Apache Flume Source for AMPS
This project allows you to directly use an AMPS messaging subscription as an
input source into Apache Flume. This is done by implementing a custom pollable
Flume source. It also provides a Flume sink that can publish a stream into AMPS.
For more in depth information on the AMPS / Apache Flume integration, see our Crank Up Apache Flume blog article.
First make sure you have at least Maven 3.3 installed.
Create a ${basedir}/jars
directory and place the 5.3.3.0 or later AMPS Java client JAR under it.
Execute mvn clean install
.
The resulting JAR will be at: target/AMPSFlumeComponents-1.1.0-SNAPSHOT.jar
To install the AMPS custom Flume source as a plugin within your Flume
installation, follow the instructions below:
Create the plugins.d
directory if it doesn’t exist in the Flume home
directory.
Create the following directory & file structure under plugins.d
:
plugins.d/
AMPSFlume/
lib/
AMPSFlumeComponents-1.1.0-SNAPSHOT.jar
libext/
amps_client.jar
More information can be found about Flume plugins in the
Flume User Guide.
Look at the plugins.d
section.
Included in the project repository is a working example that shows off the powerful aggregated subscription feature of AMPS.
Copy the example Flume config from src/test/resources/flume-conf.properties
to $FLUME_HOME/conf/
.
Start an AMPS server using the config at src/test/resources/amps-config.xml
.
Create the temporary output directory specified in the above configuration:
mkdir /tmp/amps-flume/
From $FLUME_HOME, start Flume with:
bin/flume-ng agent -Xmx512m -c conf/ -f conf/flume-conf.properties -n agent -Dflume.root.logger=INFO,console
In a new window, subscribe to the AMPSFlumeSinkTest topic using the AMPS spark utility:
spark subscribe -server localhost:9007 -topic AMPSFlumeSinkTest
In another window, publish the example JSON messages to the Orders topic using the AMPS spark utility:
spark publish -server localhost:9007 -topic Orders -rate 1 -file src/test/resources/messages.json
Notice that we are publishing at a rate of 1 message per a second, so that in the output we can see the aggregate fields change over time as updates arrive. The messages are replicated to two Flume memory channels, feeding two separate sinks (a file_roll sink and the custom AMPS sink). After about 15 seconds our aggregated subscription view of the data gives us all the results under the output directory (/tmp/amps-flume/
) and the window with the AMPSFlumeSinkTest subscription (which the AMPS sink publishes to).
Please send any issues or questions to support@crankuptheamps.com