Scaffolding for data stream processing applications, leveraging Apache Flink.
Scaffolding for data stream processing applications, leveraging Apache Flink
The system needs to be provided with the following packages:
and the following environment variables, pointing to the respective package home directory:
Build the application for a specific query:
$> mvn clean package
Start the environment:
$socstream_home> bash start-env.sh
Visit the Flink web dashboard at http://localhost:8081.
The general job submission is as follows:
$flink_home> bin/flink jar [PROGRAM_JAR] [QUERY] [QUERY_OPTS]
where
Notice that the following map/reduce programs are available:
The job can be interrupted typing Ctrl+C.
Read the output:
$hadoop_home> bin/hadoop hdfs -cat [RESULT]/*
where
[RESULT] is the HDFS directory of results.
Stop the environment:
$socstream_home> bash stop-env.sh
The 1st query requires a netcat session to be started:
$> ncat 127.0.0.1 9000 -l
The 1st query can be executed running:
$socstream_home> bash query-1.sh
The output is saved to $path/to/the/project/out/query1.
The 2nd query requires a netcat session to be started:
$> ncat 127.0.0.1 9000 -l
The 2nd query can be executed running:
$socstream_home> bash query-1.sh
The output is saved to $path/to/the/project/out/query1.
The 3rd query requires a Kafka and Elasticsearch to be configured and started.
Create the Kafka topic topic-query-3
:
$kafka-home> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topic-query-3
Test the topic creation:
$kafka-home> bin/kafka-topics.sh --list --zookeeper localhost:2181
To test message publishing:
$kafka-home> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-query-3
$kafka-home> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic-query-3
Create the Elasticsearch index fsq4
with mapping output
and mapping schema
{
"properties": {
"wStart": {"type": "date"},
"wEnd": {"type": "date"},
"rank": {"type": "text"}
}
}
The 3rd query can be executed running:
$socstream_home> bash query-3.sh
The output is saved to ${FLINK_HOME}/log/*.out.
The dataset is provided by … and can be downloaded from here.
Giacomo Marciani, gmarciani@acm.org
Christopher Mutschler, Holger Ziekow, and Zbigniew Jerzak. 2013. The DEBS 2013 grand challenge. In Proceedings of the 7th ACM international conference on Distributed event-based systems (DEBS ‘13). ACM, New York, NY, USA, 289-294. DOI
The project is released under the MIT License.