项目作者: gmarciani

项目描述 :
Scaffolding for data stream processing applications, leveraging Apache Flink.
高级语言: Java
项目地址: git://github.com/gmarciani/flink-app.git
创建时间: 2017-06-19T14:58:08Z
项目社区:https://github.com/gmarciani/flink-app

开源协议:MIT License

下载


FLINK SCAFFOLDING

Scaffolding for data stream processing applications, leveraging Apache Flink

Requirements

The system needs to be provided with the following packages:

  • Java >= 1.8.0
  • Maven >= 3.5.0
  • Hadoop = 2.8.0
  • Flink = 1.3.1 (scala 2.11)
  • Kafka >= 0.10.2.1
  • Elasticsearch >= 5.5.0
  • Kibana >= 5.5.0

and the following environment variables, pointing to the respective package home directory:

  • JAVA_HOME
  • MAVEN_HOME
  • HADOOP_HOME
  • FLINK_HOME
  • KAFKA_HOME
  • ELASTICSEARCH_HOME
  • KIBANA_HOME

Build

Build the application for a specific query:

  1. $> mvn clean package

Usage

Start the environment:

  1. $socstream_home> bash start-env.sh

Visit the Flink web dashboard at http://localhost:8081.

The general job submission is as follows:

  1. $flink_home> bin/flink jar [PROGRAM_JAR] [QUERY] [QUERY_OPTS]

where

  • [PROGRAM_JAR] is the local absolute path to the program’s JAR;
  • [QUERY] is the name of the program query to execute;
  • [QUERY_OPTS] are query arguments (e.g. —optName optValue).

Notice that the following map/reduce programs are available:

  • query-1 the 1st query, leveraging … .

The job can be interrupted typing Ctrl+C.

Read the output:

  1. $hadoop_home> bin/hadoop hdfs -cat [RESULT]/*

where
[RESULT] is the HDFS directory of results.

Stop the environment:

  1. $socstream_home> bash stop-env.sh

Query 1

The 1st query requires a netcat session to be started:

  1. $> ncat 127.0.0.1 9000 -l

The 1st query can be executed running:

  1. $socstream_home> bash query-1.sh

The output is saved to $path/to/the/project/out/query1.

Query 2

The 2nd query requires a netcat session to be started:

  1. $> ncat 127.0.0.1 9000 -l

The 2nd query can be executed running:

  1. $socstream_home> bash query-1.sh

The output is saved to $path/to/the/project/out/query1.

Query 3

The 3rd query requires a Kafka and Elasticsearch to be configured and started.

Create the Kafka topic topic-query-3:

  1. $kafka-home> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topic-query-3

Test the topic creation:

  1. $kafka-home> bin/kafka-topics.sh --list --zookeeper localhost:2181

To test message publishing:

  1. $kafka-home> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic topic-query-3
  2. $kafka-home> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic-query-3

Create the Elasticsearch index fsq4 with mapping output and mapping schema

  1. {
  2. "properties": {
  3. "wStart": {"type": "date"},
  4. "wEnd": {"type": "date"},
  5. "rank": {"type": "text"}
  6. }
  7. }

The 3rd query can be executed running:

  1. $socstream_home> bash query-3.sh

The output is saved to ${FLINK_HOME}/log/*.out.

Dataset

The dataset is provided by … and can be downloaded from here.

Authors

Giacomo Marciani, gmarciani@acm.org

References

Christopher Mutschler, Holger Ziekow, and Zbigniew Jerzak. 2013. The DEBS 2013 grand challenge. In Proceedings of the 7th ACM international conference on Distributed event-based systems (DEBS ‘13). ACM, New York, NY, USA, 289-294. DOI

License

The project is released under the MIT License.