项目作者: jpbirdy

项目描述 :
kafka-hdfs-source-connector
高级语言: Java
项目地址: git://github.com/jpbirdy/kafka-hdfs-source-connector.git
创建时间: 2017-05-14T14:07:23Z
项目社区:https://github.com/jpbirdy/kafka-hdfs-source-connector

开源协议:

下载


kafka-hdfs-source-connector

kafka-hdfs-source-connector is a Kafka Connector for loading data from
HDFS. HDFS sink connector is here

  • allow multi fils
  • allow file&folder listen
  • allow kerberos

Development

To build a development version you’ll need a recent version of Kafka. You can build kafka-hdfs-source-connector with Maven using the standard lifecycle phases.
Just run ./build.sh to build project to a standalone jar file.

configures

Source connector properties file may as belows, and should put it to ANY
dictionary, such as ‘${CONFLUENT_HOME}/etc/schema-registry/connect-avro-standalone.propertiesetc/kafka-connect-hdfs/‘

  1. name=test-hdfs
  2. connector.class=hdfs.HDFSSourceConnector
  3. hadoop.conf.dir=
  4. hadoop.home=
  5. hdfs.url=hdfs://localhost:9000
  6. hdfs.authentication.kerberos=false
  7. connect.hdfs.principal=
  8. connect.hdfs.keytab=
  9. hdfs.namenode.principal=
  10. kerberos.ticket.renew.period.ms=3600000
  11. file=
  12. file.path=/tmp
  13. topic.prefix=test-hdfs

Running steps:

  1. CD to CONFLUENT_HOME

    $ cd ${CONFLUENT_HOME}

  2. Start zookeeper

    $ bin/zookeeper-server-start etc/kafka/zookeeper.properties

  3. Start Kafka Broker

    $ bin/kafka-server-start etc/kafka/server.properties

  4. Start Schema Registry(Should start in Kafka server)

    $ bin/schema-registry-start etc/schema-registry/schema-registry.properties

  5. Start HDFS

    $ start-dfs.sh

  6. Start HDFS Source

    $ bin/connect-standalone
    etc/schema-registry/connect-avro-standalone.properties
    etc/kafka-connect-hdfs/test-hdfs.properties

  7. Connect Console consumer

    ./bin/kafka-avro-console-consumer —zookeeper localhost:2181 —topic
    test-hdfs —from-beginning