项目作者: JannikArndt

项目描述 :
Out of the box scheduling, logging, monitoring and data governance for your scala ETL jobs.
高级语言: Scala
项目地址: git://github.com/JannikArndt/DataMover.git
创建时间: 2017-03-30T11:59:45Z
项目社区:https://github.com/JannikArndt/DataMover

开源协议:MIT License

下载


DataMover

Out of the box scheduling, logging, monitoring and data governance for your scala ETL jobs.

DataMover logo

Build Status

Getting DataMover

  1. <dependency>
  2. <groupId>de.jannikarndt</groupId>
  3. <artifactId>datamover_2.12</artifactId>
  4. <version>1.2.0</version>
  5. </dependency>

or

  1. libraryDependencies += "de.jannikarndt" % "datamover" % "1.1.0"

Example:

  1. import de.jannikarndt.datamover._
  2. import de.jannikarndt.datamover.governance.GovernedID
  3. import scala.concurrent.duration._
  4. import scala.language.postfixOps
  5. object ExampleJob {
  6. def main(args: Array[String]): Unit = DataMover run classOf[ExampleJob] every (10 seconds)
  7. }
  8. class ExampleJob extends DataMover("ExampleJob") {
  9. override def run(): Unit = {
  10. // here you can access
  11. // - logger => Log debug, info or error information
  12. // - monitor => track throughput
  13. // - governedId => append this to your output to find the job that generated it
  14. // Logging
  15. logger.info(s"Logs are aggregated per run. These are for Job ${governedId.identifier}.")
  16. // Write you own EXTRACT-function
  17. // Monitor your input
  18. monitor.input(5)
  19. // Write your own TRANSFORM-function
  20. // Write your own LOAD-function
  21. // Monitor your output
  22. monitor.output("Appended successfully")
  23. }
  24. }

Then head to http://localhost:55555 (and increasing for every additional job):

Monitoring Screenshot

Monitoring with Prometheus

Prometheus can read directly from DataMover jobs. Just add the targets: ['localhost:55555'] to your prometheus.yml!

License

This code is open source software licensed under the MIT License.

To-Do / Planned

  • Data Governance
  • Alerting when job fails
  • Uptime-Monitoring for sources and sinks

  • Versioning: Which version of which job is deployed where? + Changelog

  • Feature toggles

  • Central server to monitor all jobs:
    Monitoring Idea

  • Interface for Elastic/Kibana

  • Interface for Jolokia

Deployment

Snapshots are deployed at oss.sonatype.org.
Releases are deployed at maven.org.

Changes

v1.3.0

  • Support for Prometheus
  • Jobs automatically choose a free port, starting at 55555
  • LogLevel coloring
  • GovernedId can be accessed anywhere in class

v1.2.0

  • Logger now supports ERROR, WARN and DEBUG
  • Governor writes valid json
  • artifact id contains scala version

v1.1.0

  • Upgrade to scala 2.12.3
  • Removed unnecessary dependencies
  • Removed old files