Out of the box scheduling, logging, monitoring and data governance for your scala ETL jobs.
Out of the box scheduling, logging, monitoring and data governance for your scala ETL jobs.
<dependency>
<groupId>de.jannikarndt</groupId>
<artifactId>datamover_2.12</artifactId>
<version>1.2.0</version>
</dependency>
or
libraryDependencies += "de.jannikarndt" % "datamover" % "1.1.0"
import de.jannikarndt.datamover._
import de.jannikarndt.datamover.governance.GovernedID
import scala.concurrent.duration._
import scala.language.postfixOps
object ExampleJob {
def main(args: Array[String]): Unit = DataMover run classOf[ExampleJob] every (10 seconds)
}
class ExampleJob extends DataMover("ExampleJob") {
override def run(): Unit = {
// here you can access
// - logger => Log debug, info or error information
// - monitor => track throughput
// - governedId => append this to your output to find the job that generated it
// Logging
logger.info(s"Logs are aggregated per run. These are for Job ${governedId.identifier}.")
// Write you own EXTRACT-function
// Monitor your input
monitor.input(5)
// Write your own TRANSFORM-function
// Write your own LOAD-function
// Monitor your output
monitor.output("Appended successfully")
}
}
Then head to http://localhost:55555 (and increasing for every additional job):
Prometheus can read directly from DataMover jobs. Just add the targets: ['localhost:55555']
to your prometheus.yml
!
This code is open source software licensed under the MIT License.
Uptime-Monitoring for sources and sinks
Versioning: Which version of which job is deployed where? + Changelog
Feature toggles
Central server to monitor all jobs:
Interface for Elastic/Kibana
Snapshots are deployed at oss.sonatype.org.
Releases are deployed at maven.org.