项目作者: AndrewKuzmin

项目描述 :
Analytics for IoT devices using Apache Spark Structured Streaming 2.4.0
高级语言: Scala
项目地址: git://github.com/AndrewKuzmin/Analytics-For-IoT-Devices-Using-Spark.git
创建时间: 2018-05-04T17:25:26Z
项目社区:https://github.com/AndrewKuzmin/Analytics-For-IoT-Devices-Using-Spark

开源协议:Apache License 2.0

下载


Analytics-For-IoT-Devices-Using-Spark

Analytics for IoT devices using Apache Spark 2.4.0

Use cases of processing modes (Triggers modes)

1) Default;
2) Fixed interval micro-batches;
3) One-time micro-batch;
2) Continuous with fixed checkpoint interval;

Optimizations

1) Tungsten execution engine;
2) Catalyst query optimizer;
3) Cost-based optimizer;

Structured Sessionization

1) KeyValueGroupedDataset.mapGroupsWithState;
2) KeyValueGroupedDataset.flatMapGroupsWithState;

Examples from below notebooks were used:

1) Complex and Nested Data;

Reference for test data was used:

Nest Developers

JSON data generator for test data by EverWatch Corporation was used:

Json Data Generator