Workshop Material for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop at the Open Data Science Conference WEST 2019
Open Data Science Conference WEST 2019
Find me on Twitter: @newfront
Find me on Medium @newfrontcreative"">@newfrontcreative
About Twilio: Twilio
Install Docker Desktop (https://www.docker.com/products/docker-desktop)
Additional Docker Resources:
cd /path/to/odsc-west-2019-realtime-analytics/docker
./run.sh install
./run.sh start
The initial download can take some time depending on your WiFi connection. Expect this to take around 5-10 minutes and fingers crossed it goes faster!
The ./run.sh init
process will 1.) download Apache Spark and untar it into docker/spark-2.4.4
and 2.) unzip
the wine reviews data set from docker/data
.
The ./run.sh start
will 1.) download the official Apache Zeppelin
docker image, and 2.) download the official Redis
docker image. It will then run docker compose
on redis followed by zeppelin. Zeppelin will use the spark version (2.4.4
) that you downloaded in the init
phase so we are running on the latest and greatest Spark.
spark
in the Search Interpreters
input field.edit
button to initiate editing mode.Add the following key/values.
Updated the following key/values
com.redislabs
2.4.0
Save
and these settings will be applied to the Zeppelin Runtime.
docker exec -it redis5 redis-cli
xadd books-liked * userId 1 bookId 3
These events will now be preocessed in spark-2.4.4 foreachBatch