Spark exercises: Spark RDD, SparkSQL, Spark ML pipelines, Spark in Cloud(AWS)
This repository contains solutions for four Spark exercises.
├── README.md <- You are here
├── SparkSQL
│ ├── exercise1.py <- python source code file
│ ├── exercise1.png <- Output of the Spark Job
│ ├── exercise1-findings.txt <- Findings
│ ├── Problem_Statement.md <- Problem Statement
├── SparkRDD
│ ├── exercise2.py <- python source code file
│ ├── exercise2.txt <- Output of the Spark Job
│ ├── exercise2-findings.txt <- Findings
│ ├── Problem_Statement.md <- Problem Statement
├── Spark_Machine_Learning_Pipeline
│ ├── exercise3.py <- python source code file
│ ├── exercise3.txt <- Output of the Spark Job: Out of sample R Square of the Model
│ ├── Problem_Statement.md <- Problem Statement
├── Spark_Application_Crime_Analysis
│ ├── exercise4.py <- python source code file
│ ├── exercise4.txt <- Output of the Spark Job
│ ├── exercise4.png <- Output of the Spark Job
│ ├── exercise3-findings.txt <- Findings
│ ├── Problem_Statement.md <- Problem Statement
├── Spark_Application_Profit_Prediction
│ ├── exercise5.py <- python source code file
│ ├── mape_all.txt <- Output of the Spark Job
│ ├── Problem_Statement.md <- Problem Statement
<!-- tocstop -->