项目作者: rachelzhaolp

项目描述 :
Spark exercises: Spark RDD, SparkSQL, Spark ML pipelines, Spark in Cloud(AWS)
高级语言: Python
项目地址: git://github.com/rachelzhaolp/BigData-HW-Spark.git
创建时间: 2020-09-23T19:35:49Z
项目社区:https://github.com/rachelzhaolp/BigData-HW-Spark

开源协议:

下载


BigData-HW-Spark

This repository contains solutions for four Spark exercises.

  1. SparkSQL
  2. Spark RDD
  3. Spark DataFrame and Machine Learning Pipelines — Gradient Boosted Tree
  4. Spark Application — Crime Analysis
  5. Spark Application — Profit Prediction

Directory structure

  1. ├── README.md <- You are here
  2. ├── SparkSQL
  3. ├── exercise1.py <- python source code file
  4. ├── exercise1.png <- Output of the Spark Job
  5. ├── exercise1-findings.txt <- Findings
  6. ├── Problem_Statement.md <- Problem Statement
  7. ├── SparkRDD
  8. ├── exercise2.py <- python source code file
  9. ├── exercise2.txt <- Output of the Spark Job
  10. ├── exercise2-findings.txt <- Findings
  11. ├── Problem_Statement.md <- Problem Statement
  12. ├── Spark_Machine_Learning_Pipeline
  13. ├── exercise3.py <- python source code file
  14. ├── exercise3.txt <- Output of the Spark Job: Out of sample R Square of the Model
  15. ├── Problem_Statement.md <- Problem Statement
  16. ├── Spark_Application_Crime_Analysis
  17. ├── exercise4.py <- python source code file
  18. ├── exercise4.txt <- Output of the Spark Job
  19. ├── exercise4.png <- Output of the Spark Job
  20. ├── exercise3-findings.txt <- Findings
  21. ├── Problem_Statement.md <- Problem Statement
  22. ├── Spark_Application_Profit_Prediction
  23. ├── exercise5.py <- python source code file
  24. ├── mape_all.txt <- Output of the Spark Job
  25. ├── Problem_Statement.md <- Problem Statement
  26. <!-- tocstop -->