项目作者: SatLight
项目描述 :
Building a prediction model for a huge dataset using Big Data tech like Kylin and Spark.
高级语言: Jupyter Notebook
项目地址: git://github.com/SatLight/Airline-Delay-Prediction-using-Spark-and-Kylin.git
Airline-Delay-Prediction-using-Spark-and-Kylin
Mapper and Reducer purpose is to detect and replace null values by column average.
The model is built in PySpark using Decision Tree Classifiers.
Dataset