项目作者: Mkang525

项目描述 :
Performed an ETL process on Amazon datasets using PySpark, AWS and Google Colab.
高级语言: Jupyter Notebook
项目地址: git://github.com/Mkang525/Amazon_ETL.git
创建时间: 2020-11-16T22:20:33Z
项目社区:https://github.com/Mkang525/Amazon_ETL

开源协议:

下载


Big Data


  • Performed an ETL on two datasets from Amazon completely in the cloud. First dataset was on beauty products, second dataset was on watches

  • Transformed the dataset to fit the tables in the schema file. Ensured the DataFrames matched in data type and in column name, then loaded the DataFrames that corresponded to tables into an RDS instance

  • Demonstrated ability to conduct statistical analyses on data using PySpark


#