项目作者: vineetdcunha

项目描述 :
Processing and transforming data via Hadoop Ecosystem
高级语言: Python
项目地址: git://github.com/vineetdcunha/Hadoop_Ecosystem.git
创建时间: 2020-11-25T05:22:08Z
项目社区:https://github.com/vineetdcunha/Hadoop_Ecosystem

开源协议:

下载


Hadoop_Ecosystem

WordCount_Python - Fetching the word count from the file.

Hive_Vehicle_data - Hive QL with vehicle data.

Hive_transform - Hive QL and using python code to transform and load the data.

Hive_Sum_transform - Lineorder data - Sum and Transformation using python

Hadoop_Stream_Average - Calculate the average using Hadoop streaming and python file.

Hadoop_Stream_Std_Dev - Calculate the standard deviation using Hadoop streaming and python file.

Hadoop_Stream_Join - Hadoop Streaming to join the Employee and Customer dataset.

Hadoop_Stream_Join_Agg - Hadoop Streaming to join and aggregate data from the Lineorder and Customer dataset.

Hadoop_Stream_Cluster - Clustering using hadoop streaming.

HBase - Creating system for Employee data

lo_pig - lo_discount_count, lo_revenue_sum - Lineorder data: Count and Sum

Pig_Join_Agg -Pig to join and aggregate data from the Lineorder and Customer dataset.

Hadoop_Multi_Node_WordCount - Fetching the word count from the file using python.

Mahout_Page_Rank - Implementation of Page Rank algorithm using Mahout

Mahout_Kmeans_Matrix_Fact - Implementation of Kmeans and Matrix Factorization for Movie Lens data using Mahout