项目作者: mustafabacchus

项目描述 :
Create, query, and create relational tables with Apache Ambari's Hive.
高级语言:
项目地址: git://github.com/mustafabacchus/Hive-Data-Reporting.git
创建时间: 2018-06-03T02:26:24Z
项目社区:https://github.com/mustafabacchus/Hive-Data-Reporting

开源协议:

下载


Note: Requires a basic understanding of Hadoop, HortonWorks Sandbox, Apache Hive, and how to navigate HDFS through ‘File View’.

Hive - Data Reporting

Data derived from public repositories Nasdaq.

Description:

This repository demonstrates relational table creation in HDFS, warehousing structured data, and reporting on it’s stored data.

The Files:

NASDAQ_daily_prices.csv, NASDAQ_dividends.csv - CSV files Containg the raw data.
questions.txt - Use cases used to create the tables and report on them.
script.sql - The scripts used to create the tables and queries used for reporting.

Prequesties:

1. Horton Works Sandbox Hadoop Framework installed and running with Apache Ambari.
(https://hortonworks.com/tutorial/learning-the-ropes-of-the-hortonworks-sandbox/)
2. Apace Hive installed on HDFS.

Execution:

1. Download (clone) this repostory on your local machine.
2. Create the follwing directories using ‘File View’:
‘/user/maria_dev/nasdaq/daily_prices/‘, ‘/user/maria_dev/nasdaq/dividends/‘
(https://hortonworks.com/tutorial/loading-and-querying-data-with-hadoop/)
3. Place the CSV files into HDFS using ‘File View’ into thier respective directories.
4. Open Hive and copy, paste the entire script and execute.