项目作者: BalaPasala20
项目描述 :
Clean a museum objects data using python. Visualize data with Tableau.
高级语言: Jupyter Notebook
项目地址: git://github.com/BalaPasala20/DataCleaningPython_TableauVisualization.git
DataCleaningPython_TableauVisualization
A Museum’s messy data about its objects is available in MessyData_Sample.csv. The goal is to clean the data and then visualize it using Tableau.
Techniques used for data cleaning
- Identify the null values and their percentage to total values
- Remove null values when they are insignificantly small. The null values are removed for the string type data.
- Repetitive values with pipe delimiter are restructured to a single value within a data frame
- Data types of columns are changed appropriately.
- Medium column contains description about composition of objects in the museum. The word frequency is calculated within the column to identify different types of “Medium”.
Techniques used for Tableau visualization
- Text tables
- Horizontal bar charts
- Highlight tables
- Packed bubbles chart
Prerequisites
- Jupyter notebook
- Tableau-desktop
Steps to run the project
- The python file “Clean_Data.ipynb” takes in “MessyData_Sample.csv” and outputs two .csv files - medium1.csv and Clean_Data.csv. The .csv files are also available in the repository.
- After opening the Tableau desktop, the csv files - “medium1.csv” and “Clean_Data.csv” are taken as data source by a logical relationship by considering “Medium” as a common field. The worksheets are already developed to display the charts.
Data Cleaning using Python:
- “MessyData_Sample.csv” has messy data about the museum’s objects.
- The python program “Clean_Data.ipynb” takes the “MessyData_Sample.csv” and outputs two .csv files - “medium1.csv” and “Clean_Data.csv”
- The columns - Object ID, Object Name, Medium, Department, Culture, Object Begin Date, Object End Data, and Artist Nationality, are considered for the analysis.
- The “medium1.csv” contains the word frequency about each type of medium. The “Medium” column in “MessyData_Sample.csv” has description about composition of objects.
- The “Artist Nationality” column has repetitve values with a pipe delimiter. A new column “Artist_Nationality_new” is created with a single value.
- Clean data is saved inside “Clean_Data.csv”.
Outputs of Tableau visualization:
- A calculated field “Age” is generated by finding the difference between “Object End Date” and “Object Begin Date”.
- Horinzontal bar chart showing different medium of artifacts in the museum.
- Text table with a “Culture” filter showing “Department” of each “Object ID”.
- Text table with a bar chart showing average “age” of an “Object ID” under each “Culture” along with “Culture” filter.
- Highlight table that displays no.of artists based on their nationality.
- Horizontal bar chart showing top 20 oldest objects.
- Packed bubble chart showing top “Culture”s that produced high number of “Object ID”s.