项目作者: leo-you

项目描述 :
Country Profiling Using PCA and KMeans Clustering
高级语言: HTML
项目地址: git://github.com/leo-you/Country-PCA-and-Custering.git
创建时间: 2020-07-12T03:40:34Z
项目社区:https://github.com/leo-you/Country-PCA-and-Custering

开源协议:

下载


Country Profiling Using PCA and Clustering

The data from this analysis is from kaggle Unsupervised Learning on Country Data, which contains socio-economic and health related factors of 167 countries over the world. The goal of the project is to categorise the countries using socio-economic and health factors that determine the overall development of the country.

The variables provided in the dataset include socio-economic factors such as export, import and GDP of a country, as well as heath-related factors such as child mortality rate, life expectancy and health spend % of a country. The data is relatively well formatted and unlablled. Due to the nature of the dataset, the analysis will use a combinatino of Unsupervised Machine Learning techniques such as kmeans clustering and Dimensionality Reduction techniques such as Principal Component Analysis. We’ll also perform outlier analysis and scalling the data to provide a more accurate clustering result. The analysis will be performed in Python using Jupyter Notebook. The packages used in this analysis are as followed:

  • Seaborn
  • Sklearn
  • Matlibplot
  • Numpy
  • Pandas
  • Yellowbrick