项目作者: shimonyagrawal

项目描述 :
This repository contains coursework for the Data Mining course in the MS Applied Business Analytics program at Boston University.
高级语言: R
项目地址: git://github.com/shimonyagrawal/Data-Mining-for-Airbnb-Listings.git
创建时间: 2020-08-13T10:52:42Z
项目社区:https://github.com/shimonyagrawal/Data-Mining-for-Airbnb-Listings

开源协议:MIT License

下载


Data Mining for Airbnb Listings

This repository contains coursework for the Data Mining course in the MS ABA program at Boston University.
Team Members: Shimony Agrawal, Gerardo Bastidas, Alberto Calderon, Benjamin Flavin, Oscar Villarreal Rojas

Introduction

The project aims to analyse the Airbnb Listings for Copacabana, Brazil to better improve its performance. There are 4 key parts of the project:

  1. Data Exploration and Preparation
  2. Prediction
  3. Classification
  4. Clustering

Based on these steps, supervised and unsupervised machine learning algorithms like Multiple Linear Regression, K-Nearest Neighbours, Naive Bayes, CART and Clustering Analysis were applied to predict prices, instant bookability of the rental, cancellation policies, impact of cleaning fee on the bookings and various clusters the rentals belonged to.

Analysis

We first performed data wrangling on 33,715 records to eliminate N/A and missing values to perform further analysis on the data.Following which, we performed data visualization to identify any outliers in the data. Using the training set, we created machine learning models in RStudio. We built 5 models: Multiple Linear Regression for price prediction, K-Nearest Neighbours for predicting cancellation policy, Naive Bayes to predict the instant bookability of the rental, Classification and Regression Tree to assess the cleaning fee and lastly, we performed feature engineering to cluster our rentals.The results and analysis can be used by Airbnb to further improve its listings.