项目作者: DimitriKourouniotis

项目描述 :
Data Science Capstone : Machine Learning applied to FCC Net Neutrality comments
高级语言: Jupyter Notebook
项目地址: git://github.com/DimitriKourouniotis/Capstone1FCCNN.git
创建时间: 2018-04-06T06:56:57Z
项目社区:https://github.com/DimitriKourouniotis/Capstone1FCCNN

开源协议:

下载


Dimitri Kourouniotis Data Science Enthusiast

Analysing FCC Net Neutrality Comments using Machine Learning and NLP

Supervised Machine Learning using NLP
by Dimitri Kourouniotis
In the winter of 2017 there were numerous articles about quantity of fake comments submitted regarding the repeal of Net Neutrality laws by the FCC.

A blog post published by Jeff Kao caught my attention and I followed up with him on his analysis of the text. He provided me with the unedited 22 million filings available. I analyzed a sample from 3 million of them to see what I could find to develop my own features based around the text of faked comments.
Image of Jeff Kao Article


Capstone Report (pdf)

Capstone Summary Slidedeck (pdf)

00 Summary and Table of Contents

01 Importing 3 million FCC records from SQL

02 Email domains
Emails from fake domains or fake accounts

03 WordCloud
Wordcloud of comments

04 Submission Frequency
Suspcious submission timing

05 State Population Estimates 2016 and Comment Percentages

06 Plotting Differences from Average
Comments by state variations from population

07 Choropleth grid Map of US
Variations mapped

08 Statistics Proportions by State Relative to Population
Comments by state variations from Normal

09 Classifiers and Feature Selection


Acknowledgements

Many thanks to my mentor, Rajiv Shah!

Thanks to the following for the data and code help for this capstone:

Data: Jeff Kao

More than a million pro-repeal net neutrality comments were likely faked

https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked-e9f0e3ed36a6

Word Cloud: Nikhil Kumar Singh

wordcloud example

https://github.com/nikhilkumarsingh/wordcloud-example/blob/7a77e97c4da135b67ad924be96269d6bb68a0fe6/mywc.py

Chorogrid Plot: lavinben88

chorogrid tutorial part 2

https://plot.ly/~lavinben88/116/chorogrid-tutorial-part-2-chorogri/

Classifier Iterator: Evgeny Volkov

SMS spam detection with various classifiers

https://www.kaggle.com/muzzzdy/sms-spam-detection-with-various-classifiers