项目作者: VibhaBelavadi

项目描述 :
Sentiment Analysis using NLTK: Maximum entropy classification
高级语言: Python
项目地址: git://github.com/VibhaBelavadi/sentiment-analysis-using-nltk.git
创建时间: 2016-02-02T05:06:06Z
项目社区:https://github.com/VibhaBelavadi/sentiment-analysis-using-nltk

开源协议:MIT License

下载


Sentiment Analysis using NLTK:

Problem Statement:

  1. Perform sentiment analysis by applying Maximum Entropy Classification to movies review data.
  2. Observe the affect on accuracy by the discriminating features of stop words, punctuations, lemmatization and also the amount of training data fed.
  3. Perform analysis on the unbalanced collection – changing proportions of positive and negative samples in training data.

Case Studies:

The following case studies were proposed:

Case Study I:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for all the words assuming equal proportions of positive and negative examples

Case Study II:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for top 500 words assuming equal proportions of positive and negative examples

Case Study III:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for top 1000 words assuming equal proportions of positive and negative examples

Case Study IV:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization for all the words assuming unequal proportions of negative and positive examples

Case Study V:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization for all the words assuming only negative examples