测试训练-sentiment-analysis-using-nltk-PROSAGA-码农传奇

Sentiment Analysis using NLTK:

Problem Statement:

Perform sentiment analysis by applying Maximum Entropy Classification to movies review data.
Observe the affect on accuracy by the discriminating features of stop words, punctuations, lemmatization and also the amount of training data fed.
Perform analysis on the unbalanced collection – changing proportions of positive and negative samples in training data.

Case Studies:

The following case studies were proposed:

Case Study I:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for all the words assuming equal proportions of positive and negative examples

Case Study II:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for top 500 words assuming equal proportions of positive and negative examples

Case Study III:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization, for top 1000 words assuming equal proportions of positive and negative examples

Case Study IV:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization for all the words assuming unequal proportions of negative and positive examples

Case Study V:

Maximum entropy classification on a) RawData, b) With stop words, c) without punctuation, d) with lemmatization for all the words assuming only negative examples