项目作者: ArmanKabiri

项目描述 :
A Sentiment Text Classifier employing Hadoop implemented in Java
高级语言: Java
项目地址: git://github.com/ArmanKabiri/A-Sentiment-Analysis-System-for-Big-Datasets.git


A-Sentiment-Analysis-System-for-Big-Datasets

A Sentiment Text Classifier employing Hadoop implemented in Java

Sentiment Analysis if one of the subfields of natural language processing whose goal is to predict the sentiments expressed in the reviews and comments published on the Web. The fast growth of the volume of the available data on the Web necessitates the use of some big data technologies that can handle and analyze such huge data. In this project, we propose to use Hadoop for processing the data. The main approach that is followed in this project for classifying the reviews is a lexicon-based method. To evaluate the proposed model, the performance of the system is measured on two popular datasets: SAR14, and Amazon dataset. The evaluation criteria that are used for the measurement are F-measure and accuracy. The results demonstrate the higher performance gained using Hadoop in comparison to the sequential mode. Besides, the proposed system is just a prototype, which means that the accuracy of the system is not superior to that of the state-of-the-art systems. However, the designed architecture of the model could be expanded and retrofitted to produce more accurate results.

To access the datasets and the lexicons, please contact me.

Note: This project was accomplished as a part of Big Data Systems course at UNB.