A Sentiment Text Classifier employing Hadoop implemented in Java
Sentiment Analysis if one of the subfields of natural language processing whose goal is to predict the sentiments expressed in the reviews and comments published on the Web. The fast growth of the volume of the available data on the Web necessitates the use of some big data technologies that can handle and analyze such huge data. In this project, we propose to use Hadoop for processing the data. The main approach that is followed in this project for classifying the reviews is a lexicon-based method. To evaluate the proposed model, the performance of the system is measured on two popular datasets: SAR14, and Amazon dataset. The evaluation criteria that are used for the measurement are F-measure and accuracy. The results demonstrate the higher performance gained using Hadoop in comparison to the sequential mode. Besides, the proposed system is just a prototype, which means that the accuracy of the system is not superior to that of the state-of-the-art systems. However, the designed architecture of the model could be expanded and retrofitted to produce more accurate results.