Sentiment Analysis System for Amazon Views
We have developed a system to automatically categorize comments as positive or negative based on SVM. Predicted the rating of a review given its text and other properties from Amazon. In this project, we tried a lot of machine learning approaches to address sentiment analysis problems on Amazon. We first eliminated words that are non-related to ratings, most of which are objective words. We used methods including frequency-based word-probability, stemming, PCA and so on. Then we added many useful additional features, like data, title and bigram features. Furthermore, we tried and compared standard machine learning approaches. During the competition, we constantly utilized cross validation to evaluate our performance. After trying SVM, Adaboost, logistic regression, knn and our own kernel, we found that a combination of several features after performing PCA using liblinear could give the us best performance: RMSE of 0.8529.