Sentiment Analysis using Logistic Regression
Authors: George B. Aliman, Tanya Faye S. Nivera, Jensine Charmille A. Olazo, Daisy Jane P. Ramos, Chris Danielle B. Sanchez, Timothy M. Amado, Nilo M. Arago, Romeo L. Jorda Jr.,Glenn C. Virrey, Ira C. Valenzuela
Abstract
This paper proposed a study that will assess different machine learning techniques in classifying tweets. There are four machine learning techniques that will be subjected to testing using same set of data namely: Naive Bayes, Linear Support Vector Classifier, Stochastic Gradient Descent Classifier and Logistic Regression. It is always a challenge to identify which machine learning model will give the most efficient performance in sentiment analysis. The main objective of this paper is to find the best machine learning technique for the sentiment analysis in English, Filipino and Taglish languages. The said models will be integrated to Twitter’s API for the collection of twitter data which will be subjected to data preprocessing to make the tweets analyzable and then feature extraction was done using Natural Language Processing. The performance scores of each machine learning algorithm has been computed. The four algorithms: Support Vector Classifier, Stochastic Gradient Descent, Naive Bayes and Logistic Regression were used for machine learning with an accuracy of 69%, 71%, 77%, and 81% respectively. The Logistic Regression Model has the highest accuracy and best fitted algorithm for prediction of potential mental health crisis tweets.