Classification of Depression through Social Media Posts Using Machine Learning Techniques




Depression, Tweets, Social media, Bag of Words (BOW), Term Frequency-Inverse Document Frequency (TFIDF), Tokenizer, Naïve Bayes, Random Forest, Decision Tree


Machine Learning has been applied to solve several problems in various areas of life such as medicine, sciences and industries. Depression is a major problem across the globe and is becoming a serious challenge in the health sector. Millions of people suffer from depression, at different levels, but only few take preventive measures and get appropriate treatment, due mainly to the fact that early detection of depression may be cumbersome. A deep study of an individual’s behaviour could led to early detection and some of these behaviours can be gotten through social media platforms. This study seeks to analyse users’ tweets gotten from twitter and classify depressive contents into four levels, rather than the usual two-tier depression classification. Users’ tweets were extracted using twitter API and a web scrapping tool called ‘Twint’. Bag of words model, Term Frequency-Inverse Document Frequency and a text pre-processing tool provided by Keras framework, were used to quantify and comparatively evaluate how different models influenced the classification of tweets. Three machine learning algorithms; Naïve Bayes, Random Forest and Decision Tree were used for the classification. The result reveals that Random Forest best classifies the tweets into the four categories of depression.