About the Project

Today, more than 250 million people suffer from depression. And it is apparent that there are many more who might be suffering, but either don’t know about it or don’t believe it. Mental Health Awareness is crucial for tackling this, and so is a way, to accurately and efficiently detect depression within seconds.

SentiMate, short for ‘Sentiment Mate’, is a machine learning model that can identify depression using text with an accuracy of over 96%. SentiMate doesn’t come as a replacement for professionals in this field, but rather it complements their work. By detecting the likelihood of depression, users can contact therapists and doctors to understand the problem as well as the way out.

The Idea

During my time as a volunteer at Zenonco (“The World’s First Integrative Oncology Healthtech Platform”), I got to know that patients suffering from cancer constantly faced bad mental health, and most of the times never knew about it. I wrote this article, which summarised Dr. Vidhya Nair’s talk on the relation between Cancer and Mental Health. Mental Health affects our recovery from diseases and it is necessary to stay happy and mentally strong during these times. Thus, a tool to detect depression is essential; So, I looked up a few ways to detect depression and other causes of poor mental health, but all I found was a bunch of surveys that predicted the results based on the number of “positive” answers, i.e, answers that could be associated with depression. I started SentiMate as a research project that aimed to find a better way to detect depression. After about 6 months, I finished writing a 15 page paper on my research (publishing soon), and had a model ready; I then deployed the model with a few adjustments, which is now available online. The concept of this model is to replace traditional surveys with a more logical and efficient approach.

~Achintya Jha


This project uses 2 separate models for predictions. Both models are based on separate datasets, and do best when used together. The first model uses a general dataset containing 1.6 million tweets, while the other dataset has over 20 thousand tweets which are specifically scraped so as to be related to depression. Together, these datasets have been used for training 2 sets of vocabularies, and subsequently, training 2 models. Of all suitable machine learning models for this project, Logistic Regression, performs the best.

So, with 2 custom vocabularies and logistic regression models, this model, classifies the text into positive and negative categories. However, as you might expect, there is a lot more going on than that. The language processing and feature engineering methods used have been covered in depth in my paper, which I will be releasing soon.

If you still wish to know more, consider dropping an email at - [achintyakjha] at [gmail] dot [com].