Durning my 3rd year HBO, I've worked with another student on determining the sentiment of movie reviews. We've collected over 150.000 movie reviews from rotten tomatoes, trough kaggle. Some of these reviews already had a sentiment grade (a grade between 0 and 4 determining how positive/negative the review was). These rows functioned as our train set.
We've started off by exploring, and cleaning our dataset. We did this by visualising our data, and cleaning things as rows containing empty cells, removing rows containing only non-alpha characters, removing certain non-alpha characters in general, etc.
After that we've tried out multiple supervised algorithms, and setting those off to each other. We also expirimented with deep learning using an LSTM Model. Overall this was a really basic and introducing project to the concepts of machine and deep learning. We ended up with an accuracy of around 67%.