Detecting why a product is rated bad or good

Welcome back in this week, I look at an algorithm used to generate features on why a product is bad or good.

First how it works is it we change a review to be a “negative review” if it is less than 3 stars there are surely text in that review that tells us why it is 3 stars. These are the features we would like to extract and show to the people visting our website.
Then we create lists of both the text and the labels.
Use tf-idf function to ignore words that are not frequently said, eg 1 %. These words may be nonsense or words not important at all, like names.
Use a vectorizer to vectorize all the words and to learn the vocabulary of the document.
Use svm
Take the top 10 and last 10 features to be features that you want to show.

Docker

I have also for this week finally set up docker for our api. This ensures that the code will run anywhere, be it at a local computer or on a virtual machine elsewhere. This would solve our problems that was present during the byte-hackz where the machine learning code would run on only certain laptops. Installing all the required modules on one person’s was a big headache that now would be solved by seperating the algorithms from the main application.

Now all that is left is to set up a virtual machine to process the requests.

Thats all for now, see you after demo-day!