Features
Detecting why a product is rated bad or good
Welcome back in this week, I look at an algorithm used to generate features on why a product is bad or good.
-
First how it works is it we change a review to be a “negative review” if it is less than 3 stars there are surely text in that review that tells us why it is 3 stars. These are the features we would like to extract and show to the people visting our website.
-
Then we create lists of both the text and the labels.
-
Use tf-idf function to ignore words that are not frequently said, eg 1 %. These words may be nonsense or words not important at all, like names.
-
Use a vectorizer to vectorize all the words and to learn the vocabulary of the document.
-
Use svm
-
Take the top 10 and last 10 features to be features that you want to show.
Docker
I have also for this week finally set up docker for our api. This ensures that the code will run anywhere, be it at a local computer or on a virtual machine elsewhere. This would solve our problems that was present during the byte-hackz where the machine learning code would run on only certain laptops. Installing all the required modules on one person’s was a big headache that now would be solved by seperating the algorithms from the main application.
Now all that is left is to set up a virtual machine to process the requests.
Thats all for now, see you after demo-day!