Using NLP on Amazon Echo Reviews

Sentiment Analysis and Topic Modelling

Kelsey Heng
Analytics Vidhya

--

Image obtained from Google

Recent years have seen the rise of smart home and the growth of smart home technology — a system that anticipates user, responds and automate control of home amenities.

Big tech companies such as Amazon and Google have heavily invested in this promising market, each developing and launching their line of products. Spoilt for choices, where do consumers start? Typically by reading reviews from other users to decide which device to invest in.

In this project, we will specifically look at the reviews on three Amazon Echo, namely the Echo Plus, Echo and Echo Dot, using natural language processing (NLP) techniques. Reviews were obtained from a dataset on Kaggle.

To give us an idea for comparison, the Echos retails from $50 to $150, with the Echo Plus at the highest price point and the Echo Dot at the lowest.

A preliminary look of the ratings given by users (5 being the best) tells us two things; 1) There are more positive ratings than negative ones. 2) Echo could be the bestseller.

To find out if the sentiment of the reviews matches the rating scores, I did sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner) and took the average positive and negative score. VADER, a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed on social media.

The average positive sentiment rating of the reviews is 10 times higher than the negative, suggesting that the rating scores are reliable.

I wanted to find out which features were commonly reviewed, performing topic modelling using LDA (Latent Dirichlet Allocation). Using an iterative process, LDA maps documents to a distribution of topics. The distribution of words in the topics build up over the iterative process. (This article explains LDA in an in-depth manner if you are interested).

The most common topics seem to be: users commenting on how much they love it, the ease of use and sound quality.

Using a count vectorizer (TFIDF), I analysed what the users love and hate about the Amazon Echo aka the words that contributed to positive and negative sentiments.

While some users did not like the sound quality and thought that the Echo was not worth their money, many users thought that the device worked well and is easy to use. These are feedbacks that can be very constructive for the development team at Amazon to look into and consider to improve their devices to meet consumer needs.

All codes for this project can be found on my Github. I can be contacted on LinkedIn if you would like to connect.

--

--

Kelsey Heng
Analytics Vidhya

Neuroscience researcher turned analytics consultant. Huge love for data storytelling, turning numbers into fun facts!