How Flipkart used data science to rank product reviews?

How Flipkart used data science to rank product reviews?

Flipkart is one of the biggest e-commerce platforms in India. The platform has an approximate of 80 million products across 80+ product categories. (Source)

In Cypher 2018, Ravi Vijayaraghavan, Chief Data Analytics Officer at Flipkart, presented a talk about “Analytics And Decision Sciences For E-Commerce” where he explained how Flipkart uses data science to score reviews of products.

The Problem

According to a study, reviews and ratings are the “most important” factor determining the purchase decisions of customers. Reviews can boost up sales by giving customers the information on a product or service they are considering to purchase.

Image from Source

So it is very important for an e-commerce platform to show its customers relevant product reviews. For this, Flipkart used a pretty straight forward algorithm to rank the reviews based on the following two metrics:

1. Proportion of Upvotes
2. Recency of Reviews

In short, reviews with more number of upvotes were displayed first. However, this algorithm had many shortcomings. For instance, the speaker showed some of the highly upvoted reviews:

flipkart data science

Though these reviews were the most upvoted ones, they didn’t describe the product well and were vague. Also, a lot of reviews talked about aspects such as delivery and services, which are subject to change over time. So, showing old, non-relevant reviews might even negatively impact the purchase decision of customers.

They needed a more robust approach to rank the reviews so that the customers be provided with in-depth and unbiased reviews.

A New NLP and ML Based Approach

In order to show their customers reviews that best describes the product rather than the most upvoted ones, Flipkart came up with a new NLP and ML-based approach. This approach ranked reviews based on structure, relevance, and depth.

Step 1: Extract cluster seeds using catalog data

The words best describing the product can be found in that product’s catalog data. So they used the catalog data to extract nouns (seeds for clustering).

flipkart data science

Step 2: Extend seed list using word2vec

Word2vec is a two-layer neural network that takes in a word as its input and outputs a feature vector representing that word. They used this model to extend the initially extracted seeds and create a list consisting of words similar to the seeds.


flipkart data science

Step 3: Extract phrases from review text

They then extracted nouns and adjectives from the review text using Stanford’s Part of Speech (POS) tagger,


flipkart data science

Step 4: Cluster phrases

The extracted phrases from the review text are then clustered according to the seed list obtained in step two. The extracted phrases that are not a part of the seed list are reviewed and added to the original seed list if needed.

Step 5: Assign weights

Weights are assigned to clusters based on the cluster frequency in the entire corpus. In other words, those clusters that occur more in reviews are given higher weights than the clusters that occur less.

Step 6: Score each review

Finally, they developed a scoring algorithm to assign a score called Review Quality Score (RQS) to each review. The brief overview of the algorithm is as follows:

1. Extract keywords matching from cluster definition and assign respective cluster weights $(w_1, w_2, \dots, w_n)$.
2. If a review has keywords such as “pros” or “cons”, a multiplier of 1.2 is used to bump up the score.
3. Finally, calculate the RQS as the sum of cluster weights as:

$$\textbf{RQS} = w_1 + w_2 + … + w_n$$


$$\textbf{RQS} = 1.2 * (w_1 + w_2 + … + w_n)$$


With the new data science based approach, the reviews that covered the product in-depth received higher RQS and thus were ranked higher. As a result of this, more relevant and unbiased reviews were displayed to customers which in fact helped them make a more informed purchase decision. Also, the new ranking of reviews significantly increased the key KPIs at Flipkart such as product page views and the number of reviews read per product page view by up to 142%.

Flipkart is an excellent example of an e-commerce business that is utilizing the power of data science. If you are an individual looking to implement data science for your business, you might want to read a more detailed article on How can businesses use data science.


Have any queries?

Enquire Now

Learn about our services.

Request Quote

Please fill in the form below and we'll get in touch with you.