Online Reviewing Made Super Easy By Mining User Generated Content

With the transition from word of mouth to an ‘electronic word of mouth’ marketing culture, businesses today feel the need to possess an expansive arsenal of user feedback (preferably positive) in order to mark their reputation and presence in the Web. Though the 1% rule of Internet is presumed to be dead, the proportion of lurkers, i.e., people who observe user-generated content in the Web without contributing, is still high. A 2010 Pew Internet survey reveals that only a quarter (24%) of Americans have ever posted comments or reviews online about the things they buy. Thus, businesses strategize innovative incentive programs in a desperate attempt to garner customer reviews.

“Writing reviews is too tedious” is the most popular reason given by people who never or rarely write online reviews, followed by “I forgot”. While businesses can take care of the later by sending reminder emails to customers periodically, they still need a system to simplify the review writing task for the customers.

Dr. Gautam Das at University of Texas at Arlington and his students, Mahashweta Das and Azade Nazi provide a novel solution to this problem. They leverage user feedback available in the Web for products by past users in order to identify a set of meaningful phrases, i.e., tags, that when advised to the user would help her review the product. The user would quickly choose from among the set of returned tags to articulate her feedback for the product without having to spend a lot of time writing the review.

As one of the first step towards solution, they employ text mining techniques in order to extract meaningful phrases or tags with sentiment labels from user feedback in the form reviews. “It is a lightweight camera with some amazing features” is reduced to the tags {lightweight camera, amazing features}, where both tags have positive sentiment.

The inventors formulate the problem as a general-constrained optimization problem. A core challenge in this design is defining the essential properties of the tags to be returned that would serve to review the product effectively. They consider relevance (i.e., how well the result set of tags describes a product to a user), coverage (i.e., how well the result set of tags covers the diverse aspects of the product), and polarity (i.e., how well sentiment is attached to the result set of tags) in order to enable a user to satisfactorily review a product.

A user can review a product in different ways. A user can express her broad opinion about the different aspects of the product which, in turn, can either be positive or negative. Again, a user can express both positive and negative opinion for the same product feature. For example, a user may write a review for a camera as “The picture quality of this camera is great and so is the sharpness and color accuracy of the pictures, but the battery life is short.”, while another user of the same camera may write “Though the extra screen with touchscreen and gesture-control features saps battery life, it’s perfect for fashion-conscious snap shooters.”. The first review contains positive feedback for the camera’s image quality and negative feedback for the camera’s battery life. The second review contains both positive and negative feedback for the camera’s advanced features such as dual-screen, touchscreen and gesture-control. The general problem formulation considers two different definitions of coverage of product features by tag in order to enable the different real-world scenarios. They develop practical algorithms with theoretical bounds to solve the problem efficiently and effectively.

The team conducts an Amazon Mechanical Turk user study in order to validate if users prefer and benefit from tags returned by the proposed solution for reviewing products. They consider 12,600 reviews available in Walmart by 11,500 users for 140 digital cameras as training data and generate tags for 6 new cameras.

The user study was conducted in two phases. In the first part of the study, an overwhelming 71% of the users reviewed the six cameras choosing tags returned by the proposed solution instead of writing the review from scratch thereby validating that they find the returned tags meaningful and adequate. The second part of the study intended to determine if the feedback left by the users match the tags returned by the solution. Domain experts studied the results and revealed that 77% of the users submitted feedback that matches tags returned by the proposed system.

The work was accepted for publication at the premium international peer-reviewed research conferences SIGMOD/PODS by Association for Computing Machinery (ACM) and at VLDB, both of which ranks among the top Computer Science conferences of all times. The demonstration of the work will be presented at the 42nd International Conference on Very Large Databases in New Delhi, India on 6th September, 2016.


  1. Azade Nazi, Mahashweta Das, Gautam Das. The TagAdvisor: Luring the Lurkers to Review Web Items. 34th ACM SIGMOD International Conference on Data Management.
  2. Rajeshkumar Kannapalli, Azade Nazi, Gautam Das and Mahashweta Das. ADWIRE: Addon for Web Item Reviewing System. 42nd International Conference on Very Large Data Bases.
  3. Web Item Reviewing Made Easy by Leveraging Available User Feedback. arXiv preprint: 1602.06454. 2015