This invention pertains in general to mining of information from product reviews in electronic commerce and more particularly to a method and a system for providing a comprehensive product overview/search using user-weighted attribute-based sort-ordering of products.
Products are often discussed in public reviews, online and in other media. Reviews are typically written by professional critics, by experts, and/or by ordinary consumers. Reviews often discuss particular features of a reviewed item, and provide the reviewer's subjective opinions regarding the item (product or service) and its features. A rating may be given as part of a review, to indicate an item's relative merit. e-commerce websites often provide a facility to write a product review on their sites, giving consumers a chance to rate and comment on products they have purchased. Such reviews are published near or on the web page(s) that offer the reviewed product. Users can also rate products (a star-based rating system is provided). Other consumers can read these reviews when considering items for purchase. When several reviews have been given, an overall rating based on the individual ratings can be calculated and displayed on the product page.
Internet product searches are used to help Web users research and buy products. With the widespread growth of Internet use, the Internet (such as blog, forum, etc.) has produced a large number of users to participate and comment on products, events and provide other review information. These comments often express a variety of user information and emotional colors and emotional tendency, which not only provides an information display platform for businesses, but also for the consumer (ie the user) provides a platform for the exchange of product experience. Extracting information and meaning from these massive texts with this kind of emotion, using text sentiment analysis and language processing, and converting it into a instantly comprehensible representation (like a numerical score—sentiment score) has a strong business and customer value, for example, the user can review for information commodity goods, choose the right product; businesses can use data gleaned from user reviews to improve product quality, and strive for greater market share.
A basic task of sentiment analysis is the text sentiment classification into positive or negative text. Another task is to identify entities and attributes within it, and the larger goal within the product review context is to mine all the relevant information and convert it into an easily understood metric about the product (like a numerical score).
A number of product search systems currently exist—many companies (e.g. Google, Microsoft) have search engines with a variety of different product search systems by crawling websites of e-retailers. Also, vertical search engines exist that provide a plethora of search options.
In both product search and online shopping systems, a common function is to rank products according to the preference of end users. Since most of these Web sites allow users to give rating scores (typically from 1 to 5 stars) for products, the typical product ranking approach is based on the average score of all ratings given by end users for each product.
The search process for products with many attributes, and many variants at different price points is complex. All the existing approaches for product search at e-Commerce websites and shopping comparison websites implement product attribute-based filtering to aid the product search and discovery process. This has certain drawbacks—it does not provide a comprehensive product overview, it does not consider products holistically (products at the boundary are eliminated) and it does not customise according to user preferences.
In a prior art an U.S. Pat. No. 8,892,422 discloses methods of phrase identification, using identification of a phrase weighting of a sequence of words as a function of the position of words present in the sequence of words and apparatus thereof. Methods are provided herein to help determine the co-occurrence consistencies for positional word pairings of a variety of word sequences in a corpus that may be used in identifying a phrase; determining a phrase coherence of a word sequence based on the co-occurrence consistencies for positional word pairings in the word sequence; and determining one or more phrase boundaries in a word sequence.
Another prior art, an U.S. Pat. No. 5,696,962 discusses method for computerized information retrieval from a text corpus in response to a natural-language input string, e.g., a question, supplied by a user. A string is accepted as input and analyzed to detect noun phrases and other grammatical constructs therein. The analyzed input string is converted into a series of Boolean queries based on the detected phrases. US Specification U.S. Pat. No. 9,037,464 B1 (Computing Numeric Representations of words in a high-dimensional space) discusses techniques to obtain a respective numeric representation of each word in the vocabulary in the high-dimensional space.
In the prior art following non patent literature has been referred:
Lack of Comprehensive Overview of a Product:
It is possible to get a comprehensive overview of the quality of a product by analysing along two dimensions—one based on the technical specifications of the product, and another based on what the users of the products are saying about it. Existing approaches to product search do not provide a useful summarisation of user reviews, at the most, they provide only a listing of user reviews from their own sites. Users are forced to navigate hundreds of reviews for each product on multiple website and then assimilate all this information. It is very difficult to condense all this information into a single representative metric that provides an overview of the product. Since it is not possible to easily obtain a representative metric that conveys the quality of the product as gleaned from user reviews, it is therefore not possible to get a comprehensive overview of a product—it can be rated only on the basis of its technical specifications.
Arbitrary Elimination of Products—
Filtering applies an arbitrary boundary and excludes all products that fall just outside the boundary. (For e.g. camera resolution [in megapixels] is a common filter used to simplify search for smartphones. However, applying a filter at 8MP and above for the camera arbitrarily excludes phones that may have had a very good camera with 7.9 MP resolution).
Lack of Customisation—
Different users attach different levels of importance to various product attributes. The filtering mechanism does only a binary selection/elimination and does not allow users to attach varying levels of importance to different attributes. (E.g.—If battery life is the most important criteria for me, followed by camera quality, and if screen size does not matter at all, then the search results should sort records in such a way that phones with the best battery life appear higher than others). The filtering mechanism does not allow for this.
The discussion above is merely provided for general background information and is not intended for use as an aid in determining the scope of the claimed subject matter.
Systems and methods in accordance with various embodiments of the present invention can provide for the information mining via language processing of product reviews in electronic commerce. For products with many attributes and many variants, the buying decision involves a lot of complex research because—
Therefore herein described there is provided a computer-implemented system and method for product search using the User-Weighted, Attribute-Based, Sort-Ordering comprising the steps of: computing of specification score for product attribute; computing of sentiment score for product attribute; characterized by steps of extracting reviews for each product from multiple sources; detecting the attributes described in each product review; detecting the polarity (positive/negative) of the user review with respect to each attribute and converting the detected information into a numerical score for each attribute which captures all the information about that attribute from user-ratings; computing the overall product score based on specification score and sentiment score of individual product attributes; and displaying the search results sorted according to the overall product score.
In some embodiments, the present invention provides a computerized system and method for searching, analyzing, and display data using an User-Weighted Attribute-Based Sort-Ordering algorithm. More particularly the present invention provides a solution to personalize relevant data using a user-defined, user weighted, and a user-profile-driven method to obtain relevant data and feedback tuning for searching, comparing, and analysing data as product review.
In some embodiments, the present invention provides a novel approach to product search that overcomes the drawbacks of the existing method by doing the following—
Some embodiments further include enabling user defined relevant information in the form of input data or feedback. Other embodiments enable and facilitate sharing of data and user defined and user weighted feedback and decisions with regards to purchasing, evaluating, comparing, predicting, searching and browsing a particular product, individual event or other user-defined topic. The new approach has the following advantages
Such as herein described there is provided a method and system configured for comprehensive product search and overview using user weighted attribute based sort ordering. The disclosed sort ordering takes all products into consideration and does not eliminate products at arbitrary boundaries. The improved method encompasses all the attributes of the product into consideration and therefore, is considered as a more holistic ranking of products.
The users are allowed to assign different weights to individual product attributes, leading to a more personalized search, also accommodating all possible variables/varieties of products. —This is not possible under the existing methods.
As per an exemplary embodiment, the system architecture includes a processing unit, typically a computer for use as a user and/or server according to one embodiment. Illustrated are at least one processor coupled to a bus. Also coupled to the bus are a memory, a storage device, a key board, a graphics adapter, a pointing device, and a network adapter. A display is coupled to the graphics adapter.
The processor may be any general-purpose processor. The results may be stored in the memory, and the method comprises storing the real result. The results may be stored in any memory, and may be stored in a volatile, or preferably non-volatile memory. They may be stored using any suitable data storage medium or media. In particularly preferred embodiments the results are stored using a set of one or more memory drives. Any suitable drive may be used, but preferably the or each drive is a solid state drive (SSD). Such drives have been found to be particularly useful for storing result tables, as SSDs may provide fast access to stored. The pointing device may be a mouse, track ball, or other type of pointing device, and is used in combination with the keyboard to input data into the computer system. The graphics adapter displays images and other information on the display. The network adapter couples the computer to a network.
As is known in the art, the computer is adapted to execute computer program modules stored in memory. As used herein, the term “module” refers to computer program logic and/or data for providing the specified functionality. A module can be implemented in hardware, firmware, and/or software. In one embodiment, the modules are stored on the storage device, loaded into the memory, and executed by the processor.
Relevant pieces of the information are extracted from the data retrieved from the diverse set of sources and stored. Product information gathered by aggregation may be normalized into a single unified representation, which is described in detail below. Each product is associated with a product category as well as with the information collected about the product. The processing of the information obtained from different information sources across numerous product categories is challenging since there is no single representational standard used across web sites for representing the information and the information is constantly changing. The accuracy of the analysis of the quality of a product typically improves with the volume and diversity of data used for processing. More, diverse data results in better estimation of customer satisfaction, sentiment and better coverage of products across the internet.
Systems and methods in accordance with various embodiments of the present invention can overcome the aforementioned and other deficiencies in existing product review approaches by providing a different approach to product search, based on the following key insights.
The sentiment analysis engine analyses millions of user reviews, extracts meaning from these reviews, produces a numerical score for each product that encapsulates the user-reviews for that product (more positive the reviews, higher would be the score).
There are n Products in a set {P1 . . . Pr}.
Each of these products has r Attributes i.e. all products {P1 . . . Pn} have r attributes in the set {A1 . . . Ar}. The possible set of product-attribute combinations is (n×r).
Each attribute of these r Attributes has any number of discrete possible values along a spectrum from Ai(min) to Ai(max) where Ai(min) and Ai(max) are the minimum and maximum values for the attribute Ai.
There is a user u, that assigns a weight Wi to every attribute Ai in the set {A1 . . . Ar}. Every attribute Ai in the set {A1 . . . An} is given a weight Wi that can vary in a discrete set of weight values from {Wmin . . . Wmax}
Our user-weighted Attribute-based Sort-Ordering for Product Search ranks the n products in descending order of their Product Scores. The product score is computed as a weighted sum of the individual attribute scores (weights are assigned by the user).
Each attribute score is computed as a weighted average of the specifications score, and sentiment score for the attribute. The specifications score is based on the technical specifications as suggested by the manufacturers, while the sentiment score is based on analysis of the text of the review for the product.
For e.g. Product Score for mobile phone P1 will be weighted sum of attribute scores for display, camera, screen size and performance—where weights will be specified by the user each of the four attributes to denote the importance of those attributes. Scores of the attributes themselves will be weighted averages of the specification score for the attribute (rank-normalized) and the sentiment score for the attribute (numerical score based on sentiment analysis).
The process therefore has the following two steps:
Step 1: Computation of standardized scores for individual product attributes
This step can be divided into two parts—
A. Computation of specification score for product attribute
B. Computation of sentiment score for product attribute
Part a—Computation of Specification Score for Product Attributes
Since individual attributes are not comparable (e.g. camera---->MegaPixel, is not comparable to battery--->maH), it is necessary to standardize the individual attribute scores in order to enable the addition of attribute scores. This is achieved using normalization and percentile based scaling.
Part B—Computation of sentiment score for product attributes.
This involves the following steps
Further details of the sentiment score computation are given here.
sa(i)=standardized score (between 0 and 1) for attribute Ai.
This score has two components—
The specifications score for the attribute is achieved by rank normalization/min-max scaling etc. This makes it possible to add up scores that are not normally comparable.
For sentiment scores, a different methodology is used to compute scores, as outlined below.
The standardized attribute score is therefore, an average of the specification score and sentiment score for the attribute.
For phones where the sentiment score is unavailable, we apply a smoothing constant on the specifications score to arrive at the overall product score.
Therefore, for a product P1, the standardized attribute score for individual attribute Ai is denoted by s(P1)a(i).
s
(P1)
a(i)=(s(P1)a(spec)(i)+s(P1)a(sent)(i))/2
where S(P1)a(spec)(i) is the specification score for attribute a(i) of product P1 and S(P1)a(sent)(i) is the sentiment score for the attribute a(i) of product P1.
Step 2: Calculating the overall product scores by summing up the standardized attribute scores, with user-weighted criteria, to derive user-specific product score.
S
(Pj) for user u=Σi=1Twu(i)S(Pj)a(i)
This is expressed as weighted summation of scores for the r individual attributes of Pj. Where s(Pj)a(i) is the standardized attribute score for individual attribute a(i) of Product j. Wu(i) is the weight assigned by user u to the attribute i.
Following can be noted from above equation
Smartphone Search.
The disclosed system and method use the machine learning approaches to do sentiment analysis on user reviews and expert reviews. There are several steps involved in processing the reviews to derive a numerical score, and a brief summary of the stages in process is given below—
total score(a,p)=(sentiment score(a,p)+specification score (a,p))/2
total score (a,p)=(specification score(a,p)*sentiment smoothing (p))
total score (p)=(Σaϵaspects total score(a,p)/|aspects|
Although the foregoing description of the present invention has been shown and described with reference to particular embodiments and applications thereof, it has been presented for purposes of illustration and description and is not intended to be exhaustive or to limit the invention to the particular embodiments and applications disclosed. It will be apparent to those having ordinary skill in the art that a number of changes, modifications, variations, or alterations to the invention as described herein may be made, none of which depart from the spirit or scope of the present invention. The particular embodiments and applications were chosen and described to provide the best illustration of the principles of the invention and its practical application to thereby enable one of ordinary skill in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated. All such changes, modifications, variations, and alterations should therefore be seen as being within the scope of the present invention as determined by the appended claims when interpreted in accordance with the breadth to which they are fairly, legally, and equitably entitled.
| Number | Date | Country | Kind |
|---|---|---|---|
| 3691/CHE/2015 | Jul 2015 | IN | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/IN2015/000342 | 9/1/2015 | WO | 00 |