An enterprise, such as a company, educational organization, government agency, and so forth, can receive a large amount of feedback from users or customers in the form of comments received over time. If there is a large volume of comments, then it can be relatively difficult for analysts to manually detect problems indicated by the customer feedback.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Some embodiments are described with respect to the following figures:
An enterprise can receive relatively large amounts of data, such as customer feedback in the form of comments. The comments can be received over a network, such as the Internet, where customers can supply comments regarding a product or service through the enterprise's website or through a third party website such as a social networking site. Alternatively, or additionally, comments can be received in paper form and entered by the enterprise's personnel into a system in electronic form.
Data records can be stored to represent the time sequence of comments. In some cases, it may be desirable to visually analyze the comments by employing automated visualization of the comments in graphical form (without a user having to read the individual comments). When there is a relatively large number of comments, however, graphical elements representing corresponding comments can be close to each other or can actually overlap each other, particularly when the comments are associated with the same time points or time points that are relatively close to each other. A large number of overlapping graphical elements or graphical elements close to each other can make it difficult to understand what is being represented by the graphical elements.
In the ensuing discussion, reference is made to visually analyzing customer comments. Note, however, that techniques according to some implementations can also be applied to data records representing other types of events, such as measurements taken by sensors within a system (e.g., a network of computing devices), or other types of events.
Referring to
In dense regions of the sequence 202 (regions that have relatively large numbers of comments close in time to each other or having the same time), the rectangles can overlap either partially or even entirely (such as when there are multiple comments associated with the same time point). The time point with which a comment (or other type of event) is associated with can represent the time point at which the comment (or other event) was created, received, submitted, and so forth. Any gap between two rectangles in the sequence 202 represents a time gap between the respective comments. Darker lines or even dark rectangles (206) in the sequence 202 represents multiple comments that are close in time to each other (note that the rectangles of the corresponding comments overlap each other).
To address the issue of overlapping graphical elements (e.g., overlapping rectangles in the sequence 202 of
In the example of
The graphical elements in the comment sequence track 208 are assigned different colors corresponding to different values of a respective attribute of the respective comment. In the example of
Other examples of attributes that can be represented by different colors of the graphical elements in the comment sequence track 208 include product features, concepts, persons, etc.
Since the inter-event temporal information (in other words, time gaps between comments) has been removed in the comment sequence track 208, a second visualization is generated (at 106), which can be in the form of a time density track 210. The time density track 210 has gap representing elements to represent time gaps between respective successive comments. In the example of
For example, a point 216 that has a high value indicates that the comments represented by respective graphical elements 218A and 218B in the comment sequence track 208 are relatively close to each other in time (and in fact, overlap each other). Another point 220 that has an above average height indicates that two successive comments represented by graphical elements 218C and 218D in the comment sequence track 208 are relatively close to each other (they do not overlap but have a relatively short time gap in between).
Another point 222 having a below average height indicates that the two corresponding comments represented by two respective graphical elements in the comment sequence track 208 have a medium gap between each other. A zero height of a point along the curve 214 indicates that there is a relatively long time gap between successive events.
By looking at the curve 214 of the time density track 210, an analyst can quickly identify points along the comment sequence track that would be more interesting (for example, points along the comment sequence track 208 associated with negative feedback and where the comments are arriving relatively close in time to each other). Such an “interesting” point along the comment sequence track 208 can correspond to times when some problem has occurred, such as a website crashing, a product being out of stock, and so forth. Since the graphical elements of the comment sequence track 208 do not occlude each other, a user can go to any point along the comment sequence track 210 and select, using an input device, respective ones of the graphical elements to obtain further detail regarding the respective comments. Also, by looking at the combination of the comment sequence track 208 and time density track 210, patterns can become more visible to the analyst. The pattern can be based on colors of the graphical elements of the common sequence track 208, along with the varying heights of the curve 214 in the time density track 210.
The visualizations in
To remove time gaps between comments and to avoid representing overlapping comments with overlapping graphical elements, a comment sequence track 304 is generated having five graphical elements (each of the same length) to represent the respective comments a-e. A time density track 306 is also generated, which can be in the form of a curve 308 having points (small squares) to represent respective time gaps between successive pairs of comments. For example, the point on the curve 308 corresponding to the time gap between comments a and b has a height of 5 time units (to represent the time gap of 5 time units between comments a and b). Any time gap between successive comments of greater than 20 time units (or other predefined threshold) has a zero height on the curve 308 (to indicate that the successive comments are far away from each other in time such that they are not considered to be interesting). The threshold at which the height of the curve 308 representing a time gap between comments is set at zero can be defined differently for different implementations.
In some examples, the height point (square box in the curve 308 of the time density track 306) is calculated according to:
where time_density_height(i, j) represents the height of the point along the curve 308 to represent the relative time gap between comments i and j, timedist(i, j) represents the time distance between comments i, j, and avgtimedist represents the average time gap between successive pairs of comments. The parameter avgtimedist is a moving average, since avgtimedist changes as more comments are received.
More generally, the height can be based on a ratio between the time gap between successive comments i and j, and the average time gap (e.g., moving average time gap) of comments received so far.
As further shown in
The system can then search (at 404) opinion words in the selected comments, and map the opinion words to the selected attribute. The “opinion words” refer to words in the selected comments that have some bearing to the selected attribute. Opinion words can be considered to be relevant to the selected attribute based on proximity of the opinion words to the selected attribute The opinion words may include negative opinion words, positive opinion words, or neutral opinion words. Based on the mapped opinion words to the selected attribute, each comment can be assigned a particular color to represent whether the comment is associated with negative, positive, or neutral feedback with respect to the selected attribute. Alternatively, instead of performing searching of opinion words to map to the selected attribute, a comment may also or alternatively include a user rating (e.g. 1-5) regarding a particular attribute. Such ratings can be used for assigning colors to the graphical elements of the comment sequence track.
Next, the system calculates (at 406) time density heights for the time density track. For example, the heights of points along a curve (e.g., 214 or 308 in
The comment sequence track and time density track are then depicted (at 408) in respective visualizations (such as shown in
The computer 700 has a network interface 712 to communicate over the network 702. The network interface 712 is connected to a processor (or multiple processors) 714. A visual analysis module 716 is executable on the processor(s) 714 to perform the tasks of
The visual analysis module 716 can include machine-readable instructions that are loaded for execution on processor(s) 714. A processor can include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device.
Data and instructions are stored in respective storage devices, which are implemented as one or more computer-readable or machine-readable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs); or other types of storage devices. Note that the instructions discussed above can be provided on one computer-readable or machine-readable storage medium, or alternatively, can be provided on multiple computer-readable or machine-readable storage media distributed in a large system having possibly plural nodes. Such computer-readable or machine-readable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components.
In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some or all of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.