This application relates to data display in a treemap format.
A treemap is a useful tool to represent hierarchical (e.g., tree-structured) data in a constrained space. A treemap typically consists of a group of two-dimensional cells (e.g., rectangles) that each corresponds to a respective portion of the data (e.g., a branch of a tree). Each cell may further contain subdivisions (e.g., sub-rectangles) that represent data of lower hierarchy (e.g., a sub-branch or a leaf of the branch). A cell can have several dimensions (e.g., size and color) that are correlated with various characteristics of the data. For example, the size of a cell can correspond to one statistic about the data, whereas the color of a cell can correspond to another statistic.
One general aspect of the invention relates to a computer-implemented method of generating a treemap display. A collection of data elements characterized by a first attribute is accepted, and some data elements are grouped into a first set of data elements according to a first rule associated with the first attribute. A treemap field is partitioned into a collection of cells according to the grouping result, and the collection of cells includes a first cell representing the first set of data elements. The first cell has a first dimension corresponding to a value of the first attribute of the first set of data elements. The first set of data elements is then divided into a collection of subsets of data elements according to a second rule. Correspondingly, the first cell of the treemap field is partitioned into a collection of sub-cells according to the division result. Each sub-cell represents a respective one of the plurality of subsets of data elements.
Some embodiments may include one or more of the following features.
The first dimension of the first cell that corresponds to the value of the first attribute includes cell size. In some examples where the collection of data elements is further characterized by a second attribute, the first cell is configured to have a second dimension corresponding to a value of the second attribute of the first set of data elements. The second dimension of the first cell may include a color.
In some examples, the first set of data elements is associated with a first set of audio files. The area of the first cell may correspond to a volume of the first set of audio files, and the color of the first cell may correspond to an average time length of the first set of audio files.
In some examples, in order to group data elements into the first set of data elements, the collection of data elements is first ranked in descending order based on the respective value of the first attribute of each data element. A first data element is then compared with a second data element that immediately follows the first element in the rank to determine whether the first data element is in the same group as the second element. In some further embodiments, a nth data element is iteratively compared with a n+1th data element that immediately follows the nth data element in the rank to determine whether the nth data element is in the same group as the n+1th data element.
Another general aspect of the invention also relates to a computer-implemented method of generating a treemap display. A collection of data elements characterized by a first attribute and a second attribute is accepted, and a treemap field is partitioned into a collection of cells based on a rule. Each cell represents a respective group of data elements and has at least a first and a second dimension corresponding to the first and second attributes respectively.
Embodiment of this aspect may include one or more of the following features.
In some examples, each of the collection of cells is partitioned into a respective set of sub-cells based on the rule, and each sub-cell represents at least a sub-group of the group of data elements represented by one of the collection of cells. Each sub-cell has at least the first and second dimension corresponding to the first and second attributes respectively. In some further examples, each sub-cell is iteratively partitioned based on the rule.
In some examples, the collection of cells is first ranked in descending order by the first dimension, and the location of each cell on the treemap is determined based on its corresponding ranking.
The collection of data elements is represented in a tree-based structure. The tree-based structure may be generated based at least on one of the first and second attributes.
In some examples, the tree-based structure includes a first level of nodes each corresponding to one of the groups of data elements, and a second level of nodes each corresponding to one of the sub-groups of data elements. The first dimension of a cell may include size, and the second dimension may include color.
The collection of data elements may be associated with a library of audio files, for example, recordings of calls placed at a call center. Each of the first and second attributes of the data elements may include one or more of the following: a volume of a group of audio files having a common characteristic, an average handle time of the group of audio files, a median handle time of the group of audio files, a standard deviation of the individual handle time of each one of the group of audio files, and a customer satisfaction feedback score associated with the audio files.
A further aspect of the invention relates to a system for generating a treemap display representing a plurality of data elements. The system includes an interface for accepting a selection of display mode and for receiving a description of the plurality of data elements characterized by a first attribute and a second attribute; and a processor for partitioning a treemap field into a plurality of cells according to the selection of the mode of display, where each cell represents group of data elements. The processor is further configured for computing a size of each of the plurality of cells based on the first attribute; and computing a color of each of the plurality of cells based on the second attribute.
Embodiments may include one or more of the following features.
In some examples, the mode of display includes one or more of the following: a “global” mode, a “selected sessions” mode, a “sessions only” mode, a “per session” mode, and an “animated” mode.
The processor may be further configured for grouping at least some of the plurality of data elements into a first set of data elements according to a first rule associated with the first attribute; partitioning the treemap field into a plurality of cells according to the grouping result, the plurality of cells including a first cell representing the first set of data elements, the first cell having a size corresponding to a value of the first attribute of the first set of data elements; dividing the first set of data elements into a plurality of subsets of data elements according to a second rule; and partitioning the first cell of the treemap field into a plurality of sub-cells according to the division, each sub-cell representing a respective one of the plurality of subsets of data elements. In some examples, the processor is also configured for re-partitioning the treemap field in response to a modification in the selection of display mode.
Embodiments of various aspects may include one or more of the following advantages.
One application of a treemap is in audio analytics, including, for example, tracking audio sessions and query results for a call center. By visualizing calls that match specific queries in grouped regions of the map, where each region has multiple dimensions representing various statistics of the calls, areas of concern to a call center may be quickly identified in a visually intuitive way. In some embodiments, the area of each group may correspond to the call volume (number of calls or audio files) and the color of the group may correspond to the average handle time of the calls matching a specific query. Multiple “sessions”, each containing distinct audio files (calls) and queries, can also be simultaneously displayed in a treemap. In some circumstances (for example, where a new session is defined on a subset of an old session), different sessions may also contain a common set of audio files.
As the number of sessions, queries, and calls tracked by a call center analytics system increases, there is an increasing challenge to identify relevant data and to navigate the data to retrieve useful information. The treemap tool described in this application can help both by visually identifying sets of calls that differ from the norm (or from the past) in handle time or in call volume, and by providing a landing page from which a user may dive into the details of specific sessions, queries, and calls that are representative of a trend of interest.
Other features and advantages of the invention are apparent from the following description, and from the claims.
Referring to
Each region of the treemap 100 includes multiple cells that are also shown in rectangles. For example, region 110 contains cells 112, 114, 116, and 118. A cell can represent data of the smallest unit, or alternatively, further include a set of sub-cells. Each cell has at least three dimensions, including the size, the color, and the region to which it belongs. Each of the three dimensions can be used to represent a respective characteristic of the data that the cell represents, as will be described in greater detail later.
By encoding various aspects of data characteristics in a compact and organized display, the treemap 100 provides a visual representation of a high-level overview that allows users to navigate through the data and obtain information for identifying particular areas of interest. Such a treemap layout can be useful in many applications, including, for example, display of stock prices, photo albums, and distributed networks. The following embodiments will be described primarily in the context of audio analytics.
In this example, treemap 200 provides an effective visual representation for revealing condensed information to viewers. Here, each one of regions 210, 220, 230, and 240 represents a set of audio files that belongs to a particular session. For example, region 210 represents audio files that are directed to the session of “technical support.” Within this session, files are further organized by subdivisions, for example, according to result of queries. For instance, cell 212 may represent files that match the query of “Is this call about an installation failure?” in the session of “technical support.”
The color and size of subdivisions are designed to represent characteristics of calls associated with this subdivision. Depending on implementation, the color of cell 212 may be provided in RGB- or grey-scale to represent the average handle time of the calls associated with this cell, and the size may represent the call volume (e.g., number of audio files). Other characteristics that can be represented either by color or size include the median handle time of calls, standard deviation of the individual handle time of the calls in this subdivision, the number of transfers, the number of hits for a specific query, customer satisfaction feedback scores, and etc.
Referring to
Referring to
Subsequently, in step 358, the grouping unit 324 uses a recursive algorithm to group nodes into sets and determines the locations and dimensions of individual rectangular cells that will be used to represent the sets and nodes. Based on the grouping result, the area computation unit 326 and color computation unit 328 respectively determines the area and color of each cell, in step 360 and 362. The treemap generation unit 320 then generates the treemap in step 364, and display the treemap in the selected mode in step 366.
Depending on implementation, there are various approaches to grouping nodes in a tree structure and subsequently conforming them to cells on a treemap in order to increase readability of the treemap. One approach, for example, places groups of nodes of larger size in the upper left sections of the map and groups of smaller sizes in the lower right sections. Described below is a two-stage recursive algorithm developed for this approach.
Referring to
Here, a node refers to a unit of subdivision, for example, ingest sessions, structured queries, or search terms. Therefore, a node can be a session node, a query node, a search term node, or other types of nodes. The size of a node refers to the call volume (e.g., the number of audio files) associated with the node. For example, a query node of “installation failure” having a size “101” indicates that there are 101 audio files that match the query “installation failure.” In some implementations, a query node is specifically tied to the session it situates. That is, a query node “installation failure” in session “technical support” differs from a query node “installation failure” in session “agent behavior.” The size of the query node therefore represents the call volume directed to a query in a specific session.
When there are no more subgroups that have more than two nodes, the node grouping process completes.
Once cells of the session nodes are drawn in the map, in step 522, children nodes of each session node (e.g., query nodes) are ranked in descending order of size. In steps 524 and 526, children of each session node are grouped into subsets again using the two-stage recursive algorithm. In steps 528 and 530, cells for each child node are then created in a similar way as described for the session node above.
As described earlier, each node of a data tree can be represented as a rectangular cell in a treemap. The color of a cell can be used to represent one or multiple characteristics of the corresponding node. In one embodiment, the color of each rectangle corresponds to the average handle time (AHT) of calls associated with the node (or average duration of the audio files). In this embodiment, there are three “anchor colors,” corresponding to an “average AHT,” a “maximum AHT,” and a “minimum AHT.” When a RGB color model is used, these three anchor colors can be, for example, blue, red, and green, respectively. Depending on whether the AHT of an individual rectangle is longer or shorter than an “overall average” value AHTavg (e.g., an overall average length of the calls in this session, or an overall average length of the calls in the entire data set), the color of the rectangle is computed by fading between the “maximum AHT” color and the “average AHT” color (when AHT>AHTavg), or by fading between the “minimum AHT” color and the “average AHT” color (when AHT<AHTavg). This fading can be done using in a logarithmic fashion or a linear fashion, for example by applying a fading function.
The area of a cell in a treemap can also be used to represent one or multiple characteristics of the node associated with the cell. In the embodiment described with reference to
In some implementations, the areas of cells within a single session are drawn to the same scale, for example, based on the corresponding call volume of each cell. The area of cells that represent sessions, however, are scaled relative to the number of calls in the session that have at least one structured query or search term hit, rather than the total call volume in the session. Therefore, the area of one cell in one session may not be directly comparable to the area of another cell in different session based on their call volumes.
Note that variations of the above-described processes of node grouping, color computation, and area computation may be implemented depending on the specific mode of display selected by users or compute programs. Each mode may present user with data shown in selected portions and to the extent of detail desired. By switching between various display modes, users can navigate through data and identify areas of interest for further study. For illustrated purposes, the following section further describes of four exemplary display modes.
In this “Global” mode, the entire collection of sessions or a group of selected sessions is displayed in the treemap as rectangles bordered by white lines. Within each session area, query nodes are displayed as rectangles of different colors.
In this “Selected Sessions” mode, a group of selected session is displayed in the treemap as rectangles bordered by white lines. Within each session area, query nodes are displayed as rectangles of different colors.
In this “Sessions only” mode, the treemap only shows the session divisions of audio files, not the individual queries. Therefore, no query cells are shown within each session cell.
In this “Per Sessions” mode, a group of selected session is displayed in the treemap as rectangles bordered by white lines. Within each session area, query nodes are displayed as rectangles of different colors.
In addition to the four examples described above, there are many alternative display modes suitable for use. For example, the treemap may be configured in an animated mode, such the color and size of each cell can change, for example, by tracking and filtering and calls against a specific time frame or other metadata, such as “call center.” In such an animated view, it may be possible to see a specific type of call grow as a proportion of the total calls over time, or to watch the average time of a specific type of call change over time indicating, for example, that agents have become more proficient in handling such type of calls. In fact the ability to display a treemap on such filtered data is implemented, but not with animation or automated sequencing of views.
In addition to the area computation approach described in section 4, there are several alternative approaches for computing areas of rectangular cells in a treemap. For example, the area of a session cell can be determined based on the number of audio files in the session, regardless whether the files contain hits (match queries) or not. In a second example, the area of a session cell can be determined based on the number of audio files with hits (but only count a file once, even if it has multiple hits). In a third example, the area of a session cell can be determined based on the number of hits on the audio files (that is, a file with multiple hits will be counted as multiple instances).
There are also alternative approaches for node grouping. For example, the grouping unit 324 may use a single-stage recursive algorithm that divides nodes into two groups such that the ratio of the sums of the node sizes in each group equals the square of the golden ratio. To find the correct division of nodes, multiple intermediate sums are computed and compared. Although this algorithm may be potentially slower, it is likely to make better choices in the presence of unusual or worst-case data.
In each of the display modes, the AHTavg time used for color computation can also be set manually to a time desired by the user (e.g., artificially enhanced or lowered), so that the colors of the cells are computed with reference to a desired average handle time rather than an actual average handle time.
Although the treemaps described above and shown in
The techniques described herein can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The techniques can be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps of the techniques described herein can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by, and apparatus of the invention can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). Modules can refer to portions of the computer program and/or the processor/special circuitry that implements that functionality.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry.
To provide for interaction with a user, the techniques described herein can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer (e.g., interact with a user interface element, for example, by clicking a button on such a pointing device). Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
The techniques described herein can be implemented in a distributed computing system that includes a back-end component, e.g., as a data server, and/or a middleware component, e.g., an application server, and/or a front-end component, e.g., a client computer having a graphical user interface and/or a Web browser through which a user can interact with an implementation of the invention, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet, and include both wired and wireless networks.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact over a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It is to be understood that the foregoing description is intended to illustrate and not to limit the scope of the invention.
This application claims the benefit of U.S. Provisional Application No. 61/089,265 filed Aug. 15, 2008, the contents of which are incorporated herein in its entirety.
Number | Date | Country | |
---|---|---|---|
61089265 | Aug 2008 | US |