Analyzing software users with instrumentation data and user group modeling and analysis

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is related to the following copending U.S. patent applications, assigned to the assignee of the present application, filed concurrently herewith and hereby incorporated by reference:

Software Reliability Analysis Using Alerts, Asserts, and User Interface Controls, U.S. Pat. No. 7,681,085;

Multidimensional Analysis Tool for High Dimensional Data, U.S. patent application Ser. No. 11/818,607;

Efficient Data Infrastructure for High Dimensional Data Analysis, U.S. patent application Ser. No. 11/818,879;

Software Feature Usage Analysis and Reporting, U.S. patent application Ser. No. 11/818,600;

Software Feature Modeling and Recognition, U.S. Pat. No. 7,680,645; and

Analyzing Software Usage with Instrumentation Data, U.S. patent application Ser. No. 11/818,611.

BACKGROUND

Understanding the way in which software users use software can be very valuable when working to improve the effectiveness and ease of use of software applications. Traditional ways to analyze software users include usability studies, user interviews, user surveys and the like.

Various data can be collected during actual software usage to obtain information related to how users use and otherwise interact with a software program. However, analyzing that data to obtain useful information about the users, including how to model and analyze a specific group of users, is a difficult problem.

SUMMARY

This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.

Briefly, various aspects of the subject matter described herein are directed towards a technology by which software instrumentation data collected from user sessions corresponding to one or more programs is analyzed, including by determining program usage metrics and/or command usage metrics. Information representative of the program usage metrics and/or the command usage metrics is output, such as in the form of a report. The software instrumentation data may be further analyzed, such as to determine at least one usage trend over time, and to determine user groups.

Examples of program usage metrics include session count information based on a number of application sessions, session frequency information based on a time measurement between sessions, running time information based on session time, session length information based on session time and session count, and/or depth of usage information based on a percentage of commands used. Examples of command usage metrics include user count information based on a number of users of the set who use the selected command, percentage of users information corresponding to a percentage of users of the set who use the selected command, session count information based on a number of sessions in which the selected command occurred, percentage of session information corresponding to a percentage of application sessions in which the selected command was used, click count information corresponding to a number of clicks corresponding to the selected command, percentage of click count information corresponding to a percentage of program clicks corresponding to the selected command, click count per user information based on click count and user count of the selected command, and/or click count per session information corresponding to a click count per session.

The software instrumentation data may be analyzed to determine at least one type of user, and for modeling a user group. For example, users may be categorized by their depth of usage, and/or by the types of activities in which they engage. Potential outliers may be identified based on command usage that is significantly different from the command usage of other users. Users may be located from their sessions based on session criterion comprising a dimension and a value for that dimension, where each dimension comprises a variable recorded in a session, a feature, or results computed from a plurality of variables.

A subset of sessions that meet specified session criteria based on a set of session data may be located, along with a subset of users based on users whose sessions meet specified user criteria. The subsets may be combined via Boolean logic to produce a result set.

Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:

FIG. 1 shows an example representation of recording software instrumentation data for subsequent analysis.

FIG. 2 shows a representation in a table format of example software instrumentation (e.g., software quality metrics) data saved for various program usage sessions by users of a suite of application programs.

FIG. 3 shows an example representation of an analyzer for analyzing software instrumentation data and a mechanism for modeling groups of users based on the software instrumentation data.

FIG. 4 shows a representation of various example concepts related to analyzing software instrumentation data and/or modeling groups of users.

FIG. 5 is a flow diagram representing various example concepts related to analyzing software instrumentation data and/or modeling groups of users.

FIG. 6 is an illustrative example of a general-purpose computing environment into which various aspects of the present invention may be incorporated.

DETAILED DESCRIPTION

Various aspects of the technology described herein are generally directed towards analyzing software usage and software users, such as for the purpose of improving software products such as application programs, and improving the user experience with those software products. To this end as described below, various aspects are directed towards collecting and analyzing various application-related usage data, referred to as software instrumentation data, in an attempt to understand the usage of an application program, including concepts such as how long, how much, how often and how extensive users use the application, the use of commands by users, and/or usage trends over time.

In other aspects, the software instrumentation data includes information about the types of users that use a program, and helps to define one or more groups of users. A user interface may be provided to help define and model a user group, along with an example language to model a user group and example ways to analyze a user group. As will be understood, the use of user groups provides mechanisms for software feature usage analysis and application usage analysis.

For purposes of understanding, the technology is described herein by use of examples, including those that operate in various environments, such as internal users (e.g., corresponding to employees of the enterprise that is analyzing the software) and external users. Further, the programs exemplified herein are generally a suite of application programs such as those provided as part of the Microsoft® Office software product suite. However, as will be understood, these are only non-limiting examples, and the technology is applicable to different user environments and different software products, including individual application programs and operating system components.

As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in computing in general.

Turning to FIG. 1, there is shown a mechanism for collecting software instrumentation data 102, including a session recorder 104 that collects various data from one or more application instances 106 corresponding to various users 108₁-108_n, where n represents any practical number of users. The session recorder 104 may be per application instance/user, or may be a mechanism such as an agent on each computing device of a network that communicates with at least one data collection server component running on a network server or the like. One or more various mechanisms 110 allow a test operator or the like to set collection parameters, such as from which program or programs to collect the instrumentation data, from which users to collect data, how long a collection session should last (if the program is not ended by the user within that time limit) and so forth. Note that the mechanisms 110 represent any mechanisms that may be used at various points during which collection parameters can be set, e.g., during development of the session recorder, or as (or after) the instrumentation data is collected on the users' client machine and/or uploaded to servers.

In general, the instrumentation data 102 comprise data collected from each user session, where a session corresponds to actual usage by a user of an executing program. A typical session starts from the application start (e.g., by double clicking on the application executable or a document that launches the application executable, or by choosing the application from a start menu), and ends when the application is closed (e.g., by choosing “Exit” in the application or closing the application window). Sessions can also be time limited, e.g., if a session exceeds twenty-four hours, the session is ended and the instrumentation data to that point recorded (the application continues to run). Sessions can also end by non-normal termination of a program, e.g., because of program or system crashes.

FIG. 2 provides an example of one type of software instrumentation data 102, with some of the data (arranged in columns) collected for some number of sessions (arranged in rows); it is equivalent to have the sessions be in the columns and the rows represent the data. In one example implementation, each session is associated with some or all of the information shown in FIG. 2, including a session ID, a user ID, and an application name. Other information that is typically recorded includes the application version, a start time, an end time, the commands used during the session and still other data, such as the number of files opened and so forth. Note that in actual implementations, the instrumentation data 102 may be many thousands of dimensions.

To analyze software product usage, the software instrumentation data 102 is processed, such as to measure the overall usage of an application by a group of users. FIG. 3 shows example analysis components, including an analyzer 330 that generates a report 332 from the instrumentation data 102. Note that the data may be first pre-processed into any suitable data structure or set of data structures, such as described in the aforementioned U.S. patent applications entitled “Multidimensional Analysis Tool for High Dimensional Data” and “Efficient Data Infrastructure for High Dimensional Analysis.” Further, the data may be accessed via a client/service architecture, such as described in the aforementioned U.S. patent application entitled “Analyzing Software Usage with Instrumentation Data,” in which a data infrastructure system manages the data for the analysis and provides access to the data via APIs.

A user interface 310 (which may or may not be associated with or otherwise the same as the user interface 110 of FIG. 1) establishes the parameters, criteria and which metrics are used to analyze the instrumentation data 102. Also shown in FIG. 3 is a group modeling mechanism 334.

In one implementation, the metrics set forth in table below may be used for generating at least some of report 332 with respect to the usage of an application:

Average
Average number of application sessions by

session
these users. This indicates on average, how

count
much the users have been using the application

during a time period. One suitable

calculation is: get the session count (total

number of application sessions) of each user,

and average across the users in the group.

Average
Average time between consecutive sessions by

session
users. This indicates on average, how

frequency
frequent the users use the application. One

suitable calculation is: get the session

elapse time (the time between the end of the

last session and the end of the first session)

of each user, get the session frequency

(session elapse time divided by session count)

of each user, average across the users.

Average
Average total session time by users. This is

total
another indication of on average, how much the

running
analysis users have been using the

time
application. One suitable calculation is: get

the total running time (sum of the session

time) of each user, average across the users.

Average
Average session time by users. This indicates

session
on average, how much time users spend in each

length
session using the application. One suitable

calculation is: get the average session length

(total running time divided by session count)

of each user, average across the users.

Average
Average percentage of total commands of the

depth of
application used by users. This indicates how

usage
deep users use the application. One suitable

calculation is: get the depth of usage

(percentage of total application commands used

by the user, where total application commands

is the total number of distinct commands found

in the command stream of the application so

far, and used by the user is defined as the

command found at least once in the command

stream of the user) of each user, average

across the users. For example, users can be

characterized as beginner, intermediate,

advanced and expert users depending on their

depth of usage, or some other levels may be

used.

A distribution of the above measures can also be obtained by counting how many or what percentage of users have values that fall within an interval.

As part of the analysis processing and report 332 generation, the source of users may be specified. Some example user sources include all users from whom instrumentation data have been collected (All), users who are external customers and not internal employees of the company performing the analysis (External), users who are employees of the company performing the analysis (Internal), users who are from a particular group that the company performing the analysis has set up from which to collect data (e.g., a Study ID such as the beta participants of the next release of a software product), or another customized group.

In general, any filtering, grouping and sorting may be used in the processing of the instrumentation data; for example, a particular application and/or version for which the analysis is being conducted may be specified. The user interface 310 may be designed to help an operator filter, group and/or sort the data as desired, as well as to determine how the output should look and what results should be computed.

A typical example analysis report 332 summarizes the type of analysis performed, the parameters used (e.g., data source, program, build or version, time period of collection, user source, filtering criteria, user count and so forth). A summary section may show the metrics including session count, session frequency, average running time, average session length, and average depth of usage.

FIG. 4 represents example types of analyses that may be performed by the analyzer 330, as well as group modeling concepts. For example, the above-described application usage analysis is represented by the block labeled 440. Alternatively, or in addition to usage analysis, the usage of commands (block 442) of an application by a group of users may be measured, for example with the metrics for each command set forth in the following table:

User count
Number of users who use this command.

Percentage of
Percentage of application users who use

users
this command.

Session count
Number of sessions in which this command

occurred.

Percentage of
Percentage of application sessions in which

Sessions
this command occurred.

Click count
Number of clicks that are of this command.

Percentage of
Percentage of application clicks that are

click count
of this command.

Click count
The ratio of click count of this command

per user
and user count. This shows on average, how

many times a user uses a command.

Click count
The ratio of click count of this command

per session
and session. This shows on average, how

many times a command occurs in a session.

In this example, the application users, application sessions and application clicks described above refer to the total number of users, sessions and command clicks of the application for which command usage analysis is being performed. The total number of sessions of an application is the total number of sessions in which the application name (or other suitable identifier) that was recorded is the application of interest. The total number of users of an application is the total number of unique user identifiers (IDs) of the sessions of the application. The total number of command clicks of an application is the total number of command clicks in all the sessions of the application. Note that the application and version for which the analysis is being conducted, and the source of users, can be specified by the analyzer operator.

Another aspect with respect to analysis is referred to as trend analysis 444. More particularly, given the time information in the recorded instrumentation data, the trend of using an application may be measured, corresponding to the usage of an application over time. The application and version for which the analysis is being conducted and/or the source of users can be specified via the user interface 310. The trend data may be displayed as a table or a graph.

The period to analyze and the reporting interval may also be specified. The period to analyze can be an absolute period, e.g., the time period from a start date to an end date, or may be a relative period, e.g., each user's enrollment length, which is the time period from a user's first session to the last session. The reporting interval is the interval to report the measures, and for example may be monthly, weekly, daily, or any other suitable interval. Example measures may include:

User count
Total number of users using the build and

application during a reporting interval.

Session count
Total number of sessions of the build and

application during a reporting interval.

Session Count
Total number of sessions of the build and

Per User
application divided by the number of users

using the build and application during a

reporting interval.

Cumulative
Total number of sessions of the build and

Session Count
application divided by the number of users

Per User
using the build and application from the

start time of the period to analyze to the

end of each reporting interval.

Cumulative
Total session length of the users using the

Running Time
build and application divided by the number

Per User
of users from the start time of the period

to analyze to the end of each reporting

interval.

One or more other types of analysis may be performed, as represented in FIG. 4 by the block 446.

Other measures are directed towards users, and are represented in FIG. 4 via the group modeling block 334. User categorization (block 450) refers to categorizing users of an application based on their usage of the application. As described above, one way to categorize users is by depth of usage, which is related to the percentage of total application commands used by the user. Depth of usage is a measure of how extensively users use an application, and users can be categorized based on their depth of usage. For example, users can be categorized as “beginners” if their depth of usage is less than some threshold such as three (3.0) percent; “intermediate” if their depth of usage is between some range such as three and eight (3.0-8.0) percent; “advanced” if their depth of usage is between eight and twelve (8.0-12.0) percent; and “expert” if their depth of usage is greater than twelve (12.0) percent. Other categories and other thresholds/ranges may be used.

The commands of an application also may be clustered into representative activities of the application, as represented in FIG. 4 via block 452. For example, in a word processing program such as Microsoft® Word, the various commands can be classified into editing, formatting, managing files, viewing and navigating documents, printing, reviewing, tools, emailing, automating tasks and programmability, customization, reading and getting help.

Thus, another way to categorize users is by the types of activities in which they engage. For example, for a set of users, each of their levels of engagement in an activity can be measured by the ratio of the total number of command clicks of the activity and the total number of command clicks by the user across the sessions, such as exemplified in the table below:

Activity 1
Activity 2
Activity 3
. . .

User 1
20.0%
10.0%
1.0%
. . .

User 2
5.0%
36.0%
20.0%
. . .

User 3
2.0%
25.0%
6.0%
. . .

. . .
. . .
. . .
. . .
. . .

Using activity grouping, users can be categorized into groups based on usage, that is, each group of users may represent a type of use of the application. For example, a word processing program may have users who primarily use the editing functionalities and not much of anything else, other users who primarily use the formatting functionalities, and so forth. In this manner, analysis parameters 460 such as the application and version for which the analysis is being conducted, and the source of users can be specified via filtering criteria. The number of categories can also be specified.

Outlier analysis (block 454) refers to a type of user (a potential outlier) if his or her use of a command is substantially different from those of most other users. Various criteria can be used, such as the entropy of the occurrence distribution of each command. The smaller the entropy, the more unevenly distributed the occurrence of the command among the set of all users. For example, if the entropy is less than one-half (0.5), a first criterion is met.

More particularly, in one example implementation, an outlier is determined for a particular application, version/build and each command, by determining that if a command is only used by one user, and the average clicks per session is larger than some threshold number (e.g., 100), this user is identified as an outlier. Alternatively, if a command is used by more than one user, the entropy of the command is calculated as the following:

$P_{i} = \frac{C_{i}}{C_{total}}$

$E = - \frac{\sum_{i = 1}^{n} P_{i} \times Log (P_{i})}{Log (n)}$

where n is the total number of users who used the command, C_iis the total number of clicks of the command by user i, and C_totalis the total number of clicks of the command.

If the entropy of a command is smaller than some threshold value, (e.g., 0.5), and the average clicks per session by a user is larger than some other threshold number (e.g., 100), this user is identified as an outlier.

The outlier analysis outputs all (or some specified subset of) users who are identified as outliers, including the application for which the user is considered as an outlier, total number of application sessions the user had, the command of unusual usage, total number of times the user used the command, number of application sessions where the user used the command more than 100 times.

Additionally, the average occurrence per session of the command by this user may be considered, e.g., the total occurrence of the command divided by application session count of the user. If the average occurrence per session is greater than some number, such as one-hundred, the second criterion is met. In this example, any user who meets the two criteria can be grouped and reported; in this example, the user is likely using automation.

To use a user group in analysis, a user group is defined and can thereafter be used in software feature usage analysis and application usage analysis. In the analysis configuration, the operator can specify the “User source” to be a user group. When the operator sets the user source to be a user group, the analysis is focused to that user group.

One approach to defining a user group is to define a set of sessions that meet certain criteria (block 462) based on per session data, define a user criterion specifying users whose sessions in a session set as a whole meet a certain criterion or criteria, and allow the specifying of multiple criteria mathematically combined in some way, e.g., using Boolean logic or weighted factors. For example, basic elements to define a user group may include user group, user criterion, union, intersection, and complement. Basic elements to define a session set may include: session set, session criterion, AND, OR and NOT. For example, in a user interface, the basic elements (or user modeling controls) may be listed on the left, with the user group and session set definition (user group modeling) on the right. To define a user group, the operator can drag the basic elements from the left to add to the right, and can also change the name of a session set or user group.

In one example implementation, a session criterion includes a “dimension” and a “value.” A dimension may be any variable recorded in a session (e.g., OfficeApplication), a feature, (e.g., copy and paste, typically comprising a series of commands), and/or variables that are commonly used but are not directly recorded in a session, but rather are calculated from variables that are recorded. For example, ImportantBuild is based on several variables such as OfficeProductVer, OfficeMajorVer, OfficeMinorVer and OfficeDotBuild.

Once the operator selects a dimension, the operator may specify the value or values that are of interest. For example, if “feature” is selected as the dimension, the operator can specify a feature file.

By default, the logical relationship between session criteria is AND. In the above example, for each session in the session set, by default the operator may specify that OfficeApplication=OneNote AND ImportantBuild=Office 12 Beta 1. The operator may specify other types of logical relationships by selecting the basic elements (e.g., dragging from the left to add to the right).

Once the operator has defined a session set, the session set may be used to define a user group, e.g., by selecting and dragging a user criterion to the right. The user criterion may be named, with the user criterion condition or conditions specified that a user's sessions need to meet. For example, to be considered a “OneNote12Beta1User,” a user needs to have at least one session that corresponds to OneNote Beta1 session.

Example measures that can be used to specify conditions are listed in the table below. The measures are calculated per user, e.g., for each user of the session set. In this example, if the chosen measure of a user meets the condition specified, the user is included in the user group:

Average
Average session length of the sessions of a

Session Length
user.

Crash Ratio
Ratio of the number of sessions that

crashed (crash count) to the total number

of sessions of a user.

Depth of Usage
Percentage of total commands in software

instrumentation data that are used by a

user.

Enrollment
Time between earliest session and latest

Length
session of a user.

Failure Ratio
Ratio of the number of sessions that failed

(such as crash and hang) to total number of

sessions of a user.

MTTC (mean
The ratio of total session length to the

time to crash)
total number of sessions that crashed of a

user.

MTTF (mean
Ratio of total session length to total

time to
number of sessions that failed of a user.

failure

Session Count
Total number of sessions of a user.

Session
Average time between consecutive sessions

Frequency
of a user's sessions.

Total Running
Total session time of a user's sessions.

Time

The operator may also specify other criteria, such as that the total time since the user's first session until now needs to be less than a month.

The relationship between the user criteria in a user group is “Intersection” by default, e.g., the above examples would specify that the user group “OneNote 12 Starters” is the intersection of “OneNote12Beta1Users” and “OneNote12Beta1LessThanAMonth” users. The operator may specify other types of relationships via the basic elements, e.g., by dragging the basic elements on the left to add to the right. In this way, straightforward user interface interaction defines a user group. Note that the operator can also define a user group in other ways, e.g., via links shown when hovering on the user count of a category (“bucket”) that if selected provides a “user groups” creation dialog, wizard or the like.

To analyze a user group once defined, the instrumentation data may be queried to get results for the user group. As represented in FIG. 4, the analysis criteria/parameters 460 and user group modeling based on filtering parameters 462 thus may be used to generate the query 470. The query results may then be formatted into the report 332.

Example query results that may be included in the report 332 may include some or all of the data set forth in the following table, as well as additional data:

User Count
Total number of users who are in the user

group, i.e. who meet the criteria of the

user group.

Data
Total number of users and sessions for the

Characteristics
applications and builds.

User counts
User counts for each user criterion and for

each logical group.

Session counts
Session count for each session set, session

criterion and logical group.

FIG. 5 summarizes an overall example process, beginning at step 502 which represents collecting the software instrumentation data. As is readily understood, the software instrumentation data may be collected at any previous time, not necessarily just prior to analysis.

Step 504 represents obtaining the analysis criteria (e.g., application usage, command usage, trend analysis and/or others), and obtaining the user set, which may be all, external, internal, a user group and so forth as set above. Step 506 generates the query from the operator-input analysis and/or user filtering criteria.

Step 508 represents submitting the query against the software instrumentation data (in any appropriate format), with step 510 representing receiving the query results. Step 512 represents generating the report, which may include performing calculations on the results as needed to match the operator's requirements. For example, as described above, some of the report can include information that is not directly measured but is computed from a combination of two or more measured sets of data.

Exemplary Operating Environment

FIG. 6 illustrates an example of a suitable computing system environment 600 on which the collection, analysis and/or group modeling mechanisms (FIGS. 1 and 2) may be implemented. The computing system environment 600 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 600 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 600.

The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.

With reference to FIG. 6, an exemplary system for implementing various aspects of the invention may include a general purpose computing device in the form of a computer 610. Components of the computer 610 may include, but are not limited to, a processing unit 620, a system memory 630, and a system bus 621 that couples various system components including the system memory to the processing unit 620. The system bus 621 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.

The computer 610 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 610 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 610. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.

The system memory 630 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 631 and random access memory (RAM) 632. A basic input/output system 633 (BIOS), containing the basic routines that help to transfer information between elements within computer 610, such as during start-up, is typically stored in ROM 631. RAM 632 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 620. By way of example, and not limitation, FIG. 6 illustrates operating system 634, application programs 635, other program modules 636 and program data 637.

The computer 610 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 6 illustrates a hard disk drive 641 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 651 that reads from or writes to a removable, nonvolatile magnetic disk 652, and an optical disk drive 655 that reads from or writes to a removable, nonvolatile optical disk 656 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 641 is typically connected to the system bus 621 through a non-removable memory interface such as interface 640, and magnetic disk drive 651 and optical disk drive 655 are typically connected to the system bus 621 by a removable memory interface, such as interface 650.

The drives and their associated computer storage media, described above and illustrated in FIG. 6, provide storage of computer-readable instructions, data structures, program modules and other data for the computer 610. In FIG. 6, for example, hard disk drive 641 is illustrated as storing operating system 644, application programs 645, other program modules 646 and program data 647. Note that these components can either be the same as or different from operating system 634, application programs 635, other program modules 636, and program data 637. Operating system 644, application programs 645, other program modules 646, and program data 647 are given different numbers herein to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 610 through input devices such as a tablet, or electronic digitizer, 664, a microphone 663, a keyboard 662 and pointing device 661, commonly referred to as mouse, trackball or touch pad. Other input devices not shown in FIG. 6 may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 620 through a user input interface 660 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 691 or other type of display device is also connected to the system bus 621 via an interface, such as a video interface 690. The monitor 691 may also be integrated with a touch-screen panel or the like. Note that the monitor and/or touch screen panel can be physically coupled to a housing in which the computing device 610 is incorporated, such as in a tablet-type personal computer. In addition, computers such as the computing device 610 may also include other peripheral output devices such as speakers 695 and printer 696, which may be connected through an output peripheral interface 694 or the like.

The computer 610 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 680. The remote computer 680 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 610, although only a memory storage device 681 has been illustrated in FIG. 6. The logical connections depicted in FIG. 6 include one or more local area networks (LAN) 671 and one or more wide area networks (WAN) 673, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.

When used in a LAN networking environment, the computer 610 is connected to the LAN 671 through a network interface or adapter 670. When used in a WAN networking environment, the computer 610 typically includes a modem 672 or other means for establishing communications over the WAN 673, such as the Internet. The modem 672, which may be internal or external, may be connected to the system bus 621 via the user input interface 660 or other appropriate mechanism. A wireless networking component 674 such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN. In a networked environment, program modules depicted relative to the computer 610, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 6 illustrates remote application programs 685 as residing on memory device 681. It may be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.

An auxiliary subsystem 699 (e.g., for auxiliary display of content) may be connected via the user interface 660 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state. The auxiliary subsystem 699 may be connected to the modem 672 and/or network interface 670 to allow communication between these systems while the main processing unit 620 is in a low power state.

Conclusion

While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.

Claims

1. In a computing environment, a method comprising: analyzing software instrumentation data collected from user sessions corresponding to one or more programs and one or more commands associated with the user sessions, wherein the analyzing the software instrumentation data includes determining program usage metrics, command usage metrics and at least one usage trend over time; andupon determining the program usage metrics, command usage metrics and at least one usage trend over time, outputting information representative of at least one of the program usage metrics, the command usage metrics, and at least one usage trend over time.
2. The method of claim 1 further comprising, collecting the instrumentation data during actual user sessions.
3. The method of claim 1 wherein determining the program usage metrics comprises determining for a set of users, session length information based on session time and session count, depth of usage information based on a percentage of commands used, and at least one of: session count information based on a number of application sessions, session frequency information based on a time measurement between sessions, running time information based on session time, session length information based on session time and session count.
4. The method of claim 1 wherein determining the command usage metrics comprises determining for a set of users and a selected command, session count information based on a number of sessions in which the selected command occurred, percentage of users information corresponding to a percentage of users of the set who use the selected command, and at least one of: user count information based on a number of users of the set who use the selected command, percentage of session information corresponding to a percentage of application sessions in which the selected command was used, click count information corresponding to a number of clicks corresponding to the selected command, percentage of click count information corresponding to a percentage of program clicks corresponding to the selected command, click count per user information based on click count and user count of the selected command, and click count per session information corresponding to a click count per session.
5. The method of claim 1 wherein analyzing the software instrumentation data to determine at least one usage trend over time comprises determining at least one of: user count information corresponding to a number of users that used a program during a reporting interval, session count information corresponding to a number of sessions of a program during a reporting interval, session count per user information based on a number of sessions and a total number of users that used a program during a reporting interval, running time per user information corresponding to a session length of users and a number of users that used a program during a reporting interval, cumulative session count per user information corresponding to a total number of sessions and a number of users from a start time of a period to analyze to an end of each reporting interval, and cumulative running time per user information corresponding to a session length of users that used a program and a number of users from a start time of a period to analyze to an end of each reporting interval.
6. The method of claim 1 further comprising, analyzing the software instrumentation data to determine at least one type of user.
7. The method of claim 6 wherein analyzing the software instrumentation data to determine at least one type of user comprises categorizing users by at least one of depth of usage and categorizing users by the types of activities in which they engage.
8. The method of claim 1 further comprising, analyzing the software instrumentation data to determine at least one potential outlier corresponding to command usage that appears different from command usage of other users.
9. The method of claim 1 wherein analyzing the software instrumentation data to determine at least one potential outlier comprises computing an entropy value corresponding to an occurrence distribution of a command, computing an average occurrence per session of the command usage by a potential outlier, and using the entropy value or the average occurrence per session, or both, as outlier criterion or criteria.
10. The method of claim 1 further comprising, locating a subset of the user sessions based on at least one session criterion, wherein each session criteria comprises a dimension and a value for that dimension, and each dimension comprises a variable recorded in a session, a feature, or a set of one or more variables computed from a plurality of variables recorded in a session, and wherein analyzing the information corresponding to the software instrumentation data comprises analyzing the subset.
11. The method of claim 10 wherein locating the subset comprises mathematically combining each of at least two session criteria.
12. The method of claim 1 further comprising, providing a mechanism for modeling a user group, including providing an interface for receiving one or more measures that specify one or more conditions that any user needs to meet in order to belong to the user group, and locating the users from the software instrumentation data based on the one or more measures.
13. The method of claim 12 wherein locating the users comprises determining at least one of: session length information corresponding to a session length of the sessions of a user, crash information corresponding to a number of sessions of a user that crashed, depth of usage information corresponding to which commands were used by a user, enrollment length information corresponding to a time between an earliest session and a latest session of a user, failure information corresponding to a number of sessions that failed of a user, mean time to crash information corresponding to session length and sessions of a user that crashed, mean time to failure information corresponding to session length and number of sessions of a user that failed, session count information corresponding to a total number of sessions of a user, session frequency information corresponding to time between consecutive sessions of a user, and total running time information corresponding to total session time of sessions of a user.
14. A computer-readable storage medium having computer executable instructions, which when executed perform steps comprising, analyzing software instrumentation data collected from user sessions corresponding to one or more programs and one or more commands associated with the user sessions, wherein the analyzing the software instrumentation data includes determining program usage metrics and command usage metrics and at least one usage trend over time; andupon determining the program usage metrics and, command usage metrics and at least one usage trend over time, outputting information representative of at least one of the program usage metrics, the command usage metrics, and at least one usage trend over time.
15. In a computing environment, a system comprising: an analyzer that processes information corresponding to software instrumentation data recorded from user software program usage sessions to produce a first subset comprising software usage data; wherein the analyzer comprises means for performing a command usage analysis, an application usage analysis, and a trend analysis; a group modeling mechanism that processes the information corresponding to the software instrumentation data and a second subset comprising user data; andmeans for combining the first subset with the second subset to provide an output that corresponds to a selected group of users and their software program usage.
16. The system of claim 15 further comprising a user interface for facilitating selection of one or more criteria by which the first subset and second subset are located from the information.
17. The system of claim 15 further comprising means for recording the software instrumentation data.

US Referenced Citations (105)

Number	Name	Date	Kind
5500941	Gil	Mar 1996	A
5542070	LeBlanc et al.	Jul 1996	A
5548718	Siegel et al.	Aug 1996	A
5619709	Caid et al.	Apr 1997	A
5774660	Brendel et al.	Jun 1998	A
5867144	Wyard	Feb 1999	A
5903886	Heimlich et al.	May 1999	A
5956720	Fernandez et al.	Sep 1999	A
6046741	Hochmuth	Apr 2000	A
6079032	Peri	Jun 2000	A
6128279	O'Neil et al.	Oct 2000	A
6131082	Hargrave, III et al.	Oct 2000	A
6138159	Phaal	Oct 2000	A
6144962	Weinberg et al.	Nov 2000	A
6154746	Berchtold et al.	Nov 2000	A
6167358	Othmer et al.	Dec 2000	A
6182139	Brendel	Jan 2001	B1
6233570	Horvitz et al.	May 2001	B1
6237138	Hameluck et al.	May 2001	B1
6237143	Fontana et al.	May 2001	B1
6260050	Yost et al.	Jul 2001	B1
6317750	Tortolani et al.	Nov 2001	B1
6374369	O'Donnell	Apr 2002	B1
6385604	Bakalash et al.	May 2002	B1
6418427	Egilsson et al.	Jul 2002	B1
6434544	Bakalash et al.	Aug 2002	B1
6564174	Ding et al.	May 2003	B1
6567796	Yost et al.	May 2003	B1
6587970	Wang et al.	Jul 2003	B1
6601062	Deshpande et al.	Jul 2003	B1
6633782	Schleiss et al.	Oct 2003	B1
6662362	Arora et al.	Dec 2003	B1
6701363	Chiu et al.	Mar 2004	B1
6714940	Kelkar	Mar 2004	B2
6748555	Teegan et al.	Jun 2004	B1
6754312	Gundlach	Jun 2004	B1
6768986	Cras et al.	Jul 2004	B2
6801940	Moran et al.	Oct 2004	B1
6816898	Scarpelli et al.	Nov 2004	B1
6845474	Circenis et al.	Jan 2005	B2
6862696	Voas et al.	Mar 2005	B1
6901347	Murray et al.	May 2005	B1
6901536	Davenport	May 2005	B2
6912692	Pappas	Jun 2005	B1
6963826	Hanaman et al.	Nov 2005	B2
7003766	Hong	Feb 2006	B1
7028225	Maso et al.	Apr 2006	B2
7032214	Rodrigues et al.	Apr 2006	B1
7039166	Peterson et al.	May 2006	B1
7062483	Ferrari et al.	Jun 2006	B2
7111282	Stephenson	Sep 2006	B2
7117208	Tamayo et al.	Oct 2006	B2
7131070	Motoyama et al.	Oct 2006	B1
7171406	Chen et al.	Jan 2007	B2
7185231	Mullally et al.	Feb 2007	B2
7194386	Parikh et al.	Mar 2007	B1
7197447	Susskind	Mar 2007	B2
7216341	Guarraci	May 2007	B2
7315849	Bakalash et al.	Jan 2008	B2
7333982	Bakalash et al.	Feb 2008	B2
7392248	Bakalash et al.	Jun 2008	B2
7401331	Leung	Jul 2008	B2
20010044705	Vardi et al.	Nov 2001	A1
20020083003	Halliday et al.	Jun 2002	A1
20020144124	Remer et al.	Oct 2002	A1
20030009507	Shum	Jan 2003	A1
20030115207	Bowman et al.	Jun 2003	A1
20040049505	Pennock	Mar 2004	A1
20040088699	Suresh	May 2004	A1
20040117760	McFarling	Jun 2004	A1
20040122646	Colossi et al.	Jun 2004	A1
20040133882	Angel et al.	Jul 2004	A1
20040191743	Chiu et al.	Sep 2004	A1
20040230858	Susskind	Nov 2004	A1
20050015683	Clark et al.	Jan 2005	A1
20050021293	Elbel et al.	Jan 2005	A1
20050065910	Welton et al.	Mar 2005	A1
20050071807	Yanavi	Mar 2005	A1
20050081206	Armstrong et al.	Apr 2005	A1
20050125777	Calder et al.	Jun 2005	A1
20050131924	Jones	Jun 2005	A1
20050182750	Krishna et al.	Aug 2005	A1
20050183143	Anderholm et al.	Aug 2005	A1
20050278290	Bruce et al.	Dec 2005	A1
20060075399	Loh et al.	Apr 2006	A1
20060106793	Liang	May 2006	A1
20060116981	Krimmel et al.	Jun 2006	A1
20060174346	Carroll et al.	Aug 2006	A1
20060218138	Weare	Sep 2006	A1
20060242636	Chilimbi et al.	Oct 2006	A1
20060242638	Lew et al.	Oct 2006	A1
20060259981	Ben-Shoshan	Nov 2006	A1
20070016672	Wilson et al.	Jan 2007	A1
20070033201	Stienhans	Feb 2007	A1
20070038974	Albahari et al.	Feb 2007	A1
20070038983	Stienhans	Feb 2007	A1
20070039009	Collazo	Feb 2007	A1
20070101311	Castelli et al.	May 2007	A1
20080127120	Kosche et al.	May 2008	A1
20080312899	Li et al.	Dec 2008	A1
20080313149	Li et al.	Dec 2008	A1
20080313184	Li et al.	Dec 2008	A1
20080313213	Zhang et al.	Dec 2008	A1
20080313507	Mahmud et al.	Dec 2008	A1
20080313633	Zhu et al.	Dec 2008	A1

Foreign Referenced Citations (3)

Number	Date	Country
1083486	Mar 2001	EP
WO 0075814	Dec 2000	WO
WO 0175678	Oct 2001	WO

Related Publications (1)

	Number	Date	Country
	20080313617 A1	Dec 2008	US

Analyzing software users with instrumentation data and user group modeling and analysis

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Term Extension

Abstract

Description

Claims

US Referenced Citations (105)

Foreign Referenced Citations (3)

Related Publications (1)