Analyzing software users with instrumentation data and user group modeling and analysis

Abstract
Described is a technology by which software instrumentation data collected from user program sessions are analyzed, including by determining program usage metrics and/or command usage metrics. Information representative of the program usage metrics and/or the command usage metrics is output, such as in the form of a report. The software instrumentation data may be further analyzed, such as to determine at least one usage trend over time, and to determine user groups. For example, a usage subset of sessions that meet specified session usage criteria based on a set of session data may be located, along with a subset of users based on users whose sessions meet specified user criteria. The usage and user subsets may be combined via Boolean logic to produce a result set.
Description
BACKGROUND

Understanding the way in which software users use software can be very valuable when working to improve the effectiveness and ease of use of software applications. Traditional ways to analyze software users include usability studies, user interviews, user surveys and the like.


Various data can be collected during actual software usage to obtain information related to how users use and otherwise interact with a software program. However, analyzing that data to obtain useful information about the users, including how to model and analyze a specific group of users, is a difficult problem.


SUMMARY

This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.


Briefly, various aspects of the subject matter described herein are directed towards a technology by which software instrumentation data collected from user sessions corresponding to one or more programs is analyzed, including by determining program usage metrics and/or command usage metrics. Information representative of the program usage metrics and/or the command usage metrics is output, such as in the form of a report. The software instrumentation data may be further analyzed, such as to determine at least one usage trend over time, and to determine user groups.


Examples of program usage metrics include session count information based on a number of application sessions, session frequency information based on a time measurement between sessions, running time information based on session time, session length information based on session time and session count, and/or depth of usage information based on a percentage of commands used. Examples of command usage metrics include user count information based on a number of users of the set who use the selected command, percentage of users information corresponding to a percentage of users of the set who use the selected command, session count information based on a number of sessions in which the selected command occurred, percentage of session information corresponding to a percentage of application sessions in which the selected command was used, click count information corresponding to a number of clicks corresponding to the selected command, percentage of click count information corresponding to a percentage of program clicks corresponding to the selected command, click count per user information based on click count and user count of the selected command, and/or click count per session information corresponding to a click count per session.


The software instrumentation data may be analyzed to determine at least one type of user, and for modeling a user group. For example, users may be categorized by their depth of usage, and/or by the types of activities in which they engage. Potential outliers may be identified based on command usage that is significantly different from the command usage of other users. Users may be located from their sessions based on session criterion comprising a dimension and a value for that dimension, where each dimension comprises a variable recorded in a session, a feature, or results computed from a plurality of variables.


A subset of sessions that meet specified session criteria based on a set of session data may be located, along with a subset of users based on users whose sessions meet specified user criteria. The subsets may be combined via Boolean logic to produce a result set.


Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.





BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:



FIG. 1 shows an example representation of recording software instrumentation data for subsequent analysis.



FIG. 2 shows a representation in a table format of example software instrumentation (e.g., software quality metrics) data saved for various program usage sessions by users of a suite of application programs.



FIG. 3 shows an example representation of an analyzer for analyzing software instrumentation data and a mechanism for modeling groups of users based on the software instrumentation data.



FIG. 4 shows a representation of various example concepts related to analyzing software instrumentation data and/or modeling groups of users.



FIG. 5 is a flow diagram representing various example concepts related to analyzing software instrumentation data and/or modeling groups of users.



FIG. 6 is an illustrative example of a general-purpose computing environment into which various aspects of the present invention may be incorporated.





DETAILED DESCRIPTION

Various aspects of the technology described herein are generally directed towards analyzing software usage and software users, such as for the purpose of improving software products such as application programs, and improving the user experience with those software products. To this end as described below, various aspects are directed towards collecting and analyzing various application-related usage data, referred to as software instrumentation data, in an attempt to understand the usage of an application program, including concepts such as how long, how much, how often and how extensive users use the application, the use of commands by users, and/or usage trends over time.


In other aspects, the software instrumentation data includes information about the types of users that use a program, and helps to define one or more groups of users. A user interface may be provided to help define and model a user group, along with an example language to model a user group and example ways to analyze a user group. As will be understood, the use of user groups provides mechanisms for software feature usage analysis and application usage analysis.


For purposes of understanding, the technology is described herein by use of examples, including those that operate in various environments, such as internal users (e.g., corresponding to employees of the enterprise that is analyzing the software) and external users. Further, the programs exemplified herein are generally a suite of application programs such as those provided as part of the Microsoft® Office software product suite. However, as will be understood, these are only non-limiting examples, and the technology is applicable to different user environments and different software products, including individual application programs and operating system components.


As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in computing in general.


Turning to FIG. 1, there is shown a mechanism for collecting software instrumentation data 102, including a session recorder 104 that collects various data from one or more application instances 106 corresponding to various users 1081-108n, where n represents any practical number of users. The session recorder 104 may be per application instance/user, or may be a mechanism such as an agent on each computing device of a network that communicates with at least one data collection server component running on a network server or the like. One or more various mechanisms 110 allow a test operator or the like to set collection parameters, such as from which program or programs to collect the instrumentation data, from which users to collect data, how long a collection session should last (if the program is not ended by the user within that time limit) and so forth. Note that the mechanisms 110 represent any mechanisms that may be used at various points during which collection parameters can be set, e.g., during development of the session recorder, or as (or after) the instrumentation data is collected on the users' client machine and/or uploaded to servers.


In general, the instrumentation data 102 comprise data collected from each user session, where a session corresponds to actual usage by a user of an executing program. A typical session starts from the application start (e.g., by double clicking on the application executable or a document that launches the application executable, or by choosing the application from a start menu), and ends when the application is closed (e.g., by choosing “Exit” in the application or closing the application window). Sessions can also be time limited, e.g., if a session exceeds twenty-four hours, the session is ended and the instrumentation data to that point recorded (the application continues to run). Sessions can also end by non-normal termination of a program, e.g., because of program or system crashes.



FIG. 2 provides an example of one type of software instrumentation data 102, with some of the data (arranged in columns) collected for some number of sessions (arranged in rows); it is equivalent to have the sessions be in the columns and the rows represent the data. In one example implementation, each session is associated with some or all of the information shown in FIG. 2, including a session ID, a user ID, and an application name. Other information that is typically recorded includes the application version, a start time, an end time, the commands used during the session and still other data, such as the number of files opened and so forth. Note that in actual implementations, the instrumentation data 102 may be many thousands of dimensions.


To analyze software product usage, the software instrumentation data 102 is processed, such as to measure the overall usage of an application by a group of users. FIG. 3 shows example analysis components, including an analyzer 330 that generates a report 332 from the instrumentation data 102. Note that the data may be first pre-processed into any suitable data structure or set of data structures, such as described in the aforementioned U.S. patent applications entitled “Multidimensional Analysis Tool for High Dimensional Data” and “Efficient Data Infrastructure for High Dimensional Analysis.” Further, the data may be accessed via a client/service architecture, such as described in the aforementioned U.S. patent application entitled “Analyzing Software Usage with Instrumentation Data,” in which a data infrastructure system manages the data for the analysis and provides access to the data via APIs.


A user interface 310 (which may or may not be associated with or otherwise the same as the user interface 110 of FIG. 1) establishes the parameters, criteria and which metrics are used to analyze the instrumentation data 102. Also shown in FIG. 3 is a group modeling mechanism 334.


In one implementation, the metrics set forth in table below may be used for generating at least some of report 332 with respect to the usage of an application:


















Average
Average number of application sessions by



session
these users. This indicates on average, how



count
much the users have been using the application




during a time period. One suitable




calculation is: get the session count (total




number of application sessions) of each user,




and average across the users in the group.



Average
Average time between consecutive sessions by



session
users. This indicates on average, how



frequency
frequent the users use the application. One




suitable calculation is: get the session




elapse time (the time between the end of the




last session and the end of the first session)




of each user, get the session frequency




(session elapse time divided by session count)




of each user, average across the users.



Average
Average total session time by users. This is



total
another indication of on average, how much the



running
analysis users have been using the



time
application. One suitable calculation is: get




the total running time (sum of the session




time) of each user, average across the users.



Average
Average session time by users. This indicates



session
on average, how much time users spend in each



length
session using the application. One suitable




calculation is: get the average session length




(total running time divided by session count)




of each user, average across the users.



Average
Average percentage of total commands of the



depth of
application used by users. This indicates how



usage
deep users use the application. One suitable




calculation is: get the depth of usage




(percentage of total application commands used




by the user, where total application commands




is the total number of distinct commands found




in the command stream of the application so




far, and used by the user is defined as the




command found at least once in the command




stream of the user) of each user, average




across the users. For example, users can be




characterized as beginner, intermediate,




advanced and expert users depending on their




depth of usage, or some other levels may be




used.











A distribution of the above measures can also be obtained by counting how many or what percentage of users have values that fall within an interval.


As part of the analysis processing and report 332 generation, the source of users may be specified. Some example user sources include all users from whom instrumentation data have been collected (All), users who are external customers and not internal employees of the company performing the analysis (External), users who are employees of the company performing the analysis (Internal), users who are from a particular group that the company performing the analysis has set up from which to collect data (e.g., a Study ID such as the beta participants of the next release of a software product), or another customized group.


In general, any filtering, grouping and sorting may be used in the processing of the instrumentation data; for example, a particular application and/or version for which the analysis is being conducted may be specified. The user interface 310 may be designed to help an operator filter, group and/or sort the data as desired, as well as to determine how the output should look and what results should be computed.


A typical example analysis report 332 summarizes the type of analysis performed, the parameters used (e.g., data source, program, build or version, time period of collection, user source, filtering criteria, user count and so forth). A summary section may show the metrics including session count, session frequency, average running time, average session length, and average depth of usage.



FIG. 4 represents example types of analyses that may be performed by the analyzer 330, as well as group modeling concepts. For example, the above-described application usage analysis is represented by the block labeled 440. Alternatively, or in addition to usage analysis, the usage of commands (block 442) of an application by a group of users may be measured, for example with the metrics for each command set forth in the following table:


















User count
Number of users who use this command.



Percentage of
Percentage of application users who use



users
this command.



Session count
Number of sessions in which this command




occurred.



Percentage of
Percentage of application sessions in which



Sessions
this command occurred.



Click count
Number of clicks that are of this command.



Percentage of
Percentage of application clicks that are



click count
of this command.



Click count
The ratio of click count of this command



per user
and user count. This shows on average, how




many times a user uses a command.



Click count
The ratio of click count of this command



per session
and session. This shows on average, how




many times a command occurs in a session.










In this example, the application users, application sessions and application clicks described above refer to the total number of users, sessions and command clicks of the application for which command usage analysis is being performed. The total number of sessions of an application is the total number of sessions in which the application name (or other suitable identifier) that was recorded is the application of interest. The total number of users of an application is the total number of unique user identifiers (IDs) of the sessions of the application. The total number of command clicks of an application is the total number of command clicks in all the sessions of the application. Note that the application and version for which the analysis is being conducted, and the source of users, can be specified by the analyzer operator.


Another aspect with respect to analysis is referred to as trend analysis 444. More particularly, given the time information in the recorded instrumentation data, the trend of using an application may be measured, corresponding to the usage of an application over time. The application and version for which the analysis is being conducted and/or the source of users can be specified via the user interface 310. The trend data may be displayed as a table or a graph.


The period to analyze and the reporting interval may also be specified. The period to analyze can be an absolute period, e.g., the time period from a start date to an end date, or may be a relative period, e.g., each user's enrollment length, which is the time period from a user's first session to the last session. The reporting interval is the interval to report the measures, and for example may be monthly, weekly, daily, or any other suitable interval. Example measures may include:


















User count
Total number of users using the build and




application during a reporting interval.



Session count
Total number of sessions of the build and




application during a reporting interval.



Session Count
Total number of sessions of the build and



Per User
application divided by the number of users




using the build and application during a




reporting interval.



Cumulative
Total number of sessions of the build and



Session Count
application divided by the number of users



Per User
using the build and application from the




start time of the period to analyze to the




end of each reporting interval.



Cumulative
Total session length of the users using the



Running Time
build and application divided by the number



Per User
of users from the start time of the period




to analyze to the end of each reporting




interval.










One or more other types of analysis may be performed, as represented in FIG. 4 by the block 446.


Other measures are directed towards users, and are represented in FIG. 4 via the group modeling block 334. User categorization (block 450) refers to categorizing users of an application based on their usage of the application. As described above, one way to categorize users is by depth of usage, which is related to the percentage of total application commands used by the user. Depth of usage is a measure of how extensively users use an application, and users can be categorized based on their depth of usage. For example, users can be categorized as “beginners” if their depth of usage is less than some threshold such as three (3.0) percent; “intermediate” if their depth of usage is between some range such as three and eight (3.0-8.0) percent; “advanced” if their depth of usage is between eight and twelve (8.0-12.0) percent; and “expert” if their depth of usage is greater than twelve (12.0) percent. Other categories and other thresholds/ranges may be used.


The commands of an application also may be clustered into representative activities of the application, as represented in FIG. 4 via block 452. For example, in a word processing program such as Microsoft® Word, the various commands can be classified into editing, formatting, managing files, viewing and navigating documents, printing, reviewing, tools, emailing, automating tasks and programmability, customization, reading and getting help.


Thus, another way to categorize users is by the types of activities in which they engage. For example, for a set of users, each of their levels of engagement in an activity can be measured by the ratio of the total number of command clicks of the activity and the total number of command clicks by the user across the sessions, such as exemplified in the table below:


















Activity 1
Activity 2
Activity 3
. . .




















User 1
20.0% 
10.0%
1.0%
. . .


User 2
5.0%
36.0%
20.0% 
. . .


User 3
2.0%
25.0%
6.0%
. . .


. . .
. . .
. . .
. . .
. . .









Using activity grouping, users can be categorized into groups based on usage, that is, each group of users may represent a type of use of the application. For example, a word processing program may have users who primarily use the editing functionalities and not much of anything else, other users who primarily use the formatting functionalities, and so forth. In this manner, analysis parameters 460 such as the application and version for which the analysis is being conducted, and the source of users can be specified via filtering criteria. The number of categories can also be specified.


Outlier analysis (block 454) refers to a type of user (a potential outlier) if his or her use of a command is substantially different from those of most other users. Various criteria can be used, such as the entropy of the occurrence distribution of each command. The smaller the entropy, the more unevenly distributed the occurrence of the command among the set of all users. For example, if the entropy is less than one-half (0.5), a first criterion is met.


More particularly, in one example implementation, an outlier is determined for a particular application, version/build and each command, by determining that if a command is only used by one user, and the average clicks per session is larger than some threshold number (e.g., 100), this user is identified as an outlier. Alternatively, if a command is used by more than one user, the entropy of the command is calculated as the following:







P
i

=


C
i


C
total








E
=

-





i
=
1

n




P
i

×

Log


(

P
i

)





Log


(
n
)








where n is the total number of users who used the command, Ci is the total number of clicks of the command by user i, and Ctotal is the total number of clicks of the command.


If the entropy of a command is smaller than some threshold value, (e.g., 0.5), and the average clicks per session by a user is larger than some other threshold number (e.g., 100), this user is identified as an outlier.


The outlier analysis outputs all (or some specified subset of) users who are identified as outliers, including the application for which the user is considered as an outlier, total number of application sessions the user had, the command of unusual usage, total number of times the user used the command, number of application sessions where the user used the command more than 100 times.


Additionally, the average occurrence per session of the command by this user may be considered, e.g., the total occurrence of the command divided by application session count of the user. If the average occurrence per session is greater than some number, such as one-hundred, the second criterion is met. In this example, any user who meets the two criteria can be grouped and reported; in this example, the user is likely using automation.


To use a user group in analysis, a user group is defined and can thereafter be used in software feature usage analysis and application usage analysis. In the analysis configuration, the operator can specify the “User source” to be a user group. When the operator sets the user source to be a user group, the analysis is focused to that user group.


One approach to defining a user group is to define a set of sessions that meet certain criteria (block 462) based on per session data, define a user criterion specifying users whose sessions in a session set as a whole meet a certain criterion or criteria, and allow the specifying of multiple criteria mathematically combined in some way, e.g., using Boolean logic or weighted factors. For example, basic elements to define a user group may include user group, user criterion, union, intersection, and complement. Basic elements to define a session set may include: session set, session criterion, AND, OR and NOT. For example, in a user interface, the basic elements (or user modeling controls) may be listed on the left, with the user group and session set definition (user group modeling) on the right. To define a user group, the operator can drag the basic elements from the left to add to the right, and can also change the name of a session set or user group.


In one example implementation, a session criterion includes a “dimension” and a “value.” A dimension may be any variable recorded in a session (e.g., OfficeApplication), a feature, (e.g., copy and paste, typically comprising a series of commands), and/or variables that are commonly used but are not directly recorded in a session, but rather are calculated from variables that are recorded. For example, ImportantBuild is based on several variables such as OfficeProductVer, OfficeMajorVer, OfficeMinorVer and OfficeDotBuild.


Once the operator selects a dimension, the operator may specify the value or values that are of interest. For example, if “feature” is selected as the dimension, the operator can specify a feature file.


By default, the logical relationship between session criteria is AND. In the above example, for each session in the session set, by default the operator may specify that OfficeApplication=OneNote AND ImportantBuild=Office 12 Beta 1. The operator may specify other types of logical relationships by selecting the basic elements (e.g., dragging from the left to add to the right).


Once the operator has defined a session set, the session set may be used to define a user group, e.g., by selecting and dragging a user criterion to the right. The user criterion may be named, with the user criterion condition or conditions specified that a user's sessions need to meet. For example, to be considered a “OneNote12Beta1User,” a user needs to have at least one session that corresponds to OneNote Beta1 session.


Example measures that can be used to specify conditions are listed in the table below. The measures are calculated per user, e.g., for each user of the session set. In this example, if the chosen measure of a user meets the condition specified, the user is included in the user group:


















Average
Average session length of the sessions of a



Session Length
user.



Crash Ratio
Ratio of the number of sessions that




crashed (crash count) to the total number




of sessions of a user.



Depth of Usage
Percentage of total commands in software




instrumentation data that are used by a




user.



Enrollment
Time between earliest session and latest



Length
session of a user.



Failure Ratio
Ratio of the number of sessions that failed




(such as crash and hang) to total number of




sessions of a user.



MTTC (mean
The ratio of total session length to the



time to crash)
total number of sessions that crashed of a




user.



MTTF (mean
Ratio of total session length to total



time to
number of sessions that failed of a user.



failure



Session Count
Total number of sessions of a user.



Session
Average time between consecutive sessions



Frequency
of a user's sessions.



Total Running
Total session time of a user's sessions.



Time










The operator may also specify other criteria, such as that the total time since the user's first session until now needs to be less than a month.


The relationship between the user criteria in a user group is “Intersection” by default, e.g., the above examples would specify that the user group “OneNote 12 Starters” is the intersection of “OneNote12Beta1Users” and “OneNote12Beta1LessThanAMonth” users. The operator may specify other types of relationships via the basic elements, e.g., by dragging the basic elements on the left to add to the right. In this way, straightforward user interface interaction defines a user group. Note that the operator can also define a user group in other ways, e.g., via links shown when hovering on the user count of a category (“bucket”) that if selected provides a “user groups” creation dialog, wizard or the like.


To analyze a user group once defined, the instrumentation data may be queried to get results for the user group. As represented in FIG. 4, the analysis criteria/parameters 460 and user group modeling based on filtering parameters 462 thus may be used to generate the query 470. The query results may then be formatted into the report 332.


Example query results that may be included in the report 332 may include some or all of the data set forth in the following table, as well as additional data:


















User Count
Total number of users who are in the user




group, i.e. who meet the criteria of the




user group.



Data
Total number of users and sessions for the



Characteristics
applications and builds.



User counts
User counts for each user criterion and for




each logical group.



Session counts
Session count for each session set, session




criterion and logical group.











FIG. 5 summarizes an overall example process, beginning at step 502 which represents collecting the software instrumentation data. As is readily understood, the software instrumentation data may be collected at any previous time, not necessarily just prior to analysis.


Step 504 represents obtaining the analysis criteria (e.g., application usage, command usage, trend analysis and/or others), and obtaining the user set, which may be all, external, internal, a user group and so forth as set above. Step 506 generates the query from the operator-input analysis and/or user filtering criteria.


Step 508 represents submitting the query against the software instrumentation data (in any appropriate format), with step 510 representing receiving the query results. Step 512 represents generating the report, which may include performing calculations on the results as needed to match the operator's requirements. For example, as described above, some of the report can include information that is not directly measured but is computed from a combination of two or more measured sets of data.


Exemplary Operating Environment


FIG. 6 illustrates an example of a suitable computing system environment 600 on which the collection, analysis and/or group modeling mechanisms (FIGS. 1 and 2) may be implemented. The computing system environment 600 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 600 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 600.


The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.


The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.


With reference to FIG. 6, an exemplary system for implementing various aspects of the invention may include a general purpose computing device in the form of a computer 610. Components of the computer 610 may include, but are not limited to, a processing unit 620, a system memory 630, and a system bus 621 that couples various system components including the system memory to the processing unit 620. The system bus 621 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.


The computer 610 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 610 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 610. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.


The system memory 630 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 631 and random access memory (RAM) 632. A basic input/output system 633 (BIOS), containing the basic routines that help to transfer information between elements within computer 610, such as during start-up, is typically stored in ROM 631. RAM 632 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 620. By way of example, and not limitation, FIG. 6 illustrates operating system 634, application programs 635, other program modules 636 and program data 637.


The computer 610 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 6 illustrates a hard disk drive 641 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 651 that reads from or writes to a removable, nonvolatile magnetic disk 652, and an optical disk drive 655 that reads from or writes to a removable, nonvolatile optical disk 656 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 641 is typically connected to the system bus 621 through a non-removable memory interface such as interface 640, and magnetic disk drive 651 and optical disk drive 655 are typically connected to the system bus 621 by a removable memory interface, such as interface 650.


The drives and their associated computer storage media, described above and illustrated in FIG. 6, provide storage of computer-readable instructions, data structures, program modules and other data for the computer 610. In FIG. 6, for example, hard disk drive 641 is illustrated as storing operating system 644, application programs 645, other program modules 646 and program data 647. Note that these components can either be the same as or different from operating system 634, application programs 635, other program modules 636, and program data 637. Operating system 644, application programs 645, other program modules 646, and program data 647 are given different numbers herein to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 610 through input devices such as a tablet, or electronic digitizer, 664, a microphone 663, a keyboard 662 and pointing device 661, commonly referred to as mouse, trackball or touch pad. Other input devices not shown in FIG. 6 may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 620 through a user input interface 660 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 691 or other type of display device is also connected to the system bus 621 via an interface, such as a video interface 690. The monitor 691 may also be integrated with a touch-screen panel or the like. Note that the monitor and/or touch screen panel can be physically coupled to a housing in which the computing device 610 is incorporated, such as in a tablet-type personal computer. In addition, computers such as the computing device 610 may also include other peripheral output devices such as speakers 695 and printer 696, which may be connected through an output peripheral interface 694 or the like.


The computer 610 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 680. The remote computer 680 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 610, although only a memory storage device 681 has been illustrated in FIG. 6. The logical connections depicted in FIG. 6 include one or more local area networks (LAN) 671 and one or more wide area networks (WAN) 673, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.


When used in a LAN networking environment, the computer 610 is connected to the LAN 671 through a network interface or adapter 670. When used in a WAN networking environment, the computer 610 typically includes a modem 672 or other means for establishing communications over the WAN 673, such as the Internet. The modem 672, which may be internal or external, may be connected to the system bus 621 via the user input interface 660 or other appropriate mechanism. A wireless networking component 674 such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN. In a networked environment, program modules depicted relative to the computer 610, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 6 illustrates remote application programs 685 as residing on memory device 681. It may be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.


An auxiliary subsystem 699 (e.g., for auxiliary display of content) may be connected via the user interface 660 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state. The auxiliary subsystem 699 may be connected to the modem 672 and/or network interface 670 to allow communication between these systems while the main processing unit 620 is in a low power state.


CONCLUSION

While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.

Claims
  • 1. In a computing environment, a method comprising: analyzing information corresponding to software instrumentation data collected from user sessions corresponding to one or more programs, including determining program usage metrics or determining command usage metrics, or determining program usage metrics and command usage metrics; andoutputting information representative of the program usage metrics, the command usage metrics, or both the program usage metrics and the command usage metrics.
  • 2. The method of claim 1 further comprising, collecting the instrumentation data during actual user sessions.
  • 3. The method of claim 1 wherein determining the program usage metrics comprises determining for a set of users at least one of: session count information based on a number of application sessions, session frequency information based on a time measurement between sessions, running time information based on session time, session length information based on session time and session count, or depth of usage information based on a percentage of commands used, or any combination of session count information, session frequency information, running time information, session length information or depth of usage information.
  • 4. The method of claim 1 wherein determining the command usage metrics comprises determining for a set of users and a selected command at least one of: user count information based on a number of users of the set who use the selected command, percentage of users information corresponding to a percentage of users of the set who use the selected command, session count information based on a number of sessions in which the selected command occurred, percentage of session information corresponding to a percentage of application sessions in which the selected command was used, click count information corresponding to a number of clicks corresponding to the selected command, percentage of click count information corresponding to a percentage of program clicks corresponding to the selected command, click count per user information based on click count and user count of the selected command, or click count per session information corresponding to a click count per session, or any combination of user count information, percentage of users information, session count information, percentage of session information, click count information, percentage of click count information, click count per user information, or click count per session information.
  • 5. The method of claim 1 further comprising, analyzing the information corresponding to the software instrumentation data to determine at least one usage trend over time.
  • 6. The method of claim 5 wherein analyzing the information corresponding to the software instrumentation data to determine at least one usage trend over time comprises determining at least one of: user count information corresponding to a number of users that used a program during a reporting interval, session count information corresponding to a number of sessions of a program during a reporting interval, session count per user information based on a number of sessions and a total number of users that used a program during a reporting interval, running time per user information corresponding to a session length of users and a number of users that used a program during a reporting interval, cumulative session count per user information corresponding to a total number of sessions and a number of users from a start time of a period to analyze to an end of each reporting interval, or cumulative running time per user information corresponding to a session length of users that used a program and a number of users from a start time of a period to analyze to an end of each reporting interval, or any combination of user count information, session count information, session count per user information, running time per user information, cumulative session count per user information, or cumulative running time per user information.
  • 7. The method of claim 1 further comprising, analyzing the information corresponding to the software instrumentation data to determine at least one type of user.
  • 8. The method of claim 7 wherein analyzing the information corresponding to the software instrumentation data to determine at least one type of user comprises categorizing users by depth of usage, or categorizing users by types of activities in which they engage, or categorizing users by depth of usage and categorizing users by the types of activities in which they engage.
  • 9. The method of claim 1 further comprising, analyzing the information corresponding to the software instrumentation data to determine at least potential outlier corresponding to command usage that appears different from command usage of other users.
  • 10. The method of claim 1 wherein analyzing the information corresponding to the software instrumentation data to determine at least potential outlier comprises computing an entropy value corresponding to an occurrence distribution of a command, computing an average occurrence per session of the command usage by a potential outlier, and using the entropy value or the average occurrence per session, or both, as outlier criterion or criteria.
  • 11. The method of claim 1 further comprising, locating a subset of the user sessions based on at least one session criterion, wherein each session criteria comprises a dimension and a value for that dimension, and each dimension comprises a variable recorded in a session, a feature, or a set of one or more variables computed from a plurality of variables recorded in a session, and wherein analyzing the information corresponding to the software instrumentation data comprises analyzing the subset.
  • 12. The method of claim 11 wherein locating the subset comprises mathematically combining each of at least two session criteria.
  • 13. The method of claim 1 further comprising, providing a mechanism for modeling a user group, including providing an interface for receiving one or more measures that specify one or more conditions that any user needs to meet in order to belong to the user group, and locating the users from the information corresponding to the software instrumentation data based on the one or more measures.
  • 14. The method of claim 13 wherein locating the users comprises determining at least one of: session length information corresponding to a session length of the sessions of a user, crash information corresponding to a number of sessions of a user that crashed, depth of usage information corresponding to which commands were used by a user, enrollment length information corresponding to a time between an earliest session and a latest session of a user, failure information corresponding to a number of sessions that failed of a user, mean time to crash information corresponding to session length and sessions of a user that crashed, mean time to failure information corresponding to session length and number of sessions of a user that failed, session count information corresponding to a total number of sessions of a user, session frequency information corresponding to time between consecutive sessions of a user, or total running time information corresponding to total session time of sessions of a user, or any combination of session length information, crash information, depth of usage information, enrollment length information, failure information, mean time to crash information, mean time to failure information, session count information, session frequency information, or total running time information.
  • 15. A computer-readable medium having computer executable instructions, which when executed perform steps comprising, locating a subset of sessions that meet specified session criteria based on a set of session data, locating a subset of users based on users whose sessions meet specified user criteria based on the set of session data, and combining the subset of sessions with the subset of users via Boolean logic to produce a result set.
  • 16. The computer-readable medium of claim 15 wherein the session criteria comprise at least one session criterion and another session criterion combined with each other via Boolean logic, and wherein the user criteria comprise at least one user criterion and another user criterion combined with each other via Boolean logic.
  • 17. In a computing environment, a system comprising: an analyzer that processes information corresponding to software instrumentation data recorded from user software program usage sessions to produce a first subset comprising software usage data;a group modeling mechanism that processes the information corresponding to the software instrumentation data a second subset comprising user data; andmeans for combining the first subset with the second subset to provide an output that corresponds to a selected group of users and their software program usage.
  • 18. The system of claim 17 further comprising a user interface for facilitating selection of one or more criteria by which the first subset and second subset are located from the information.
  • 19. The system of claim 17 further comprising means for recording the software instrumentation data.
  • 20. The system of claim 17 wherein the analyzer comprises means for performing at least one of an application usage analysis, a command usage analysis, or a trend analysis, or any combination of an application usage analysis, a command usage analysis, or a trend analysis.
CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is related to the following copending U.S. patent applications, assigned to the assignee of the present application, filed concurrently herewith and hereby incorporated by reference: Software Reliability Analysis Using Alerts, Asserts, and User Interface Controls, U.S. patent application Ser. No. ______ (attorney docket no. 319768.01); Multidimensional Analysis Tool for High Dimensional Data, U.S. patent application Ser. No. ______ (attorney docket no. 319769.01); Efficient Data Infrastructure for High Dimensional Data Analysis, U.S. patent application Ser. No. ______ (attorney docket no. 319771.01); Software Feature Usage Analysis and Reporting, U.S. patent application Ser. No. ______ (attorney docket no. 319772.01); Software Feature Modeling and Recognition, U.S. patent application Ser. No. ______ (attorney docket no. 319773.01); and Analyzing Software Usage with Instrumentation Data, U.S. patent application Ser. No. ______ (attorney docket no. 319774.01).