The present invention relates to the field of computer software development. In particular, but not by way of limitation, the present invention discloses techniques for analyzing software development and predicting software defect rates for planning purposes.
Managing computer software development is a notoriously difficult task that has been studied for many years. Predicting how long it will take to develop, test, and debug a particular software product is often more art than science. The difficulties in planning, scheduling, and managing software development have long caused problems for software development teams since these software development teams must also interact with customers and marketing teams that want to have reliable software development schedules for planning purposes.
For example, software development teams often have a difficult time in projecting an accurate release date for a new software product since the amount of time required to create a software application is difficult to estimate. Compounding this problem is the fact that the amount of time required to thoroughly test and debug a new software product is also a very difficult task to forecast. The lack of an accurate release date makes it difficult to marketing and advertising teams to plan their sales campaigns. The lack of an accurate release date also complicates the financial planning for a company since it is not known how much software development will cost and when revenue from a product release will begin to be collected.
Even after a software product is eventually released, it can be very difficult to manage the support of that released software product. The management of a released software product is very difficult due to the inability to accurately determine the amount of support staff that will be required to fix the bugs that customers find within a newly released software product. Proper post-release planning is required because if a newly released software product is not properly supported then the reputation of the newly release software product and the company that created the software product will suffer.
The difficulties in forecasting software development schedules and forecasting the amount of post-release support that will be required for a software product has long made software development a very difficult business risk. Thus, it would be desirable to improve the techniques for software development and release planning
In the drawings, which are not necessarily drawn to scale, like numerals describe substantially similar components throughout the several views. Like numerals having different letter suffixes represent different instances of substantially similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the invention. It will be apparent to one skilled in the art that specific details in the example embodiments are not required in order to practice the present invention. For example, although some of the example embodiments are disclosed with specific reference to computer software development, many of the teachings of the present disclosure may be used in many other environments that involve scheduling the development and support of complex projects wherein various project metrics can be obtained. For example, a complex construction project that involves many different subcontractors may use many of the same techniques for managing the construction project. The example embodiments may be combined, other embodiments may be utilized, or structural, logical and electrical changes may be made without departing from the scope of what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one. In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. Furthermore, all publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
Computer Systems
The present disclosure concerns techniques for improving the scheduling and support of software development projects. To monitor the software development, computer systems may be used.
The example computer system 100 of
The disk drive unit 116 includes a machine-readable medium 122 on which is stored one or more sets of computer instructions and data structures (e.g., instructions 124 also known as ‘software’) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 124 may also reside, completely or at least partially, within the main memory 104 and/or within a cache memory 103 associated with the processor 102. The main memory 104 and the cache memory 103 associated with the processor 102 also constitute machine-readable media.
The instructions 124 may further be transmitted or received over a computer network 126 via the network interface device 120. Such transmissions may occur utilizing any one of a number of well-known transfer protocols such as the well known File Transport Protocol (FTP). While the machine-readable medium 122 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies described herein, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical media, and magnetic media.
For the purposes of this specification, the term “module” includes an identifiable portion of code, computational or executable instructions, data, or computational object to achieve a particular function, operation, processing, or procedure. A module need not be implemented in software; a module may be implemented in software, hardware/circuitry, or a combination of software and hardware.
Traditional Approach
Predictive analytics is the analysis of recent operations to predict future outcomes, using information learned from experience in the past. After creating a set of predictions, a user of a predictive analytics system may then take corrective action to avoid a predicted detrimental future outcome. Specifically, analysis of recent operations is used to determine future outcomes, based on past behavior so that corrective action can be taken today. This is graphically illustrated in
Referring to
Combining the insight from the past with the informational metrics from the present provides foresight such that predictions of the future can be made. Based upon the predictions of the future, a manager can take corrective action which will change the predicted outcome of the future. Thus, predictive analytics provides a substantial amount of information that can help software managers and executives including product ship dates, customer satisfaction, revenue estimates, etc.
The traditional approach of performing predictive analytics for planning and scheduling a software project is based upon simple bug tracking All of the bugs discovered within a software program being developed are tracked with a bug tracking system and the rate at which bugs are being discovered provides some guidance as to how the software development is proceeding.
An actual bug rate 310 may be linearly extrapolated to form the simple estimation 315 of the bug rate at the release date as illustrated in
The current actual bug rate may be compared to bug rates of previous products to come up with a revised bug prediction. For example, one may scale last year's bug rate curve 320 to match this year's current bug rate data 310 to generate an improved bug prediction 325. This improved bug prediction 325 is likely to be better than the simple linear estimation 315 since the improved bug prediction 325 more accurately incorporates the realities of software development processes. However, this improved bug prediction 325 is also likely to be inaccurate since every software project is different and just a simple mapping of a previous bug rate 320 onto a current bug rate will result only in a simple prediction that will only be accurate if the two development scenarios are very similar.
However, most software development projects are very different from each other. For example, what if the current software development project was attempting to add several more complex features than the previous software development projects? The more complex current software development project would likely lead to more bugs. Thus,
Problems with the Traditional Approach
The current bug rate only predictive analytics ignore too much of the activity that is occurring during the software development process. For example, the amount of testing being performed should be considered. If there is a large amount of testing the more bugs will be discovered. However, more bugs discovered due to more testing does not necessarily mean the code is worse that previous code; it is simply more thoroughly tested.
The current bug rate only predictive analytics systems also ignore the “volume” of software code being analyzed. If the current software development project is much larger than previous software development projects there will generally be more bugs in the current larger software development project. But if the larger number of bugs is proportional to the larger size of the current software development project, the larger number of bugs may not signal any significant problem with the current software development project. Furthermore, if a large number of new features are being added to the current software development project, these new features may be more vulnerable to having bugs than code written to implements well-known features that have been created in previous software projects.
The current bug rate only predictive analytics systems may also ignore the “density” of software code being analyzed. Equally sized software development projects may have different levels of complexity. For example, if a project has multiple different code threads that run on different cores of a processor and each thread must carefully interoperate with the other concurrently executing threads then such a software development project will be inherently more complex than single-threaded software program that runs on a single processor even if both software development projects have the same number of lines of code. Thus, one would expect to have more bugs in an inherently complex software development project.
A key insight here is that the traditional approach to predictive analytics that only uses bug rate tracking can have problems because software bugs are a lagging indicator. Software bugs only indicate problems that have been discovered and are poor indicators as to problems that will be encountered later. And depending on the specific context, bugs discovered during a software development project are both positive and negative indicators. For example, a larger number of bugs may actually be a positive indicator if this larger number of bugs was discovered by extremely thorough testing. Conversely, a large number of bugs may also indicate significant problems with the software being developed.
An Improved Approach Using More Information
To improve upon the predictive analytics for software development, the present disclosure discloses a predictive analytics system that collects much more information about the software development project to create significantly better predictions of future outcomes. The new information collected about the software project is combined with previously used indicators (such as bug rate tracking) in a synergistic manner that greatly improves the accuracy of the predictions that can be made. Recent research has revealed that there indeed are several software code metrics that are highly correlated with quality. Measuring these software code metrics and implementing them within a predictive analytics system can greatly improve the predictive analytics system.
Three different groups of significant factors have been identified as important and implemented in predictive analytics system: code complexity, code churn, development process factors. Code complexity may be defined as a set of metrics that may be extracted from the actual software code itself and which provide a measure as to the complexity of created software code. Code churn may be defined as the set of interactions between humans (programmers and testers) and the actual software code. Finally, the development process factors are a set of software development processes that affect the software development process such as the number of new features being added, the amount the code is exposed to consumers, the code ownership.
The number of global variables written to in a software file is generally highly-correlated to the defect rate of software. With global variables, many different entities can access the global variable such that any one of them may cause an error and determining which one caused the error may be difficult. Note that these particular code complexity metrics listed in
All of these code complexity metrics may be collected on a localized basis (per method, per class, etc.) and used to perform local analysis for individual methods, classes, etc. In this manner, predictions made on local code regions may be used to allocate resources to code areas where there may be localized trouble. The code complexity metrics may also be combined together for a larger project basis view.
Additional code churn metrics may include the sum of all revisions of the lines of code added to file, the sum of all lines of code minus the deleted lines of code over all revisions, the maximum number of files committed together, and the age of file in weeks counted backwards from the release time. In general, the less that a particular section of software code has been altered indicates that the software code is more likely to be stable. Furthermore, a series of relatively small or simple changes to a section of code, generally accompanied by testing (which also may be tracked) is correlated with fewer bugs for that code section.
Referring back to the predictive analytics system 500 diagram of
The source code control system 581 tracks when any source code is changed, who changed the source code, a description of the changes made, an identifier token for the feature being added or the defect being fixed by the change, and any reviewers of the change. In addition, the system may determine the version branch impact of the code changes. In one embodiment, the system handles the existing version branching structure and can analyze the version branching without requiring any changes.
In addition to the source code control system 581, a bug tracking system 583 (also known as a defect tracking system) can provide a wealth of code churn information. For each bug that has been identified, the bug tracking system 583 may maintain a bug identifier token, a bug description, a title, the name of the person that found the bug, an identifier of the component with the bug, the specific version release with the bug, the specific hardware platform with the bug, the date the bug was identified, a log of changes made to address the bug, the name of the developer and/or manager assigned to the bug, whether the bug is interesting to a customer, the priority of the bug, the severity of the bug, and other custom fields. When a particular bug tracked by the bug tracking system 583 is addressed by a programmer, the programmer will indicate which particular bug was being addressed using the bug identifier token. The source code control system 581 may then update all the associated information such as the log of changes made to address the bug and the specific code segments modified. Thus, the number of times a code section has been modified due to bug-fixing can be tracked. If a bug is associated with a new feature being added, the system may also provide a link to the feature in the feature tracking system 589.
In one embodiment of the predictive analytics system 500 of the present disclosure, the predictive analytics system 500 may provide feedback directly into some of the programming support tools. For example, referring to
A third set of metrics that may be tracked are a set of software development process factors that may be referred to as ‘process’ metrics. These process metrics keep track of various activities that occur during software development such as testing, adding new features, “ownership” of code sections by various programmers, input from beta-testing sites, etc.
One particularly important process metric to analyze is “orphan” analysis of the source code. When one or two programmers work on a particular section of source code, those one or two programmers are said to “own” that code and tend to take responsibility for that code. However, if there is a section of code that is accessed by numerous different programmers, the various different programmers may make contradictory modifications to that section of code such that defects become more likely.
Referring again to
Brand new features are generally more difficult to create than well-known features such that the bug rates may be expected to be higher. In one embodiment, each new feature is rated with a complexity score. For example, each feature may be rated as high, medium, or low in complexity such that each new feature is not treated exactly the same since some new features are more difficult to add than others.
The amount of marketing exposure can also be used to help track the progress of software development. Referring to
In summary, the present disclosure proposes tracking a much larger amount of information than is tracked by conventional bug tracking systems in order to improve predictive analytics during software development. Specifically, in addition to traditional bug tracking, an improved predictive analytics will track many code complexity features (that can generally be extracted from the source code), many code churn statistics describing the interaction between programmers and the source code (that can often be extracted from source code control systems), and many software development process metrics such as the number of new features being added, the amount of testing being performed on the various code sections, and feedback from customers.
Improved Predictive Analytics System
All of the metrics described in the previous section are collected and used within a predictive analytics system 500 that predicts the future progress of the software development. Specifically, all of the metrics described in the previous section are collected within a current project development metrics database 530. All of the metrics within the current project development metrics database 530 provide a deep quantified measure of how the software project development is progressing. A predictive analysis engine 521 processes information the current project development metrics database 530 along with a previous software development history and system model 550 to develop a set of current predictions 525 for the current software development project.
At the bottom of
The predictive analysis engine processes all of the data received to generate useful predictive analytic information. In
The pre-release defect rate information provided to the user may be used to guide the software development effort. For example, the pre-release defect rate may specify particular areas of software development project code that are more likely to have defects. This information can be used to allocate software development resources to those particular code sections. For example, more testing may be done on those code sections. If the predicted pre-release defect rate appears to be too high, the software project managers may decide to eliminate some new features in order to reduce the complexity of the software project in order to ensure a more stable software product upon release.
The post-release defect rate provides an estimate of how many customer found defects (CFDs) will be reported by customers. The post-release defect rate can be used to plan for the post-release customer support efforts. The number of customer support responders and programmers needed to address customer found defects may be allocated based on the post-release defect rate. If the predicted post-release defect rate is deemed too high, the release date of the product may be postponed to improve the product quality before release.
Referring again to the
Note that as a project progresses, additional bug tracking information will be provided on the current project. This additional information can be used to create a feedback loop 713 to the dependency analyzer as depicted in
Many different predictive analysis systems may be used to implement the predictor. For example, the statistical techniques of multi-collinearity, logistic regression, and hierarchical clustering maybe used to make predictions based on the previous data. Various different artificial intelligence techniques may also be used. For example, Bayesian inference, neural networks, and support vector machines may also be used to create new predictions based on the current project information (bug tracking, code complexity, code churn, etc.) in view of the experience data collected from previous projects that is stored within the representative data model.
In one particular embodiment, the primary techniques used in the predictor system include Principal Component Regression (one application of principal component analysis), factor analysis, auto regression, and parametric forms of defect curves. These particular techniques have proved to provide accurate defect forecasting results for both pre-release and post release defects in the software development project.
For comparison, a set of simple predictions from a bug-tracking only based system is drawn on the same graph. As illustrated in
Customer found defects (CFDs) represent only one set of many other predictions can be made by the improved predictive analytics system.
The predictive analytics system can be used to determine a proper ship date given a quality standard that must be met. Having a projected ship date based upon empirical objective statistics that can be used to determine if a release date desired by executive management should be postponed or not. Without such an objective figure, internal office politics may allow poor decisions to be made on whether to ship a product or not.
The predictive analytics system can be used to determine the amount of resources that will likely be required to provide good post-release support for a product. Once a product ships, a software development project needs to hire support staff to handle support calls received from the customers of the product. Furthermore, engineering resources need to be allocated to the software development project in order to remedy the various customer found defects. Thus, the predictive analytics system can be used to make budgeting and hiring decisions for post-release customer support.
The improved predictive analytics system disclosed in this document can be used to significantly improve the software development process by providing objective analysis of the software development project and a set of objective predictions for the software development project. Providing objective analysis from an automated predictive analysis system can help remove many of the subjective decisions made by software managers that can be controversial and often very wrong. Traditional bug rate-only analysis is too simplistic to provide accurate results since reported bugs are lagging indicators that only describe defects that have already been found. By using other detailed information about a software project including code complexity, code churn, new features, and testing information in additional to traditional bug tracking much more accurate predictions can be made. Most of the additional information can easily be obtained by automated processing of the source code, retrieving information from source code control systems, retrieving information from testing databases, and retrieving information from feature request systems. This additional data reflects the future bug risk inherent in the software project instead of just the problems found so far with bug tracking. The predictions made by the improved predictive analytics system can then be used to provide better scheduling and resource allocations.
Improved Predictive Analytics System
To fully describe how the predictive analytics system of the present disclosure operates, a full example of its application is disclosed with reference to the flow chart of
The predictive analytics system then builds a statistical model of the software development process based upon all of the information collected. The statistical model correlates the various code complexity, code churn, and process metrics to an observed set of software defect rates. Referring back to
Next, at stage 1020, the system collects a set of code complexity, code churn, and process metrics for a current software development project. As set forth in the previous sections, the collection of these metrics is largely performed in a manner that is completely transparent to the programmers and managers working on the project. Referring back to
Referring back to
During the processing of the current project's collected metrics 530, the predictive analytics system 500 may feedback some of the recent collected metrics from the current project into the statistical model 550. In this manner, the predictive analytics system 500 is continually updated with more recent experience. Furthermore, the information stored within the statistical model 550 may be weighted depending on the age of the information. By continually adding new information and weighting the information by age, the predictive analytics system 500 will continually adjust the predictions made based upon the way the software development team changes their practices. Thus, as a software development team uses a predictive analytics system 500, that software development team will change the way they work based upon the advice they receive from the predictive analytics system 500. This in turn will change defect rates. Thus, having a feedback system that continually adjusts the statistical model 550 of the predictive analytics system 500 with the latest information will ensure that predictions continue to be accurate.
After analyzing the current state of a software development project as reflected in the current project's collected metrics 530, the predictive analytics system 500 will display a forecast of the current software development project at stage 1140.
Displaying the forecast provides some useful information to the software manager. However, to provide more useful information, additional displays of information are made available to the software manager using the predictive analytics system 500. Thus, at stage 1050, the system displays a visual representation of the model that shows the relative importance of the various metrics. In one embodiment, the relative importance is displayed with a colored coding system. This display allows a software manager to know which metrics are very important to handle properly. Conversely, this also allows the software manager to see which factors are not very important and probably not worth focusing on. The relative importance of the metrics is extracted from the statistical model 550 of the predictive analytics system 500. Note that the importance of the metrics will depend on what the system learned from the previous software development projects. Thus, for the best advice, the system should use a collection of metrics collected from the same development team and tools.
After displaying the important metrics in the model, the system may then proceed to stage 1060 where the predictive analytics system displays the most important metrics affecting the current predictions. Thus, specific issues with the current software development project may be causing abnormally large risks. For example, a set of popularly used global variables may be introducing a high-risk to this particular project even though that is not often a problem with this team's projects. By highlighting the specific factors that are most important for this project, the software manager can take direct actions to address those issues. In one embodiment, the user is able to change certain metrics to see how the changes adjust the forecast. In this manner the user can see how different changes to the development process will affect the outcome.
Finally, at stage 1070, the predictive analytics system 500 may employ an expert system 527 to process the current predictions 525 and output a set of specific recommendations to address the most high risk areas of the current software development project. For example, a set of general recommendations for minimizing the risks presented the metrics identified in stage 1050 has highly important to the model will be presented. Similarly, the expert system 527 may include a set of specific recommendations for addressing the specific problem areas identified in stage 1060 that are strongly affecting this current software development project.
The preceding technical disclosure is intended to be illustrative, and not restrictive. For example, the above-described embodiments (or one or more aspects thereof) may be used in combination with each other. Other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the claims should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim is still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
The Abstract is provided to comply with 37 C.F.R. §1.72(b), which requires that it allow the reader to quickly ascertain the nature of the technical disclosure. The abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped together to streamline the disclosure. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.
The present patent application claims the benefit of the previous U.S. Provisional Patent Application entitled “Methods and Apparatus for Providing Predictive Analytics for Software Development” filed on Nov. 9, 2011 having Ser. No. 61/557,891.
Number | Date | Country | |
---|---|---|---|
61557891 | Nov 2011 | US |