Conventional programming environments include one or more database(s) holding many computer programming elements (CPEs) that may be interconnected to provide expected resources and proper functionality to an end-user (e.g., a client using a computing system).
Currently, a plurality of computer programmers may contribute to the development of hundreds to thousands of CPEs stored on multiple computers hosting the programs made up of the elements. Not all CPEs depend upon one another, but with so many CPEs developed for particular computing systems and applications, there are numerous dependencies between multiple CPEs that make up particular programs like operating systems and browsers.
Thus, domain knowledge and management experience associated with dependent CPEs is reduced or lost when a programming task is transferred from one programmer to another. With this lack of domain knowledge and management, errors committed by computer programmers making one or more modifications or updates to the CPEs are more likely to damage the functionality of the programs.
This document describes tools for determining dependencies and associations between computer programming elements (CPEs) in a computing system. These tools track code check-ins in order to mine dependencies and associations between CPEs. Code check-ins may be mined for a period of time, and CPEs checked in together may be identified. Code check-ins performed by a plurality of different computer programmers may also be identified. In at least one embodiment, an indication that a CPE has either already been modified or is about to be modified, such as via a check-out, is received. In response to the received indication, the tools provide a recommendation indicating additional CPEs which are associated with the checked-out CPE. This recommendation is based on the mined dependencies and associations ascertained from previous code check-ins.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The term “tools,” for instance, may refer to one or more systems, methods, computer-readable instructions, and/or techniques as permitted by the context above and throughout the document.
The detailed description is presented with reference to accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
The following description sets forth tools for determining dependencies and associations between computer programming elements (CPEs) in a computing environment, such as during the maintenance phase of the software development life cycle (SDLC). These tools track code check-ins performed by a plurality of different programmers, extracting patterns, dependencies, and associations between CPEs. For example, associations may be based on CPEs being historically checked-in together. When performing these code check-ins the plurality of programmers develop individual skills and domain knowledge associated with the CPEs being checked-in.
In one embodiment, responsive to receiving an indication that a CPE has been checked-out and either has already been modified or is about to be modified by a computer programmer, the tools provide recommendations indicating additional CPEs associated with the CPE checked-out. For example, requesting the CPE for check-out may indicate that the CPE is about to be modified. In another example, submitting the CPE for check-in may indicate that the CPE has been modified. Because the preceding examples are not mutually exclusive, either one or both may serve as a trigger for a recommendation tool. The recommendation tool extracts patterns, dependencies, and associations ascertained from the previous code check-ins performed by the plurality of different computer programmers.
The provided recommendation can be presented to computer programmers for example, via a graphical user interface, via a command line, etc., as an indication of additional CPEs associated with a CPE being modified to facilitate transfer of domain knowledge and enhance individual skill sets. Such computer programmers include, but are not limited to, any interested programmers, inexperienced programmers, programmers newly assigned to a particular task or group, programmers modifying complex programming elements, etc.
In this way, the programmers are better able to estimate the effort and programming knowledge that may be required to make a modification to a CPE. Additionally, programmers can reduce the number of defective fixes within a programming environment and expeditiously familiarize themselves with the domain knowledge and management of a source code base.
In one practical example, as time passes, individual computer programmers cease working on a particular project at a company. For example, the programmer may retire or transfer from positions working on projects associated with a set of particular source code files that make up at least part of a particular program or project. As such, these programmers leave further development and/or maintenance tasks associated with the particular set of source code files to another programmer. Furthermore, when these programmers leave, they take with them their knowledge (built up over time) regarding this set of source code files. The recommendation tool described herein, however, helps fill this void by recommending that certain CPEs be analyzed and/or modified in response to a programmer checking-out a particular CPE.
In another practical example, source code files developed by one or more computer programmers in a source code development group for a particular area of development, are often procedurally transferred. For example, development groups transfer source code files to one or more different computer programmers in a separate source code file-maintenance group that fixes the programs when bugs are reported and adds new features and/or applications to the computing system as part of a maintenance phase.
In both of these practical examples, transferring domain knowledge and development management experience associated with the CPEs would benefit computer programmers with limited domain knowledge and development management experience of a particular application.
For example, when a programmer, checks-out a code segment from a database of CPEs in order to modify, add or delete a set of CPEs previously written or maintained by another programmer, the programmer may have limited knowledge of the inner-workings of the set of CPEs and any implicit relationships between CPEs. This scenario may occur when there is a large number of CPEs with hundreds to thousands of lines of source code interconnected and dependent upon one another, such as for operating systems, browsers, and integrated development environments. Examples include the Microsoft Windows® operating system, Internet Explorer®, and Visual Studio®. This disclosure is not limited to these implementations and other applications are also envisioned.
In order to modify one or more CPEs, a programmer checks-out (e.g. accesses) at least one CPE from a central repository in order to analyze, review and ultimately perform a modification. Checking-out CPEs may include pulling the CPEs from the central repository to a separate client computer where the programmer is working. Alternately, checking-out CPEs may include securing the checked-out CPE on the resident computer. In at least one embodiment both options are enabled. Once the modification is performed, the programmer checks-in the modified CPEs. Checking-in includes returning the modified CPEs to the central repository, thus updating the source code database.
For example, the computer programmer may check-out ten CPEs from a programming code database (e.g. the central repository), but only modify five of the ten CPEs that have been checked-out. Thus, only the five modified CPEs make up the code check-in when the programming code database is updated with the five modified CPEs. In this example, the recommendation tool may notify the programmer in an event that one or more associations or dependencies are identified between the modified and unmodified CPEs.
In another practical example, the computer programmer may modify all ten CPEs that have been checked-out. Thus, all ten modified CPEs make up the code check-in when the programming code database is updated with the ten modified CPEs. Thus, CPEs identified and associated with a particular check-in are the CPEs that have been modified by a programmer while the CPE was checked out or those checked-in within a predefined window of time, as discussed later in this document.
In yet another practical example, a programmer may not check-out any existing CPEs for modification. Instead, a programmer may develop a new set of CPEs, and then check-in the newly developed CPEs into a computing environment. In this scenario, no existing CPEs are checked-out from the computing environment. However, in this example, the newly developed CPEs can be used to mine patterns, dependencies and associations for future check-outs.
A practical example scenario when a computer programmer would benefit from transferred domain knowledge and management would be when different areas (e.g. source code files, functions, etc.) of an operating system (such as Microsoft Windows®) are developed by a plurality of computer programmers in different countries or across multiple time zones. In this exemplary scenario, it is beneficial to transfer domain knowledge and development management when building, testing and maintaining the source code files that make up the operating system.
For example, when a programmer in the United States is checking-in a set of source code files located on one or more central servers hosting the operating system, another programmer, in Taiwan, for example, may be simultaneously preparing to check-out a particular source code file in the set of source code files previously checked-in by the programmer in the United States. Thus, it would be practical and beneficial to inform the computer programmer in Taiwan of any associations and dependencies between the CPEs and/or source code files in the complete set of source code files that resulted from the check-in performed by the programmer in the United States so that errors can be avoided.
A programmer may modify, add or delete computer programming code in one or more CPEs for security purposes in response to exploitation of an application from the outside when one or more CPEs should be fixed expeditiously. Modifications may also be made for reliability purposes such as in order to be compatible with a particular piece of hardware or software running in a different part of the world for example, or for implementing new features in one or more applications. However, it is to be appreciated that programmers will check-out one or more CPEs in many other contexts also.
Thus, described herein is a recommendation tool that informs programmers of associated and dependent CPEs as well as patterns in the development and maintenance of a source code database. The recommendation tool extracts information from code check-ins. In this way, a computer programmer is informed of any potential impact that modification to one CPE will have on another CPE within a computing system.
As described herein, for purposes of this document, a programmer is a user who checks-out (e.g. accesses) or checks-in, via a computing device, one or more CPEs in order to review, analyze, add and/or modify one or more CPEs that comprise part of a code database. Modifying programming code includes, but is not limited to adding code, deleting code, merging code or changing code.
As described herein, for purposes of this document, a CPE is illustrated and described as a source code file. However, it is appreciated, without departing from the scope thereof, that CPEs in the context of this document can also be interpreted as relating to particular development areas (Internet Explorer®, HTML rendering, Multimedia) within a computing system or software product (e.g. operating system, browser, integrated development environment, Microsoft Word® etc.), sub-areas within the computing system (e.g. operating system user interface, browser control, Input/Output, Document Rendering), code components (e.g. DirectX), code sub-components (e.g. Sound), binaries, functions/classes, and individual lines of programming code. Thus, source code files are but one exemplary CPE and it is understood, that there are numerous different CPEs for which the recommendation tool may be implemented.
By mining information in code check-ins, a recommendation tool can discover patterns, dependencies and associations between hundreds to thousands of CPEs. The recommendation tool provides a finite number of CPEs (e.g. source code files) associated with the source code file currently or about to be modified. This finite list of associated source code files can then be reviewed and modified in association with the source code file currently or about to be modified.
In illustrated architecture 100, user 102 may check-out Element-1.x 106 for the purpose of modifying the source code file. One or more servers 110 store Element-1.x 106. The servers 110 host at least part of a computing system made up of numerous source code files. As illustrated, the servers 110 individually, or in combination, store or otherwise have access to computer programming data 112 (e.g. via code databases such as in managed code environments) including a plurality of CPEs (1) . . . (N). Hereinafter the plurality of CPEs is referred to as source code files. Furthermore, servers 110 are capable of compiling the computer programming data 112 when the computer programming data is modified by a user 102.
While
As illustrated, the servers 110 include one or more processors 114 and at least one memory 116. Memory 116 stores the computer program data 112, transaction data 118, and a transaction mining tool 120.
The transaction data 118, for example includes a plurality of previous transactions 122 T1, T2, T3, . . . TN. In many instances transaction data 118 will store hundreds to thousands of transactions, wherein TN is equal to the number of transactions stored over a period of time. However, it is contemplated that some computing environments may include far fewer or far more transactions over a period of time. Thus, the number of transactions is associated with the size and complexity of the computing environment, and how many programmers develop, update, and maintain the applications over a period of time.
For purposes of this document, a transaction may be considered equivalent to a code check-in as previously discussed. Thus, the transaction data 118 stores information associated with numerous code check-ins performed by a plurality of programmers.
As previously discussed, a code check-in could include a set of source code files that are checked-out, modified and subsequently checked-in together, or a set of new source code files developed and subsequently checked-in together thereby adding additional source code files to the computing environment. Of course, a check-in can also be a combination of modified source code files that previously existed in a computing environment and new source code files developed and thereby added to a computing environment.
Furthermore, in at least one embodiment the transaction data 118 may store time data 124, t1, t2, t3 . . . tN, and/or person data 126 p1, p2, p3 . . . pN associated with each transaction 122 T1, T2, T3 . . . TN respectively. The time data 124 corresponds to a timestamp indicating the date and time when a particular code check-in (or check-out) occurred. The person data 126 uses a unique identification to identify the person (e.g. programmer) who performed the code check-in. The time data 124 and the person data 126 can be used in a variety of ways, such as to help determine the strength of associations between source code files as described later in this document.
In response to the user 102 checking-out Element-1.x 106, a recommendation tool 128 provides (e.g., displays) a recommendation to the user 102. In one embodiment, the user 102 checks-out Element-1.x 106 by simply entering the name (or another form of unique identification) of the source code file (e.g. Element-1.x 106) at the computing device 104 for service to the servers 110. By entering the name of a source code file, a user is indicating that he or she intends to check-out, analyze and possibly modify the source code file.
The recommendation tool 128 indicates one or more source code files (e.g. Element-2.x, Element-3.x, Element-6.x, Element-7.x, and Element-11.x) that are associated with Element-1.x 106, which the user is currently modifying or intends to modify. As depicted, recommendation tool 128 indicates associated source code files. The recommendation tool does not indicate each source code file in an ordered set of source code files, for example. Note that Element-4.x and Element-5.x are not recommended via the recommendation tool in the illustrated example. Although the illustrated recommendation tool 128 in this example lists and ranks associated source code files according to association percentage values, it is understood that the source files can be recommended in a variety of ways.
Examples of implementations of recommendation tool 128 include a Graphical User Interface (GUI) that pops-up allowing the user 102 to be presented with the recommendation, a textual representation in a command line of a computer programming application utilized by the user 102 at the computing device 104, an audio recommendation via an audio component on the computing device 104, or a combination thereof implemented separately or as part of an integrated development environment (IDE) used to manage check-out and modification of source code files. In each implementation, the recommendation is a combination of hardware (e.g. computer monitor) and software. In at least one embodiment the combination of hardware and software is utilized to present recommendations to a user 102 who is modifying computer programming code.
Once the user 102 submits the name of a source code file (e.g. Element-1.x 106) that he or she intends to, check-out, analyze, and possibly modify, the transaction mining tool 120 gathers information from the code check-ins stored in the transaction data 118. Using the information gathered by the transaction mining tool 120, the system can determine and generalize associations and dependencies between the source code file intended to be analyzed and modified (e.g. Element-1.x 106) and other source code files in the programming data 112, and can recommend the other source code files accordingly.
Thus, the transaction mining tool 120 mines the stored transaction data 118 and provides information to be presented to the user 102. As illustrated in
Memory 116 is but one example of computer-readable media, and in some embodiments transaction mining tool 120 may be stored on computer-readable media outside of servers 110. Computer-readable media can be any available media that can be accessed by a computing device such as computing device 104. Computer-readable media includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media comprises computer storage media. “Computer storage media” includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device such as computing device 104.
In various embodiments the transaction mining tool 120 is made up of one or more software modules. In at least one embodiment transaction mining tool 120 includes: a frequency tracker module 204, an itemset passing module 206, a time weighting module 208, a person weighting module 210. In some embodiments other weighting module(s) 212 may be included. These modules are utilized individually, or in combinations to determine associations between two source code files. These modules can selectively be utilized and combined to strengthen the recommendation of associated source code files presented to the user 102 via the recommendation tool 128.
As previously discussed, when programmers intend to make a code modification or addition, they will typically check-out and modify or add a set of source code files. This modification or fix occurs in one transaction (e.g. code check-in). Associations between source code files exist because the source code files are often programmed to work together to provide proper programming and functionality within the computing system. Thus, a particular set of source code files are typically modified together (e.g. in a group). In
As illustrated in the previous code check-ins 302 in
While the above discussion indicates different code check-ins performed by different computer programmers, it is noted that an individual programmer is capable of performing more than one code check-in over a period of time. In fact, it is more likely an individual programmer performs numerous code check-ins relating to code maintenance and development in association with a particular computer programming code database over a period of time. Additionally, the six code check-ins 302 as illustrated in
Next,
In at least one embodiment, the associations determined in the matrix 304 correspond directly to the association values (presented as percentages) indicated via the recommendation tool 128 in
Furthermore, in some embodiments the source code files recommended via the recommendation tool 128, are source code files that meet a defined threshold. In a practical example, a group administrator may set a threshold that any source code files with an association value of strength 50% or higher must be indicated via the recommendation tool 128 to any individual computer programmer in the programming group which the group administrator supervises. In another practical example, a computer programmer can set a defined threshold based on his/her own level of experience relating to the CPEs being checked-out, analyzed and/or modified.
Thus, using the numbers in the matrix 304 with a defined threshold of 50%, if a user 102 intends to modify File A, the recommendation tool 128 will indicate File B and File C as associated files with their corresponding association value strengths of 50% and 75% respectively, while not recommending File D. If another user 102 intends to modify File C, the recommendation tool 128 will indicate File A and File D as associated files with their corresponding association value strengths of 75% and 50% respectively, while not recommending File B.
Of course, the example illustrated in
With more granular association values, numerous source code files may approach an association value within five percentage points of 100% while other source code files in the same computing system may approach an association value closer to 0%. Therefore, careful consideration is given when determining a defined threshold for the recommendation tool 128.
For example, an experienced programmer may have a defined threshold set at 95% because the group administrator is aware the experienced programmer has a high knowledge level of the computing environment and therefore does not need to review and check all the associated source code files that do not meet the 95% threshold. On the other hand, if a programmer has limited experience, the group administrator may set a relatively low defined threshold (e.g. 75%) for the recommendation tool 128 so the programmer with limited experience is presented with a recommendation to review a more exhaustive list of associated source code files and make sure he or she has modified all source code files necessary to avoid any potential errors.
Furthermore, in another example, the defined thresholds can be set in accordance with functionality of a particular computing environment and the severity of any potential consequences resulting from modification error(s) within the computing environment. For example, if a computing environment is programmed to control a nuclear reactor, the defined threshold should be set very low so that any user 102 making a modification checks code with a much stronger threshold (e.g. lower tolerance) compared to a computing environment programmed to control an email login system, where the tolerance for failures may be significantly higher. In this way, an error that could create a catastrophic consequence is more heavily controlled.
Ultimately, the transaction mining tool 120 utilizes the frequency tracker module 204 to extract data from the transaction data 118 and presents, via the recommendation tool 128, a finite list of associated source code files that meet a defined threshold to the user 102. In at least one embodiment this list is ranked according to the strength of the association values for each individual source code file.
In some embodiments, a user 102 may indicate, or pass the name of two source code files that he or she intends to check-out and modify together. In this scenario, the transaction mining tool 120 uses the frequency tracker module 204 to further recommend, via the recommendation tool 128, source code files based on an aggregate of the two source code files being checked-out and modified together by the user 102.
For example,
Again, it is understood that more complex computing systems could be made up of hundreds to thousands of interconnected source code files and corresponding numbers of previous code check-ins. Thus, in most scenarios the association values determined for an aggregate of source code files will also be more granular when recommending a finite list of associated source code files.
Exemplary operations are described herein with reference to
At 402, the transactions mining tool 120 monitors transactions on the servers 110. The transactions are performed by multiple programmers over a period of time. In at least one embodiment the period of time may be defined to be the life of the development and maintenance of a particular application (e.g. the SDLC for the application). In some embodiments the period of time may be a user-defined time period in which a particular development or maintenance task is occurring. In at least one embodiment periods of time may be defined as the SDLC in some instances, such as critical infrastructure applications, and user-defined in others, such as for particular testing projects.
At 404, information associated with each transaction monitored in 402 is stored in the transaction data 118. In some embodiments, the respective time data 124 and person data 126 are stored in associations with the transaction(s) as discussed in the exemplary architecture of
At 406, the transaction mining tool 120 utilizes the frequency tracker module 204 to determine associations between CPEs (e.g. source code files) based on source code files that are historically checked-in together.
At 408, the servers 110 receive an indication that a CPE is being checked-out. In at least one implementation, the user 102 submits the name of the source code file he or she intends to check-out, analyze and possibly modify.
At 410, based on mined patterns, dependencies, and/or associations for the CPE being checked-out, the recommendation is provided via the recommendation tool 128. As previously discussed, the recommendation tool 128 can be in the form of GUI that presents a finite ranked list of source code files and association values (e.g. percentages) that meet a defined threshold. Therefore, the user 102 is informed of associated source code files that programmers have previously checked-out and modified or added in shared transactions with the source code file the user 102 intends to check-out.
While
At 504, the itemset passing module 206 discovers itemsets of size 2 in a second pass. Size 2 itemsets include previous code check-ins 302 in which the computer programmer checked-in three source code files (e.g. one more source code files than size 1 itemsets). Thus, check-ins 3 and 5 in
At 506, itemset passing module 206 iteratively discovers itemsets of size M, where M represents further passes up to size M. Accordingly, in at least one embodiment, M may be equal to the code check-in 302 with the largest N-itemset such that the itemset passing module 206 discovers and mines all transactions stored in the transaction data 118. In some embodiments M may be defined so that less than all transactions stored in the transaction data 118 are mined. As illustrated in
In some embodiments, M may define a cut-off set by an administrator of the computing system. For example, the administrator can define a cut-off M so that the transaction mining tool 120 and the itemset passing module 206 stop after completing ten passes of size 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. In this scenario, a code check-in 302 with an itemset of more than eleven source code files will not be used by the itemset passing module 206 to determine associations. In this scenario, the administrator may conclude that mining a transaction with an itemset of more than eleven source code files would not result in strong indications of associations, patterns and dependencies. Therefore, there is no reason to expend the processing time and resources associated with the transaction mining tool 120 to mine such large itemsets. In an alternate scenario, a code check-in 302 with an itemset of a small number of source code files will not be used by the transaction mining tool 120 to determine associations, (e.g. passes of size 1 or 2). In this scenario, the administrator may conclude that mining a transaction with an itemset of so few source code files would not provide enough data to indicate associations, patterns and dependencies. In yet another scenario, code check-ins of itemsets above and below defined thresholds may be excluded from mining for reasons similar to those discussed above.
At 508, association values are initially determined and/or weighted according to the number of passes completed by the itemset passing module 206. When weighting the associations, the itemset passing module 206 uses previous code check-ins of different sizes. Thus a relative association can be determined, and ultimately recommended, based on itemsets of size 1 . . . M.
In at least one embodiment, for example, an itemset of size 1 (e.g. two source code files checked-in together) may indicate a stronger association than an item set of size 5 (e.g. six source code files checked-in together). The itemset of size 1 may indicate a stronger association because it is known that a computer programmer modified or added File B, for example, when initially checking-out and modifying File A. Thus, there is a 1:1 correspondence. On the other hand, an itemset of size 5 may not indicate as strong of an association as an itemset of size 1 because an itemset of size 5 indicates six source code files checked-out and modified or added together. Thus, there is no 1:1 correspondence in an itemset of size 5. Thus, the dependencies may not be as clearly defined. For example, a code check-in with an itemset of size 5 includes Files A, B, C, D, E and F checked in together. If a programmer initially intended to modify File A, it may be unclear which files depend upon File A. Any one or more of files B, C, D, E and/or F alone or in combination could depend upon File A. Furthermore, the dependencies may not be direct. For example, File B may have been modified or added because of a dependency upon File C, which was modified in response to File A being modified. Thus, the associations between source code files determined by the transaction mining tool 120 may not be as strong when the itemset size increases. Accordingly, check-ins with smaller sizes of itemsets may be weighted more than check-ins with larger sizes of itemsets.
Thus, in some embodiments the association values previously discussed in
In some embodiments, association values can be adjusted based on the size of the itemset by using one or more algorithms to determine weighting coefficients to be applied to the association values. For example, weighting coefficients (1, 0.9, 0.8, 0.7, etc.) can be applied to the association values based on whether the transaction was discovered in the first pass, second pass, etc. Additionally, regression or other evolutionary algorithms can be used to determine weighting coefficients.
Additionally,
The time data 124 allows the time weighting module 208 to apply relative associations between individual source code files, based on the time when the check-in occurred.
For example, if a programmer checked-in Files A and B as illustrated in
In one some embodiments, the time weighting module 208 weights association values according to delta t, thereby adjusting (e.g. strengthening) the association values to indicate a degree of confidence incorporating a delta t.
In some embodiments, the time weighting module 208 determines that code check-in 1 and code check-in 2 should be treated as a single transaction based on delta t, thereby combining the two transactions so that Files A, B and C are associated in one transaction.
This example can be illustrated in a practical scenario where the programmer forgets to change necessary programming code in File C (
It is understood that this weighting technique supports the assumption that the closer in time two separate check-ins occur, the more likely it is that the two separate check-ins are related, and therefore association values should be determined and/or adjusted to indicate a degree of confidence associated with a delta t. The ten minute difference previously discussed in relation to code check-in 1 and code check-in 2 is used for exemplary purposes only. Thus, any time period or time difference may be defined to weight the association between individual source code files or combine two transactions into one transaction. Furthermore, the time data 124 can be used to strengthen the associations based on a definite time threshold (e.g. 10 minutes, 12 hours, 1 day, 1 week, etc.). In at least one embodiment, a definite time threshold may not be implemented and strength associations and weighting factors are determined linearly based on a difference (delta t) in time between two individual code check-ins.
Furthermore, using the person data 126, the person weighting module 210 can determine relative associations between individual source code files based on distance metrics (delta p) between two programmers (e.g. persons) performing two previous code check-ins 602. In order to determine the delta p the person weighting module 210 may access a structure of an organization or a social network.
In one embodiment, an organization hierarchy tree is employed with a plurality of nodes representing different persons within the organization. In this example, each node in the organizational hierarchy tree has a manager or parent node, up to the most senior or root node. Using the organization hierarchy tree, the person weighting module 210 determines the distance, delta p, in number of nodes, between two programmers performing two code check-ins. In one embodiment the person weighting module 210 may count the least number of nodes between the two programmers through a common manager in the organization hierarchy tree.
For example, assume the hierarchy tree 604 illustrated in
In the first scenario, team 1 programmer 612 and team 1 programmer 614, under the same team 1 manager 608, perform two separate code check-ins. Thus, the programmer (persons) distance metrics corresponding to these two separate code check-ins is two based at least in part on the person weighting module 210 counting nodes to the most common managing node. In this scenario, team 1 programmer 612 and team 1 programmer 614 have common team 1 manager 608 and thus traversing the hierarchy tree from team 1 programmer 612 to team 1 programmer 614 via common team 1 manager 608, the person weighting module 210 will count two nodes. Here the distance metrics, delta p, is equal to two.
In a second scenario, team 1 programmer 612 and team 2 programmer 618 perform two code check-ins. In the second scenario, the programmer (persons) distance metrics corresponding to the two separate code check-ins is four based at least in part on the closest common managing node being the senior manager 606. Thus, traversing the hierarchy tree from team 1 programmer 612 to team 2 programmer 618 via senior manager 606, the person weighting module 210 will count four nodes. Here the distance metrics, delta p, is equal to four.
Using the first and second scenarios described above, the transaction mining tool 120 weights associations values based on the determined distance metrics delta p. The lower the distance metric delta p is, the stronger the associations between source code files modified in two separate check-ins is weighted because, for example, members of the same programming team are more likely to be modifying and adding source code files that should be checked-in together within a particular computing environment. Thus, the first scenario explained would determine a stronger association than the second scenario, and the association values would be weighted accordingly into values indicating a degree of confidence.
It is understood the discussed weighting techniques can be implemented separately or in combination with other weighting techniques. For example the itemset sizes discussed in
Furthermore, in some embodiments, in addition to recommending source code files as explained, other implementations recommending additional information can also be realized. For example, the transaction mining tool 120 may mine data associated with ownership of a particular source code file. In this example, a user 102 changing source code files Element-1.x 106 will be informed of an identification of an owner (e.g. original programmer, programmer who last modified the source code file, administrator) of Element-1.x. Thus, if any questions or issues arise, user 102 could contact the owner in order to find out more information about Element-1.x 106. In another embodiment, the user 102 would need to obtain authorization from the owner to modify Element-1.x 106.
In some embodiments, a further recommendation can be given to the user 102 about a depth of inheritance of the source code file to be modified. The user 102, when modifying Element-1.x 106 is informed of another element of risk, for example, if Element-1.x 106 is inherited by numerous other source code files. In this sense, Element-1.x 106 may be well nested within cascading source code, and any modification to Element-1.x 106 would affect the source code files which inherit it. With this recommendation a domino effect of failures can be avoided.
In some embodiments, a recommendation can be given to the user 102 based on cyclomatic complexity of the CPE to be modified. In this implementation, the transaction mining tool 120 determines risk associated with how complex, or how important, the CPE is.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the system and method defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.