This application claims priority under 35 U.S.C. §119 from Japanese Patent Application number 2008-73565, filed on Mar. 21, 2008, the entire contents of which are incorporated herein by reference.
The present invention relates to a technology for predicting trouble occurrence in project management for software development, product development and the like.
In project management for software development and the like, it is very important to recognize troubles as early as possible. A delay in recognition of troubles makes it more difficult to take measures or leads to extra cost and work. In typical project progress management, a project leader regularly reports project information and a supervisor evaluates a project state based on the information. Since the supervisor is generally required to manage a large number of projects, it is difficult for him/her to take time and examine all reports. Thus, in actual project progress management, the supervisor recognizes occurrence of troubles by checking if the project has a problem based on predetermined objective criteria, i.e., by checking if there is a schedule delay, a cost overrun or the like, for example. Thus, the report is checked by going over standard check items and scored as a result of the checking. These measures are useful for surely recognizing an actual trouble after occurrence thereof, but are not effective for earlier recognition of troubles. Furthermore, in general, there is a psychological tendency for project leaders to avoid notifying his/her supervisor of troubles. Thus, the troubles are often not recognized by the supervisor until the very last minute.
Meanwhile, the report of the project information often includes text information described in natural language to complement the standard check items as described above. A skilled project manager is said to be able to predict whether or not troubles are likely to occur in the future, based on text contents and characteristics of expressions by reading the text information. For a project likely to have a trouble, such a trouble can be prevented or measures can be taken early by carefully checking the project beforehand. Thus, the text information is very useful. However, as the number of projects increases, trouble prediction becomes difficult since a person has a limit to reading all texts with his/her eyes.
Therefore, it has been desired to automatically analyze the text information in the project progress management and to narrow down the projects to those that are likely to have a trouble based on the analyzed information.
Japanese Patent Application Publication No. Hei 10(1998)-240715 relates to a method for predicting and estimating new problems from a plurality of cases including quantitative attributes. To be more specific, for example, in the case of estimating “quality characteristics” from a set of “design attribute values” of a product, the following steps are disclosed, including: (1) obtaining similarity of the design attribute values between each of the cases and the new problem; (2) selecting the cases having high similarity to the new problem and obtaining a predicted distribution of the quality characteristics for each of the cases; and (3) obtaining a final predicted value by synthesizing a plurality of the predicted distributions thus obtained.
Japanese Patent Application Publication No. 2004-252893 relates to a method for measuring operational risks. Particularly, the method is intended to improve the validity and stability of a risk value when an amount of loss is estimated from the past records. To be more specific, there is disclosed a specific smoothing method for a transaction amount distribution to be used for the estimation.
Japanese Patent Application Publication No. 2005-018304 relates to a time-series data prediction method and discloses methods including: (1) dividing time-series data to be used for prediction into subsets; (2) creating a value frequency distribution histogram for each of the subsets; and (3) obtaining a predicted value based on a cumulative frequency of the histogram corresponding to attributes of a prediction target from a group of the histograms.
Japanese Patent Application Publication No. 2005-157755 relates to a system for recording medical accidents and discloses methods of recording, as internal factors, personal attributes of a person who reports and judges the accidents in addition to recording the accidents. Particularly, it is disclosed that internal values in the personal attributes are extracted as factors by analyzing report descriptions of accident records through language analysis.
However, none of the above conventional technologies makes it possible with sufficient reliability to automatically analyze the text information in the project progress management and to narrow down the projects to one likely to have a trouble based on the analyzed information.
It is an object of the present invention to provide a technique for recognizing occurrence of troubles early with high likelihood in project management.
It is another object of the present invention is provide a technique for making it possible with sufficient reliability to automatically analyze text information in project progress management and to narrow down projects to one likely to have a trouble based on the analyzed information.
In order to achieve the foregoing objects, the present invention preferably has the following configuration.
In the above description, particularly, the project information is a project state report regularly created for each project and includes at least a text describing the project state and, if the project falls into a troubled state, information indicating that state. The expression pattern is one or more definition descriptions that specify a specific linguistic expression in natural language processing. Moreover, the expression pattern characteristics include a project trouble occurrence probability distribution, in which a point of time when a certain expression pattern appears on a text in the project information is set as an origin, and statistics representing the distribution.
For a more complete understanding of the present invention and the advantage thereof, reference is now made to the following description taken in conjunction with the accompanying drawings.
With reference to the drawings, a configuration and processing according to an embodiment of the present invention will be described below. In the following description, unless otherwise noted, the same components are denoted by the same reference numerals throughout the drawings. Note that the configuration and processing described here is only one example of the embodiment and there is no intention to limit the technical scope of the present invention to this specific embodiment.
Although not individually shown, in advance, the hard disk drive 108 stores an operating system, past project information, current project information to be analyzed and processing programs according to the present invention.
The operating system may be an arbitrary one compatible with the CPU 104, such as Linux (trademark), Windows (trademark) Vista, Windows XP (trademark) and Windows (trademark) 2000 by Microsoft Corporation and Mac OS (trademark) by Apple Computer.
Moreover, the hard disk drive 108 may also store an arbitrary programming language processor such as C, C++, C# and Java (trademark). This programming language processor is used to create and retain processing programs according to the present invention.
The hard disk drive 108 may further include a text editor for writing source codes to be compiled by the programming language processor and a development environment such as Eclipse (trademark).
The keyboard 110 and the mouse 112 are used to initiate the operating system or a program (not shown), which are loaded into the main memory 106 from the hard disk drive 108 and displayed on the display 114, and to type characters.
The display 114 is preferably a liquid crystal display, and one having an arbitrary resolution, such as XGA (1024×768 resolution) and UXGA (1600×1200 resolution), can be used. The display 114 is used to display a result of processing according to the present invention.
Next, a processing flow according to the present invention will be schematically described with reference to a functional block diagram shown in
In
A text analysis part 204 has a function of performing morphological analysis and syntactic analysis on a given text by use of a publicly known text analysis technique, though not limited to, such as one described in Japanese Patent Application Publication Nos. Hei 6(1994)-325104, 2000-76274 and 2004-126933. And then the text analysis part 204 determines whether or not the specified expression patterns are in the text and outputs a frequency thereof. To perform such determination, the text analysis part 204 receives expression patterns 214 from an expression pattern management part 212.
Thereafter, the text analysis part 204 outputs appearing expression patterns 206 in the following form.
This table-form listing expression patterns 206 is stored in the computer-readable form such as CSV and XML, for example, in the hard disk drive 108. This processing is performed for all the stored past project information 202 and a result of the processing serves as the appearing expression patterns 206. Here, the appearing expression patterns 206 are, for example, a pattern indicating occurrence of trouble, such as “okyaku-sama . . . chousei-suru” (The customer makes an adjustment to . . . ) and “keikaku henkou . . . hassei-suru” (a plan change occurs).
In the context of this embodiment, the expression pattern means a set of definitional descriptions of linguistic expressions which specify the above specific linguistic expressions that are obtained as a result of natural language processing. For example, each linguistic expression definitional description includes the following.
Each of the expression patterns is a group of more than one of those linguistic expression definitions described above. If a certain text contains any of the linguistic expression definitional descriptions that belong to the expression pattern, the text is regarded to match the expression pattern.
The form in which the linguistic expression definitions and the expression patterns are described is arbitrary. For example, in a form that can be used as an input to IBM OAE (OmniFind Analytics Edition), the linguistic expression definitions and the expression patterns are expressed as follows by use of XML.
A pattern characteristic calculation part 208 calculates characteristics of each expression pattern with respect to project trouble occurrence, based on data, and outputs the calculated characteristics as expression pattern characteristics 210. An input to the pattern characteristic calculation part 208 includes the appearing expression patterns and a list of dates when trouble occurrence is recognized for those troubles that have already occurred and is stored in the past project information 202. The following table shows an example of the list.
Here, information in Table 2 shows only the date when the trouble occurrence is recognized. But information on time may also be included as a matter of course.
By use of those described above, pattern characteristics are calculated as follows.
a. Only rows related to the expression pattern e in Table 1 are selected. Thereafter, a total sum of appearance counts of those rows is obtained and set as c.
b. With reference to Table 2, whether or not a project trouble has occurred is checked based on the project ID for each of the selected rows. If the project trouble has occurred, a relative time T is obtained, which is defined by the following equation.
T=(time and date when trouble occurrence is recognized)−(time and date of notification)
Thereafter, for the expression pattern e, (T, (observation count)/c) is recorded. Particularly, the second term (observation count)/c is called a normalized observation count.
c. A list of (T, normalized observation count) thus obtained is characteristics of the expression pattern e.
Note that
For later description, this probability is expressed as Prob[trouble|e](T).
An output from the pattern characteristic calculation part is a group containing the list (T, normalized observation counts) indicating the characteristics of each expression pattern and a value of effectiveness.
Note that, as the definition of T, instead of using the above mentioned relative time directly, a progress rate of the project can also be used, the progress rate being obtained by dividing the relative time by a length of each project. In this case, T is defined as follows.
T={(time and date when the trouble occurrence is recognized)−(time and date of notification)}/(length of project)
Moreover, in the case where large volumes of data and various kinds and properties exist in the past project information, the above processing can also be performed for each of the kinds and properties of the projects. In this case, characteristics of each of the expression patterns are calculated for each of the kinds and properties of those projects.
The expression pattern management part 212 manages a set of expression patterns together with characteristic information of the expression patterns. The expression pattern characteristic information is as described above in connection with the pattern characteristic calculation part 208.
The operations of the expression pattern management part 212 are as follows.
New project information 220 is data on a project to be subjected to trouble prediction. A data format of the new project information 220 preferably includes a project ID of a new project, a text notifying a state of the project and time and date of notification.
As a text analysis part 222, the text analysis part 204 may be used as it is or one that is different from the text analysis part 204 and has approximately the same function may be used. The text analysis part 222 also performs analysis by use of expression patterns 216 provided by the expression pattern management part 212.
As in the case of the text analysis part 204, the text analysis part 222 generates appearing expression patterns 224. The appearing expression patterns 224 are provided to the output part 226.
The following are specific contents of processing executed by the output part 226.
By use of the above, an estimated trouble probability within the time T can be outputted.
The output part 226 performs the above processing for each of target projects and outputs the projects after sorting the projects in descending order of trouble occurrence probability.
Next, with reference to flowcharts shown in
In Step 504, it is determined whether or not there is an unprocessed project report. If the processing of all the project reports is completed, the result of the determination is No. Thus, in Step 506, a total appearance count C[e] of each expression pattern is outputted as a list. Thereafter, the processing is terminated.
If it is determined in Step 504 that there is an unprocessed project report, an expression pattern list Er of a next project report r is obtained in Step 508. Thereafter, in Step 510, it is determined whether or not there is an unprocessed expression pattern in the expression pattern list Er. If it is determined in Step 510 that all the expression patterns in the expression pattern list Er are processed, the processing returns to the determination in Step 504.
If it is determined in Step 510 that there is still an unprocessed expression pattern in the expression pattern list Er, a next expression pattern e is taken out in Step 512.
In Step 514, it is determined whether or not the expression pattern e appears for the first time. If so, in Step 516 the total appearance count C[e] of the expression pattern e is initialized to 0.
Next, in Step 518, the total appearance count C[e] of the expression pattern e is incremented by the number of appearances of e in the expression pattern list Er. Thereafter, the processing returns to the determination in Step 510.
In Step 604, it is determined whether or not there is an unprocessed project report. If the processing of all the project reports is completed, the result of the determination is No. Thus, in Step 606, a characteristic list of each expression pattern is outputted. Thereafter, the processing is terminated.
If it is determined in Step 604 that there is the unprocessed project report, a next project report r is obtained in Step 608.
In Step 610, a project ID of the project report r is set to p_id.
In Step 612, trouble information on p_id is obtained. Thereafter, in Step 614, it is determined whether or not there are one or more troubles in the trouble information.
If it is determined in Step 614 that there are one or more troubles in the trouble information, the number of troubles of p_id is set to Nt in Step 616.
If it is determined in Step 614 that there is no trouble in the trouble information, the processing returns to Step 604.
If it is determined in Step 614 that there are one or more troubles in the trouble information, a time stamp of the project report r is set to Tr in Step 618 subsequent to Step 616.
In Step 620, an expression pattern list within the project report r is obtained. Thereafter, in Step 622, it is determined whether or not there is an unprocessed expression pattern. If there is no more unprocessed expression pattern, the processing returns to Step 604.
If it is determined in Step 622 that there is the unprocessed expression pattern, a next expression pattern e is taken out in Step 624.
In Step 626, the number of appearances of the expression pattern e within the project report r is stored in cr.
In Step 628, trouble information on p_id is obtained. Thereafter, in Step 630, it is determined whether or not there is unprocessed trouble information.
If it is determined in Step 630 that there is no more unprocessed trouble information, the processing returns to Step 622.
If it is determined in Step 630 that there is still unprocessed trouble information, a time stamp of the trouble information is stored in Tt in Step 632. Thereafter, in Step 634, Tt−Tr is assigned to T.
In Step 636, cr/C[e]/Nt is assigned to p.
In Step 638, a subroutine 638 of adding [T, p] to the characteristic list L of the expression pattern e is executed. Thereafter, the processing returns to Step 630.
In Step 706, a normalized count is set to p. The normalized count here is cr/C[e]/Nt in Step 636.
In Step 708, it is determined whether or not the expression pattern e appears for the first time. If so, the characteristic list L of the expression pattern e is emptied in Step 710.
Next, in Step 712, [T, p] is added to the characteristic list L of the expression pattern e.
Next, with reference to a flowchart of
In Step 802 shown in
In Step 808, an expression pattern of a characteristic list L is set to e.
In Step 810, 0.0 is set to a floating-point number pp. Thereafter, in Step 812, it is determined whether or not there are still unprocessed items left in the characteristic list L. If so, a next histogram item [T, p] is taken out from the characteristic list L in Step 814.
In Step 816, it is determined whether or not a value of T in the taken out histogram item [T, p] is larger than 0. If not T>0, the item is not useful for the purpose of this processing. Thus, the processing immediately returns to Step 812.
If T>0, p is added to pp in Step 818. Thereafter, the processing returns to Step 812.
If it is determined in Step 812 that there is no more unprocessed item in the characteristic list L, pp, as a trouble probability of the expression pattern e, is assigned in Step 820. Thereafter, the processing returns to Step 804.
If it is determined, back in Step 804, that there is no more unprocessed characteristic list, the processing advances to Step 822 where the expression patterns are sorted in descending order of the trouble probability.
In Step 824, a list Ep of the expression patterns having trouble probabilities exceeding a threshold is outputted. In Step 826, a user performs selection of which one of the expression patterns is to be actually used and the like by checking the list Ep with a GUI of the display 114.
Specifically, in
Next, with reference to
In Step 1002 shown in
In Step 1006, it is determined whether or not there is an unprocessed project report in the list of the new project reports. If there is the unprocessed project report, a next project report r is retrieved in Step 1008.
Next, in Step 1010, a project ID of the project report r is assigned to a variable project_id. Thereafter, in Step 1012, a text of the project report is subjected to syntactic analysis. This processing is executed by the text analysis part 222 shown in
Thus, in Step 1014, a list Er of the expression patterns included in the project report r is obtained.
Next, in Step 1016, 0.0 is assigned to a variable pp_max. Thereafter, in Step 1018, it is determined whether or not there is an unprocessed pattern in Er. If there is the unprocessed pattern, the processing advances to Step 1020 where a next expression pattern e is obtained from Er.
In Step 1022, it is determined whether or not the expression pattern e is included in Ep. Here, Ep means the expression pattern characteristics 218 provided by the expression pattern management part 212. It can also be said that Ep is one selected as the selected expression pattern list in
If the determination in Step 1022 is negative, the processing returns to Step 1018. On the other hand, if it is determined in Step 1022 that the expression pattern e is included in Ep, a trouble probability of the expression pattern e is assigned to a variable pp. As can be seen from
In Step 1026, it is determined whether or not pp thus obtained is larger than pp_max. If pp is larger than pp_max, the maximum value pp_max is updated by assigning a value of pp to pp_max in Step 1028. Thereafter, the processing returns to Step 1018. On the other hand, if pp is not larger than pp_max, the processing directly returns to Step 1018.
Thereafter, if it is determined in Step 1018 that there is no more unprocessed pattern in Er, pp_max is set to the predicted trouble probability for the project report r in Step 1030. Subsequently, the processing returns to Step 1006.
If it is determined in Step 1006 that there is no more unprocessed project report, the list of the project reports is sorted in descending order of predicted trouble probability in Step 1032. Thereafter, in Step 1034, the list of the project reports sorted is preferably outputted and displayed on the display 114.
Next, description will be given of a concrete example of the past project report and processing associated therewith.
It is assumed that there is the following past project report.
(In Japanese) Tsugi no tsuki ikou no keiyaku ni tsuite okyaku-sama to chousei-shita. Sarani, okyaku-sama kara souteigai no gaibu sekkei kikan enchou no hanashi ga ari, tsuika shien keiyaku nai de jisshi suru koto de goui. Yoki senu purojekuto keikaku henkou ga hassei shita. Shanai ni okeru keiyaku no chousei ga hitsuyou de aru. Mata, kokyaku manzokudo chousa ni okeru hyouka no teika mo houkoku sareta. Okyaku-sama ni taishite mo shinchou na taiou ga hitsuyou de aru.
(An adjustment has been made with the client for the contract from next month. There was an unexpected suggestion by the client for extension of the term of the external design, and implementation within the additional support contract was agreed upon. An unexpected project plan change has occurred. An in-house adjustment for the contract is required. Moreover, a drop in the rating in the customer satisfaction survey was also reported. A careful handling is required to communicate with the client.)
The following table shows appearance counts of the expression patterns extracted by the text analysis processing. Note that, here, the table shows the case where “noun . . . verb” and “noun . . . adjective verb” forms among “subject . . . predicate” forms (in Japanese) are used as the expression patterns. There are the following two methods to carry out the present invention, including: a method for extracting the expression patterns by specifying a modification pattern as described here; and a method for manually creating a specific expression set beforehand as a dictionary and extracting items which match with the dictionary.
Each of the past reports is similarly analyzed and added to the above table to obtain a past project information set.
Next, description will be given of an example of expression pattern characteristics.
The expression pattern characteristics are represented by a list of [T, p]. For example, expression pattern characteristics calculated for the expression pattern “keikaku henkou . . . hassei-suru” are sorted in ascending order of T. The following table shows the result. In this table, the third column shows a cumulative value of p (a sum of a cumulative value of p up to the previous row (0 in the case of the first row) and p in the current row).
Specifically, if the above expression pattern appears in the report, the project is a troubled project by the probability of 87.9% (0.879388) based on the total cumulative value. Furthermore, based on the cumulative value in the portion of T>0, a trouble occurs after the point when the expression pattern appears in the report, by the probability of 76.2% (pp=0.879388−0.116946=0.762442). Therefore, it is found out that the larger the value of pp, the more useful the expression pattern is for trouble prediction.
Next, description will be given of an example of a result of prediction by applying an appearance pattern of a past project.
The following is an example of a new project report to be a target for trouble prediction.
(In Japanese) Shanai tetsuzuki de keiyaku shounin made sunda ga, sono go okyaku-sama tsugou ni yori 11/1 kaishi ni henkou. Sagyou naiyou no naibu chousei chuu. Jitsu-keiyaku ni mukete okyaku-sama to hibi chousei chuu. Genzai okure to naru youin ha miukerarenai ga, ikutsuka no keikaku henkou ga hassei shite iru. Sonotame, kongetsu ni haitte kara suudo, sagyou sukejyu-ru wo okyaku-sama to chousei shita. Sono kekka shidai deha, kongo henkou ga hassei uru kanousei ga aru. Kongo mo chuui ga hituyou to kangaete iru.
(The internal procedures have been completed up to the approval of the contract. However after that the starting date was changed to November 1 due to the client's request. Internal adjustments are being made for work details. Adjustments are being made on a daily basis with the client for the actual contract. Currently, there is no factor to cause delay but several plan changes have occurred. Thus, in this month, adjustments were made with the client for the work schedule several times. Depending on the result of the adjustments, a plan change may arise. It is necessary to continue to exercise caution.)
The following are expression patterns which are extracted by the text analysis and used for trouble prediction.
With reference to Table 5, the maximum value of pp obtained from this report is pp_max=0.7624422 (in the case of “keikaku henkou . . . hassei-suru”).
Similarly, the above processing is performed for each of new reports on other projects to obtain pp_max. Thereafter, the projects are sorted in descending order of pp_max. The following table shows the result.
This is the order of project trouble occurrence probabilities obtained according to the present invention. Based on information indicating that the project ranked higher in the list has a higher trouble probability, the user can perform thorough checks on project situations preferentially from above (for example, the project 1084698 in the project report example described above has the second priority here). Thus, early detection and prevention of troubles can be efficiently achieved.
Although the present invention has been described above based on the specific embodiment, the embodiment is only one of the examples of the present invention. Therefore, those skilled in the art in the field can come up with various modified examples without departing from the scope of the invention. For example, in the block diagram shown in
Moreover, for example, in the block diagram shown in
According to the present invention, based on the expression pattern describing the project information, trouble occurrence probability and time can be estimated by use of the project trouble occurrence probability distribution, in which a point of time when the expression pattern appears on a text in the project information is set as an origin, and statistics representing the distribution. Thus, with high likelihood that has not heretofore been possible, the projects likely to have a trouble can be narrowed down.
Although the preferred embodiment of the present invention has been described in detail, it should be understood that various changes, substitutions and alternations can be made therein without departing from spirit and scope of the inventions as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
200873565 | Mar 2008 | JP | national |