Many program applications have been written to perform particular tasks. For example, test programs have been written to test and identify faults in electronic equipment. However, these programs are often only designed to independently run a series of tests on a particular electronic unit to determine faults in the electronic unit. They lack the ability to make intelligent decisions and/or interactively function with other test software. To compensate for this lack of ability, a decision support system can be implemented. A decision support system is a system which collects and analyzes data and the test software in order to make intelligent decisions about how and when the test software is to be run.
In order to design a decision support system which has the ability to augment the test software with intelligent decision making, it is important for the decision support system to have access to and analyze the process flow of the different test software. However, analyzing the process flow can be extremely complex due in part to the many different decision branches which can be taken during the course of running the test software. A decision branch is a branch of the source code which is run if a particular outcome of a test occurs. For example, if a test fails, one branch of the test software is run, whereas if the test passes, a different branch is run. The task of analyzing the process flow for multiple test programs is further complicated by the fact that each test program is often written in either different programming languages and/or different variations in syntax or vocabularies of the same programming language.
The above-mentioned problems and other problems are resolved by the present invention and will be understood by reading and studying the following specification.
In one embodiment, a method of extracting process flow data from a source code is provided. The method comprises parsing the source code with a plurality of regular expressions to identify a plurality of elements and at least one relationship between two or more identified element, and generating a model of the source code process flow based on the plurality of identified elements and the at least one relationship between two or more identified elements.
In another embodiment, a system adapted to analyze source code is provided. The system comprises a parser adapted to parse a source code with a plurality of regular expressions to identify a plurality of elements and at least one relationship between two or more identified elements, and a model generator adapted to generate a model of the source code process flow based on the plurality of identified elements and the at least one relationship.
In yet another embodiment, a computer program product comprising a computer-usable medium having computer-readable code embodied therein for configuring a computer processor is provided. The computer-readable code comprises first executable computer-readable code configured to cause a computer processor to parse a source code using at least one regular expression in order to identify a plurality of elements and at least one relationship between two or more identified elements and second executable computer-readable code configured to cause a computer processor to generate a model of the source code process flow based on the plurality of identified elements and the at least one relationship.
The present invention can be more easily understood and further advantages and uses thereof more readily apparent, when considered in view of the description of the preferred embodiments and the following figures in which:
In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific illustrative embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical and electrical changes may be made without departing from the scope of the present invention. It should be understood that the exemplary method illustrated may include additional or fewer steps or may be performed in the context of a larger processing scheme. Furthermore, the method presented in the drawing figures or the specification is not to be construed as limiting the order in which the individual steps may be performed. The following detailed description is, therefore, not to be taken in a limiting sense.
Embodiments of the present invention enable automated extraction of process flow data through parsing the source code. The parsed data is then used in the generation of a process flow model. In addition, embodiments of the present invention enable editing of the parser output to improve accuracy and reliability of the generated process flow model. The process flow model is generated in a format that is useable by a high level application (e.g. a decision support system) to analyze the data in making intelligent decisions. For example, the process flow model is analyzed to determine which tests should be run and the order in which the selected tests should be run.
The use of regular expressions enables parser 102 to identify elements in the source code. For example, in this embodiment, parser 102 is adapted to use regular expressions to identify elements which indicate, among other things, tests to be run, outcomes of the test, and actions to be taken based on the outcome (also known as indictments). In this embodiment, indictments include, but are not limited to, additional tests to be run, and suggested repairs.
In order to associate identified elements with each other, parser 102 maintains one or more state variables. In particular, the state variables keep track of the current position in the process flow. For example, in this embodiment, state variables are used for the last test found and the outcome value found corresponding to the last test. In this way, as each indictment is found, it is related to a given test and outcome. In addition, the different identified tests are also related by the state variables. For example, in this embodiment, if a test is found after the last test without finding an outcome between the two tests, then the second test is determined to run regardless of the outcome of the prior found test. On the other hand, some tests are identified to be dependent on the outcome of a prior test, in which the subsequent tests are only run if a particular outcome occurs for the prior test.
Additionally, parser 102 is configurable such that it is capable of parsing variations in different source code files. For example, in this example, parser 102 is adapted to handle modified vocabularies and possible syntax distortions of the ATLAS code. These variations can cause the same regular expression to return different results for different source code files. In this embodiment, parser 102 uses a plurality of configuration files to handle the different variations. Each configuration file contains a set of regular expressions each set corresponding to a different variation. If the differences are mutually exclusive to each variation, then the different configuration files can be added together into one configuration file. However, in other embodiments, the differences are not mutually exclusive. Therefore, in such embodiments, a global configuration file is used for regular expressions which are common throughout the different variations, whereas non-mutually exclusive differences are stored in separate local configuration files for each variation. Parser 102 is configured to select a local configuration file and uses both the local and global configurations files in parsing a particular source code file.
In the embodiment in
Model generator 106, in this embodiment, is adapted to receive and process editable parser output 104. Model generator 106 generates the model based on the elements and relationships identified by parser 102. In particular, model generator 106, in this embodiment, formats the parser output into a tree structure model. For example, model generator 106 generates and outputs a tree structure in eXtensible Markup Language (XML) for each received editable parser output 104 in this example. The generated model can then be used by other high level applications to analyze the process flow.
In operation, parser 102 receives and parses the source code for one or more programs. Parser 102 parses the source code using regular expressions to identify elements and relationships between the identified elements. Parser 102 outputs the parsing results as editable parser output 104. A human, in this example, reviews editable parser output 104 for errors, such as redundancies, false positives, etc. Once editable parser output 104 has been reviewed and edited, it is input to model generator 106. Model generator 106 generates a tree structure model of the process flow based on an analysis of the edited data in editable parser output 104.
A parser, such as parser 102, reads and stores statement numbers 202 and the corresponding strings. For each line, everything not part of statement number 202 is stored as a text string. The parser searches the text string of each line with configurable regular expressions. In this example, the regular expressions are implemented using the Practical Extraction and Report Language (Perl). However, it is to be understood that in other embodiments, other languages are used to implement the regular expressions, such as the Tool Command Language (Tcl). The regular expressions are used to find keywords indicating particular elements of the code, such as tests, outcomes, and indictments in this example. In addition, other values are also located using regular expressions. Such values include configuration, outcome qualifier (HI, LO), test group ID, and programming language constructs such as IF, WHILE, GOTO and procedure calls.
For example, the following is a configurable Perl regular expression used to locate keywords indicating a test in source code 200:
The test regular expression looks for lines which begin with one or more spaces followed by either the word COMPARE or the word VERIFY. Each line identified using the test regular expression is a test. In this example, the test ID (e.g. unique identifying name) of each test is based on the statement number 202 corresponding to the line where each test is located. For example, a test is identified on line 204 using the regular expression above. Line 204 corresponds to the statement number “050015”. The test ID for this test is “T50015” based on the statement number. Alternatively, test IDs are determined differently in other embodiments. For example, in some embodiments, unique test IDs are found in the text strings of source code 200.
Similarly, the following configurable Perl regular expression is used to locate keywords indicating outcomes of a test (also referred to as a check):
The check regular expression looks for lines beginning with one or more spaces followed by variations of the phrase “IF, NOGO”. Each line identified using the check regular expression indicates a check. A check is a point where the code branches in two or more directions. For example, in this embodiment, a check indicates a determination if the previous test failed or passed. If the “NOGO” flag is true, the expression “IF, NOGO” evaluates to true, and the outcome of the test is failed. This outcome is depicted by the statement ‘<TrueOutcome value=“Failed”/>’ in the check regular expression. If the “NOGO” flag is false, the expression “IF, NOGO” evaluates to false, and the outcome of the test is passed, as depicted by the statement ‘<FalseOutcome value=“Passed”/>’. The parser keeps track of whether it is inside the IF statement or not. In this way, it determines the outcome of the test and which branch of the code is being followed. In this example, the parser locates a check on line 206. For example, if the “NOGO” flag at this point is true, the expression “IF, NOGO” evaluates to true and the outcome of the test identified on line 204 is failed. The parser keeps track of what follows the IF statement identified on line 206 as the indictment corresponding to a failed test.
Additionally, the following configurable Perl regular expression is used in this example to locate keywords indicating indictments:
The indictment regular expression looks for lines containing variations of the phrase “PERFORM, ‘ONEFAILURE’”. For example, the indictment regular expression identifies an indictment on line 208 corresponding to a failed test “T50015”. In addition, the indictment on line 208 corresponds to statement number “050025”. The indictment regular expression also extracts the failed part “A1” from the string “C‘A1’”. The ([ˆ\′]+) in the statement extractRegex=“C\‘([ˆ\′]+)\’” allows the parser to return whatever was between the first C’ and the final ‘ for the indictment.
It is to be understood that the regular expressions shown above are exemplary and that, in other embodiments, other regular expressions are used. In addition, other expressions are used to identify other elements such as outcome qualifiers.
For example, the first entry 314-1 in test column 302 is “T50015”. This is the test ID of the first test found in source code 200. Test “T50015” is part of “Test Group 1” as indicated by corresponding entry 314-2 in test group column 304. The test groups are used in some embodiments to determine entry points in the source code. For example, a given source code file may have multiple entry or starting points. The test group ID is used to associate tests allowing for any of the multiple starting points to be used.
The corresponding entry 314-3 in outcome column 306 is “Failed” indicating a failed test “T50115”. In the same row, under indictment column 310, is entry 314-6 with the value “A1”. This information indicates that if the outcome of test “T50015” is a failed outcome, the corresponding action is “A1” where “A1” is a variable representing a particular action. Similarly, the second entry 316-1 in test column 302 is also “T50015”, but the corresponding entry 316-3 in outcome column 306 is “Passed”. In this row, there is an entry 316-5, “T50040”, in the next test column 310. This indicates that when the test “T50015” outcome is “passed”, the next test to perform is test “T50040”. Similar relationships between tests, outcomes and indictments are shown in parser output 300 for other tests found in source code 200 of
As can be seen in
For example, sub-class 404a represents the fault “F1” (identified by its fault ID) in
In addition, sub-class 404a has properties 408a-408e. Properties 408a-408c indicate the test ID of the tests and test outcomes which detect the fault “F1” (e.g. outcomeID “T1F” indicates a failed outcome of test T1). Property 408d indicates which test outcome indicates that fault “F1” is not a problem (i.e. cleared outcome). Property 408e indicates which repair outcome indicates that the physical part has been repaired. Classes 404b-404m have similar properties and attributes. For example, sub-class 404b has properties 406f and 406g which indicate possible outcomes of test “T1”. Therefore, model 400 provides information regarding relationships between tests, faults and repairs such that a high level application can analyze model 400 and make intelligent decisions regarding which tests to run and in which order.
At 504, the parser parses the source code using the regular expressions contained in the at least one selected configuration file. The parser steps through the source code line by line exploring the different branches of the source code process flow. The parser identifies keywords indicating particular elements as described above. In this example, the parser looks for keywords indicating, among other things, tests, checks, indictments, outcome qualifiers, test groups, configuration, etc. The parser keeps track of where in the process flow it is by updating one or more state variables as it processes each line of code. For example, in this example, the parser updates state variables for the last test found and the outcome value of the last test. Alternatively, additional or different state variables are used in other embodiments. For example, in some embodiments, an additional state variable for configuration is used. The configuration state variable is used to distinguish between two models of the same unit being tested by the source code. For example, if the unit being tested is an electronic component on an F/A-18 aircraft, a test failure may indicate different actions for model A than for model C. The configuration state variable keeps a record of when each model is being addressed in the source code.
At 506, the results of parsing the source code at 504 are output and edited. In this example, the results are output to an editable tabular file, such as output 300 above. This output is then reviewed and edited for errors, such as redundancy or missed elements. In this example, the reviewing and editing is performed by a human. However, in other embodiments, the reviewing and editing is automated by a computer and software. At 508, a model generator, such as model generator 106, uses the edited parser output to generate a model of the source code process flow. The model generator organizes the parser output into a tree structure relating the elements identified by the parser. In this example, the model generator relates each fault with the tests that detect it, and the recommended actions or repairs (i.e. indictments). In addition, the model generator, in this example, formats the parser output in a format that is useable by a high level application in analyzing the process flow. In particular, the model generator, in this example, formats the parser output into an XML file, such as model 400 above.
If the parser finds an element at 604, the parser stores in memory the condition of the state variables and other data (e.g. type of element and relationship to other elements) related to the identified element at 606. The state variables, in this example, are lastTest and Outcome, representing the last test found and the outcome of the test (e.g. failed or passed), respectively. However, it is to be understood that in other embodiments, additional and/or different state variables are used. At 608, the parser determines if the element identified at 604 is a test. If the element is a test, the parser updates the lastTest state variable with the test ID of the identified test at 610. In this example the test ID is based on the statement number of the line of code, as described above. Alternatively, other protocols are used for determining test IDs in other embodiments. The parser then moves to the next line of code to be processed at 602. In this example, the next line of code to be processed is the line of code which linearly follows the line just processed unless the line just processed calls another line of code. If another line of code is called, the parser moves to that line and internally remembers from which line of code the new line was called. However, in other embodiments, the parser moves forward in a strictly linear fashion from one line to the next while keeping track of relationships between lines based on call statements. If the element identified at 604 is not a test, the parser determines if the element is a check at 612.
If the element is a check, the parser updates the Outcome state variable to indicate what outcome is being processed at 614. For example, as described above, the parser knows, based on keywords and regular expressions, whether or not it is inside an IF statement or not. In this way, the parser knows which outcome (e.g. passed, failed, fail lo, fail hi) is being processed. Method 600 then continues at 602 to process the next line of code as described above. In this way, the parser explores a branch of the code corresponding to a particular outcome. Additionally, the parser stores internally the value of the state variables for each line of code. As the parser recursively processes the source code, it is able to follow a different branch of the code for each check and avoid processing lines of code a second time by remembering the value of the state variables for each line of code. For example, when a line of code is reached, the parser does not process it again if the state variables are the same for the line of code as in a prior run.
The state variables are also used to reduce processing power and time spent on irrelevant branches of the code (e.g. branches that do not contain information relevant to the task at hand). For example, if a branch of code goes through a series of outputs to a user without providing information regarding suggested repairs, additional test, etc., the state variables associated with the start of the branch will be the same at the end of the branch. Since the state variables are the same when the branch starts and ends, that branch of the code will be ignored when the parser processes and outputs its results. Finally, if the element is not a check at 612, method 600 continues at 602 to process the next line of code as described above.
If a procedure definition is not identified at 704, it is determined if a procedure perform statement is identified at 708. A procedure perform statement is a statement in the source code indicating that a particular procedure should be run. Procedure perform statements are also identified using regular expressions. For example, in this embodiment, regular expressions search for variations of the term “PERFORM”. If a perform procedure statement is identified at 708, data regarding the procedure flow for the particular procedure to be run is inserted into the parser output for the overall source code flow, at 710 (e.g. rows of data are entered into a parser output such as parser output 300 for the procedure flow). Once the procedure flow data has been inserted, method 700 continues at 702 where the next line of code is parsed. If a procedure perform statement is not identified at 708, method 700 continues at 702 where the next line of code is parsed as described above. Method 700 improves the performance of the parser since each procedure need only be evaluated once and can then be inserted into the parser results whenever a perform procedure statement is identified calling a particular procedure. Since a particular procedure may be called multiple times in several different places in the source code, method 700 decreases processing time and required memory since the data for a particular procedure need only be calculated and stored once.
1. lastTest=TEST1, outcome=failed
2. lastTest=TEST2, outcome=failed
3. lastTest=TEST3, outcome=failed
4. lastTest=TEST3, outcome=passed
The state variable lastTest represents the last test found and outcome represents the test outcome of the last test found. Since the procedure will exit if any of the three tests fail, there is an exit state corresponding to each of those states. In addition, the procedure will exit once all of the tests have been run. Therefore, a fourth exit state is included indicating that TEST3 (the last test in the procedure) has run. It is to be understood that in other examples, exit states may contain additional and or different state variables, such as a state variable for configuration as discussed above. In addition, the exit states may be based on different criteria than a failed or passed test.
Memory 1004 includes any type of suitable medium such as floppy disks, conventional hard disks, DVD-RW, CD-RW, reprogrammable non-volatile memory such as flash memory and EEPROM, volatile memory such as dynamic RAM and static RAM, and any other existing or later developed suitable medium. Processor 1002 and memory 1004 are coupled together allowing processor 1002 to write to and store data in memory 1004 as well as retrieve stored data from memory 1004. For example, memory 1004 is used to store state variable values as well as data regarding identified elements in the source code (e.g. relationships, state variable values for each line, etc.).
Processor 1002 includes or interfaces with hardware components that support the parsing of source code and generation of a process flow model as described above. By way of example and not by way of limitation, these hardware components include one or more microprocessors, graphics processors, memories, storage devices, interface cards, and other components used to parse a source code file and to generate a process flow model from the parsed sourced code file. Additionally, processor 1002 includes or functions with software programs, firmware or computer readable instructions for carrying out various methods, process tasks, calculations, and control functions for the parsing of source code and the generation of a process model. The computer readable instructions, firmware and software programs are tangibly embodied on any appropriate medium used for storage of computer readable instructions including, but not limited to, all forms of non-volatile memory, including, by way of example and not by limitation, semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and DVD disks.
In operation, processing unit 1000 receives a source code file via input/output interface 1006. According to computer-readable instructions (stored in memory 1004 in this example), processor 1002 parses the source code file using regular expressions to identify certain elements of the source code, as described above. Data regarding the identified elements and relationships between the elements is stored in memory 1004. Once processor 1002 has completed parsing the source code file, the results are generated and output in a human-readable file via input/output interface 1006. Alternatively, the results are generated and output in a computer-readable file via input/output 1006. Once the file has been reviewed and edited, the edited results are input via input/output 1006. The edited results are then used by processor 1002 to generate a process flow model as described above. Notably, although processing unit 1000 is used, in this example, to both parse the source code file and generate the process flow model, it is to be understood that in other embodiments, a separate processing unit is used to parse the source code file and another processing unit is used to generate the process flow model.
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that any arrangement, which is calculated to achieve the same purpose, may be substituted for the specific embodiment shown. This application is intended to cover any adaptations or variations of the present invention. Therefore, it is manifestly intended that this invention be limited only by the claims and the equivalents thereof.
The U.S. Government may have certain rights in the present invention as provided for by the terms of a Government contract.