Conventional parallel processing software development models either (a) create no revenue for the developers (Open source, GPL model), (b) pay the developers by sharing in a corporate environment (profit sharing at the discretion of a company or controlling organization), (c) pay the developers per programming job (consulting), or (d) pay the developers per time period (salary model). These payment models are at the discretion of some controlling company. Thus, developers may not fully reap the rewards of their labors.
The controlling company itself typically receives remuneration only for completed applications. The exception is if the company creates libraries of specialized functions and sells entire libraries. Writing software is very time consuming, with developers needing to redevelop various software code components over and over again, even though the same or other organizations may have already developed the required functionality. This is because there is no current method of identifying and accessing those previously created software components. What is missing is a business model that allows developers from multiple, non-associated organizations to share useful software functionality such that 1) the required software functionality can be quickly identified, 2) such codes can be easily accessed, 3) the underlying software codes are inherently protected from theft, and 4) the originating company can receive remuneration from the use of their functionality.
Presently, an individual or organization can purchase a single copy of an application which places a copy of the underlying code on the purchaser's equipment. This can allow the purchaser to duplicate the underlying code, repackage the duplicated code, and resell the duplicated code with no recompense to the original development organization. During application development, it can be very difficult for the development organization to know if it has a performance advantage over its competitors. Similarly, application program purchasers must depend primarily upon the claims of the application creating organizations, with little head-to-head comparison capability available. Since the performance of an application can be a function of the specific data processed by that application, the ability to compare the performance of multiple applications under the user's conditions can be extremely valuable to the application purchaser, and is not directly available through third-party evaluations.
An organization that utilizes the parallel processing development environment may include one or more administrators and zero or more developers. The organization may represent an actual company with employees that utilize the parallel processing development environment, or may represent a collective of individuals that cooperate to develop parallel processing routines using the parallel processing development environment.
The parallel processing development environment represents a client/server-based, multicore, multiserver, graphical process-control, computer program management, and application-construction collaboration system.
Environment 100 includes a graphical process control server 104 that provides an interface to the Internet 150, through which one or more developers 152 may access environment 100 concurrently. Environment 100 also includes one or more database for storing kernel 122, algorithm 124, organization 126, user 128, database 130, and usage information 132. A development server 108 of environment 100 facilitates creation and maintenance of kernels 122 and algorithms 124 in cooperation with graphical process control server 104 and database 106. A program management server 110 of environment 100 facilitates access to a cluster 112 of environment 100 to execute one or more algorithms 124 and kernels 122.
As illustrated in
As shown in
Development server 108 allows developer 152, through interaction with graphical process control server 104, to submit a kernel and/or an algorithm for testing within environment 100. Development server 108 stores received kernels and algorithms within database 106 and in association with developer 152 and organization 154. In one embodiment, database 106 represents a relational database and a file store. Additional control information is stored within database 106 (e.g., within separates database tables, not shown) in association with these kernels and algorithms that define access and cost of each kernel and algorithm.
Environment 100 also includes a financial server 102 that provides payment to organizations 154, administrators 158, and developers 152 based upon license fees and usage fees received for each of the organizations kernels and algorithms. For example, kernel 122 developed by developer 152(1) of organization 154(1) may be incorporated into algorithm 124 developed by developer 152(3) of organization 154(2). A license fee, defined by administrator 158(1), for kernel 122 is paid by organization 154(2) and a first part of the license fee is distributed to developer 152(1), a second part of the license fee is distributed to administrator 158(1), and a third part of the license fee is distributed to organization 154(1). A fourth part of the license fee may be accrued by financial server 102 as payment for use of environment 100. That is, environment 100 may not charge connect and use time for each developer and administrator, but instead receives financial compensation based upon a percentage of license fees and usage fees associated with each kernel and algorithm. Similarly, developed algorithms may be sold, through environment 100, to other organizations, and proceeds from the sale may be distributed to the owning organization, its administrators, and its developers, with environment 100 receiving a percentage of the overall sale price.
Each kernel 122 and algorithm 124 within database 106 has a defined category and a set of keywords that classify each kernel and algorithm within environment 100. Categories may include ‘cross-communication’, ‘image-processing’, ‘mmo-gaming-tools’, and so on. Additional keywords may be associated with each kernel and algorithm to define features thereof in detail, such as required parameters and data output formats. Kernels and algorithms stored within database 106 may be selected by developers inputting a category and/or one or more keywords.
Each kernel (e.g., kernels 204) represents a software routine that runs on cluster 112,
In the example of
In one embodiment, environment 100 ensures that, upon creation of a new algorithm, the usage cost and license cost is equal to or greater than the sum of the usage costs and components costs, respectively, of the components included therein. Specifically, when algorithm 222 is licensed (or used), environment 100 ensures that developer(s) 152 of each kernel 204 and algorithm 202 included therein receives an appropriate portion of a license fee 220 and/or a usage fee 230 paid for algorithm 222.
When creating algorithm 222, developer 152 requires a license for each kernel 204 and algorithm 202 used therein. Developer 152 therefore pays a new license of each kernel 204 and/or algorithm 202, unless a license for each of these kernels and algorithm is already held by developer 152. Environment 100 operates to ensure that developer 152 pays any necessary license costs 208 prior to allowing developer 152 to include any selected kernel 204 and/or algorithm 202 within a new algorithm.
Once a new kernel or algorithm is created, it may remain private for use within the creating organization, or it may be published for use by developers within other organizations. In one embodiment, user interface 160,
Environment 100 controls licensing and use of kernels 204 and algorithms 202, 222, tracks their earned usage and license fees, and thereby allows developers to share income from developed routines and algorithms. Further, sharing and re-use of developed software is encouraged and rewarded by environment 100 through automatic control and payment of license fees and usage fees.
To encourage developers to create and publish parallel processing algorithms (e.g., kernels and algorithms), environment 100 does not charge developers for use of the facilities provided by environment 100. Rather, environment 100 retains a percentage of the usage fees and license fees earned by each kernel and algorithm as it is licensed and used. This fee is added on top of the other fees such that the requested income flow remains unimpeded.
Upon running of program 304 on cluster 112 to process data 306, program management server 110 determines an appropriate usage fee 320, payable by user 352 based upon usage costs of program 304, size and type of data 306, and the number of processing nodes 113 of cluster 112 selected for running program 304. Program management server 110 may inform financial server 102 of usage fee 320, such that financial server 102 may determine payments 322, based upon components of program 304, for developers 152. Using the examples of
Financial server 102 also withholds a certain percentage of usage fee 320 as payment for use of environment 100 by developers 152(1)-(5), since these developers contributed to algorithm 222. User 352 may select higher performance processing for a particular task, and pay a premium price for that higher performance from environment 100. A task selected for higher performance processing may utilize additional processing nodes of cluster 112 or may have a higher priority that ensures nodes are allocated to the task in preference to lower priority task node requests. Payment for this higher performance processing is used only to pay for use of environment 100 and not paid to developers.
Parallel processing routines (e.g., kernels and algorithms) and databases (e.g., database 130,
“Massively Parallel Technologies” is one exemplary organization name, which may be abbreviated to “MPT” on a button or control of user interface 160. Where the organization name is abbreviated within user interface 160, if the developer ‘hovers’ the mouse over the abbreviation, the full organization name will be displayed. Within an organization, exemplary categories are: “cross-communication,” “image-processing,” and “mmo-gaming-tools.” These categories would appear within user interface 160 once the organization is selected. Exemplary parallel processing routine names are: “PAAX-exchange,” “FAAX-exchange,” and “Howard-Cascade.”
In one example of operation, developer 152(5) first selects the name “MPT” of organization 154(3) and then category cross-communication, and then a kernel called Howard-Cascade. Developer 152(5) may then include the selected kernel within a new algorithm or profile the kernel to determine characteristics based upon a test data set.
Development server 108 profiles each of first routine 404(1) and second routine 404(2) to determine first routine profile 408(1) and second routine profile 408(2), respectively. Each routine profile 408 includes one or more of: amount of RAM used 410, communication model 412, first and second processing speed 414 and Amdahl Scaling 416. In one embodiment, one routine profile 408 is created for each communication model 412 selected for routine 404. Selection of a particular communication model may result from profiling the routine using each available communication model, or may be made by a user.
In one example of operation, development server 108 profiles first routine 404(1) running on a single processing node of cluster 112 to process test data 406 and derives RAM used 410(1), communication model 412(1) and a first processing speed 414(1) based upon the execution time of the first routine to process the test data. Development server 108 then profiles first routine 404(1) running on ten processing nodes of cluster 112 to process test data 406 and derives a second processing speed 414(3). Processing speed and execution time are used interchangeably herein to represent the processing performance of the parallel processing routines, and not the computing power of the processing node. For example, first processing speed 414(1) represents the execution time for processing test data 406 by first routine 404(1) on a single processing node of cluster 112. Development server 108 then determines Amdahl Scaling 416(1) based upon the first processing speed 414(1), the determined second processing speed 414(3) and the number of processing nodes (N) used to determine the second processing speed 414(3), as described in association with
To encourage the use of the most appropriate kernels and algorithms, and to allow developers to evaluate newly created kernels and/or algorithms, environment 100 allows a developer or user to compare kernels and algorithms against one another, such that the best kernel/algorithm for a particular task may be identified and incorporated into that task. Many factors determine suitability of a kernel and/or algorithm for a particular task, including, but not limited to, size of the data set, parameters input to the kernel and/or algorithm, number of processing nodes selected for processing the kernel and/or algorithm, and Amdahl Scaling of the kernel and/or algorithm.
In one embodiment, environment 100 does not save routine profiles 408 within database 106, since conditions for evaluating the parallel processing routines typically change, particularly since each developer evaluates the routines utilizing their own test data tailored to their processing specifications and requirements. Environment 100 facilitates automatic evaluation of new and existing the parallel processing routines against test data and input parameters to allow a developer to select optimal kernels and algorithms based upon their data requirements. In another embodiment, environment stores routine profiles 408 in relation to test data 406 and the evaluating developer 152, such that a developer need not profile routines more than once when input parameters and test data have not changed.
In step 502 of method 500, the routine is profiled on a single processing node to get a First Execution Time. In one example of step 502, development server 108 profiles first routine 404(1) processing test data 406 within a single processing node of cluster 112 to determine first processing speed 414(1). In step 504, a projected execution time of the routine on N-processing nodes is calculated as First Execution Time/N, where N is the number of processing nodes used for profiling. In one example of step 504, ten processing nodes 113 are to be used to profile routine 404(1) in step 506, and thus N equals 10, giving the predicted execution time as first processing speed 414(1) divided by 10. In step 506, the routine is profiled on N processing nodes to determine a second execution time. In one example of step 506, development server 108 profiles routine 404(1) processing test data 406 on ten processing nodes 113 of cluster 112 to determine second processing speed 414(3). In step 508, the Amdahl Scaling is calculated as the Projected Execution Time/Second Execution Time. In one example of step 508, the first processing speed 414(1) is divided by ten, since ten processing nodes 113 were used in step 506, and then divides this result by second processing speed 414(3). If the first execution time is 10 seconds, and the second execution time is 5 seconds, the Amdahl Scaling factor is 0.5. An Amdahl Scaling factor of one is ideal; parallel processing routines having an Amdahl Scaling value close to one scale more efficiently than routines with a smaller Amdahl Scaling factor.
In step 606, each selected similar parallel processing routine is profiled using the test data. In one example of step 606, development server 108 utilizes method 500 to profile second routine 404(4) processing test data 406 and generates routine profile 408(2). In step 608, the profile data of the first parallel processing routine is compared to profile data of each of the selected similar parallel processing routines to rank the first parallel processing routine against the selected similar parallel processing routines. In one example of step 608, where efficiency of parallel scaling is of greatest importance, development server 108 compares first routine profile 408(1) against second routine profile 408(2) and ranks first routine 404(1) against second routine 404(2) based upon Amdahl Scaling 416 within each routine profile 408. In step 610, the communication model of the selected existing routine is then determined.
Optionally, developer 152 may prioritize elements of routine profile 408 to influence the ranking of step 608. For example, for a particular application where the maximum amount of RAM used is based upon the size of the data being processed, the algorithm that utilizes less RAM may be more valuable than the algorithm with the fastest processing speed. Thus, developer 152 may define RAM used 410 as the highest priority element within routine profiles 408, such that development server 108, in step 608 of method 600, ranks the kernel with the lowest RAM used 410 value above other profiled characteristics.
In one example of operation, developer 152 uses environment 100 to evaluate a new kernel against existing kernels with similar functionality within environment 100 using test data 406. Development server 108 selects kernels from database 106 based upon one or both of category and defined keywords defined by developer 152 for the new kernel. Development server 108 profiles, using method 600 of
Software Plagiarism Detection
Unscrupulous software developers may copy (or use a close imitation of) computer code and ideas developed by another developer, and present this replicated code as original work. Software is easily duplicated, and, thus, its value can be easily harmed. Source code is easily modified, without changing its functionality, using global find-and-replace methods and/or by rearranging the order of the functions within the source code. By combining these two modifications, it is difficult for the uninitiated to recognize software plagiarism.
In the following example, the ‘C’ software language is used, however, other software languages may be used in place of the ‘C’ software language without departing from the scope hereof. Further, the amount of formatting that is ignored by a compiler of software source code varies between software languages, and only formatting that has no effect on the compiled code is removed in the following methodology.
Functionally, there is no difference between first software source code 700 and second software source code 800, however, this is not immediately apparent when comparing second software source code 800 to first software source code 700. Further, since the order of functions within second software source code 800 are re-ordered, as compared to the order of functions within first software source code 700, compiled code of second software source code 800 will differ from compiled code of first software source code 700; compiled code cannot be directly compared to identify plagiarism. In these examples, the ‘C’ language is case sensitive, and this requires the case of characters to match. Other software languages are case insensitive, and in embodiments supporting such languages, characters may be converted to all lower-case (or all upper-case) to ignore character case.
Environment 100 includes a plagiarism detection module (PDM) 109 for identifying plagiarism within submitted parallel processing routines (e.g., kernel 112 and algorithm 124). PDM 109 is illustratively shown within development server 108, however, PDM 109 may be implemented within other servers (e.g., program management server 110 and financial server 102) without departing from the scope hereof. PDM 109 may also be implemented as a separate tool for identifying software plagiarism external to environment 100.
In a further example, an unscrupulous developer changes the order of independent statements within the software source code in an attempt to hide plagiarism.
At step 5010, assignment statements within the block are analyzed to determine which assignment statements are dependent within the block and which are independent. There are two types of assignment statements in the ‘C’ language: single-sided and two-sided. A single-sided assignment statement utilizes increment and decrement the operators, “++” and “−−”, respectively, in association with a variable. For example, “a++;” is an assignment statement that is equivalent to “a=a+1;”. A two-sided assignment statement includes one of the following operators: “=”, “/=”, “*=”, “+=”, “−=”, “&=”, “|=”, “̂=”, “<<=”, and “>>=”. For example, “a=a+1” is a two-sided assignment statement. The variable shown in the above single-sided assignment statement is considered as occurring on both the left and right side of the assignment. If a variable found in the right side of an assignment statement within a code block is also found on the left side of any preceding assignment statement (real or implied) within that same block, then that statement is considered dependent (e.g., dependent statements 3030, 3032 and 3034). Within the same block, any non-assignment statements following an assignment are considered associated (e.g., independent statements 3010 and 3012) with that assignment statement.
At step 5015, multiple instances 2910* (shown in
Statements that are not determined as dependent within a block are considered independent statements and are placed, along with any associated statements, anywhere within a given code block, provided such placement does not change an independent statement into a dependent statement or change the dependency of a dependent statement (i.e., as long as the placement does not affect the dependency of any statements within the block). The dependency of a statement changes if an independent statement containing a variable on its left side (actual or implied) is exchanged for a statement that depends upon that left side variable. Dependent statements must occur after their defining independent statements. A dependent statement has no associated statements. Each software source code instance represents one permutation of the independent statements within their respective code blocks.
Looking at code block 3006 and at the above rules for positioning independent code statements, there is only one other permutation of the included statements. That is, independent statement 3010 and 3012 may exchange positions, but independent statement 3014 cannot move since the “++i” portion of the statement would cause either independent statement 3010 or independent statement 3012 to become dependent therefrom. Independent statement 3014 cannot exchange with any of dependent statements 3030, 3032, and 3034 since their dependence would be violated.
In one embodiment, at step 5020, each new code instance 2910* generated from permutations of movable independent statements is stored as a +“_#”+separate file using the following filename format: sourcefilename+”_#”+“.c(cpp)”, where “#” represents the instance number. For example, if the original software source code file is named “a.c”, the first new software source code instance filename is generated as “a—1.c”.
In step 902 of
In step 904, the software source code is parsed to generate one source code instance for each permutation of independent statements, as described above with respect to
Process 1000 of
In step 1004, functions within the source compare file are placed in ascending order according to length in characters. In one example of step 1004, PDM 109 determines the length in characters of each function within source compare file 1500 and places these functions in ascending size order, shown as source compare file 1600,
In step 1006, a component redaction file 2922(*) is generated for each function within the source compare file. In one example of step 1006, PDM 109 creates a component redaction file 1700,
Returning to method 900,
Steps 910 through 916 are repeated for each identified parallel processing routine of step 908.
In step 910, the identified software source code is parsed to construct a function table and a variable table for the ‘main’ routine, and a variable table for each additional function listed within the function table. In one example of step 910, PDM 109 parses software source code 700 to generate second function table 2000,
In step 912, process 1000 is invoked to perform redaction on identified software source code of step 908 to form second compare files and zero or more second component redaction files. In one example of step 912, PDM 109 implements process 1000 to process software source code 700 and generate source compare file 2400,
In step 914, the first compare files are compared to the second compare files to determine the percentage of plagiarism between code statements of the first source compare files and code statements of the second source compare files. In one example of step 914, PDM 109 utilizes a Needleman-Wunsch analysis to determine a percentage of plagiarism between (a) compare file 1600 and compare file 2500, (b) compare files 1700, 1800, 1900 and compare files 2600, 2700 and 2800, respectively. In particular, plagiarism percentages are determined for each instance 2910(1), 2910(2), and 2910(3) derived from software source code 800 against compare files 2500, 2600, 2700 and 2800. Source code alignment and plagiarism percentage determination is described in detail below, with reference to
In step 916, the first source code file is rejected if the determined plagiarism percentage is greater than an acceptable limit. In one example of step 916, PDM 109 has a defined limit of 60% and flags software source code 800 for rejection since determined plagiarism percentage is greater than 60%. PDM 109 may also send a rejection notice for software source code 800 to the associated developer 152.
Step 918 is a decision. If, in step 918, method 900 determines that the first source code file was not rejected in step 916 for any identified parallel processing routine within database 106, method 900 continues with step 920; otherwise, method 900 terminates. In step 920, the first source code file is accepted. In one example of step 920, software source code 2902 is accepted as not being plagiarized.
By utilizing method 900, each function may be evaluated against other functions stored in database 106 to determine a plagiarism percentage. Within software source code, functions may be considered a complete functional idea and are thus individually checked for plagiarism. As shown above, redacted code for each function is placed into its own file, called a component redaction file, which may have the file extension “.CRE”. Each component redaction file is compared against selected component redaction files within environment 100 (e.g., as stored within database 106). This process is similar to the process described in
Plagiarism—Alignment Step
Software is typically created in versions, with one version including many of the features of a previous version. That is, there may be an evolutionary relationship between versions of code. Based upon this evolutionary relationship, bioinformatics mathematical tools may be used to determine a closest version of tested code to a newly submitted software source code. Using the Needleman-Wunsch dynamic programming model, it is possible to obtain all optimal global alignments between two redacted files (e.g., component redaction file 2922(1) and component redaction files 2922(4)-2922 (9)). The Needleman-Wunsch equation is as follows:
M
i,j
=M
i,j+max(Mk,j+1,Mi+1,l)
Where:
Using a Smith-Waterman dynamic programming model, it is possible to obtain all optimal local alignments between two source compare files (e.g., compare files 1600 and 2500). The Smith-Waterman dynamic programming model, as described here, is considered the preferred alignment method because it allows the effects of gaps in the compared sequences to be weighted. The equations below show the Smith-Waterman dynamic programming model:
Example:
Plagiarism—Compare Step
The greater the number of matched characters found in two codes used to generate filtered, optimally aligned traces, the lower the probability that those codes are unaffiliated. If the compared codes generate matches long the filtered, optimally aligned trace above 25% then homology may be assumed; that is, the codes are evolutionarily related. Therefore, 25% character matches along any filtered, optimally aligned trace by any two codes (called A and B, with A=the code being tested for plagiarism) constitutes plagiarism of A against B.
Determining Code Lineage
Since software source code is generally created in versions, with one version conserving many of the features of the previous version, where there are multiple versions of the code then some version of code will have a higher percentage of matches in the filtered aligned trace to another version closest in lineage. For example, if an unknown software source code (version X) is compared against software source code versions that are evolutionally related, then the following scenarios may occur.
Code-creation time-stamps may also be used in place of version numbers to show the association of some unknown code such as version X.
Malicious Software Behavior Detection
Within environment 100, parallel processing routines (e.g., kernels 122 and algorithms 124), should not cause problems to other parallel processing routines. Software that causes problems to other software is called malicious software, and the unwanted software activity is called malicious software behavior. Malicious software behavior may occur accidentally or may be intentional. In either event, malicious software behavior is undesirable within environment 100. Preferably, malicious software is detected prior to publication of that software (e.g., parallel processing routine) within environment 100.
One exemplary malicious software behavior is when a variable (e.g., an array type structure or pointer) in memory overflows and protected memory is accessed. A hacker (i.e., a person that intentionally creates malicious software) attempts to gain unauthorized access to protected memory of a system and then exploit that access.
To prevent malicious software behavior within environment 100, development server 108 includes a malicious behavior detector (MBD) 111. Specifically, MBD 111 functions to detect malicious behavior within parallel processing routines submitted for publication within environment 100. MBD 111 detects malicious software behavior in submitted parallel processing routines, and detects when a parallel processing routine is overflowing its variables.
Identifying linear source code segments within the software source code allows the software to be iteratively tested when not all linear code segments can be tested in a single run. MBD 111 further modifies augmented source code 3204 to output tracking information from each linear code segment into a tracking file 3208 with the same filename as the software source code and a “.TRK” extension. A parallel processing routine associated with software source code 3202 is not published for use by the present system until all branches and looped code segments have been tested as indicated by tracking information within tracking file 3208.
In step 3402, process 3400 inserts code to include a definition file into an augmented source code. In one example of step 3402, MBD 111 inserts “#include <mpttrace.h>” at point 3302 of software source code 3300 to include definitions that support tracking code that will also be inserted into augmented source code 3204. In step 3404, process 3400 inserts code to open a tracking file into a first linear code segment of the augmented source code. In one example of step 3404, MBD 111 inserts code insert 3500,
In step 3408, process adds block markers to surround the identified linear code segment if it is a single statement without block markers. In one example of step 3408, MBD 111 adds delimiters “{” and “}” around linear code segment 3356. In step 3410, process 3400 inserts source code to append a time-stamped segment identifier to the tracking file within each linear code segment. In one example of step 3410, MBD 111 adds code to call a function ‘mptWriteSegment (trkFile, “X”)’, where X is the segment number, as a first statement within each identified linear code segment 3352, 3354, 3356, 3358, 3360, and 3362. The function ‘mptWriteSegment’ writes the current time and date, and the segment number X to the end f the already opened tracking file, ‘trkFile’. In step 3412, process 3400 inserts source code to close the tracking file prior to each program termination point. In one example of step 3412, MBD 111 adds code insert 3700,
In addition, the “mptWriteSegment” function determines if execution time of previous segments, and/or the total execution time, exceeds a defined maximum time. If the defined maximum time limit has been reached, the “mptWriteSegment( ) function returns a 1; otherwise, it returns a 0. As shown in code insert 3600,
Tracing Kernel Data Usage—Level 2 Augmentation
Computer languages may have different static and dynamic memory allocation models. In the C and C++ languages, dynamic memory is allocated using “malloc ( )”, “calloc ( )”, “realloc ( )”, and “new type ( )” commands. Arrays may also be dynamically allocated at runtime. The allocated memory utilizes heap space. Unless the allocation is static, it is created for each routine in each thread. The C language includes the ability to determine a variable address and write any value starting at that address. To ensure that memory outside of the memory allocated to the routine is not accessed (e.g., by writing more values to a variable than that variable is defined to hold, which is a standard hacker technique), all variables, static and dynamic, are located and their addresses are checked at runtime for overflow conditions.
To identify code that will access memory beyond the defined extent of a variable, the starting and ending addresses of each variable is determined at runtime.
If a pointer is declared, as shown at position 4012 of
When required, allocation of memory to the pointer is isolated, such as from within an “if” statement as shown at position 3840. The assignment of the memory and the evaluation of the pointer resulting from the allocation are separated, as shown at position 4014, to allow the variable address detection code 4002 (e.g., function “mptStartingAddressDetector( )”) to record the start address, and the test of the allocated pointer is performed within a separate “if” statement as shown.
The starting address is obtained as follows:
To evaluate the pointer value at run time, a function is inserted after the statement changing the pointer value as follows:
In this example, the function “mptCurrentAddressDetector( )” compared the modified pointer value against the determined starting and ending address values as previously determined by the “mptStartAddressDetector( )” function and stored within a variable tracking table 4100 of
Tracking Memory Allocations And Deallocations
As noted above, memory is typically assigned to a pointer using an allocation function within the language. In the C language, memory is allocated using a malloc, calloc, realloc, or new system function call. To record these memory allocations, an allocation tracking function is added to augmented source code 3204 proximate to the assignment to the pointer, to write the name of the variable on the left side of the memory allocation assignment into an allocated resources table.
Proximate to each memory allocation and assignment to a pointer variable within augmented source code 3204, a call to the “mptAllocationTableChange( )function, with a one as the third parameter, updates allocated resources table 4300 to indicate that memory has been allocated to that pointer variable. Similarly, for each memory de-allocation statement of augmented source code 3204, a call to the “mptAllocationTableChange( ) function is inserted with a zero as the third parameter to record the memory deallocation to the pointer variable of the statement. Where memory is allocated to pointer already listed within allocated resources table 4300 (e.g., memory is allocated to a pointer variable more than once), an additional entry with the same variable name is added to allocated resources table 4300.
When memory is deallocated from the pointer variable, the first entry in allocated resources table 4300 that matches the variable name and function name, and has the allocation flag set to one, is modified to have the allocation flag set to zero. Allocated resources table 4300 thereby tracks allocation and deallocation of memory, such that abnormal use of allocated memory (e.g., where memory is allocated twice to a pointer variable without the first memory being deallocated) can be determined. Similarly, address assignments (e.g., a memory address stored within one pointer variable assigned to a second pointer variable) are tracked to prevent miss-use of allocated memory.
At every program termination point (e.g., a return or exit function call within the C language), the allocation resource table values are stored in tacking file 3208. Below shows the function required to perform the allocation resource table value tracing augmentation.
Forced Code Segment Entry—Level 3 Augmentation
Accessing certain code segments within software source code 3202 may be problematic in that they are typically accessed only upon certain error conditions. Where code segments are not accessed through normal operation, a forced segment file 3210 (see
Within augmented source code 3204, each branch point 4512, 4514, and 4516, is modified to evaluate the appropriate element of the force array. For example, the conditional statement at the entry point of segment six evaluated element six of the force array. Thus, by including the segment number within forced segment file 3210, the force array element associated with that code segment is set to one when the file is read in at run time, and that code segment is entered when the condition for the branch statement is evaluated.
Within augmented source code 3204, for the C language, an additional case is added to case statements (e.g., switch) prior to the default case label, which allows activation of the default via the force file. Further, where the code segment to be forced is embedded within another code segment (e.g., nested, if statements), then all activation of all nesting branch points is required to insure that the targeted code segment is actually activated.
Use of Multiple Program Runs to Access All Segments
Augmented source code 3204 is compiled and then run to produce tracking file 3208 which contains variable address accesses, code segment accesses and times/dates. MBD 111 then processes tracking file 3208 to determine whether all segments within software source code 3202 have been accessed. If all code segments within software source code 3202 have not been accessed, MBD 111 generates a missing segment file 3212 which contains a list of un-accessed code segments. The file name format for missing segment file 3212 is “sourceFileName.MIS.”
The user may view missing segment file 3212 to determine whether additional runs are necessary with modified forced segment file 3210 to activate the identified missed code segments. Tracking file 3208 is cumulative in that output from additional runs of augmented source code 3204 is appended to the file. Missing segment file 3212 regenerated by each run of augmented source code 3204 so that the user knows which segments require profiling. When all code segments of software source code 3202 have been accessed then missing segment file 3212 is not created, thereby indicating that all segments have been analyzed. If a new software source file is provided by the user, then any tracking file with the same source file name is erased from the system, thereby requiring all segments to require analysis.
Interactive Kernel Tracing
Since testing software source code 3202 may require several runs of augmented source code 3204, MBD 111 allows a user (e.g., developer 152) to interact with user interface 160 within client 156 to trace execution of a submitted kernel interactively. MBD 111 creates a visual representation of a submitted (or selected) kernel (e.g., kernel 204(1),
By selecting the “trace” option within user interface 160, a runtime “interactive flag” is set, that causes the write segment function (e.g., “mptWriteSegment ( )”) to stop execution of the kernel at each code segment and allows the user to set the force array (e.g., “mptForceArray[ ]”) interactively prior to continuing execution of the kernel.
In one example of operation, as augmented source code 3204 is executed, the code segment being executed is highlighted within function-structure diagram 4600. MBD 111 stops execution of augmented source code 3204 at each branch point (e.g., branch points 4512, 4514, and 4516 of
The user may select a code segment using a right mouse button to indicate that execution should not halt at that segment. Whenever execution of augmented source code 3204 is halted (e.g., at one of a branch point, an exit, and a return) then the user may optionally display variable names, their starting, ending, and current addresses, as well as their current location values within a pop-up window. For example, the user may click a “View-Change Variables” button within user interface 160 to display these variables. Selecting the current value field of any variable within the pop-up window allows the user to change the variable's data. If the variable is an array then the array index value may also be changed by the user to display that array element's value. Where the user changes a variable's value, code segments executed after the change are not tracked as accessed segment paths. In one embodiment, an array (e.g., “mptVariableArray[ ]” is used to store this variable information for display within the pop-up window.
Furthernore, whenever execution of augmented source code 3204 is halted (e.g., at one of a branch point, an exit, and a return), then the user may optionally display the contents of the mapping file (e.g., mapped source code 3206) within a pop-up window by selecting a “View Code” button within user interface 160. Within this pop-up window, the current code segment is highlighted, for example as determined from execution of the “mptWriteSegment( )” function added to augmented source code 3204. Further again, MBD 111 records the code segments executed within augmented source code 3204 and displays older code segment executions in one or more different colors. Since code segment execution is based upon data within the missing segment file 3212, all segment activation history is reset when a new version of the software source code 3202 is loaded into environment 100.
Code Segment Rollback
Whenever execution of augmented source code 3204 is halted (e.g., at one of a branch point, an exit, and a return), the user may optionally select a rollback button (e.g., “Rollback Code” button) within user interface 160 to resume execution at the last executed code segment. This is implemented, in one embodiment, by utilizing the last executed code segment returned by the “mptWriteSegment” function, thereby allowing MBD 111 to use that information to transfer control to the returned code segment.
Collaborative Kernel Level Debugging
Since the above described functionality and tools are implemented within development server 108, for example, and not on the user's equipment, the interactive activity may also be shared with other developers. For example, multiple users within an organization may each activate trace mode for the same kernel and then simultaneously access the above described tools. In one embodiment, the first person initiating trace of the kernel becomes the moderator and may selectively allow other users access to view and optionally control the interactive session.
In one embodiment, the name of each collaborative user is displayed within user interface 160 and indicated, through highlighting and/or color, which user has control of the currently executed segment. For example, the user with current control may select the name of another user to pass control of the interactive session thereto. Only the user with segment control may select the segment, display code, display variables and/or change variables. Only the moderator may select the “Continue” and the “Rollback Code” buttons. The moderator may change the segment control user at any time during halted execution.
Collaborative Algorithm Tracing
An algorithm may consist of multiple kernels and may include other algorithms. Within user interface 160, the user (e.g., developer 152 or administrator 158) may select an algorithm for tracing by MBD 111.
For example, selecting a kernel results in function-structure diagram 4600,
In one embodiment, the user assigned to each kernel 4802 becomes the moderator of that kernel and proceeds to trace that kernel within MBD 111, as described above (see
The moderator is able to assign output values to each kernel/algorithm they are tracing. This is accomplished by double right clicking (selects) on the required kernel or algorithm. The moderator selection of a kernel/algorithm causes the input/output selection popup menu to be displayed. After the “Input” button is selected on the Input/Output selection popup menu then the file or variables selection popup menu is displayed. If the URL of the variable file is entered followed by the selection of the “Continue” button then a file with the following format is used to define all input variables.
Blank spaces and line feeds/carriage return characters are ignored. If the variable is an array then the array element that is affected is selected. For example: (test[3], 10) means that the forth element of the array named test will receive the value ten. Any undefined elements are designated “N/A.” Any variable with an “N/A” designation will not be defined.
The selection of the “Display Variables” button within user interface 160 causes all variables for the current kernel/algorithm to be displayed. The moderator may then place values in the current value field of the each variable or enter “N/A,” where “N/A” means that this value is not important. Each element in an array must be defined separately. Any variable that is not given a value is assumed be defined as “N/A.”
The selection of an “Output” button within the “Input/Output” popup menu will cause the “Output File or Variable” popup menu to be displayed. The “Output” files and variables are filled in a manner analogous to the “Input” files or variables.
After all input and output variables are defined then the moderator may select the starting kernel/algorithm for activation. In one embodiment, the moderator left clicks the starting kernel/algorithm followed by left clicking the “Start” button within user interface 160. The algorithm is then processed by development server 108 and once complete the output data is compared to the entered output variable values. The moderated algorithm is considered traced when all algorithm paths possible been selected and when required values have be obtained for each path. An algorithm may be traced when only when all kernels and algorithms defined within that algorithm are successfully traces and considered safe.
Unsafe Code Determination
MBD 111 analyzes tracking file 3208 and missing segment file 3212 to determine whether the tested software source code 3202 is considered safe. If missing segment file 3212 identifies any code segment as untested, the software source code is not considered safe. If, within tracking file 3208, a current address of any variable is outside of that variable's assigned address range during a program run, then the software source code 3202 is not considered safe. If, within tracking file 3208, a code segment is indicated as having a total execution time greater than a defined maximum time is not considered safe.
If, within tracking file 3208, the sum of all execution time of a looping segment (without exiting the looping segment) is greater than a defined maximum time, then the software source code is not considered safe. If, within tracking file 3208, the total execution time for software source code 3202 exceeds a defined maximum time, then the software code is not considered safe. If, within tracking file 3208, there are any allocated variables that never have memory allocated to them, then software source code 3202 is not considered safe. If, within tracking file 3208, more than one memory allocation is made per variable per function, then software source code 3202 is not considered safe.
Ancillary Services
In the example of
Ancillary resource server 4902 retrieves service information and associated organization information from database 106 based upon service request 4908, and presents a list of organizations offering the requested services to organization 154(4). In one embodiment, service information 4904 may be presented as a graphic similar to a kernel (e.g., kernels 204,
In another example of
Developers (e.g., developers 152(6) and 152(7)) that are interested in finding work in association with environment 100 may submit résumés (e.g., résumés 4930(1) and 4930(2), respectively) to ancillary resource server 4902 via graphical process control server 104. Ancillary resource server 4902 stores résumés 4930(1) and 4930(2) within developer information table 4932 of database 106. Each developer 152 may then interact with ancillary resource server 4902, via graphical process control server 104, to search for jobs within job descriptions 4922 based upon an input category and/or one or more keywords. In response, ancillary resource server 4902, via graphical process control server 104, may display a list 4934 of organizations (e.g., organizations 154(4) and 154(5)) offering work to the developer. Selection, by the developer (e.g., developer 152(6)) of one or more of these organizations on list 4934 is received by ancillary resource server 4902 and stored within database 106 in association with developer 152(6) and job descriptions 4922.
Administrators 158 of organizations 154(4) and 154(5) may each interact with ancillary resource server 4902, via graphical process control server 104, to evaluate résumés 4930 of developers 152 that have selected their organization from organization list 4934. In the example of
Changes may be made in the above methods and systems without departing from the scope hereof. It should thus be noted that the matter contained in the above description or shown in the accompanying drawings should be interpreted as illustrative and not in a limiting sense. The following claims are intended to cover all generic and specific features described herein, as well as all statements of the scope of the present method and system, which, as a matter of language, might be said to fall therebetween.
This application claims the benefit of U.S. Provisional Application Ser. No. 61/377,422 filed Aug. 26, 2010, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61377422 | Aug 2010 | US |