The present invention relates to the field of trusted software and software testing. More particularly, this invention relates to a program constraint construction based method for examining output uniqueness of multithread program and generating proof.
With the widespread application of multi-core processors, writing multithread programs with fine performance and structure is an important way to release potential of the multi-core processors. Debugging obscure errors in a multithread program are now an urgent issue. For a serial program, outputs of one input in multiple executions are definitely unique. However, unique outputs of one input are not necessarily generated in multiple executions of a multithread program, for different thread interleaves may be generated in execution processes of the multithread program, which may have different impacts on an execution result of the program. Hence, how to verify output uniqueness of a multithread program is an urgent issue to be solved.
However, there is a difficult in verifying a multithread program and it is hard to reproduce serial errors. A multithread program has the following features: 1) a user can hardly control execution order of all threads; 2) side effects may be generated in a debugger using method of instrumentation or breakpoint debugging, resulting in disappearing of some errors; 3) due to an operating system and a running environment, a sequence in which errors occur seldom reoccurs; and 4) explosion of spatial states is caused by thread interleaving, where, for example, the number of interleave sequences of a program with n threads in each of which k instructions are executed is as many as (nk)!/(k!)n>=(n!)k. Even in an assumption of controllable thread scheduling, a programmer can not enumerate all thread interleaves.
At present, a lot of work has been done on testing and verifying of multithread programs, which includes uncertainty testing and model examining. In an uncertainty testing method based on instruction of a coverage standard, a coverage standard set in each execution is examined to determine elements not yet covered, and a random delay is inserted into the program to increase a possibility of covering other elements in a next execution. In model testing, a state of the program is symbolized and an entire state space is traversed, to find error states in the program. Though model testing solves the issue of verifying multithread programs to some extent, it is subject to the issue of state space explosion, which makes it hard to be applied to a large-scale and complex software system.
To overcome the disadvantages above in conventional technology, the disclosure is to provide a program constraint construction based method for examining output uniqueness of a multithread program and generating proof, in which constraint expressions are constructed based on semanteme of a multithread program, the issue of verifying output uniqueness is converted into an issue of constraint solving, whether there are different outputs is detected with a constraint solver and counter-example execution paths are generated to describe different outputs.
To achieve the object above, following technical solutions are provided according to the disclosure.
A program constraint construction based method for examining output uniqueness of a multithread program and generating proof is provided, which includes the following steps:
S1), embedding monitoring code into a program to be tested, to record an execution process of the program;
S2), executing the instrumented program at a given input, and generating a path record file;
S3), pre-processing an execution path to facilitate constraint construction;
S4), automatically adding an attribute condition at the end of running of the program, and for a running output of the multithread program, inserting an output uniqueness condition ρ into the program in the form of assert;
S5), converting, based on execution semanteme of the program, state transfers and thread interleaving relationships in the execution path into quantifier-free first-order logic expressions, and constructing a multithread program execution path constraint model F covering all possible interleave sequences;
S6), verifying whether there is a solution to fρ given the uniqueness condition ρ; and
S7), generating a proof sequence if there is a solution which indicates existence of multiple different outputs, and determining that output is unique at the given input if there is no solution.
A further improvement of the invention is that: the instrumentation in step S1) is not performed at a source code or binary level but at a bytecode level, where a specific implementing method thereof includes first converting source code of the multithread program to be tested into bytecode of an intermediate format, that is, LLVM bytecode, then embedding a sentence with a monitoring function into the program to be tested, and finally linking the bytecode with the monitoring code embedded in to generate an executive program.
A further improvement of the invention is that: the pre-processing in step S3) includes extracting a shared variable to recognize an access point of a public variable in the execution path, and slicing to remove an executive sentence irrelevant to a verification attribute.
A further improvement of the invention is that: an output variable is automatically recognized and the output uniqueness condition ρ is constructed therefor in step S4).
A further improvement of the invention is that: the multithread program execution path constraint model F in step S5) covers all possible interleave sequences of the execution path, and includes five constraints: a path expression, a memory model constraint, a read-write relationship constraint, a partial-order constraint and a synchronous semantic constraint, which are defined as follows:
1), the path expression: describing definition-use chains inside threads and controlling state switching inside the threads;
2), the memory model constraint: representing relationships between sentences and between variables in a program and adopting semanteme with sequential consistency, where the sequential consistency provides that a CPU executes a program in an order of sentences in code;
3), the read-write relationship constraint: defining definition-use chains across the threads, and providing that a value read by a shared variable must be from an initial value and a value written recently;
4), the partial-order constraint: defining sequential relationships between operating sentences for creating and terminating a thread and sentences of an operated thread across the threads;
5), the synchronous semantic constraint: defining sequential relationships between synchronization control operating sentences across the threads;
where the definition-use chain is defined as follows: with each thread sequence converted into an SSA format, each execution sequence of the SSA format except shared access points is a complete definition-use chain.
A further improvement of the invention is that: a method for constructing the multithread program execution path constraint model F in step S5) includes the following operations:
1), calculating the path expression to control the state switching inside the threads;
2), calculating the memory model constraint to restrict relationships between sentences in each of the threads;
3), calculating the read-write relationship constraint to establish the definition-use chains across the threads:
4), calculating the synchronous semantic constraint to define synchronization relationships between the threads;
5), calculating the partial-order constraint to describe semanteme for creating and terminating the threads; and
forming the constraint model F by combining the five constraints above.
A further improvement of the invention is that: with an event set of the execution path defined as n={Ti|0<i≦k}, where k is the number of the threads, Ti={e1,e2,en} is an execution sequence of thread i, en represents an nth event of Ti, O(en) represents a rank of event en and n represents the number of events in Ti,
a method for calculating the path expression includes:
converting each thread sequence into an SSA format, which is similar to collecting of path conditions (Path Condition), and directly converting the sequence of the SSA format into the path expression;
a method for calculating the memory model constraint includes:
performing all operations in accordance with an order of the program with a model with sequential consistency adopted, where an order of events in a thread is subject to the following constraint:
where ei and ei+1 represent two successive events in one thread and τ represents all thread sequences;
a method for calculating the read-write relationship constraint includes:
letting reading of the shared variable be from recent writing, and for one shared variable v, with R representing a set of all events performing a reading operation thereon and W representing a set of all event performing a writing operation thereon, providing the following formula:
where er is a read event, ew and ex are write events, vr and vw are variables operated by events er and ew, and the meaning of the formula is that a value of vr in event er is from vw in event ew on condition that er is after ew, that is O(ew)<O(er), and that all writing is either before ew or after er;
a method for calculating the synchronous semantic constraint includes two kinds of operations, lock/unlock operations and wait/signal operations:
1), the lock/unlock operations are for constructing a locked synchronous semantic constraint, which requires that any two lock/unlock event pairs li/ui and lk/uk in a lock/unlock set L in one mutex are subject to the following formula:
where lock pair li/ui occurs either before or after lock pair lk/uk; and
2) the wait/signal operations are for constructing a synchronous semantic constraint for condition variables, which is subject to the condition that each wait operation corresponds to one signal operation and one signal operation awakes one wait operation at most, and for one condition variable cond, with WT representing a set of all wait operations on cond and SG representing a set of all signal operations on cond, the following formula is provided to meet the condition above:
where ewt is an element in WT, SGwt represents a set of signal operations matching ewt, esg is any signal operation event in SGwt, whether esg matches ewt is represented by whether variable pairsgwt is equal to 1, and the sub-formula Σe∈SG
a method for calculating the partial-order constraint includes:
providing that if an event creates a thread, all events of the created thread are required to be executed after the event and that if an event performs a thread terminating operation, all events of a terminated thread are required to be before the event, and with C representing a set of events of create/fork operations and J representing a set of events of join operations, providing the following constraint:
where ec is a thread creating event, first (ec) is a rank of the first event of a thread created by ec, ej is a thread terminating event and last(ej) is a rank of the last event of a thread terminated by ej; and
the five constraints above are combined to form the constraint model F.
A further improvement of the invention is that: with the given constraint model and the output uniqueness attribute condition, the attribute condition is solved for with the constraint solver in step S6), and if there is a different output, a counter-example is generated to describe a triggering process of the different output.
Compared with conventional technology, the invention has the following beneficial effects:
1), a multithread program execution path constrain model is provided, the issue of verifying output uniqueness of a multithread program is converted into an issue of constraint solving, the model constructs constraints based on semanteme of the program, constructed expressions cover all possible interleave sequences, and whether all the interleaves generate different outputs is examined with a constraint solver;
2), a proof sequence is generated if there are different outputs, to present a user how the different results are generated; and
3), a post-hoc analysis is performed on an execution sequence, without generating a high running cost as in on-the-fly technology.
Some embodiments of the invention are described hereinafter in conjunction with the figures and an example. A program to be tested is shown as follows, where x and y are shared variables and thread 0 creates threads 1 and 2 in lines 1 and 2.
As shown in
In step S1), monitoring code is embedded into the program to be tested, to record an execution process of the program. Code presented after the instrumentation is completed at an LLVM bytecode level is shown as follows:
where function clap_inst_pre is an inserted monitoring sentence, which monitors a sentence thereafter and outputs a thread ID, an instruction ID, a state value and a returned value of the sentence thereafter in an execution process.
In step S2), the example program is executed at a given input and a path is recorded as [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14].
In step S3), the path is pre-processed for constraint construction in step S4), access points of global variables are extracted, which include lines 5, 6, 8, 9, 12 and 13, and the path is converted into an SSA format, where thread 0 is converted into track 0, thread 1 is converted into track 1 and thread 2 is converted into track 2, as is shown as follows:
where subscripts of global variables x and y represent reading (r) or writing (w), superscripts are for telling different reading or writing operations apart and a superscript being 0 indicates an initial assigned value.
In step S4), with expected outputs of global variables x and y being 6 and 5 respectively, assertions x=6 and y=5 are inserted at the end. In addition, x=6 and y=5 are determined to be output uniqueness verification conditions, as is shown as follows:
In step S5), state transfers and thread interleaving relationships in the execution path are converted into quantifier-free first-order logic expressions based on execution semanteme of the program, and a constraint model F of the execution path π, which includes five constraints: a path expression, a memory model constraint, a read-write relationship constraint, a partial-order constraint and a synchronous semantic constraint, is constructed. The whole constraint model F covers all possible interleave sequences. Specifically, as shown in
Step S501), directly calculating, based on the SSA format of the path, the path expression, which is expressed by the following formula:
x
0
w=3y0w1a0=x0rb0=y0rx1w=a0+2y1w=b0+3x2w=x1r+1y2r=y1r+1;
Step S502), constructing the memory model constraint, which includes performing all operations in accordance with an order of the program with a model with sequential consistency adopted, and calculating the memory model constraint of the path π according to the formula:
which is expressed by the following formula:
o(e1)<o(e2)<o(e3)<o(e4)o(e5)<o(e6)<o(e7)<o(e8)<o(e9)<o(e10)o(e11)<o(e12)<o(e13)<o(e14),
where o(ei) represents a serial number of an interleave sequence of a sentence in an ith line.
Step S503) calculating the read-write relationship constraint, which includes letting reading of the shared variables be from recent writing, and for one shared variable, with R representing a set of all events performing a reading operation thereon and W representing a set of all event performing a writing operation thereon, providing the following formula:
where er is a read event, ew and ex are write events, yr and vw are variables operated by events er and ew, and the meaning of the formula is that a value of vr in event er is from vw in event ew on condition that er is after ew, that is O(ew)<O(er), and that all writing is either before ew or after er.
In the path, for global variable x, R={e5,e12}, W={e0,e8,e12} , and a read-write relationship is expressed by the following formula:
{x0r=x0wO(e0)<O(e5)(e8)<O(e12)}{x0r=x1wO(e12)<O(e5)O(e0)<O(e12)},
where reading and writing possibilities of variable x are enumerated, and reading of x in line 5 is from writing of x in line 0 on condition that line 0 is before line 5 and writing of x in line 12 must not occur between reading of x in line and writing of x in line 0. A situation in a case of variable y is similar to that in a case of variable x.
Step S504) constructing the synchronous semantic constraint, which includes two kinds of operations, lock/unlock operations and wait/signal operations:
1), in constructing a locked synchronous semantic constraint (that is, performing the lock/unlock operations), it is required that any two lock/unlock event pairs li/ui and lk/uk in a lock/unlock set L in one mutex are subject to the following formula:
where lock pair li/ui occurs either before or after lock pair lk/uk; and
2) in constructing a synchronous semantic constraint for condition variables (that is, performing the wait/signal operations), it is required to meet the condition that each wait operation corresponds to one signal operation and one signal operation awakes one wait operation at most, and for one condition variable cond, with WT representing a set of all wait operations on cond and SG representing a set of all signal operations on cond, the following formula is provided to meet the condition above:
where ewt is an element in WT, SGwt represents a set of signal operations matching ewt, esg is any signal operation event in SGwt, whether esg matches ewt is represented by whether variable pairsgwt is equal to 1, and the sub-formula Σe∈SG
In the path, there is only a lock m, and the synchronous semantic constraint is expressed by the following formula:
o(e10)<o(e11)o(e14)<o(e7)
where the constraint expression indicates that either thread 1 obtains the lock first, that is, o6<o7, or thread 2 obtains the lock first, that is, o10<o3.
Step S505) calculating the partial-order constraint, which provides that: if an event creates a thread, all events of the created thread are required to be executed after the event; and if an event performs a thread terminating operation, all events of a terminated thread are required to be before the event. With C representing a set of events of create/fork operations and J representing a set of events of join operations, the following constraint is provided:
where ec is a thread creating event, first (ec) is a rank of the first event of a thread created by ec, ej is a thread terminating event and last(ej) is a rank of the last event of a thread terminated by ej.
In the path, thread creating sentences are o2 and o3 and a partial-order constraint thereof is expressed by the following formula:
o(e1)<o(e5)o(e2)<o(e11)o(e10)<o(e3)o(e14)<o(e2),
where the constraint o(ei)<o(e5) represents that thread creating sentence ei is executed before the first event e5 of thread 1 created by e1, and the constraint o(e10)<o(e3) represents that thread wait sentence e3 is executed after the last event e10 of thread 1.
Step S506) obtaining the constraint model F by combining the five constraints above.
In step S6) in the example, with the output uniqueness verification conditions being ρ1:x=6 and ρ2:y =5, Fρ1 and Fρ2 are solved with a constraint solver, each of which has a solution, where a counter-example of ρ1 is {1,2,5,11,12,13,14,6,7,8,9,10} and a counter-example of ρ2 is {1,2,5,6,11,12,13,14,7,8,9,10}.
In step S7), a verification result and the counter-examples are outputted.
Number | Date | Country | Kind |
---|---|---|---|
201410320129.0 | Jul 2014 | CN | national |
This is a continuation application of International Application PCT/CN2015/081055 filed on Jun. 9, 2015, which claims the benefit of the Chinese Patent Application CN 201410320129.0 filed Jul. 7, 2014. which are all incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2015/081055 | Jun 2015 | US |
Child | 15270266 | US |