This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
A system may comprise a number of resources, some of which may be defective. For example, a system may be a computer that comprises a number of electronic chips. The chips are resources of the computer system, and some of the chips may be defective. Accordingly, it may be desirable or even necessary to test the system to determine which resources are functional (good) and which resources are inoperative (bad). In another example, a single electronic chip may be the system, and a number of logic devices on the chip may be the resources. In this second example, it may be desirable to perform a test or tests to determine which, if any, of the logic devices are good and which logic devices are bad.
One method of determining the status of system resources, as in the examples above, may be to test each resource in the system individually. However, individualized testing may not always be possible or efficient. For example, some systems may prohibit such testing based on the structure of the system. Other systems may comprise too many resources to efficiently test each resource individually. Accordingly, it may be desirable or even necessary to test some systems by testing groups of resources within the system thus reducing the number of tests required. For example, if a system contains one hundred resources, the system may be tested by first dividing the system into five groups of twenty resources, reducing the number of tests from one-hundred to five. Next, each of the five groups may be tested to determine whether the group as a whole is good or bad. The group is defined as good if all of its resources are good. If any resource is bad, the group is defined as bad.
However, inherent problems exist with the abovementioned method of testing groups. Further, there exists difficulty in selecting groups with a reasonable likelihood of providing a positive outcome based on initial estimates or based on information obtained from prior testing. For example, if twenty electrical components are tested as a group and the test fails, the test may not indicate which of the twenty components is/are faulty. This inherent difficulty exists because one bad component can cause the entire group to fail and thus cause the group to be deemed bad. Accordingly, it may be necessary to choose groups wherein all the resources comprised by the group are good. This necessity arises because negative tests only indicate that at least one resource in the group is faulty, and good groups may be necessary in order to obtain valid results regarding the status of resources in a system. However, it may be difficult to select groups that have a reasonable likelihood of producing a success when tested. Additionally, a further problem exists in that even a positive test may not be a true indication that all twenty components are good. There may be some probability of an accidental success.
Faulty information may result from tests that yield incorrect results due to accidental successes. This faulty information can be very problematic. For example, a group may be deemed good when in actuality it contains a bad resource. A falsely positive test result can damage or reduce the value of an entire system or network of systems. One means of overcoming the problem of accidental successes is to employ a stronger test, one which has less likelihood of producing an accidental success. However, the number and complexity of resources in a group may limit the strength of a test of that group. One means to employ stronger tests is to increase the number of resources in each group. By making the group contain more resources, it may be possible to employ a test strong enough that the probability of achieving an accidental success is negligible. However, by increasing the number of resources in each group, it becomes even more difficult to select groups which will with reasonable probability provide a positive or good test. Very simply, the increase in the difficulty of selecting good groups is due to an increase in the probability of each group containing at least one bad resource. Of course, the probability of failure increases proportionally relative to the defect rate of resources in the particular system. This is due to the increased likelihood of including a bad resource. Thus, in systems with defect rates above a certain level, it may not be practical to overcome the problem of accidental successes by increasing the group size.
Advantages of one or more disclosed embodiments may become apparent upon reading the following detailed description and upon reference to the drawings in which:
One or more specific embodiments of the present invention will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
Embodiments of the present invention may provide the ability to test resource characteristics and/or resources of a system for which individual testing is either impractical due to the number of resources comprised by the system, or impossible because of inaccessibility. Further, embodiments of the present invention may account for accidental successes, thus allowing for division of systems into relatively small groups. Because accidental successes may be taken into account, the groups need not be so large as to negate the possibility of accidental success. Additionally, by allowing relatively small groups, the probability of choosing good groups (groups that will return positive results or that will pass) may increase. Accordingly, even systems with high defect rates, which require smaller groups in order to acquire positive tests, may be reliably testable utilizing embodiments of the present invention.
It should also be noted that embodiments of the present invention may provide means for selecting groups with higher probabilities of success. In other words, the present invention may provide information or a system for choosing good groups, thus increasing the probability of acquiring useful information. This may be beneficial because failed tests are not fair or adequate tests for good resources in a group. For example, failed (negative) tests only indicate that at least one resource is bad. Good resources can only demonstrate that they are good through successful (positive) tests. Thus, positive tests provide the best and most valuable information.
In accordance with
Embodiments of the present invention may comprise an iterative method for estimating a probability “Px” that X is good, where X is a resource. The iterative method may begin with an initial estimate of Px for each resource X, as shown in block 114, based on the nature of the system and the nature of the resource. Considering the resource X, this method includes counting a number of successes “Sx” and tests “Tx” for X. As shown in block 116, Sx and Tx may initially be set so that their ratio Sx/Tx equals Px. The iterative method may select a testable group of resources, as shown in block 118, perform a test, and update the counts of successes and tests for the resources in the group. The ratio of Sx to Tx (Sx/Tx) may then be calculated to give a revised estimate for the probability Px. It should be noted that the tests and successes are not counted as integers, but rather as summations of probabilities based on the results of the group tests in which X has been involved. For example, a group may comprise three resources (X1, X2, X3). Based on previous tests, the probability that X1 is good may be estimated as 0.9, the probability that X2 is good may be estimated as 0.9, and the probability that X3 is good may be estimated as 0.1. A test of this group of three resources may be expected to succeed with probability 0.081, assuming that the test depends uniformly on all three resources. In other words, this test may be expected to fail. But it would not be fair to attribute much of the cause for such a failure to X1 or X2, because the failure would be much more likely due to X3. In particular, each resource may be seen as being tested only under the conditional probability that the other resources are good. For resource X1, the probability that the other resources are good may be estimated as the product 0.9×0.1=0.09. Accordingly, after testing the example group and getting a failure, the fraction 0.09 would be added to the number of tests for X1. The test is not very effective for resource X1 because the failure is most likely the fault of the resource having only a ten percent likelihood of being good. For resource X3, the probability that all the other resources are good may be estimated as the product 0.9×0.9=0.81. Accordingly, after testing the example group and getting a failure, the fraction 0.81 would be added to the number of tests for X3. Similarly, if a success occurs, only the probability that the success was not accidental may be attributed to a success count. This probability that the success was not an accidental success is related to the probability of success with a bad element discussed below and depends on the particular test being used. The probability of success with a bad element may be referred to as “PSB.”
In testing a group involving resource X, several assumptions may be made. An assumption may be that the test will always succeed if all the resources in the group are good (block 120). Another assumption may be that if some resource in the group is bad, the test will usually fail, but it may accidentally succeed with some probability PSB (block 122). Another assumption may be that the test can indicate the value of the probability PSB, based on its internal structure (block 124). For example a test that expects never to produce an accidental success, would indicate that PSB=0. Another example is a full-sequence linear-feedback shift register of n bits, which in spite of containing a bad resource has a probability of roughly 2−n of accidental success after being clocked for many cycles.
Another assumption is that the test can tell us, based on the internal structure of the test and the estimated probabilities of the resources in the group, the expectation P that the test will succeed (block 126). If the test depends uniformly on all of the resources in the group, then P could be the product of Py over all resources Y in the group. This would assume that all of the probability estimates are independent. This could be a default assumption, because a well-designed test should depend uniformly on all of its resources. However, tests that depend more heavily on some resources than others may still be used with this method by making a proper calculation of the expectation P.
In the illustrated embodiment, the test is performed, as shown in block 130. If the test succeeds, for each resource X in the group, the number of successes Sx is increased by 1-PSB and the number of tests Tx is increased by (1−PSB)×P/Px (blocks 132 and 134). If the test fails, for each resource X in the group, the number of tests Tx is increased by P/Px and the number of successes is left unchanged (block 136). With the updated information, the estimated probability Px for each resource X in the group is recalculated as illustrated in block 140. The iteration proceeds through block 142 back to block 118 to select another testable group of resources. When enough testing has been performed, the method is finished (block 199).
Other embodiments can be envisaged comprising similar versions of the calculations presented in blocks 114-142 that may also function with the present invention. For example, on a successful test, the number of tests Tx could be increased by P/Px and the number of successes Sx could be increased by 1−PSB.
In summary, there may be two general concepts concerning attributing fractional test results. The first concept is to attribute a fractional test to each resource X based on how strongly the state of that resource X influenced the test result. The second concept is to attribute a fractional success to each resource X based on how strongly the test result indicates that the resource X must be good. It should be noted that different numerical calculations that follow this concept may tend to provide an acceptable result, even if the exact conditional probability calculation is not mathematically accurate. In the present context, an acceptable result may be defined such that after iteration of many tests, the probability estimates for each resource tend to discriminate between good resources and bad resources. In other words, rough approximations may be sufficient, which is beneficial because getting an exact probability model of the dependence of a test on its resources is often quite difficult.
In some embodiments, a large system may be tested in which many, non-overlapping groups may be configured. In such a system, tests may be performed and then evaluated utilizing the graph walking method, discussed above. For example, groups may be selected using the graph walking system. Further, this group selection process may be applied repeatedly to completion (block 399) or until all nodes have been visited and included into some group (blocks 320 and 340). Accordingly, all groups may be tested individually or in parallel (block 350). Even if the system does not allow for simultaneous tests, applying the graph walking method repeatedly until all nodes have been visited may be a good approach to get coverage.
When selecting the starting node for a path in the graph walking method, selecting the unvisited node with the highest probability for success, as illustrated by block 322, helps to select a larger group of nodes. Likewise, when selecting other nodes to extend a path (block 336), selecting the nodes with the highest probability of producing a successful test will also help to select a larger group of nodes. Larger groups permit more robust tests and thus have a lower chance of accidental success, which makes the information produced by a good test more valuable.
In some embodiments of the present invention, rather that keeping counts of successes (Sx) and tests (Tx), successes (Sx) and failures (Fx) could be tallied. In such a system, the number of tests (Tx) would just be the sum of the successes (Sx) and the failures (Fx).
In performing the tests and iterations discussed above, it may be beneficial to initialize all nodes to 0.5 successes (Sx) and one test (Tx). This would correspond to an initial probability estimate Px of 0.5. The process of choosing groups and testing the groups may be repeated iteratively until every (or most) nodes have accumulated a minimum number of tests. For example, the minimum number of tests may be set at twenty. As the iteration proceeds, nodes with a probability less than 0.45, for example, may be considered as bad and nodes with a probability of greater than 0.55 may be considered as good. Of course, these thresholds may be adjusted and such a procedure may continue until convergence, until certain values are reached, until users are satisfied, or until some other designated stopping point. Further, the results may be utilized in the identification of error prone nodes or bad nodes.
Specifically, the illustrated computer system 400 comprises a computer and a hard drive which are represented by blocks 410 and 420 respectively. The computer system 400 may also comprise various other modules, as illustrated in
Other modules represented in
While the invention may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, it should be understood that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the invention as defined by the following appended claims.