The present invention, in at least some embodiments, is of an advanced system and method for continuous software testing, and in particular, for testing in a system of continuous code delivery.
In order for software applications to be delivered quickly, with new features on rapid release, new systems for coding and software release have been developed. Termed “CI/CD” for “continuous integration and continuous delivery”, these systems enable new features to be rolled out rapidly for consumption by users.
Unfortunately, such rapid release has put a significant strain on existing systems for code development, particularly for code testing. Previously, systems for code delivery were set up for a new code release once or twice per year, with static delivery dates that were fixed well in advance. This rigid scheduling made it easy to schedule tests and to have sufficient time for all tests to be performed before the code was released.
CI/CD does not integrate well with such rigid test scheduling as it requires dynamic analysis of code changes and test coverage. Existing code quality management systems are all built for an earlier age of rigid scheduling and cannot easily be adjusted to the new requirements of dynamic testing and release.
To further increase the complexity of what must be tested, dynamic code and test analysis is now also required. Previously, static code analysis analyzed a complete set of code, which was only changed once or twice per year. Dynamic code analysis is better suited to the new rapid release format, but has only recently been developed and is not fully implemented in all Quality management systems. Analysis and management of the results of either static or dynamic code analysis are both lagging far behind the new needs of CI/CD systems.
In addition, as DevOps and Agile methods are emerging, developers are building automated tests, developing automated code to test code. The number of automated tests per application is increasing dramatically, resulting in dozens and even hundreds of thousands of automated tests running for each build.
Combining all of these factors—the high speed of releases, the high number and high frequency of releases and the growing number of tests-makes it impossible to control and understand the readiness of each and every build, and whether a build is ready or not for production deployment.
For example, U.S. Pat. Nos. 8,473,907, 7,966,346 both relate to static code analysis performed to understand dependencies and component mapping. Attempts to determine which tests are more important are described for example in U.S. Pat. No. 9,075,914, which describes a system to run Selenium tests after checking all possible user paths through software, and then determining which ones are different and which ones are more important.
US20150007140 analyzes the code for test prioritization in order to determine the order in which tests should be run.
The background art does not teach or suggest a system or method for constructing testing systems around the needs of CI/CD systems.
The background art also does not teach or suggest a system or method that is suitable for continuous deployment and release systems for software.
The background art also does not teach or suggest a system or method for selecting one or more tests according to test impact, for example as determined with regard to test history and/or test timing.
By “test history” it is meant which tests were executed at which time, for example in order to determine which tests are relevant for coverage of particular code.
By contrast, the present invention, in at least some embodiments, relates to a system and method for CI/CT/CD, (continuous integration/continuous testing/continuous delivery), in which testing is fully integrated to the needs of rapid code development and delivery. Such a system needs to be capable of continuous testing with seamless integration to the CI/CD system, and be able to raise a flag if code quality is reduced—or if testing fails to determine code quality. For example, if there is a hole in test coverage, then code may not be adequately tested and hence code quality may not be correctly determined. Quality analysis is therefore an important aspect of testing code, to determine whether it is ready to release. Such quality analysis optionally and preferably is able to determine build quality, more preferably before release of a build.
The present invention further overcomes the drawbacks of the background art by providing a system and method for selecting one or more tests according to test impact, for example as determined with regard to test history and/or test timing.
The present invention, in at least some embodiments, further relates to a system and method for determining build quality for a plurality of tests being performed on each build of code across at least a plurality of environments or even every environment. Each environment may also be described as a test stage with a plurality of tests to be performed at each stage. Each stage may have its own quality measure determined, in terms of build quality, which then preferably leads to a measure of build quality for the test stages. Tests are executed on the application under test. Tests are preferably performed on the completed compiled build.
By “build quality” it is meant that the quality of a build includes one or more of the following: detection of at least one change in the code from a previous build to a current build and analysis of at least one test in one test environment to determine whether such change has been tested; assessment of at least one previously performed test; assessment of at least one test coverage; assessment of at least one test coverage hole; or a combination thereof.
By “test coverage hole” it is meant a determination that at least a portion of the code that has been modified, has not adequately been tested by test(s) that have been run, which may optionally include but is not limited to zero test coverage, in which no test that has been run tests that portion of the code.
“Test coverage” may optionally be considered according to test environment or test stage, optionally and preferably in a two step process, in which test coverage is first determined across all environments (or at least a plurality of environments) to avoid repeating footprints, and are then determined according to a specific environment. Optionally and alternatively, test coverage may be determined first per environment and then on the build level, optionally then leading to determination of complete test coverage. Test coverage is optionally determined according to environmental footprint, which relates to the build in a particular environment.
Tests may optionally be automatically selected to provide improved test coverage, for example according to one or more of changes in the code; run-time environment; previous failed tests or test coverage holes; or other priority components (such as a user request, dependency or connection to a particular code area or code family).
Test priority may optionally be determined according to the above parameters, in which the order in which tests are to be performed may optionally be determined according to test priority, such that tests which will provide more information and/or more important information are performed first. Alternatively, only certain selected tests may optionally be run at any given time, since in a continuous delivery environment, the need to release a new build may outweigh the need to run all possible tests before release.
According to at least some embodiments, the system and method as described herein may optionally be applied to a Continuous Testing paradigm, in which a build-test-release cycle is preferably implemented for code construction and implementation in a working environment. The “build” part of the cycle may optionally relate to relatively small or at least incremental differences in the code, rather than large changes to the code. In this paradigm, the system and method are preferably able to detect code changes between builds. At least partially according to these code changes, test priority and test selection are performed, so as to select matching tests and priority for running these tests.
According to at least some embodiments, test priority and selection may optionally be performed according to a variety of analytical tools, including but not limited to a calculation based on historical test status and build content (binaries and configuration files), as well as user input, environmental changes and realtime calculations; and realtime dynamic test priority calculation based on realtime coverage data collection, optionally including modifying the priority list on the fly.
For greater efficacy, optionally and preferably selected tests are automatically run across different environments and testing tools.
In order to assist users in determining the results of the tests and in selecting further actions to be performed, optionally and preferably the build quality is collected, and is more preferably displayed to the user. Such build quality information optionally includes but is not limited to one or more of test status, coverage, quality holes, trends, timing, or a combination thereof.
A build quality dashboard may optionally be implemented to show an aggregated display of all quality matrices, optionally including the previously described build quality information. To assist the user in understanding the meaning of the build quality, preferably a build quality analysis is performed, which optionally and preferably includes calculating a build scorecard. The scorecard preferably includes different various metrics to show the quality of a build. Optionally and preferably, a rule based engine may be used to determine build readiness for production deployment; such an engine may also optionally calculate the metrics for the scorecard for the user. The rule based engine may also optionally and preferably calculate coverage on distributed application.
Some non-limiting examples of build quality metrics include quality analytics such as automated test maintenance analysis, to detect irrelevant tests, redundant or never failing tests, which may optionally be eliminated. Other quality analytics optionally and preferably include the detection of problematic code areas-code that's uniquely or frequently associated with failing tests. Other quality analytics optionally and preferably include the detection of duplicate and similar tests; comparing footprints of production and QA execution to highlight relevant and irrelevant tests, and improve priority. Other quality metrics may optionally include detecting failed tests, to filter out failed test coverage from coverage calculation; and the automatic detection of quality holes for automatically identifying when a quality hole is closed or covered.
Other non-limiting examples of build quality analytics may optionally be determined on a build level per application component and may optionally include performing a build compare of all the quality parameters. For example, the analytics may optionally include determining the test footprint diff between environments and between builds, and the test content change detection (to be able to detect when a test has changed and to update the priority list).
According to at least some embodiments, there is provided the ability to collect footprint from a multi-tier application with automatic detection of the servers and services in each environment and application under test (with no manual configuration of the servers under test). In such an application, each tier may have its own server such that multiple servers may need be considered in a single environment and/or several services may share the same server. Therefore it is necessary to determine which servers are related to which environment to determine the footprint, for example by determining which server(s) are performing operations for a particular environment or through a pre-determined manual mapping. Optionally and preferably automatic detection is performed by analyzing messages regarding the tests and in particular, which server(s) or service(s) report that a particular test is being performed and/or which listener determines that a particular test is being executed on a particular server. The combined information allows for servers to be automatically mapped to environments.
According to at least some embodiments for testing an integration build, in which a single such integration build features a plurality of components of different versions, special testing processes are preferably performed to determine coverage, quality holes and so forth for the integration build. In particular, for an integration build, tests are performed to determine the quality of the integration between the components.
According to at least some embodiments, there is provided a method for determining test triage, to determine which methods are suspected as causing the test to fail. Such a method may also optionally include showing suspected methods/functions for failed tests, based on build-diff analysis and matching with test footprints in distributed test environments. Test triage optionally and preferably involves detecting the cause or root source of the failure, such as of failed tests, for example according to a specific change in the code. The change in the code would be identified as a potential cause for the failure of the test(s).
According to at least some embodiments, there is provided a method for automatic build discovery, optionally and preferably including an ability to automatically detect a build version of running components in a distributed testing environment.
According to at least some embodiments, there is provided a system and method for statistically determining a relative importance of running a particular test within a plurality of such tests. The solution may be based on mapping of a test method to a list of methods/functions which are executed upon execution of a test, or overall execution of a test itself. Preferably, once the relative importance of running a particular test has been determined, it is possible to determine whether a particular test should be run at a particular time, or indeed whether it should be run at all. The relative importance may be determined for example according to a characteristic of the code, a characteristic of the test, a characteristic of who or what caused the code to change, when the code was changed and the context of other changes to the code, or a combination thereof. A system or method comprising such a solution may therefore comprise a test selection system for selecting one or more tests, according to the importance of the test, the importance of the code to be tested, or a combination thereof. A plurality of such tests may be referred to as a “suite of tests”. The code to be tested may be described as “program code”.
For example in relation to an importance of the code, characteristics include but are not limited to code usage in execution by the end user, interactions between a section of the code and other sections (with increased interactions relating to increased importance), an importance of the code overall for execution in production (that is, being “mission critical”), code with many changes or new commits, otherwise modified code, and so forth. Optionally such importance may be determined according to build scanning of the binary files and commit history as described herein, and/or other methods.
In relation to an importance of the test, characteristics include but are not limited to tests that apply to modified or added code, or code that is deemed important, tests that have failed previously, new and modified tests, and so forth. The relationship between a test and the code that is tested may be determined statistically, for example. As a non-limiting example, the statistical relationship may relate to a test execution time frame relative to execution time frame of a particular section of code.
According to at least some embodiments there is provided a system, comprising: one or more computing devices configured to implement a test selection system, wherein the test selection system is configured to:
As previously described, such mapping may comprise mapping of a test method to a list of methods/functions which are executed upon execution of a test, or overall execution of a test itself. The mapping may also comprise code that has changed between a plurality of sets of compared program code, and/or data indicative of modified or new portions of code. For example, a first set of code may be compared to a second set of code as described herein. The subset of tests that are likely to exercise the second set of code—that is, to touch, test or otherwise cause to execute the second set of code—may then be selected.
An order for executing the subset of tests (that is, the selected tests) may also be determined, for example according to importance of the test, importance of the code to be tested, or a combination thereof.
Optionally, the test selection system is further configured to: determine an ordered sequence for the subset of the tests, wherein the ordered sequence is determined based at least in part on an estimated likelihood of failure for the tests. For example, the importance of the test may relate to whether the code is likely to fail the test—or whether the test itself is likely to fail. In the former situation, the test may be deemed to be more important and so appear earlier in the ordered sequence; for the latter situation, the test may be deemed to be less important.
Optionally, the test selection system is further configured to: determine an ordered sequence for the subset of the tests, wherein the ordered sequence is determined based at least in part on an estimated execution time for the tests.
Optionally, the test selection system is further configured to: determine, based at least in part on the data indicative of code coverage, that a particular test may not exercise the first set of program code; and move the particular test from the suite of tests to a suite of deprecated tests, wherein the suite of deprecated tests is excluded from consideration for the subset of the tests that are likely to exercise the second set of program code. By “deprecated” it may mean tests that are less important and/or that are not selected for testing the second set of program code. A particular test may be deprecated according to a statistical likelihood of exercising or testing the second set of program code.
According to at least some embodiments, there is provided a computer-implemented method, comprising: receiving a mapping of a suite of tests to a first set of program code, wherein data indicative of code coverage is generated using execution of the tests on the first set of program code, wherein the mapping is determined based at least in part on the data indicative of code coverage, and wherein the mapping comprises data indicative of one or more portions of the first set of program code exercised by respective tests from the suite; and determining a subset of the tests that are likely to exercise a second set of program code that comprises one or more newly added portions, based at least in part on:
Optionally, the method further comprises: determining an ordered sequence for the subset of the tests, wherein the ordered sequence is determined based at least in part on an estimated likelihood of failure for the tests. Optionally, the method further comprises: determining an ordered sequence for the subset of the tests, wherein the ordered sequence is determined based at least in part on an estimated execution time for the tests. Optionally, the method further comprises: determining, based at least in part on the data indicative of code coverage, that a particular test does not exercise the first set of program code; and moving the particular test from the suite of tests to a suite of deprecated tests, wherein the suite of deprecated tests is excluded from consideration for the subset of the tests that are likely to exercise the second set of program code.
Optionally, the method further comprises: determining, based at least in part on a test execution history, that a particular test is infrequently executed; and moving the particular test from the suite of tests to a suite of deprecated tests, wherein the suite of deprecated tests is excluded from consideration for the subset of the tests that are likely to exercise the second set of program code. Alternatively, an infrequently executed test may be marked as more important, so that it is actually executed.
Optionally, the method further comprises: receiving a new test, wherein the new test is associated with the second set of program code; and adding the new test to the subset of the tests that are likely to be exercised by the second set of program code. The association may be determined manually (for example, by a developer or quality assurance tester) and/or it may be determined automatically, by analyzing the test and determining the code that it is likely to execute.
Optionally the second set of program code comprises a newly added portion, wherein the subset of the tests comprises one or more tests that are likely to be exercised by the newly added portion, and wherein the one or more tests are determined based at least in part on a machine learning model that analyzes the similarity of the newly added portion to the one or more portions of the first set of program code represented in the mapping.
Optionally the subset of the tests is determined based at least in part on user input representing a selection of tests for prior versions of the second set of program code.
Optionally, the method further comprises: executing the subset of the tests on the second set of program code.
According to at least some embodiments, there is provided a computer-readable storage medium storing program instructions computer-executable to perform:
Optionally the program instructions are further computer-executable to perform:
Optionally the program instructions are further computer-executable to perform:
Optionally the program instructions are further computer-executable to perform:
Optionally the program instructions are further computer-executable to perform:
Optionally in determining the subset of the tests that are likely to exercise the second version of the program code, the program instructions are further computer-executable to perform:
Optionally the subset of the tests are executed using a continuous integration system that tests the second version of the program code.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The materials, methods, and examples provided herein are illustrative only and not intended to be limiting.
Implementation of the method and system of the present invention involves performing or completing certain selected tasks or steps manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of preferred embodiments of the method and system of the present invention, several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof. For example, as hardware, selected steps of the invention could be implemented as a chip or a circuit. As software, selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In any case, selected steps of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.
An algorithm as described herein may refer to any series of functions, steps, one or more methods or one or more processes, for example for performing data analysis.
Implementation of the apparatuses, devices, methods and systems of the present disclosure involve performing or completing certain selected tasks or steps manually, automatically, or a combination thereof. Specifically, several selected steps can be implemented by hardware or by software on an operating system, of a firmware, and/or a combination thereof. For example, as hardware, selected steps of at least some embodiments of the disclosure can be implemented as a chip or circuit (e.g., ASIC). As software, selected steps of at least some embodiments of the disclosure can be implemented as a number of software instructions being executed by a computer (e.g., a processor of the computer) using an operating system. In any case, selected steps of methods of at least some embodiments of the disclosure can be described as being performed by a processor, such as a computing platform for executing a plurality of instructions.
Software (e.g., an application, computer instructions) which is configured to perform (or cause to be performed) certain functionality may also be referred to as a “module” for performing that functionality, and also may be referred to a “processor” for performing such functionality. Thus, processor, according to some embodiments, may be a hardware component, or, according to some embodiments, a software component.
Further to this end, in some embodiments: a processor may also be referred to as a module; in some embodiments, a processor may comprise one or more modules; in some embodiments, a module may comprise computer instructions-which can be a set of instructions, an application, software-which are operable on a computational device (e.g., a processor) to cause the computational device to conduct and/or achieve one or more specific functionality.
Some embodiments are described with regard to a “computer,” a “computer network,” and/or a “computer operational on a computer network.” It is noted that any device featuring a processor (which may be referred to as “data processor”; “pre-processor” may also be referred to as “processor”) and the ability to execute one or more instructions may be described as a computer, a computational device, and a processor (e.g., see above), including but not limited to a personal computer (PC), a server, a cellular telephone, an IP telephone, a smart phone, a PDA (personal digital assistant), a thin client, a mobile communication device, a smart watch, head mounted display or other wearable that is able to communicate externally, a virtual or cloud based processor, a pager, and/or a similar device. Two or more of such devices in communication with each other may be a “computer network.”
Although the present invention is described with regard to a “computer”, it should be noted that optionally any device featuring a data processor and the ability to execute one or more instructions may be described as a computer, computing device, or mobile computing device, or user device including but not limited to any type of personal computer (PC), a server, a cellular telephone, an IP telephone, a smartphone, a PDA (personal digital assistant), or a pager. A server as used herein may refer to any of a single server, multiple servers, distributed servers or cloud computing environment. Any two or more of such devices in communication with each other may optionally comprise a “computer network”.
The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in order to provide what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice. In the drawings:
Test impact analytics are an important aspect of testing code. Simply running all available tests may take a great deal of time and slow down development. On the other hand, failing to run an important test may cause code to be inadequately tested, and even lead to failures in production or execution.
New defects in the system are mainly created by code changes. Therefore, it is preferable that areas of code that were modified since the last text execution are tested. The tests that are run on code that was recently modified have a much higher probability of failure.
On the other hand, tests that have been previously run on code that was not modified should not suddenly start failing. If such a test is failing without a reason, it may be defective and so this test result would not be trusted.
However if a test fails, it is most probable that the reason for the failure is due to the code change that was done prior to the occurrence of the failure.
Given the above, when a full regression cycle for every given code change is performed, many tests may be run that have a very low probability of failure. As noted above, this may cause the delivery pipeline to be very slow and the feedback for the developers may be very long. In addition, the infrastructure cost of running a full regression is very high, the waiting queue for a test environment may then expand to a catastrophic level. In such a situation, sometimes test jobs need to be canceled urgently in order to push some important fixes or changes earlier. However, making such changes without prior analysis will not lead to the most efficient outcome-such as canceling only unnecessary tests.
Some manual test operations are built on the practice of having a test architect select the required tests to execute for a given build in order to optimize the test execution time. This selection is based on the knowledge of what the tests are doing and what the code change is, and how the code change may impact specific tests. Through such knowledge, the tests that are selected are hopefully the ones with a higher probability to fail. However, this is a manual selection done based on an individual's knowledge and subject to human mistakes. Such a manual selection inherently lacks visibility of how the changes are actually impacting the tests.
Preferably, instead of relying on fallible human selection, automatic and scientific methods are performed for test impact analytics, and for selecting important code to test and/or the tests themselves. Various methods may be used for the determination of code importance and/or test impact analytics. These test impact analytics consider the impact of the code change on tests and/or of the test on the code, in order to build a list of tests to be performed. Optionally the list is built according to an order of relative test importance, which may in turn be determined according to an importance of the test, an importance of the code to be tested or a combination thereof. For example, tests may be preferentially implemented on code that the end user is actively using, or causing to execute.
A user may also implement a manual rule or condition for running one or more tests. Additionally or alternatively, a user may also determine a degree of sensitivity vs accuracy. For example, the user may prefer that analysis of a relationship between code components and tests tends to favor running a test rather than not running the test. On the other hand, the user may prefer to only run tests that are clearly related to, or associated with, particular code.
All software modules and/or functional processes described herein are assumed to be run by a computational device or a plurality of such devices, even if not explicitly shown.
Turning now to the drawings, there is shown an exemplary system in
Customer premises 102 optionally and preferably features a user computer 104 with a web interface 106, which allows the user to control the various components of the system, including with regard to deploying builds, determining whether a build has been properly tested, analyzing test results, and also running a dashboard for various parameters for these results.
In addition, Customer Premises 102 features a customer build server 108 operating with a Build Listener 110. Customer Build Server 108 may also optionally be referred to as a CI (continuous integration) server. Customer Build Server 108 is present at Customer Premises 102 to be able to build or determine the various builds, and once the builds have actually been created, to be able to deploy them to the test environment. The Build listener 110 monitors the build server 108, and determines when a new build has been created. The Build Listener 110 then determines any differences from a previous build, optionally and preferably ignoring any extraneous information, such as comments on the code.
It should be noted that for parts of the build that need to be tested, Customer Application Server 112 preferably operates a Test Listener 114. Test Listener 114 listens to the tests which are to be performed, and also determines which tests are being performed and which tests have already been performed, in order to determine whether sufficient test coverage for the new build has been provided. Test listener 114 also determines which tests cover which parts of the build.
A Customer Test Server 116 then proceeds to run the test assisted by a Test Runner 118 that collects the test data. The information provided by operating the tests and from determining the test coverage and how the tests have been operated, is provided through Internet 120 to Internet Gateway 124, and hence to Cloud Application 122. Cloud Application 122 runs on a server which as used herein may refer to a distributed server, virtual server or cloud computing environment (not shown) or combination of these. This information is stored in the Storage 142, from which Analysis Engine 120 is able to withdraw and perform various analyses. Analysis Engine 120 performs analysis as information comes in, preferably in combination with storage 142 to actually obtain the information for testing. The results from Analysis Engine 120 are then stored in Database 128 and Storage (142), which provides a record of build changes, test coverage, test events, quality holes, trends, reports and dashboard data.
Internet gateway 124 receives messages and requests from customer premises 102, including from web interface 106. Messages and requests from web interface 106 are preferably transmitted through internet gateway 124 to UI engine 126 for answers and bidirectional commands.
An optional exemplary method is provided according to some embodiments of the present invention for optionally operating through the system of
In stage 1, the build content is analyzed to determine what changes have made from the last build, optionally and preferably ignoring comments. A stream of builds is actually preferably received through the server which is creating them, and as each build is received, it is analyzed for differences from the previous builds in step 1.
Next, test selection is performed in step 2, optionally including test coverage calculation. Such a selection is preferably performed according to the build content, including any differences from previous builds, and also according to tests that have been previously run and optionally also those that are scheduled to run. Optionally the selection is performed according to quality analytics, for example, previous test coverage holes, and also according to environments in which the build is to be deployed. Optionally and preferably it is also performed across tools and environments.
Test coverage calculation may optionally form part of the test selection process, particularly in a continuous testing environment, in which for each build, optionally and preferably the following process is performed: analyze build, build quality analytics, select tests, run them, analyze results, determine build quality according to rules, and decide if the build can go to production.
The tests may optionally include one or more of a unit-test, component tests, integration tests, and other type of tests, preferably also including test automation so that the tests are run automatically, after which the coverage analysis determines whether the build may be released. The order in which the tests are run and/or whether certain tests are run according to some type of selection may optionally be determined according to requirements for test coverage.
In stage 3, the test(s) are run and the results are collected. These include the results of the test, any test coverage holes, and optionally and preferably a determination of the tests which need to be performed. The results of the test are also collected in order to determine whether the build actually passes the test and whether sufficient coverage is provided in order for the determination to be made as to whether the build passed the test in stage 3. Optionally, the test(s) are run according to a determination of test priority, such that more urgent tests are run first or optionally even only urgent tests are run. Optionally the test selection and operating order are determined according to a test management framework, which may optionally receive the operating order and then cause the tests to occur according to that order. Optionally, according to the results, the test management framework can change the test priority as the tests are performed and may optionally also be able to change the test(s) being performed and even to stop the performance, according to a dynamic test priority selection.
In stage 4, the build quality is determined, often preferably including automatically building a score for the build according to the baseline multi-metrics of the analytics. This determines whether the build may be deployed. If the build is of sufficient quality, then optionally and preferably it is deployed in stage 5.
BuildDiff Queue 130 receives a message that outlines the content of the build, and this information is then pulled by a build Queue Parser 136, which causes Analysis Engine 120 to retrieve information from Storage 142 regarding the content of the build. Footprint Queue 132 receives information regarding test coverage. This messaging causes the Footprint Queue Parser 138 to pull information from Storage 142 for further analysis. TestEvents Queue 134 receives information regarding tests that have been performed and the results. This causes TestEvents Queue Parser 140 to pull the information from storage 142 regarding the test results.
All this information is optionally and preferably fed into the Database 128 and is then handled by Test Scoring Engine 144, as shown. Test Scoring Engine 144 then determines the level of the test, whether the code passes the test and whether the code has undergone sufficient testing. This information is then stored in Database 128. In addition, the information regarding whether the build passed and whether or not there's sufficient build coverage, and of course if any test holes or problems still remain, is passed through UI Engine 126.
UI Engine 126 then connects back to Internet Gateway 124, and the information is passed back to the customer server or other user computational device (not shown, see
A further exemplary implementation of Cloud Application 122 is shown in
Analysis engine 120 optionally and preferably determines test quality coverage according to the test results. Optionally, test quality coverage is determined according to one or more of detection of at least one change in the code from a previous build to a current build and analysis of at least one test to determine whether such change has been tested; assessment of at least one previously failed test; assessment of at least one test coverage hole; or a combination thereof. As previously noted, a test coverage hole may optionally be determined in the case that at least a portion of the code has not adequately been tested by test(s) that have been run, which may optionally include but is not limited to zero test coverage, in which no test that has been run tests that portion of the code.
For example, in order to determine test coverage, optionally and preferably the number of methods tested is divided by the number of methods in each component(s) to determine the percent coverage. Preferably also code branches and code lines are also considered in the calculation for the percent coverage. This determination is optionally and preferably made per environment and also across all environments.
The test cycle of a build may also optionally be divided into test environment and quality coverage may optionally be calculated as described above both per test environment and also across the entire build. Additionally and/or alternatively, unit-test results may also optionally be considered, unit-test results only provide coverage results across an entire build. The results of these tests may optionally show 100% coverage for each individual method within the build, as the tests may provide coverage such that each method is tested once, even if not all of the other test environments have 100% coverage.
According to at least some embodiments a quality hole is flagged, indicating that a method that was modified, has not been tested at all levels of integration, for example according to at least one unit-test, component test and functional test. At each higher level of integration, the number of tests that were run and the total coverage at least is considered.
It is also possible to skip a test, which may then optionally be noted separately, to show how many tests were skipped in general in terms of the coverage, and/or in comparison to what was checked in the previous build.
In addition, UI Engine 126 informs Internet Gateway and hence the customer servers (such as user computer 104) of the results. Internet Gateway 124 also connects to the UI Engine to receive any commands from the user at the Customer Premises 102.
In addition, builddiff queue parser 136 also preferably connects to priority service 180, trends service 182, coverage service 184, and quality holes service 186. Each such service may optionally contribute to test scoring and the determination of overall build quality coverage. If test priority is to be determined (for example whether to run a test and/or which tests to run first), the priority service 180 preferably determines the test priority.
Trends service 182 optionally and preferably determines the trends in test coverage and build quality coverage and may also optionally determine the trends in test results: for example, whether test coverage is increasing or decreasing over time; whether test coverage holes are being detected and covered by one or more tests; whether build quality is increasing or decreasing over time; and/or whether the necessary tests for achieving a certain level of test coverage are being performed.
Coverage service 184 optionally and preferably determines the level of coverage while quality holes service 186 detects holes in test coverage. This information is preferably passed back to trends service 182 in order for trends to be determined. Each service determines what information is required, such as information about changes in the build which is obtained from BuildDiff, and any other required information, and uses such information to perform the calculations.
In stage 4, the tests are run, optionally and preferably according to coverage requirements (for example in order to increase test coverage quality) in each environment. Optionally tests are run according to priority, such that the highest priority tests are performed first, and the lowest priority tests are performed last. Optionally, for each test, or alternatively, only after certain tests have been done or only after all tests have been done, post-test test coverage is calculated for each environment, and also connected to the build coverage. This is because in different environments, optionally different tests may be needed. And also, the build may be different for each requirement.
The coverage trends are calculated in stage 6, including whether, in fact, additional sufficient coverage has been provided, or whether coverage holes still remain. The actual identity of the quality holes, for example a section or sections of code that were not tested, for example for a specific build and preferably over a plurality of environments or even all environments, is preferably determined in stage 7. In stage 8, the build scoring is calculated to determine whether or not the build passes, whether it has been tested, whether it has been shown to have sufficiently high quality, and whether there are sufficiently few quality holes in order to determine the actual coverage of the test, and hence the quality of the build.
Once build quality has been calculated, the system waits for a new build in stage 9, and then the process returns back to stage 1, once a new build has been detected. Optionally,
Then, in stage 4, looping through the list of methods for each component change in each environment, the following processes are optionally and preferably performed. In stage 5a, it is determined whether a method has been added. If so, then in stage 6a, all tests are collected for the environment that have the same method name in their footprints. Footprints are the locations where the tests touch or otherwise analyze the application under test. Each test is done through code and may optionally be performed in a plurality of test stages, optionally over a plurality of environments. Quality holes are then determined according to each section of code in each environment. Unique quality holes are sections of code which were not tested by any test in any environment. The tests may for example only enter a method and then exit it, or may also handle branching code or code related to a particular case. For the latter, the test may not have handled all of the branches or all of the cases, which is another type of quality hole. This process continues on to stage 7 for each test collected. An exemplary screenshot of a user interface dashboard such as interface 106 for receiving information from cloud application such as cloud application 122 concerning build coverage and quality holes is shown in
It is determined in stage 8 whether or not to add an L1 score to the test-scoring for the impacted test, to indicate that this test needs to be run again to increase test coverage quality and/or to increase the priority for running that test. Such an indication may optionally be determined independently of the test score, for example according to such considerations as test coverage and whether a test was skipped. The method then returns to stage 4 as shown.
Returning back to stage 5A, if a method wasn't added, then the build is assessed to determine whether a method was deleted or modified in stage 5B. If not, the process continues back looping to step 4. If a method was modified or deleted, then in Stage 6B, all tests are collected for each environment that had the footprint in the same method. Again, this option is preferably performed in order to determine overlap or other processes that are occurring across different environments. Then the method again returns to Stage (7) as previously described.
Turning now to
For any type of software computer test, once the tests are run, it is possible to determine what these tests have actually examined, termed the “test footprint”. In Stage (2), the test footprint is collected by the TestListener and sent to the server. The test footprint includes information in regard to methods tested, hashes of the functions, and locations (for example within the build) to determine what was tested, which aspects of the code were tested and hence coverage, and also which tests were run in terms of test events. Hashes of the functions may optionally and preferably be used to map between the tests performed and the functions of the code, for example to determine whether certain parts of the code were tested. Hashes may optionally be used to determine whether a method has changed even if the method does not have a name.
In Stage (3), the test status is collected by the TestListener and sent to the service to determine whether or not success or failure has been detected for a particular test. In Stage (4), a test event is created by the TestListener and sent to the server, after which it is determined whether the test has actually been performed. In addition, optionally and preferably the status of the test and its results are determined.
In Stage (1), test execution is finished, and the footprint is reported as previously described in
In Stage (3) it is determined whether the test has failed before, if so, then L2 score is optionally added to the test scoring as a suspected test in Stage (4) which may optionally relate to test priority, for example in terms of whether certain test results need to be reconsidered or whether the test is insufficient and so should be performed again (or with a different test). In stage 5, it is determined whether at least part of the build or the test footprint (that is, coverage of the tests in relation to the build) falls within a user selected parameter. If so, then L3 score is optionally added to the test scoring as a user selected test in Stage (6), again to increase the priority of the test. For example, in cases where the user wishes to pay particular attention to a certain test and or to a certain code type, code function, or area of the code.
In Stage (7), it's considered whether this is the first time the test has been run, such that it is a new test. If so, then L4 score is added to the test scoring indicating that no previous scoring has occurred in stage 8. Therefore, this test needs to be considered particularly and or to receive special consideration in Stage (8).
In Stage (9) is considered whether the test fails often, for example, greater than 50 percent in the past five runs or neither parameter of frequent failure of the test. If so, then L5 is added to the test scoring, indicating it is a non-stable test or alternatively non-stable code in Stage (10).
These test scores may optionally be used to calculate test quality coverage, for example according to the exemplary method of
Starting in Stage (1), if function or method has been added or modified. In Stage (2), tests are executed in the selected environment. Again, optionally and preferably, they're tested in a plurality of environments. The test may optionally be executed according to each environment separately. In Stage (3), it is determined whether a test footprint has been added or modified, whether that is detected. If not, then in Stage (4), the method of function is marked as a quality hole in the selected environment. If however, it has been added or modified, then in Stage (5) is determined whether more environments are required to run the test. If so, the method returns to Stage (2); and if not, then it returns to Stage (1) as previously described.
In Stage (4), the BuildDiff is reported and stored optionally in a cloud application. In Stage (5), the cloud application calculates test scoring based on build and run time data. In the first branch, it relates only to build data. In Stage (6), the cloud application builds analytic data based on the historical data collected. Then, in Stage (7), the test quality coverage is calculated based on current test scoring or historical data or both for this specified environment. Optionally, the results of the method performed as described in
Optionally and preferably, these tests and examination for Stages (5) through (7) are performed separately for each environment. Now, as the process is performed from Stage (4) to Stage (8), the build is deployed at least in the development or quality assurance environment. Optionally and preferably, the build is deployed in each environment in which it is to be deployed in real life.
In Stage (9), tests are executed in the deploying environment based on coverage analysis as described above. Also, based on parameters determined by the particular environment. In Stage (10), the footprint listener collects test run time data. In Stage (11), the test run time data is reported and stored in the cloud application. Now the method returns to Stage (5) as previously described.
Common code coverage tools today focus on a single process, typically a process that runs the user code plus tests. For modern micro-services, multiple small services are used, and each one is usually tested independently, but when running system tests (tests that involve the majority of the system—multiple micro-services in this context), there is no way to gather the total code coverage (e.g. 67% of micro-service “A”, 25% of micro-service B, weighted total 69%).
According to at least some embodiments, a method for performing code coverage calculations—and hence coverage quality—may optionally be performed as described herein. For each service, the total known Methods, “Ms”, and the unique method hits, “ms” are calculated. The calculated coverage for the process is ms/Ms (for example, shown in percent, e.g. 5 methods out of 10=50%)
An “Integration build” or “Release” is a manifest/list of versioned components that constitute a full deliverable, deployable package. For example, release 123 of the product contains component A version 10 and component B version 25. There are many test tools out there, each one is suited for a different purpose/technology/methodology (e.g. TestNG for unit tests in Java, RSpec for behavior driven testing in Ruby, etc.). The present invention provides test listeners for multiple [supported] test tools, and these send data in a common format to the cloud application such as application 122 described above. Additionally, this data is used to track activity (method execution) that happens in the tested process [es], and match every activity to the test that caused it to get executed.
An end-to-end/system test is executed against the whole “Integration Build” and each test “passes” through one or more components. For the entire build, the weighted average is mb/Mb, where mb is the total number of hit methods across all services, and Mb is the total number of methods in all services.
Optionally, according to at least some embodiments, code coverage and quality are determined across a plurality of test tools. For this embodiment, test listeners are provided for a plurality of test tools, which send data in a common format for analysis. Additionally, this data is used to track activity (method execution) that happens in the tested process [es], and match every activity to the test that caused it to get executed.
To further assist with detection of code coverage quality, optionally a further embodiment is implemented, to detect testing coverage quality of interaction between micro-services through auto discovery of the test environment (sometimes referred to as “coloring”).
In a micro-service (or any N-tier) architecture, services communicate with each other. Naturally, this communication is a potential subject for testing (explicitly or implicitly), and a test may trigger activity (method execution) in more than one process. It is necessary to associate each test with all executed methods in all processes. To do so, either all processes need to be notified that test XYZ has started and from this point on all methods should be associated with it, but for this it is necessary to know exactly which processes are involved in advance. A better approach is to do this without prior knowledge, but in order to do this it is necessary to track the execution. This is what “process coloring” is. Whenever process A (micro-service A) makes a request to process B (micro-service B), the test listener on process A augments the request with metadata about the ongoing test. On the listening side, the test listener on process B receives this metadata and stores it in memory. From this point on, all methods in process B will be associated with that test.
Every test listener also reports some information about the execution environment—the process ID, the machine Name, its IP addresses, local time, O/S, runtime versions (e.g. Java 7 version xx). Once all data is received for analysis, it is now possible to report the test environment (involved machines).
In stage 1, the level of coverage for each method is preferably calculated for at least one environment. Optionally the level of coverage is only calculated for a plurality of methods.
In stage 2, the level of coverage for each component of the build, or at least a plurality of components, is preferably calculated for at least one environment. In this context, a component is optionally a portion of the code, including without limitation groups of methods, code lines or code branches, or micro-services. For example, a package or file could be examined. Alternatively, stages 1 and 2 may be performed together.
In stage 3, the level of coverage for the build overall is preferably calculated for at least one environment.
Optionally, the above stages 1-3 are only performed in reference to code elements that have been added or modified.
Optionally and preferably, in stage 4, one or more of stages 1-3 is repeated for any environments which have not been tested.
In stage 5, a method which is suspecting of causing failure (that is, unacceptable coverage) in at least one of stages 1-4 is identified, and is preferably flagged in a dashboard.
In stage 6, it is determined whether the methods and components of a build have been sufficiently tested for the build to pass to acceptance and hence to deployment. Sufficient testing may optionally relate to a percent level of coverage in terms of one or more of the overall build, methods, components and environments. Optionally all code parts must be tested, such that 100% test coverage must be achieved for each of the overall build, methods, components and environments.
In stage 7, the trends for coverage for the above are also optionally calculated. Also optionally, the trend may also determine whether the build is accepted; for example, if the trend for coverage in any part, overall or in one or more specific parts has decreased, then optionally the build is not accepted.
Cloud application 122 then analyzes the results to determine whether the build can be provided to build release 906, for subsequent release to the production environment.
Analysis engine 120 comprises BuildDiff Queue 1004, Footprint Queue 1006, and TestEvents Queue 1008. These queues receive information about tests that have been performed and changes to code as well as footprints of the tests on the applications under test, and in turn preferably connect to a core calculations service 1010. Core calculations service 1010 receives build changes (methods/branches and lines added/modifies/deleted), the results and identity of tests, and also the footprints. Core calculations service 1010 then maps the code to the results and identity of tests, and the footprints (in terms of what is tested), and provides this information to additional services for precise calculations, for example to detect a quality hole. Optionally these calculations may be combined to a single service (not shown), but are preferably split in this way in order to reduce computational load and to increase scalability. The mapping functions are preferably placed in one service so as to provide enough computational power to the particular subset of these functions required for mapping.
Without wishing to be limited in any way, core calculations service 1010 is preferably separated out as a service because although it is the first service to begin the quality coverage determination and quality hole detection, information about the tests and code may be expected to arrive according to the order of generation and transmission, but not necessarily in the order in which the information is needed. The information needs to be stored until it can be accessed which can require a great deal of memory, for example.
Next core calculations service 1010 preferably connects to a test priority queue 1012 which in turn connects to a test priority service 1014, for determining the list and order for test priorities. Basic information regarding these calculations is provided by core calculations service 1010 to test priority queue 1012 so that test priority service 1014 can determine the details of the test priorities. Information regarding test priorities is sent to a reports queue 1016, which then connects to a reports module 1018 which sends the reports to storage 1019. Storage 1019 does not need to be expensive storage such as a database. If the user requests a report, then it is sent to a cache 1021. The reports are then sent through API gateway 1002 from cache 1021 back to the customer premises (not shown), for example to enable the customer computer to analyze the reports and automatically determine the order in which further tests should be run.
The remaining queues connected to core calculations service 1010 receive equivalent information about the respective material that they need to calculate, and in turn provide detailed reports to reports queue 1016, so that the reports can ultimately be sent to the customer premises.
For example, a failed test queue 1020 receives calculations regarding which tests were executed and their status and provides this information to a failed test service 1022 in order to perform a detailed analysis of what the test results are, so that a detailed report can be created.
A test coverage queue 1024 receives information regarding the extent of test coverage, optionally and preferably in different environments and for a new build (according to differences with the old build), and provides this information to a test coverage service 1026, which then performs the detailed analysis of test coverage.
A quality holes queue 1028 receives information regarding any quality holes, optionally and preferably in different environments and for a new build (according to differences with the old build), and provides this information to a quality holes service 1030, which then performs the detailed analysis of quality holes.
A notifications queue 1032 receives information regarding specific tests, coverage levels or any other information that the user is interested in for analysis by notifications service 1034; this report is preferably sent to cache 1021 for transmission to the user.
External data processor queue 1001 connects to external data processor 1005 which performs the calculations for external reports for the dashboard and test labs, including but not limited to information about bugs, static analysis tool results, and functional requirements;
BuildDiff Queue 1004 connects to BuildDiff queue parser 1004P for preferably calculating the differences between the builds and also to build service 1003 which preferably sends information about the build to the BuildDiff queue 1004 and determines against which other build to compare, as well as which tests run against which part of the build, and whether a part has changed and isn't tested. BuildDiff queue parser 1004P also connects to storage 1019 for storing the results of the calculated difference between the builds.
Footprint Queue 1006 connects to Footprint Queue Parser 1006P which preferably calculates the footprint of the test coverage and which parts of code were examined by a test;
TestEvents Queue 1008 connects to TestEvents Queue parser 1008P which preferably collects data about test events for informing other services including but not limited to test starts, and test ends. TestEvents Queue parser 1008P further connects to test event service 1015 for analyzing the implication of the test event related to test coverage, as well as the state of the tests and also to storage 1019 for storing of results;
Test state tracker 1011 receives input from external data processor 1005, Footprint Queue Parser 1006P, and TestEvents Queue parser 1008P. This input enables it to preferably monitor when tests start or end. It also receives results from external data processor 1005 and tracks these, and can send information through API. It also determines whether a test is still running to know whether additional coverage will be received. The output of Test state tracker 1011 is fed into TestEvents Queue 1008. Test state tracker 1011 is also connected to cache 1021 for temporary storage of information;
TestEvents Queue parser 1008P and Footprint Queue Parser 1006P both connect to Queue optimizer 1009 for optimizing use of memory for calculations, and context switching manager. Context switching enables memory to be loaded with information as infrequently as possible, to increase efficiency. Queue optimizer 1009 connects to cache 1021 for temporary storage of information.
Queue optimizer 1009 connects to core calculations service 1010. Core calculations service 1010 receives build changes, the results and identity of tests, and also the footprints. Core calculations service 1010 then maps the code to the results and identity of tests, and the footprints (in terms of what is tested), and provides this information to additional services for precise calculations, for example to detect a quality hole. Optionally these calculations may be combined into a single service (not shown), but are preferably split in this way in order to reduce computational load and to increase scalability. The mapping functions are preferably combined into one service so as to provide enough computational power to the particular subset of these functions required for mapping.
Core calculations service 1010 determines the test coverage and for example also optionally determines the quality hole detection, by building a matrix of all parts of the code. Core calculations service 1010 then determines that each part of the code has been tested, alone and also optionally through integration (for example combinations of code). Core calculations service 1010 receives the build, with the analyzable components provided (such as for example methods, branching functions, lines and so forth) from queue optimizer 1009 upon completing a calculation and then starts a new calculation, according to information received from queue optimizer 1009. Core calculations service 1010 then places all components in a matrix to check the coverage of each component with one or more tests. Preferably also the results of the test are included, for example in terms of whether the test succeeded. Core calculations service 1010 preferably calculates coverage according to successful tests, rather than merely whether a test ran.
Optionally the memory that serves core calculations service 1010 may be distributed in order to handle large amounts of code, whether by separate components or type of test to be run on each component. Including both successful and failed results may optionally be used to determine whether a test succeeded—or failed-according to the last time that the test(s) were run, more preferably according to whether one or more code components changed. To increase scalability, optionally sharing is performed, so that information that is required for a particular set of operations is stored on the same or similarly accessible memory, again to avoid swapping in and out of memory. With sufficient memory and a sufficiently optimized memory structure, optionally core calculations service 1010 acts to collect information, which is then served directly to test coverage queue optimizer 1024 and failed test queue optimizer 1020. Optionally queue optimizer 1009 may operate to reduce the demand on memory as previously described.
Without wishing to be limited in any way, core calculations service 1010 is preferably separated out because although it is the first service to begin the quality coverage determination and quality hole detection, information about the tests and code may be expected to arrive according to the order of generation and transmission, but not necessarily in the order in which the information is needed. The information needs to be stored until it can be accessed which can require a great deal of memory, for example. Core calculations service 1010 is preferably able to analyze the results of tests much more quickly so that test analysis, and calculation of the results, can be determined in real time. Also, because the system and methods are highly asynchronous, core calculations service 1010 is preferably organized to be able to analyze the results of the tests, even if the results appear in various non-predetermined, and non-deterministic, orders. For example, information regarding the components of the build may optionally arrive after the test and/or footprint results.
Several queues are connected to core calculations service 1010 and receive relevant information related to the aspect of testing that they need to analyze.
Failed test queue optimizer 1020 receives calculations regarding which tests failed and provides this information to a failed test service 1022 in order to perform a detailed analysis of which tests failed. The output of failed test service 1022 is preferably sent to dashboard queue 1039 and reports queue 1016, so that a detailed report can be created preferably covering one or more of failed tests, passed tests and test status overall.
Test coverage queue optimizer 1024 receives information regarding the extent of test coverage, optionally and preferably in different environments and for a new build (according to differences with the old build), and provides this information to a test coverage service 1026, which then performs the detailed analysis of test coverage also based on information retrieved from storage 1019.
These analyses from failed test service 1022 and test coverage service 1026 are forwarded to a reports queue 1016 and to reports service 1018 which listens to events from reports queue 1016 which then connects to a reports module 1018 which creates and sends the reports to storage 1019. If the user requests a report, then it is sent to a cache 1021. The reports are then sent through API gateway 1002 from cache 1021 back to the customer premises (not shown).
Core calculations service 1010, API gateway 1002, failed test service 1022 and test coverage service 1026 connect to dashboard queue 1039. Dashboard queue 1039 connects to dashboard service 1040 which listens to events from dashboard queue 1039, generates reports for sending to a dashboard at the client's location, and also determines how to display and arrange the dashboard data to the client. Dashboard service 1040 connects to threshold queue 1043 which in turn connects to threshold service 1042 which preferably checks the thresholds and rules set by the client/user and then checks whether thresholds are met or exceeded. Threshold service 1042 feeds back into dashboard queue 1039 such that these threshold indications are preferably displayed on the dashboard, for example, as red or green indications or other suitable indications of threshold exceeding (as shown in
Notifications service 1034 receives information from the dashboard service regarding specific tests, coverage levels or any other information that the user is interested in and provides this information for transmission to the user, for example by sending a notification through email or Slack or any other messaging service as known in the art. Non limiting examples of reports include weekly reports or reports that a build is ready.
Security is optionally and preferably provided according to the Amazon AWS platform as well. Furthermore, optionally and without limitation, storage 1102 preferably communicates with analysis engine 120 through the HTTPS/443 protocol. Internet gateway 124 preferably communicates with storage 1102 and UI engine 128 through the HTTPS/443 protocol. In both cases, such communication optionally and preferably includes build meta data, including differences with previous builds; coverage meta data on methods/functions covered during test per listener running on the Application Under Test; and test event data, including test execution meta data: test names, start time, end time, status and so forth.
Customer build server 108 preferably communicates the build meta data from build listener 110 to storage 1102. Customer application server 112 preferably communicates the coverage meta data from test listener 114 to storage 1102. Customer test server 116 preferably communicates the test execution meta data to storage 1102.
Additionally, internet gateway 124 preferably communicates with internet 120 through the HTTPS/443 protocol.
Analysis engine 120 and UI engine 126 both optionally and preferably communicate with database 128 according to the SSL protocol.
In a system 1300, the list of tests to run or exclusion of such tests is determined by a backend 1302 by analyzing code changes and the status of various tests in the previous run (failed tests or new/modified tests). Backend 1302 may for example correspond to the previously described cloud system. Backend 1302 determines the effect of such code changes. Backend 1302 receives the status of the tests being run, as well as the results from previously run tests, for example from an agent or other software monitoring the underlying test framework. Preferably backend 1302 then applies statistical analysis to determine which tests are to be run, and more preferably to determine a relative importance of running one or more tests. For example and without limitation, backend 1302 may determine that certain tests are required according to one or more test criteria, while other tests are less important in relation to the required tests. Optionally, input may be obtained from an end user or from another source, such as a software source for parameters for example, to determine which tests are more relevant and/or which tests are required.
Code changes are preferably collected by backend 1302 after being reported by a build scanner 1304. Build Scanner 1304 preferably scans tests classes in the build map or artefact folder to allow management of test changes. Build scanner 1304 preferably monitors changes in the build, and in particular changes in the code, which may be used to determine which tests are relevant by backend 1302. Build scanner 1304 may optionally cause or define the build, but preferably listens to a build server that actually causes the build to be constructed (not shown). Upon detecting that a new build has been made, or upon being notified of such a new build being made, build scanner 1304 preferably then determines any differences from a previous build, optionally and preferably ignoring any extraneous information, such as comments on the code. Alternatively, build scanner 1304 only receives the differences in the code from a previous build. Also optionally and alternatively, build scanner 1304 determines such differences in the code from a previous build.
A test listener 1306 may for example operate according to the previously described test agent. Preferably test listener 1306 is able to modify test methods and to determine which should be excluded. Test listener 1306 also preferably defines a new footprint format. Optionally, the context for this footprint, for example in terms of which tests are applied, the headers for communication for the tests, and so forth is also sent. Such context is preferably sent for uni-test implementations, optionally in place of timing or time based information. Alternatively, the test context is not sent.
Optionally and preferably test listener 1306 collects code coverage from the application being tested while tests are running; collects test information and test results; and forces the test framework to run the recommended test list and skip the other tests not in the list (the recommendation process is described in greater detail below).
Back end 1302 is optionally able to determine coverage by already applied test methods in relation to the previous build of the code. Optionally, back end 1302 is able to predict code coverage of available tests on the new test build. Optionally and preferably back end 1302 determines test coverage according to the test results or may receive such test coverage from a test agent as described in greater detail below. Optionally, test coverage is determined according to one or more of detection of at least one change in the code from a previous build to a current build and analysis of at least one test to determine whether such change has been tested; assessment of at least one previously failed test; assessment of at least one test coverage hole; or a combination thereof. As previously noted, a test coverage hole may optionally be determined in the case that at least a portion of the code has not adequately been tested by test(s) that have been run, which may optionally include but is not limited to zero test coverage, in which no test that has been run tests that portion of the code.
In some cases (for example, when using test frameworks for tests running) the test listener 1306 collects footprints in a test execution time frame and packs together test identifier and collected footprints-so-called colored footprints. In this case optionally back end 1302 does not perform any additional mechanism for associating between test and tested code.
For example, in order to determine test coverage, optionally and preferably the number of methods tested is divided by the total number of methods in each component(s) to determine the percent coverage. Preferably also code branches and code lines are also considered in the calculation for the percent coverage. This determination is optionally and preferably made per environment and also across all environments.
The test cycle of a build may also optionally be divided into test environment and coverage may optionally be calculated as described above both per test environment and also across the entire build. Additionally and/or alternatively, unit-test results may also optionally be considered, unit-test results only provide coverage results across an entire build. The results of these tests may optionally show 100% coverage for each individual method within the build, as the tests may provide coverage such that each method is tested once, even if not all of the other test environments have 100% coverage.
It is also possible to skip a test, which may then optionally be noted separately, to show how many tests were skipped in general in terms of the coverage, and/or in comparison to what was checked in the previous build.
If tests are executed in stages, then optionally the determination is made for each test stage, for example to select one or more tests to be performed at each stage.
Backend 1302 preferably associates footprints to a particular test according to time of footprints collection for a preferable time based solution. Optionally, backend 1302 reviews tests that were performed, the code that was executed or reviewed at the time of performing the tests, and then compares such information. Preferably a statistical analysis is used to determine the likelihood of a particular test being associated with particular code.
Backend 1302 preferably then generates a test to method mapping 1308, by updating the build history on each build (shown as a build history 1310) and by updating a statistical model 1312 through a timer. Wherever the term “test to method mapping” is described, optionally a test may be mapped to a file or to another code component. Each of these two different types of mapping is explained in greater detail below.
Build History Mapping 1310 is a matrix for each test, which contains affected files and methods for particular execution identifier. Up to N executions may be supported for each test. This mapping is created based on test execution data (footprints, etc.). Optionally, the above build map information may be fed to a machine learning model, to further analyze and refine test mapping to particular parts of code according to the build history. Also optionally, the above build map is fed to a statistical model to determine a further statistical map.
Turning back to
In the system 1500, executable code processor 1554 is executing a test 1556. The test agent 1536 monitors the behavior of unit under test 1556 and causes one or more tests to be performed. These tests may perform through a test engine server 1532 and a test framework 1534. Test framework 1534 preferably determines the code coverage for new, modified, or existing code and may also receive information from cloud system 1522 regarding previous tests. Test framework 1534 may then calculate the effect of a new test as to whether or not it will increase code coverage or whether it will not increase test code coverage, and in particular test coverage for specific code.
Test information is sent first to a storage manager 1542 and then to analysis engine 1520. Analysis engine 1520 determines whether or not test code coverage should be updated, how it should be updated, whether any code has not been tested, and so forth. This information is stored in database 1528 and is also passed back to gateway 1524.
As shown, the test listener functions of
A build mapper 1502 preferably determines the relevance of one or more tests, according to whether the code that is likely covered by such tests has changed. Such a determination of likely coverage and code change in turn may be used to determine which tests are relevant, and/or the relative relevance of a plurality of tests.
Build mapper 1502 preferably receives information about a new build and/or changes in a build from a build scanner 1512. Alternatively, such functions may be performed by analysis engine 1520. Build mapper then preferably receives information about test coverage, when certain tests were performed and when different portions of code were being executed when such tests were performed, from test agent 1536 and/or analysis engine 1520.
Build mapper 1502 preferably comprises a footprint correlator 1504, for determining which tests relate to code that has changed, or that is likely to have changed, as well as for receiving information regarding code coverage. Footprint correlator 1504 in turn preferably communicates such information to a history analyzer 1506 and a statistical analyzer 1508. History analyzer 1506 preferably assigns likely relevance of tests to the new or changed code, based on historical information. Such likely relevance is then based to statistical analyzer 1508. Statistical analyzer 1508 preferably determines statistical relevance of one or more tests to one or more sections of code, preferably new or changed code. For example, such statistical relevance may be determined according to the timing of execution of certain tests in relation to the code that was being executed at the time. Other relevance measures may also optionally be applied. Information regarding the results of the build history map and/or statistical model are preferably stored in a database 1510.
For determining the relative or absolute impact of a test, and optionally for selecting one or more preferred tests, a test impact analyzer 1560 is shown as part of cloud system 1522. Test impact analyzer 1560 may have part or all of its functions incorporated in build mapper 1502 or at another location (not shown). Test impact analyzer 1560 is preferably able to recommend one or more tests to be performed, for example according to a policy set by the user, according to one or more rules or a combination thereof. Tests may be given a preference, relative or absolute, according to such a policy or such rules.
Non-limiting examples of such rules include a preference for impacted tests, previously used at least once, that cover a footprint in a method that changed in a given build. Also preferably recently failed tests may be performed again. New tests that were added recently or modified tests may be performed again. Tests that were recommended in the past but were not executed since then may have a preference to be performed.
Tests that are covering code that is being used in production may be preferred, particularly in case of inclusion of one of the above rules. Other code related rules may include but are not limited to tests that are covering code that was modified multiple times recently, and/or tests that are covering code that is marked manually or automatically as high risk code.
A non-limiting example of a user determined policy rule is the inclusion of an important test recommended by the user so that it is always executed in each selective run of the tests.
When a user selects the option of running only the recommended tests, it is done by defining a selective run policy that may include information about the occasions when to run the selective tests list or full test list, that can be based on number or tests executions in a day/week/month, or every number of test executions, or between a certain time frame.
Test impact analyzer 1560 preferably performs test impact analytics, based on the above. Test Impact Analytics is based on the correlation between a test coverage and code changes. It may for example include giving a preference to the tests that have footprints in methods that were modified in a given build. Test impact analyzer 1560 is supported by calculations of the code coverage for each test that was executed in any given test environment. The per test coverage is calculated based on a statistical correlation between a given time frame of the tests that were executed with the coverage information being collected during this time frame, as described above and in greater detail below.
Also as described in greater detail below, a machine learning system is preferably used to refine the aforementioned correlation of tests to coverage data. Such a system is preferably applied, due to the fact that test execution is not deterministic by nature and the fact that tests may run in parallel, which may render the results even more non-deterministic.
Optionally, test impact analyzer 1560 determines that a full list of tests is to be run, for example but without limitation under the following conditions: 1. User selection of full run. 2. When server bootstrapping code has been modified. 3. Configuration or environment variables have been modified.
Functions of processor 1562 preferably relate to those performed by any suitable computational processor, which generally refers to a device or combination of devices having circuitry used for implementing the communication and/or logic functions of a particular system. For example, a processor may include a digital signal processor device, a microprocessor device, and various analog-to-digital converters, digital-to-analog converters, and other support circuits and/or combinations of the foregoing. Control and signal processing functions of the system are allocated between these processing devices according to their respective capabilities. The processor may further include functionality to operate one or more software programs based on computer-executable program code thereof, which may be stored in a memory, such as a memory 1564 in this non-limiting example. As the phrase is used herein, the processor may be “configured to” perform a certain function in a variety of ways, including, for example, by having one or more general-purpose circuits perform the function by executing particular computer-executable program code embodied in computer-readable medium, and/or by having one or more application-specific circuits perform the function.
Also optionally, memory 1564 is configured for storing a defined native instruction set of codes. Processor 1562 is configured to perform a defined set of basic operations in response to receiving a corresponding basic instruction selected from the defined native instruction set of codes stored in memory 1564. For example and without limitation, memory 1564 may store a first set of machine codes selected from the native instruction set for receiving information from build scanner 1512 (not shown) about a new build and/or changes in a build; a second set of machine codes selected from the native instruction set for receiving information about test coverage, when certain tests were performed and when different portions of code were being executed when such tests were performed, from test agent 1536 and/or analysis engine 1520 (not shown); and a third set of machine codes from the native instruction set for operating footprint correlator 1504, for determining which tests relate to code that has changed, or that is likely to have changed, as well as for receiving information regarding code coverage.
Memory 1564 may store a fourth set of machine codes from the native instruction set for communicating such changed code and/or code coverage information to a history analyzer 1506, and a fifth set of machine codes from the native instruction set for assigning likely relevance of tests to the new or changed code, based on historical information. Memory 1564 may store a sixth set of machine codes from the native instruction set for communicating such changed code and/or code coverage information to a statistical analyzer 1508, and a seventh set of machine codes from the native instruction set for determining statistical relevance of one or more tests to one or more sections of code, preferably new or changed code.
More preferably, an output correlator 1604 receives information from history analyzer 1506 and statistical analyzer 1508, and transmits this information to machine learning analyzer 402. Such transmission may enable the information to be rendered in the correct format for machine learning analyzer 1602. Optionally, if history analyzer 1506 and statistical analyzer 1508 are also implemented according to machine learning, or other adjustable algorithms, then feedback from machine learning analyzer 1602 may be used to adjust the performance of one or both of these components.
Once a test stage finishes executing, optionally with a “grace” period for all agents to submit data (and the API gateway to receive it), then preferably the following data is available to machine learning analyzer 1602: a build map, a test list, and time slices. A Build map relates to the code of the build and how it has changed. For example, this may be implemented as a set of unique IDs+code element IDs which are persistent across builds. The test list is a list of all tests and their start/end timing. Time slices preferably include high-time-resolution slicing of low-coverage-resolution data (e.g. file-level hits [or method hits] in 1-second intervals).
The first step is to process the data to correlate the footprint per test (or a plurality of tests when tests are run in parallel). The second step is model update for the machine learning algorithm. Based on the build history, the latest available model for a previous build is loaded (ideally this should be the previous build).
If no such model exists, it is possible to assume an empty model with no data, or an otherwise untrained machine learning algorithm. The model consists of a set of test+code element id mapping (which are the key) and a floating point number that indicates the correlation between the test and the code element id. Such correlation information is preferably determined by statistical analyzer 308. For example, a “1.0” means the highest correlation, whereas a 0 means no correlation at all (the actual numbers will probably be in between).
For any test+code element id, the method preferably updates each map element, such as each row, according to the results received. For example, updating may be performed according to the following formula: NewCorrelation [test i, code element id j]=OldCorrelation [test i,code element id j] *0.9+(0.1 if there is a hit, 0 otherwise). This type of updating is an example of a heuristic which may be implemented in addition to, or in place of, a machine learning algorithm. Preferably these coefficients always sum up to 1.0, so there is effectively a single coefficient that relates to the speed (number of builds). For example, it is possible to do a new statistical model after each set of tests run, optionally per build.
Next preferably a cleanup step is performed where old correlations are deleted for code elements that no longer exist in the new build. Optionally a further cleanup step is performed where old tests are deleted, and methods that are very uncorrelated with tests (e.g. <0.1).
Optionally tests are selected for implementation according to a variety of criteria, once the statistically likely relationship between a particular test that is executed and the related code has been established. Such criteria may be determined according to test impact analytics. These test impact analytics consider the impact of the code and/or of the test on the code, in order to build a list of tests to be performed. Optionally the list is built according to the above described order of relative importance. The list may also comprise a plurality of lists, in which each list may contain a preferred order of tests. Such lists may be assigned for performance in a particular order and/or according to a particular time schedule. For example and without limitation, one list of a plurality of tests may be executed immediately, while another such list may be executed at a later time, which may for example and without limitation be a particular time of day or day of the week.
Another consideration for the creation of one or more lists is the implementation of a minimal set of tests as opposed to a full test review. For example, a list may contain a minimal necessary set of tests. Alternatively or additionally, at least one list may comprise a full set of tests only on code that the end user is actively using, or causing to execute. Such a list may be preferentially implemented, while a full set of tests on code that the user is not actively using may not be preferentially implemented, may be implemented with a delay or when resources are free, or may not be implemented at all.
The test impact analytics may include such criteria as preferred implementation of new and modified tests; tests selected by the user, failed tests, and/or tests that are selected according to code that actual end users cause to execute when the code is in production. Other important criteria for the code that may influence test selection include highly important code sections or code that is considered to be mission critical, as well as code that has required significant numbers of tests according to past or current criteria, with many interactions with other sections of code and/or with many new commits in the code.
Turning now to
A DBN is a type of neural network composed of multiple layers of latent variables (“hidden units”), with connections between the layers but not between units within each layer.
A CNN is a type of neural network that features additional separate convolutional layers for feature extraction, in addition to the neural network layers for classification/identification. Overall, the layers are organized in 3 dimensions: width, height and depth. Further, the neurons in one layer do not connect to all the neurons in the next layer but only to a small region of it. Lastly, the final output will be reduced to a single vector of probability scores, organized along the depth dimension. It is often used for audio and image data analysis, but has recently been also used for natural language processing (NLP; see for example Yin et al, Comparative Study of CNN and RNN for Natural Language Processing, arXiv: 1702.01923v1 [cs.CL] 7 Feb. 2017).
Outputs 1812 from AI engine 1806 may then be provided as test relevance ranking 1804, as previously described.
In terms of provision of the training data, preferably balanced and representative training data is used. The training data is preferably representative of the types of actual data which will be processed by the machine learning model.
While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made, including different combinations of various embodiments and sub-embodiments, even if not specifically described herein.
It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination.
Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.
Number | Date | Country | |
---|---|---|---|
62372419 | Aug 2016 | US | |
62906215 | Sep 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17371144 | Jul 2021 | US |
Child | 18476558 | US | |
Parent | 16323263 | Feb 2019 | US |
Child | 17371144 | US | |
Parent | 17024740 | Sep 2020 | US |
Child | 18061463 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 18476558 | Sep 2023 | US |
Child | 18637485 | US | |
Parent | 18061463 | Dec 2022 | US |
Child | 18637485 | US |