Embodiments of the present invention generally relate to machine learning models and to testing machine learning models, including very large machine learning models. More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods for testing very large machine learning models.
Machine learning models are examples of applications that become more accurate in generating predictions without being specifically programmed to generate the predictions. There are different manners in which machine learning models learn. Examples of learning include supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.
Generally, a machine learning model is trained with certain types of data. The data may depend on the application. Once trained or once the machine learning model has learned from the training data, the machine learning model is prepared to generate predictions using real data.
Training a machine learning model, however, can be costly. This is particularly true for certain machine learning models such as VLMs (Very Large Models). VLMs may have, for example, on the order of a trillion parameters. As a result, training and testing VLMs can be costly from both economic and time perspectives.
These VLM training and testing difficulties can present problems whenever a change is made to anything associated with the operation of the VLM. If a change is made to the dataset, the model pipeline, or the codebase, there is a need to ensure that the VLM remains valid. In fact, there are many instances where it is critical to have quality and performance guarantees, such as in self-driving vehicles. Accordingly, example embodiments disclosed herein address issues associated with retraining and retesting VLMs while minimizing costs and ensuring that changes surrounding the VLMs do not adversely impact the behavior of the VLMs.
In order to describe the manner in which at least some of the advantages and features of the invention may be obtained, a more particular description of embodiments of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, embodiments of the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings, in which:
Embodiments of the present invention generally relate to machine learning models including very large machine learning models (VLMs), referred to generally herein as models. More particularly, at least some embodiments of the invention relate to systems, hardware, software, computer-readable media, and methods for unit testing of very large machine learning models.
Model management relates to managing models and ensures that the models meet expectations and business requirements. Model management also ensures that models are properly stored, retrieved, delivered in an up-to-date state, and the like. Embodiments of the invention relate to increasing quality assurance when a change or changes are made to a model pipeline, model datasets, model codebase, or the like. Embodiments of the invention are able to retrain and/or retest a model while reducing or minimizing costs.
Retraining and/or retesting models such as VLMs can be cost prohibitive and embodiments of the invention ensure that, when a change that may impact the behavior of a model occurs, the training and validation behavior remains the same or sufficiently close to the expected behaviors of the model prior to the change. In order to retrain and/or retest in a more cost-effective manner, embodiments of the invention may generate a small or proxy version of a model using compression, such as neural network compression. Embodiments of the invention may perform unit testing on compressed models.
A framework is provided that allows specific tests to be created for a given functionality of a model such as a VLM. For example, a test for the expected final training error or the expected validation error curve may be created. These tests are executed using the proxy or compressed versions of the models. Embodiments of the invention relate to unit testing and neural network compression in a single framework.
Aspects (e.g., functionality, behavior, metrics) of models can be tested using unit tests. A unit test, which may be automated, helps ensure that a particular unit of code or other aspect of a model is performing the desired behavior. The unit of code being tested may be a small module of code or relate to a single function or procedure. In some examples, unit tests may be written in advance.
Model compression allows a compact version of a model to be generated. Compression is often achieved by decreasing the resolution of a model's weights or by pruning parameters. Embodiments of the invention ensure that the compressed model is small and achieves similar performance on selected metrics with respect to the original uncompressed model. The compressed models may be, by way of example only, 10%-20% of the size of the original models while still achieving comparable metrics.
A framework is further provided for determining if the passing or failing of a unit test is based on an underlying problem with the data pipeline or if the passing and failing is based on changes to the underlying distribution of the data used to train or execute the models. That is, if a change in the distribution (i.e., data drift) exists in the domain, the unit tests with the compressed models may fail in cases in which the VLM still retains total (or sufficient) functionality. Thus, the framework disclosed herein accounts for changes in the data distribution between the time at which the VLM (and the compressed test model) are trained and the time at which the test (with the compressed model) takes place. This helps to ensure that the training and validation behavior of these models remain close to the expected one, which helps to avoid costly retraining or revalidating them every time a change is made to codebase or data.
One example method includes generating a first test metric using an unknown dataset and second test metrics using shifted datasets that are shifted versions of a known dataset. A determination is made of a data distribution difference between the unknown dataset and one of the shifted datasets that is closest to the unknown dataset. A determination is made if the data distribution difference is less than or equal to a first known threshold, and in response, applying the data distribution difference to a correlation model to determine an estimated test metric difference. A determination is made of a test metric difference between the first test metric and a second test metric associated with the one of the shifted datasets that is closest to the unknown dataset. A determination is made if a difference between the test metric difference and the estimated test metric difference is less than or equal to a second known threshold.
The method 100 may begin in different manners. For example, the method 100 may begin by selecting 102 a model that has already been trained. If a compressed model (CM) for the selected model exists (Yes at 104), the method may spawn 118 automatic unit tests. Spawning tests 118 may include recommending tests for execution. These tests may have been developed in advance and may be automatically associated with the CM.
If the CM does not exist (No at 104), the model may be compressed 110. If the model is not compressed, the method ends 122. If a compressed model is generated (Yes at 110), the compressed model is run or executed 112 using a data pipeline 106. Metadata generated from running the compressed model is stored 120 and unit tests may be created or spawned 118.
Another starting point is to train 108 a model and then compress (Yes at 110) the model. If the model is not compressed, (No at 110), the method may end 122. If there is a need to compress 110 the model that has been trained 108 (Yes at 110), a compression model is run 112 based on data from a data pipeline 106. The output of the compression model is stored 120 as CM metadata and automatic unit tests are spawned 118.
Training 108 a model, particularly a very large model, may require access to large amounts of storage and multiple processors or accelerators. Training the model may require days or weeks, depending on the resources. Because of the time required to train the model or for other reasons, embodiments of the invention may store metadata associated with training the model. The metadata generated and/or stored may include, but is not limited to, training/validation loss evolution, edge cases with bad prediction, timestamps for waypoints along training/validation, or the like. These metadata can be used for various automatic unit tests. More specifically, the unit test may generate or be associated with metadata that can be compared to the metadata generated during training or collected for validation of the model.
As previously stated, compressing a model into a CM is performed and metadata associated with training and validating the CM are stored. Embodiments of the invention do not require the CM to achieve the same level of accuracy or other metric as the original model. Rather, the CM serves as a valid proxy when the metric or other output is reasonable. Reasonable may be defined by a threshold value or percentage. Further the assessment of the metric or output can be based on hard (exact) or soft (withing a threshold deviation) standards.
Embodiments of the invention may rely on the relationship between the metadata gathered or generated by the CM and the metadata gathered or generated by the original model. When running a unit test, the current training or validation data or metrics (metadata) generated by the running or executing the CM with the change may be compared to the metadata stored in association with the model prior to the change.
Regardless of the starting point of the method 100 (selecting 102 or training 108 a model), once a CM is associated with a model and metadata for the CM has been generated, a series of automatic unit tests can be created or spawned 118. These unit tests may assert a hard or soft comparison between the metadata of the stored CM with the metadata of the CM based on the modified code base.
In addition, embodiments of the invention allow a user to create 116 additional unit tests, for example via a manual interface 114. These unit tests can be based on any metadata related to the CMs and may be created to address cases or situations that are not covered by the automatically generated unit tests.
In general, the method 100 may be represented more compactly by the method 148 performed in the framework of method 100. The method 148 may include training/selecting 150 a model. The trained/selected model is compressed 152 to generate a compressed model. In one example, the trained/selected model may already be associated with a compressed model and the compressed model does not need to be generated. Unit tests can be created or spawned 154 for the compressed model. Additional unit tests can be created 156 for the compressed model.
Whenever there is a change that impacts the model 202, it may be necessary to determine whether the behavior or other aspect of the model 202 is affected. In this example, the model 202 is impacted by or associated with a change 204. The change 204 may be a change to the training data or other data set, the codebase of or used by the model 202, the pipeline or the like. The metadata 214 is generated from operation of the CM 210.
The unit test 216 can be performed separately or independently on the metadata 212 and the metadata 214. Thus, the unit test 216 generates an output 218 from the metadata 212 and the unit test 216 generates an output 220 from the metadata 214. The output 218 and 220 are compared 222 to generate a result 224. The result 224 may indicate whether the model 202 is operating as expected or whether any change in behavior is acceptable in light of the change 204. Stated differently, the result 224 may indicate that the behavior, prediction, or other aspect of the model 202 is operating properly or valid for the aspect of the model 202 tested by the unit test 216.
As illustrated in
Embodiments of the invention allow the behavior of the model 202 to be evaluated based on unit tests that are applied to the CM 210. More specifically, the behavior of the model 202 can be compared to the behavior of the CM 210. The behavior of the CM 210, which is operated in the context of the change 204, allows the impact of the change 204 on the model 202 to be determined and to determine whether the behavior of the model 202 will be acceptable in light of the change 204.
As previously stated, unit tests may be generated automatically. Once a CM is generated, unit tests can be automatically associated with the CM. This is one way to identify which unit tests should be performed in the event of the change 204. Further, unit tests can be suggested (e.g., based on actions of other users or based on unit tests for similar models) to the user. Unit tests may also be created.
Unit tests can be created to test different functions, metrics, or other aspects of models and may be specific to changes or to the type of the change. Thus, changes impacting the codebase may be performed with specific metadata or metrics related to the part of the codebase that was changed. Unit testing is often used in test-driven machine learning development. This allows tests to be written in order to detect changes to intended behavior. This allows for development to be performed rapidly.
In the context of very large machine models, automatic unit testing using CMs overcomes the problem of having to test the actual model. Unit tests can be generated based on generic algorithms, based on feedback, or the like.
For example, the unit test 216 may be an inner model metric unit test. In this case, the unit test attempts to measure deviation from established inner model metrics. For a given dataset (or portion thereof), for example, a certain final state or behavior may be expected. The metric can involve a single hidden layer, two or more hidden layers, interactions between those layers, or the like.
When the output 220 (for the CM 210 with the change 204) is sufficiently close or equal to the output 218 (for the model 202 without the change), then the test may be a success. More specifically, the unit test is performed on metadata 214 generated by the CM 210 rather than the model 202 itself because, as previously stated, testing very large machine models takes substantial time and/or cost. Thus, the output 220 is associated with the CM 210 and gives an indication of how the change 204 impacted the original model 202.
If the deviation (e.g., difference between the output 220 and the output 218) is sufficiently small or within a threshold (e.g., 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, or other value), the test may be a success. In this example, the metadata associated with an inner model metric unit test may include values pertaining to hidden layers of the model/CMs in relation to a given dataset or portion thereof. These metadata serve to assert the expected behavior of the model with respect to a given set of input samples and allow the functionality of the model 202 to be tested using the CM 210 that is operated in the context of the change 204.
In another example, the unit test 216 may be an output metric unit test. Output metric unit tests are configured to compare the output 218 (e.g., a prediction or inference) associated with the model 202 with the output 220 associated with the CM 210. The output metric unit test is thus configured to determine the impact of a change to the codebase (e.g., data processing, pipeline code changes). In this example, the changes to the codebase do not affect the input entering the CM 210. If the CM is deterministic, then the outputs 218 and 220 can be compared. More specifically, the output metric unit test may perform a soft comparison as changes to the dataset or output may be expected. In one example, only minor changes are expected. Thus, a threshold between the outputs 218 and 220 can be determined. In this example, the metadata 212 and 214 may include values output by the CM with respect to a given dataset or set of datasets thereof. If a soft comparison is performed, the unit test may be successful if the deviation or difference is within a threshold or is acceptable to a user.
The unit test 216 may be an evolution metric unit test. This type of unit test is configured to compare the evolution of a given metric across an interval of time or steps, such as the validation loss curve. The metadata may include values related to the evolution of one or more metrics across time, such as for training, validation, or the like.
The change 204 may include changes to the model pipeline, datasets, or codebase. For example, datasets used in machine models undergo processing. The change 204 may be related to data ETL (Extract-Transform-Load). This is a process of moving and transforming data from an environment where the data is stored to a volume where it can be used, such as by a machine learning model. This may include feature extraction, parameter related processing, or the like. Any modification to the ETL process (e.g., the change 204) may affect the behavior of the model 202. As a result, unit tests may be created to determine whether changes to the ETL in the context of the CMs have affected the behavior of the original model. Thus, the impact of the ETL changes on the model 202 can be determined based on the output 220 using the metadata 214 of the CM 210.
The change 204 may relate to library updates or rollbacks. When there is a modification to a library used to process or model a codebase (e.g., Machine Learning framework libraries), it is useful to test for the expected behavior of the model based on how these changes relate to how the model is trained, runs, or is stored.
The change 204 may relate to hardware changes. Modifications to the hardware (e.g., CPU (Central Processing Unit)/GPU (Graphical Processing Unit) version) running the model may impact the behavior of the model. It may be useful to ensure that these changes do not change or only minimally change (within a threshold) the expected behavior.
As previously suggested unit tests can be performed to ensure that expected behavior does not change or that the behaviors do not deviate from expected behavior by more than a threshold. Embodiments of the invention integrate model compression and unit testing in the same framework.
As discussed previously, the framework discussed above in relation to
The embodiments disclosed here provide for an extension and adaptation to the framework of
The embodiments disclosed herein have two main phases: offline and online. The goal of the offline phase is to learn a model of how the VLM behaves under data distribution shifts in terms of its relevant metrics. The goal of the online phase is to detect false positives/negatives of VLM unit tests. That is, to know whether a unit test's failing or passing is related to the data having shifted or due to the actual test having passed or failed.
In the offline phase, a VLM is trained and tested, and relevant metrics are collected. Then, perturbation functions are applied to a test dataset to obtain variations of the dataset (i.e., shifted datasets) and to collect relevant metrics with the variations of the dataset. These perturbation test metrics are then correlated with the shifted datasets so that an estimation model for test metrics on unknown shifted datasets can be obtained.
FA illustrates a training and testing stage 300 that in operation goes from an untrained VLM and a given dataset to a trained VLM. The training and testing stage 300 can then generate both training and testing metrics that are computed using a training dataset and a testing dataset. In one embodiment, the training and testing stage 300 is implemented using the framework discussed above in relation to
As illustrated, the training and testing stage 300 includes a VLM 302, which may correspond to the VLMs discussed in relation to
As illustrated, the compression stage 320 includes the VLM 302, the training dataset 304, and the trained VLM 308. The VLM 302 and/or the trained VLM 308 are compressed 322 using any compression method to generate a compressed VLM 324.
As mentioned, the set of perturbation functions 332 are applied to test dataset 334 to obtain the shifted test dataset 336. In the illustrated embodiment, the shifted test dataset 336 includes a shifted test dataset D1test 336A, a shifted test dataset D2test 336B, a shifted test dataset D3test 336C, and any number of additional shifted test datasets Ditest 336D as illustrated by the ellipses.
Each shifted dataset 336A-336D is then tested 338 by the compressed VLM 324 to obtain a set of shifted test metrics (TMi for each test metric) 340. That is, each application of a perturbation function to the test dataset 334 will result in a different shifted test metric 340. In the illustrated embodiment, the shifted test metrics 340 includes a shifted test metric TM1 340A that is obtained by testing the shifted test dataset D1test 336A, a shifted test metric TM2 340B that is obtained by testing the shifted test dataset D2test 336B, a shifted test metric TM3 340C that is obtained by testing the shifted test dataset D3test 336C, and any number of additional shifted test metrics TMi 340D as illustrated by the ellipses that are obtained by testing the additional shifted datasets Ditest 336D. Thus, each shifted test metric 340A-340D is associated with a shifted test dataset 336A-336D.
Having computed the set of shifted test metrics 340, two values per (D1test, TMi) pair are computed that are: Qji, the difference between Ditest and Djtest; and Rji, the difference between TMi and TMj. In other words, an aggregate distance is computed between the two value pairs to output a single value. The aggregate distance can be computed using any reasonable function such as finding an average between the two value pairs.
The (Qji, Rji) pairs can then be used to build a correlation model S. The correlation model S model in operation allows for an estimation of what is the expected difference in the test metric R given the difference in data distribution as will be explained in more detail to follow. That is, the correlation model S estimates the difference between the test metrics given the distance between datasets.
As shown at 416, the unit test was expected to pass, but it failed and as shown at 418, the unit test was expected to fail, but it passed. Since the dataset is a known dataset, a user will know not to trust these results and therefore can perform retraining or revalidation of the data pipeline as needed.
The embodiments disclosed herein provide for an online phase where checks of the unit tests for the VLM can then be made against the unknown dataset Dx using test metrics and the correlation model S generated during an offline phase. The idea is that, if the distribution shift of the unknown dataset Dx in relation to the datasets on which the VLM was tested (e.g., shifted test datasets 336) is small enough, both the passing and failing of a unit test can be trusted as likely being valid. In such a case, the unit test's failing or passing is likely related to the data having shifted and no retraining or revalidation of the data pipeline is needed.
If, however, the distribution shift of the unknown dataset Dx in relation to the datasets on which the VLM was tested is large enough, it is possible that if the results show a test passing, this is a false positive, or if the results show a test failing, this is a false negative. Thus, a user may be uncertain if the unit test's failing or passing is due to the actual test having passed or failed or due to the false positive/negative. In such cases, retraining or revalidation of the data pipeline may be needed to determine the cause of the unit test's failing or passing. However, any such retraining or revalidation will be less than the case of
The unknown dataset Dx 502 and the shifted test datasets 336A, 336B, and 336C are applied to a distance function 510 to determine a distribution distance quantity or value Qij between the unknown dataset Dx 502 and each of the shifted test datasets 336A, 336B, and 336C. The distance function 510 then determines which distribution distance between the unknown dataset Dx 502 and each of the shifted test datasets 336A, 336B, and 336C is the shortest, or in other words which one of the shifted test datasets 336 is closest to the unknown dataset Dx 502 based on having the shortest or smallest distribution distance Qij 512 with the unknown dataset Dx 502. In some embodiments the distance function 510 may be a histogram of label distributions for a classification task or the continuous distribution for a regression task. Then, it would be possible to calculate a probability distribution divergence as the distance between the two datasets.
As shown in
As part of the further test, the test metric difference Rij 516 between the test metric 504 obtained from the unknown dataset Dx 502 and the shifted test metric 340A obtained from the shifted test dataset 336A is calculated as shown in
For example, as shown in
It is noted with respect to the disclosed methods, including the example method of
Directing attention now to
The method 600 includes generating a first test metric from a machine learning model using an unknown dataset (610). For example, as previously the test metric 504 is generated from the compressed VLM 324 using the unknown dataset Dx 502.
The method 600 includes generating a plurality of second test metrics from the machine learning model using a plurality of shifted datasets, the plurality of shifted datasets being shifted versions of a known dataset (620). For example, as previously described the shifted test metrics 340A-340D are generated using the compressed VLM 324 using the shifted test datasets 336A-336D. The shifted test datasets 336A-336D are shifted from the test dataset 334 based on the application of the perturbation functions 332A-332D.
The method 600 includes determining a data distribution difference between the unknown dataset and one of the plurality of shifted datasets that is closest to the unknown dataset (630). For example, as previously described distance function 510 determines the distribution distance Qij 512 between the unknown dataset Dx 502 and the shifted test dataset 336A. In some embodiments, this done by finding all the distribution differences between the unknown dataset Dx 502 and the shifted test datasets 336A-336D and then finding the test dataset having the shortest distance to the unknown dataset Dx 502.
The method 600 includes determining if the data distribution difference is less than or equal to a first known threshold (640). For example, as previously described the distribution distance Qij 512 is checked to see if it is less than or equal to the first known threshold E.
The method 600 includes in response to determining that the data distribution difference is less than or equal to the first known threshold, applying the data distribution difference to a correlation model to determine an estimated test metric difference (650). For example, as previously described when the distribution distance Qij 512 is less than or equal to the first known threshold E, the distribution distance Qij 512 can be applied to the correlation model S 372 to determine the estimated test metric difference R* 514.
The method 600 includes determining a test metric difference between the first test metric and a second test metric associated with the one of the plurality of shifted datasets that is closest to the unknown dataset (660). For example, as previously described the test metric difference Rij 516 between the test metric 504 obtained from the unknown dataset Dx 502 and the shifted test metric 340A is determined.
The method 600 includes determining if a difference between the test metric difference and the estimated test metric difference is less than or equal to a second known threshold (670). For example, as previously described the difference between the estimated test metric difference R* 514 and the metric difference Rij 516 is checked against the second known threshold as follows: |Rij 516−R* 514|≤τ. If the difference is low (less or equal to the second known threshold τ), then the unit tests can be considered trustworthy and there will be no need for retraining or revalidation of the data pipeline. However, if the difference is high (above the second known threshold τ), then the unit tests are not to be trustworthy and retraining or revalidation of the data pipeline is likely needed to verify if the unit tests are correct.
Following are some further example embodiments of the invention. These are presented only by way of example and are not intended to limit the scope of the invention in any way.
Embodiment 1. A method, comprising: generating a first test metric from a machine learning model using an unknown dataset; generating a plurality of second test metrics from the machine learning model using a plurality of shifted datasets, the plurality of shifted datasets being shifted versions of a known dataset; determining a data distribution difference between the unknown dataset and one of the plurality of shifted datasets that is closest to the unknown dataset; determining if the data distribution difference is less than or equal to a first known threshold; in response to determining that the data distribution difference is less than or equal to the first known threshold, applying the data distribution difference to a correlation model to determine an estimated test metric difference; determining a test metric difference between the first test metric and a second test metric associated with the one of the plurality of shifted datasets that is closest to the unknown dataset; and determining if a difference between the test metric difference and the estimated test metric difference is less than or equal to a second known threshold.
Embodiment 2. The method as recited in embodiment 1, wherein determining that the data distribution difference is greater than the first known threshold is indicative of a false positive or false negative and that retraining, or revalidation of the machine learning model is to be performed.
Embodiment 3. The method as recited in any of embodiments 1-2, wherein determining that the difference between the test metric difference and the estimated test metric difference is greater than the second known threshold is indicative of a false positive or false negative and that retraining, or revalidation of the machine learning model is to be performed.
Embodiment 4. The method as recited in any of embodiments 1-3, wherein determining that the difference between the test metric difference and the estimated test metric difference is less than or equal to a second known threshold is indicative that an underlying data pipeline of the machine learning model is operating in an expected manner.
Embodiment 5. The method as recited in any of embodiments 1-4, wherein determining a data distribution difference between the unknown dataset and one of the plurality of shifted datasets that is closest to the unknown dataset comprises: determining a data distribution difference between the unknown dataset and each of the plurality of shifted datasets; and selecting the one of the plurality of shifted datasets that is closest to the unknown dataset based on the one of the plurality of shifted datasets having a smallest data distribution difference with the unknown dataset.
Embodiment 6. The method as recited in any of embodiments 1-5, further comprising: generating a plurality of second data distributions between each of the plurality of shifted datasets; generating a plurality of second test metric differences between each of the second test metrics; and generating the correlation model based on the plurality of second data distributions and the plurality of second test metric differences.
Embodiment 7. The method as recited in any of embodiments 1-6, wherein the machine learning model is a compressed Very Large Model (VLM) that acts as a proxy for a VLM.
Embodiment 8. The method as recited in any of embodiments 1-7, further comprising: applying a plurality of perturbation functions to the known dataset to generate the plurality of shifted datasets
Embodiment 9. The method as recited in any of embodiments 1-8, wherein the first known threshold is based on an average of a data distribution difference between the unknown dataset and each of the plurality of shifted datasets.
Embodiment 10. The method as recited in any of embodiments 1-9, wherein the second known threshold is based on an average of second test metric differences between each of the second test metrics.
Embodiment 11. A system, comprising hardware and/or software, operable to perform any of the operations, methods, or processes, or any portion of any of these, disclosed herein.
Embodiment 12. A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising the operations of any one or more of embodiments 1-10.
The embodiments disclosed herein may include the use of a special purpose or general-purpose computer including various computer hardware or software modules, as discussed in greater detail below. A computer may include a processor and computer storage media carrying instructions that, when executed by the processor and/or caused to be executed by the processor, perform any one or more of the methods disclosed herein, or any part(s) of any method disclosed.
As indicated above, embodiments within the scope of the present invention also include computer storage media, which are physical media for carrying or having computer-executable instructions or data structures stored thereon. Such computer storage media may be any available physical media that may be accessed by a general purpose or special purpose computer.
By way of example, and not limitation, such computer storage media may comprise hardware storage such as solid state disk/device (SSD), RAM, ROM, EEPROM, CD-ROM, flash memory, phase-change memory (“PCM”), or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage devices which may be used to store program code in the form of computer-executable instructions or data structures, which may be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention. Combinations of the above should also be included within the scope of computer storage media. Such media are also examples of non-transitory storage media, and non-transitory storage media also embraces cloud-based storage systems and structures, although the scope of the invention is not limited to these examples of non-transitory storage media.
Computer-executable instructions comprise, for example, instructions and data which, when executed, cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. As such, some embodiments of the invention may be downloadable to one or more systems or devices, for example, from a website, mesh topology, or other source. As well, the scope of the invention embraces any hardware system or device that comprises an instance of an application that comprises the disclosed executable instructions.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts disclosed herein are disclosed as example forms of implementing the claims.
As used herein, the term ‘module’ or ‘component’ may refer to software objects or routines that are executed on the computing system. The different components, modules, engines, and services described herein may be implemented as objects or processes that execute on the computing system, for example, as separate threads. While the system and methods described herein may be implemented in software, implementations in hardware or a combination of software and hardware are also possible and contemplated. In the present disclosure, a ‘computing entity’ may be any computing system as previously defined herein, or any module or combination of modules running on a computing system.
In at least some instances, a hardware processor is provided that is operable to carry out executable instructions for performing a method or process, such as the methods and processes disclosed herein. The hardware processor may or may not comprise an element of other hardware, such as the computing devices and systems disclosed herein.
In terms of computing environments, embodiments of the invention may be performed in client-server environments, whether network or local environments, or in any other suitable environment. Suitable operating environments for at least some embodiments of the invention include cloud computing environments where one or more of a client, server, or other machine may reside and operate in a cloud environment.
With reference briefly now to
In the example of
Such executable instructions may take various forms including, for example, instructions executable to perform any method or portion thereof disclosed herein, and/or executable by/at any of a storage site, whether on-premises at an enterprise, or a cloud computing site, client, datacenter, data protection site including a cloud storage site, or backup server, to perform any of the functions disclosed herein. As well, such instructions may be executable to perform any of the other operations and methods, and any portions thereof, disclosed herein.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.