Various embodiments relate generally, but not exclusively, to scientific instruments and scientific instrument support apparatuses, such as data processing and analysis systems for data generated by scientific instruments.
Scientific instruments can generate anomalous data for a variety of reasons. For example, individual instruments may be incorrectly calibrated or malfunctioning. Such instruments could potentially generate inconsistent or incorrect data. Environmental interferences can also impact data generated by scientific instruments. For example, variances in environmental conditions such as temperature, humidity, pressure, and/or electromagnetic interference can cause scientific instruments to generate anomalous data. Furthermore, the conditions of samples being tested can also lead to anomalous data. Contaminated, degraded, or improperly prepared samples can cause scientific instruments to generate anomalous data. It is often important for scientific laboratories—particularly laboratories utilizing a high-degree of automation—to have processes for identifying anomalous data. For example, detecting anomalous data allows laboratories to ensure the integrity and reliability of data provided to clients and end users. However, due to the complexity of modern laboratories—which can generate large amounts of data from many different scientific instruments at a fast pace—identifying anomalous data can be technically challenging.
Various techniques can be used to detect anomalous results in test data generated by scientific instruments. Pre-defined thresholds can be used to identify anomalous data. For example, test data analysis software can flag data that exceeds a pre-defined threshold as anomalous. However, these and other rule-based approaches to identifying anomalous results can be technically challenging to implement. For example, threshold-based techniques require well-defined sets of thresholds to be set for each type of data. These thresholds may be difficult to define and are not always available. Furthermore, such thresholds require subject matter experts to set, and the usefulness of the thresholds are often limited by the experience and knowledge of the subject matter experts. Additionally, while threshold-based techniques are well-suited to detecting univariate anomalies in test data, they may nevertheless miss other types of anomalous data.
In some scenarios, each individual variable in the test data might appear to be behaving normally. Each data point may fall within an expected range, with no significant deviations or outliers. For example, if the scientific instrument is measuring both temperature and pressure in a chemical reaction, both may fall within an expected range. However, when these variables are analyzed together, the relationships between the variables might not be as expected—and the test data might be considered univariate non-anomalous but multivariate anomalous. In the example where the scientific instrument is measuring both temperature and pressure, temperature and pressure might be expected to increase together. However, if the test data shows that temperature is increasing while pressure is remaining constant or decreasing, then the relationship between the multiple variables could suggest that the test data is anomalous—which could suggest malfunctions in the scientific instrument and/or errors in the data recording. While such inconsistent relationships may be manually spotted by a subject matter expert in low-dimensional test data, these types of anomalous relationships are almost impossible for human users to spot in high-dimensional, high-throughput test data. Thus, identifying multivariate anomalous relationships in complex, high-dimensional, high-throughput scientific instrument test data may be challenging or impossible for skilled human users—even when aided by test data analysis software.
Accordingly, what is needed are software solutions for laboratory information management systems that are capable of automatically processing high-dimensional, high-throughput sets of scientific instrument test data to consistently identify both univariate and multivariate anomalies without relying on subjective and potentially inconsistent human judgment.
A method of detecting sample anomalies within a laboratory information management system includes obtaining a first result for a sample within the laboratory information management system, processing the first result via a univariate machine learning model within the laboratory information management system, processing, within the laboratory information management system, a plurality of results for the sample via a multivariate machine learning model in response to the univariate machine learning model generating a normal output for the first result, and flagging, within the laboratory information management system, the sample for rejection processing in response to the multivariate machine learning model generating an abnormal output for the plurality of samples. The first result represents a first type of result, the univariate machine learning model is trained using unsupervised machine learning, the plurality of results includes the first result, each of the plurality of results represents a different type of result for the sample, and the multivariate machine learning model trained using unsupervised machine learning.
In other features, processing the plurality of results via the multivariate machine learning model includes generating an input vector from the plurality of results and providing the input vector to the multivariate machine learning model to generate an output vector. In other features, the method includes generating the abnormal output for the plurality of results in response to an anomaly score computed based on a comparison of the input vector and the output vector exceeding a threshold value. In other features, the method includes setting the threshold value based on a training dataset. Setting the threshold value based on the training dataset includes loading a training dataset including training results for a plurality of training samples, inputting the training results to the multivariate machine learning model to generate training outputs, computing differences between the training results and the training outputs, and computing the threshold value based on the differences.
In other features, computing the threshold value based on the differences includes ordering the differences in ascending order, computing a first training value based on a lower percentile threshold of the ordered differences, computing a second training value based on an upper percentile threshold of the ordered differences, computing a first range based on a difference between the first training value and the second training value, and computing the threshold value as a function of the second training value and the first range. In other features, the multivariate machine learning model includes a neural network. In other features, the neural network includes an autoencoder. In other features, the neural network includes a variational autoencoder. In other features, the multivariate machine learning model is configured to identify anomalous features in the plurality of results. In other features, the method includes generating the abnormal output for the plurality of results in response to identifying anomalous features in the plurality of results. In other features, the multivariate machine learning model is an isolation forest model. In other features, the multivariate machine learning model is a local outlier factor model. In other features, the multivariate machine learning model is a one-class support vector machine.
In other features, the method includes training the univariate machine learning model. Training the univariate machine learning model includes loading a plurality of training results from the laboratory information management system, ordering the plurality of training results in ascending order, computing a first observation value based on a lower percentile threshold of the ordered plurality of training results, computing a second observation value based on an upper percentile threshold of the ordered plurality of training results, computing a second range based on a difference between the first observation value and the second observation value, setting a minimum threshold as a function of the first observation value and the second range, and setting a maximum threshold as a function of the second observation value and the second range. Each result of the plurality of training results is the first type of result. In other features, the method includes generating the normal output for the first result in response to determining the first result does not exceed the maximum threshold.
In other features, the method includes generating the normal output for the first result in response to determining the first result is not below the minimum threshold. In other features, the method includes generating an abnormal output in response to determining the first result exceeds the maximum threshold or is below the minimum threshold. In other features, the method includes flagging the sample for rejection processing includes generating a notification on a graphical user interface. The notification includes at least one of (i) anomaly scores per feature, (ii) graphs, or (iii) graphical representations of clusters. In other features, the method includes flagging the sample for rejection processing includes flagging the sample for manual processing. In other features, the method includes flagging the sample for rejection processing includes adding, within the laboratory information management system, an anomaly tag to the plurality of results.
In other features, one or more non-transitory computer-readable media include instructions thereon that, when executed by one or more processing devices of a scientific instrument support apparatus, cause the scientific instrument support apparatus to perform the method.
A scientific instrument support apparatus includes memory hardware configured to store instructions and processing hardware configured to execute the instructions. The instructions include obtaining a first result for a sample within a laboratory information management system, processing, within the laboratory information management system, the first result via a univariate machine learning model trained using unsupervised machine learning, processing, within the laboratory information management system, a plurality of results for the sample via a multivariate machine learning model in response to the univariate machine learning model generating a normal output for the first result, and flagging, within the laboratory information management system, the sample for rejection processing in response to the multivariate machine learning model generating an abnormal output for the plurality of samples. The first result represents a first type of result, the plurality of results include the first result, each of the plurality of results represents a different type of result for the sample, and the multivariate machine learning model is trained using unsupervised machine learning.
In other features, processing the plurality of results via the multivariate machine learning model includes generating an input vector from the plurality of results and providing the input vector to the multivariate machine learning model to generate an output vector. In other features, the instructions further comprise generating the abnormal output for the plurality of results in response to an anomaly score computed based on a comparison of the input vector and the output vector exceeding a threshold value. In other features, the instructions further comprise setting the threshold value based on a training dataset. Setting the threshold value based on the training dataset includes loading a training dataset including training results for a plurality of training samples, inputting the training results to the multivariate machine learning model to generate training outputs, computing differences between the training results and the training outputs, and computing the threshold value based on the differences.
In other features, computing the threshold value based on the differences includes ordering the differences in ascending order, computing a first training value based on a lower percentile threshold of the ordered differences, computing a second training value based on an upper percentile threshold of the ordered differences, computing a first range based on a difference between the first training value and the second training value, and computing the threshold value as a function of the second training value and the first range. In other features, the multivariate machine learning model includes a neural network. In other features, the neural network includes an autoencoder. In other features, the neural network includes a variational autoencoder. In other features, the multivariate machine learning model is configured to identify anomalous features in the plurality of results. In other features, the instructions further comprise generating the abnormal output for the plurality of results in response to identifying anomalous features in the plurality of results. In other features, the multivariate machine learning model is an isolation forest model. In other features, the multivariate machine learning model is a local outlier factor model. In other features, the multivariate machine learning model is a one-class support vector machine.
In other features, the instructions further include training the univariate machine learning model. Training the univariate machine learning model includes loading a plurality of training results from the laboratory information management system, each result of the plurality of training results being the first type of result, ordering the plurality of training results in ascending order, computing a first observation value based on a lower percentile threshold of the ordered plurality of training results, computing a second observation value based on an upper percentile threshold of the ordered plurality of training results, computing a second range based on a difference between the first observation value and the second observation value, setting a minimum threshold as a function of the first observation value and the second range, and setting a maximum threshold as a function of the second observation value and the second range.
In other features, the instructions further comprise generating the normal output for the first result in response to determining the first result does not exceed the maximum threshold. In other features, the instructions further comprise generating the normal output for the first result in response to determining the first result is not below the minimum threshold. In other features, the instructions further comprise generating an abnormal output in response to determining the first result exceeds the maximum threshold or is below the minimum threshold. In other features, flagging the sample for rejection processing includes generating a notification on a graphical user interface. The notification includes at least one of (i) anomaly scores per feature, (ii) graphs, or (iii) graphical representations of clusters. In other features, flagging the sample for rejection processing includes flagging the sample for manual processing. In other features, flagging the sample for rejection processing includes adding, within the laboratory information management system, an anomaly tag to the first result and the second result.
A computer-implemented method includes processing a sample with a scientific instrument to generate a plurality of results, inputting at least one result of the plurality of results to a trained univariate machine learning model to generate a univariate output for each result, inputting the univariate outputs to a trained multivariate machine learning model to generate a multivariate output, computing an anomaly score between the univariate outputs input to the trained multivariate machine learning model and the multivariate output, and flagging, within a laboratory information management system, the sample for rejection processing in response to determining that the anomaly score exceeds a threshold.
In other features, the method includes generating an input vector based on the univariate outputs and providing the input vector to the trained multivariate machine learning model to generate the multivariate output. Computing the anomaly score between the univariate outputs input to the trained multivariate machine learning model and the multivariate output includes computing a distance between the input vector and the multivariate output. In other features, the method includes training a multivariate machine learning model. Training the multivariate machine learning model includes generating a training input vector based on a training sample retrieved from the laboratory information management system, providing the training input vector to the multivariate machine learning model to generate a training output vector, computing a distance between the training input vector and the training output vector, updating parameters of the multivariate machine learning model in response to determining that the distance exceeds a threshold, and saving the multivariate machine learning model configured with the updated parameters as the trained multivariate machine learning model.
In other features, training the multivariate machine learning model includes saving the multivariate machine learning model as the trained multivariate machine learning model in response to determining that the distance does not exceed the threshold. In other features, the trained multivariate machine learning model comprises a neural network. In other features, the neural network comprises an autoencoder. In other features, the method further includes training the univariate machine learning model. Training the univariate machine learning model includes loading a plurality of training results from the laboratory information management system, ordering the plurality of training results in ascending order, computing a first observation value based on a lower percentile threshold of the ordered plurality of training results, computing a second observation value based on an upper percentile threshold of the ordered plurality of training results, computing a range based on a difference between the first observation value and the second observation value, setting a minimum threshold as a function of the first observation value and the range, and setting a maximum threshold as a function of the second observation value and the range. Each result of the plurality of training results is a first type of result.
In other features, the lower percentile threshold is about a 25th percentile. In other features, the upper percentile threshold is about a 75th percentile. In other features, inputting the univariate outputs to a trained multivariate machine learning model includes loading a selected result from the sample, determining whether the selected result is within a range between the minimum threshold and the maximum threshold, and adding the selected result to an input vector for the trained multivariate machine learning model in response to determining that the selected result is within a range between the minimum threshold and the maximum threshold. The selected result is the first type of result.
In other features, one or more non-transitory computer-readable media include instructions thereon that, when executed by one or more processing devices of a scientific instrument support apparatus, cause the scientific instrument support apparatus to perform the method.
A scientific instrument support apparatus includes memory hardware configured to store instructions and processing hardware configured to execute the instructions. The instructions include processing a sample with a scientific instrument to generate a plurality of results, inputting at least one result of the plurality of results to a trained univariate machine learning model to generate a univariate output for each result, inputting the univariate outputs to a trained multivariate machine learning model to generate a multivariate output, computing an anomaly score between the univariate outputs input to the trained multivariate machine learning model and the multivariate output, and flagging, within a laboratory information management system, the sample for rejection processing in response to determining that the anomaly score exceeds a threshold.
In other features, the instructions further comprise generating an input vector based on the univariate outputs and providing the input vector to the trained multivariate machine learning model to generate the multivariate output. Computing the anomaly score between the univariate outputs input to the trained multivariate machine learning model and the multivariate output includes computing a distance between the input vector and the multivariate output. In other features, the instructions further comprise training a multivariate machine learning model. Training the multivariate machine learning model includes generating a training input vector based on a training sample retrieved from the laboratory information management system, providing the training input vector to the multivariate machine learning model to generate a training output vector, computing a distance between the training input vector and the training output vector, and updating parameters of the multivariate machine learning model and saving the multivariate machine learning model configured with the updated parameters as the trained multivariate machine learning model in response to determining that the distance exceeds a threshold.
In other features, training the multivariate machine learning model includes saving the multivariate machine learning model as the trained machine learning model in response to determining that the distance does not exceed the threshold. In other features, the trained multivariate machine learning model comprises a neural network. In other features, the neural network comprises an autoencoder. In other features, the instructions further comprise training the univariate machine learning model. Training the univariate machine learning model includes loading a plurality of training results from the laboratory information management system, ordering the plurality of training results in ascending order, computing a first observation value based on a lower percentile threshold of the ordered plurality of training results, computing a second observation value based on an upper percentile threshold of the ordered plurality of training results, computing a range based on a difference between the first observation value and the second observation value, setting a minimum threshold as a function of the first observation value and the range, and setting a maximum threshold as a function of the second observation value and the range. Each result of the plurality of training results being a first type of result.
In other features, the lower percentile threshold is about a 25th percentile. In other features, the upper percentile threshold is about a 75th percentile. In other features, inputting the univariate outputs to a trained multivariate machine learning model includes loading a selected result from the sample, determining whether the selected result is within a range between the minimum threshold and the maximum threshold, and adding the selected result to an input vector for the trained multivariate machine learning model in response to determining that the selected result is within a range between the minimum threshold and the maximum threshold. The selected result is the first type of result.
In other features, one or more non-transitory computer-readable media include instructions thereon that, when executed by one or more processing devices of a scientific instrument support apparatus, cause the scientific instrument support apparatus to perform the method.
Embodiments will be readily understood by the following detailed description in conjunction with the accompanying drawings. To facilitate this description, like reference numerals designate like structural elements. Embodiments are illustrated by way of example, not by way of limitation, in the figures of the accompanying drawings.
Disclosed herein are scientific instrument support systems, as well as related methods, computing devices, and computer-readable media. For example, in some embodiments, a method of detecting sample anomalies within a laboratory information management system includes obtaining a first result for a sample within the laboratory information management system, processing the first result via a univariate machine learning model within the laboratory information management system, processing, within the laboratory information management system, a plurality of results for the sample via a multivariate machine learning model in response to the univariate machine learning model generating a normal output for the first result, and flagging, within the laboratory information management system, the sample for rejection processing in response to the multivariate machine learning model generating an abnormal output for the plurality of samples. The first result represents a first type of result, the univariate machine learning model is trained using unsupervised machine learning, the plurality of results includes the first result, each of the plurality of results represents a different type of result for the sample, and the multivariate machine learning model trained using unsupervised machine learning.
The embodiments disclosed herein thus provide improvements to scientific instrument technology (e.g., improvements in the computer technology supporting such scientific instruments, among other improvements). For example, techniques described in this specification do not rely on human operators to set rules (such as thresholds) for each variable in test data, which removes the variability introduced by relying on human expertise and improves the consistency of laboratory quality control processes. Additionally, by removing the need for and reliance on skilled human operators, techniques described in this specification allow for the implementation of fully automated sampling and analysis processes within the laboratory environment. Techniques described in this specification also do not require human operators to learn and adapt their analyses to new types of data and/or samples analyzed under different conditions. Instead, techniques described herein are capable of operating in an unsupervised manner. For example, techniques described herein are capable of automatically learning—for example, using historical scientific instrument test data—how to process and analyze new types of scientific instrument results generated from new sample types that are analyzed under new conditions, all without requiring human input or analysis.
In the following detailed description, reference is made to the accompanying drawings that form a part hereof wherein like numerals designate like parts throughout, and in which is shown, by way of illustration, embodiments that may be practiced. It is to be understood that other embodiments may be utilized, and structural or logical changes may be made, without departing from the scope of the present disclosure. Therefore, the following detailed description is not to be taken in a limiting sense.
Various operations may be described as multiple discrete actions or operations in turn, in a manner that is most helpful in understanding the subject matter disclosed herein. However, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations may not be performed in the order of presentation. Operations described may be performed in a different order from the described embodiment. Various additional operations may be performed, and/or described operations may be omitted in additional embodiments.
For the purposes of the present disclosure, the phrases “A and/or B” and “A or B” mean (A), (B), or (A and B). For the purposes of the present disclosure, the phrases “A, B, and/or C” and “A, B, or C” mean (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C). Although some elements may be referred to in the singular (e.g., “a processing device”), any appropriate elements may be represented by multiple instances of that element, and vice versa. For example, a set of operations described as performed by a processing device may be implemented with different ones of the operations performed by different processing devices.
The description uses the phrases “an embodiment,” “various embodiments,” and “some embodiments,” each of which may refer to one or more of the same or different embodiments. Furthermore, the terms “comprising,” “including,” “having,” and the like, as used with respect to embodiments of the present disclosure, are synonymous. When used to describe a range of dimensions, the phrase “between X and Y” represents a range that includes X and Y. As used herein, an “apparatus” may refer to any individual device, collection of devices, part of a device, or collections of parts of devices. The drawings are not necessarily to scale.
In various implementations, the scientific instrument support module 1000 may implement a software-based solution that manages the data and processes associated with operations of scientific instruments within a laboratory environment. For example, scientific instrument support module 1000 may implement a laboratory information management system. Accordingly, as shown in the example of
In some embodiments, workflow management logic 1004 manages and automates the laboratory's processes and tasks. Workflow management logic 1004 automates the planning, execution, and monitoring of sequences of processes within the laboratory. For example, workflow management logic 1004 guides samples through each stage of the laboratory process—from reception, assignment, processing, and quality control to final approval. In some examples, instrument integration logic 1006 connects support module 1000 and the laboratory's scientific instruments and other equipment. Instrument integration logic 1006 may control scientific instruments and other machines in the laboratory and/or capture data produced by the scientific instruments and other machines. In example embodiments, data management logic 1008 captures, stores, and processes data generated from laboratory operations. Such data may include sample data, results from experiments or tests conducted on samples using scientific instruments, calibration data for instruments, and/or quality control data. In various implementations, data management logic 1008 performs data analysis and validation operations on the data generated from laboratory operations. For example, data management logic 1008 and machine learning logic 1010 perform multivariate anomaly detection on test data generated by scientific instruments.
In various implementations, machine learning logic 1010 trains machine learning models stored in machine learning models 1012. Machine learning models 1012 include one or more univariate machine learning models and one or more multivariate machine learning models. Examples of univariate machine learning models include machine learning models that implement outlier detection methods, such as the Interquartile Range (IQR) Method, the Z-score threshold method, and other suitable methods. Examples of multivariate machine learning models include neural networks (such as autoencoders and variational autoencoders), ensemble learning models such as the Isolation Forest model, density-based anomaly detection algorithms such as the Local Outlier Factor model, and/or the One-Class Support Vector Machine model. In some embodiments, user interface logic 1014 generates graphical user interfaces for users to interact with the laboratory information management system (such as graphical user interface 11000, which will be described further on in this specification with reference to
As used herein, the term “logic” may include an apparatus that is to perform a set of operations associated with the logic. For example, any of the logic elements included in the support module 1000 may be implemented by one or more computing devices programmed with instructions to cause one or more processing devices of the computing devices to perform the associated set of operations. In a particular embodiment, a logic element may include one or more non-transitory computer-readable media having instructions thereon that, when executed by one or more processing devices of one or more computing devices, cause the one or more computing devices to perform the associated set of operations. As used herein, the term “module” may refer to a collection of one or more logic elements that, together, perform a function associated with the module. Different ones of the logic elements in a module may take the same form or may take different forms. For example, some logic in a module may be implemented by a programmed general-purpose processing device, while other logic in a module may be implemented by an application-specific integrated circuit (ASIC). In another example, different ones of the logic elements in a module may be associated with different sets of instructions executed by one or more processing devices. A module may not include all of the logic elements depicted in the associated drawing; for example, a module may include a subset of the logic elements depicted in the associated drawing when that module is to perform a subset of the operations discussed herein with reference to that module.
Each result may refer to a single data point associated with a single property of the sample generated by one or more scientific instruments. Accordingly, where the one or more scientific instruments generate test data including multiple data points associated with multiple properties of the sample, the one or more scientific instruments generate multiple results—one associated with each data point. Thus, the multiple results generated for the sample may include one or more data points for one or more properties of the sample. Results associated with a property may be collectively referred to as a type of result. At 2004, results generated by the one or more scientific instruments are imported into a laboratory information management system—such as the laboratory information management system implemented by support module 1000. For example, instrument integration logic 1006 passes the results to data management logic 1008 and data management logic 1008 saves the results to one or more data stores.
At 2006, data management logic 1008 and/or machine learning logic 1010 inputs each result of the sample to a trained univariate machine learning model corresponding to each result type to generate a univariate output for each result. For example, if the result is a pH measurement, then data management logic 1008 and/or machine learning logic 1010 selects a trained univariate machine learning model for pH and inputs the pH measurement into the selected trained univariate machine learning model to generate a univariate output for the sample. In various implementations, each univariate machine learning model may be trained to predict whether a single result or a single type of result is likely to be univariate anomalous, and the univariate output indicates whether or not the result input into the univariate machine learning model is likely to be univariate anomalous. Data management logic 1008 labels each result with whether it is likely to be univariate anomalous according to the output of the univariate machine learning model. In various implementations, reporting and analytics logic 1016 generates a report related to each result input into the univariate machine learning model. For example, the report includes a visual representation of the result along with a distribution of other results of the same type. The visual representation may be as a density plot, box plot, dot plot, and/or a time-series plot. The report may also include summary statistics of the result input into the univariate machine learning model and other results of the same type. Example summary statistics include a mean, median, minimum, maximum, first quartile value, third quartile value, standard deviation, and/or mean absolute deviation. The report may be output to the graphical user interface. In various implementations, the report include interactive portions, such as an interactive dashboard. Additional details associated with training the univariate machine learning models will be described further on in this specification with reference to
At 2008, data management logic 1008 generates a subset of results based on the univariate output for each result. In various implementations, data management logic 1008 adds each result labeled as not likely to be univariate anomalous to the subset of results. At 2010, data management logic 1008 and/or machine learning logic 1010 inputs the subset of results to a trained multivariate machine learning model to generate a multivariate output. In some embodiments, the subset of results are first preprocessed using the same preprocessing techniques used during training of the multivariate machine learning model and the preprocessed results are saved to an input vector. In some embodiments, preprocessing may include techniques such as normalization, standardization, imputation, and/or variable encoding. The input vector is provided to the trained multivariate machine learning model as input data. In various implementations, the trained multivariate machine learning model generates an efficient (or compressed) representation of the input data by mapping the input data to a lower-dimensional representation. The trained multivariate machine learning model then reconstructs the lower-dimensional representation back into a higher-dimensional representation (for example, matching the dimensionality of the input data) as multivariate outputs. Additional details associated with training the multivariate machine learning models will be described further in this specification with reference to
At 2012, data management logic 1008 processes the multivariate output to calculate an anomaly score-such as an error or a reconstruction error. In various implementations, data management logic 1008 calculates a difference between the input and output of the multivariate machine learning model and returns the calculated difference as the anomaly score. The anomaly score may include a difference between each feature input into the multivariate machine learning model and a corresponding component of the output. In some embodiments, the anomaly score may include an aggregate difference between all features input into the multivariate machine learning model and all components of the output. Additional details associated with calculating the anomaly score will be described further on in this specification with reference to
At 2014, data management logic 1008 compares the anomaly score to a threshold and determines whether the anomaly score exceeds the threshold. Additional details associated with automatically calculating the threshold will be described further on in this specification with reference to
While process 2000 is described with results being generated by one or more scientific instruments at 2002 and saved to a laboratory information management system at 2004, in some implementations, results may be generated by any suitable data source at 2002 and saved to any suitable data store and/or data management system at 2004. Accordingly, the subset of results, the results, and/or the sample may be flagged in any suitable data management system at 2016 and/or approved in any suitable data management system at 2018.
Furthermore, in some embodiments, one or more of the results are not provided to corresponding trained univariate machine learning model at 2006. These results may also be added to the subset of results at 2008. In some examples, the subset of results may include all results, regardless of whether they have been processed through a trained univariate machine learning model at 2006.
At 3010, machine learning logic 1010 computes a range based on a difference between the first and second observation values. In various implementations, the range is computed by subtracting the first observation value from the second observation value, equation (1) below:
At 3012, machine learning logic 1010 computes a minimum threshold as a function of the first observation value and the range. For example, the minimum threshold is computed according to equation (2) below:
At 3014, machine learning logic 1010 computes a maximum threshold as a function of the second observation and the range. For example, the maximum threshold is computed according to equation (3) below:
In equations (1) and (2), a and b may be pre-defined or user-defined constants or range multipliers. In various implementations, a and/or b may be about 1.5. Accordingly, the minimum threshold and maximum threshold are set for the type of result corresponding to the training dataset. Process 3000 may be repeated with training datasets corresponding to each type of result so that a univariate machine learning model is trained for each type of result. In various implementations, process 3000 may be repeated when new training datasets become available.
In response to machine learning logic 1010 determining that the value of the selected result is below the minimum threshold (“YES” at decision block 4010), data management logic 1008 and/or machine learning logic 1010 labels the selected result as univariate anomalous at 4008 and proceeds to decision block 4014. In response to machine learning logic 1010 determining that the value of the selected result is not below the minimum threshold (“NO” at decision block 4010), data management logic 1008 and/or machine learning logic 1010 labels the selected result as not univariate anomalous at 4012 and proceeds to decision block 4014. At decision block 4014, machine learning logic 1010 determines whether another unlabeled result is present in the set of results corresponding to the sample. In response to determining another unlabeled result is present (“YES” at decision block 4014), machine learning logic 1010 selects the next unlabeled result at 4016 and proceeds back to block 4004. In response to machine learning logic 1010 determining all results corresponding to the sample have been labeled (“NO” at decision block 4014), data management logic 1008 and/or machine learning logic 1010 saves the labeled results as labeled results for the sample at 4018.
In various embodiments, each original value x may be standardized to a new value x′ according to equation (5) below—where μ is the mean of the epoch or dataset and σ is the standard deviation of the epoch or dataset:
In various implementations, the training dataset and dataset representing the sample being evaluated by the multivariate machine learning model are preprocessed using the same method and/or values calculated during the training phase. At 5006, machine learning logic 1010 generates an input vector from the preprocessed results. For example, the input vector may include preprocessed results from a set (e.g., representing a sample). At 5010, machine learning logic 1010 provides the input vector to the multivariate machine learning model to generate an output vector. At 5012, machine learning logic 1010 computes an anomaly score—such as a reconstruction error em using a loss function. In some embodiments, the reconstruction error represents a difference between the input vector and the output vector. Additional details associated with computing the anomaly score will be described further on in this specification with reference to
In some examples, the threshold may be automatically determined according to techniques described further on in this specification with reference to
In response to machine learning logic 1010 determining that the anomaly score is less than or equal to the threshold (“YES” at decision block 5014), machine learning logic 1010 saves the multivariate machine learning model as a trained multivariate machine learning model at 5020. For example, machine learning logic 1010 saves trained multivariate machine learning model to trained machine learning models 1012. At 5022, machine learning logic 1010 loads a validation dataset. In various implementations, validation dataset includes results corresponding to a sample not present in the training dataset. In some embodiments, the validation dataset is pre-processed using the same methods as used for the training dataset. At 5024, machine learning logic 1010 provides the validation dataset to the trained machine learning model to generate a validation output. At 5026, machine learning logic 1010 computes a validation anomaly score using the loss function. The validation anomaly score may be representative of a difference between the validation dataset input into the trained multivariate machine learning model and the validation output. In various implementations, the validation anomaly score is calculated according to the same techniques as used to compute the anomaly score at block 5012. At 5028, machine learning logic 1010 determines whether the anomaly score is less than or equal to the threshold.
In response to determining that the anomaly score is not less than or equal to the threshold (“NO” at decision block 5028), machine learning logic 1010 adjusts hyperparameters and/or the architecture of the trained multivariate machine learning model (because the trained multivariate machine learning model may be overfitted to the training data) at 5030. Examples of hyperparameters include: (i) ρ (coefficient used for computing a running average of squared gradients) in implementations where the Adadelta optimization algorithm is used, (ii) the strength of regularization techniques used (such as L1 or L2) to prevent overfitting, and/or (iii) a number of nodes in the encoder and decoder layers. In response to determining that the anomaly score is less than or equal to the threshold (“YES” at decision block 5032), machine learning logic 1010 accepts the trained multivariate machine learning model at 5032.
At 6008, machine learning logic 1010 determines whether another result associated with the sample is present that has not yet been processed. In response to determining that another result is present (“YES” at decision block 6008), machine learning logic 1010 selects the next result associated with the sample at 6010 and process 6000 proceeds back to block 6004. In response to determining that all results associated with the sample have been processed (“NO” at decision block 6008), machine learning logic 1010 computes an average of the difference values in the data object at 6012. While
At 7008, machine learning logic 1010 generates an error value between the selected results and the training output. In various implementations, machine learning logic 1010 computes the error value according to techniques previously discussed with reference to
At 7018, machine learning logic 1010 computes a first observation value based on a lower percentile threshold of the ordered training dataset. For example, machine learning logic 1010 computes the value below which a given percentage of observations in the ordered training error set fall and sets the value as the first observation value. In various implementations, the first observation value is the value below which about 25% of the observations fall. At 7020, machine learning logic 1010 computes a second observation value based on an upper percentile threshold of the ordered training error set. For example, machine learning logic 1010 computes the value below which a given percentage of observations in the results fall and sets the value as the second observation value. In some embodiments, the second observation value is the value below which about 75% of the results fall.
At 7022, machine learning logic 1010 computes a range based on a difference between the first and second observation values. In various implementations, the range is computed by subtracting the first observation value from the second observation value, equation (6) below:
At 7024, machine learning logic 1010 computes a maximum threshold as a function of the second observation and the range. For example, the maximum threshold is computed according to equation (7) below:
In equation (7), a may be a pre-defined or user-defined constant or range multiplier. In various implementations, a may be about 1.5. At 7026, machine learning logic 1010 sets the maximum threshold as the anomaly value threshold for the trained multivariate machine learning model. For example, the anomaly value threshold set at block 7026 may be the anomaly value threshold used at decision block 2014 of process 2000 and/or decision blocks 5014 and/or 5028 of process 5000.
Generally, the number of hidden layers—and the number of nodes in each layer—may be selected based on the complexity of the input data, time complexity requirements, and accuracy requirements. Time complexity may refer to an amount of time required for the neural network to learn a problem—which can be represented by the input variables—and produce acceptable results—which can be represented by the output variables. Accuracy may refer to how close the results represented by the output variables are to real results. In various implementations, increasing the number of hidden layers and/or increasing the number of nodes in each layer may increase the accuracy of neural networks but also increase the time complexity. Conversely, in various implementations, decreasing the number of hidden layers and/or decreasing the number of nodes in each layer may decrease the accuracy of neural networks but also decrease the time complexity.
As shown in
In various implementations, if previous layer 8002 is the input layer of neural network 8000, each of the nodes 8006-8012 may correspond to an element of the input vector. For example, the input variables to a neural network may be expressed as input vector i having n dimensions. So for neural network 8000—which has an input layer with nodes 8006-8012 assigned scalar values x1-xn, respectively—input vector i may be represented by equation (8) below:
In various implementations, input vector i may be a signed vector, and each element may be a scalar value in a range of between about −1 and about 1. So, in some examples, the ranges of the scalar values of nodes 8006-8012 may be expressed in interval notation as: x1∈[−1,1], x2∈[−1,1], x3∈[−1,1], and xn∈[−1,1].
Each of the nodes of a previous layer of a feedforward neural network—such as neural network 8000—may be multiplied by a weight before being fed into one or more nodes of a next layer. For example, the nodes of previous layer 8002 may be multiplied by weights before being fed into one or more nodes of the next layer 8004. In various implementations, next layer 8004 may include one or more nodes, such as node 8014. While only a single node is shown in
In various implementations—such as in the example of
In various implementations, if a bias b is added to the summed outputs of the previous layer, then summation Σ may be represented by equation (10) below:
The summation Σ may then be fed into activation function ƒ. In various implementations, the activation function ƒ may be any mathematical function suitable for calculating an output of the node. Example activation functions ƒ may include linear or non-linear functions, step functions such as the Heaviside step function, derivative or differential functions, monotonic functions, sigmoid or logistic activation functions, rectified linear unit (ReLU) functions, and/or leaky ReLU functions. In some embodiments, the activation function may be the tanh function. The output of the function ƒ may then be the output of the node. In the example of
As previously discussed with reference to
At 10010, the subset of results may be provided to a trained multivariate machine learning model to generate a multivariate output. In some embodiments, the multivariate machine learning model may include a trained machine model suitable for detecting anomalous results within a set of results. For example, the multivariate machine learning model may be a trained Isolation Forest model, a trained Local Outlier Factor model, and/or a trained One-Class Support Vector Machine model. Accordingly, the multivariate output includes which of the input features (e.g., results) are predicted to be anomalous. At 10012, the multivariate output is analyzed to determine whether multivariate anomalous features are detected. In response to multivariate anomalous features being detected (“YES” at decision block 10014), the subset of results, the set of results, and/or the sample corresponding to the results is flagged within the laboratory information management system as being potentially anomalous at 10014. In various implementations, block 10014 may be implemented in a substantially similar manner as block 2016 of process 2000. In response to multivariate anomalous features not being detected (“NO” at decision block 10014), the subset of results, the set of results, and/or the sample corresponding to the results is approved within the laboratory information management system at 10016.
Although the operations of processes 2000-7000 and 10000 may be illustrated with reference to particular embodiments disclosed herein (e.g., the scientific instrument support modules 1000 discussed herein with reference to
The scientific instrument support methods disclosed herein may include interactions with a human user (e.g., via the user local computing device 13020 discussed herein with reference to
The GUI 11000 may include a data display region 11002, a data analysis region 11004, a scientific instrument control region 11006, and a settings region 11008. The particular number and arrangement of regions depicted in
The scientific instrument control region 11006 may include options that allow the user to control a scientific instrument (e.g., the scientific instrument 13010 discussed herein with reference to
As noted above, the scientific instrument support module 1000 may be implemented by one or more computing devices.
The computing device 12000 of
The computing device 12000 may include a processing device 12002 (e.g., one or more processing devices). As used herein, the term “processing device” may refer to any device or portion of a device that processes electronic data from registers and/or memory to transform that electronic data into other electronic data that may be stored in registers and/or memory. The processing device 12002 may include one or more digital signal processors (DSPs), application-specific integrated circuits (ASICs), central processing units (CPUs), graphics processing units (GPUs), cryptoprocessors (specialized processors that execute cryptographic algorithms within hardware), server processors, or any other suitable processing devices.
The computing device 12000 may include a storage device 12004 (e.g., one or more storage devices). The storage device 12004 may include one or more memory devices such as random access memory (RAM) (e.g., static RAM (SRAM) devices, magnetic RAM (MRAM) devices, dynamic RAM (DRAM) devices, resistive RAM (RRAM) devices, or conductive-bridging RAM (CBRAM) devices), hard drive-based memory devices, solid-state memory devices, networked drives, cloud drives, or any combination of memory devices. In some embodiments, the storage device 12004 may include memory that shares a die with a processing device 12002. In such an embodiment, the memory may be used as cache memory and may include embedded dynamic random access memory (eDRAM) or spin transfer torque magnetic random access memory (STT-MRAM), for example. In some embodiments, the storage device 12004 may include non-transitory computer readable media having instructions thereon that, when executed by one or more processing devices (e.g., the processing device 12002), cause the computing device 12000 to perform any appropriate ones of or portions of the methods disclosed herein.
The computing device 12000 may include an interface device 12006 (e.g., one or more interface devices 12006). The interface device 12006 may include one or more communication chips, connectors, and/or other hardware and software to govern communications between the computing device 12000 and other computing devices. For example, the interface device 12006 may include circuitry for managing wireless communications for the transfer of data to and from the computing device 12000. The term “wireless” and its derivatives may be used to describe circuits, devices, systems, methods, techniques, communications channels, etc., that may communicate data through the use of modulated electromagnetic radiation through a nonsolid medium. The term does not imply that the associated devices do not contain any wires, although in some embodiments they might not. Circuitry included in the interface device 12006 for managing wireless communications may implement any of a number of wireless standards or protocols, including but not limited to Institute for Electrical and Electronic Engineers (IEEE) standards including Wi-Fi (IEEE 802.11 family), IEEE 802.16 standards (e.g., IEEE 802.16-2005 Amendment), Long-Term Evolution (LTE) project along with any amendments, updates, and/or revisions (e.g., advanced LTE project, ultra mobile broadband (UMB) project (also referred to as “3GPP2”), etc.). In some embodiments, circuitry included in the interface device 12006 for managing wireless communications may operate in accordance with a Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Evolved HSPA (E-HSPA), or LTE network. In some embodiments, circuitry included in the interface device 12006 for managing wireless communications may operate in accordance with Enhanced Data for GSM Evolution (EDGE), GSM EDGE Radio Access Network (GERAN), Universal Terrestrial Radio Access Network (UTRAN), or Evolved UTRAN (E-UTRAN). In some embodiments, circuitry included in the interface device 12006 for managing wireless communications may operate in accordance with Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Digital Enhanced Cordless Telecommunications (DECT), Evolution-Data Optimized (EV-DO), and derivatives thereof, as well as any other wireless protocols that are designated as 3G, 4G, 5G, and beyond. In some embodiments, the interface device 12006 may include one or more antennas (e.g., one or more antenna arrays) to receipt and/or transmission of wireless communications.
In some embodiments, the interface device 12006 may include circuitry for managing wired communications, such as electrical, optical, or any other suitable communication protocols. For example, the interface device 12006 may include circuitry to support communications in accordance with Ethernet technologies. In some embodiments, the interface device 12006 may support both wireless and wired communication, and/or may support multiple wired communication protocols and/or multiple wireless communication protocols. For example, a first set of circuitry of the interface device 12006 may be dedicated to shorter-range wireless communications such as Wi-Fi or Bluetooth, and a second set of circuitry of the interface device 12006 may be dedicated to longer-range wireless communications such as global positioning system (GPS), EDGE, GPRS, CDMA, WiMAX, LTE, EV-DO, or others. In some embodiments, a first set of circuitry of the interface device 12006 may be dedicated to wireless communications, and a second set of circuitry of the interface device 12006 may be dedicated to wired communications.
The computing device 12000 may include battery/power circuitry 12008. The battery/power circuitry 12008 may include one or more energy storage devices (e.g., batteries or capacitors) and/or circuitry for coupling components of the computing device 12000 to an energy source separate from the computing device 12000 (e.g., AC line power).
The computing device 12000 may include a display device 12010 (e.g., multiple display devices). The display device 12010 may include any visual indicators, such as a heads-up display, a computer monitor, a projector, a touchscreen display, a liquid crystal display (LCD), a light-emitting diode display, or a flat panel display.
The computing device 12000 may include other input/output (I/O) devices 12012. The other I/O devices 12012 may include one or more audio output devices (e.g., speakers, headsets, earbuds, alarms, etc.), one or more audio input devices (e.g., microphones or microphone arrays), location devices (e.g., GPS devices in communication with a satellite-based system to receive a location of the computing device 12000, as known in the art), audio codecs, video codecs, printers, sensors (e.g., thermocouples or other temperature sensors, humidity sensors, pressure sensors, vibration sensors, accelerometers, gyroscopes, etc.), image capture devices such as cameras, keyboards, cursor control devices such as a mouse, a stylus, a trackball, or a touchpad, bar code readers, Quick Response (QR) code readers, or radio frequency identification (RFID) readers, for example.
The computing device 12000 may have any suitable form factor for its application and setting, such as a handheld or mobile computing device (e.g., a cell phone, a smart phone, a mobile internet device, a tablet computer, a laptop computer, a netbook computer, an ultrabook computer, a personal digital assistant (PDA), an ultra mobile personal computer, etc.), a desktop computing device, or a server computing device or other networked computing component.
One or more computing devices implementing any of the scientific instrument support modules or methods disclosed herein may be part of a scientific instrument support system.
Any of the scientific instrument 13010, the user local computing device 13020, the service local computing device 13030, or the remote computing device 13040 may include any of the embodiments of the computing device 12000 discussed herein with reference to
The scientific instrument 13010, the user local computing device 13020, the service local computing device 13030, or the remote computing device 13040 may each include a processing device 13002, a storage device 13004, and an interface device 13006. The processing device 13002 may take any suitable form, including the form of any of the processing devices 12002 discussed herein with reference to
The scientific instrument 13010, the user local computing device 13020, the service local computing device 13030, and the remote computing device 13040 may be in communication with other elements of the scientific instrument support system 13000 via communication pathways 13008. The communication pathways 13008 may communicatively couple the interface devices 13006 of different ones of the elements of the scientific instrument support system 13000, as shown, and may be wired or wireless communication pathways (e.g., in accordance with any of the communication techniques discussed herein with reference to the interface devices 12006 of the computing device 12000 of
The user local computing device 13020 may be a computing device (e.g., in accordance with any of the embodiments of the computing device 12000 discussed herein) that is local to a user of the scientific instrument 13010. In some embodiments, the user local computing device 13020 may also be local to the scientific instrument 13010, but this need not be the case; for example, a user local computing device 13020 that is in a user's home or office may be remote from, but in communication with, the scientific instrument 13010 so that the user may use the user local computing device 13020 to control and/or access data from the scientific instrument 13010. In some embodiments, the user local computing device 13020 may be a laptop, smartphone, or tablet device. In some embodiments the user local computing device 13020 may be a portable computing device.
The service local computing device 13030 may be a computing device (e.g., in accordance with any of the embodiments of the computing device 12000 discussed herein) that is local to an entity that services the scientific instrument 13010. For example, the service local computing device 13030 may be local to a manufacturer of the scientific instrument 13010 or to a third-party service company. In some embodiments, the service local computing device 13030 may communicate with the scientific instrument 13010, the user local computing device 13020, and/or the remote computing device 13040 (e.g., via a direct communication pathway 13008 or via multiple “indirect” communication pathways 13008, as discussed above) to receive data regarding the operation of the scientific instrument 13010, the user local computing device 13020, and/or the remote computing device 13040 (e.g., the results of self-tests of the scientific instrument 13010, calibration coefficients used by the scientific instrument 13010, the measurements of sensors associated with the scientific instrument 13010, etc.). In some embodiments, the service local computing device 13030 may communicate with the scientific instrument 13010, the user local computing device 13020, and/or the remote computing device 13040 (e.g., via a direct communication pathway 13008 or via multiple “indirect” communication pathways 13008, as discussed above) to transmit data to the scientific instrument 13010, the user local computing device 13020, and/or the remote computing device 13040 (e.g., to update programmed instructions, such as firmware, in the scientific instrument 13010, to initiate the performance of test or calibration sequences in the scientific instrument 13010, to update programmed instructions, such as software, in the user local computing device 13020 or the remote computing device 13040, etc.). A user of the scientific instrument 13010 may utilize the scientific instrument 13010 or the user local computing device 13020 to communicate with the service local computing device 13030 to report a problem with the scientific instrument 13010 or the user local computing device 13020, to request a visit from a technician to improve the operation of the scientific instrument 13010, to order consumables or replacement parts associated with the scientific instrument 13010, or for other purposes.
The remote computing device 13040 may be a computing device (e.g., in accordance with any of the embodiments of the computing device 12000 discussed herein) that is remote from the scientific instrument 13010 and/or from the user local computing device 13020. In some embodiments, the remote computing device 13040 may be included in a datacenter or other large-scale server environment. In some embodiments, the remote computing device 13040 may include network-attached storage (e.g., as part of the storage device 13004). The remote computing device 13040 may store data generated by the scientific instrument 13010, perform analyses of the data generated by the scientific instrument 13010 (e.g., in accordance with programmed instructions), facilitate communication between the user local computing device 13020 and the scientific instrument 13010, and/or facilitate communication between the service local computing device 13030 and the scientific instrument 13010.
In some embodiments, one or more of the elements of the scientific instrument support system 13000 illustrated in
The following paragraphs provide various examples of the embodiments disclosed herein.
Example 1 includes a method of detecting sample anomalies within a laboratory information management system. The method includes obtaining a first result for a sample within the laboratory information management system, processing, within the laboratory information management system, the first result via a univariate machine learning model trained using unsupervised machine learning, processing, within the laboratory information management system, a plurality of results for the sample via a multivariate machine learning model in response to the univariate machine learning model generating a normal output for the first result, and flagging, within the laboratory information management system, the sample for rejection processing in response to the multivariate machine learning model generating an abnormal output for the plurality of samples. The first result represents a first type of result, the plurality of results includes the first result and each of the plurality of results represents a different type of result for the sample, and the multivariate machine learning model is trained using unsupervised machine learning.
Example 2 includes the subject matter of Example 1 and further specifies that processing the plurality of results via the multivariate machine learning model includes generating an input vector from the plurality of results and providing the input vector to the multivariate machine learning model to generate an output vector.
Example 3 includes the subject matter of Example 2 and further specifies generating the abnormal output for the plurality of results in response to an anomaly score computed based on a comparison of the input vector and the output vector exceeding a threshold value.
Example 4 includes the subject matter of Example 3 and further specifies setting the threshold value based on a training dataset. Setting the threshold value based on the training dataset includes loading a training dataset including training results for a plurality of training samples, inputting the training results to the multivariate machine learning model to generate training outputs, computing differences between the training results and the training outputs, and computing the threshold value based on the differences.
Example 5 includes the subject matter of Example 4 and further specifies that computing the threshold value based on the differences includes ordering the differences in ascending order, computing a first training value based on a lower percentile threshold of the ordered differences, computing a second training value based on an upper percentile threshold of the ordered differences, computing a first range based on a difference between the first training value and the second training value, and computing the threshold value as a function of the second training value and the first range.
Example 6 includes the subject matter of any of Examples 1-5 and further specifies that the multivariate machine learning model includes a neural network.
Example 7 includes the subject matter of Example 6 and further specifies that the neural network includes an autoencoder.
Example 8 includes the subject matter of Example 6 and further specifies that the neural network includes a variational autoencoder.
Example 9 includes the subject matter of Example 1 and further specifies that the multivariate machine learning model is configured to identify anomalous features in the plurality of results.
Example 10 includes the subject matter of Example 9 and further specifies generating the abnormal output for the plurality of results in response to identifying anomalous features in the plurality of results.
Example 11 includes the subject matter of any of Examples 9 or 10 and further specifies that the multivariate machine learning model is an isolation forest model.
Example 12 includes the subject matter of any of Examples 9 or 10 and further specifies that the multivariate machine learning model is a local outlier factor model.
Example 13 includes the subject matter of any of Examples 9 or 10 and further specifies that the multivariate machine learning model is a one-class support vector machine.
Example 14 includes the subject matter of any of Examples 1-13 and further specifies training the univariate machine learning model. Training the univariate machine learning model includes loading a plurality of training results from the laboratory information management system, each result of the plurality of training results being the first type of result, ordering the plurality of training results in ascending order, computing a first observation value based on a lower percentile threshold of the ordered plurality of training results, computing a second observation value based on an upper percentile threshold of the ordered plurality of training results, computing a second range based on a difference between the first observation value and the second observation value, setting a minimum threshold as a function of the first observation value and the second range, and setting a maximum threshold as a function of the second observation value and the second range.
Example 15 includes the subject matter of Example 14 and further specifies generating the normal output for the first result in response to determining the first result does not exceed the maximum threshold.
Example 16 includes the subject matter of any of Examples 14 or 15 and further specifies generating the normal output for the first result in response to determining the first result is not below the minimum threshold.
Example 17 includes the subject matter of any of Examples 14-16 and further specifies generating an abnormal output in response to determining the first result exceeds the maximum threshold or is below the minimum threshold.
Example 18 includes the subject matter of any of Examples 1-17 and further specifies that flagging the sample for rejection processing includes generating a notification on a graphical user interface, wherein the notification includes at least one of (i) anomaly scores per feature, (ii) graphs, or (iii) graphical representations of clusters.
Example 19 includes the subject matter of any of Examples 1-17 and further specifies that flagging the sample for rejection processing includes flagging the sample for manual processing.
Example 20 includes the subject matter of any of Examples 1-19 and further specifies that flagging the sample for rejection processing includes adding, within the laboratory information management system, an anomaly tag to the plurality of results.
Example 21 includes a scientific instrument support apparatus that includes memory hardware configured to store instructions and processing hardware configured to execute the instructions. The instructions include obtaining a first result for a sample within a laboratory information management system, processing, within the laboratory information management system, the first result via a univariate machine learning model trained using unsupervised machine learning, processing, within the laboratory information management system, a plurality of results for the sample via a multivariate machine learning model in response to the univariate machine learning model generating a normal output for the first result, the multivariate machine learning model trained using unsupervised machine learning, and flagging, within the laboratory information management system, the sample for rejection processing in response to the multivariate machine learning model generating an abnormal output for the plurality of samples. The first result represents a first type of result, the plurality of results include the first result, and each of the plurality of results represents a different type of result for the sample.
Example 22 includes the subject matter of Example 21 and further specifies that processing the plurality of results via the multivariate machine learning model includes generating an input vector from the plurality of results and providing the input vector to the multivariate machine learning model to generate an output vector.
Example 23 includes the subject matter of Example 22 and further specifies that the instructions further comprise generating the abnormal output for the plurality of results in response to an anomaly score computed based on a comparison of the input vector and the output vector exceeding a threshold value.
Example 24 includes the subject matter of Example 23 and further specifies that the instructions further comprise setting the threshold value based on a training dataset. Setting the threshold value based on the training dataset includes loading a training dataset including training results for a plurality of training samples, inputting the training results to the multivariate machine learning model to generate training outputs, computing differences between the training results and the training outputs, and computing the threshold value based on the differences.
Example 25 includes the subject matter of Example 24 and further specifies that computing the threshold value based on the differences includes ordering the differences in ascending order, computing a first training value based on a lower percentile threshold of the ordered differences, computing a second training value based on an upper percentile threshold of the ordered differences, computing a first range based on a difference between the first training value and the second training value, and computing the threshold value as a function of the second training value and the first range.
Example 26 includes the subject matter of any of Examples 21-25 and further specifies that the multivariate machine learning model includes a neural network.
Example 27 includes the subject matter of Example 26 and further specifies that the neural network includes an autoencoder.
Example 28 includes the subject matter of Example 26 and further specifies that the neural network includes a variational autoencoder.
Example 29 includes the subject matter of Example 21 and further specifies that the multivariate machine learning model is configured to identify anomalous features in the plurality of results.
Example 30 includes the subject matter of Example 29 and further specifies that the instructions further comprise generating the abnormal output for the plurality of results in response to identifying anomalous features in the plurality of results.
Example 31 includes the subject matter of any of Examples 29 or 30 and further specifies that the multivariate machine learning model is an isolation forest model.
Example 32 includes the subject matter of any of Examples 29 or 30 and further specifies that the multivariate machine learning model is a local outlier factor model.
Example 33 includes the subject matter of any of Examples 29 or 30 and further specifies that the multivariate machine learning model is a one-class support vector machine.
Example 34 includes the subject matter of any of Examples 21-33 and further specifies that the instructions further comprise training the univariate machine learning model. Training the univariate machine learning model includes loading a plurality of training results from the laboratory information management system, each result of the plurality of training results being the first type of result, ordering the plurality of training results in ascending order, computing a first observation value based on a lower percentile threshold of the ordered plurality of training results, computing a second observation value based on an upper percentile threshold of the ordered plurality of training results, computing a second range based on a difference between the first observation value and the second observation value, setting a minimum threshold as a function of the first observation value and the second range, and setting a maximum threshold as a function of the second observation value and the second range.
Example 35 includes the subject matter of Example 34 and further specifies that the instructions further comprise generating the normal output for the first result in response to determining the first result does not exceed the maximum threshold.
Example 36 includes the subject matter of any of Examples 34 or 35 and further specifies that the instructions further comprise generating the normal output for the first result in response to determining the first result is not below the minimum threshold.
Example 37 includes the subject matter of any of Examples 34-36 and further specifies that the instructions further comprise generating an abnormal output in response to determining the first result exceeds the maximum threshold or is below the minimum threshold.
Example 38 includes the subject matter of any of Examples 21-37 and further specifies that flagging the sample for rejection processing includes generating a notification on a graphical user interface, wherein the notification includes at least one of (i) anomaly scores per feature, (ii) graphs, or (iii) graphical representations of clusters.
Example 39 includes the subject matter of any of Examples 21-37 and further specifies that flagging the sample for rejection processing includes flagging the sample for manual processing.
Example 40 includes the subject matter of any of Examples 21-39 and further specifies that flagging the sample for rejection processing includes adding, within the laboratory information management system, an anomaly tag to the first result and the second result.
Example 41 includes a computer-implemented method that includes processing a sample with a scientific instrument to generate a plurality of results, inputting at least one result of the plurality of results to a trained univariate machine learning model to generate a univariate output for each result, inputting the univariate outputs to a trained multivariate machine learning model to generate a multivariate output, computing an anomaly score between the univariate outputs input to the trained multivariate machine learning model and the multivariate output, and flagging, within a laboratory information management system, the sample for rejection processing in response to determining that the anomaly score exceeds a threshold.
Example 42 includes the subject matter of Example 41 and further specifies generating an input vector based on the univariate outputs and providing the input vector to the trained multivariate machine learning model to generate the multivariate output. Computing the anomaly score between the univariate outputs input to the trained multivariate machine learning model and the multivariate output includes computing a distance between the input vector and the multivariate output.
Example 43 includes the subject matter of Example 42 and further specifies training a multivariate machine learning model. Training the multivariate machine learning model includes generating a training input vector based on a training sample retrieved from the laboratory information management system, providing the training input vector to the multivariate machine learning model to generate a training output vector, computing a distance between the training input vector and the training output vector, and updating parameters of the multivariate machine learning model and saving the multivariate machine learning model configured with the updated parameters as the trained multivariate machine learning model in response to determining that the distance exceeds a threshold.
Example 44 includes the subject matter of Example 43 and further specifies that training the multivariate machine learning model includes saving the multivariate machine learning model as the trained multivariate machine learning model in response to determining that the distance does not exceed the threshold.
Example 45 includes the subject matter of any of Examples 41-44 and further specifies that the trained multivariate machine learning model comprises a neural network.
Example 46 includes the subject matter of Example 45 and further specifies that the neural network comprises an autoencoder.
Example 47 includes the subject matter of any of Examples 41-45 and further specifies training the univariate machine learning model. Training the univariate machine learning model includes loading a plurality of training results from the laboratory information management system, ordering the plurality of training results in ascending order, computing a first observation value based on a lower percentile threshold of the ordered plurality of training results, computing a second observation value based on an upper percentile threshold of the ordered plurality of training results, computing a range based on a difference between the first observation value and the second observation value, setting a minimum threshold as a function of the first observation value and the range, and setting a maximum threshold as a function of the second observation value and the range. Each result of the plurality of training results being a first type of result
Example 48 includes the subject matter of Example 47 and further specifies that the lower percentile threshold is about a 25th percentile.
Example 49 includes the subject matter of any of Examples 47 or 48 and further specifies that the upper percentile threshold is about a 75th percentile.
Example 50 includes the subject matter of any of Examples 47-49 and further specifies that inputting the univariate outputs to a trained multivariate machine learning model includes loading a selected result from the sample, wherein the selected result is the first type of result, determining whether the selected result is within a range between the minimum threshold and the maximum threshold, and adding the selected result to an input vector for the trained multivariate machine learning model in response to determining that the selected result is within a range between the minimum threshold and the maximum threshold.
Example 51 includes a scientific instrument support apparatus that includes memory hardware configured to store instructions and processing hardware configured to execute the instructions. The instructions include processing a sample with a scientific instrument to generate a plurality of results, inputting at least one result of the plurality of results to a trained univariate machine learning model to generate a univariate output for each result, inputting the univariate outputs to a trained multivariate machine learning model to generate a multivariate output, computing an anomaly score between the univariate outputs input to the trained multivariate machine learning model and the multivariate output, and flagging, within a laboratory information management system, the sample for rejection processing in response to determining that the anomaly score exceeds a threshold.
Example 52 includes the subject matter of Example 51 and further specifies that the instructions further comprise generating an input vector based on the univariate outputs and providing the input vector to the trained multivariate machine learning model to generate the multivariate output. Computing the anomaly score between the univariate outputs input to the trained multivariate machine learning model and the multivariate output includes computing a distance between the input vector and the multivariate output.
Example 53 includes the subject matter of Example 52 and further specifies that the instructions further comprise training a multivariate machine learning model. Training the multivariate machine learning model includes generating a training input vector based on a training sample retrieved from the laboratory information management system, providing the training input vector to the multivariate machine learning model to generate a training output vector, computing a distance between the training input vector and the training output vector, and updating parameters of the multivariate machine learning model and saving the multivariate machine learning model configured with the updated parameters as the trained multivariate machine learning model in response to determining that the distance exceeds a threshold.
Example 54 includes the subject matter of Example 53 and further specifies that training the multivariate machine learning model includes saving the multivariate machine learning model as the trained machine learning model in response to determining that the distance does not exceed the threshold.
Example 55 includes the subject matter of any of Examples 51-54 and further specifies that the trained multivariate machine learning model comprises a neural network.
Example 56 includes the subject matter of Example 55 and further specifies that the neural network comprises an autoencoder.
Example 57 includes the subject matter of any of Examples 51-55 and further specifies that the the instructions further comprise training the univariate machine learning model. Training the univariate machine learning model includes loading a plurality of training results from the laboratory information management system, each result of the plurality of training results being a first type of result, ordering the plurality of training results in ascending order, computing a first observation value based on a lower percentile threshold of the ordered plurality of training results, computing a second observation value based on an upper percentile threshold of the ordered plurality of training results, computing a range based on a difference between the first observation value and the second observation value, setting a minimum threshold as a function of the first observation value and the range, and setting a maximum threshold as a function of the second observation value and the range.
Example 58 includes the subject matter of Example 57 and further specifies that the lower percentile threshold is about a 25th percentile.
Example 59 includes the subject matter of any of Examples 57 or 58 and further specifies that the upper percentile threshold is about a 75th percentile.
Example 60 includes the subject matter of any of Examples 57-59 and further specifies that inputting the univariate outputs to a trained multivariate machine learning model includes loading a selected result from the sample, wherein the selected result is the first type of result, determining whether the selected result is within a range between the minimum threshold and the maximum threshold, and adding the selected result to an input vector for the trained multivariate machine learning model in response to determining that the selected result is within a range between the minimum threshold and the maximum threshold.
Example 61 includes one or more non-transitory computer-readable media having instructions thereon that, when executed by one or more processing devices of a scientific instrument support apparatus, cause the scientific instrument support apparatus to perform the method of any of Examples 1-20.
Example 62 includes one or more non-transitory computer-readable media having instructions thereon that, when executed by one or more processing devices of a scientific instrument support apparatus, cause the scientific instrument support apparatus to perform the method of any of Examples 41-50.