N/A
Oligonucleotide (sometimes referred to herein as oligos) are short DNA or RNA molecules having a specific sequence of bases that can be used for a variety of purposes. For example, a group of oligos can be used in a positive control sample provided to a sequencing device (e.g., a next generation sequencing device) to determine whether the sequencing device and/or associated sequencing processes (e.g., sequence alignment) properly identifies the sequences that are known to be present in the group of oligos that were included in the positive control sample. However, using oligos for this or other purposes can be confounded if the oligos are not of sufficiently high quality. For example, quality can be affected by factors including but not limited to the presence of additional undesired species of oligos, discrepancies in relative abundances between desired oligo species, or insufficient similarity of oligo properties to the properties of the sample types for which the oligos will be used as a control.
Accordingly, new systems, methods, and media for determining relative quality of oligonucleotide preparations are desirable.
In accordance with some embodiments of the disclosed subject matter, systems, methods, and media for determining relative quality of oligonucleotide preparations are provided.
In accordance with some embodiments of the disclosed subject matter, a system for determining relative quality of oligonucleotide preparations is provided, the system comprising: at least one hardware processor that is programmed to: (a) receive genetic sequencing results for multiple libraries each associated with a target concentration of a plurality of oligonucleotides; (b) calculate at least one prediction band based on the multiple libraries; (c) repeat (a) and (b) for a plurality of preparations; (d) determine boundaries for a final prediction band based on the prediction bands calculated at (b) for each of the plurality of preparations; and (e) cause to be presented a report indicative of quality of the oligonucleotide libraries associated with the plurality of preparations, wherein the report includes at least metrics indicative of the final prediction band.
In some embodiments, the at least one hardware processor is further programmed to: subsequent to (a) and prior to (b), (i) divide the libraries into a plurality of titer bins based on target concentration, including a high titer bin and a low titer bin; and repeat (a), (i), and (b) for each of the plurality of preparations.
In some embodiments, the at least one hardware processor is further programmed to: receive genetic sequencing results for multiple new libraries each associated with a target concentration of oligonucleotides; calculate a prediction band based on the multiple new libraries; and cause the report to include at least metrics indicative of the prediction band calculated based on the multiple new libraries.
In some embodiments, the at least one hardware processor is further programmed to: divide the new libraries into the plurality of titer bins based on target concentration, including the high titer bin and the low titer bin; and calculate a prediction band based for each titer band based on the multiple new libraries; and cause the report to include at least metrics indicative of the prediction band for the high titer bin calculated based on the multiple new libraries.
In some embodiments, the at least one hardware processor is further programmed to: cause the report to include a graphical representation of the final prediction band using a first pair of axes; and cause the report to include a graphical representation of the metrics indicative of the prediction band for the high titer bin calculated based on the multiple new libraries using the same pair of axes.
In some embodiments, each prediction band includes an upper line and a lower line, wherein the upper line and the lower line are each characterized by a slope m and an intercept c.
In some embodiments, the processor is further programmed to: generate a distribution of slopes for the upper line of each prediction band corresponding to the high titer bin; determine a range of slopes for an upper boundary for the final prediction band based on the distribution of slopes for the upper line of each prediction band corresponding to the high titer bin; generate a distribution of slopes for the lower line of each prediction band corresponding to the high titer bin; determine a range of slopes for a lower upper boundary for the final prediction band based on the distribution of slopes for the lower line of each prediction band corresponding to the high titer bin; generate a distribution of intercepts for the high titer bin; determine a range of intercepts based on the distribution of intercepts for the high titer bin; and cause the report to include the range of slopes for the upper boundary, the range of slopes for the lower boundary, and the range of intercepts.
In some embodiments, the at least one hardware processor is further programmed to: cause the report to include a graphical representation of the final prediction band using a first pair of axes; and cause the report to include a graphical representation of the metrics indicative of the prediction band calculated based on the multiple new libraries using the same pair of axes.
In some embodiments, each prediction band includes an upper line and a lower line, wherein the upper line and the lower line are each characterized by a slope m and an intercept c.
In some embodiments, the processor is further programmed to: generate a distribution of slopes for the upper line of each prediction band; determine a range of slopes for an upper boundary for the final prediction band based on the distribution of slopes for the upper line of each prediction band; generate a distribution of slopes for the lower line of each prediction band; determine a range of slopes for a lower upper boundary for the final prediction band based on the distribution of slopes for the lower line of each prediction band; generate a distribution of intercepts for the high titer bin; determine a range of intercepts based on the distribution of intercepts; and cause the report to include the range of slopes for the upper boundary, the range of slopes for the lower boundary, and the range of intercepts.
In some embodiments, the processor is further programmed to: cause the report to include a graphical representation of the final prediction band based on the range of slopes for the upper boundary, the range of slopes for the lower boundary, and the range of intercepts.
In some embodiments, the genetic sequencing results for each of the multiple libraries is indicative of a number reads corresponding to each oligonucleotide of the plurality of oligonucleotides; and the processor is further programmed to: determine, for each of the libraries, a signal value indicative of the number of reads corresponding to an average of the number of reads corresponding to each oligonucleotide of the plurality of oligonucleotides;
calculate a ratio of target concentration for each pair of libraries in the multiple libraries by dividing the higher target concentration of the pair by the lower target concentration of the pair; calculate a ratio of signal values for each pair of libraries in the multiple libraries by dividing the signal value associated with the sample with the higher target concentration of the pair by the signal value associated with the sample with the lower target concentration of the pair; calculate a logarithm of each ratio of target concentration; calculate a logarithm of each ratio of signal values; and calculate the prediction band based on a plurality of points each having an x value corresponding to the logarithm of the ratio of target concentration of two libraries and a y value corresponding to the logarithm of the ratio of signal values of the two libraries.
In accordance with some embodiments of the disclosed subject matter a method for determining relative quality of oligonucleotide preparations is provided, the method comprising: (a) receiving genetic sequencing results for multiple libraries each associated with a target concentration of a plurality of oligonucleotides; (b) calculating at least one prediction band based on the multiple libraries; (c) repeating (a) and (b) for a plurality of preparations; (e) determining boundaries for a final prediction band based on the prediction bands calculated at (b) for each of the plurality of preparations; and (e) causing to be presented a report indicative of quality of the oligonucleotide libraries associated with the plurality of preparations, wherein the report includes at least metrics indicative of the final prediction band.
In accordance with some embodiments of the disclosed subject matter, a non-transitory computer readable medium containing computer executable instructions that, when executed by a processor, cause the processor to perform a method for determining relative quality of oligonucleotide preparations is provided, the method comprising: (a) receiving genetic sequencing results for multiple libraries each associated with a target concentration of a plurality of oligonucleotides; (b) calculating at least one prediction band based on the multiple libraries; (c) repeating (a) and (b) for a plurality of libraries; (d) determining boundaries for a final prediction band based on the prediction bands calculated at (b) for each the high titer bin associated with each of the plurality of libraries; and (e) causing to be presented a report indicative of quality of the oligonucleotide libraries associated with the plurality of libraries, wherein the report includes at least metrics indicative of the final prediction band.
Various objects, features, and advantages of the disclosed subject matter can be more fully appreciated with reference to the following detailed description of the disclosed subject matter when considered in connection with the following drawings, in which like reference numerals identify like elements.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
In accordance with various embodiments, mechanisms (which can, for example, include systems, methods, and media) for determining relative quality of oligonucleotide preparations are provided.
In accordance with some embodiments of the disclosed subject matter, mechanisms described herein can be used to determine metrics that can be indicative of the quality of a preparation of oligos and/or the quality of a process for sequencing a library derived from a preparation of oligos. In general, oligos can be used as normalization controls (which can sometimes be referred to as quantitative controls) that can be used to determine whether a genetic sequencing process is producing accurate and precise results. In some embodiments, a preparation of oligos can refer to a group of oligos synthesized based on a design that specifies a set of oligos based on various parameters. For example, a preparation of oligos can be a master, which can refer to a collection of oligos synthesized based on a particular design during a particular period of time. In a more particular example, a master can be X total moles of oligos based on a design specifying a set of Y different oligos at one or more target concentrations (e.g., each oligo in a design can be associated with a target molar concentration per liter, a target number of nanomoles, etc., which may be the same across all oligos or different for different sets of one or more oligos). As another example, a preparation of oligos can be a pool, which can refer to a portion of a master. In a particular example, if a particular master originally comprises 1 liter of solution, a 100 milliliter portion of the master can be referred to as a pool of oligos. As yet another example, a preparation of oligos can be a sample, which can refer to a portion of a master or pool of oligos. In a more particular example, a sample can refer to a portion of a master or pool that is to be prepared for sequencing (e.g., using one or more next generation sequencing techniques). As still another example, a preparation of oligos can be a library, which can refer to a sample or a portion of a sample that has been prepared such that it is suitable for sequencing (e.g., by ligating an adapter that the sequencing technique utilizes during sequencing to each end of the oligos). In some embodiments, multiple libraries (e.g., at different target concentrations) can be derived from a single sample by subdividing the sample and combining the portion of the sample with an amount of solvent needed to achieve a particular target concentration, where the amount of solvent needed to achieve a particular target concentration can be determined based on the target concentration of the sample.
In some embodiments, mechanisms described herein can be used to, among other things, indicate the quality of a particular oligo preparation, to compare quality between oligo preparation, to compare oligos corresponding to different experimental designs, and/or to compare oligos manufactured via different manufacturing techniques.
In some embodiments, system 100 can include an alignment system that can use any suitable alignment technique or combination of techniques, such as linear alignment techniques, and graph-based alignment techniques (e.g., as described in U.S. Patent Application Publication No. 2020/0090786, which is hereby incorporated by reference herein in its entirety) to assemble reads in results received from data source 102 into sequences (e.g., sequences corresponding to oligos in the library).
In some embodiments, oligo quality assessment system 104 can determine prediction bands based on the known target concentration of the libraries and the sequencing results received from data source 102 for the libraries. For example, oligo quality assessment system 104 can execute one or more portions of process 300 described below in connection with
Additionally or alternatively, in some embodiments, computing device 110 can communicate information about genetic information (e.g., genetic sequence results generated by a next generation sequencing device, aligned reads associated with a particular library) from data source 102 to a server 120 over a communication network 108 and/or server 120 can receive genetic information from data source 102 (e.g., directly and/or using communication network 108), which can execute at least a portion of oligo quality assessment system 104. In such embodiments, server 120 can return analysis results to computing device 110 (and/or any other suitable computing device) indicative of quality of the oligo preparations.
In some embodiments, computing device 110 and/or server 120 can be any suitable computing device or combination of devices, such as a desktop computer, a laptop computer, a smartphone, a tablet computer, a wearable computer, a server computer, a virtual machine being executed by a physical computing device, a specialty device (e.g., a next generation sequencing device), etc. As described below in connection with
In some embodiments, data source 102 can be any suitable source or sources of genetic data. For example, data source 102 can be a next generation sequencing device or devices that generate a large number of reads from a library. As another example, data source 102 can be a data store configured to store genetic data, which may be aligned genetic data or unaligned reads.
In some embodiments, data source 102 can be local to computing device 110. For example, data source 102 can be incorporated with computing device 110. As another example, data source 102 can be connected to computing device 110 by one or more cables, a direct wireless link, etc. Additionally or alternatively, in some embodiments, data source 102 can be located locally and/or remotely from computing device 110, and provide data to computing device 110 (and/or server 120) via a communication network (e.g., communication network 108).
In some embodiments, communication network 108 can be any suitable communication network or combination of communication networks. For example, communication network 108 can include a Wi-Fi network (which can include one or more wireless routers, one or more switches, etc.), a peer-to-peer network (e.g., a Bluetooth network), a cellular network (e.g., a 3G network, a 4G network, a 5G network, etc., complying with any suitable standard, such as CDMA, GSM, LTE, LTE Advanced, WiMAX, 5G NR, etc.), a wired network, etc. In some embodiments, communication network 108 can be a local area network, a wide area network, a public network (e.g., the Internet), a private or semi-private network (e.g., a corporate or university intranet), any other suitable type of network, or any suitable combination of networks. Communications links shown in
In some embodiments, communications systems 208 can include any suitable hardware, firmware, and/or software for communicating information over communication network 108 and/or any other suitable communication networks. For example, communications systems 208 can include one or more transceivers, one or more communication chips and/or chip sets, etc. In a more particular example, communications systems 208 can include hardware, firmware and/or software that can be used to establish a Wi-Fi connection, a Bluetooth connection, a cellular connection, an Ethernet connection, etc.
In some embodiments, memory 210 can include any suitable storage device or devices that can be used to store instructions, values, etc., that can be used, for example, by processor 202 to present content using display 204, to communicate with server 120 via communications system(s) 208, etc. Memory 210 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 210 can include RAM, ROM, EEPROM, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, etc. In some embodiments, memory 210 can have encoded thereon a computer program for controlling operation of computing device 110. In such embodiments, processor 202 can execute at least a portion of the computer program to present content (e.g., user interfaces, graphics, tables, reports, etc.), receive genetic data, information, and/or content from data source 102, receive information (e.g., content, genetic information, etc.) from server 120, transmit information to server 120, etc.
In some embodiments, server 120 can include a processor 212, a display 214, one or more inputs 216, one or more communications systems 218, and/or memory 220. In some embodiments, processor 212 can be any suitable hardware processor or combination of processors, such as a CPU, a GPU, an MCU, an ASIC, an FPGA, etc. In some embodiments, display 214 can include any suitable display devices, such as a computer monitor, a touchscreen, a television, etc. In some embodiments, inputs 216 can include any suitable input devices and/or sensors that can be used to receive user input, such as a keyboard, a mouse, a touchscreen, a microphone, etc.
In some embodiments, communications systems 218 can include any suitable hardware, firmware, and/or software for communicating information over communication network 108 and/or any other suitable communication networks. For example, communications systems 218 can include one or more transceivers, one or more communication chips and/or chip sets, etc. In a more particular example, communications systems 218 can include hardware, firmware and/or software that can be used to establish a Wi-Fi connection, a Bluetooth connection, a cellular connection, an Ethernet connection, etc.
In some embodiments, memory 220 can include any suitable storage device or devices that can be used to store instructions, values, etc., that can be used, for example, by processor 212 to present content using display 214, to communicate with one or more computing devices 110, etc. Memory 220 can include any suitable volatile memory, non-volatile memory, storage, or any suitable combination thereof. For example, memory 220 can include RAM, ROM, EEPROM, one or more flash drives, one or more hard disks, one or more solid state drives, one or more optical drives, etc. In some embodiments, memory 220 can have encoded thereon a server program for controlling operation of server 120. In such embodiments, processor 212 can execute at least a portion of the server program to transmit information and/or content (e.g., a user interface, graphs, tables, reports, etc.) to one or more computing devices 110, receive genetic data, information, and/or content from one or more computing devices 110, receive instructions from one or more devices (e.g., a personal computer, a laptop computer, a tablet computer, a smartphone, etc.), etc.
In some embodiments, the oligo libraries can include the same distribution of oligos and/or different distributions of oligos. For example, the oligo libraries can all be drawn from the same preparation (e.g., sample, pool, master, etc.) and can include the same distribution of oligos. As another example, the oligo libraries can be drawn from different preparations (e.g., different samples from the same pool, different samples from the same master, different samples of different pools drawn from the same master, different samples of different masters, different samples of different pools drawn from different masters, etc.) that each include the same distribution of oligos. As yet another example, the oligo libraries can be drawn from different preparations in which at least two of the preparations include a distribution of oligos that at least partially overlaps with another preparation (e.g., there may be some oligos in common and some oligos that are different). As yet another example, the oligo libraries can be drawn from different preparations that each include a distribution of unique oligos that are not present in more than one of the preparation.
In some embodiments, process 300 can receive the genetic sequencing results in any suitable format. For example, in some embodiments, genetic data received at 302 can be formatted as results from a next generation sequencing device. In more particular example, the results can be formatted as a BCL file, which includes information received from the sequencer's sensors (e.g., regarding the luminescence that represent the biochemical signal of the reaction). In such an example, process 300 can include aligning the genetic data received at 302. In such an example, the data can be converted into another format, such as a FASTQ format, that includes both a called base and a quality score for each position of a read. As another example, the genetic data received at 302 can be received as reads that include a called base and in some cases a quality score for each position of each read. In a more particular example, the results can be formatted as a FASTQ file.
In some embodiments, process 300 can receive an indication of the target concentration associated with the genetic sequencing results from any suitable source. For example, in some embodiments, process 300 can an indication of the target concentration associated with the genetic sequencing results from an input device (e.g., an input device associated with computing device 110).
In some embodiments, process 300 can receive the results and/or can format the results as two arrays of values, an array of input titer values (e.g., an input titer array) and an array of observed RPM values (e.g., an observed RPM array). In such embodiments, the elements in the arrays can ordered by library such that input titer value at n=1 corresponds to observed RPM at n=1. As another example, process 300 can receive the results and/or can format the results as a matrix (e.g., a 2×M or M×2 matrix) in which a first row (or column) corresponds to titer values, and a second row (or column) corresponds to RPM values. As yet another example, process 300 can receive the results and/or can format the results as a matrix (e.g., a 2×M×N or M×2×N matrix, or any other suitable permutation, where M is the number of libraries derived from a preparation (e.g., sample, pool, etc.) from which the largest number of libraries were derived, and N is the number of preparations being evaluated).
At 304, process 300 can divide the oligo libraries associated with each preparation (e.g., sample, pool, master, etc.) into i relative titer bins based on the target concentration associated with each oligo library. In some embodiments, for example as described below in connection with
In some embodiments, process 300 can omit 304. For example, in lieu of dividing the oligo libraries associated with each preparation (e.g., sample, pool, master, etc.) into i relative titer bins, process 300 can use a single titer bin (e.g., with a range that includes all of the concentration). As another example, in lieu of dividing the oligo libraries into multiple titer bins, process 300 can utilize a single titer bin (e.g., such that i=1).
At 306, process 300 can calculate one or more prediction bands (e.g., for all libraries in the preparation, for a subset of libraries in the preparation such as for libraries that have a signal above a threshold, for each titer bin based on all results in that titer bin, etc.). In some embodiments, process 300 can remove libraries that failed. For example, process 300 can remove libraries with results having a signal below a particular threshold level (e.g., samples for which results have a value of 0). In some embodiments, process 300 can record the identity of the libraries that failed, which can be used when evaluating quality of a preparation from which the sample(s) used to derive the libraries was drawn.
In some embodiments, process 300 can calculate, for pairs of library results in (e.g., for all libraries in the preparation, for a subset of libraries in the preparation such as for libraries that have a signal above a threshold, for each titer bin, etc.), a ratio of concentrations, and a ratio of signals based on the genetic sequencing results received at 302. For example, for a titer bin that includes the libraries described below in connection with
Process 300 can generate a similar ratio for the signal (e.g., RPM) associated with each result using the same relationship between libraries that was used to determine ratios for the target concentrations (e.g., for signal values as, bs, cs, process 300 can determine three ratios
regardless of the numerical values of as, bs, cs). In some embodiments, a logarithm (e.g., log based 10) can be applied to each ratio. Note that this can result in negative values for the log of a ratio of the signals, as it is possible for a library with a higher target concentration to result in a lower signal level (e.g., through one or more sources of error).
In a particular example with reference to
In some embodiments, process 300 can calculate a prediction band for the data corresponding to the pairwise ratios. For example, process 300 can generate a 95% prediction band for the data. A 95% prediction band can be a band into which 95% of future measurements are expected to fall within. In some embodiments, process 300 can use any suitable technique or combination of techniques to calculate a prediction band for the data. For example, process 300 can calculate a pointwise prediction band. As another example, process 300 can calculate a simultaneous prediction band (e.g., using Bonferroni's method, or Scheffe's method to account for multiple comparisons). Note that this is merely an example, and the prediction band can be any suitable prediction band (e.g., an 80% prediction band, a 90% prediction band, etc.). In some embodiments, confidence intervals can be used in addition to, or in lieu of, prediction intervals to represent the scattered distribution.
As described below in connection with
At 308, process 300 can repeat 302 to 306 for each preparation (e.g., sample, pool, master, etc.) that is being used to generate a final prediction band.
At 310, process 300 can determine boundaries for the final prediction band. For example, process 300 can determine boundaries for all libraries. As another example, process 300 can determine boundaries for all libraries that have a signal above a threshold. As yet another example, process 300 can determine boundaries for each titer bin (e.g., one set of boundaries for the high titer bin, and another set of boundaries for the low titer bin). In some embodiments, process 300 can use any suitable technique to determine the boundaries for the final prediction band(s) (e.g., based on data from all libraries derived from a particular preparation). For example, as described below in connection with
As described below in connection with
At 312, process 300 can receive genetic sequencing results for multiple oligo libraries at different target titer concentrations that are drawn from a new preparation (e.g., a new master based on a new design, a new master based on the same design, a new pool prepared from the same master, etc.). In some embodiments, process 300 can receive genetic sequencing results using any suitable technique or combination of techniques, such as techniques described above in connection with 302.
At 314, process 300 can divide the new oligo libraries into i relative titer bins (e.g., one or more titer bins). In some embodiments, process 300 can use the titer concentration ranges used to divide the libraries at 304 to divide the new samples. At 316, process 300 can calculate a prediction band for each titer bin based on all results from the new libraries that are included in that titer bin. In some embodiments, process 300 can use any suitable technique or combination of techniques to calculate a prediction band, such as techniques described above in connection with 306. As described above, in some embodiments, process 300 can omit 314. For example, in lieu of dividing the oligo libraries associated with each preparation (e.g., sample, pool, master, etc.) into i relative titer bins, process 300 can use a single titer bin (e.g., with a range that includes all of the concentration). As another example, in lieu of dividing the oligo libraries into multiple titer bins, process 300 can utilize a single titer bin (e.g., such that i=1).
At 318, process 300 can generate a comparison of the prediction bands for the new libraries with the final boundaries of the prediction band (e.g., for each titer bin). In some embodiments, the comparison can be used to evaluate the quality of the new preparation(s) from which the new libraries were derived with respect to the quality of the preparation(s) used to generate the final prediction band boundaries at 310.
At 320, process 300 can present a report that is indicative of the relative quality of the new preparation(s) based on the quality of the original samples (e.g., the original samples used to generate the final prediction band at 320).
In some embodiments, the report can include any suitable information and/or graphics. For example, the report can include graphical information shown in, and described below in connection with, one or more of
In some embodiments, 312 to 318 can be omitted, and the report can include information that is indicative of the original preparation(s) and/or that includes comparisons of various subgroups from the original libraries derived from the original preparation. For example, the report can include graphical information shown in, and described below in connection with, one or more of
As shown in
As shown in
As shown in
In some embodiments, any suitable computer readable media can be used for storing instructions for performing the functions and/or processes described herein. For example, in some embodiments, computer readable media can be transitory or non-transitory. For example, non-transitory computer readable media can include media such as magnetic media (such as hard disks, floppy disks, etc.), optical media (such as compact discs, digital video discs, Blu-ray discs, etc.), semiconductor media (such as RAM, Flash memory, electrically programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), etc.), any suitable media that is not fleeting or devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, or any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
It should be noted that, as used herein, the term mechanism can encompass hardware, software, firmware, or any suitable combination thereof.
It should be understood that the above described steps of the processes of
Although the invention has been described and illustrated in the foregoing illustrative embodiments, it is understood that the present disclosure has been made only by way of example, and that numerous changes in the details of implementation of the invention can be made without departing from the spirit and scope of the invention, which is limited only by the claims that follow. Features of the disclosed embodiments can be combined and rearranged in various ways.
This application is based on, claims the benefit of, and claims priority to U.S. Provisional Application No. 63/059,542, filed Jul. 31, 2020, which is hereby incorporated herein by reference in its entirety for all purposes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/044026 | 7/30/2021 | WO |
Number | Date | Country | |
---|---|---|---|
63059542 | Jul 2020 | US |