The instant application contains a Sequence Listing which has been submitted electronically in XML file format and is hereby incorporated by reference in its entirety. Said XML copy, created on Apr. 11, 2024, is named 64100-718_301_SL.xml and is 8,015 bytes in size.
The invention relates to encoded assays, in which a target analyte is detected based on association of the target with a code, and detection of the code as a surrogate for detection of the target analyte.
Many assays such as single base detection assays require high-level of sensitivity and specificity and are associated with low signal level. Low signal requires amplification (e.g., PCR, immunostaining cascades, and the like) resulting in complex and lengthy protocols, high-level of background and other biases limiting the performance of the assay. There is a need in the art for assays that are easier to read and detect at higher sensitivity than the analyte itself.
The features and advantages of the present invention will be more clearly understood from the following description taken in conjunction with the accompanying drawings, which are not necessarily drawn to scale, and wherein:
In various embodiments of the invention, a method is provided of conducting an assay for a set of targets, the method comprising: (a) subjecting a set of targets to a recognition event, in which each target is uniquely recognized by and bound to at least one recognition element from a set of coded recognition elements, each recognition element comprising a target-specific binding site and a code from a set of codes, each code comprises at least one segment encoding one or more symbols that correspond to a sequence of one or more nucleotides, to yield a set of coded targets comprising the target and recognition element; (b) subjecting the recognition elements of the set of coded targets to a molecular transformation event to yield a set of modified recognition elements comprising the codes, such that the codes of the modified recognition elements can be amplified in an amplification event; and (c) performing the amplification event for each code of the modified recognition elements and detecting the targets associated with the set of modified recognition elements by decoding the codes that are amplified.
In other embodiments, a method is provided for conducting an assay for a set of targets. The method includes (a) subjecting a set of targets to a recognition event, in which each target is uniquely recognized by and bound to a recognition element comprising a code from a set of codes; (b) subjecting the recognition elements to a molecular transformation event to produce a set of modified recognition elements in the presence of the target and a set of unmodified recognition elements in the absence of the target, in which the codes of the modified recognition elements can be amplified and the codes of the unmodified recognition elements cannot be amplified in an amplification event; and (c) performing the amplification event on the transformed recognition elements and detecting the targets associated with the set of modified recognition elements by decoding the codes that are amplified. Each code may include at least one segment encoding one or more symbols that correspond to a sequence of one or more nucleotides.
In the methods of the invention, the codes may be soft decodable codes.
In some instances, the set of coded recognition elements may include at least 10, 100, 1,000, or 10,000 coded recognition elements and each of the coded recognition elements includes a soft decodable code.
Decoding the codes that are amplified in the methods of the invention may include: (a) recording signal produced in response to interrogation of each segment of the codes; and (b) upon completion of the interrogation, determining a probably of the presence of each of the codes by applying a soft-decision probabilistic decoding algorithm to the recorded signal, wherein the presence of the code is indicative of the presence of the target.
In various embodiments of the invention, the codes in the set of coded recognition elements are the same length. In other instances, at least a subset of the set of coded recognition elements has codes of the same length.
In some embodiments of the methods of the invention, the set of coded recognition elements consists of tens, hundreds, thousands, or up to tens of thousands of the coded recognition elements, decoding the codes that are amplified includes decoding the codes by a soft decoding method, and the codes are trellis codes and at least a subset of the trellis codes has the same length.
In some instances, a method is provided for conducting an assay for a set of target analytes that includes: (a) performing a recognition and amplification event on a set of coded target analytes potentially present in a sample to generate a set of rolling circle amplification products (RCPs) from the target analytes or representative of the target analytes present in the sample, wherein each of the coded RCPs comprises multiple copies of a nucleic acid code from a set of codes, wherein each code comprises at least one segment encoding one or more symbols that correspond to a sequence of one or more nucleotides; (b) recording signal produced in response to interrogation of each segment of the codes; and (c) upon completion of the interrogation, determining a probably of the presence of each of the codes by applying a soft-decision probabilistic decoding algorithm to the recorded signal, wherein the presence of the code is indicative of the presence of the target analyte.
The set of coded RCPs may include at least 10, 100, 1,000, or 10,000 coded RCPs and each of the coded RCPs may include a soft decodable code.
In the methods of the invention, decoding the codes that are amplified or interrogation of the segments can include one or a combination of nanopore sequencing, next-generation sequencing, massively parallel sequencing, Sanger sequencing, sequencing by synthesis (SBS), pyrosequencing, sequencing by hybridization, decoding by hybridization, single molecule real-time sequencing, SOLID, and sequencing by ligation.
Each segment of the codes of the invention may include one symbol corresponding to one nucleotide. Each of the codes may include up to 50 segments for a length of each code comprising up to 50 nucleotides. Interrogation of the up to 50 segments having one symbol corresponding to one nucleotide sequencing may be performed by sequencing by synthesis (SBS).
In other embodiments, each segment may include one symbol corresponding to more than one nucleotide.
In various instances, each code may include two or more segments. Each code may include three or more segments. Each code may include four or more segments. In some cases, each code includes five to sixteen segments.
In one embodiment, interrogation of the segments including one symbol corresponding to more than one nucleotide is performed by decoding by hybridization. In some instances, at least one of the segments is interrogated more than one time by hybridization with one or more hybridization probes each having at least one label to produce the signal. At least four different labels may be utilized in the decoding by hybridization. In one example, each code includes at least four segments and at least sixteen symbols. In the case that at least one of the segments is interrogated more than one time by hybridization with one or more hybridization probes each having at least one label to produce the signal, a unique number of possibilities at each of the segments includes up to a number of the different labels to the power of a number of the hybridizations per segment. The label may be an optical label. The label may be a fluorescent label. At least one probe may include two or more of the labels to create a pseudo label and generate a larger number of the symbols.
In the methods of the invention, the set of targets may include tens of target analytes, hundreds of target analytes, thousands of target analytes, or tens of thousands of target analytes.
The set of targets may be nucleic acid targets, polypeptide targets, or both nucleic acid and polypeptide targets.
In the various embodiments of the invention, at least one of the following I, II or IlI may be true: (I)(A) the set of targets is immobilized on a surface; or (I)(B) the recognition element in the recognition event is immobilized on a surface; (II) the amplification event is performed on a surface; or (III) the amplification event and the recognition event are performed on the same surface.
In some instances, the set of rolling circle amplification products generated in the amplification event are attached non-covalently to a charged surface. The surface may be a cation-coated surface. The surface may be a polylysine coated surface.
In various embodiments, encoded probes, sets of encoded probes, and compositions including the sets of encoded probes are provided.
In one instance, a set of coded oligonucleotide probes is provided, each probe including a target-specific binding site and a code from a set of codes. In this instance, each code is a soft decodable code that includes at least one segment encoding one or more symbols that correspond to a sequence of one or more nucleotides.
The set of coded oligonucleotide probes may include padlock probes.
The set of coded oligonucleotide probes may include at least 10, 100, 1,000, or 10,000 probes.
“A,” “an” and “the” include their plural forms unless the context clearly dictates otherwise.
“About” means approximately, roughly, around, or in the region of. When “about” is used with a numerical range, it modifies that range by extending the boundaries above and below the numerical values indicated. “About” can modify a numerical value above and below the stated value by a variance of, e.g., 10 percent up or down (higher or lower).
“And” is used interchangeably with “or” unless expressly stated otherwise.
“Include,” “including,” and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of “including, but not limited to.”
“Invention,” “the invention” and the like are intended to refer to various embodiments or aspects of subject matter disclosed herein and are not intended to limit the invention to the specific embodiments or aspects of the invention referred to.
The terms “coded” and “encoded” are intended to have the same meaning and are herein used interchangeably.
“Linked” with respect to two nucleic acids means not only a fusion of a first moiety to a second moiety at the C-terminus or the N-terminus, but also includes insertion of the first moiety to the second moiety into a common nucleic acid. Thus, for example, the nucleic acid A may be linked directly to nucleic acid B such that A is adjacent to B (-A-B-), but nucleic acid A may be linked indirectly to nucleic acid B, by intervening nucleotide or nucleotide sequence C between A and B (e.g., -A-C-B- or -B-C-A-). The term “linked” is intended to encompass these various possibilities.
“Optimum,” “optimal,” “optimize” and the like are not intended to limit the invention to the absolute optimum state of the aspect or characteristic being optimized but will include improved but less than optimum states.
The terms “rolling circle amplification products (RCPs)” and “nanoballs” are intended to have the same meaning and are herein used interchangeably.
“Sample” means a source of target or analyte. Examples of samples include biological samples, such as whole blood, lymphatic fluid, serum, plasma, sweat, tear, saliva, sputum, cerebrospinal fluid, amniotic fluid, seminal fluid, vaginal excretion, serous fluid, synovial fluid, pericardial fluid, peritoneal fluid, pleural fluid, transudates, exudates, cystic fluid, bile, urine, gastric fluid, intestinal fluid, fecal samples, liquids containing single or multiple cells, liquids containing organelles, fluidized tissues, fluidized organisms, liquids containing multi-celled organisms, biological swabs and biological washes. Samples may be from any organism (e.g., prokaryotes, eukaryotes, plants, animals, humans) or other sample (e.g., environmental or forensic samples).
“Set” includes sets of one or more elements or objects. A “subset” of a set includes any number elements or objects from the set, from one up to all of the elements of the set.
“Subject” includes any plant or animal, including without limitation, humans.
“Target” means a nucleic acid analyte (e.g., mRNA, cfDNA etc.) or a proxy for the target analyte of interest (e.g., an antibody conjugated with oligonucleotide). Thus, in some instances, the term “target” and the term “target analyte” are used interchangeably. “Target” with respect to a nucleic acid includes wild-type and mutated nucleic acid sequences, including for example, point mutations (e.g., substitutions, insertions and deletions), chromosomal mutations (e.g., inversions, deletions, duplications), and copy number variations (e.g., gene amplifications). “Target” with respect to a nucleic acid may also include the presence or absence of one or more methyl groups on the nucleic acid target. “Target” with respect to a polypeptide includes wild-type and mutated polypeptides of any length, including proteins and peptides.”
“Decoding” with respect to a code includes determining the presence of a known code or a probability of the presence of a known code with or without determining the sequence of the code. Decoding may be hard decision decoding. Decoding may be soft decision decoding.
“Identify,” “determine” and the like with respect to codes, targets or analytes of the invention are intended to include any or all of: (A) an indication of the presence or absence of the relevant code, target or analyte, (B) an indication of the probability of the presence or absence of the relevant code, target or analyte, and/or (C) quantification of the relevant code, target or analyte.
“Hard decision decoding” or “hard decision” refers to a method or model that includes making a call for each nucleotide in a nucleic acid segment (commonly referred to as a “base call”) in order to identify nucleotides in the nucleic acid segment. Models of the invention incorporate hard decision decoding models. The particular nucleic acid being decoded may be or include a code of the invention.
“Soft decision decoding” or “soft decision” refers to a method or a model that uses data collected during a sequencing or decoding process to calculate a probability that a particular nucleic acid or nucleic acid segment is present. The probability may optionally be calculated without making a base call for each nucleotide in a nucleic acid segment. In another example, a probability is calculated without making a hard call that a string of nucleic acids in a segment are present. Instead of making a hard call for each nucleotide or nucleotide segment, a probabilistic decoding algorithm is applied to the recorded signal upon completion of signal collection. A probability of the presence of each of the codes may be determined without discarding signal in contrast to hard decision decoding method in which hard calls are made during the signal collection process. In soft decision decoding, the data may, for example, include or be calculated from, intensity readings in spectral bands for signals produced by the sequencing/decoding chemistry. In one embodiment, soft decision decoding uses data collected during a sequencing/decoding process to calculate a probability that a particular nucleic acid segment from a known set of sequences is present. Models of the invention may be used for soft decision decoding. The particular nucleic acid or nucleic acid segment being decoded may be or include a code of the invention.
“Phasing” or “signal phasing” means misalignment of SBS cycles during an SBS process caused by the non-incorporation of a nucleotide during a cycle or by the incorporation of two or more nucleotides during an SBS cycle.
“Droop” or “signal droop” means signal decay that occurs during an SBS process, which may be caused by some complementary strands being synthesized as part of the SBS process being blocked, preventing further nucleotide incorporation.
“Sample” means a set of nucleic acids for testing. A sample preparation process may be used to produce a sequencing-ready sample from a raw sample or partially processed sample. Note that one or more samples may be combined for sample preparation and/or sequencing and may be distinguished post-sequencing using sample-specific DNA barcodes linked to sample fragments.
“Crosstalk” refers to the situation in which a signal from one nucleotide addition reaction may be picked up by multiple channels (referred to as “color crosstalk”) or the situation in which a signal from a nanoball or sequencing cluster interferes with an adjacent or nearby cluster or nanoball (referred to as “cluster crosstalk” or “nanoball crosstalk”).
“Color channel” means a set of optical elements for sensing and recording an electromagnetic signal from a sequencing reaction. Examples of optical elements include lenses, filters, mirrors, and cameras.
“Spectral band” or “spectral region” means a continuous wavelength range in the electromagnetic spectrum.
Headings are included herein for reference and to aid in locating the various sections. These headings are not intended to limit the scope of the concepts described with respect to the headings.
The description and examples should not be construed as limiting the scope of the invention to the embodiments and examples described herein, but as encompassing all modifications and alternatives falling within the true scope and spirit of the invention.
The disclosure provides encoded assays for detection of target analytes in a sample. At a high level, in an encoded assay, a target analyte (“target”) is detected based on association of the target with a code and detection of the code is a surrogate for detection of the analyte.
In various embodiments, an encoded assay may include a recognition event in which a target is uniquely recognized by a recognition element. The recognition event may be effected by submitting targets of a set of targets to a recognition event, in which each target is uniquely recognized by and bound to a recognition element associated with a code, thereby yielding a set of coded targets comprising the target and the recognition element.
In various embodiments, an encoded assay may include a transformation event, in which a high-fidelity molecular transformation of the recognition element associated with a code produces a modified recognition element. The transformation event may be effected by submitting each recognition element of the set of coded targets to a transformation event, in which a molecular transformation of each recognition element produces a modified recognition element, thereby yielding a set of modified recognition elements comprising the code.
In various embodiments, an encoded assay may include a detection event, which detects the code as a surrogate for detection of the analyte, e.g., by decoding, the code, such as by recognizing or decoding code (and optionally other elements). The detection event may include an amplification step in which each code of the set of modified recognition elements is amplified, thereby yielding a set of amplified codes. Amplified codes of the set of amplified codes may have their sequences determined or decoded using a variety of techniques, including for example, but not limited to, microarray detection, or nucleic acid sequencing. In some cases, the detection step may be integrated with the amplification step, e.g., as in amplification with intercalating dyes.
In one embodiment, the method may include:
In one embodiment, the method may include:
As described in more detail herein, the recognition event, transformation event, and the detection event may occur sequentially, or combinations of the steps may occur simultaneously, e.g., as a single combined step. For example, the transformation event and the coding event may be simultaneous, such that the sequential process involves (i) recognition event, followed by (ii) transformation event/coding event, followed by (iii) detection event.
To further illustrate the encoded assays:
The codes may be error corrected and thus easy to distinguish from each other, so they can be detected a low abundance and in the presence of high level of background and in the presence of many other codes.
Since many assays can be converted into codes, the invention provides for multiomic assays where a sample is analyzed in multiple parallel workflows that are analyte-dependent and then converge codes that can be then detected simultaneously in a single platform. Parallel assay workflows may be merged into a single workflow, where multiple targets and target-types (e.g., nucleic acids and polypeptides) may be detected simultaneously in a single workflow and also read simultaneously within the same readout platform.
Following recognition and transformation, the codes may be detected and matched to targets for identification and/or quantification of targets present in the sample.
The encoded assays of the invention make use of codewords or codes. The codes may be detected as surrogates in the place of direct analysis of target analytes. As an example, a target analyte may be a particular nucleic acid fragment (e.g., a nucleic acid fragment with a specific mutation); in the assays of the invention, a codeword may be associated with the nucleic acid fragment and the codeword may be read to identify the presence of the nucleic acid fragment in the sample.
For example, a code may be a predetermined sequence ranging from about 3 to about 100 nucleotides or about 3 to about 75 nucleotides. Codes may have sequences selected to avoid inadvertent interaction with other assay components, such as targets, probes, or primers. Code sequences may be selected to ensure that codes differ from each other to permit unique identifiability during the decoding process.
The invention includes a dataset or database of codes generated using the methods of the invention. The dataset or database may associate the codes with other assay elements, such as primers or probes linked to the probes. The invention also includes a method of making a probe set comprising synthesizing probes having the sequences set forth in the dataset or database.
In one embodiment, the codes are homopolymer-free codes. For standard genomic applications that use a full 4-ary nucleotide alphabet of {ACGT}, the method uses a 4-state encoding trellis with 3 transitions per state.
As illustrated in
In one embodiment, codes for the set of codes are selected using a 4-ary alphabet, avoid homopolymers, and every code in the set is different from every other code in the set. The codes may be generated using the trellis method.
In one embodiment, codes for the set of codes are selected using a 3-ary alphabet, avoid homopolymers, and every code in the set is different from every other code in the set. The codes may be generated using the trellis method.
This method will eliminate all repeats. The same method can be applied to generate homopolymer codes for 3-ary alphabets (eg., {C, G, T}), and larger 5-ary+ alphabets (such as oligopolymers).
Codes may be optimized for pyrosequencing and similar cyclic serial dispensation schemes. In one embodiment, the invention provides a locus code-encoding approach for pyrosequencing or similar serial (rather than pooled) primer dispensation methods. The method generates homopolymer-free codes.
When the locus code is encapsulated between header and tail bases, all generated codewords finish decoding at the same time. The technique avoids unexpected spurious incorporations that change how long in time that a codeword needs to finish its decoding. This is important because then a sequencer only need sample for a prescribed number of samples to obtain complete data for decoding the samples, regardless of the underlying codeword. This also keeps all codewords candidates aligned, so that the theoretical design distances between codewords are maintained.
The previously mentioned synchrony ensures that soft decision block decoding techniques can be applied during the decoding of its blocks of samples. This soft decision decoding guarantees that SNR requirements are improved by at least 2 dB—and sometimes by many factors—more when the signal strength significantly fades during the reception of codeword samples.
In pyrosequencing, nucleotides are dispensed sequentially (and non-overlappingly) in a cycle, such as G, C, T, A, G, C, T, A, G, C, . . . etc. This encoding is quite original because it doesn't directly encode bases; instead, it encodes base positions within G, C, T, A cycles. Each cycle element can be either populated, or unpopulated—and multiple elements within a cycle can be populated. For this to be implemented, the underlying code must be derived from a binary alphabet, with 1s and 0s. To emphasize, with these codes, more than one base can be incorporated within a single G, C, T, A dispensation cycle. This also implies that sequencing, though serial in nature, can be fast. And with the underlying {0,1} alphabet that underpins and drives the encoding of the populated/unpopulated cycle positions, all codewords are guaranteed to be of the same length—and to finish decoding in the same amount of time.
To provide coding gain, the sequence of 0s and 1s that comprise each codeword are derived from constructions of optimal binary error correction codes. Such codes possess many redundant parity bits, and these parity bits are designed such that each codeword varies from each other in multiple positions. This quality results in strong error correction capabilities.
Note the use of 4 states in the trellis. Each state represents previous mappings of that last two positions:
Transitions to next states indicate an update which either does not populate or does populate the next position in a sequence.
Four (4) states are used to correctly implement a pyrosequencing scheme that is homopolymer-free; one position is populated every 3 positions. Note that if 3 consecutive positions were allowed to be unfilled, then the 4th position would need to be filled (because an unzipped hybrid will have an opening to at least one of the four nucleotides). That 4th position being filled would result in generation of a homopolymer (repeat) of bases in a sequence—since the last filled base was the same base in the cycle before.
This aforementioned restriction explains the double transition from the 00 state to the 10 state in the trellis diagram. A current state of 00 transitioning to a next state of 00 would imply 3 positions in a row were unfilled.
Optimal error correction codes are constructed to maximize distance between their sets of codewords. They are not constrained to disallow runs of three consecutive zeros. That would reduce the degrees of freedom they use to maximize distance. By contrast, the mappings to pyrosequenced positions comply with homopolymer-free and pyrosequencing constraints.
All other transitions in the picture design trellis are natural results of populating a position with a ‘0’ or a ‘1’ and updating the next state to reflect that transition. Since 7 of the 8 transitions in the trellis perfectly express the underlying error correction code's structure, such a code can be quite effective and powerful.
Weakening transitions occur when the underlying code has 3 consecutive zeros. One way to reduce those appearances is to use the sorting methodology described above. This method modestly reduces the library of codes. This method also ensures that the pyro-mapped codewords that best reflect the underlying binary code's structure are faithfully reproduced, while those least reflective are not.
Another method to improve the weakening due to transitions involves breaking up strings of zeros by interleaving the code. Within a code, the (systematic) information section of bits—which precede the redundant section of parity bits—are the bits where the most consecutive zeros are usually seen. One way to eliminate those strings of zeros is to interleave the entire code design, so that the parity and information bits are intermingled. All codewords may be intermingled by the same interleaving pattern. The interleaving technique does not help for the all-zeros codeword, which is generated by almost all linear codes. The all-zeros codeword can be excluded from the codeword set.
For the purposes of the specification and claims, the codes of the invention that are based on an encoding trellis as illustrated in
In an encoded assay, a target is detected based on association of the target with a code, and detection of the code is used as a surrogate for detection of the analyte. A variety of techniques may be used to amplify and read the codes.
In one embodiment, codes of the invention are amplified using rolling circle amplification (RCA) to produce DNA nanoballs that include many duplicates of the code. An RCA reaction may include one or more rounds of amplification to produce the nanoball product. A nanoball may be from about 10,000 to about 1,000,000 or more nucleotides in length. A nanoball may include from about 100 to about 10,000 or more copies of the amplified code.
In one embodiment, the codes of the invention are amplified using a linear PCR amplification reaction to generate double stranded DNA amplicon products.
In one embodiment, codes of the invention are amplified using bridge amplification to produce clusters of oligos on a surface.
In one embodiment, codes of the invention are amplified on bead surfaces to produce bead-attached oligos.
In one embodiment, the amplified codes are read in a sequencing reaction.
In one embodiment, codes of the invention are detected using a patterned array, such as a microarray comprising oligos which are complimentary to the codes.
In one embodiment, codes of the invention are detected in situ, i.e., in a cell or tissue.
In one embodiment, in situ detection comprises reading the code in a sequencing reaction.
In one embodiment, codes of the invention are detected using an electronic/electrical sensing mechanism.
A variety of techniques and models may be used to identify a nucleic acid code of the invention. In one embodiment, the invention provides models that make use of hard decision decoding methods or models. In another embodiment, the invention provides models that make use of soft decision decoding methods or models.
When using soft decision decoding techniques, it is not necessary for the model to identify each base specifically. For example, signals generated during each nucleotide addition cycle of a sequencing process may be detected and recorded to produce a data set that may be used as input into a model of the invention to calculate a probability that a specific code is present without requiring a hard decoding model. Although it is not necessary in a soft decision decoding model to make a hard decision about the identity of each nucleotide, a model developed according to the methods of the invention may nevertheless include a model for assigning a probability or identity to each nucleotide in the sequence of a code.
Data gathered during a sequencing process may, for example, include intensity readings for signals produced by the sequencing chemistry in various spectral bands. For example, in some cases the data is collected across a set of spectral bands that corresponds to part or all of the spectral bands expected to be produced by a series of nucleotide extension steps during a sequencing process.
In some embodiments, it is not necessary to filter light from each nucleotide extension step in order to distinguish between the nucleotides. Instead, a set of intensity readings may be detected, stored and used as input into a model of the invention for determining a probability that a particular code is present. In other embodiments, one or more filters may be used to refine signals from a sequencing process.
A model may be developed or trained using sequencing data from known codes, such as signal intensity data across a predetermined spectrum, during a sequencing process. The model may be used to calculate a set of probabilities across a set of one or more codes, indicating, for example, for each code, a probability that it is present in a sample.
In some cases, the model is developed or trained using data corresponding to color intensity signals across multiple color channels. In some cases, the model is developed or trained using data corresponding to color intensity signals across four color channels, each generally corresponding to the signal produced by addition of one of the four nucleotides A, T, C or G during a sequencing process. As discussed elsewhere in this specification, the channels may experience color crosstalk.
A model may be built using data obtained using multiple light sensing channels. Each channel may be specific for a specific frequency bandwidth. In some cases, the model may be built using four channels, wherein the bandwidth of each channel may be selected for signals produced by addition of one of the four nucleotides A, T, C or G. In other cases, more or less than four channels may be used to collect data used to produce the model.
In certain embodiments of the invention, each channel detects a bandwidth region of a fluorescence signal produced by addition of one of the four nucleotides. Nevertheless, the bandwidth of the signal produced by addition of one of the four nucleotides may be spread across a spectral band that overlaps with other channels. This effect is illustrated in
As will be discussed in the examples below, a color crosstalk model may be empirically developed and used as input into the model of the invention for producing a probability that a code is present. Relative coefficient strength may be experimentally determined across color channels for signal produced by addition of each nucleotide (A, T, C, G) from empirically produced test data.
Other factors that may be included in a statistical model according to the invention for calculating a probability that a code is present include signal phasing, signal droop, color cross-talk values, fluctuations in in color cross-talk values, noise, amplitude noise, gaussian amplitude models, and base calling algorithms.
The model of the invention may also account for various sources of noise and error, such as variability in the concentration of the active molecules in the assay, variability in color channel response due primarily to limited ability to estimate the color channel responses individually for each cluster, and background and random error noise sources. A concentration noise model may be used to model the variable density of active molecules for a given cluster. A transduction noise model may be included to model variability in the color crosstalk matrix.
Accurately modeling the biochemical opto-mechanical processes in DNA sequencing is a complex process. Furthermore, to derive the inputs for a soft decision probabilistic signal estimator requires estimating the parameters driving the model, as well as having strong confidence that the model is accurate. Under these two assumptions, metrics can be computed that work directly with the received signals. In the commercially available base call algorithms, channel distortion effects are compensated for before the decision process; however, in soft decision decoding of the invention it is not necessary to compensate for distortions before decoding. Embodiments which do not compensate for distortions before decoding will have the advantage of avoiding information loss compensations, such as inversions.
The probability that a particular code is present may be indicative of the probability that a particular target associated with the probe is present. Data indicating the probability that a particular target is present may be used, for example, to calculate probabilities relevant to diagnosis or screening of various medical conditions, or selection of drugs for treatment of various medical conditions.
The disclosure provides encoded probes that can be decoding using soft decision decoding methods or models. The codes may be generated using the trellis method and the codes may be referred to as “trellis codes”. The probes of the invention may be padlock probes that include a soft decodable code, such as a trellis code. The probes of the invention may be a dual probe that includes a soft decodable code, such as a trellis code.
The disclosure provides assays that make use of encoded probes that may be decoded using soft decision decoding (“soft decoding”). In various embodiments, the assays make use of mixtures of probes, each with a soft decodable code. A mixture may include 100s, 1000s, 10000s, 100000s or more of encoded probes.
In some instances of the methods of the invention, decoding code is performed without making a specific base call for each nucleotide in the code.
In some embodiments, a hybridization-based detection method may be used to decode the code. In one embodiment, the amplified codes are identified using oligonucleotide probes in a hybridization-based reaction. The amplified codes may be identified using decoding by hybridization. In one example, the hybridization-based detection method uses fluorescently labeled oligonucleotide probes. The code data may then be used as a digital count of the target-specific detection events.
The encoded assays make use of recognition elements and encoded probe sequences (“encoded probes”) for detecting a panel of target analytes (“targets”).
An assay using encoded probes (i.e., an encoded assay) may include: (i) a recognition event, in which a target is uniquely recognized and bound by a recognition element associated with a code (i.e., an encoded probe); (ii) a transformation event, in which a molecular transformation of the recognition element produces a modified recognition element comprising the code that may be used to provide a measure of the presence or absence of the target; and (iii) a detection event, that uses the code as a surrogate for detection of the target, e.g., by recognizing or decoding code (and optionally other elements).
An encoded assay may be a solution-based assay.
An encoded assay may be a surface-bound assay, e.g., on a flow cell or on beads.
An encoded assay may be a hybrid assay that includes a surface-bound component and a solution-based component.
An encoded assay maybe performed in a plate-based format (e.g., a multi-well plate). The multi-well plate may include, for example, an array of nanowells.
An encoded assay may be performed on a microfluidics device.
The encoded probe may include other functional sequences such as sequencing primers, one or more amplification primer sequences, unique identifier sequences (UMIs) and sample indexes. The sequencing primers may, in some cases, be adjacent to the code sequence. The amplification primer sequences may, in some cases, be universal primer sequences that are common to all probes in a set of encoded probes.
An encoded probe may be a padlock probe that includes a recognition element associated with a code. The code may be a soft decodable code, such as a trellis code.
Thus, for example, the disclosure provides a padlock probe in which the terminal sequences comprise a probe and a soft decodable code is provided between the terminal sequences. Similarly, the disclosure provides a padlock probe in which the terminal sequences comprise a probe and a trellis code is provided between the terminal sequences. The disclosure provides a set of 10 or more padlock probes in each of which (A) the terminal sequences comprise a probe and (B) a soft decodable code is provided between the terminal sequences. The disclosure provides a set of 100 or more padlock probes in each of which (A) the terminal sequences comprise a probe and (B) a soft decodable code is provided between the terminal sequences. The disclosure provides a set of 1000 or more padlock probes in each of which (A) the terminal sequences comprise a probe and (B) a soft decodable code is provided between the terminal sequences. The disclosure provides a set of 10,000 or more padlock probes in each of which (A) the terminal sequences comprise a probe and (B) a soft decodable code is provided between the terminal sequences. In certain embodiments, the foregoing sets are provided in the absence of any padlock probes that do not include the soft decodable codes. In certain embodiments, the foregoing sets are provided with codes that are homopolymer-free and soft decodable.
The disclosure provides a set of 10 or more padlock probes in each of which (A) the terminal sequences comprise a probe and (B) a trellis code is provided between the terminal sequences. The disclosure provides a set of 100 or more padlock probes in each of which (A) the terminal sequences comprise a probe and (B) a trellis code is provided between the terminal sequences. The disclosure provides a set of 1000 or more padlock probes in each of which (A) the terminal sequences comprise a probe and (B) a trellis code is provided between the terminal sequences. The disclosure provides a set of 10,000 or more padlock probes in each of which (A) the terminal sequences comprise a probe and (B) a trellis code is provided between the terminal sequences. In certain embodiments, the foregoing sets are provided in the absence of any padlock probes that do not include the trellis codes. In certain embodiments, the foregoing sets are provided with codes that are homopolymer-free trellis codes.
An encoded probe may be a molecular inversion probe that includes a recognition element associated with a code. The code may be a soft decodable code, such as a trellis code.
The disclosure provides a set of 10 or more molecular inversion probes in each of which (A) the terminal sequences comprise a probe and (B) a soft decodable code is provided between the terminal sequences. The disclosure provides a set of 100 or more molecular inversion probes in each of which (A) the terminal sequences comprise a probe and (B) a soft decodable code is provided between the terminal sequences. The disclosure provides a set of 1000 or more molecular inversion probes in each of which (A) the terminal sequences comprise a probe and (B) a soft decodable code is provided between the terminal sequences. The disclosure provides a set of 10,000 or more molecular inversion probes in each of which (A) the terminal sequences comprise a probe and (B) a soft decodable code is provided between the terminal sequences. In certain embodiments, the foregoing sets are provided in the absence of any molecular inversion probes that do not include the soft decodable codes. In certain embodiments, the foregoing sets are provided with codes that are homopolymer-free and soft decodable.
The disclosure provides a set of 10 or more molecular inversion probes in each of which (A) the terminal sequences comprise a probe and (B) a trellis code is provided between the terminal sequences. The disclosure provides a set of 100 or more molecular inversion probes in each of which (A) the terminal sequences comprise a probe and (B) a trellis code is provided between the terminal sequences. The disclosure provides a set of 1000 or more molecular inversion probes in each of which (A) the terminal sequences comprise a probe and (B) a trellis code is provided between the terminal sequences. The disclosure provides a set of 10,000 or more molecular inversion probes in each of which (A) the terminal sequences comprise a probe and (B) a trellis code is provided between the terminal sequences. In certain embodiments, the foregoing sets are provided in the absence of any molecular inversion probes that do not include the trellis codes. In certain embodiments, the foregoing sets are provided with codes that are homopolymer-free trellis codes.
The transformation event may include a ligation or gap-fill ligation reaction to produce the modified recognition element comprising the code.
The detection event may include an amplification step in which the code sequence (among other elements) is amplified. Amplification may be by any method of amplification, including for example, on-surface PCR, isothermal amplification, rolling circle amplification, and/or ultrarapid amplification. Surface based amplification may be performed using PCR with surface-anchored primers (e.g., Illumina bridge amplification technology) or recombinase polymerase amplification (RPA) (e.g., ExAmp technology).
In one embodiment, the amplification step comprises a rolling circle amplification (RCA) reaction to generate a nanoball product. In one embodiment, the amplification step comprises a rolling circle amplification (RCA) on an anionic surface to generate a nanoball product. In one embodiment, the amplification step comprises a rolling circle amplification (RCA) on a polylysine surface to generate a nanoball product. In one embodiment, the amplification step comprises a rolling circle amplification (RCA) on an anionic surface without covalently attaching the template to the surface to generate a nanoball product. In one embodiment, the amplification step comprises a rolling circle amplification (RCA) on a polylysine surface without covalently attaching the template to the surface to generate a nanoball product.
In one embodiment, an encoded probe may include a sequence which may prevent RCA of the probe, thereby allowing for production of linear double-stranded PCR products. The non-extendable sequence may, for example, be located between a pair of amplification primer sequences.
In one embodiment, an encoded probe may include a restriction enzyme site that may be cleaved to yield a linear DNA molecule.
In some embodiments, the amplified code may be sequenced to identify the sequence of the code associated with the target. Any sequencing technology may be used to sequence. Examples of sequencing technologies that may be used include sequencing by synthesis (e.g., pyrosequencing; sequencing by reversible terminator chemistry (Illumina)), avidity sequencing (Element Biosciences), sequencing by hybridization, sequencing by ligation, and nanopore sequencing.
In some embodiments, a sequencing library may be generated from a set of modified recognition elements comprising the codes. The library may be sequenced to determine the code associated with a target of interest. The code data may then be used as a digital count of the target-specific detection events. In some embodiments the code is a soft-decodabe code.
In one embodiment, a sequencing library comprising the code (among other elements) may be generated from a circularized padlock probe.
In one embodiment, a sequence library comprising the code (among other elements) may be generated from a nanoball product.
In one embodiment, a nanoball or a portion of the nanoball that includes the code (and optionally other elements) may be directly sequenced to determine the code associated with the target of interest. The code data may then be used as a digital count of the target-specific detection events.
In some embodiments, a hybridization-based detection method may be used to decode the code. In one embodiment, the amplified codes are decoded using oligonucleotide probes in a hybridization-based reaction such as, for example, decoding by hybridization. In one example, the hybridization-based detection method uses fluorescently labeled oligonucleotide probes. The code data may then be used as a digital count of the target-specific detection events. Decoding using a hybridization approach may be soft decoding.
The disclosure provides assays that make use of novel padlock probes comprising codes that may be used as a surrogate for detection of a target, e.g., by recognizing or decoding code (and optionally other elements). The code in a padlock probe may be a soft decodable code (e.g., a trellis code). A coded padlock probe may include target-specific regions that may be used for target recognition and enrichment. A coded padlock probe may include a 5′ terminal phosphate that may be used to facilitate ligation (i.e., circularization) after target recognition. A coded padlock probe may include a 3′ nucleotide that is the complement to a nucleotide at a target site of interest (e.g., a 3′ SNP-specific nucleotide). A coded padlock probe may include an RCA priming site that includes a primer sequence suitable for priming an RCA reaction.
For example, the coded padlock probe may include regions at the 3′ and 5′ ends that are complementary to regions of a target. The probe regions may hybridize to the target, and the probe may be circularized, e.g., by a ligation or gap-fill ligation reaction. As described elsewhere in this disclosure, the target may be a nucleic acid analyte (e.g., mRNA, cfDNA etc.) or a proxy for the analyte of interest (e.g., an antibody conjugated with oligonucleotide).
Target specific regions 510a and 510b may hybridize to the target, and the probe may be circularized. For example, when the complementary nucleotide is present in the target, the 3′ SNP specific nucleotide hybridizes to the target, enabling circularization, e.g., by ligation or gap-fill ligation. Other types of features or mutations may be detected by varying the terminal nucleotide (N) or nucleotides of target specific region 510a and/or target specific regions 510b to hybridize when the target feature is present and not hybridize when the target feature is not present.
Coded padlock probe 500 may include an RCA priming site 515 that includes a primer sequence suitable for priming an RCA reaction. In this example, RCA priming site 515 is downstream from target specific region 515b. However, other locations are possible, as long as the positioning the primer site doesn't interfere with the other functions of the probe, e.g., the probe hybridization function and the encoding function.
A coded padlock probe may optionally include other functional sequences. For example, the probe may include index sequences which are unique oligo identifiers present in the probe sequence or inserted as part of the assay. Index sequences, such as sample barcodes, allow differentiation among different samples, experiments, etc. during the detection event (i.e., reading (decoding) the code).
The coded padlock probe may include unique molecular identifiers. UMIs may be inserted anywhere within the probe to address downstream readout and data analysis purposes. For example, UMIs may be introduced to distinguish unique recognition events with single-molecule resolution during the readout. UMI's may facilitate error correction and/or individual molecule counting.
A coded padlock probe may include other primers in addition to the priming region required for RCA amplification. Other priming regions may, for example, be present to facilitate the readout of an index, a UMI or other oligonucleotide sequences present in the probe. Priming regions may allow parallel or serial reading schemes. They may also be used to increase the amount of multiplexing or allow sequential readout. For instance, if a plurality of probes or amplified objects are present, only those containing a specific primer will be amplified or read. Primers may also be used to facilitate the capture and immobilization of a probe or amplified object onto a surface (e.g., via DNA-DNA hybridization).
A coded padlock probe may include one or more sequences recognizable by enzymes, such as endonucleases. Various sequences may be selected and used to facilitate additional transformations, such as digestion, nick or gap formation, phosphorylation etc. In one embodiment, the probe includes one or more restriction sites.
A coded padlock probe may include one or more non-natural NTP components. Examples include phosphorothioate groups, locked DNA (LNA), peptide DNA (PNA) and others, which may be included to improve certain features of the probe, such as melting temperature for target recognition, or primer recognition, or resistance to degradation. Additionally, abasic NTPs (“wobble bases”) may be included in the probe sequence to add degeneracy to targeting or priming regions and extend the ability to recognize a broader number of complementary sequences.
A coded padlock probe may include one or more chemical moieties. Such chemical moieties may be included in the probe structure or added at any stage of the workflow to enable additional transformations or properties. Examples include cleavable groups to open or linearize the probe, reactive groups to add additional components such as dyes, and groups to facilitate immobilization on surfaces.
A coded padlock probe may include CRISPR recognition sequences, oligo sequences designed to be recognized by CRISPR enzymes and replaced with other arbitrary sequences. The probe may optionally include one or more oligo sequences designed to be recognized by transposases and replaced with other arbitrary sequences.
A coded padlock probe may optionally include one or more adapter primers for compatibility with sequencing by synthesis (SBS) and other non-SBS platforms. The adapter primers may be included in the probe sequence or added at any stage as part of the workflow. Such adapter primers may be used directly to immobilize, cluster, extend, and amplify as precursor activities to a decoding run by SBS or another non-SBS method.
In one embodiment, a padlock probe assay workflow may include:
Index sequences, such as sample barcodes, allow differentiation among different samples, experiments, etc. during the detection event (i.e., reading (decoding) the code). Indexes may be added to a padlock probe using a variety of strategies.
Indexes may be added during the synthesis of a padlock probe. In this case, for every probe manufactured, the number of probes is N×P, where N is the number of indices and P is the plexity of the probe pool.
Indexes may be added after probe synthesis as part of manufacturing or at a site of use as a step prior to performing an encoded assay. In this case, only one synthesis is required for each probe and additional functional elements. Additional functional elements may be added to a probe to enable insertion of an index. Examples of functional elements that may be added include (i) non-natural nucleotides (e.g., biotin, amine, etc.) and (ii) polynucleotides that enable biochemical transformation of the probe to contain an index sequence such as adapters for ligations or extension ligations, restriction endonuclease recognition sites, and transposome binding sites.
Indexes may be added during an encoded assay. For example, a ligation reaction to insert an index can occur at the same time as ligation of the padlock probe at the target site of interest to generate a circularized padlock probe (i.e., the transformation event). In some cases, the ligation reaction may be a gap-fill extension/ligation reaction.
Indexes may be added after ligation of the padlock probe and RCA by including modified nucleotides during the RCA reaction. The modified nucleotides may then be coupled to an index sequence. In cases where there is a covalent or non-covalent interaction, either moiety can be linked to the index sequence or incorporated during RCA.
Examples of coupling strategies include: (i) ligand protein pairs such as biotin-streptavidin, antigen-antibody, CLIP tag and SNAP tag pair (i.e., O6-benzylguanine derivatives coupling to O6-alkylguanine-DNA-alkyltransferase, wherein either the protein or the substrate may be bound to the probe), carbohydrate-protein pairs (e.g., lectins), and digoxigenin-DIG-binding protein; (ii) peptide-protein pairs (e.g., SpyTag-SpyCatcher); and (iii) hybridizing indexes to a common sequence on the RCA product.
Indexes may be added to RCA products by restriction endonuclease cleavage followed by index ligation.
Indexes may be added to RCA products using a transposase enzyme that fragments and indexes the RCA products.
The encoded assays of the invention may be performed on a surface. For example, a target may be immobilized on a surface for conducting assays of the invention. The probes of the invention may be immobilized on a surface for conducting assays of the invention. DNA nanoballs of the invention may be immobilized on a surface for conducting assays of the invention. Various intermediate assemblies of molecules of the assays of the invention may be immobilized on a surface for conducting assays of the invention.
Various steps of the invention may be performed on a surface, such as target capture, recognition events, transformation events, amplification, and/or detection events, i.e., determination of the absence or presence of the code (e.g., by sequencing or hybridization-based detection).
Thus, for example, the disclosure provides a surface having a probe as described herein immobilized on the surface. The disclosure provides a surface having a nanoball as described herein immobilized on the surface. The disclosure provides a surface having a target immobilized on the surface. The disclosure provides a surface having a target immobilized on the surface with a probe as described herein hybridized to the target. The disclosure provides a surface having a probe immobilized on the surface with a target as described herein hybridized to the probe. The disclosure provides a surface having a target nucleic acid immobilized on the surface, and a protein or peptide bound to the target nucleic acid. The disclosure provides a surface having a target nucleic acid immobilized on the surface, and an antibody, aptamer, binder, or antibody fragment bound to the target nucleic acid. The disclosure provides a surface having a ligand that has affinity for any of the foregoing immobilized on the surface. For example, the ligand may have affinity for a probe as described herein, a nanoball as described herein, or a target as described herein. The ligand may, for example, be a protein, peptide, antibody, aptamer, binder, or antibody fragment.
A variety of surfaces may be used for the surface attachments described herein. In various embodiments, the surface includes an oxide, a nitride, a metal, an organic or an inorganic polymer (e.g., hydrogel, resin, plastic or other).
The surface may take a variety of forms, e.g., it may be flat or curved. It may be beads or particles. In some cases, the surface is the surface of a flow cell. Beads or other particles may in some embodiments range in size from less than 100 nm up to several centimeters.
Various surface modifications may be used to permit attachment of various components of the assays of the invention to a surface. For example, various anchoring ligands may be used (e.g., streptavidin, biotin, aptamers, antibodies, etc.). Chemical handles, such as click chemistry handles, may be used. Examples include azides, alkynes, unsaturated bonds, amines, carboxylic acids, NHS, DBCO, BCN, tetrazine, epoxy and the like. Single- or double-stranded oligonucleotides may be used. Size ranges of the oligonucleotides may, in some cases, be from about 10 to about 200 nucleotides. Proteins or peptides may be used for surface attachment. Charge-based molecules or polymers may be used, e.g., polyethylenimine.
Various techniques may be used to prepare a surface for binding to a target or to a component of an assay of the invention. In one example, a flow cell with primers may be used. A splint DNA segment that comprises a segment complementary to the primer and a segment that is complementary to the target, or the component of the assay may be hybridized to the primer. A variety of splints may be used on a surface, with various subsets of the splints having different segments complementary to different components of the invention or different targets. Specific splints may be arranged on different regions of a surface. For example, splints may be arranged in a manner that permits the identification of distinct regions of a surface targeted to specific analytes or components of the assays.
In various embodiments, amplification of a nucleic acid may occur on the surface. The nucleic acid may be a target or any nucleic acid component of an assay of the invention. For example, a target analyte may be amplified on a surface, or a probe of the invention may be amplified on a surface, and/or a fragment of any of the foregoing may be amplified on a surface. The amplification may be performed on a bead or particle, or on a flat surface, such as on the surface of a flow cell.
It should also be noted that DNA may be amplified in solution, e.g., in an aqueous suspension or emulsion, such as in microdroplets. Solution-based amplification may be performed, for example, in an open environment, such as the well of the microtiter plate, in a nanowell, or in an enclosed space, droplet in an emulsion, or on a flow cell or other microfluidic device.
Amplification may be by any method of amplification, including for example, PCR, isothermal amplification and/or ultrarapid amplification.
Attachment for immobilization of components of the assays or of targets may be covalent or non-covalent (e.g., Coulombic in nature), temporary or permanent, and/or rendered labile when subject to a particular stimulus.
Examples of mechanisms of lability include:
A variety of surface-based workflows are possible within the scope of the assays disclosed. In some embodiments, a surface-based workflow may use a padlock probe that includes a recognition element associated with a code. The code may be a soft decodable code, such as a trellis code. In some embodiments, a surface-based workflow may use a dual probe that includes a recognition element associated with a code (e.g., a trellis code).
In some embodiments, a surface-based workflow may include immobilizing a target on a surface and hybridizing a probe to the target. In one embodiment, a surface-based workflow may include:
In some embodiments, the target may be a nucleic acid, e.g., DNA. In this case, immobilization of the nucleic acid target (e.g., DNA) may be at an end of the target or via a side chain or internal segment of the target.
In a step 701, a target is immobilized on a surface. For example, a target 710 is immobilized on a surface 715 by an anchor element 720. In one example, target 710 is DNA and anchor element 720 is an oligonucleotide.
In a step 702, a linear probe is hybridized to the immobilized target. For example, a solution that includes a probe 725 is added and a hybridization reaction is performed to bind probe 725 to target 710. In one example, probe 725 is a coded padlock probe.
In a step 703, the probe is circularized. For example, a ligation reaction is performed to circularize probe 725 to produce a circular modified probe 730. In some cases, a gap-fill extension/ligation reaction is used to circularize probe 725 to produce the circular modified probe.
In a step 704, the circular modified probe is released from the immobilized target for downstream processing. For example, circular modified probe 730 may be dehybridized from target 710 and amplified in an RCA reaction to produce a nanoball product.
In some cases, the RCA reaction may be performed in a solution that remains in contact with the surface on which the target is immobilized (e.g., in the same container, well, reservoir, liquid volume or droplet). In some cases, the solution comprising the released modified probe may be transferred to a separate container prior to performing the RCA reaction. In some cases, the solution comprising the released modified probe may be transferred to a different surface prior to performing the RCA reaction.
In some embodiments, the immobilized target (e.g., DNA) may be used to prime the RCA reaction. In one embodiment, a surface-based workflow may include:
In a step 801, a target analyte is immobilized on a surface. For example, a target 710 is immobilized on a surface 715 by an anchor element 720. In one example, target 710 is DNA and anchor element 720 is an oligonucleotide.
In a step 802, a linear probe is hybridized to the immobilized target. For example, a solution that includes a probe 725 (e.g., a coded padlock probe) is added and a hybridization reaction is performed to bind probe 725 to target 710.
In a step 803, the probe is circularized. For example, a ligation reaction is performed to circularize probe 725 to produce a circular modified probe 730. In some cases, a gap-fill extension/ligation reaction is used to circularize probe 725 to produce the circular modified probe.
In a step 804, the immobilized target 710 is used to as a primer to initiate an RCA reaction to generate a nanoball product.
In some embodiments, a surface-based workflow may include immobilizing a probe (or a part thereof) on a surface and using the immobilized probe to capture a target. In one embodiment, a surface-based workflow may include:
In a step 901, a linear probe is immobilized on a surface. For example, a probe 910 is immobilized on a surface 915 by an anchor element 920. In one example, probe 910 is a padlock probe and anchor element 920 is an oligonucleotide.
In a step 902, a target is hybridized to the immobilized probe. For example, a solution that may include a target 925 is added and a hybridization reaction is performed to bind target 925 to probe 910.
In a step 903, the probe is circularized. For example, a ligation reaction is performed to circularize probe 910 to produce a circular modified probe 930. In some cases, a gap-fill extension/ligation reaction is used to circularize probe 910 to produce the circular modified probe.
In a step 904, the circular modified probe is amplified in an RCA reaction to generate a nanoball product. Circular modified probe 930 may be amplified without being released from the surface. For example, circular modified probe 930 may be amplified in an RCA reaction using target 925 as a primer to initiate the amplification reaction.
In some embodiments, the circular modified probe may be released from the surface prior to amplification. In some cases, the RCA reaction may be performed in a solution that remains in contact with the surface on which the probe was anchored (e.g., in the same container, well, reservoir, liquid volume or droplet). In some cases, the solution comprising the released modified probe may be transferred to a separate container prior to performing the RCA reaction.
In some embodiments, the solution comprising the released modified probe may be transferred to a different surface prior to performing the RCA reaction. In one embodiment, oligonucleotides bound to the new surface may be used as capture moieties to immobilize the circular modified probe on the surface and to initiate the amplification reaction. In one embodiment, the target may be immobilized on the new surface and used to initiate the amplification reaction.
A surface-based workflow may use a dual probe as a recognition element. In one embodiment, a surface-based workflow using a dual probe may include:
In some embodiments, the first probe and the second probe may both be immobilized on the surface.
In some embodiments, the first probe is immobilized on the surface and the second probe is in solution. The surface may, for example, be the surface of a flow cell.
In a step 1001, a target is hybridized to a first probe immobilized on a surface. For example, a first probe 1010a is immobilized on a surface 1015 via an anchor element 1020. In one example, anchor element 1020 is a surface bound primer. The surface bound primer may, for example, be a primer on a sequencing flow cell. A process for anchoring a probe (or a segment thereof) on a surface bound primer is described below with reference to
First probe 1010a may be used as a capture element for recognizing and binding a target. For example, a solution that may include a DNA target 1025 is added and a hybridization reaction is performed to bind target 1025 to first probe 1010a.
In a step 1002, the target is hybridized to a second probe. For example, a second probe 1010b that includes a sequence for recognizing and binding target 1025 is added and a hybridization reaction is performed to hybridize second probe 1010b to target 1025.
In a step 1003, the dual probe is ligated to link the first probe and the second probe to produce a modified probe immobilized on the surface. For example, a ligation reaction is performed to link first probe 1010a and second probe 1010b to produce a modified probe 1030.
In some cases, a gap-fill extension/ligation reaction is used to link first probe 1010a and second probe 1010b to produce the modified probe.
In some cases, second probe 1010b may further include a surface oligonucleotide adapter for binding to another surface bound primer.
The disclosure provides a process for preparing a surface for binding to a target or to a component of an assay of the invention. Surface modifications may serve a dual purpose. For example, a surface modification may (i) capture the target of interest and (ii) initiate the amplification of a probe or a portion thereof on the surface. In another example, a surface modification may (i) capture a component of the assay (e.g., a circular modified probe), and (ii) initiate an RCA reaction to generate a nanoball product.
A surface bound primer may be enzymatically modified to include a capture sequence. A capture sequence may be a target-specific probe or a sequence that is specific for a component of an assay. A surface bound primer may be enzymatically modified to include a probe or a portion thereof (e.g., a probe arm or a primer binding site). For example, a splint oligonucleotide that includes a segment that is complementary to a surface bound primer and a segment that is complementary to a probe (or a portion thereof) may be hybridized to the primer and used to template the synthesis of a surface bound probe. In one example, the surface bound probe is one arm of a dual probe.
In a step 1101, a surface is provided with a surface bound primer. For example, a primer 1110 is bound to a surface 1115. Surface 1115 may, for example, be the surface of a flow cell.
In a step 1102, a splint oligonucleotide is hybridized to the surface bound primer. For example, a splint 1120 that includes a segment 1122 that is complementary to primer 1110 and a capture segment 1124 is hybridized to primer 1110. In one example, capture segment 1124 is one arm of a dual capture probe.
In a step 1103, a primer extension reaction is performed to synthesize the surface bound probe. For example, in the primer extension reaction, splint 1120 is used to template the synthesis of a capture segment 1124 extending from primer 1110 to produce a surface bound probe arm 1124a.
Amplification may be by any method of amplification, including for example, on-surface PCR, isothermal amplification, rolling circle amplification, and/or ultrarapid amplification.
Surface based amplification may be performed using PCR with surface-anchored primers (e.g., Illumina bridge amplification technology) or recombinase polymerase amplification (RPA) (e.g., ExAmp technology).
Clonally amplified material may be a nanoball or a DNA cluster (e.g., Illumina surface-based amplification).
An amplification strategy may include adding a second surface adapter to a probe. The second surface adapter may be complementary to a second primer on a flow cell surface (e.g., a bridge amplification primer). The second surface adapter may, for example, be added to a probe during the ligation or gap-fill ligation event or added separately by PCR or through its own ligation to a probe. For example, an amplification strategy may include using the splint ligation approach described with reference to
Surface adapter 1218 may be complementary to a second primer 1230 on surface 1220. Second primer 1230 may be a primer used in a bridge amplification reaction. Probe structure 1210b may include first probe element 1212 and second probe element 1214 that are separated by an adapter 1216, and surface adapter 1218. A bridge amplification reaction (see
An amplification strategy may include adding a restriction enzyme site in a probe. For example, the probe may include a restriction enzyme site that when hybridized with a complementary oligonucleotide provides a double-stranded site for a restriction endonuclease to cleave the probe, rendering a linear strand. The linear strand may be amplified for downstream processing, e.g., for sequencing. For example, the linear strand may be captured on a flow cell and amplified by bridge amplification (e.g., Illumina bridge amplification technology) or recombinase polymerase amplification (RPA) (e.g., ExAmp technology).
The probe may include surface primers or surface adapter sequences that are complementary to surface bound primers of a flow cell. The adapter sequences may be linked to or adjacent to the restriction site, so that when the site is cut by a restriction enzyme the linear strand is ready for sequencing. As noted, other forms of cleavage are possible, such as CRISPR mediated cleavage or any other double-stranded break inducing protein.
Similarly, a nanoball may include surface primers or sequencing adapters linked to or adjacent to a restriction site, so that when the site is cut by a restriction enzyme the linear strands are released ready for sequencing. As noted, other forms of cleavage are possible, such as CRISPR mediated cleavage.
In another embodiment, a nanoball with adapter sequences complementary to surface bound primers may be seeded directly onto the surface without cleaving. Amplification may proceed through bridge amplification (e.g., Illumina bridge amplification technology) or recombinase polymerase amplification (RPA) (e.g., ExAmp technology) initiated directly.
Rolling circle amplification (RCA) may be used to produce nanoballs as part of the assays of the invention. An RCA reaction may be performed as a surface-bound reaction. For example, RCA may be initiated by an oligonucleotide bound to a surface (e.g., beads, flow cells, microwell, or nanowells). Any method may be used to bind the oligonucleotide to the surface. In one example, the oligonucleotide may be covalently bound to the surface.
In another example, a cation-coated surface (e.g., beads, flow cells, microwells, or nanowells) may be used to capture nanoballs. In one example, the cation-coated surface may be a polylysine-coated surface.
In another example, a streptavidin-coated surface (e.g., beads, flow cells, microwells, or nanowells) may be used to capture nanoballs. In this approach, biotin-linked deoxynucleotides may be incorporated into the nanoballs during RCA. The nanoballs will then be bound to the surface by a biotin-streptavidin linkage.
In another embodiment, biotin linked RCA primers may be bound to a surface by a streptavidin-biotin linkage and used to initiate an RCA reaction as described above with reference to
Following the formation of a nanoball, a determination may be made with respect to the identity of the code. Prior to making the determination, various secondary processing steps are possible within the scope of the assays described herein. The probe may include various elements that facilitate secondary processing steps. Examples include restriction endonuclease sites and CRISPR sites.
The nanoball may be converted to double-stranded DNA (dsDNA) prior to fragmentation. The dsDNA nanoball may be fragmented. In one embodiment, the probe includes restriction sites which are replicated in the nanoball, and the nanoball is fragmented using a restriction enzyme having specificity for the restriction sites.
CRISPR may be used to fragment the nanoball at specific sites.
Random fragmentation of nanoballs may be performed, using known fragmentation techniques.
Tagmentation may be performed on the nanoball, and the tagmentation may be used to add sequencing adapters.
This disclosure provides a variety of techniques for amplifying and preparing circularized probes for sequencing. In certain embodiments, amplification and preparation for sequencing may be performed sequentially (e.g., PCR+primer ligation). In certain embodiments, amplification and preparation for sequencing may be performed in a single reaction (e.g., adapter addition via PCR). Addition of sequencing adapters may be performed with or without RCA amplification of circularized probes.
In one embodiment, sequencing adapters are added via PCR. In this case, amplification and preparation for sequencing may be a single step. Depending on the probe design, the code, UMI, and index may be read in a single step or in two separate reads with a dehybridization step.
In one embodiment, RCA products (nanoballs) may be fragmented with restriction endonucleases (RE) to yield a multitude of code-containing single stranded nucleic acids. The single-stranded nucleic acids (i.e., the RE reaction products) may then be prepared for sequencing by ligation to adapter sequences.
In one embodiment, sequencing adapters may be added by transposomes that simultaneously fragment double-stranded DNA and add adapters.
As discussed elsewhere in the application, the assays of the invention include a transformation step. Typically, the transformation involves circularization of a probe when a target is present (e.g., by ligation or gap-fill ligation).
The circular modified probe shown in
In some embodiments, the RCA products (nanoballs) may be sequenced directly. In some embodiments, sequencing adapters may be added by PCR amplification, followed by clustering and sequencing.
In another embodiment, the probes of the invention may include restriction sites. The probes may be designed with restriction sites, or the restriction sites may be added to the probes as part of the assay process. The restriction sites will be amplified into the nanoball and will provide multiple sites at which to cut the nanoball into fragments.
Referring to panel “B”, restriction sites consist of a recognition sequence and flanking bases to ensure that strands remain hybridized after cleavage. Flanking sequences (NNNNNN) may be of length ranging from about 5 to about 50 bases and can be designed to minimize interactions with other probe components and tune the melting temperature (Tm). In this example, the flanking sequences include five bases (N). The RS sequences can be used as an SBS primer such that sequencing begins with the code or may include a spacer region that is read prior to the code.
Digestion of nanoball 1530 hybridized to RS complementary sequences 1547 yields many code-containing DNA fragments with termini that contain single-stranded DNA overhangs or “sticky ends”. The digestion products may be further processed for sequencing. For example, adapters may be ligated to the sticky ends resulting from the restriction digestion.
Alternatively, the ends may be blunt ended (i.e., the single-stranded overhangs removed) and prepared for ligation to adapters. Blunt ended fragments may then be processed via typical sequencing sample preparation protocols such as A-tailing and adapter ligation.
An additional embodiment includes using a primer and polymerase to create RCA products where the entire concatemer is double stranded. This structure can then be processed via the restriction endonuclease procedure described above.
Another embodiment includes employing hyperbranched RCA to create many double stranded, code-containing sequences that can be processed via the restriction endonuclease procedure described above.
In certain embodiments, the restriction endonuclease may be a member of the cas family of proteins or a derivative thereof. These proteins recognize longer sequences of DNA, making them more specific.
In an additional embodiment, circularized probes may be prepared for sequencing without RCA. In certain embodiments, the nanoballs of the invention may be compacted prior to sequencing. Rolling circle amplification produces linear concatemers of single-stranded DNA. When the substrate for RCA is a circularized probe, these concatemers may contain 100s-1000s of copies of a code. When preparing RCA products for sequencing, it is useful to compact them. The compacting may produce spherical structures. The compacted structures can increase localization of signal.
Compaction of RCA products into spherical nanoballs can be accomplished by a variety of techniques. In one embodiment, cationic additives that condense high molecular weight DNA (e.g., spermidine, Mg ions, cationic polymers) may be used. The compactness of a spherical nanoball may be tuned by controlling the concentration of the cationic reagent used. The concentration of the cationic reagent used may be selected to avoid aggregation of multiple nanoballs.
In one embodiment, multivalent oligonucleotide sequences that crosslink sites on RCA products may be used to compact RCA products into spherical nanoballs. The RCA binding sites may be separated by a nucleic acid or polymeric linker to control the degree of compaction. The compactness of the spherical nanoball may, for example, be tuned by controlling the degree of crosslinking in the RCA product.
In one embodiment, incorporation of modified nucleotides followed by crosslinking may be used to compact RCA products into spherical nanoballs. Examples of modified nucleotides that may be used include biotinylated nucleotides that bind to streptavidin proteins and nucleotides that covalently react with multifunctional linkers (e.g., amino nucleotides and NHS-terminated linkers). The compactness of the spherical nanoball may, for example, be tuned by controlling the degree of crosslinking in the RCA product.
In certain embodiments, the assays of the invention make use of nanopore sequencing. A nanoball or a circular modified probe may be sequenced using nanopore sequencing. Various nanopore sequencing sample preparation techniques are known in the art. Amplification is optional. Various components required for other sequencing techniques, such as sequencing primers, may be omitted from the probe. Purification can be accomplished using, for example, SPRI beads or BluePippen. Oxford Nanopore Technologies, Inc. (Oxford, UK) provides kits for sample preparation. Examples include Ligation Sequencing Kit, Native Barcoding Kit 96, and Rapid Barcoding Kit.
In certain embodiments, it may be useful to further amplify RCA products prior to sequencing. For example, in applications that use cell-free DNA (cfDNA) as the input where the analyte number may be low, it may be useful to amplify the RCA product prior to sequencing. In one embodiment, a circle-to-circle amplification approach may be used to produce multiple RCA products from one initial RCA product by monomerization of the concatemer (i.e., cleavage to unit length fragments), recircularization of the unit length fragments (i.e., monomers) and amplification of the newly generated circles in a second RCA reaction to produce multiple RCA product copies for further processing or sequencing. The restriction enzyme approach described with reference to
In a step 1701, a probe is hybridized to a target and circularized to yield a circular modified probe. For example, a probe 1710 that includes a code 1712, and a restriction site (not shown) is hybridized to target 1715. A ligation reaction is then performed to circularize probe 1710 to produce a circular modified probe 1720.
In a step 1702, the circular modified probe 1720 is amplified in an RCA reaction to generate a nanoball product 1725. During amplification, the restriction site is amplified into the nanoball and provides multiple sites at which to cut nanoball 1725 into fragments.
In a step 1703, the nanoball product is cleaved to produce multiple unit sized fragments each comprising the code. For example, nanoball 1725 is cleaved at the restriction sites to produce multiple unit size fragments 1730 each comprising code 1712. The cleavage reaction may, for example, be performed as describe with reference to
In a step 1704, the unit size fragments are amplified in a PCR reaction to generate multiple double-stranded fragments. For example, indexed amplification primers 1732 are hybridized to unit size fragments 1730 and a PCR reaction is performed to produce multiple unit size fragments 1735 that include code 1712 and the indexed amplification primer 1732.
In a step 1705, the amplified unit size fragments are circularized to generate circular unit size fragments. For example, an end-to-end joining oligonucleotide 1740 that is complementary to sequences in amplification primer 1732 is hybridized to unit size fragment 1730 and an end-to-end ligation reaction is performed to generate circular unit size fragments 1735 comprising the code.
In a step 1706, the circular unit size fragments are amplified in a second RCA reaction to produce multiple nanoball copies for further processing or sequencing. For example, circular unit size fragments 1735 are amplified in an RCA reaction to produce multiple nanoballs 1745 each comprising code 1712 and indexed amplification primers 1732.
In an embodiment of process 1700 of
In a step 1801, a probe is hybridized to a target and circularized to yield a circular modified probe. For example, a probe 1810 that includes target recognition sequences (not shown), a code 1812 and a restriction site (not shown) is hybridized to a target 1715. A ligation reaction is then performed to circularize probe 1810 to produce a circular modified probe 1820.
In a step 1802, the circular modified probe 1820 is amplified in an RCA reaction to generate a nanoball product 1825. During amplification, the restriction site is amplified into the nanoball and provides multiple sites at which to cut nanoball 1825 into fragments.
In a step 1803, the nanoball product is cleaved to produce multiple unit sized fragments each comprising the code. For example, nanoball 1825 is cleaved at the restriction sites to produce multiple unit size fragments 1830 each comprising code 1812. The cleavage reaction may, for example, be performed as describe with reference to
In a step 1804, the unit size fragments are circularized to generate circular unit size fragments. For example, a splint oligonucleotide 1840 that is complementary to the target recognition sequences in unit size fragments 1830 is hybridized to the fragments and a ligation reaction is performed to generate circular unit size fragments 1835 comprising the code.
In a step 1805, the circular unit size fragments are amplified in a second RCA reaction to produce multiple nanoball copies for further processing or sequencing. For example, circular unit size fragments 1835 are amplified in an RCA reaction to produce multiple nanoballs 1845 each comprising code 1812.
Examples of sequencing techniques suitable for use with the assays disclosed herein include nanopore sequencing, next-generation sequencing, massively parallel sequencing, Sanger sequencing, sequencing by synthesis (SBS), pyrosequencing, sequencing by hybridization, single molecule real-time sequencing, SOLID, and sequencing by ligation.
In some embodiments, a process for circularizing a probe may include a gap-fill ligation reaction that may be used to circularize the probe and capture an unknown region of the target that may then be sequenced along with the code.
In some embodiments, an unknown region of a target sequence may be captured by a probe transformation reaction and sequenced along with the code.
In a step 1910, a probe is hybridized to a target and circularized in a gap-fill ligation reaction that captures an unknown region of the target sequence. For example, a probe 1910 that includes a code 1912 (among other elements not shown) and a pair of target recognition elements 1914 (e.g., 1914a and 1914b) is hybridized to a target analyte 1920. Target 1920 may include a region 1922 comprising an unknown sequence. Target recognition elements 1914a and 1914b recognize and bind to target 1920 at sites flanking region 1922. A gap-fill ligation reaction (indicated by dashed arrow) is performed to copy region 1922 into probe 1910 and circularize the probe to yield a circular modified probe (not shown) comprising the unknown region of target 1920. The ligation reaction may be followed by an exonuclease digestion step to remove unligated probes 1940 and target.
In a step 1915, the circular modified probe is amplified in an RCA reaction to form an RCA product 1925 comprising multiple copies of the unknown region 1922 and the code 1912 (among other sequences). The RCA product 1925 may be sequenced directly or sequencing adapter may be added by PCR amplification, followed by clustering and sequencing as described herein above.
The assays provide a readout that can be measured alongside the readout of various molecular assays that may be performed in parallel, thereby enabling a multiomic platform for the analysis of different target analytes in a sample.
Examples of target analytes include, but are not limited to, proteins, nucleic acids (e.g., DNA and RNA), metabolites, glycosylation, exosomes, viruses, bacteria, and cells (e.g., circulating tumor cells). DNA targets include single nucleotide variants (SNVs), insertion/deletions (indels), and methylated nucleotides. An RNA target may be a splice variant.
In one embodiment, an encoded assay may be performed for the analysis of a set of nucleic acid targets in a sample.
In one embodiment, the analyte is DNA. In an encoded assay, a set of DNA targets may be targeted for detection of a single nucleotide difference relative to a reference nucleotide. A single nucleotide difference may be a change in the methylation status of a nucleotide at a target site of interest. In another example, a single nucleotide difference may be a change in nucleotide usage at a target site of interest, i.e., a single nucleotide polymorphism (SNP).
In one embodiment, the analyte is RNA. In an encoded assay, an RNA sample may, for example, be processed in a reverse transcription reaction to generate cDNA molecules for detection of a set of targets of interest. An encoded RNA assay may, for example, be used to detect and count RNA targets of interest in a sample. In another example, an encoded RNA assay may be used to detect alternative splicing variants for a target of interest.
At a step 2010, a sample is collected. For example, a blood or saliva sample may be collected. In one example, a whole blood sample may be collected and processed to separate the plasma fraction from the cellular components of whole blood.
At a step 2015, analyte extraction, concentration, conversion, and/or purification processes are performed. In this example, the analyte is DNA. DNA (e.g., cell-free DNA) in the plasma sample may be extracted, purified, and concentrated for analysis. A proteinase K (ThermoFisher, Waltham, MA) digestion step may be used to digest proteins present in the plasma sample. In some cases, a heat denaturation step (e.g., 94-98° C. for 20-30 seconds) may be used to denature double-stranded DNA into single-stranded nucleic acid. A bead-based extraction and concentration protocol may be used to capture single-stranded DNA in the plasma sample. In some embodiments, the bead-based extraction protocol uses magnetically responsive nucleic acid capture beads. The bead-bound DNA may be released from the capture beads using an elution buffer (or other elution means suitable to the capture bead used) to produce a processed DNA sample for analysis.
In one embodiment, the DNA sample may be further processed in a bisulfite conversion reaction for analysis of the methylation status of a set of targets in the sample.
At a step 2020, the processed DNA sample is transferred into an analysis cartridge.
At a step 2025, a recognition event for each target in a set of targets is performed. For example, each target is uniquely recognized by and bound to a recognition element associated with a code (and optionally other elements). In one example, the recognition event for the set of targets uses a panel of coded padlock probes. In another example, the recognition event for the set of targets uses a panel of molecular inversion probes. The recognition event yields a set of coded targets comprising the target and the recognition element.
At a step 2030, a transformation event for each recognition element of the set of coded targets is performed. For example, in the transformation event, a ligation or a gap-fill ligation may produce the modified recognition element, i.e., a version of the recognition element that is ligated or gap-filled. In one example, transformation of a modified padlock probe in a ligation or gap-fill ligation reaction generates a circular molecule. In some cases, an exonuclease cleanup step may be used following the transformation event to digest any remaining single stranded nucleic acid, such as unreacted coded padlock probes, amplification primers, and single stranded target sequences. The transformation event yields a set of modified recognition elements comprising the code.
At a step 2035, an amplification event for each code of the set of modified recognition elements is performed. In one example, the amplification event may be a rolling circle amplification (RCA) reaction to generate a set of target-specific nanoballs. The amplification event yields a set of amplified codes (among other elements).
At a step 2040, a decoding event for each amplified code of the set of amplified codes is performed to identify the code. In one example, the code may be decoded by sequencing the code (and optionally other elements). The detection event detects the code as a surrogate for detection of the targeted analyte. Decoding by sequencing may in some cases make us of soft decoding.
At a step 2045, using the code information (and optionally other elements) from step 445, bioinformatics is performed.
In some embodiments, the amplification event (step 435) and the detection event (step 440) may be combined in a single step.
In some embodiments, of workflow 2000, a sequencing library comprising the codes (among other elements) may be generated. The library may be sequenced to identify codes associated with a target of interest. In one embodiment, a sequencing library may be generated from a circularized padlock probe (step 2030). The padlock probe library may be sequenced to identify the code associated with the target of interest.
A sequencing library comprising the codes (among other elements) may be generated from a set of target-specific nanoballs (step 2035). The nanoball library may be sequenced to identify codes associated with targets of interest.
In a step 2110, recognition and transformation events (steps 2025 and 2030) for each target in a set of targets of interest is performed to yield a set of modified recognition elements comprising the code. For example, a set of coded padlock probes 2112 that include target-specific recognition elements associated with a code may be used. The transformation event may include a ligation or a gap-fill ligation reaction to produce a circularized modified probe comprising the code. In the transformation event, only the coded padlock probes 2112 that hybridize to a target sequence of interest with no mismatches may be ligated to yield a circular modified probe comprising the code. In this example, a single modified probe 2114 is shown, but any number of modified probes 2114 may be generated to yield a set of modified probes 2114.
In a step 2115, an amplification event for each code of the set of modified recognition elements is performed. For example, modified probe 2114 may be amplified in a rolling circle amplification (RCA) to generate a nanoball product 2116. In this example, a single nanoball 2116 is shown, but any number of nanoballs may be generated corresponding to the number of circular modified probes present to yield a set of nanoballs comprising the codes.
In a step 2120, a sequencing library is generated from the nanoball product. For example, 25 cycles of amplification may be used to add sequencing adapters and sample index sequences (among other optional sequences) to the code sequence generating a sequencing library 2122 that includes a set of codes. Sequencing library 2122 may then loaded onto a sequencing flow cell (e.g., a MiSeq flow cell) for next generation sequencing (NGS).
In a step 2125, a detection event for each code of the set of codes is performed. For example, the library is sequenced using an NGS sequencing protocol to identify the codes (and other elements (e.g., sample index, UMIs)) associated with the set of targets of interest. The code data may then be used as a digital count of the target-specific detection events.
An example of using the NGS readout from a nanoball sequencing library for counting detection events is describe below with reference to
A set of nanoballs (step 2035) may be directly sequenced to identify codes associated with the set of targets of interest. The code data may then be used as a digital count of the target-specific detection events.
In one embodiment, the nanoballs may be immobilized onto the surface of a sequencing flow cell for direct sequencing on the nanoballs. The nanoballs may be immobilized onto the flow cell surface using an immobilization agent. In one example, the immobilization agent is a surface bound oligonucleotide that is complementary to a sequence on the nanoball. In another example, the immobilization agent is a polypeptide.
To facilitate immobilization of a nanoball on a flow cell surface for direct sequencing, a recognition element associated with a code (i.e., an encoded probe) may include a palindrome sequence that is incorporated into the nanoball to create a secondary structure that compacts (collapses) the nanoball. The compacted nanoball provides a structure that may be more readily sequenced.
In a step 2210, recognition and transformation events (steps 2025 and 2030) for each target in a set of targets of interest is performed to yield a set of modified recognition elements comprising the code. For example, a set of coded padlock probes 2212 that include target-specific recognition elements associated with a code may be used. The transformation event may include a ligation or a gap-fill ligation reaction to produce a circularized modified probe comprising the code. In the transformation event, only the coded padlock probes 2212 that hybridize to a target sequence of interest with no mismatches may be ligated to yield a circular modified probe comprising the code. In this example, a single modified probe 2214 is shown, but any number of modified probes 2214 may be generated to yield a set of modified probes 2214.
In a step 2215, an amplification event for each code of the set of modified recognition elements is performed. For example, modified probe 2214 may be amplified in a rolling circle amplification (RCA) to generate a nanoball product 2216. In this example, a single nanoball 2216 is shown, but any number of nanoballs may be generated corresponding to the number of circular modified probes present to yield a set of nanoballs comprising the codes.
In a step 2220, the nanoball product is loaded onto the surface of a sequencing flow cell. For example, nanoball product 2216 is loaded onto a MiSeq flow cell. The nanoballs may be immobilized onto the flow cell surface using an immobilization agent. In one example, the immobilization agent is a surface bound oligonucleotide that is complementary to a sequence on the nanoball. In another example, the immobilization agent is a polypeptide.
In a step 2225, a detection event for each amplified code of the set of amplified codes is performed. For example, the nanoball is directly sequenced to identify codes associated with the set of targets of interest. The code data may then be used as a digital count of the target-specific detection events.
An example of using the readout from direct nanoball sequencing for counting detection events is describe below with reference to
Assays of the invention may be used to interrogate the methylation status of a target sequence of interest. In one embodiment, methylated cytosines in a target sequence of interest may be detected using assays that include a conversion reaction to detect methylated cytosines. In another embodiment, methylated cytosines in a target sequence of interest may be detected using assays that do not use a conversion reaction (i.e., conversion-free).
In one embodiment of a conversion assay for detection of methylated cytosines, a bisulfite conversion reaction that converts non-methylated cytosines to thymine (C→T) may be used.
For example, a methylated cytosine assay using encoded probes may include: (i) a bisulfite conversion reaction to convert non-methylated cytosine to thymine (C→T); (ii) a recognition event, in which a target nucleic acid is uniquely recognized and bound by a recognition element associated with a code (i.e., an encoded probe); (ii) a transformation event, in which a molecular transformation of the recognition element produces a modified recognition element comprising the code; and (iii) a detection event, that uses the code as a surrogate for detection of the target nucleic acid, e.g., by recognizing or decoding code (and optionally other elements).
In some embodiments, a methylated target site of interest may be interrogated using an encoded probe in combination with a transformation event that includes a ligation reaction to detect the methylation status of the target site.
In one embodiment, the recognition element (i.e., an encoded probe) may be a coded padlock probe that includes a 3′-terminal guanine (“G”). The transformation event (i.e., ligation) to generate the modified recognition element may only occur when the 3′-guanine is matched to a cytosine at a target site of interest.
In the recognition event, target sequence 2310 is recognized and bound by a recognition element associated with a code, i.e., padlock probe 2315. Padlock probe 2315 includes a 3′-terminal G nucleotide that base pairs with the target C at the CpG site of interest.
In the transformation event, ligation of padlock probe 2315 only occurs when the 3′-terminus of the padlock probe (i.e., a guanine “G”) is matched to the target site “C” of interest in target sequence 2310a to generate a circularized modified padlock probe 2320. No ligation occurs at the target site “T” in the bisulfite converted target sequence 2310b and consequently, transformation of padlock probe 2315 hybridized to target sequence 2310b to a circular modified probe does not occur. As described above with reference to
In one embodiment of process 2300, the recognition element (i.e., encoded probe) may be a molecular inversion probe that includes a 3′-terminal single base gap at a target site of interest. A gap-fill ligation event using only a single added nucleotide may be used to generate the modified recognition element comprising the code only when the nucleotide corresponding to the target site of interest is incorporated. This approach provides two forms of specificity to the assay: (i) the 3′-terminus of the probe must recognize and bind the interrogated site; and (ii) a single base extension reaction that incorporates the nucleotide corresponding to the target site of interest occurs.
In the recognition event, target sequences 2310a and 2310b are recognized and bound by a recognition element associated with a code, i.e., molecular inversion probe 2410. Molecular inversion probe 2410 includes a single 3′-terminal base gap that spans a target site of interest.
In the transformation event, a single dGTP nucleotide (“G”) is incorporated in molecular inversion probe 2410, thereby allowing ligation of the probe to generate a circularized modified probe 2420. No incorporation of dGTP occurs at the target site “T” in the bisulfite converted target sequence 2310b and consequently, transformation of molecular inversion probe 2410 hybridized to target sequence 2310b to a circular modified probe does not occur. As described above with reference to
In one embodiment of process 2400, the recognition element (i.e., a molecular inversion probe) may be designed to target two methylated cytosine sites of interest in a target sequence of interest. A gap-fill ligation event using all dNTPs may be used to generate the modified recognition element comprising the code. In this approach, both methylated cytosines must be present in the target nucleic acid molecule for ligation to occur. The requirement for multiple matches has several advantages: (i) it provides enhanced specificity relative to a single match at a methylated cytosine; (ii) the ability to discriminate between a disease state (e.g., all CpG sites in a region are methylated) and a healthy state (e.g., only some CpG sites are methylated) is increased by requiring multiple methylated cytosines for detection; and (iii) multiple matches can be used to correct for incomplete bisulfite conversion of unmethylated cytosines at the target site of interest.
In the recognition event, target sequences 2310 is recognized and bound by a recognition element associated with a code, i.e., molecular inversion probe 2515. Molecular inversion probe 2515 includes a 3′-probe arm that terminates at a first methylated cytosine site and a 5′-probe arm that terminates at a second methylated cytosine site. Both a 3′-GC match and a 5′-GC match during the recognition event (hybridization) are required for a transformation event to occur.
In the transformation event, a gap-fill ligation reaction using all dNTPs is performed. The 3′-GC match is required for polymerase extension in the gap-fill reaction. The 5′-GC match is required for ligation of the gap-filled molecule. Gap-fill ligation generates a circularized modified probe 2515. No incorporation of dGTP occurs at the target site “T” in the bisulfite converted target sequence (not shown) and consequently, transformation to a circular modified probe does not occur in non-methylated target sequences. As described above with reference to
The assays of the invention may be used in a genotyping assay. A target site of interest may be interrogated using an encoded probe in combination with a ligation reaction to detect a single nucleotide variant (SNV) of interest. In one example, the single nucleotide change may be a single nucleotide polymorphism (SNP).
In one embodiment, a genotyping assay using encoded probes may include: (i) a recognition event, in which a target nucleic acid is uniquely recognized and bound by a recognition element associated with a code (i.e., an encoded probe); (ii) a transformation event, in which a molecular transformation of the recognition element produces a modified recognition element comprising the code; and (iii) a detection event, that uses the code as a surrogate for detection of the target nucleic acid, e.g., by recognizing or decoding code (and optionally other elements).
In one embodiment, the recognition element (i.e., an encoded probe) may be a coded padlock probe that includes a 3′-terminal nucleotide that is matched to a SNV of interest. The transformation event (i.e., ligation) to generate the modified recognition element may only occur when the 3′-nucleotide is matched to the SNV at the target site of interest.
In one embodiment, the recognition element (i.e., an encoded probe) may be a molecular inversion probe that includes a 3′-terminal single base gap at a target site of interest. A gap-fill ligation event using only a single added nucleotide may then be used to generate the modified recognition element comprising the code only when corresponding nucleotide is incorporated.
The assays of the invention may be used in an RNA analysis assay.
In one embodiment, an RNA assay using encoded probes may include: (i) a reverse transcription reaction to convert RNA (e.g., polyA RNA) to cDNA; (ii) a recognition event, in which a target nucleic acid is uniquely recognized and bound by a recognition element associated with a code (i.e., an encoded probe); (ii) a transformation event, in which a molecular transformation of the recognition element produces a modified recognition element comprising the code; and (iii) a detection event, that uses the code as a surrogate for detection of the target nucleic acid, e.g., by recognizing or decoding code (and optionally other elements).
In some cases, the reverse transcription step (i) may be omitted and a ligase tolerant to DNA-RNA hybrid duplexes may be used in the transformation event. In one example, the ligase is SplintR® ligase (New England BioLabs).
In one embodiment, the encoded probe may be a padlock probe that includes a recognition element associated with a code.
In one embodiment, the encoded probe may be a molecular inversion probe that includes a recognition element associated with a code.
Assays of the invention may be used to detect and count RNA targets of interest in a sample.
Assays of the invention may be used to detect alternative splicing variants for a target of interest. In one example, splicing variants may be identified by placing one half of a recognition element (e.g., a coded padlock probe) on either side of the splice site. The transformation event (i.e., ligation) to generate the modified recognition element may only occur when the 3′-nucleotide is matched to the splice variant at the target site of interest.
In another example, splicing variants may be identified using a molecular inversion probe and an extension ligation reaction, wherein one probe arm spans the splice site.
Examples of tissues from which nucleic acid may extracted using the techniques described herein may include solid tissue, lysed solid tissue, fixed tissue samples, whole blood, plasma, serum, dried blood spots, buccal swabs, other forensic samples, fresh or frozen tissue, biopsy tissue, organ tissue, cultured or harvested cells, and bodily fluids.
In various embodiments, a sample may include a biological sample, such as whole blood, lymphatic fluid, serum, plasma, sweat, tear, saliva, sputum, cerebrospinal fluid, amniotic fluid, seminal fluid, vaginal excretion, serous fluid, synovial fluid, pericardial fluid, peritoneal fluid, pleural fluid, transudates, exudates, cystic fluid, bile, urine, gastric fluid, intestinal fluid, fecal samples, liquids containing single or multiple cells, liquids containing organelles, fluidized tissues, fluidized organisms, liquids containing multi-celled organisms, biological swabs and biological washes.
Samples may be provided directly from biological sources, or may be processed samples, such as samples which are enriched for targets, nucleic acids, or proteins from any of the foregoing sources.
The assays provide a readout that can be measured alongside the readout of various molecular assays that may be performed in parallel, thereby enabling a multiomic platform for the analysis of different target analytes in a sample. Examples of target analytes include, but are not limited to, proteins, nucleic acids (e.g., DNA and RNA), metabolites, glycosylation, exosomes, viruses, bacteria, and cells (e.g., circulating tumor cells). DNA targets include single nucleotide variants (SNVs), insertion/deletions (indels), and methylated nucleotides. An RNA target may be a splice variant.
Targets may include any biological markers. Examples include biological markers for screening or diagnosing cancer. In one embodiment, targets include a panel of methylation markers for diagnosing cancer. Examples of panels of probes which may be targeted are set for the in WO2019195268, entitled “Methylation markers and targeted methylation probe panels,” and WO2020069350A1, entitled “Methylation markers and targeted methylation probe panel,” the entire disclosures of which (including without limitation the sequence listings) are incorporated herein by reference. Targets may be obtained from biopsies, circulating nucleic acid samples, or nucleic acids from other samples.
In one embodiment, targets include a panel of single nucleotide variants (SNV) for diagnosing cancer.
The methods of the invention may be used for screening or diagnosing a subject for a disease, such as cancer or for selecting a therapy for treating a disease, such as selecting a therapy for treating a cancer.
In one embodiment, the methods of the invention may be used in a liquid biopsy application. In one example, a liquid biopsy assay may include determination of the methylation status and/or the variant usage of a set of target sequences.
In one embodiment, the methods of the invention may be used in a pathogen detection application. In one example, pathogen detection may include detect both a protein and nucleic acid (e.g., an RNA) associated with the pathogen.
In one embodiment, the methods of the invention may be used to monitor and/or determine complications associated with a transplantation procedure.
A sequencing library may be generated from a set of target-specific nanoballs. The nanoball library may be sequenced to decode the code associated with the target of interest. The sequence analysis may include, for example, demultiplexing sample indexes, bin code sequences, and filter the data based on UMIs. The code data may then be used as a digital count of the target-specific detection events.
To evaluate detecting target sequences using a nanoball library and NGS sequencing, methylation assay was performed. Nanoball libraries were generated using a synthetic DNA sample comprising 8 methylated or unmethylated target sequences. The experimental conditions were as follows: (i) the input target concentrations were 1, 10, 100, or 1000 fM; (ii) the total target probe concentration was 2 nM; (iii) 8 target-specific probes were used, each at 250 pM; and (iv) for each input target concentration, the recognition event (i.e., target and probe hybridization; at 65° C. for 15 minutes), an exonuclease cleanup step, and the amplification event (i.e., an RCA reaction) were performed in a single tube by the sequential addition of reaction reagents.
To evaluate the specificity of the NGS assay (i.e., on-target vs. off-target performance), the same data set was used, but additional samples were used that add either no targets present (i.e., background sample) or an excess of non-methylated target (Me (−)).
Referring now to
A set of nanoballs may be directly sequenced to identify codes associated with the set of targets of interest. The nanoballs may be immobilized onto a flow cell surface using an immobilization agent and then sequenced. The code data may then be used as a digital count of the target-specific detection events.
Referring now to photo A, the individual features generated from nanoballs that include the standard recognition element and are immobilized on the flow cell surface via oligonucleotide hybridization appear spread out, i.e., as streaks. This streaking of the features may be due to the unrolling of the nanoballs on the surface of the flow cell.
Referring now to photo B, the features generated from nanoballs that include the palindrome sequence and are immobilized on the flow cell surface via oligonucleotide hybridization appear more punctate, but still display some streaking. The density of features achieved using this approach was about 23k nanoballs/mm2.
Referring now to photo C, the features generated from nanoballs that include the palindrome sequence and are immobilized on the flow cell surface via the polypeptide are more punctate, i.e., more compacted. The density of features achieved using this approach was about 110k nanoballs/mm2. The compacted nanoballs on the flow cell surface provide a nanoball structure that may be more readily sequenced.
Sequencing on nanoballs allows counting of detection events. To demonstrate that the input target concentration directly correlates with the number of counts, a titration experiment was performed. Briefly, nanoballs were generated using one target sequence at a range of input concentrations (i.e., 100 pM, 10 pM, 1 pM, or no target (0 pM)) and 8 probes for methylation sites. Following the recognition and transformation events (i.e., hybridization and ligation), an exonuclease cleanup reaction was performed prior to performing the amplification event (i.e., an RCA reaction) to generate nanoballs. The nanoballs were then loaded onto a MiSeq flow cell and sequenced.
A soft decoding process may use decoding by hybridization (DBH).
In one embodiment of the invention, a method is provided for conducting an assay for a set of target analytes that includes: (a) performing a recognition and amplification event on a set of target analytes potentially present in a sample to generate a set of rolling circle amplification products (RCPs) from the target analytes or representative of the target analytes present in the sample, wherein each of the RCPs comprises multiple copies of a nucleic acid code from a set of codes, wherein each code comprises at least one segment encoding one or more symbols that correspond to a sequence of one or more nucleotides; (b) recording signal produced in response to interrogation of each segment of the codes; and (c) upon completion of the interrogation, determining a probably of the presence of each of the codes by applying a soft-decision probabilistic decoding algorithm to the recorded signal, wherein detecting the presence of the code is indicative of the presence of the target analyte.
Various modifications and variations of the disclosed methods, compositions and uses of the invention will be apparent to the skilled person without departing from the scope and spirit of the invention. Although the invention has been disclosed in connection with specific preferred aspects or embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific aspects or embodiments.
The present invention may be implemented using hardware, software, or a combination thereof and may be implemented in one or more computer systems or other processing systems. In one aspect, the invention is directed toward one or more computer systems capable of carrying out the functionality described herein.
For the purposes of this specification and appended claims, unless otherwise indicated, all numbers expressing amounts, sizes, dimensions, proportions, shapes, formulations, parameters, percentages, quantities, characteristics, and other numerical values used in the specification and claims, are to be understood as being modified in all instances by the term “about” even though the term “about” may not expressly appear with the value, amount or range. Accordingly, unless indicated to the contrary, the numerical parameters set forth in the following specification and attached claims are not and need not be exact, but may be approximate and/or larger or smaller as desired, reflecting tolerances, conversion factors, rounding off, measurement error and the like, and other factors known to those of skill in the art depending on the desired properties sought to be obtained by the presently disclosed subject matter. For example, the term “about,” when referring to a value can be meant to encompass variations of, in some embodiments ±100%, in some embodiments ±50%, in some embodiments ±20%, in some embodiments ±10%, in some embodiments ±5%, in some embodiments ±1%, in some embodiments ±0.5%, and in some embodiments ±0.1% from the specified amount, as such variations are appropriate to perform the disclosed methods or employ the disclosed compositions.
Further, the term “about” when used in connection with one or more numbers or numerical ranges, should be understood to refer to all such numbers, including all numbers in a range and modifies that range by extending the boundaries above and below the numerical values set forth. The recitation of numerical ranges by endpoints includes all numbers, e.g., whole integers, including fractions thereof, subsumed within that range (for example, the recitation of 1 to 5 includes 1, 2, 3, 4, and 5, as well as fractions thereof, e.g., 1.5, 2.25, 3.75, 4.1, and the like) and any range within that range.
Although the foregoing subject matter has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be understood by those skilled in the art that certain changes and modifications can be practiced within the scope of the appended claims.
This application is a continuation application of International Application No. PCT/US2022/037785, filed Jul. 21, 2022, which claims the benefit of U.S. Provisional Application No. 63/346,307, filed on May 26, 2022, U.S. Provisional Application No. 63/345,866, filed on May 25, 2022, U.S. Provisional Application No. 63/332,245, filed on Apr. 18, 2022, U.S. Provisional Application No. 63/329,781, filed on Apr. 11, 2022, and International Patent Application No. PCT/US2021/060647, filed on Nov. 23, 2021, each of which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63329781 | Apr 2022 | US | |
63332245 | Apr 2022 | US | |
63345866 | May 2022 | US | |
63346307 | May 2022 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2022/037785 | Jul 2022 | WO |
Child | 18391323 | US | |
Parent | PCT/US2021/060647 | Nov 2021 | WO |
Child | PCT/US2022/037785 | US |