This invention relates to the use of families of oligonucleotides tags, for example, in the sorting of molecules, identification of target nucleic acid molecules or for analyzing the presence of a mutation or polymorphism at a locus of each target nucleic acid molecule.
With the completion of the nucleic acid sequencing of the human genome, the demand for fast, reliable, cost-effective and user-friendly tests for genomics research and related drug design efforts has greatly increased. A number of institutions are actively mining the available genetic sequence information to identify correlations between genes, gene expression and phenotypes (e.g., disease states, metabolic responses, and the like). These analyses include an attempt to characterize the effect of gene mutations and genetic and gene expression heterogeneity in individuals and populations. Often, it is desirable to look at many different loci and alleles in parallel, generally in a single reaction.
Working in a highly parallel hybridization environment requiring specific hybridization imposes very rigorous selection criteria for the design of families of oligonucleotides that are to be used. The success of these approaches is dependent on the specific hybridization of a probe and its complement. Problems arise as the family of nucleic acid molecules cross-hybridize or hybridize incorrectly to the target sequences. While it is common to obtain incorrect hybridization resulting in false positives or an inability to form hybrids resulting in false negatives, the frequency of such results must be minimized. In order to achieve this goal certain thermodynamic properties of forming nucleic acid hybrids must be considered.
Design of families of oligonucleotide sequences that can be used in multiplexed hybridization reactions includes consideration for the thermodynamic properties of oligonucleotides and duplex formation that will reduce or eliminate cross hybridization behavior within the designed oligonucleotide set.
In the INVADER Assay and other 5′ nuclease assays, one system of multiplexing involved the use of different 5′ arms or “flaps” for different alleles or loci. The use of different flaps is one way of detecting many different sequences in a single “multiplex” reaction. Thus, it is desirable to have a large number of “flap” molecules incorporated into the INVADER Assay, with the “flap” sequences selected such that each flap is highly selective for its own complement sequence.
The present invention relates to the use of minimally cross-hybridizing oligonucleotide sequences in the INVADER Assay. The incorporation of these sequences into one of the two oligonucleotides that forms an invasive cleavage structure with a target nucleic acid, and subsequent structure-dependent cleavage of the oligonucleotide comprising the minimally cross-hybridizing sequence provides a way of using the INVADER Assay in massively parallel analysis of multiple genes, e.g., in a gene microarray. The present invention provides, for example, oligonucleotide probes for cleavage in INVADER assays, wherein the oligonucleotide probes comprise an a 5′ portion a minimally cross-hybridizing nucleic acid tag, such that at least a portion of the tag is released when the probe is cleaved.
In some embodiments, the present invention comprises a composition comprising a cleavage structure, said cleavage structure comprising:
In preferred embodiments, the tag identifiers 1-210 are selected from SEQ ID NOS: 1173-1382.
In some embodiments, the composition further comprises a 5′ nuclease. In preferred embodiments, the 5′ nuclease is a FEN-1 nuclease. In particularly preferred embodiments, the FEN-1 nuclease is a thermostable FEN-1 nuclease.
In some embodiments, the present invention provides a method for detecting the presence of a target nucleic acid molecule in a sample, comprising:
wherein said thermostable 5′ nuclease lacks synthesis activity, and wherein at least a portion of said first nucleic acid molecule is annealed to said first region of said target nucleic acid, and wherein at least a portion of said second nucleic acid molecule is annealed to said second region of said target nucleic acid;
In some embodiments, said non-target cleavage product comprises the 5′ portion of said first nucleic acid molecule, and detecting the cleavage of the cleavage structure comprises detecting annealing of the non-target cleavage product to a third nucleic acid molecule, wherein the third nucleic acid molecule comprises a nucleic acid sequence complementary to the tag identifier selected in step (a)(iv).
In some preferred embodiments, the tag identifiers 1-210 are selected from SEQ ID NOS: 1173-1382.
In some embodiments, the target nucleic acid comprises an amplified nucleic acid. In some preferred embodiments, the amplified nucleic acid is produced using a polymerase chain reaction.
In some embodiments, the detecting of the cleavage of said cleavage structure comprises detection of fluorescence. In preferred embodiments, the detecting of the cleavage of said cleavage structure comprises detection of fluorescence energy transfer. In some embodiments, the detecting of the cleavage of said cleavage structure comprises detection of radioactivity, luminescence, phosphorescence, fluorescence polarization, and/or charge.
In some embodiments, the target nucleic acid comprises DNA and in some embodiments the target nucleic acid comprises RNA.
In some embodiments, the 3′ portion of the second nucleic acid molecule comprises a 3′ terminal nucleotide not complementary to said target nucleic acid. In other embodiments, the 3′ portion of the second nucleic acid molecule comprises a 3′ terminal nucleotide complementary to said target nucleic acid.
In some embodiments, the 3′ portion of the second nucleic acid molecule consists of a single nucleotide. In some embodiments, the single nucleotide is not complementary to said target nucleic acid, while in other embodiments, the single nucleotide is complementary to said target nucleic acid.
In some embodiments, the 3′ terminal nucleotide of the second nucleic acid molecule comprises a naturally occurring nucleotide, while in other embodiments, the 3′ terminal nucleotide comprises a nucleotide analog.
In some embodiments, the 3′ portion of the second nucleic acid molecule is completely complementary to the target nucleic acid.
The present invention provides a composition comprising a cleavage structure, said cleavage structure comprising:
i) a target nucleic acid having a first region and a second region, wherein said second region is located adjacent to and downstream of said first region;
ii) a first nucleic acid molecule comprising a 3′ portion and a 5′ portion, wherein at least a portion of said 3′ portion of said first nucleic acid molecule is completely complementary to said first region of said target nucleic acid, and wherein said 5′ portion contains a tag identifier that is not base-paired to said target nucleic acid and that is selected from the group consisting of tag identifiers 211-1378, wherein
In preferred embodiments, the tag identifiers 211-1378 are selected from SEQ ID NOS: 1-1172.
In some embodiments, the composition further comprises a 5′ nuclease. In preferred embodiments, the 5′ nuclease is a FEN-1 nuclease. In particularly preferred embodiments, the FEN-1 nuclease is a thermostable FEN-1 nuclease.
The present invention provides a method for detecting the presence of a target nucleic acid molecule in a sample, comprising:
a) incubating a sample with a thermostable 5′ nuclease under conditions wherein a cleavage structure is formed, said cleavage structure comprising:
wherein said thermostable 5′ nuclease lacks synthesis activity, and wherein at least a portion of said first nucleic acid molecule is annealed to said first region of said target nucleic acid, and wherein at least a portion of said second nucleic acid molecule is annealed to said second region of said target nucleic acid;
b) cleaving said cleavage structure with said thermostable 5′ nuclease so as to generate non-target cleavage product; and
c) detecting the cleavage of said cleavage structure.
In some embodiments, the non-target cleavage product comprises the 5′ portion of the first nucleic acid molecule, and wherein the detecting the cleavage of the cleavage structure comprises detecting annealing of the non-target cleavage product to a third nucleic acid molecule, wherein the third nucleic acid molecule comprises a nucleic acid sequence complementary to the tag identifier selected in step (a) (iv). In preferred embodiments, the tag identifiers 211-1378 are selected from the group consisting of SEQ ID NOS: 1-1172.
In some embodiments, the target nucleic acid comprises an amplified nucleic acid. In some preferred embodiments, the amplified nucleic acid is produced using a polymerase chain reaction.
In some embodiments, the detecting of the cleavage of said cleavage structure comprises detection of fluorescence. In preferred embodiments, the detecting of the cleavage of said cleavage structure comprises detection of fluorescence energy transfer. In some embodiments, the detecting of the cleavage of said cleavage structure comprises detection of radioactivity, luminescence, phosphorescence, fluorescence polarization, and/or charge.
In some embodiments, the target nucleic acid comprises DNA and in some embodiments the target nucleic acid comprises RNA.
In some embodiments, the 3′ portion of the second nucleic acid molecule comprises a 3′ terminal nucleotide not complementary to said target nucleic acid. In other embodiments, the 3′ portion of the second nucleic acid molecule comprises a 3′ terminal nucleotide complementary to said target nucleic acid.
In some embodiments, the 3′ portion of the second nucleic acid molecule consists of a single nucleotide. In some embodiments, the single nucleotide is not complementary to said target nucleic acid, while in other embodiments, the single nucleotide is complementary to said target nucleic acid.
In some embodiments, the 3′ terminal nucleotide of the second nucleic acid molecule comprises a naturally occurring nucleotide, while in other embodiments, the 3′ terminal nucleotide comprises a nucleotide analog.
In some embodiments, the 3′ portion of the second nucleic acid molecule is completely complementary to the target nucleic acid.
Embodiments of the invention are described in this summary, and in the Detailed Description of the Invention, below, which is incorporated here by reference. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments.
To facilitate an understanding of the present invention, a number of terms and phrases are defined below:
As used herein, the terms “subject” and “patient” refer to any organisms including plants, microorganisms and animals (e.g., mammals such as dogs, cats, livestock, and humans).
As used herein, the term “INVADER assay reagents” refers to one or more reagents for detecting target sequences, said reagents comprising oligonucleotides capable of forming an invasive cleavage structure in the presence of the target sequence. In some embodiments, the INVADER assay reagents further comprise an agent for detecting the presence of an invasive cleavage structure (e.g., a cleavage agent). In some embodiments, the oligonucleotides comprise first and second oligonucleotides, said first oligonucleotide comprising a portion complementary to a first region of the target nucleic acid and said second oligonucleotide comprising a 3′ portion and a 5′ portion, said 5′ portion complementary to a second region of the target nucleic acid downstream of and contiguous to the first region. In some embodiments, the 3′ portion of the second oligonucleotide comprises a 3′ terminal nucleotide not complementary to the target nucleic acid. In preferred embodiments, the 3′ portion of the second oligonucleotide consists of a single nucleotide not complementary to the target nucleic acid. In some embodiments, the 3′ portion of the second oligonucleotide comprises a moiety that is not a nucleotide. In preferred embodiments, the 3′ portion of the second oligonucleotide comprises an aromatic ring moiety that is not a nucleotide. In some embodiments, the first oligonucleotide further comprises a 5′ portion comprising a tag sequence. In preferred embodiments, the tag sequence is a non-cross-hybridizing tag as described herein.
In some embodiments, INVADER assay reagents are configured to detect a target nucleic acid sequence comprising first and second non-contiguous single-stranded regions separated by an intervening region comprising a double-stranded region. In preferred embodiments, the INVADER assay reagents comprise a bridging oligonucleotide capable of binding to said first and second non-contiguous single-stranded regions of a target nucleic acid sequence. In particularly preferred embodiments, either or both of said first or said second oligonucleotides of said INVADER assay reagents are bridging oligonucleotides. See, e.g., U.S. Pat. No. 6,709,815, which is incorporated herein by reference.
In some embodiments, the INVADER assay reagents further comprise a solid support. For example, in some embodiments, the one or more oligonucleotides of the assay reagents (e.g., first and/or second oligonucleotide, whether bridging or non-bridging) is attached to said solid support. In some embodiments, the INVADER assay reagents further comprise a buffer solution. In some preferred embodiments, the buffer solution comprises a source of divalent cations (e.g., Mn2+ and/or Mg2+ ions). Individual ingredients (e.g., oligonucleotides, enzymes, buffers, target nucleic acids) that collectively make up INVADER assay reagents are termed “INVADER assay reagent components.”
In some embodiments, the INVADER assay reagents further comprise a third oligonucleotide complementary to a third region of the target nucleic acid upstream of the first region of the first target nucleic acid. In yet other embodiments, the INVADER assay reagents further comprise a target nucleic acid. In some embodiments, the INVADER assay reagents further comprise a second target nucleic acid. In yet other embodiments, the INVADER assay reagents further comprise a third oligonucleotide comprising a 5′ portion complementary to a first region of the second target nucleic acid. In some specific embodiments, the 3′ portion of the third oligonucleotide is covalently linked to the second target nucleic acid. In other specific embodiments, the second target nucleic acid further comprises a 5′ portion, wherein the 5′ portion of the second target nucleic acid is the third oligonucleotide. In some embodiments, the third oligonucleotide further comprises a 5′ terminal portion comprising a tag sequence. In preferred embodiments, the tag sequence is a non-cross-hybridizing tag as described herein. In still other embodiments, the INVADER assay reagents further comprise an arrestor molecule (e.g., arrestor oligonucleotide).
In some preferred embodiments, the INVADER assay reagents further comprise reagents for detecting a nucleic acid cleavage product. In some embodiments, one or more oligonucleotides in the INVADER assay reagents comprise a label. In some preferred embodiments, said first oligonucleotide comprises a label. In other preferred embodiments, said third oligonucleotide comprises a label. In particularly preferred embodiments, the reagents comprise a first and/or a third oligonucleotide labeled with moieties that produce a fluorescence resonance energy transfer (FRET) effect.
In some embodiments one or more the INVADER assay reagents may be provided in a predispensed format (i.e., premeasured for use in a step of the procedure without re-measurement or re-dispensing). In some embodiments, selected INVADER assay reagent components are mixed and predispensed together. In preferred embodiments, predispensed assay reagent components are predispensed and are provided in a reaction vessel (including but not limited to a reaction tube or a well, as in, e.g., a microtiter plate, or in a microfluidic card or chip). In certain preferred embodiments, the INVADER assay reagents are provided in microfluidic devices such as those described in U.S. Pat. Nos. 6,627,159; 6,720,187; 6,734,401; and 6,814,935, as well as U.S. Pat. Pub. 2002/0064885, all of which are herein incorporated by reference. In particularly preferred embodiments, predispensed INVADER assay reagent components are dried down (e.g., desiccated or lyophilized) in a reaction vessel.
In some embodiments, the INVADER assay reagents are provided as a kit. As used herein, the term “kit” refers to any delivery system for delivering materials. In the context of reaction assays, such delivery systems include systems that allow for the storage, transport, or delivery of reaction reagents (e.g., oligonucleotides, enzymes, etc. in the appropriate containers) and/or supporting materials (e.g., buffers, written instructions for performing the assay etc.) from one location to another. For example, kits include one or more enclosures (e.g., boxes) containing the relevant reaction reagents and/or supporting materials. As used herein, the term “fragmented kit” refers to delivery systems comprising two or more separate containers that each contains a subportion of the total kit components. The containers may be delivered to the intended recipient together or separately. For example, a first container may contain an enzyme for use in an assay, while a second container contains oligonucleotides. The term “fragmented kit” is intended to encompass kits containing Analyte specific reagents (ASR's) regulated under section 520(e) of the Federal Food, Drug, and Cosmetic Act, but are not limited thereto. Indeed, any delivery system comprising two or more separate containers that each contains a subportion of the total kit components are included in the term “fragmented kit.” In contrast, a “combined kit” refers to a delivery system containing all of the components of a reaction assay in a single container (e.g., in a single box housing each of the desired components). The term “kit” includes both fragmented and combined kits.
In some embodiments, the present invention provides INVADER assay reagent kits comprising one or more of the components necessary for practicing the present invention. For example, the present invention provides kits for storing or delivering the enzymes and/or the reaction components necessary to practice an INVADER assay. The kit may include any and all components necessary or desired for assays including, but not limited to, the reagents themselves, buffers, control reagents (e.g., tissue samples, positive and negative control target oligonucleotides, etc.), solid supports, labels, written and/or pictorial instructions and product information, software (e.g., for collecting and analyzing data), inhibitors, labeling and/or detection reagents, package environmental controls (e.g., ice, desiccants, etc.), and the like. In some embodiments, the kits provide a sub-set of the required components, wherein it is expected that the user will supply the remaining components. In some embodiments, the kits comprise two or more separate containers wherein each container houses a subset of the components to be delivered. For example, a first container (e.g., box) may contain an enzyme (e.g., structure specific cleavage enzyme in a suitable storage buffer and container), while a second box may contain oligonucleotides (e.g., INVADER oligonucleotides, probe oligonucleotides, control target oligonucleotides, etc.).
The term “label” as used herein refers to any atom or molecule that can be used to provide a detectable (preferably quantifiable) effect, and that can be attached to a nucleic acid or protein. Labels include but are not limited to dyes; radiolabels such as 32P; binding moieties such as biotin; haptens such as digoxgenin; luminogenic, phosphorescent or fluorogenic moieties; mass tags; and fluorescent dyes alone or in combination with moieties that can suppress (“quench”) or shift emission spectra by fluorescence resonance energy transfer (FRET). FRET is a distance-dependent interaction between the electronic excited states of two molecules (e.g., two dye molecules, or a dye molecule and a non-fluorescing quencher molecule) in which excitation is transferred from a donor molecule to an acceptor molecule without emission of a photon. (Stryer et al., 1978, Ann. Rev. Biochem., 47:819; Selvin, 1995, Methods Enzymol., 246:300, each incorporated herein by reference). As used herein, the term “donor” refers to a fluorophore that absorbs at a first wavelength and emits at a second, longer wavelength. The term “acceptor” refers to a moiety such as a fluorophore, chromophore, or quencher that has an absorption spectrum that overlaps the donor's emission spectrum, and that is able to absorb some or most of the emitted energy from the donor when it is near the donor group (typically between 1-100 nm). If the acceptor is a fluorophore, it generally then re-emits at a third, still longer wavelength; if it is a chromophore or quencher, it then releases the energy absorbed from the donor without emitting a photon. In some embodiments, changes in detectable emission from a donor dye (e.g. when an acceptor moiety is near or distant) are detected. In some embodiments, changes in detectable emission from an acceptor dye are detected. In preferred embodiments, the emission spectrum of the acceptor dye is distinct from the emission spectrum of the donor dye such that emissions from the dyes can be differentiated (e.g., spectrally resolved) from each other.
In some embodiments, a donor dye is used in combination with multiple acceptor moieties. In a preferred embodiment, a donor dye is used in combination with a non-fluorescing quencher and with an acceptor dye, such that when the donor dye is close to the quencher, its excitation is transferred to the quencher rather than the acceptor dye, and when the quencher is removed (e.g., by cleavage of a probe), donor dye excitation is transferred to an acceptor dye. In particularly preferred embodiments, emission from the acceptor dye is detected. See, e.g., Tyagi, et al., Nature Biotechnology 18:1191 (2000), which is incorporated herein by reference. Labels may provide signals detectable by fluorescence (e.g., simple fluorescence, FRET, time-resolved fluorescence, fluorescence polarization, etc.), radioactivity, colorimetry, gravimetry, X-ray diffraction or absorption, magnetism, enzymatic activity, characteristics of mass or behavior affected by mass (e.g., MALDI time-of-flight mass spectrometry), and the like. A label may be a charged moiety (positive or negative charge) or alternatively, may be charge neutral. Labels can include or consist of nucleic acid or protein sequence, so long as the sequence comprising the label is detectable.
In some embodiments a label comprises a particle for detection. In preferred embodiments, the particle is a phosphor particle. In particularly preferred embodiments, the phosphor particle is an up-converting phosphor particle (see, e.g., Ostermayer, F. W. Preparation and properties of infrared-to-visible conversion phosphors. Metall. Trans. 752, 747-755 [1971]). In some embodiments, rare earth-doped ceramic particles are used as phosphor particles. Phosphor particles may be detected by any suitable method, including but not limited to up-converting phosphor technology (UPT), in which up-converting phosphors transfer low energy infrared (IR) radiation to high-energy visible light. While the present invention is not limited to any particular mechanism, in some embodiments the UPT up-converts infrared light to visible light by multi-photon absorption and subsequent emission of dopant-dependant phosphorescence. See, e.g., U.S. Pat. No. 6,399,397, Issued Jun. 4, 2002 to Zarling, et al.; van De Rijke, et al., Nature Biotechnol. 19(3):273-6 [2001]; Corstjens, et al., IEE Proc. Nanobiotechnol. 152(2):64 [2005], each incorporated by reference herein in its entirety.
As used herein, the term “distinct” in reference to signals refers to signals that can be differentiated one from another, e.g., by spectral properties such as fluorescence emission wavelength, color, absorbance, mass, size, fluorescence polarization properties, charge, etc., or by capability of interaction with another moiety, such as with a chemical reagent, an enzyme, an antibody, etc.
As used herein, the terms “complementary” or “complementarity” are used in reference to polynucleotides (i.e., a sequence of nucleotides such as an oligonucleotide or a target nucleic acid) related by the base-pairing rules. For example, for the sequence “5′-A-G-T-3′,” is complementary to the sequence “3′-T-C-A-5′.” Complementarity may be “partial,” in which only some of the nucleic acids' bases are matched according to the base pairing rules. Or, there may be “complete” or “total” complementarity between the nucleic acids. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands. This is of particular importance in amplification reactions, as well as detection methods that depend upon binding between nucleic acids. Either term may also be used in reference to individual nucleotides, especially within the context of polynucleotides. For example, a particular nucleotide within an oligonucleotide may be noted for its complementarity, or lack thereof, to a nucleotide within another nucleic acid strand, in contrast or comparison to the complementarity between the rest of the oligonucleotide and the nucleic acid strand.
The term “homology” and “homologous” refers to a degree of identity. There may be partial homology or complete homology. A partially homologous sequence is one that is less than 100% identical to another sequence. In the context of this invention, pairs of sequences are compared with each other based on the amount of “homology” between the sequences. By way of example, two sequences are said to have a 50% “maximum homology” with each other if, when the two sequences are aligned side-by-side with each other so as to obtain the (absolute) maximum number of identically paired bases, the number of identically paired bases is 50% of the total number of bases in one of the sequences. (If the sequences being compared are of different lengths, then it would be of the total number of bases in the shorter of the two sequences.)
As used herein, the term “hybridization” is used in reference to the pairing of complementary nucleic acids. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is influenced by such factors as the degree of complementary between the nucleic acids, stringency of the conditions involved, and the Tm of the formed hybrid. “Hybridization” methods involve the annealing of one nucleic acid to another, complementary nucleic acid, i.e., a nucleic acid having a complementary nucleotide sequence. The ability of two polymers of nucleic acid containing complementary sequences to find each other and anneal through base pairing interaction is a well-recognized phenomenon. The initial observations of the “hybridization” process by Marmur and Lane, Proc. Natl. Acad. Sci. USA 46:453 (1960) and Doty et al., Proc. Natl. Acad. Sci. USA 46:461 (1960) have been followed by the refinement of this process into an essential tool of modern biology.
The complement of a nucleic acid sequence as used herein refers to an oligonucleotide which, when aligned with the nucleic acid sequence such that the 5′ end of one sequence is paired with the 3′ end of the other, is in “antiparallel association.” Certain bases not commonly found in natural nucleic acids may be included in the nucleic acids of the present invention and include, for example, inosine and 7-deazaguanine. Complementarity need not be perfect; stable duplexes may contain mismatched base pairs or unmatched bases. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length of the oligonucleotide, base composition and sequence of the oligonucleotide, ionic strength and incidence of mismatched base pairs.
As used herein, the term “Tm” is used in reference to the “melting temperature.” The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. Several equations for calculating the Tm of nucleic acids are well known in the art. As indicated by standard references, a simple estimate of the Tm value may be calculated by the equation: Tm=81.5+0.41(% G+C), when a nucleic acid is in aqueous solution at 1 M NaCl (see e.g., Anderson and Young, Quantitative Filter Hybridization, in Nucleic Acid Hybridization (1985). Other references (e.g., Allawi, H. T. & SantaLucia, J., Jr. Thermodynamics and NMR of internal G.T mismatches in DNA. Biochemistry 36, 10581-94 (1997) include more sophisticated computations which take structural and environmental, as well as sequence characteristics into account for the calculation of Tm.
The term “gene” refers to a DNA sequence that comprises control and coding sequences necessary for the production of an RNA having a non-coding function (e.g., a ribosomal or transfer RNA), a polypeptide or a precursor. The RNA or polypeptide can be encoded by a full length coding sequence or by any portion of the coding sequence so long as the desired activity or function is retained.
The term “wild-type” refers to a gene or a gene product that has the characteristics of that gene or gene product when isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the “normal” or “wild-type” form of the gene. In contrast, the term “modified”, “mutant” or “polymorphic” refers to a gene or gene product which displays modifications in sequence and or functional properties (i.e., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally-occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.
The term “oligonucleotide” as used herein is defined as a molecule comprising two or more deoxyribonucleotides or ribonucleotides, preferably at least 5 nucleotides, more preferably at least about 10-15 nucleotides and more preferably at least about 15 to 30 nucleotides. The exact size will depend on many factors, which in turn depend on the ultimate function or use of the oligonucleotide. The oligonucleotide may be generated in any manner, including chemical synthesis, DNA replication, reverse transcription, PCR, or a combination thereof. In some embodiments, oligonucleotides that form invasive cleavage structures are generated in a reaction (e.g., by extension of a primer in an enzymatic extension reaction).
Because mononucleotides are reacted to make oligonucleotides in a manner such that the 5′ phosphate of one mononucleotide pentose ring is attached to the 3′ oxygen of its neighbor in one direction via a phosphodiester linkage, an end of an oligonucleotide is referred to as the “5′ end” if its 5′ phosphate is not linked to the 3′ oxygen of a mononucleotide pentose ring and as the “3′ end” if its 3′ oxygen is not linked to a 5′ phosphate of a subsequent mononucleotide pentose ring. As used herein, a nucleic acid sequence, even if internal to a larger oligonucleotide, also may be said to have 5′ and 3′ ends. A first region along a nucleic acid strand is said to be upstream of another region if the 3′ end of the first region is before the 5′ end of the second region when moving along a strand of nucleic acid in a 5′ to 3′ direction.
When two different, non-overlapping oligonucleotides anneal to different regions of the same linear complementary nucleic acid sequence, and the 3′ end of one oligonucleotide points towards the 5′ end of the other, the former may be called the “upstream” oligonucleotide and the latter the “downstream” oligonucleotide. Similarly, when two overlapping oligonucleotides are hybridized to the same linear complementary nucleic acid sequence, with the first oligonucleotide positioned such that its 5′ end is upstream of the 5′ end of the second oligonucleotide, and the 3′ end of the first oligonucleotide is upstream of the 3′ end of the second oligonucleotide, the first oligonucleotide may be called the “upstream” oligonucleotide and the second oligonucleotide may be called the “downstream” oligonucleotide.
The term “primer” refers to an oligonucleotide that is capable of acting as a point of initiation of synthesis when placed under conditions in which primer extension is initiated. An oligonucleotide “primer” may occur naturally, as in a purified restriction digest or may be produced synthetically.
A primer is selected to be “substantially” complementary to a strand of specific sequence of the template. A primer must be sufficiently complementary to hybridize with a template strand for primer elongation to occur. A primer sequence need not reflect the exact sequence of the template. For example, a non-complementary nucleotide fragment may be attached to the 5′ end of the primer, with the remainder of the primer sequence being substantially complementary to the strand. Non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the sequence of the template to hybridize and thereby form a template primer complex for synthesis of the extension product of the primer.
The term “cleavage structure” as used herein, refers to a structure that is formed by the interaction of at least one probe oligonucleotide and a target nucleic acid, forming a structure comprising a duplex, the resulting structure being cleavable by a cleavage means, including but not limited to an enzyme. The cleavage structure is a substrate for specific cleavage by the cleavage means in contrast to a nucleic acid molecule that is a substrate for non-specific cleavage by agents such as phosphodiesterases which cleave nucleic acid molecules without regard to secondary structure (i.e., no formation of a duplexed structure is required).
The term “cleavage means” or “cleavage agent” as used herein refers to any means that is capable of cleaving a cleavage structure, including but not limited to enzymes. “Structure-specific nucleases” or “structure-specific enzymes” are enzymes that recognize specific secondary structures in a nucleic molecule and cleave these structures. The cleavage means of the invention cleave a nucleic acid molecule in response to the formation of cleavage structures; it is not necessary that the cleavage means cleave the cleavage structure at any particular location within the cleavage structure.
The cleavage means may include nuclease activity provided from a variety of sources including the CLEAVASE enzymes, the FEN-1 endonucleases (including RAD2 and XPG proteins), Taq DNA polymerase and E. coli DNA polymerase I. The cleavage means may include enzymes having 5′ nuclease activity (e.g., Taq DNA polymerase (DNAP), E. coli DNA polymerase I). The cleavage means may also include modified DNA polymerases having 5′ nuclease activity but lacking synthetic activity. Examples of cleavage means suitable for use in the method and kits of the present invention are provided in U.S. Pat. Nos. 5,614,402; 5,795,763; 5,843,669; 6,090,606; PCT Appln. Nos WO 98/23774; WO 02/070755A2; WO0190337A2; and WO03073067, each of which is herein incorporated by reference it its entirety.
The term “thermostable” when used in reference to an enzyme, such as a 5′ nuclease, indicates that the enzyme is functional or active (i.e., can perform catalysis) at an elevated temperature, i.e., at about 55° C. or higher.
The term “cleavage products” as used herein, refers to products generated by the reaction of a cleavage means with a cleavage structure (i.e., the treatment of a cleavage structure with a cleavage means).
The term “target nucleic acid,” when used in reference to an invasive cleavage reaction, refers to a nucleic acid molecule containing a sequence that has at least partial complementarity with at least a probe oligonucleotide and may also have at least partial complementarity with an INVADER oligonucleotide. The target nucleic acid may comprise single- or double-stranded DNA or RNA.
The term “non-target cleavage product” refers to a product of a cleavage reaction that is not derived from the target nucleic acid. As discussed above, in the methods of the present invention, cleavage of the cleavage structure generally occurs within the probe oligonucleotide. The fragments of the probe oligonucleotide generated by this target nucleic acid-dependent cleavage are “non-target cleavage products.”
The term “probe oligonucleotide,” when used in reference to an invasive cleavage reaction, refers to an oligonucleotide that interacts with a target nucleic acid to form a cleavage structure in the presence or absence of an INVADER oligonucleotide. When annealed to the target nucleic acid, the probe oligonucleotide and target form a cleavage structure and cleavage occurs within the probe oligonucleotide.
The term “INVADER oligonucleotide” refers to an oligonucleotide that hybridizes to a target nucleic acid at a location near the region of hybridization between a probe and the target nucleic acid, wherein the INVADER oligonucleotide comprises a portion (e.g., a chemical moiety, or nucleotide-whether complementary to that target or not) that overlaps with the region of hybridization between the probe and target. In some embodiments, the INVADER oligonucleotide contains sequences at its 3′ end that are substantially the same as sequences located at the 5′ end of a probe oligonucleotide.
The term “cassette,” when used in reference to an invasive cleavage reaction, as used herein refers to an oligonucleotide or combination of oligonucleotides configured to generate a detectable signal in response to cleavage of a probe oligonucleotide in an INVADER assay. In preferred embodiments, the cassette hybridizes to a non-target cleavage product (e.g., a minimally cross-hybridizing 5′ tag) from cleavage of the probe oligonucleotide to form a second invasive cleavage structure, such that the cassette can then be cleaved.
In some embodiments, the cassette is a single oligonucleotide comprising a hairpin portion (i.e., a region wherein one portion of the cassette oligonucleotide hybridizes to a second portion of the same oligonucleotide under reaction conditions, to form a duplex). In other embodiments, a cassette comprises at least two oligonucleotides comprising complementary portions that can form a duplex under reaction conditions. In preferred embodiments, the cassette comprises a label. In some embodiments, the cassette comprises a 5′ tag of the present invention. In particularly preferred embodiments, cassette comprises labeled moieties that produce a fluorescence resonance energy transfer (FRET) effect.
As used herein, the phrase “non-amplified oligonucleotide detection assay” refers to a detection assay configured to detect the presence or absence of a particular polymorphism (e.g., SNP, repeat sequence, etc.) in a target sequence (e.g. genomic DNA) that has not been amplified (e.g. by PCR), without creating copies of the target sequence. A “non-amplified oligonucleotide detection assay” may, for example, amplify a signal used to indicate the presence or absence of a particular polymorphism in a target sequence, so long as the target sequence is not copied.
The term “sequence variation” as used herein refers to differences in nucleic acid sequence between two nucleic acids. For example, a wild-type structural gene and a mutant form of this wild-type structural gene may vary in sequence by the presence of single base substitutions and/or deletions or insertions of one or more nucleotides. These two forms of the structural gene are said to vary in sequence from one another. A second mutant form of the structural gene may exist. This second mutant form is said to vary in sequence from both the wild-type gene and the first mutant form of the gene.
The term “nucleotide analog” as used herein refers to modified or non-naturally occurring nucleotides including but not limited to analogs that have altered stacking interactions such as 7-deaza purines (i.e., 7-deaza-dATP and 7-deaza-dGTP); base analogs with alternative hydrogen bonding configurations (e.g., such as Iso-C and Iso-G and other non-standard base pairs described in U.S. Pat. No. 6,001,983 to S. Benner); non-hydrogen bonding analogs (e.g., non-polar, aromatic nucleoside analogs such as 2,4-difluorotoluene, described by B. A. Schweitzer and E. T. Kool, J. Org. Chem., 1994, 59, 7238-7242, B. A. Schweitzer and E. T. Kool, J. Am. Chem. Soc., 1995, 117, 1863-1872); “universal” bases such as 5-nitroindole and 3-nitropyrrole; and universal purines and pyrimidines (such as “K” and “P” nucleotides, respectively; P. Kong, et al., Nucleic Acids Res., 1989, 17, 10373-10383, P. Kong et al., Nucleic Acids Res., 1992, 20, 5149-5152). Nucleotide analogs comprise modified forms of deoxyribonucleotides as well as ribonucleotides.
The term “sample” in the present specification and claims is used in its broadest sense. On the one hand it is meant to include a specimen or culture (e.g., microbiological cultures). On the other hand, it is meant to include both biological and environmental samples. A sample may include a specimen of synthetic origin.
Biological samples may be animal, including human, fluid, solid (e.g., stool) or tissue, as well as liquid and solid food and feed products and ingredients such as dairy items, vegetables, meat and meat by-products, and waste. Biological samples may be obtained from all of the various families of domestic animals, as well as feral or wild animals, including, but not limited to, such animals as ungulates, bear, fish, lagomorphs, rodents, etc.
Environmental samples include environmental material such as surface matter, soil, water and industrial samples, as well as samples obtained from food and dairy processing instruments, apparatus, equipment, utensils, disposable and non-disposable items. These examples are not to be construed as limiting the sample types applicable to the present invention.
An oligonucleotide is said to be present in “excess” relative to another oligonucleotide (or target nucleic acid sequence) if that oligonucleotide is present at a higher molar concentration that the other oligonucleotide (or target nucleic acid sequence). When an oligonucleotide such as a probe oligonucleotide is present in a cleavage reaction in excess relative to the concentration of the complementary target nucleic acid sequence, the reaction may be used to indicate the amount of the target nucleic acid present. Typically, when present in excess, the probe oligonucleotide will be present at least a 100-fold molar excess; typically at least 1 pmole of each probe oligonucleotide would be used when the target nucleic acid sequence was present at about 10 fmoles or less.
The term “nucleic acid sequence” as used herein refers to an oligonucleotide, nucleotide or polynucleotide, and fragments or portions thereof, and to DNA or RNA of genomic or synthetic origin that may be single or double stranded, and represent the sense or antisense strand. Similarly, “amino acid sequence” as used herein refers to peptide or protein sequence.
As used herein, the terms “purified” or “substantially purified” refer to molecules, either nucleic or amino acid sequences, that are removed from their natural environment, isolated or separated, and are at least 60% free, preferably 75% free, and most preferably 90% free from other components with which they are naturally associated. An “isolated polynucleotide” or “isolated oligonucleotide” is therefore a substantially purified polynucleotide.
As used herein, the term “non-cross-hybridization” refers to the absence of hybridization between two nucleic acids that are not perfect complements of each other.
As used herein, the term “cross-hybridization” refers to the hydrogen bonding of a single-stranded nucleic acid sequence that is partially but not entirely complementary to a single-stranded substrate.
As used herein, the term “minimal cross-hybridization” refers to low-level cross hybridization such that any cross hybridization detectable is of little or no consequence (e.g., in an experiment or assay).
As used herein, the term “tag” refers to an oligonucleotide or a portion of an oligonucleotide that comprising a non-cross-hybridizing or minimally cross-hybridizing sequence. In preferred embodiments, a tag sequence is not the same as, or complementary to, the portion of target nucleic acid recognized by (e.g., complementary to) a target-specific portion of a tag-containing oligonucleotide.
As used herein, the term “block sequence” refers to a symbolic representation of a sequence of blocks. In its most general form a block sequence is a representative sequence in which no particular value, mathematical variable, or other designation is assigned to each block of the sequence.
As used herein, the term “incidence matrix” refers to the well-defined term in the field of Discrete Mathematics. However, an incidence matrix cannot be defined without first defining a “graph”. In the method described herein, a subset of general graphs called simple graphs is used. Members of this subcategory are further defined as follows.
A simple graph G is a pair (V, E) where V represents the set of vertices of the simple graph and E is a set of un-oriented edges of the simple graph. An edge is defined as a 2-component combination of members of the set of vertices. In other words, in a simple graph G there are some pairs of vertices that are connected by an edge. In this application, a graph is based on nucleic acid sequences generated using sequence templates and vertices represent DNA sequences and edges represent a relative property of any pair of sequences.
The incidence matrix is a mathematical object that allows one to describe any given graph. For the subset of simple graphs used herein, the simple graph G=(V,E), and for a pre-selected and fixed ordering of vertices, V={v1,v2, . . . vn}, elements of the incidence matrix A(G)=[aij] are defined by the following rules:
(1) ai j=1 for any pair of vertices {vi,vj} that is a member of the set of edges; and
(2) aij=0 for any pair of vertices {vi,vj}that is not a member of the set of edges.
This is an exact unequivocal definition of the incidence matrix. In effect, one selects the indices: 1, 2, . . . n of the vertices and then forms an (n×n) square matrix with elements aij=1 if the vertices vi and vj are connected by an edge and aij=0 if the vertices vi and vj are not connected by an edge.
To define the term “class property” as used herein, the term “complete simple graph” or “clique” must first be defined. The complete simple graph is required because all sequences that result from the method described herein should collectively share the relative property of any pair of sequences defining an edge of graph G, for example not violating the threshold rule that is, do not have a “maximum simple homology” greater than a predetermined amount, whatever pair of the sequences are chosen from the final set. It is possible that additional “local” rules, based on known or empirically determined behavior of particular nucleotides, or nucleotide sequences, are applied to sequence pairs in addition to the basic threshold rule.
In the language of a simple graph, G=(V, E), this means in the final graph there should be no pair of vertices (no sequence pair) not connected by an edge (because an edge means that the sequences represented by vi and vj do not violate the threshold rule).
Because the incidence matrix of any simple graph can be generated by the above definition of its elements, the consequence of defining a simple complete graph is that the corresponding incidence matrix for a simple complete graph will have all off-diagonal elements equal to 1 and all diagonal elements equal to 0. This is because if one aligns a sequence with itself, the threshold rule is of course violated, and all other sequences are connected by an edge.
For any simple graph, there might be a complete subgraph. First, the definition of a subgraph of a graph is as follows. The subgraph Gs=(Vs, Es) of a simple graph G=(V, E) is a simple graph that contains the subsets of vertices Vs of the set V of vertices and inclusion of the set Vs into the set V is immersion (a mathematical term). This means that one generates a subgraph Gs=(Vs,Es) of a simple graph G in two steps. First select some vertices Vs from G. Then select those edges Es from G that connect the chosen vertices and do not select edges that connect selected with non selected vertices.
We desire a subgraph of G that is a complete simple graph. By using this property of the complete simple graph generated from the simple graph G of all sequences generated by the template based algorithm, the pairwise property of any pair of the sequences (violating/non-violating the threshold rule) is converted into the property of all members of the set, termed “the class property”.
By selecting a subgraph of a simple graph G that is a complete simple graph, this assures that, up to the tests involving the local rules described herein, there are no pairs of sequences in the resulting set that violate the threshold rule, also described above, independent of which pair of sequences in the set are chosen. This feature is called the “desired class property”.
The present invention thus includes reducing the potential for non cross-hybridization behavior by taking into account local homologies of the sequences and appears to have greater rigor than known approaches. For example, the method described herein involves the sliding of one sequence relative to the other sequence in order to form a sequence alignment that would accommodate insertions or deletions. (Kane et al., Nucleic Acids Res.; 28, 4552-4557: 2000).
The present invention relates to the use of non and minimally cross-hybridizing oligonucleotide sequences for use in the INVADER Assay. The incorporation of these sequences into one of the two oligonucleotides that forms an invasive cleavage structure with a target nucleic acid, and subsequent structure-dependent cleavage of the oligonucleotide comprising the minimally cross-hybridizing sequence provides a way of using the INVADER Assay in massively parallel analysis of multiple genes, e.g., in a gene microarray.
Invasive cleavage assays, or INVADER assays comprise forming a nucleic acid cleavage structure that is dependent upon the presence of a target nucleic acid and cleaving the nucleic acid cleavage structure so as to release distinctive cleavage products. 5′ nuclease activity, for example, is used to cleave the target-dependent cleavage structure and the resulting cleavage products are indicative of the presence of specific target nucleic acid sequences in the sample. When two strands of nucleic acid, or oligonucleotides, both hybridize to a target nucleic acid strand such that they form an overlapping invasive cleavage structure, as described below, invasive cleavage can occur. Through the interaction of a cleavage agent (e.g., a 5′ nuclease) and the upstream oligonucleotide, the cleavage agent can be made to cleave the downstream oligonucleotide at an internal site in such a way that a distinctive fragment is produced. Such embodiments have been termed the INVADER assay (Third Wave Technologies) and are described in U.S. Pat. Nos. 5,846,717, 5,985,557, 5,994,069, 6,001,567, and 6,090,543, WO 97/27214, WO 98/42873, Lyamichev et al., Nat. Biotech., 17:292 (1999), Hall et al., PNAS, USA, 97:8272 (2000), each of which is herein incorporated by reference in their entirety for all purposes). The INVADER assay detects hybridization of probes to a target by enzymatic cleavage of specific structures by structure specific enzymes.
The INVADER assay detects specific DNA and RNA sequences by using structure-specific enzymes (e.g. FEN endonucleases) to cleave a complex formed by the hybridization of overlapping oligonucleotide probes (See, e.g.
The INVADER assay detects specific mutations and SNPs in unamplified, as well as amplified, RNA and DNA including genomic DNA. In the embodiments shown schematically in
If the primary probe oligonucleotide and the target nucleotide sequence do not match perfectly at the cleavage site, the overlapped structure does not form and cleavage is suppressed. The structure specific enzyme (e.g., CLEAVASE VIII enzyme, Third Wave Technologies) used cleaves the overlapped structure more efficiently (e.g., at least 340-fold) than the non-overlapping structure, allowing excellent discrimination of the alleles.
In the INVADER assays, the probes turn can over without temperature cycling to produce many signals per target (i.e., linear signal amplification). Similarly, each target-specific product can enable the cleavage of many FRET probes. The primary INVADER assay reaction is directed against the target DNA (or RNA) being detected. The target DNA is the limiting component in the first invasive cleavage, since the INVADER and primary probe are supplied in molar excess. In the second invasive cleavage, it is the released flap that is limiting. When these two cleavage reactions are performed sequentially, the fluorescence signal from the composite reaction accumulates linearly with respect to the target DNA amount.
In certain embodiments, the INVADER assay, or other nucleotide detection assays, are performed with accessible site designed oligonucleotides and/or bridging oligonucleotides. Such methods, procedures and compositions are described in U.S. Pat. No. 6,194,149, WO9850403, and WO0198537, all of which are specifically incorporated by reference in their entireties.
In certain embodiments, the target nucleic acid sequences are amplified prior to detection (e.g., such that amplified products are generated). See, for example, co-pending application Ser. Nos. 10/356,861 and 10/967,711, each of which is incorporated by reference herein in its entirety for all purposes. In some embodiments, the target nucleic acid comprises genomic DNA. In other embodiments, the target nucleic acid comprises synthetic DNA or RNA. In some preferred embodiments, synthetic DNA within a sample is created using a purified polymerase. In some preferred embodiments, the creation of synthetic DNA using a purified polymerase occurs in the same reaction mixture as the INVADER assay. In some preferred embodiments, creation of synthetic DNA using a purified polymerase comprises the use of PCR. In other preferred embodiments, creation of synthetic DNA using a purified DNA polymerase, suitable for use with the methods of the present invention, comprises use of rolling circle amplification, (e.g., as in U.S. Pat. Nos. 6,210,884, 6,183,960 and 6,235,502, herein incorporated by reference in their entireties). In other preferred embodiments, creation of synthetic DNA comprises copying genomic DNA by priming from a plurality of sites on a genomic DNA sample. In some embodiments, priming from a plurality of sites on a genomic DNA sample comprises using short (e.g., fewer than about 8 nucleotides) oligonucleotide primers. In other embodiments, priming from a plurality of sites on a genomic DNA comprises extension of 3′ ends in nicked, double-stranded genomic DNA (i.e., where a 3′ hydroxyl group has been made available for extension by breakage or cleavage of one strand of a double stranded region of DNA). Some examples of making synthetic DNA using a purified polymerase on nicked genomic DNAs, suitable for use with the methods and compositions of the present invention, are provided in U.S. Pat. No. 6,117,634, issued Sep. 12, 2000, and U.S. Pat. No. 6,197,557, issued Mar. 6, 2001, and in PCT application WO 98/39485, each incorporated by reference herein in their entireties for all purposes.
In other embodiments, synthetic DNA suitable for use with the methods and compositions of the present invention is made using a purified polymerase on multiply-primed genomic DNA, as provided, e.g., in U.S. Pat. Nos. 6,291,187, and 6,323,009, and in PCT applications WO 01/88190 and WO 02/00934, each herein incorporated by reference in their entireties for all purposes. In these embodiments, amplification of DNA such as genomic DNA is accomplished using a DNA polymerase, such as the highly processive Φ 29 polymerase (as described, e.g., in U.S. Pat. Nos. 5,198,543 and 5,001,050, each herein incorporated by reference in their entireties for all purposes) in combination with exonuclease-resistant random primers, such as hexamers.
The present invention further provides assays in which the target nucleic acid is reused or recycled during multiple rounds of hybridization with oligonucleotide probes and cleavage of the probes without the need to use temperature cycling (e.g., for periodic denaturation of target nucleic acid strands) or nucleic acid synthesis (e.g., for the polymerization-based displacement of target or probe nucleic acid strands). When a cleavage reaction is run under conditions in which the probes are continuously replaced on the target strand (e.g. through probe-probe displacement or through an equilibrium between probe/target association and disassociation, or through a combination comprising these mechanisms, (The kinetics of oligonucleotide replacement. Luis P. Reynaldo, Alexander V. Vologodskii, Bruce P. Neri and Victor I. Lyamichev. J. Mol. Biol. 97: 511-520 (2000)), multiple probes can hybridize to the same target, allowing multiple cleavages, and the generation of multiple cleavage products.
The present invention provides INVADER assays using probes comprising non- and minimally cross-hybridizing tags.
A family of 210 sequences has been described to have optimal hybridization properties for use in nucleic acid detection assays. The sequence set of 210 oligonucleotides was characterized in hybridization assays, demonstrating the ability of family members to correctly hybridize to their complementary sequences with an absence of cross hybridization. (See U.S. Patent Publication No. 2005/0186573 to Janeczko, incorporated by reference herein in its entirety). These are the sequences having SEQ ID NOs:1173 to 1382 of Table I.
A family of complements is obtained from a set of oligonucleotides based on a family of oligonucleotides such as those of Table I. For illustrative purposes, providing a family of complements based on the oligonucleotides of Table I will be described.
Firstly, the groups of sequences based on the oligonucleotides of Table I can be represented as follows:
In Table IA, each of the numerals 1 to 22 (“numeric identifiers”) represents a 4mer (a sequence of 4 nucleotides) and the pattern of numeric identifiers 1 to 22 in the above list corresponds to the pattern of tetrameric oligonucleotide segments present in the tag, e.g., in the oligonucleotides of Table I, below. These oligonucleotides sequences have been found to be non-cross-hybridizing (See Janeczko, supra).
Each pattern is identified by a number in the left column, the “tag identifier,” which is associated with the pattern of numeric identifiers on that line. Each 4-mer is selected from the group of 4-mers consisting of WWWW, WWWX, WWWY, WWXW, WWXX, WWXY, WWYW, WWYX, WWYY, WXWW, WXWX, WXWY, WXXW, WXXX, WXXY, WXYW, WXYX, WXYY, WYWW, WYWX, WYWY, WYXW, WYXX, WYXY, WYYW, WYYX, WYYY, XWWW, XWWX, XWWY, XWXW, XWXX, XWXY, XWYW, XWYX, XWYY, XXWW, XXWX, XXWY, XXXW, XXXX, XXXY, XXYW, XXYX, XXYY, XYWW, XYWX, XYWY, XYXW, XYXX, XYXY, XYYW, XYYX, XYYY, YWWW, YWWX, YWWY, YWXW, YWXX, YWXY, YWYW, YWYX, YWYY, YXWW, YXWX, YXWY, YXXW, YXXX, YXXY, YXYW, YXYX, YXYY, YYWW, YYWX, YYWY, YYXW, YYXX, YYXY, YYYW, YYYX, and YYYY. Here W, X and Y represent nucleotide bases, A, G, C, etc., the assignment of bases being made according to rules described below. Given this numeric pattern, a 4-mer is assigned to a numeral. For example, 1=WXYY, 2=YWXY, etc. Once a given 4-mer has been assigned to a given numeral, it is not assigned for use in the position of a different numeral. It is possible, however, to assign a different 4-mer to the same numeral. That is, for example, the numeral 1 in one position could be assigned WXYY and another numeral 1, in a different position, could be assigned XXXW, but none of the other numerals 2 to 22 can then be assigned WXYY or XXXW. A different way of saying this is that each of 1 to 22 is assigned a 4-mer from the list of eighty-one 4-mers indicated so as to be different from all of the others of 1 to 22.
In the case of the specific oligonucleotides given in Table I, 1=WXYY, 2=YWXY, 3=XXXW, 4=YWYX, 5=WYXY, 6=YYWX, 7=YWXX, 8=WYXX, 9=XYYW, 10=XYWX, 11=YYXW, 12=WYYX, 13=XYXW, 14=WYYY, 15=WXYW, 16=WYXW, 17=WXXW, 18=WYYW, 19=XYYX, 20=YXYX, 21=YXXY and 22=XYXY.
Once the 4-mers are assigned to positions according to the above pattern, a particular set of oligonucleotides can be created by appropriate assignment of bases, A, T/U, G, C to W, X, Y. These assignments are made according to one of the following two sets of rules:
(i) Each of W, X and Y is a base in which:
and each of W, X and Y is selected so as to be different from all of the others of W, X and Y,
(ii) Each of W, X and Y is a base in which:
In the case of the specific oligonucleotides given in Table I, W=G, X=A and Y=T.
In any case, given a set of oligonucleotides generated according to one of these sets of rules, it is possible to modify the members of a given set in relatively minor ways and thereby obtain a different set of sequences while more or less maintaining the cross-hybridization properties of the set subject to such modification. In particular, it is possible to insert up to 3 of A, T/U, G and C at any location of any sequence of the set of sequences. Alternatively, or additionally, up to 3 bases can be deleted from any sequence of the set of sequences.
A person skilled in the art would understand that given a set of oligonucleotides having a set of properties making it suitable for use as a family of tags (or tag complements), one can obtain another family with the same property by reversing the order of all of the members of the set. In other words, all the members can be taken to be read 5′ to 3′ or to be read 3′ to 5′.
A family of complements of the present invention is based on a given set of oligonucleotides defined as described above. Each complement of the family is based on a different oligonucleotide of the set and each complement contains at least 10 consecutive (i.e., contiguous) bases of the oligonucleotide on which it is based. For a given family of complements where one is seeking to reduce or minimize inter-sequence similarity that would result in cross-hybridization, each and every pair of complements meets particular homology requirements. Particularly, subject to limited exceptions, described below, any two complements within a set of complements are generally required to have a defined amount of dissimilarity.
In order to notionally understand these requirements for dissimilarity as they exist for a given pair of complements of a family, a phantom sequence is generated from the pair of complements. A “phantom” sequence is a single sequence that is generated from a pair of complements by selection, from each complement of the pair, of a string of bases wherein the bases of the string occur in the same order in both complements. An object of creating such a phantom sequence is to create a convenient and objective means of comparing the sequence identity of the two parent sequences from which the phantom sequence is created.
A phantom sequence may thus be generated from exemplary Sequence 1 and Sequence 2 as follows:
The phantom sequence generated from these two sequences is thus 22 bases in length. That is, one can see that there are 22 identical bases with identical sequence (the same order) in Sequence Nos. 1 and 2. There is a total of three insertions/deletions and mismatches present in the phantom sequence when compared with the sequences from which it was generated:
The dashed lines in this latter representation of the phantom sequence indicate the locations of the insertions/deletions and mismatches in the phantom sequence relative to the parent sequences from which it was derived. Thus, the “T” marked with an asterisk in Sequence 1, the “A” marked with a diamond in Sequence 2 and the “A-T” mismatch of Sequences 1 and 2 marked with two dots were deleted in generating the phantom sequence.
A person skilled in the art will appreciate that the term “insertion/deletion” is intended to cover the situations indicated by the asterisk and diamond. Whether the change is considered, strictly speaking, an insertion or deletion is merely one of vantage point. That is, one can see that the fourth base of Sequence 1 can be deleted therefrom to obtain the phantom sequence, or a “T” can be inserted after the third base of the phantom sequence to obtain Sequence 1.
One can thus see that if it were possible to create a phantom sequence by elimination of a single insertion/deletion from one of the parent sequences, that the two parent sequences would have identical homology over the length of the phantom sequence except for the presence of a single base in one of the two sequences being compared. Likewise, one can see that if it were possible to create a phantom sequence through deletion of a mismatched pair of bases, one base in each parent, that the two parent sequences would have identical homology over the length of the phantom sequence except for the presence of a single base in each of the sequences being compared. For this reason, the effect of an insertion/deletion is considered equivalent to the effect of a mismatched pair of bases when comparing the homology of two sequences.
Once a phantom sequence is generated, the compatibility of the pair of complements from which it was generated within a family of complements can be systematically evaluated.
According to one embodiment of the invention, a pair of complements is compatible for inclusion within a family of complements if any phantom sequence generated from the pair of complements has the following properties:
Here, L1 is the length of the first complement, L2 is the length of the second complement, and L=L1, or if L1≠L2, L is the greater of L1 and L2.
In particular preferred embodiments of the invention, all pairs of complements of a given set have the properties set out above. Under particular circumstances, it may be advantageous to have a limited number of complements that do not meet all of these requirements when compared to every other complement in a family.
In one case, for any first complement there are at most two second complements in the family which do not meet all of the three listed requirements. For two such complements, there would thus be a greater chance of cross-hybridization between their tag counterparts and the first complement. In another case, for any first complement there is at most one second complement which does not meet all of three listed requirements.
It is also possible, given this invention, to design a family of complements where a specific number or specific portion of the complements do not meet the three listed requirements. For example, a set could be designed where only one pair of complements within the set do not meet the requirements when compared to each other. There could be two pairs, three pairs, and any number of pairs up to and including all possible pairs. Alternatively, it may be advantageous to have a given proportion of pairs of complements that do not meet the requirements, say 10% of pairs, when compared with other sequences that do not meet one or more of the three requirements listed. This number could instead by 5%, 15%, 20%, 25%, 30%, 35%, or 40%.
The foregoing comparisons would generally be largely carried out using appropriate computer software. Although notionally described in terms of a phantom sequence for the sake of clarity and understanding, it will be understood that a competent computer programmer can carry out pair-wise comparisons of complements in any number of ways using logical steps that obtain equivalent results.
The symbols A, G, T/U; C take on their usual meaning in the art here. In the case of T and U, a person skilled in the art would understand that these are equivalent to each other with respect to the inter-strand hydrogen-bond (Watson-Crick) binding properties at work in the context of this invention. The two bases are thus interchangeable and hence the designation of T/U. Base analogues can be inserted in their respective places where desired.
In another broad embodiment, a family of 1168 sequences was determined using a computer algorithm to have desirable hybridization properties for use in nucleic acid detection assays. The sequence set of 1168 oligonucleotides have been partially characterized in hybridization assays, demonstrating the ability of family members to correctly hybridize to their complementary sequences with minimal cross hybridization. (See Janeczko, supra). These are the sequences having SEQ ID NOS: 211 to 1378 of Table II.
Variant families of sequences (seen as tags or tag complements) of a family of sequences taken from Table II are also part of the invention. For the purposes of discussion, a family or set of oligonucleotides will often be described as a family of tag complements, but it will be understood that such a set could just easily be a family of tags.
A family of complements is obtained from a set of oligonucleotides based on a family of oligonucleotides such as those of Table II. To simplify discussion, providing a family of complements based on the oligonucleotides of Table II will be described.
Firstly, the groups of sequences based on the oligonucleotides of Table II can be represented as shown in Table IIA.
In Table IIA, each of the numerals 1 to 3 (“numeric identifiers”) represents a nucleotide base and the pattern of numeric identifiers (1, 2 or 3) in the above list corresponds to the pattern of nucleotide bases present in the tag, e.g., in the oligonucleotides of Table II, below. These oligonucleotides have been found to be non- or minimally cross-hybridizing (See Janeczko, supra).
Each pattern is identified by a number in the left column, the “tag identifier,” which is associated with the pattern of numeric identifiers on that line. Each nucleotide base is selected from the group of nucleotide bases consisting of A, C, G, and T/U. A particularly preferred embodiment of the invention, in which a specific base is assigned to each numeric identifier is shown in Table II, below.
In another broad aspect, the invention is a composition comprising INVADER assay probes comprising tags or tag complements, wherein each tag portion of the molecule comprises an oligonucleotide selected from a set of oligonucleotides based on a group of sequences as specified by numeric identifiers set out in Table IIA. In the sequences, each of 1 to 3 is a nucleotide base selected to be different from the others of 1 to 3 with the proviso that up to three nucleotide bases of each sequence can be substituted with any nucleotide base provided that:
for any pair of sequences of the set:
An explanation of the meaning of the parameters set out above is given in the section describing detailed embodiments.
In another broad aspect, the invention is a composition comprising INVADER assay probes comprising tags or tag complements, wherein each tag portion of the molecule comprises an oligonucleotide selected from a set of oligonucleotides based on a group of sequences as specified by numeric identifiers set out in Table IIA wherein each of 1 to 3 is a nucleotide base selected to be different from the others of 1 to 3 with the proviso that up to three nucleotide bases of each sequence can be substituted with any nucleotide base provided that:
for any pair of sequences of the set:
In another broad aspect, the invention is a composition comprising INVADER assay probes comprising tags or tag complements, wherein each tag portion of the molecule comprises an oligonucleotide selected from a set of oligonucleotides based on a group of sequences as specified by numeric identifiers set out in Table IIA wherein each of 1 to 3 is a nucleotide base selected to be different from the others of 1 to 3 with the proviso that up to three nucleotide bases of each sequence can be substituted with any nucleotide base provided that:
for any pair of sequences of the set:
In preferred aspects, the invention provides a composition in which, for the group of 24-mer sequences in which 1=A, 2=T and 3=G, under a defined set of conditions in which the maximum degree of hybridization between a sequence and any complement of a different sequence of the group of 24-mer sequences does not exceed 30% of the degree of hybridization between said sequence and its complement, for all said oligonucleotides of the composition, the maximum degree of hybridization between an oligonucleotide and a complement of any other oligonucleotide of the composition does not exceed 50% of the degree of hybridization of the oligonucleotide and its complement.
More preferably, the maximum degree of hybridization between a sequence and any complement of a different sequence does not exceed 30% of the degree of hybridization between said sequence and its complement, the degree of hybridization between each sequence and its complement varies by a factor of between 1 and up to 10, more preferably between 1 and up to 9, more preferably between 1 and up to 8, more preferably between 1 and up to 7, more preferably between 1 and up to 6, and more preferably between 1 and up to 5.
It is also preferred that the maximum degree of hybridization between a sequence and any complement of a different sequence does not exceed 25%, more preferably does not exceed 20%, more preferably does not exceed 15%, more preferably does not exceed 10%, more preferably does not exceed 5%.
Even more preferably, the above-referenced defined set of conditions results in a level of hybridization that is the same as the level of hybridization obtained when hybridization conditions include 0.2 M NaCl, 0.1 M Tris, 0.08% Triton X-100, pH 8.0 at 37.degree. C.
In the composition, the defined set of conditions can include the group of 24-mer sequences being covalently linked to beads.
In a particular preferred aspect, for the group of 24-mers the maximum degree of hybridization between a sequence and any complement of a different sequence does not exceed 15% of the degree of hybridization between said sequence and its complement and the degree of hybridization between each sequence and its complement varies by a factor of between 1 and up to 9, and for all oligonucleotides of the set, the maximum degree of hybridization between an oligonucleotide and a complement of any other oligonucleotide of the set does not exceed 20% of the degree of hybridization of the oligonucleotide and its complement.
It is possible that each 1 is one of A, T/U, G and C; each 2 is one of A, T/U, G and C; and each 3 is one of A, T/U, G and C; and each of 1, 2 and 3 is selected so as to be different from all of the others of 1, 2 and 3. More preferably, 1 is A or T/U, 2 is A or T/U and 3 is G or C. Even more preferably, 1 is A, 2 is T/U, and 3 is G.
In certain preferred composition, each of the oligonucleotides is from twenty-two to twenty-six bases in length, or from twenty-three to twenty-five, and preferably, each oligonucleotide is of the same length as every other said oligonucleotide.
In a particularly preferred embodiment, each oligonucleotide is twenty-four bases in length.
It is preferred that no oligonucleotide contains more than four contiguous bases that are identical to each other.
It is also preferred that the number of G's in each oligonucleotide does not exceed L/4 where L is the number of bases in said sequence.
For reasons described below, the number of G's in each said oligonucleotide is preferred not to vary from the average number of G's in all of the oligonucleotides by more than one. Even more preferably, the number of G's in each said oligonucleotide is the same as-every other said oligonucleotide. In the embodiment disclosed below in which oligonucleotides were tested, the sequence of each was twenty-four bases in length and each oligonucleotide contained 6 G's.
It is also preferred that, for each nucleotide, there is at most six bases other than G between every pair of neighboring pairs of G's.
Also, it is preferred that, at the 5′-end of each oligonucleotide at least one of the first, second, third, fourth, fifth, sixth and seventh bases of the sequence of the oligonucleotide is a G. Similarly, it is preferred, at the 3′-end of each oligonucleotide that at least one of the first, second, third, fourth, fifth, sixth and seventh bases of the sequence of the oligonucleotide is a G.
It is possible to have sequence compositions that include one hundred and sixty said molecules, or that include one hundred and seventy said molecules, or that include one hundred and eighty said molecules, or that include one hundred and ninety said molecules, or that include two hundred said molecules, or that include two hundred and twenty said molecules, or that include two hundred and forty said molecules, or that include two hundred and sixty said molecules, or that include two hundred and eighty said molecules, or that include three hundred said molecules, or that include four hundred said molecules, or that include five hundred said molecules, or that include six hundred said molecules, or that include seven hundred said molecules, or that include eight hundred said molecules, or that include nine hundred said molecules, or that include one thousand said molecules.
It is possible, in certain applications, for each molecule to be linked to a solid phase support so as to be distinguishable from a mixture containing other of the molecules by hybridization to its complement. Such a molecule can be linked to a defined location on a solid phase support such that the defined location for each molecule is different than the defined location for different others of the molecules.
In certain embodiments, each solid phase support is a microparticle and each said molecule is covalently linked to a different microparticle than each other different said molecule.
In another broad aspect, the invention is a composition comprising INVADER assay probes comprising tags or tag complements, wherein the composition comprises a set of 150 molecules for use as tags or tag complements wherein each molecule includes an oligonucleotide having a sequence of at least sixteen nucleotide bases wherein for any pair of sequences of the set:
In yet another broad aspect, the invention is a composition that includes a set of 150 molecules for use as tags or tag complements wherein each molecule has an oligonucleotide having a sequence of at least sixteen nucleotide bases wherein for any pair of sequences of the set:
In certain embodiments of the invention, each sequence of a composition has up to fifty bases. More preferably, however, each sequence is between sixteen and forty bases in length, or between sixteen and thirty-five bases in length, or between eighteen and thirty bases in length, or between twenty and twenty-eight bases in length, or between twenty-one and twenty-seven bases in length, or between twenty-two and twenty-six bases in length.
Often, each sequence is of the same length as every other said sequence. In particular embodiments disclosed-herein, each sequence is twenty-four bases in length.
Again, it can be preferred that no sequence contains more than four contiguous bases that are identical to each other, etc., as described above.
In certain preferred embodiments, the composition is such that, under a defined set of conditions, the maximum degree of hybridization between an oligonucleotide and any complement of a different oligonucleotide of the composition does not exceed about 30% of the degree of hybridization between said oligonucleotide and its complement, more preferably 20%, more preferably 15%, more preferably 10%, more preferably 6%.
Preferably, the set of conditions results in a level of hybridization that is the same as the level of hybridization obtained when hybridization conditions include 0.2 M NaCl, 0.1 M Tris, 0.08% Triton X-100, pH 8.0 at 37.degree. C., and the oligonucleotides are covalently linked to microparticles. Of course it is possible that these specific conditions be used for determining the level of hybridization.
It is also preferred that under such a defined set of conditions, the degree of hybridization between each oligonucleotide and its complement varies by a factor of between 1 and up to 8, more preferably up to 7, more preferably up to 6, more preferably up to 5. In a particular disclosed embodiment, the observed variance in the degree of hybridization was a factor of only 5.3, i.e., the degree of hybridization between each oligonucleotide and its complement varied by a factor of between 1 and 5.6.
In certain preferred embodiments, under the defined set of conditions, the maximum degree of hybridization between a said oligonucleotide and any complement of a different oligonucleotide of the composition does not exceed about 15%, more preferably 10%, more preferably 6%.
In one preferred embodiment, the set of conditions results in a level of hybridization that is the same as the level of hybridization obtained when hybridization conditions include 0.2 M NaCl, 0.1 M Tris, 0.08% Triton X-100, pH 8.0 at 37° C., and the oligonucleotides are covalently linked to microparticles.
Also, under the defined set of conditions, it is preferred that the degree of hybridization between each oligonucleotide and its complement varies by a factor of between 1 and up to 8, more preferably up to 7, more preferably up to 6, more preferably up to 5.
Any composition of the invention can include one hundred and sixty of the oligonucleotide molecules, or one hundred and seventy of the oligonucleotide molecules, or one hundred and eighty of the oligonucleotide molecules, or one hundred and ninety of the oligonucleotide molecules, or two hundred of the oligonucleotide molecules, or two hundred and twenty of the oligonucleotide molecules, or two hundred and forty of the oligonucleotide molecules, or two hundred and sixty of the oligonucleotide molecules, or two hundred and eighty of the oligonucleotide molecules, or three hundred of the oligonucleotide molecules, or four hundred of the oligonucleotide molecules, or five hundred of the oligonucleotide molecules, or six hundred of the oligonucleotide molecules, or seven hundred of the oligonucleotide molecules, or eight hundred of the oligonucleotide molecules, or nine hundred of the oligonucleotide molecules, or one thousand or more of the oligonucleotide molecules.
A composition of the invention can comprise a family of 5′ tags, or it can comprise a family of 5′ tag complements.
An oligonucleotide molecule belonging to a family of molecules of the invention can have incorporated thereinto one more analogues of nucleotide bases, preference being given those that undergo normal Watson-Crick base pairing.
The invention includes kits for sorting and identifying polynucleotides. Such a kit can include one or more solid phase supports each having one or more spatially discrete regions, each such region having a uniform population of substantially identical tag complements covalently attached. The tag complements are made up of a set of oligonucleotides of the invention.
The one or more solid phase supports can be a planar substrate in which the one or more spatially discrete regions is a plurality of spatially addressable regions.
The tag complements can also be coupled to microparticles. See, e.g., U.S. Pat. No. 6,916,661, which is incorporated herein by reference. Microparticles preferably each have a diameter in the range of from 5 to 40 μm.
Such a kit preferably includes microparticles that are spectrophotometrically unique, and therefore distinguishable from each other according to conventional laboratory techniques. Of course for such kits to work, each type of microparticle would generally have only one tag complement associated with it, and usually there would be a different oligonucleotide tag complement associated with (attached to) each type of microparticle.
The invention provides a method for sorting complex mixtures of molecules, e.g., in multiplex INVADER assays, by the use of families of oligonucleotide sequence tags. The families of oligonucleotide sequence tags are designed so as to provide minimal cross hybridization during the sorting process. Thus any sequence within a family of sequences will not cross hybridize with any other sequence derived from that family under appropriate hybridization conditions known by those skilled in the art. The invention is particularly useful in highly parallel processing of analytes.
The present invention includes a family of 24-mer polynucleotides, that have been demonstrated to be minimally cross-hybridizing with each other. This family of polynucleotides is thus useful as a family of tags, and their complements as tag complements.
The oligonucleotide sequences that belong to families of sequences that do not exhibit cross hybridization behavior can be derived by computer programs (described in U.S. Provisional Patent Application No. 60/181,563 filed Feb. 10, 2000). The programs use a method of generating a maximum number of minimally cross-hybridizing polynucleotide sequences that can be summarized as follows. First, a set of sequences of a given length are created based on a given number of block elements. Thus, if a family of polynucleotide sequences 24 nucleotides (24-mer) in length is desired from a set of 6 block elements, each element comprising 4 nucleotides, then a family of 24-mers is generated considering all positions of the 6 block elements. In this case, there will be 66 (46,656) ways of assembling the 6 block elements to generate all possible polynucleotide sequences 24 nucleotides in length.
Constraints are imposed on the sequences and are expressed as a set of rules on the identities of the blocks such that homology between any two sequences will not exceed the degree of homology desired between these two sequences. All polynucleotide sequences generated that obey the rules are saved. Sequence comparisons are performed in order to generate an incidence matrix. The incidence matrix is presented as a simple graph and the sequences with the desired property of being minimally cross hybridizing are found from a clique of the simple graph, which may have multiple cliques. Once a clique containing a suitably large number of sequences is found, the sequences are experimentally tested to determine if it is a set of minimally cross hybridizing sequences. This method was used to obtain the 210 non cross-hybridizing tags of Table I (See Janeczko, supra).
The method includes a rational approach to the selection of groups of sequences that are used to describe the blocks. For example there are n4 different tetramers that can be obtained from n different nucleotides, non-standard bases or analogues thereof. In a more preferred embodiment there are 44 or 256 possible tetramers when natural nucleotides are used. More preferably 81 possible tetramers when only 3 bases are used A, T and G. Most preferably 32 different tetramers when all sequences have only one G.
Block sequences can be composed of a subset of natural bases most preferably A, T and G. Sequences derived from blocks that are deficient in one base possess useful characteristics, for example, in reducing potential secondary structure formation or reduced potential for cross hybridization with nucleic acids in nature. Sets of block sequences that are most preferable in constructing families of non cross-hybridizing tag sequences should contribute approximately equivalent stability to the formation of the correct duplex as all other block sequences of the set. This should provide tag sequences that behave isothermally. This can be achieved, for example, by maintaining a constant base composition for all block sequences such as one G and three A's or T's for each block sequence. Preferably, non-cross hybridizing sets of block sequences will be comprised from blocks of sequences that are isothermal. The block sequences should be different from each other by at least one mismatch. Guidance for selecting such sequences is provided by methods for selecting primer and or probe sequences that can be found in published techniques (Robertson et al., Methods Mol Biol; 98:121-54 (1998); Rychlik et al, Nucleic Acids Research, 17:8543-8551 (1989); Breslauer et al., Proc Natl Acad. Sci., 83:3746-3750 (1986)) and the like. Additional sets of sequences can be designed by extrapolating on the original family of non cross-hybridizing sequences by simple methods known to those skilled in the art.
A preferred family of 100 tags is shown as SEQ ID NOs:1173 to 1382 in Table I. Characterization of the family of 100 sequence tags was performed to determine the ability of these sequences to form specific duplex structures with their complementary sequences and to assess the potential for cross hybridization. (See Janeczko, supra). The results indicated that the family of sequences are non-cross hybridizing (tag) sequences.
The family of 100 non-cross-hybridizing sequences can be expanded by incorporating additional tetramer sequences that are used in constructing further 24-mer oligonucleotides. (See Janeczko, supra). An additional set of 73 tag sequences so obtained (SEQ ID NOs:1273 to 1345 of Table 1) is composed of sequences that, when compared to any of SEQ ID NOs: 1173-1382, of Table I have no greater similarity than the sequences of the original 100 sequence tags of Table I. The set of 173 24-mer oligonucleotides were expanded again to include those having SEQ ID NOs:1346 to 1382 as follows. The 4-mers WXYW, XYXW, WXXW, WYYW, XYYX, YXYX, YXXY and XYXY where W=G, X=A, and Y=U/T were used in combination with the fourteen 4-mers used in the generation of SEQ ID NOs:1173-1345 to generate potential 24-base oligonucleotides. Excluded from the set were those containing the sequence patterns GG, AAAA and TTTT. To be included in the set of additional 24-mers, a sequence also had to have at least one of the 4-mers containing two G's: WXYW (GATG), WYXW (GTAG), WXXW (GAAG), WYYW (GTTG) while also containing exactly six G's. Also required for a 24-mer to be included was that there be at most six bases between every neighboring pair of G's. Another way of putting this is that there are at most six non-G's between any two G's. Also, each G nearest the 5′-end of its oligonucleotide (the left-hand side as written in Table I) was required to occupy one of the first to seventh positions (counting the 5′-terminal position as the first position.) A set of candidate sequences was obtained by eliminating any new sequence that was found to have a maximum simple homology of 16/24 or more with any of the previous set of 173 oligonucleotides (Table 1, SEQ ID NOs:1173-1345). As above, an arbitrary 174th sequence was chosen and candidate sequences eliminated by comparison therewith. In this case the permitted maximum degree of simple homology was 16/24. A second sequence was also eliminated if there were ten consecutive matches between the two (i.e., it was notionally possible to generate a phantom sequence containing a sequence of 10 bases that is identical to a sequence in each of the sequences being compared). A second sequence was also eliminated if it was possible to generate a phantom sequence 20 bases in length or greater.
A property of the polynucleotide sequences shown in Table I is that the maximum block homology between any two sequences is never greater than 662/3 percent. This is because the computer algorithm by which the sequences were initially generated was designed to prevent such an occurrence. It is within the capability of a person skilled in the art, given the family of sequences of Table I, to modify the sequences, or add other sequences while largely retaining the property of minimal-cross hybridization which the polynucleotides of Table I have been demonstrated to have.
There are 210 polynucleotide sequences given in Table I. Since all 210 of this family of polynucleotides can work with each other as a minimally cross-hybridizing set, then any plurality of polynucleotides that is a subset of the 210 can also act as a minimally cross-hybridizing set of polynucleotides. An application in which, for example, 30 molecules are to be sorted using a family of polynucleotide tags and tag complements could thus use any group of 30 sequences shown in Table I. This is not to say that some subsets may be found in practical sense to be more preferred than others. For example, it may be found that a particular subset is more tolerant of a wider variety of conditions under which hybridization is conducted before the degree of cross-hybridization becomes unacceptable.
It may be desirable to use polynucleotides that are shorter in length than the 24 bases of those in Table I. A family of subsequences (i.e., subframes of the sequences illustrated) based on those contained in Table I having as few as 10 bases per sequence could be chosen, so long as the subsequences are chosen to retain homological properties between any two of the sequences of the family important to their non cross-hybridization.
The selection of sequences using this approach would be amenable to a computerized process. Thus for example, a string of 10 contiguous bases of the first 24-mer of Table I could be selected: GATTTGTATTGATTGAGATTAAAG.
A string of contiguous bases from the second 24-mer could then be selected and compared for maximum homology against the first chosen sequence:
Systematic pairwise comparison could then be carried out to determine if the maximum homology requirement of 662/3 percent is violated:
As can be seen, the maximum homology between the two selected subsequences is 50 percent (5 matches out of the total length of 10), and so these two sequences are compatible with each other.
A 10mer subsequence can be selected from the third 24-mer sequence of Table I, and pairwise compared to each of the first two 10mer sequences to determine its compatability therewith, etc. and in this way a family of 10mer sequences developed.
It is within the scope of this invention, to obtain families of sequences containing 11mer, 12mer, 13mer, 14-mer, 15mer, 16mer, 17mer, 18mer, 19mer, 20mer, 21 mer, 22mer and 23mer sequences by analogy to that shown for 10mer sequences.
It may be desirable to have a family of sequences in which there are sequences greater in length than the 24-mer sequences shown in Table I. It is within the capability of a person skilled in the art, given the family of sequences shown in Table I, to obtain such a family of sequences. One possible approach would be to insert into each sequence at one or more locations a nucleotide, non natural base or analogue such that the longer sequence should not have greater similarity than any two of the original non cross hybridizing sequences of Table I and the addition of extra bases to the tag sequences should not result in a major change in the thermodynamic properties of the tag sequences of that set for example the GC content must be maintained between 10%-40% with a variance from the average of 20%. This method of inserting bases could be used to obtain a family of sequences up to 40 bases long.
The present invention also provides INVADER assays making use of a family of 1168 24-mer polynucleotides that have been demonstrated to be minimally cross-hybridizing with each other. This family of polynucleotides is thus useful as a family of tags, and their complements as tag complements.
In order to be considered for inclusion into the family, a sequence had to satisfy a certain number of rules regarding its composition. For example, repetitive regions that present potential hybridization problems such as four or more of a similar base (e.g., AAAA or TTTT) or pairs of Gs were forbidden. Another rule is that each sequence contains exactly six Gs and no Cs, in order to have sequences that are more or less isothermal. Also required for a 24-mer to be included is that there must be at most six bases between every neighboring pair of Gs. Another way of putting this is that there are at most six non-Gs between any two consecutive Gs. Also, each G nearest the 5′-end (resp. 3′-end) of its oligonucleotide (the left-hand (resp. right-hand) side as written in Table II) was required to occupy one of the first to seventh positions (counting the 5′-terminal (resp. 3′-terminal) position as the first position.)
Depending on the application for which these families of sequences will be used, various rules are designed. A certain number of rules can specify constraints for sequence composition (such as the ones described in the previous paragraph). The other rules are used to judge whether two sequences are too similar. Based on these rules, a computer program can derive families of sequences that exhibit minimal or no cross-hybridization behavior. The exact method used by the computer program is not crucial since various computer programs can derive similar families based on these rules. Such a program is for example described in international patent application No. PCT/CA 01/00141 published under WO 01/59151 on Aug. 16, 2001. Other, programs can use different methods, such as the ones summarized below.
A first method of generating a maximum number of minimally cross-hybridizing polynucleotide sequences starts with any number of non-cross-hybridizing sequences, for example just one sequence, and increases the family as follows. A certain number of sequences is generated and compared to the sequences already in the family. The generated sequences that exhibit too much similarity with sequences already in the family are dropped. Among the “candidate sequences” that remain, one sequence is selected and added to the family. The other candidate sequences are then compared to the selected sequence, and the ones that show too much similarity are-dropped. A new sequence is selected from the remaining candidate sequences, if any, and added to the family, and soon until there are no candidate sequences left. At this stage, the process can be repeated (generating a certain number of sequences and comparing them to the sequences in the family, etc.) as often as desired. The family obtained at the end of this method contains only minimally cross-hybridizing sequences.
A second method of generating a maximum number of minimally cross-hybridizing polynucleotide sequences starts with a fixed-size family of polynucleotide sequences. The sequences of this family can be generated randomly or designed by some other method. Many sequences in this family may not be compatible with each other, because they show too much similarity and are not minimally cross-hybridizing. Therefore, some sequences need to be replaced by new ones, with less similarity. One way to achieve this consists of repeatedly replacing a sequence of the family by the best (that is, lowest similarity) sequence among a certain number of (for example, randomly generated) sequences that are not part of the family. This process can be repeated until the family of sequences shows minimal similarity, hence minimal cross-hybridizing, or until a set number of replacements has occurred. If, at the end of the process, some sequences do not obey the similarity rules that have been set, they can be taken out of the family, thus providing a somewhat smaller family that only contains minimally cross-hybridizing sequences. Some additional rules can be added to this method in order to make it more efficient, such as rules to determine which sequence will be replaced.
Such methods have been used to obtain the 1168 non-cross-hybridizing tags of Table II (see also U.S. Patent Publication 20050186573).
One embodiment of the invention is a composition comprising molecules for use as tags or tag complements on INVADER assay probes, wherein each molecule comprises an oligonucleotide selected from a set of oligonucleotides based on the group of sequences set out in Table IIA, wherein each of the numeric identifiers 1 to 3 (see the Table) is a nucleotide base selected to be different from the others of 1 to 3. According to this embodiment, several different families of specific sets of oligonucleotide sequences are described, depending upon the assignment of bases made to the numeric identifiers 1 to 3.
The sequences contained in Table II have a mathematical relationship to each other, described as follows.
Let S and T be two DNA sequences of lengths s and t respectively. While the term “alignment” of nucleotide sequences is widely used in the field of biotechnology, in the context of this invention the term has a specific meaning illustrated here. An alignment of S and T is a 2× matrix A (with p≧s and p≧t) such that the first (or second) row of A contains the characters of S (or T respectively) in order, interspersed with p-s (or p-t respectively) spaces. It assumed that no column of the alignment matrix contains two spaces, i.e., that any alignment in which a column contains two spaces is ignored and not considered here. The columns containing the same base in both rows are called matches, while the columns containing different bases are called mismatches. Each column of an alignment containing a space in its first row is called an insertion and each column containing a space in its second row is called a deletion while a column of the alignment containing a space in either row is called an indel. Insertions and deletions within a sequence are represented by the character ‘-’. A gap is a continuous sequence of spaces in one of the rows (that is neither immediately preceded nor immediately followed by another space in the same row), and the length of a gap is the number of spaces in that gap. An internal gap is one in which its first space is preceded by a base and its last space is followed by a base and an internal indel is an belonging to an internal gap. Finally, a block is a continuous sequence of matches (that is neither immediately preceded nor immediately followed by another match), and the length of a block is the number of matches in that block. In order to illustrate these definitions, two sequences S=TGATCGTAGCTACGCCGCG (of length s=19; SEQ ID NO:1169) and T=CGTACGATTGCAACGT (of length t=16, SEQ ID NO:1170) are considered. Exemplary alignment R1 of S and T (with p=23) is:
Columns 1 to 4, 9, 10, 12 and 20 to 23 are indels, columns 6, 7, 8, 11, 13, 14, 16, 17 and 18 are matches, and columns 5, 15 and 19 are mismatches. Columns 9 and 10 form a gap of length 2, while columns 16 to 18 form a block of length 3. Columns 9, 10 and 12 are internal indels.
A score is assigned to the alignment A of two sequences by assigning weights to each of matches, mismatches and gaps as follows:
the reward for a match m,
the penalty for a mismatch mm,
the penalty for opening a gap og
the penalty for extending a gap eg.
Once these values are set, a score to each column of the alignment is assigned according to the following rules:
1. assign 0 to each column preceding the first match and to each column following the last match.
2. for each of the remaining columns, assign m if it is a match, mm if it is a mismatch, -og-eg if it is the first indel of a gap, -eg if it is an indel but not the first indel of a gap.
The score of the alignment A is the sum of the scores of its columns. An alignment is said to be of maximum score if no other alignment of the same two sequences has a higher score (with the same values of m, mm, og and eg). A person knowledgeable in the field will recognize this method of scoring an alignment as scoring a local (as opposed to global) alignment with affine gap penalties (that is, gap penalties that can distinguish between the first indel of a gap and the other indels). It will be appreciated that the total number of indels that open a gap is the same as the total number of gaps and that an internal indel is not one of those assigned a 0 in rule (1) above. It will also be noted that foregoing rule (1) assigns a 0 for non-internal mismatches. An internal mismatch is a mismatch that is preceded and followed (not necessarily immediately) by a match.
As an illustration, if the values of m, mm, og and eg are set to 3, 1, 2 and 1 respectively, alignment R1 has a score of 19, determined as shown below:
Note that for two given sequences S and T, there are numerous alignments. There are often several alignments of maximum score.
Based on these alignments, five sequence similarity measures are defined as follows. For two sequences S and T, and weights {m, mm, og, eg}:
Notice that, by definition, the following inequalities between these similarity measures are obtained: M4≦M3 and M5≦M3. Also, in order to determine M2 it is sufficient to determine the maximum length of a block over all alignments free of internal indels. For two given sequences, the values of M3 to M5 can vary depending on the values of the weights {m, mm, og, eg}, but not M1 and M2.
For weights {3, 1, 2, 1}, the illustrated alignment is not a maximum score alignment of the two example sequences. But for weights {6, 6, 0, 6} it is; hence this alignment shows that for these two example sequences, and weights {6, 6, 0, 6}, M2≧3, M3≧9, M4≧6 and M5≧6. In order to determine the exact values of M1 to M5, all the necessary alignments need to be considered. M1 and M2 can be found by looking at the s+t−1 alignments free of internal indels, where s and t are the lengths of the two sequences considered. Mathematical tools known as dynamic programming can be implemented on a computer and used to determine M3 to M5 in a very quick way. Using a computer program to do these calculations, it was determined that:
with the weights {6, 6, 0, 6}, M1=8, M2=4, M3=10, M4=6 and M5=6;
with the weights {3, 1, 2, 1}, M=8, M2=4, M3=10, M4=6 and M5=4.
According to the preferred embodiment of this invention, two sequences S and T each of length 24 are too similar if at least one of the following happens:
M1>16 or
M2>13 or
M3>20 or
M4>16 or
M5>19
when using either weights {6, 6, 0, 6}, or {6, 6, 5, 1}, or {6, 2, 5, 1}, or {6, 6, 6, 0}. In other words, the five similarity measures between S and T are determined for each of the above four sets of weights, and checked against these thresholds (for a total of 20 tests).
The above thresholds of 16, 13, 20, 16 and 19, and the above sets of weights, were used to obtain the sequences listed in Table I. Additional sequences can thus be added to those of Table I as long as the above alignment rules are obeyed for all sequences.
It is also possible to alter thresholds M1, M2, etc., while remaining within the scope of this invention. It is thus possible to substitute or add sequences to those of Table II, or more generally to those of Table IIA to obtain other sets of sequences that would also exhibit reasonably low cross-hybridization. More specifically, a set of 24-mer sequences in which there are no two sequences that are too similar, where too similar is defined as:
M1>19 or
M2>17 or
M3>21 or
M4>18 or
M5>20
when using either weights {6, 6, 0, 6}, or {6, 6, 5, 1}, or {6, 2, 5, 1}, or {6, 6, 6, 0}, would also exhibit low cross-hybridization. Reducing any of the threshold values provides sets of sequences with even lower cross-hybridization. Alternatively, ‘too similar’ can also be defined as:
M1>19 or
M2>17 or
M3>21 or
M4>18 or
M5>20
when using either weights {3, 1, 2, 1}. Alternatively, other combinations of weights will lead to sets of sequences with low cross-hybridization.
Notice that using weights {6, 6, 0, 6} is equivalent to using weights {1, 1, 0, 1}, or weights {2, 2, 0, 2}, . . . (that is, for any two sequences, the values of M1 to M5 are exactly the same whether weights {6, 6, 0, 6} or {1, 1, 0, 1} or {2, 2, 0, 2} or any other multiple of {1, 1, 0, 1} is used).
When dealing with sequences of length other than 24, or sequences of various lengths, the definition of similarity can be adjusted. Such adjustments are obvious to the persons skilled in the art. For example, when comparing a sequence of length L1 with a sequence of length L2 (with L1<L2), they can be considered as too similar when
M1>19/24×L1
M2>17/24×L1
M3>21/24×L1
M4>18/24×L1
M5>20/24×L1
when using either weights {6, 6, 0, 6}, or {6, 6, 5, 1}, or {6, 2, 5, 1} or {6, 6, 6, 0}.
Polynucleotide sequences can be composed of a subset of natural bases most preferably A, T and G. Sequences that are deficient in one base possess useful characteristics, for example, in reducing potential secondary structure formation or reduced potential for cross hybridization with nucleic acids in nature. Also, it is preferable to have tag sequences that behave isothermally. This can be achieved for example by maintaining a constant base composition for all sequences such as six Gs and eighteen As or Ts' for each sequence. Additional sets of sequences can be designed by extrapolating on the original family of non-cross-hybridizing sequences by simple methods known to those skilled in the art.
There are 1168 polynucleotide sequences given in Table II. This family of 1168 sequence tags have been shown to form specific duplex structures with their complementary sequences, and with low potential for cross-hybridization within the sequence set (see, e.g., U.S. Patent Publication 20050186573).
Since all 1168 of this family of polynucleotides can work with each other as a minimally cross-hybridizing set, then any plurality of polynucleotides that is a subset of the 1168 can also act as a minimally cross-hybridizing set of polynucleotides. An application in which, for example, 30 molecules are to be sorted using a family of polynucleotide tags and tag complements could thus use any group of 30 sequences shown in Table II. This is not to say that some subsets may be found in a practical sense to be more preferred than others. For example, it may be found that a particular subset is more tolerant of a wider variety of conditions under which hybridization is conducted before the degree of cross-hybridization becomes unacceptable.
It may be desirable to use polynucleotides that are shorter in length than the 24 bases of those in Table II. A family of subsequences (i.e., subframes of the sequences illustrated) based on those contained in Table II having as few as 10 bases per sequence could be chosen, so long as the subsequences are chosen to retain homological properties between any two of the sequences of the family important to their non cross-hybridization.
The selection of sequences using this approach is amenable to a computerized process. Thus for example, a string of 10 contiguous bases of the first 24-mer of Table II could be selected: AAATTGTGAAAGATTGTTTGTGT-A (SEQ ID NO:1).
The same string of contiguous bases from the second 24-mer could then be selected and compared for similarity against the first chosen sequence: GTTAGAGTTAATTGTATTTGATGA (SEQ ID NO:2 of Table II). A systematic pairwise comparison could then be carried out to determine if the similarity requirements are violated. If the pair of sequences does not violate any set property, a 10-mer subsequence can be selected from the third 24-mer sequence of Table II, and compared to each of the first two 10-mer sequences (in a pairwise fashion to determine its compatibility therewith, etc. In this way a family of 10-mer sequences may be developed.
It is within the scope of this invention, to obtain families of sequences containing 1mer, 12mer, 13mer, 14-mer, 15mer, 16mer, 17mer, 18mer, 19mer, 20mer, 21mer, 22mer and 23mer sequences by analogy to that shown for 10mer sequences. It may be desirable to have a family of sequences in which there are sequences greater in length than the 24-mer sequences shown in Table II. It is within the capability of a person skilled in the art, given the family of sequences shown in Table II, to obtain such a family of sequences. One approach would be to insert into each sequence at one or more locations a nucleotide, non-natural base or analogue such that the longer sequence should not have greater similarity than any two of the original non-cross-hybridizing sequences of Table II and the addition of extra bases to the tag sequences should not result in a major change in the thermodynamic properties of the tag sequences of that set for example the GC content must be maintained between 10%-40% with a variance from the average of 20%. This method of inserting bases could be used to obtain, for example, a family of sequences up to 40 bases long.
Given a particular family of sequences that can be used as a family of tags (or tag complements), e.g., those of Table II, a skilled person will readily recognize variant families that work equally as well.
Again taking the sequences of Table II for example, every T could be converted to an A and vice versa and no significant change in the cross-hybridization properties would be expected to be observed. This would also be true if every G were converted to a C.
Also, all of the sequences of a family could be taken to be constructed in the 5′-3′ direction, as is the convention, or all of the constructions of sequences could be in the opposition direction (3′-5′).
There are additional modifications that may be carried out. For example, C has not been used in the family of sequences. Substitution of C in place of one or more G's of a particular sequence would yield a sequence that is at least as low in homology with every other sequence of the family as was the particular sequence chosen for modification. It is thus possible to substitute C in place of one or more G's in any of the sequences shown in Table II. Analogously, substituting of C in place of one or more A's is possible, or substituting C in place of one or T's is possible.
It is preferred that the sequences of a given family are of the same, or roughly the same length. Preferably, all the sequences of a family of sequences of this invention have a length that is within five bases of the base-length of the average of the family. More preferably, all sequences are within four bases of the average base-length. Even more preferably, all or almost all sequences are within three bases of the average base-length of the family. Better still, all or almost all sequences have a length that is within two of the base-length of the average of the family, and even better still, within one of the base-length of the average of the family.
It is also possible for a person skilled in the art to derive sets of sequences from the family of sequences described in this specification and remove sequences that would be expected to have undesirable hybridization properties.
Given a particular family of sequences that can be used as a family of tags (or tag complements), e.g., those of Table I or Table II, or the combined sequences of these two tables, a skilled person will readily recognize variant families that work equally as well.
Again taking the sequences of Table I for example, every T could be converted to an A and vice versa and no significant change in the cross-hybridization properties would be expected to be observed. This would also be true if every G were converted to a C.
Also, all of the sequences of a family could be taken to be constructed in the 5′-3′ direction, as is the convention, or all of the constructions of sequences could be in the opposition direction (3′-5′).
There are additional modifications that can be carried out. For example, C has not been used in the family of sequences. Substitution of C in place of one or more T's of a particular sequence would yield a sequence that is at least as low in homology with every other sequence of the family as the particular sequence chosen to be modified was. It is thus possible to substitute C in place of one or more T's in any of the sequences shown in Table I. Analogously, substituting of C in place of one or more A's is possible, or substituting C in place of one or T's is possible.
It is preferred that the sequences of a given family are of the same, or roughly the same length. Preferably, all the sequences of a family of sequences of this invention have a length that is within five bases of the base-length of the average of the family. More preferably, all sequences are within four bases of the average base-length. Even more preferably, all or almost all sequences are within three bases of the average base-length of the family. Better still, all or almost all sequences have a length that is within two of the base-length of the average of the family.
It is also possible for a person skilled in the art to derive sets of sequences from the family of sequences that is the subject of this patent and remove sequences that would be expected to have undesirable hybridization properties.
Preferably oligonucleotide sequences of the invention are synthesized directly by standard phosphoramidite synthesis approaches and the like (Caruthers et al, Methods in Enzymology; 154, 287-313: 1987; Lipshutz et al, Nature Genet.; 21, 20-24: 1999; Fodor et al, Science; 251, 763-773: 1991). Alternative chemistries involving non natural bases such as peptide nucleic acids or modified nucleosides that offer advantages in duplex stability may also be used (Hacia et al; Nucleic Acids Res; 27: 4034-4039, 1999; Nguyen et al, Nucleic Acids Res.; 27, 1492-1498: 1999; Weiler et al, Nucleic Acids Res.; 25, 2792-2799:1997). It is also possible to synthesize the oligonucleotide sequences of this invention with alternate nucleotide backbones such as phosphorothioate or phosphoroamidate nucleotides. Methods involving synthesis through the addition of blocks of sequence in a step wise manner may also be employed (Lyttle et al, Biotechniques, 19: 274-280 (1995). Synthesis may be carried out directly on the substrate to be used as a solid phase support for the application or the oligonucleotide can be cleaved from the support for use in solution or coupling to a second support.
There are several different solid phase supports that can be used with the invention. They include but are not limited to slides, plates, chips, membranes, beads, microparticles and the like. The solid phase supports can also vary in the materials that they are composed of including plastic, glass, silicon, nylon, polystyrene, silica gel, latex and the like. The surface of the support is coated with the complementary sequence of the same.
In some embodiments, the family of tag complement sequences are derivatized to allow binding to a solid support. Many methods of derivatizing a nucleic acid for binding to a solid support are known in the art (Hermanson G., Bioconjugate Techniques; Acad. Press: 1996). The sequence tag may be bound to a solid support through covalent or non-covalent bonds (Iannone et al, Cytometry; 39: 131-140, 2000; Matson et al, Anal. Biochem.; 224: 110-106, 1995; Proudnikov et al, Anal Biochem; 259: 34-41, 1998; Zammatteo et al, Analytical Biochemistry; 280:143-150, 2000). The sequence tag can be conveniently derivatized for binding to a solid support by incorporating modified nucleic acids in the terminal 5′ or 3′ locations.
A variety of moieties useful for binding to a solid support (e.g., biotin, antibodies, and the like), and methods for attaching them to nucleic acids, are known in the art. For example, an amine-modified nucleic acid base (available from, eg., Glen Research) may be attached to a solid support (for example, Covalink-NH, a polystyrene surface grafted with secondary amino groups, available from Nunc) through a bifunctional crosslinker (e.g., bis(sulfosuccinimidyl suberate), available from Pierce). Additional spacing moieties can be added to reduce steric hindrance between the capture moiety and the surface of the solid support.
A family of oligonucleotide tag sequences can be conjugated to a population of analytes most preferably polynucleotide sequences in several different ways including but not limited to direct chemical synthesis, chemical coupling, ligation, amplification, and the like. Sequence tags that have been synthesized with primer sequences can be used for enzymatic extension of the primer on the target for example in PCR amplification.
The families of INVADER assay probes comprising non cross-hybridizing 5′ tag sequences may be provided in kits for use in, for example, genetic analysis. Such kits include one or more probes comprising non-cross hybridizing sequences. Reagents may include enzymes, nucleotides, fluorescent labels and the like that would be required for specific applications. Instructions for correct use of the kit for a given application may be provided.
Filed herewith on compact disk, and expressly incorporated herein by reference, is a Sequence Listing provided as a file entitled “10956.txt,” 427 kb in size, created Jun. 7, 2006.
All publications and patents mentioned in the above specification are herein incorporated by reference. Various modifications and variations of the described compositions and methods of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in molecular biology, genetics, or related fields are intended to be within the scope of the following claims.