The invention relates to the field of aptamers and their use.
Aptamers are nucleic acids or peptide molecules that bind targets with an affinity and specificity that rival antibody-antigen interactions. DNA/RNA aptamers promise to provide a cost-effective alternative to antibodies because there is no need for selection in animals or cell lines, they have shelf-lives of years, and they can be easily modified to reduce cross-reactivity with undesired targets. This ability to bind, and in some instances, alter their targets' functions have earned aptamers potential applications in biosensor development, affinity chromatography and recently therapeutics and diagnostics.
Traditionally, artificial aptamer sequences are discovered by SELEX (Systematic Evolution of Ligands by EXponential Enrichment) and other closely related methods of in vitro evolution. Starting libraries have relatively long oligomers of DNA/RNA sequences (80-120 nt) with central randomized regions (30-120 nt). These are sparsely sampled libraries with a probability of ˜10−4 that any particular sequence occurs in a typical starting pool for a randomized 30 mer, and of ˜10−29 with randomized 70 mers. This means that such SELEX experiments begin with single copies of those sequences that are present by random chance. Evolution occurs via the selective pressure of binding to the target followed by amplification of the survivors; selection and amplification are repeated in typically 5-20 rounds. Winners are found by cloning and sequencing, after which a minimal core binding sequence is sought by truncating segments of the parent aptamer that are not needed for the interaction with the target.
Despite the wide adoption of the SELEX procedure for the discovery of DNA/RNA aptamers, only a few hundred target-specific aptamers have been discovered to date using this method compared with the discovery of thousands of antibodies during the same period. This limited success may stem primarily from a significant number of drawbacks with the SELEX selection method itself. First, the universe of possible sequences in SELEX experiments (e.g., 1×1018 for a 30 nucleotide random stretch), is so large that direct synthesis and screening of all sequences is impossible, even given the high-throughput advancements made in DNA/RNA synthesizer instrumentation. Second, even when SELEX identifies nucleic acid sequences with extremely high affinity for the target, these sequences are generally relatively long (typically 80-150 monomer units in length), and often have complex internal structures (secondary structures). Such long folded molecules are often disadvantageous for a variety of applications, where cost and ease of production and manipulation are better for short (20-40 unit), defined binding domains. Third, the SELEX methodology of repeated rounds of selection and amplification are cumbersome, time-consuming and expensive.
For the forgoing reasons, there is an unmet need for improved high-throughput methods of aptamer discovery.
A procedure called, high throughput screening of aptamers (HTSA), is described for the rapid discovery of relatively small, structurally-defined nucleic acid sequences that bind targets with high affinity and selectivity.
In one aspect, the invention provides an aptamer library comprising a plurality of aptamer candidates. Each aptamer candidate is substantially of the same length and has a primary structure and a pre-selected secondary structure. The primary structure comprises at least a variable nucleotide sequence where nucleotides at m number of positions are varied, and a secondary structure comprising at least a single-stranded region and a double-stranded region, where the variable sequence is at least part of the single-stranded region, and where, for every 100 pmol of aptamer candidates, an average of at least about three copies of each possible variable sequence is represented.
In various embodiments, the pre-selected secondary structure is a hairpin loop, a bulge loop, an internal loop, a multi-branch loop, a pseudoknot or combinations thereof.
The variable sequence can have randomized nucleotides at some positions and invariant nucleotides at other positions, or randomized nucleotides at all positions. The variable sequence can be completely within the single-stranded region, or comprise nucleotides at positions in the double-stranded region and are no more than three nucleotides away from an end of the single-stranded region.
In some embodiments, for every 100 pmol of the aptamer candidates, an average of at least about six, twelve, or a higher number of copies of each possible variable sequence is represented. The m number of positions can be at least about 5. Each aptamer candidate can be about 50-60 nucleotides in length and m can be about 25, 22 or less. In one feature, each aptamer candidate has a common secondary structure. Each aptamer candidate may comprise an oligonucleotide selected from DNAs, RNAs, PNA, modified nucleotides, and mixtures of any of the above. In some embodiments, each aptamer candidate is no more than 100, 75 or 50 nucleotides in length.
In one embodiment, the aptamer library comprises at least 109 distinct members. In one feature, the aptamer library may comprise a plurality of concatenated aptamers that can include two or more identical secondary structures, two or more non-identical secondary structures or a combination of identical and non-identical secondary structures.
In one aspect, the invention provides a microarray chip having the above-described aptamer library or other library embodiments of the present invention.
In another aspect, the invention further provides a method of using the library of the invention, specifically, a method for identifying an aptamer that binds to a target. Naturally, features of the library also apply to methods involving the library and are not repeated here. The method includes the steps of (a) providing an aptamer library comprising a plurality of aptamer candidates, each aptamer candidate having a primary structure of substantially the same length and a pre-selected secondary structure, the primary structure comprising at least a variable nucleotide sequence where nucleotides at m number of positions are being varied, the secondary structure comprising at least a single-stranded region and a double-stranded region, wherein the variable sequence is at least part of the single-stranded region and wherein for every 100 pmol of the aptamer candidates, an average of at least about three copies of each possible variable sequence is represented; (b) contacting the aptamer library with a target under a buffer condition that allows binding between members of the aptamer library and the target; (c) isolating at least a member of the aptamer library that is bound to the target, and (d) determining the variable sequence of the bound aptamer candidate.
In one embodiment, the above method includes an amplification step after step (c).
In one embodiment, step (c) comprises isolating a sub-fraction of the aptamer library bound to the target and wherein the method further comprises a step (e) of ranking the affinity of the bound candidate aptamers for the target according to their frequency of occurrence within the sub-fraction, as evidenced by result from step (d).
In one feature, the variable sequence has randomized nucleotides at some position and invariant nucleotides at other positions. In another feature, the variable sequence comprises randomized nucleotides at all positions.
In an embodiment, the above-described method of identifying an aptamer that binds to a target comprises a washing step after the contacting step, wherein the aptamer candidates that do not bind to the target are washed away by a buffer. The buffer condition of the washing step may be no more stringent than the buffer condition in the contacting step or the washing may occur in the presence of a competing oligonucleotide that comprises at least a part of the secondary structure of the aptamer candidates.
In some embodiments, the target comprises a polypeptide sequence, a nucleotide sequence, a lipid or a carbohydrate. In other embodiments, the target comprises a peptide, nucleotide, lipid or carbohydrate moieties at the surface of a virus, or cell. The target can be immobilized on a solid support. In one feature, the target comprises a small molecule. The small molecule may have a molecular weight of 1000 or less. In one feature, the target may comprise a label.
In one feature, step (d) of the method is accomplished through high throughput sequencing technology. In an embodiment, the high throughput sequencing technology is capable of generating at least 10,000 sequences in the library subsequent to step (c).
In yet another aspect, the invention provides a method of identifying a candidate aptamer sequence that binds to a target, comprising the steps of (a) providing an aptamer library comprising a plurality of aptamer candidates, each aptamer candidate having a primary structure and a pre-selected secondary structure, the primary structure comprising at least a variable nucleotide sequence, where nucleotides at m number of positions are being varied, the secondary structure comprising at least a single-stranded region and a double-stranded region, wherein the variable sequence is at least part of the single-stranded region and wherein for every 100 pmol of the aptamer candidates, an average of at least about three copies of each possible variable sequence is represented, (b) dividing the aptamer library into pools of aptamer candidates, each pool comprising 4m aptamer candidates, wherein m represents the number of randomized nucleotides within the variable sequence of each aptamer candidate, (c) affixing each of the pools to a distinct feature on a support, (d) contacting the support with a target (e) identifying features that exhibit sufficient binding to the target above a pre-determined level, (f) subsequently devising sub-pools from any candidate pool associated with each feature identified in step (e), each of the sub-pools comprising a fraction of distinct candidate aptamers contained in the candidate pool, (g) repeating steps c) through f) until at least one of the sub-pools has only aptamer candidates of the same variable sequence and identifying the variable sequence of the aptamer candidate in the at least one sub-pool obtained in step (g).
In some embodiments, the solid support is a microarray chip or a filter substrate. In an embodiment, the sub-pool is identified through gel shift.
In one embodiment, the number of the randomized nucleotides, m, within the variable sequence of each aptamer candidate is about 25, 22 or less.
In a further aspect, the invention provides a method for refining the desirable properties of a template aptamer by randomizing certain segments of the aptamer sequence, providing a template aptamer, introducing randomized sequences into a segment of the template aptamer, applying any one of the above described methods of identifying a candidate aptamer sequence that binds to a target, and determining which of the randomized sequences within the segment increases the binding affinity of the template aptamer for the target.
The template aptamer can be a SELEX-derived aptamer. The binding affinity for the target can be determined by fluorescence polarization. The target may be labeled.
In yet another aspect, the invention discloses an aptamer-based biosensor comprising (a) a test aptamer capable of binding to a target, the test aptamer being selected from an aptamer library comprising a plurality of aptamer candidates, each aptamer candidate having a primary structure and a pre-selected secondary structure, the primary structure comprising at least a variable nucleotide sequence, where nucleotides at m number of positions are being varied, the secondary structure comprising at least a single-stranded region and a double-stranded region, wherein the variable sequence is at least part of the single-stranded region and wherein for every 100 pmol of the aptamer candidates, an average of at least about three copies of each possible variable sequence is represented and (b) a detection moiety, attached to the test aptamer, wherein the absence of binding of the target to the test aptamer permits detection of a signal from the detection moiety.
The detection moiety can be an oligonucleotide and the oligonucleotide can include a fluorescence donor and either a fluorescence acceptor or a fluorescence quencher. Binding of the target to the test aptamer can induce a conformational change in the detection moiety that causes a change in the fluorescence signal.
In yet a further aspect, the invention provides a diagnostic kit for identifying the presence of a target in a sample, comprising (a) a test aptamer capable of binding to a target, the aptamer being selected from an aptamer library comprising a plurality of aptamer candidates, each aptamer candidate having substantially the same length and having a primary structure and a pre-selected secondary structure, the primary structure comprising at least a variable nucleotide sequence where nucleotides at m number of positions are being varied, the secondary structure comprising at least a single-stranded region and a double-stranded region, wherein the variable sequence is at least part of the single-stranded region, and wherein for every 100 pmol of the aptamer candidates, an average of at least about three copies of each possible variable sequence is represented, (b) reagents for performing the binding reaction between the test aptamer and the target, and (c) instructions for the use of the diagnostic kit in identifying the presence of the target in a test sample.
Features and embodiments described with regard to one aspect of the present invention, as would be obvious to one skilled in the art, often apply to other aspects of the invention and are not repeated here. For example, features described with regard the library generally apply to the biosensor and the diagnostic kit aspect of the invention as well.
It should be understood that this application is not limited to the embodiments disclosed in this Summary, and it is intended to cover modifications and variations that are within the scope of those of sufficient skill in the field, and as defined by the claims.
The embodiments described here have many advantages over SELEX and other similar methods for aptamer discovery. The herein described HTSA procedure employs a comprehensive library of short nucleic acid sequences having a pre-defined secondary structure in which every possible variant sequence is represented by at least one copy in the library. Selection and sequencing of candidate aptamers that bind to a target occur after just one round of binding to the target. HTSA methodology therefore resolves many of the limitations of current aptamer discovery technology by improving throughput, cost, the diversity of the sequences screened as well as the time needed to validate candidate aptamers.
a and 12b illustrate N3-N6 DNA hairpin loop library (5440 total sequences, 106 library pools).
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of skill in the art. The following definitions are provided to help interpret the disclosure and claims of this application. In the event a definition in this section is not consistent with definitions elsewhere, the definition set forth in this section will control.
As used herein, the term “about” or “approximately” when used in conjunction with a number refers to any number within 5, 10 or 15% of the referenced number.
The term “plurality”, as used herein, refers to a quantity of two or more.
As used herein, “nucleic acid,” “oligonucleotide,” and “polynucleotide” are used interchangeably to refer to a polymer of nucleotides of any length, and such nucleotides may include deoxyribonucleotides, ribonucleotides, and/or analogs or chemically modified deoxyribonucleotides or ribonucleotides. The terms “polynucleotide,” “oligonucleotide,” and “nucleic acid” include double- or single-stranded molecules as well as triple-helical molecules. An oligonucleotide may have any number of nucleotides theoretically but preferably 2-200 nucleotides, more preferably 10-100 nucleotides, and yet more preferably 20-40 nucleotides.
“Enumerate” refers to a series of positions in an oligonucleotide sequence. An enumerated position will have only one of several different bases (generally G,A,T,C, or U) at that position. The enumerated positions are generally found in a single stranded loop or bulge loop.
As used herein, “target molecule” and “target” are used interchangeably to refer to any molecule to which an aptamer can bind. “Target molecules” or “targets” refer to, for example, proteins, polypeptides, nucleic acids, carbohydrates, lipids, polysaccharides, glycoproteins, hormones, receptors, antigens, antibodies, affybodies, antibody mimics, viruses, pathogens, toxic substances, substrates, metabolites, transition state analogs, cofactors, inhibitors, drugs, small molecules, dyes, nutrients, pollutants, growth factors, cells, tissues, or microorganisms and any fragment or portion of any of the foregoing. In one embodiment, a “target” refers to a cell surface molecule, such as a cell membrane protein.
As used herein, “combimer,” “aptamer candidate” and “aptamer,” are used interchangeably and refer to an oligonucleotide that is able to bind a target of interest other than by base pair hybridization. “Aptamers” typically comprise DNA, RNA, PNA, nucleotide analogs, modified nucleotides or mixtures of any of the above. “Aptamers” may be naturally occurring or made by synthetic or recombinant means. “Aptamers” used herein comprise single stranded regions and regions of secondary structure including, but not limited to, a hairpin loop, a bulge loop, an internal loop, a multi-branch loop, a pseudoknot or combinations thereof. “Aptamers” may comprise naturally occurring nucleotides, nucleotides that have been modified in some way, such as by chemical modification, and unnatural bases, for example 2-aminopurine. “Aptamers” may be chemically modified, for example, by the addition of a label, such as a fluorophore, or a by the addition of a molecule that allows the aptamer to be crosslinked to a molecule to which it is bound. “Aptamers” or “candidate aptamers” are of the same “type” if they have the same sequence or are capable of specific binding to the same molecule. The length of the aptamer will vary, but it is typically less than about 100 nucleotides. HT-aptamers designate aptamers found in HTSA libraries and SE-aptamers designate aptamers found in SELEX libraries.
An “aptamer candidate” is an HTSA selected aptamer (sometimes referred to as HT-aptamer) that has a low, moderate or high binding affinity for a target molecule. It is recognized that affinity interactions are a matter of degree; however, in this context, the “specific binding affinity” of an aptamer for its target means that the aptamer binds to its target generally with a much higher degree of affinity than it binds to other components in a test sample.
As used herein, “a template aptamer” is an aptamer having an affinity for a target that can be improved by refinement, i.e., modification of the nucleotide sequence of an aptamer to increase or decrease the affinity of the template aptamer for the target. In one embodiment, “a template aptamer” is a SELEX-derived aptamer (sometimes referred to as SE-aptamer).
As used herein, “high affinity” binding refers to binding of a candidate aptamer to a target with binding dissociation constant Kd is less than 100 nMolar.
As used herein, “moderate affinity” binding refers to binding of a candidate aptamer to a target with binding dissociation constant Kd from 0.1 μM to 100 μMolar.
As used herein, “low affinity” binding refers to binding of a candidate aptamer to a target with binding dissociation constant Kd from 0.1 mM to 1000 mMolar.
As used herein, the term “library” refers to a plurality of compounds, e.g. aptamers.
As used herein, Peptide Nucleic Acids (PNAs), are nucleic acids in which the sugar phosphate backbone of the oligonucleotide is replaced by a peptide backbone comprising an amide bond.
As used herein, the term “label” or “detection moiety” refers to one or more reagents that can be used to detect interactions involving a target and an aptamer. A detection moiety or label is capable of being detected directly or indirectly. In general, any reporter molecule that is detectable can be a label. Labels include, for example, (i) reporter molecules that can be detected directly by virtue of generating a signal, (ii) specific binding pair members that can be detected indirectly by subsequent binding to a cognate that contains a reporter molecule, (iii) mass tags detectable by mass spectrometry, and (iv) oligonucleotide primers that can provide a template for amplification or ligation. The reporter molecule can be a catalyst, such as an enzyme, dye, fluorescent molecule, quantum dot, chemiluminescent molecule, coenzyme, enzyme substrate, radioactive group, a small organic molecule, amplifiable polynucleotide sequence, a particle such as latex or carbon particle, metal sol, crystallite, etc., which may or may not be further labeled with a dye, catalyst or other detectable group, a mass tag that alters the weight of the molecule to which it is conjugated for mass spectrometry purposes, and the like. The label can be selected from electromagnetic or electrochemical materials. In one embodiment, the detectable label is a fluorescent dye such as Cy-3 or Cy-5. Other labels and labeling schemes will be evident to one skilled in the art based on the disclosure herein.
The detection moiety can be detected by emission of a fluorescent signal, a chemiluminescent signal, or any other detectable signal that is dependent upon the identity of the moiety. In the case where the detectable moiety is an enzyme (for example, alkaline phosphatase), the signal can be generated in the presence of the enzyme substrate and any additional factors necessary for enzyme activity. In the case where the detectable moiety is an enzyme substrate, the signal can be generated in the presence of the enzyme and any additional factors necessary for enzyme activity. Suitable reagent configurations for attaching the detectable moiety to a target molecule include covalent attachment of the detectable moiety to the target molecule, non-covalent association of the detectable moiety with another labeling agent component that is covalently attached to the target molecule, and covalent attachment of the detectable moiety to a labeling agent component that is non-covalently associated with the target molecule. Universal protein stains are described in detail in U.S. Patent Application US20080160535. In one embodiment, the detection moiety is a molecular switch based on a FRET pair, for example, an “Alloswitch” (Orthosystems, Inc.), further described in the published U.S. patent applications US20060216692 and US20060029933.
“Solid support” refers herein to any substrate having a surface to which molecules can be attached, directly or indirectly, through either covalent or non-covalent bonds. The substrate materials can be naturally occurring, synthetic, or a modification of a naturally occurring material. Solid support materials include silicon, graphite, mirrored surfaces, laminates, ceramics, plastics (including polymers such as, e.g., poly(vinyl chloride), cyclo-olefin copolymers, polyacrylamide, polyacrylate, polyethylene, polypropylene, poly(4-methylbutene), polystyrene, polymethacrylate, polyethylene terephthalate), polytetrafluoroethylene (PTFE or Teflon®), nylon, poly(vinyl butyrate)), germanium, gallium arsenide, gold, silver, etc., either used by themselves or in conjunction with other materials. Additional rigid materials can be considered, such as glass, which includes silica and further includes, for example, glass that is available as Bioglass. Other materials that can be employed include porous materials, such as, for example, controlled pore glass beads. Any other materials known in the art that are capable of having one or more functional groups, such as any of an amino, carboxyl, thiol, or hydroxyl functional group, for example, incorporated on its surface, are also contemplated. The solid support can take any of a variety of configurations ranging from simple to complex and can have any one of a number of shapes, including a strip, plate, disk, rod, particle, including bead, tube, well, and the like. The surface can be relatively planar (e.g., a slide), spherical (e.g., a bead), cylindrical (e.g., a column), or grooved. Exemplary solid supports include, but are not limited to, microtiter wells, microscope slides, membranes, paramagnetic beads, charged paper, filters, gels, Langmuir-Blodgett films, silicon wafer chips, flow through chips, microarray chips, microbeads and magnetic beads.
As used herein the term “amplification” or “amplifying” means any process or combination of process steps that increases the amount or number of copies of a molecule or class of molecules. In one embodiment, “amplification” refers to a polymerase chain reaction (PCR).
As used herein, primary structure of an oligonucleotide refers to its nucleotide sequence.
As used herein, “secondary structures” of an oligonucleotide refer to RNA or DNA secondary structures including, but is not limited to, a hairpin loop, a bulge loop, an internal loop, a multi-branch loop, a pseudoknot or combinations thereof.
“Pre-selected secondary structures” refers to those secondary structures that are selected and engineered into an aptamer by design.
As used herein, a “variable sequence” or a “variable nucleotide sequence” refers to a base sequence within an aptamer that includes at least one enumerated or randomized position. In some embodiments, “a variable sequence” also includes invariant nucleotides where the nucleotide sequence at that location is the same amongst all members of a given population of aptamers, as long as there is at least one other base that is not constant. In one embodiment, a variable sequence is confined to a single-stranded region of an aptamer. In another embodiment, a variable sequence comprises nucleotides at positions in the double-stranded region and are no more than three nucleotides away from an end of the single-stranded region. “A variable nucleotide sequence” can be at least 2, at least 5, at least 10, at least 15, at least 20 or at least 25 or 50 nucleotides in length.
A “double-stranded region” refers to a region of an aptamer where two single stranded regions have sufficient complementarity to base-pair with each other. Double-stranded regions may have an invariant sequence. In some embodiments, the inclusion of randomized sequences within a region originally intended as single-stranded may permit varied stem positions because randomized positions may be able to base pair with each other thus extending the double-stranded region into a previously single stranded region. In other words, the “single stranded” region of some candidate aptamers may include varied loop positions that may adopt structures with Watson-Crick or non-canonical pairs, triples, quadruples.
As used herein, a concatenated aptamer is a continuous nucleic acid molecule that contains one or more repeats of base sequences linked in series. The linkage may be covalent or non-covalent. In one embodiment, concatenated aptamers comprise two or more identical secondary structures. In another embodiment, concatenated aptamers comprise two or more non-identical secondary structures. In yet another embodiment, concatenated aptamers comprise a combination of identical and non-identical secondary structures.
Buffer conditions refer to the chemical nature of the buffer, pH, added salts, denaturants, detergents, mole ratio of target to aptamer candidates, and other parameters well known to those skilled in the art of modulating target interactions with nucleic acids.
As used herein the term “stringency” is used in reference to the conditions of temperature, ionic strength, and the presence of other compounds such as organic solvents, under which binding assays are conducted.
As used herein, “over-sampling” or “ample-sampling” means that each distinct aptamer sequence has on average at least one, preferably multiple copies in a library and that substantially all possible sequences within a variable nucleotide sequence are represented in a library.
As used herein, “sparse sampling” means that not all possible sequences are present in a library.
As used herein, the term “small molecules” and analogous terms include, but are not limited to, peptides, peptidomimetics, amino acids, amino acid analogs, polynucleotides, polynucleotide analogs, nucleotides, nucleotide analogs, other organic and inorganic compounds (i.e., including heteroorganic and organometallic compounds) having a molecular weight less than about 10,000 grams per mole. In some embodiments, the term refers to organic or inorganic compounds having a molecular weight less than about 5,000 grams per mole, less than about 1,000 grams per mole, less than about 500 grams per mole, less than about 100 grams per mole. Salts, esters, and other pharmaceutically acceptable forms of such compounds are also encompassed.
I. HTSA Screening
In one aspect, the present invention can be practiced using an “in solution” approach where the HT-aptamer library is provided in a solution where it binds to a target immobilized on a solid support. The bound aptamers are then eluted from the target, ligated with adaptor sequences, and PCR amplified prior to high-throughput sequencing. The identity and frequency of occurrence of each bound aptamer is therefore determined by sequencing.
Current aptamer discovery technologies based on SELEX require successive rounds of enrichment of candidate aptamers starting from a highly complex pool of 1014 distinct, fully randomized SE-aptamer candidates of 30 to 120 nucleotides in length. As shown in
Despite its initial success, the SELEX procedure remains arduous, time consuming and poorly amenable to automation. As will become apparent from this disclosure, the SELEX methodology is fundamentally flawed because the complexity of the starting library severely limits the diversity of sequences that can be present in a SELEX library. As shown in Table 1, introduction of random nucleotides at every position of a 70 nucleotide aptamer would potentially generate 470=1.4×1042 distinct aptamer candidate sequences. One hundred pmol of a 70 nucleotide aptamer library comprises just 100×10−12×6.022×1023=6.02×1013 sequences. Hence, even at high concentrations, SE-libraries are very sparsely sampled and capture only a tiny fraction of the full diversity of HTSA libraries described herein. Of course, a long randomized sequence naturally contains shorter sequences, as well. For instance, Table 1 shows that all possible 20 mer sequences are represented an average of 55 times in a library containing 100 pmol of all randomized NA molecules, and all 17 mers are represented more than 3,000 times on average. However, all target-binding sequences of substantial length (20 mers, 17 mers, etc.) cannot be represented in the context of all possible secondary structures—much of the diversity in the pool would be exhausted in creating the H-bonded context. This may help to explain why, in the ˜20 years since SELEX was first reported, SE-aptamers for only ˜500 targets have been discovered.
Unlike SELEX, the HTSA procedure pre-defines the secondary structure of oligonucleotide library members and systematically limits their sequence diversity by position in the chain, thereby creating smaller, more manageable sequence pools which, taken together screen a large diversity of combinatorial sequence space. Typically, each library contains 109-1012 HT-aptamer candidates where every possible permutation of a variable sequence is present on average at least once. This is accomplished by generating relatively short HT-aptamers of just 30-50 nucleotides in length and confining the variable nucleotide sequence generally to single stranded regions and, in some instances, to adjacent double-stranded regions. In some cases, base randomization within an intended single-stranded region can result in base-pairing inside the previously single-stranded region, resulting in extension of an existing double-stranded region or formation of new double-stranded region(s).
Characterization of aptamers isolated by SELEX, such as the thrombin-binding aptamer of
In one embodiment, HT-aptamers comprise any known secondary structure including, but not limited to, a hairpin loop, a bulge loop, an internal loop, a multi-branch loop, a pseudoknot or combinations thereof (see
Moreover, HTSA library design permits direct screening of the library in a single partitioning/PCR amplification step. As illustrated in
The under-sampling limitations seen with SE-aptamer libraries are resolved by HTSA. To understand how, Table 1 shows the changes in the sequence redundancy and complexity in 100 pmol of an HT-aptamer library as the number “m” of randomized nucleotides increases from 1 to 120. In 100 pmol of a candidate aptamer pool where each aptamer candidate is ˜15 nt in length, there are approximately AM=6.02×1013 aptamer candidates in the pool. The number of unique sequences of length m is equal to 4m. For instance, there are pm=45=1,024 unique loops with m=5. The number of copies of each unique sequence is therefore equal to 6.02×1013/1024=5.9×1010. When m−15, there are approximately 56,000 copies of each unique sequence in the pool and a 0.006 chance that any particular HT-aptamer is counted without PCR.
An issue involved in sampling all possible HT-aptamers only becomes apparent when in is about 22 and there are only on average about 3 copes of each distinct sequence in the pool. This represents a threshold number of HT-aptamers that can be detected and sequenced after PCR amplification using current Illumina (Solexa) high throughput sequencers. With the use of more than 100 pmol in the selection step and with even newer generations of sequencers a threshold of m=23, 24, or 25 will become practical. Single-molecule sequencers are due to come on the market soon that require no PCR step. These are especially attractive for the in-solution mode of HTSA.
A fundamental limitation of all aptamer discovery methods, including HTSA, is that the partitioning step of contacting the pool with the target is never 100% efficient to remove unbound or weakly bound candidates. There will be thousands to millions of randomly selected molecules that are sequenced—this represents background “noise” in the experiment. Other non-binding or weakly binding candidates will be carried forward to the sequencing step in HTSA. In the example of a 100 pmol of hairpin loops with m=15, sampling 6×106 sequences, and no partitioning step, the Poisson distribution predicts that there will be 31 instances where a random hairpin will be sequenced three times, and nearly 17,000 times where a random hairpin will be sequenced twice.
A conservative noise floor can be set by the Poisson distribution by those skilled in the art of DNA sequence analysis. A sequence that is determined from the partitioned pool should be considered as a possible binding candidate if it occurs more often than the Poisson estimate for multiple appearance of random sequences. A hairpin candidate that appears at three times or more in the partitioned pool can be considered a “signal” in the example of m=15 and 6×106 sequences determined. As will be seen below, known high affinity aptamers appear thousands of times for targets from such libraries.
In one embodiment, increasing the stringency in the partitioning step may reduce non sequence-specific binding. For example, the ionic strength of the buffer may be increased or competition oligonucleotides, e.g., those containing a part of the double-stranded regions of the candidate HT-aptamers, may be added to the binding buffer.
A person of skill in the art will recognize selection of target-specific aptamers can be accomplished using a variety of partition methods known in the art including by not limited to, immunoprecipitation, gel shift assays, kinetic capillary electrophoresis, size fractionation and various bead assays requiring fractionation by centrifugation or by application of a magnetic field.
The HTSA method inherently identifies alternative HT-aptamers that have a wide range of affinities for the target. To compare the affinity and specificity of the different candidate HT-aptamer sequences, DNA-protein microarrays may be screened using fluorescently tagged proteins or by Surface Plasmon Resonance (SPR) for low throughput, label-free analysis. Also, validated HT-aptamers can be exposed to microarray analysis with other protein targets that are likely to be cross-reactive to determine HT-aptamer specificity.
Surface Plasmon Resonance (SPR) is a label-free method to determine kinetic on-rates and off-rates, and hence the equilibrium constant, Kd, for dissociating an aptamer-target complex. Biotinylated aptamer candidates can be attached to the surface of an SPR microarray chip. Any increase in mass associated with binding the protein target is then measured. Hence, the instrument can usually detect whether the complex has a 1:1 or different stoichiometry. Although SPR has lower throughput than HT-sequencing or microarray analysis (see below), it is still capable of high enough throughput to evaluate the top 100+ of the most interesting candidate aptamers that pass the sequencing and microarray tests.
The HTSA method may also adapted to refine previously identified aptamers, such as SELEX-derived aptamers, by introducing targeted mutations into a selected region or regions of the aptamer and determining the affinity of the refined aptamers for its target.
II. Multiplex Library
In another aspect, the present invention can be practiced using a multiplex approach, where the HT-aptamer library is divided into pools that are immobilized at one of up to 106 or more locations on a solid support, e.g., a microarray chip. Each pool is designed to contain a defined number of enumerated bases within the HT-aptamer's variable sequence from which a predictable number of distinct aptamers of known sequence can be deduced. Binding of the target molecule to a specific location then indicates at least one of the HT-aptamers within the pool at that location contains a binding site for the target. By designing a second microarray chip where each location contains only one of the aptamer candidate species predicted to be found in each sub-pool and repeating the binding to the target, it is possible to determine the identity of any HT-aptamer that binds to the target without the need for direct sequencing.
This aspect of the present invention provides a method that is simpler, more defined and more flexible than the existing in vitro selection methods with respect to both the chemical nature of the oligomer libraries being screened and the resulting high affinity target sequence. The present procedure also affords a huge increase in throughput compared to in vitro selection when many target species are being investigated.
Given the universe of possible sequences in SELEX experiments (e.g., 1×1018 for a random 30-nucleotide stretch), direct synthesis and screening of all sequences is impossible, even given the high-throughput advancements made in DNA/RNA synthesizer instrumentation. This “under-sampling” problem means that in a typical SELEX library, a significant number of candidate sequences is not even present in the library. By contrast, in one application of the present invention, a procedure has been devised to systematically limit library members' sequence diversity by a position-driven approach. Specifically, an embodiment of the present method sequentially holds a predetermined number of, e.g., two, positions invariant—just within a subset of the library—in the chain of the variable sequence under examination, thereby creating smaller, manageable subsets (i.e. features on a chip). Taken together, these smaller subsets are used to screen a large diversity of combinatorial sequence space. We sometimes refer to this multiplex library screening approach as the Combigen method. As described above, the secondary structures of members of such an aptamer library are defined or pre-selected.
In essence, the present invention solves the above-noted “under-sampling” problem in SELEX methodology by dividing the sequence complexity of a library amongst subsets of degenerate pools. If the total sequence complexity is 4m-meaning, the total length of the variable sequence is “m”—and “n” number of nucleotides are chosen to be held invariant in a subset, then 46 of subsets are needed but each subset will only need 4(m−n) distinct sequences to warrant the same desired sequence complexity. By manipulating the 4(m−n) number, a given feature's physical limitation can now comfortably accommodate the number of oligonucleotides needed to guarantee the sequence complexity desired of each subset—in fact, each distinct sequence can be represented in a subset by a sufficient number, e.g., an average of about 3 copies, preferably 4, 5, 6, 7, 8, 9, 10, and more preferably 12 or even higher copies, resulting in “ample-sampling” or “over-sampling” to guarantee the completeness of the multiplex library.
Still referring to
Assuming GCATGA is the ultimate high affinity aptamer sequence for the loop, then Round 1 will have a hit for NNNNGA, Round 2 will have a hit for NNATGA, and Round 3 will reveal GCATGA. Thus, for a N6 library, three rounds of 16 N6-2 subset syntheses (or 3 chip screens) are sufficient to discover an especially tight-binding aptamer. In Round 2, since “GA” has been determined as part of the overall variable sequence, the total sequence complexity required of that round is 256. And through division into subsets, the sequence complexity for each subset within Round 2 is further reduced to 16. The presumed hit sequence of Round 2, “NNATGA,” is represented in Round 1 already, albeit in much smaller number in the “NNNNGA” subset library. Accordingly, positive sequences are further enriched in each subsequent round, and stronger binding signals can be expected if all other conditions remain similar.
One of the key values of the above approach lies in how a defined space of sequences are systematically divided into sequence pool or library sets, providing a context in which sequences of a desired affinity can be located and monitored as the resolution of selected library set are expanded in subsequent screens. Thus, the present invention enriches the number of each of the aptamer candidates within a feature to avoid inadequate or sparse sampling of the library. Desired affinity can be affinity above a pre-determined level, e.g., as measured through binding dissociation constant Kd. In one embodiment, the desired affinity is relatively higher affinity among all the candidates as determined by the strength of a signal that results from the binding in all the library subsets. In other embodiments, the desired affinity is weak affinity, moderate or, preferably, high affinity. Referring back to Table 1 and as described earlier, for every 100 pmol of aptamer candidates about 50 nt in total length, the present invention can provide about 1 copy on average for a variable sequence that is 23 nt in length, about 3 copies on average for a variable sequence that is 22 nt in length, and about 14 copies on average for a variable sequence that is 21 nt in length.
There are a number of methods and media that can be used to examine the affinity of these library sets: chips, filters, gel shifts, or any other means commonly known in the art as suitable for testing binding affinity. In one embodiment, microarray chips are used as a fast, low cost means of comprehensively and comparatively measuring the affinity of millions of oligonucleotide/aptamer features against a target in a parallel, high throughput format.
In one example, the target is a protein. Several groups have used DNA microarrays to study protein-DNA interactions (7,8); much of this work focusing on identifying putative transcription-factor (TF) binding sites (9-11). Bulyk et al have defined these chips and the technique, “Protein Binding Microarrays” or PBM technique respectively. These library chips are designed such that each feature of the microarray represents a completely defined, double-stranded (ds) DNA library sequence for profiling putative binding sites for DNA-binding proteins such as TF's (11-14). These dsDNA features are typically generated by primer extension or self-hairpinning sequences (15). In contrast, the present approach, as it would apply to microarrays, would routinely use multiplexed features in initial and subsequent screens until the resolution is such that each feature represents one defined sequence on the final chip. Also, the “sweet spot” of the present multiplex library constructs is within a pre-defined secondary structure, e.g., a hairpin loop, bulge or junction and not within a dsDNA helix. Furthermore, most of the PBM studies use antibody based detection methods; while we do not rule out that possibility, in a preferred embodiment, the present invention utilizes direct labeling. Microarray chips have been used to study aptamers (16-18); however these studies were focused on presenting chips as a general method for characterizing aptamer hits generated from the SELEX process. These aptamer chips used a completely defined sequence on each feature.
In an application embodiment, the target-specific aptamers are incorporated switchable sensors, as described in the published U.S. patent applications US20060216692 and US20060029933. For example, the AlloSwitch is a molecular switch that changes its shape upon binding its cognate target. The shape change is coupled to a fluorescent or luminescent reporter. The heart of an AlloSwitch sensor is a nucleic acid probe (HT-aptamer) that has a high affinity for the target (
As shown in
The 15-base canonical TBA sequence, described above and shown in
As noted above, two elements of HTSA's selection step expedite the discovery process. (1) The employment of combinatorial libraries with relatively short (<22 bases) degenerate regions allows full coverage of all possible sequences at relatively low library concentrations. (2) The library is oversampled resulting in multiple copies of each possible sequence (see Table 1). Overrepresentation of each sequence coupled with a single partitioning step allows the determination of high affinity binders at frequencies far above the background in the 5-6 million reads generated by a new-generation sequencing instrument.
As outlined above, a short combinatorial 15 mer hairpin library with constant stem and non-complementary tail regions was first generated (stem and tail sequence as in
The library was constructed by application of predetermined input ratios of nucleoside phosphoramidites in a hand-mixed loop synthesis to generate equal numbers of the four bases in the randomized positions before the partitioning step. Prior to running the thrombin-partitioned sample, a dose-response analysis with 4 different specified m=15 hairpins in 1.00:0.10:0.010:0.0010 molar proportion was run without selection against a target. The counts of 3.2 million sequenced clusters were directly proportional to the dose, 1.00:0.11:0.012:0.0010, accurately representing the input population and thereby eliminating concerns of bias due to bridge amplification in the sequencing by synthesis process.
Library partitioning conditions were previously described by Bock et al. A 60:1 target:DNA ratio was maintained from the Bock et al. protocol to demonstrate the efficiency of HTSA. Due to the nature of SELEX—multiple selection and amplification “enrichment” cycles are required after starting with single copies of each sequence.
Because HTSA has a single selection step, in an embodiment, greater stringency could be implemented by reduction of the target:DNA ratio, increasing the salt concentration, adding competitors, etc., among several measures. The successful isolation of high affinity aptamer sequences at the 60:1 ratio served as confirmation of HTSA's efficiency even in conditions of low stringency. Following isolation of high affinity binders, the samples were prepared for sequencing by ligation to adapter DNA molecules required by the Illumina system and PCR amplification. Confirmation of the ligation product and PCR amplification was achieved by agarose gel electrophoresis. The purified PCR product was then analyzed in a single lane of an 8-lane flow-cell for sequencing by the Illumina Genome Analyzer.
The Illumina Genome Analyzer generated ˜5 million reads per partitioning experiment. Output reads were analyzed using a custom Perl script (TABLE 2). To determine the accuracy of the generated sequences, we assessed the base calls of the constant known indexed stem and tail regions and report >95% accuracy for each base position (TABLE 3), The script also counted and ranked each output sequence by frequency, as well as generated a FASTA file that was used for sequence alignment and generation of a phylogenetic tree diagram by ClustalX and Drawtree, respectively (
Based on the assumption that a relationship exists between the number of times a sequence is counted and its affinity for the target it is screened against, HTSA can be used to screen for aptamer sequences that bind a specific target. Of the ˜5 million reads generated, aptamer candidates were distinguishable as they occurred hundreds to thousands of times above a conservative background count of 3 determined from a Poisson distribution of a theoretical 5 million sequences data set (Table 4). The canonical TBA sequence occurred most frequently (46444 counts) while the novel α-methyl-mannoside binding sequence had the second highest count of 29,405. Both constructs lead their sequence homologues and other novel sequences. A sequence alignment and phylogenetic tree of all sequences that appeared at least 10 times revealed 3 distinct sequence motif families (
To validate the findings, the binding affinities of these sequences to α-thrombin were investigated. The highest frequency sequences from each motif family were used in binding studies by SPR analysis (
The dual identification of aptamer candidates for two different targets within one selection experiment substantiates the great promise of HTSA. This was revealed through secondary analysis of the relationship between counts and affinity by SPR analysis. Another contrast with SELEX is also apparent in that repeated cycles of selection and PCR lead to sequence “monsters” that can dominate the population at the expense of desirable aptamers specific for the target. HTSA is shown to be able to tolerate the sugar-binding monsters, which themselves might prove to be useful. In addition, the SPR experiment illustrated in
HTSA's employment of new generation DNA sequencing technology allowed the efficient exploration of the sequence space of thrombin aptamer candidates. The first 108 sequences of the TBA motif were aligned and the frequency of each of base in each of the 15 possible library positions was counted. Alignment profiles display high conservation of the TBA bases GGTTGG that constitute the first half of the stacked GG structure, while the largest variability is tolerated at the G position of the TGT loop of the central loop (see
We also showed that only the aptamer candidates capable of forming a G-quartet motif could effectively inhibit the activity of α-thrombin. The canonical TBA was most effective, while TBA variants had reduced performance in the order of their counts from the HTSA experiments.
Similar results to this α-thrombin example have been obtained for a library of concatenated RNA internal and hairpin loops that bind human coagulation factor IXa. The 5′- and 3′-termini of this molecule consisted of the same DNA tails and the bottom six DNA base-pairs of the stem as in
HTSA bypasses the 3 slowest steps in standard SELEX aptamer generation; (1) Multiple rounds of partitioning, (2) Cloning of the sequences into plasmids, picking colonies and conducting conventional sequencing and (3) Truncation of sequences from the ends of the long chains to find the core binding sequences of aptamers. The principal expense is the cost of next generation sequencing technologies which can be reduced by multiplexed sequencing of different selection experiments. However, the largest cost in a biotechnology laboratory is for salaries of highly trained employees, so the sequencing expense is quickly recovered. In addition, newer sequencing technologies offer the chance to multiplex the sequencing runs to analyze winning sequences from different pools applied to multiple targets.
Materials and Methods
Aptamer Selection
Following elution of high affinity binders, the eluted mixture was phenol extracted twice followed by a final chloroform extraction. After concentration, adapter constructs were ligated to the candidate sequences. The ligation step was as follows: 50 μM adapter sequences and their complements were added to the partitioned DNA library and incubated at 90° C. for 3 min, ligation buffer and T4 DNA ligase (New England BioLabs) were added at 25° C. and the mixture was incubated for 30 min. DNA was extracted using a QIAquick PCR purification Kit (QIAGEN) and purified on a 2% agarose gel after which the ligation product was excised out and extracted using a QIAGEN MiniElute Gel Extraction Kit. PCR cycling conditions were as follows: Initial denaturation at 94° C. for 2 min and 18 repeats of denaturation at 94° C. for 1 min, primer annealing at 61° C. for 1 min and elongation at 72° C. for 1 min. The PCR product was purified and its length confirmed on a 2% agarose gel prior to sequencing.
DNA Sequencing Data Analysis
The Illumina Genome Analyzer (GA) generated ˜4-6 million reads per partitioning experiment. Output sequence files were analyzed using a custom Perl Script. A stringent algorithm (low penalty tolerance) was used to filter the output GA data for sequence strings that contained full length library sequences containing the 15 nt loop region. Sequences that contained ≦2 base mismatch or a single gap within the 10 bases to the 5′ side, and 4 bases to the 3′ side of the “m” degenerate library were categorized as “candidate reads”. Sequences that failed to meet these conditions were categorized as “bad reads”, and served to highlight adaptor ligation or amplification issues encountered in the experiment. The script output all parsed data into text files described further in Supplementary Data. The selection of 10 bases of the header sequence and 4 bases in the tail was a result of optimization of the script and observation that the combination was sufficient for maximum filtration of bad reads. Of the “candidate reads,” sequences with exactly 15 bases in the variable “m” region were selected as “good reads”. These sequences were subsequently input into ClustalX to generate alignments profiles and phylogenetic trees for further analysis.
Thrombin Analysis
To avoid the selection of aptamers against contaminants, the purity of α-thrombin used in the selection experiments was verified by sedimentation velocity experiments which verified a consistent ˜90% purity and ˜10% self-cleavage products. Following a 24 hour dialysis period in selection buffer, sedimentation velocity experiments of α-thrombin were performed on a Beckman XL-A instrument in which the sample was monitored using absorbance optics at 280 nm. Data was acquired over 21 h using a 6 channel cell with an epon charcoal-filled 3 mm centrepiece at a rotor speed of 50,000 rev/min at 20° C. The data was analyzed using SEDFIT using a v-bar of 0.69 mg/mL23.
SPR Analysis
Binding affinities were measured using a GWC SPRimager®II array instrument (GWC Technologies, Inc.) and 16 and 25 SpotReady™ chips. SPRdata was acquired using the V++ imagining software and analyzed in Microsoft Excel. All SPR experiments were conducted at 25° C., using selection buffer as the running buffer. For each experiment, the surface of the SpotReady™ chip (GWC Technologies, Inc.) was functionalized by incubating the chip in a 1 mM solution of 8-amino-octanethiol (AOT) (Dojindo Molecular Technologies, Inc.) in absolute ethanol at room temperature overnight, creating a self assembled monolayer. The chip was rinsed with absolute ethanol and dried under nitrogen and was incubated with 1 mM 4-(N-maleimidomethyl) cyclohexane-1-carboxylic 3-sulfo-n-hydroxysuccinimide ester (SSMCC) (Pierce Biotechnology) for an hour to create a thiol-reactive maleimide-terminated surface. Reduced 3′ thiolated DNA oligonucleotides (2 mM) were then spotted in 5 replicates per sequence onto the SSMCC treated chip and allowed to react overnight. Excess DNA was removed by washing with nuclease free water and drying under nitrogen. The chip was blocked overnight with 4 mM mPEG-thiol (MW 1000) (Nanocs) to cap all unreacted SSMCC. Once mounted on the instrument, the chip was blocked with 500 nM bovine serum albumin (Fischer Scientific), washed with 0.02% Tween-20 in selection buffer and subsequently selection buffer (without Tween-20). Binding experiments were performed with 50 nM, α-thrombin that was pumped into the flowcell at a constant flow rate for 10 min after which selection buffer was used to wash the chip.
Gel Mobility Shift Assay (GMSA)
Each DNA aptamer candidate was pre-incubated for 30 minutes in selection buffer with α thrombin, Con-A, α thrombin+Con-A, Con-A beads and Con-A beads saturated with α thrombin, all separately in both the presence and absence of 20 mM glucose and 20 mM α methylmannoside in a 60 DNA:1 protein ratio as per selection conditions. Samples were analyzed on native polyacrylamide gels (14% (w/v)) in 1×Tris/glycine running buffer at 100V for 30 min at 4° C. Immediately after electrophoresis, gels were SYBR gold stained for 1 hour, imaged and then subsequently stained with Coomassie Brilliant Blue for protein staining.
Semi-quantitative Real Time PCR (sqRT-PCR)
In an effort to confirm that the counts generated in high throughput sequencing were representative of affinity for a target and not a result of “super” amplification bias, sqRT-PCR was performed. 12 PCR reactions per aptamer candidate were prepared with equal amounts of starting template DNA and PCR cocktail reagents. PCR cycling conditions were as described for the selection process but were repeated for 30 cycles instead of 18. 2 tubes per sequence were removed at cycle 10, 14, 18, 22, 26 and 30 and their amplification rates were compared by gel electrophoresis and Nanodrop DNA concentration readings.
Thrombin Activity Assay
Clotting times were measured in duplicates using a mechanical fibrometer, Oatoclot 2 (Helena Laboratories). Normal human plasma and varying concentrations of DNA aptamer candidates (0.1 nM-700 nM) were incubated for 4 min at 37° C. before adding α-thrombin diluted in selection buffer and pre-equilibrated at 37° C. to a final α-thrombin concentration of 7.5 nM. The extent of thrombin inhibition was then calculated using a thrombin standard curve generated by measuring the plasma clotting time versus thrombin concentration, at various thrombin concentrations in the absence of the high affinity binding DNA sequences.
DNA sequences used in the “in-solution” example of HTSA are listed below. All sequences are listed in 5′ to 3′ direction and m=15. Note that adapter complementary sequences possessed overhangs into the constant stem and tail regions of the library from each direction, thus their longer lengths. The forward PCR primer also introduced a 5′ overhang sequence thus its longer length. The overhang sequence was complementary to a sequence planted on the Illumina flowcell and thus facilitated the annealing of the amplified library to the flowcell for sequencing. The sequencing primer was essentially adapter 1.
DNA Sequences:
The process of discovering tight binding sequences was greatly accelerated by systematically searching through a structurally defined library of sequences assembled in microarray format in this example. For this study we successfully screened HIV-1 Nucleocapsid Protein p7 (NC) against a DNA hairpin library containing all possible 3 to 6 nucleotide loop sequences at varying levels of feature complexity. In two consecutive chip screens, we discovered several high affinity DNA loop sequences that bound NC with low nM affinity, as determined by NC-Tryptophan titration assays.
Materials and Methods
DNA libraries: The N3-N6 DNA hairpin library covered all possible 3 to 6 base loop sequences (21 mers to 24 mers respectively) for a total of 5440 unique sequences. The library was synthesized in pool complexities (# sequences per pool) of 64 (
Microarray Printing: The DNA libraries were transferred to 384-well plates and diluted 1:1 with 2× spotting buffer (Arrayit, Inc.) making 50 uM printing stocks. DNA libraries were printed using an Omnigrid 100 arrayer, equipped with four state-of-the-art 100 micron silicon wafer printing pins. The libraries were printed on super streptavidin slides (Arrayit, Inc.) in lots of 25 slides, at 70% humidity. Slides were left overnight to dry and subsequently stored in a 4° C. desiccator. Libraries were printed as 4 identical arrays (A, B, C, D) each having 4 identical library “blocks” (1, 2, 3, 4). Control sequences G (positive), 5′GGACUAGCGGUGGCUAGUCC, and A (negative), 5′GGACUAGCGAUAGCUAGUCC have known affinities to NCp7.
Protein Labeling: The HIV-1 NCp7 protein was supplied by Dr. Borer's laboratory (Syracuse University, Chemistry). Fresh stocks of NCp7 protein are routinely made in the laboratory on a weekly basis to >95% homogeneity and in high yield, as determined by SDS-PAGE. Prior to screening, each protein was fluorescently tagged using amino reactive Dylight 549 or 649 reagent (Pierce Biotechnology). Labeling reactions were optimized to obtain 1 label per protein using manufacturer protocols. Unreacted label was completely removed using an affinity purification resin supplied by Pierce Technology.
Protein Screening: Slides were fitted with a 4-well gasket and loaded onto the Fast frame (
NC-Trp Titration assay: The oligonucleotides were independently titrated against NCp7 protein in the microarray screening buffer (PBS, pH 7.4, 0.1% Tween-20, 5 mM MgCl2) at 25° C. The Trp fluorescence at 350 nm was monitored upon addition of concentrated aliquots of oligonucleotide to a 0.35 uM NCp7 sample. Titrations were run on a PTI spectrometer (QM-4/2005 SE, Photon Technology International, Birmingham, N.J.) and data was acquired 5 minutes after each oligonucleotide aliquot using Felix 5.1 software. Data was exported to Excel and Kd values determined for each oligonucleotide by fitting titration curves assuming a 1:1 binding model using a nonlinear regression analysis.
Results:
The microarray studies were conducted on Streptavidin chips using biotiylated DNA libraries. The microarray layout is shown in
Initial studies were conducted on the N3-N6 diversity chip sets containing features having up to 256 sequences, which were hybridized with Cy3 labeled NC (Cy3-NC). These screens produced a number of hits shown in
A histogram of the Cy3-NC chip profile is shown in
Collectively, the average intensities of all 64-complexity library features (middle group of darker bars) are higher than the 256-complexity features (right group of lighter bars) for the same library. This is due to the lower concentration of each sequence in a 256-complexity feature, which is ¼ of that same sequence present in a 64-complexity feature. Although the 256-complexity features are less than twice background (AUA), their relative intensities clearly reveal features, N6-6(256) and N6-14(256) mentioned previously.
“Expanded” Chips
The N3-N6 diversity chip allowed us to rapidly assess all possible DNA hairpin loops of 3 to 6 bases against NC in a single microarray. In a second round of screening, the N6—56(64) and N6—57(64) library sets were completely enumerated and printed onto streptavidin chips in the same FAST frame format (
The slides were blocked, hybridized with Cy3-NC and washed using identical protocols. The result of this screen and the chip layout is shown in
The expanded N6—56(64) hit set was printed in the bottom half of the array, shown in
C-probe/NCp7 Secondary Screens
In preliminary work, the affinity of SL3 RNA hairpin constructs, having point mutations in the loop region, were determined for the NCp7 protein by monitoring the protein's tryptophan fluorescence. The tryptophan-37 residue of NCp7 is fluorescent and its emission is quenched upon formation of a complex with a nucleic acid. This behavior permits a quantitative fluorescence titration to be performed in which RNA (or DNA) is added to an NCp7 solution. The resulting data is then analyzed to determine the stoichiometry of the complex, the residual fluorescence level at saturation and the equilibrium dissociation constant, Kd, for 1:1 complexes (19-21). To confirm the intensity profile of the expanded NC chip screens, a collection of hit and non-hit library sequences were independently investigated using the NC-Trp titration assay.
The results of these NC-Trp titrations are shown in
In these studies we discovered a novel set of DNA hairpin constructs with low nM affinities to the NCp7 protein using two multiplex library chips of the present invention. Each protein screen took less than 24 hours to complete from labeling the protein to analyzing the protein's chip profile. The FAST frame slide holder allowed us to rapidly process multiple slides in parallel and under different buffer conditions during the 24 hour period. The only “bottleneck” in the entire process was waiting for IDT (Integrated DNA Technologies, Inc.) to deliver the biotinylated multiplex libraries. The results of these screens surpassed our own expectations in terms of sensitivity, reproducibility and speed. Furthermore, these chip studies and resulting profiles serve as a valuable control library in further optimizing the multiplex microarray format of the invention.
The N3-N6 diversity chips used in the 1st round screens covered all possible 3 to 6 base loop DNA sequences (21 mers to 24 mers respectively) for a total of 5440 unique sequences. The 5440 sequences were systematically covered in a 110 feature arrays using 3 or 4 contiguous degenerate positions within a loop structure. This level of degeneracy allowed us to study feature complexities of 64 and 256 on a single chip for hairpin loop sizes of 3-6 bases. NCp7 was selected as the protein target due to its ability to bind known hairpin loop constructs, which were used as control features.
The N3-N6/NCp7 screens generated several hits as shown in
Here we successfully demonstrated that multiplexed features of 64 and 256 sequences identify the same 6-base loop sequence class TGXXXX or GGXXXX (where X=A, G, T or C). This is a very important first step in developing the multiplex screening approach. In further embodiments, aptamer libraries with feature complexities of 1024 (NNNNN or 45) are constructed. In using libraries of higher feature complexities, background noises due to manual washing should be minimized by, e.g., automating the hybridization and wash steps using available hybridization stations. Amplifying the hit signals should also facilitate analysis of higher complexity libraries.
If the aim of the experiment is discovering only the highest affinity sequences, it will be important to ensure that no high affinity hits are “hidden” within non-hit features. Screening at lower feature complexities (i.e. higher resolution microarrays) will reduce this problem, but at an expense of library coverage. The present invention contemplates chip resolution as a delicate balance, with the desire for broad coverage of sequence space with high resolution. In various embodiments, high density microarray platforms such as Nimblegen (Roche), Geniom (Febit), and Agilent arrays, are employed in addressing this issue.
Within the expanded N6—56(64) library set, several distinct hits were present, which contributed to the total intensity of the parent N6—56 and N6—14(256) feature on the diversity chip. Loop sequences TGTTGT and TGTGGG represent the top two hits on the expanded N6-56/57 chip. In general, sequence families TGTGGX, TGTGTX, and TGTTGX, where X represents C, T and G bases, have been found to collectively contribute to the total intensity of the parent features. This result demonstrates that the sequences within a multiplex feature are collectively binding as a “family” of sequences, which suggests that having a case where only a single high affinity sequence is present in a multiplex feature is unlikely. In other words, the multiplex feature that possesses the highest affinity “star” sequence will have close sequence homologues that will more than likely bind the protein target with a moderate affinity, and contribute to the protein's overall affinity for that mixture. These sequence homologues are very useful for identifying the best sequences to use as aptamer specific for a given target. They are also useful to distinguish aptamers that are least likely to cross-react with known interferences for a target, simply by screening the interferences against these same arrays at moderate to high complexity. Of course, as the feature complexity increases, the homology of the sequences within the feature will become more distant.
The hit and non-hit sequences discovered using the multiplex microarray method were further investigated using the NC-Trp titration assay. An important aspect of this assay is that these experiments were performed in homogenous buffer solution under equilibrium conditions at physiological ionic strength. Reversibility and reproducibility were demonstrated and the data conformed to the expectations in respect to both equilibrium constant and stoichiometry.
In preliminary NC-Trp titration studies using RNA hairpin constructs of the HIV-1 genomic RNA SL3 motif, 24 of the 64 possible SL3 constructs having GNNN loop diversity (43), showed Kd values ranging from 20,000 to 10 nM, a 2000-fold variation in affinity for these three loop positions (22). Interestingly, the stem sequence and length has very little effect on the stability of the complex, even DNA stems decrease affinity only slightly (23), while replacing RNA loop residues with DNA reduces the stability of the complex by ˜10-fold (24). Results from several of these titration studies indicated the highest affinity sequence loop sequence for SL3 RNA is GGUG followed by GGGG (24). The value of the dissociation constant, Kd, for the GGUG case is 10 nM in 0.20 M NaCl buffer, pH 7.4.(19-21) All of the other loop sequences where found to have lower affinity toward NCp7 (24). These results correlate well with the appearance of GTG and GGG DNA base patterns (for loop positions 4, 5, 6) discovered using the multiplex chip screening approach. Furthermore, our high affinity hit sequences also correlate well with the findings of Fisher et. al. which used surface plasmon resonance (SPR) to study NC binding to series of short DNA oligonucleotides. They found that NC bound tightly to d(G) homopolymers, but exhibited much stronger binding to d(TG)n, were n≧5 (25).
Through work performed in this Example, a novel set of DNA hairpin constructs with low nM affinities to the NCp7 protein was discovered using two Combigen library chips. Each protein screen took less that 24 hour to complete from labeling the protein to analyzing the protein's chip profile. Using a chambered slide holder it was possible to rapidly process multiple slides in parallel and under different buffer conditions in a 24 hour period. The only “bottleneck” in the entire process was the 1 week delay for IDT (Integrated DNA Technologies, Inc.) to synthesize and deliver the biotinylated multiplex libraries. The results of these screens surpassed our own expectations in terms of sensitivity, reproducibility and speed.
Any patent, patent application, publication, or other disclosure material identified in the specification is hereby incorporated by reference herein in its entirety. Any material, or portion thereof, that is said to be incorporated by reference herein, but which conflicts with existing definitions, statements, or other disclosure material set forth herein is only incorporated to the extent that no conflict arises between that incorporated material and the present disclosure material.
Calculations made for 100 pmol of library.
AM=6.02E+13 is the total number of all NA molecules in the library.
m=length of varied loop sequence.
pm=4m, the number of unique sequences of length, m.
Np=AM/pm, the average number of each unique molecule in a pool that includes only length m.
H=6.0E06, the number of readable sequences from a chip.
tm=H/pm, is the average number of times a given loop in a pool of length, m, should be sequenced in the absence of a prior separation step.
m ≠ 15
NNNNNNNN
m = 15
NNNNNNNN
string ≦ 2 mismatches and < 1
= correct constant stem base,
AGTGTGGTCGGAAGT
ATGTGGCGAGGATGA
aA total of 4,749,241 reads had validated stems separated by 15 bases, with 4,237,141 unique sequences found. Only 4 sequences with varying count numbers from each conserved sequence group are shown.
This application claims priority to and the benefit of U.S. provisional patent applications Ser. Nos. 61/035,844 filed on Mar. 12, 2008, and 61/119,777, filed on Dec. 4, 2008, the entire contents of which applications are incorporated herein by reference.
This invention described herein was sponsored by the NIH under Phase I SBIR grants awarded to Orthosystems, Inc., the U.S. government may have certain rights in this invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US09/37022 | 3/12/2009 | WO | 00 | 1/3/2011 |
Number | Date | Country | |
---|---|---|---|
Parent | 61035844 | Mar 2008 | US |
Child | 12922173 | US | |
Parent | 61119777 | Dec 2008 | US |
Child | 61035844 | US |