A METHOD FOR REPRODUCIBLE APTAMER SELECTION USED TO IDENTIFY APTAMERS THAT BIND TO UNKNOWN BIOMARKERS

FIELD OF THE INVENTION

The present invention relates to the field of aptamers, and provides a library of sequences for aptamer selection and related methods for selecting aptamers for binding to target molecules.

BACKGROUND

Aptamer development has grown into a global industry since their invention over thirty years ago. A key difficulty with aptamer selection has been that it is essentially not reproducible as the sampling of sequences used for any given project represents a small sub-sample of all the possible sequences in a synthesized library. Also, the use of libraries with lower numbers of random nucleotides has been constrained by the need for primer recognition sites to support reiterative selection. The following invention provides a basis for the effective use of closed solution spaces of sequences for aptamer selection, first with relatively few random nucleotides for small molecules selection and secondly with sufficient random nucleotides to support sufficiently complex structures for larger molecules. Aptamers derived from this new process have been termed “Neomers” by the Inventor, which means than a library of aptamers can be defined as “Neomer Library.”

Aptamers are synthetic, single stranded oligonucleotides that mimic antibodies in their ability to act as ligands and bind to analytes. U.S. Pat. No. 5,475,096 A teaches a method for the in vitro selection of DNA or RNA molecules that are capable of binding specifically to any given target molecule. The method taught is composed of a process of reiterative selection steps that the inventors named “Systematic Evolution of Ligands by Exponential Enrichment” or SELEX.

The Inventors of U.S. Pat. No. 5,475,096 A suggested that only one cycle of selection may be sufficient in order to identify desirable aptamer sequences if the selection process was sufficiently stringent. Here, I quote from this patent,

“In one embodiment of the method of the SELEX patent applications, the selection process is so efficient at isolating those nucleic acid ligands that bind most strongly to the selected target, that only one cycle of selection and amplification is required. Such an efficient selection may occur, for example, in a chromatographic-type process wherein the ability of nucleic acids to associate with targets bound on a column operates in such a manner that the column is sufficiently able to allow separation and isolation of the highest affinity nucleic acid ligands.

In many cases, it is not necessarily desirable to perform the iterative steps of SELEX until a single nucleic acid ligand is identified. The target-specific nucleic acid ligand solution may include a family of nucleic acid structures or motifs that have a number of conserved sequences and a number of sequences which can be substituted or added without significantly effecting the affinity of the nucleic acid ligands to the target. By terminating the SELEX process prior to completion, it is possible to determine the sequence of a number of members of the nucleic acid ligand solution family.”

We define a closed solution space in terms of possible aptamer sequences as meaning an initial selection library that contains multiple copies of all possible sequences and the capacity to characterize the frequencies of all possible sequences.

In the US patent described above, this definition of closed solution space was not contemplated because this was prior to the invention and use of next generation sequencing. They only contemplated an open solution system because they did not consider it possible to characterize all of the possible sequences in the naïve selection library (the random library prior to the initiation of selection). Instead, they relied on either reiterative selection or highly effective selection to limit the number of successful aptamers (aptamers that bind to the target) to a smaller number than were present in the naïve library. There was no consideration of the possibility of comparing changes in frequency for every sequence in the naïve library to the frequency of every sequence in a selected library after a single round of selection. This necessary reduction in the number of sequences is a function of both the number of possible sequences in the naïve library, and the capacity to sequence the aptamers after selection. They clearly state that this problem can be solved with either extremely strong selection pressure or with reiterative selection. In this invention we provide an alternative solution which relies on the design of the selection library such that the frequency of all of the possible sequences in the naïve library can be characterized and thus can be directly compared to the frequency of all of the possible sequences in a selected library after a single round of selection, or after multiple rounds of selection.

The SELEX process disclosed in the original patent outlined in italics above requires synthesis with primer recognition sites that flank the random sequence on both the 5′ end and 3′ ends. These extended, conserved, sequence regions are required for this invention because they enable PCR amplification of the selected library. However, given that the sequences within the library are single-stranded, and given that a portion of these sequences are random, a large proportion of the sequences in the naïve library will exhibit substantial secondary structure complexation with random nucleotides and the primer recognition sites. This is deleterious for several reasons.

Reason 1: the selection process is based on the functionality of the aptamers (their ability to bind to a specific target). This functionality is manifested by the secondary and tertiary structure that the aptamers are able to adopt based on their sequences. We can think of this as the structure space for an aptamer library. The presence of extended primer recognition sequences constrains the secondary and tertiary structure space. The possible structure space is not fully random as it is dominated by structures that involve the primer recognition sequences.

In practice this difficulty is reduced by the employment of more random nucleotides, thus promoting the probability of a broader diversity of structures based on hybridization within the random region. This does not entirely overcome this deleterious effect however as still a large proportion of the sequences and consequently the structures do involve hybridization between nucleotides in the random region and the primer recognition region.

Reason 2: a broader diversity of structures can be obtained with the use of more random nucleotides but this also leads to longer aptamers, with the probability that sub-structures within an aptamer are responsible for its functionality and that other regions of the aptamer are not involved in binding. The presence of these functional regions in the selected aptamer is deleterious as these not only add to the cost of aptamer synthesis but they also have the potential to decrease aptamer functionality by interfering with the functional domain (decreasing affinity) or by exhibiting the capacity to bind to other target molecules (decreasing specificity). This deleterious effect affects aptamer selection for all targets but is of increasing concern as the size of the target molecule decreases. The smaller the target molecule the less nucleotides will be involved in binding events with the target, and thus the more profound effects of such non-necessary domains become in terms of affinity and specificity.

To explain further, it is common in the art of aptamer selection to use primer recognition regions that average 20 nucleotides in length flanking a random region of 40 nucleotides. As such, the selected full-length aptamers are 80 nucleotides in length. An aptamer of this length has a higher probability on average of being in flux between different possible shapes at the temperature at which it has been selected (FIGS. 1 & 2). An example of this flux in secondary structure of a random aptamer sequence is provided below as an example. This analysis is based on the use of software referenced by Gruber et al., 2008 (Nucleic Acids Res. 36 (Web Server issue): W70-W74).

This flux between possible shapes results in a decrease in the proportion of time that the aptamer is in a shape that is optimal for binding to the target molecule. This increases the probability that a proportion of the times that an aptamer collides with its target the aptamer will not be in the appropriate shape for binding to occur.

In the art, it is common to attempt to overcome this difficulty by truncating the aptamer to a minimal size and to limit the number of possible structures that can be formed.

This practice is time-consuming, and not always successful, as simple truncation of a sequence often does not result in an improvement in the stability of a sub-structure. Such stabilization also requires the introduction of nucleotide substitutions, and given that these substitutions were not part of the original selection, they may result in complete loss of aptamer function.

Reason 3: we have disclosed an invention for an alternative approach to SELEX-based aptamer selection which we have named FRELEX (EP 3 344 805 B1 and U.S. Pat. No. 10,415,034 B2). In this approach, we expose traditional aptamer libraries to a field of 8 contiguous random nucleotide oligonucleotides immobilized on a surface. In phase I of selection, we select for aptamers with the capacity to hybridize with the random 8-mers, and discard all aptamers that do not. We elute these hybridized aptamers and, in phase II of selection, we combine them with a molecular target and re-expose the mixture to a fresh field of immobilized random 8 nucleotide oligonucleotides. In this case, we retain those sequences that did not hybridize to the immobilized nucleotides because we presume that they were constrained from doing so as a result of their binding to the target molecule.

This approach has worked well for the development of many aptamers for many targets, but for small targets, the approach implicitly drives selection for aptamers that bind at multiple sites to a given target molecule over those aptamers that bind the target more tightly. This is also an issue with SELEX selection for small molecules, as the immobilization of the target molecule also favours selection of aptamers that bind to more than the target molecule and hence have a higher probability of being retained in the selection process.

The need to increase the length of the aptamer in order to more adequately explore the structure space during the selection process as a function of the presence of primer recognition sequences increases the nature of this constraint on selection for small molecules with FRELEX or SELEX.

Here, we describe a synthetic library of aptamers in a closed sequence solution space, allowing a reproducible selection of aptamers, particularly useful for aptamer selection against small molecules.

SUMMARY

The present invention relates to a synthetic library of aptamers with one module, said library comprising a plurality of aptamer oligonucleotide sequences, each comprising one module comprising at least two regions comprising a mixture of fixed and random nucleotides, two regions being interspersed with a stretch of fixed nucleotides, preferably with a stretch of adenosines and/or thymidines;

- wherein internal sequence hybridization events are driven by variations in the random nucleotides of each of the at least two regions; and
- wherein the aptamer oligonucleotide sequences comprise at most 11 random nucleotides so that the maximum number of possible different sequences in the library of aptamers is limited to 4 194 304; preferably at most 8 or 9 random nucleotides.

In some embodiments, each aptamer oligonucleotide sequence comprises at most 8 or 9 random nucleotides.

In some embodiments, each aptamer in the library of aptamers has the nucleotide sequence SEQ ID NO: 1, 7, 9, 17 or 18.

The present invention also relates to a synthetic library of aptamers, said library comprising a plurality of aptamer oligonucleotide sequences each comprising two or more modules, each of said modules comprising at least two regions comprising a mixture of fixed and random nucleotides, two regions being interspersed with a stretch of fixed nucleotides, preferably with a stretch of adenosines and/or thymidines;

- wherein internal sequence hybridization events are driven by variations in the random nucleotides of each of the at least two regions within a same module or between two modules, preferably within a same module;
- wherein each module comprises at most 11 random nucleotides, preferably at most 8 or 9 random nucleotides; and
- wherein a restriction site is present between two modules.

In some embodiments, each aptamer in the library of aptamers has the nucleotide sequence SEQ ID NO: 6 or 14.

The present invention also relates to the use of the synthetic library of aptamers according to [0018]-[0020], in an aptamer selection process against a small target molecule, preferably wherein the small target molecule has a molecular weight below about 1 kDa.

The present invention also relates to a method of selecting aptamers that specifically bind to a small target molecule, preferably to a target molecule having a molecular weight below about 1 kDa, said method comprising contacting the small target molecule with the synthetic library of aptamers according to [0018]-[0020], and recovering the aptamer oligonucleotides that bound to the small target molecule; and optionally, contacting another small target molecule with the synthetic library of aptamers and recovering the aptamer oligonucleotides that did not bind to said other small target molecule.

In some embodiments, the small molecule is selected from the group consisting of antibiotics, volatile organic compounds (VOCs), amino acids, sugars, lipids, phenolic compounds, and alkaloids.

The present invention also relates to the use of the synthetic library of aptamers according to [0021]-[0022], in an aptamer selection process against a large target molecule, preferably wherein the large target molecule has a molecular weight above about 1 kDa.

The present invention also relates to a method of selecting aptamers that specifically bind to a large target molecule, preferably to a target molecule having a molecular weight above about 1 kDa, said method comprising contacting the large target molecule with the synthetic library of aptamers according to [0021]-[0022], and recovering the aptamer oligonucleotides that bound to the large target molecule; and optionally, contacting another large target molecule with the synthetic library of aptamers and recovering the aptamer oligonucleotides that did not bind to said other large target molecule.

In some embodiments common to all uses and methods disclosed herein, the selection in performed in a single round based on the statistical evaluation of the change in frequency of each sequence in the synthetic library of aptamers between a positive selection in the presence of the target molecule and a negative selection in the absence of the target molecule.

In some embodiments common to all uses and methods disclosed herein, the use or method is for selecting aptamers that specifically bind to a target molecule in a given 3-D or 4-D conformation.

In some embodiments common to all uses and methods disclosed herein, the use or method is for selecting aptamers that specifically bind to a full-length or native target molecule as opposed to the same target molecule that is degraded or otherwise cleaved.

In some embodiments common to all uses and methods disclosed herein, the use or method is for selecting aptamers that specifically bind to a target molecule as opposed to one or several counter-target molecules, wherein one or several positive selections are performed for both the target and the counter-target molecules with the same starting library, and aptamers that are preferentially selected in the presence of the target molecule compared to the counter-target molecules are recovered.

In some embodiments a target molecule consists of a target molecule or target molecules.

In some embodiments a target molecule consists of known target molecule or unknown target molecules.

In some embodiments a counter-target molecule consists of a counter-target molecule or counter-target molecules.

In some embodiments the target molecule and/or the counter-target molecule is selected from the group consisting of antibiotics, volatile organic compounds (VOCs), amino acids, sugars, lipids, phenolic compounds, alkaloids, proteins and peptides, optionally extracellular domains of transmembrane receptors.

In some embodiments the aptamers are selected to specifically bind to a full-length or native target molecule as opposed to the same target molecule that is degraded, cleaved, or altered due to post-translational modifications, mutations, or changes in 3D structure.

In some embodiments the identification of aptamers for target molecule comprising the following steps:

- a. Performing aptamer selection on the desired target with the synthetic library of aptamers,
- b. Performing selection on a counter-target or counter-targets with the same library either before, simultaneously or after step a,
- c. Selecting the best performing aptamers on the desired target using statistical analysis, preferably using the formulation of a Z statistic for each sequence:

$Z = \frac{Avg . freq . w / target - Avg . frequency w / o target}{Avg . of the standard deviations of the averages}$

wherein “Avg. freq w/target” is the average frequency in the presence of selection for a target; and “Avg. freq w/o target” is the average frequency in the absence of selection for a target.

- d. Optionally evaluating how these best performing aptamers on the desired target respond to selection on the counter-target or counter-targets by characterizing their response in the selection on a counter-target or counter-targets with the same library, and
- Selecting the aptamers by retaining only those sequences that do not exhibit a statistically significant response to selection to a counter-target or counter-targets,
- or
- Selecting aptamers that cross react to multiple targets, by retaining only those sequences that exhibit a statistically significant response to selection to counter-target or counter-targets.

In some embodiments the method for identifying at least one aptamer against a target molecule, comprising the steps of:

- a. Generating a synthetic library of aptamers,
- b. Using SELEX or FRELEX selection process, incubating the candidate aptamers with said target,
- c. Performing PCR reaction for each selected library,
- d. Cutting the amplified library using a restriction enzyme that recognizes the restriction site designed to reside in the middle of the library, to divide the selected library into two different modules, Module A and Module B, with a difference in sequence at either the 5′ or 3′ end or both ends of the modules.
- e. Dividing the restricted library into two aliquots and specifically amplifying Module A from one aliquot and Module B from the second aliquot with sufficient PCR reactions to ensure maintenance of an average copy number of at least 100 copies of each sequence.
- f. Determining the frequency of each Module sequence within each selected library by dividing copy number by the total number of reads for that Module within each selected library.
- g. Multiplying the frequency of the sequences in Module A by the frequency of the sequences in Module B thus creating a matrix of the frequencies of 4 294 967 296 possible sequences for each selection,
- h. Performing steps to g in at least duplicate to enable calculation of average frequencies and the standard deviation of these average frequencies for each 4 294 967 296 possible sequences for a given target and for selections with the same library in the absence of the target.
- i. Identifying sequences in terms of determining the statistical significance of the average differences in frequencies between a library selected for a target and a library selected in the absence of the target, including but not limited to evaluating Z values by subtracting the average frequency of each sequence in the absence of target from the average frequency of each sequence in the presence of target, and dividing this subtracted value by the average of the standard deviation for the same sequence in both the presence and absence of the target.
- j. Optionally, evaluating the binding performance of the sequences against the desired target and counter-targets using methods from the group comprising: surface plasmon resonance imaging, isothermal titration calorimetry, dialysis, qPCR analysis of bound and unbound fractions of aptamer, electrochemical approaches, modulation of fluorescence either through quenching, or fluorescence polarization, lateral flow assays, ELISA assays, or HPLC analysis.

DETAILED DESCRIPTION

The present invention relates to a synthetic library of aptamers comprising a plurality of aptamer oligonucleotide sequences, and related uses and methods.

The terms “aptamer” or “aptamer oligonucleotide” or “aptamer oligonucleotide sequence” refer to oligonucleotides that mimic antibodies in their ability to act as ligands and bind to a target molecule. In some embodiments, aptamers comprise natural or synthetic DNA nucleotides, natural or synthetic RNA nucleotides, modified DNA nucleotides, modified RNA nucleotides, or a combination thereof. The term “Library of aptamers” refer to DNA library of aptamers or RNA library of aptamers.

In some embodiments, the synthetic aptamer library comprises at least 50 000 different sequences, at least 75 000 different sequences, at least 100 000 different sequences, at least 250 000 different sequences, at least 500 000 different sequences, at least 750 000 different sequences, at least 1 000 000 different sequences, at least 1 250 000 different sequences, at least 1 500 000 different sequences, at least 1 750 000 different sequences, at least 2 000 000 different sequences, at least 2 250 000 different sequences, at least 2 500 000 different sequences, at least 2 750 000 different sequences, at least 3 000 000 different sequences, at least 3 250 000 different sequences, at least 3 500 000 different sequences, at least 3 750 000 different sequences, at least 4 000 000 different sequences, at least 4 250 000 different sequences.

In some embodiments, the synthetic aptamer library with one module comprises at most 4 194 304 different sequences, at most 1 048 576 different sequences, at most 262 144 different sequences or at most 65 536 different sequences.

In some embodiments, the synthetic aptamer library with two modules comprises at most 17.59218×10¹²different sequences, at most 1.09951×10¹²different sequences, at most 68 719 476 736 different sequences or at most 4 294 967 296 different sequences.

In some embodiment, the synthetic aptamer library comprises at least 50 000 aptamers, at least 100 000 aptamers, at least 500 000 aptamers, at least 1 000 000 aptamers, at least 2 000 000 aptamers, at least 3 000 000 different sequences, at least 4 000 000 aptamers, at least 5 000 000 aptamers, at least 6 000 000 aptamers, at least 7 000 000 aptamers, at least 8 000 000 aptamers, at least 9 000 000 aptamers, at least 10 000 000 aptamers, at least 20 000 000 aptamers, at least 30 000 000 aptamers, at least 40 000 000 aptamers, at least 50 000 000 aptamers, at least 60 000 000 aptamers, at least 70 000 000 aptamers, at least 80 000 000 aptamers, at least 90 000 000 aptamers, at least 100 000 000 aptamers, at least 200 000 000 aptamers, at least 300 000 000 aptamers, at least 400 000 000 aptamers, at least 500 000 000 aptamers, at least 600 000 000 aptamers, at least 700 000 000 aptamers, at least 800 000 000 aptamers, at least 900 000 000 aptamers, at least 1 000 000 000 aptamers, at least 2 000 000 000 aptamers, at least 3 000 000 000 aptamers, at least 4 000 000 000 aptamers, at least 5 000 000 000 aptamers, at least 6 000 000 000 aptamers, at least 7 000 000 000 aptamers, at least 8 000 000 000 aptamers, at least 9 000 000 000 aptamers, at least 10 000 000 000 aptamers, at least 20 000 000 000 aptamers, at least 30 000 000 000 aptamers, at least 40 000 000 000 aptamers, at least 50 000 000 000 aptamers, at least 60 000 000 000 aptamers, at least 70 000 000 000 aptamers, at least 80 000 000 000 aptamers, at least 90 000 000 000 aptamers, at least 100 000 000 000 aptamers, at least 200 000 000 000 aptamers, at least 300 000 000 000 aptamers, at least 400 000 000 000 aptamers, at least 500 000 000 000 aptamers, at least 600 000 000 000 aptamers, at least 700 000 000 000 aptamers, at least 800 000 000 000 aptamers, at least 900 000 000 000 aptamers, or at least 1 000 000 000 000 aptamers.

According to the invention, each aptamer oligonucleotide sequence in the library comprises a same modular structure, with at least one module, each of said module comprising at least two regions interspersed with a stretch of fixed nucleotides.

The at least two regions comprise a mix of fixed nucleotides (i.e., nucleotides that do not vary in all the sequences of the library), and random nucleotides (i.e., nucleotides that vary in each sequence of the library).

In a preferred embodiment the fixed sequences are designed to minimize potential for complementary hybridization with other fixed sequences, both within a region and between regions. As such, all structural variation within the library is driven by variation in the identity of the random nucleotides and their ability to hybridize with other random or fixed nucleotides either within a region or between regions. In some embodiments, the at least two regions are at least partially complementary two by two. Depending on the nature of the random nucleotides in the at least two regions, they can be partially complementary or fully complementary. In some embodiments, two regions are therefore capable of forming a secondary structure element being a double-stranded stem, eventually with one or several mismatches when two random nucleotides do not hybridize. As such, all structural variations on the part of the aptamers are hence driven by variations in the random nucleotides, and their potential for hybridization between different regions (through their random nucleotides, or through their random and fixed nucleotides).

In some embodiments, each of the at least two regions comprises at least 3 nucleotides in total (fixed and random), such as 3, 4, 5, 6, 7 or more nucleotides in total. In some preferred embodiments, each of the at least two regions comprises 4 or 5 nucleotides in total (fixed and random).

In some embodiments, each of the at least two regions comprises from about 20% to about 80% of random nucleotides. In some embodiments, each of the at least two regions comprises at least 1, such as 1, 2, 3, 4, 5 or more random nucleotides.

In some preferred embodiments, each of the at least two regions comprises 4 or 5 nucleotides in total (fixed and random), among which 1, 2 or 3 random nucleotides.

In some preferred embodiment, each aptamer oligonucleotide sequence with one module in the library comprises at most 11 random nucleotides, so that the maximum number of possible different sequences in the library of aptamers is limited to 4¹¹=4 194 304. In some preferred embodiment, each aptamer oligonucleotide sequence with one module in the library comprises at most 10 random nucleotides, so that the maximum number of possible different sequences in the library of aptamers is limited to 4¹⁰=1 048 576. In some preferred embodiment, each aptamer oligonucleotide sequence with one module in the library comprises at most 9 random nucleotides, so that the maximum number of possible different sequences in the library of aptamers is limited to 4⁹=262 144. In some preferred embodiment, each aptamer oligonucleotide sequence with one in the library comprises at most 8 random nucleotides, so that the maximum number of possible different sequences in the library of aptamers is limited to 4⁸=65 536.

In some embodiments, the stretch(es) of fixed nucleotides comprise(s) adenosines and/or thymidines.

In some embodiments, the stretch(es) of fixed nucleotides comprise(s) at least 3 nucleotides or more, such as 3, 4, 5, 6, 7, 8, 9, 10 or more fixed nucleotides, preferably adenosines and/or thymidines. In some embodiments, the stretch(es) of fixed nucleotides comprise(s) a sufficient number of fixed nucleotides so that to be capable of forming a secondary structure element being a loop. Depending on the overall organization of the at least two regions and the stretch of fixed nucleotides, this loop can be a hairpin loops at the extremity of a double stranded stem, or interior loops between two double stranded stems.

Example 1 shows an exemplary embodiment wherein the aptamer oligonucleotide sequences comprise an overall A-s₁-B-s₂-C-s₃-D structure, wherein A, B, C and D are regions of fixed and random nucleotides and each of s₁, s₂and s₃is a stretch of fixed nucleotides. In this example, region A and region B can be at least partially complementary and form a stem, and regions B and region C can be at least partially complementary and form a stem; hence, the stretch s₂would form a hairpin loop, and the stretches s₁and s₃together would form an internal loop.

An exemplary consensus sequence that can be shared by all the aptamers in the library is given as SEQ ID NO: 1, and further described in Example 1 below. It is to be understood that this consensus sequence is not intended to be a limiting feature of the present invention, and that the skilled artisan will readily contemplate modifications in this sequence, such as in the exact position and/or number of random nucleotides within the various regions, as well as in the nature, position and number of fixed nucleotides within the various regions and stretches.

In some embodiments, the consensus sequence is as set forth in SEQ ID NO: 34.

SEQ ID NO: 34

5′-AAANGWWWNNNGWWWCNNNWWWCNTTT-3′

wherein W indicate adenine or thymine; and N

indicates any nucleotide.

In some embodiments, the consensus sequence is as set forth in SEQ ID NO: 35.

SEQ ID NO: 35

5′-AAANGWWWNNNGWWWGNNNWWWCNTTT-3′

wherein W indicate adenine or thymine; and N

indicates any nucleotide.

In some embodiments, the consensus sequence is as set forth in SEQ ID NO: 36.

SEQ ID NO: 36

5′-AAANGWWWNNNCWWWCNNNWWWCNTAA-3′

wherein W indicate adenine or thymine; and N

indicates any nucleotide.

Also contemplated herein is a synthetic library of aptamers comprising a plurality of aptamer oligonucleotide sequences, and related uses and methods, wherein each aptamer oligonucleotide sequence in the library comprises two or more modules as described above, i.e., each module having at least two regions interspersed with a stretch of fixed nucleotides.

According to this embodiment, a restriction site may be present between each module, so as to allow individualization of each module by use of a restriction enzyme.

Two examples of restrictions enzymes are given in Examples 4 and 5 below:

- the DraI enzyme, which recognizes and cleaves an AAA↓TTT sequence and produce blunt ends; and
- the KasI enzyme, which recognizes and cleaves a G↓GCGCC sequence and produce overhangs.

It is however to be understood that these two enzymes are not intended to be a limiting feature of the present invention, and that the skilled artisan will readily contemplate using other restriction enzymes, depending on the specific nucleic acid sequence that is present at the interface between two modules.

Examples 4 and 5 show exemplary embodiments of such aptamer oligonucleotide sequences comprises two modules.

In some embodiments, the two or more modules can be identical. Alternatively, they can be different, e.g., they can differ by at least 1 fixed nucleotide, such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more fixed nucleotides. When they differ, the at least two modules can find their differences in the fixed nucleotides of one or several regions, in the stretch(es) of fixed nucleotides, or in both.

In some embodiments, the aptamer oligonucleotide sequences comprise at least 2 modules as described above, such as at least 2, 3, 4, 5, 6, 7, 8 or more modules as described above. When the aptamer oligonucleotide sequences comprise more than 2 modules, such as 3 and more, the restriction site present between each module may be the same, or may be different; it is preferably the same.

The synthetic libraries of aptamers described herein find their use in several applications, as described in the Example section, and in particular in Example 6. The skilled artisan is familiar with techniques of aptamer selection, such as FRELEX or SELEX.

A particular application is a method of selecting aptamers that bind, preferably specifically bind, to a target molecule, which method comprises contacting a synthetic library of aptamers as described herein with the target molecule, and selecting those aptamer sequences that bind to the target molecule.

In particular, this method may be applied to the selection against a target molecule in multiple parallel applications of a single selection round. As detailed in the Example section, such selection in a single round is rendered possible and accurate with the synthetic libraries of aptamers described herein. Preferably, the basis for selection is the statistical evaluation of the change in frequency of each sequence in the library between a positive selection (i.e., in the presence of the target molecule) and a control, negative selection (i.e., in the absence of the target molecule).

The method can be applied, for instance, for the selection of aptamer sequences which bind specifically to a target molecule, but not to another specific molecule or group of molecules. In this case, the method will include a step of counter-selection against the other specific molecule or group of molecules.

The method can also be applied, for instance, for the selection of aptamer sequences that bind at relatively different affinities to different target molecules, e.g., for determining the presence of such molecules in a mixture and quantifying their relative amounts.

Such method can allow, for instance, to select aptamer sequences which bind specifically to a target molecule in a given 3-D or 4-D conformation, e.g., a target molecule in free state versus the same target molecule in complex with other molecules (4-D conformation); or a target molecule with a 3-D structure versus the same target molecule with a different folding.

Such method can also allow, for instance, to select aptamer sequences which bind specifically to a full-length or native target molecule versus the same target molecule which would have been degraded or otherwise cleaved into one or several fragments.

As explained in the Example section below, the synthetic library of aptamers, in particular when it comprises a single module, can be particularly useful for selecting aptamer sequences which bind specifically to small molecules. By “small molecule”, it is meant molecules which have a molecular weight below about 1 kDa.

Some examples of such small molecules include, without limitation, antibiotics, volatile organic compounds (VOCs), amino acids, some sugars, lipids, as well as phenolic compounds, alkaloids, and the like.

Conversely, the synthetic library of aptamers, in particular when it comprises two or more modules, can be particularly useful for selecting aptamer sequences which bind specifically to large or small molecules. By “large molecule”, it is meant molecules which have a molecular weight above about 1 kDa.

Some examples of such large molecules include, without limitation, proteins and polypeptides.

One object of the invention is an aptamer or aptamers obtainable from implementing the method of the invention.

Another object of the invention is the above-mentioned aptamer or aptamers or specific set of aptamers for use in a diagnostic or prognostic or disease monitoring method.

In one embodiment of this invention, the above-mentioned aptamer or aptamers or specific set of aptamers are for determining whether individuals have high or low brain amyloid.

In one embodiment of this invention, the above-mentioned aptamer or aptamers or specific set of aptamers are for determining the resistance of cognitive decline of a subject in the presence of high brain amyloid.

In one embodiment of this invention, the above-mentioned aptamer or aptamers or specific set of aptamers are for determining the rate of brain amyloid deposition in a subject.

In one embodiment of this invention, the method for determining whether individuals have high or low brain amyloid, the resistance of cognitive decline of a subject in the presence of high brain amyloid, the rate of brain amyloid deposition in a subject or for diagnosing or prognosing a disease in a subject, or for monitoring a disease in a subject, comprises contacting the aptamers or specific set of aptamers of the invention with a biological sample of a subject.

In one embodiment, the method as described above comprises contacting the aptamers or specific set of aptamers of the invention with a biological sample from at least one subject. In practice, the aptamers or specific set of aptamers of the invention would be applied on biological samples from a subject and would then be subjected to quantitative PCR analysis and the relative frequency of the aptamer sequences within the sample would be determined. In one embodiment, the relative frequency of diagnostic aptamers would be determined through a method other than quantitative PCR analysis, such as NGS analysis, hybridization to antisense sequences, or quantitative LCR analysis.

In one embodiment, the subject is/was diagnosed with the medical state, disease or condition under investigation. In one embodiment, the subject is at risk of developing the medical state, disease or condition under investigation. In one embodiment, the subject is/was not diagnosed with the medical state, disease or condition under investigation.

In one embodiment, the subject or diagnostic subject is a mammal, preferably a primate, more preferably a human. In one embodiment, the subject or diagnostic subject is a man. In one embodiment, the subject or diagnostic subject is a woman. In one embodiment, the subject or diagnostic subject is above the age of 20, preferably above the age of 30, 40, 50, 60, 70, 80, 90 years old or more. In one embodiment, the subject or diagnostic subject is from 30 to 90 years old, preferably from 40 to 90 years old, more preferably from 50 to 90 years old, even more preferably from 60 to 90 years old, even more preferably from 70 to 90 years old.

In one embodiment, the medical state, disease or condition include, but are not limited to, neurodegenerative diseases, neurological diseases.

Examples of neurodegenerative diseases include, but are not limited to, Alzheimer's disease, mild cognitive impairment (MCI), impaired memory functions, Parkinson's disease, Huntington's disease, multiple sclerosis, amyotrophic lateral sclerosis (ALS) (including familial ALS and sporadic ALS), Pick's disease, dementia, depression, sleep disorders, psychoses, epilepsy, schizophrenia, paranoia, attention deficit hyperactivity disorder (ADHD), amnesiac syndromes, progressive supranuclear palsy, brain tumor, head trauma and Lyme disease.

In a preferred embodiment, the medical state, disease or condition is Alzheimer's disease.

In one embodiment, the subject or diagnostic subject is not receiving medication for the medical state, disease or condition. In one embodiment, the subject or diagnostic subject is not receiving medication for Alzheimer's disease.

In one embodiment, the subject or diagnostic subject is receiving medication for the medical state, disease or condition. In one embodiment, the subject or diagnostic subject is receiving medication for Alzheimer's disease.

Examples of Alzheimer's disease medications include, but are not limited to, acetylcholinesterase inhibitors, NMDA receptor antagonists and antibodies.

Examples of acetylcholinesterase inhibitors include, but are not limited to, donepezil (Aricept®, Namzaric®), rivastigmine (Exelon®), galantamine (Razadyne®), huperzine A and tacrine (Cognex®).

Examples of NMDA receptor antagonists include, but are not limited to, memantine (Axura®, Ebixa®, Namenda®, Namzaric®).

Examples of antibodies include, but are not limited to, monoclonal antibodies directed against Aβ (such as, e.g., aducanumab, bapineuzumab, crenezumab, gantenerumab, ponezumab, solanezumab) and immunoglobulin therapies (such as, e.g., Gammagard®, Flebogamma®).

In one embodiment, the Alzheimer's disease medication is an agency-approved medication, i.e., a medication which has been approved by a national or regional drug agency selected from the group consisting of Food and Drug Administration (FDA—United States), European Medicines Agency (EMA—European Union), Pharmaceuticals and Medical Devices Agency (PMDA—Japan), China Food and Drug Administration (CFDA—China) and Ministry of Food and Drug Safety (MFDS—South Korea).

Also disclosed herein are the following embodiments.

E1: A synthetic library of aptamers suitable for use in a method of selecting aptamers specifically binding to small molecules, said library comprising multiple regions composed of a mixture of fixed and random nucleotides designed in such a way that the random nucleotides have the potential to be homologous between specific regions and where these regions are separated by fixed sequences that enable hairpin turns between homologous regions contiguous, wherein the maximum number of possible different sequences is limited to 4 194 304.

E2: The synthetic library of aptamers according to E1, wherein the aptamer sequences comprise at most 11 nucleotides of random sequence, preferably at most 8 or 9 nucleotides of random sequence.

E3: The synthetic library of aptamers according to E1 or E2, wherein each aptamer in the library of aptamers has a sequence 5′-AAANGAAANNNGAAACNNNAAACNTTT-3′ with SEQ ID NO: 1, wherein N represents any nucleotide.

E4: The synthetic library of aptamers according to any one of E1 to E3, wherein small molecules are smaller in size than 1 000 Daltons.

E5: A method for the reproducible processing and analysis of aptamer selections for target molecules comprising a selection library of not more than 11 random nucleotides, and preferably 8 or 9 random nucleotides.

E6: The method according to E5, as applied to the selection of a target in multiple parallel applications of a single selection round where the basis for selection is the statistical evaluation of the change in frequency of each sequence in the library between the positive selection and a control selection with no target.

E7: The method according to E5 or E6, wherein the selection library is a synthetic library of aptamers according to any one of E1 to E4.

E8: The method according to any one of E5 to E7, for the identification of aptamers that bind specifically to one target molecule and not to another specific molecule or molecules.

E9: The method according to any one of E5 to E7, for the identification of aptamers that bind at relatively different affinities to different target molecules, for use in determining the presence of such molecules in a mixture and quantifying their relative amounts.

E10: The method according to any one of E5 to E9, wherein the target molecules are small molecules, preferably smaller in size than 1 000 Daltons.

E11: A synthetic library of aptamers composed of modules that can be separated post selection with a restriction enzyme, each of said modules comprising the design components as described in any one of E1 to E4.

E12: The synthetic library of aptamers according to E11, wherein each aptamer in the library of aptamers has a sequence

5′-AAANGAAANNNGAATGNNNAAACNTTTAAANGAAANNNCATTCNN

NTTACNTAA-3′ with SEQ ID NO: 6.

E13: Use of the library according to E11 or E12 in the method according to E5 or E6, said method being followed by the separation of each of the individual modules of the library by a restriction enzyme, wherein the target molecules are molecules larger than 1 000 Daltons.

E14: The use according to E13, for the selection of aptamers for more complex targets than molecules that are smaller than 1 000 Daltons in multiple parallel applications of a single selection round, where the basis for selection is the statistical evaluation of the change in frequency of each sequence in the library between the positive selection and a control selection with no target.

E15: The use according to E13 or E14, for the identification of aptamers that bind specifically to one molecule larger than 1 000 Daltons and not to another specific molecule or molecules.

E16: The according to E13 or E14, for the identification of the aptamers that bind to different molecules with different affinities for use in determining the presence of such molecules in a mixture, or for quantifying the relative amounts of such molecules.

E17: The according to E13 or E14, to characterize a difference between a molecule by itself, and the same molecule with another molecule bound to it.

E18: The according to E13 or E14, to characterize a difference in the manner in which a molecule is folded.

E19: The according to E13 or E14, to characterize different cleavage products of a protein.

E20: A synthetic library of aptamers, said library comprising a plurality of aptamer oligonucleotide sequences each comprising two or more modules, each of said modules comprising at least two regions comprising a mixture of fixed and random nucleotides, two regions being interspersed with a stretch of fixed nucleotides, preferably with a stretch of adenosines and/or thymidines;

- wherein internal sequence hybridization events are driven by variations in the random nucleotides of each of the at least two regions within a same module or between two modules, preferably within a same module;
- wherein each module comprises at most 11 random nucleotides, preferably at most 8 or 9 random nucleotides; and
- wherein a restriction site is present between two modules.

E21: The synthetic library of aptamers according to E20, wherein each aptamer in the library of aptamers has the fixed nucleotides of the sequence SEQ ID NO: 6 or SEQ ID NO: 14, and varying sequence for the random nucleotides SEQ ID NO: 6 or SEQ ID NO: 14.

E22: Use of the synthetic library of aptamers according to E20 or E21, in an aptamer selection process against a target molecule.

E23: Use of the synthetic library of aptamers according to any one of E20 to E22, to identify aptamers that bind to target molecules that differ in biological samples that are derived from individuals that differ in terms of phenotype where the identity of the target molecules that the aptamers bind to is not necessarily known.

E24: A method of selecting aptamers that specifically bind to a target molecule, wherein the method comprising contacting different biological samples that are derived from individuals that differ in terms of phenotype with the synthetic library of aptamers according to E20 or E21, and recovering the aptamer oligonucleotides that bound to the target molecules.

E25: The use or the method according to E22 or E24, wherein the target molecule or molecules are unknown.

E26: The use or the method according to any one of E22 to E25, wherein the target molecule or molecule are located on cell surfaces, wherein preferably the cell is a mammalian cell, bacterial cell, fungus, or virus, more preferably a Mycobacterium cell or virus.

E27: The use or the method according to any one of E22 to E26, wherein the target molecule or molecules are contained in biological fluids, biological samples, or tissues.

E28: The use or the method according to E27, wherein the biological fluids is a blood, plasma or serum.

E29: The use or the method according to any one of E22 to E28, wherein the binding of the aptamers to unknown molecules are used to predict whether individuals have high or low brain amyloid, or high or low resistance to cognitive decline in the presence of high brain amyloid, or fast or slow rates of brain amyloid deposition.

E30: An aptamer or aptamers obtainable by the use or the method of any one of E22 to E29.

E31: Use of at least one aptamer or at least one set of aptamers according to E30, for the diagnostic or monitoring of a disease or disorder or for predicting any medical state.

E32: The aptamer having the nucleotide sequence of SEQ ID NO: 43 (Ham_6968) or SEQ ID NO: 44 (Ham_2753) or SEQ ID NO: 45 (Ham_6700) or SEQ ID NO: 46 (Ham_8505) or SEQ ID NO: 47 (C-LAM_1) or SEQ ID NO: 48 (C-LAM_168) or SEQ ID NO: 49 (C-LAM_2709) or SEQ ID NO: 50 (C-LAM_262).

E33: The aptamer having the nucleotide sequence of SEQ ID NO: 51 or SEQ ID NO: 52 or SEQ ID NO: 53 or SEQ ID NO: 54 or SEQ ID NO: 55 or SEQ ID NO: 56 or SEQ ID NO: 57 or SEQ ID NO: 58.

E34: The aptamer having the nucleotide sequence of SEQ ID NO: 59 or SEQ ID NO: 60 or SEQ ID NO: 61 or SEQ ID NO: 62 or SEQ ID NO: 63 or SEQ ID NO: 64 or SEQ ID NO: 65 or SEQ ID NO: 66.

E35: Set of aptamers, wherein the set comprise a combination of 1, 2, 3, 4, 5, 6, 7 or 8 aptamers according to any one of E30 to E34, wherein the set comprises:

- a. The set A, with the nucleotide sequences of SEQ ID NO: 43 (Ham_6968) and/or SEQ ID NO: 44 (Ham_2753) and/or SEQ ID NO: 45 (Ham_6700) and/or SEQ ID NO: 46 (Ham_8505) and/or SEQ ID NO: 47 (C-LAM_1) and/or SEQ ID NO: 48 (C-LAM_168) and/or SEQ ID NO: 49 (C-LAM_2709) and/or SEQ ID NO: 50 (C-LAM_262),
  - or
- b. The set B, with the nucleotide sequences of SEQ ID NO: 51 and/or SEQ ID NO: 52 and/or SEQ ID NO: 53 and/or SEQ ID NO: 54 and/or SEQ ID NO: 55 and/or SEQ ID NO: 56 and/or SEQ ID NO: 57 and/or SEQ ID NO: 58,
  - or
- c. The set C, with the nucleotide sequences of SEQ ID NO: 59 and/or SEQ ID NO: 60 and/or SEQ ID NO: 61 and/or SEQ ID NO: 62 and/or SEQ ID NO: 63 and/or SEQ ID NO: 64 and/or SEQ ID NO: 65 and/or SEQ ID NO: 66.

E36: Use of at least one aptamer or at least one set of aptamers according to any one of E30 and E32 to E35, for the diagnostic or monitoring of a neurodegenerative disorder involving brain amyloid.

E37: The use according to E36, wherein the neurodegenerative disorder is Alzheimer's Disease (AD), amyotrophic lateral sclerosis (ALS), Parkinson's Disease (PD), Huntington's Disease, prion disease, motor neuron disease, spinocerebellar ataxia, spinal muscular atrophy, neuronal loss, cognitive defect, primary age-related tauopathy (PART)/Neurofibrillary tangle-predominant senile dementia, chronic traumatic encephalopathy including dementia pugilistica, dementia with Lewy bodies, neuroaxonal dystrophies, and multiple system atrophy, progressive supranuclear palsy, Pick's Disease, corticobasal degeneration, some forms of frontotemporal lobar degeneration, frontotemporal dementia and parkinsonism linked to chromosome 17, Lytico-Bodig disease (Parkinson-dementia complex of Guam), ganglioglioma, gangliocytoma, meningioangiomatosis, postencephalitic parkinsonism, subacute sclerosing panencephalitis, lead encephalopathy, tuberous sclerosis, Hallervorden-Spatz disease, and lipofuscinosis, preferentially Alzheimer's Disease.

Use of at least one aptamer or at least one set of aptamers according to any one of E30 and E32 to E35, for the detection and/or the quantification of a target molecule.

The present invention will be further understood from the following examples. However, the scope of protection of the present invention shall not be limited to these examples.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the energy landscape of an 80-nucleotide oligonucleotide sequence. The number at the end of each vertical bar corresponds to a different predicted secondary structure of an oligonucleotide with the nucleic acid sequence SEQ ID NO: 37; the length of the vertical bars indicates the free energy required to change from one structure to another; the scale for these free energies is provided in the y-axis.

FIG. 2 shows the predicted proportion of secondary structures of the sequence referred to in FIG. 1 at equilibrium. In this figure, it was assumed that 100% of the sequence referred to in FIG. 1 was in the form of predicted “Structure 1”. The structures were allowed to evolve over time as a function of the energy landscape depicted in FIG. 1. It is clear that after 103 time units, the system has equilibrated and the “Structure 2” is present at a higher proportion than “Structure 1”.

FIG. 3 is a dot plot showing the distribution of number of base pairs predicted across the 65 536 possible sequences derived from SEQ ID NO: 1.

FIG. 4 shows the annealing of double-stranded coupling oligonucleotides to a Neomer library.

FIG. 5 provides a schematic overview of the Aptamarker process as applied with Neomer selection.

FIG. 6 provides a schematic overview of the FRELEX selection process as applied to Aptamarker selection with Neomers.

FIG. 7 is the observed distribution of Z scores in relation to Fold values for each of the top 10,000 sequences in terms of Z scores from the high brain amyloid versus low brain amyloid selection.

FIG. 8 is a chart providing the predictive capacity of Aptamarkers chosen for the detection of high brain amyloid in terms of their frequencies in the next generation sequencing analysis on high and low brain amyloid samples (labeled as high or low on the horizontal axis in the graph, and their frequency in the absence of blood sample (labeled as buffer on the horizontal axis)). The predictive capacity of each Aptamarker is provided as labeled by the name of the Aptamarker on the horizontal axis (HAM).

FIG. 9 is the observed distribution of Z scores in relation to Fold values for each of the top 10,000 sequences in terms of Z scores from the low brain amyloid versus high brain amyloid analysis of brain amyloid selection.

FIG. 10 is a chart providing the predictive capacity of Aptamarkers chosen for the detection of low brain amyloid in terms of their frequencies in the next generation sequencing analysis on high and low brain amyloid samples (labeled as high or low on the horizontal axis in the graph, and their frequency in the absence of blood sample (labeled as buffer on the horizontal axis)). The predictive capacity of each Aptamarker is provided as labeled by the name of the Aptamarker on the horizontal axis (C-LAM).

FIG. 11 provides the brain amyloid deposition level as characterized by PET scan analysis on two individuals over a course of months. The individual denoted with a dashed line did not exhibit cognitive dysfunction, while the individual denoted with a solid line exhibited mild cognitive impairment at the first brain amyloid sampling point and significant cognitive dysfunction (diagnosed as affected by Alzheimer's disease) in the subsequent sampling points.

FIG. 12 is the observed distribution of Z scores in relation to Fold values for each of the top 10,000 sequences in terms of Z scores from the high resistance to cognitive dysfunction in the presence of high brain amyloid versus low resistance to cognitive dysfunction in the presence of high brain amyloid analysis of resistance to cognitive dysfunction in the presence of high brain amyloid analysis.

FIG. 13 is a chart providing the predictive capacity of Aptamarkers chosen for the detection of high resistance to cognitive dysfunction in the presence of high brain amyloid in terms of their frequencies in the next generation sequencing analysis on individuals exhibiting high and low resistance to cognitive dysfunction (labeled as high or low on the horizontal axis in the graph, and their frequency in the absence of blood sample (labeled as buffer on the horizontal axis)). The predictive capacity of each Aptamarker is provided as labeled by the name of the Aptamarker on the horizontal axis (C-HR).

FIG. 14 is the observed distribution of Z scores in relation to Fold values for each of the top 10,000 sequences in terms of Z scores from the low resistance to cognitive dysfunction in the presence of high brain amyloid versus high resistance to cognitive dysfunction in the presence of high brain amyloid analysis of resistance to cognitive dysfunction in the presence of high brain amyloid analysis.

FIG. 15 is a chart providing the predictive capacity of Aptamarkers chosen for the detection of low resistance to cognitive dysfunction in the presence of high brain amyloid in terms of their frequencies in the next generation sequencing analysis on individuals exhibiting high and low resistance to cognitive dysfunction (labeled as high or low on the horizontal axis in the graph, and their frequency in the absence of blood sample (labeled as buffer on the horizontal axis)). The predictive capacity of each Aptamarker is provided as labeled by the name of the Aptamarker on the horizontal axis (C-LR).

FIG. 16 provides a comparison from the two individuals from the AIBL cohort in Australia that differ in terms of the rate of brain amyloid that they are accumulating in their brains. The individual described by the solid line is exhibiting a faster rate of brain amyloid accumulation based on PET scans than the individual described by a dashed line.

FIG. 17 is the observed distribution of Z scores in relation to Fold values for each of the top 10,000 sequences in terms of Z scores from the fast brain amyloid accumulation individual selections versus the slow brain amyloid accumulation individual selections.

FIG. 18 characterizes the predictive capacity of the top four Aptamarkers arising from the selection of fast versus slow brain amyloid accumulation (labeled as fast or slow on the horizontal axis in the graph, and their frequency in the absence of blood sample (labeled as buffer on the horizontal axis)). The predictive capacity of each Aptamarker is provided as labeled by the name of the Aptamarker on the horizontal axis (c-FAM).

FIG. 19 is the observed distribution of Z scores in relation Fold values for each of the top 10,000 sequences in terms of Z scores from the slow brain amyloid accumulation individual selections versus the fast brain amyloid individual selections.

FIG. 20 characterizes the predictive capacity of the top four Aptamarkers arising from the selection of slow versus fast brain amyloid accumulation (labeled as slow or fast on the horizontal axis in the graph, and their frequency in the absence of blood sample (labeled as buffer on the horizontal axis)). The predictive capacity of each Aptamarker is provided as labeled by the name of the Aptamarker on the horizontal axis (c-SAM).

EXAMPLES
Example 1: Design of an Aptamer Library in which all Possible Sequences can be Characterized by Next-Generation Sequencing

U.S. Pat. No. 5,475,096 B2 was written prior to the development and wide-spread commercial application of next-generation sequencing. The examples provided within U.S. Pat. No. 5,475,096 B2 are based on the cloning of selected sequences into plasmids, transformation into bacteria and individual clone sequencing. The advent of next-generation sequencing has made this approach obsolete, with the possibility of millions of sequences being directly characterized from a single selection library following PCR amplification.

Given that this scale of sequence characterization was not available at the time this patent was filed, it is clear that the concept of limiting aptamer selection to a single selection depends entirely on the effectiveness of the selection process. In the present invention, we overcome this constraint by providing a method whereby the effectiveness of the selection process is not the limiting factor determining the necessary rounds of selection required. We achieved this by designing synthetic aptamer libraries with conserved elements and by including segments (or blocks) of random sequences that can be fully characterized by the current existing capacity of next-generation sequencing.

The number of possible sequences that can exist in a synthetic aptamer library can be calculated with the formula 4ⁿ, where “n” is the number of random oligonucleotides in the sequence. Table 1 provides the total number of possible sequences based on the total number of random nucleotides within a sequence of an oligonucleotide. Table 1 also provides the average number of copies expected of each sequence in an NGS analysis of 5×10⁶sequences.

TABLE 1

Number of random
Number of possible
Average copy number

nucleotides
sequences
in NGS analysis

1
4
1250000

2
16
312500

3
64
78125

4
256
19531.25

5
1024
4882.8125

6
4096
1220.703125

7
16384
305.1757813

8
65536
76.29394531

9
262144
19.07348633

10
1048576
4.768371582

11
4194304
1.192092896

12
16777216
0.298023224

13
67108864
0.074505806

14
268435456
0.018626451

15
1073741824
0.004656613

16
4294967296
0.001164153

17
17179869184
0.000291038

18
68719476736
7.27596 × 10⁻⁵

19
2.74878 × 10¹¹
1.81899 × 10⁻⁵

20
1.09951 × 10¹²
4.54747 × 10⁻⁶

It is clear from Table 1 that if the number of random nucleotides in a sequence exceeds 11, it is expected that, in an NGS analysis of 5 million sequences from such a library, not all sequences would be observed. Moreover, one trained in the art of evaluating NGS data of aptamer libraries would appreciate that there is variation in copy number in any naïve random library and as such, the probability of observing all possible sequences in a library would more likely be limited to libraries of 8 or 9 random nucleotides. As such, the use of 8 or 9 random nucleotides within a library would be a preferred embodiment of this invention.

We have also determined that the probability of an oligonucleotide composed of only 8 contiguous random nucleotides forming significant secondary structure is very low, as there is a need for the structure to contain a hairpin loop of at least 3 nucleotides between double-stranded regions. As such, to achieve the structural complexity necessary for aptamer function, it is necessary to intersperse the random nucleotides with defined, fixed sequences.

An example of a library with 8 random nucleotides interspersed with conserved elements is provided as SEQ ID NO: 1.

SEQ ID NO: 1

5′-AAANGAAANNNGAAACNNNAAACNTTT-3′

wherein N represents a random nucleotide.

This library has 8 random nucleotides and is specifically designed to maximize the effect of the random nucleotides on secondary structure, by placing the random nucleotides in four separate regions A to D as depicted below in bold underlined:

(SEQ ID NO: 1)

5′-AAANGAAANNNGAAACNNNAAACNTTT-3′

A B C D

The 4 regions A to D are composed of a mixture of fixed and random nucleotides. These regions are separated one from another by 3 A residues to provide capacity for hairpin turns and enable the necessary spatial freedom required for hairpin turns between possible hybridization of the regions. For instance, region C has the capacity to hybridize to region B, and region D has the capacity to hybridize to region A. All hybridization events within a single sequence must be anti-parallel with one strand running in a 5-to-3′ direction and the other in a 3-to-5′ direction.

The enablement of this example is not limited to this sequence, or this particular design. This example is provided to demonstrate that thoughtful consideration of the interspersion of fixed and random regions is necessary to optimize the capacity of the library to form an optimal number of secondary structures.

It has also been our experience that effective aptamers are composed of both hybridized regions (stems) and regions that are not hybridized (loops). Our library design takes this into consideration by optimizing the proportion of possible sequences that will have a mixture of stems and loops.

To confirm the potential efficacy of this library design, we tabulated all possible 65 536 sequences of the library of aptamers with one module with SEQ ID NO: 1, and determined the predicted secondary structures using batch LINUX commands in the RNAFold program (available at https://www.tbi.univie.ac.at/RNA/RNAfold.1.html). A distribution of shape complexity is provided in FIG. 3.

Overall, this means that 30 645 of the possible 65 536 sequences exhibit significant secondary structure in the absence of binding to another molecule. A preferred embodiment of this invention is a library design where over 25% of all possible sequences form secondary structure. It is recognized by the Inventor that the library of this example may not provide sufficiently complex structures to bind to relatively complex targets such as proteins: as such, a preferred embodiment would be a library of aptamers, each aptamer comprising at least one module, with for each module from 8 to 11 random nucleotides. Such library would be suitable for selection against small (e.g., with a molecular weight below about 1 kDa) or against any larger targets exhibiting only few charged groups.

Example 2: A Method for the Preparation of a Library without Primer Recognition Sequences for Next Generation Sequence Analysis

The library of Example 1 is an example of a library enabling the present invention that does not have primer recognition sequences. A method describing how such a library would be prepared for next-generation sequence analysis is provided below and illustrated in FIG. 4.

Step a): the library is synthesized with a 5′ phosphate group to enable effective ligation.

Step b): two double-stranded coupling oligonucleotides are designed to facilitate preparation of the library for NGS.

Step c): sequences Fwd A and Fwd B are annealed with each other and the sequences Rvs A and Rvs B are annealed to each other.

The four sequences in this example provided for the double-stranded coupling oligonucleotides are as follows:

-Fwd A

SEQ ID NO: 2

5′-TAATACGACTCACTATAGGGATAAT-3′

-Fwd B

SEQ ID NO: 3

5′-TTTCNTTTATTATCCCTATAGTGAGTCTATTA-3′

-Rvs A

SEQ ID NO: 4

5′-GAACGAGCGACCTCATACGTATTG-3′

-Rvs B

SEQ ID NO: 5

5′-CAATACGTATGAGGTCGCTCGTTCAAANGTTT-3′

The Rvs B sequence can be labeled at its 5′ end, for instance with a biotin moiety; the Rvs A oligonucleotide is preferably synthesized with a 5′ phosphate.

Step d): the double-stranded coupling constructs are annealed with the library sequences.

Step e): the construct created in step d) is used to add primer recognition regions to the library, using, e.g., a T4 DNA ligase. This creates a post-selection library. Only the sense strand containing the library is efficiently ligated, as the antisense strands are separated by a gap in the hybridized construct.

Step f): the post-selection library is washed from all extraneous sequences; for instance, when the Rvs B sequence is labelled with biotin, the entire construct can be coupled to immobilized streptavidin (e.g., streptavidin coated resin, streptavidin agarose, streptavidin coated magnetic beads, etc.).

Step g): the sense strand of the post-selection library is recovered, e.g., from the immobilized streptavidin using an elution step. Preferably, this step is performed in absence of any chemical denaturing agents to avoid additional steps of washing. For instance, elution can be performed using heat alone.

Step h): The sense strand of the post-selection library is amplified with appropriate primers for preparation for NGS.

It is a preferred embodiment of this invention that both the naïve (i.e., unselected) library and a selected library be prepared in the same manner, for instance according to the method described above. This would then enable a reliable comparison of the frequency of each sequence in the naïve library to its frequency in the selected library. Indeed, in SELEX or FRELEX, candidate aptamers are chosen from the selection process based on their overall enrichment in terms of frequency across selection rounds; however, if only a single round of selection is performed, then the selection is based on the sequences with the highest frequency which may be a biased choice if one doesn't know their frequency in the naïve library.

In the present invention, candidate sequences are chosen based on the relative proportion of their increase or decrease in frequency in the selected library from their frequency in the naïve library. This is the sole basis for their selection.

Moreover, a key advantage of the present invention is that, given we are characterizing the frequency of all possible sequences in the naïve and selected libraries, the selection process is reasonably expected to be reproducible. This means that replications of the same selection process should result in similar changes in overall frequency for the selected sequences. As such, the design of the library described in Example 1, and the processing of such libraries for NGS, establishes a basis for an innovative approach to defining the optimal aptamers from a single-round selection process. This is described in more detail in Example 3 below.

Example 3: A Method for the Statistical Analysis of Replicated Single Step Selection Processes for the Identification of Optimal Aptamers for any Given Target

It was an insight of the Inventor that, given that libraries could be designed where a significant proportion of the possible sequences would exhibit secondary and tertiary structures, and given that the frequencies of all sequences within such libraries could be characterized, this meant that the aptamer selection process is reproducible. As such, it is possible to design aptamer identification processes that are based on statistical analysis of replicated selection experiments using the same libraries. This capacity has not previously existed within the field of aptamer selection.

An example of selection strategy is as follows.

Step a): a sample of the library containing 1010 copies is combined with a target and allowed to incubate in binding buffer for preferably 5 minutes, although incubation time could be longer or shorter.

Step b): the mixture from step a) is applied to a gold surface coated with immobilized oligonucleotides that are composed of 8 contiguous random nucleotides (i.e., a FRELEX field as described in EP 3 344 805 B1 or U.S. Pat. No. 10,415,034 B2), and allowed to incubate.

Step c): the supernatant from this surface containing those sequence that have not hybridized to the immobilized random 8 nucleotide oligonucleotides is recovered.

The product of step c) constitutes a “selected library”. This would then be directly processed for NGS as described in Example 2.

It is a preferred enablement of this new approach to characterize the frequency of each sequence in the naïve (unselected) library from at least two iterations (replications) of independent and separate single-step selection processes, as described above, but in the absence of any target molecule contacted with the library. A selection for aptamers in the presence of a target molecule would be performed in the same manner, with at least two separate and independent iterations of a single selection round.

The application of such a process will result in average frequencies and standard deviations of these frequencies of all sequences in a library. For example, if 8 random nucleotides are used in the selected library (as per the example of SEQ ID NO: 1), then we would have average expected frequencies and standard deviations of these average frequencies for each of all possible 65 536 sequences in the absence of selection, and in the presence of selection for a specific target.

This information allows evaluation of aptamer sequences that exhibit the highest statistically significant deviation, either in terms of increased or decreased frequency, in the presence of selection for the target versus in the absence of the target. An example of the processing of such data would involve the use of the following formula well-known in statistics, as the formulation of a Z statistic for each sequence:

$Z = \frac{Avg . freq . w / target - Avg . frequency w / o target}{Avg . of the standard deviations of the averages}$

wherein “Avg. freq w/target” is the average frequency in the presence of selection for a target; and “Avg. freq w/o target” is the average frequency in the absence of selection for a target.

This statistical analysis process for the identification of optimal aptamers from the selection process is not limited to only a positive/negative comparison. It is a further insight of the Inventor that, given the reproducibility of the process, positive selection could be applied against one or more counter-targets and sequences identified that differ between selections. This leads to implicit identification of sequences with desired specificity for one target over another.

It is a further insight of the Inventor that such a process is not limited to qualitative differences defined as binding or not binding. The effect of different targets on the relative frequencies of aptamers within a defined library as described herein could also be use to identify aptamers that have relatively different binding affinities to multiple targets. Such aptamers could then be used individually, or in combination to quantify the relative amounts of several different target molecules in a mixture.

Example 4: Extension of Structural Complexity of Aptamer Libraries without Loss of Reproducibility

It was noted previously that the structural complexity of the library of Example 1 may be limiting when considering aptamer selection for complex targets (e.g., molecules with a molecular weight above 1 kDa or with several charged groups), such as proteins. In this invention, this limitation can be overcome by increasing the potential for structural complexity while maintaining the reproducibility of this aptamer selection process through the introduction of multiple modules in each aptamer of the library.

This Multiple library modules, similar to the library described in Example 1, are synthesized in a library as contiguous units that remain together for selection, but can be cleaved into the respective modules prior to NGS characterization. This is shown below, with an exemplary sequence of SEQ ID NO: 6 comprising SEQ ID NO: 9 as an example of a first module (example of Module A) and a similar sequence of SEQ ID NO: 7 as a second an example of a second module (example of Module B).

SEQ ID NO: 6

5′-AAANGAAANNNGAATGNNNAAACNTTTAAANGAAANNNCATTCNNN

TTACNTAA-3′

wherein N represents a random nucleotide.

An example of Module A,

SEQ ID NO: 9

5′-AAANGAAANNNGAATGNNNAAACNTTT-3′

An example of Module B,

SEQ ID NO: 7

5′-AAANGAAANNNCATTCNNNTTACNTAA-3′

This library has 16 random nucleotides, each aptamer being divided into two modules A and B, and each module being sub-divided in four separate regions as depicted below in bold underlined, from 5′ to 3′:

Module A Module B

AAANG
AAANNNGAATGNNNAAACNTTT AAANGAAANNNCATTCNNNTTACNTAA

A B C′ D A B′ C D′

In this library design, when compared to the single-module design with SEQ ID NO: 1 of Example 1, region C was modified in Module A to start with a G nucleotide (named region C′), and region B was modified in Module B to end with a C nucleotide (named region B′). These changes were made to ensure that the sequenced products of the two modules could be differentiated from each other.

Region D was also modified in Module B (named region D′) to enable separate amplification of this module. This difference enables the option of preparing Module B from Module A separately with different hex codes in the forward primer for NGS.

These changes also ensure that all potential for hybridization be majorly driven by the random nucleotides and not by the fixed nucleotides. Indeed, any implicit secondary structure due to hybridization of complementary fixed nucleotides would reduce the potential diversity of the solution space.

This library of aptamers, comprising in this example two modules (A and B), can be synthesized and used as a contiguous strand in selection. Prior to NGS preparation, the library can be cleaved into the two modules with the use of a restriction enzyme (in this particular example, for instance, a DraI enzyme which recognizes and cleaves an AAA↓TTT sequence and produce blunt ends). The use of this enzyme, or any other restriction enzyme, can be facilitated by the addition of an antisense oligonucleotide, for instance with SEQ ID NO: 8, that would hybridize to the library astride Module A and Module B, creating a double-stranded recognition site for the restriction enzyme.

SEQ ID NO: 8

5′-TTTCNTTTAAANGTTT-3′

AAANGAAANNNGAATGNNNAAACNTTT↓AAANGAAANNNCATTCNNNTTACNTAA

TTTGNAAA↑TTTNCTTT

The remaining free antisense fragments can be easily removed using, e.g., a primer cleanup column.

The retained library modules can then be processed separately for NGS analysis. Each library module comprises 65 536 possible sequences and the library as a whole comprises 4 294 967 296 possible sequences.

As such, the concept of the selection process comprising a closed sequence solution space remains satisfied. Samples of the library for selection or for naïve library processing for NGS analysis would be made with at least 4 294 967 296 sequences, such that each possible sequence would have an average of 100 copies in the naïve library.

The frequency of the random nucleotides within each position can be determined for each module separately with confidence by NGS. The product of the frequency for each position is equivalent to the frequency of each possible sequence. Thus, the frequency of each of the possible 4 294 967 296 sequences could be determined by the product of the individual frequencies of nucleotide identity at each random nucleotide position.

As such, the concept of enabling reproducible aptamer selection articulated in Examples 1 to 3 is retained. We have demonstrated that all of the possible sequences in the naïve library would be present in each aliquot of the library analyzed, and that the frequency of each possible sequence could be ascertained in the naïve library and in the selected libraries.

We have the capacity to characterize the frequency of each of the 65 536 sequences present in each module, thus we have the capacity to characterize the frequency of each of the 4 294 967 296 sequences in a library of aptamers with SEQ ID NO: 6. As such, the reproducibility of the selection process is maintained, while the range of complexity in potential secondary and tertiary structures is increased.

In theory, this concept could be expanded to include more than 16 random nucleotides and thus, more than 4 294 967 296 possible sequences. As the sequence length is increased however, the number of individual sequences that are not observed in the original library would decrease simply as a function of sampling. In this case, we would increase the original sample of aptamers sequences applied to a target from 4 294 967 296 sequences to a larger number, to ensure adequate coverage of all sequences.

It is also possible to vary the number of random nucleotides to any possible number as long as the number within a given library module can be fully characterized by NGS.

It is also possible to vary the library structure, by changing the sequence identity and/or the length of the fixed sequence nucleotides.

Following digestion of SEQ ID NO: 6 with DraI, the two Modules A and B are produced, with SEQ ID NOs: 9 and 7, respectively. These modules require a redesign of the sequences for preparation for next generation sequencing as follows.

-Fwd B

SEQ ID NO: 10

5′-TTTCNTTTATTATCCCTATAGTGAGTCGTATTA-3′

-Rvs A

SEQ ID NO: 11

5′-GAACGAGCGACCTCATACGTATTTG-3′

-Rvs B module A

SEQ ID NO: 12

5′-CAAATACGTATGAGGTCGCTCGTTCAAANGTTT-3′

-Rvs B module B

SEQ ID NO: 13

5′-CAAATACGTATGAGGTCGCTCGTTCTTANGTAA-3′

Similarly to Example 2, the Rvs B sequences with SEQ ID NOs: 12 and 13 can be labeled at their 5′ end, for instance with a biotin moiety; and the Rvs A oligonucleotide with SEQ ID NO: 11 is preferably synthesized with a 5′ phosphate.

These would be used to prepare each module for NGS analysis as described for the single module in Example 2. This preparation could either be done simultaneously or with separate aliquots of the digested library.

Example 5: A Simplified Library Design for Neomer Selection

A potential difficulty encountered with the practical application of the sequences and approach described in Example 4 could be inefficiency, as the process requires the ligation of primer recognition regions to a library sequence prior to amplification. In practice, the ligation step may be difficult to robustly reproduce, resulting in an arbitrary loss of a proportion of the sequences, and/or mis-amplification of the library with non-flanking primer sequences. This difficulty can be overcome with the use of a library with SEQ ID NO: 14.

SEQ ID NO: 14

5′-CCAGATACAGACNNGAGGNNNGAATNNNAACCATCGGCGCCAACAN

NNCATTCNNNCAGANNCAGTAGACAGC-3′

In this example, SEQ ID NO: 14 is first amplified with a forward primer with SEQ ID NO: 15 and a reverse primer with SEQ ID NO: 16. The amplified sequence is then restricted with the restriction enzyme KasI forming two restricted fragments: an example of Module A with SEQ ID NO: 17 and an example of Module B with SEQ ID NO: 18. Module A is amplified for NGS analysis with a forward NGS sequence containing a hex code (SEQ ID NO: 19) and a reverse NGS sequence (SEQ ID NO: 20). A second round of amplification is performed with a forward NGS 2 primer containing a hex code (SEQ ID NO: 21) and a reverse NGS 2 primer (SEQ ID NO: 22).

SEQ ID NO: 15

5′-CAAATACGTATGAGGTCGCTCGTTCCCAGATACAGAC-3′

SEQ ID NO: 16

5′-TAATACGACTCACTATAGGGATAATGCTGTCTACTG-3′

SEQ ID NO: 17

5′-CAAATACGTATGAGGTCGCTCGTTCCCAGATACAGACNNGAGGNNNGAAT

NNNAACCATCG-3′

SEQ ID NO: 18

5′-GCGCCAACANNNCATTCNNNCAGANNCAGTAGACAGCATTATCCCTATAG

TGAGTCGTATTA-3′

SEQ ID NO: 38

5′-CCCTACACGACGCTCTTCCGATCTATCACGCAAATACGTATGAGGTCGCTC

GTTC-3′

SEQ ID NO: 20

5′-GGTCAGACGTGTGCTCTTCCGATCGGGGCGCCGATGGTT-3′

SEQ ID NO: 39

5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTT

CCG-3′

SEQ ID NO: 40

5′-CAAGCAGAAGACGGCATACGAGATGTGACTGGAGTTCAGACGTGTGCTCT

TCC-3′

an example of Fwd NGS 1 module B,

SEQ ID NO: 41

5′-CCCTACACGACGCTCTTCCGATCTATCACGGCGCCAACA-3′

an example of Rev NGS 1 module B,

SEQ ID NO: 42

5′-GGTCAGACGTGTGCTCTTCCGATCGGGTAATACGACTCACTATAGGGATAA

TGCTGTCTACTG-3′

Underlined in SEQ ID NO: 38 is the hex code sequence (ATCACG), which varies with different libraries.

Example 6: The Use of Neomer Libraries to Identify Aptamarkers for a Medical State, Risk of Developing Alzheimer's from High Deposition of Brain Amyloid as Predicted in Blood Samples

The Aptamarker platform is described in patent applications (WO 2018/150030 A1). In brief, this concept describes the use of aptamer libraries with SELEX to identify aptamers that bind to unknown biomarkers in biological fluids or tissues. The aptamers identified are referred to as Aptamarkers. The Aptamarker process has been used to identify aptamers for high brain amyloid through their binding to unknown targets in blood samples (DOI: 10.1371/journal.pone.0243902).

The application of a Neomer library in this process is clearly advantageous as it eliminates the first step of SELEX based Aptamarker selection, the enrichment of a library against pools of blood samples. The Neomer library approach enables direct application of the same library to different blood samples where such blood samples are chosen based on differences in phenotype.

This approach is described in FIG. 5.

A Neomer library composed of the same 4 294 967 296 sequences is applied to blood samples (plasma or serum) from individuals that vary for a particular disease state. Each of the aptamers may bind to target molecules in these blood samples. Aptamers that bind to target molecules are partitioned from aptamers that do not bind using the FRELEX approach (U.S. Pat. No. 10,415,034-B2) as described herein as FIG. 6.

In FRELEX all aptamers will have a tendency to hybridize to the antisense oligonucleotides immobilized on a surface. If an aptamer binds to a molecule in blood, then such an aptamer is less likely to hybridize to the antisense on the surface. It would be clear to one trained in the art that all that is required in this process is a modulation of the frequency with which each aptamer binds to the antisense on the surface. In this example of the Neomerlibrary application to Aptamarker identification a preferred embodiment is to start selection with an average of 1,000 copies of each of the 4 294 967 296 sequences. As such, it is only necessary for the enablement of this application that a portion of the copies of each sequence hybridize to the antisense oligonucleotides, and that the magnitude of this portion is modulated by the binding of the aptamer to molecules in blood.

It should also be noted that while blood was the bodily fluid used in this example, this application is applicable to all bodily fluids and cell suspensions.

It should also be noted that the blood samples were all received from the AIBL cohort in Australia.

All aptamers that are not hybridized to the antisense oligonucleotides are recovered and processed for next generation sequencing (NGS) analysis. One embodiment of this partitioning process with FRELEX involves immobilization of the antisense oligonucleotides on gold nanoparticles. The gold nanoparticles functionalized with immobilized antisense oligonucleotides are incubated with the aptamer in blood sample mixture. Aptamers that hybridize to the antisense oligonucleotides are removed from the solution through a microcentrifugation of the solution, resulting in a pelleting of the gold nanoparticles and the hybridized aptamers. The supernatant is enriched for those aptamers that are bound to a target molecule.

We use the term hybridized for the interaction between an aptamer and the immobilized antisense, and the term bound for the interaction between an aptamer and a target molecule.

The proportion of aptamer in the supernatant following the FRELEX process (that proportion of the aptamer not hybridized to the antisense) would be directly proportional to the concentration of the target molecule in the blood sample. The nature of this proportional relationship between amount of aptamer bound and target molecule concentration will be a function of the binding affinity between the aptamer and the target molecule, which is not known. However, it is not required for this relationship to be known for the enablement of this invention because regardless of the binding affinity, such binding affinity will be the same between a given aptamer sequence and a given target molecule. As such, differences in aptamer frequency detected in NGS analysis will be indicative of differences in the concentration of the target that they bind to.

As such, aptamers that exhibit differences in frequency in NGS analysis following FRELEX selection in different blood samples are detecting a difference in target molecule concentration in these blood samples. Clearly there can be many differences in target molecule concentration between any two blood samples. In order to identify aptamers that bind to target molecules that are related to a phenotype, or medical state it is necessary to apply the Neomer aptamer set to multiple blood samples comprising at least two different phenotypes or medical states. In theory, a minimum of two samples exhibiting one phenotype and a minimum of two samples exhibiting a different phenotype.

In this example, we used ten samples of blood from individuals with high brain amyloid as determined by positron emission tomography (PET scans) and ten samples of blood from individuals with low brain amyloid as determined by PET scans. The frequency of each of the 4 294 967 296 sequences in each blood sample was determined by the methods described in Example 3.

The average and standard deviation of each of the 4 294 967 296 sequences was determined for the ten samples with high brain amyloid and separately for the ten samples with low brain amyloid. These values were used to determine the statistical significance of differences between the two phenotypes (high and low brain amyloid) with the following formula.

Z=(Average of sequence in one phenotype−average of same sequence on other phenotype)/(average of standard deviation of the same sequence in both phenotypes).

In this example, the top 10,000 Z scores were defined for high versus low brain amyloid and for low versus high brain amyloid. It is however to be understood that these top 10,000 Z scores are not intended to be a limiting feature of the present invention, and that the skilled artisan will readily contemplate using other number than 10,000. We termed the high brain amyloid versus low brain amyloid aptamers as Ham_#where the #sign designated the rank of the aptamer in terms of Z score, with the number 1 being assigned to the aptamer with the highest Z score, and the number 10,000 being assigned to the aptamer with the 10,000th highest Z score. We named the low brain amyloid versus high brain amyloid aptamers as C-Lam_#using the same convention.

The frequencies from each of these sets of 10,000 sequences were then analyzed for the capacity of the sequence to differentiate between high and low brain amyloid. To calculate this, we gave each aptamer across the high brain amyloid samples a score of 1 if the aptamer frequency in a given sample was higher than the maximum aptamer frequency in all the low amyloid samples. Likewise, we gave a score of one for each aptamer within each low brain amyloid sample, where the aptamer frequency was lower than the minimum frequency observed with the high brain amyloid samples.

In addition, we processed a buffer sample containing no blood in each analysis, as a further test. Aptamers that exhibit a consistent higher frequency on one phenotype versus the other as described above, and exhibited a similar or lower frequency on the buffer as compared to the non-targeted phenotype were deemed desirable. This buffer control serves a further check that the aptamers are in fact binding to a target molecule that differs between the phenotypes. Finally, we also ranked the aptamers on the basis of the difference in frequency between the least frequent high amyloid sample and the most frequent low amyloid sample for each aptamer.

The Z scores versus fold values are provided for the top 10,000 aptamers based on Z scores for high brain amyloid versus low brain amyloid in FIG. 7. Fold value is the average frequency for high brain amyloid divided by the average frequency for low brain amyloid, with unity being subtracted from this dividend.

Of these 10,000 aptamers, 3,595 exhibited higher frequencies on all high brain amyloid samples (n=10) versus all low brain amyloid samples (n=10) tested.

We then screened all 3,595 sequences noted above to define those where the buffer alone score was also lower in frequency than all of the high amyloid samples. We identified 2,806 sequences that also satisfied this test.

The predictive capacity of the top four aptamers for high brain amyloid versus low brain amyloid are described in FIG. 8.

We also performed the same analysis low amyloid versus high amyloid. FIG. 9 provides the Z score versus fold difference for the Aptamarker selection in this direction.

The predictive capacity of the top four aptamers based on this analysis are provided in FIG. 10.

One enablement of this invention is the use of a combined set of Aptamarkers including Aptamarkers that were selected for high versus low brain amyloid and Aptamarkers that were selected for low versus high brain amyloid in a combined qPCR analysis.

The sequences noted above are all of utility in the diagnosis of high or low brain amyloid in blood samples as based on a qPCR assay. The method of developing such a qPCR assay from these Aptamarkers is described in the patent WO 2018/150030 A1.

Aptamarker sequences: High versus low brain amyloid

Ham_6968,

SEQ ID NO: 43

CCAGATACAGACCAGAGGCGTGAATTTCAACCATCGGCGCCAACACCC

CATTCCTGCAGATACAGTAGACAGC

Ham_2753,

SEQ ID NO: 44

CCAGATACAGACTCGAGGACTGAATCGGAACCATCGGCGCCAACAAGA

CATTCATACAGAAACAGTAGACAGC

Ham_6700,

SEQ ID NO: 45

CCAGATACAGACTCGAGGTAAGAATGGGAACCATCGGCGCCAACACAG

CATTCAATCAGAGACAGTAGACAGC

Ham_8505,

SEQ ID NO: 46

CCAGATACAGACAGGAGGGCGGAATTCCAACCATCGGCGCCAACATTT

CATTCTAACAGACACAGTAGACAGC

Aptamarker sequences: Low versus high brain

amyloid

C-LAM_1,

SEQ ID NO: 47

CCAGATACAGACTAGAGGTATGAATAGAAACCATCGGCGCCAACAATT

CATTCATTCAGATGCAGTAGACAGC

C-LAM_168,

SEQ ID NO: 48

CCAGATACAGACAAGAGGACCGAATGTCAACCATCGGCGCCAACATTA

CATTCATTCAGATTCAGTAGACAGC

C-LAM_2709,

SEQ ID NO: 49

CCAGATACAGACTTGAGGTTCGAATCTCAACCATCGGCGCCAACATGA

CATTCCTTCAGATTCAGTAGACAGC

C-LAM_262,

SEQ ID NO: 50

CCAGATACAGACCAGAGGTTTGAATTCCAACCATCGGCGCCAACATTA

CATTCCTCCAGAATCAGTAGACAGC

Example 7: The Use of the Neomer Library for the Identification of Aptamarkers for Individual Resistance to Cognitive Decline in the Presence of High Brain Amyloid

It is known in the art that the relationship between brain amyloid deposition and cognitive decline is not absolute. Certain individuals exhibit cognitive decline very rapidly in the presence of high brain amyloid and other individuals exhibit cognitive decline much more slowly. The data from two individuals in the AIBL cohort analysis is provided in FIG. 11 to elaborate this point clearly. The dots and lines correspond to evaluation of brain amyloid based on PET scan data on successive tests with 18 months between each test. Individual B, did not exhibit cognitive decline even though their brain amyloid deposition was considered high (high is defined as being above 30 centiloids). NMC denotes non-memory complainer. Individual A, on the other hand did exhibit cognitive decline, progressing from a clinical diagnosis as mild cognitive impairment (MCI) to a full diagnosis as Alzheimer's demented (AD).

In general, there is definitely a trend towards cognitive decline in the presence of high brain amyloid, but the observations described above point to the need to diagnose individual resistance to cognitive decline in the presence of high brain amyloid. This individual resistance is something that we also refer to as resiliency.

We used the same approach used in Example 6 to develop Aptamarkers for the prediction of resiliency. We applied the same Neomer library to blood samples from ten individuals that exhibited high brain amyloid and a lack of cognitive decline or a delay in the onset of cognitive decline (high resiliency) and ten individuals that exhibited high brain amyloid and rapid rates of cognitive decline (low resiliency). We also applied this same library to a buffer with no blood added.

The selected libraries were processed for next generation sequencing (NGS) analysis as described in the previous example and the results from this analysis were characterized statistically as described in the previous example as well. FIG. 12 provides the Z scores and fold values for the top 10,000 aptamers from the high resiliency versus low resiliency analysis.

The frequencies of these 10,000 aptamers were then screened for their predictive capacity between the samples that they were selected on as described in Example 6.

The top four aptamers as defined by this approach are presented in FIG. 14. Once aptamers are defined as being meaningful for diagnosis, we refer to them as Aptamarkers.

We also analyzed this same set of NGS data for low resiliency versus high resiliency. FIG. 15 provides the distribution of Z scores and Fold values for the top 10,000 aptamers in this analysis.

It is noted that the Z scores in this case are lower for low versus high compared to high versus low.

These 10,000 sequences were also evaluated for predictive capacity in terms of predicting low resiliency to high brain amyloid in the same way as described previously.

The predictive capacity of the top twenty sequences from this analysis are provided in FIG. 16.

It would be clear to one trained in the art that each of these Aptamarkers has individual utility for diagnosis or prediction of this aspect of the disease of dementia as related to brain amyloid deposition. Taken together as a set of Aptamarkers the diagnostic or predictive power will be greater across individuals as this will enable differentiation of different patterns of the disease. In this example all of the individual Aptamarkers exhibited 100% sensitivity, 100% specificity and 100% accuracy across the twenty samples evaluated.

Aptamarker sequences: High versus low brain amyloid

C-HR_4638,

SEQ ID NO: 51

CCAGATACAGACTTGAGGTTCGAATATCAACCATCGGCGCCAACACCA

CATTCCATCAGATACAGTAGACAGC

C-HR_41,

SEQ ID NO: 52

CCAGATACAGACTTGAGGCTTGAATAGAAACCATCGGCGCCAACAAAC

CATTCGAGCAGAAACAGTAGACAGC

C-HR_2838,

SEQ ID NO: 53

CCAGATACAGACTTGAGGCCTGAATAGTAACCATCGGCGCCAACATCA

CATTCTAGCAGAACCAGTAGACAGC

C-HR_1437,

SEQ ID NO: 54

CCAGATACAGACATGAGGGGTGAATAAGAACCATCGGCGCCAACAAA

GCATTCATCCAGAAACAGTAGACAGC

Aptamarker sequences: Low versus High brain

amyloid

C-LR_3,

SEQ ID NO: 55

CCAGATACAGACGCGAGGCTCGAATTAAAACCATCGGCGCCAACACTC

CATTCTTTCAGAATCAGTAGACAGC

C-LR_133,

SEQ ID NO: 56

CCAGATACAGACGCGAGGCTCGAATTAAAACCATCGGCGCCAACACTA

CATTCTTTCAGATTCAGTAGACAGC

C-LR_504,

SEQ ID NO: 57

CCAGATACAGACCTGAGGTTAGAATTACAACCATCGGCGCCAACATTT

CATTCTTTCAGAGGCAGTAGACAGC

C-LR_531,

SEQ ID NO: 58

CCAGATACAGACACGAGGTAAGAATAAGAACCATCGGCGCCAACATTT

CATTCTCCCAGAAGCAGTAGACAGC

Example 8: The Use of the Neomer Library for the Identification of Aptamarkers for Individual Differences in the Rate of Brain Amyloid Deposition

There are differences across individuals in their relative rates of brain amyloid accumulation. We are not differentiating here between individuals that exhibit high brain amyloid and those that exhibit low brain amyloid, we are differentiating between individuals that end up with high brain amyloid but the length of time the rate at which they accumulate brain amyloid. In FIG. 16 we provide an example of a difference in the rate of brain amyloid accumulation between two individuals in the AIBL Alzheimer's cohort who differ in their relative rate of brain amyloid accumulation.

The individual represented by the dashed line in FIG. 16 accumulates brain amyloid at a slower rate than the individual represented by a solid line. These differences in the rate of brain amyloid accumulation may have significant impact on the risk of developing Alzheimer's disease as a function of brain amyloid, and the rate at which cognitive dysfunction increases. As such it would be useful for clinicians and care-givers to have access to a blood based diagnostic test that predicts the rate of brain amyloid accumulation.

If two individuals exhibit the same level of brain amyloid at a given moment in time, the individual with a slower rate of accumulation may be at a higher risk for developing cognitive dysfunction because such an individual would have been exposed to a higher level of brain amyloid over a longer time period.

As such, we have used the same Neomer library that was applied in examples 6 and 7 to select Aptamarkers for the prediction of the rate of brain amyloid accumulation. Individuals from the AIBL cohort were chosen based on their individual rates of brain amyloid accumulation (fast or slow) and blood samples were used for analysis where the brain amyloid level was similar across samples. Ten samples were chosen for fast brain amyloid accumulation, and ten for slow brain amyloid accumulation.

We performed library of aptamers based selection of Aptamarkers for this example in an identical manner to Examples 6 and 7.

FIG. 17 provides a summary of the Z scores and associated Fold values for each of the top 10,000 aptamers in terms of Z scores identified from the selection of fast versus slow brain amyloid accumulation. The top four Aptamarkers chosen in terms of predictive capacity are characterized in FIG. 18.

Likewise, a summary of the Z scores and the associated Fold values for each of the top 10,000 in terms of Z scores from an analysis from the selection but for slow versus fast is provided in FIG. 19. The top four Aptamarkers chosen in terms of predictive capacity are characterized in FIG. 20. Note, the buffer comparison here does not contain blood sample at all, so it is expected that the best Aptamarkers should not show a significant increase in frequency in this negative control.

It should be noted that it is an implicit strength of the Aptamarker approach to enable prediction of a medical state with the combination of a set of Aptamarkers, ideally Aptamarkers that differ in the biomarkers that they are binding to. The use of a set of Aptamarkers in this manner enables more robust and accurate prediction of the medical state across large numbers of individuals.

Aptamarker sequences: Fast accumulation

c-FAM_02108,

SEQ ID NO: 59

CCAGATACAGACATGAGGTTCGAATTTCAACCATCGGCGCCAACATAA

CATTCTCACAGATCCAGTAGACAGC

c-FAM_00217,

SEQ ID NO: 60

CCAGATACAGACATGAGGTTCGAATTTCAACCATCGGCGCCAACATAC

CATTCTTTCAGACTCAGTAGACAGC

c-FAM_03136,

SEQ ID NO: 61

CCAGATACAGACATGAGGTTCGAATTTCAACCATCGGCGCCAACACAC

CATTCATCCAGATTCAGTAGACAGC

c-FAM_05536,

SEQ ID NO: 62

CCAGATACAGACTTGAGGTAGGAATTAAAACCATCGGCGCCAACATAA

CATTCTTCCAGATCCAGTAGACAGC

Aptamarker sequences: Slow accumulation

c-SAM_06642,

SEQ ID NO: 63

CCAGATACAGACAAGAGGATTGAATATTAACCATCGGCGCCAACACTG

CATTCTAACAGACGCAGTAGACAGC

c-SAM_00707,

SEQ ID NO: 64

CCAGATACAGACATGAGGTATGAATGTTAACCATCGGCGCCAACACTG

CATTCTAACAGACGCAGTAGACAGC

c-SAM_08571,

SEQ ID NO: 65

CCAGATACAGACTAGAGGAATGAATATTAACCATCGGCGCCAACACTG

CATTCTAACAGACGCAGTAGACAGC

c-SAM_03894,

SEQ ID NO: 66

CCAGATACAGACATGAGGGTTGAATTTTAACCATCGGCGCCAACACTG

CATTCTAACAGACGCAGTAGACAGC

The following nucleotide sequences are listed herein, from 5′ to 3′:

SEQ ID NO: 1:

AAANGAAANNNGAAACNNNAAACNTTT

SEQ ID NO: 2:

TAATACGACTCACTATAGGGATAAT

SEQ ID NO: 3:

TTTCNTTTATTATCCCTATAGTGAGTCTATTA

SEQ ID NO: 4:

GAACGAGCGACCTCATACGTATTG

SEQ ID NO: 5:

CAATACGTATGAGGTCGCTCGTTCAAANGTTT

SEQ ID NO: 6:

AAANGAAANN NGAATGNNNA AACNTTTAAA NGAAANNNCA

TTCNNNTTAC NTAA

SEQ ID NO: 7:

AAANGAAANN NCATTCNNNT TACNTAA

SEQ ID NO: 8:

TTTCNTTTAA ANGTTT

SEQ ID NO: 9:

AAANGAAANN NGAATGNNNA AACNTTT

SEQ ID NO: 10:

TTTCNTTTAT TATCCCTATA GTGAGTCGTA TTA

SEQ ID NO: 11:

GAACGAGCGA CCTCATACGT ATTTG

SEQ ID NO: 12:

CAAATACGTA TGAGGTCGCT CGTTCAAANG TTT

SEQ ID NO: 13:

CAAATACGTA TGAGGTCGCT CGTTCTTANG TAA

SEQ ID NO: 14:

CCAGATACAG ACNNGAGGNN NGAATNNNAA CCATCGGCGC

CAACANNNCA TTCNNNCAGA NNCAGTAGAC AGC

SEQ ID NO: 15:

CAAATACGTA TGAGGTCGCT CGTTCCCAGA TACAGAC

SEQ ID NO: 16:

TAATACGACT CACTATAGGG ATAATGCTGT CTACTG

SEQ ID NO: 17:

CAAATACGTA TGAGGTCGCT CGTTCCCAGA TACAGACNNG

AGGNNNGAAT NNNAACCATC G

SEQ ID NO: 18:

GCGCCAACAN NNCATTCNNN CAGANNCAGT AGACAGCATT

ATCCCTATAG TGAGTCGTAT TA

SEQ ID NO: 20:

GGTCAGACGTGTGCTCTTCCGATCGGGGCGCCGATGGT

SEQ ID NO: 23:

CCAGATACAG ACCGGAGGTC TGAATCTCAA CCATCGGCGC

CAACATTTCA TTCAACCAGA ACCAGTAGAC AGC

SEQ ID NO: 24:

CCAGATACAG ACACGAGGTC TGAATCCTAA CCATCGGCGC

CAACAGCTCA TTCGCCCAGA CCCAGTAGAC AGC

SEQ ID NO: 25:

CCAGATACAG ACCTGAGGTC TGAATCTTAA CCATCGGCGC

CAACAACGCA TTCTGCCAGA GTCAGTAGAC AGC

SEQ ID NO: 26:

CCAGATACAG ACCGGAGGAT CGAATCTAAA CCATCGGCGC

CAACAGCGCA TTCCTCCAGA CGCAGTAGAC AGC

SEQ ID NO: 27:

CCAGATACAG ACCTGAGGTC GGAATCCAAA CCATCGGCGC

CAACAGCGCA TTCCTCCAGA CGCAGTAGAC AGC

SEQ ID NO: 28:

CCAGATACAG ACTCGAGGTC TGAATCTGAA CCATCGGCGC

CAACACCGCA TTCATTCAGA CACAGTAGAC AGC

SEQ ID NO: 29:

CCAGATACAG ACCGGAGGTC TGAATCCAAA CCATCGGCGC

CAACAGCGCA TTCCTCCAGA CGCAGTAGAC AGC

SEQ ID NO: 30:

CCAGATACAG ACCCGAGGCC TGAATCCCAA CCATCGGCGC

CAACAGCGCA TTCCTCCAGA CGCAGTAGAC AGC

SEQ ID NO: 31:

CCAGATACAG ACCAGAGGTC TGAATCCCAA CCATCGGCGC

CAACAGGGCA TTCTGGCAGA CTCAGTAGAC AGC

SEQ ID NO: 32:

CCAGATACAG ACCGGAGGTC TGAATCTCAA CCATCGGCGC

CAACAGCGCA TTCTAGCAGA TACAGTAGAC AGC

SEQ ID NO: 33:

CCAGATACAG ACCAGAGGTC TGAATCGTAA CCATCGGCGC

CAACAGCGCA TTCCTCCAGA CGCAGTAGAC AGC

SEQ ID NO: 34:

AAANGWWWNN NGWWWCNNNW WWCNTTT

SEQ ID NO: 35:

AAANGWWWNN NGWWWGNNNW WWCNTTT

SEQ ID NO: 36:

AAANGWWWNN NCWWWCNNNW WWCNTAA

SEQ ID NO: 37:

AACTCGCGAG CCACGGGTAG TCCTCCACCT CACTAGGGGG

TTGGCGGATC TCAGCTGACAAAACAAGATC TTTGCGATCG

SEQ ID NO: 38:

CCCTACACGA CGCTCTTCCG ATCTATCACG CAAATACGTA

TGAGGTCGC TCGTT

SEQ ID NO: 39:

AATGATACGG CGACCACCGA GATCTACACT CTTTCCCTAC

ACGACGCTCT TCCG

SEQ ID NO: 40:

CAAGCAGAAG ACGGCATACG AGATGTGACT GGAGTTCAGA

CGTGTGCTCT TCC

SEQ ID NO: 41:

CCCTACACGA CGCTCTTCCG ATCTATCACG GCGCCAACA

SEQ ID NO: 42:

GGTCAGACGT GTGCTCTTCC GATCGGGTAA TACGACTCAC

TATAGGGATA ATGCTGTCTA CTG

SEQ ID NO: 43:

CCAGATACAG ACCAGAGGCG TGAATTTCAA CCATCGGCGC

CAACACCCCA TTCCTGCAGA TACAGTAGAC AGC

SEQ ID NO: 44:

CCAGATACAG ACTCGAGGAC TGAATCGGAA CCATCGGCGC

CAACAAGACA TTCATACAGA AACAGTAGAC AGC

SEQ ID NO: 45:

CCAGATACAG ACTCGAGGTA AGAATGGGAA CCATCGGCGC

CAACACAGCA TTCAATCAGA GACAGTAGAC AGC

SEQ ID NO: 46:

CCAGATACAG ACAGGAGGGC GGAATTCCAA CCATCGGCGC

CAACATTTCA TTCTAACAGA CACAGTAGAC AGC

SEQ ID NO: 47:

CCAGATACAG ACTAGAGGTA TGAATAGAAA CCATCGGCGC

CAACAATTCA TTCATTCAGA TGCAGTAGAC AGC

SEQ ID NO: 48:

CCAGATACAG ACAAGAGGAC CGAATGTCAA CCATCGGCGC

CAACATTACA TTCATTCAGA TTCAGTAGAC AGC

SEQ ID NO: 49:

CCAGATACAG ACTTGAGGTT CGAATCTCAA CCATCGGCGC

CAACATGACA TTCCTTCAGA TTCAGTAGAC AGC

SEQ ID NO: 50:

CCAGATACAG ACCAGAGGTT TGAATTCCAA CCATCGGCGC

CAACATTACA TTCCTCCAGA ATCAGTAGAC AGC

SEQ ID NO: 51:

CCAGATACAG ACTTGAGGTT CGAATATCAA CCATCGGCGC

CAACACCACA TTCCATCAGA TACAGTAGAC AGC

SEQ ID NO: 52:

CCAGATACAG ACTTGAGGCT TGAATAGAAA CCATCGGCGC

CAACAAACCA TTCGAGCAGA AACAGTAGAC AGC

SEQ ID NO: 53:

CCAGATACAG ACTTGAGGCC TGAATAGTAA CCATCGGCGC

CAACATCACA TTCTAGCAGA ACCAGTAGAC AGC

SEQ ID NO: 54:

CCAGATACAG ACATGAGGGG TGAATAAGAA CCATCGGCGC

CAACAAAGCA TTCATCCAGA AACAGTAGAC AGC

SEQ ID NO: 55:

CCAGATACAG ACGCGAGGCT CGAATTAAAA CCATCGGCGC

CAACACTCCA TTCTTTCAGA ATCAGTAGAC AGC

SEQ ID NO: 56:

CCAGATACAG ACGCGAGGCT CGAATTAAAA CCATCGGCGC

CAACACTACA TTCTTTCAGA TTCAGTAGAC AGC

SEQ ID NO: 57:

CCAGATACAG ACCTGAGGTT AGAATTACAA CCATCGGCGC

CAACATTTCA TTCTTTCAGA GGCAGTAGAC AGC

SEQ ID NO: 58:

CCAGATACAG ACACGAGGTA AGAATAAGAA CCATCGGCGC

CAACATTTCA TTCTCCCAGA AGCAGTAGAC AGC

SEQ ID NO: 59:

CCAGATACAG ACATGAGGTT CGAATTTCAA CCATCGGCGC

CAACATAACA TTCTCACAGA TCCAGTAGAC AGC

SEQ ID NO: 60:

CCAGATACAG ACATGAGGTT CGAATTTCAA CCATCGGCGC

CAACATACCA TTCTTTCAGA CTCAGTAGAC AGC

SEQ ID NO: 61:

CCAGATACAG ACATGAGGTT CGAATTTCAA CCATCGGCGC

CAACACACCA TTCATCCAGA TTCAGTAGAC AGC

SEQ ID NO: 62:

CCAGATACAG ACTTGAGGTA GGAATTAAAA CCATCGGCGC

CAACATAACA TTCTTCCAGA TCCAGTAGAC AGC

SEQ ID NO: 63:

CCAGATACAG ACAAGAGGAT TGAATATTAA CCATCGGCGC

CAACACTGCA TTCTAACAGA CGCAGTAGAC AGC

SEQ ID NO: 64:

CCAGATACAG ACATGAGGTA TGAATGTTAA CCATCGGCGC

CAACACTGCA TTCTAACAGA CGCAGTAGAC AGC

SEQ ID NO: 65:

CCAGATACAG ACTAGAGGAA TGAATATTAA CCATCGGCGC

CAACACTGCA TTCTAACAGA CGCAGTAGAC AGC

SEQ ID NO: 66:

CCAGATACAG ACATGAGGGT TGAATTTTAA CCATCGGCGC

CAACACTGCA TTCTAACAGA CGCAGTAGAC AGC

Wherein W indicate adenine or thymine; and N indicates any nucleotide.

A METHOD FOR REPRODUCIBLE APTAMER SELECTION USED TO IDENTIFY APTAMERS THAT BIND TO UNKNOWN BIOMARKERS

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information

Provisional Applications (1)