The present invention pertains to the sequencing of individual monomers of a polymer and, more particularly, to increasing the sequencing accuracy of a nanopore-based system by controlling sequencing error rates and monomer identification error rates.
Extensive amounts of research and money are being invested to develop a method to sequence DNA, (Human Genome Project) by recording the signal of each base as the polymer is passed in a base-by-base manner through a recording system. Such a system could offer a rapid and low cost alternative to present methods based on chemical reactions with probing analytes and as a result might usher in a revolution in medicine.
Research in this area to date has focused on the question of developing a measurement system that can record a sufficient signal from each monomer in order to distinguish one monomer from another. In the case of DNA, the monomers are the well-known bases: adenine (A), cytosine (C), guanine (G), and thymine (T). It is necessary that the signals produced by each base be: a) different from that of the other bases, and b) be different by an amount that is substantially larger than the internal noise of the measurement device. For convenience, we will refer to this aspect of the sequencing as the Signal Amplitude Problem (SAP). The SAP is fundamentally limited by the specific property of the polymer being probed in order to differentiate the monomers and the signal to noise ratio (SNR) of the measurement device used to probe it.
A separate question, and one that has been overlooked to date, is the need to control, and thereby preserve, the order of the monomers while the measurement is made. We will refer to this as the Sequence Order Problem (SOP). For a polymer pulled through a measurement device it might seem that SOP is simply a question of providing a very well controlled pulling force. In a simple nanopore model, the polymer motion is one-dimensional, i.e., along the major axis of the polymer, and the total distance, s, the polymer has been displaced in time t is given by s=vDCt, where vDC is the average translocation velocity. However, such a model ignores the often critical effect of diffusion, which causes the polymer to move unpredictably. This phenomenon, also known as Brownian motion, results in a “random walk” such that the average net displacement in a given time t is proportional to (Dt)1/2 for an entity with diffusion rate D. This random motion is superimposed on the average translocation velocity resulting in an inherent uncertainty in the number of bases that have passed through the measurement device.
The diffusion rate D is given by D=D0e−E/kt in which D0 is a constant, E is the activation energy, k is Boltzman's constant and T is temperature. The motion of a measured molecule is formally equivalent to that of a rigid particle moving between periodic potential energy wells separated by energy barriers of height E. For passage of DNA through a narrow pore, the motion can be approximated as one-dimensional, and can be represented by the one-dimensional potential shown in
The rate of motion of the molecule in a one-dimensional potential as shown in
The energy barrier shown in
The diffusion constant of single stranded DNA in αHL under conditions of zero applied voltage was first measured by Mathe in 2003. The Mathe experiment only gave a value of D at 15° C. and was not sufficient to enable determination of the activation energy for diffusional processes in this system. Without knowing E, it is impossible to determine the extent to which diffusion affects, and within the limit dominates, the molecular motion under practical conditions. To the best of our knowledge, there have been no prior experiments to determine E for any kind of nanopore.
An idea of the effect of diffusion can be obtained by using the Mathe value of D for the case of zero voltage bias. For DNA threading αHL at 15° C. (the Mathe case) the net one-dimensional motion due to diffusion alone in 100 microseconds (μs) is calculated to be approximately 5 bases. Thus, in a notional example in which a given base is measured for 100 the DNA would on average have moved a linear distance away from its desired position a total of 5 bases due to diffusion, resulting in an unacceptable SOP. In a second notional case in which a given base is measured for 20 μs and a total of five bases are measured, by the time the fifth base is measured the average error in the DNA position would again be 5 bases. This simple example shows that, if not taken into account, the diffusive motion of the polymer could quickly overwhelm any attempt to sequence it. Further, the positional errors occur no matter how sensitive the measurement device is that identifies each base.
One way to tackle the SOP is to reduce the time used to measure each base. In the simple example above, going to a measurement time per base of 1 μs would allow 5 bases to be measured in 5 μs, thereby reducing the mean random displacement due to diffusion to 0.5 bases. However, for any real recording system, reducing the measurement time tm significantly exacerbates the SAP. To date, no base-by-base serial method has been able to differentiate DNA bases in a single-base tm of order 10 μs because of inadequate measurement sensitivity. Reducing tm and, therefore, increasing the measurement bandwidth in inverse proportion, reduces the signal to noise ratio of the individual base measurement at least by an amount of order the square root of time reduction. Thus, for tm=1 μs the SNR relative to tm=100 μs is reduced by at least a factor of 10. Conversely, addressing the SOP directly by minimizing the effect of diffusion allows longer measurement times to be used, thereby alleviating the SAP.
To date, the impact of diffusion on systems that aim to sequence a polymer in a monomer-by-monomer or base-by-base serial manner has been overlooked. Owing to the very small distance between monomers, diffusion has the potential to greatly limit the ability of any measurement device to sequence a polymer above what might be required based on the need to record the signal from an individual monomer. What is needed in order to develop a practical polymer sequencing system is an approach that reduces the net uncertainty in position due to diffusion, and incorporates this improvement in the design of the measurement protocol in order to reduce the overall combined effect of the SAP and SOP.
The system and method of the present invention utilizes a combination of measurement parameters to limit the sequencing error rate produced by diffusional motion of a polymer in solution in order to optimize the sequencing accuracy of the overall system and allow single-nucleotide level sequencing. The sequence error is the sum of the sequence order error rate (SOER) and the monomer identification error rate (MIER). More specifically, the SOER is the probability that a series of monomers or bases will be correctly identified but reported in the wrong sequence order. There are three types of sequence order error: 1) a base counting error in which the polymer does not move in the desired direction at the rate expected and the same base is inadvertently reported multiple times; 2) a base skipping error in which the polymer moves faster than expected and a base is not reported or the signals from one or more bases are correctly measured but inadvertently combined and reported as a single base; and 3) a base repeat error in which the polymer moves in the opposite of the desired direction and one or more bases are re-measured and inadvertently repeated in the reported sequence. The MIER is the probability that a base is measured erroneously and reported as a different base.
In accordance with the method of the present invention, a user selects a measurement device or system and one or more means for reducing the diffusional motion of a polymer within the system. In a preferred embodiment, the measuring system includes a first fluid chamber separated from a second fluid chamber by a barrier structure including a nanopore. The nanopore provides a fluid path connecting electrolytes in the first and second chambers. The system further includes electrodes extending into the first and second chambers, a power source, a controller and a temperature control stage for regulating the temperature of electrolytes in the first and second chambers. In use, electrical current signals sensed by the current sensor are processed in order to calculate the monomer sequence of a polymer driven through the nanopore.
Once a measurement device is selected, one or more means for reducing diffusional motion of a polymer to be sequenced are utilized, depending on the measurement device selected. Means for reducing the diffusional motion of a polymer include utilizing a modified nanopore adapted to increase the effective frictional force for polymer motion through the nanopore, cooling an electrolyte solution containing the polymer, utilizing an electrolyte solution adapted to reduce the diffusion constant of a polymer in the solution (such as an electrolyte having an increased salt concentration), or combinations thereof. Next, a major system parameter, such as average translocation velocity or measurement time, is selected based on the characteristics of the measurement device and an algorithm is utilized to jointly optimize the SOER and the MIER of the system. The algorithm is preferably performed on a computer system in communication with the controller of the measurement device. Although preferably utilized for single-nucleotide sequencing, the invention can be utilized in combination with any method that seeks to sequence a polymer, or indeed any method that measures a property of a polymer. However, when combined with new methods for improving pore current measurement sensitivity, the invention offers a means to enable sequencing of individual DNA molecules.
Additional objects, features and advantages of the present invention will become more readily apparent from the following detailed description of a preferred embodiment when taken in conjunction with the drawings wherein like reference numerals refer to corresponding parts in the several views.
FIG. 5 is a chart illustrating mean aggregate SNR vs. vDC for fixed tm assuming frequency independent measurement system noise;
With initial reference to
Orifice 17 must be small enough that polymer 18 produces a measurable blocking signal when located within the channel. In the case where polymer 18 is DNA, orifice 17 preferably has a diameter on the order of 2 nanometers (nm) at its narrowest point. In any case, at this point it should be realized that measurement device 1 is exemplary only, and the present invention can be employed with any type of system used in sequencing of individual monomers or a unique set of monomers of a polymer that is limited in its accuracy by the effect of diffusion. The term “nanopore” should be taken to include any structure that is used to guide a polymer so that its individual monomers or bases can be measured in a base-by-base manner. To this end, further details regarding some basic components of measurement device 1, as well as certain variants thereof, are set forth in pending U.S. Patent Application Publication No. 2008/0041733 entitled “Controlled Translation of a Polymer in an Electrolytic Sensing System” filed Aug. 16, 2007 which is incorporated herein by reference. Therefore, the above description is basically provided for the sake of completeness. The present invention is actually concerned with polymers in general and to any method that seeks to sequence a polymer. However, because of its technological significance and large body of existing experimental data, the specifics of the invention will be discussed further below in terms of sequencing DNA via a nano-scale pore. Although base-by-base sequencing is discussed, it should be understood that sequencing of unique monomer sets (such as a set of three adenine bases, for example), can also be improved utilizing the present method.
Experiments have shown that DNA passage through a nano-scale orifice of comparable diameter to the DNA is limited by an essentially frictional interaction, such that the average translocation velocity, vDC, is proportional to the applied force. Because each base of DNA carries a net charge, a force to induce translocation through a pore can easily be applied by imposing an electric field across the pore. It is therefore relatively straightforward to arrange for DNA to pass through a nanopore at any desired average velocity up to a limit that depends on the maximum allowable applied voltage, the effective friction of the pore, and the breaking force of the DNA. Similarly, the properties of various available approaches to measure the signal of an individual (or small number of) DNA bases are relatively well known and the duration of each individual measurement, tm, can be set over a range that is limited by the inherent signal to noise ratio (SNR) of the approach. In the work that has been done to date, vDC and tm have been analyzed and preferred values postulated only in light of the signal amplitude problem (SAP) and large scale issues such as the overall total time required to sequence a human genome.
The present invention was premised on recognizing and establishing a path to reduce the diffusion driven motion of DNA in at least one system of significant technological relevance for sequencing. To this end, it has been determined that the rate of passage of DNA through an αHL protein pore can be reduced by orders of magnitude by methods that can be used singly, or in combination with each other. For example, mutating αHL or adding an internal adapter to reduce its internal dimensions will increase the energy barrier, E, resulting in a reduction in the diffusion rate, D. Similarly, there is an indication that increasing the electrolyte concentration and adding glycerol to a solution containing DNA can reduce the average translocation rate, vDC, suggesting an increase in E and reduction in D. Finally, the inventors of the present invention have been able to explicitly show that the diffusion rate of DNA in αHL can be reduced by a factor of over 100 by cooling the electrolyte from 20° C. to −5° C. In one preferred embodiment of the present invention, an αHL-based measurement apparatus and protocol is provided to reduce diffusional motion of the target polymer 18. As will become more fully evident below, one or more of the above methods can be applied to other potential sequencing methods that share common features.
A detailed projection of the relationship between diffusion constant and two principal types of sequencing error is given in
It is important to note that the analysis summarized in
As indicated, the SOP can be reduced by reducing the time used to measure each base. A tm of 1 μs would produce a D value (at 15° C. in αHL) of 0.125 bases2/measurement, giving an error for the two components plotted in
A preferable approach is to reduce diffusion to the greatest feasible extent and then to optimize the system based on its resulting properties. The example of
However, as vDC is changed, the average number of measurements per base, N, changes. As N changes, the mean aggregate SNR of the measurement of an individual base, and so the MIER, will also change.
As discussed, the SNR of the measurement device determines the error rate in distinguishing one monomer from the others. This is the signal amplitude problem and the precise relationship between measurement device SNR and MIER depends on the specific technology used by the measurement device and the physical properties of the monomer that produce the measured signal. However, regardless of the exact functional relationship, it is clear from
With particular reference to
Step 0.2 fundamentally addresses the SOP. Even if the SAP could be reduced to zero, or effectively zero in terms of the errors in distinguishing individual bases by appropriate design of the measurement device and appropriate setting of vDC and tm, sequencing may be impossible due to randomization in the position of the bases due to diffusion. Thus, it is essential that the method and apparatus used to sequence the polymer be configured to take into account the contribution of polymer motion due to diffusion. A number of potential methods may be utilized to reduce the diffusion constant of a polymer in solution, including: reducing the temperature of the solution, adding an agent to increase viscosity such as glycerol, changing the ionic concentration of the electrolyte, and adding functional groups to the pore and/or adducts to the DNA that increase the effective friction through the pore. Additionally, secondary molecules can be utilized within the pore to reduce the diffusional motion of a polymer traveling through the pore. For example, with respect to measurement device 1, temperature stage 14 may be utilized to cool first and second electrolyte solutions 6 and 8, wherein electrolyte solutions 6 and 8 have an increased ionic concentration and a higher viscosity due to glycerol. Further, orifice 17 is preferably a protein pore mutated or chemically altered to increase the effective friction of polymer 18 through orifice 17 and may include a secondary or adaptor molecule (not shown) to decrease the internal diameter of orifice 17. The method or combination of methods that is used will depend on the type of measurement approach chosen in Step 1. Once the apparatus is constructed, the diffusion parameters can be quantified by methods known to those familiar with the art for the type and length of polymer to be sequenced.
In Step 3, major system parameters, such as vDC and tm, are selected to jointly optimize the SOER and the MIER. In accordance with the invention, the innovation of controlling polymer diffusion is combined with the inherent trade-offs in the performance of the base identification approach in an algorithm to minimize the combination of the SOER and the MIER. The basic structure of a preferred algorithm is summarized in
In the analysis of the SOER summarized in
Most likely, for the initial value of the average total number of data points per base, the SOER and MIER will not be identical, and one will dominate the other. In that case, a new value of tm, is chosen and the process repeated as shown in
Alternatively, as depicted in
As before, the MIER and resulting SOER are then compared and in this latter case, if MIER>SOER the product of tm and N is increased and the algorithm repeated. If MIER<SOER then the product of tm and N is reduced and the algorithm is repeated. Once the product of tm and N has been set so that the combination of the SOER and MIER has been balanced to reach an acceptable value, the value of tm should be made as small as possible consistent with the engineering and cost limitations of acquiring the data very quickly. The smaller tm the higher the time resolution will be to capture signals from bases that do not remain in the pore long due to random diffusion driven motion.
As can be seen by comparing the first algorithm depicted in
These three algorithms are given as examples of the overall process of varying the system parameters of tm, N and vDc in order to reduce the total sequence error rate, and are not meant to be limiting in their specific embodiments. In all cases, the average time the system is expected to remain recording one specific base is used in combination with the statistics of diffusion to calculate the SOER.
Generally, the goal is to reduce diffusion as much as practically possible. However, depending on the physical properties of the measurement device, the modifications made to reduce diffusion (e.g., cooling the electrolyte) may directly alter the SNR measured for each base. In this case, the balance between SOER and MIER will involve multiple adjustable parameters. The final system setting will be a synergistic combination of these two or more parameters and a clear optimum setting may not exist, but rather a broad range of possible operating conditions will be applicable. Nevertheless, regardless of the complexity of the balancing condition, a trade-off between the SOER and the MIER is required for a practical sequencing system.
The means for calculating measurement device parameters to jointly balance SOER and MIER may be in the form of a computer 50, or may be standard iterative human calculation methods. For example, as depicted in
Advantageously, the present invention addresses not only the SOP of a system, but the SAP as well, and provides a system, and method for balancing a measurement device in such a way that synergistic results are obtained, allowing unprecedented sensitivity and single-nucleotide sequencing. Although described with reference to a preferred embodiment of the invention, it should be readily understood that various changes and/or modifications can be made to the invention without departing from the spirit thereof. In general, the invention is only intended to be limited by the scope of the following claims.
The present application represents a continuation of U.S. patent application Ser. No. 12/395,682 entitled “System and Method to Improve the Accuracy of Sequencing a Polymer” filed Mar. 1, 2009 which claims the benefit of U.S. Provisional Patent Application Ser. No. 61/032,318 entitled “System and Method to Improve Sequencing Accuracy of a Polymer” filed Feb. 28, 2008.
The U.S. Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Grant No. 1R43HG004466-01 awarded by the National Institutes of Health and under Grant No. FA9550-06-C-0006 awarded by the U.S. Air Force Office of Scientific Research.
Number | Date | Country | |
---|---|---|---|
61032318 | Feb 2008 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12395682 | Mar 2009 | US |
Child | 13538537 | US |