This invention relates, generally, to methods of species categorization. More specifically, it relates to methods of determining DNA barcodes for efficient species categorization without relying on traditional chemical-based DNA sequencing of lengthy sections of nucleotides.
The methods of DNA sequencing and species categorization provide essential insight into basic biological research for studied species, and help describe relationships between different species. The knowledge gained through DNA sequencing techniques is useful across a broad scientific spectrum, such as by conserving biodiversity [1],estimating phyletic diversity, identifying disease vectors [2], authenticating herbal products [3], unambiguously labeling food products [4, 5], and protecting endangered species [1]. Rather than sequencing an entire DNA strand, researchers determined that DNA barcodes could be determined based on a targeted gene, and that these barcodes yield accurate species identifications. A DNA barcode consists of a short strand of DNA sequence taken from a targeted gene like COI or cox I gene (Cytochrome C Oxidase 1) [6] present in the mitochondrial gene in animals. As such, during the early 21st century, DNA sequencing techniques dramatically improved as quicker categorizations were possible based on these DNA barcodes.
To determine the DNA barcode, traditional sequencing methods based on chemical analyses are widely used in the biological community. Recently, nanopore-based sequencing methods [7] have been explored in a dual nanopore system for a cost effective, high throughput, chemical-free, and real time barcode generation. Dual nanopore systems determine DNA barcodes by scanning a captured dsDNA (double stranded DNA) multiple times as the strand passes through both pores of the dual nanopore system, applying a net periodic bias across the two pores. However, such a system relies on the accurate calculation of dwell time or time of flight (hereinafter “TOF”') of the barcodes (e.g., tags) using the current blockage information from individual nanopores. As the tags are heavier and bulkier in nature, they produce significant current blockage (e.g., increased dwell time) compared to the normal nucleotide monomers. The disparate velocity of tags and monomers within a segment leads to an over/underestimation of the distance between sequential tags if only dwell or TOF velocity information is used.
Accordingly, what is needed is an improved method of DNA barcoding to efficiently categorize species without suffering from over/underestimation problems related to the distance between measured tags within the DNA sequence. However, in view of the art considered as a whole at the time the present invention was made, it was not obvious to those of ordinary skill in the field of this invention how the shortcomings of the prior art could be overcome.
While certain aspects of conventional technologies have been discussed to facilitate disclosure of the invention, Applicant in no way disclaims these technical aspects, and it is contemplated that the claimed invention may encompass one or more of the conventional technical aspects discussed herein.
The present invention may address one or more of the problems and deficiencies of the prior art discussed above. However, it is contemplated that the invention may prove useful in addressing other problems and deficiencies in a number of technical areas. Therefore, the claimed invention should not necessarily be construed as limited to addressing any of the particular problems or deficiencies discussed herein.
In this specification, where a document, act or item of knowledge is referred to or discussed, this reference or discussion is not an admission that the document, act or item of knowledge or any combination thereof was at the priority date, publicly available, known to the public, part of common general knowledge, or otherwise constitutes prior art under the applicable statutory provisions; or is known to be relevant to an attempt to solve any problem with which this specification is concerned
The long-standing but heretofore unfulfilled need for a method of categorizing a species associated with a segment of double-stranded DNA is now met by a new, useful, and nonobvious invention.
The novel method includes a step of passing a segment of double-stranded DNA through a singular cylindrical nanopore formed within a test chamber. The segment of double-stranded DNA may include a plurality of monomers, a first protein tag, and/or a subsequent protein tag. Each of the plurality of monomers and each protein tag may have an equal size, shape, and/or volume. In an embodiment, the test chamber may also include at least two opposing longitudinal walls joined together by at least two opposing lateral walls, such that the singular cylindrical nanopore may be formed between the two opposing longitudinal walls, such that a central axis of the singular cylindrical nanopore may be parallel to each of the at least two opposing lateral walls. In an embodiment, the singular cylindrical nanopore may comprise an associated diameter of 2σ, where σ is a diameter of each of the plurality of monomers, the first protein tag, and/or the subsequent protein tag.
In some embodiments, the method may also include a step of calculating an average scanning velocity of the segment of double-stranded DNA by dividing a length of the segment of double-stranded DNA by an average scanning time for the double-stranded DNA taken for multiple scans. In these other embodiments, the method comprises a step of retaining at least a portion of the segment of double-stranded DNA within the singular cylindrical nanopore throughout each of the multiple scans.
In some embodiments, the method may include a step of passing the segment of double-stranded DNA through the singular cylindrical nanopore in an opposing direction and/or repeating the steps of calculating the average scanning velocity, calculating the estimated distance between the first protein tag and/or the subsequent protein tag, calculating the estimated number of monomers of the plurality of monomers, calculating the weighted velocity of the segment of double-stranded DNA, and/or calculating the distance between the first protein tag and the subsequent protein tag. In addition, a bias voltage may be applied to the test chamber in a reverse direction prior to passing the segment of double-stranded DNA through the singular cylindrical nanopore in the opposing direction.
An estimated distance between a first protein tag and a subsequent protein tag of the segment of double-stranded DNA may be calculated by measuring, for the first protein tag and/or the subsequent protein tag, a dwell time and/or a dwell velocity based on an entry time into the singular cylindrical nanopore and/or an exit time from the singular cylindrical nanopore. In some embodiments, using the estimated distance between the first protein tag and the subsequent protein tag, an estimated number of monomers of the plurality of monomers that are disposed between the first protein tag and the subsequent protein tag may also be calculated.
In some embodiments, a weighted velocity of the segment of double-stranded DNA may be calculated using the dwell velocity for each of the first protein tag and/or the subsequent protein tag, the average scanning velocity of the segment of double-stranded DNA, and/or the estimated number of monomers. In these other embodiments, the weighted velocity of the segment of double-stranded DNA may be calculated using
where vweightU→D may represent the weighted velocity in a downward direction through the singular cylindrical nanopore, Nmn may represent the estimated number of monomers of the plurality of monomers, vdwellU→D(m) may represent the dwell velocity of the first protein tag in the downward direction through the singular cylindrical nanopore, vdwellU→D(n) may represent the dwell velocity of the subsequent protein tag in the downward direction through the singular cylindrical nanopore, and/or
In some embodiments, the method may further comprise a step of passing the segment of double-stranded DNA through the singular cylindrical nanopore in an opposing direction. As such, the weighted velocity of the segment of double-stranded DNA in an upward direction through the singular cylindrical nanopore may be calculated using
Moreover, in some embodiments, the method may comprise a step of calculating the distance between the first protein tag and the subsequent protein tag by multiplying the weighted velocity of the segment of double-stranded DNA by a time delay between the entry time of the first protein tag and the entry time of the subsequent protein tag. In these other embodiments, the steps of calculating a distance between sequential protein tags may be repeated for a plurality of protein tags within the segment of double-stranded DNA.
In some embodiments, the novel method may also include a step of applying a first voltage to a first side of a test chamber that defines a singular nanopore therethrough. In these other embodiments, based on the applied first voltage, the method may further comprise a step of passing the segment of double-stranded DNA through the first side of the singular nanopore defined by the test chamber. In this manner, the method may further comprise a step of applying a second voltage to a second side of the test chamber, with the second side of the test chamber being opposite the first side of the test chamber, such that a bias voltage applied to the test chamber may reverse. As such, based on the applied second voltage, the method may comprise a step of passing the segment of double-stranded DNA through the second side of the singular nanopore in a direction toward the first side of the test chamber.
In some embodiments, the novel method may further comprise the step of calculating the distance between the first protein tag and the subsequent protein tag for each of a plurality of protein tags on a segment of dsDNA. In some embodiments, the method may also include a step of generating a barcode for the segment of double-stranded DNA by arranging the plurality of protein tags of the segment of double-stranded DNA in sequential order.
An object of the invention is to provide efficient and accurate methods of calculating distances between sequential protein tags of a double-stranded DNA, thereby providing for efficient categorization of species based on the calculated DNA barcode.
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not restrictive.
The invention accordingly comprises the features of construction, combination of elements, and arrangement of parts that will be exemplified in the disclosure set forth hereinafter and the scope of the invention will be indicated in the claims.
For a fuller understanding of the invention, reference should be made to the following detailed description, taken in connection with the accompanying drawings, in which:
In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings, which form a part thereof, and within which are shown by way of illustration specific embodiments by which the invention may be practiced. It is to be understood that one skilled in the art will recognize that other embodiments may be utilized, and it will be apparent to one skilled in the art that structural changes may be made without departing from the scope of the invention.
As such, elements/components shown in diagrams are illustrative of exemplary embodiments of the disclosure and are meant to avoid obscuring the disclosure. Any headings, used herein, are for organizational purposes only and shall not be used to limit the scope of the description or the claims.
Furthermore, the use of certain terms in various places in the specification, described herein, are for illustration and should not be construed as limiting. For example, any reference to an element herein using a designation such as “first,” “second,” and so forth does not limit the quantity or order of those elements, unless such limitation is explicitly stated. Rather, these designations may be used herein as a convenient method of distinguishing between two or more elements or instances of an element. Therefore, a reference to first and/or second elements does not mean that only two elements may be employed there or that the first element must precede the second element in some manner. Also, unless stated otherwise a set of elements may comprise one or more elements
Reference in the specification to “one embodiment,” “preferred embodiment,” “an embodiment,” or “embodiments” means that a particular feature, structure, characteristic, or function described in connection with the embodiment is included in at least one embodiment of the disclosure and may be in more than one embodiment. The appearances of the phrases “in one embodiment,” “in an embodiment,” “in embodiments,” “in alternative embodiments,” “in an alternative embodiment,” or “in some embodiments” in various places in the specification are not necessarily all referring to the same embodiment or embodiments. The terms “include,” “including,” “comprise,” and “comprising” shall be understood to be open terms and any lists that follow are examples and not meant to be limited to the listed items.
Referring in general to the following description and accompanying drawings, various embodiments of the present disclosure are illustrated to show its structure and method of operation. Common elements of the illustrated embodiments may be designated with similar reference numerals.
Accordingly, the relevant descriptions of such features apply equally to the features and related components among all the drawings. For example, any suitable combination of the features, and variations of the same, described with components illustrated in
As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. As used in this specification and the appended claims, the term “or” is generally employed in its sense including “and/or” unless the context clearly dictates otherwise.
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present technology. It will be apparent, however, to one skilled in the art that embodiments of the present technology may be practiced without some of these specific details. The techniques introduced here can be embodied as special-purpose hardware (e.g. circuitry), as programmable circuitry appropriately programmed with software and/or firmware, or as a combination of special-purpose and programmable circuitry. Hence, embodiments may include a machine-readable medium having stored thereon instructions which may be used to program a computer (or other electronic devices) to perform a process. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, compacts disc read-only memories (CD-ROMs), magneto-optical disks, ROMs, random access memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing electronic instructions.
As used herein, the term “communicatively coupled” refers to any coupling mechanism known in the art, such that at least one electrical signal may be transmitted between one device and one alternative device. Communicatively coupled may refer to Wi-Fi, Bluetooth, wired connections, wireless connection, and/or magnets. For ease of reference, the exemplary embodiment described herein refers to Wi-Fi and/or Bluetooth, but this description should not be interpreted as exclusionary of other electrical coupling mechanisms.
As used herein, the terms “about,” “approximately,” or “roughly” refer to being within an acceptable error range (i.e., tolerance) for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined (e.g., the limitations of a measurement system) (e.g., the degree of precision required for a particular purpose, such as determining DNA barcodes for efficient species categorization without relying on traditional chemical-based DNA sequencing of lengthy sections of nucleotides). As used herein, “about,” “approximately,” or “roughly” refer to within ±25% of the numerical.
All numerical designations, including ranges, are approximations which are varied up or down by increments of 1.0, 0.1, 0.01 or 0.001 as appropriate. It is to be understood, even if it is not always explicitly stated, that all numerical designations are preceded by the term “about”. It is also to be understood, even if it is not always explicitly stated, that the compounds and structures described herein are merely exemplary and that equivalents of such are known in the art and can be substituted for the compounds and structures explicitly stated herein.
Wherever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.
Wherever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 1, 2, or 3 is equivalent to less than or equal to 1, less than or equal to 2, or less than or equal to 3.
The present invention pertains to methods of accurately determining DNA barcodes using a cylindrical nanopore as opposed to a dual nanopore architecture. The methods of the present invention explain the underestimation of DNA tags caused by the fast-moving nucleotides in between the barcodes of a strand using tension propagation theory [8]. Instead, the methods described herein, schematic and graphical diagrams of which are shown in
As shown in particular in
In addition, still referring to
Similar to a double nanopore setup, in an embodiment, the single cylindrical nanopore 16 may comprise a periodical variation in the differential bias applied at nanopore 16 to scan the co-captured DNA multiple times. The force bias direction may be altered when either of the end tags is detected at the nanopore preventing the DNA chain from escaping the nanopore for a long time. As such, in some embodiments, (e.g., the embodiment shown in
As shown in
Moreover, as shown in
As described above, entry time ti(m) and/or exit time tf(m) of each tag 22 and/or monomer with index m may be recorded as the monomer/tag passes through the nanopore 16 membrane during each scan event, resulting in a calculation of the dwell time W(m). As shown in
Where tiU→D(m) and tfU→D(m) may represent the arrival and exit times of a monomer with index m through nanopore 16 traveling in a downward, as shown in
The presence of tags with heavier mass (mtag>mbulk) and/or larger solvent friction (γtag>γbulk) may introduce a large variation in the dwell time and/or, hence, a large variation in the dwell velocities of the dsDNA monomers and/or tags, as shown in
As shown in
If the dsDNA were a rigid rod, then the barcode distance (dmnU→D) between tags Tm and Tn may be calculated by:
For U→D translocation; the same set of equations may be derived for D→U translocation by interchanging the indices U to D and/or vice versa. Equations 3a-3c may provide the shortest distance between the tags, but not necessarily the contour length, or the actual distance, between the tags. As such, such a calculation may likely be to provide an underestimation of the barcodes.
Unlike a rigid rod, tension propagation is important in the semi-flexible dsDNA chain's motion in the presence of an external bias force, as the motion of the dsDNA sub-chain in the cis side decouples into two domains [8, 9]. In an embodiment, as the dsDNA travels through the nanopore 16, after the tag 22 Tm translates through the nanopore 16, the preceding monomers may be quickly dragged into the nanopore 16 quickly by the tension front of the dsDNA, similar to an uncoiling effect of a rope pulled from one end. As such, faster motion may occur as the monomer strand translates through the nanopore 16, hitting a maximum at the subsequent tag 22 Tm±1 with greater inertia and/or viscous drag. In this embodiment, at this tension propagation time, the faster motion of the monomers (e.g., shown in
Accordingly, in an embodiment, a first improved method for accurately determining tag 22 locations, without underestimations, may include measuring a barcode from known end-to-end tag 22 distances. By adding additional tags 22 disposed at the approximate ends of a dsDNA chain or by considering two end tags 22 (T1 and T8, with a distance therebetween being defined as d18≃L), an average velocity for the dsDNA chain may be calculated by:
Where τ18U→D may represent the time delay of arrival for tags 22 T1 and/or T8 at the nanopore 16 for U→D scan direction. The barcode distance between tags 22 Tm and/or Tn may then be calculated by multiplying the time delay with the v18U→D velocity:
The method is effective for estimating long-spaced barcodes; however, the method may be prone to overestimate barcode distances if multiple tags 22 are next to each other.
As such, in an embodiment, a second improved method including a two-step process may be employed to correct for overestimations using the average scan time for the entire time, measured experimentally, to estimate the average velocity of the dsDNA chain. The scan length Lscan may be the maximum length up to which the dsDNA segment (e.g., including monomers and tags 22) remains captured inside nanopore 16 for scanning events. The scan length may denote the theoretical maximum beyond which the dsDNA will escape from the nanopore 16, L≈Lscan. The average scanning velocity from a number of repeated scans, such as 500 independent scans, may be calculated by Equation 6:
Where τscan(i) may represent the scan time for the ith event, Nscan may represent the number of scanning events, and/or the average chain velocity may represent vchain≈
During the first step of the method, the barcode distance between Tm and Tn may be calculated using only tag velocities vdwell(m) and vdwell(n), using Equations 3a-3c. The estimated distance dmn may then be used to approximately calculate the number of monomers Nmn=dmnU→D/b1 present in a segment joining the two tags Tm and Tn, with b1 being the bond-length. In the second step, the segment velocity may be re-calculated by accounting weighted velocity contribution from both tag 22 and non-tag counterpart as:
The same set of equations for D→U direction may be obtained by interchanging U with D. The barcodes may be finally calculated by multiplying the weighted two-step velocity by the tag time delay as:
The two-step method may accurately capture barcode distances across the range of the dsDNA segment, independent of the proximity of the sequential tags. The underlying concept used in the single nanopore case may be equally applicable to other multi-nanopore systems which use the dwell time and time of flight velocities to measure the barcodes.
To test the methods described herein, an in silico coarse-grained (CG) model of a dsDNA segment including 1,024 monomers interspersed with 8 barcodes at different distances shown in
By implementing the barcode determination method described above, utilizing an in-silico Brownian dynamics scheme on a model dsDNA with known locations of the barcodes, a broad distribution of DNA tags may be accurately identified for species classification without overestimation and/or underestimation issues. The method may include the scanning of dsDNA through a cylindrical nanopore multiple times and/or uses the dwell time data of the tags in conjunction with a weighted extrapolation scheme to calculate the average velocities of the chain segment in between two tags. Using one of the tags as a reference, the barcodes may be calculated multiplying time delays between sequential tags by the corresponding segment velocities using Equation 6 and Equation 7.
The advantages set forth above, and those made apparent from the foregoing description, are efficiently attained. Since certain changes may be made in the above construction without departing from the scope of the invention, it is intended that all matters contained in the foregoing description or shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
It is also to be understood that the following claims are intended to cover all of the generic and specific features of the invention herein described, and all statements of the scope of the invention which, as a matter of language, might be said to fall therebetween.
This nonprovisional patent application is a continuation of and claims the benefit of U.S. Nonprovisional patent application Ser. No. 17/649,577 entitled “METHODS OF DETERMINING DNA BARCODES FOR EFFICIENT SPECIES CATEGORIZATION USING NANOPORE TRANSLOCATION” filed Feb. 1, 2022 by the same inventors, which claims the benefit of and priority to U.S. Provisional Patent Application No. 63/199,898 entitled “METHODS OF DETERMINING DNA BARCODES FOR EFFICIENT SPECIES CATEGORIZATION USING NANOPORE TRANSLOCATION” filed Feb. 1, 2021 by the same inventors, all of which are incorporated herein by reference, in their entireties, for all purposes.
This invention was made with Government support under 1R21HG011236-01 as awarded by the National Human Genome Research Institute at the National Institutes of Health. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63199898 | Feb 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17649577 | Feb 2022 | US |
Child | 18787273 | US |