Using DNA for storing data is an emerging technology.
Traditional biological DNA data storage is limited to four states; the state values are represented by the nucleotide present: (A) adenine, (C) cytosine, (G) guanine, or (T) thymine. A data storage bit is represented by one nucleotide on one half (single strand) of the DNA double strand; the other half of the DNA strand has the complementary nucleotide, which offers redundancy but not extra data capability.
This disclosure provides methodology that massively increases the amount of data that can be stored on DNA, with the theoretical storage limit exceeding 1 binary bit per atom. Particularly, this disclosure provides methodologies that utilize isotopes in natural nucleotides, synthetic nucleotides and other nucleotides for data storage. The nucleotides, and thus the data they encode, can be read, e.g., by spectroscopy such as Surface-Enhanced Raman Spectroscopy (SERS). Other molecules, in addition to synthetic or natural nucleotides, can be similarly used. In some implementations, any of the nucleotides or molecules can be modified with at least one isotope of at least one of H, C, N or O.
This disclosure provides, in one particular implementation, a method of storing data on a DNA strand. The method includes providing a DNA strand having at least one isotope-modified nucleotide comprising at least one isotope of carbon, nitrogen, oxygen or hydrogen, and assigning a bit pattern to the at least one isotope-modified nucleotide that is different than a bit pattern assigned to a non-isotope-modified nucleotide. The nucleotide can be a natural nucleotide, a synthetic nucleotide, or an otherwise-modified nucleotide (e.g., with a different atom in a cyclic position or a ligated ion or atom).
A similar method can be utilized for storing data on any molecule, crystal, or other material that can be isotope-modified in such a way that physical or logical order is maintained.
This disclosure provides, in another particular implementation, a DNA strand or an RNA strand encoding data, the DNA or RNA strand having at least one nucleotide having a first bit pattern assigned thereto, and at least one modified nucleotide having a second bit pattern assigned thereto different than the first bit pattern. The nucleotide can be a natural nucleotide or a synthetic nucleotide. The modified nucleotide may be isotope-modified, comprising at least one isotope of one of carbon, nitrogen, oxygen or hydrogen, or otherwise-modified nucleotide (e.g., with a different atom in a cyclic position or a ligated ion or atom).
This disclosure also provides, in another particular implementation, a system for data storage on a DNA strand. The system includes a plurality of isotope-modified nucleotides, each isotope-modified nucleotide comprising at least one isotope, and each isotope-modified nucleotide having a number of possible states. The number of possible states defined by (aNa)*(bNb)*(cNc)* . . . (zNz), where a, b, c . . . z is the number of isotopes available for a given atom, and Na, Nb, Nc . . . Nz is the number of atoms of type a, b, c, and z in the nucleotide.
A similar system can be used to store data on any molecule, crystal or other material that can be isotope-modified.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. These and various other features and advantages will be apparent from a reading of the following detailed description.
The described technology is best understood from the following Detailed Description describing various implementations read in connection with the accompanying drawing, where:
As indicated above, this disclosure describes the use of nucleotides for DNA data storage. Natural nucleotides in DNA are adenine (A), thymine (T), cytosine (C), guanine (G), with uracil (U) used in place of thymine (T) for RNA. Synthetic nucleotides can have different atomic species (e.g., fluorine, chlorine, bromine, mercury, or sulfur) or exclude an atomic species (e.g., carbon, nitrogen, oxygen, or hydrogen) from the typical naturally occurring biological nucleotides. One well-known set of synthetic nucleotides are the Hachimoji nucleotides. Other synthetic nucleotides, having an atom other than carbon (C) or nitrogen (N) in a cyclic position, can also be used for data storage. Additionally, nucleotides or molecules modified by metal ion ligation can be used for data storage. Any of the nucleotides or other molecules can be modified with at least one or more isotopes of at least one of hydrogen (H), carbon (C), nitrogen (N) or oxygen (O). In some implementations, synthetic nucleotides with isotopes other than hydrogen (H) or nitrogen (N) (e.g., in a cyclic position) can be used for data storage.
Other molecules, in addition to natural nucleotides, synthetic nucleotides, and otherwise-modified nucleotides, could be modified with one or more isotopes and additionally or alternately used in place of the nucleotides; for example, the methodology described herein can be applicable to polymers and other large molecules (e.g., hexane, heptane octane, pentane, etc.).
It is noted that although the term “nucleotide” is used herein throughout, it is actually the nucleotide base (e.g., the adenine (A), thymine (T), cytosine (C), guanine (G)) that includes the at least one isotope in many implementations. A nucleotide base attached to a sugar molecule (e.g., ribose) is a nucleoside, which when attached to a phosphate forms a nucleotide. In some implementations, however, at least one isotope may be located in the sugar molecule (e.g., ribose) or the phosphate backbone.
The nucleotides or molecules, and thus the data they encode, can be read, e.g., by Surface-Enhanced Raman Spectroscopy (SERS). SERS is able to differentiate between molecules, including differentiate between molecules with different atoms and/or isotope concentrations. This atom and/or isotope differentiation allows the same chemical compound (e.g., nucleotide, molecule) to represent multiple unique states.
By using synthetic nucleotides and/or isotope-modified nucleotides and/or otherwise-modified nucleotides for DNA data storage, data density can be greatly increased due to the additional spectral signatures present beyond the traditional four signatures present in the four natural nucleotides. Overlapping spectral signatures due to molecular symmetry are expected to be detectable as sensing technology continues to evolve. In essence, the more sensitive the spectroscopic technique, the higher the potential data storage. When all possible states are resolvable with sensing technology, greater than 1 bit per atom can be realized using DNA or other suitable molecules.
Additionally, by using isotope-modified or otherwise-modified nucleotides for DNA data storage, the data is tamperproof from any reading system that makes chemical copies of the nucleotides as part of the reading process. Sensing techniques (e.g., spectroscopy) that detect isotopes or different atoms will still require additional information to determine which spectroscopic shifts represent data and which ones represent natural or intentionally introduced background noise.
Still further, by using isotope-modified nucleotides for DNA data storage, a limited lifetime for the data can be designed by utilizing decaying isotopes, e.g., to provide data security in niche applications.
In the following description, reference is made to the accompanying drawing that forms a part hereof and in which is shown by way of illustration at least one specific implementation. The following description provides additional specific implementations. It is to be understood that other implementations are contemplated and may be made without departing from the scope or spirit of the present disclosure. The following detailed description, therefore, is not to be taken in a limiting sense. While the present disclosure is not so limited, an appreciation of various aspects of the disclosure will be gained through a discussion of the examples, including the figures, provided below. In some instances, a reference numeral may have an associated sub-label consisting of a lower-case letter to denote one of multiple similar components. When reference is made to a reference numeral without specification of a sub-label, the reference is intended to refer to all such multiple similar components.
Any nucleotide in a strand of DNA can be a carrier for data, by assigning a bit pattern to the nucleotide. Traditional biological DNA data storage is limited to four states per natural nucleotide. A data storage bit is typically represented by one nucleotide on one strand of the DNA double-helix strand, the other strand having the complementary nucleotide which offers redundancy but not extra data capacity. For example, binary bits can be arbitrarily assigned to the nucleotides as follows: A=00, G=01, C=10, and T=11. Thus, with this example, if the binary data 0000111100011110 is desired, an oligo (a portion of a DNA strand) having nucleotides in the order AATTAGTC is needed. Such an oligo can be formed by any suitable method to obtain the desired nucleotide sequence. Once the oligo is formed, it can be sequenced. or “read” by any suitable method that can identify the nucleotides and convert the nucleotides identification to data bits.
Surface Enhanced Raman Spectroscopy (SERS) is an ultrasensitive optical detection method that can be used to identify molecules such as nucleotides based on their unique Raman scattering spectra. Each of the four natural nucleotides (adenine (A), cytosine (C), guanine (G), and thymine (T)) emits Raman-scattered photons with unique frequencies when excited by a laser,
Each of the natural nucleotides A, C, G, T (of
With the four natural biologic or genetic nucleotides, there are four states per bit (nucleotide) position. These natural nucleotides are a base 4 (quaternary) number system compared with the more commonly used base 2 (binary), base 10 (decimal), and base 16 (hexadecimal) number systems. The number of bit states (and therefore the base of the number system) can be increased by utilizing at least one isotope in a nucleotide. For example, with the addition of two isotope-modified nucleotides, the number of nucleotide states increases from four to six. By increasing the number of isotopes and where those isotopes are located in a nucleotide, the number of bit states represented by a nucleotide can be increased exponentially.
A natural nucleotide or a synthetic nucleotide can have one of four states per position. These four states are the equivalent of 2 binary bits (4=22). Each nucleotide position can therefore carry two binary bits. However, as will be shown with isotope encoding, each correlated nucleotide pair can have>231 states (base 231 number system) representing>15 times increase in storage density binary bits per unit volume, where the nucleotide volume is essentially constant versus the data density.
The number of states each nucleotide or synthetic nucleotide can have is dependent on the resolution capability of the reading (e.g., spectroscopic) technique used. Higher spectroscopic resolution will support detection of smaller spectroscopic shifts which directly affects the number and position of isotopes that can be used to provide additional states for a given nucleotide. Greater spectroscopic sensitivity allows for greater number of isotopes per nucleotide, and thus greater number of states and increased data storage.
In adenine, seen in
Referring to
Guanine, of
Thymine, of
Hachimoji nucleotide 2-aminoimidazo[1,2a][1,3,5]triazin-4(1H)-one of
Hachimoji nucleotide 6-amino-5-nitropyridin-2-one of
Hachimoji nucleotide isoguanine of
Hachimoji nucleotide isocytosine of
The number of states is an exponential relationship between the number of possible isotopes being used and the number of possible locations the isotope can be located at in the molecule. There are multiple stable and decay prone isotopes that can be used to increase the number of detectable states for a given nucleotide. For example, carbon (C) has isotopes C12, C13 and radioactive C14; hydrogen has H1 (protium), H2 (deuterium) and radioactive H3 (tritium); nitrogen (N) has N14 and N15; oxygen (O) has O16, O17 and O18. Other isotopes of C, H, N and O are known but are less practical due to the isotope decay times.
As seen in
The ability to differentiate between isotopes is dependent on the given isotope's frequency shift of the Raman-scattered photons, the location of the isotope in the molecule, and the sensitivity of the Raman spectrometer. Raman Spectroscopy including SERS is just one of the spectroscopic techniques that can be used to identify different atomic isotopes; other spectroscopic techniques (e.g., X-ray spectroscopy) can also be used. The SERS implementation described here is representative of the other spectroscopic implementations (ultra-violet, x-ray, gamma ray). Higher spectroscopic sensitivity (usually associated with higher frequencies) will yield improved state detection of overlapping frequency shifts due to molecular symmetry. This will allow for increasing data density, improving copy protection, and improving self-erasing characteristics as detector sensitivity continues to improve over time.
The lagging strand nucleotide is always chemically fixed in relation to the leading strand nucleotide. In the absence of synthetic nucleotides, for DNA, guanine (G) only pairs with cytosine (C), and adenine (A) only pairs with thymine (T). As such, although the lagging strand is different it is generally redundant for data storage purposes as shown in
However, as shown in
The total possible states for any position (e.g., the position identified by the box 505a) of the leading strand 502a is four (i.e., A, C, G, T); each natural genetic or biological nucleotide position supports only four possible states.
FIG. SC shows an example DNA oligo 500c having nine nucleotide pairs, with various nucleotides in the lagging strand being isotope-modified. Similar to the oligo 500b in
The examples of
Whether only one isotope or multiple, the leading strand 502 and the lagging strand 504 can be interpreted by a “reader” in one of two methods. The first method is as described above in respect to
Correlating the strands 502, 504 increases the size of the data set that can be represented in the overall strand 500. Any one position in the strand 500 now supports sixteen states—AT, AT′, A′T, A′T′, TA, TA′, T′A, T′A′, CG, CG′, C′G, C′G′, GC, GC′, G′C, G′C′. Synchronizing data from both the leading strand 502 and lagging strand 504 has a multiplicative effect on states represented, compared to an additive effect when data is only read from one strand (e.g., the leading strand). A strand tagging method can be used can be used to ensure data can be synchronized.
For a non-correlated strand, the two strands 602, 604 do not need to be read simultaneously or even together, and each position (e.g., a nucleotide in the position of the box 605) in the leading strand 602 or in the lagging strand 604 can support a different number of states depending on the nucleotide present. The data present in the position of the box 605 shows thymine supporting “y” unique states. The number of unique states (e.g., “y”) is dependent on the number and atomic species of the isotopes in the (e.g., thymine) molecule. Other nucleotides will have different numbers of unique states, as has been discussed above. The number of unique states is not dependent on the nucleotide with which it is paired.
For a correlated strand, the relative position between the leading and lagging strands 602, 604 is relevant and must be known at all times, as the nucleotides in the two strands are paired;
Although the strands 602, 604 are correlated, it is not necessary to read both strands simultaneously, rather each strand can be read individually as long as the position (e.g., any one of positions 0-8) of the leading strand 602 and lagging strand 604 nucleotides are known. The strands 602, 604 can be tagged or otherwise have the position(s) identified or indexed, particularly if the strands 602, 604 are processed separately.
Returning to
Each nucleotide supports a different number of isotopic states due to the individual atomic makeup of the nucleotide. For natural nucleotides, the AT paring supports more individual states (approximately double) than the CG paring, before accounting for symmetry. In some implementations, using the AT pairing exclusively can be done to maximize the data stored, as long as the DNA double strand remains stable with just one nucleotide paring present.
By using the formula Num_isotopesNum-atoms, the total independent states for a nucleotide, taking into account all possible isotope locations for each isotope, can be calculated. Thus, each isotope-modified nucleotide has a number of possible states defined by:
number of possible states=(aNa)*(bNb)*(cNc)* . . . (zNc) (1)
where:
Returning to
The following calculations provide correlated and uncorrelated positions for the natural nucleotides; it should be understood that the theory similarly applies to the Hachimoji and other synthetic nucleotides.
The number of states available to a correlated position in the strand (e.g., denoted by the box 607) is much greater than to a non-correlated position (e.g., denoted by the box 605). Each non-correlated position in the strand can represent 218,088 possible (different) isotope-modified nucleotide states (i.e., 73,728+32,768+98,304+12,288=218,088 for the natural nucleotides), whereas a correlated position in the strand has significantly more possible (different) isotope-modified nucleotide states, >231 or >230 (i.e., 32,768*73,728=2,415,919,104 for an AT pair or 12,288*98,304=1,207,959,552 for a CG pair).
If both the leading and lagging strands are processed independently (i.e., they are not correlated), the AT or CG pair may make up the entire double strand, provided the DNA can remain stable in that configuration. An example of this is shown in the first four positions of
For non-correlated reading or decoding, each position of the AT pair would support 32,768+73,728 states and each CG pair would support 12,288+98,304 states. However, if both the leading and lagging strands 502d, 504d were correlated while encoding and decoding (processed dependently), as shown by the pair in the box 607 in
With 231 total possible states represented by 30 atoms from the AT pair, there is >1 binary bit per atom storage density possible in the pair. The GC pair support 1,207,959,552 states (>230) per position, essentially half of the AT pair.
With correlated decoding of the two strands, the order of the leading strand to the lagging strand has an effect; i.e., AT is uniquely different from TA and CG is uniquely different from GC, providing different data and a different number of possible states. The total possible states for a single position of a nucleotide pair is AT+TA+CG+GC, which is 7,247,757,312 possible states (>232). If a nucleotide with a long half-life (e.g., carbon14) is included, it will add long term data decay, and will increase the possible states to >238 (1.3 binary bits per atom).
With today's technology, many of the state combinations may not be resolvable, for example, with Raman scattering or surface enhanced Raman scattering (SERS). However future techniques (e.g., x-ray spectroscopy) are expected to be able to resolve more states. Other spectrographic techniques may also be useable. As the ability to resolve more states due to increased sensitivity improves, so will data storage density. The higher the resolution of the sensing technique, the greater the ability to differentiate symmetrical combinations and the greater the amount of data that can be stored on a given isotope-modified nucleotide, approaching the theoretical states calculated above. Isotope-modified nucleotides for DNA data storage have the potential to exceed >1 bit state of storage per atom as the sensitivity of the detector improves over time.
Isotope modified nucleotides have a unique property which is a variable number base system for storing data. The number base is defined by the number of states that are encoded, and the number of possible states is determined by which isotope combinations are used in the encoding. This state information is created and utilized as needed by the data encoder.
Not only does utilizing isotope-modified nucleotides drastically increase the data storage density on a DNA strand, copying of the DNA strand is prohibitive, which adds a level of security to the data.
In some methodologies, when data is read from DNA, multiple copies of the DNA strand are created. These copies are processed in parallel and the read data is combined to obtain a full data set from the original strand. This technique is conventionally used because reading an entire length of a strand of DNA can take a long time with standard techniques, whereas processing multiple copies at the same time has the effect of increasing the speed of reading the DNA nucleotide values. SERS, as discussed in respect to
As indicated directly above, copies of the DNA strand are commonly made, e.g., to hasten reading. However, a chemical process cannot copy the isotope information in an isotope-modified strand, as disclosed herein, as all isotopes of a single element, and hence the resulting nucleotide, are chemically identical. In such a manner, although a chemical copy can be made, the copy will not include the isotope information and therefore that copy is not a true duplicate, thus providing a mode of copy protection, because the data is protected from common chemical copying processes. In this copy protection methodology, the unintended reader, without additional information on how nucleotide encoding is being used (e.g., which isotopes, where in the nucleotide, which nucleotides, number of isotopes per nucleotide, etc.) or whether it is being used, will not know data was lost with the chemical copy, and thus will be unable to know, much less effectively decode, the data. Thus, by using isotope-modified DNA for data storage, the data is protected from common chemical copying and reading.
Another reading process for DNA data uses spectroscopic techniques, e.g., Raman spectroscopy. However, without prior knowledge as to which nucleotides should have isotopic shifts in the spectroscopy, the unintended reader will not know if a measured spectroscopic shift is due to an expected isotope and hence part of the data or if it is background noise. Additionally, the unintended reader may overlook the encoded data completely if the reading technique is not sensitive enough to recognize the small shifts in the isotope spectroscopic response. Again, by utilizing isotope-modified DNA for data storage, the data is protected from common spectroscopic analysis. The data is also protected from the unintended reader by the number base used in the encoding. Only the encoder and the intended reader know the number base being used. Any number base can be chosen between 21 and 232 to encode the data when using the techniques described.
It is noted that to have a viable spectroscopic copy protection, the concentration of the isotopes in the DNA should be taken into account. Too much variation from natural spectral levels can suggest to the unintended reader the presence of isotopic-modification in the nucleotides, although the unintended reader would nevertheless need to determine how the nucleotide encoding is being used (e.g., which isotopes, where in the nucleotide, which nucleotides, number of isotopes per nucleotide, etc.).
Higher levels of less common isotopes can be used to flood the spectroscopic response, thus hiding the true data present in only pre-defined specific shifts. Flooding the signal, in this manner, complicates attempts to determine which isotope locations represent the encoded data.
Offsetting correlated strands is another technique to protect isotope encoded data from unintended viewing. When two strands (e.g., strands 602, 604 of
As indicated above, not only does utilizing isotope-modified nucleotides drastically increase the data storage density on a DNA strand and inhibit copying and identification of the DNA strand, the data can be designed with a limited lifetime, or, designed with a “self-destruct” mechanism. A limited data life can be implemented using short-lived isotopes in an isotope-modified nucleotide.
When an isotope decays, the spectroscopic information changes to a new state and the value no longer reflects the original recorded data. Depending on the resulting decayed atom, the molecule (nucleotide) may also become unstable and break up. Examples of decay-prone isotopes that can be used to encode data in a nucleotide include tritium (12.32 year half-life) and phosphorous 33 (25 day half-life). Tritium (H3) is a particularly good candidate isotope for self-erasing or limited life data. The natural nucleotides contain about 30% hydrogen, and tritium can break the nucleotide bonds when it converts to Helium3 (He3). Once the nucleotide bonds are broken, order is lost and the data is permanently scrambled. When designing a limited life for an isotope-modified nucleotide, the isotope percentage should be sufficiently high that the decayed state cannot be overturned with error correction techniques.
To read the DNA strand having at least one isotope-modified nucleotide, numerous technologies may be used. Raman spectroscopy is one suitable technology.
A Raman sensor or device can be used that has a Raman “hot spot” channel formed by laser excitation and enhanced by resonance of focusing plasmonic (e.g., gold, silver) nanostructures. A DNA template strand is drawn or fed through the hot spot channel. As the DNA template strand moves through the hot spot, Raman spectra for the individual nucleotides and isotope-modified nucleotides are measured.
In some implementations, rather than measuring each nucleotide individually, the Raman spectra for a first group of nucleotides present in the hot spot channel is measured at a first point in time, and the Raman spectra for a second group of nucleotides present in the hot spot channel is measured at a second point in time subsequent to the first point in time. The two Raman spectra are compared to determine what nucleotides) left the hot spot and what nucleotide(s) entered the hot spot.
In some implementations, the device includes a DNA polymerase, which replicates the template strand being sequenced. The replication action by the polymerase pulls the template strand through the hot spot channel. In some implementations, a secondary force, e.g., an electric force or voltage differential, is additionally or alternatively used to aid the passage of the strand through the hot spot channel between the nanostructures.
The sensor can be provided as a microfluidic lab-on-a-chip system, or, “on chip.”
The sensor 700 has a sample loading chamber 702, a secondary or sample receiving chamber 704 and a nanochannel 705 connecting the chambers 702, 704. A pair of nanostructures 710a, 710b is located on opposites sides of the nanochannel 705, operably connected to a pair of waveguides 712a, 712b. The nanostructures 710 focus the Raman signal to a small region (e.g., 1-10 nm wide) in the nanochannel 705. The nanostructures 710 may be any of a variety of shapes, such as triangular (as in
At least one laser 720 is focused on at least one of the nanostructures 710 in the region of the nanochannel 705;
The laser(s) 720 are directed at the nanostructures 710 and/or the gap between them, to generate plasmons across the nanostructures 710 and create a Raman hot spot in the nanochannel 705. The one or more waveguides 712 may be used to direct the laser beam(s) to the nanostructures 710. The laser(s) 720 may be, individually, e.g., a solid state laser, a gas (e.g., xenon) laser, a liquid laser, etc., or any similar light source operating at, e.g., 600 nm, 800 nm, 1064 nm wavelengths. Multiple lasers 720 may be positioned parallel to or perpendicular to the nanostructures and may be on the same plane or a separate plane.
The resulting Raman photons or light scattered by the nucleotides (hence, the Raman spectra) are measured and the nucleotides identified. Stokes scattered photons, Anti-Stokes scattered photons, or both may be used for nucleotide identification. The Raman scattered photons may be collected and/or focused by mirrors or lenses to facilitate identification of the nucleotides, or the scattered light may be collected by a waveguide. Light may be detected and quantified by a photomultiplier tube, photodiode array, charge-coupled device, electron multiplied charge-coupled device, etc. The resulting Raman-scattered photons may be filtered such that only photons of specific frequencies are detected. In some implementations, optical resonator(s) may be present to increase the signal from the detected photons.
In use of the sensor 700, a DNA template strand having one or more isotope-modified nucleotides is drawn or fed from the sample loading chamber 702 through the nanochannel 705 through the hot spot formed by the nanostructures 710 and the laser(s) 720, The laser(s) 720, focused on the nanostructures 710, enhance the Raman spectra or resonance obtained from the scattered photons, allowing each individual nucleotide to be identified by its Raman spectra.
In
The sensor 800 has a sample loading chamber 802, a secondary chamber 804, and a nanochannel hot spot 805 therebetween. This nanochannel hot spot 805 is generated by laser excitation and enhanced by resonance of metallic (e.g., gold) nanostructures 810. The sample loading chamber 802 is upstream of the nanochannel hot spot 805 and the secondary chamber 804 is downstream of the nanochannel hot spot 805.
A DNA polymerase 830 (illustrated as a Pac Man™ type shape) replicates a DNA template strand 840 to be sequenced, the strand having at least one isotope-modified nucleotide; the replication process, however, is not able to replicate the isotope information, as discussed above. The replicated complementary strand 850 is shown proximate the DNA polymerase 830. The action of replicating the template strand 840, by the DNA polymerase 830, applies a tension or force on the strand 840 and pulls the strand through the Raman nanochannel hot spot 805. Each of the nucleotides of the template strand 840 generates a unique Raman signal depending on its identity as it passes through the nanochannel hot spot 805.
The nucleotides present in the nanochannel hot spot emit Raman-scattered photons, which can then be filtered and detected. Each of the nucleotides A, C, G, T emits Raman photons of specific frequencies (see,
Various additional and alternate implementations are also contemplated.
In some implementations, the DNA template strand is a linear single strand (as shown, e.g., in
In other implementations, a DNA exonuclease, an RNA polymerase or exonuclease may be used in place of a DNA polymerase or DNA exonuclease, in order to sequence RNA or DNA. Alternately, an electric current or voltage differential may be used to pull the strand through the hot spot(s) or aid in the pulling. Other sources of electrophoresis may additionally or alternatively be used, as well as another source of force, e.g., electromechanical.
In summary, described herein is the use of isotope-modified nucleotides and other molecules for encoding data thereon. Any or all of the H, C, N and O molecules can be replaced with an isotope, thus modifying the nucleotide. Each modified nucleotide will produce a different Raman scattering spectra. Thus, the more and/or different isotopes in the nucleotide, the more nucleotide signatures, and the more nucleotide signatures, the grater the increase in the data density available in the DNA strand. Rather than each nucleotide having only one data state available and encoding 2 bits (e.g., 00, or 01, or 10, or 11), the number of possible states is a function of the number of isotope-replaceable-atoms and the number of available isotopes. As shown above, thymine theoretically has 73,728 data states, adenine theoretically has 32,768 data states, guanine theoretically has 98,304 data states, and cytosine theoretically has 12,288 data states. Thus, each modified nucleotide can encode significantly more bits. Additionally, if the processing of the two strands is correlated (where position matters), the data store in any nucleotide pair position exceeds 232 states (32 bits).
The above specification and examples provide a complete description of the structure and use of exemplary implementations of the invention. The above description provides specific implementations. It is to be understood that other implementations are contemplated and may be made without departing from the scope or spirit of the present disclosure. The above detailed description, therefore, is not to be taken in a limiting sense. While the present disclosure is not so limited, an appreciation of various aspects of the disclosure will be gained through a discussion of the examples provided.
Unless otherwise indicated, all numbers expressing feature sizes, amounts, and physical properties are to be understood as being modified by the term “about,” whether or not the term “about” is immediately present. Accordingly, unless indicated to the contrary, the numerical parameters set forth are approximations that can vary depending upon the desired properties sought to be obtained by those skilled in the art utilizing the teachings disclosed herein.
As used herein, the singular forms “a”, “an”, and “the” encompass implementations having plural referents, unless the content clearly dictates otherwise. As used in this specification and the appended claims, the term “or” is generally employed in its sense including “and/or” unless the content clearly dictates otherwise.
Spatially related terms, including but not limited to, “bottom,” “lower”, “top”, “upper”, “beneath”, “below”, “above”, “on top”, “on,” etc., if used herein, are utilized for ease of description to describe spatial relationships of an element(s) to another. Such spatially related terms encompass different orientations of the device in addition to the particular orientations depicted in the figures and described herein. For example, if a structure depicted in the figures is turned over or flipped over, portions previously described as below or beneath other elements would then be above or over those other elements.
Since many implementations of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended. Furthermore, structural features of the different implementations may be combined in yet another implementation without departing from the disclosure or the recited claims.
This application is a continuation-in-part of pending U.S. application Ser. No. 17/166,838 filed Feb. 3, 2021 titled Nucleotides with Isotopes for DNA Data Storage, the entire disclosure of which is incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 17166838 | Feb 2021 | US |
Child | 17308837 | US |