This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Other features, details, utilities, and advantages of the claimed subject matter will be apparent from the following, more particular written Detailed Description of various implementations as further illustrated in the accompanying drawings and defined in the appended claims.
Various embodiments of methods and systems for reading DNA storage genes are described herein. In some embodiments, a method for use in reading a DNA storage gene includes removing one or more linking symbols from a first strand of a DNA storage gene and introducing a test symbol pool to the DNA storage gene. The test symbol pool can include a plurality of single stranded test symbols, each single stranded test symbol comprising a data symbol and a linking symbol. The method further includes replacing a data symbol in the first strand of the DNA storage gene with a single stranded test symbol from the test symbol pool when the data symbol and the linking symbol in the single stranded test symbol are complimentary, respectively, to an adjacent linking symbol and data symbol in the second strand of the DNA storage gene. The method further includes scanning the DNA storage gene to identify whether each linking symbol in the DNA storage gene is single stranded DNA or double stranded DNA, and recording the locations where the linking symbol is double stranded DNA.
In some embodiments, a method for use in reading a DNA storage gene includes providing a first DNA storage gene having one or more linking symbols removed from a first strand of the DNA storage gene and introducing a first test symbol pool to the first DNA storage gene. The first test symbol pool includes a plurality of single stranded test symbols, each single stranded test symbol including one of a first set of data symbols and one of a first set of linking symbols. The method further includes replacing a data symbol in the first strand of the first DNA storage gene with a single stranded test symbol from the first test symbol pool when the data symbol and the linking symbol in the single stranded test symbol are complimentary, respectively, to an adjacent linking symbol and data symbol in the second strand of the first DNA storage gene. The method further incudes scanning the first DNA storage gene to identify whether each linking symbol in the first DNA storage gene is single stranded DNA or double stranded DNA and recording each location on the first DNA storage gene where the linking symbol is double stranded DNA. The method further includes providing a second DNA storage gene having one or more linking symbols removed from a first strand of the second DNA storage gene, the second DNA storage gene being identical to the first DNA storage gene, and introducing a second test symbol pool to the second DNA storage gene. The second test symbol pool includes a plurality of single stranded test symbols, each single stranded test symbol including one of a second set of data symbols and one of the first set of linking symbols, the second set of data symbols being different from the first set of data symbols. The method further includes replacing a data symbol in the second strand of the second DNA storage gene with a single stranded test symbol from the second test symbol pool when the data symbol and the linking symbol in the single stranded test symbol are complimentary, respectively, to an adjacent linking symbol and data symbol in the second strand of the second DNA storage gene. The method further includes scanning the second DNA storage gene to identify whether each linking symbol in the second DNA storage gene is single stranded DNA or double stranded DNA and recording each location on the second DNA storage gene where the linking symbol is double stranded DNA. The method further includes using the recoded linking symbol locations and the composition of the first test symbol pool and the second test symbol pool to read the DNA storage gene.
In some embodiments, a system for use in reading a DNA storage gene includes a reaction vessel, an enzyme source, one or more test symbol pool sources, and a scanner. The reaction vessel is configured to receive a DNA storage gene. The enzyme source is in fluid communication with the reaction vessel and stores an enzyme source that is configured to remove one or more linking symbols from a first strand of a DNA storage gene. The one or more test symbol pool sources are in fluid communication with the reaction vessel and store a test symbol pool including a plurality of single stranded test symbols, each single stranded test symbol including a data symbol and a linking symbol. The scanner is configured to scan a DNA storage gene located in the reaction vessel and distinguish between single stranded DNA and double stranded DNA in the DNA storage gene, detect a single stranded DNA overhang, or both.
A further understanding of the nature and advantages of the present technology may be realized by reference to the figures, which are described in the remaining portion of the specification.
With reference to
Regarding step 110, a DNA storage gene is provided and then manipulated in order to remove one or more linking symbols from a first strand of the DNA storage gene. Any suitable DNA storage gene can be used in step 110 and as part of the overall method 100 of reading a DNA storage gene. DNA storage genes serve as volumetrically efficient archival storage mediums by way of using an encoding scheme to construct/synthesize a sequence of base pairs, after which the base pairs can be decoded/read in order to communicate information via the DNA storage gene.
In some embodiments, the DNA storage gene is a long strand of double stranded DNA made up of multiple individually selected data symbols assembled in a specific order using a predefined set of linking symbols. The data symbols themselves each comprise a unique sequence of one or more base pairs. The DNA storage gene may comprise alternating fixed length sections of data symbols and linking symbols. The DNA storage gene may further comprise binding symbols used to connect a linking symbol to either end of the data symbol. In some embodiments, the linking symbol prior to the data symbol when reviewing the DNA storage gene from left to right is the linking symbol associated with the data symbol, while the linking symbol following the data symbol when reviewing the DNA storage gene from left to right is the linking symbol associated with the next data symbol in the DNA storage gene. Thus, using the nomenclature where DS stands for data symbol, LS stands for linking symbol, and BS-L and BS—R stand for left binding symbol and right binding symbol, respectively, the DNA storage gene may have a construction as follows:
BS—R:LS1:BS-L:DS1:BS—R:LS2:BS-L:DS2:BS—R:LS1:BS-L:DS3:BS—R:LS2:BS-L
The linking symbols used in DNA storage gene may be selected from a full set of known linking symbols that arrange in repeating order. In this manner, linking symbols may be reused in the DNA storage gene such that the DNA storage gene need not use a unique linking symbol prior to every data symbol, but the linking symbols will arrange in repeating order. Thus, in the representation given above, two linking symbols are used (LS1 and LS2), and the pattern of linking symbols used in the DNA storage gene repeats LS1, LS2, LS1, LS2 . . . , though with different, random data symbols attached to the linking symbols.
The units of associated linking symbols and data symbols can be referred to as a data block, and the DNA storage gene is designed such that the data blocks assemble in sequence, using the linking symbol sequence as the overall sequencing guide. When the sequence of available linking symbols ends, the sequence repeats in order to extend the length of the DNA storage gene. The binding symbols may be constant, such that the same left and right binding symbols are used in each data block.
With reference again to step 110, the DNA storage gene provided for reading is manipulated such that one or more linking symbols on a first strand of the double stranded DNA storage gene are removed from the DNA storage gene. The result of step 110 is a DNA storage gene having various segments that are single stranded due to the removal of a linking symbols from the first strand of the double stranded DNA storage gene. In some embodiments, step 110 is carried out such that all linking symbols from a first strand of the DNA storage gene are removed, thereby providing a DNA storage gene that is single stranded at all linking symbol locations.
Any manner of removing the linking symbols from a first strand of the DNA storage gene can be used as part of step 110. In some embodiments, the linking symbols are removed from the first strand of the DNA storage gene by nicking or cutting the first strand of the DNA storage gene at the juncture of each linking symbol and the adjacent binding symbols. Referring again to the DNA storage gene construction described previously, the first strand of the DNA storage gene can be nicked or cut at the juncture of LS1 and BS-L, the juncture of LS1 and BS—R, the juncture of LS2 and BS-L, the juncture of LS2 and BS—R, and so on.
Any manner of creating these links or cuts in the linking symbols can be used. In some embodiments, an enzyme is programmed to make these cuts such that the introduction of the enzyme to the DNA storage gene results in the cuts being made at either end of the linking symbols. Once cut, further steps may be used to remove the linking symbols from the first strand of the DNA symbol. In some embodiments, a heating step is used in order to remove the cut linking symbols from the first strand of the DNA storage gene. As noted previously, the result of this step is to provide a DNA storage gene that is single stranded at one or more linking symbol locations. The removal of the one or more linking symbols also results in the creation of a toe-hold in the first strand of the DNA storage gene. These toe-holds can be used in subsequent steps described in greater detail below where complimentary single stranded test symbols replace data symbols in the first strand of the DNA storage gene and thereby revert the single stranded linking symbols to double stranded linking symbols.
In step 120, a test symbol pool is introduced to the DNA storage gene having one or more single stranded test symbols. The test symbol pool is comprised of a plurality of single stranded test symbols, each test symbol comprising a data symbol selected from a first set of data symbols and a linking symbol selected from a first set of linking symbols. Each single stranded test symbol may also include a right binding signal and a left binding signal. Each single stranded test symbol may further include an anchor symbol (AS), which may be located at a first end of the test symbol, and with the linking symbol being located at the opposite (second) end of the test symbol. In some embodiments, the single stranded test symbols that make up the test symbol pool may have the following construction:
LSx:BS-L:DSx:BS—R:AS
The test symbol pool is designed to include single stranded test symbols of every possible combination of the first set of data symbols and the first set of linking symbols used to make up the test symbol pool. Thus, in an embodiment where the test symbol pool includes a first set of data symbols DS1, DS2, DS3 and DS4, and a first set of linking symbols L1 and L2, the test symbol pool includes single stranded test symbols of L1-DS1, L1-DS2, L1-DS3, L1-DS4, L2-DS1, L2-DS2, L2-DS3 and L2-DS4. The first set of data symbols and the first set of linking symbols will include data symbols and linking symbols that are present in the DNA storage gene. In some embodiments, the first set of data symbols used in the test symbol pool may include less than all of the data symbols used in the DNA storage gene, while the first set of linking symbols used in the test symbol pool will include any and all of the linking symbols used in the DNA storage gene.
In step 120, the test symbol pool is introduced to the DNA storage gene in any manner that allows for interaction between the test symbol pool and the DNA storage gene, such as the replacement of a data symbol in the first strand of the DNA storage gene with a single stranded test symbol from the test symbol pool. In step 130, this replacement occurs at any location on the DNA storage gene where a single stranded test symbol, comprising a data symbol and a linking symbol, is complimentary to an adjacent data symbol and linking symbol in the second strand of the DNA storage gene. In this replacement, a data symbol in the first strand of the DNA storage gene, which may include a right binding symbol and a left binding symbol, is removed from the first strand. In its place, a test symbol from the test symbol pool that is comprised of a data symbol and linking symbol that is complimentary to the data symbol and linking symbol in the second strand of the DNA storage gene at the location where the data symbol from the first strand previously resided attaches to the first strand of the DNA storage gene. Because the test symbol includes a data symbol and a linking symbol, the replacement of the data symbol in the first strand with the test symbol results in that linking symbol location of the DNA storage gene reverting back to double stranded DNA. This replacement occurs in any location where a test symbol exists in the test symbol pool that is complimentary to a linking symbol/data symbol pair in the second strand of the DNA storage gene.
In locations along the DNA storage gene where the test symbol pool does not include a single stranded test symbol that compliments a linking symbol/data symbol pair in the second strand of the DNA storage symbol, no replacement takes place. As such, the first strand of the DNA storage gene in those locations retains the original data symbol and continues to be without the linking symbol removed from the first strand of the DNA storage gene in step 110. In some embodiments, the DNA storage gene following step 130 will include some locations where the linking symbol location has reverted to double strand DNA and some locations where the linking symbol locations remain single stranded. Depending on the composition of the test symbol pool and the sequence of data symbols in the DNA storage gene, it is also possible that following step 130, all linking symbol locations on the DNA storage gene are reverted back to double stranded DNA or all linking symbol locations on the DNA storage gene remain single stranded DNA.
The specific manner in which the test symbol replaces the data symbol in the first strand of the DNA symbol when there is a compliment between the test symbol and the linking symbol/data symbol pair in the second strand of the DNA storage gene is not limited. In some embodiments, the mechanism for replacement is a toe-hold mediated strand displacement (TMSD) reaction.
Any manner of preparing the test symbol pool can be used, provided that the test symbol pool includes test symbols for every combination of the first set of test symbols and the first set of linking symbols selected for a given test symbol pool. In some embodiments, the single stranded test symbols are created from double stranded versions of the test symbols that are denatured, annealed to a methanol-responsive polymer anchor or magnetic particle anchor for capture, and then, following pull down and release from the polymer anchor, are refined to a single stranded test symbols. In embodiments where the test symbols include an anchor symbol but where the anchor symbol interferes with toe-hold mediated strand displacement or is otherwise not helpful during subsequent scanning of the DNA storage gene, the anchor may be removed from the test symbols as part of preparing the test symbols that will make up the test symbol pool.
As discussed in greater detail below, the overall method 100 used for reading DNA storage gene may use multiple test symbol pools, wherein the composition of each test symbol pool is different. In some embodiments, each test symbol pool includes a different set of data symbols. A different set of data symbols means that each set of data symbols is different from the other set of data symbols by at least one data symbol. Thus, some sets of data symbols with have common data symbols, but will not be completely overlapping. In some embodiments, a first set of data symbols used for a first test symbol pool may include DS1, DS2, DS3 and DS4, while a second set of data symbols used for a second test symbol pool may include DS1, DS2, and DS3. Thus, while both sets of data symbols include DS1, DS2 and DS3, the sets of data symbols are considered different because of the presence of DS4 in the first set of data symbols and the absence of DS4 in the second set of data symbols. The set of linking symbols between different test symbol pools remains the same.
In step 140, the DNA storage gene is scanned to identify whether each linking symbol in the DNA storage gene is single stranded DNA or double stranded DNA. The locations where a linking symbol remains single stranded following step 130 allows for the inference that the test symbol pool used in step 130 did not include a test symbol including the data symbol at the location on the DNA storage gene where the linking symbol remains single stranded. Similarly, the locations where a linking symbol has reverted back to double stranded following step 130 allows for the inference that the test symbol pool used in step 130 did include a test symbol including the data symbol at the location on the DNA storage gene where the linking symbol has reverted back to double stranded. As discussed in greater detail below, this information along with the known composition of the test symbol pool used, can be used to determine the specific data symbol at each location along the DNA storage gene and thereby permit reading of the DNA storage gene. Any manner of scanning the DNA storage gene to identify double stranded or single stranded locations can be used. In some embodiments, a scanning device capable of distinguishing between double stranded DNA and single stranded DNA is used to carry out step 140.
In other embodiments, step 140 includes scanning the DNA storage gene to identify anchor tails of the test symbols that have replaced data symbols in the first strand of the DNA storage gene. Similar to detecting double stranded locations, this method allows for similar inferences to those detailed above, where the presence of an anchor tail means a replacement has taken place and therefore the test symbol pool included a test symbol having a data symbol also included in the DNA storage gene. Any device capable of identifying the presence of such anchor tails can be used to carrying out this version of step 140. In some embodiments, the anchor tails are magnetic particle anchors, which can be detected through the use of magnetic sensors
In step 150, the location on the DNA storage gene where double stranded linking symbols are located (or where anchor symbols are identified) is recorded. Any manner of recording this information can be used, such as through the use of an associated computer system/processor used as part of carrying out the method 100. A computer system/processor may be in direct or indirect communication with the scanner used in step 140 so that location information is relayed from the scanner to the computer system/processor used to record location information. As discussed in greater detail below, this location information, along with information relating the composition of the test symbol pool, can be used as part of reading/decoding the DNA storage gene.
In some embodiments, the method 100 is carried out multiple times, with each iteration of the method 100 being carried out using a test symbol pool with a different composition of test symbols. In some embodiments, the method 100 is carried out concurrently using different test symbol pools in order to expedite the overall process of reading the DNA storage gene. When location data is recorded from multiple methods 100 performed serially or in parallel and each method 100 using a test symbol pool of differing test symbol compositions, relatively basic logic can be used to identify the data symbol at each location of the DNA storage gene and thereby read the DNA storage gene.
With reference to
Following the replacement shown in
With reference to
The method 300 general includes step 310 of providing a first DNA storage gene having one or more linking symbols removed from the first DNA storage gene. In some embodiments, all linking symbols are removed from the first strand of the first DNA storage gene. The specific manner and details of removing linking symbols from the first DNA storage gene is similar or identical to step 110 described in greater detail above.
In step 320, a first test symbol pool is introduced to the DNA storage gene. The first test symbol pool includes a plurality of single stranded test symbols, each single stranded test symbol comprising one data symbol from a first set of data symbols and one linking symbol from a first set of linking symbols. In some embodiments, the test symbol pool includes single stranded test symbols for every combination of the first set of data symbols and the first set of data symbols. Furthermore, the data symbols in the first set of data symbols and the linking symbols in the first set of linking symbols include only data symbols and linking symbols present in the DNA storage gene. The specific manner and details of introducing the first test symbol pool to the DNA storage gene can be similar or identical to step 120 described in greater detail above.
In step 330, data symbols in the first strand of the DNA storage gene are replaced with single stranded test symbols from the first test symbol pool when the data symbol and linking symbol in the single stranded test symbol are complimentary, respectively, to an adjacent linking symbol and data symbol in the second strand of the DNA storage gene. Where these replacements take place, the linking symbol portion of the DNA storage gene reverts back to double stranded DNA. The replacement of a data symbol in the first strand of the DNA storage gene with a single stranded test symbol may be carried out via TMSD. The specific manner and details of replacing the data symbols in the first strand of the DNA storage gene with single stranded test symbols from the first test symbol pool can be similar or identical to step 130 described in greater detail above.
In step 340, the DNA storage gene is scanned to identify whether each linking symbol in the DNA storage gene is single stranded or double stranded. The presence of a double stranded linking symbol in the DNA storage gene infers that the test symbol pool included a data symbol present in the DNA storage gene. The specific manner and details of scanning the DNA storage gene can be similar or identical to step 140 described in greater detail above.
In step 350, the location on the DNA storage gene where the linking symbol is double stranded is recorded. This recorded location information is associated with the known composition of the test symbol pool used in step 320 so that it can later be used to read the DNA storage gene. The specific manner and details of recording the location information can be similar or identical to step 150 described in greater detail above.
Method 300 further includes steps 310A, 320A, 330A, 340A and 350A, which are essentially identical to steps 310, 320, 330, 340 and 350, respectively, but which use a second DNA storage gene and a second test symbol pool. In some embodiments, the second DNA storage gene is identical to the first DNA storage gene, including the removal of linking symbols from the same locations as in the first DNA storage gene (in embodiments where the not all linking symbols are removed from the first DNA storage gene). The second test symbol pool introduced in step 320A has a different composition of test symbols from the composition of test symbols included in first test symbol pool of step 320. Generally speaking, the second test symbol pool comprises test symbols made up of data symbols from a second set of data symbols and linking symbols from the same first set of linking symbols used for the first test symbol pool. The second set of data symbols is different from the first set of data symbols in that the second data symbol includes at least one data symbol present in the first set of data symbols or excludes at least one data symbol present in the first set of data symbols. Thus, while the first set of data symbols and the second set of data symbols may include the same data symbols, the data symbols of each set of data symbols are not perfectly identical.
In step 360, the recorded locations from steps 350 and 350A, along with the known compositions of the first and second test symbol pools, are used to identify the data symbols, including their sequence, in the DNA storage gene. Relatively simple logic can be used to read the DNA storage gene based on this information and as described in greater detail below with respect to
In order to determine the data symbol at each data symbol location DS1-DS8 of the DNA storage gene 400, a protocol is introduced where different test symbol pools are introduced to the DNA storage gene. A binary system may be used to determine the composition of each test symbol pool used, where ones and zeros denote the presence or absence, respectively, of a possible data symbol present in the DNA storage gene 400. Thus, as shown in
Because only limited conclusions can be drawn from this information as it relates to identifying the data symbol at each data symbol location, the second test symbol pool is introduced to the DNA storage gene having all linking symbols removed. Generally, the DNA storage gene used with the second test symbol pool will be a fresh version of the DNA storage gene in that no replacements have yet taken place on the DNA storage gene to which the second test symbol pool is introduced.
The combination of the data collected from the first test symbol pool run and the second test symbol pool can now be used together to identify the specific data symbol of each data symbol location in the DNA storage gene.
While
With reference to
System 500 further includes one or more test symbol pool sources 530A, 530B, 530C, with each test symbol pool source 530A, 530B, 530C being in fluid communication with the reaction vessel 510 via tubing or piping or the like such that test symbol pools stored within the test symbol pool sources 530A, 530B, 530C can be delivered into the reaction vessel 510. While
The system 500 further includes a scanner 540 that is configured to scan the DNA storage gene located within the reaction vessel 510. More specifically, the scanner 540 is configured to scan the DNA storage gene and distinguish between single stranded and double stranded DNA in the DNA storage gene, detect a single stranded DNA overhang, or both. The scanner 540 may be positioned within the reaction vessel 510 as shown in
The above specification, examples, and data provide a complete description of the structure and use of example embodiments of the disclosed technology. Since many embodiments of the disclosed technology can be made without departing from the spirit and scope of the disclosed technology, the disclosed technology resides in the claims hereinafter appended. Furthermore, structural features of the different embodiments may be combined in yet another embodiment without departing from the recited claims.