The present disclosure belongs to the field of molecular computing and digital data storage/decoding. In particular, herein provided are systems and methods for data storage and readout using for instance hybrid nucleic acid-polymeric molecules.
As digital information continues to accumulate, higher density and long-term storage are necessary. Data storage capability has been a key aspect of the latest technological developments of human-kind and it still is as never before a compelling challenge to face the current and future “big data” explosion. Although the research on semiconductors have dramatically improved the capacity of data storage in silicon devices, this technology cannot meet the exponential growth of demand for digital data production and storage. This issue is expected to keep widening, as data storage density of silicon chips is limited and magnetic tapes used to maintain large-scale permanent archives begin to deteriorate after 20 years.
As DNA has evolved to store genetic information at large scales, it was proposed to be used as alternative support for data storage, as it can provide both higher density and longer-term storage. Precise synthesis of sequence-encoded heteropolymers has recently opened the possibility of storing information at the molecular scale with ultrahigh density and long-term storage persistence. However, writing strategies are still complicated and reading options not always practical.
The two essential requirements for molecular data storage at single-molecule level are writing and sequence-reading mechanisms. For writing purpose, chemists have developed numerous synthetic methods of controlling sequences in chain-growth and step-growth polymerizations to build sequence-defined macromolecules in a straightforward and protecting-group-free way.
Meanwhile, nanopore technique, the next generation of sequencing tool, have been explored to read DNA sequence in a fast way. When a ssDNA passes through the nanopore, its sequence can be characterized by variation of ionic currents caused by different nucleobases. More recently, nanopores have been explored for sensing digitally encoded DNA nanostructures.
Nanopore sensing is an approach that relies on the exploitation of individual binding or interaction events between to-be-analysed molecules and pore-forming macromolecules. Nanopore sensors can be created by placing nanometric-scaled pore peptide structures in an insulating membrane and measuring voltage-driven ionic transport through the pore in the presence of substrate molecules. The identity of a substrate can be ascertained through its peculiar electrical signature, particularly the duration and extent of current block and the variance of current levels. Two of the essential components of sequencing nucleic acids using nanopore sensing are (1) the control of nucleic acid movement through the pore and (2) the discrimination of nucleotides as the nucleic acid polymer is moved through the pore.
Pore-forming proteins are produced by a variety of organisms and are often involved in defense or attack mechanisms. One notable feature is that they are produced as soluble proteins that subsequently oligomerize and convert into a transmembrane pore in the target membrane. The most extensively characterized pore-forming proteins are the bacterial pore-forming toxins (PFTs), which, depending on the secondary structure elements that cross the bilayer, have been classified as α- or β-PFTs.
In the past, to achieve nucleotide discrimination the nucleic acid has been passed through a mutant of hemolysin (WO 2014/100481). This has provided current signatures that have been shown to be sequence dependent. It has also been shown that a large number of nucleotides contribute to the observed current when a hemolysin pore is used, making a direct relationship between observed current and polynucleotide challenging.
While the current range for nucleotide discrimination has been improved through mutation of the hemolysin pore, a sequencing system would have higher performance if the current differences between nucleotides could be improved further. In addition, it has been observed that when the nucleic acids are moved through a pore, some current states show high variance. It has also been shown that some mutant hemolysin pores exhibit higher variance than others. While the variance of these states may contain sequence specific information, it is desirable to produce pores that have low variance to simplify the system.
In another approach, mutant forms of lysenin, as well as analyte characterisation using thereof, has been described (WO 2013/153359). Lysenin (also known as efLI) is a pore-forming toxin purified from the coelomic fluid of the earthworm Eisenia fetida. It specifically binds to sphingomyelin, which inhibits aerolysin-induced hemolysis. In still another approach, mutant forms of the pore-forming Msp monomer, as well as analyte characterisation using thereof, has been described (WO 2012/107778).
Cao C. et al. (Nat. Nanotechnol. 2016 Apr. 25. doi: 10.1038/nnano.2016.66) demonstrated the ability of aerolysin nanopore to resolve at high resolution individual short oligonucleotides that are 2 to 10 bases long without any extra chemicals or modifications, useful for single-molecule analysis of oligonucleotides.
International patent application WO 2017/189914 discloses methods for controlled segregation of blocks of information encoded in the sequence of a biopolymer, such as nucleic acids and polypeptides, with rapid retrieval based on multiply addressing nanostructured data. In some embodiments, sequence controlled polymer memory objects include data-encoded biopolymers of any length or form encapsulated by natural or synthetic polymers and including one or more address tags. The sequence address labels are used to associate or select memory objects for sequencing readout, enabling organization and access of distinct memory objects or subsets of memory objects using Boolean logic. In some embodiments, a memory object is a single-stranded nucleic acid scaffold strand encoding bit stream information that is folded into a nucleic acid nanostructure of arbitrary geometry, including one or more sequence address labels.
International patent application WO 2018/081745 discloses methods, systems and devices for reading data stored in a polymer (e.g., DNA) and for verifying the sequence of a polymer synthesized in situ in a nanopore-based chip, said method comprising providing a resonator having an inductor and a cell, the cell having a nanopore and a polymer that can traverse through the nanopore, the resonator having an AC output voltage frequency response at a probe frequency in response to an AC input voltage at the probe frequency, providing the AC input voltage having at least the probe frequency, and monitoring the AC output voltage at least at the probe frequency, the AC output voltage at the probe frequency being indicative of the data stored in the polymer at the time of monitoring, wherein the polymer includes at least two monomers having different properties causing different resonant frequency responses.
The articles “Translocation of precision polymers through biological nanopores” (M. Boukhet et al., Macromolecular Rapid Communications, 38, 1700680, 2017), “Tuning Polymer-Protein Interactions with Salt (M. Talarimoghari et al., Biophysical Journal, 112, 457a, 2017) and “Translocation of Sequence-controlled Synthetic Polymers through Biological Nanopores” (M. Boukhet et al., Biophysical Journal, 114, 182a, 2018) describe threading but not sequencing of macromolecular analytes in non-modified hemolysin and aerolysin nanopores.
There is still a need for alternative solutions with regards to molecular systems and methods for encoding, storing and decoding data information which are simple, robust, precise and reliable.
In order to address and overcome at least some of the above-mentioned drawbacks of the prior art solutions, the present inventors developed a brand new tool for encoding and decoding information having improved features and capabilities.
In particular, a first purpose of the present invention is that of providing a novel molecular medium able to encode information, such as in a bitstream-format, which is relatively easy to synthesise, accurate to decipher and gathering high density of information.
A further purpose of the present invention is that of providing a method for encoding and decoding information based on a molecular data storage medium.
Still a further purpose of the present invention is that of providing a decoding system based on nanopore technology that can precisely and reliably decode information stored in a molecular data storage medium.
All those aims have been accomplished with the present invention, as described herein and in the appended claims.
Inspired by recent progresses presented in the previous background section, the present inventors encoded individual binary information through sequence-controlled DNA-polymer hybrid structures and decoded them using solid state or biological nanopores based on engineered pore-forming toxin aerolysin. In non-limiting embodiments detailed later on along the present disclosure, by a rational and synergic development of aerolysin mutants and the design of DNA nucleobases intercalated on sequence-encoded heteropolymers, the translocation speed of the hybrid molecule can be optimized to have a uniquely identifiable level-by-level signal, which delivered digital reading with single-bit resolution without compromising information density.
Using in one embodiment a deep learning strategy to process the current signal, the present inventors demonstrated the ability of engineered aerolysin nanopores to accurately read the information encoded in hybrids DNA-polymer molecules alone and in mixed samples. These findings open promising possibilities to develop writing-reading techniques to process digital data using a biological-inspired platform. In embodiments of the invention, the molecular data storage medium was designed in a binary format, with n-propyl-phosphate representing bit-0 and (2,2-dipropargyl)-propyl-phosphate representing bit-1. Each bit is characterized by peculiar current levels, as well as DNA bases. By using deep learning, the reading accuracy of 1-bit, 2-bit, 3-bit, and 4-bit barcodes were assessed at 98.7%, 96.4%, 95.0% and 76.9%, respectively, thereby demonstrating the ability of nanopores as polymer sequence decoders, opening the venue for further design of polymers specific for a particular reading pore.
In view of the above, according to the present invention there is provided a molecular data storage medium according to claim 1.
Another object of the present invention relates to a method for encoding a bitstream-format information in a molecular data storage medium according to claim 6.
Still another object of the present invention relates to a nanopore-based device for reading data stored in a molecular data storage medium according to claim 8.
Still another object of the present invention relates to a method for decoding a bitstream-format information encoded in the molecular data storage medium according to claim 13.
Further embodiments of the present invention are defined by the appended claims.
The above and other objects, features and advantages of the herein presented subject-matter will become more apparent from a study of the following description with reference to the attached figures showing some preferred aspects of said subject-matter.
Wt: AA00200AA polymer (‘2’ is the non-zero bit depicted in
The subject-matter herein described will be clarified in the following by means of the following description of those aspects which are depicted in the drawings. It is however to be understood that the subject matter described in this specification is not limited to the aspects described in the following and depicted in the drawings; to the contrary, the scope of the subject-matter herein described is defined by the claims. Moreover, it is to be understood that the specific conditions or parameters described and/or shown in the following are not limiting of the subject-matter herein described, and that the terminology used herein is for the purpose of describing particular aspects by way of example only and is not intended to be limiting.
Unless otherwise defined, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, unless otherwise required by the context, singular terms shall include pluralities and plural terms shall include the singular. The methods and techniques of the present disclosure are generally performed according to conventional methods well known in the art and as described in various general and more specific references that are cited and discussed throughout the present specification unless otherwise indicated. Further, for the sake of clarity, the use of the term “about” is herein intended to encompass a variation of +/— 10% of a given value.
The following description will be better understood by means of the following definitions.
As used in the following and in the appended claims, the singular forms “a”, “an” and “the” include plural referents unless the context clearly dictates otherwise. Also, the use of “or” means “and/or” unless stated otherwise. Similarly, “comprise”, “comprises”, “comprising”, “include”, “includes” and “including” are interchangeable and not intended to be limiting. It is to be further understood that where for the description of various embodiments use is made of the term “comprising”, those skilled in the art will understand that in some specific instances, an embodiment can be alternatively described using language “consisting essentially of” or “consisting of.”
In the frame of the present disclosure, the expression “operatively connected” and similar reflects a functional relationship between the several components of the device or a system among them, that is, the term means that the components are correlated in a way to perform a designated function. The “designated function” can change depending on the different components involved in the connection. A person skilled in the art would easily understand and figure out what are the designated functions of each and every component of the device or the system of the invention, as well as their correlations, on the basis of the present disclosure.
The term “nucleotide” refers to a molecule that contains a nitrogen—containing heterocyclic base (also referred to as “nucleobase”), a sugar or a modified sugar and one or more phosphate groups. For example, in some embodiments, a nucleotide can be a deoxynucleotide triphosphate (dNTP). The term “non-natural nucleotide” as used herein refers to a nucleotide that obeys Watson—Crick base pairing but has a modification that can be detected. By way of example, but not limitation, such a modification can be a functional group attached to the nucleobase such as a methyl group on methylcytosine.
As used herein, the terms “nucleic acid molecule,” “nucleic acid sequence,” “nucleic acid fragment,” “oligonucleotide” and “polynucleotide” are used interchangeably and refer to biopolymers that are made from nucleotides as monomer units. The nucleotide monomers link up to form a linear sequence of the nucleic acid polymer. Nucleic acids encompassed by the present disclosure can include deoxyribonucleic acid (DNA), ribonucleic acid (RNA), cDNA or a synthetic nucleic acid known in the art, such as glycerol nucleic acid (GNA), threose nucleic acid (TNA), locked nucleic acid (LNA) or other synthetic polymers with nucleotide side chains, or any combination thereof. For the sake of easiness, peptide nucleic acids (PNAs), artificially synthesized polymer similar to DNA or RNA, are also included into the definition of oligonucleotides according to the invention.
Nucleotide subunits of nucleic acids can be naturally occurring, artificial, or modified. As indicated above, nucleotide typically contains a nucleobase, a sugar, and at least one phosphate group. The nucleobase is typically heterocyclic. Suitable nucleobases include the canonical purines and pyrimidines, and more specifically adenine (A), guanine (G), thymine (T) (or typically in RNA, uracil (U) instead of thymine (T)), and cytosine (C). The sugar is typically a pentose sugar. Suitable sugars include, but are not limited to, ribose and deoxyribose. The nucleotide is typically a ribonucleotide or deoxyribonucleotide. The nucleotide typically contains a monophosphate, diphosphate or triphosphate. These are generally referred to herein as nucleotides or nucleotide residues to indicate the subunit. Without specific identification, the term nucleotides, nucleotide residues, and the like, is not intended to imply any specific structure or identity.
As indicated above, the nucleic acids of the present disclosure can also include synthetic variants of DNA or RNA. “Synthetic variants” encompasses nucleic acids incorporating known analogs of natural nucleotides/nucleobases that e.g. can hybridize to nucleic acids in a manner similar to naturally occurring nucleotides. Exemplary synthetic variants include peptide nucleic acids (PNAs), phosphorothioate DNA, locked nucleic acids, and the like. Modified or synthetic nucleobases and analogs can include, but are not limited to, 5-Br-UTP, 5-Br-dUTP, 5-F-UTP, 5-F-dUTP, 5-propynyl dCTP, 5-propynyl-dUTP, diaminopurine, S2T, 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, xantine, 4-acetylcytosine, 5-(carboxyhydroxylmethyl)uracil, 5-carboxymethylaminomethyl-2-thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, N 6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5-methylam inomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-Dmannosylqueosine, 5′-methoxycarboxymethyluracil, 5-methoxyuracil, 2-methylthio-D46-isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, 2,6-diaminopurine and the like. Persons of ordinary skill in the art can readily determine what base pairings for each modified nucleobase are deemed a base-pair match versus a base-pair mismatch.
The term “payload” refers to the actual body of data for transmission or for storage or computation. For example, in nucleic acid memory storage, the payload is encoded in the specified nucleotide sequence. The terms “desired data”, “desired information” or “desired media” are used interchangeably to specify the payload information that is contained in the bit stream encoded sequence within a given memory object.
The term “bit” is a contraction of “binary digit”. Commonly “bit” refers to a basic capacity of information in computing and telecommunications. A “bit” conventionally represents either 1 or 0 (one or zero) only, though other higher-order codes can be used with e.g. 2, 4, 6, 8 or more different unit possibilities at every position.
The term “bit stream encoded sequence” as used herein relates to any natural or synthetic sequence-controlled polymer sequence that encodes for data to be stored in a so-called “bitstream-format media”. A “bitstream format” is the format of the data found in a stream (or sequence) of bits used in a digital communication or data storage application. For example, when nucleic acid is used to store data, the “bit stream encoded sequence” is the nucleic acid sequence, either synthetically obtained or naturally occurring, that corresponds to the data that is encoded in a bitstream format.
The terms “sequence-controlled polymer”, “sequence-defined polymer”, “sequence-specific polymer” or “sequence-controlled macromolecule”, as used herein, refer to a macromolecule that is composed of two or more distinct monomers sequentially arranged in a specific, non-random manner, as a polymer “chain”. The arrangement of the two or more distinct monomers constitutes a precise molecular “signature”, or “code” within the polymer chain, particularly in the payload of the molecules of the present disclosure. Sequence controlled polymers can be biological polymers (i.e., biopolymers), or synthetic polymers. Exemplary sequence-controlled biopolymers include natural and/or synthetic nucleic acids, polypeptides or proteins, linear or branched carbohydrate chains, or other sequence controlled polymers that encode a format of information. Exemplary sequence controlled polymers are described in Lutz et al., Science, 341, 1238149 (2013).
As used herein, a “header” refers to supplemental data placed at the beginning of a block of data being stored or transmitted. In the frame of the present disclosure, a header refers to a molecular header, i.e. a supplemental molecular signature, such as a monomer or a polymer, which is placed at the beginning of a sequence-controlled polymer payload storing information to be transmitted. In the same way, a “footer” refers herein to supplemental data placed at the end of a block of data being stored or transmitted. In the frame of the present disclosure, a footer refers to a molecular footer, i.e. a supplemental molecular signature, such as a monomer or a polymer, which is placed at the end of a sequence-controlled polymer payload storing information to be transmitted. Molecular headers and footers according to the invention can preferably but not exclusively be nucleic acid units selected from a list comprising a mononucleotide, a dinucleotide and a nucleic acid sequence such as an oligonucleotide or a polynucleotide. Additionally or alternatively, molecular headers and footers can be other kind of monomers or polymers composed therefrom such as amino acids, oligo- or polypeptides, linear or branched carbohydrate or carbohydrate chains, as well as synthetic chemical entities, or synthetic variants of any of the foregoing.
The term “molecular data storage medium” refers to an object that includes a bit stream-encoded sequence-controlled polymer as a payload of information, at least one header and at least one footer as defined. The bit stream-encoded sequence includes a discrete piece of data, and the at least one header and at least one footer enable selection, organization, and/or isolation of the molecular data storage medium. In some embodiments, molecular data storage media include bitstream-encoded sequence in the form of a continuous stretch of sequence-controlled polymer. In some embodiments, molecular data storage media include discontinuous segments of sequence.
A “nanopore” is any structure comprising and/or defining a pore having a diameter of less than 1 micron, typically between 1 and 20 nm in diameter, for example between 2 and 5 nm in diameter. As a way of example for the sake of providing reference dimensions, single stranded DNA can pass through a 2 nm nanopore, whereas double stranded DNA can pass through a 4 nm nanopore. Having a very small nanopore, e.g., 2-5 nm, allows a biomolecule such as DNA to pass through, but not larger molecular entities such as proteinaceous complexes or enzymes, thereby allowing for controlled passage of polymeric biomolecules or charged polymers in general.
Different types of nanopores are known. For example, biological nanopores are formed by assembly of (a) pore-forming protein(s) in a membrane such as a lipid bilayer. For example, α-hemolysin and similar protein pores are found naturally in cell membranes, where they act as channels for ions or molecules to be transported in and out of cells, and such proteins can be repurposed as nanochannels. Solid-state nanopores are formed in synthetic materials such as silicon nitride or graphene, by configuring holes in the synthetic membrane, e.g. using feedback controlled low energy ion beam sculpting (IBS) or high energy electron beam illumination. Hybrid nanopores can be made by embedding a pore-forming protein in synthetic materials.
Where there is a mean for applying an electrical potential at either end or either side of a nanopore via e.g. electrodes, a current flow across the nanopore may be established through the nanopore via an electrolyte media. Electrodes may be made of any conductive material, for example silver, gold, platinum, copper, titanium dioxide, for example silver coated with silver chloride. The flow of materials across a nanopore may also be regulated by electrodes; for example, as biomolecules are electrically charged, or may be electrically charged depending on some factors such as the pH of the medium they are in (e.g., DNA and RNA are negatively charged in many buffer media), they will be drawn to a positively charged electrode upon application of an electrical voltage across the nanopore. In the event a polymer, such as a sequence-controlled polymer, passes through the nanopore, the change in electric potential, capacitance or current across the nanopore caused by the partial blockage of said nanopore can be detected and used to identify the sequence of monomers in the polymer, wherein different monomers can be distinguished by their different sizes and/or electrostatic potentials.
Methods for configuring a solid state nanopore, a biological nanopore or a hybrid nanopore in membranes or substrates are known in the art, a review of which can be found for instance in Haque, Farzin et al. “Solid-State and Biological Nanopore for Real-Time Sensing of Single Chemical and Sequencing of DNA” Nano today vol. 8, 1 (2013): 56-74, incorporated herein in its entirety by reference.
The terms “membrane”, “film” or “thin film” can be used interchangeably and relate to the thin form factor of an element of the device of the invention. Generally speaking, a “membrane”, “film” or “thin film” as used herein relate to a layer of a material having a thickness much smaller than the other dimensions, e.g. at least one fifth compared to the other dimensions. Typically, a film is a solid layer having a first surface and an opposed second surface, with any suitable shape, and a thickness generally in the order of nanometers or micrometers, depending on the needs and circumstances, e.g. the manufacturing steps used to produce it. In some embodiments, films according to the invention have a thickness comprised between 0.1 nm to 500 μm, such as between 0.3 and 10 nm, between 1 and 50 nm, between 20 and 100 nm, between 200 and 500 nm, between 50 nm and 1 μm, between 1 and 50 μm, between 50 μm and 150 μm, 100 μm and 500 μm or between 200 μm and 500 μm.
In embodiments of the invention, a membrane or thin film can be made of a silicon material, for example silicon dioxide or silicon nitride. Silicon nitride (e.g., Si3N4) is especially desirable for this purpose because it is chemically relatively inert and provides an effective barrier against diffusion of water and ions even when only a few nm thick. Silicon dioxide is also useful, because it is a good surface to chemically modify. Alternatively, in certain embodiments, a membrane or thin film may be made in whole or in part out of materials which can form sheets as thin as a single molecule (sometimes referred to as “single layered” membrane, “monolayer” membrane or “2D” and “two dimensional” sheet or membrane), for example and without limitstion: graphene; GaS; GaSe; GaTe; MX2 type of dichalcogenides where M=Mo, Nb, Ni, Sn, Ti, Ta, Pt, V, W, or Hf and X=S, Se, or Te; M2X3 type of trichalcogenides where M=As, Bi, or Sb and X=S, Se, or Te; MPX3 where X=S or Se; MAX3 where A=Si or Ge and X=S, Se, or Te; and alloy sheets like MxM′1-xS2, as well as combinations of any of the foregoing. Accordingly, suitable materials include molybdenum disulfide (MoS2), molybdenum diselenide (MoSe2), molybdenum ditelluride (MoTe2), tungsten disulfide (WS2), tungsten diselenide (WSe2), tungsten ditelluride (WTe2), chromium disulfide (CrS2), chromium diselenide (CrSe2), chromium ditelluride (CrTe2), gallium arsenide, germanium, boron nitride (hBN) and gallium indium phosphide.
A “two-dimensional” or “2D” layer, sheet, polymer, film, membrane and the like is a sheet-like, macromolecule of elements or crystal having a thickness in the order of a single molecule (monomolecular) layer, i.e. of a few nanometers or less, and are therefore not retrievable in nature as free-standing structures. The most known example of a two-dimensional crystal is graphene, an individual, atomically thin layer or sheet of graphite. However, in a broader sense, a 2D structure may comprise more than one monolayer, such as two or three stacked monomolecular layers, and still be considered as two-dimensional in nature. Two-dimensional materials, sometimes also referred to as layered materials, may comprise laterally connected repeat units (monomers) or may be composed of a single or few atomic elements. These materials have found use in applications such as photovoltaics, semiconductors, electrodes and water purification, to cite a few. Layered combinations of different 2D materials are generally called van der Waals heterostructures, and are contemplated in the frame of the present invention.
The term “unit” as used herein refers to a basic element identical or equivalent in function or form with other elements of the same kind, and by comparison with which any other quantity of the same kind is measured or estimated. For instance, when referring to one unit of a chemical species, it is herein meant the single element of said chemical species that forms a base unity of measure to determine the nature of said chemical species. For instance, a nucleic acid unit can be a nucleotide, a dinucleotide, an oligonucleotide such as a sequence of 3, 4, 5, 6 or more nucleotides, a polynucleotide etc., whereas a peptide unit can be one amino acid, a dipeptide, an oligopeptide, a polypeptide etc. The same is true for any kind of chemical species mutatis mutandis, as well as variants thereof. Units according to the invention can be also represents bits in a bit stream-encoded sequence-controlled polymer payload.
According to a main aspect, the present invention discloses a molecular data storage medium comprising:
This first aspect of the invention is based on the consideration and intuition that a molecule designed as a data storage medium typically used in information technology is much more convenient when translated into a molecular data storage setting. In particular, the present inventors designed and synthesized a “hybrid” molecule comprising a payload carrying an information to be stored and decoded operatively linked to an upstream header and a downstream footer, wherein the payload comprises a polymeric chain of a chemical species different from the chemical species forming both the header and the footer. This design and implementation allows some technical and functional advantages when it comes to a molecular data storage and decoding approach. Contrary to the approaches exploited in the prior art, where typically nucleic acid molecules have been used and declined in several possible ways (including 3D structures and non-classical folding, coupling with luminescent labels, modification with functional or bulky groups etc.), or in which nucleic acids and amino acids have been used in the same molecule to have some technical advantage, the presence of a header and a footer which are chemically distinguished from the sequence-controlled polymeric payload allows to 1) direct and orientate the molecular data storage medium towards a decoding spot including a nanopore 2) easily and advantageously synthesize the molecule with readily available and low-cost synthesis approaches and 3) easily distinguish, thanks to their different chemical nature, the encoded data of the payload vis-à-vis the header and the footer, thereby facilitating the decoding of the information whenever needed.
In some embodiments, the header and the footer each comprise or consist of at least one nucleic acid unit as defined before, and the sequence-controlled polymeric chain payload comprises or consists of a non-nucleic acid polymer chain. Non-nucleic acid polymer chain may include amino acids, oligo- or polypeptides, synthetic monomers or polymers, linear or branched carbohydrate chains and the like. In still another embodiment, the sequence-controlled polymeric chain payload comprises or consists of a non-natural nucleic acid polymer chain. The inventors have implemented a series of such sequence-controlled polymeric chains, tailoring in particular the constituting monomers and their chemistry in order to have optimized performances when decoding the payload through a nanopore-based device. Some exemplary monomers are depicted in
The header and the footer have in embodiments of the invention the same chemical nature, i.e. they are composed of the same chemical species. In embodiments, the header and the footer have the same length. In embodiments, the header and the footer are composed of the same number of units. In embodiments, the units of the header and the footer are the same. As a way of example, the header and the footer may comprise one or more mononucleotide, dinucleotide, oligonucleotide or polynucleotide units, such as for instance a dinucleotide unit (e.g. AA, CC, GG, TT etc.). In embodiment envisaging a header and/or a footer comprising nucleic acid units, said nucleic acid may contain only two base types and does not contain any bases capable of self-hybridizing, e.g., wherein the DNA comprises adenines and guanines, adenines and cytosines, thymidines and guanines, or thymidines and cytosines.
In embodiments of the invention, said header and/or said footer may comprise a unit of a first chemical species having a sequence complementary to the unit of a first chemical species of a header and/or a footer of a second molecular data storage medium. The complementarity of sequences in headers and/or footers may allow the association of molecular data storage media of the invention into larger super-structures based on a pool of memory media, enable physical association in supra-memory blocks for networking and/or spatially segregating blocks of related information, in a way as to for instance allow to a decoding system rapid retrieval of said pool of memory information. Typically, assembly occurs through complementary sequences on overhangs, through a bridging oligonucleotide (splint strand) in case said first chemical species is a nucleic acid, or through protein or chemical adducts to overhangs. The super-structured molecular data storage media can be specifically dissociated and re-grouped by using external signals as desired by the user. Exemplary external signals used to control dissociation include changing the pH, lowering the salt concentration in a molecule-containing buffer, increasing the temperature, applying an electro-magnetic radiation, toe-hold strand displacement, complementary strand excess, or enzymatic release by restriction nucleases, nickases, helicases, resolvases, releasing using UV-sensitive linker, using CRISPR/Cas9 and guide RNAs, or any combination thereof.
In embodiments, the molecular data storage media according to the invention comprise sequence-controlled polymeric chain payloads in which each monomer composing the same encodes for one or more bits of a bitstream-format media, such as 2 bits/monomer, 3 bits/monomer or higher. In one embodiment, data storage media according to the invention comprise sequence-controlled polymeric chain payloads in which each monomer composing the same encodes for a single bit of a bitstream-format media. Advantageously, in embodiments said sequence-controlled polymeric chain is composed of a sequence of two or more types of monomers, i.e. two distinct monomers of the same chemical species, thereby having a plurality of monomers arranged in sequence to correspond to a binary code. The use of only two, distinct monomers, one representing bit-0 and the other representing bit-1, facilitates at the same time the synthesis of the polymers, the encoding and the decoding of information, inter alia, thereby permitting operations similar to bitstream format memory data typically used in information technology. The use of more than two monomers is also possible, as it may improve the storage density of the molecular data storage media. The bit stream may also be improved by the use of error-correcting codes and data compression methods.
As it will be apparent, a second aspect of the present invention concerns a method for encoding a bitstream-format information in a molecular data storage medium, comprising the steps of:
The present invention is further directed to systems and methods for digital data decoding, said digital data being encoded in a molecular data storage medium according to the invention. In particular, the invention features a system adapted and configured for reading data stored in a molecular data storage medium. Even more particularly, the invention features a nanopore-based device adapted and configured for reading data stored in a molecular data storage medium, said nanopore-based device comprising:
Preferably, the device further comprises means for recording and analysing an electrical current. The membrane can be either a solid state membrane or a biological membrane, such as a lipid bilayer. In embodiments of the invention, the membrane comprises an array of nanopores, and the device can be accordingly configured to record and analyse an electrical current obtainable from more than one nanopore.
The design and technical features of the device is tightly linked to, and based upon, a method for decoding a bitstream-format information encoded in the molecular data storage medium according to the invention, which represents a further aspect of the present disclosure. In one embodiment, said method comprises the steps of:
The device of the invention comprises at least two chambers separated by one or more nanopores, wherein each chamber is configured to comprise an electrolytic fluid and one or more electrodes to draw an electrically charged polymer according to the invention from one chamber to another. The device may optionally be configured with functional elements to guide, channel and/or control the molecular data storage medium of the invention, it may optionally be coated or made with materials selected to allow smooth molecule flow, and it may comprise for instance circuit elements to provide and control electrodes proximate to the nanopores. For example, the one or more nanopores may optionally each be associated with electrodes which can control the movement of the polymer though the nanopore and/or detect changes in electrical potential, current, resistance or capacitance at the interface of the nanopore and the polymer, thereby detecting the sequence of the polymer as it passes through the one or more nanopores. As the polymer passes through the nanopore, the change in electrical potential, capacitance or current across the nanopore caused by the partial blockage of the nanopore can be detected and used to identify the sequence of monomers in the polymer, as the different monomers can be distinguished by their different sizes and electrostatic potentials.
Accordingly, the methods of the invention involve the measuring of a current passing through the pore as the substrate, such as a target molecular data storage medium, moves with respect to the pore. Suitable conditions for measuring ionic currents through transmembrane protein pores are known in the art. The method is typically carried out with a voltage applied across the membrane and pore. It is possible to increase discrimination between different monomers of the substrate by a pore by e.g. using an increased applied potential.
The current needed to move a charged polymer through the nanopore depends on, e.g., the nature of the polymer, the size of the nanopore, the material of the membrane containing the nanopore and/or the salt concentrations, and so need to be optimized to the particular system depending on the needs and circumstances. In the case of the hybrid polymeric molecules as used in the present invention, examples of voltage and current would be, e.g., between −300 and +300 mV, typically between 80 and 140 mV, and between −250 and 250 pA, e.g., between 40 and 120 pA, with salt concentrations on the order of 0.1 and 10 M.
The methods are typically carried out in the presence of any charge carriers, such as metal salts, for example alkali metal salt, halide salts, for example chloride salts, such as alkali metal chloride salt. Charge carriers may include ionic liquids or organic salts, for example tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or 1-ethyl-3-methyl imidazolium chloride. In the exemplary apparatus discussed above, the salt is present in the aqueous solution in the chamber. Potassium chloride (KCl), lithium chloride (LiCl), sodium chloride (NaCl) or caesium chloride (CsCl) is typically used. The salt concentration may be at saturation. High salt concentrations provide a high signal to noise ratio and allow for currents indicative of the presence of a nucleotide to be identified against the background of normal current fluctuations.
The methods are typically carried out in the presence of a buffer. In the exemplary apparatus discussed above, the buffer is present in the aqueous solution in the chamber. Any suitable buffer may be used in the method of the invention. Typically, the buffer is HEPES. Another suitable buffer is Tris-HCl buffer. The methods are typically carried out at a pH of from 3.0 to 12.0, preferably about 7.5.
The methods may be carried out at temperatures from 0° C. to 100° C., such as from 15° C. to 95° C., from 16° C. to 90° C., from 17° C. to 85° C., from 18° C. to 80° C., 19° C. to 70° C., or from 20° C. to 60° C. The methods are typically carried out at room temperature. The methods are optionally carried out at a temperature that supports enzyme function, such as about 37° C.
In one embodiment, the step of recording and analysing an electrical current comprises measuring a relative current distribution I/I0, wherein I0 is the value of the open nanopore current and I is the residual current value during the passage of said molecular data storage medium through said nanopore.
In order to implement the methods of the invention (particularly a method for decoding a bitstream-format information encoded in a molecular data storage medium), the system may comprise an operatively coupled computing device configured to control the operation of the system, said computing device comprising a memory and a processing unit encoding instructions that, when executed, cause the processing unit to control at least one of the means to provide a voltage and the means for recording and analyzing an electrical current. The computing device may include one or more processing units and computer readable media. Computer readable media includes physical memory such as volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or some combination thereof. Additionally, the computing device can include mass storage (removable and/or non-removable) such as a magnetic or optical disks or tape. An operating system and one or more application programs can be stored on the mass storage device. The computing device can further include input devices (such as a keyboard and mouse) and output devices (such as a monitor), if needed.
In sequencing devices known in the art, typically a nanopore is used in a fluid-filled cell to read usually DNA data by measuring a change in current as the DNA passes through the nanopore, which are typically in the range of nano-amps. Accordingly, it is very difficult to reliably and repeatably detect such small changes, as they are difficult to distinguish over typical background noise. The difficulties are further enhanced in that charged polymers like DNA can move through a nanopore at the rate of approximately one million bases per second, which is too fast to be read accurately using means known in the art, requiring the use of protein nanopores which slow the passage of DNA through the nanopore itself, and which were considered impractical for reading data.
To the contrary, the present inventors proved able to manage the very rapid movement of the molecular data storage polymer of the invention while getting an accurate reading thereof distinct from the noise in the system. In particular, the inclusion in one embodiment of the device of the invention of at least one biological, macromolecular nanopore selected from a list comprising pore-forming toxins and mutated pore-form toxins, resulted in an astoundingly precise and reliable measurement of tiny current changes during the passage of the hybrid molecule of the invention through said nanopores. Non-limiting examples of suitable biological, macromolecular nanopore comprise wild-type or mutated versions of Alpha hemolysin (aHL), Mycobacterium smegmatis porin A (MspA) and aerolysin, to cite some. In a preferred embodiment of the invention, said pore-forming toxin and/or mutated pore-form toxin is at least one of an aerolysin pore or a mutated aerolysin pore.
One aspect of the invention relates to the optimization of nanopores used for implementing the methods of the invention for decoding a bitstream-format information encoded into a molecular data storage medium, in parallel with the optimization of the (stereo)chemical nature of the molecular data storage medium. In this sense, as will be better detailed in the Example section herein below, the type and structure of the monomers used in the payload of the molecular data storage medium has “evolved” together with the sensing interface of the nanopores.
The inventors have developed in the past a series of aerolysin mutants that have been rationally designed and studied, using molecular modelling and simulation based on recent aerolysin structures and models, in order to alter the interaction between an aerolysin monomer and an analyte such as a polynucleotide, polypeptide or small molecules such as ions. Pores comprising said mutant monomers have an enhanced ability to interact with a substrate analyte such as polynucleotides, polypeptide and small molecules, and therefore display improved properties for estimating the characteristics of, such as the sequence of, said analytes. The aim of this aerolysin mutation process was to increase a current blockage difference/variance (with respect to the basal current of the open pore) in order to better discriminate different monomers of a polymer in a nanopore sensing approach. Such aerolysin mutants, including pores and various constructs obtainable therefrom, are described in European patent application 19 197 435.1, incorporated herein in its entirety by reference.
However, the inclusion of mutated aerolysin pores into systems configured for decoding digital information was never tested in the past. The use of aerolysin pores in this context has been validated with surprisingly good results when used in combination with the molecular data storage media of the invention.
In particular embodiments of the invention, therefore, the device of the invention exploits the improved sensing abilities of those aerolysin nanopores. A mutant aerolysin pore useful in the frame of the present invention comprises one or more modifications on the aerolysin monomer sequence that change the net positive charge, as well as the size of the pore region formed upon oligomerization of the monomers into a pore-forming structure. Said net charge is increased by e.g. introducing one or more positively charged amino acids and/or by neutralising one or more negative charged amino acids, for instance by substituting one or more negatively charged amino acids with one or more uncharged amino acids, non-polar amino acids and/or aromatic amino acids or by introducing one or more positive charged amino acids adjacent to one or more negatively charged amino acids. The size of the pore is altered by increasing or reducing the steric hindrance of side-chain protruding to the internal lumen of the pore.
A modified aerolysin polypeptide to be used as a monomer in an aerolysin pore generally comprises, consists essentially of or consists of a modified aerolysin amino acid sequence. An amino acid sequence of a wild-type (i.e., native, unmodified) aerolysin monomer polypeptide from Aeromonas hydrophila is provided herein as SEQ ID NO: 1 which corresponds to region or positions 24-493 of the wild type aerolysin protein sequence https://www.ncbi.nlm.nih.qov/protein/P09167.2. Such modifications alter the ability of the aerolysin monomer, assembled in a heptameric pore form, to interact with a polymer such as a polynucleotide, a polypeptide or even another analyte via (i) a steric effect of the aerolysin pore on the interacting substrate, (ii) a net charge alteration of the aerolysin pore and/or (iii) the ability of the aerolysin pore to alter the hydrogen bonds established with an interacting substrate.
Said monomer can comprise or can consist of a polypeptide comprising a modified aerolysin amino acid sequence, wherein said sequence comprises the amino acid sequence of SEQ ID NO: 1 or the amino acid sequence of SEQ ID NO: 2 (representing the mature aerolysin monomer without a C-terminal propeptide, namely positions 24-445 of the wild type aerolysin protein sequence) having one or more amino acid substitutions at one or more positions corresponding to positions 220, 238, 242 and 282. In some additional or alternative embodiments, polypeptides further comprises one or more amino acid substitutions at one or more positions corresponding to positions 216, 222, 244, 246, 252, 254 and 258 of SEQ ID NO: 1 or SEQ ID NO: 2.
Preferably, the amino acid(s) substituted into the mutant aerolysin monomer at positions R220, K238, K242 and R282 are selected from the group comprising asparagine (N), glutamine (Q), arginine (R), glutamic acid (E), leucine (L), lysine (K), cysteine (C), tryptophan (W), histidine (H) or alanine (A).
In embodiments, a mutant aerolysin monomer comprises at least one of the following mutations: R220A/W/K/Q, R282A/E/W, K238A/Q/N/R/W/H, K242A/W as well as any combination thereof. Preferably, the amino acid(s) substituted into the mutant aerolysin monomer at positions D216, D222, D222, K244, K246, E252, E254 and E258 are selected from the group comprising asparagine (N), glutamine (Q), arginine (R), aspartic acid (D) or alanine (A). In embodiments, a mutant aerolysin comprises at least one of the following mutations: D216A/N/Q/R, D222A/N/Q/R, K244A/N/Q/R/D, K246A/N/Q/R/D, E252A/N/Q/R, E254A/N/Q/R, E258A/N/R/Q as well as any combination thereof.
In embodiments of the invention, a mutant aerolysin monomer comprises a substitution on at least one of the following positions of SEQ ID NO: 1 or SEQ ID NO: 2: 220, 238, 242 and 282 (hereinafter referred to “group 1 of mutations”) together with a substitution on at least one of the following positions 216, 222, 244, 246, 252, 254 and 258 (hereinafter referred to “group 2 of mutations”). For example, a mutant aerolysin monomer comprises at least one of the following mutations in group 1 of mutations: R220A/W/K/Q, R282A/E/W, K238A/Q/N/R/W/H, K242A/W, as well as at least one of the following mutations in group 2 of mutations: D216A/N/Q/R, D222A/N/Q/R, K244A/N/Q/R/D, K246A/N/Q/R/D, E252A/N/Q/R, E254A/N/Q/R, E258A/N/R/Q as well as any combination thereof.
A mutant aerolysin pore suitable in the frame of the present invention may comprise at least one polypeptide of SEQ ID NO: 2 (representing the mature aerolysin monomer without a C-terminal propeptide) or a variant thereof having one or more amino acid substitutions at one or more positions corresponding to positions 220, 238, 242, 282, 216, 222, 244, 246, 252, 254 and 258; additionally or alternatively, a homo-oligomeric pore derived from said mutant aerolysin monomer comprising identical mutant monomers and a hetero-oligomeric pore derived from said mutant aerolysin monomer as described herein, wherein at least one of the monomers differs from the others are envisaged in the frame of the invention.
A mutant monomer can be produced using standard methods known in the art. Polynucleotide sequences encoding a mutant monomer may be expressed in a bacterial host cell using standard techniques in the art. The mutant monomer may be produced in a cell by in situ expression of the polypeptide from a recombinant expression vector. The expression vector optionally carries an inducible promoter to control the expression of the polypeptide. The monomer may be made synthetically or by recombinant means. For example, the monomer may be synthesized by in vitro translation and transcription (IVTT). Suitable methods for producing pore monomers are discussed for instance in International Applications WO 2010/004273, WO 2010/004265 or WO 2010/086603.
Aerolysin Nanopores
In the following, non-limiting examples the inventors show that aerolysin pores have the potential to achieve the molecular equivalent of single-base resolution for tailored digital analytes, which in turn allows for single-bit reading accuracy. Using deep learning, the inventors were able to decode digital sequences encoding up to 4-bit information with a high accuracy, while blindly detect the identity and relative concentration of polymer mixtures.
Single-channel recording experiments were performed to analyze polymer translocation using the PFT aerolysin from Aeromonas hydrophila (
These macromolecules are sequence-defined poly(phosphodiester)s prepared by automated phosphoramidite chemistry (
aValues obtained by ESI-HRMS measurements in the negative ion mode;
bDetected as [M − 3H]3−;
cDetected as [M − 4H]4−.
Digital decoding in aerolysin pores was first attempted with copolymers containing only bit-0 and bit-1 monomers. The negatively charged backbone of these polymers ensures efficient translocation, but also speeds up the crossing time that would be too fast to allow decoding. As no signals were observed during the single-channel recording after addition of ‘00000’ polymer in the cis side of the chamber and the signal-to-noise-ratio was too low for ‘11111’ (
To understand the relationship between the polymer chemical nature and current levels, the inventors collected more than 10′000 blockade current events for statistical analysis (see Methods and
To optimize the polymer design and further understand the influence of the terminal nucleotides for decoding, a series of polymers which replaced the terminal di-deoxyadenosine groups with other types of dinucleotides were tested. According to previous observations, DNA prefers to enter the aerolysin pore from the 3′-terminal (thus all polymers are oriented starting from the 3′-end,
It was then tested the sensitivity of the pore for detecting bit-1 monomers when spanning the 5 available positions along the n-propyl-phosphate backbone (i.e., AA10000AA, AA01000AA, AA00100AA, AA00010AA and AA00001AA). For this task, the inventors developed a deep learning approach to process the current signal, which was able to automatically classify a much larger fraction of events (˜40%) with high accuracy (˜84%,
To better evaluate the reading capability of aerolysin nanopores the inventors performed a statistical analysis across N=546 separate nanopore measurements of all 30 polymers (˜6.6M events in total,
Based on the model generated by deep learning, the expectation is that any item in the library of polymer sequences can be identified directly with high confidence. To test this hypothesis, blind tests was performed to identify the given polymers and their relative concentration when mixed. Following this blind procedure, the inventors were able to correctly detect polymer “AA010AA” among all 30 polymers, with a percentage of 72.0±3.0% (
In an additional experiment, to show the potential of different type of “chemical bits”, single-channel recording setup and reading of polymers encoding single-bit information using an aerolysin pore were performed. As shown in
In conclusion, the inventors demonstrated that tailor-made informational polymers can be efficiently decoded by using, in the described exemplary embodiment, a variant of the aerolysin pore (K238A). In particular, the design of an optimal bio-inspired writing-reading framework allowed for single-bit resolution, which is unprecedented in analytical chemistry. The aerolysin pore structure can in principle be further tuned to optimize the translocation speed to allow efficient reading of longer polymers. On the other hand, the vast chemical space accessible to informational polymers can be further explored to enhance optimal decoding by biological nanopores. Importantly, informational polymers hybridized with DNA nucleobases keep some of the advantages of synthetic DNA used as support for data storage. For instance, different terminal nucleobases, which allow for more efficient capture by the nanopore, can be readily discriminated (
Writing-reading digital data using this biological-inspired nanopore-based platform can offer numerous advantages. First, single-bit resolution on the proposed informational polymer theoretically provides the opportunity to increase the information density of existing DNA-based solutions. Second, there is no need for amplification of detected molecules, lowering the time and cost of sample preparation and avoiding amplification errors. Furthermore, nanopore sensing does not require additional labelling and there is no theoretical upper limit for the reading length, further reducing the overall cost and workflow time. More importantly, nanopore sensing, which relies on an electrical readout, naturally enables large-scale parallelization based on already established technologies, allowing thus the construction of more affordable and portable devices for data management.
Mycobacterium Smegmatis Porin a (MspA) Nanopores
In an additional exemplary set-up, the inventors included the so-called M2-NNN mutant (D90N/D91N/D93N/D118R/E139K/D134R) of the MspA as a biological nanopore in a nanopore-based device.
As shown in
Methods
Synthesis of the Macromolecular Analytes
The polymers used in the nanopore experiments were synthesized by automated phosphoramidite chemistry on an Expedite DNA synthesizer (Perseptive Biosystem 8900), as previously described (Al Ouahabi, A., et al, J. Am. Chem. Soc., 2015, April 8, doi: 10.1021/jacs.5b02639; Al Ouahabi, A., et al, ACS Macro Lett. 2015, September 10, doi: 10.1021/acsmacrolett.5b00606). All polymers were characterized by ESI-HRMS and their purity was controlled by anion-exchange HPLC, on an Agilent Apparatus equipped with a column Dionex BioLC DNAPac-PA100 and UV detectors (260 and 280 nm).
Aerolysin Productions
The aerolysin full length sequence was cloned in a pET22b vector with a C-terminal hexa-histidine tag to aid purification as described in Cao, C. et al., Nature Communications 2019, 10, 4918. The QuikChange II XL kit from Agilent Technologies was used for performing site-directed mutagenesis on the aerolysin gene, following manufacturer's instructions. The recombinant protein K238A was expressed and purified from BL21 DE3 pLys E. coli cells. Cells were grown to an optical density of 0.6-0.7 in Luria-Bertani (LB) media. Protein expression was induced by the addition of 1 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) and subsequent growth over night at 20° C. Cell pellets were resuspended in lysis buffer (20 mM Sodium phosphate pH 7.4, 500 mM NaCl) mixed with cOmplete™ Protease Inhibitor Cocktail (Roche) and then lysed by sonication. The resulting suspensions were centrifuged (12.000 rpm for 35 min at 4° C.) and the supernatants were applied to an HisTrap HP column (GE Healthcare) previously equilibrated with lysis buffer. The protein was eluted with a gradient over 40 column volumes of elution buffer (20 mM Sodium phosphate pH 7.4, 500 mM NaCl, 500 mM Imidazole), and buffer exchanged into final buffer (20 mM Tris, pH 7.4, 500 mM NaCl) using a HiPrep Desalting column (GE Healthcare). The purified protein was flash frozen in liquid nitrogen and stored at −20° C.
Single-Channel Recording Experiments
Phospholipid of 1,2-Diphytanoyl-sn-glycero-3-phosphocholine powder (Avanti Polar Lipids, Inc., Alabaster, AL, USA) was dissolved in octane (Sigma-Aldrich Chemie GmbH, Buchs, Switzerland) for a final concentration of 1.0 mg per 100 μl. Purified K238A aerolysin mutant was diluted to the concentration of 0.2 μg/ml and then incubated with Trypsin-agarose (Sigma-Aldrich Chemie GmbH, Buchs, SG Switzerland) for 2 h under 4° C. temperature to activate the toxin for oligomerization. The solution was finally centrifuged to remove trypsin.
Nanopore single-channel recording experiments were performed by Orbit Mini equipment (Nanion, Munich, Germany). Phospholipid membranes were formed across a MECA 4 recording chip that contains a 2×2 array of circular microcavities in a highly inert polymer. Each cavity contains an individual integrated Ag/AgCl-microelectrode, and is able to record four artificial lipid bilayers in parallel. The current value leaps from 0.0 pA to nearly 80.0 pA once a single K238A aerolysin self-assembly into the membrane under the applied voltage of +100 mV. The measurement chamber temperature was set to 25 degree for all experiments.
Polymers in powder form were pre-diluted in water, to a stock concentration of 2.0 mg/ml and added to the cis side of the chamber in 1.0 M KCl solution buffered with 10 mM Tris and 1.0 mM EDTA (pH=7.4) to the final concentration of 20 μmol. All experiments shown here were repeated at least 10 different pores. The same conditions were used in experiments using the Mycobacterium smegmatis porin A (MspA) nanopores, such as the M2-NNN MspA mutant (D90N/D91N/D93N/D118R/E139K/D134R).
Current Signal Processing
The raw signals are segmented based on voltage discontinuities and large time-scale discontinuities in order to separate the signals segments where the pore is blocked or where a second pore is inserted into the membrane. For each segment, the open pore current distribution is measured by fitting a Gaussian function on the peak distribution of current with the highest mean current. The signals segments with an open pore current distribution of mean between 67 to 98 pA and standard deviation between 1.5 to 4.2 pA are kept.
The events are extracted using a current threshold at 3a from the open pore current distribution (
In order to detect and label different level in the signal, the local relative current extrema are used to generate a Gaussian mixture model (GMM) with three components: low, high and transition level. The low and high Gaussian models correspond to the two main modes of the relative current extrema distribution. The transition level describe possible change of state between high and low level. Each event is segmented into low, high and transition levels of based on the level type with the highest probability predicted by the GMM. Finally, the transition levels which are not transition between high and low such as high-transition-high and low-transition-low are merged into a single high and low level respectively.
Finally to classify the current events, a machine learning approach was devised including two steps. The first one is the classification of every events and the second is the assessment of the quality of the prediction of the classifier (
While the invention has been disclosed with reference to certain preferred embodiments, numerous modifications, alterations, and changes to the described embodiments, and equivalents thereof, are possible without departing from the sphere and scope of the invention. Accordingly, it is intended that the invention not be limited to the described embodiments, and be given the broadest reasonable interpretation in accordance with the language of the appended claims.
hydrophila GN = aerA (corresponds to region
hydrophila GN = aerA without C-terminal
https://www.ncbi.nlm.nih.gov/protein/P09167.2
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/077229 | 9/29/2020 | WO |