The present invention relates to a method for identifying nucleic acid segments which interact with a target nucleic acid segment or segments as well as kits for performing the method. The invention also relates to a method of identifying one or more interacting nucleic acid segments that are indicative of a particular disease.
Regulatory elements play a central role in an organism's genetic control and have been shown to contribute to health and disease (e.g. in cancer and autoimmune disorders). It has been demonstrated that such regulatory elements (for example, enhancers) can be located at considerable genomic distances (on a linear scale) from their target genes. Approaches for capture of these regulatory elements and their target genes have been developed and are widely applied to study the impact of regulatory landscape dynamics on gene expression and phenotype establishment, as well as to study the role of genetic modifications in disease development. However, when working with low cell numbers, determining which target genes these regulatory elements regulate represents a major challenge.
One of the first methods developed to identify interactions between genomic loci was Chromosome Conformation Capture (3C) technology (Dekker et al. Science (2002) 295: 1306-1311). This involved creating a 3C library by: crosslinking a nuclear composition so that genomic loci that are in close spatial proximity become linked; removing the intervening DNA loop between the crosslink by digestion; and ligating and reversing crosslinking of the interacting regions to generate a 3C library. The library can then be used to detect/identify the frequency of interactions between known sequences. However, this method has a requirement of previous knowledge of the interaction in order to detect the interacting regions of interest. Since then, the technology has been further developed to overcome limitations with the 3C method.
Hi-C is a genome-wide method that does not require any prior knowledge about the interactome of interest. This method uses junction markers to isolate all of the ligated interacting sequences in the cell (see WO 2010/036323 and Lieberman-Aiden et al. 2009). Although this provides information on all interactions occurring within the nuclear composition at a particular time point, the resulting libraries are extremely complex which impedes their analysis at a resolution required to identify significant interactions between specific elements, such as promoters and enhancers. To overcome this limitation, the capture Hi-C technique has been developed which involves a capture step to enrich Hi-C libraries for chromosomal interactions comprising, at least at one end, the regions of interest (see WO 2015/033134, Dryden et al. 2014 and Schoenfelder et al., 2015). WO 2015/033134 discloses a method and kit for identifying nucleic acid segments which interact with a target nucleic acid segment by use of an isolating nucleic acid molecule. However, this method requires starting with a large number of cells (30-40 million cells), which is impossible when working with rare cell types, cells from the early stages of organism development or patient/biopsy samples.
There is therefore a need to provide an improved method for identifying nucleic acid interactions which overcomes the limitations of the currently available methodologies.
According to a first aspect of the invention, there is provided a method for identifying nucleic acid segments which interact with a target nucleic acid segment or segments, said method comprising the steps of:
According to a further aspect of the invention, there is provided a method of identifying one or more interacting nucleic acid segments that are indicative of a particular disease state, comprising:
According to a yet further aspect of the invention, there is provided a kit for identifying a nucleic acid segment which interacts with a target nucleic acid segment or segments, comprising buffers and reagents capable of performing the methods defined herein.
According to a first aspect of the invention, there is provided a method for identifying nucleic acid segments which interact with a target nucleic acid segment or segments, said method comprising the steps of:
The method of the present invention provides a means for identifying interacting sequences and nucleic acid segments by using either targeted amplification or isolating nucleic acid molecules to isolate a target nucleic acid segment or segments. Such methods have the advantage of focussing the data on particular interactions within enormously complex libraries. Furthermore, methods comprising targeted amplification or addition of isolating nucleic acid molecules which bind to the target nucleic acid segment or segments can also be used to organise the information into various subsets depending on the type of reagents used for selection or the type of isolating nucleic acid molecules used (e.g. promoters to identify promoter interactions). Detailed information on the chromosomal interactions within a particular group of targets of interest can then be obtained.
The methods of the present invention further provide a single step fragmentation and oligonucleotide insertion using a recombinase enzyme. Such single step fragmentation and oligonucleotide insertion has the advantage of providing a method with significantly fewer overall steps and reduced manipulation of the nucleic acid composition. For example, in particular embodiments of the present method, a single tube may be utilised from obtaining a nucleic acid composition (step (a) as defined herein) to ligating the fragmented nucleic acid segments (step (e) as defined herein) and also from performing single step fragmentation and oligonucleotide insertion (step (f) as defined herein) to enrichment of fragments comprising biotin (step (g) as defined herein). Methods which do not comprise single step fragmentation and oligonucleotide insertion require separate fragmentation by physical or enzymatic means (e.g. sonication or restriction enzyme digestion), end repair of library fragments, addition of dATP at the 3′-end of library fragments, size selection, ligation of oligonucleotide sequences and purification of fragments from unligated oligonucleotides (such as those methods disclosed in WO 2015/033134). Therefore, it will be appreciated that the present invention provides methods for identifying nucleic acid segments which interact with a target nucleic acid segment or segments which are simpler, comprise fewer steps and may comprise shorter time-frames for completion. Thus, the methods of the present invention are faster than conventional protocols and moreover decrease the overall cost of library production. It will also be appreciated that such advantages may lead to reduced loss of nucleic acid composition. Such reduced loss of nucleic acid composition allows for the amount of starting material to be reduced, for example the number of cells from which the nucleic acid composition is obtained, or the increase in resulting nucleic acid composition which is available for subsequent analysis.
Furthermore, whereas previous techniques, such as 4C, allow the capture of genome-wide interactions of one or a few promoters, the methods described herein can capture over 22000 promoters and their interacting genomic loci in a single experiment. Moreover, the present methods yield a significantly more quantitative readout.
Genome-Wide Association Studies (GWAS) have identified thousands of single-nucleotide polymorphisms (SNPs) that are linked to disease. However, many of these SNPs are located at great distances from genes, making it very challenging to predict on which genes they act. Therefore, the present methods provide ways of identifying interacting nucleic acid segments, even if they are located far away from each other within the genome.
References to “nucleic acid segments” as used herein, are equivalent to references to “nucleic acid sequences”, and refer to any polymer of nucleotides (i.e. for example, adenine (A), thymidine (T), cytosine (C), guanosine (G), and/or uracil (U)). This polymer may or may not result in a functional genomic fragment or gene. A combination of nucleic acid sequences may ultimately comprise a chromosome. A nucleic acid sequence comprising deoxyribonucleosides is referred to as deoxyribonucleic acid (DNA). A nucleic acid sequence comprising ribonucleosides is referred to as ribonucleic acid (RNA). RNA can be further characterised into several types, such as protein-coding RNA, messenger RNA (mRNA), transfer RNA (tRNA), long non-coding RNA (InRNA), long intergenic non-coding RNA (lincRNA), antisense RNA (asRNA), micro RNA (miRNA), short interfering RNA (siRNA), small nuclear (snRNA) and small nucleolar RNA (snoRNA).
“Single-nucleotide polymorphisms” or “SNPs” are single nucleotide variations (i.e. A, C, G or T) within a genome that differ between members of a biological species or between paired chromosomes.
It will be understood that the term “target nucleic acid segment or segments” refers to the sequence or sequences of interest which are known to the user. Isolating only the ligated fragments which contain the target nucleic acid segment or segments helps to focus the data to identify specific interactions with a particular gene or gene segment of interest. Alternatively, performing targeted amplification to enrich fragments comprising the target nucleic acid segment or segments helps to focus the data to identify specific interactions with a particular gene or gene segment of interest by increasing the proportion of fragments within the composition which comprise the target nucleic acid segment or segments.
References herein to the term “interacts” or “interacting”, refer to an association between two elements, for example in the present method, a genomic interaction between a nucleic acid segment and a target nucleic acid segment. The interaction may cause one interacting element to have an effect upon the other, for example, silencing or activating the element it binds to. The interaction may occur between two nucleic acid segments that are located close together or far apart on the linear genome sequence. Thus, in one embodiment, the nucleic acid segment or segments which interact with a target nucleic acid segment or segments are in close proximity to said target nucleic acid segment or segments on the linear genome sequence, for example, are relatively close to each other on the same chromosome. In a further embodiment, the nucleic acid segment or segments which interact with a target nucleic acid segment or segments are located far apart from said target nucleic acid segment or segments on the linear genome sequence, for example, present on a different chromosome or further away if on the same chromosome.
References herein to the term “nucleic acid composition”, refers to any composition comprising nucleic acids and protein. The nucleic acids within the nucleic acid composition may be organised into chromosomes, wherein the proteins (i.e. for example, histones) may become associated with the chromosomes having a regulatory function. In one embodiment, the nucleic acid composition comprises a nuclear composition. Such a nuclear composition may typically include a nuclear genome organisation or chromatin.
References to “crosslinking” or “crosslink” as used herein, refer to any stable chemical association between two compounds, such that they may be further processed as a unit. Such stability may be based upon covalent and/or non-covalent bonding (e.g. ionic). For example, nucleic acids and/or proteins may be crosslinked by chemical agents (i.e. for example, a fixative), heat, pressure, change in pH, or radiation, such that they maintain their spatial relationships during routine laboratory procedures (i.e. for example, extracting, washing, centrifugation etc.). Crosslinking as used herein is equivalent to the terms “fixing” or “fixation”, which applies to any method or process that immobilises any and all cellular processes. A crosslinked/fixed cell, therefore, accurately maintains the spatial relationships between components within the nucleic acid composition at the time of fixation. Many chemicals are capable of providing fixation, including but not limited to, formaldehyde, formalin, or glutaraldehyde.
References to the term “fragments” as used herein, refers to any nucleic acid sequence that is shorter than the sequence from which it is derived. Fragments can be of any size, ranging from several megabases and/or kilobases to only a few nucleotides long. Fragments are suitably greater than 5 nucleotide bases in length, for example 10, 15, 20, 25, 30, 40, 50, 100, 250, 500, 750, 1000, 2000, 5000 or 10000 nucleotide bases in length. Fragments may be even longer, for example 1, 5, 10, 20, 25, 50, 75, 100, 200, 300, 400 or 500 nucleotide kilobases in length. Methods such as restriction enzyme digestion, sonication, acid incubation, base incubation, microfluidization etc., can all be used to fragment a nucleic acid composition.
In some embodiments, fragmentation (i.e. step (c)) is performed using an endonuclease enzyme. Examples of suitable endonuclease enzymes include, but are not limited to, sequence specific endonucleases, such as restriction enzymes, and non-sequence specific endonucleases, such as MNase or DNase.
Thus, in one embodiment, the endonuclease enzyme is a sequence specific endonuclease, such as a restriction enzyme. The term “restriction enzyme” as used herein, refers to any protein that cleaves nucleic acid at a specific base pair sequence. Cleavage can result in a blunt or sticky end, depending on the type of restriction enzyme chosen. Examples of restriction enzymes include, but are not limited to, Eco RI, Eco RII, Bam HI, Hind III, Dpn II, Bgl II, Nco I, Taq I, Not I, Hinf I, Sau 3A, Pvu II, Sma I, Hae III, Hga I, Alu I, Eco RV, Kpn I, Pst I, Sac I, Sal I, Sca I, Spe I, Sph I, Stu I, Xba I. In a further embodiment, fragmentation (i.e. step (c)) is performed using a restriction enzyme. In one embodiment, the restriction enzyme is Hind III. In a further embodiment, the restriction enzyme is Dpn II.
In an alternative embodiment, the endonuclease enzyme is a non-sequence specific endonuclease. The term “non-sequence specific endonuclease” as used herein, refers to any protein that cleaves nucleic acid and is not restricted to the sequence of said nucleic acid, for example they may cleave nucleic acid at any region where protein (e.g. nucleosomes and/or transcription factors) is not bound. Examples of non-sequence specific endonucleases are known in the art and include, but are not limited to, DNase, RNase and MNase. MNase is a non-specific endo-exonuclease derived from the bacteria Staphylococcus aureus, which binds and cleaves protein-unbound regions of DNA on chromatin—DNA bound to histones or other chromatin-bound proteins remains undigested. In a yet further embodiment, fragmentation (i.e. step (c)) is performed using a non-sequence specific endonuclease.
In another embodiment, fragmentation (i.e. step (c)) is performed using sonication.
References herein to the term “filling the end(s)” of fragments or of nucleic acid segments, refer to the addition of nucleotides to the 3′ end of the crosslinked nucleic acid composition or segments following fragmentation. Such filling comprises the addition of dATP, dCTP, dGTP and/or dTTP nucleotides to the 3′ end of the nucleic acid composition or segments. In order to allow the enrichment of nucleic acid fragments or segments which have been ligated and thus contain a ligation junction, one or more of the nucleotides used for filling as described herein may comprise a covalently linked biotin moiety. Thus, in one embodiment, filling the ends of fragmented crosslinked nucleic acid segments comprises the addition of a biotin moiety to the ends of the crosslinked nucleic acid fragments. In a further embodiment, filling the ends of the fragmented crosslinked nucleic acid segments comprises “marking” the ends of the crosslinked nucleic acid fragments with a “junction marker”. Such “marking” of the ends or addition of a biotin moiety to the ends of the crosslinked nucleic acid fragments allows for the subsequent selection, or enrichment, of nucleic acid fragments and/or segments which have been ligated according to step (e) and the methods as defined herein.
The junction marker allows ligated fragments to be purified prior to enrichment step (h), therefore ensuring that only ligated sequences are enriched, rather than non-ligated (i.e. non-interacting) fragments.
In certain embodiments, the junction marker comprises a labelled nucleotide linker (i.e. a nucleotide comprising a covalently linked biotin moiety). In a further embodiment, the junction marker comprises biotin. In one embodiment, the junction marker may comprise a modified nucleotide. In one embodiment, the junction marker may comprise an oligonucleotide linker sequence.
References herein to the terms “ligated” or “ligating”, refer to any linkage of two nucleic acid segments usually comprising a phosphodiester bond. The linkage is normally facilitated by the presence of a catalytic enzyme (i.e. for example, a ligase such as T4 DNA ligase) in the presence of co-factor reagents and an energy source (i.e. for example, adenosine triphosphate (ATP)). In the methods described herein, the fragments of two nucleic acid segments that have been crosslinked are ligated together in order to produce a single ligated fragment.
In one embodiment, ligation of fragmented nucleic acid segments to produce ligated fragments (i.e. step (e)) utilises in-nucleus ligation. Thus, in certain embodiments, ligation of fragmented nucleic acid segments is performed by in-nucleus ligation. Such in-nucleus ligation has the advantage that small volumes of reagents may be used, leading to reduced loss of nucleic acid composition, and thus may also allow for the amount of starting material to be reduced. For example, the number of cells from which the nucleic acid composition is obtained may be reduced, or the resulting nucleic acid composition which is available for subsequent analysis may be increased.
References herein to “single step fragmentation and oligonucleotide insertion”, refer to the fragmentation of ligated fragments and insertion of oligonucleotide sequences in a single step. Such methods utilise a recombinase enzyme which binds to the oligonucleotide sequences and inserts these onto the fragments. This process is also known as “tagmentation”. Therefore, in one embodiment, single step fragmentation and oligonucleotide insertion comprises tagmentation.
Advantages of single step fragmentation and oligonucleotide ligation, further to those mentioned above, include that any binding pair element (such as biotin) which has been incorporated into the nucleic acid composition does not need to be removed from unligated fragments, no size selection of ligated fragments need be performed, enzymatic fragmentation by a recombinase removes the need for end repair as no sonication has been performed, and the addition of A-tails need not be performed. Furthermore, the insertion of oligonucleotide and/or adapter sequences, which may include barcode sequences and/or a unique molecular identifier, is performed concurrently with fragmentation. Such barcode sequences or unique molecular identifier may allow for the identification of a particular nucleic acid composition in subsequent analysis and processing and allow for multiple nucleic acid compositions to be combined in subsequent steps, whilst retaining the ability to identify and analyse individual nucleic acid compositions. Thus, in one embodiment, the oligonucleotide sequence is an “adapter” sequence which allows for or enables subsequent library preparation and sequencing of the adapter-containing nucleic acid fragments. In a further embodiment, the adapter comprises a barcode sequence and/or unique molecular identifier.
In a yet further embodiment, single step fragmentation and oligonucleotide insertion comprises inserting barcode sequences into the ligated fragments. In one embodiment, paired end adapter sequences comprise barcode sequences and/or a unique molecular identifier.
A yet further advantage of methods of the invention utilising single step fragmentation and oligonucleotide ligation (e.g. tagmentation) as presented herein is the obtaining of a significantly enriched library of fragments comprising the target nucleic acid segment or segments compared to previously published protocols. For example, enrichment values of between at least 5-fold and 20-fold or between at least 5-fold and 80-fold compared to libraries produced according to previously known or conventional Hi-C protocols may be generated. In one embodiment, a library at least 5-fold, at least 10-fold, at least 15-fold or at least 20-fold enriched may be generated according the methods defined herein, compared to a library generated according to conventional Hi-C protocols. In a further embodiment, a library at least 10-fold, at least 11-fold, at least 12-fold, at least 13-fold, at least 14-fold or at least 15-fold enriched may be generated according the methods defined herein. In a yet further embodiment, a library at least 50-fold, at least 55-fold or at least 60-fold enriched may be generated according the methods defined herein. It will be appreciated that any enrichment value for a library which is obtained when performing the methods as defined herein, compared to a library generated according to conventional Hi-C protocols, can be dependent on the identity of the endonuclease enzyme used for fragmenting the crosslinked nucleic acid composition. For example, when the restriction enzyme Hind III is used, an enrichment value of up to 20-fold may be obtained. Alternatively, when the restriction enzyme Dpn II is used, an enrichment value of up to 80-fold may be obtained.
The term “paired end adapters” as used herein, refers to any primer pair set that allows automated high throughput sequencing to read from both ends. For example, such high throughput sequencing devices that are compatible with these adapters include, but are not limited to Solexa (Illumina), the 454 System, and/or the ABI SOLiD. For example, the method may include using universal primers in conjunction with poly-A tails.
Recombinase enzymes suitable for use in the present methods will be appreciated to include any enzyme capable of removing (or cutting) and inserting sequence into an oligonucleotide or nucleic acid fragment. Examples of such recombinase enzymes include retroviral integrase and transposase enzymes such as MuA, Tn5, Tn7 and Tc1/mariner-type transposases. Thus, in one embodiment of the present method, the recombinase enzyme is a retroviral integrase. In a further embodiment, the recombinase enzyme is a transposase enzyme, such as Tn5 transposase. In order for the recombinase, integrase or transposase enzyme to be active in the method presented herein, the enzyme may be mutated to overcome the naturally occurring low level of activity of such enzymes. Thus, in a yet further embodiment, the recombinase enzyme is a mutant transposase, such as a hyperactive transposase. Such a hyperactive transposase may be a mutant Tn5 transposase. In one embodiment, the recombinase is Tn5 transposase, such as hyperactive Tn5 transposase.
Tn5 transposase is a member of the RNase superfamily of recombinase proteins which includes retroviral integrases and catalyses the movement of a portion of nucleic acid, known as a transposon, to another part of or another genome by a so called “cut and paste” mechanism. Recombinases, such as transposase enzymes, and transposon elements can be found in certain bacteria and are involved in the acquisition of antibiotic resistance. Transposase enzymes are commonly inactive and mutations in either the active site or elsewhere in the protein can lead to the generation of a hyperactive enzyme. Methods of producing Tn5 transposase enzyme are known in the art (Picelli et al. (2014) Genome Research 24:2033-2040). However, these methods may be further adapted by utilising oligonucleotide sequences, such as adapter sequences, when purifying the Tn5 transposase enzyme.
Oligonucleotide sequences, such as adapter sequences, used when purifying the recombinase enzyme (e.g. the Tn5 transposase) incorporate with the enzyme and are subsequently inserted by said recombinase into a nucleic acid fragment or segment. Such sequences may be diverse in their sequence and comprise additional elements which enable further processing of the nucleic acid fragment or segment into which they are inserted. For example, oligonucleotides incorporated with a purified recombinase enzyme may comprise an adapter sequence for sequencing and/or a barcode sequence. It will be appreciated, however, that all such oligonucleotides comprise a transposon sequence or element which allows for incorporation with the enzyme. Examples of transposon sequences or elements include the Tn5 transposase-compatible Mosaic End (ME) sequence and sequences which are sterically compatible with the binding pocket of a recombinase and/or transposase enzyme.
Thus, according to one embodiment, the recombinase enzyme of the method comprises Mosaic End Double-Stranded (MEDS) oligonucleotides, which comprise a half of paired end adapter sequences. In a further embodiment, the recombinase enzyme comprises paired end adapter sequences for sequencing. In yet further embodiments, the transposase enzyme may comprise oligonucleotides comprising paired end adapter sequences for sequencing which additionally comprise barcode sequences. In further embodiments, the oligonucleotide sequences are selected from: SEQ ID NO: 1, SEQ ID NO: 2 and/or SEQ ID NO: 3 as defined herein. In an alternative embodiment, the oligonucleotide sequences comprise any sequence that enables subsequent library preparation and sequencing. Such sequences will be appreciated to enable the amplification and isolation of nucleic acid segments as well as the binding of said nucleic acid segments for analysis of sequence by high-throughout or next generation sequencing. Examples of next generation sequencing platforms include: Roche 454 (i.e. Roche 454 GS FLX), Applied Biosystems' SOLiD system (i.e. SOLiDv4), Illumina's GAIIx, HiSeq 2000 and MiSeq sequencers, Life Technologies' Ion Torrent semiconductor-based sequencing instruments, Pacific Biosciences' PacBio RS and Oxford Nanopore's MinION.
References herein to “enriching” or “enrichment”, refer to any isolation of nucleic acid segments or increase in the proportion of nucleic acid segments of interest or target nucleic acid segments relative to other nucleic acid segments within the nucleic composition. It will be appreciated that such references include the terms “isolating”, “isolation”, “separating”, “removing”, “purifying” and the like. For example, the enrichment or isolation of nucleic acid segments of interest or target nucleic acid segments may comprise positive methods, such as the “pulling out” of nucleic acid segments of interest or target nucleic acid segments, or may comprise negative methods, such as the exclusion of nucleic acid segments which are not of interest or which do not comprise a target nucleic acid segment. Alternatively, enriching or isolating may comprise the selective or targeted amplification of nucleic acid segments of interest or target nucleic acid segments. Such selective or targeted amplification of nucleic acid segments of interest or target nucleic acid segments will increase the proportion of such segments in the nucleic acid composition (i.e. enrich said segments).
In one embodiment, said enrichment step (h) comprises the step of performing targeted amplification to enrich fragments comprising the target nucleic acid segment or segments.
In an alternative embodiment, said enrichment step (h) comprises the steps of:
Thus, enrichment step (h) of the present method comprises the enrichment of nucleic acid fragments or segments of interest or target nucleic acid segments comprising a particular target segment or sequence.
References herein to “targeted amplification” refer to amplification using methods which preferentially amplify particular nucleic acid segments of interest or target nucleic acid segments. Such targeted amplification may utilise particular primer sequences which are complementary to target nucleic acid segments or sequences present within target nucleic acid segments (e.g. a promoter or silencer sequence). Thus, in one embodiment, the primer sequences are complementary to a promoter sequence. In another embodiment, the primer sequences are complementary to a sequence comprising a SNP. Primer sequences utilised in the methods presented herein may comprise additional elements involved in subsequent processing or analysis of amplified nucleic acid segments. For example, primer sequences may comprise adapter sequences for sequencing as described herein or a unique molecular identifier useful for identification of a nucleic acid segment or group of segments (e.g. those derived from a particular sample). Alternatively or additionally, targeted amplification may utilise particular conditions which may favour the amplification of target nucleic acid segments or fragments comprising target nucleic acid segments. It will be appreciated that amplification may be performed by any method known in the art, such as polymerase chain reaction (PCR). It will be further appreciated that targeted amplification as described herein may comprise amplification of nucleic acid segments in solution or on a support moiety, such as a bead, used for enrichment. Elongation of primer sequences may also be performed prior to amplification, such that amplification on a support moiety may additionally comprise a step of elongation of primer sequences prior to the amplification of said elongated sequences and nucleic acid segments.
References herein to an “isolating nucleic acid molecule” refer to a molecule formed of nucleic acids that is configured to bind to the target nucleic acid segment or segments. For example, the isolating nucleic acid molecule may contain the complementary sequence to the target nucleic acid segment or segments which will then form interactions with the nucleotide bases of the target nucleic acid segment or segments (i.e. to form base pairs (bp)). It will be understood that the isolating nucleic acid molecule, for example biotinylated RNA, does not need to contain the entire complementary sequence of the target nucleic acid segment or segments in order to form complementary interactions and isolate it from the nucleic acid composition. The isolating nucleic acid molecule may be at least 10 nucleotide bases long, for example, at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 130, 150, 170, 200, 300, 400, 500, 750, 1000, 2000, 3000, 4000 or 5000 nucleotide bases long.
In one embodiment, the addition of isolating nucleic acid molecules which bind to the target nucleic acid segment or segments is performed at between 65° C. and 72° C. In a particular embodiment, the addition of isolating nucleic acid molecules is performed at 65° C. Thus, in a further embodiment, step (i) of enrichment step (h) above is performed at between 65° C. and 72° C., such as at 65° C. In another embodiment, isolating fragments which contain the target nucleic acid segment or segments bound to the isolating nucleic acid molecules using the second half of said binding pair is performed at between 68° C. and 72° C. In a particular embodiment, isolating fragments using the second half of the binding pair is performed at 68° C. Thus, in a yet further embodiment, step (ii) of enrichment step (h) is performed at between 68° C. and 72° C., such as at 68° C.
In one embodiment, the isolating nucleic acid molecules are added in the presence of blocking or blocker sequences. Such blocker sequences prevent the binding of ligated fragments comprising adapter sequences to other ligated fragments comprising adapter sequences through any complementarity in the sequence of the adapter sequences. Thus, such blocker sequences prevent binding of fragments which do not comprise the target nucleic acid segment or segments to fragments which do comprise the target nucleic acid segment or segments. In certain embodiments, the blocker sequences are added to the ligated fragments prior to the addition of isolating nucleic acid molecules. In alternative embodiments, the blocker sequences are added to the ligated fragments concurrently, or together with, the isolating nucleic acid molecules. It will therefore be appreciated that, in one embodiment, the blocker sequences comprise any sequence compatible with the adapter sequences ligated to fragments, such as a sequence complementary to the particular adapter sequence. In a further embodiment, the blocker sequences comprise any sequence compatible with, such as complementary to, the MEDS oligonucleotides comprising half of paired end adapter sequences. In some embodiments, the blocker sequences are selected from: SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16 and/or SEQ ID NO: 17 as defined herein.
Additionally, enrichment step (h) of the present methods may be performed according to methods and utilising reagents known in the art. For example, wherein enrichment step (h) comprises isolating nucleic acid molecules which bind to the target nucleic acid segment or segments as described herein, the method or steps of the method may be performed in the presence of a buffer with high concentrations of divalent cation salt, such as between 100 mM and 600 mM. The salt may be present at a molar ratio of between 2.5:1 and 60:1. A volume-excluding/thickening agent may also be present, for example in a concentration of between 0.002% and 0.1%. Additionally or alternatively, said method or steps of the method may comprise incubating the nucleic acid composition in the presence of a buffer. Incubation may be for a period of 8 hours or less, optionally at two different temperatures, wherein the two different temperatures are cycled between 2 and 100 times. Examples of such buffers and methods are described in U.S. Pat. No. 9,587,268. It will thus be appreciated that, according to wherein enrichment step (h) is performed according to certain embodiments disclosed herein, enrichment comprising isolating nucleic acid molecules is more rapid than when using conventional reagents and reaction methods.
References herein to a “binding pair” refer to at least two moieties (i.e. a first half and a second half) that specifically recognise each other in order to form an attachment. Suitable binding pairs include, for example, biotin and avidin or biotin and derivatives of avidin such as streptavidin and neutravidin.
References herein to “labelling” or “labelled” refer to the process of distinguishing a target by attaching a marker, wherein the marker comprises a specific moiety having a unique affinity for a ligand (i.e. an affinity tag). For example, the label may serve to selectively purify the isolating nucleic acid sequence (i.e. for example, by affinity chromatography). Such a label may include, but is not limited to, a biotin label, a histidine label (i.e. 6His), or a FLAG label.
In one embodiment, the isolating nucleic acid molecules comprise biotin. In a further embodiment, the isolating nucleic acid molecules are labelled with biotin. In a yet further embodiment, the isolating nucleic acid molecules are labelled with a histidine label or a FLAG label. Thus, according to certain embodiments, the binding pair may comprise a label (such as a histidine or FLAG label) and an antibody.
In one embodiment, the target nucleic acid segment or segments are selected from promoters, enhancers, silencers or insulators. In a further embodiment, the target nucleic acid segment or segments are promoters. In a further alternative embodiment, the target nucleic acid segment or segments are insulators.
References herein to the terms “promoter” and “promoters”, refer to nucleic acid sequences which facilitates the initiation of transcription of an operably linked coding region. Promoters are sometimes referred to as “transcription initiation regions”. Regulatory elements often interact with promoters in order to activate or inhibit transcription.
The present inventors have used the method of the invention to identify thousands of promoter interactions, with ten to twenty interactions occurring per promoter. The method described herein has identified some interactions to be cell specific, or to be associated with different disease states. A wide range of separation distances between interacting nucleic acid segments has also been identified—most interactions are within 100 kilobases, but some can extend to 2 megabases and beyond. Interestingly, the method has also been used to show that both active and inactive genes form interactions.
Nucleic acid segments that are identified to interact with promoters are candidates to be regulatory elements that are required for proper genetic control. Their disruption may alter transcriptional output and contribute to disease, therefore linking these elements to their target genes could provide potential new drug targets for new therapies.
Identifying which regulatory elements interact with promoters is crucial to understanding genetic interactions. The present method also provides a snapshot look at the interactions within the nucleic acid composition at a particular point in time, therefore it is envisaged that the method could be performed over a series of time points or developmental states or experimental conditions to build a picture of the changes of interactions within the nucleic acid composition of a cell.
It will be understood that in one embodiment the target nucleic acid segment interacts with a nucleic acid segment which comprises a regulatory element. In a further embodiment, the regulatory element comprises an enhancer, silencer or insulator.
The term “regulatory gene” as used herein, refers to any nucleic acid sequence encoding a protein, wherein the protein binds to the same or a different nucleic acid sequence thereby modulating the transcription rate or otherwise affecting the expression level of the same or a different nucleic acid sequence. The term “regulatory element” as used herein, refers to any nucleic acid sequence that affects the activity status of another genomic element. For example, various regulatory elements may include, but are not limited to, enhancers, activators, repressors, insulators, promoters or silencers.
In one embodiment, the target nucleic acid molecule is a genomic site identified through chromatin immunoprecipitation (ChIP) sequencing. ChIP sequencing experiments analyse protein-DNA interactions by crosslinking protein-DNA complexes within a nucleic acid composition. The protein-DNA complex is then isolated (by immunoprecipitation) prior to sequencing the genomic region to which the protein is bound.
It will be envisaged that in some embodiments, the nucleic acid segment is located on the same chromosome as the target nucleic acid segment. Alternatively, the nucleic acid segment is located on a different chromosome to the target nucleic acid segment.
The method may be used to identify a long range interaction, a short range interaction or a close neighbour interaction. The term “long range interaction” as used herein, refers to the detection of interacting nucleic acid segments that are far apart within the linear genome sequence. This type of interaction may identify two genomic regions that are, for instance, located on different arms of the same chromosome, or located on different chromosomes. The term “short range interaction” as used herein, refers to the detection of interacting nucleic acid segments that are located relatively close to each other within the genome. The term “close neighbour interaction” as used herein, refers to the detection of interacting nucleic acid segments that are very close to each other in the linear genome and, for instance, part of the same gene.
SNPs have been shown by the present inventors to be positioned more often in an interacting nucleic acid segment than would be expected by chance, therefore the method of the present invention can be used to identify which SNPs interact with, and are therefore likely to regulate, specific genes.
Thus, from the disclosures presented herein, it will be appreciated that the present methods may by used to identify any nucleic acid interactions, in particular DNA-DNA interactions within a nucleic acid composition.
In one embodiment, the isolating nucleic acid molecule is obtained from bacterial artificial chromosomes (BACs), fosmids or cosmids. In a further embodiment, the isolating nucleic acid molecule is obtained from bacterial artificial chromosomes (BACs).
In one embodiment, the isolating nucleic acid molecule is DNA, cDNA or RNA. In a further embodiment, the isolating nucleic acid molecule is RNA.
The isolating nucleic acid molecule may be employed in a suitable method, such as solution hybridization selection (see WO 2009/099602). In this method a set of ‘bait’ sequences is generated to form a hybridization mixture that can be used to isolate a sub group of target nucleic acids from a sample (i.e. ‘pond’).
In one embodiment, the first half of the binding pair comprises biotin and the second half of the binding pair comprises streptavidin.
In one embodiment, the method additionally comprises reversing the cross-linking prior to step (f). It will be understood that there are several ways known in the art to reverse crosslinks and it will depend upon the way in which the crosslinks are originally formed. For example, crosslinks may be reversed by subjecting the crosslinked nucleic acid composition to high heat, such as above 50° C., 55° C., 60° C., 65° C., 70° C., 75° C., 80° C., 85° C., or greater. Furthermore, the crosslinked nucleic acid composition may need to be subjected to high heat for longer than 1 hour, for example, at least 5 hours, 6 hours, 7 hours, 8 hours, 9 hours, 10 hours or 12 hours or longer. In one embodiment, reversing the cross-linking prior to step (f) comprises incubating the crosslinked nucleic acid composition at 65° C. for at least 8 hours (i.e. overnight) in the presence of Proteinase K.
In one embodiment, the method additionally comprises purifying the nucleic acid composition to remove any fragments which do not contain the junction marker prior to step (f).
References herein to “purifying”, may refer to a nucleic acid composition that has been subjected to treatment (i.e. for example, fractionation) to remove various other components, and which composition substantially retains its expressed biological activity. Where the term “substantially purified” is used, this designation will refer to a composition in which the nucleic acid forms the major component of the composition, such as constituting about 50%, about 60%, about 70%, about 80%, about 90%, about 95% or more of the composition (i.e. for example, weight/weight (w/w), volume/volume (v/v) and/or weight/volume (w/v)).
In one embodiment, the method additionally comprises amplifying the isolated target ligated fragments prior to step (i). In a further embodiment, the amplifying is performed by polymerase chain reaction (PCR).
In one embodiment, the nucleic acid composition is derived from a mammalian cell nucleus. In a further embodiment, the mammalian cell nucleus may be a human cell nucleus. Many human cells are available in the art for use in the method described herein, for example GM12878 (a human lymphoblastoid cell line) or CD34+ (human ex vivo haematopoietic progenitors).
It will be appreciated that the method described herein finds utility in a range of organisms, not just humans. For example, the present method may also be used to identify genomic interactions in plants and animals.
Therefore, in an alternative embodiment, the nucleic acid composition is derived from a non-human cell nucleus. In one embodiment, the non-human cell is selected from the group including, but not limited to, plants, yeast, mice, cows, pigs, horses, dogs, cats, goats, or sheep. In one embodiment, the non-human cell nucleus is a mouse cell nucleus or a plant cell nucleus.
It will be appreciated from the advantages of the invention as mentioned herein, that the present methods provide for a reduced loss of nucleic acid composition during the herein mentioned steps. Such reduced loss of nucleic acid composition may allow for the amount of starting material to be reduced, for example the number of cells from which the nucleic acid composition is obtained. Thus, in one embodiment, the nucleic acid composition may be derived from a smaller number of cells than previous promoter capture or conformation capture techniques. In a further embodiment, the nucleic acid composition is derived from 1 million or fewer cells, 0.5 million or fewer cells, 0.2 million or fewer cells, 50000 or fewer cells or 10000 or fewer cells. In a yet further embodiment, the nucleic acid composition is derived from 1 million cells, 0.5 million cells, 0.2 million cells, 50000 cells or 10000 cells. In certain embodiments, the nucleic acid composition is derived from 1 million cells, 50000 cells or 10000 cells.
In one embodiment, the method as defined herein comprises the steps of:
In another embodiment, the method as defined herein comprises the steps of:
According to a further aspect of the invention, there is provided a method of identifying one or more interacting nucleic acid segments that are indicative of a particular disease state comprising:
References to “frequency of interaction” or “interaction frequency” as used herein, refer to the number of times a specific interaction occurs within a nucleic acid composition (i.e. sample). In some instances, a lower frequency of interaction in the nucleic acid composition, compared to a normal control nucleic acid composition from a healthy subject, is indicative of a particular disease state (i.e. because the nucleic acid segments are interacting less frequently). Alternatively, a higher frequency of interaction in the nucleic acid composition, compared to a normal control nucleic acid composition from a healthy subject, is indicative of a particular disease state (i.e. because the nucleic acid segments are interacting more frequently). In some instances, the difference will be represented by at least a 0.5-fold difference, such as a 1-fold, 1.5-fold, 2-fold, 2.5-fold, 3-fold, 4-fold, 5-fold, 7-fold or 10-fold difference.
In one aspect of the invention, the frequency of interaction may be used to determine the spatial proximity of two different nucleic acid segments. As the interaction frequency increases, the probability increases that the two genomic regions are physically proximal to one another in 3D nuclear space. Conversely, as the interaction frequency decreases, the probability decreases that the two genomic regions are physically proximal to one another in 3D nuclear space.
Quantifying can be performed by any method suitable to calculate the frequency of interaction in a nucleic acid composition from a patient or a purification or extract of a nucleic acid composition sample or a dilution thereof. For example, high throughput sequencing results can also enable examination of the frequency of a particular interaction. In methods of the invention, quantifying may be performed by measuring the concentration of the target nucleic acid segment or ligation products in the sample or samples. The nucleic acid composition may be obtained from cells in biological samples that may include cerebrospinal fluid (CSF), whole blood, blood serum, plasma, or an extract or purification therefrom, or dilution thereof. In one embodiment, the biological sample may be cerebrospinal fluid (CSF), whole blood, blood serum or plasma. Biological samples also include tissue homogenates, tissue sections and biopsy specimens from a live subject, or taken post-mortem. The samples can be prepared, for example where appropriate diluted or concentrated, and stored in the usual manner.
In one embodiment, the disease state is selected from: cancer, autoimmune disease, a developmental disorder, a genetic disorder, diabetes, cardiovascular disease, kidney disease, lung disease, liver disease, neurological disease, viral infection or bacterial infection. In a further embodiment, the disease state is cancer or autoimmune disease. In a yet further embodiment, the disease state is cancer, for example breast, bowel, bladder, bone, brain, cervical, colon, endometrial, oesophageal, kidney, liver, lung, ovarian, pancreatic, prostate, skin, stomach, testicular, thyroid or uterine cancer, leukaemia, lymphoma, myeloma or melanoma.
References herein to an “autoimmune disease” include conditions which arise from an immune response targeted against a person's own body, for example Acute disseminated encephalomyelitis (ADEM), Ankylosing Spondylitis, Behçet's disease, Celiac disease, Crohn's disease, Diabetes mellitus type 1, Graves' disease, Guillain-Barré syndrome (GBS), Psoriasis, Rheumatoid arthritis, Rheumatic fever, Sjögren's syndrome, Ulcerative colitis and Vasculitis.
References herein to a “developmental disorder” include conditions, usually originating from childhood, such as learning disabilities, communication disorders, Autism, Attention-deficit hyperactivity disorder (ADHD) and Developmental coordination disorder.
References herein to a “genetic disorder” include conditions which result from one or more abnormalities in the genome, such as Angelman syndrome, Canavan disease, Charcot-Marie-Tooth disease, Colour blindness, Cri du chat syndrome, Cystic fibrosis, Down syndrome, Duchenne muscular dystrophy, Haemochromatosis, Haemophilia, Klinefelter syndrome, Neurofibromatosis, Phenylketonuria, Polycystic kidney disease, Prader-Willi syndrome, Sickle-cell disease, Tay-Sachs disease and Turner syndrome.
According to a further aspect of the invention, there is provided a kit for identifying a nucleic acid segment which interacts with a target nucleic acid segment or segments, which comprises buffers and reagents capable of performing the methods defined herein.
The kit may include one or more articles and/or reagents for performance of the method. For example, an oligonucleotide probe, pair of amplification primers and/or recombinase enzyme associated oligonucleotides for use in the methods described herein may be provided in isolated form and may be part of a kit, e.g. in a suitable container such as a vial in which the contents are protected from the external environment. The kit may include instructions for use according to the protocol of the method described herein. A kit wherein the nucleic acid is intended for use in PCR may include one or more other reagents required for the reaction, such as polymerase, nucleotides, buffer solution etc.
In one embodiment, the kit comprises a recombinase enzyme. In a further embodiment, the recombinase enzyme comprised in the kit as defined herein is a transposase enzyme, such as a hyperactive mutant transposases enzyme, e.g. a hyperactive mutant Tn5 transposase.
According to a yet further aspect of the invention, there is provided a recombinase enzyme as defined herein capable of single step fragmentation and adapter insertion. Thus, there is also provided herein, a recombinase enzyme capable of tagmentation.
In one embodiment, the recombinase enzyme provided herein is a hyperactive mutant transposase enzyme. In a further embodiment, the transposase enzyme is hyperactive mutant Tn5 transposase. In a yet further embodiment, the transposase enzyme comprises paired end adapter sequences.
It will be understood that examples of the types of buffers and reagents to be included in the kit, in addition to those previously described can be seen in the Examples described herein.
The following studies and protocols illustrate embodiments of the methods described herein:
BB Binding buffer
dd dideoxy
EDTA Ethylenediaminetetraacetic acid
HB Agilent Hybridization buffer (HBI, HBII, HBIII and HBIV)
SPRI beads Solid Phase Reversible Immobilisation beads
TB Tween buffer
WB Wash buffer
Incubate for 7 min at 55° C. without mixing.
Aim for DNA fragment distribution around 400 bp.
As a guideline: use 0.5-1 μl of 12.3 uM Tn5 if working with ˜50 ng of DNA. If working with 100-300 ng of DNA, use 1 μl of 24.6 uM Tn5.
For better results—titrate the amount of Tn5 to get a proper fragment distribution.
If the distribution is correct—add 1 μl of nuclease free water to the initial tagmentation mix from step 21 and strip off the Tn5 by adding 5 μl of 0.2% SDS and incubating at 55° C. for 7 min.
Prepare three PCR strips: “DNA”, “Hybridization” and “RNA”.
Mix thoroughly; if a precipitate has formed, heat at 65° C. for 5 minutes. Aliquot 30 μl per capture to each well in “Hybridization” PCR strip (Agilent 410022), close with a PCR strip tube lid (Agilent optical cap 8×strip 401425) and keep at room temperature.
The PCR machine lid has to be heated. Throughout the procedure, work quickly and try to keep the PCR machine lid open for the minimum time possible. Evaporation of the sample will result in suboptimal hybridization conditions.
Close the remaining “RNA” PCR strip (now containing Hi-C library/hybridization buffer/RNA bait as shown in
Streptavidin-Biotin Pull-Down and Washes—to be used with Method 1 above
Binding buffer (BB, Agilent Technologies) at room temperature
Wash buffer I (WB I, Agilent Technologies) at room temperature
Wash buffer II (WB II Agilent Technologies) at between 65° C. and 72° C., in particular at 65° C. NEB2 1×(NEB B7002S) at room temperature.
Mix Dynabeads MyOne Streptavidin T1 (Life Technologies 65601) thoroughly before adding 60 μl per Capture Hi-C sample into a 1.5 ml lobind Eppendorf tube. Wash the beads as follows (same procedures for all subsequent wash steps):
Repeat steps a) to d) for a total of 3 washes.
With the Dynabeads MyOne Streptavidin T1 beads in 200 μl BB in a fresh low bind Eppendorf tube, open the lid of the PCR machine (while the PCR machine is running) and pipette the entire hybridization reaction into the tube containing the streptavidin beads. Incubate on a rotating wheel for 30 mins at room temperature.
After 30 mins, place the sample on the magnetic separator, discard supernatant.
Resuspend beads in 500 μl WB I, and transfer to a fresh tube. Incubate at room temperature for 15 mins. Vortex every 2 to 3 minutes for 5 seconds each.
Separate the beads and buffer on a magnetic separator and remove the supernatant. Resuspend in 500 μl WB II (prewarmed to between 65° C. and 72° C., in particular 65° C.) and transfer to a fresh tube. Incubate at between 65° C. and 72° C., in particular at 65° C., for 10 mins, and vortex (at low to medium setting) for 5 seconds every 2 to 3 minutes. Repeat for a total of 3 washes in WB II, all at between 65° C. and 72° C., in particular at 65° C.
Resuspend in 200 μl of Neb2 1×. Put directly on the magnet. Remove the supernatant and resuspend in 30 μl of Neb2 1×.
The RNA/DNA mixture hybrid ‘catch’ on beads is now ready for PCR amplification (step 45).
Capture Hybridization of Hi-C Library with Biotin-RNA—Method 2 (using Buffers with High Concentrations of Divalent Cation Salt—herein referred to as “Fast Hybridization”)
As described herein, according to embodiments utilising this method, preparation time may be greatly reduced (for example, to approx. 2 hours 45 minutes).
Resuspend in a final volume of 20 μl TLE or nuclease free water.
Check the quality and quantity of the Capture Hi-C library by TapeStation/Bioanalyzer and KAPA qPCR.
1540 mM MgCl2*6H2O, 0.0417% w/w HPMC, 100 mM Tris (pH 8.0) and H2O.
(“low-stringency buffer”—high salt concentrations and low temperatures, to remove non-specifically bound probe) 2×SSC, 0.1% SDS and H2O.
(“high-stringency buffer”—low salt concentrations and high temperatures, to remove low-affinity hybridization probe) 0.1×SSC, 0.1% SDS and H2O.
Number | Date | Country | Kind |
---|---|---|---|
1914325.4 | Oct 2019 | GB | national |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/GB2020/052448 | Oct 2020 | US |
Child | 17656063 | US |