ENGINEERED TRANSPOSASE AND USES THEREOF

FIELD OF THE INVENTION

The present invention relates to the field of genomic and epigenomic analysis. More specifically, the present invention relates to an engineered transposase and an engineered transposome to target specific regions of chromatin. The present invention also relates to methods for genomic and/or epigenomic analysis and uses of the engineered transposase and/or engineered transposome of the invention for genomic and/or epigenomic analysis.

BACKGROUND TO THE INVENTION

Epigenetic modifications are heritable phenotype changes that do not result from alteration of the DNA sequence itself. Epigenetic mechanisms are highly conserved throughout eukaryotes. Examples of epigenetic modifications include histone modification and DNA methylation, each of which alters gene expression without changing the underlying DNA sequence. In particular, histone modification alters local chromatin structure and thereby gene expression.

Several human diseases are the result of disrupted epigenetics impinging on underlying genetic lesions. A case in point is represented by cancer. Cancers are characterized by extensive inter-patient and intra-tumour heterogeneity, down to the single cell level. This fuels clonal evolution, leading to treatment resistance, both primary and acquired, which is the leading cause of death for cancer patients. Despite extensive studies, the mechanisms underlying this resistance are still largely unknown both for standard chemotherapeutic regimens and for the recently introduced immunotherapies. Increasingly detailed analysis of cancer genomes, before and after treatment, have so far failed to identify genetic causes, such as the acquisition of somatic mutations or copy number aberrations, which could explain the ensuing refractoriness to therapeutic regimens. Growing evidence points to epigenetic traits as crucially important in driving the acquisition of resistance towards anticancer regimens. This suggests that only a comprehensive assessment of both genetic changes of the cancer genome and the concomitant chromatin remodelling events ensuing after treatment could finally provide the insights required to tackle this pressing unmet clinical need. Additionally, given the rampant heterogeneity that is present within cancer cell populations, single-cell approaches are emerging as truly revolutionary tools to reliably and comprehensively capture cancer heterogeneity and inform on treatment resistance mechanisms.

Next-generation sequencing (NGS) has transformed genomic research by reducing turnaround time and cost. Library construction plays an important role for high-throughput NGS. A plethora of library construction methods have been developed, including the traditional ligation-based methods and the more recently developed transposase-based Nextera method.

The transposase-based Nextera approach employs an in vitro transposition reaction, using a transposome complex formed of a transposase Tn5 and a free transposon end that contains a transposase recognition site mosaic end (ME) and a sequencing adaptor (which may be a sequencing primer). When the transposome complex is incubated with target double-stranded DNA (dsDNA), the target dsDNA undergoes tagmentation by the transposase. Thus, the target dsDNA is fragmented and the transposon (including the ME and the sequencing primer) is covalently attached to the 5′ end of the target dsDNA fragment, resulting in a sequencing-ready DNA library. Nextera libraries can also incorporate tagging sequences (also termed barcodes), enabling multiplexed sequencing in a single run.

Whilst significant improvements have been made in genome sequencing approaches, methodologies currently used for sequencing of chromatin fragments suffer from various limitations.

Conventional chromatin immunoprecipitation with sequencing (ChIP-seq) is a complex, time consuming and multistep process involving crosslinking of DNA and protein in live cells, extraction followed by shearing of crosslinked material, immunoprecipitation of crosslinked DNA-protein complexes (by antibody binding of the protein of interest), reverse crosslinking, and the sequencing of the resulting DNA molecules. Thus, ChIP-seq and its variations involve performing DNA sequence analysis on the fraction of DNA isolated by immunoprecipitation with antibodies specific to the protein of interest, which is directly or indirectly associated with DNA. These methodologies suffer from low signals, high backgrounds, epitope masking due to cross-linking, low yields which require large numbers of cells, limitations associated with efficient immunocapture of protein-associated DNA, and technical challenges associated with the use of antibodies. In particular, ChIP-seq and other antibody-based approaches are limited to a single library per immunoprecipitation, i.e. these methods are not suitable for multiplex sequencing analysis of different epigenetic markers.

ChIP-seq and Nextera sequencing have also been integrated in an approach termed transposase assisted chromatin immunoprecipitation (TAM-ChIP). This approach combines the antibody-mediated targeting of chromatin immunoprecipitation with the ability of Tn5 to tagment DNA, leading to chromatin fragmentation and tagging of the chromatin surrounding the antibody binding site. In this process, a transposase is conjugated to an antibody such that antibody-directed tagmentation of DNA by the transposase occurs following antibody binding to the target molecule. This approach relies on antibodies, which pose technical challenges.

Recently, a method for determining chromatin accessibility has been developed, termed Assay for Transposase-Accessible Chromatin using sequencing (ATACseq). This method uses transposases to probe accessible chromatin. Transposases allow for the fragmenting and then sequencing of native accessible chromatin in bulk (ATACseq), as well as at the single-cell level (scATAC-seq). This approach is providing key insights on the cellular status of open chromatin. However, the epigenetic modifications of large portions of the genome which exert essential roles in cellular physiology are excluded from this analysis.

Hence, while recent efforts have succeeded in surveying open chromatin, the high-throughput, single-cell assessment of genomic and epigenetic landscapes remains challenging.

Thus, there is a significant need in the art for a tool which comprehensively audits, for example at the single cell level, both the genomic and the epigenetic landscape.

SUMMARY OF THE INVENTION

The present inventors have developed engineered transposases which have been redirected to bind to a different component of chromatin compared to the corresponding wild type transposase. This permits the analysis of chromatin modifications which were previously excluded from sequencing analyses.

In addition, the present inventors have devised a genomic and epigenetic approach, termed “genome and epigenome by transposases sequencing” (GET-seq), which can be performed at the single-cell level (scGET-seq), that may exploit such engineered transposases to comprehensively probe open and closed chromatin, concomitantly recording the underlying genomic sequences. Hence, a comprehensive epigenetic assessment of heterochromatin is achieved. Additionally, building upon the differential enrichment between closed and open chromatin, the present inventors devised a method using scGET-seq, termed “Chromatin Velocity”, which identifies the trajectories of epigenetic modifications at the single-cell level. Thus, GET-seq, and in particular, scGET-seq, may illuminate the dynamic and evolving genomic and epigenetic landscapes of single cell populations in physiology and human diseases.

Furthermore, the present inventors have devised a multiomics approach (i.e. an approach which combines multiple omics technologies), termed GET²-seq, which can be performed at the single-cell level (scGET²-seq), that may exploit the engineered transposases described herein to comprehensively probe open and closed chromatin, concomitantly recording the underlying genomic sequences while simultaneously capturing RNA. Hence, a comprehensive genomic, epigenomic and transcriptomic approach may be achieved. Thus, GET²-seq, and in particular, scGET²-seq, may illuminate the dynamic and evolving genomic, epigenetic and transcriptomic landscapes of single cell populations in physiology and human diseases.

The methods of the invention significantly improve the principle techniques currently used for sequencing of chromatin fragments, such as for epigenetic analysis, including Nextera (transposon-based), ATAC-seq (transposon-based), ChIP and TAM-ChIP. In particular, existing methodologies may not be suitable for single cell analysis, require extraction and optionally fragmentation of genomic DNA, exclude epigenetic modifications of large portions of the genome and/or rely on antibodies, which pose technical challenges. The methods of the invention permit multiplex sequencing analysis and is less time-consuming, i.e. more rapid and efficient, since they do not require steps such as histone-DNA crosslinking, chromatin shearing and de-crosslinking. Further, the GET²-seq method permits simultaneous genomic, epigenomic and transcriptiomic profiling.

Advantages of the methods of the invention over conventional techniques include the following:

- the need for antibodies is eliminated, thus providing a more efficient and robust process;
- the insertion of the oligonucleotide to the DNA by the transposase reduces the input DNA requirement;
- the need for pre-processing of genetic material is eliminated (i.e. the method can be performed on intact cells), providing a more efficient and cost effective process;
- the methods of the invention may be applicable to a broader range of chromatin targets which were previously excluded due to the limited targeting of the available transposases and/or the lack of suitable antibodies for certain targets;
- the methods of the invention are applicable to single cell analysis;
- the methods of the invention are applicable to multiplexed sequencing applications; the methods of the invention permit simultaneous and dynamic profiling of both accessible and compacted chromatin, i.e. simultaneous and dynamic genomic and epigenetic analysis, even at the single cell level; and.
- the multiomics methods of the invention achieve simultaneous and dynamic profiling of the chromatin conformation state (euchromatin and heterochromatin) and capture of RNA, e.g. simultaneous and dynamic genomic, epigenomic and transcriptomic profiling, even at the single cell level.

Accordingly, in one aspect, the invention provides a method for making a DNA sequence library or libraries comprising the steps:

- a) providing a sample comprising genomic DNA;
- b) adding at least one engineered transposome complex according to the invention;
- c) optionally amplifying tagged DNA; and
- d) optionally isolating the amplified DNA.

In some embodiments, the method further comprises the step of sequencing tagged DNA, the amplified DNA or the isolated DNA.

In a further aspect, the invention provides a method for DNA sequencing comprising the steps:

- a) providing a sample comprising genomic DNA;
- b) adding at least one engineered transposome complex according to the invention;
- c) optionally amplifying tagged DNA;
- d) optionally isolating the amplified DNA; and
- e) sequencing tagged DNA, the amplified DNA or the isolated DNA.

In a further aspect, the invention provides a method for genome sequence and/or epigenome analysis comprising the steps:

- a) providing a sample comprising genomic DNA;
- b) adding at least one engineered transposome complex according to the invention;
- c) optionally amplifying tagged DNA;
- d) optionally isolating the amplified DNA; and
- e) sequencing tagged DNA, the amplified DNA or the isolated DNA.

In some embodiments, the sample further comprises RNA.

In some embodiments, the methods further comprise the steps of tagging the RNA, optionally amplifying the tagged RNA, optionally isolating the amplified cDNA and optionally sequencing the tagged RNA, amplified cDNA or isolated cDNA. Suitably, the RNA is tagged using a polyA capture probe(s) which may comprising an RNA tagging sequence.

In a further aspect, the invention provides a method for making a DNA sequence library or libraries and an RNA sequence library or libraries comprising the steps:

- a) providing a sample comprising genomic DNA and RNA;
- b) (i) adding at least one engineered transposome complex; and
  - (ii) tagging the RNA;
- c) optionally amplifying tagged DNA and/or tagged RNA;
- d) optionally isolating the amplified DNA and/or the amplified cDNA; and
- e) optionally sequencing tagged DNA, the amplified DNA or the isolated DNA and/or optionally sequencing tagged RNA, the amplified cDNA or the isolated cDNA.

In a further aspect, the invention provides a method for DNA sequencing and RNA sequencing comprising the steps:

- a) providing a sample comprising genomic DNA and RNA;
- b) (i) adding at least one engineered transposome complex; and
  - (ii) tagging the RNA;
- c) optionally amplifying tagged DNA and/or tagged RNA;
- d) optionally isolating the amplified DNA and/or the amplified cDNA; and
- e) sequencing tagged DNA, the amplified DNA or the isolated DNA and sequencing tagged RNA, the amplified cDNA or the isolated cDNA.

In a further aspect, the invention provides a method for a method for genome sequence, epigenome and/or transcriptome analysis comprising the steps:

- a) providing a sample comprising genomic DNA and RNA;
- b) (i) adding at least one engineered transposome complex; and
  - (ii) tagging the RNA;
- c) optionally amplifying tagged DNA and/or tagged RNA;
- d) optionally isolating the amplified DNA and/or the amplified cDNA; and
- e) sequencing tagged DNA, the amplified DNA or the isolated DNA and sequencing tagged RNA, the amplified cDNA or the isolated cDNA.

In some embodiments, the sequencing comprises single-cell sequence analysis. Suitably, the method may use a microfluidic device. Suitably, the method may use a droplet-based microfluidic device and/or beads comprising an RNA tagging sequence(s).

In some embodiments, the engineered transposome complex comprises an oligonucleotide and an engineered transposase.

In some embodiments, the oligonucleotide comprises a sequencing primer site, a tagging sequence and/or a mosaic end.

In some embodiments, the oligonucleotide comprises a 5′ phosphate group.

In some embodiments, the engineered transposase comprises a transposase operably linked to a polypeptide that binds to a component of heterochromatin and/or euchromatin. In preferred embodiments, the engineered transposase comprises a transposase operably linked to a polypeptide that binds to a component of heterochromatin.

In some embodiments, the polypeptide binds to methylated histone.

In some embodiments, the polypeptide binds to H3K9me3, H3K27me3 and/or H3K36me3.

In some embodiments, the polypeptide binds to H3K9me3.

In some embodiments, the polypeptide comprises a chromodomain, a bromodomain, a HMG-box domain, a JmJc domain, a KRAB domain or a PWWP domain.

In some embodiments, the polypeptide comprises a chromodomain.

In some embodiments, the chromodomain is selected from the chromodomain of heterochromatin protein 1-α, of chromobox protein homolog 2, of chromobox protein homolog 5, of chromobox protein homolog 7, of chromobox protein homolog 8, of yeast protein Eaf3 or of M phase phosphoprotein 8.

In preferred embodiments, the chromodomain is the chromodomain of heterochromatin protein 1-α.

In some embodiments, the transposase is a DD [E/D] transposase.

In some embodiments, the transposase is selected from Tn5, Sleeping Beauty, Tn10, Drosophila P element, bacteriophage Mu, Tc1/Mariner, IS10 and IS50.

In preferred embodiments, the transposase is Tn5.

In preferred embodiments, the engineered transposase comprises Tn5 operably linked to a chromodomain, preferably chromodomain of heterochromatin protein 1-α.

In some embodiments, the engineered transposase comprises:

- a) a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 9; and/or
- b) a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 22 or SEQ ID NO: 24.

In some embodiments, the engineered transposase comprises a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5 or SEQ ID NO: 7.

In preferred embodiments, the engineered transposase comprises a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 1. In some embodiments, the engineered transposase comprises a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 3. In some embodiments, the engineered transposase comprises a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 5. In some embodiments, the engineered transposase comprises a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 7.

In some embodiments, the analysis determines genomic copy number variants (CNVs). In some embodiments, the analysis determines single nucleotide variations (SNV), for example within single cells.

In some embodiments, step b) further comprises adding at least one further transposome complex.

In some embodiments:

- a) the at least one engineered transposome complex; and
- b) the at least one further transposome complex,
  
  each binds to a different methylated histone.

In some embodiments:

- a) the at least one engineered transposome complex; and
- b) the at least one further transposome complex,
  
  each preferentially binds to a different methylated histone.

In some embodiments:

- a) the at least one engineered transposome complex; and
- b) the at least one further transposome complex,
  
  each has a different methylated histone binding specificity.

In some embodiments, the tagging sequence of the at least one engineered transposome complex differs from the tagging sequence of the at least one further transposome complex.

In some embodiments, the sample comprising genomic DNA is a sample of isolated cells, tissue, or whole organs. In some embodiments, the sample has not been pre-processed. In some embodiments, the sample comprising genomic DNA comprises genomic DNA which has been extracted from isolated cells, tissue, or whole organs, and optionally fragmented.

In some embodiments, nuclei in the sample have been permeabilized.

In some embodiments, the sample comprising genomic DNA is a sample comprising permeabilized nuclei.

In some embodiments, the sample comprising genomic DNA is a sample comprising permeabilized cells.

In some embodiments, the sample comprising genomic DNA comprises a single cell. In some embodiments, the sample comprising genomic DNA comprises an intact single cell.

In some embodiments, the sequencing comprises single-cell sequence analysis.

In some embodiments, the signals obtained from the at least one further transposome complex and the at least one engineered transposome complex at a DNA locus are compared.

In some embodiments, the at least one further transposase and/or at least one further transposome complex binds to euchromatin.

In some embodiments, the ratio between signals obtained from the at least one further transposome complex and the at least one engineered transposome complex at a DNA locus is determined. In some embodiments, an increase in the ratio indicates an increase in open chromatin. In some embodiments, a decrease in the ratio indicates an increase in compact chromatin.

In a further aspect, the invention provides an engineered transposase as described herein.

In a further aspect, the invention provides an engineered transposase comprising a transposase operably linked to a polypeptide that binds to a component of heterochromatin and/or euchromatin.

In a further aspect, the invention provides an engineered transposase comprising a transposase operably linked to a polypeptide that binds to a component of heterochromatin.

In some embodiments, the polypeptide binds to methylated histone.

In some embodiments, the polypeptide binds to H3K9me3, H3K27me3 and/or H3K36me3.

In some embodiments, the polypeptide binds to H3K9me3.

In some embodiments, the polypeptide comprises a chromodomain, a bromodomain, a HMG-box domain, a JmJc domain, a KRAB domain or a PWWP domain.

In some embodiments, the polypeptide comprises a chromodomain.

In preferred embodiments, the chromodomain is the chromodomain of heterochromatin protein 1-α.

In some embodiments, the transposase is selected from Tn5, Sleeping Beauty, Tn10, Drosophila P element, bacteriophage Mu, Tc1/Mariner, IS10 and IS50.

In preferred embodiments, the transposase is Tn5.

In preferred embodiments, the engineered transposase comprises Tn5 operably linked to a chromodomain, preferably chromodomain of heterochromatin protein 1-α.

In some embodiments, the engineered transposase comprises:

- a) a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 9; and/or
- b) a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 22 or SEQ ID NO: 24.

In some embodiments, the engineered transposase comprises a sequence having at least 70% sequence identity to the sequence set forth in SEQ ID NO: 1, SEQ ID NO: 3, SEQ ID NO: 5 or SEQ ID NO: 7.

In a further aspect, the invention provides an engineered transposome complex as described herein.

In a further aspect, the invention provides an engineered transposome complex comprising an oligonucleotide and an engineered transposase according to the invention.

In some embodiments, the oligonucleotide comprises a sequencing primer site, a tagging sequence and/or a mosaic end. In some embodiments, the oligonucleotide comprises a sequencing primer site, a tagging sequence and a mosaic end.

In a further aspect, the invention provides a kit comprising:

- a) at least one engineered transposase according to the invention and at least one further transposase; or
- b) at least one engineered transposome complex according to the invention and at least one further transposome complex.

In a further aspect, the invention provides the use of an engineered transposase according to the invention for making a DNA sequence library or libraries.

In a further aspect, the invention provides the use of an engineered transposome according to the invention for making a DNA sequence library or libraries.

In a further aspect, the invention provides the use of an engineered transposase according to the invention for DNA sequencing.

In a further aspect, the invention provides the use of an engineered transposome according to the invention for DNA sequencing.

In a further aspect, the invention provides the use of an engineered transposase according to the invention for genome and epigenetic sequencing.

In a further aspect, the invention provides the use of an engineered transposome according to the invention for genome and epigenetic sequencing.

In a further aspect, the invention provides a method for making a DNA sequence library or libraries comprising the steps:

- a) providing a sample comprising genomic DNA;
- b) adding at least one engineered transposome complex comprising an oligonucleotide and an engineered transposase;
- c) optionally amplifying tagged DNA; and
- d) optionally isolating the amplified DNA,