The present invention relates to nucleic acids comprising nucleotide sequences corresponding to or based on isolated and purified MAR sequences of human and non-human animal origin. These nucleic acids generally have transcription and/or protein production enhancing activities. The invention also relates to methods for identifying such sequences and systems employing them, e.g., for high yield production of proteins.
The publications and other materials, including patents, used herein to illustrate the invention and, in particular, to provide additional details respecting the practice are incorporated herein by reference. For convenience, the publications, as far as not stated in full within the text are listed in alphabetical order in the appended bibliography. EMBL accession no. AC102666 and sequences flanked by EMBL accession no. BH101870 and BH101901 as well as EMBL accession nos. (synonyms). 126658, 23119391, 22981746 are also incorporated herein by reference in their entirety.
Nowadays, the model of the organization of eukaryotic chromosomes into chromatin loop domains of about 50 to 100 kb is widely accepted [Bodnar J W, Breyene P, Van Montagu M and Gheyseu G, Razin S V]. The outer ends of these loops are believed to correspond to specific DNA sequences that are attached to the nuclear matrix, a proteinaceous network made up of RNPs (ribonucleoproteins) and other nonhistone proteins [Bode J, Benham C, Knopp A and Mielke C]. The chromosomal DNA sequences that are attached to the nuclear matrix are called SAR or MAR, respectively, for scaffold (during metaphase) or matrix (interphase) attachment regions. S/MARs, MAR elements or MAR sequences or MARs for short, are polymorphic regions of typically 300-3000 bp length. It is estimated that there are approximately 100 000 MARs in a mammalian nucleus [Bode J, Stengert-Iber M, Kay V, Schlake T and Dietz-Pfeilstetter A].
By structurally and functionally segregating the chromatin into looped domains, MAR elements are considered to play a crucial role in the replication and regulation of gene expression such as to facilitate the sequential assembly and disassembly of transcription foci in mammalian nuclei. A host of indirect evidence has been generated to support this notion; for instance, in various eukaryotic genomes, DNA replication origins were mapped within MAR elements [Amati B and Gasser S M (1988), Amati B and Gasser S M (1990)]. MARs are also almost always found in non-coding intergenic regions, within introns [Girod P A, Zahn-Zabal M and Mermod N] or at the borders of transcription units [Gasser S M and Laemmli U K; National Center for Biotechnology Information], where they can bind ubiquitous and/or tissue-specific transcription factors. Overall, in transgenic experiments in plants and in animal cell lines, MAR elements have been successfully used to increase transgene expression and stability [Allen G C, Spiker S, Thompson W F, Bode J, Schlake T, Rios-Ramirez M, Mielke C, Stengart M, Kay V and Klehr-Wirth D, Girod P A, Zahn-Zabal M and Mermod N]. For instance, MARs have been used to increase the production of various recombinant proteins in cells relevant to biotechnology and therapeutic applications, such as CHO (chinese hamster ovary) cells [Girod P A, Zahn-Zabal M and Mermod N, Kim J M, Kim J S, Park D H, Kang H S, Yoon J, Baek K and Yoon Y, Zahn-Zabal M, Kobr M, Girod P A, Imhof M, Chatellard P, de Jesus M, Wurm F and Mermod N] (Mermod et al., “Development of stable cell lines for production or regulated expression using matrix attachment regions,” WO 02074969, also U.S. Patent publication 20030087342).
The functional activity of MARs has been linked to their structural properties rather than to their primary DNA sequence. Indeed, MARs are high in A and T content [Boulikas T (1993)] and some particular conformational and physicochemical properties have been observed, such as a natural curvature of the molecule, a narrow minor groove, a high unwinding/unpairing potential or a susceptibility to denature [Bode J, Schlake T, Rios-Ramirez M, Mielke C, Stengart M, Kay V and Klehr-Wirth D, Boulikas T (1993), Boulikas T (1995)]. In fact those very properties have been used to identify MARs via a method called SMAR Scan. In addition, MAR activity may also be mediated by DNA binding proteins, such as chromatin remodeling enzymes and/or transcription factors that may recognize specific structural features of MAR elements such as single stranded and/or curved DNA [Bode J, Stengert-Iber M, Kay V, Schlake T and Dietz-Pfeilstetter A]. No clear-cut protein-binding site or MAR consensus sequence has been found [Boulikas T (1993)], which makes the prediction of MARs from genomic sequences difficult.
While certain functional and structural properties of MARs have been described, their identification is difficult, since they share little in terms of primary structure. While MAR elements may be functionally conserved in eukaryotic genomes, an assumption which is supported by the fact that animal MARs can bind to plant nuclear scaffolds and vice versa [Breyne P, Van Montagu M, Depicker A and Gheysen G, Mielke C, Kohwi Y, Kohwi-Shigematsu T and Bode J], little can be said about what feature renders a MAR sequence, e.g., a potent protein producing sequence. Also, varying results can be obtained depending on the assay employed [Razin S V, Boulikas T (1995), Kay V and Bode J]. Considering the huge number of expected MARs in an eukaryotic organism and the amount of sequences issued by genome projects, tools/programs were developed to detect the structural features of the MAR DNA sequences (SMAR Scan I), or functional sequences such as the binding sites for specific proteins that act as regulatory proteins or transcription factors (SMAR Scan II) [U.S. provisional patent application 60/953,910, filed Aug. 3, 2007, U.S. Patent Publication 20070178469 to Mermod et al.]. Such programs were designed to identify novel potential MAR sequences by detecting clusters of DNA sequence features corresponding to DNA bending, major groove depth and minor groove width potentials, as well as binding sites for specific transcription regulatory proteins. These programs have been used to scan the human genome to identify putative MAR DNA sequences, several of which were shown to increase transgene expression when introduced into an expression plasmid that was transfected into CHO cells (Girod et al., “Identification of S/MAR from genomic sequences with bioinformatics and use to increase protein production in industrial and therapeutic processes,” U.S. Patent Publication 20070178469 to Mermod et al.]. This demonstrated that the SMAR Scan programs can efficiently identify human genetic elements that, in turn, can be used to increase protein synthesis. While functional screens performed so far were limited to the human genome, in large-scale production, a protein of interest is often expressed in non-human mammalian cells.
About sixteen hundred MARs have been identified in the human genome by SMAR Scan and six out of eight were demonstrated to trigger enhanced expression of genes (such as for green fluorescence protein (GFP), antibodies and receptors) in CHO cells when placed upstream of the enhancer/promoter. The length of DNA shown to have ectopic MAR activity ranges from 2.5 kb to 6 kb. However, the lack of structural characterisation of MARs has, as of now, limited the production of “designer” MARs. Thus, there is a need for the characterization of MARs, in particular functional and/or structural regions of MARs, to allow for MAR engineering and design.
The functional screens performed so far were limited to the human genome. Since in large-scale production, a protein of interest is often expressed in mammalian cells, there is also a need for identifying more potent naturally occurring MARs that enhance transcription and/or gene-expression and/or potent protein producer cells in human and/or non-human mammalian cells.
Overall, a need exists to identify and/or produce MARs having advantageous properties, e.g., by identifying further natural occurring MARs, by engineering identified MARs and/or by producing synthetic MARs. Advantageous properties manifest themselves, but are not limited to enhanced transcription and/or protein production/gene-expression properties; reduced length relative to naturally occurring MARs, thus allowing, e.g., for more versatile use in genetic engineering; tissue, cell or organ specificity and/or inducability upon addition of an external stimulant, such as a drug.
To address one or more of these needs and other needs that will become apparent from the following disclosure, several approaches were employed including a large-scale bioinformatics analysis of the mouse genome to identify putative MAR DNA sequences. The mouse genome was analyzed using MAR predictive software SMAR Scan I. Newly identified rodent sequences were assessed for their ability to mediate improved production of recombinant proteins of pharmaceutical interest from cultured cells. To this end, the transcriptional activity of the newly identified MARs was assessed in transgene transfection assays.
Furthermore, MARs, such as human 1—68 MAR and mouse MAR S4 were studied. Modules, in particular modules comprising certain structural/sequence-specific modules of MARs were identified and these modules utilized to engineer MARs having advantageous properties by, e.g., reshuffling, deletion and/or duplication of sequences. Modules were also combined with other elements, e.g., synthetic nucleotide sequences comprising certain binding sites, in particular transcription factor binding sites (TFBS).
The present invention is, in one embodiment, directed at an expression system for high-level expression of at least one gene comprising:
a promoter for operably liking a nucleotide sequence encoding a gene of interest, and
at least one non-human mammalian MAR nucleotide sequence for enhancing expression of a said gene in a cell transformed with said expression system,
wherein said non-human mammalian MAR nucleotide sequence increases expression of said gene about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10 fold or more upon transformation of said cell with said construct.
Said non-human mammalian MAR nucleotide sequence may comprise, consist essentially of or consist of:
(i) SEQ ID No. 3, SEQ ID No. 10 or a functional fragment thereof; or
(ii) a nucleotide sequence having about 80%, about 90%, about 95% or about 98% sequence identity with any of the sequences of (i).
The invention is also directed at an isolated and purified nucleic acid molecule comprising, consisting essentially of or consisting of:
(a) the nucleotide sequence of SEQ ID No. 3 or SEQ ID No. 10 or a functional fragment thereof, or
(b) a nucleotide sequence that has at least about 80%, about 90%, about 95% or about 98% sequence identity with the sequence of (a) and has MAR activity.
The invention is furthermore directed at a method for identifying non-human mammalian MAR sequences comprising:
The feature may hereby be the DNA bending angle whose value is multiplied with the window value to obtain a multiplication value of between about 320 and 1320 such as, about 420 and about 1220, about 520 and about 1120, about 620 and about 1020, about 720 and about 920; the feature may hereby be the major groove depth value which is multiplied with the window value to obtain a multiplication value between about 900 and about 4000, such as about 1200 and 3700, about 1500 and about 3400, about 1800 and about 3100, about 2100 and about 2800 and/or the feature may hereby be minor groove depth value which is multiplied with the window size value to obtain a multiplication value between about 500 and about 2500, such as about 750 and about 2250, about 1000 and about 2000, about 1250 and 1750.
The invention is also directed towards MAR constructs comprising:
(a) (i) an isolated nucleotide sequence comprising at least part of a terminal region of an identified MAR, and
(ii) a further isolated nucleotide sequence comprising about 10%, about 15%, about 20%, about 25%, about 30% or more of said identified MAR or another identified MAR; or
(b) (i) a nucleotide sequence having about 90%, about 95%, about 96%, about 97% about 98%, about 99% sequence identity with the nucleotide sequence of (a) (i), and
(ii) a nucleotide sequence having about 70%, about 80%, preferably about 90%, about 95%, about 96%, about 97% about 98%, about 99% sequence identity with the nucleotide sequence of (b) (i).
Other MAR constructs according to the invention comprise: regions of an identified MAR sequence or a part thereof in consecutive arrangement, wherein an order and/or an orientation differs from that of an identified MAR sequence.
Yet other MAR constructs according to the invention comprise:
(a) a core nucleotide sequence comprising
The invention is also directed at expression systems comprising any of the specified MAR constructs, kit comprising any of the specified expression systems, and the use of any of the MAR constructs, expression systems, cells, transgenic non-human animals, kits and/or methods referred to herein in (1) producing proteins such as antibodies recognizing human pathogen proteins or human cell surface proteins and proteins such as erythropoietin, interferons or other therapeutic or diagnostic proteins and/or (2) in vitro, in vivo gene therapy, cell therapy or tissue regeneration therapy.
The present invention relates to isolated and purified MAR sequences from non-human animals, a method of identifying those sequences and a system employing those sequences for the high yield production of proteins in human cells as well as non-human cells such as rodent cells.
The present invention is also directed at MAR constructs, in particular enhanced MAR constructs, expression systems and kits employing these MAR constructs and their use in the production, in particular large scale production of proteins and in therapy. Furthermore, the invention is directed at methods for the high yield production of proteins in human cells as well as non-human mammalian cells via MARs/MAR constructs.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials varying from those described herein can be used in the practice of the present invention, examplaratory suitable methods and materials are described below.
An expression cassette according to the present invention is a nucleic acid comprising at least one gene as well as elements required for the transcription of this gene.
A promoter according to the present invention is regulatory region of DNA, that, when located upstream of a gene, furthers transcription of the gene.
Expression in a cell, e.g., expression in a non-human mammalian cell, refers, in the context of the present invention, to expression in vitro and in vivo. In vitro expression includes, e.g., expression in a cell line such as a HeLa cell line or a CHO cell line and in cells used for in vitro gene therapy. In vivo expression comprises expression in a transgenic non-human animal and expression in human cells used in vivo gene therapy or in vitro gene therapy after reintroduction of the cells into a human gene therapy recipient.
A mammalian cell, such as a non-human mammalian cell, according to the present invention is capable of being maintained under cell culture conditions. A non-limiting example of this type of cells are chinese hamster ovary (CHOs) cells.
A MAR construct, MAR element, a MAR sequence, a S/MAR or just a MAR according to the present invention is a nucleotide sequence sharing one or more (such as two, three or four) characteristics with a naturally occurring “SAR” or “MAR” and having at least one property that facilitates protein expression of any gene influenced by said MAR. A MAR construct has also the feature of being an isolated and/or purified nucleic acid with MAR activity, in particular, with transcription modulation, preferably enhancement activity, but also with, e.g., expression stabilization activity and/or other activities which are also described under “enhanced MAR constructs.” MAR constructs may be defined based on the identified MAR they are primarily based on: A MAR S4 construct is, accordingly, a MAR construct that whose majority of nucleotide (50% plus) are based on MAR S4. Naturally occurring SARs or MARs, according to a well-accepted model, mediate the anchorage of specific DNA sequences to the nuclear matrix, generating chromatin loop domains that extend outwards from the heterochromatin cores. While SARs or MARs do not contain any obvious consensus or recognizable sequence, their most consistent feature appears to be an overall high A and T content, and C bases predominating on one strand. MARs have generally the propensity to form bent secondary structures that may be prone to strand separation. Several simple sequence motifs high in A and T content have often been found within SARs and/or MARs, but for the most part, their functional importance and potential mode of action has been unresolved. These include the A-box, the T-box, DNA unwinding motifs, SATB1 binding sites (H-box, A/T/C25) and consensus topoisomerase II sites for vertebrates or Drosophila.
A MAR candidate or MAR candidate sequence according to the present invention is a sequence sharing one or more characteristics such as two, three or four with naturally occurring SARs or MARs.
An identified MAR or identified MAR sequence according to the present invention is and isolated nucleotide sequence and corresponds to a naturally occurring MAR sequence in that it comprises all regions (“modules” or “elements”) that allow for the full enhancement of protein/gene expression of its natural counterpart.
The modules (also referred to herein as “regions,” “DNA region”, “portions”, “domains”) of an identified MAR are all required to allow enhancement of protein/gene expression to the capacity of the naturally occurring MAR. None of the modules is generally able to achieve the full activity of the MAR by itself. Some of these regions are sequence specific, such as AT-dinucleotide rich bent regions and transcription factor binding site (TFBS) regions described below. Others “regions” are characterized by their location, e.g., the 5′ and 3′ terminal regions of an identified MAR sequence.
An AT/TA-dinucleotide rich bent DNA region (hereinafter referred to as “AT-rich region”) is a bent DNA region comprising a high number of A and Ts, in particular in form of the dinucleotides AT and TA. In a preferred embodiment, it contains at least 10% of dinucleotide TA, and/or at least 12% of dinucleotide AT on a stretch of 100 contiguous base pairs, preferably at least 33% of dinucleotide TA, and/or at least 33% of dinucleotide AT on a stretch of 100 contiguous base pairs (or on a respective shorter stretch when the AT-rich region is of shorter length), while having a bent secondary structure. However, the “AT-rich regions” may be as short as about 30 nucleotides or less, but is preferably about 50 nucleotides, about 75 nucleotides, about 100 nucleotides, about 150, about 200, about 250, about 300, about 350 or about 400 nucleotides long or longer.
As will be discussed below, an AT-rich region can be distinguished from a neighboring region, such as a binding site region by, e.g., its relative high bending angle. Some binding sites are also often have relatively high A and T content such as the SATB1 binding sites (H-box, A/T/C25) and consensus Topoisomerase II sites for vertebratesor Drosophila. However, a binding site region (module), in particular a TFBS region, which comprises a cluster of binding sites, can be readily distinguished from AT and TA dinucleotides rich regions (“AT-rich regions”) from binding sites high in A and T content by a comparison of the bending pattern of the regions. For example, for human MAR 1—68, the latter might have an average degree of curvature exceeding about 3.8 or about 4.0, while a TFBS region might have an average degree of curvature below about 3.5 or about 3.3. Regions of an identified MAR can also be ascertained by alternative means, such as, but not limited to, relative melting temperatures, as described elsewhere herein. However, such values are species specific and thus may vary from species to species, and may, e.g., be lower. Thus, the respective AT and TA dinucleotides rich regions may have lower degrees of curvature such as from about 3.2 to about 3.4 or from about 3.4 to about 3.6 or from about 3.6 to about 3.8, and the TFBS regions may have proportionally lower degrees of curvatures, such a below about 2.7, below about 2.9, below about 3.1, below about 3.3. In SMAR Scan II, respectively lower window sizes will be selected by the skilled artisan.
A terminal region of an identified MAR/MAR sequence according to the present invention comprises at least about 5%, about 6%, about 7%, about 8%, about 9% or about 10% of an identified MAR.
A binding site or DNA protein binding site is any nucleotide sequence that can bind a DNA binding protein. Binding sites for DNA binding proteins are typically TFBSs. A TFBS is any sequence that can bind a transcription factor. The TFBS can be of any origin such as, but not limited to, human or mouse. TFBSs may also be engineered or synthetic. However, in certain embodiments, the TFBS has a counterpart in a MAR sequence, such as a MAR sequence of the same organism, the same species or the same genus. However, the TFBS may be from a MAR sequence of a different species or a different genus. Also TFBSs that have no currently known counterpart in a MAR sequence are within the scope of the present invention. Such TFBSs may include, but are not limited to, binding sites for USF1 (upstream stimulatory factor 1) or the zink-finger protein CTCF. TFBSs might be modified by 1, 2, 3, 4, 5 or more substitutions, additions and/or deletions and may be in full or part synthesized. Optimized TFBSs, that are TFBSs with optimized binding affinities for the respective DNA binding protein and which often have no known natural counterpart, are also within the scope of present invention. Those optimized TFBS might be created by the above modifications of a natural occurring TFBSs or synthetically, in particular by chemical synthesis. In certain embodiments of the invention, the binding site(s) or TFBS(s) confer tissue specificity to the MAR by, e.g., being bound by tissue-specific natural, engineered or synthetic regulatory proteins or other natural, engineered or synthetic proteins, which, e.g., may respond to specific drugs and molecules. Gene and/or cell therapy are typical cases benefiting from tissue-specificity as well as from the ability of the MAR to specifically respond to a certain drug, that is, be inducible by the drug. In the former case, the, e.g., gene of interest would only be expressed in specific organs or tissues, in the latter case, the expression could, e.g., only be turned on in response to a certain drug. Other non-limiting examples of transcription factors for which TFBSs may be included are, e.g., SATB1, NMP4, MEF2, S8, DLX1, FREAC7, BRN2, GATA 1/3, TATA, Bright, MSX, AP1, C/EBP, CREBP1, FOX, Freac7, HFH1, HNF3alpha, Nk×25, POU3F2, Pitt, TTF1, XFD1, AR, C/EBPgamma, Cdc5, FOXD3, HFH3, HNF3 beta, MRF2, Oct1, POU6F1, SRF, V$MTATA_B, XFD2, Bach2, CDP CR3, Cdx2, FOXJ2, HFL, HP1, Myc, PBX, Pax3, TEF, VBP, XFD3, Brn2, COMP1, Evil, FOXP3, GATA4, HFN1, Lhx3, NKX3A, POU1F1, Pax6 and/or TFIIA.
A binding site, such as a TFBS, is said to be adjacent to a core nucleotide sequence if the core nucleotide sequence and the binding site is separated by not more than about 200, preferably not more than about 100 nucleotides, even more preferably not more than about 50 nucleotides, even more preferably not more than about 25, not more than about 15, not more than about 5 or no nucleotides. In a preferred embodiment the binding site, in particular TFBSs, themselves comprise short linker or adapters of up to 25 nucleotides on each side of the TFBS. In an even more preferred embodiment the TFBS is part of an oligomer of up to about 50 nucleotides, up to about 40 nucleotides or up to about 30 nucleotides. A series of binding sites, such as TFBSs in accordance with the present invention, are a row of TFBSs are arranged in sequence next to each other. A series of TFBSs is said to be adjacent to a core nucleotide sequence if the TFBS of this series which is proximate to the core has the distance specified above. A binding site is said to flank an “AT-rich region” if the binding site is a binding site which is part of the core nucleotide sequence and has a counterpart at the identical location in a naturally occurring MAR.
A binding site may be modified by 1, 2, 3, 4, 5 or more substitutions, additions and/or deletions. Preferably these substitutions, additions and/or deletions are introduced so that the binding site matches a consensus sequence of the respective binding site.
A variety of enhanced MAR construct are part of the present invention and have properties that constitute an enhancement over a naturally occurring and/or identified MAR on which a MAR construct according to the present invention may be based, in particular the natural occurring MAR on which the core nucleic acid sequence is based. Such properties include, but are not limited to, reduced length relative to the full length natural occurring and/or identified MAR, gene expression/transcription enhancement, enhancement of stability of expression, tissue specificity, inducibility or a combination thereof. Accordingly, a MAR construct that is enhanced may, e.g., comprise less than about 90%, preferably less than about 80%, even more preferably less than about 70%, less than about 60%, or less than about 50% of the number of nucleotides of an identified MAR sequence. A MAR construct may enhance gene expression and/or transcription of a gene upon transformation of an appropriate cell with said construct. If, in the context of the present invention, reference is made to MAR constructs/MAR (nucleotide) sequences that “enhance expression,” have a “gene expression enhancing activity,” “enhance protein expression” or similar, this “enhancement” is relative to the expression of, e.g., a gene, expressed under otherwise equivalent conditions but in absence of such a sequence. The enhancement can, for example, be about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, about 10 fold or about 15 fold, about 20 fold or about 25 fold or higher.
A MAR construct may also increase the average percentile of very high producing cells by about 5 fold, about 10 fold, about 15 fold or more. Thus, apart from an higher average expression of a gene, an increase in the percentile of very high expressing cells, as well as the occurrence of stable (“resistant”) colonies (about 100%, about 200%, about 300% or about 400% or higher increase, and/or a lower variability of expression (reduction of cv (coefficient of variation) of about 30%, about 40%, about 50% or more) are within the scope of the present invention.
A MAR construct or similar may “enhance stability of expression.” This “enhancement” is relative to the expression of, e.g., a gene being expressed under otherwise equivalent conditions, but in absence of such a MAR construct/MAR sequence. The stability enhancement can, for example, maintain 100% enhancement after up to about 5, 10, 20, 25, 30, 35, 40, 45, or 50 weeks. A MAR construct may by specific for, e.g., muscle, liver, central nervous system or other tissues and/or may be inducible upon administration of a substance such as antibiotics, hormones and/or metabolic intermediates.
A MAR construct/MAR sequence may be inserted preferably upstream of a promoter region to which a gene of interest is or can be operably linked. However, in certain embodiments, it is advantageous that a MAR construct is located upstream as well as downstream or just downstream of a gene/nucleotide acid sequence of interest. Other multiple MAR arrangements both in cis and/or in trans are also within the scope of the present invention.
A MAR construct or a region of a MAR is said to be based on, e.g., an identified MAR or a region of a identified MAR if it shares one or more (such as two, three or four) characteristics with naturally occurring “SARs” or “MARs” or an respective region thereof and has at least one property that facilitates protein expression of any gene influenced by said MAR. These MAR constructs or regions of a MAR generally have “substantial identity” with the identified MARs they are based on in accordance with the definition of the term provided herein. Despite these and/or modifications of their nucleotide sequence, they will maintain at least one functionality/characteristic of the underlying identified MAR.
The present invention is also directed to uses of a MAR constructs, including enhanced MAR constructs. In these uses, a MAR construct may also be combined with one or more non-MAR epigenic gene regulation tool such as, but not limited to, histone modifiers such as histone deacetylase (HDAC), other DNA elements such as locus control regions (LCRs), insulators such as cHS4 or antirepressor elements (e.g., stabilizer and antirepressor elements (STAR or UCOE elements) or hot spots (Kwaks THJ and Otte AP).
Synthetic, when used in the context of a MAR/MAR construct refers to a MAR whose design involved more than simple reshuffling, duplication and/or deletion of sequences/regions or partial regions, of identified MARs or MARs based thereon. In particular, synthetic MARs/MAR constructs generally comprise one or more, preferably one, region of an identified MAR, which, however, might in certain embodiment be synthesized or modified, as well as specifically designed, well characterized elements, such as a single or a series of TFBSs, which are, in a preferred embodiment, produced synthetically. These designer elements are in many embodiments relatively short, in particular, they are generally not more than about 300 bps long, preferably not more than about 100, about 50, about 40, about 30, about 20 or about 10 bps long. These elements may, in certain embodiments, be multimerized.
A non-human mammalian MAR according to the present invention is a MAR/MAR sequence that is, at least in part, ascertained via the genome or parts of the genome of an non-human mammalian organism. This includes, for example MAR/MAR sequences identified via analysis of a rodent genome such as, but not limited to, a mouse genome.
A vector according to the present invention is a nucleic acid molecule capable of transporting another nucleic acid molecule to which it has been linked. For example, a plasmid is a type of vector, a retrovirus or lentivirus is another type of vector.
Transfection according to the present invention is the introduction of a nucleic acid into a recipient eukaryotic cell, such as, but not limited to, by electroporation, lipofection, via a viral vector or via chemical means.
Transformation as used herein, refers to modifying an eukaryotic cell by the addition of a nucleic acid. For example, transforming a cell could include transfecting the cell with nucleic acid, such as by introducing an DNA vector via electroporation. However, in many embodiments of the invention, the way of introducing the enhanced MARs of the present invention into a cell, is not limited to any particular method.
Transcription means the synthesis of RNA from a DNA template.
Cis refers to the placement of two or more elements (such as chromatin elements) on the same nucleic acid molecule such as, but not limited to, the same vector or chromosome.
Trans refers to the placement of two or more elements (such as chromatin elements) on the two or more nucleic acid molecules such as, but not limited to, two or more vectors or chromosomes.
A sequence is said to act in cis and/or trans on, e.g., a gene when it exerts its activity from a cis/trans location.
A window according to the present invention describes a number of base pairs evaluated for MARs, e.g., during the SMAR Scan procedure. The number is usually about 50 bps, about 100 bps, about 200 bps, about 300 bps. However, windows of 400, 500, 600 or more bps are also within scope of the present invention.
A nucleotide sequence or fragment thereof has substantial identity with another if, when optimally aligned (with appropriate nucleotide insertions or deletions) with the other nucleotide sequence (or its complementary strand), there is nucleotide sequence identity in at least about 60% of the nucleotide bases, usually at least about 70%, more usually at least about 80%, preferably at least about 90%, and more preferably at least about 95-98% of the nucleotide bases.
Identity means the degree of sequence relatedness between two nucleotide sequences as determined by the identity of the match between two strings of such sequences, such as the full and complete sequence. Identity can be readily calculated. While there exists a number of methods to measure identity between two nucleotide sequences, the term “identity” is well known to skilled artisans (Computational Molecular Biology, Lesk, A. M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991). Methods commonly employed to determine identity between two sequences include, but are not limited to those disclosed in Guide to Huge Computers, Martin J. Bishop, ed., Academic Press, San Diego, 1994, and Carillo, H., and Lipman, D., SIAM J Applied Math. 48: 1073 (1988). Preferred methods to determine identity are designed to give the largest match between the two sequences tested. Such methods are codified in computer programs. Preferred computer program methods to determine identity between two sequences include, but are not limited to, GCG (Genetics Computer Group, Madison Wis.) program package (Devereux, J., et al., Nucleic Acids Research 12(1). 387 (1984)), BLASTP, BLASTN, FASTA (Altschul et al. (1990); Altschul et al. (1997)). The well-known Smith Waterman algorithm may also be used to determine identity.
As an illustration, by a nucleic acid comprising a nucleotide sequence having at least, for example, 95% “identity” with a reference nucleotide sequence means that the nucleotide sequence of the nucleic acid is identical to the reference sequence except that the nucleotide sequence may include up to five point mutations per each 100 nucleotides of the reference nucleotide sequence. In other words, to obtain a nucleotide having a nucleotide sequence at least 95% identical to a reference nucleotide sequence, up to 5% of the nucleotides in the reference sequence may be deleted or substituted with another nucleotide, or a number of nucleotides up to 5% of the total nucleotides in the reference sequence may be inserted into the reference sequence. These mutations of the reference sequence may occur at the 5′ or 3′ terminal positions of the reference nucleotide sequence or anywhere between those terminal positions, interspersed either individually among nucleotides in the reference sequence or in one or more contiguous groups within the reference sequence.
Functional fragments of nucleotide sequences are also part of the present invention. A fragment is considered functional as long as they maintain a desirable function of the naturally occurring counterpart sequences, in particular increasing expression of a gene influenced by them. A fragment of a MAR or a MAR region is still considered a functional fragment if it's deletion decreases the transcription enhancing activity of a MAR/region, but does not abolish it. A “fully functional fragment” is a fragment in which any decrease in activity, if at all observed, cannot be statistically verified when the fragment is used without other MAR sequences. Also included within the scope of the present invention are functional fragments having substantial identity in accordance with the definition provided herein with, e.g., the naturally occurring MAR, identified MAR, MAR region or a fragment of any of these.
As will be described in detail herein, in certain embodiments, modules or parts thereof are reshuffled, duplicated and/or subject to deletion. As the person skilled in the art will recognize, such, shuffling and/or duplication of regions, may create, e.g., new restrictions sites, which in turn can lead to new restriction pattern of the constructs so created and may lead to adjustments in the length of the sequences. Those adjustments may affect, but are not limited to, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40 nucleotides. These adjustments as well as other modifications are within the scope of the present invention. Sequences of the rearranged MARs, in particular reshuffled and/or duplicated MARs, that have substantial identity in accordance with the definition provided herein with each of the respective element(s) (or region(s)/module(s)) and/or fragment(s) thereof, are within the scope of the present invention.
MAR sequences can be transferred from plant to mammalian cells or vice versa, and will retain nuclear matrix attachment activity in the heterologous host cells [Breyne P, Van Montagu M, Depicker A and Gheysen G, Mielke C, Kohwi Y, Kohwi-Shigematsu T and Bode J]. Given this conservation of MAR functions in all higher eukaryotes, one would expect that a MAR sequence from one genus would work as well in the genus it was derived from as in another genus.
Nonetheless, reasoning that MAR sequences from rodent origins might be in some way advantageous for the production of recombinant proteins, the whole mouse genome was screened to identify MAR candidate sequences using SMAR Scan I, a computer program that, as described below, detects structural features of the DNA sequences (DNA bend, for example).
As will be discussed below, it was surprisingly found that non-human, in particular rodent (here mouse) MAR sequences are more potent in terms of expression enhancement, e.g., in CHO cells as well as human cells such as HeLA cells. Even more surprisingly, it was found that certain non-human MAR sequences work substantially better, both in non-human cells, e.g., CHO cells as well as in human cells, e.g. in HeLa cells, than human MAR sequences.
Several of the identified novel S/MAR DNA sequences of mouse origin were could be shown to increase transgene expression, thus providing evidence that SMAR Scan I, a program designed for and tested with human MAR sequences, is an efficient tool for identifying S/MAR elements from a multitude of genomic origins, e.g., mouse in addition to human. Importantly, however, it was found that more potent MAR elements can be identified by screening rodent (e.g., mouse) genomes than by screening the human genome. In particular, the invention establishes that highly active S/MAR elements from the mouse genome can be used to increase the production of recombinant proteins, such as recombinant proteins having pharmaceutical uses, in a variety of cells, in particular mouse and human cells. The mouse S/MAR S4 was shown to be the most potent of the newly isolated mouse MARs and of the previously cloned human MARs. The invention is thus directed at non-human MARs having enhanced protein production and/or at MARs enhancing the stability of protein expression over time.
SMAR Scan I is a software tool that identifies MAR candidate sequences based on the structural and physicochemical features of these sequences. A thorough discussion of the method has been provided elsewhere (U.S. Patent Publication 20070178469 to Mermod et al). Essentially, “SMAR Scan” describes bioinformatic tools comprising algorithms that recognize profiles, based on dinucleotide weight-matrices, to compute the theoretical values for conformational and physiochemical properties of DNA. Preferably, SMAR Scan evaluates DNA sequence features corresponding to DNA bending, major groove depth and minor groove width potentials, melting temperatures in a wide variety of combinations using scanning windows of variable sizes. For each feature, a cut-off or threshold value has to be set. The program returns a hit each time the computed score of a given region is above the set cut-off/threshold value.
Two data output modes are available to handle the hits, the first (called “profile-like”) simply returns all hit positions on the query sequence and their corresponding values for the different criteria chosen. The second mode (called “contiguous hits”) returns only the positions of several contiguous hits and their corresponding sequence. For this mode, the minimum number of contiguous hits is another cut-off/threshold value that can be set, again with a tunable window size. To tune the default cut-off/threshold values for, e.g., the four theoretical structural criteria, experimentally validated MARs, e.g., from SMARt DB can be used. In this way, for example, all human MAR sequences from the database were retrieved and analyzed with SMAR Scan using the“profile-like” mode with the four criteria and with no set cut-off/threshold value. This allowed the setting of each function for every position of the sequences. The distribution for each criterion was then computed according to these data (see FIGS. 1 and 3 U.S. Patent Publication 20070178469 to Mermod et al).
While the use of SMAR Scan technology is a preferred one for the identification of MAR sequences, the person skilled in the art will recognize that other bioinformatic tools that allow for the identification of S/MAR motives with similar or even somewhat lower selectivity can be used in the context of the present invention. Preferably such tools can be set so that only those MAR associated features that display these features beyond a certain value, that is a set threshold or cut-off value, yield or can be set to yield a positive hit. Many bioinformatic tools used to identify MARs were, however, designed to identify matrix-binding activity. This activity does not necessarily correlate with the ability to increase gene expression [Phi-Van, L. & Stratling, W. H.].
SMAR Scan I has been developed to identify human MARs. Thus, it was developed using structural data collected from known human MARs. A human “tuned” SMAR Scan I program was used in context of the present invention to evaluate the mouse genome for MAR sequences. However, differences in the base compositions of the mouse and human genomes prevented the use of SMAR Scan program with the settings previously defined to scan the human genome (U.S. Patent Publication 20070178469 to Mermod et al). Therefore distinct window size and structural parameter threshold values had to be defined by trial and error, until the program would allow the identification of a manageable collection of candidate mouse MAR sequences. Several of those, when tested, turned out to be “super MAR sequences”, that are MAR sequences allowing for substantial increase of protein production, when, e.g., placed on a vector with the gene encoding the respective protein and introduced into a rodent cell line.
Mouse MAR S4 and Mouse MAR S46 are examples of rodent MAR sequences that are within the scope of the present invention. These MAR sequences as isolated are shown in the appended sequence listing as SEQ ID No. 3 and SEQ ID No. 10. However, as the person skilled in the art will appreciate, base pair insertions, deletions, substitutions, in particular fragments of these and other non-human MARs that themselves may contain base pair insertions, deletions or substitutions are within the scope of the present invention as long as they maintain a desirable function of the wild type sequences, in particular increasing expression of a gene influenced by them. For example, an insertion that decreases the transcription/gene expression enhancing activity of a MAR sequence, but does not abolish it, is considered to not substantially interfere with the desirable function, here gene expression enhancement, of the MAR. Similarly, a fragment of an, e.g., identified MAR is still considered a functional fragment if has a somewhat reduced transcription enhancing activity relative to the identified MAR, but does not completely lose the transcription enhancing activity. A “fully functional fragment” is a fragment in which any decrease in activity, if at all observed, cannot be statistically verified. As detailed elsewhere herein, also included within the scope of the present invention are sequences having “substantial identity” with the nucleotide sequence of the naturally occurring MAR or a fragment thereof.
Identified MARs were analyzed to determine whether they comprise modules (or regions), in particular sequence-specific modules, which could be used in engineering identified MARs or in producing synthetic MARs, including MARs comprising synthesized regions. In fact, several sequence-specific modules of identified MARs could ascertained. Surprisingly it was found that shuffling and/or full or partial duplication and even deletion of certain modules or parts thereof resulted in enhanced MARs as described above.
The human 1—68 MAR and S4 MAR from mouse will serve as a model for producing MAR constructs by shuffling, deleting and/or duplication of regions. However, as the person skilled in the art will readily understand, the present invention is directed at manipulating any identified MAR and at the MAR constructs resulting therefrom. Appropriate adjustments that may be necessary to accommodate different MARs, including MARs of different origin, are well within the skill of the artesian. Examples include, but are not limited to, eukaryotic organisms, preferably mammals, especially model organisms such as mouse, and species of economic importance such as cattle, pigs, sheep as well as humans.
The human 1—68 MAR served as a model for producing MAR constructs by shuffling and/or duplication of regions. Using modules ascertained as described below or parts thereof, MAR constructs were produced based on identified MARs, such as human 1—68 MAR. The MAR constructs were in particular produced by shuffling, and/or duplication of regions (modules) or parts thereof.
The 1—68 MAR example shows that modules (also referred to herein as regions or elements) of an identified MAR were all required to allow enhancement of gene expression to the capacity of the naturally occurring MAR. None of the modules identified was able to achieve the full activity of the MAR by itself. Surprisingly, it was found that shuffling and full or partial duplication of certain modules resulted in further enhancement of gene expression.
Several non-redundant sequence-specific modules (regions) were identified. These modules cooperate to influence local chromatin structure. This organization of MAR parallels somewhat the control of metazoan transcription: a diverse collection of modules, which are dispersed up to several kilobases from the initiation site, collectively dictate where transcription will initiate.
The sequence-specific modules identified were in particular (1) regions high in A and T content, such as symmetrical A-T rich regions (alternating A and T) in particular “AT rich regions” and (2) regions rich in binding sites, in particular, but not limited to, TFBSs separated by A-T rich regions.
It has been reported that bent DNA high in A and T content are commonly found in promoter regions, MARs and replicators [Aladjem and Fanning 2004]). Previously, sequences high in A and T content (“symmetric” ones as described above as well as “asymmetric” ones, that are sequences having mostly A on one strand and mostly T on the other) were thought to primarily facilitate duplex opening. However, these regions might have a wide range of functions. For example, sequences high in A and T content in the lamin B2 replicator bind the origin-recognition complex (ORC) [Abdurashidova, Danailov et al. 2003; Stefanovic, Stanojcic et al. 2003] and can facilitate the loading of the Mcm4/6/7 helicase and the unwinding of duplex DNA in vitro [You, Ishimi et al. 2003]. Architectural roles for intrinsically bent DNAs high in A and T content have also been considered. The “AT-hook DNA-binding motifs” of fission yeast ORC4, which resemble those of the high mobility group protein HMG-I/Y, may have such an architectural role [Strick and Laemmli 1995; Bell 2002]. Protein-mediated bending, analogous to the HMG-I/Y-mediated DNA bending that facilitates V(D)J recombination, and the assembly and stabilization of transcription complexes at enhancers and promoters in eukaryotes, might also occur [Levine and Tjian 2003]. Not all regions that have a high A and T content correspond to bent DNA. However, those DNAs are bent could act as a ‘histone magnet’ to attract histones to form nucleosomes over the bent DNA, leaving the adjacent regions free to act as a landing pad for pre-replication/transcription proteins.
As described above, MARs also contain binding sites for other proteins in particular in the “regions rich in binding sites” or just “binding site regions” (see (2) above), Those other proteins may include, but are not limited to, DNA unwinding element-binding protein (DUE-B) and transcription factors such as Hox proteins, SATBI, CEBP, etc as found in 1—68 MAR. Mutational analysis indicates that these binding sites contribute to the MAR function.
Human 1—68 MAR could be improved by reversing its orientation and by moving away the bent DNA to augment the size of the transcription factors binding site region upstream the promoter region. As can be seen in
Several MARs were constructed based the S4 MAR (Table 3) and characterized (
Experiment with the human 1—68 MAR (
Thus, the present invention includes high activity MAR constructs that are considerably shorter in length than their natural counterparts, thus making them of more convenient size for, e.g., vector design and transfer.
In particular, MAR constructs comprising less than about 90%, preferably at less than about 80%, even more preferably less than about 70%, less than about 60% or less than about 50% of the number of nucleotides of an identified MAR sequence are within the scope of the present invention. Those constructs preferably comprise the 3′ terminal region of the identified MAR, even more preferably at least about 5%, about 6%, about 7%, about 8%, about 9% or about 10% of the 3′ terminal region of an identified MAR/MAR sequence. However, MAR constructs that contain the 5′ terminal region of the identified MAR are also within the scope of the present invention
The rearrangement of the human 1—68 MAR showed that a 223 bp fragment of the Hox-rich region located at the 3′ end of the forward hatched portion of an isolated MAR, retains, in certain embodiments, the activity of the full-length region. This suggests that this portion may, in certain embodiments of the invention, be of importance in cooperating with other elements.
The findings of a possible cooperation between the AT-rich bent DNA region and transcription factor binding sites in human MAR 1—68 prompted the construction of MARs/MAR constructs comprising the AT-rich region of MAR 1-68 adjacent to one or several transcription factor binding sites.
Binding sites for the C/EBP, NMP4, FAST1, SATB1, and HoxF (also called Gsh) transcription factors were identified from the MAR 1-68 sequence (
As can be seen from
While, in preferred embodiments the additional binding sites are downstream the AT-rich core, but upstream of the promoter, other configurations, such as, but not limited to, a location upstream the AT-rich region, within the AT-rich region, adjacent to the AT-rich region of the core or downstream of the gene, are also within the scope of the present invention.
In a preferred embodiment, certain combinations of protein binding sites, either synthetic or isolated, are contemplated, such as combinations of two different protein binding sites, combinations of three different protein binding sites, combinations of four, five, six, seven, eight, nine, ten or more protein binding sites. These combinations may be multimerized, in full or in part. In a preferred embodiment, the combination comprises Hox/Gsh and SATB1. The insertion of these combinations or multimerized combinations, e.g., between the core and the appropriate promoter, may increase the occurrence of high expressor clones about two fold or more, such as, but not limited to, about three, four, five, six, seven, eight, nine fold or more, preferably about 10 fold or more, even more preferably, about 11, 12, 13, 14, 15, 16, 17, 18, 19 fold or more or about 20 or even about 25 or about 30 fold or more, relative to the occurrence of high expression clones when vectors not comprising a MAR construct/MAR sequence are used under otherwise equivalent conditions.
In sum, MAR constructs can be assembled from building blocks. These building blocks may include or be based on regions, such as sequence specific regions, of identified MARs or parts thereof, synthetic building blocks (including modifications to optimize their functionality), such as a series of chemically synthesized transcription factor binding sites (TFBS), building blocks from or based on non-MAR sequences, or building blocks of or based on MAR sequences of different species or genera. In a preferred embodiment, such MARs comprise AT-rich regions coupled to TFBS regions or specific transcription factor DNA-binding site combinations as those shown in Table 5. The person skilled in the art will appreciated that these principles are not limited to the particular sequences or to the binding sites disclosed herein, and that other derivatives, homologues or sequence combinations are also within the scope of the present invention.
As mentioned above, the MAR constructs, expression systems and/or kits of the invention can be used for protein production. Here a MAR construct may be included in a vector comprising a gene for a protein of interest, for example insulin, under the control of a promoter. The vector is introduced into a cell and the cells are grown. The process is then scaled-up for large scale batch production of insulin. High insulin production, e.g. 3 to 5 times higher than without the MAR construct, can be maintained over three weeks.
As mentioned above, the MAR constructs, expression systems and/or kits of the invention can be used for in vitro and/or in vivo gene therapy and in cell and tissue replacement therapy. E.g., in vitro gene therapy a MAR construct may be included in a vector comprising a gene defective in the patient in need of in vitro gene therapy under the control of a promoter. Subsequently the MAR construct is introduced into cells, such as bone marrow cells of the patient. After transformation with the MAR construct, the bone marrow cells are introduced into the patient and expression of the gene of interest may precede at a level 5 times higher than without the MAR construct. An effective amount of protein may thus be expressed.
In in vivo gene therapy, a vector comprising the MAR construct may be directly introduced into the cells of a patient in need thereof, e.g. by injection.
Similarly, an expression systems of the present invention can be introduced into a stem cell for engraftment for tissue regeneration or for, e.g., neuronal cell therapy for neurodegenerative diseases. Non-limiting examples of stem cells, which can be used in this embodiment of the invention, are hematopoietic stem cells (HSCs) and mesenchymal stem cells (MSCs) obtained from bone marrow tissue of an individual at any age or from cord blood of a newborn individual. The stem cells are transfected with an expression system according to the present invention and successful transformants can be transplanted or reintroduced into a patient in need of the cell therapy or tissue regeneration therapy. Several methods are available for obtaining transformed stem cells, e.g., Nucleofection® (Cell Line Solution V (VCA-1003), amaxa GmbH, Germany).
Transgenic animals, which can produce a wide variety of proteins including antibodies that bind to human antigens, can be produced by known methods (e.g., but not limited to, U.S. Pat. Nos. 5,770,428, 5,569,825, 5,545,806, 5,625,126, 5,625,825, 5,633,425, 5,661,016 and 5,789,650 issued to Lonberg et al.). The expression systems and MAR constructs can be employed in protein production via, e.g., transgenic cattle, sheep, goats or pigs, typically by secretion of the protein into a biological fluid (e.g., milk). See, e.g., U.S. Pat. No. 5,750,172 to Meade et al. See also U.S. Pat. No. 6,518,482 to Lubon et al. for the production of transgenic animals.
The invention will be further described in the following examples, which do not limit the scope of the invention set forth in the claims, the summary of the invention or elsewhere herein. The materials, methods, and examples are illustrative only and are not intended to be limiting. With the guidance provided herein, the person skilled in the art will be able make modifications, additions and improvements all of which are within the scope of the present invention.
All mouse chromosome sequences corresponding to the NCBI m34 mouse assembly were compiled and analyzed with SMAR Scan I. Low and high stringency screens were performed using either a threshold for the DNA bending criterion of 3.6 degrees and a minimal window size of 300 bp, or a threshold of 4.2 degrees and a minimal window size of 100 bp, respectively.
Low stringency analysis via SMAR Scan I of the whole mouse genome yielded a total of 1496 putative S/MARs (candidate MARs), representing a total of 622,410 bp (0.024% of the whole mouse genome). Table 1 shows for each chromosome: its size, its number of genes, its number of predicted MARs (candidate MARs), its MARs density per gene and the average distance in kb between S/MARs. This table reveals that there are various gene densities per predicted S/MAR (candidate MAR) on different chromosomes (with a standard deviation representing around 50% of the mean of the density of genes per MARs). The fold difference between the higher and the lower density of genes per MAR is 6 without considering the chromosome Y, which is extremely rich in predicted MAR (candidate MARs) relative to its size and its number of gene, indicating a strong and unexpected bias in the distribution of these MARs. Table 1 also shows that the average distance between S/MARs (kb per S/MAR) is variable (standard deviation represents 38% of the mean of kb per S/MAR and the fold difference between the higher and the lower density of kb per S/MAR is 8.3). The chromosomes 10, 11, X and Y contribute significantly to the high standard deviation of these densities.
SMAR Scan I has been originally tuned for human sequences and thus yields few MARs with mouse genomic sequences when using the most stringent parameters: therefore, the default cutoff values were adjusted for the high stringency screen (threshold of 4.2 degrees for the DNA bending criterion) to a minimum size of contiguous hits to be considered as MAR, using a window of 100 bp instead of 300 bp. Analysis by SMAR Scan I of the mouse genome predicted 49 “super” MARs with a value>4.2 degrees for the DNA bending criterion.
Five MAR elements were selected from the putative MARs (candidate MARs) obtained with the high stringency screen of the complete mouse genome with SMAR Scan. They were cloned in plasmid vectors from mouse genomic DNA bacterial artificial chromosomes purchased from the Children's Hospital Oakland Research Institute (CHORI, http://bacpac.chori.org/).
These newly-identified mouse MARs were named S4, S8, S15, S32 and S46 (according to the order of identification by SMAR Scan I, “super” MARs S1 to S49). The human MARs 1—3, 1—6, 1—9, 1—42, 1—68, 3_S5 and X_S29 have been previously identified, the MARs 168 and X_S29 being the most potent human elements (Mermod et al. “High efficiency gene transfer and expression in mammalian cells by a multiple transfection procedure of MAR sequences,” WO2005/040377, see also U.S. Patent Publication 20070178469 to Mermod et al). These MARs were inserted into the pGEGFP control vector upstream of the SV40 promoter and enhancer driving the expression of the green fluorescent protein and these plasmids were transfected into cultured CHO cells, as described previously [Girod P A, Zahn-Zabel M and Mermod N]. Expression of the transgene was then analyzed in the total population of stably transfected cells using a fluorescent cell sorter (FACS) machine.
As can be seen from
The transcriptional activity of the most potent human MARs 1—68 and X_S29 was compared to the ones obtained with the newly identified mouse MARs. Five mouse MARs were initially tested via GFP expression assays, and they were all found to increase the expression of GFP to different levels. Mouse MARs S15 and S32 are relatively the least transcriptionally active MARs (−2 fold increase compared to GFP alone), S8 and S46 showed a medium activity (3-4 fold increase) and MAR S4 displayed very high transcriptional activity (7 fold increase). Moreover, mouse MAR S4 is the most potent of all MARs tested in this study. Comparison between the human MAR 1-68 and mouse MAR S4 transcriptional activity reveals a 50% increase of the mean fluorescence of the whole population (Gmean M0) and of the high GFP-producing cells (M2), whereas the percentile of very high GFP-producing cells (M3) was 175% higher with mouse MAR S4. The homogeneity of the whole population in terms of GFP fluorescence (CV M0) was always 1-2% lower with mouse MAR S4, which is advantageous because it indicates greater stability of the cell productivity.
After this first round of cloning, it was sought to be determined if highly active MAR elements can be consistently obtained from the mouse genome. Thus, two additional mouse MARs (S6 and S10) were cloned and characterized. These new mouse MARs were inserted into the pGEGFP control vector and analyzed by FACS as above. Mouse MAR S10 appeared to be also more potent than the best human MARs in all the different parameters analyzed by FACS, and is nearly as active transcriptionally as MAR S4 to increase overall expression.
To assess very high producers, the percentile of M3 cells normalized to the one obtained for the human MAR 1—68. The result are presented in
The potency of the S4 MAR was assessed in CHO cells. In addition, EGFP expression vectors comprising either human MAR 1-68, mouse MAR S4 or no MAR were transfected stably in human HeLa cells and EGFP fluorescence was analyzed.
To determine if mouse MARs, in particular the most potent ones, can be used to augment the production of proteins for pharmaceutical applications, they were inserted in the pMZ37 and pMZ59 vectors encoding the heavy and light chains of a Rhesus-D-recognizing immunoglobulin [Miescher S, Zahn-Zabal M, De Jesus, M, Moudry, R, Fisch, I, Vogel, M, Kobr, M, Imboden, M A, Kragten, E, Bichler, J, Mermod, N, Stadler, BC, Amstutz, H., Wurm, F]. These plasmids were transfected in CHO cells, selection and immunoglobulin assays were performed as described previously [Girod P A, Zahn-Zabel M and Mermod N].
Expression Stability with Human MAR 1—68
MAR 1—68 was used to demonstrate that the expression of genes that are produced by clones, not containing MARs are gradually silenced, equivalent clones containing MARs not only maintain high level expression over time, but silent cells recover expression.
A structural analysis of MARs revealed DNA sequence regions/modules that each contribute to enhanced gene expression.
The experiments depicted in
Based on the findings with human 1—68 MAR, the S4 MAR was also analyzed for modules, in particular those responsible for its transcriptional activity. This analysis was also performed with the goal of reducing the size of the S4 MAR, which is relatively long. Thus, several MARs were constructed from the S4 MAR (Table 3) and characterized (
To further analyze the activity of the 3′ end sequences of MAR S4, this portion of the MAR was further dissected by removing or duplicating portions of it.
The findings of a possible cooperation between the AT-rich bent DNA region and transcription factor binding sites in human MAR 1—68 prompted the construction of synthetic MARs comprising the AT-rich portion of MAR 1-68 adjacent to one or several transcription factor binding sites.
Different mixtures of active binding sites were also tested, to determine if synergistic effects may be observed. To do so various combinations of oligonucleotides containing binding sites for the different transcription factors were mixed in DNA ligation reactions, and the precise order and arrangement of binding sites were determined by DNA sequencing. The obtained combinations are showed in Table 5:
The resulting plasmids were tested by transfection as before.
This application claims the benefit of U.S. provisional application Nos. 60/823,319, filed Aug. 23, 2006 and 60/953,910, filed Aug. 3, 2007, which are incorporated herein by reference in their entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB07/02404 | 8/22/2007 | WO | 00 | 10/15/2009 |
Number | Date | Country | |
---|---|---|---|
60823319 | Aug 2006 | US | |
60953910 | Aug 2007 | US |