COMPOSITIONS AND METHODS FOR TREATING SENSORINEURAL HEARING LOSS USING STEREOCILIN DUAL VECTOR SYSTEMS

SEQUENCE LISTING

The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. The ASCII copy, created on May 4, 2022, is named 51471-011 W02_Sequence_Listing_5_4_22_ST25 and is 113,251 bytes in size.

FIELD OF THE INVENTION

Described herein are compositions and methods for treatment of hearing loss, particularly forms of the disease that are associated with mutations in stereocilin (STRC) by way of STRC gene therapy, in which the expression of the STRC gene is under regulatory control of an oncomodulin (OCM) promoter. The disclosure provides two-vector expression systems that include a first nucleic acid vector that contains a polynucleotide encoding an N-terminal portion of a stereocilin protein and a second nucleic acid vector that contains a polynucleotide encoding a C-terminal portion of a stereocilin protein. These vectors can be used to increase the expression of or provide wild-type STRC to a cell or subject, such as a subject suffering from hearing loss (e.g., sensorineural hearing loss).

BACKGROUND

Sensorineural hearing loss is a type of hearing loss caused by defects in the cells of the inner ear or the neural pathways that project from the inner ear to the brain. Although sensorineural hearing loss is often acquired, and can be caused by noise, infections, head trauma, ototoxic drugs, or aging, there are also congenital forms of sensorineural hearing loss associated with autosomal recessive mutations. One such form of autosomal recessive sensorineural hearing loss is associated with mutation of the stereocilin (STRC) gene. Stereocilin is a large protein encoded by the STRC gene on chromosome 15q15, which contains 29 exons spanning approximately 19 kb of the genome. The STRC gene is tandemly duplicated, where the second copy contains a premature stop codon in exon 20, thereby producing an STRC pseudogene. Previous studies have identified mutations in STRC in families with autosomal recessive non-syndromic sensorineural hearing loss (Verpy et al., Nat. Genet. 29:345-9 (2001)). Stereocilin protein expression is limited to stereocilia in hair bundles of inner ear hair cells. Stereocilin protein is thought to form horizontal top connectors and tectorial membrane-attachment crowns, which are required for the normal functioning of the auditory apparatus (Avan et al., PNAS 116:25948-57 (2019); Verpy et al., J. Comp. Neurol. 519:194-210 (2011)). Mice lacking stereocilin have been shown to exhibit abnormal hair cell bundles with defective cohesion and impaired hearing (Verpy et al., Nature 456:255-8 (2008)).

In recent years, efforts to treat hearing loss have increasingly focused on gene therapy as a possible solution; however, the STRC gene is too large to allow for treatment using standard gene therapy approaches. There is a need for new therapeutics to treat STRC-related sensorineural hearing loss.

SUMMARY OF THE INVENTION

The present invention provides compositions and methods for treating sensorineural hearing loss in a subject, such as a human subject. The compositions and methods of the disclosure pertain to dual vector systems for the delivery of a polynucleotide encoding a stereocilin protein to a subject having or at risk of developing sensorineural hearing loss (e.g., a subject with a mutation in STRC). For example, using the compositions and methods described herein, a first nucleic acid vector and a second nucleic acid vector that each encode a portion of a functional stereocilin protein may be delivered to a subject by way of viral gene therapy. The compositions and methods described herein may also be used to increase expression of a wild-type stereocilin protein in a cochlear hair cell (e.g., an outer hair cell).

In a first aspect, the invention provides a two-vector system comprising (a) a first nucleic acid vector comprising an oncomodulin (OCM) promoter having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to any one of SEQ ID NOs: 1-3 operably linked to a first polynucleotide encoding an N-terminal portion of a stereocilin protein; and (b) a second nucleic acid vector including a second polynucleotide encoding a C-terminal portion of a stereocilin protein.

In some embodiments, the first polynucleotide partially overlaps with the second polynucleotide. In some embodiments, the first polynucleotide and the second polynucleotide have a region of overlap having a length of at least 200 bases (b) (e.g., at least 200 b, 300 b, 400 b, 500 b, 600 b, 700 b, 800 b, 900 b, 1.0 kilobase (kb), 1.1 kb, 1.2 kb, 1.3 kb, 1.4 kb, 1.5 kb or more). In these embodiments, when introduced into a mammalian cell, the first and second nucleic acid vectors undergo homologous recombination to form a recombined polynucleotide that encodes a full-length stereocilin protein. In some embodiments, the first nucleic acid vector includes a polynucleotide including the sequence of nucleotides 225 to 4574 of SEQ ID NO: 43. In some embodiments, the second nucleic acid vector includes a polynucleotide including the sequence of nucleotides 211 to 4219 of SEQ ID NO: 44.

In some embodiments, the first nucleic acid vector includes a splice donor signal sequence positioned at the 3′ end of the first polynucleotide and the second nucleic acid vector includes a splice acceptor signal sequence positioned 5′ of the second polynucleotide. In some embodiments, the first and second polynucleotides do not overlap.

In some embodiments, the first nucleic acid vector includes a splice donor signal sequence positioned at the 3′ end of the first polynucleotide and a first recombinogenic region positioned 3′ of the splice donor signal sequence and the second nucleic acid vector includes a second recombinogenic region, a splice acceptor signal sequence 3′ of the recombinogenic region, and the second polynucleotide 3′ of the splice acceptor signal sequence. In some embodiments, the first and second polynucleotides do not overlap. In some embodiments, the first and second recombinogenic regions are the same. In some embodiments, the first recombinogenic region and the second recombinogenic region is an AP gene fragment. In some embodiments, the AP gene fragment includes or consists of the sequence of any one of SEQ ID NOs: 47-52. In some embodiments, the AP gene fragment includes or consists of the sequence of SEQ ID NO: 50. In some embodiments, the first nucleic acid vector further includes a degradation signal sequence positioned 3′ of the recombinogenic region; and the second nucleic acid vector further includes a degradation signal sequence positioned between the recombinogenic region and the splice acceptor signal sequence. In some embodiments, the first nucleic acid vector includes a polynucleotide including the sequence of nucleotides 225 to 4454 of SEQ ID NO: 45 and the second nucleic acid vector includes a polynucleotide including the sequence of nucleotides 257 to 3597 of SEQ ID NO: 46.

In some embodiments, the second nucleic acid vector further includes an OCM promoter having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to the nucleic acid sequence of any one of SEQ ID NOs: 1-3 operably linked to the second polynucleotide, wherein the promoter is positioned 5′ of the second polynucleotide. In some embodiments, the OCM promoter in the second nucleic acid vector is the same (i.e., has the same nucleotide sequence) as the OCM promoter in the first nucleic acid vector. In some embodiments, the OCM promoter in the second nucleic acid vector has a different nucleotide sequence than the OCM promoter in the first nucleic acid vector.

In some embodiments, the first nucleic acid vector further includes a polynucleotide encoding an N-terminal intein (N-intein) positioned 3′ of the first polynucleotide. In some embodiments, the second nucleic acid vector further includes a polynucleotide encoding a C-terminal intein (C-intein) positioned between the OCM promoter and the second polynucleotide. In some embodiments, the N-intein and C-intein are components of a split intein trans-splicing system.

In some embodiments, the first and/or second vectors include an intein degradation signal. In some embodiments, the degradation signal is an N-degron and/or a C-degron. In some embodiments, the N-degron and/or the C-degron are independently a CL1, PB29, SMN, CIITA, or ODC degron. In some embodiments, the degradation signal is an E. coli dihydrofolate reductase (ecDHFR) degradation signal. In some embodiments the degradation signal is an FKBP12 degradation domain (Banaszynski et al., Cell 126:995-1004, 2006). In some embodiments the degradation signal is a PEST degradation domain (Rechsteiner and Rogers, Trends Biochem Sci. 21:267-271, 1996). In some embodiments the degradation signal is a UbR tag ubiquitination signal (Chassin et al., Nat Commun. 10:2013, 2019). In some embodiments the degradation signal is a destabilized mutation of human ELRBD (Miyazaki et al., J. Am. Chem. Soc., 134:3942-3945, 2012).

In some embodiments, the first and second vectors, when introduced into a mammalian cell, produce a first and second fusion protein, respectively, wherein the first fusion protein includes the N-terminal portion of stereocilin and the N-intein positioned 3′ thereto, and wherein the second fusion protein includes the C-intein and the C-terminal portion of stereocilin positioned 3′ thereto. In some embodiments, the C-terminus of the N-intein of the first fusion protein and the N-terminus of the C-intein of the second fusion protein are capable of forming a peptide bond, thereby producing a polypeptide including, from N-terminus to C-terminus, the N-terminal portion of stereocilin, N-intein, C-intein, and the C-terminal portion of stereocilin, wherein the bound N-intein and C-intein are capable of self-excising and ligating the C-terminus of the N-terminal portion of stereocilin and the N-terminus of the C-terminal portion of stereocilin, thereby producing a full-length stereocilin protein.

In some embodiments, the split intein trans-splicing system is derived from a DnaEgene of one or more bacteria. In some embodiments, the one or more bacteria are selected from the group consisting of Nostoc punctiforme (Npu), Synechocystis sp. PCC6803 (Ssp), Fischerella sp. PCC9605 (Fsp), Scytonema tolypothrichoides (Sto), Cyanobacteria bacterium SW_9_47_5, Nodularia spumigena (Nsp), Nostoc flagelliforme (Nfl), Crocosphaera watsonii (Cwa) WH8502, Chroococcidiopsis cubana (Ccu) CCALA043, Trichodesmium erythraeum (Ter), Rhodothermus marinus (Rma), Saccharomyces cerevisiae (Sce), Saccharomyces castellii (Sca), Saccharomyces unisporus (Sun), Zygosaccharomyces bisporus (Zbi), Torulaspora pretoriensis (Tpr), Mycobacteria tuberculosis (Mtu), Mycobacterium leprae (Mle), Mycobacterium smegmatis (Msm), Pyrococcus abyssi (Pab), Pyrococcus horikoshii (Pho), Coxiella burnetti (Cbu), Coxiella neoformans (Cne), Coxiella gattii (Cga), Histoplasma capsulatum (Hca), and Porphyra purpurea chloroplast (Ppu). In some embodiments, the split intein trans-splicing system is derived from multiple sequence alignment studies of DnaE for identifying a consensus design (e.g., Cfa) to engineer a split intein with desirable stability and activity.

In some embodiments, the N-intein has a sequence of any one of SEQ ID NOs: 8, 10, 13, 15, 17-22, 27, 29, 31, 33, 35, 37, and 39, and the C-intein has a sequence of any one of SEQ ID NOs: 9, 11, 12, 14, 16, 23-26, 28, 30, 32, 34, 36, 38, and 40. In some embodiments, the N-intein has the sequence of SEQ ID NO: 8 and the C-intein has the sequence of SEQ ID NO: 9. In some embodiments, the N-intein has the sequence of SEQ ID NO: 8 and the C-intein has the sequence of SEQ ID NO: 11. In some embodiments, the N-intein has the sequence of SEQ ID NO: 8 and the C-intein has the sequence of SEQ ID NO: 12. In some embodiments, the N-intein has the sequence of SEQ ID NO: 10 and the C-intein has the sequence of SEQ ID NO: 9. In some embodiments, the N-intein has the sequence of SEQ ID NO: 10 and the C-intein has the sequence of SEQ ID NO: 11. In some embodiments, the N-intein has the sequence of SEQ ID NO: 10 and the C-intein has the sequence of SEQ ID NO: 12. In some embodiments, the N-intein has the sequence of SEQ ID NO: 13 and the C-intein has the sequence of SEQ ID NO: 14. In some embodiments, the N-intein has the sequence of SEQ ID NO: 15 and the C-intein has the sequence of SEQ ID NO: 16. In some embodiments, the N-intein has the sequence of SEQ ID NO: 17 and the C-intein has the sequence of SEQ ID NO: 23. In some embodiments, the N-intein has the sequence of SEQ ID NO: 20 and the C-intein has the sequence of SEQ ID NO: 24. In some embodiments, the N-intein has the sequence of SEQ ID NO: 21 and the C-intein has the sequence of SEQ ID NO: 25. In some embodiments, the N-intein has the sequence of SEQ ID NO: 22 and the C-intein has the sequence of SEQ ID NO: 26. In some embodiments, the N-intein has the sequence of SEQ ID NO: 27 and the C-intein has the sequence of SEQ ID NO: 28. In some embodiments, the N-intein has the sequence of SEQ ID NO: 29 and the C-intein has the sequence of SEQ ID NO: 30. In some embodiments, the N-intein has the sequence of SEQ ID NO: 31 and the C-intein has the sequence of SEQ ID NO: 32. In some embodiments, the N-intein has the sequence of SEQ ID NO: 33 and the C-intein has the sequence of SEQ ID NO: 34. In some embodiments, the N-intein has the sequence of SEQ ID NO: 35 and the C-intein has the sequence of SEQ ID NO: 36. In some embodiments, the N-intein has the sequence of SEQ ID NO: 37 and the C-intein has the sequence of SEQ ID NO: 38. In some embodiments, the N-intein has the sequence of SEQ ID NO: 39 and the C-intein has the sequence of SEQ ID NO: 40. In some embodiments, the N-intein has the sequence of any one of SEQ ID NOs: 17-22 and the C-intein has the sequence of any one of SEQ ID NOs: 23-26.

In some embodiments, the split intein trans-splicing system includes one or more inteins that perform protein trans-splicing only upon contact with a ligand. In some embodiments, the ligand is selected from the group consisting of 4-hydroxytamoxifen, a peptide, a protein, a polynucleotide, an amino acid, or a nucleotide.

In some embodiments, the first nucleic acid vector further includes a polynucleotide encoding a signal peptide. In some embodiments, the polynucleotide encoding a signal peptide is placed 5′ of the polynucleotide encoding the N-terminal portion of the stereocilin protein. In some embodiments, the polynucleotide encoding a signal peptide is placed 3′ of the polynucleotide encoding the N-terminal portion of the stereocilin protein. In some embodiments, the second nucleic acid vector further includes a polynucleotide encoding a signal peptide. In some embodiments, the polynucleotide encoding a signal peptide is placed 5′ of the polynucleotide encoding the C-terminal portion of the stereocilin protein. In some embodiments, the polynucleotide encoding a signal peptide is placed 3′ of the polynucleotide encoding the C-terminal portion of the stereocilin protein.

In some embodiments, neither the first nor the second polynucleotide encodes a full-length stereocilin protein. In some embodiments, each of the first and second polynucleotides encode about half of the stereocilin protein sequence.

In some embodiments, the second nucleic acid vector further includes a poly(A) sequence 3′ of the second polynucleotide.

In some embodiments, the first and second nucleic acid vectors do not include STRC untranslated regions (UTRs). In some embodiments, the first and second nucleic acid vectors include STRC UTRs. In some embodiments, the first nucleic acid vector includes a 5′ STRC UTR 5′ of the first polynucleotide. In some embodiments, the second nucleic acid vector includes a 3′ STRC UTR 3′ of the second polynucleotide.

In some embodiments, the first and second polynucleotides that encode the stereocilin protein do not include introns (e.g., the first and second polynucleotides are portions of STRC cDNA). In some embodiments, the first and second polynucleotides that encode the stereocilin protein include introns.

In some embodiments, the OCM promoter has at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 1. In some embodiments, the OCM promoter has the sequence of SEQ ID NO: 1. In some embodiments, the OCM promoter has at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 2. In some embodiments, the OCM promoter has the sequence of SEQ ID NO: 2. In some embodiments, the OCM promoter has at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 3. In some embodiments, the OCM promoter has the sequence of SEQ ID NO: 3.

In some embodiments, the two-vector system is capable of directing cochlear outer hair cell (OHC)-specific expression of a full-length stereocilin protein in a mammalian OHC. In some embodiments, the mammalian OHC is a human OHC. In some embodiments, the mammalian OHC is a murine OHC.

In some embodiments, the stereocilin protein is a human stereocilin protein having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 4. In some embodiments, the stereocilin protein has the sequence of SEQ ID NO: 4. In some embodiments, the human stereocilin protein is encoded by a polynucleotide having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 6. In some embodiments, the polynucleotide that has at least 85% (e.g., at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 6 encodes the stereocilin protein of SEQ ID NO: 4.

In some embodiments, the human stereocilin protein is encoded by a polynucleotide having the sequence of SEQ ID NO: 6. In some embodiments, the STRC protein is a murine stereocilin protein having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 5. In some embodiments, the murine stereocilin protein has the sequence of SEQ ID NO: 5. In some embodiments, the murine stereocilin protein is encoded by a polynucleotide having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 7. In some embodiments, the polynucleotide that has at least 85% (e.g., at least 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 7 encodes the stereocilin protein of SEQ ID NO: 5. In some embodiments, the murine stereocilin protein is encoded by a polynucleotide having the sequence of SEQ ID NO: 7.

In some embodiments, the first and second vectors are viral vectors, plasmids, cosmids, or artificial chromosomes. In some embodiments, the first and second vectors are viral vectors. In some embodiments, the viral vectors are adeno-associated virus (AAV) vectors, adenovirus vectors, or lentivirus vectors. In some embodiments, the first and second vectors are AAV vectors. In some embodiments, each of the first and second AAV vectors has an AAV1, AAV2, AAV2quad(Y-F), AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, rh10, rh39, rh43, rh74, Anc80, Anc80L65, DJ/8, DJ/9, 7m8, PHP.B, PHP.eb, or PHP.S capsid. In some embodiments, each of the first and second AAV vectors has an AAV1 capsid. In some embodiments, each of the first and second AAV vectors has an AAV9 capsid. In some embodiments, each of the first and second AAV vectors has an AAV6 capsid. In some embodiments, each of the first and second AAV vectors has an AAV8 capsid. In some embodiments, each of the first and second AAV vectors has an Anc80 capsid. In some embodiments, each of the first and second AAV vectors has an Anc80L65 capsid. In some embodiments, each of the first and second AAV vectors has a DJ/9 capsid. In some embodiments, each of the first and second AAV vectors has a 7m8 capsid. In some embodiments, each of the first and second AAV vectors has an AAV2 capsid. In some embodiments, each of the first and second AAV vectors has a PHP.B capsid. In some embodiments, each of the first and second AAV vectors has an AAV2quad(Y-F) capsid.

In another aspect, the invention provides a pharmaceutical composition containing the two-vector system of the foregoing aspect and embodiments. In some embodiments, the composition further includes a pharmaceutically acceptable carrier, diluent, or excipient.

In another aspect, the invention provides a cell (e.g., a mammalian cell, e.g., a human cell, such as an OHC, e.g., an OHC having a pathogenic mutation in the STRC gene) including the two-vector system of any of the foregoing aspects and embodiments. In some embodiments, the cell is a mammalian OHC. In some embodiments, the mammalian OHC is a human OHC.

In another aspect, the disclosure provides a method of expressing a stereocilin protein in a mammalian cell by contacting the cell with the two-vector system or the pharmaceutical composition of the foregoing aspects and embodiments. In some embodiments, the cell is a cochlear hair cell. In some embodiments, the cell is an OHC. In some embodiments, the cell is a human cell. In some embodiments, the cell is in a subject (e.g., the contacting occurs in vivo).

In another aspect, the invention provides a method of treating a subject having or at risk of developing sensorineural hearing loss by administering to an inner ear of the subject an effective amount of the two-vector system or the pharmaceutical composition of the foregoing aspects and embodiments. In some embodiments, the sensorineural hearing loss is genetic sensorineural hearing loss. In some embodiments, the genetic hearing loss is autosomal recessive hearing loss. In some embodiments of any of the foregoing aspects, the hearing loss is associated with loss of OHCs or dysfunction of OHCs. In some embodiments, the hearing loss is associated with abnormal OHC stereocilia bundle deflection or impaired connectivity between the OHC hair bundles and the tectorial membrane.

In another aspect, the invention provides a method of increasing STRC expression (e.g., wild-type STRC expression, e.g., to produce wild-type stereocilin protein) in a subject in need thereof, the method including administering to an inner ear of the subject a therapeutically effective amount of the two-vector system or the pharmaceutical composition of the foregoing aspects and embodiments.

In another aspect, the invention provides a method of preventing or reducing OHC damage or death in a subject in need thereof, including administering to an inner ear of the subject an effective amount of the two-vector system or the pharmaceutical composition of the foregoing aspects and embodiments.

In another aspect, the invention provides a method of increasing OHC survival in a subject in need thereof, including administering to an inner ear of the subject an effective amount of the two-vector system or the pharmaceutical composition of the foregoing aspects and embodiments.

In another aspect, the invention provides a method of increasing or improving OHC hair bundle attachment to the tectorial membrane in a subject in need thereof, including administering to an inner ear of the subject an effective amount of the two-vector system or the pharmaceutical composition of the foregoing aspects and embodiments.

In some embodiments of any of the foregoing aspects, the subject has a mutation in STRC. In some embodiments of any of the foregoing aspects, the subject has been identified as having a mutation in STRC. In some embodiments of any of the foregoing aspects, the method further includes identifying the subject as having a mutation in STRC prior to administering the two-vector system or pharmaceutical composition. In some embodiments of any of the foregoing aspects, the subject has deafness, autosomal recessive 16 (DFNB116). In some embodiments of any of the foregoing aspects, the subject has been identified as having DFNB16.

In some embodiments of any of the foregoing aspects, the method further includes evaluating the hearing of the subject prior to administering two-vector system or pharmaceutical composition (e.g., evaluating hearing using standard tests, such as audiometry, auditory brainstem response (ABR), electrocochleography (ECOG), or otoacoustic emissions).

In some embodiments of any of the foregoing aspects, the method further includes evaluating the hearing of the subject after administering the two-vector system or pharmaceutical composition (e.g., evaluating hearing using standard tests, such as audiometry, ABR, ECOG, or otoacoustic emissions).

In some embodiments of any of the foregoing aspects, the two-vector system or pharmaceutical composition is locally administered. In some embodiments, the two-vector system or pharmaceutical composition is administered to the ear of the subject (e.g., administered to the inner ear, e.g., into the perilymph or endolymph, such as to or through the oval window, round window, or horizontal canal, or by transtympanic or intratympanic injection). In some embodiments, the vectors in the two-vector system are administered concurrently. In some embodiments, the vectors in the two-vector system are administered sequentially.

In some embodiments of any of the foregoing aspects, the nucleic acid vector or composition is administered in an amount sufficient to prevent or reduce hearing loss, delay the development of hearing loss, slow the progression of hearing loss, improve hearing, improve speech discrimination, improve hair cell function, prevent or reduce hair cell damage, prevent or reduce hair cell death, promote or increase hair cell survival, improve OHC hair bundle attachment to the tectorial membrane, or increase STRC expression in a hair cell.

In some embodiments of any of the foregoing aspects, the subject is a human.

In another aspect, the invention provides a kit containing two-vector system or the pharmaceutical composition of the foregoing aspects and embodiments.

Definitions

As used herein, the term “about” refers to a value that is within 10% above or below the value being described.

As used herein, “administration” refers to providing or giving a subject a therapeutic agent (e.g., a two-vector system containing an oncomodulin (OCM) promoter operably linked to a polynucleotide encoding a stereocilin protein), by any effective route. Exemplary routes of administration are described herein below.

As used herein, the phrase “administering to the inner ear” refers to providing or giving a therapeutic agent described herein to a subject by any route that allows for transduction of inner ear cells.

Exemplary routes of administration to the inner ear include administration into the perilymph or endolymph, such as to or through the oval window, round window, or semicircular canal (e.g., horizontal canal), or by transtympanic or intratympanic injection, e.g., administration to an OHC.

As used herein, the term “cell type” refers to a group of cells sharing a phenotype that is statistically separable based on gene expression data. For instance, cells of a common cell type may share similar structural and/or functional characteristics, such as similar gene activation patterns and antigen presentation profiles. Cells of a common cell type may include those that are isolated from a common tissue (e.g., epithelial tissue, neural tissue, connective tissue, or muscle tissue) and/or those that are isolated from a common organ, tissue system, blood vessel, or other structure and/or region in an organism.

As used herein, the term “cochlear hair cell” refers to group of specialized cells in the inner ear that are involved in sensing sound. There are two types of cochlear hair cells: inner hair cells and outer hair cells. Damage to cochlear hair cells and genetic mutations that disrupt cochlear hair cell function are implicated in hearing loss and deafness.

As used herein, the terms “conservative mutation,” “conservative substitution,” and “conservative amino acid substitution” refer to a substitution of one or more amino acids for one or more different amino acids that exhibit similar physicochemical properties, such as polarity, electrostatic charge, and steric volume. These properties are summarized for each of the twenty naturally occurring amino acids in table 1, below.

TABLE 1

Representative physicochemical properties

of naturally occurring amino acids

Electrostatic

3
1
Side-
character at

Letter
Letter
chain
physiological
Steric

Amino Acid
Code
Code
Polarity
pH (7.4)
Volume^†

Alanine
Ala
A
nonpolar
neutral
small

Arginine
Arg
R
polar
cationic
large

Asparagine
Asn
N
polar
neutral
intermediate

Aspartic acid
Asp
D
polar
anionic
intermediate

Cysteine
Cys
C
nonpolar
neutral
intermediate

Glutamic acid
Glu
E
polar
anionic
intermediate

Glutamine
Gln
Q
polar
neutral
intermediate

Glycine
Gly
G
nonpolar
neutral
small

Histidine
His
H
polar
Both neutral
large

and cationic

forms in

equilibrium

at pH 7.4

Isoleucine
Ile
I
nonpolar
neutral
large

Leucine
Leu
L
nonpolar
neutral
large

Lysine
Lys
K
polar
cationic
large

Methionine
Met
M
nonpolar
neutral
large

Phenylalanine
Phe
F
nonpolar
neutral
large

Proline
Pro
P
non-polar
neutral
intermediate

Serine
Ser
S
polar
neutral
small

Threonine
Thr
T
polar
neutral
intermediate

Tryptophan
Trp
W
nonpolar
neutral
bulky

Tyrosine
Tyr
Y
polar
neutral
large

Valine
Val
V
nonpolar
neutral
intermediate

^†based on volume in A³: 50-100 is small, 100-150 is intermediate, 150-200 is large, and >200 is bulky

From this table it is appreciated that the conservative amino acid families include (i) G, A, V, L, and I; (ii) D and E; (iii) C, S and T; (iv) H, K and R; (v) N and Q; and (vi) F, Y and W. A conservative mutation or substitution is therefore one that substitutes one amino acid for a member of the same amino acid family (e.g., a substitution of Ser for Thr or Lys for Arg).

As used herein, the term “degradation signal sequence” refers to a sequence (e.g., a nucleotide sequence that can be translated into an amino acid sequence) that mediates the degradation of a polypeptide in which it is contained. Degradation signal sequences can be included in the nucleic acid vectors of the invention to reduce or prevent the expression of portions of stereocilin proteins that have not undergone recombination and/or splicing.

The terms “derived” and “derivative” as used herein refer to a nucleic acid, peptide, or protein or a variant or analog thereof comprising one or more mutations and/or chemical modifications as compared to a corresponding full-length wild-type nucleic acid, peptide, or protein. Non-limiting examples of chemical modifications involving nucleic acids include, for example, modifications to the base moiety, sugar moiety, phosphate moiety, phosphate-sugar backbone, or a combination thereof.

As used herein, the terms “effective amount,” “therapeutically effective amount,” and a “sufficient amount” of a composition, vector construct, or viral vector described herein refer to a quantity sufficient to, when administered to the subject, including a mammal, for example a human, effect beneficial or desired results, including clinical results, and, as such, an “effective amount” or synonym thereto depends upon the context in which it is being applied. For example, in the context of treating sensorineural hearing loss, it is an amount of the composition, vector construct, or viral vector sufficient to achieve a treatment response as compared to the response obtained without administration of the composition, vector construct, or viral vector. The amount of a given composition described herein that will correspond to such an amount will vary depending upon various factors, such as the given agent, the pharmaceutical formulation, the route of administration, the type of disease or disorder, the identity of the subject (e.g., age, sex, weight) or host being treated, and the like, but can nevertheless be routinely determined by one skilled in the art. Also, as used herein, a “therapeutically effective amount” of a composition, vector construct, or viral vector of the present disclosure is an amount which results in a beneficial or desired result in a subject as compared to a control. As defined herein, a therapeutically effective amount of a composition, vector construct, or viral vector of the present disclosure may be readily determined by one of ordinary skill by routine methods known in the art. Dosage regimen may be adjusted to provide the optimum therapeutic response.

As used herein, the term “endogenous” refers to a molecule (e.g., a polypeptide, nucleic acid, or cofactor) that is found naturally in a particular organism (e.g., a human) or in a particular location within an organism (e.g., an organ, a tissue, or a cell, such as a human cell, e.g., an OHC).

As used herein, the term “express” refers to one or more of the following events: (1) production of an RNA template from a DNA sequence (e.g., by transcription); (2) processing of an RNA transcript (e.g., by splicing, editing, 5′ cap formation, and/or 3′ end processing); (3) translation of an RNA into a polypeptide or protein; and (4) post-translational modification of a polypeptide or protein.

As used herein, the term “exogenous” describes a molecule (e.g., a polypeptide, nucleic acid, or cofactor) that is not found naturally in a particular organism (e.g., a human) or in a particular location within an organism (e.g., an organ, a tissue, or a cell, such as a human cell, e.g., a human OHC).

Exogenous materials include those that are provided from an external source to an organism or to cultured matter extracted there from.

As used herein, the term “exon” refers to a region within the coding region of a gene, the nucleotide sequence of which determines the amino acid sequence of the corresponding protein. The term exon also refers to the corresponding region of the RNA transcribed from a gene. Exons are transcribed into pre-mRNA and may be included in the mature mRNA depending on the alternative splicing of the gene. Exons that are included in the mature mRNA following processing are translated into protein, wherein the sequence of the exon determines the amino acid composition of the protein.

As used herein, the term “heterologous” refers to a combination of elements that is not naturally occurring. For example, a heterologous transgene refers to a transgene that is not naturally expressed by the promoter to which it is operably linked.

As used herein, the terms “increasing” and “decreasing” refer to modulation resulting in, respectively, greater or lesser amounts, of function, expression, or activity of a metric relative to a reference. For example, subsequent to administration of a composition in a method described herein, the amount of a marker of a metric (e.g., transgene expression) as described herein may be increased or decreased in a subject by at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or 98% or more relative to the amount of the marker prior to administration. Generally, the metric is measured subsequent to administration at a time that the administration has had the recited effect, e.g., at least one week, one month, 3 months, or 6 months, after a treatment regimen has begun.

As used herein, the term “intein,” also referred to as “protein intron,” refers to a portion of a protein that is typically 100-900 amino acid residues long and that is capable of self-excision and ligation of the flanking protein fragments (“exteins”) with a peptide bond. Inteins are produced during protein splicing. The term “intein” subsumes four different classes of inteins, including maxi-intein, mini-intein, trans-splicing intein, and alanine intein. Maxi-inteins refer to N- and C-terminal splicing regions of a protein containing an endonuclease domain. Endonuclease domains, also known as “homing endonuclease genes” or “HEG” refer to a class of endonucleases encoded as stand-alone genes within introns, as protein fusions with other proteins, or as self-splicing inteins. HEGs generally hydrolyze very few and often targeted DNA regions. Once a HEG hydrolyzes a piece of DNA, the gene encoding the HEG typically incorporates itself into the cleavage site, thereby increasing its allele frequency. Mini-inteins refer to N- and C-terminal splicing domains lacking the endonuclease domain. Trans-splicing inteins refer to inteins that are split into two or more domains which are further split into N-termini and C-termini. Alanine inteins refer to inteins having a splicing junction of an alanine instead of a cysteine or serine. An intein of a precursor protein may come in two genes; in such cases, the intein is designated a split “intein.”

As used herein, the term “intron” refers to a region within the coding region of a gene, the nucleotide sequence of which is not translated into the amino acid sequence of the corresponding protein. The term intron also refers to the corresponding region of the RNA transcribed from a gene. Introns are transcribed into pre-mRNA, but are removed during processing, and are not included in the mature mRNA.

As used herein, the term “outer hair cell-specific expression” or “OHC-specific expression” refers to production of an RNA transcript or polypeptide primarily within cochlear OHCs as compared to other cell types of the cochlea (e.g., spiral ganglion neurons, glia, or other cochlear cell types). OHC-specific expression of a transgene can be confirmed by comparing transgene expression (e.g., RNA or protein expression) between various cell types of the cochlea (e.g., OHCs vs. non-OHCs) using any standard technique (e.g., quantitative RT PCR, immunohistochemistry, western blot analysis, or measurement of the fluorescence of a reporter (e.g., GFP) operably linked to a promoter). An OHC-specific promoter induces expression (e.g., RNA or protein expression) of a transgene to which it is operably linked that is at least 50% greater (e.g., 50%, 75%, 100%, 125%, 150%, 175%, 200% greater or more) in OHCs compared to at least 2 (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, or more) of the following inner ear cell types: inner hair cells, Border cells, inner phalangeal cells, inner pillar cells, outer pillar cells, first row Deiter cells, second row Deiter cells, third row Deiter cells, Hensen's cells, Claudius cells, inner sulcus cells, outer sulcus cells, spiral prominence cells, root cells, interdental cells, basal cells of the stria vascularis, intermediate cells of the stria vascularis, marginal cells of the stria vascularis, spiral ganglion neurons, Schwann cells. An OHC-specific promoter induces expression (e.g., RNA or protein expression) of a transgene to which it is operably linked that is at least 50% greater (e.g., 50%, 75%, 100%, 125%, 150%, 175%, 200% greater or more) in OHCs of the cochlea compared to other cells of the cochlea.

As used herein, “locally” or “local administration” means administration at a particular site of the body intended for a local effect and not a systemic effect. Examples of local administration are epicutaneous, inhalational, intra-articular, intrathecal, intravaginal, intravitreal, intrauterine, intra-lesional administration, lymph node administration, intratumoral administration, administration to the inner ear, and administration to a mucous membrane of the subject, wherein the administration is intended to have a local and not a systemic effect.

As used herein, the term “operably linked” refers to a first molecule joined to a second molecule, wherein the molecules are so arranged that the first molecule affects the function of the second molecule. The two molecules may or may not be part of a single contiguous molecule and may or may not be adjacent. For example, a promoter is operably linked to a transcribable polynucleotide molecule if the promoter modulates transcription of the transcribable polynucleotide molecule of interest in a cell. Additionally, two portions of a transcription regulatory element are operably linked to one another if they are joined such that the transcription-activating functionality of one portion is not adversely affected by the presence of the other portion. Two transcription regulatory elements may be operably linked to one another by way of a linker polynucleotide (e.g., an intervening non-coding polynucleotide) or may be operably linked to one another with no intervening nucleotides present.

As used herein, the term “plasmid” refers to a to an extrachromosomal circular double stranded DNA molecule into which additional DNA segments may be ligated. A plasmid is a type of vector, a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. Certain plasmids are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial plasmids having a bacterial origin of replication and episomal mammalian plasmids). Other vectors (e.g., non-episomal mammalian vectors) can be integrated into the genome of a host cell upon introduction into the host cell, and thereby are replicated along with the host genome. Certain plasmids are capable of directing the expression of genes to which they are operably linked.

As used herein, the term “polynucleotide” refers to a polymer of nucleosides. Typically, a polynucleotide is composed of nucleosides that are naturally found in DNA or RNA (e.g., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine) joined by phosphodiester bonds. The term encompasses molecules comprising nucleosides or nucleoside analogs containing chemically or biologically modified bases, modified backbones, etc., whether or not found in naturally occurring nucleic acids, and such molecules may be preferred for certain applications. Where this application refers to a polynucleotide it is understood that both DNA, RNA, and in each case both single- and double-stranded forms (and complements of each single-stranded molecule) are provided. “Polynucleotide sequence” as used herein can refer to the polynucleotide material itself and/or to the sequence information (i.e., the succession of letters used as abbreviations for bases) that biochemically characterizes a specific nucleic acid. A polynucleotide sequence presented herein is presented in a 5′ to 3′ direction unless otherwise indicated.

As used herein, the term “promoter” refers to a recognition site on DNA that is bound by an RNA polymerase. The polymerase drives transcription of the transgene. A representative promoter of the disclosure is the oncomodulin (OCM) promoter, such as an OCM promoter having a nucleic acid sequence of any one of SEQ ID NOs: 1-3 or a variant thereof having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to the nucleic acid sequence of any one of SEQ ID NOs: 1-3.

“Percent (%) sequence identity” with respect to a reference polynucleotide or polypeptide sequence is defined as the percentage of nucleic acids or amino acids in a candidate sequence that are identical to the nucleic acids or amino acids in the reference polynucleotide or polypeptide sequence, after aligning the sequences and introducing gaps, if necessary, to achieve the maximum percent sequence identity. Alignment for purposes of determining percent nucleic acid or amino acid sequence identity can be achieved in various ways that are within the capabilities of one of skill in the art, for example, using publicly available computer software such as BLAST, BLAST-2, or Megalign software. Those skilled in the art can determine appropriate parameters for aligning sequences, including any algorithms needed to achieve maximal alignment over the full length of the sequences being compared. For example, percent sequence identity values may be generated using the sequence comparison computer program BLAST. As an illustration, the percent sequence identity of a given nucleic acid or amino acid sequence, A, to, with, or against a given nucleic acid or amino acid sequence, B, (which can alternatively be phrased as a given nucleic acid or amino acid sequence, A that has a certain percent sequence identity to, with, or against a given nucleic acid or amino acid sequence, B) is calculated as follows:

$100 multiplied by (the fraction X / Y)$

where X is the number of nucleotides or amino acids scored as identical matches by a sequence alignment program (e.g., BLAST) in that program's alignment of A and B, and where Y is the total number of nucleic acids in B. It will be appreciated that where the length of nucleic acid or amino acid sequence A is not equal to the length of nucleic acid or amino acid sequence B, the percent sequence identity of A to B will not equal the percent sequence identity of B to A.

As used herein, the term “pharmaceutical composition” refers to a mixture containing a therapeutic agent, optionally in combination with one or more pharmaceutically acceptable excipients, diluents, and/or carriers, to be administered to a subject, such as a mammal, e.g., a human, in order to prevent, treat or control a particular disease or condition affecting or that may affect the subject.

As used herein, the term “pharmaceutically acceptable” refers to those compounds, materials, compositions and/or dosage forms, which are suitable for contact with the tissues of a subject, such as a mammal (e.g., a human) without excessive toxicity, irritation, allergic response, and other problem complications commensurate with a reasonable benefit/risk ratio.

As used herein, the term “recombinogenic region” refers to a region of homology that mediates recombination between two different sequences.

As used herein, the term “regulatory sequence” includes promoters, enhancers, and other expression control elements (e.g., polyadenylation signals) that control the transcription or translation of the polynucleotides that encode STRC. Such regulatory sequences are described, for example, in Goeddel, Gene Expression Technology: Methods in Enzymology 185 (Academic Press, San Diego, C A, 1990); incorporated herein by reference.

As used herein, the term “sample” refers to a specimen (e.g., blood, blood component (e.g., serum or plasma), urine, saliva, amniotic fluid, cerebrospinal fluid, tissue (e.g., placental or dermal), pancreatic fluid, chorionic villus sample, and cells) isolated from a subject.

As used herein, the terms “stereocilin” and “STRC” (also known as DFNB16) refer to a protein encoded by the STRC gene and to the gene encoding this protein, respectively. In humans, STRC is tandemly duplicated, where the second copy contains a premature stop codon in exon 20, thereby producing an STRC pseudogene. In the context of the present disclosure, STRC does not refer to the STRC pseudogene. Previous studies have identified mutations in the full-length copy of STRC in humane patients with autosomal recessive non-syndromic sensorineural hearing loss (Verpy et al., Nat. Genet. 29:345-9 (2001)). Stereocilin protein expression is limited to stereocilia in hair bundles of hair cells. Stereocilin is thought to form horizontal top connectors and tectorial membrane-attachment crowns, which are required for the normal functioning of the auditory apparatus (Avan et al., PNAS 116:25948-57 (2019); Verpy et al., J. Comp. Neurol. 519:194-210 (2011)). Mice lacking stereocilin have been shown to exhibit abnormal hair cell bundles with defective cohesion and impaired hearing (Verpy et al., Nature 456:255-8 (2008)). The present disclosure provides polynucleotides encoding the full-length stereocilin protein, which, when incorporated into the vector systems described herein, may be used as a therapeutic agent for the treatment of hearing loss (e.g., sensorineural hearing loss) in subjects in need thereof. The terms “stereocilin” and “STRC” also refer to variants of wild-type stereocilin protein and nucleic acids encoding the same, respectively, such as variant proteins having at least 85% sequence identity (e.g., 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 99.9% identity, or more) to the amino acid sequence of a wild-type stereocilin protein (e.g., SEQ ID NO: 4 or 5) or polynucleotides having at least 85% sequence identity (e.g., 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 99.9% identity, or more) to the nucleic acid sequence of a wild-type STRC gene (e.g., SEQ ID NO: 6 or 7), provided that the STRC analog encoded retains the therapeutic function of wild-type STRC.

As used herein, the term “transcription regulatory element” refers to a polynucleotide that controls, at least in part, the transcription of a gene of interest. Transcription regulatory elements may include promoters, enhancers, and other polynucleotides (e.g., polyadenylation signals) that control or help to control gene transcription. Examples of transcription regulatory elements are described, for example, in Lorence, Recombinant Gene Expression: Reviews and Protocols (Humana Press, New York, NY, 2012).

As used herein, the terms “subject” and “patient” refer to an animal (e.g., a mammal, such as a human). A subject to be treated according to the methods described herein may be one who has been diagnosed with hearing loss (e.g., hearing loss associated with a mutation in STRC) or one at risk of developing this condition. Diagnosis may be performed by any method or technique known in the art. One skilled in the art will understand that a subject to be treated according to the present disclosure may have been subjected to standard tests or may have been identified, without examination, as one at risk due to the presence of one or more risk factors associated with the disease or condition.

As used herein, the terms “transduction” and “transduce” refer to a method of introducing a vector construct or a part thereof into a cell. Wherein the vector construct is contained in a viral vector such as for example an AAV vector, transduction refers to viral infection of the cell and subsequent transfer and integration of the vector construct or part thereof into the cell genome.

As used herein, “treatment” and “treating” in reference to a disease or condition, refer to an approach for obtaining beneficial or desired results, e.g., clinical results. Beneficial or desired results can include, but are not limited to, alleviation or amelioration of one or more symptoms or conditions; diminishment of extent of disease or condition; stabilized (i.e., not worsening) state of disease, disorder, or condition; preventing spread of disease or condition; delay or slowing the progress of the disease or condition; amelioration or palliation of the disease or condition; and remission (whether partial or total), whether detectable or undetectable. “Ameliorating” or “palliating” a disease or condition means that the extent and/or undesirable clinical manifestations of the disease, disorder, or condition are lessened and/or time course of the progression is slowed or lengthened, as compared to the extent or time course in the absence of treatment. “Treatment” can also mean prolonging survival as compared to expected survival if not receiving treatment. Those in need of treatment include those already with the condition or disorder, as well as those prone to have the condition or disorder or those in which the condition or disorder is to be prevented.

As used herein, the term “vector” refers to a nucleic acid vector, e.g., a DNA vector, such as a plasmid, cosmid, or artificial chromosome, an RNA vector, a virus, or any other suitable replicon (e.g., viral vector). A variety of vectors have been developed for the delivery of polynucleotides encoding exogenous proteins into a prokaryotic or eukaryotic cell. Examples of such expression vectors are described in, e.g., Gellissen, Production of Recombinant Proteins: Novel Microbial and Eukaryotic Expression Systems (John Wiley & Sons, Marblehead, M A, 2006). Expression vectors suitable for use with the compositions and methods described herein contain a polynucleotide sequence as well as, e.g., additional sequence elements used for the expression of proteins and/or the integration of these polynucleotide sequences into the genome of a mammalian cell. Certain vectors that can be used for the expression of STRC as described herein include vectors that contain regulatory sequences, such as promoter and enhancer regions, which direct gene transcription. Other useful vectors for expression of STRC contain polynucleotide sequences that enhance the rate of translation of STRC or improve the stability or nuclear export of the mRNA that results from gene transcription. These sequence elements include, e.g., 5′ and 3′ untranslated regions and a polyadenylation signal site to direct efficient transcription of the gene carried on the expression vector. The expression vectors suitable for use with the compositions and methods described herein may also contain a polynucleotide encoding a marker for selection of cells that contain such a vector. Examples of a suitable marker include genes that encode resistance to antibiotics, such as ampicillin, chloramphenicol, kanamycin, or nourseothricin.

As used herein, the term “wild-type” refers to a genotype with the highest frequency for a particular gene in a given organism.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1B are a series of fluorescent images of mouse cochlea transduced with either an adeno-associated virus (AAV) vector expressing green fluorescent protein (GFP) under the control of the ubiquitous cytomegalovirus (CMV) promoter (FIG. 1A), or an AAV vector expressing GFP under control of an oncomodulin (OCM) promoter (SEQ ID NO: 1; FIG. 11B). Native GFP fluorescence is shown. Using a ubiquitous promoter, AAV-CMV-GFP induced GFP expression in many cell types within the cochlea including inner hair cells (IHCs), outer hair cells (OHCs), spiral ganglion neurons, mesenchymal cells, and glia (FIG. 1A). Using an OHC-specific promoter, AAV-OCM (SEQ ID NO: 1)-GFP induced GFP expression exclusively in OHCs (FIG. 11B).

FIGS. 2A and 2B are a series of micrographs of single paraffin sections from a cochlea basal turn of two non-human primates (Macaca fascicularis) administered an AAV vector expressing H2B-GFP under control of the OCM promoter of SEQ ID NO: 1. FIG. 2A is a micrograph of single paraffin section from a first animal and FIG. 2B is a micrograph of single paraffin section from a second animal. Panel A in FIG. 2A and panel B in FIG. 2B (upper images) show a grey scale conversion of the area around the organ of Corti with Hematoxylin-stained nuclei originally in blue and H2B-GFP antibody originally stained in red. Panel A′ in FIG. 2A and panel B′ in FIG. 2B (lower images) show the remaining signal after removing the signal for blue Hematoxylin; H2B-GFP positive (red) nuclei remain visible as darker colors after greyscale conversion. The scale bars represent 100 μm. Inner hair cells (IHCs) and outer hair cells (OHCs) are highlighted for orientation.

FIGS. 3A-3C are a series of fluorescent images of stereocilin expression in the mouse organ of Corti in a 200 μm²ROI at 16 kHz. FIG. 3A shows stereocilin antibody staining at the tips of the outer hair cell (OHC) stereocilia in a wild-type CBA/CaJ mouse. As shown in FIG. 3B, 232 bp STRC knockout (KO) animals lacked the signal for the antibody. FIG. 3C shows stereocilin antibody staining in a 232 bp STRC KO mouse administered dual Anc80 vectors, in which the first vector carried a CMV promoter and nucleotides 1-3200 of the murine STRC cDNA and the vector second carried nucleotides 2201-5430, creating a 1000 bp overlap between the two cDNA in the two vectors. De-novo stereocilin protein expression could be observed at the tips of the OHC stereocilia and in the body of inner hair cells of the organ of Corti in treated 232 bp STRC KO mice.

FIGS. 4A-4C are a series of graphs showing improvement of auditory function in Anc80-CMV-mStrc treated 232 bp STRC KO mice and correlation to OHC STRC expression. Untreated contralateral ears showed near absent DPOAEs and highly elevated ABR thresholds indicative of loss of OHC function (FIGS. 4A-4B, open circles), while treated 232 bp STRC KO animals showed recovery of hearing thresholds (FIGS. 4A-4B, filled circles). The best responder of the treated animals (FIGS. 4A-4B, black squares) showed close to wild type (FIGS. 4A-4B, triangles) hearing thresholds. A high fraction of OHCs of 232 bp STRC KO mice expressing stereocilin after treatment with AAV-Anc80-CMV-mStrc was found to promote hearing recovery (FIG. 4C).

FIGS. 5A-5B are an image and a graph showing that transfection of HEK293T cells with a two-vector split intein system led to reconstitution of full-length stereocilin. FIG. 5A is a representative image of a Western blot against stereocilin protein and beta-actin. FIG. 5B is a densitometry quantification of full-length stereocilin band intensity relative to actin and indicates the relative expression of full-length stereocilin protein for the negative (GFP) control, positive full-length control, and the Npu intein construct.

FIGS. 6A-6B are maps of plasmids P959 and P724, respectively, used to create an overlapping dual vector system for expressing stereocilin (STRC) under the control of a murine OCM promoter.

FIGS. 7A-7B are maps of plasmids P960 and P726, respectively, used to create a dual hybrid vector system for expressing STRC under the control of a murine OCM promoter.

DETAILED DESCRIPTION

Described herein are compositions and methods for the treatment of sensorineural hearing loss in a subject (such as a mammalian subject, for instance, a human) by administering a first nucleic acid vector containing a promoter, such as an oncomodulin (OCM) promoter, and a polynucleotide encoding an N-terminal portion of a stereocilin (STRC) protein (e.g., wild-type (WT) STRC protein) and a second nucleic acid vector containing a polynucleotide encoding a C-terminal portion of a STRC protein and a polyadenylation (poly(A)) sequence. When introduced into a mammalian cell, such as a cochlear outer hair cell (OHC), the polynucleotides encoded by the two nucleic acid vectors can combine to form a polynucleotide that encodes the full-length STRC protein. The disclosure also features two-vector expression systems (e.g., overlapping dual vectors, trans-splicing vectors, dual hybrid vectors, and split intein trans-splicing vectors) containing the aforementioned polynucleotides. The compositions and methods described herein can be used to express polynucleotides encoding STRC specifically in OHCs, and, therefore, the compositions described herein can be administered to a subject (such as a mammalian subject, for instance, a human) to treat disorders caused by dysfunction of OHCs, such as hearing loss (e.g., sensorineural hearing loss) and auditory neuropathy.

Stereocilin

Stereocilin (also known as DFNB16) is a protein encoded by the STRC gene on chromosome 15q15, which contains 29 exons spanning approximately 19 kb of the genome. The STRC gene is tandemly duplicated, where the second copy contains a premature stop codon in exon 20, thereby producing an STRC pseudogene. Previous studies have identified two frameshift mutations and a large deletion in the full-length copy of STRC in two families with autosomal recessive non-syndromic sensorineural hearing loss (Verpy et al., Nat. Genet. 29:345-9 (2001)). Stereocilin protein expression is limited to stereocilia in hair bundles of inner ear hair cells and is thought to form horizontal top connectors and tectorial membrane-attachment crowns, which are required for the normal functioning of the auditory apparatus (Avan et al., PNAS 116:25948-57 (2019); Verpy et al., J. Comp. Neurol. 519:194-210 (2011)). Mice lacking stereocilin have been shown to exhibit abnormal hair cell bundles with defective cohesion and impaired hearing (Verpy et al., Nature 456:255-8 (2008)).

The compositions and methods described herein can be used to treat sensorineural hearing loss by administering a first nucleic acid vector containing a polynucleotide encoding an N-terminal portion of a stereocilin protein and a second nucleic acid vector containing a polynucleotide encoding a C-terminal portion of a stereocilin protein. The full-length STRC coding sequence is too large to include in the type of vector that is commonly used for gene therapy (e.g., an adeno-associated virus (AAV) vector, which is thought to have a packaging limit of 5 kb). The compositions and methods described herein overcome this problem by dividing the STRC coding sequence between two different nucleic acid vectors such that the full-length STRC sequence can be reconstituted in a cell. These compositions and methods can be used to treat subjects having one or more mutations in the STRC gene, e.g., an STRC mutation that reduces STRC expression, reduces STRC function, or is associated with hearing loss (e.g., a subject having DFNB16). When the first and second nucleic acid vectors are administered in a composition, the polynucleotides encoding the N-terminal and C-terminal portions of stereocilin can combine within a cell (e.g., a human cell, e.g., a cochlear hair cell) to form a single polynucleotide that contains the full-length STRC coding sequence (e.g., through homologous recombination and/or splicing).

The nucleic acid vectors used in the compositions and methods described herein include polynucleotide sequences that encode wild-type stereocilin, or a variant thereof, such as polynucleotide sequences that, when combined, encode a protein having at least 85% sequence identity (e.g., 85%, 90%, 95%, 96%, 97%, 98%, 99%, or more, sequence identity) to the amino acid sequence of wild-type mammalian (e.g., human or mouse) stereocilin. The polynucleotides used in the nucleic acid vectors described herein encode an N-terminal portion and a C-terminal portion of a stereocilin amino acid sequence in Table 2 below (e.g., two portions that, when combined, encode a full-length stereocilin amino acid sequence listed in Table 2, e.g., SEQ ID NO: 4 or SEQ ID NO: 5).

According to the methods described herein, a subject can be administered a composition containing a first nucleic acid vector and a second nucleic acid vector that contain an N-terminal and C-terminal portion, respectively, of a polynucleotide sequence encoding the amino acid sequence of SEQ ID NO: 4 or SEQ ID NO: 5, or a polynucleotide sequence encoding an amino acid sequence having at least 85% sequence identity (e.g., 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, sequence identity) to the amino acid sequence of SEQ ID NO: 4 or SEQ ID NO: 5, or a polynucleotide sequence encoding an amino acid sequence that contains one or more conservative amino acid substitutions relative to SEQ ID NO: 4 or SEQ ID NO: 5 (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more conservative amino acid substitutions), provided that the stereocilin analog encoded retains the therapeutic function of wild-type STRC. In some embodiments, no more than 10% of the amino acids in the N-terminal portion of the stereocilin protein and no more than 10% of the amino acids in the C-terminal portion of the stereocilin protein may be replaced with conservative amino acid substitutions. The stereocilin protein may be encoded by a polynucleotide having the sequence of SEQ ID NO: 5 or SEQ ID NO: 6. The stereocilin protein may also be encoded by a polynucleotide having single nucleotide variants (SNVs) that have been found to be non-pathogenic in human subjects. The stereocilin protein may be a human stereocilin protein or may be a homolog of the human stereocilin protein from another mammalian species (e.g., mouse, rat, cow, horse, goat, sheep, donkey, cat, dog, rabbit, guinea pig, or other mammal).

TABLE 2

STRC Sequences

SEQ ID

NO.
Sequence Name
Sequence

4
Wild-type human
MALSLWPLLLLLLLLLLLSFAVTLAPTGPHSLDPGLSFLKSLLSTLDQ

stereocilin protein,
APQGSLSRSRFFTFLANISSSFEPGRMGEGPVGEPPPLQPPALRLH

UniProt ID:
DFLVTLRGSPDWEPMLGLLGDMLALLGQEQTPRDFLVHQAGVLGG

Q7RTU9
LVEVLLGALVPGGPPTPTRPPCTRDGPSDCVLAADWLPSLLLLLEG

TRWQALVQVQPSVDPTNATGLDGREAAPHFLQGLLGLLTPTGELG

SKEALWGGLLRTVGAPLYAAFQEGLLRVTHSLQDEVFSILGQPEPD

TNGQCQGGNLQQLLLWGVRHNLSWDVQALGFLSGSPPPPPALLH

CLSTGVPLPRASQPSAHISPRQRRAITVEALCENHLGPAPPYSISNF

SIHLLCQHTKPATPQPHPSTTAICQTAVWYAVSWAPGAQGWLQAC

HDQFPDEFLDAICSNLSFSALSGSNRRLVKRLCAGLLPPPTSCPEG

LPPVPLTPDIFWGCFLENETLWAERLCGEASLQAVPPSNQAWVQH

VCQGPTPDVTASPPCHIGPCGERCPDGGSFLVMVCANDTMYEVLV

PFWPWLAGQCRISRGGNDTCFLEGLLGPLLPSLPPLGPSPLCLTPG

PFLLGMLSQLPRCQSSVPALAHPTRLHYLLRLLTFLLGPGAGGAEA

QGMLGRALLLSSLPDNCSFWDAFRPEGRRSVLRTIGEYLEQDEEQ

PTPSGFEPTVNPSSGISKMELLACFSPVLWDLLQREKSVWALQILV

QAYLHMPPENLQQLVLSAEREAAQGFLTLMLQGKLQGKLQVPPSE

EQALGRLTALLLQRYPRLTSQLFIDLSPLIPFLAVSDLMRFPPSLLAN

DSVLAAIRDYSPGMRPEQKEALAKRLLAPELFGEVPAWPQELLWA

VLPLLPHLPLENFLQLSPHQIQALEDSWPAAGLGPGHARHVLRSLV

NQSVQDGEEQVRRLGPLACFLSPEELQSLVPLSDPTGPVERGLLE

CAANGTLSPEGRVAYELLGVLRSSGGAVLSPRELRVWAPLFSQLG

LRFLQELSEPQLRAMLPVLQGTSVTPAQAVLLLGRLLPRHDLSLEEL

CSLHLLLPGLSPQTLQAIPRRVLVGACSCLAPELSRLSACQTAALLQ

TFRVKDGVKNMGTTGAGPAVCIPGQPIPTTWPDCLLPLLPLKLLQL

DSLALLANRRRYWELPWSEQQAQFLWKKMQVPTNLTLRNLQALG

TLAGGMSCEFLQQINSMVDFLEVVHMIYQLPTRVRGSLRACIWAEL

QRRMAMPEPEWTTVGPELNGLDSKLLLDLPIQLMDRLSNESIMLVV

ELVQRAPEQLLALTPLHQAALAERALQNLAPKETPVSGEVLETLGP

LVGFLGTESTRQIPLQILLSHLSQLQGFCLGETFATELGWLLLQESV

LGKPELWSQDEVEQAGRLVFTLSTEAISLIPREALGPETLERLLEKQ

QSWEQSRVGQLCREPQLAAKKAALVAGVVRPAAEDLPEPVPNCA

DVRGTFPAAWSATQIAEMELSDFEDCLTLFAGDPGLGPEELRAAM

GKAKQLWGPPRGFRPEQILQLGRLLIGLGDRELQELILVDWGVLST

LGQIDGWSTTQLRIVVSSFLRQSGRHVSHLDFVHLTALGYTLCGLR

PEELQHISSWEFSQAALFLGTLHLQCSEEQLEVLAHLLVLPGGFGPI

SNWGPEIFTEIGTIAAGIPDLALSALLRGQIQGVTPLAISVIPPPKFAV

VFSPIQLSSLTSAQAVAVTPEQMAFLSPEQRRAVAWAQHEGKESP

EQQGRSTAWGLQDWSRPSWSLVLTISFLGHLL

5
Murine stereocilin
MALSLQPQLLLLLSLLPQEVTSAPTGPQSLDAGLSLLKSFVATLDQA

protein
PQRSLSQSRFSAFLANISSSFQLGRMGEGPVGEPPPLQPPALRLH

(NP_536707.2)
DFLVTLRGSPDWEPMLGLLGDVLALLGQEQTPRDFLVHQAGVLGG

LVEALLGALVPGGPPAPTRPPCTRDGPSDCVLAADWLPSLMLLLEG

TRWQALVQLQPSVDPTNATGLDGREPAPHFLQGLLGLLTPAGELG

SEEALWGGLLRTVGAPLYAAFQEGLLRVTHSLQDEVFSIMGQPEP

DASGQCQGGNLQQLLLWGMRNNLSWDARALGFLSGSPPPPPALL

HCLSRGVPLPRASQPAAHISPRQRRAISVEALCENHSGPEPPYSIS

NFSIYLLCQHIKPATPRPPPTTPRPPPTTPQPPPTTTQPIPDTTQPPP

VTPRPPPTTPQPPPSTAVICQTAVWYAVSWAPGARGWLQACHDQ

FPDQFLDMICGNLSFSALSGPSRPLVKQLCAGLLPPPTSCPPGLIPV

PLTPEIFWGCFLENETLWAERLCVEDSLQAVPPRNQAWVQHVCRG

PTLDATDFPPCRVGPCGERCPDGGSFLLMVCANDTLYEALVPFWA

WLAGQCRISRGGNDTCFLEGMLGPLLPSLPPLGPSPLCLAPGPFLL

GMLSQLPRCQSSVPALAHPTRLHYLLRLLTFLLGPGTGGAETQGML

GQALLLSSLPDNCSFWDAFRPEGRRSVLRTVGEYLQREEPTPPGL

DSSLSLGSGMSKMELLSCFSPVLWDLLQREKSVWALRTLVKAYLR

MPPEDLQQLVLSAEMEAAQGFLTLMLRSWAKLKVQPSEEQAMGR

LTALLLQRYPRLTSQLFIDMSPLIPFLAVPDLMRFPPSLLANDSVLAAI

RDHSSGMKPEQKEALAKRLLAPELFGEVPDWPQELLWAALPLLPH

LPLESFLQLSPHQIQALEDSWPVADLGPGHARHVLRSLVNQSMED

GEEQVLRLGSLACFLSPEELQSLVPLSDPMGPVEQGLLECAANGTL

SPEGRVAYELLGVLRSSGGTVLSPRELRVWAPLFPQLGLRFLQELS

ETQLRAMLPALQGASVTPAQAVLLFGRLLPKHDLSLEELCSLHPLLP

GLSPQTLQAIPKRVLVGACSCLGPELSRLSACQIAALLQTFRVKDGV

KNMGAAGAGSAVCIPGQPTTWPDCLLPLLPLKLLQLDAAALLANRR

LYRQLPWSEQQAQFLWKKMQVPTNLSLRNLQALGNLAGGMTCEF

LQQISSMVDFLDVVHMLYQLPTGVRESLRACIWTELQRRMTMPEP

ELTTLGPELSELDTKLLLDLPIQLMDRLSNDSIMLVVEMVQGAPEQL

LALTPLHQTALAERALKNLAPKETPISKEVLETLGPLVGFLGIESTRRI

PLPILLSHLSQLQGFCLGETFATELGWLLLQEPVLGKPELWSQDEIE

QAGRLVFTLSAEAISSIPREALGPETLERLLGKHQSWEQSRVGHLC

GESQLAHKKAALVAGIVHPAAEGLQEPVPNCADIRGTFPAAWSATQ

ISEMELSDFEDCLSLFAGDPGLGPEELRAAMGKAKQLWGPPRGFR

PEQILQLGRLLIGLGERELQELTLVDWGVLSSLGQIDGWSSMQLRA

VVSSFLRQSGRHVSHLDFIYLTALGYTVCGLRPEELQHISSWEFSQ

AALFLGSLHLPCSEEQLEVLAYLLVLPGGFGPVSNWGPEIFTEIGTIA

AGIPDLALSALLRGQIQGLTPLAISVIPAPKFAVVENPIQLSSLTRGQA

VAVTPEQLAYLSPEQRRAVAWAQHEGKEIPEQLGRNSAWGLYDW

FQASWALALPVSIFGHLL

6
Polynucleotide
ATGGCTCTCAGCCTCTGGCCCCTGCTGCTGCTGCTGCTGCTGC

encoding full-length
TGCTGCTGCTGTCCTTTGCAGTGACTCTGGCCCCTACTGGGCCT

WT human
CATTCCCTGGACCCTGGTCTCTCCTTCCTGAAGTCATTGCTCTC

stereocilin (from
CACTCTGGACCAGGCTCCCCAGGGCTCCCTGAGCCGCTCACGG

NM_153700.2),
TTCTTTACATTCCTGGCCAACATTTCTTCTTCCTTTGAGCCTGGG

encodes the protein
AGAATGGGGGAAGGACCAGTAGGAGAGCCCCCACCTCTCCAGC

of SEQ ID NO: 4
CGCCTGCTCTGCGGCTCCATGATTTTCTAGTGACACTGAGAGGT

(includes stop
AGCCCCGACTGGGAGCCAATGCTAGGGCTGCTAGGGGATATGC

codon)
TGGCACTGCTGGGACAGGAGCAGACTCCCCGAGATTTCCTGGT

GCACCAGGCAGGGGTGCTGGGTGGACTTGTGGAGGTGCTGCT

GGGAGCCTTAGTTCCTGGGGGCCCCCCTACCCCAACTCGGCCC

CCATGCACCCGTGATGGGCCGTCTGACTGTGTCCTGGCTGCTG

ACTGGTTGCCTTCTCTGCTGCTGTTGTTAGAGGGCACACGCTGG

CAAGCTCTGGTGCAGGTGCAGCCCAGTGTGGACCCCACCAATG

CCACAGGCCTCGATGGGAGGGAGGCAGCTCCTCACTTTTTGCA

GGGTCTGTTGGGTTTGCTTACCCCAACAGGGGAGCTAGGCTCC

AAGGAGGCTCTTTGGGGCGGTCTGCTACGCACAGTGGGGGCCC

CCCTCTATGCTGCCTTTCAGGAGGGGCTGCTCCGTGTCACTCAC

TCCCTGCAGGATGAGGTCTTCTCCATTTTGGGGCAGCCAGAGC

CTGATACCAATGGGCAGTGCCAGGGAGGTAACCTTCAACAGCT

GCTCTTATGGGGCGTCCGGCACAACCTTTCCTGGGATGTCCAG

GCGCTGGGCTTTCTGTCTGGATCACCACCCCCACCCCCTGCCC

TCCTTCACTGCCTGAGCACGGGCGTGCCTCTGCCCAGAGCTTC

TCAGCCGTCAGCCCACATCAGCCCACGCCAACGGCGAGCCATC

ACTGTGGAGGCCCTCTGTGAGAACCACTTAGGCCCAGCACCAC

CCTACAGCATTTCCAACTTCTCCATCCACTTGCTCTGCCAGCACA

CCAAGCCTGCCACTCCACAGCCCCATCCCAGCACCACTGCCAT

CTGCCAGACAGCTGTGTGGTATGCAGTGTCCTGGGCACCAGGT

GCCCAAGGCTGGCTACAGGCCTGCCACGACCAGTTTCCTGATG

AGTTTTTGGATGCGATCTGCAGTAACCTCTCCTTTTCAGCCCTGT

CTGGCTCCAACCGCCGCCTGGTGAAGCGGCTCTGTGCTGGCCT

GCTCCCACCCCCTACCAGCTGCCCTGAAGGCCTGCCCCCTGTT

CCCCTCACCCCAGACATCTTTTGGGGCTGCTTCTTGGAGAATGA

GACTCTGTGGGCTGAGCGACTGTGTGGGGAGGCAAGTCTACAG

GCTGTGCCCCCCAGCAACCAGGCTTGGGTCCAGCATGTGTGCC

AGGGCCCCACCCCAGATGTCACTGCCTCCCCACCATGCCACAT

TGGACCCTGTGGGGAACGCTGCCQGGATGGGGGCAGCTTCCT

GGTGATGGTCTGTGCCAATGACACCATGTATGAGGTCCTGGTGC

CCTTCTGGCCTTGGCTAGCAGGCCAATGCAGGATAAGTCGTGG

GGGCAATGACACTTGCTTCCTAGAAGGGCTGCTGGGCCCCCTT

CTGCCCTCTCTGCCACCACTGGGACCATCCCCACTCTGTCTGAC

CCCTGGCCCCTTCCTCCTTGGCATGCTATCCCAGTTGCCACGCT

GTCAGTCCTCTGTCCCAGCTCTTGCTCACCCCACACGCCTACAC

TATCTCCTCCGCCTGCTGACCTTCCTCTTGGGTCCAGGGGCTGG

GGGCGCTGAGGCCCAGGGGATGCTGGGTCGGGCCCTACTGCT

CTCCAGTCTCCCAGACAACTGCTCCTTCTGGGATGCCTTTCGCC

CAGAGGGCCGGCGCAGTGTGCTACGGACGATTGGGGAATACCT

GGAACAAGATGAGGAGCAGCCAACCCCATCAGGCTTTGAACCC

ACTGTCAACCCCAGCTCTGGTATAAGCAAGATGGAGCTGCTGGC

CTGCTTTAGTCCTGTGCTGTGGGATCTGCTCCAGAGGGAAAAGA

GTGTTTGGGCCCTGCAGATTCTAGTGCAGGCGTACCTGCATATG

CCCCCAGAAAACCTCCAGCAGCTGGTGCTTTCAGCAGAGAGGG

AGGCTGCACAGGGCTTCCTGACACTCATGCTGCAGGGGAAGCT

GCAGGGGAAGCTGCAGGTACCACCATCCGAGGAGCAGGCCCT

GGGTCGCCTGACAGCCCTGCTGCTCCAGCGGTACCCACGCCTC

ACCTCCCAGCTCTTCATTGACCTGTCACCACTCATCCCTTTCTTG

GCTGTCTCTGACCTGATGCGCTTCCCACCATCCCTGTTAGCCAA

CGACAGTGTCCTGGCTGCCATCCGGGATTACAGCCCAGGAATG

AGGCCTGAACAGAAGGAGGCTCTGGCAAAGCGACTGCTGGCCC

CTGAACTGTTTGGGGAAGTGCCTGCCTGGCCCCAGGAGCTGCT

GTGGGCAGTGCTGCCCCTGCTCCCCCACCTCCCTCTGGAGAAC

TTTTTGCAGCTCAGCCCTCACCAGATCCAGGCCCTGGAGGATAG

CTGGCCAGCAGCAGGTCTGGGGCCAGGGCATGCCCGCCATGT

GCTGCGCAGCCTGGTAAACCAGAGTGTCCAGGATGGTGAGGAG

CAGGTACGCAGGCTTGGGCCCCTCGCCTGTTTCCTGAGCCCTG

AGGAGCTGCAGAGCCTAGTGCCCCTGAGTGATCCAACGGGGCC

AGTAGAACGGGGGCTGCTGGAATGTGCAGCCAATGGGACCCTC

AGCCCAGAAGGACGGGTGGCATATGAACTTCTGGGTGTGTTGC

GCTCATCTGGAGGAGCGGTGCTGAGCCCCCGGGAGCTGCGGG

TCTGGGCCCCTCTCTTCTCTCAGCTGGGCCTCCGCTTCCTTCAG

GAGCTGTCAGAGCCCCAGCTTAGAGCCATGCTTCCTGTCCTGCA

GGGAACTAGTGTTACACCTGCTCAGGCTGTCCTGCTGCTTGGAC

GGCTCCTTCCTAGGCACGATCTATCCCTGGAGGAACTCTGCTCC

TTGCACCTTCTGCTACCAGGCCTCAGCCCCCAGACACTCCAGG

CCATCCCTAGGCGAGTCCTGGTCGGGGCTTGTTCCTGCCTGGC

CCCTGAACTGTCACGCCTCTCAGCCTGCCAGACCGCAGCACTG

CTGCAGACCTTTCGGGTTAAAGATGGTGTTAAAAATATGGGTAC

AACAGGTGCTGGTCCAGCTGTGTGTATCCCTGGTCAGCCTATTC

CCACCACCTGGCCAGACTGCCTGCTTCCCCTGCTCCCATTAAAG

CTGCTACAACTGGATTCCTTGGCTCTTCTGGCAAATCGAAGACG

CTACTGGGAGCTGCCCTGGTCTGAGCAGCAGGCACAGTTTCTC

TGGAAGAAGATGCAAGTACCCACCAACCTTACCCTCAGGAATCT

GCAGGCTCTGGGCACCCTGGCAGGAGGCATGTCCTGTGAGTTT

CTGCAGCAGATCAACTCCATGGTAGACTTCCTTGAAGTGGTGCA

CATGATCTATCAGCTGCCCACTAGAGTTCGAGGGAGCCTGAGG

GCCTGTATCTGGGCAGAGCTACAGCGGAGGATGGCAATGCCAG

AACCAGAATGGACAACTGTAGGGCCAGAACTGAACGGGCTGGA

TAGCAAGCTACTCCTGGACTTACCGATCCAGTTGATGGACAGAC

TATCCAATGAATCCATTATGTTGGTGGTGGAGCTGGTGCAAAGA

GCTCCAGAGCAGCTGCTGGCACTGACCCCCCTCCACCAGGCAG

CCCTGGCAGAGAGGGCACTACAAAACCTGGCTCCAAAGGAGAC

TCCAGTCTCAGGGGAAGTGCTGGAGACCTTAGGCCCTTTGGTT

GGATTCCTGGGGACAGAGAGCACACGACAGATCCCCCTACAGA

TCCTGCTGTCCCATCTCAGTCAGCTGCAAGGCTTCTGCCTAGGA

GAGACATTTGCCACAGAGCTGGGATGGCTGCTATTGCAGGAGT

CTGTTCTTGGGAAACCAGAGTTGTGGAGCCAGGATGAAGTAGA

GCAAGCTGGACGCCTAGTATTCACTCTGTCTACTGAGGCAATTT

CCTTGATCCCCAGGGAGGCCTTGGGTCCAGAGACCCTGGAGCG

GCTTCTAGAAAAGCAGCAGAGCTGGGAGCAGAGCAGAGTTGGA

CAGCTGTGTAGGGAGCCACAGCTTGCTGCCAAGAAAGCAGCCC

TGGTAGCAGGGGTGGTGCGACCAGCTGCTGAGGATCTTCCAGA

ACCTGTGCCAAATTGTGCAGATGTACGAGGGACATTCCCAGCAG

CCTGGTCTGCAACCCAGATTGCAGAGATGGAGCTCTCAGACTTT

GAGGACTGCCTGACATTATTTGCAGGAGACCCAGGACTTGGGC

CTGAGGAACTGCGGGCAGCCATGGGCAAAGCAAAACAGTTGTG

GGGTCCCCCCCGGGGATTTCGTCCTGAGCAGATCCTGCAGCTT

GGTAGGCTCTTAATAGGTCTAGGAGATCGGGAACTACAGGAGCT

GATCCTAGTGGACTGGGGAGTGCTGAGCACCCTGGGGCAGATA

GATGGCTGGAGCACCACTCAGCTCCGCATTGTGGTCTCCAGTTT

CCTACGGCAGAGTGGTCGGCATGTGAGCCACCTGGACTTCGTT

CATCTGACAGCGCTGGGTTATACTCTCTGTGGACTGCGGCCAGA

GGAGCTCCAGCACATCAGCAGTTGGGAGTTCAGCCAAGCAGCT

CTCTTCCTCGGCACCCTGCATCTCCAGTGCTCTGAGGAACAACT

GGAGGTTCTGGCCCACCTACTTGTACTGCCTGGTGGGTTTGGC

CCAATCAGTAACTGGGGGCCTGAGATCTTCACTGAAATTGGCAC

CATAGCAGCTGGGATCCCAGACCTGGCTCTTTCAGCACTGCTGC

GGGGACAGATCCAGGGCGTTACTCCTCTTGCCATTTCTGTCATC

CCTCCTCCTAAATTTGCTGTGGTGTTTAGTCCCATCCAACTATCT

AGTCTCACCAGTGCTCAGGCTGTGGCTGTCACTCCTGAGCAAAT

GGCCTTTCTGAGTCCTGAGCAGCGACGAGCAGTTGCATGGGCC

CAACATGAGGGAAAGGAGAGCCCAGAACAGCAAGGTCGAAGTA

CAGCCTGGGGCCTCCAGGACTGGTCACGACCTTCCTGGTCCCT

GGTATTGACTATCAGCTTCCTTGGCCACCTGCTATGA

7
Polynucleotide
ATGGCTCTGAGCCTCCAGCCCCAGCTGCTCCTTCTCCTGTCGCT

encoding full-
CCTGCCGCAGGAAGTGACTTCAGCCCCTACTGGGCCTCAGTCT

length, murine wild-
TTGGATGCTGGTCTCTCCCTTCTGAAGTCATTCGTAGCCACTCT

type stereocilin
GGACCAAGCTCCTCAGCGTTCCCTCAGCCAGTCACGGTTCTCTG

(from
CGTTCCTGGCCAACATTTCTTCATCCTTCCAGCTTGGGAGGATG

NM_080459.2),
GGGGAGGGACCGGTGGGAGAGCCCCCACCTCTCCAGCCCCCT

encodes the protein
GCACTTCGACTTCATGATTTCCTCGTGACACTGAGAGGTAGCCC

of SEQ ID NO: 5
AGACTGGGAGCCAATGCTAGGGCTTCTGGGAGATGTGCTGGCA

(includes stop
CTCCTGGGACAGGAACAGACTCCCCGGGACTTTTTGGTGCACC

codon)
AGGCAGGTGTACTGGGTGGACTTGTAGAGGCATTGTTGGGAGC

GTTAGTTCCTGGAGGCCCCCCTGCCCCCACTCGACCCCCATGC

ACCCGTGATGGCCCTTCTGACTGTGTCCTGGCTGCTGATTGGTT

GCCTTCTCTGATGTTGTTATTAGAGGGTACACGCTGGCAGGCCC

TGGTGCAGTTGCAGCCCAGTGTGGACCCAACCAATGCCACAGG

TCTTGATGGTAGAGAGCCAGCTCCTCACTTTTTACAGGGTCTGC

TGGGCTTGCTTACCCCAGCAGGAGAGTTGGGCTCTGAGGAGGC

TCTTTGGGGTGGTCTGCTGCGCACAGTGGGGGCCCCCCTCTAT

GCTGCCTTCCAGGAGGGGCTACTGCGAGTCACTCATTCTCTGCA

AGATGAGGTCTTTTCTATTATGGGACAGCCAGAGCCTGATGCCA

GTGGGCAGTGCCAGGGAGGCAACCTTCAACAGCTGCTTTTATG

GGGCATGCGGAACAACCTTTCTTGGGACGCCCGAGCACTGGGT

TTTCTATCTGGATCACCACCTCCACCCCCTGCTCTCCTGCACTG

CCTGAGCAGAGGTGTGCCTCTGCCCAGGGCTTCCCAGCCTGCG

GCTCACATCAGCCCTCGACAGCGGCGAGCCATCTCTGTGGAGG

CCCTCTGCGAGAACCACTCAGGCCCAGAGCCACCCTACAGCAT

CTCCAACTTCTCCATCTACTTGCTCTGCCAGCACATCAAGCCTG

CCACCCCGCGGCCCCCTCCTACCACCCCACGGCCTCCTCCTAC

CACCCCACAGCCCCCTCCTACCACTACACAGCCCATTCCTGACA

CTACACAGCCCCCTCCTGTCACCCCAAGGCCTCCTCCTACCACC

CCACAACCCCCTCCTAGCACAGCTGTCATCTGCCAGACAGCTGT

ATGGTACGCAGTCTCGTGGGCACCAGGTGCCCGAGGTTGGCTC

CAAGCCTGCCATGATCAGTTTCCTGATCAATTTCTGGATATGATC

TGCGGCAACCTCTCATTTTCAGCCCTGTCTGGCCCCAGTCGTCC

TTTGGTAAAGCAGCTCTGTGCTGGCTTGCTCCCACCCCCCACTA

GCTGTCCACCAGGCCTGATCCCTGTGCCCCTCACCCCAGAAATA

TTCTGGGGCTGTTTCCTGGAGAATGAGACACTGTGGGCTGAAC

GGTTGTGTGTGGAGGACAGTCTGCAGGCTGTGCCCCCGAGGAA

CCAGGCTTGGGTTCAGCATGTGTGTCGGGGCCCCACCTTGGAC

GCCACTGATTTTCCACCGTGCCGCGTTGGACCCTGTGGGGAAC

GCTGCCCAGATGGGGGCAGCTTCCTGCTCATGGTCTGTGCCAA

TGACACTCTGTATGAAGCCTTGGTTCCCTTCTGGGCTTGGCTAG

CAGGCCAATGCAGAATTAGTCGTGGAGGAAATGATACTTGCTTT

CTAGAAGGCATGCTGGGCCCCTTGTTGCCCTCTCTGCCCCCTCT

GGGACCATCCCCACTCTGTCTGGCTCCTGGTCCTTTTCTGCTTG

GCATGTTATCCCAGTTGCCACGCTGTCAGTCCTCCGTGCCAGCC

CTCGCCCACCCCACGCGCCTACATTACCTCCTGCGCCTACTGAC

CTTCCTTCTGGGTCCAGGGACTGGGGGTGCCGAGACGCAGGG

GATGTTAGGTCAAGCCCTGCTGCTCTCTAGTCTCCCAGACAACT

GTTCATTCTGGGATGCCTTCCGCCCAGAGGGCCGGAGAAGTGT

ACTGAGGACAGTCGGAGAGTACTTGCAGCGGGAAGAGCCAACC

CCACCAGGCTTAGACTCCTCCCTCAGCCTCGGCTCTGGTATGAG

CAAGATGGAGCTTCTGTCCTGCTTCAGTCCTGTACTGTGGGATC

TACTCCAGAGAGAGAAGAGCGTTTGGGCCCTGAGGACCCTGGT

GAAGGCCTACCTGCGCATGCCTCCAGAAGACCTTCAGCAGCTT

GTGCTTTCAGCAGAGATGGAGGCTGCACAGGGCTTCCTGACGC

TCATGCTTCGTTCCTGGGCTAAGCTGAAGGTTCAACCATCCGAG

GAGCAGGCCATGGGCCGCCTGACAGCCTTGCTGCTCCAGCGGT

ACCCACGCCTCACCTCCCAACTCTTTATCGACATGTCACCGCTC

ATCCCCTTCCTGGCTGTCCCTGACCTCATGCGCTTCCCACCGTC

CCTTTTGGCCAACGACAGTGTCCTGGCTGCCATCAGGGATCACA

GCTCAGGAATGAAGCCTGAACAGAAGGAGGCCCTGGCAAAACG

ACTGCTGGCCCCTGAGCTGTTTGGAGAAGTGCCTGATTGGCCC

CAGGAGCTGCTGTGGGCAGCCCTGCCTCTGCTTCCCCATCTGC

CTCTGGAGAGCTTTCTCCAGCTCAGCCCTCACCAGATCCAGGCC

CTGGAGGATAGCTGGCCAGTAGCAGATCTTGGGCCGGGACACG

CCCGACATGTGCTTCGTAGCCTAGTAAACCAGAGCATGGAGGA

TGGGGAGGAGCAGGTGCTCAGGCTTGGGTCCCTCGCCTGTTTC

CTGAGTCCTGAGGAGCTACAGAGTCTGGTGCCCTTGAGTGATC

CAATGGGGCCTGTAGAACAGGGTCTGCTGGAATGTGCGGCCAA

TGGGACCCTCAGCCCAGAAGGACGGGTGGCATATGAACTTCTG

GGAGTGTTGCGTTCATCTGGAGGAACTGTCTTAAGCCCCCGAGA

GCTGAGGGTCTGGGCACCTCTCTTTCCCCAGCTGGGCCTCCGC

TTCCTGCAGGAGCTCTCAGAGACCCAGCTTAGAGCCATGCTTCC

TGCCCTACAGGGAGCCAGTGTCACACCTGCCCAGGCTGTTCTG

TTGTTTGGAAGGCTCCTTCCTAAGCATGATCTGTCCCTGGAGGA

ACTCTGCTCCCTGCACCCTCTCCTGCCAGGTCTCAGCCCCCAGA

CACTCCAGGCCATCCCTAAGAGAGTTCTGGTTGGTGCTTGTTCC

TGCCTGGGCCCTGAACTGTCAAGGCTTTCAGCTTGCCAGATTGC

AGCTCTGCTGCAGACCTTTCGGGTAAAAGATGGTGTTAAAAATA

TGGGTGCAGCAGGTGCCGGCTCAGCCGTGTGCATTCCTGGGCA

GCCCACCACTTGGCCAGACTGCCTGCTTCCCCTGCTCCCATTAA

AGCTGCTACAGCTGGACGCTGCAGCTCTTCTGGCAAACCGAAG

ACTCTATCGGCAGCTGCCTTGGTCTGAGCAACAGGCACAGTTTC

TCTGGAAGAAAATGCAAGTGCCTACCAACCTGAGCCTGAGGAAT

CTGCAGGCTCTGGGCAACTTGGCAGGAGGCATGACCTGCGAGT

TTCTGCAGCAGATCAGCTCAATGGTTGACTTTCTTGATGTGGTAC

ACATGCTCTACCAGCTGCCCACTGGTGTTCGAGAGAGCCTGCG

GGCCTGTATCTGGACAGAGCTACAGCGGAGGATGACAATGCCA

GAGCCAGAGCTGACCACCCTAGGGCCAGAACTGAGTGAACTTG

ACACAAAGCTACTCCTGGACTTGCCGATCCAGCTGATGGACAGA

TTGTCCAATGATTCCATTATGTTGGTGGTGGAGATGGTCCAAGG

CGCTCCAGAGCAGCTGCTGGCACTGACCCCACTCCACCAGACA

GCCTTGGCAGAGCGAGCACTTAAAAACCTGGCTCCAAAGGAGA

CCCCAATCTCCAAAGAAGTGCTGGAGACACTGGGCCCCTTGGTT

GGATTCCTGGGAATAGAGAGCACGCGACGGATCCCTTTACCCAT

TCTACTGTCTCATCTCAGTCAGCTGCAGGGCTTCTGCCTAGGAG

AGACATTTGCCACAGAGCTGGGATGGCTGCTGTTGCAGGAGCC

TGTTCTTGGAAAACCAGAATTGTGGAGCCAGGATGAAATAGAGC

AAGCTGGACGCCTAGTATTCACTCTGTCTGCTGAGGCTATTTCC

TCGATCCCCAGGGAGGCTTTGGGCCCAGAGACACTGGAGAGGC

TTCTGGGAAAGCATCAAAGCTGGGAGCAGAGCAGAGTGGGCCA

TCTGTGTGGGGAGTCACAGCTTGCCCACAAGAAAGCAGCTCTG

GTAGCTGGGATTGTGCATCCAGCTGCTGAGGGTCTCCAAGAGC

CTGTACCAAACTGTGCAGACATACGGGGAACCTTCCCAGCGGC

CTGGTCTGCGACACAAATCTCAGAGATGGAACTCTCAGACTTTG

AAGACTGCCTGTCACTATTTGCTGGAGATCCAGGACTTGGTCCT

GAGGAACTACGGGCAGCCATGGGCAAGGCCAAGCAGTTGTGG

GGTCCCCCTCGAGGATTCCGTCCTGAGCAGATCTTGCAGCTGG

GCCGTCTCCTGATAGGTCTAGGAGAACGGGAACTGCAGGAGCT

TACCTTGGTGGACTGGGGTGTGCTGAGCAGCCTGGGGCAAATA

GATGGCTGGAGTTCCATGCAGCTCCGAGCCGTGGTCTCCAGTT

TCCTAAGGCAGAGTGGTCGGCATGTGAGCCACCTGGACTTCATT

TATCTGACAGCACTGGGTTACACAGTCTGTGGATTGCGACCAGA

GGAGTTACAGCACATCAGCAGTTGGGAGTTTAGCCAAGCAGCTC

TCTTCCTGGGTAGCTTGCATCTCCCGTGCTCTGAGGAACAGCTG

GAAGTTCTGGCCTATCTCCTTGTGTTGCCTGGTGGCTTTGGCCC

AGTCAGTAACTGGGGGCCTGAGATCTTCACTGAAATTGGCACAA

TAGCAGCTGGCATCCCAGACCTGGCTCTTTCAGCATTACTGCGG

GGACAGATCCAAGGCCTGACTCCTCTTGCCATTTCTGTCATTCC

TGCTCCCAAGTTTGCAGTGGTCTTCAACCCCATCCAGTTATCTA

GTCTCACCAGGGGTCAGGCCGTAGCTGTTACTCCTGAACAGCT

GGCCTATCTGAGTCCTGAGCAGCGGCGAGCAGTTGCATGGGCC

CAACACGAAGGGAAGGAGATCCCAGAGCAGCTGGGTCGAAACT

CAGCCTGGGGTCTCTACGACTGGTTCCAAGCCTCCTGGGCCCT

GGCATTGCCCGTCAGCATTTTTGGCCACCTATTATGA

Expression of Stereocilin in Mammalian Cells

Mutations in STRC have been linked to sensorineural hearing loss. The compositions and methods described herein can be used to induce or increase the expression of WT stereocilin by administering to a subject or contacting a cell with a first nucleic acid vector that contains a polynucleotide encoding an N-terminal portion of a stereocilin protein and a second nucleic acid vector that contains a polynucleotide encoding a C-terminal portion of a stereocilin protein. In order to utilize nucleic acid vectors for therapeutic application in the treatment of sensorineural hearing loss, they can be directed to the interior of the cell, and, in particular, to specific cell types. A wide array of methods has been established for the delivery of proteins to mammalian cells and for the stable expression of genes encoding proteins in mammalian cells.

Polynucleotides Encoding Stereocilin

One platform that can be used to achieve therapeutically effective intracellular concentrations of stereocilin in mammalian cells is via the stable expression of the gene encoding stereocilin (e.g., by integration into the nuclear or mitochondrial genome of a mammalian cell, or by episomal concatemer formation in the nucleus of a mammalian cell). The gene is a polynucleotide that encodes the primary amino acid sequence of the corresponding protein. In order to introduce exogenous genes into a mammalian cell, genes can be incorporated into a vector. Vectors can be introduced into a cell by a variety of methods, including transformation, transfection, transduction, direct uptake, projectile bombardment, and by encapsulation of the vector in a liposome. Examples of suitable methods of transfecting or transforming cells include calcium phosphate precipitation, electroporation, microinjection, infection, lipofection and direct uptake. Such methods are described in more detail, for example, in Green, et al., Molecular Cloning: A Laboratory Manual, Fourth Edition (Cold Spring Harbor University Press, New York 2014); and Ausubel, et al., Current Protocols in Molecular Biology (John Wiley & Sons, New York 2015), the disclosures of each of which are incorporated herein by reference.

STRC can also be introduced into a mammalian cell by targeting vectors containing portions of a gene encoding a stereocilin protein to cell membrane phospholipids. For example, vectors can be targeted to the phospholipids on the extracellular surface of the cell membrane by linking the vector molecule to a VSV-G protein, a viral protein with affinity for all cell membrane phospholipids. Such a construct can be produced using methods well known to those of skill in the field.

Recognition and binding of the polynucleotide encoding a stereocilin protein by mammalian RNA polymerase is important for gene expression. As such, one may include sequence elements within the polynucleotide that exhibit a high affinity for transcription factors that recruit RNA polymerase and promote the assembly of the transcription complex at the transcription initiation site. Such sequence elements include, e.g., a mammalian promoter, the sequence of which can be recognized and bound by specific transcription initiation factors and ultimately RNA polymerase. Examples of mammalian promoters have been described in Smith, et al., Mol. Sys. Biol., 3:73, online publication, the disclosure of which is incorporated herein by reference.

Polynucleotides suitable for use in the compositions and methods described herein include those that encode a stereocilin protein downstream of a mammalian promoter (e.g., a polynucleotide that encodes an N-terminal portion of a stereocilin protein downstream of a mammalian promoter). Promoters that are useful for the expression of a stereocilin protein in mammalian cells include OHC-specific promoters, such as an oncomodulin (OCM) promoter (e.g., a polynucleotide having at least 85% sequence identity (e.g., 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, sequence identity) to any one of the OCM promoter sequences listed in Table 3 (e.g., any one of SEQ ID NOs: 1-3)).

Oncomodulin Promoters

The present inventors have discovered of a region of 1,140 base pairs (bp) located upstream of the OCM translation start site that is sufficient for driving gene expression in OHCs. The compositions and methods described herein can, thus, be used to express stereocilin in OHCs to treat subjects having or at risk of developing hearing loss (e.g., sensorineural hearing loss associated with a mutation in STRC, such as DFNB16). Since the OCM promoters described herein (e.g., an OCM promoter having at least 85% sequence identity to SEQ ID NO: 1) can be used to induce OH-specific gene expression, they can reduce or eliminate off-target expression in other inner ear cells (e.g., in cells other than OHCs), thereby improving the safety and efficacy of gene therapy by targeting STRC expression to the cells in which it is endogenously expressed and reducing toxicity associated with off-target expression.

The compositions and methods described herein include an OCM promoter listed in Table 3 (e.g., any one of SEQ ID NOs: 1-3) that is capable of expressing stereocilin specifically in OHCs, such as a polynucleotide sequence having at least 85% sequence identity (e.g., 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, sequence identity) to any one of SEQ ID NOs: 1-3.

Exemplary OCM promoter sequences are listed in Table 3.

TABLE 3

OCM promoter sequences

SEQ
Description of promoter
Promoter sequence

ID NO:
sequence

1
Murine OCM promoter
AGCAGGTTTGTTACAGAAACCTTAGTTAAGGTTTGTTGAGG

sequence (1140 bp)
GTTTTTTTTCTCTCTCTCTCTCTTAATTGGCTGTCCCAATCC

ATCCTTCTATAAATAGAAAAGAGAGACAGGGAGTGTGTGTG

GTTTCATTACTAAGGTAAAGACACTTGAGCTACACACACTT

GATCCCTGAACATGAAATCTAAGAGGTTGAACGATCACAGT

TTCAGGACTATATAAGGTGGTGAAAGACCATCTGCTTCGTT

TTTCTGTTTGTTCCTACAACTCTTTCCCTCCGCTTGATTTTA

ACTCTAAATTGGTGAGTAGCTGGTGGGCTCACCAGACTCC

GAGATCCTCTTCTCTGCACGCACTGTATTAGACTTGGCACC

CGGGAGGATTTTCACCTCTGCTGCATGGGCTAATCTTCCA

CAAGGGATCTGTGGTATTGCAATCTCGGGTTGATGCATGA

CGGTGATGTTGTGTTTATAGCATGGCTAAGGTTTAGCTGCC

TATGATGATTGGTTAGGGAAGGATAATTTTTGCTAGAAGAT

TGGACTTTAGGGAAAAAAAACCCCACTTTTATTTGCTTTTAG

AATTTTAAAAGACTGGGCCATGTAGCTCAGGCTGGTTTGGA

GTTCATTATGTAGTCAAGGATGCTCTTGGACTCTTTAGCAT

CCTCCTCCTCCTCTTTTTCCTCCTCCTCCTTCTTGTTCTTCT

TCTTGTTCTTCCTCTTCCCCTTCTCTTCCCCCTTCTCTCCCT

CTTCCTCCTCTTCCTCCTCCTTCTTGTTCTTCTTCCTCTTCC

CCTTCTCCTCTCCCCCTTGTCCTCCTCTTCCTTCTCCTCCT

CCTCCTCTTCTTCTTTCTGAGTACCAAGATTGCAAGTGTGC

ACACGATGACCAGCTTGGTCTTTCTTTGTCTTTTTTTTTTAA

CTTCAATTTTGGAGTGAATTCAAGAGCAACCATGTAGTCAA

GAGGTGGCTGGAGTCTTTTCTGTATCTGGGTTTGGTTTAGT

ACTCTGCCCCATCACTTAACAGGTCCTTATGGCCACATCTT

AAAAAAATTCTAGAGATACACGGTGTCGGTGAGTGGCTGA

GAATGTGTGGTCTTCCCATTTCTCTGTCACCGTGGCTCACA

TCTTGTTTCCTCTGTTCGGCCAGGTAGAAA

2
Human OCM promoter
TTTTACCACAATAATTAAAAAGAACAGTCTAGCACAGTGCT

sequence containing a
GGCCATATAAAGGCTCAATAAATGTTTGCTGAAAGTTAAAA

polynucleotide located −2 kb
AAAAAAAAAAAAAAAAAAAAGCCAGGCGCAGTGGTTCATTC

to +0.5 kb of the TSS of the
CTGTAATCCCAGCACTTTAGGAGGATGAGGTGGGAGAATT

human OCM gene
ACTTGAGCCCAGGAGTTCGAGACCAGCCTAGGCAACATGG

CAAAACCCTGTCAAAACCCTGTCTCTCCAAAAAATATGCAT

ATTTAAAAAATTAGCCAGGCATGGTGGTGTGTGCCTGTAGT

ACCAGCTACTCGGGAGACTGAAGTGGGAGGATCGCTTGAG

CCTGGGAGGTCAAGGCTGCAATGAGCTGAGATCGTGCCAC

TGCACTCCAGCCGGGGCAACAGAGCAAGACCCTGTCACAA

CAGAAACAAAATCTTGAGGTGTCTAGTCCTGGCCTCAGCCT

CAGAATATTTGTTTCTGAACATGTTAGTTTTGGGGGTTGGG

GATGCTGGTTTGATTTCCTCCTTTTTGCCTTTTGAGTGTGTG

CAATTTATGGTATAGCTGGGAAACGTCAAAGTCAAGAGTTT

TGTAGGAAAGTCACGTCACTTAGCCCTGTCTCCTGTGCCG

GGTGAGACCTGTGTGTGCACTTGGTGACAATGGCTTTGAG

TCTGTCAACTCCAGACTGAGGTCAGCCTTACACACCCATAG

TTCCCAAAGCTGAAAACAGGCCTGCCTCCAACGGTACCTG

CTAATATCAGGGGAGCCTTTTCAGCTTACAGAGCACCCTGT

ATGTGTTTGTCTTAGTTCAGGCCACCATCTCCACCTTACCA

GGCATCTAGAACCTTCTCCACACTTTGCCAACAGGGTTCGT

TTGCAGAATTGAAATCTTAGTTAAGGTTTGTTGAAGTTTGTT

GTTGTTTTTTTTTTTTTTTTACAATTGGCTGTTCCCACCCACA

TTCCCTTGAGACATAAATAGAAAAAAAAAAAAAAAGAGGTTT

CATGAGTAAGACAAGACATTTGAGCTGCATCCACTTGATCC

TTGAAAAGGAAATCTAAGAGGTTGTAACTATCACTTTTTCTA

GCCTATATAAGGTAGGTCAGTAAGGTAGCAAAAACACATCT

GTTGTTTTGCTCCTTCAACTCTTTTTCCTGATTCTTCCTGGG

GGGAAACCGAAAACGGTGAGTAACTGGTGGACACATCAGA

CCCCAGACTCTTTTCTTCACTGCATGCATTCATATTAGGCT

CAGGTGCTTAGACTCCTGTTTTCCGGTGGCTCTGACACCT

GGAAGGATTTTAATCTCTGGGAGATGGGCTTTTCATCCATC

TGCTTCCCACCTTTCAGGACAGGTGCATGCCTTCTTCCACA

GAATGTCTGCAAGCAGCCCAAACTGTATCCTTTCCCACGTG

GAATTTGCAACATTGCATCTCTCGGGCTGCTGTAGGAAAAT

GCCAGTGCATGTGTAACATGGTTTACGGCTGCCTATGCAA

ATGACTGATTATGTCAGTATAATTTTTATAAGAAAACAATTG

AATCCTTCTTTGGGTCATTTTTTTTTTCCATTTTTGGCATGTA

TTCAAAAGAAGGCTCTGAGACAAAAAAGGCTGGGGTGTTTT

CCGTATCTGGTTTTAATTTGGATATTCTGTCCCGTCACTTAA

TACAAAACCATGCTTATCACATTTTAAAAATTCTAGACAGGC

CTGGCTCGGTGGCTTGCATCTGTCATCCCAGCACTTTGTG

AGGCCAAGGCAGGCAGATCACCTGAGGTCAGGAGCTCAA

GACCAGCCTGGCCAACATGGCAAAACCCCGTCTCTACTAA

AAACACAAAAATTAGCCAGGCATGGTAGTGCGCACCTGTA

ATCCCAGCTACTGGGAAGGCTTAGGCAGGAGAATCACTTG

AGCCCAGGAGGCGGAGGTTGCGGTGAGCCGAGATCACGC

TCTTGCACTCCAGCCTGGGTGACAGAGTGAGACTCCGTCT

TAATTTAAAAAAAAAAATAATCTAGACACACATACAGTTTCA

GTGGGCCTGGGAAGATGTGTTTCCCCTGGATGTGCACATT

CCTGTTTGTGGCTTATCGCCTCTCATTTATTCTGTGTGAGT

AGGTAGAAAATGAGCATCACGGACGTGCTCAGTGCTGACG

ACATTGCAGCAGCGCTCCAGGAATGCCGAGGTAGAGGGG

ACGTGAGGCGGGGGTGGGATTTCCTCACAGCTTTGCACCT

CCAGCGAGTCAACACAAAATCAAAATGTAGGCCAGGCGGC

CAGACGCAGTGGCTCACACCTGTAATCCCAGCACTTTGGG

AGGCCGAGGCGGGTGGATCACGAGGTCAGGAGTTCGAGA

CCAGCCTGGCCAAGATGGTGAAACCCCATCTCTACTAAAA

ATACAAAAAAATTAACCGGGCGTGGTGGTGGGTGCCTGTA

ATCCCAGCTACTCGGGAGGCTGAGGCAGAGAATTGCTTGA

ACCCGGGAGGCAGAAGTTGCAGTGAGCTGAGATCATGCCA

CTGCACTCCAGCCTGGGCA

3
Human OCM promoter
GTTCCCAAAGCTGAAAACAGGCCTGCCTCCAACGGTACCT

sequence containing
GCTAATATCAGGGGAGCCTTTTCAGCTTACAGAGCACCCT

regions from SEQ ID NO: 2
GTATGTGTTTGTCTTAGTTCAGGCACCTTACCAGGCATCTA

that are conserved across
GAACCTTCTCCACACTTTGCCAACAGGGTTCGTTTGCAGAA

mammalian species
TTGAAATCTTAGTTAAGGTTTGTTGAAGTTTGTTGTTGTTTT

TTTTTTTTTTTTACAATTGGCTGTTCCCACCCACATTCCCTT

GAGACATAAATAGAAAAAAAAAAAAAAAGAGGTTTCATGAG

TAAGACAAGACATTTGAGCTGCATCCACTTGATCCTTGAAA

AGGAAATCTAAGAGGTTGTAACTATCACTTTTTCTAGCCTAT

ATAAGGTAGGTCAGTAAGGTAGCAAAAACACATCTGTTGTT

TTGCTCCTTCAACTCTTTTTCCTGATTCTTCCTGGGGGGAA

ACCGAAAACGGTGAGTAACTGGTGGACACATCAGACCCCA

GACTCTTTTCTTCACTGCATGCATTCATATTAGGCTCAGGT

GCTTAGACTCCTGTTTTCCGGTTTACGGCTGCCTATGCAAA

TGACTGATTATGTCAGTATAATTTTTATAAGAAAACAATTGA

ATCCTTCTTTGGGTCATTTTTTTTTTCCATTTTTGGCATGTAT

GTGCACATTCCTGTTTGTGGCTTATCGCCTCTCATTTATTCT

GTGTGAGTAGGTAGAAAATGAGCATCACGGACGTGCTCAG

TGCTGACGACATTGCAGCAGCGCTCCAGGAATGCCGAGGT

AGAGGGGACGTGAGGGGGGGGTGGGATTTCCTCACAGCT

TTGCACCTCCAGC

The foregoing polynucleotides can be included in a nucleic acid vector and operably linked to a transgene to express the transgene specifically in OHCs. In the vectors described herein, the transgene can encode an N-terminal portion of a stereocilin protein. According to the methods described herein, a subject can be administered a composition containing one of the foregoing polynucleotides (e.g., any one the polynucleotide sequences listed in Table 3 (e.g., SEQ ID NOs: 1-3) or a polynucleotide sequence having at least 85% sequence identity thereto (e.g., 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, sequence identity any one of SEQ ID NOs: 1-3)) operably linked to a polynucleotide encoding, e.g., an N-terminal portion of a stereocilin protein (e.g., an N-terminal portion of SEQ ID NO: 4 or SEQ ID NO: 5) for the treatment of hearing loss.

Once a polynucleotide encoding stereocilin has been incorporated into the nuclear DNA of a mammalian cell, the transcription of this polynucleotide can be induced by methods known in the art. For example, expression can be induced by exposing the mammalian cell to an external chemical reagent, such as an agent that modulates the binding of a transcription factor and/or RNA polymerase to the mammalian promoter and thus regulates gene expression. The chemical reagent can serve to facilitate the binding of RNA polymerase and/or transcription factors to the mammalian promoter, e.g., by removing a repressor protein that has bound the promoter. Alternatively, the chemical reagent can serve to enhance the affinity of the mammalian promoter for RNA polymerase and/or transcription factors such that the rate of transcription of the gene located downstream of the promoter is increased in the presence of the chemical reagent. Examples of chemical reagents that potentiate polynucleotide transcription by the above mechanisms include tetracycline and doxycycline. These reagents are commercially available (Life Technologies, Carlsbad, CA) and can be administered to a mammalian cell in order to promote gene expression according to established protocols.

Other DNA sequence elements that may be included in polynucleotides for use in the compositions and methods described herein include enhancer sequences. Enhancers represent another class of regulatory elements that induce a conformational change in the polynucleotide containing the gene of interest such that the DNA adopts a three-dimensional orientation that is favorable for binding of transcription factors and RNA polymerase at the transcription initiation site. Thus, polynucleotides for use in the compositions and methods described herein include those that encode an STRC protein and additionally include a mammalian enhancer sequence. Many enhancer sequences are now known from mammalian genes, and examples include enhancers from the genes that encode mammalian globin, elastase, albumin, α-fetoprotein, and insulin. Enhancers for use in the compositions and methods described herein also include those that are derived from the genetic material of a virus capable of infecting a eukaryotic cell. Examples include the SV40 enhancer on the late side of the replication origin (bp 100-270), the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers. Additional enhancer sequences that induce activation of eukaryotic gene transcription include the CMV enhancer and RSV enhancer. An enhancer may be spliced into a vector containing a polynucleotide encoding a protein of interest, for example, at a position 5′ or 3′ to this gene. In a preferred orientation, the enhancer is positioned at the 5′ side of the promoter, which in turn is located 5′ relative to the polynucleotide encoding a stereocilin protein.

The nucleic acid vectors described herein may include a Woodchuck Posttranscriptional Regulatory Element (WPRE). The WPRE acts at the mRNA level, by promoting nuclear export of transcripts and/or by increasing the efficiency of polyadenylation of the nascent transcript, thus increasing the total amount of mRNA in the cell. The addition of the WPRE to a vector can result in a substantial improvement in the level of transgene expression from several different promoters, both in vitro and in vivo.

In some embodiments, the nucleic acid vectors described herein include a reporter sequence, which can be useful in verifying stereocilin expression, for example, in cells and tissues (e.g., in OHCs). Reporter sequences that may be provided in a transgene include DNA sequences encoding β-lactamase, β-galactosidase (LacZ), alkaline phosphatase, thymidine kinase, green fluorescent protein (GFP), chloramphenicol acetyltransferase (CAT), luciferase, and others well known in the art. When associated with regulatory elements that drive their expression, such as an OCM promoter, the reporter sequences provide signals detectable by conventional means, including enzymatic, radiographic, colorimetric, fluorescence or other spectrographic assays, fluorescent activating cell sorting assays and immunological assays, including enzyme linked immunosorbent assay (ELISA), radioimmunoassay (RIA), and immunohistochemistry. For example, where the marker sequence is the LacZ gene, the presence of the vector carrying the signal is detected by assays for β-galactosidase activity. Where the transgene is green fluorescent protein or luciferase, the vector carrying the signal may be measured visually by color or light production in a luminometer.

Dual Vector Expression Systems
Overlapping Dual Vectors

One approach for expressing large proteins in mammalian cells involves the use of overlapping dual vectors. This approach is based on the use of two nucleic acid vectors, each of which contains a portion of a polynucleotide that encodes a protein of interest and has a defined region of sequence overlap with the other polynucleotide. Homologous recombination can occur at the region of overlap and lead to the formation of a single polynucleotide that encodes the full-length protein of interest (e.g., a stereocilin protein).

Overlapping dual vectors for use in the methods and compositions described herein contain at least 200 bases of overlapping sequence (e.g., at least 200 b, 300 b, 400 b, 500 b, 600 b, 700 b, 800 b, 900 b, 1.0 kilobase (kb), 1.1 kb, 1.2 kb, 1.3 kb, 1.4 kb, 1.5 kb or more of overlapping sequence). The nucleic acid vectors are designed such that the overlapping region is centered at or near a position within the stereocilin-encoding polynucleotide that corresponds to approximately half of the length of the stereocilin-encoding polynucleotide, with an equal amount of overlap on either side of the central position. The center of the overlapping region can also be chosen based on the size of the promoter and the locations of sequence elements of interest in the polynucleotide that encodes stereocilin. In some embodiments, the stereocilin-encoding polynucleotide is split in two halves of approximately equal length with some degree of overlap (e.g., 50 b, 100 b, 150 b, 200 b, 250 b, 300 b, 350 b, 400 b, 450 b, 500 b, 600 b, 700 b, 800 b, 900 b, 1 kb, 1.1 kb, 1.2 kb, 1.3 kb, 1.4 kb, 1.5 kb, or more), in which the 5′ half of the polynucleotide encodes an N-terminal portion of the stereocilin protein and the 3′ half of the polynucleotide encodes a C-terminal portion of the stereocilin protein. The nucleic acid vectors for use in the methods and compositions described herein are also designed such that approximately half of the stereocilin-encoding polynucleotide is contained within each vector (e.g., each vector contains a polynucleotide that encodes approximately half of the stereocilin protein).

In some embodiments, the first nucleic acid vector encodes an N-terminal portion of the stereocilin protein. In some embodiments, the second nucleic acid vector encodes a C-terminal portion of the stereocilin protein. In some embodiments, the stereocilin protein has the sequence of SEQ ID NO: 4 or at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity thereto. In some embodiments, the stereocilin protein has the sequence of SEQ ID NO: 5 or at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity thereto. In some embodiments, the polynucleotide that encodes a full-length human stereocilin protein has the sequence of SEQ ID NO: 6 or is a variant thereof having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 6. In some embodiments, the polynucleotide that encodes a full-length murine stereocilin protein has the sequence of SEQ ID NO: 7 or is a variant thereof having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 7.

One exemplary overlapping dual vector system includes a first nucleic acid vector containing an OCM promoter described hereinabove (e.g., an OCM promoter having at least 85% sequence identity (e.g., 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, sequence identity) to any one of SEQ ID NOs: 1-3) operably linked to polynucleotide encoding an N-terminal portion of a stereocilin protein (e.g., an N-terminal portion of SEQ ID NO: 4 or SEQ ID NO: 5) including 500 b immediately 3′ of the position selected as the central position; and a second nucleic acid vector containing the C-terminal portion of the polynucleotide encoding the stereocilin protein, which includes 500 b immediately 5′ of the position selected as the central position, and a poly(A) sequence (e.g., a bovine growth hormone (bGH) poly(A) signal sequence). The nucleic acid vectors can optionally contain STRC untranslated regions (UTRs). In some embodiments, the OCM promoter is a polynucleotide having the sequence of SEQ ID NO: 1 or a variant having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 1. In some embodiments, the OCM promoter is a polynucleotide having the sequence of SEQ ID NO: 2 or a variant having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity SEQ ID NO: 2. In some embodiments, the OCM promoter is a polynucleotide having the sequence of SEQ ID NO: 3 or a variant having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity SEQ ID NO: 3.

In some embodiments, the first member of the dual vector system includes the OCM promoter of SEQ ID NO:1 (also represented by nucleotides 225-1364 of SEQ ID NO: 43) operably linked to nucleotides that encode an N-terminal portion of a stereocilin protein. In certain embodiments, the nucleotide sequence that encodes an N-terminal portion of a stereocilin protein is nucleotides 1375-4574 of SEQ ID NO: 43. The nucleotide sequences that encode an N-terminal portion of a stereocilin protein can be partially or fully codon-optimized for expression. In particular embodiments, the first member of the dual vector system includes nucleotides 225-4574 of SEQ ID NO: 43 flanked on each of the 5′ and 3′ sides by an inverted terminal repeat. In some embodiments, the flanking inverted terminal repeats are any variant of AAV2 inverted terminal repeats that can be encapsidated by a plasmid that carries the AAV2 Rep gene. In certain embodiments, the 5′ flanking inverted terminal repeat has a sequence corresponding to nucleotides 1-130 of SEQ ID NO: 43 or a sequence having at least 80% sequence identity (at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity) thereto; and the 3′ flanking inverted terminal repeat has a sequence corresponding to nucleotides 4662-4791 of SEQ ID NO: 43 or a sequence having at least 80% sequence identity (at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity) thereto. It will be understood by those of skill in the art that, for any given pair of inverted terminal repeat sequences in a transfer plasmid that is used to create the viral vector (typically by transfecting cells with that plasmid together with other plasmids carrying the necessary AAV genes for viral vector formation) (e.g., SEQ ID NO: 43), that the corresponding sequence in the viral vector can be altered due to the ITRs adopting a “flip” or “flop” orientation during recombination. Thus, the sequence of the ITR in the transfer plasmid is not necessarily the same sequence that is found in the viral vector prepared therefrom. However, in some very specific embodiments, the first member of the dual vector system includes nucleotides 1-4791 of SEQ ID NO: 43.

In some embodiments, the second member of the dual vector system includes nucleotides that encode the C-terminal portion of the stereocilin protein immediately followed by a stop codon. In certain embodiments, the nucleotide sequence that encodes the C-terminal amino acids of the stereocilin protein is nucleotides 211-3440 of SEQ ID NO: 44. The nucleotide sequences that encode the C-terminal portion of the STRC protein can be partially or fully codon-optimized for expression. In some embodiments, the second member of the dual vector system includes a WPRE sequence corresponding to nucleotides 3452-3999 of SEQ ID NO: 44. In some embodiments, the second member of the dual vector system includes the poly(A) sequence corresponding to nucleotides 4012-4219 of SEQ ID NO: 44. In particular embodiments, the second member of the dual vector system includes nucleotides 211-4219 of SEQ ID NO: 44 flanked on each of the 5′ and 3′ sides by an inverted terminal repeat. In some embodiments, the flanking inverted terminal repeats are any variant of AAV2 inverted terminal repeats that can be encapsidated by a plasmid that carries the AAV2 Rep gene. In certain embodiments, the 5′ flanking inverted terminal repeat has a sequence corresponding to nucleotides 1-130 of SEQ ID NO: 44 or a sequence having at least 80% sequence identity (at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity) thereto; and the 3′ flanking inverted terminal repeat has a sequence corresponding to nucleotides 4307-4436 of SEQ ID NO: 44 or a sequence having at least 80% sequence identity (at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity) thereto. It will be understood by those of skill in the art that, for any given pair of inverted terminal repeat sequences in a transfer plasmid that is used to create the viral vector (typically by transfecting cells with that plasmid together with other plasmids carrying the necessary AAV genes for viral vector formation) (e.g., SEQ ID NO: 44), that the corresponding sequence in the viral vector can be altered due to the ITRs adopting a “flip” or “flop” orientation during recombination. Thus, the sequence of the ITR in the transfer plasmid is not necessarily the same sequence that is found in the viral vector prepared therefrom. However, in some very specific embodiments, the first member of the dual vector system includes nucleotides 1-4436 of SEQ ID NO: 44.

Transfer plasmids that may be used to produce nucleic acid vectors for use in the compositions and methods described herein are provided in Tables 4 and 5. A transfer plasmid (e.g., a plasmid containing a DNA sequence to be delivered by a nucleic acid vector, e.g., to be delivered by an AAV) may be co-delivered into producer cells with a helper plasmid (e.g., a plasmid providing proteins necessary for AAV manufacture) and a rep/cap plasmid (e.g., a plasmid that provides AAV capsid proteins and proteins that insert the transfer plasmid DNA sequence into the capsid shell) to produce a nucleic acid vector (e.g., an AAV vector) for administration. Nucleic acid vectors (e.g., a nucleic acid vector (e.g., an AAV vector) containing a polynucleotide encoding an N-terminal portion of a stereocilin protein and a nucleic acid vector (e.g., an AAV vector) containing a polynucleotide encoding a C-terminal portion a stereocilin protein) can be combined (e.g., in a single formulation) prior to administration.

Transfer plasmids that may be used to produce nucleic acid vectors (e.g., AAV vectors) for co-formulation or co-administration (e.g., administration simultaneously or sequentially) in a overlapping dual vector system are provided in Table 4 (SEQ ID NO: 43 and SEQ ID NO: 44).

TABLE 4

Transfer plasmids designed to produce overlapping dual vectors

SEQ

ID

NO.
Description
Plasmid Sequence

43
Plasmid P959
CTGCGCGCTCGCTCGCTCACTGAGGCCGCCCGGGCAAAGCCCGG

5′ ITR at nucleotide
GCGTCGGGCGACCTTTGGTCGCCCGGCCTCAGTGAGCGAGCGAG

positions 1-130
CGCGCAGAGAGGGAGTGGCCAACTCCATCACTAGGGGTTCCTTGT

Murine OCM promoter at
AGTTAATGATTAACCCGCCATGCTACTTATCTACGTAGCCATGCTC

nucleotide positions
TAGGAAGATCGGAATTCGCCCTTAAGCTAGCGGCGCGCCACCGGT

225-1364
AGCAGGTTTGTTACAGAAACCTTAGTTAAGGTTTGTTGAGGGTTTT

N-terminal STRC coding
TTTTCTCTCTCTCTCTCTTAATTGGCTGTCCCAATCCATCCTTCTAT

sequence at nucleotide
AAATAGAAAAGAGAGACAGGGAGTGTGTGTGGTTTCATTACTAAG

positions 1375-4574
GTAAAGACACTTGAGCTACACACACTTGATCCCTGAACATGAAATC

(including overlap at
TAAGAGGTTGAACGATCACAGTTTCAGGACTATATAAGGTGGTGAA

nucleotide positions
AGACCATCTGCTTCGTTTTTCTGTTTGTTCCTACAACTCTTTCCCTC

4075-4574 with P724)
CGCTTGATTTTAACTCTAAATTGGTGAGTAGCTGGTGGGCTCACCA

3′ ITR at nucleotide
GACTCCGAGATCCTCTTCTCTGCACGCACTGTATTAGACTTGGCAC

positions 4662-4791
CCGGGAGGATTTTCACCTCTGCTGCATGGGCTAATCTTCCACAAG

GGATCTGTGGTATTGCAATCTCGGGTTGATGCATGACGGTGATGT

TGTGTTTATAGCATGGCTAAGGTTTAGCTGCCTATGATGATTGGTT

AGGGAAGGATAATTTTTGCTAGAAGATTGGACTTTAGGGAAAAAAA

ACCCCACTTTTATTTGCTTTTAGAATTTTAAAAGACTGGGCCATGTA

GCTCAGGCTGGTTTGGAGTTCATTATGTAGTCAAGGATGCTCTTGG

ACTCTTTAGCATCCTCCTCCTCCTCTTTTTCCTCCTCCTCCTTCTTG

TTCTTCTTCTTGTTCTTCCTCTTCCCCTTCTCTTCCCCCTTCTCTCC

CTCTTCCTCCTCTTCCTCCTCCTTCTTGTTCTTCTTCCTCTTCCCCT

TCTCCTCTCCCCCTTGTCCTCCTCTTCCTTCTCCTCCTCCTCCTCTT

CTTCTTTCTGAGTACCAAGATTGCAAGTGTGCACACGATGACCAGC

TTGGTCTTTCTTTGTCTTTTTTTTTTAACTTCAATTTTGGAGTGAATT

CAAGAGCAACCATGTAGTCAAGAGGTGGCTGGAGTCTTTTCTGTAT

CTGGGTTTGGTTTAGTACTCTGCCCCATCACTTAACAGGTCCTTAT

GGCCACATCTTAAAAAAATTCTAGAGATACACGGTGTCGGTGAGTG

GCTGAGAATGTGTGGTCTTCCCATTTCTCTGTCACCGTGGCTCACA

TCTTGTTTCCTCTGTTCGGCCAGGTAGAAAGGCGGCCGCCATGGC

TCTGAGCCTCCAGCCCCAGCTGCTCCTTCTCCTGTCGCTCCTGCC

GCAGGAAGTGACTTCAGCCCCTACTGGGCCTCAGTCTTTGGATGC

TGGTCTCTCCCTTCTGAAGTCATTCGTAGCCACTCTGGACCAAGCT

CCTCAGCGTTCCCTCAGCCAGTCACGGTTCTCTGCGTTCCTGGCC

AACATTTCTTCATCCTTCCAGCTTGGGAGGATGGGGGAGGGACCG

GTGGGAGAGCCCCCACCTCTCCAGCCCCCTGCACTTCGACTTCAT

GATTTCCTCGTGACACTGAGAGGTAGCCCAGACTGGGAGCCAATG

CTAGGGCTTCTGGGAGATGTGCTGGCACTCCTGGGACAGGAACA

GACTCCCCGGGACTTTTTGGTGCACCAGGCAGGTGTACTGGGTG

GACTTGTAGAGGCATTGTTGGGAGCGTTAGTTCCTGGAGGCCCCC

CTGCCCCCACTCGACCCCCATGCACCCGTGATGGCCCTTCTGACT

GTGTCCTGGCTGCTGATTGGTTGCCTTCTCTGATGTTGTTATTAGA

GGGTACACGCTGGCAGGCCCTGGTGCAGTTGCAGCCCAGTGTGG

ACCCAACCAATGCCACAGGTCTTGATGGTAGAGAGCCAGCTCCTC

ACTTTTTACAGGGTCTGCTGGGCTTGCTTACCCCAGCAGGAGAGT

TGGGCTCTGAGGAGGCTCTTTGGGGTGGTCTGCTGCGCACAGTG

GGGGCCCCCCTCTATGCTGCCTTCCAGGAGGGGCTACTGCGAGT

CACTCATTCTCTGCAAGATGAGGTCTTTTCTATTATGGGACAGCCA

GAGCCTGATGCCAGTGGGCAGTGCCAGGGAGGCAACCTTCAACA

GCTGCTTTTATGGGGCATGCGGAACAACCTTTCTTGGGACGCCCG

AGCACTGGGTTTTCTATCTGGATCACCACCTCCACCCCCTGCTCTC

CTGCACTGCCTGAGCAGAGGTGTGCCTCTGCCCAGGGCTTCCCA

GCCTGCGGCTCACATCAGCCCTCGACAGCGGCGAGCCATCTCTG

TGGAGGCCCTCTGCGAGAACCACTCAGGCCCAGAGCCACCCTAC

AGCATCTCCAACTTCTCCATCTACTTGCTCTGCCAGCACATCAAGC

CTGCCACCCCGCGGCCCCCTCCTACCACCCCACGGCCTCCTCCT

ACCACCCCACAGCCCCCTCCTACCACTACACAGCCCATTCCTGAC

ACTACACAGCCCCCTCCTGTCACCCCAAGGCCTCCTCCTACCACC

CCACAACCCCCTCCTAGCACAGCTGTCATCTGCCAGACAGCTGTA

TGGTACGCAGTCTCGTGGGCACCAGGTGCCCGAGGTTGGCTCCA

AGCCTGCCATGATCAGTTTCCTGATCAATTTCTGGATATGATCTGC

GGCAACCTCTCATTTTCAGCCCTGTCTGGCCCCAGTCGTCCTTTG

GTAAAGCAGCTCTGTGCTGGCTTGCTCCCACCCCCCACTAGCTGT

CCACCAGGCCTGATCCCTGTGCCCCTCACCCCAGAAATATTCTGG

GGCTGTTTCCTGGAGAATGAGACACTGTGGGCTGAACGGTTGTGT

GTGGAGGACAGTCTGCAGGCTGTGCCCCCGAGGAACCAGGCTTG

GGTTCAGCATGTGTGTCGGGGCCCCACCTTGGACGCCACTGATTT

TCCACCGTGCCGCGTTGGACCCTGTGGGGAACGCTGCCCAGATG

GGGGCAGCTTCCTGCTCATGGTCTGTGCCAATGACACTCTGTATG

AAGCCTTGGTTCCCTTCTGGGCTTGGCTAGCAGGCCAATGCAGAA

TTAGTCGTGGAGGAAATGATACTTGCTTTCTAGAAGGCATGCTGG

GCCCCTTGTTGCCCTCTCTGCCCCCTCTGGGACCATCCCCACTCT

GTCTGGCTCCTGGTCCTTTTCTGCTTGGCATGTTATCCCAGTTGCC

ACGCTGTCAGTCCTCCGTGCCAGCCCTCGCCCACCCCACGCGCC

TACATTACCTCCTGCGCCTACTGACCTTCCTTCTGGGTCCAGGGA

CTGGGGGTGCCGAGACGCAGGGGATGTTAGGTCAAGCCCTGCTG

CTCTCTAGTCTCCCAGACAACTGTTCATTCTGGGATGCCTTCCGCC

CAGAGGGCCGGAGAAGTGTACTGAGGACAGTCGGAGAGTACTTG

CAGCGGGAAGAGCCAACCCCACCAGGCTTAGACTCCTCCCTCAG

CCTCGGCTCTGGTATGAGCAAGATGGAGCTTCTGTCCTGCTTCAG

TCCTGTACTGTGGGATCTACTCCAGAGAGAGAAGAGCGTTTGGGC

CCTGAGGACCCTGGTGAAGGCCTACCTGCGCATGCCTCCAGAAG

ACCTTCAGCAGCTTGTGCTTTCAGCAGAGATGGAGGCTGCACAGG

GCTTCCTGACGCTCATGCTTCGTTCCTGGGCTAAGCTGAAGGTTC

AACCATCCGAGGAGCAGGCCATGGGCCGCCTGACAGCCTTGCTG

CTCCAGCGGTACCCACGCCTCACCTCCCAACTCTTTATCGACATGT

CACCGCTCATCCCCTTCCTGGCTGTCCCTGACCTCATGCGCTTCC

CACCGTCCCTTTTGGCCAACGACAGTGTCCTGGCTGCCATCAGGG

ATCACAGCTCAGGAATGAAGCCTGAACAGAAGGAGGCCCTGGCAA

AACGACTGCTGGCCCCTGAGCTGTTTGGAGAAGTGCCTGATTGGC

CCCAGGAGCTGCTGTGGGCAGCCCTGCCTCTGCTTCCCCATCTGC

CTCTGGAGAGCTTTCTCCAGCTCAGCCCTCACCAGATCCAGGCCC

TGGAGGATAGCTGGCCAGTAGCAGATCTTGGGCCGGGACACGCC

CGACATGTGCTTCGTAGCCTAGTAAACCAGAGCATGGAGGATGGG

GAGGAGCAGGTGCTCAGGCTTGGGTCCCTCGCCTGTTTCCTGAGT

CCTGAGGAGCTACAGAGTCTGGTGCCCTTGAGTGATCCAATGGGG

CCTGTAGAACAGGGTCTGCTGGAATGTGCGGCCAATGGGACCCTC

AGCCCAGAAGGACGGGTGGCATATGAACTTCTGGGAGTGTTGCGT

TCATCTGGAGGAACTGTCTTAAGCCCCCGAGAGCTGAGGGTCTGG

GCACCTCTCTTTCCCCAGCTGGGCCTCCGCTTCCTGCAGGAGCTC

TCAGAGACCCAGCTTAGAGCCATGCTTCCTGCCCTACAGGGAGCC

AGTGTCACACCCTCGAGTTAAGGGCGAATTCCCGATAAGGATCTT

CCTAGAGCATGGCTACGTAGATAAGTAGCATGGGGGGTTAATCAT

TAACTACAAGGAACCCCTAGTGATGGAGTTGGCCACTCCCTCTCT

GCGCGCTCGCTCGCTCACTGAGGCCGGGCGACCAAAGGTCGCCC

GACGCCCGGGCTTTGCCCGGGGGGCCTCAGTGAGCGAGCGAGC

GCGCAGCCTTAATTAACCTAATTCACTGGCCGTCGTTTTACAACGT

CGTGACTGGGAAAACCCTGGCGTTACCCAACTTAATCGCCTTGCA

GCACATCCCCCTTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCG

CACCGATCGCCCTTCCCAACAGTTGCGCAGCCTGAATGGCGAATG

GGACGCGCCCTGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTG

GTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCCTAGCGCC

CGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGC

TTTCCCCGTCAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGA

TTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTAGGGTG

ATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCC

CTTTGACGTTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCA

AACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTAT

AAGGGATTTTGCCGATTTCGGCCTATTGGTTAAAAAATGAGCTGAT

TTAACAAAAATTTAACGCGAATTTTAACAAAATATTAACGCTTACAA

TTTAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTG

TTTATTTTTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATA

ACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGC

CATATTCAACGGGAAACGTCGAGGCCGCGATTAAATTCCAACATG

GATGCTGATTTATATGGGTATAAATGGGCTCGCGATAATGTCGGG

CAATCAGGTGCGACAATCTATCGCTTGTATGGGAAGCCCGATGCG

CCAGAGTTGTTTCTGAAACATGGCAAAGGTAGCGTTGCCAATGAT

GTTACAGATGAGATGGTCAGACTAAACTGGCTGACGGAATTTATGC

CTCTTCCGACCATCAAGCATTTTATCCGTACTCCTGATGATGCATG

GTTACTCACCACTGCGATCCCCGGAAAAACAGCATTCCAGGTATTA

GAAGAATATCCTGATTCAGGTGAAAATATTGTTGATGCGCTGGCAG

TGTTCCTGCGCCGGTTGCATTCGATTCCTGTTTGTAATTGTCCTTTT

AACAGCGATCGCGTATTTCGTCTTGCTCAGGCGCAATCACGAATG

AATAACGGTTTGGTTGATGCGAGTGATTTTGATGACGAGCGTAATG

GCTGGCCTGTTGAACAAGTCTGGAAAGAAATGCATAAACTTTTGCC

ATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTGAT

AACCTTATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGTTG

GACGAGTCGGAATCGCAGACCGATACCAGGATCTTGCCATCCTAT

GGAACTGCCTCGGTGAGTTTTCTCCTTCATTACAGAAACGGCTTTT

TCAAAAATATGGTATTGATAATCCTGATATGAATAAATTGCAGTTTC

ATTTGATGCTCGATGAGTTTTTCTAACTGTCAGACCAAGTTTACTCA

TATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAAGGATC

TAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACG

TGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAA

GGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGCTGCTTGC

AAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATC

AAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGC

GCAGATACCAAATACTGTTCTTCTAGTGTAGCCGTAGTTAGGCCAC

CACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAA

TCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTA

CCGGGTTGGACTCAAGACGATAGTTACCGGATAAGGCGCAGCGG

TCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCG

AACGACCTACACCGAACTGAGATACCTACAGCGTGAGCTATGAGA

AAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGG

TAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCA

GGGGGAAACGCCTGGTATCTTTATAGTCCTGTCGGGTTTCGCCAC

CTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGG

AGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTG

GCCTTTTGCTGGCCTTTTGCTCACATGTTCTTTCCTGCGTTATCCC

CTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATAC

CGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCG

AGGAAGCGGAAGAGCGCCCAATACGCAAACCGCCTCTCCCCGCG

CGTTGGCCGATTCATTAATGCAGCTGGCACGACAGGTTTCCCGAC

TGGAAAGCGGGCAGTGAGCGCAACGCAATTAATGTGAGTTAGCTC

ACTCATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTA

TGTTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAG

CTATGACCATGATTACGCCAGATTTAATTAAGGCCTTAATTAGG

44
Plasmid P724
CTGCGCGCTCGCTCGCTCACTGAGGCCGCCCGGGCAAAGCCCG

5′ ITR at nucleotide
GGCGTCGGGCGACCTTTGGTCGCCCGGCCTCAGTGAGCGAGCG

positions 1-130
AGCGCGCAGAGAGGGAGTGGCCAACTCCATCACTAGGGGTTCCT

C-terminal STRC coding
TGTAGTTAATGATTAACCCGCCATGCTACTTATCTACGTAGCCATG

sequence at nucleotide
CTCTAGGAAGATCGGAATTCGCCCTTAAGCTAGCGCCTCGGCTCT

positions 211-3440
GGTATGAGCAAGATGGAGCTTCTGTCCTGCTTCAGTCCTGTACTG

(including overlap at
TGGGATCTACTCCAGAGAGAGAAGAGCGTTTGGGCCCTGAGGAC

nucleotide positions
CCTGGTGAAGGCCTACCTGCGCATGCCTCCAGAAGACCTTCAGC

211-710 with P959)
AGCTTGTGCTTTCAGCAGAGATGGAGGCTGCACAGGGCTTCCTG

WPRE at nucleotide
ACGCTCATGCTTCGTTCCTGGGCTAAGCTGAAGGTTCAACCATCC

positions 3452-3999
GAGGAGCAGGCCATGGGCCGCCTGACAGCCTTGCTGCTCCAGC

bGH poly(A) at nucleotide
GGTACCCACGCCTCACCTCCCAACTCTTTATCGACATGTCACCGC

positions 4012-4219
TCATCCCCTTCCTGGCTGTCCCTGACCTCATGCGCTTCCCACCGT

3′ ITR at nucleotide
CCCTTTTGGCCAACGACAGTGTCCTGGCTGCCATCAGGGATCACA

positions 4307-4436
GCTCAGGAATGAAGCCTGAACAGAAGGAGGCCCTGGCAAAACGA

CTGCTGGCCCCTGAGCTGTTTGGAGAAGTGCCTGATTGGCCCCA

GGAGCTGCTGTGGGCAGCCCTGCCTCTGCTTCCCCATCTGCCTC

TGGAGAGCTTTCTCCAGCTCAGCCCTCACCAGATCCAGGCCCTG

GAGGATAGCTGGCCAGTAGCAGATCTTGGGCCGGGACACGCCC

GACATGTGCTTCGTAGCCTAGTAAACCAGAGCATGGAGGATGGG

GAGGAGCAGGTGCTCAGGCTTGGGTCCCTCGCCTGTTTCCTGAG

TCCTGAGGAGCTACAGAGTCTGGTGCCCTTGAGTGATCCAATGG

GGCCTGTAGAACAGGGTCTGCTGGAATGTGCGGCCAATGGGACC

CTCAGCCCAGAAGGACGGGTGGCATATGAACTTCTGGGAGTGTT

GCGTTCATCTGGAGGAACTGTCTTAAGCCCCCGAGAGCTGAGGG

TCTGGGCACCTCTCTTTCCCCAGCTGGGCCTCCGCTTCCTGCAG

GAGCTCTCAGAGACCCAGCTTAGAGCCATGCTTCCTGCCCTACAG

GGAGCCAGTGTCACACCTGCCCAGGCTGTTCTGTTGTTTGGAAG

GCTCCTTCCTAAGCATGATCTGTCCCTGGAGGAACTCTGCTCCCT

GCACCCTCTCCTGCCAGGTCTCAGCCCCCAGACACTCCAGGCCA

TCCCTAAGAGAGTTCTGGTTGGTGCTTGTTCCTGCCTGGGCCCTG

AACTGTCAAGGCTTTCAGCTTGCCAGATTGCAGCTCTGCTGCAGA

CCTTTCGGGTAAAAGATGGTGTTAAAAATATGGGTGCAGCAGGTG

CCGGCTCAGCCGTGTGCATTCCTGGGCAGCCCACCACTTGGCCA

GACTGCCTGCTTCCCCTGCTCCCATTAAAGCTGCTACAGCTGGAC

GCTGCAGCTCTTCTGGCAAACCGAAGACTCTATCGGCAGCTGCCT

TGGTCTGAGCAACAGGCACAGTTTCTCTGGAAGAAAATGCAAGTG

CCTACCAACCTGAGCCTGAGGAATCTGCAGGCTCTGGGCAACTT

GGCAGGAGGCATGACCTGCGAGTTTCTGCAGCAGATCAGCTCAA

TGGTTGACTTTCTTGATGTGGTACACATGCTCTACCAGCTGCCCA

CTGGTGTTCGAGAGAGCCTGCGGGCCTGTATCTGGACAGAGCTA

CAGCGGAGGATGACAATGCCAGAGCCAGAGCTGACCACCCTAGG

GCCAGAACTGAGTGAACTTGACACAAAGCTACTCCTGGACTTGCC

GATCCAGCTGATGGACAGATTGTCCAATGATTCCATTATGTTGGT

GGTGGAGATGGTCCAAGGCGCTCCAGAGCAGCTGCTGGCACTGA

CCCCACTCCACCAGACAGCCTTGGCAGAGCGAGCACTTAAAAAC

CTGGCTCCAAAGGAGACCCCAATCTCCAAAGAAGTGCTGGAGAC

ACTGGGCCCCTTGGTTGGATTCCTGGGAATAGAGAGCACGCGAC

GGATCCCTTTACCCATTCTACTGTCTCATCTCAGTCAGCTGCAGG

GCTTCTGCCTAGGAGAGACATTTGCCACAGAGCTGGGATGGCTG

CTGTTGCAGGAGCCTGTTCTTGGAAAACCAGAATTGTGGAGCCAG

GATGAAATAGAGCAAGCTGGACGCCTAGTATTCACTCTGTCTGCT

GAGGCTATTTCCTCGATCCCCAGGGAGGCTTTGGGCCCAGAGAC

ACTGGAGAGGCTTCTGGGAAAGCATCAAAGCTGGGAGCAGAGCA

GAGTGGGCCATCTGTGTGGGGAGTCACAGCTTGCCCACAAGAAA

GCAGCTCTGGTAGCTGGGATTGTGCATCCAGCTGCTGAGGGTCT

CCAAGAGCCTGTACCAAACTGTGCAGACATACGGGGAACCTTCC

CAGCGGCCTGGTCTGCGACACAAATCTCAGAGATGGAACTCTCA

GACTTTGAAGACTGCCTGTCACTATTTGCTGGAGATCCAGGACTT

GGTCCTGAGGAACTACGGGCAGCCATGGGCAAGGCCAAGCAGTT

GTGGGGTCCCCCTCGAGGATTCCGTCCTGAGCAGATCTTGCAGC

TGGGCCGTCTCCTGATAGGTCTAGGAGAACGGGAACTGCAGGAG

CTTACCTTGGTGGACTGGGGTGTGCTGAGCAGCCTGGGGCAAAT

AGATGGCTGGAGTTCCATGCAGCTCCGAGCCGTGGTCTCCAGTT

TCCTAAGGCAGAGTGGTCGGCATGTGAGCCACCTGGACTTCATTT

ATCTGACAGCACTGGGTTACACAGTCTGTGGATTGCGACCAGAG

GAGTTACAGCACATCAGCAGTTGGGAGTTTAGCCAAGCAGCTCTC

TTCCTGGGTAGCTTGCATCTCCCGTGCTCTGAGGAACAGCTGGAA

GTTCTGGCCTATCTCCTTGTGTTGCCTGGTGGCTTTGGCCCAGTC

AGTAACTGGGGGCCTGAGATCTTCACTGAAATTGGCACAATAGCA

GCTGGCATCCCAGACCTGGCTCTTTCAGCATTACTGCGGGGACA

GATCCAAGGCCTGACTCCTCTTGCCATTTCTGTCATTCCTGCTCC

CAAGTTTGCAGTGGTCTTCAACCCCATCCAGTTATCTAGTCTCACC

AGGGGTCAGGCCGTAGCTGTTACTCCTGAACAGCTGGCCTATCT

GAGTCCTGAGCAGCGGCGAGCAGTTGCATGGGCCCAACACGAAG

GGAAGGAGATCCCAGAGCAGCTGGGTCGAAACTCAGCCTGGGGT

CTCTACGACTGGTTCCAAGCCTCCTGGGCCCTGGCATTGCCCGT

CAGCATTTTTGGCCACCTATTATGATAATAAGCTTGGATCCAATCA

ACCTCTGGATTACAAAATTTGTGAAAGATTGACTGGTATTCTTAAC

TATGTTGCTCCTTTTACGCTATGTGGATACGCTGCTTTAATGCCTT

TGTATCATGCTATTGCTTCCCGTATGGCTTTCATTTTCTCCTCCTT

GTATAAATCCTGGTTGCTGTCTCTTTATGAGGAGTTGTGGCCCGT

TGTCAGGCAACGTGGCGTGGTGTGCACTGTGTTTGCTGACGCAA

CCCCCACTGGTTGGGGCATTGCCACCACCTGTCAGCTCCTTTCC

GGGACTTTCGCTTTCCCCCTCCCTATTGCCACGGCGGAACTCATC

GCCGCCTGCCTTGCCCGCTGCTGGACAGGGGCTCGGCTGTTGG

GCACTGACAATTCCGTGGTGTTGTCGGGGAAATCATCGTCCTTTC

CTTGGCTGCTCGCCTGTGTTGCCACCTGGATTCTGCGCGGGACG

TCCTTCTGCTACGTCCCTTCGGCCCTCAATCCAGCGGACCTTCCT

TCCCGCGGCCTGCTGCCGGCTCTGCGGCCTCTTCCGCGTCTTCG

AGATCTGCCTCGACTGTGCCTTCTAGTTGCCAGCCATCTGTTGTT

TGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCC

CACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTG

AGTAGGTGTCATTCTATTCTGGGGGGTGGGGTGGGGCAGGACAG

CAAGGGGGAGGATTGGGAAGACAATAGCAGGCATGCTGGGGACT

CGAGTTAAGGGCGAATTCCCGATAAGGATCTTCCTAGAGCATGGC

TACGTAGATAAGTAGCATGGGGGGTTAATCATTAACTACAAGGAA

CCCCTAGTGATGGAGTTGGCCACTCCCTCTCTGCGCGCTCGCTC

GCTCACTGAGGCCGGGCGACCAAAGGTCGCCCGACGCCCGGGC

TTTGCCCGGGGGGCCTCAGTGAGCGAGCGAGCGCGCAGCCTTAA

TTAACCTAATTCACTGGCCGTCGTTTTACAACGTCGTGACTGGGA

AAACCCTGGCGTTACCCAACTTAATCGCCTTGCAGCACATCCCCC

TTTCGCCAGCTGGCGTAATAGCGAAGAGGCCCGCACCGATCGCC

CTTCCCAACAGTTGCGCAGCCTGAATGGCGAATGGGACGCGCCC

TGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCA

GCGTGACCGCTACACTTGCCAGCGCCCTAGCGCCCGCTCCTTTC

GCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGT

CAAGCTCTAAATCGGGGGCTCCCTTTAGGGTTCCGATTTAGTGCT

TTACGGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCA

CGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACG

TTGGAGTCCACGTTCTTTAATAGTGGACTCTTGTTCCAAACTGGAA

CAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGAT

TTTGCCGATTTCGGCCTATTGGTTAAAAAATGAGCTGATTTAACAA

AAATTTAACGCGAATTTTAACAAAATATTAACGCTTACAATTTAGGT

GGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTT

TTCTAAATACATTCAAATATGTATCCGCTCATGAGACAATAACCCT

GATAAATGCTTCAATAATATTGAAAAAGGAAGAGTATGAGCCATAT

TCAACGGGAAACGTCGAGGCCGCGATTAAATTCCAACATGGATGC

TGATTTATATGGGTATAAATGGGCTCGCGATAATGTCGGGCAATC

AGGTGCGACAATCTATCGCTTGTATGGGAAGCCCGATGCGCCAG

AGTTGTTTCTGAAACATGGCAAAGGTAGCGTTGCCAATGATGTTA

CAGATGAGATGGTCAGACTAAACTGGCTGACGGAATTTATGCCTC

TTCCGACCATCAAGCATTTTATCCGTACTCCTGATGATGCATGGTT

ACTCACCACTGCGATCCCCGGAAAAACAGCATTCCAGGTATTAGA

AGAATATCCTGATTCAGGTGAAAATATTGTTGATGCGCTGGCAGT

GTTCCTGCGCCGGTTGCATTCGATTCCTGTTTGTAATTGTCCTTTT

AACAGCGATCGCGTATTTCGTCTTGCTCAGGCGCAATCACGAATG

AATAACGGTTTGGTTGATGCGAGTGATTTTGATGACGAGCGTAAT

GGCTGGCCTGTTGAACAAGTCTGGAAAGAAATGCATAAACTTTTG

CCATTCTCACCGGATTCAGTCGTCACTCATGGTGATTTCTCACTTG

ATAACCTTATTTTTGACGAGGGGAAATTAATAGGTTGTATTGATGT

TGGACGAGTCGGAATCGCAGACCGATACCAGGATCTTGCCATCC

TATGGAACTGCCTCGGTGAGTTTTCTCCTTCATTACAGAAACGGC

TTTTTCAAAAATATGGTATTGATAATCCTGATATGAATAAATTGCAG

TTTCATTTGATGCTCGATGAGTTTTTCTAACTGTCAGACCAAGTTT

ACTCATATATACTTTAGATTGATTTAAAACTTCATTTTTAATTTAAAA

GGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCC

TTAACGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAA

GATCAAAGGATCTTCTTGAGATCCTTTTTTTCTGCGCGTAATCTGC

TGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTG

CCGGATCAAGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTC

AGCAGAGCGCAGATACCAAATACTGTTCTTCTAGTGTAGCCGTAG

TTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTC

GCTCTGCTAATCCTGTTACCAGTGGCTGCTGCCAGTGGCGATAAG

TCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGATAAG

GCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCA

GCTTGGAGCGAACGACCTACACCGAACTGAGATACCTACAGCGT

GAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGA

CAGGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACG

AGGGAGCTTCCAGGGGGAAACGCCTGGTATCTTTATAGTCCTGTC

GGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCG

TCAGGGGGGCGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTT

TTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACATGTTCTTT

CCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTT

GAGTGAGCTGATACCGCTCGCCGCAGCCGAACGACCGAGCGCA

GCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCCAATACGCAAA

CCGCCTCTCCCCGCGCGTTGGCCGATTCATTAATGCAGCTGGCA

CGACAGGTTTCCCGACTGGAAAGCGGGCAGTGAGCGCAACGCAA

TTAATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACACTT

TATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACA

ATTTCACACAGGAAACAGCTATGACCATGATTACGCCAGATTTAAT

TAAGGCCTTAATTAGG

Trans-Splicing Dual Vectors

A second approach for expressing large proteins in mammalian cells involves the use of trans-splicing dual vectors. In this approach, two nucleic acid vectors are used that contain distinct nucleic acid sequences, and the polynucleotide encoding the N-terminal portion of the protein of interest and the polynucleotide encoding the C-terminal portion of the protein of interest do not overlap. Instead, the first nucleic acid vector includes a splice donor sequence 3′ of the polynucleotide encoding the N-terminal portion of the protein of interest, and the second nucleic acid vector includes a splice acceptor sequence 5′ of the polynucleotide encoding the C-terminal portion of the protein of interest. When the first and second nucleic acids are present in the same cell, their ITRs can concatenate, forming a single nucleic acid structure in which the concatenated ITRs are positioned between the splice donor and splice acceptor. Trans-splicing then occurs during transcription, producing a nucleic acid molecule in which the polynucleotides encoding the N-terminal and C-terminal portions of the protein of interest are contiguous, thereby forming the full-length coding sequence.

Trans-splicing dual vectors for use in the methods and compositions described herein are designed such that approximately half of the stereocilin coding sequence is contained within each vector (e.g., each vector contains a polynucleotide that encodes approximately half of the stereocilin protein, as is discussed above). The determination of how to split the polynucleotide sequence between the two nucleic acid vectors is made based on the size of the promoter and the locations of sequence elements of interest in the polynucleotide that encodes the stereocilin protein (e.g., exons of the STRC gene). The first vector in the trans-splicing dual vector system can contain a promoter sequence 5′ of a polynucleotide encoding an N-terminal portion of a stereocilin protein. The nucleic acid vectors can optionally contain STRC UTRs (e.g., both the 5′ and 3′ STRC UTRs, e.g., full-length UTRs). One exemplary trans-splicing dual vector system for use in the compositions and methods described herein includes a first nucleic acid vector containing an OCM promoter (e.g., any one of SEQ ID NOs: 1-3) operably linked to a polynucleotide encoding an N-terminal portion of a stereocilin protein (e.g., an N-terminal portion of a human stereocilin protein, e.g., an N-terminal portion of SEQ ID NO: 4) and a splice donor sequence 3′ of the polynucleotide sequence; and a second nucleic acid vector containing a splice acceptor sequence 5′ of a polynucleotide encoding a C-terminal portion of the stereocilin protein (e.g., a C-terminal portion of human stereocilin, e.g., a C-terminal portion of SEQ ID NO: 4) and a poly(A) sequence. An alternative trans-splicing dual vector system includes a first nucleic acid vector containing an OCM promoter (e.g., any one of SEQ ID NOs: 1-3) operably linked to a polynucleotide encoding an N-terminal portion of the stereocilin protein (e.g., an N-terminal portion of a murine stereocilin protein, e.g., an N-terminal portion of SEQ ID NO: 5) and a splice donor sequence 3′ of the polynucleotide sequence; and a second nucleic acid vector containing a splice acceptor sequence 5′ of a polynucleotide encoding a C-terminal portion of the stereocilin protein (e.g., a C-terminal portion of a murine stereocilin protein, e.g., a C-terminal portion of SEQ ID NO: 5) and a poly(A) sequence. In some embodiments, the OCM promoter is a polynucleotide having the sequence of SEQ ID NO: 1 or a variant having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 1. In some embodiments, the OCM promoter is a polynucleotide having the sequence of SEQ ID NO: 2 or a variant having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 2. In some embodiments, the OCM promoter is a polynucleotide having the sequence of SEQ ID NO: 3 or a variant having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 3. These nucleic acid vectors can also contain full-length 5′ and/or 3′ STRC UTRs in the first and second nucleic acid vectors, respectively (e.g., the first nucleic acid vector can contain the 5′ human STRC UTR in dual vector systems encoding human stereocilin, or the 5′ mouse UTR in dual vector systems encoding mouse stereocilin; and the second nucleic acid vector can contain the 3′ human STRC UTR in dual vector systems encoding human stereocilin, or the 3′ mouse STRC UTR in dual vector systems encoding mouse stereocilin). To accommodate an STRC UTR, the stereocilin coding sequence can be divided at such a position as to accommodate the length of the promoter sequence and the sequence encoding the N-terminal portion of stereocilin.

In some embodiments, the polynucleotide that encodes a full-length human stereocilin protein has the sequence of SEQ ID NO: 6 or is a variant thereof having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity to SEQ ID NO: 6. In some embodiments, the polynucleotide that encodes a full-length murine stereocilin protein has the sequence of SEQ ID NO: 7 or is a variant thereof having at least 85% (e.g., at least 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more) sequence identity SEQ ID NO: 7.

Dual Hybrid Vectors

A third approach for expressing large proteins in mammalian cells involves the use of dual hybrid vectors. This approach combines elements of the overlapping dual vector strategy and the trans-splicing strategy in that it features both an overlapping region at which homologous recombination can occur and splice donor and splice acceptor sequences. In dual hybrid vector systems, the overlapping region is a recombinogenic region that is contained in both the first and second nucleic acid vectors, rather than a portion of the polynucleotide sequence encoding the protein of interest—the polynucleotide encoding the N-terminal portion of the protein of interest and the polynucleotide encoding the C-terminal portion of the protein of interest do not overlap in this approach. The recombinogenic region is 3′ of the splice donor sequence in the first nucleic acid vector and 5′ of the splice acceptor sequence in the second nucleic acid sequence. The first and second nucleic acid sequences can then join to form a single sequence based on one of two mechanisms: 1) recombination at the overlapping region, or 2) concatemerization of the ITRs. The remaining recombinogenic region(s) and/or the concatemerized ITRs can be removed by splicing, leading to the formation of a contiguous polynucleotide sequence that encodes the full-length protein of interest. Recombinogenic regions, splice donor sequences, and splice acceptor sequences that can be used in the compositions and methods described herein include those well-known to one of skill in the art. Exemplary recombinogenic regions include the F1 phage AK gene and alkaline phosphatase (AP) gene fragments as described in U.S. Pat. Nos. 10,494,645 and 8,236,557, which are incorporated herein by reference. In some embodiments, the AP gene fragment has the sequence of:

(SEQ ID NO: 47)

CCCCGGGTGCGCGGCGTCGGTGGTGCCGGGGGGGGCGCCAGGTCGCAGG

CGGTGTAGGGCTCCAGGCAGGCGGCGAAGGCCATGACGTGCGCTATGAA

GGTCTGCTCCTGCACGCCGTGAACCAGGTGCGCCTGCGGGCCGCGCGCG

AACACCGCCACGTCCTCGCCTGCGTGGGTCTCTTCGTCCAGGGGCACTG

CTGACTGCTGCCGATACTCGGGGCTCCCGCTCTCGCTCTCGGTAACATC

CGGCCGGGCGCCGTCCTTGAGCACATAGCCTGGACCGTTTCCGTATAGG

AGGACCGTGTAGGCCTTCCTGTCCCGGGCCTTGCCAGCGGCCAGCCCGA

TGAAGGAGCTCCCTCGCAGGGGGTAGCCTCCGAAGGAGAAGACGTGGGA

GTGGTCGGCAGTGACGAGGCTCAGCGTGTCCTCCTCGCTGGTGAGCTGG

CCCGCCCTCTCAATGGCGTCGTCGAACATGATCGTCTCAGTCAGTGCCC

GGTAAGCCCTGCTTTCATGATGACCATGGTCGATGCGACCACCCTCCAC

GAAGAGGAAGAAGCCGCGGGGGTGTCTGCTCAGCAGGCGCAGGGCAGCC

TCTGTCATCTCCATCAGGGAGGGGTCCAGTGTGGAGTCTCGGTGGATCT

CGTATTTCATGTCTCCAGGCTCAAAGAGACCCATGAGATGGGTCACAGA

CGGGTCCAGGGAAGCCTGCATGAGCTCAGTGCGGTTCCACACGTACCGG

GCACCCTGGCGTTCGCCGAGCCATTCCTGCACCAGATTCTTCCCGTCCA

GCCTGGTCCCACCTTGGCTGTAGTCATCTGGGTACTCAGGGTCTGGGGT

TCCCATGCGAAACATGTACTTTCGGCCTCCA.