The present invention relates to improved nucleotide sequences and nucleic acids that encode peptide linkers.
The present invention also relates to nucleotide sequences and nucleic acids that encode (fusion) proteins and polypeptides that contain peptide linkers, which nucleotide sequences and nucleic acids contain such improved nucleotide sequences and nucleic acids that encode peptide linkers.
The present invention also relates to methods for expressing/producing (fusion) proteins and polypeptides containing peptide linkers, which involve the use of such improved nucleotide sequences and nucleic acids that encode peptide linkers.
Other aspects, embodiments, uses and advantages of the present invention will become clear from the further description herein.
The use of peptide linkers to link two or more proteins, peptides, peptide moieties, binding domains or binding units is well known in the art. One often used class of peptide linker are known as the “Gly-Ser” or “GS” linkers. These are linkers that essentially consist of glycine (G) and serine (S) residues, and usually comprise one or more repeats of a peptide motif such as the GGGGS motif (for example, have the formula (Gly-Gly-Gly-Gly-Ser)n in which n may be 1, 2, 3, 4, 5, 6, 7 or more). Some often used examples of such GS linkers are 15GS linkers (n=3) and 35GS linkers (n=7). Reference is for example made to Chen et al., Adv. Drug Deliv. Rev. 2013 Oct. 15; 65(10): 1357-1369; and Klein et al., Protein Eng. Des. Sel. (2014) 27 (10): 325-330.
Polypeptides and (fusion) proteins that comprise such GS linkers are often produced by suitably expressing a genetic construct that comprises two or more nucleotide sequences encoding the relevant peptide moieties to be linked, in which these nucleotide sequences encoding the peptide moieties are suitably and operably linked via one or more nucleotide sequences that encode the one or more GS linker(s), such that upon suitable expression in a suitable host cell or host organism, the desired fusion protein or polypeptide is obtained, optionally after suitable steps for isolation and/or purification. Some preferred, but non-limiting examples of such genetic constructs (using Nanobodies as representative examples of the peptides to be linked, see the legend to Table III) are shown schematically in
It is also generally known that, due to the degeneracy of the genetic code, in the nucleotide sequences that encode GS linkers, each one of four different codons may be used to encode a glycine residue, namely GGU (or GGT), GGC, GGA and/or GGG (it is similarly known that the serine residues in a GS linker may be encoded by an UCU (or TCT), UCC (or TCC), UCA (or TCA), UCG (or TCG), AGU (or AGT) and/or AGC codon.
It has now been found that improved nucleotide sequences encoding GS linkers may be provided by using an excess of GGA and GGG codons to encode the glycine residues in the GS linker (i.e. compared to the amount of GGT/GGU and/or GGC codons).
It has further been found that improved nucleotide sequences encoding GS linkers may be provided by using an excess of GGA, GGG, and GGT/GGU codons to encode the glycine residues in the GS linker (i.e. compared to the amount of GGC codons).
Thus, in a first aspect, the invention relates to a nucleotide sequence and/or a nucleic acid that encodes a GS linker (as further defined herein), in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGA, GGG or GGT/GGU.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a GS linker (as further defined herein), in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGA or GGG.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a GS linker (as further defined herein), in which less than 30%, preferably less than 15%, more preferably less than 10%, such as less than 5% and up to less than 1% or lower (including 0%) of the codons that encode a glycine residue in the GS linker are GGC.
In a further aspect, the invention relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA, GGG or GGT/GGU.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker (as further described herein), in which the peptide linker encoded by said nucleotide sequence or nucleic acid comprises or essentially consists of glycine and serine residues, in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA or GGG.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which less than 30%, preferably less than 15%, more preferably less than 10%, such as less than 5% and up to less than 1% or lower (including 0%) of the codons that encode a glycine residue in said peptide linker are GGC.
As further described herein, the peptide linkers encoded by said nucleotide sequences or nucleic acids will generally comprise at least 5 amino acid residues and up to 50 amino acid residues or more (but in practice will usually comprise between 10 and 40 amino acid residues, such as about 15 amino acid residues to about 35 amino acid residues). Also, as further described herein, the peptide linkers encoded by said nucleotide sequences or nucleic acids will usually contain an excess of glycine residues compared to the number of serine residues, for example between 3 and 6 glycine residues for each serine residue. Also, often, the peptide linkers encoded by said nucleotide sequences or nucleic acids will contain one or more (such as two or more) repeats of a sequence motif. In a further aspect, the invention relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker (as further described herein), in which the peptide linker encoded by said nucleotide sequence or nucleic acid comprises or essentially consists of one or more (such as two or more) repeats of the sequence motif GGGGS (SEQ ID NO:1), in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA, GGG or GGT/GGU.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker (as further described herein), in which the peptide linker encoded by said nucleotide sequence or nucleic acid comprises or essentially consists of one or more (such as two or more) repeats of the sequence motif GGGGS (SEQ ID NO:1), in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA or GGG.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker (as further described herein), in which the peptide linker encoded by said nucleotide sequence or nucleic acid comprises or essentially consists of one or more (such as two or more) repeats of the sequence motif GGGGS (SEQ ID NO:1), in which less than 30%, preferably less than 15%, more preferably less than 10%, such as less than 5% and up to less than 1% or lower (including 0%) of the codons that encode a glycine residue in said peptide linker are GGC.
For example, in this aspect of the invention, the peptide linker encoded by said nucleotide sequence or nucleic acid may comprise or essentially consists of 2, 3, 4, 5, 6, 7, 8, 9 or 10 repeats of the sequence motif GGGGS.
In a further aspect, the invention relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker (as further described herein), in which the peptide linker encoded by said nucleotide sequence or nucleic acid is of the formula (Gly-Gly-Gly-Gly-Ser)n (in which n may be 1, 2, 3, 4, 5, 6, 7 or more), in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA, GGG or GGT/GGU.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker (as further described herein), in which the peptide linker encoded by said nucleotide sequence or nucleic acid is of the formula (Gly-Gly-Gly-Gly-Ser)n (in which n may be 1, 2, 3, 4, 5, 6, 7 or more), in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in said peptide linker are either GGA or GGG.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker (as further described herein), in which the peptide linker encoded by said nucleotide sequence or nucleic acid is of the formula (Gly-Gly-Gly-Gly-Ser)n (in which n may be 1, 2, 3, 4, 5, 6, 7 or more), in which less than 30%, preferably less than 15%, more preferably less than 10%, such as less than 5% and up to less than 1% or lower (including 0%) of the codons that encode a glycine residue in said peptide linker are GGC.
For example, in this aspect of the invention, the peptide linker encoded by said nucleotide sequence or nucleic acid may comprise or essentially consists of 2, 3, 4, 5, 6, 7, 8, 9 or 10 repeats of the sequence motif GGGGS.
In a further aspect, the invention relates to a nucleotide sequence and/or a nucleic acid of the general formula
(Ax-Bp-Ay-Bq)n,
in which:
In a further aspect, the invention relates to a nucleotide sequence and/or a nucleic acid of the general formula
(Ax-B)n,
in which:
In a further aspect, the invention relates to a nucleotide sequence and/or a nucleic acid of one of the formulas shown in Table I, in which:
Generally, the nucleotide sequences and nucleic acids described herein which encode Gly-Ser linkers and in which the glycine residues in said GS linkers are predominantly or exclusively encoded by GGA, GGG, or GGT/GGU codons are also referred to herein as “GS linker-encoding sequence(s) of the invention”. Generally, the nucleotide sequences and nucleic acids described herein which encode Gly-Ser linkers and in which the glycine residues in said GS linkers are predominantly or exclusively encoded by GGA or GGG codons are also referred to herein as “GS linker-encoding sequence(s) of the invention”. Generally, the nucleotide sequences and nucleic acids described herein which encode Gly-Ser linkers and in which almost none or not any of the glycine residues in said GS linkers are encoded by GGC codons are also referred to herein as “GS linker-encoding sequence(s) of the invention”.
In one preferred but non-limiting aspect of the invention, more than 95%, and up to 99% or more (and including 100%) of the codons that encode a glycine residue in a GS linker-encoding sequence of the invention are either GGA, GGG, or GGT/GGU.
In one preferred but non-limiting aspect of the invention, more than 95%, and up to 99% or more (and including 100%) of the codons that encode a glycine residue in a GS linker-encoding sequence of the invention are either GGA or GGG.
In one preferred but non-limiting aspect of the invention, less than 5%, and up to less than 1% or lower (and including 0%) of the codons that encode a glycine residue in a GS linker-encoding sequence of the invention are GGC. Table II gives some representative, but non-limiting, examples of GS linker-encoding sequence(s) of the invention. Other examples of GS linker-encoding sequence(s) of the invention will be clear to the skilled person based on the disclosure herein.
Without being limited to any specific explanation, hypothesis or mechanism, it is assumed that the use of such nucleotide sequences (i.e. compared to the use of nucleotide sequences encoding GS linkers that contain a greater amount/proportion of GGU and/or GGC codons; or compared to the use of nucleotide sequences encoding GS linkers that contain a greater amount/proportion of GGC codons) reduces the risk of aspartate residues being erroneously included in the desired GS linkers (instead of the intended glycine residues) and/or reduces the amount of aspartate residues that, upon expression in a suitable host or host organism, are erroneously included in the desired GS linkers.
Thus, when used in the expression and/or production of fusion proteins or polypeptides, the invention also reduces the amount of contaminants that is obtained in the expressed product (i.e. contaminants that contain GS linkers with one or more aspartate residues instead of the intended glycine residues) and also reduces deleterious effects associated with the unwanted presence of aspartate residues in the desired GS linkers, such as undesired isomerization into iso-aspartate, as well as increase susceptibility to proteolytic degradation.
Thus in another aspect, the invention relates to a nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or fusion polypeptide, in which the fusion protein or polypeptide that is encoded by said nucleotide sequence and/or a nucleic acid comprises two or more peptide moieties that are suitably linked via one or more GS linkers, in which the one or more GS linkers are encoded by one or more GS linker-encoding sequence(s) of the invention (i.e. by a nucleotide sequence or nucleic acid in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGG, GGG, or GGT/GGU).
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or fusion polypeptide, in which the fusion protein or polypeptide that is encoded by said nucleotide sequence and/or a nucleic acid comprises two or more peptide moieties that are suitably linked via one or more GS linkers, in which the one or more GS linkers are encoded by one or more GS linker-encoding sequence(s) of the invention (i.e. by a nucleotide sequence or nucleic acid in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGG or GGG).
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or fusion polypeptide, in which the fusion protein or polypeptide that is encoded by said nucleotide sequence and/or a nucleic acid comprises two or more peptide moieties that are suitably linked via one or more GS linkers, in which the one or more GS linkers are encoded by one or more GS linker-encoding sequence(s) of the invention (i.e. by a nucleotide sequence or nucleic acid in which less than 30%, preferably less than 15%, more preferably less than 10%, such as less than 5% and up to less than 1% or lower (including 0%) of the codons that encode a glycine residue in the GS linker are GGC).
In another aspect, the invention relates to a nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or fusion polypeptide, in which the fusion protein or polypeptide that is encoded by said nucleotide sequence and/or a nucleic acid comprises two or more peptide moieties that are suitably linked via one or more GS linkers, in which the part(s) of the nucleotide sequence or nucleic acid that encode(s) the GS linker(s) are one or more GS linker-encoding sequence(s) of the invention (i.e. a nucleotide sequences or nucleic acids in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGG, GGG, or GGT/GGU).
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or fusion polypeptide, in which the fusion protein or polypeptide that is encoded by said nucleotide sequence and/or a nucleic acid comprises two or more peptide moieties that are suitably linked via one or more GS linkers, in which the part(s) of the nucleotide sequence or nucleic acid that encode(s) the GS linker(s) are one or more GS linker-encoding sequence(s) of the invention (i.e. a nucleotide sequences or nucleic acids in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGG or GGG).
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or fusion polypeptide, in which the fusion protein or polypeptide that is encoded by said nucleotide sequence and/or a nucleic acid comprises two or more peptide moieties that are suitably linked via one or more GS linkers, in which the part(s) of the nucleotide sequence or nucleic acid that encode(s) the GS linker(s) are one or more GS linker-encoding sequence(s) of the invention (i.e. by a nucleotide sequence or nucleic acid in which less than 30%, preferably less than 15%, more preferably less than 10%, such as less than 5% and up to less than 1% or lower (including 0%) of the codons that encode a glycine residue in the GS linker are GGC).
More generally, in another aspect, the invention relates to a nucleotide sequence or nucleic acid that comprises or contains one or more GS linker-encoding sequence(s) of the invention. Such a nucleotide sequence or nucleic acid is preferably such that, upon expression in a suitable host cell or host organism, it expresses a (fusion) protein or polypeptide that comprises at least one GS linker (i.e. a GS linker encoded by a GS linker-encoding sequence of the invention).
In another aspect, the invention relates to a method for expressing or producing a (fusion) protein or polypeptide, in which said (fusion) protein or polypeptide comprises two or more peptide moieties that are suitably linked via one or more GS linkers, which method comprises suitably expressing, in a suitable host cell or host organism, a nucleotide sequence and/or a nucleic acid encoding said (fusion) protein or polypeptide, in which said nucleotide sequence and/or a nucleic acid comprises or contains one or more GS linker-encoding sequence(s) of the invention (and further is as described herein). Said method may further comprise the optional step of isolating/purifying the (fusion) protein or polypeptide thus expressed.
In another aspect, the invention relates to a host cell or host organism that comprises a nucleotide sequence and/or a nucleic acid that encodes a (fusion) protein or polypeptide that comprises one or more GS linkers, in which said nucleotide sequence, and/or a nucleic acid comprises or contains one or more GS linker-encoding sequence(s) of the invention (and further is as described herein)
In another aspect, the invention relates to a method for expressing or producing a (fusion) protein or polypeptide, in which said (fusion) protein or polypeptide comprises two or more peptide moieties that are suitably linked via one or more GS linkers, which method comprises cultivating a suitable host cell or host organism that comprises a nucleotide sequence and/or nucleic acid that comprises or contains one or more GS linker-encoding sequence(s) of the invention (and that further is as described herein), under conditions such that said host cell or host organism expresses/produces said (fusion) protein or polypeptide (in which said fusion protein or polypeptide comprises one or more GS linkers, i.e. as encoded by the GS linker-encoding sequence(s) of the invention). Said method may further comprise the optional step of isolating/purifying the (fusion) protein or polypeptide thus expressed.
In a further aspect, the invention relates to a (fusion) protein or polypeptide (and in particular, to a (fusion) protein or polypeptide comprising one or more GS linkers) that has been obtained by expression, in a suitable host cell or host organism, of a nucleotide sequence or nucleic acid encoding said (fusion) protein or polypeptide, in which said nucleotide sequence or nucleic acid contains or comprises one or more GS linker-encoding sequence(s) of the invention (and is as further described herein).
In a further aspect, the invention provides a method for reducing the level of Gly to Asp misincorporation in a peptide linker (such as e.g. a GS linker), said method comprising the step of replacing, in the nucleic acid sequence and/or nucleic acid that encodes said peptide linker, at least one GGC codon with a GGG, GGA or GGT/GGU codon.
In this aspect, the invention also provides a method for reducing the level of Gly to Asp misincorporation in a peptide linker (such as e.g. a GS linker), said method comprising the step of replacing, in the nucleic acid sequence and/or nucleic acid that encodes said peptide linker, at least one GGC codon with a GGG or GGA.
In a further aspect, the invention provides a method for reducing the level of Gly to Asp misincorporation in a peptide linker (such as e.g. a GS linker) present in a multivalent (such as bivalent, trivalent, tetravalent) immunoglobulin single variable domain or Nanobody, said method comprising the step of replacing, in the nucleic acid sequence and/or nucleic acid that encodes said peptide linker, at least one GGC codon with a GGG, GGA or GGT/GGU codon.
In this aspect, the invention also provides a method for reducing the level of Gly to Asp misincorporation in a peptide linker (such as e.g. a GS linker) present in a multivalent (such as bivalent, trivalent, tetravalent) immunoglobulin single variable domain or Nanobody, said method comprising the step of replacing, in the nucleic acid sequence and/or nucleic acid that encodes said peptide linker, at least one GGC codon with a GGG or GGA.
The nucleotide sequences and nucleic acids described herein may be DNA or RNA (and are preferably double stranded DNA) and may be in the form of a genetic construct (for example in the form of a suitable vector, such as an expression vector). Such a genetic construct may for example, besides the nucleotide sequence encoding the (fusion) protein or polypeptide, comprise one or more suitable elements for expression of said nucleotide sequence, such as a suitable promoter, a suitable translation initiation sequence such as a ribosomal binding site and start codon, a suitable termination codon, and a suitable transcription termination sequence, 3′- or 5′-UTR sequences, leader sequences, selection markers, expression markers/reporter genes, and/or elements that may facilitate or increase (the efficiency of) transformation or integration, all suitably (and where appropriate, operably) linked to the nucleotide sequence encoding the (fusion) protein or polypeptide. Suitable examples of such elements will be clear to the skilled person and may for example depend upon the host or host cell in which said (expression) vector is to be expressed.
The genetic constructs described herein may also be in a form suitable for transformation of the intended host cell or host organism, in a form suitable for integration into the genomic DNA of the intended host cell or in a form suitable for independent replication, maintenance and/or inheritance in the intended host organism. For instance, the genetic constructs described herein may be in the form of a vector, such as for example a plasmid, cosmid, YAC, a viral vector or transposon. In particular, the vector may be an expression vector, i.e. a vector that can provide for expression in vitro and/or in vivo (e.g. in a suitable host cell, host organism and/or expression system). Such genetic constructs and (expression) vectors form further aspects of the invention.
Preferably, the regulatory and further elements of the genetic constructs described herein are such that they are capable of providing their intended biological function in the intended host cell or host organism.
For instance, a promoter, enhancer or terminator should be “operable” in the intended host cell or host organism, by which is meant that (for example) said promoter should be capable of initiating or otherwise controlling/regulating the transcription and/or the expression of a nucleotide sequence—e.g. a coding sequence—to which it is operably linked (as defined herein).
Some particularly preferred promoters include, but are not limited to, promoters known per se for the expression in the host cells mentioned herein; and in particular promoters for the expression in the bacterial cells, such as those mentioned herein.
A selection marker should be such that it allows—i.e. under appropriate selection conditions—host cells and/or host organisms that have been (successfully) transformed with a nucleotide sequence (as described herein) to be distinguished from host cells/organisms that have not been (successfully) transformed. Some preferred, but non-limiting examples of such markers are genes that provide resistance against antibiotics (such as kanamycin or ampicillin), genes that provide for temperature resistance, or genes that allow the host cell or host organism to be maintained in the absence of certain factors, compounds and/or (food) components in the medium that are essential for survival of the non-transformed cells or organisms.
A leader sequence should be such that—in the intended host cell or host organism—it allows for the desired post-translational modifications and/or such that it directs the transcribed mRNA to a desired part or organelle of a cell. A leader sequence may also allow for secretion of the expression product from said cell. As such, the leader sequence may be any pro-, pre-, or prepro-sequence operable in the host cell or host organism. Leader sequences may not be required for expression in a bacterial cell. For example, leader sequences known per se for the expression and production of antibodies and antibody fragments (including but not limited to single domain antibodies and ScFv fragments) may be used in an essentially analogous manner.
An expression marker or reporter gene should be such that—in the host cell or host organism—it allows for detection of the expression of (a gene or nucleotide sequence present on) the genetic construct. An expression marker may optionally also allow for the localisation of the expressed product, e.g. in a specific part or organelle of a cell and/or in (a) specific cell(s), tissue(s), organ(s) or part(s) of a multicellular organism. Such reporter genes may also be expressed as a protein fusion with the encoded amino acid sequence. Some preferred, but non-limiting examples include fluorescent proteins such as GFP.
Some preferred, but non-limiting examples of suitable promoters, terminator and further elements include those that can be used for the expression in the host cells mentioned herein; and in particular those that are suitable for expression in bacterial cells, such as those mentioned herein. For some (further) non-limiting examples of the promoters, selection markers, leader sequences, expression markers and further elements that may be present/used in the genetic constructs described herein—such as terminators, transcriptional and/or translational enhancers and/or integration factors—reference is made to the general handbooks such as Sambrook et al, “Molecular Cloning: A Laboratory Manual” (2nd. Ed.), Vols. 1-3, Cold Spring Harbor Laboratory Press (1989); F. Ausubel et al, eds., “Current protocols in molecular biology”, Green Publishing and Wiley Interscience, New York (1987), as well as to the examples that are given in WO 95/07463, WO 96/23810, WO 95/07463, WO 95/21191, WO 97/11094, WO 97/42320, WO 98/06737, WO 98/21355, U.S. Pat. Nos. 7,207,410, 5,693,492 and EP 1 085 089. Other examples will be clear to the skilled person. Reference is also made to the general background art cited above and the further references cited herein.
Techniques for generating the nucleotide sequences, nucleic acids and genetic constructs described herein will be clear to the skilled person and may for instance include, but are not limited to, automated DNA synthesis. The genetic constructs described herein may also generally be provided by suitably linking the nucleotide sequence(s) described herein to the one or more further elements described above. Often, the genetic constructs described herein will be obtained by inserting a nucleotide sequence or nucleic acid as described herein in a suitable (expression) vector known per se. These and other techniques will be clear to the skilled person, and reference is again made to the standard handbooks, such as Sambrook et al. and Ausubel et al., mentioned above.
The nucleic acids described herein and/or the genetic constructs described herein may be used to transform a host cell or host organism, i.e. for expression and/or production of the encoded (fusion) protein or polypeptide. Suitable hosts or host cells will be clear to the skilled person, and may for example be any suitable fungal, prokaryotic or eukaryotic cell or cell line or any suitable fungal, prokaryotic or eukaryotic organism, for example:
Some preferred expression hosts are Pichia pastoris and human cell lines used for the expression/production of therapeutic proteins.
The term “GS linkers” as used herein generally refers to peptide linkers that are comprised of and/or essentially consist of glycine and serine residues.
Generally, such GS linkers (as well as other peptide linkers referred to herein) will contain at least 5 amino acid residues, such as about 10 amino acid residues, about 15 amino acid residues, about 20 amino acid residues, about 25 amino acid residues, about 35 amino acid residues, and up to 50 amino acid residues or more (although usually, linkers comprising about 10 to 40 amino acid residues, such as about 15 to about 35 amino acid residues, will often be used in practice).
Usually, such linkers will contain an excess of glycine residues compared to the number of serine residues, for example between 3 and 6 glycine residues for each serine residue. Usually also, such linkers will contain one or more (such as two or more) repeats of a sequence motif. Also, although in the invention in its broadest sense, the presence of one or more other amino acids (such as a glutamic acid residue, or a threonine residue instead of a serine residue) is not excluded, the linkers used herein preferably only contain (or are intended to only contain) glycine and serine residues.
As will be clear to the skilled person, the GS linkers that are most commonly used in the art of protein engineering (and which are also preferred in the practice of the present invention) are linkers that comprise one or more repeats of the GGGGS (SEQ ID NO: 1) motif, i.e. linkers of the general formula (Gly-Gly-Gly-Gly-Ser)n, in which n may be 1, 2, 3, 4, 5, 6, 7 or more. Some examples as 15GS linkers (n=3) and 35GS linkers (n=7). Reference is for example made to Chen et al., Adv Drug Deliv. Rev. 2013 Oct. 15; 65(10): 1357-1369; and Klein et al., Protein Eng. Des. Sel. (2014) 27 (10): 325-330.
The GS linkers encoded by the GS linker-encoding sequence(s) of the invention can be used to link together, in a suitable manner, any desired proteins, peptides, peptide moieties, binding domains or binding units, so as to form a (fusion) protein or polypeptide in which two or more of such proteins, peptides, peptide moieties, binding domains or binding units are linked together by one or more GS linkers. Generally, and as will be clear to the skilled person, the GS linkers encoded by the GS linker-encoding sequence(s) of the invention can be used for any purpose for which GS linkers can be used and/or have been used in the prior art. Such uses and applications of the GS linker-encoding sequence(s) of the invention (and of the GS linkers encoded by the same) will be clear to the skilled person.
In one specific aspect, the GS linkers encoded by the GS linker-encoding sequence(s) of the invention can suitably be used to link together two or more immunoglobulin single variable domains (such as two or more Nanobodies, e.g. VHH's, humanized VHH's, sequence-optimized VHH's, or camelized VH's, such as camelized human VH's), to form bivalent, trivalent, bispecific, trispecific, biparatopic, tetravalent, or other suitable ISVD constructs. Reference is for example made to the various applications by Ablynx N.V., such as for example and without limitation WO 2004/062551, WO 2006/122825, WO 2008/020079 and WO 2009/068627. The GS linkers may for example also be used to link one or more immunoglobulin single variable domains or Nanobodies against a therapeutic target to an immunoglobulin single variable domain or Nanobody that provides for increased half-life (e.g. increased t1/2-beta), such as an immunoglobulin single variable domain or Nanobody against serum albumin. Again, in these uses or applications, the GS linker-encoding sequence(s) of the invention (and GS linkers encoded by the same) can be used in essentially the same way as known nucleotide sequences that encode GS linkers. Some specific but non-limiting examples of such immunoglobulin single variable domain or Nanobody constructs are schematically shown in Table III, and nucleic acids encoding these constructs are also schematically shown in Figure I (the legend of Table III applies). Other examples will be clear to the skilled person based on the disclosure herein.
The invention will now be further described by means of the following non-limiting preferred aspects, examples and figures, in which:
The entire contents of all of the references (including literature references, issued patents, published patent applications, and co pending patent applications) cited throughout this application are hereby expressly incorporated by reference, in particular for the teaching that is referenced hereinabove.
In this Example, the invention will be illustrated using, as a non-limiting example, a tetravalent Nanobody construct consisting of four sequence optimized variable domains of a heavy-chain llama antibody, which are fused head-to-tail with 35GS linkers (see
[A]-[35GS linker]-[B]-[35GS linker]-[C]-[35GS linker]-[C]
in which [A], [B] and [C] represent three different Nanobodies and [35GS linker] represents a 35GS linker (see also
DNA fragments containing the coding information of Nanobody Construct A were cloned into the multiple cloning site of a Pichia expression vector that contains a Zeocin™ resistance gene (a derivative of the original pPpT4_Alpha_S expression vector described by Näätsaari et al., PLoS One. 2012; 7(6):e39720), such that the Nanobody® sequence was downstream of and in frame with the alfa Mating Factor (aMF) signal peptide sequence.
Transformation of the Nanobody Construct a Coding Sequence, Expression and Secretion of the Construct in Pichia pastoris
Transformation and expression studies were performed in the Pichia strain NRRL Y-11430 (ARS Patent Culture Collection 1815 North University St., Peoria). This WT strain was used to make a derivative strain overexpressing the endogenous Pichia auxiliary protein KAR2 (GeneID:8198455) as well as Nanobody Construct A. Both Nanobody Construct A and Kar2 were under the control of the AOX1 methanol inducible promoter. Transformation was performed by standard techniques and in accordance with the standard handbooks (see for example Methods In Molecular Biology 2007, Humana Press Inc.). Transformants were grown on selective medium containing Zeocin and a number of individual colonies were selected and evaluated on the expression level of Nanobody Construct A in 5 mL shake-flasks cultures in BMCM medium and induced by the addition of methanol as has been described in Pichia protocols (see again the standard handbooks). The best expressing clone was used in standard fed batch fermentation. Glycerol fed batches were performed and induction was initiated by the addition of methanol. The productions were performed at 2 L scale at pH6, 30° C. in complex medium with a methanol feed rate of 4 ml/L*h.
Purification of the Nanobody Construct a after Fed-Batch Fermentation
Nanobody Construct A was purified as follows: after fermentation, part of the cell broth was clarified via a hollow fiber 750 kDa followed by a capture step using a CIEX Poros XS resin, a polish step using CIEX Nuvia HR-S resin and a flow through step on an AIEX Sartobind STIC PA. Finally a concentration and buffer exchange step was performed via UF/DF using the Hydrosart 10 kD membrane.
The purified Nanobody Construct A was analyzed by strong cation exchange chromatography using a pH gradient (pH-IEX). The chromatogram, shown in
A 58 Dalton mass difference can be explained by the exchange of glycine with the acidic amino acid aspartic acid.
Analysis and Identification of Acidic Variants by Peptide Map Reversed Phase UHPLC Coupled with Mass Spectrometry (RP-UHPLC-MS)
Peptide map analysis (after trypsin digest) of the acidic variants fraction of Nanobody Construct A resulted in identification of two peptides with a mass increment of 58 Dalton. As schematically shown in
As collision induced fragmentation in the mass spectrometer led to only partial sequence coverage of the T10 peptide, the T10 peptide of the trypsin digest was fractionated by reversed phase chromatography, and subsequently digested with the enzyme Asp-N. The enzyme Asp-N is an endoproteinase that hydrolyses peptide bonds on the N-terminal side of aspartic acid residues. Because no aspartic acid residues are in the sequence of this peptide, cleavages were only expected in case of a Gly->Asp misincorporation events. In the analysis of the Asp-N digest of the T10 peptide by RP-UHPLC-MS, different fragments were identified with a mass corresponding to fragments of the T10 peptide with a mass increment of 58 Dalton. In total 9 Asp-N fragmentation sites were identified, as shown in
As mentioned, the peptide map analysis of Nanobody Construct A also resulted in identification of a second peptide with a mass increment of 58 Dalton. This peptide was found to correspond to one of the CDR's of one of the Nanobodies present in Nanobody Construct A. Further analysis (data not shown) confirmed that also for this peptide, the observed mass increment of 58 Dalton was most likely due to Asp misincorporation.
The GGC codon sequences present in the 35GS linker sequence of Nanobody construct A were replaced with a GGG, GGA or GGT codon sequence.
The obtained Nanobody constructs were expressed in Pichia strain NRRL Y-11430 and purified as described above. The level of Asp misincorporation in the obtained polypeptides was measured by the same method as described above. The mass spectrometer was setup to quantify 3 out of 9 misincorporation sites.
The relative levels of Asp misincorporation in the 35GS linker of the polypeptide obtained with the Reference Nanobody construct A (no codon optimization) and of the polypeptide obtained with the codon optimized Nanobody construct A is shown in
In this example, the impact of Nanobody valency and linker length on Gly to Asp misincorporation was studied. For this, bi-, tri- and tetravalent constructs, each with 9GS, 20GS or 35GS linkers sequences and a Nanobody building block sequence (different from the Nanobody building block sequence present in Nanobody construct A) were produced. An extra tetravalent, 35GS linker Nanobody construct was also produced without any GGC codons. The ten new constructs are shown in
Each possible new peptide after Gly to Asp misincorporation was followed with the mass spectrometry method as described above. The method was further optimized to allow simultaneous quantification of all 9 Asp-N fragmentation sites. The results on the misincorporation are shown in
From these results it can be concluded that the valency or the linker length does not have an impact on Gly to Asp misincorporation levels. Removal or reduction of the number of GGC codons clearly reduces the level of Gly to Asp misincorporation.
Finally, although the invention is described herein mainly with respect to GS linkers, it will be clear to the skilled person that the invention can generally be applied to other peptide linkers that contain glycine residues.
Thus, in a further aspect, the invention relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which the peptide linker encoded by nucleotide sequence and/or a nucleic acid contains four or more glycine residues, in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGA, GGG or GGT/GGU.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which the peptide linker encoded by nucleotide sequence and/or a nucleic acid contains four or more glycine residues, in which more than 70%, preferably more than 85%, more preferably more than 90%, such as more than 95% and up to 99% and more (including 100%) of the codons that encode a glycine residue in the GS linker are either GGA or GGG.
In this aspect, the invention also relates to a nucleotide sequence and/or a nucleic acid that encodes a peptide linker, in which the peptide linker encoded by nucleotide sequence and/or a nucleic acid contains four or more glycine residues, in which less than 30%, preferably less than 1%, more preferably less than 10%, such as less than 5% and up to less than 1% and lower (including 0%) of the codons that encode a glycine residue in the GS linker are GGC.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/054697 | 2/26/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62634985 | Feb 2018 | US |