This application claims priority to UK Patent Application Number 2007532.1 filed on May 20, 2020 entitled POLYPEPTIDES, the contents of which are herein incorporated by reference in their entirety.
The present application is being filed along with a Sequence Listing in electronic format. The Sequence Listing file, entitled 2231_1000PCT.txt, was created on May 19, 2021 and is 1,327,961 bytes in size. The information in electronic format of the Sequence Listing is incorporated herein by reference in its entirety.
The present invention relates to polypeptides which were identified in the BCR heavy chain repertoire of individuals during SARS-CoV-2 infection. The invention also includes polynucleotides encoding said polypeptides, pharmaceutical compositions comprising said polypeptides and the use of said polypeptides in suppressing or treating a disease or disorder mediated by infection with SARS-CoV-2, for providing prophylaxis to a subject at risk of infection of SARS-CoV-2 or for the diagnosis and/or prediction of outcome of SARS-CoV-2 infection.
Since the report of the first patients in December 20191,2, the unprecedented global scale of the COVID-19 pandemic has become apparent. The infectious agent, the SARS-CoV-2 betacoronavirus3, causes mild symptoms in most cases but can cause severe respiratory diseases such as acute respiratory distress syndrome in some individuals. Risk factors for severe disease include age, male gender and underlying co-morbidities4.
Understanding the immune response to SARS-CoV-2 infection is critical to support the development of therapies. Recombinant monoclonal antibodies derived from analysis of B cell receptor (BCR) repertoires in infected patients or the immunisation of animals have been shown to be effective against several infectious diseases including Ebola virus5, rabies6 and respiratory syncytial virus disease7. Such therapeutic antibodies have the potential to protect susceptible populations as well as to treat severe established infections.
While many vaccine approaches are underway in response to the SARS-CoV-2 outbreak, many of these compositions include as immunogens either whole, attenuated virus or whole spike (S) protein—a viral membrane glycoprotein which mediates cell uptake by binding to host angiotensin-converting enzyme 2 (ACE2). The antibody response to such vaccines will be polyclonal in nature and will likely include both neutralising and non-neutralising antibodies. It is hoped that the neutralising component will be sufficient to provide long-term SARS-CoV-2 immunity following vaccination, although other potential confounders may exist, such as raising antibodies which mediate antibody-dependent enhancement (ADE) of viral entry8-10. While ADE is not proven for SARS-CoV-2, prior studies of SARS-CoV-1 in non-human primates showed that, while some S protein antibodies from human SARS-CoV-1 patients were protective, others enhanced the infection via ADE11. An alternative could be to support passive immunity to SARS-CoV-2, by administering one, or a small cocktail of, well-characterised, neutralising antibodies.
Patients recovering from COVID-19 have already been screened to identify neutralising antibodies, following analysis of relatively small numbers (100-500) of antibody sequences12,13. A more extensive BCR repertoire analysis was performed on six patients in Stanford, USA with signs and symptoms of COVID-19 who also tested positive for SARS-CoV-2 RNA14. Although no information was provided on the patient outcomes in that study, the analysis demonstrated preferential expression of a subset of immunoglobulin heavy chain (IGH) V gene segments with relatively little somatic hypermutation and showed evidence of convergent antibodies between patients.
To drive a deeper understanding of the nature of humoral immunity to SARS-CoV-2 infection and to identify potential therapeutic antibodies to SARS-CoV-2, we have evaluated the BCR heavy chain repertoire from 19 individuals at various stages of their immune response. We show that (1) there are stereotypic responses to SARS-CoV-2 infection, (2) infection stimulates both naïve and memory B cell responses, (3) sequence convergence can be used to identify putative SARS-CoV-2 specific antibodies, and (4) sequence convergence can be identified between different SARS-CoV-2 studies in different locations and using different sample types.
Polypeptides of the present invention may, in at least some embodiments, have one or more of the following advantages compared to the prior art:
(i) increased binding affinity to SARS-CoV-2, for example SARS-CoV-2 spike protein,
(ii) increased neutralising potency against SARS-CoV-2,
(iii) binding to non-spike protein components of SARS-CoV-2 to reduce viral load,
(iv) binding to host proteins to inhibit virus entry/infection,
(v) binding to SARS-CoV-2 infected human cells to enable infected cell killing,
(vi) binding to human cells or soluble factor to modulate immune response to the virus,
(vii) binding to human cells to alter innate immune responses from structural cells such as epithelial cells,
(viii) binding to endothelial cells to alter viral-related endothelial inflammation and modulation of the clotting response,
(ix) activity across all potential anti-viral mechanisms including novel ones (e.g., binding viral epitopes, secreted host epitopes, membrane host epitopes, modulating infected host cells, modulating innate and adaptive immune responses)
(x) neutralising potential against other/new forms of coronavirus,
(xi) suitability for administration with other agents in treating COVID-19 (e.g., to enhance anti-viral efficacy), (xii) suitable for prevention or treatment of SARS-CoV-2 infection,
(xii) suitability for administration by multiple routes (SC, IV, IM, dermal, nasal, oral),
(xiii) one or more polypeptides can be used in the diagnosis or prediction of outcome post SARS-CoV-2 infection.
According to a first aspect of the invention, there is provided a polypeptide comprising:
a CDRH1 sequence comprising or consisting of a sequence sharing 80% or greater sequence identity with a CDRH1 sequence as shown in Table 1 and/or
a CDRH2 sequence comprising or consisting of a sequence sharing 80% or greater sequence identity with a CDRH2 sequence as shown in Table 1 and/or
a CDRH3 sequence comprising or consisting of a sequence sharing 80% or greater sequence identity with a CDRH3 sequence as shown in Table 1.
In a further aspect there is provided a polypeptide comprising:
a FWRH1 sequence comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH1 sequence as shown in Table 1 and/or
a FWRH2 sequence comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH2 sequence as shown in Table 1 and/or
a FWRH3 sequence comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH3 sequence as shown in Table 1 and/or
a FWRH4 sequence comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH4 sequence as shown in Table 1.
In a further aspect there is provided pharmaceutical compositions comprising the polypeptides above and polynucleotides encoding the polypeptides above. Further aspects of the invention will be apparent from the detailed description of the invention.
The complementarity determining regions (CDRs) and framework regions (FWRs) of an antibody or fragment thereof may be numbered from N- to C-terminus, i.e. FWR1, CDR1, FWR2, CDR2, FWR3, CDR3 and FWR4. In the context of a heavy chain variable domain, these regions may be denoted with an ‘H’, i.e. FWRH1, CDRH1, FWRH2, CDRH2, FWRH3, CDRH3 and FWRH4.
Table 1 below provides the polypeptide sequences of immunoglobulin heavy chain variable domains of the invention (VHs) with complementarity determining regions (CDRH1-3) and frameworks (FWRH1-4) of the invention annotated according to the IMGT system (Lefranc et al. “IMGT unique numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily V-like domains” Dev. Comp. Immunol. 27(1):55-77 (2003)). The full length polypeptide sequence of any VH given in Table 1 is the combination of, from N- to C-terminus, FWRH1, CDRH1, FWRH2, CDRH2, FWRH3, CDRH3 and FWRH4 on a single row. For example, the polypeptide sequence of set1_1 is QVQLVESGGGVVQPGRSLRLSCAASGFTFSSYAMHWVRQAPGKGLEWVAVISYDG SNKYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDAVYYCARDGGGYMDVWGQG TTVTVSS (SEQ ID NO: 1). “v_call” and “j_call” refer to the germline V and J gene segments from which the sequence originated, according to the IMGT system.
Table 2 below also provides the polypeptide sequences of immunoglobulin heavy chain variable domains (VHs) of the invention.
Based on the experimental work provided herein, it is expected that components of these VHs, such as the complementarity determining regions, frameworks, or combinations of these (such as full length VH sequences) may be utilised in therapeutic or prophylactic agents for treating or preventing SARS-CoV-2 infection, or for performing diagnostic or prognostic analysis of subjects infected, or suspected of being infected, with SARS-CoV-2.
It is envisaged that the proposed heavy chains be paired with suitable light chains to enable production of monoclonal antibodies, for example in IgG1 format. Cognate light chains can be identified by various methods, including computational prediction (eg Mason et al bioRxiv 617860 (2019)), the use of promiscuous or ‘common light chains’ (eg Xue et al. Biochem Biophys Res Commun. 515(3):481-486, (2019)), high-throughput paired heavy and light chain sequencing to identify native pairings (eg Wang et al Nat Biotechnol. 36(2):152-155 (2018)) and antibody display-based methods to find and optimise heavy and light chain pairings (eg Guo-Qiang et al. Methods Mol Biol. 562:133-142 2009).
In one embodiment there is provided a polypeptide comprising:
a sequence (such as a CDRH1 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a CDRH1 sequence as shown in Table 1 and/or
a sequence (such as a CDRH2 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a CDRH2 sequence as shown in Table 1 and/or
a sequence (such as a CDRH3 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a CDRH3 sequence as shown in Table 1.
Suitably the polypeptide comprises
a sequence (such as a CDRH1 sequence) comprising or consisting of a sequence sharing 90% or greater sequence identity with a CDRH1 sequence as shown in Table 1 and/or
a sequence (such as a CDRH2 sequence) comprising or consisting of a sequence sharing 90% or greater sequence identity with a CDRH2 sequence as shown in Table 1 and/or
a sequence (such as a CDRH3 sequence) comprising or consisting of a sequence sharing 90% or greater sequence identity with a CDRH3 sequence as shown in Table 1.
More suitably the polypeptide comprises
a sequence (such as a CDRH1 sequence) comprising or consisting of a CDRH1 sequence as shown in Table 1 and/or
a sequence (such as a CDRH2 sequence) comprising or consisting of a CDRH2 sequence as shown in Table 1 and/or
a sequence (such as a CDRH3 sequence) comprising or consisting of a CDRH3 sequence as shown in Table 1.
More suitably the polypeptide comprises
a sequence (such as a CDRH1 sequence) comprising or consisting of a CDRH1 sequence as shown in Table 1 and
a sequence (such as a CDRH2 sequence) comprising or consisting of a CDRH2 sequence as shown in Table 1 and
a sequence (such as a CDRH3 sequence) comprising or consisting of a CDRH3 sequence as shown in Table 1.
Suitably the polypeptide comprises
a sequence (such as a FWRH1 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH1 sequence as shown in Table 1 and/or
a sequence (such as a FWRH2 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH2 sequence as shown in Table 1 and/or
a sequence (such as a FWRH3 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH3 sequence as shown in Table 1 and/or
a sequence (such as a FWRH4 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH4 sequence as shown in Table 1.
In one embodiment the polypeptide comprises:
a sequence (such as a FWRH1 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH1 sequence as shown in Table 1 and/or
a sequence (such as a FWRH2 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH2 sequence as shown in Table 1 and/or
a sequence (such as a FWRH3 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH3 sequence as shown in Table 1 and/or
a sequence (such as a FWRH4 sequence) comprising or consisting of a sequence sharing 80% or greater sequence identity with a FWRH4 sequence as shown in Table 1.
More suitably the polypeptide comprises
a sequence (such as a FWRH1 sequence) comprising or consisting of a sequence sharing 90% or greater sequence identity with a FWRH1 sequence as shown in Table 1 and/or
a sequence (such as a FWRH2 sequence) comprising or consisting of a sequence sharing 90% or greater sequence identity with a FWRH2 sequence as shown in Table 1 and/or
a sequence (such as a FWRH3 sequence) comprising or consisting of a sequence sharing 90% or greater sequence identity with a FWRH3 sequence as shown in Table 1 and/or
a sequence (such as a FWRH4 sequence) comprising or consisting of a sequence sharing 90% or greater sequence identity with a FWRH4 sequence as shown in Table 1.
More suitably the polypeptide comprises
a sequence (such as a FWRH1 sequence) comprising or consisting of a FWRH1 sequence as shown in Table 1 and/or
a sequence (such as a FWRH2 sequence) comprising or consisting of a FWRH2 sequence as shown in Table 1 and/or
a sequence (such as a FWRH3 sequence) comprising or consisting of a FWRH3 sequence as shown in Table 1 and/or
a sequence (such as a FWRH4 sequence) comprising or consisting of a FWRH4 sequence as shown in Table 1.
More suitably the polypeptide comprises
a sequence (such as a FWRH1 sequence) comprising or consisting of a FWRH1 sequence as shown in Table 1 and
a sequence (such as a FWRH2 sequence) comprising or consisting of a FWRH2 sequence as shown in Table 1 and
a sequence (such as a FWRH3 sequence) comprising or consisting of a FWRH3 sequence as shown in Table 1 and
a sequence (such as a FWRH4 sequence) comprising or consisting of a FWRH4 sequence as shown in Table 1.
Suitably the polypeptide comprises three complementarity determining regions (CDRH1-CDRH3). Suitably, the polypeptide comprises four framework regions (FWRH1-FWRH4).
In one embodiment there is provided a polypeptide comprising or consisting of a sequence sharing 80% or greater, more suitably 90% or greater, sequence identity with any immunoglobulin heavy chain variable domain (VH) sequence as shown in Table 1 (i.e. from N- to C-terminus, the combined sequence of FWRH1, CDRH1, FWRH2, CDRH2, FWRH3, CDRH3, FWRH4, for a single row) or Table 2. More suitably the polypeptide comprises or consists of an immunoglobulin heavy chain variable domain (VH) sequence as shown in Table 1 (i.e. from N- to C-terminus, the combined sequence of FWRH1, CDRH1, FWRH2, CDRH2, FWRH3, CDRH3, FWRH4, for a single row) or Table 2.
Suitably the polypeptide is an antibody, such as an antibody which belongs to the isotype subclass IGHA1, IGHA2 or IGHG1. Alternatively, the polypeptide is an antibody fragment, such as a F(ab′)2, an Fd, an Fv, an scFv, a VH, or a VHH.
Suitably the polypeptide binds to the spike protein (S protein) of SARS-CoV-2. More suitably the polypeptide binds to the S1 or S2 domain of the spike protein (S protein), such as the S1 domain of the spike protein (S1 protein).
An antibody fragment as used herein refers to a portion of an antibody that binds to a target. Examples of binding fragments encompassed within the term include a Fab, a F(ab′)2, an Fd, an Fv, an scFv, a VH, or a VHH.
Suitably the polypeptide comprises light chain CDRs (i.e. CDRL1, CDRL2, CDRL3). More suitably the polypeptide comprises light chain CDRs and framework regions (i.e. FWRL1, CDRL1, FWRL2, CDRL2, FWRL3, CDRL3 and FWRL4). More suitably the polypeptide is an antibody comprising both heavy and light chains. Suitably the light chain CDRs and/or frameworks and/or light chains are any one or more of those disclosed in Xue et al. Biochem Biophys Res Commun. 515(3):481-486, (2019).
Suitably, the polypeptide of the invention is isolated. An “isolated” polypeptide is one that is removed from its original environment. For example, a naturally-occurring polypeptide of the invention is isolated if it is separated from some or all of the coexisting materials in the natural system.
In one embodiment there is provided a pharmaceutical composition comprising the polypeptide and one or more pharmaceutically acceptable diluents or carriers. Suitably the composition comprises at least one further, different polypeptide according to any preceding claim. Suitably the composition comprises at least one further active agent.
In one embodiment the polypeptide or pharmaceutical composition is for use in suppressing or treating a disease or disorder mediated by infection of SARS-CoV-2, such as COVID-19, or for providing prophylaxis to a subject at risk of infection of SARS-CoV-2, such as COVID-19. In one embodiment there is provided a method of suppressing or treating a disease or disorder mediated by infection of SARS-CoV-2, such as COVID-19 or for providing prophylaxis to a subject at risk of infection of SARS-CoV-2, such as COVID-19, comprising administering to a person in need thereof a therapeutically effective amount of the polypeptide or pharmaceutical composition.
In one embodiment there is provided a polynucleotide encoding a polypeptide sequence disclosed in Table 1 or Table 2. In one embodiment there is provided a polynucleotide encoding an immunoglobulin heavy chain variable domain recited in Table 1 or Table 2. In one embodiment there is provided a vector comprising the polynucleotide.
The present invention will now be further described by means of the following non-limiting example.
While various invention embodiments have been particularly shown and described in the present disclosure, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the spirit and scope of the embodiments disclosed herein and set forth in the appended claims.
Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation, many equivalents to the specific embodiments described herein. The scope of the present disclosure is not intended to be limited to the above description, but rather is as set forth in the appended claims.
In the claims, articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The disclosure includes embodiments in which exactly one member of a group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all group members are present in, employed in, or otherwise relevant to a given product or process.
It is also noted that the term “comprising” is intended to be open and permits but does not require the inclusion of additional elements or steps. When the term “comprising” is used herein, the terms “consisting of” and “or including” are thus also encompassed and disclosed.
Where ranges are given, endpoints are included. Furthermore, it is to be understood that unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or subrange within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise.
In addition, it is to be understood that any particular embodiment of the present disclosure that falls within the prior art may be explicitly excluded from any one or more of the claims. Since such embodiments are deemed to be known to those of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiments of compositions disclosed herein can be excluded from any one or more claims, for any reason, whether or not related to the existence of prior art.
All cited sources, for example, references, publications, databases, database entries, and art cited herein, are incorporated into this application by reference, even if not expressly stated in the citation. In case of conflicting statements of a cited source and the instant application, the statement in the instant application shall control.
Section and table headings are not intended to be limiting.
Blood samples were collected from n=19 patients admitted to hospital with acute COVID-19 pneumonia. The mean age of patients was 50.2 (SD 18.5) years and 13 (68%) were male. All patients had a clinical history consistent with COVID-19 and typical radiological changes. Seventeen patients had a confirmatory positive PCR test for SARS-CoV-2. The patients experienced an average of 11 days (range 4-20) of symptoms prior to the day on which the blood sample was collected. Nine of the patients were still requiring hospital care but not oxygen therapy on day of sample collection (WHO Ordinal Scale Score 3), while eight were hospitalised requiring oxygen by conventional mask or nasal prongs (WHO Ordinal Scale Score 4) and two were hospitalised with severe COVID-19 pneumonia requiring high-flow nasal oxygen (WHO Ordinal Scale Score 5). On the day of sample collection, the direct clinical care team considered two patients to be deteriorating, four improving and the remaining thirteen were clinically stable.
IGHA and IGHG BCR sequencing yielded on average 135,437 unique sequences, and 23,742 clonotypes per sample (Table 3). To characterise the B cell response in COVID-19, we compared this BCR repertoire data to BCR repertoire data from healthy controls obtained in a separate study15. Comparing IGHV gene segment usage revealed a significantly different IGHV gene usage in COVID-19 patients compared to the healthy controls, most notably with increases in the usage of IGHV2-5 (2.6×IGHA, 1.0×IGHG increase), IGHV2-70 (4.6×IGHA, 4.1×IGHG increase), IGHV3-30 (2.0×IGHA, 1.4×IGHG increase), IGHV5-51 (3.5×IGHA, 2.0×IGHG increase), and IGHV4-34 (1.4×IGHA, 2.4×IGHG increase) in the COVID-19 patients (
Comparing isotype subclasses showed a significant increase in the relative usage of IGHA1 and IGHG1 in COVID-19 patients (
To further investigate the COVID-19-specific B cell response, we analysed the characteristics of the BCR sequences that are consistent with recent B cell activation—somatic hypermutation, and clonal expansion. In healthy controls, for class-switched sequences, there is a clear unimodal distribution of sequences with different numbers of mutations, and a mean mutation count across IGHA and IGHG isotypes of 17.6 (
To investigate differential clonal expansion between patients, the Shannon diversity index of each repertoire was calculated (while accounting for differences in read depth through subsampling). A more diverse repertoire is indicative of a greater abundance of different clonal expansions. The BCR repertoires of the COVID-19 patients were significantly more diverse than the BCR repertoires of the healthy controls (
Given the skewing of the B cell response in the COVID-19 patients to specific IGHV genes, we next investigated whether the same similarity was also seen on the BCR sequence level between different participants. Such convergent BCR signatures have been observed in response to other infectious diseases21, and may be used to identify disease-specific antibody sequences.
Of the 435,420 total clonotypes across all the COVID-19 patients, 9,646 (2.2%) were shared between at least two of the participants (
To identify a set of SARS-CoV-2-specific antibody sequences with high confidence, we identified 777 convergent clonotypes that were shared between at least four of the COVID-19 patients (see Tables 1 and 2, which also include further convergent clonotypes from another set of samples), but not seen in the healthy controls. In parallel, for a comparison of convergent signatures, we performed the same analysis on a cohort of seven metastatic breast cancer patient biopsy samples22, which identified 469 convergent clonotypes. These convergent clonotypes were highly specific to each disease cohort (
We next tested whether these convergent clonotypes correlated with disease severity. Indeed, 25 of these convergent clonotypes were found to associate with clinical symptoms after correcting for multiple testing, of which 22 were observed at a significantly higher frequency in improving patients (
BCR Sequence Convergence Signatures are Shared Between Different COVID-19 Studies in Different Locations and from Different Anatomical Sites
To further explore whether the convergent clonotypes observed in our study were indeed disease specific, and to determine whether such convergence was common across studies and geographic regions, we compared these 777 convergent clonotypes to public B cell datasets.
First, we compared our data to RNAseq data of bronchoalveolar lavage fluid obtained from five of the first infected patients in Wuhan, China23. These samples were obtained for the purpose of metagenomic analyses to identify the aetiological agent of the novel coronavirus but were re-analysed to determine whether we could extract any transcripts from BCRs. From the 10,038,758 total reads, we were able to identify 16 unique CDR3 AA sequences (Table 4). Of these, one had an exact AA match to a clonotype in our data and shared the same V gene segment (IGHV3-15), and J gene segment (IGHJ4) usage (
Next, we compared our 777 convergent clonotypes to CoV-AbDab—the Coronavirus Antibody Database [accessed 10 May 2020]16. At the time of access, this database contained 80 non-redundant CDRH3 sequences from published and patented antibodies proven to bind SARS-CoV-1 and/or SARS-CoV-2. We found 6 of our clonotypes to have high CDRH3 homology to the antibodies in CoV-AbDab (
Finally, we compared our data to a publicly available BCR deep sequencing dataset from six COVID-19 patients from Stanford, USA. 405 of our 777 convergent clonotypes matched to sequences in this dataset (
The CDRH3 identified in our SARS-CoV-2 patient dataset is SEQ ID NO: 2002.
We have used deep sequencing of the BCR heavy chain repertoire to evaluate the B cell responses of 19 individuals with COVID-19. In agreement with previous studies, there was a skewing of the repertoire in the response to SARS-CoV-2 infection, with an increased use of certain V genes, and an increase in the proportion of antibodies with longer CDRH3s, and an altered isotype subclass distribution14. The significantly increased usage of IGHA1 observed in the COVID-19 patients is in line with mucosal responses, where the longer hinge in IGHA1 compared to IGHA2 may offer advantages in antigen recognition by allowing higher avidity bivalent interactions with distantly spaced antigens.
As anticipated, given the novel nature of the virus, that SARS-CoV-2 infection largely stimulated a characteristically naïve response, rather than a reactivation of pre-existing memory B cells—(1) there was an increased prevalence of unmutated antigen-experienced class-switched BCR sequences, (2) an increase in the diversity of class-switched IGHA and IGHG BCRs, and (3) an increase in the usage of isotype subclasses that are associated with viral immunity. These observations are consistent with an increase in the frequency of recently activated B cells in response to SARS-CoV-2. In addition to the naïve response, there was also evidence of a proportion of the response arising from memory recall. In the COVID-19 patients, the largest clonal expansions were highly mutated, equivalent to the level observed in healthy control cohort. Such a secondary response to SARS-CoV-2 has been previously observed25, and may be due to recall of B cells activated in response to previously circulating human coronaviruses, as recently highlighted26,27.
We observed a potential relationship between repertoire characteristics and disease state, with improving patients showing a tendency towards a higher proportion of unmutated sequences. The increased prevalence of autoreactive IGHV4-34 sequences in improving COVID-19 patients compared to stable or deteriorating COVID-19 patients potentially suggests a role for natural or autoreactive antibodies in resolving infection and lower risk of pathology. However, this will need to be confirmed using larger sample cohorts. There is a clear need to expand on these findings by deepening the data pool and gathering more clinical data to aid understanding of the differences between individuals that respond with mild versus severe disease and have different recovery patterns. Building upon these observations could help to inform the future development of diagnostic assays to monitor and predict the progression of disease in infected patients.
A large number (777) of highly convergent clonotypes unique to COVID-19 were identified (see Table 1 and Table 2, which also include further convergent clonotypes from a separate set of samples). Our approach of subtracting the convergent clonotypes also observed in healthy controls15, allowed us to identify convergence specific to the disease cohort. The unbiased nature of the BCR repertoire analysis approach means that, whilst these convergent clonotypes are likely to include many antibodies to the spike protein and other parts of the virus they may also include other protective antibodies, including those to host proteins. It is expected that the heavy chains we have identified, and components of these heavy chains, will find utility in the treatment, prevention and diagnosis of COVID-19. Furthermore, characterisation of the heavy chains we have identified, coupled with matched light chains to generate functional antibodies will permit analysis of the binding sites and neutralising potential of these antibodies. The report that plasma derived from recently recovered donors with high neutralising antibody titres can improve the outcome of patients with severe disease28, supports the hypotheses that intervention with a therapeutic antibody has the potential to be an effective treatment. A manufactured monoclonal antibody or combination of antibodies would also provide a simpler, scalable and safer approach than plasma therapy.
Sequence convergence between our 777 convergent clonotypes with heavy chains from published and patented SARS-CoV-1 and SARS-CoV-2 antibodies16 supports several observations. Firstly, it demonstrates that our approach of finding a convergent sequence signature is a useful method for enriching disease-specific antibodies, as we find matches to known SARS-CoV spike-binding antibodies. Secondly, it shows that the clonotypes observed in response to SARS-CoV-2 overlap with those to SARS-CoV-1, presumably explained by the relatively high homology of the two related viruses 3. Indeed, here we show that there is an overrepresentation of clonotypes that correlate with patient clinical symptoms than is expected by chance, and these BCR sequences are associated with the dominant IgA1 and IgG1 responses. Finally, it shows that the convergence extends beyond our UK COVID-19 disease cohort.
Further evidence for convergence extending beyond our disease cohort came from the comparisons of our 777 convergent clonotypes to deep sequencing datasets from China23 and the USA14. The dataset from the USA is also from BCR sequencing of the peripheral blood of COVID-19 patients, and here we found matches to 405 of our 777 clonotypes. The dataset from China was from total RNA sequencing of the bronchoalveolar lavage fluid of SARS-CoV-2 infected patients. Only 16 unique CDRH3 sequences could be identified in this whole dataset, but one of them matched a convergent clonotype in the current study, showing that convergence can be seen both between different locations, and different sample types. We believe that the identification of such high BCR sequence convergence between geographically distinct and independent datasets could be highly significant and validates the disease association of the clonotypes, as well as the overall approach.
In summary, our BCR repertoire analysis provides information on the specific nature of the B cell response to SARS-CoV-2 infection. The information generated has the potential to facilitate the treatment of COVID-19 by supporting diagnostic approaches to predict the progression of disease, informing vaccine development and enabling the development of therapeutic antibody treatments and prophylactics.
Peripheral blood was obtained from patients admitted with acute COVID-19 pneumonia to medical wards at Barts Health NHS Trust, London, UK, after informed consent by the direct care team (NHS HRA RES Ethics 19/SC/0361). Venous blood was collected in EDTA Vacutainers (BD). Patient demographics and clinical information relevant to their admission were collected by members of the direct care team, including duration of symptoms prior to blood sample collection. Current severity was mapped to the WHO Ordinal Scale of Severity. Whether patients at time of sample collection were clinically Improving, Stable or Deteriorating was subjectively determined by the direct clinical team prior to any sample analysis. This determination was primarily made on the basis of whether requirement for supplemental oxygen was increasing, stable, or decreasing comparing current day to previous three days.
Blood samples were centrifuged at 150×g for 15 minutes at room temperature to separate plasma. The cell pellet was resuspended with phosphate-buffered saline (PBS without calcium and magnesium, Sigma) to 20 ml, layered onto 15 ml Ficoll-Paque Plus (GE Healthcare) and then centrifuged at 400×g for 30 minutes at room temperature without brake. Mononuclear cells (PBMCs) were extracted from the buffy coat and washed twice with PBS at 300×g for 8 min. PBMCs were counted with Trypan blue (Sigma) and viability of >96% was observed. 5×106 PBMCs were resuspended in RLT (Qiagen) and incubated at room temperature for 10 min prior to storage at −80° C. Consecutive donor samples with sufficient RLT samples progressed to RNA preparation and BCR preparation and are included in this manuscript.
Metastatic breast cancer biopsy samples were collected and RNA extracted as part of a previously reported cohort22.
Total RNA from 5×106 PBMCs was isolated using RNeasy kits (Qiagen). First-strand cDNA was generated from total RNA using SuperScript RT IV (Invitrogen) and IgA and IgG isotype specific primers29 including UMIs at 50° C. for 45 min (inactivation at 80° C. for 10 min).
The resulting cDNA was used as template for High Fidelity PCR amplification (KAPA, Roche) using a set of 6 FR1-specific forward primers29 including sample-specific barcode sequences (6 bp) and a reverse primer specific to the RT primer (initial denaturation at 95° C. for 3 min, 25 cycles at 98° C. for 20 sec, 60° C. for 30 sec, 72° C. for 1 min and final extension at 72° C. for 7 min). The amount of Ig amplicons (˜450 bp) was quantified by TapeStation (Beckman Coulter) and gel-purified.
Dual-indexed sequencing adapters (KAPA) were ligated onto 500 ng amplicons per patient using the HyperPrep library construction kit (KAPA) and the adapter-ligated libraries were finally PCR-amplified for 3 cycles (98° C. for 15 sec, 60° C. for 30 sec, 72° C. for 30 sec, final extension at 72° C. for 1 min). Pools of 10 and 9 libraries were sequenced on an Illumina MiSeq using 2×300 bp chemistry.
The Immcantation framework was used for sequence processing30,31 Briefly, paired-end reads were joined based on a minimum overlap of 20 nt, and a max error of 0.2, and reads with a mean phred score below 20 were removed. Primer regions, including UMIs and sample barcodes, were then identified within each read, and trimmed. Together, the sample barcode, UMI, and constant region primer were used to assign molecular groupings for each read. Within each grouping, usearch32, was used to subdivide the grouping, with a cutoff of 80% nucleotide identity, to account for randomly overlapping UMIs. Each of the resulting groupings is assumed to represent reads arising from a single RNA. Reads within each grouping were then aligned, and a consensus sequence determined.
For each processed sequence, IgBlast33 was used to determine V, D and J gene segments, and locations of the CDRs and FWRs. Isotype was determined based on comparison to germline constant region sequences. Sequences annotated as unproductive by IgBlast were removed. The number of mutations within each sequence was determined using the shazam R package31.
Sequences were clustered to identify those arising from clonally related B cells; a process termed clonotyping. Sequences from all samples were clustered together to also identify convergent clusters between samples. Clustering was performed using a previously described algorithm34. Clustering required identical V and J gene segment usage, identical CDRH3 length, and allowed 1 AA mismatch for every 10 AAs within the CDRH3. Cluster centers were defined as the most common sequence within the cluster. Lineages were reconstructed from clusters using the alakazam R package35. The similarity tree of the convergent clonontype CDR3 sequences was generated through a kmer similarity matrix between sequences in R.
The healthy control BCR sequence dataset used here has been described previously15. Only samples from participants aged 10 years or older, and from peripheral blood were used, resulting in a mean age of 28 (range: 11-51). Furthermore, only class-switched sequences were considered.
The bronchoalveolar lavage data comes from a previously published study of SARS-CoV-2 infection23 with data available under the PRJNA605983 BioProject on NCBI. MIXCR v3.0.3 was used, with default settings, to extract reads mapping to antibody genes from the total RNASeq data36.
All public CDRH3 AA sequences associated with published or patented SARS-CoV-1 or SARS-CoV-2 binding antibodies were mined from CoV-AbDab16, downloaded on 10 May 2020. A total of 80 non-redundant CDRH3s were identified (100% identity threshold). These sequences were then clustered alongside the representative CDRH3 sequence from each of our 777 convergent clones using CD-HIT37, at an 80% sequence identity threshold (allowing at most a CDRH3 length mismatch of 1 AA). Cluster centres containing at least one CoV-AbDab CDRH3 and one convergent clone CDRH3 were further investigated.
The fourteen MiSeq “read 1” FASTQ datasets from the six SARS-CoV-2 patients analysed in Nielsen et al.14 were downloaded from the Sequence Read Archive38. IgBlast33 was used to identify heavy chain V, D, and J gene rearrangements and antibody regions. Unproductive sequences, sequences with out-of-frame V and J genes, and sequences missing the CDRH3 region were removed from the downstream analysis. Sequences with 100% amino acid and isotype matches were collapsed. To circumvent the disparity in collapsed dataset sizes between pairs of replicates, we selected the replicate with the highest number of sequences for downstream analysis.
The public SARS-CoV-2-positive14 and healthy control BCR repertoires39 were scanned for clonotype matches to our 777 convergent clonotype cluster centres. A BCR repertoire sequence was determined as a match if it had identical V and J genes, the same length CDRH3, and was within 1 AA mismatch per 10 CDRH3 AAs to a convergent clonotype representative sequence.
Statistical analysis and plotting were performed using R40. Plotting was performed using ggplot241. Sequence logos were created using ggseqlogo42. Specific statistical tests used are detailed in the figure descriptions. Correlations of IGHV4-34 autoreactive motifs and convergent clonotypes was performed by manova in R.
Number | Date | Country | Kind |
---|---|---|---|
2007532.1 | May 2020 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2021/051221 | 5/20/2021 | WO |