COMPOSITIONS AND METHODS FOR IMMUNE REPERTOIRE MONITORING

Information

  • Patent Application
  • 20230416810
  • Publication Number
    20230416810
  • Date Filed
    May 15, 2023
    a year ago
  • Date Published
    December 28, 2023
    9 months ago
Abstract
The present disclosure provides methods, compositions, kits, and systems useful in the determination and evaluation of the immune repertoire. In one aspect, target-specific primer panels provide for the effective amplification of sequences of B cell receptor heavy and light chains in a single assay, with improved sequencing accuracy and resolution over the repertoire. Variable regions associated with the immune cell receptor are resolved to effectively portray clonal diversity of a biological sample and/or differences associated with the immune cell repertoire of a biological sample.
Description
SEQUENCE LISTING

This application hereby incorporates by reference the material of the electronic Sequence Listing filed concurrently herewith. The material in the electronic Sequence Listing is submitted as an Extensible Markup Language (.xml) file entitled “TP109031USCON1-WO1_ST26.xml” created on May 14, 2023 which has a file size of 3,298,198 bytes and is herein incorporated by reference in its entirety.


BACKGROUND

Adaptive immune response comprises selective response of B and T cells recognizing antigens. The immunoglobulin genes encoding antibody (Ab, in B cell) and T-cell receptor (TCR, in T cell) antigen receptors comprise complex loci wherein extensive diversity of receptors is produced as a result of recombination of the respective variable (V), diversity (D), and joining (J) gene segments, as well as subsequent somatic hypermutation events during early lymphoid differentiation. The recombination process occurs separately for both subunit chains of each receptor and subsequent heterodimeric pairing creates still greater combinatorial diversity. Calculations of the potential combinatorial and junctional possibilities that contribute to the human immune receptor repertoire have estimated that the number of possibilities greatly exceeds the total number of peripheral B or T cells in an individual. See, for example, Davis and Bjorkman (1988) Nature 334:395-402; Arstila et al. (1999) Science 286:958-961; van Dongen et al., In: Leukemia, Henderson et al. (eds) Philadelphia: WB Saunders Company, 2002, pp 85-429.


Extensive efforts have been made over years to improve analysis of the immune repertoire at high resolution. Means for specific detection and monitoring of expanded clones of lymphocytes would provide significant opportunities for characterization and analysis of normal and pathogenic immune reactions and responses. Despite efforts, effective high resolution analysis has provided challenges. Advances in next generation sequencing (NGS) have provided access to capturing the repertoire, however, due to the nature of the numerous related sequences and introduction of sequence errors as a result of the technology, efficient and effective reflection of the true repertoire has proven difficult. Interactions of primer-primer dimers as well as incompatibility of reaction conditions in multiplex PCR assays often require separate PCR reactions to survey each immunoglobulin chain and sometimes within each immunoglobulin chain, often leading to a longer time-to-result for samples in which no marker is initially detected. Thus, there remains a need for improved sequencing methodologies and workflows capable of efficiently resolving complex populations of highly variable immune cell receptor sequences for effective profiling of vast repertoires of immune cell receptors in order to better understand immune cell response, enhance diagnostic and treatment capabilities, and devise new therapeutics.


SUMMARY OF THE INVENTION

In one aspect of the invention compositions are provided for a single stream determination of an immune repertoire in a sample. In some embodiments the composition comprises at least one set of primers i) and ii) and iii), wherein i) consists of a plurality of variable (V) gene primers directed to a majority of different variable regions of an immune receptor IgH coding sequence and a plurality of joining (J) gene primers directed to at least a portion of a majority of different J genes of an immune receptor IgH coding sequence; and ii) consists one or more variable (V) gene primers directed to at least a portion of the respective target variable region of the respective immune receptor IgL lambda coding sequence and a plurality of joining (J) gene primers directed to at least a portion of a majority of different J genes of an immune receptor IgLlambda coding sequence; and iii) consists one or more variable (V) gene primers directed to at least a portion of the respective target variable region of the respective immune receptorIgL kappa coding sequence and a plurality of joining (J) gene primers directed to at least a portion of a majority of different J genes of an immune receptor IgLkappa coding sequence. In some embodiments the composition for analysis of a B cell receptor (BCR) repertoire in a sample comprises at least one set of primers i) and ii) and iii), wherein i) consists of (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgH coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and (b) a plurality of J gene primers directed to a majority of different J genes of BCR IgH coding sequence; and ii) consists of (a) one or more V gene primers directed to at least a portion of a V gene of the BCR IgL lambda coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgL lambda coding sequence; and iii) consists of (a) one or more V gene primers directed to at least a portion of a V gene of the BCR IgL kappa coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgL kappa coding sequence; wherein each set of i) and ii) and iii) primers directed to coding sequences of the same target BCR gene selected from IgH, IgLlambda, and IgLkappa; and wherein each set of i) and ii) and iii) primers directed to the same target BCR is configured to amplify the target BCR repertoire. In certain embodiments compositions further comprise iv) consisting of (a) one or more gene primers directed to a IgLkappa Cintron sequence and (b) one or more gene primers directed to a KDE sequence. In still other embodiments the composition comprises at least one set of primers i) and ii), wherein i) consists (a) plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of distal FR3 within the V gene of the IgH BCR coding sequence and/or (b) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR2 within the V gene of the IgH BCR coding sequence; and ii) consists a plurality of J gene primers directed to at least a portion of a majority of different J genes of the at least one IgH BCR coding sequence.


In some aspects, a multiplex assay comprising compositions of the invention is provided. In some embodiments a test kit comprising compositions of the invention is provided.


In other aspects of the invention, methods are provided for determining immune repertoire activity in a biological sample. Such methods comprise performing multiplex amplification with primer set which target two different types of immune receptors, for example, multiplex amplification of BCR targets in a single reaction.


In some embodiments, the method for amplification of rearranged genomic DNA (gDNA) sequences of a B cell receptor (BCR) repertoire in a sample comprises performing a single multiplex amplification reaction to amplify expressed target immune receptor nucleic acid template molecules using at least one set of:

    • i) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgH coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene,
      • (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgH coding sequence; and
    • ii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLlambda coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene,
      • (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgLlambda coding sequence; and
    • iii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLkappa coding sequence comprising at least a portion of framework region 1 (FR3) within the V gene,
      • (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgLkappa coding sequence; and optionally
    • iv) (a) one or more gene primers directed to a IgLkappa Cintron sequence, and
      • (b) one or more gene primers directed to a KDE sequence;
    • wherein each set of i) and ii) and iii) primers is directed to coding sequences of the same target BCR immune receptor gene selected from IgH, IgLlambda, and IgLkappa gene and wherein performing the amplification using the set of i) and ii) and iii) primers results in amplicon molecules representing the target BCR immune receptor repertoire in the sample; thereby generating immune receptor amplicon molecules comprising the target immune receptor repertoire.


In some embodiments, the method for amplification of rearranged genomic DNA (gDNA) sequences of a B cell receptor (BCR) repertoire in a sample comprises performing a single multiplex amplification reaction to amplify expressed target immune receptor nucleic acid template molecules using at least one set of:

    • i) (a) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of distal FR3 within the V gene, and/or
      • (b) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR2 within the V gene; and
    • ii) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the at least one BCR coding sequence;
    • wherein each set of i) and ii) primers is directed to coding sequences of the same target BCR IgH gene and wherein performing the amplification using the at least one set of i) and ii) primers results in amplicon molecules representing the target BCR repertoire in the sample; thereby generating target BCR amplicon molecules comprising the target BCR repertoire.


Methods of the invention further comprise preparing a BCR repertoire library using the amplified target immune receptor sequences through introducing adapter sequences to the termini of the amplified target sequences. In some embodiments, the adapter-modified immune receptor repertoire library is clonally amplified. The methods further comprise detecting sequences of the immune repertoire of each of the immune receptors in the sample and/or expression of each of the plurality of target immune receptor sequences, wherein a change in the level of repertoire sequences and/or expression of one or more target immune receptor markers as compared with a second sample or a control sample determines a change in immune repertoire activity in the sample. In certain embodiments sequencing of the immune receptor amplicon molecules is carried out using next generation sequence analysis to determine sequence of the immune receptor amplicons. In particular embodiments determining the sequence of the immune receptor amplicon molecules includes obtaining initial sequence reads, aligning and identifying productive reads and correcting errors to generate rescued productive reads and determining the sequences of the resulting total productive reads, thereby providing sequence of the immune repertoire in the sample. Provided methods described herein utilize compositions of the invention provided herein. In still other aspects of the invention, particular analysis methodology for error correction is provided in order to generate comprehensive, effective sequence information from methods provided herein.


In another aspect, methods are provided for identifying or screening for a biomarker for a disease or condition in a subject using provided compositions and methods described herein. In some embodiments, the disease or condition a biomarker is identified or screened is selected from cancer, autoimmune disease, infectious disease, allergy, response to vaccination, and response to an immunotherapy treatment.





BRIEF DESCRIPTION OF THE DRAWINGS


FIGS. 1A-1C are diagrams of depicting assays of the invention. (A) depicts a B cell clonality assay for detection of IGH, IGkappa, IGlambda comprising FR3-J primers as well as KDE/Cint primers in a single library preparation reaction; (B) depicts two additional IGH assays targeting distal regions of FR3 (FR3(d)-J) and FR2-J regions; (C) depicts two additional IGH assays targeting leader-J) and FR1-J regions.





DESCRIPTION OF THE INVENTION

We have developed a multiplex library preparation technology and sequencing workflow for effective detection and analysis of the B cell immune repertoire in a sample. Provided methods enable a single reaction for profiling B cell receptor heavy and light chains using a single library assay. Combining receptors in a single reaction allows for a higher success rate in clonality detection while maintaining the ability to efficiently detect rare clones of IGH, IGK, and IG chain rearrangements (e.g., down to 1:106). Provided methods simplify the workflow for clonality assessment and rare clone detection of B cells, e.g., in B cell malignancies. Provided methods and compositions herein represent an advancement in repertoire assessment by NGS, by combining multiple B-cell receptor targets in a single library construction reaction. Multiple receptor assays allow for simpler determination of clonality from DNA samples using fewer secondary tests and conserving sample material.


We have developed a multiplex next generation sequencing workflow for effective detection and analysis of the immune repertoire in a sample. Provided methods, compositions, systems, and kits are for use in high accuracy amplification and sequencing of immune cell receptor sequences (e.g., B cell receptor (BCR or Ab) targets) in monitoring and resolving complex immune cell repertoire(s) in a subject. The target immune cell receptor genes have undergone rearrangement (or recombination) of the VDJ or VJ gene segments, the gene segments depending on the particular receptor gene (e.g., IgH, IgLkappa, IgLlambda). In certain embodiments, the present disclosure provides methods, compositions, and systems that use nucleic acid amplification, such as PCR, to enrich rearranged target immune cell receptor gene sequences from gDNA for subsequent sequencing. In certain embodiments, the present disclosure also provides methods and systems for effective identification and removal of amplification or sequencing-derived error(s) to improve read assignment accuracy and lower the false positive rate. In particular, provided methods described herein may improve accuracy and performance in sequencing applications with nucleotide sequences associated with genomic recombination and high variability. In some embodiments, methods, compositions, systems, and kits provided herein are for use in amplification and sequencing of the CDRs of rearranged immune cell receptor gDNA in a sample. Thus, provided herein are multiplex immune cell receptor expression compositions and immune cell receptor gene-directed compositions for multiplex library preparation, used in conjunction with next generation sequencing technologies and workflow solutions (e.g., manual or automated), for effective detection and characterization of the immune repertoire in a sample.


The CDRs of a BCR result from genomic DNA undergoing recombination of the V(D)J gene segments as well as addition and/or deletion of nucleotides at the gene segment junctions. Recombination of the V(D)J gene segments and subsequent hypermutation events leads to extensive diversity of the expressed immune cell receptors. With the stochastic nature of V(D)J recombination, it is often the case that rearrangement of the B cell receptor genomic DNA will fail to produce a functional receptor, instead producing what is termed an “unproductive” rearrangement. Typically, unproductive rearrangements have out-of-frame Variable and Joining coding segments, and lead to the presence of premature stop codons and synthesis of irrelevant peptides. Unproductive BCR gene rearrangements are generally rare in cDNA-based repertoire sequencing for a number of biological or physiological reasons such as: 1) nonsense-mediated decay, which destroys mRNA containing premature stop codons, 2) B cell selection, where only B cells with a functional receptor survive, and 3) allelic exclusion, where only a single rearranged receptor allele is expressed in any given B cell.


BCR sequences can also appear as unproductive rearrangements from errors introduced during amplification reactions or during sequencing processes. For example, an insertion or deletion (indel) error during a target amplification or sequencing reaction can cause a frameshift in the reading frame of the resulting coding sequence. Such a change may result in a target sequence read of a productive rearrangement being interpreted as an unproductive rearrangement and discarded from the group of identified clonotypes. Accordingly, in some embodiments, methods and systems provided herein include processes for identification and/or removing PCR or sequencing-derived error from the determined immune receptor sequence.


In some embodiments, methods and compositions provided are used for amplifying the rearranged variable regions of immune cell receptor gDNA, e.g., rearranged BCR gene DNA. Multiplex amplification is used to enrich for a portion of rearranged BCR gDNA which includes at least a portion of the variable region of the receptor. In some embodiments, the amplified gDNA includes one or more complementarity determining regions CDR1, CDR2, and/or CDR3 for the target receptor. In some embodiments, the amplified gDNA includes one or more complementarity determining regions CDR2, and/or CDR3 for IgH. In some embodiments, the amplified gDNA includes primarily CDR3 for the target receptor, e.g., CDR3 for IgH.


As used herein, “immune cell receptor” and “immune receptor” are used interchangeably.


As used herein, the terms “complementarity determining region” and “CDR” refer to regions of a B cell receptor or an antibody (immunoglobulin) where the molecule complements an antigen's conformation, thereby determining the molecule's specificity and contact with a specific antigen. In the variable regions of B cell receptors, the CDRs are interspersed with regions that are more conserved, termed framework regions (FR). Each variable region of a B cell receptor contains 3 CDRs, designated CDR1, CDR2 and CDR3, and also contains 4 framework sub-regions, designated FR1, FR2, FR3 and FR4.


As used herein, the term “framework” or “framework region” or “FR” refers to the residues of the variable region other than the CDR residues as defined herein. There are four separate framework sub-regions that make up the framework: FR1, FR2, FR3, and FR4.


The particular designation in the art for the exact location of the CDRs and FRs within the receptor molecule (BCR or immunoglobulin) varies depending on what definition is employed. Unless specifically stated otherwise, the IMGT designations are used herein in describing the CDR and FR regions (see Brochet et al. (2008) Nucleic Acids Res. 36:W503-508, herein specifically incorporated by reference).


Other well-known standard designations for describing the regions include those found in Kabat et al., (1991) Sequences of Proteins of Immunological Interest, 5th Ed. Public Health Service, National Institutes of Health, Bethesda, Md., and in Chothia and Lesk (1987) J. Mol. Biol. 196:901-917; herein specifically incorporated by reference. As one example of CDR designations, the residues that make up the six immunoglobulin CDRs have been characterized by Kabat as follows: residues 24-34 (CDRL1), (CDRL2) and 89-97 (CDRL3) in the light chain variable region and 31-35 (CDRH1), 50-65 (CDRH2) and 95-102 (CDRH3) in the heavy chain variable region; and by Chothia as follows: residues 26-32 (CDRL1), 50-52 (CDRL2) and 91-96 (CDRL3) in the light chain variable region and 26-32 (CDRH1), 53-55 (CDRH2) and 96-101 (CDRH3) in the heavy chain variable region.


The term “antibody” or immunoglobulin” or “B cell receptor” or “BCR,” as used herein, is intended to refer to immunoglobulin molecules comprised of four polypeptide chains, two heavy (H) chains and two light (L) chains (lambda or kappa) inter-connected by disulfide bonds. An antibody has a known specific antigen with which it binds. Each heavy chain of an antibody is comprised of a heavy chain variable region (abbreviated herein as HCVR, HV or VH) and a heavy chain constant region. The heavy chain constant region is comprised of three domains, CHL CH2 and CH3. Each light chain is comprised of a light chain variable region (abbreviated herein as LCVR or VL or KV or LV to designate kappa or lambda light chains) and a light chain constant region. The light chain constant region is comprised of one domain, CL. The heavy chain determines the class or isotype to which the immunoglobulin belongs. In mammals, for example, the five main immunoglobulin isotypes are IgA, IgD, IgG, IgE and IgM and they are classed according to the alpha, delta, epsilon, gamma or mu heavy chain they contain, respectively.


As noted, the diversity of the BCR chain CDRs is created by recombination of germline variable (V), diversity (D), and joining (J) gene segments, as well as by independent addition and deletion of nucleotides at each of the gene segment junctions during the process of BCR gene rearrangement. In the rearranged nucleic acid encoding a BCR heavy chain, CDR1 and CDR2 are found in the V gene segment and CDR3 includes some of the V gene segment and the D and J gene segments. In the rearranged nucleic acid encoding a BCR light chain, CDR1 and CDR2 are found in the V gene segment and CDR3 includes some of the V gene segment and the J gene segment.


In some embodiments, a multiplex amplification reaction is used to amplify BCR genomic DNA having undergone V(D)J rearrangement. In some embodiments, a multiplex amplification reaction is used to amplify nucleic acid molecule(s) comprising at least a portion of a BCR CDR from gDNA derived from a biological sample. In some embodiments, a multiplex amplification reaction is used to amplify nucleic acid molecule(s) comprising at least two CDRs of a BCR from gDNA derived from a biological sample. In some embodiments, a multiplex amplification reaction is used to amplify nucleic acid molecules comprising at least three CDRs of a BCR from gDNA derived from a biological sample. In some embodiments, the resulting amplicons are used to determine the nucleotide sequences of the rearranged BCR CDRs in the sample. In some embodiments, determining the nucleotide sequences of such amplicons comprising at least CDR3 is used to identify and characterize novel BCR alleles


In some embodiments of the multiplex amplification reactions, each primer set used target a same BCR region however the different primers in the set permit targeting the gene's different V(D)J gene rearrangements. For example, the primer set for amplification of the expressed IgH or the rearranged IgH gDNA are all designed to target the same region(s) from IgH mRNA or IgH gDNA, respectively, but the individual primers in the set lead to amplification of the various IgH VDJ gene combinations. In some embodiments, at least one primer set includes a variety of primers directed to at least a portion of J gene segments of an immune receptor gene and the other primer set includes a variety of primers directed to at least a portion of V gene segments of the same gene.


In some embodiments, a multiplex amplification reaction is used to amplify cDNA derived from mRNA expressed from rearranged BCR genomic DNA, including rearranged IgH, IgLkappa, and IgLlambda genomic DNA. In some embodiments, at least a portion of a BCR CDR, for example CDR3, is amplified from cDNA in a multiplex amplification reaction. In some embodiments, at least two CDR portions of BCR are amplified from cDNA in a multiplex amplification reaction. In certain embodiments, a multiplex amplification reaction is used to amplify at least the CDR1, CDR2, and CDR3 regions of a BCR cDNA. In some embodiments, the resulting amplicons are used to determine the expressed BCR CDR nucleotide sequence. In some embodiments, the resulting amplicons are used to determine the expressed BCR CDR nucleotide sequence and Ig isotype of the sequence. In some embodiments, the resulting amplicons are used to determine the expressed IgH CDR nucleotide sequence and the Ig isotype and Ig sub-isotype.


In some embodiments, a multiplex amplification reaction is used to amplify rearranged BCR genomic DNA, including rearranged IgH, IgLkappa, and IgLlambda genomic DNA. In some embodiments, at least a portion of a BCR CDR, for example CDR3, is amplified from gDNA in a multiplex amplification reaction. In some embodiments, at least two CDR portions of BCR are amplified from gDNA in a multiplex amplification reaction. In certain embodiments, a multiplex amplification reaction is used to amplify at least the CDR2, and CDR3 regions of a rearranged BCR gDNA. In some embodiments, the resulting amplicons are used to determine the rearranged BCR CDR nucleotide sequence. In some embodiments, the resulting amplicons are used to determine the rearranged BCR CDR nucleotide sequence and Ig isotype of the sequence.


In some embodiments, multiplex amplification reactions are performed with primer sets designed to generate amplicons which include the expressed CDR3 regions of the target immune receptor mRNA. In some embodiments, multiplex amplification reactions are performed using i) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgH coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene and (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgH coding sequence; and ii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLlambda coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgLlambda coding sequence; and iii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLkappa coding sequence comprising at least a portion of framework region 1 (FR3) within the V gene, wherein each set of i) and ii) and iii) primers is directed to coding sequences of the same BCR gene such that performing the amplification using the at least one set of i) and ii) primers results in amplicon molecules representing the target BCR repertoire in the sample; thereby generating target BCR amplicon molecules comprising the target BCR repertoire. For example, exemplary primers specific for IgH V gene FR3 regions are shown in Table 9 and exemplary primers specific for IgH J genes are shown in Table 6, exemplary primers specific for IgLkappa V gene FR3 regions are shown in Table 3 and exemplary primers specific for IgLkappa J genes are shown in Table 4, exemplary primers specific for IgLlambda V gene FR3 regions are shown in Table 1 and exemplary primers specific for IgLlambda J genes are shown in Table 2 and exemplary primers specific for KDE and Cint are shown in Table 5.


In some embodiments, the multiplex amplification reaction uses i) (a) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of distal FR3 within the V gene, and/or (b) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR2 within the V gene; and ii) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the at least one BCR coding sequence, wherein each set of i) and ii) primers is directed to coding sequences of the same target BCR IgH gene such that performing the amplification using the at least one set of i) and ii) primers results in amplicon molecules representing the target BCR repertoire in the sample; thereby generating target BCR amplicon molecules comprising the target BCR repertoire. For example, exemplary primers specific for IgH V gene FR2 regions are shown in Table 7 and exemplary primers specific for IgH J genes are shown in Table 6 and exemplary primers specific for IgH V gene distalFR3 regions are shown in Table 8 and exemplary primers specific for IgH J genes are shown in Table 6.


In some embodiments, provided are compositions for multiplex amplification of at least a portion of an expressed BCR variable region. In some embodiments, the composition comprises a plurality of sets of primer pair reagents directed to a portion of a V gene framework region and a portion of a constant (C) gene of rearranged target immune receptor genes selected from the group consisting of immunoglobulin heavy chain (IgH), immunoglobulin light chain lambda (IgL), and immunoglobulin light chain kappa (IgK). In some embodiments, the composition comprises a plurality of sets of primer pair reagents directed to a portion of a V gene framework region and a portion of a J gene of rearranged target immune receptor genes selected from the group consisting of IgH, IgLkappa and IgLlambda.


In some embodiments, provided methods comprise multiplex amplification reactions performed with primer sets designed to generate amplicons which include the CDR1, CDR2 and CDR3 regions of the target immune receptor nucleic acid. In some embodiments, multiplex amplification reactions are performed using i) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgH coding sequence comprising at least a portion of framework region 1 (FR1) within the V gene or (b) a plurality of V gene primers directed to a majority of different V genes of BCR IgH coding sequence comprising at least a portion of leader region within the V gene; and ii) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgH coding sequence; wherein each set of i) and ii) primers is directed to coding sequences of the same BCR gene such that performing the amplification using the at least one set of i) and ii) primers results in amplicon molecules representing the target BCR repertoire in the sample; thereby generating target BCR amplicon molecules comprising the target BCR repertoire. For example, exemplary primers specific for IgH V gene FR1 regions are shown in Table 11 and exemplary primers specific for IgH J genes are shown in Table 6, exemplary primers specific for IgH V gene leader regions are shown in Table 10 and exemplary primers specific for IgH J genes are shown in Table 6.


In some embodiments, provided are compositions for multiplex amplification of at least a portion of an expressed BCR variable region. In some embodiments, the composition comprises a plurality of sets of primer pair reagents directed to a portion of a V gene framework region FR1 and a portion of a joining (J) gene of rearranged target immune receptor genes selected from (IgH). In some embodiments, the composition comprises a plurality of sets of primer pair reagents directed to a portion of a V gene leader region and a portion of a J gene of rearranged target immune receptor genes selected from IgH immunoglobulin heavy chain.


Amplification by PCR is performed with at least two primers. For the methods provided herein, a set of primers is used that is sufficient to amplify all or a defined portion of the variable sequences at the locus of interest, which locus may include any or all of the aforementioned BCR immunoglobulin loci. In some embodiments, various parameters or criteria outlined herein may be used to select the set of target-specific primers for the multiplex amplification.


In some embodiments, primer sets used in the multiplex reactions are designed to amplify at least 50% of the known expressed or gDNA rearrangements at the locus of interest. In certain embodiments, primer sets used in the multiplex reactions are designed to amplify at least 75%, at least 85%, at least 90%, at least 95%, at least 97%, at least 98% or more of the known expressed or gDNA rearrangements at the locus of interest.


For example, such a multiplex amplification reaction includes at least 15, 20, 25, 30, 35, 40, 45, 55, 60, 65, 70, 75, 80, 85, or 90, preferably 22, 23, 24, 25, 26, 27, 28, 29, 30, 34, 38, 42, 46, 50, 54, 58, or 62 reverse primers in which each reverse primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR3 regions. In such embodiments, the plurality of reverse primers directed to the BCR V gene FR3 regions is combined with at least 1 forward primer directed to a sequence corresponding to at least a portion of a joining J gene of the same BCR gene. In some embodiments, the plurality of reverse primers directed to the BCR V gene FR3 regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 forward primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments of the multiplex amplification reactions, the BCR V gene FR3 directed primers may be the forward primers and the BCR J gene-directed primer(s) may be the reverse primer(s). Accordingly, in some embodiments, a multiplex amplification reaction includes at least 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, or 90, preferably 22, 23, 24, 25, 26, 27, 28, 29, 30, 34, 38, 42, 46, 50, 54, 58, or 62 forward primers in which each forward primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR3 regions. In such embodiments, the plurality of forward primers directed to the BCR V gene FR33regions is combined with at least 1 reverse primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of forward primers directed to the BCR V gene FR3 regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 reverse primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments, such FR3 and J gene amplification primer sets may be directed to IgH gene sequences. In some preferred embodiments, about 22 to about 35 reverse primers directed to different IgH V gene FR3 regions are combined with about 2 to about 8 forward primers directed to a portion of the IgH J genes. In other preferred embodiments, about 22 to about 35 reverse primers directed to different IgH V gene FR3 regions are combined with about 5 to about 15 forward primers directed to a portion of the IgH J genes. In other preferred embodiments, about 48 to about 60 reverse primers directed to different IgH V gene FR3 regions are combined with about 5 to about 15 forward primers directed to a portion of the IgH J genes. In some preferred embodiments, about 22 to about 35 forward primers directed to different IgH V gene FR3 regions are combined with about 2 to about 8 reverse primers directed to a portion of the IgH J genes. In other preferred embodiments, about 22 to about 35 forward primers directed to different IgH V gene FR3 regions are combined with about 5 to about 15 reverse primers directed to a portion of the IgH J genes. In yet other preferred embodiments, about 48 to about 60 forward primers directed to different IgH V gene FR3 regions are combined with about 5 to about 15 reverse primers directed to a portion of the IgH J genes. In some preferred embodiments, the forward primers directed to IgH V gene FR3 regions are selected from those listed in Table 8 and the reverse primers directed to the IgH J genes are selected from those listed in Table 6. In other embodiments, the FR3 and J gene amplification primer sets are directed to Ig light chain lambda, Ig light chain kappa gene sequences.


In some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 reverse primers in which each reverse primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR2 regions. In such embodiments, the plurality of reverse primers directed to the BCR V gene FR2 regions is combined with at least 1 forward primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of reverse primers directed to the BCR V gene FR2 regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 forward primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments of the multiplex amplification reactions, the BCR V gene FR2 directed primers may be the forward primers and the BCR J gene-directed primer(s) may be the reverse primer(s). Accordingly, in some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 forward primers in which each forward primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR2 regions. In such embodiments, the plurality of forward primers directed to the BCR V gene FR2 regions is combined with at least 1 reverse primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of forward primers directed to the BCR V gene FR2 regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 reverse primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments, such FR2 and J gene amplification primer sets may be directed to IgH gene sequences. In some embodiments, about 5 to about 15 reverse primers directed to different IgH V gene FR2 regions are combined with about 2 to about 8 forward primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about 15 reverse primers directed to different IgH V gene FR2 regions are combined with about 5 to about 15 forward primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about forward primers directed to different IgH V gene FR2 regions are combined with about 2 to about 8 reverse primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about 15 forward primers directed to different IgH V gene FR2 regions are combined with about 5 to about 15 reverse primers directed to a portion of the IgH J gene. In some preferred embodiments, the forward primers directed to IgH V gene FR2 regions are selected from those listed in Table 7 and the reverse primers directed to the IgH J gene are selected from those listed in Table 6.


In some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 50, 60, 70, 80, or 90 reverse primers in which each reverse primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR2 regions. In such embodiments, the plurality of reverse primers directed to the BCR V gene FR1 regions is combined with at least 1 forward primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of reverse primers directed to the BCR V gene FR1 regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 forward primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments of the multiplex amplification reactions, the BCR V gene FR1 directed primers may be the forward primers and the BCR J gene-directed primer(s) may be the reverse primer(s). Accordingly, in some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 forward primers in which each forward primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR1 regions. In such embodiments, the plurality of forward primers directed to the BCR V gene FR1 regions is combined with at least 1 reverse primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of forward primers directed to the BCR V gene FR1 regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 reverse primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments, such FR1 and J gene amplification primer sets may be directed to IgH gene sequences. In some embodiments, about 5 to about 15 reverse primers directed to different IgH V gene FR1 regions are combined with about 2 to about 8 forward primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about 15 reverse primers directed to different IgH V gene FR1 regions are combined with about 5 to about 15 forward primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about forward primers directed to different IgH V gene FR1 regions are combined with about 2 to about 8 reverse primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about 15 forward primers directed to different IgH V gene FR1 regions are combined with about 5 to about 15 reverse primers directed to a portion of the IgH J gene. In some preferred embodiments, the forward primers directed to IgH V gene FR1 regions are selected from those listed in Table 11 and the reverse primers directed to the IgH J gene are selected from those listed in Table 6.


In some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 50, 60, 70, 80, or 90 reverse primers in which each reverse primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR2 regions. In such embodiments, the plurality of reverse primers directed to the BCR V gene LEADER regions is combined with at least 1 forward primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of reverse primers directed to the BCR V gene LEADER regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 forward primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments of the multiplex amplification reactions, the BCR V gene LEADER directed primers may be the forward primers and the BCR J gene-directed primer(s) may be the reverse primer(s). Accordingly, in some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 forward primers in which each forward primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene LEADER regions. In such embodiments, the plurality of forward primers directed to the BCR V gene LEADER regions is combined with at least 1 reverse primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of forward primers directed to the BCR V gene LEADER regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 reverse primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments, such LEADER and J gene amplification primer sets may be directed to IgH gene sequences. In some embodiments, about 5 to about 15 reverse primers directed to different IgH V gene LEADER regions are combined with about 2 to about 8 forward primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about 15 reverse primers directed to different IgH V gene LEADER regions are combined with about 5 to about 15 forward primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about 15 forward primers directed to different IgH V gene LEADER regions are combined with about 2 to about 8 reverse primers directed to a portion of the IgH J gene. In some embodiments, about 5 to about 15 forward primers directed to different IgH V gene LEADER regions are combined with about 5 to about 15 reverse primers directed to a portion of the IgH J gene. In some preferred embodiments, the forward primers directed to IgH V gene LEADER regions are selected from those listed in Table 10 and the reverse primers directed to the IgH J gene are selected from those listed in Table 6.


In some embodiments, a multiplex amplification reaction includes at least 20, 25, 30, 40, 45, preferably 50, 55, 60, 65, 70, 75, 80, 85, or 90 reverse primers in which each reverse primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR3 regions. In such embodiments, the plurality of reverse primers directed to the BCR V gene FR3 regions is combined with at least 1 forward primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of reverse primers directed to the BCR V gene FR3 regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 forward primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments of the multiplex amplification reactions, the BCR V gene FR3 directed primers may be the forward primers and the BCR J gene-directed primer(s) may be the reverse primer(s). Accordingly, in some embodiments, a multiplex amplification reaction includes at least 20, 25, 30, 40, 45, preferably 50, 55, 60, 65, 70, 75, 80, 85, or 90 reverse primers in which each forward primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR3 regions. In such embodiments, the plurality of forward primers directed to the BCR V gene FR3 regions is combined with at least 1 reverse primer directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, the plurality of forward primers directed to the BCR V gene FR3 regions is combined with at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 15, or about 2 to about 7, about 5 to about 20, about 5 to about 15, or about 7 to about 12 reverse primers each directed to a sequence corresponding to at least a portion of at least one of the J genes of the same BCR gene. In some embodiments, such FR3 and J gene amplification primer sets may be directed to IgH gene sequences. In some preferred embodiments, about 62 to about 75 reverse primers directed to different IgH V gene FR3 regions are combined with about 2 to about 8 forward primers directed to a portion of IgH J genes. In other preferred embodiments, about 62 to about 75 reverse primers directed to different IgH V gene FR3 regions are combined with about 5 to about 15 forward primers directed to a portion of IgH J genes. In some preferred embodiments, about 62 to about 75 forward primers directed to different IgH V gene FR3 regions are combined with about 2 to about 8 reverse primers directed to a portion of IgH J genes. In other preferred embodiments, about 62 to about 75 forward primers directed to different IgH V gene FR3 regions are combined with about 5 to about 15 reverse primers directed to a portion of IgH J genes. In some preferred embodiments, the forward primers directed to IgH V gene FR3 regions are selected from those listed in Table 8 and the reverse primers directed to the IgH J gene are selected from those listed in Table 6.


In some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 50, 60, 70, 80, or 90 reverse primers in which each reverse primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR2 regions. In such embodiments, the plurality of reverse primers directed to the BCR V gene FR2 regions is combined with at least 2, 3, 4, 5, 6, 8, or about 3-6 forward primers directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments of the multiplex amplification reactions, the BCR V gene FR2-directed primers may be the forward primers and the BCR J gene-directed primers may be the reverse primers. Accordingly, in some embodiments, a multiplex amplification reaction includes at least 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 forward primers in which each forward primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR2 regions. In such embodiments, the plurality of forward primers directed to the BCR V gene FR2 regions is combined with at least 2, 3, 4, 5, 6, 8, or about 3-6 reverse primers directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, such FR2 and J gene amplification primer sets may be directed to IgH gene sequences. In some preferred embodiments, about 5 to about 15 reverse primers directed to different IgH V gene FR2 regions are combined with about 3 to about 6 forward primers directed to different IgH J genes. In some preferred embodiments, about 5 to about 15 forward primers directed to different IgH V gene FR2 regions are combined with about 3 to about 6 reverse primers directed to different IgH J genes. In some preferred embodiments, the forward primers directed to IgH V gene FR2 regions are selected from those listed in Table 7 and the reverse primers directed to the IgH J gene are selected from those listed in Table 6.


In some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 50, 60, 70, 80, or 90 reverse primers in which each reverse primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR1 regions. In such embodiments, the plurality of reverse primers directed to the BCR V gene FR1 regions is combined with at least 2, 3, 4, 5, 6, 8, or about 3-6 forward primers directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments of the multiplex amplification reactions, the BCR V gene FR1-directed primers may be the forward primers and the BCR J gene-directed primers may be the reverse primers. Accordingly, in some embodiments, a multiplex amplification reaction includes at least 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 forward primers in which each forward primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene FR1 regions. In such embodiments, the plurality of forward primers directed to the BCR V gene FR1 regions is combined with at least 2, 3, 4, 5, 6, 8, or about 3-6 reverse primers directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, such FR1 and J gene amplification primer sets may be directed to IgH gene sequences. In some preferred embodiments, about 5 to about 15 reverse primers directed to different IgH V gene FR1 regions are combined with about 3 to about 6 forward primers directed to different IgH J genes. In some preferred embodiments, about 5 to about 15 forward primers directed to different IgH V gene FR1 regions are combined with about 3 to about 6 reverse primers directed to different IgH J genes. In some preferred embodiments, the forward primers directed to IgH V gene FR1 regions are selected from those listed in Table 11 and the reverse primers directed to the IgH J gene are selected from those listed in Table 6.


In some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 50, 60, 70, 80, or 90 reverse primers in which each reverse primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene LEADER regions. In such embodiments, the plurality of reverse primers directed to the BCR V gene LEADER regions is combined with at least 2, 3, 4, 5, 6, 8, or about 3-6 forward primers directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments of the multiplex amplification reactions, the BCR V gene LEADER-directed primers may be the forward primers and the BCR J gene-directed primers may be the reverse primers. Accordingly, in some embodiments, a multiplex amplification reaction includes at least 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, or 90 forward primers in which each forward primer is directed to a sequence corresponding to at least a portion of one or more BCR V gene LEADER regions. In such embodiments, the plurality of forward primers directed to the BCR V gene LEADER regions is combined with at least 2, 3, 4, 5, 6, 8, or about 3-6 reverse primers directed to a sequence corresponding to at least a portion of a J gene of the same BCR gene. In some embodiments, such LEADER and J gene amplification primer sets may be directed to IgH gene sequences. In some preferred embodiments, about to about 15 reverse primers directed to different IgH V gene LEADER regions are combined with about 3 to about 6 forward primers directed to different IgH J genes. In some preferred embodiments, about 5 to about 15 forward primers directed to different IgH V gene LEADER regions are combined with about 3 to about 6 reverse primers directed to different IgH J genes. In some preferred embodiments, the forward primers directed to IgH V gene LEADER regions are selected from those listed in Table 10 and the reverse primers directed to the IgH J gene are selected from those listed in Table 6.


In some embodiments, the concentration of the forward primer is about equal to that of the reverse primer in a multiplex amplification reaction. In other embodiments, the concentration of the forward primer is about twice that of the reverse primer in a multiplex amplification reaction. In other embodiments, the concentration of the forward primer is about half that of the reverse primer in a multiplex amplification reaction. In some embodiments, the concentration of each of the primers targeting the V gene leader or FR region is about 5 nM to about 2000 nM. In some embodiments, the concentration of each of the primers targeting the V gene leader or FR region is about 50 nM to about 800 nM. In some embodiments, the concentration of each of the primers targeting the V gene leader or FR region is about 50 nM to about 400 nM or about 100 nM to about 500 nM. In some embodiments, the concentration of each of the primers targeting the V gene leader or FR region is about 200 nM, about 400 nM, about 600 nM, or about 800 nM. In some embodiments, the concentration of each of the primers targeting the V gene leader or FR region is about 5 nM, about 10 nM, about 50 nM, about 100 nM, about 150 nM. In some embodiments, the concentration of each of the primers targeting the V gene leader or FR region is about 1000 nM, about 1250 nM, about 1500 nM, about 1750 nM, or about 2000 nM. In some embodiments, the concentration of each of the primers targeting the V gene leader or FR region is about 50 nM to about 800 nM. In some embodiments, the concentration of each of the primers targeting the J gene is about 5 nM to about 2000 nM. In some embodiments, the concentration of each of the primers targeting the J gene is about 50 nM to about 800 nM. In some embodiments, the concentration of each of the primers targeting the J gene is about 50 nM to about 400 nM or about 100 nM to about 500 nM. In some embodiments, the concentration of each of the primers targeting the J gene is about 200 nM, about 400 nM, about 600 nM, or about 800 nM. In some embodiments, the concentration of each of the primers targeting the J gene is about 5 nM, about 10 nM, about 50 nM, about 100 nM, about 150 nM. In some embodiments, the concentration of each of the primers targeting the J gene is about 1000 nM, about 1250 nM, about 1500 nM, about 1750 nM, or about 2000 nM. In some embodiments, the concentration of each of the primers targeting the J gene is about 50 nM to about 800 nM. In some embodiments, the concentration of each of the primers targeting the Cint-KDE gene is about 5 nM to about 2000 nM. In some embodiments, the concentration of each of the primers targeting the Cint-KDE gene is about 50 nM to about 800 nM. In some embodiments, the concentration of each of the primers targeting the Cint-KDE gene is about 50 nM to about 400 nM or about 100 nM to about 500 nM. In some embodiments, the concentration of each of the primers targeting the Cint-KDE gene is about 200 nM, about 400 nM, about 600 nM, or about 800 nM. In some embodiments, the concentration of each of the primers targeting the Cint-KDE gene is about 5 nM, about 10 nM, about 50 nM, about 100 nM, about 150 nM. In some embodiments, the concentration of each of the primers targeting the Cint-KDE gene is about 1000 nM, about 1250 nM, about 1500 nM, about 1750 nM, or about 2000 nM. In some embodiments, the concentration of each of the primers targeting the Cint-KDE gene is about 50 nM to about 800 nM. In some embodiments, the concentration of each forward and reverse primer in a multiplex reaction is about nM, about 100 nM, about 200 nM, or about 400 nM. In some embodiments, the concentration of each forward and reverse primer in a multiplex reaction is about 5 nM to about 2000 nM. In some embodiments, the concentration of each forward and reverse primer in a multiplex reaction is about 50 nM to about 800 nM. In some embodiments, the concentration of each forward and reverse primer in a multiplex reaction is about 50 nM to about 400 nM or about 100 nM to about 500 nM. In some embodiments, the concentration of each forward and reverse primer in a multiplex reaction is about 600 nM, about 800 nM, about 1000 nM, about 1250 nM, about 1500 nM, about 1750 nM, or about 2000 nM. In some embodiments, the concentration of each forward and reverse primer in a multiplex reaction is about 5 nM, about 10 nM, about 150 nM or 50 nM to about 800 nM.


In some embodiments, the V gene FR and J gene target-directed primers combine as amplification primer pairs to amplify target immune receptor cDNA or rearranged gDNA sequences and generate target amplicons. Generally, the length of a target amplicon will depend upon which V gene primer set (eg, LEADER, FR1, FR2, and/or FR3 directed primers) is paired with the J gene primers. Accordingly, in some embodiments, target amplicons can range from about 50 nucleotides to about 350 nucleotides in length. In some embodiments, target amplicons are about 50 to about 200, about 70 to about 170, about 200 to about 350, about 250 to about 320, about 270 to about 300, about 225 to about 300, about 250 to about 275, about 200 to about 235, about 200 to about 250, or about 175 to about 275 nucleotides in length. In some embodiments, IgH amplicons are about 80, about 60 to about 100, or about to about 90 nucleotides in length. In some embodiments, IgH amplicons, such as those generated using V gene LEADER, FR1, FR2, and/or FR3- and J gene-directed primer pairs, are about 50 to about 200 nucleotides in length, preferably about 60 to about 160, about 65 to about 120, about 90 to about 120, about 70 to about 90 nucleotides, or about 80 nucleotides in length. In some embodiments, generating amplicons of such short lengths allows the provided methods and compositions to effectively detect and analyze the immune repertoire from highly degraded gDNA template material, such as that derived from an FFPE sample or cell-free DNA (cfDNA).


In some embodiments, amplification primers may include a barcode sequence, for example to distinguish or separate a plurality of amplified target sequences in a sample. In some embodiments, amplification primers may include two or more barcode sequences, for example to distinguish or separate a plurality of amplified target sequences in a sample. In some embodiments, amplification primers may include a tagging sequence that can assist in subsequent cataloguing, identification or sequencing of the generated amplicon. In some embodiments, the barcode sequence(s) or the tagging sequence(s) is incorporated into the amplified nucleotide sequence through inclusion in the amplification primer or by ligation of an adapter. Primers may further comprise nucleotides useful in subsequent sequencing, e.g. pyrosequencing. Such sequences are readily designed by commercially available software programs or companies.


In some embodiments, multiplex amplification is performed with target-directed amplification primers which do not include a tagging sequence. In other embodiments, multiplex amplification is performed with amplification primers each of which include a target-directed sequence and a tagging sequence such as, for example, the forward primer or primer set includes tagging sequence 1 and the reverse primer or primer set includes tagging sequence 2. In still other embodiments, multiplex amplification is performed with amplification primers where one primer or primer set includes target directed sequence and a tagging sequence and the other primer or primer set includes a target-directed sequence but does not include a tagging sequence, such as, for example, the forward primer or primer set includes a tagging sequence and the reverse primer or primer set does not include a tagging sequence.


Accordingly, in some embodiments, a plurality of target cDNA or gDNA template molecules are amplified in a single multiplex amplification reaction mixture with BCR directed amplification primers in which the forward and/or reverse primers include a tagging sequence and the resultant amplicons include the target BCR sequence and a tagging sequence on one or both ends. In some embodiments, the forward and/or reverse amplification primer or primer sets may also include a barcode and the one or more barcode is then included in the resultant amplicon.


In some embodiments, a plurality of target cDNA or gDNA template molecules are amplified in a single multiplex amplification reaction mixture with BCR directed amplification primers and the resultant amplicons contain only BCR. In some embodiments, a tagging sequence is added to the ends of such amplicons through, for example, adapter ligation. In some embodiments, a barcode sequence is added to one or both ends of such amplicons through, for example, adapter ligation.


Nucleotide sequences suitable for use as barcodes and for barcoding libraries are known in the art. Adapters and amplification primers and primer sets including a barcode sequence are commercially available. Oligonucleotide adapters containing a barcode sequence are also commercially available including, for example, IonXpress™, IonCode™ and Ion Select barcode adapters (Thermo Fisher Scientific). Similarly, additional and other universal adapter/primer sequences described and known in the art (e.g., Illumina universal adapter/primer sequences, PacBio universal adapter/primer sequences, etc.) can be used in conjunction with the methods and compositions provided herein and the resultant amplicons sequenced using the associated analysis platform.


In some embodiments, two or more barcodes are added to amplicons when sequencing multiplexed samples. In some embodiments, at least two barcodes are added to amplicons prior to sequencing multiplexed samples to reduce the frequency of artefactual results (e.g., immune receptor gene rearrangements or clone identification) derived from barcode cross-contamination or barcode bleed-through between samples. In some embodiments, at least two bar codes are used to label samples when tracking low frequency clones of the immune repertoire. In some embodiments, at least two barcodes are added to amplicons when the assay is used to detect clones of frequency less than 1:1,000. In some embodiments, at least two barcodes are added to amplicons when the assay is used to detect clones of frequency less than 1:10,000. In other embodiments, at least two barcodes are added to amplicons when the assay is used to detect clones of frequency less than 1:20,000, less than 1:40,000, less than 1:100,000, less than 1:200,000, less than 1:400,000, less than 1:500,00, or less than 1:1,000,000. Methods for characterizing the immune repertoire which benefit from a high sequencing depth per clone and/or detection of clones at such low frequencies include, but are not limited to, monitoring a patient with a hyperproliferative disease undergoing treatment and testing for minimal residual disease following treatment.


In some embodiments, target-specific primers (e.g., the V gene LEADER, FR1, FR2, and/or FR3-directed primers, the J gene directed primers, and the Cint-KDE gene directed primers) used in the methods of the invention are selected or designed to satisfy any one or more of the following criteria: (1) includes two or more modified nucleotides within the primer sequence, at least one of which is included near or at the termini of the primer and at least one of which is included at, or about the center nucleotide position of the primer sequence; (2) length of about 15 to about 40 bases in length; (3) Tm of from above 60° C. to about 70° C.; (4) has low cross-reactivity with non-target sequences present in the sample of interest; (5) at least the first four nucleotides (going from 3′ to 5′ direction) are non-complementary to any sequence within any other primer present in the same reaction; and (6) non-complementarity to any consecutive stretch of at least 5 nucleotides within any other produced target amplicon. In some embodiments, the target-specific primers used in the methods provided are selected or designed to satisfy any 2, 3, 4, 5, or 6 of the above criteria.


In some embodiments, the target-specific primers used in the methods of the invention include one or more modified nucleotides having a cleavable group. In some embodiments, the target-specific primers used in the methods of the invention include two or more modified nucleotides having cleavable groups. In some embodiments, the target-specific primers comprise at least one modified nucleotide having a cleavable group selected from methylguanine, 8-oxo-guanine, xanthine, hypoxanthine, 5,6-dihydrouracil, uracil, 5-methylcytosine, thymine-dimer, 7-methylguanosine, 8-oxo-deoxyguanosine, xanthosine, inosine, dihydrouridine, bromodeoxyuridine, uridine or 5-methylcytidine.


In some embodiments, target amplicons using the amplification methods (and associated compositions, systems, and kits) disclosed herein, are used in the preparation of an immune receptor repertoire library. In some embodiments, the immune receptor repertoire library includes introducing adapter sequences to the termini of the target amplicon sequences. In certain embodiments, a method for preparing an immune receptor repertoire library includes generating target immune receptor amplicon molecules according to any of the multiplex amplification methods described herein, treating the amplicon molecule by digesting a modified nucleotide within the amplicon molecules' primer sequences, and ligating at least one adapter to at least one of the treated amplicon molecules, thereby producing a library of adapter-ligated target immune receptor amplicon molecules comprising the target immune receptor repertoire. In some embodiments, the steps of preparing the library are carried out in a single reaction vessel involving only addition steps. In certain embodiments, the method further includes clonally amplifying a portion of the at least one adapter-ligated target amplicon molecule.


In some embodiments, target amplicons using the methods (and associated compositions, systems, and kits) disclosed herein, are coupled to a downstream process, such as but not limited to, library preparation and nucleic acid sequencing. For example, target amplicons can be amplified using bridge amplification, emulsion PCR or isothermal amplification to generate a plurality of clonal templates suitable for nucleic acid sequencing. In some embodiments, the amplicon library is sequenced using any suitable DNA sequencing platform such as any next generation sequencing platform, including semi-conductor sequencing technology such as the Ion Torrent sequencing platform. In some embodiments, an amplicon library is sequenced using an Ion GeneStudio S5 540™ System or an Ion GeneStudio S5 520™ System or an Ion GeneStudio S5 530™ System or an Ion PGM 318™ System or an Ion Genexus™ System.


In some embodiments, sequencing of immune receptor amplicons generated using the methods (and associated compositions and kits) disclosed herein, produces contiguous sequence reads from about 200 to about 600 nucleotides in length. In some embodiments, contiguous read lengths are from about 300 to about 400 nucleotides. In some embodiments, contiguous read lengths are from about 350 to about 450 nucleotides. In some embodiments, read lengths average about 300 nucleotides, about 350 nucleotides, or about 400 nucleotides. In some embodiments, contiguous read lengths are from about 250 to about 350 nucleotides, about 275 to about 340, or about 295 to about 325 nucleotides in length. In some embodiments, read lengths average about 270, about 280, about 290, about 300, or about 325 nucleotides in length. In other embodiments, contiguous read lengths are from about 180 to about 300 nucleotides, about 200 to about 290 nucleotides, about 225 to about 280 nucleotides, or about 230 to about 250 nucleotides in length. In some embodiments, read lengths average about 200, about 220, about 230, about 240, or about 250 nucleotides in length. In other embodiments, contiguous read lengths are from about 70 to about 200 nucleotides, about 80 to about 150 nucleotides, about 90 to about 140 nucleotides, or about 100 to about 120 nucleotides in length. In some embodiments, contiguous read lengths are from about 50 to about 170 nucleotides, about 60 to about 160 nucleotides, about 60 to about 120 nucleotides, about 70 to about 100 nucleotides, about 70 to about 90 nucleotides, or about 80 nucleotides in length. In some embodiments, read lengths average about 70, about 80, about 90, about 100, about 110, or about 120 nucleotides. In some embodiments, the sequence read length include the amplicon sequence and a barcode sequence. In some embodiments, the sequence read length does not include a barcode sequence.


In some embodiments, the amplification primers and primer pairs are target-specific sequences that can amplify specific regions of a nucleic acid molecule. In some embodiments, the target-specific primers can amplify expressed RNA or cDNA. In some embodiments, the target-specific primers can amplify mammalian RNA, such as human RNA or cDNA prepared therefrom, or murine RNA or cDNA prepared therefrom. In some embodiments, the target-specific primers can amplify DNA, such as gDNA. In some embodiments, the target-specific primers can amplify mammalian DNA, such as human DNA or murine DNA.


In methods and compositions provided herein, for example those for determining, characterizing, and/or tracking the immune repertoire in a biological sample, the amount of input RNA or gDNA required for amplification of target sequences will depend in part on the fraction of immune receptor bearing cells (e.g., B cells) in the sample. For example, a higher fraction of B cells in the sample, such as samples enriched for B cells, permits use of a lower amount of input RNA or gDNA for amplification. In some embodiments, the amount of input RNA for amplification of one or more target sequences can be about 0.05 ng to about 10 micrograms. In some embodiments, the amount of input RNA used for multiplex amplification of one or more target sequences can be from about 5 ng to about 2 micrograms. In some embodiments, the amount of RNA used for multiplex amplification of one or more target sequences can be from about 5 ng to about 1 microgram or about 10 ng to about 1 microgram. In some embodiments, the amount of RNA used for multiplex amplification of one or more immune repertoire target sequences is about 1.5 micrograms, about 2 micrograms, about 2.5 micrograms, about 3 micrograms, about 3.5 micrograms, about 4.0 micrograms, about 5 micrograms, about 6 micrograms, about 7 micrograms, or about 10 micrograms. In some embodiments, the amount of RNA used for multiplex amplification of one or more immune repertoire target sequences is about 10 ng, about 25 ng, about 50 ng, about 100 ng, about 200 ng, about 250 ng, about 500 ng, about 750 ng, or about 1000 ng. In some embodiments, the amount of RNA used for multiplex amplification of one or more immune repertoire target sequences is from about 25 ng to about 500 ng RNA or from about 50 ng to about 200 ng RNA. In some embodiments, the amount of RNA used for multiplex amplification of one or more immune repertoire target sequences is from about 0.05 ng to about 10 ng RNA, from about 0.1 ng to about 5 ng RNA, from about 0.2 ng to about 2 ng RNA, or from about 0.5 ng to about 1 ng RNA. In some embodiments, the amount of RNA used for multiplex amplification of one or more immune repertoire target sequences is about 0.05 ng, about 0.1 ng, about 0.2 ng, about 0.5 ng, about 1.0 ng, about 2.0 ng, or about 5.0 ng.


As described herein, RNA from a biological sample is converted to cDNA, typically using reverse transcriptase in a reverse transcription reaction, prior to the multiplex amplification. In some embodiments, a reverse transcription reaction is performed with the input RNA and a portion of the cDNA from the reverse transcription reaction is used in the multiplex amplification reaction. In some embodiments, substantially all of the cDNA prepared from the input RNA is added to the multiplex amplification reaction. In other embodiments, a portion, such as about 80%, about 75%, about 66%, about 50%, about 33%, or about 25% of the cDNA prepared from the input RNA is added to the multiplex amplification reaction. In other embodiments, about 15%, about 10%, about 8%, about 6%, or about 5% of the cDNA prepared from the input RNA is added to the multiplex amplification reaction.


In some embodiments, the amount of cDNA from a sample added to the multiplex amplification reaction can be about 0.001 ng to about 5 micrograms. In some embodiments, the amount of cDNA used for multiplex amplification of one or more immune repertoire target sequences can be from about 0.01 ng to about 2 micrograms. In some embodiments, the amount of cDNA used for multiplex amplification of one or more target sequences can be from about 0.1 ng to about 1 microgram or about 1 ng to about 0.5 microgram. In some embodiments, the amount of cDNA used for multiplex amplification of one or more immune repertoire target sequences is about 0.5 ng, about 1 ng, about 5 ng, about 10 ng, about 25 ng, about 50 ng, about 100 ng, about 200 ng, about 250 ng, about 500 ng, about 750 ng, or about 1000 ng. In some embodiments, the amount of cDNA used for multiplex amplification of one or more immune repertoire target sequences is from about 0.01 ng to about 10 ng cDNA, from about 0.05 ng to about 5 ng cDNA, from about 0.1 ng to about 2 ng cDNA, or from about 0.01 ng to about 1 ng cDNA. In some embodiments, the amount of cDNA used for multiplex amplification of one or more immune repertoire target sequences is about 0.005 ng, about 0.01 ng, about 0.05 ng, about 0.1 ng, about 0.2 ng, about 0.5 ng, about 1.0 ng, about 2.0 ng, or about 5.0 ng.


In some embodiments, mRNA is obtained from a biological sample and converted to cDNA for amplification purposes using conventional methods. Methods and reagents for extracting or isolating nucleic acid from biological samples are well known and commercially available. In some embodiments, RNA extraction from biological samples is performed by any method described herein or otherwise known to those of skill in the art, e.g., methods involving proteinase K tissue digestion and alcohol-based nucleic acid precipitation, treatment with DNAse to digest contaminating DNA, and RNA purification using silica-gel-membrane technology, or any combination thereof. Exemplary methods for RNA extraction from biological samples using commercially available kits including RecoverAll™ Multi-Sample RNA/DNA Workflow (Invitrogen), RecoverAll™ Total Nucleic Acid Isolation Kit (Invitrogen), NucleoSpin® RNA blood (Macherey-Nagel), PAXgene® Blood RNA system, TRI Reagent™ (Invitrogen), PureLink™ RNA Micro Scale kit (Invitrogen), MagMAX™ FFPE DNA/RNA Ultra Kit (Applied Biosystems) ZR RNA MicroPrep™ kit (Zymo Research), RNeasy Micro kit (Qiagen), and ReliaPrep™ RNA Tissue miniPrep system (Promega).


In some embodiments, the amount of input gDNA for amplification of one or more target sequences can be about 0.1 ng to about 10 micrograms. In some embodiments, the amount of gDNA required for amplification of one or more target sequences can be from about 0.5 ng to about 5 micrograms. In some embodiments, the amount of gDNA required for amplification of one or more target sequences can be from about 1 ng to about 1 microgram or about 10 ng to about 1 microgram. In some embodiments, the amount of gDNA required for amplification of one or more immune repertoire target sequences is from about 10 ng to about 500 ng, about 25 ng to about 400 ng, or from about 50 ng to about 200 ng. In some embodiments, the amount of gDNA required for amplification of one or more target sequences is about 0.5 ng, about 1 ng, about 5 ng, about 10 ng, about 20 ng, about 50 ng, about 100 ng, or about 200 ng. In some embodiments, the amount of gDNA required for amplification of one or more immune repertoire target sequences is about 1 microgram, about 2 micrograms, about 3 micrograms, about 4.0 micrograms, or about 5 micrograms.


In some embodiments, gDNA is obtained from a biological sample using conventional methods. Methods and reagents for extracting or isolating nucleic acid from biological samples are well known and commercially available. In some embodiments, DNA extraction from biological samples is performed by any method described herein or otherwise known to those of skill in the art, e.g., methods involving proteinase K tissue digestion and alcohol-based nucleic acid precipitation, treatment with RNAse to digest contaminating RNA, and DNA purification using silica-gel-membrane technology, or any combination thereof. Exemplary methods for DNA extraction from biological samples using commercially available kits including Ion AmpliSeg™ Direct FFPE DNA Kit, MagMAX™ FFPE DNA/RNA Ultra Kit, TRI Reagent™ (Invitrogen), PureLink™ Genomic DNA Mini kit (Invitrogen), RecoverAll™ Total Nucleic Acid Isolation Kit (Invitrogen), MagMAX™ DNA Multi-Sample Kit (Invitrogen) and DNA extraction kits from BioChain Institute Inc. (e.g., FFPE Tissue DNA Extraction Kit, Genomic DNA Extraction Kit, Blood and Serum DNA Isolation Kit).


A sample or biological sample, as used herein, refers to a composition from an individual that contains or may contain cells related to the immune system. Exemplary biological samples, include without limitation, tissue (for example, lymph node, organ tissue, bone marrow), whole blood, synovial fluid, cerebral spinal fluid, tumor biopsy, and other clinical specimens containing cells. The sample may include normal and/or diseased cells and be a fine needle aspirate, fine needle biopsy, core sample, or other sample. In some embodiments, the biological sample may comprise hematopoietic cells, peripheral blood mononuclear cells (PBMCs), B cells, tumor infiltrating lymphocytes (“TILs”) or other lymphocytes. In some embodiments, the sample may be fresh (e.g., not preserved), frozen, or formalin-fixed paraffin-embedded tissue (FFPE). Some samples comprise cancer cells, such as carcinomas, melanomas, sarcomas, lymphomas, myelomas, leukemias, and the like, and the cancer cells may be circulating tumor cells. In some embodiments, the biological sample comprises cfDNA, such as found, for example, in blood or plasma.


The biological sample can be a mix of tissue or cell types, a preparation of cells enriched for at least one particular category or type of cell, or an isolated population of cells of a particular type or phenotype. Samples can be separated by centrifugation, elutriation, density gradient separation, apheresis, affinity selection, panning, FACS, centrifugation with Hypaque, etc. prior to analysis. Methods for sorting, enriching for, and isolating particular cell types are well-known and can be readily carried out by one of ordinary skill. In some embodiments, the sample may a preparation enriched for B cells.


In some embodiments, the provided methods and systems include processes for analysis of immune repertoire receptor cDNA or gDNA sequence data and for identification and/or removing PCR or sequencing-derived error(s) from the determined immune receptor sequence.


In some embodiments, the error correction strategy includes the following steps:

    • 1) Align the sequenced rearrangement to a reference database of variable, diversity and joining/constant genes to produce a query sequence/reference sequence pair. Many alignment procedures may be used for this purpose including, for example, IgBLAST, a freely-available tool from the NCBI, and custom computer scripts.
    • 2) Realign the reference and query sequences to each other, taking into account the flow order used for sequencing. The flow order provides information that allows one to identify and correct some types of erroneous alignments.
    • 3) Identify the borders of the CDR3 region by their characteristic sequence motifs.
    • 4) Over the aligned portion of the rearrangement corresponding to the variable gene and joining/constant genes, excluding the CDR3 region, identify indels in the query with respect to the reference and alter the mismatching query base position so that it is consistent with the reference.
    • 5) For the CDR3 region, if the CDR3 length is not a multiple of three (indicative of an indel error):
      • (a) Search the CDR3 for the homopolymer stretch having the highest probability of containing a sequence error, based on PHRED score (denoted e).
      • (b) Obtain the probability of error over the entire CDR3 region based on PHRED score (denoted t)
      • (c) If e/t is greater than a defined threshold, edit the homopolymer by either increasing or decreasing the length of the homopolymer by one base such that the CDR3 nucleotide length is a multiple of three.
      • (d) As an alternative to steps a-c, search the CDR3 for the longest homopolymer, and if the length of the homopolymer is above a defined threshold, edit the homopolymer by either increasing or decreasing the length of the homopolymer by one base such that the CDR3 nucleotide length is a multiple of three.


In some embodiments, methods are provided to identify B cell clones in repertoire data that are robust to PCR and sequencing error. Accordingly, the following describes steps that may be employed in such methods to identify B cell clones in a manner that is robust to PCR and sequencing error. Table 1 a diagram of an exemplary workflow for use in identifying and removing PCR or sequencing-derived errors from immune receptor sequencing data.


For a set of BCR sequences derived from mRNA or gDNA, where 1) each sequence has been annotated as a productive rearrangement, either natively or after error correction, such as previously described, and 2) each sequence has an identified V gene and CDR3 nucleotide region, in some embodiments, methods include the following:

    • 1) Identify and exclude chimeric sequences. For each unique CDR3 nucleotide sequence present in the dataset, tally the number of reads having that CDR3 nucleotide sequence and any of the possible V genes. Any V gene-CDR3 combination making up less than 10% of total reads for that CDR3 nucleotide sequence is flagged as a chimera and eliminated from downstream analyses. As an example, for the sequences below having the same CDR3 nucleotide sequence, e.g., the sequences having TRBV3 and TRBV6 paired with CDR3nt sequence AATTGGT will be flagged as chimeric.

















V gene
CDR3 nt
Read counts









TRBV2
AATTGGT
1000







TRBV3
AATTGGT
  10







TRBV6
AATTGGT
   3












    • 2) Identify and exclude sequences containing simple indel errors. For each read in the dataset, obtain the homopolymer-collapsed representation of the CDR3 sequence of that read. For each set of reads having the same V gene and collapsed-CDR3 combination, tally the number of occurrences of each non-collapsed CDR3 nucleotide sequence. Any non-collapsed CDR3 sequence making up<10% of total reads for that read set is flagged as having a simple homopolymer error. As an example, three different V gene-CDR3 nucleotide sequences are presented that are identical after homopolymer collapsing of the CDR3 nucleotide sequence. The two less frequent V gene-CDR3 combinations make up<10% of total reads for the read set and will be flagged as containing a simple indel error. For example:




















Homopolymer
Read


V gene
CDR3 nt
collapsed CDR3 nt
counts







TRBV2
AATTGGT
ATGT
1000





TRBV2
AAATGGT
ATGT
  10





TRBV2
AAAATTTGGT
ATGT
   3











    • 3) Identify and exclude singleton reads. For each read in the dataset, tally the number of times that the exact read sequence is found in the dataset. Reads that appear only once in the dataset will be flagged as singleton reads.

    • 4) Identify and exclude truncated reads. For each read in the dataset, determine whether the read possesses an annotated V gene FR1, CDR1, FR2, CDR2, and FR3 region, as indicated by the IgBLAST alignment of the read to the IgBLAST reference V gene set. Reads that do not possess the above regions are flagged as truncated if the region(s) is expected based on the particular V gene primer used for amplification.

    • 5) Identify and exclude rearrangements lacking bidirectional support. For each read in the dataset, obtain the V gene and CDR3 sequence of the read as well as the strand orientation of the read (plus or minus strand). For each V gene-CDR3 combination in the dataset, tally the number of plus and minus strand reads having that V gene-CDR3nt combination. V gene-CDR3nt combinations that are only present in reads of one orientation will be deemed to be a spurious. All reads having a spurious V gene-CDR3nt combination will be flagged as lacking bidirectional support.

    • 6) For genes that have not been flagged, perform stepwise clustering based on CDR3 nucleotide similarity. Separate the sequences into groups based on the V gene identity of the read, excluding allele information (v-gene groups). For each group:
      • a. Arrange reads in each group into clusters using cd-hit-est and the following parameters:
      • cd-hit-est vgene_groups.fa-o clustered_vgene_groups.cdhit-T 24-d 0-M 100000-B 0-r 0-g 1-S 0-U 2-uL 0.05-n 10−17. (The freely available software program cd-hit-est clusters a nucleotide dataset into clusters that meet a user-defined similarity threshold. (For code and instructions on cd-hit-est, see github.com/weizhongli/cdhit/wild/3.-User %27s-Guide#CDHITEST).
      • Where vgene_groups.fa is a fasta format file of the CDR3 nucleotide regions of sequences having the same V gene and clustered_vgenegroups.cdhit is the output, containing the subdivided sequences.
      • b. Assign each sequence in a cluster the same clone ID, used to denote that members of the subgroup are believed to represent the same B cell clone.
      • c. Chose a representative sequence for each cluster, such that the representative sequence is the sequence that appears the greatest number of times, or, in cases of a tie, is randomly chosen.
      • d. Merge all other reads in the cluster into the representative sequence such that the number of reads for the representative sequence is increased according to the number of reads for the merged sequences.
      • e. Compare the representative sequences within a v-gene group to each other on the basis of hamming distance. If a representative sequence is within a hamming distance of 1 to a representative sequence that is >50 times more abundant, merge that sequence into the more common representative sequence. If a representative sequence is within a hamming distance of 2 to a representative sequence that is >10000 times more abundant, merge that sequence into the more common representative sequence.
      • f. Identify complex sequence errors. Homopolymer-collapse the representative sequences within each V gene group, then compare to each other using Levenshtein distances. If a representative sequence is within a Levenshtein distance of 1 to a representative sequence that is >50 times more abundant, merge that sequence into the more common representative sequence.
      • g. Identify CDR3 misannotation errors. Homopolymer-collapse the representative sequences within each V gene group, then perform a pairwise comparison of each homopolymer-collapsed sequence. For each pair of sequences, determine whether one sequence is a subset of the other sequence. If so, merge the less abundant sequence into the more abundant sequence if the more abundance sequence is >500 fold more abundant.

    • 7) Report cluster representatives to user.





In some embodiments, step 6 of the above workflow separates the rearrangement sequences into groups based on the V-gene identity (excluding allele information), and the CDR3 nucleotide length. In other embodiments, the J-gene identity and/or isotype identity is also used as part of the grouping criteria. Accordingly, in some embodiments, step 6 of the above workflow includes the following steps:

    • a. Arrange reads in each group into clusters using cd-hit-est and the following parameters:
      • cd-hit-est vgene_groups.fa-o clustered_vgene_groups.cdhit-T 24-19-d 0-M 100000-B 0-r 0-g 1-S 15-U 2-uL 0.05-n 9.
      • Where vgene_groups.fa is a fasta format file of the sequenced portion of the VDJ rearrangement.
      • In some embodiments, the full sequence of the VDJ is considered for clustering as somatic hypermutation may occur throughout the VDJ region.
    • b. Assign each sequence in a cluster the same clone ID, used to denote that members of the subgroup are believed to represent the same B cell clone.
    • c. Chose a representative sequence for each cluster, such that the representative sequence is the sequence that appears the greatest number of times, or, in cases of a tie, is randomly chosen.
    • d. Merge all other reads in the cluster into the representative sequence such that the number of reads for the representative sequence is increased according to the number of reads for the merged sequences.
    • e. Compare the representative sequences within a v-gene group to each other on the basis of hamming distance. If a representative sequence is within a hamming distance of 1 to a representative sequence that is >50 times more abundant, merge that sequence into the more common representative sequence. If a representative sequence is within a hamming distance of 2 to a representative sequence that is >10000 times more abundant, merge that sequence into the more common representative sequence. In some embodiments, fold thresholds of >50/3 and >10000/3, among others are used to merge sequences of hamming distances 1 or 2, respectively. Reducing the fold thresholds can be useful when comparing sequences of the entire VDJ region rather than sequences of only the CDR3 region as the longer sequence has a greater chance of accumulating amplification and/or sequencing errors.
    • f. Identify complex sequence errors. Homopolymer-collapse the representative sequences within each V gene group, then compare to each other using Levenshtein distances. If a representative sequence is within a Levenshtein distance of 1 to a representative sequence that is >50 times more abundant, merge that sequence into the more common representative sequence.
    • g. Identify CDR3 misannotation errors. Homopolymer-collapse the representative sequences within each V gene group, then perform a pairwise comparison of each homopolymer-collapsed sequence. For each pair of sequences, determine whether one sequence is a subset of the other sequence. If so, merge the less abundant sequence into the more abundant sequence if the more abundance sequence is >500 fold more abundant.


In some embodiments, the provided workflows are not limited to the frequency ratio thresholds listed in the various steps, and other frequency ratio thresholds may be substituted for the representative frequency ratio thresholds included above. The frequency ratio refers to a ratio of the abundance value of the more common representative sequence to the abundance value of the less common representative sequence. The frequency ratio threshold gives the threshold at which the less common representative sequence is merged into the more common representative sequence. For example, in some embodiments, comparing the representative sequences within a v-gene group to each other on the basis of hamming distance may use a frequency ratio threshold other than those listed in step (e) above. For example and without limitation, frequency ratio thresholds of 1000, 5000, 20,000, etc may be used if a representative sequence is within a hamming distance of 2 to a representative sequence. For example and without limitation, frequency ratio thresholds of 20, 100, 200, etc may be used if a representative sequence is within a hamming distance of 1 to a representative sequence. The frequency ratio thresholds provided are representative of the general process of labeling the more abundant sequence of a similar pair as a correct sequence.


Similarly, when comparing the frequencies of two sequences at other steps in the workflows, eg, step (1), step (2), step (6f) and step (6g), frequency ratio thresholds other than those listed in the step above may be used.


As used herein, the term “homopolymer-collapsed sequence” is intended to represent a sequence where repeated bases are collapsed to a single base representative.


As used herein, the terms “clone,” “clonotype,” “lineage,” or “rearrangement” are intended to describe a unique V gene nucleotide combination for an immune receptor, such as a BCR. For example, a unique V gene-CDR3 nucleotide combination.


As used herein, the term “productive reads” refers to a BCR sequence reads that have no stop codon and have in-frame variable gene and joining gene segments. Productive reads are biologically plausible in coding for a polypeptide.


As used herein, “chimeras” or chimeric sequences” refer to artefactual sequences that arise from template switching during target amplification, such as PCR. Chimeras typically present as a CDR3 sequence grafted onto an unrelated V gene, resulting in a CDR3 sequence that is associated with multiple V genes within a dataset. The chimeric sequence is usually far less abundant than the true sequence in the dataset.


As used herein, the term “indel” refers to an insertion and/or deletion of one or more nucleotide bases in a nucleic acid sequence. In coding regions of a nucleic acid sequence, unless the length of an indel is a multiple of 3, it will produce a frameshift when the sequence is translated. As used herein, “simple indel errors” are errors that do not alter the homopolymer-collapsed representation of the sequence. As used herein, “complex indel errors” are indel sequencing errors that alter the homopolymer-collapsed representation of the sequence and include, without limitation, errors that eliminate a homopolymer, insert a homopolymer into the sequence, or create a dyslexic-type error.


As used herein, “singleton reads” refer to sequence reads whose indel-corrected sequence appears only once in a dataset. Typically, singleton reads are enriched for reads containing a PCR or sequencing error.


As used herein, “truncated reads” refer to immune receptor sequence reads that are missing annotated V gene regions. For example, truncated reads include, without limitation, sequence reads that are missing annotated BCR V gene FR1, CDR1, FR2, CDR2, or FR3 regions. Such reads typically are missing a portion of the V gene sequence due to quality trimming Truncated reads can give rise to artifacts if the truncation leads one to misidentify the V gene.


In the context of identified V gene-CDR3 sequences (clonotypes), “bidirectional support” indicates that a particular V gene-CDR3 sequence is found in at least one read that maps to the plus strand (proceeding from the V gene to constant gene) and at least one reads that maps to the minus strand (proceeding form the constant gene to the V gene). Systematic sequencing errors often lead to identification of V gene-CDR3 sequences having unidirectional support.


For a set of sequences that have been grouped according to a predetermined sequence similarity threshold to account for variation due to PCR or sequencing error, the “cluster representative” is the sequence that is chosen as most likely to be error free. This is typically the most abundant sequence.


As used herein, “IgBLAST annotation error” refers to rare events where the border of the CDR3 is identified to be in an incorrect adjacent position. These events typically add three bases to the 5′ or 3′ end of a CDR3 nucleotide sequence.


For two sequences of equal length, the “Hamming distance” is the number of positions at which the corresponding bases or amino acids are different. For any two sequences, the “Levenshtein distance” or the “edit distance” is the number of single base or amino acid edits required to make one nucleotide or amino acid sequence into another nucleotide or amino acid sequence.


In some embodiments in which J gene-directed primers are used in amplification of the immune receptor sequences, for example multiplex amplification with primers directed to V gene FR3 regions and primers directed to J genes, raw sequence reads derived from the assay undergo a J gene sequence inference process before any downstream analysis. In this process, the beginning and end of raw read sequences are interrogated for the presence of characteristic sequences of 10-30 nucleotides corresponding to the portion of the J gene sequences expected to exist after amplification with the J primer and any subsequent manipulation or processing (for example, digestion) of the amplicon termini prior to sequencing. The characteristic nucleotide sequences permit one to infer the sequence of the J primer, as well as the remaining portion of the J gene that was targeted since the sequence of each J gene is known. To complete the J gene sequence inference process, the inferred J gene sequence is added to the raw read to create an extended read that then spans the entire J gene. The extended read then contains the entire J gene sequence, the entire sequence of the CDR3 region, and at least a portion of the V gene sequence, which will be reported after downstream analysis. The portion of V gene sequence in the extended read will depend on the V gene-directed primers used for the multiplex amplification, for example, FR3-, or FR2-directed primers.


Use of V gene FR3 and J gene primers to amplify expressed immune receptor sequences or rearranged immune receptor gDNA sequences yields a minimum length amplicon (for example, about 60-100 or about 80 nucleotides in length) while still producing data that allows for reporting of the entire CDR3 region. With the expectation of short amplicon length, reads of amplicons <100 nucleotides in length are not eliminated as low-quality and/or off target products during the sequence analysis workflow. However, the explicit search for the expected J gene sequences in the raw reads allows one to eliminate amplicons deriving from off-target amplifications by the J gene primers. In addition, this short amplicon length improves the performance of the assay on highly degraded template material, such as that derived from an FFPE or cfDNA sample.


In some embodiments, provided methods comprise sequencing an immune receptor library and subjecting the obtained sequence data to error identification and correction processes to generate rescued productive reads, and identifying productive and rescued productive sequence reads. In some embodiments, provided methods comprise sequencing an immune receptor library and subjecting the obtained sequence dataset to error identification and correction processes, identifying productive and rescued productive sequence reads, and grouping the sequence reads by clonotype to identify immune receptor clonotypes in the library.


In some embodiments, provided methods comprise sequencing a rearranged immune receptor DNA library and subjecting the obtained sequence data to error identification and correction processes for the V gene portions to generate rescued productive reads, and identifying productive, rescued productive, and unproductive sequence reads. In some embodiments, provided methods comprise sequencing a rearranged immune receptor DNA library and subjecting the obtained sequence dataset to error identification and correction processes for the V gene portions, identifying productive, rescued productive, and unproductive sequence reads, and grouping the sequence reads by clonotype to identify immune receptor clonotypes in the library. In some embodiments, both productive and unproductive sequence reads of rearranged immune receptor DNA are separately reported.


In some embodiments, the provided error identification and correction workflow is used for identifying and resolving PCR or sequencing-derived errors that lead to a sequence read being identified as from an unproductive rearrangement. In some embodiments, the provided error identification and correction workflow is applied to immune receptor sequence data generated from a sequencing platform in which indel or other frameshift-causing errors occur while generating the sequence data.


In some embodiments, the provided error identification and correction workflow is applied to sequence data generated by an Ion Torrent sequencing platform. In some embodiments, the provided error identification and correction workflow is applied to sequence data generated by Roche 454 Life Sciences sequencing platforms, PacBio sequencing platforms, and Oxford Nanopore sequencing platforms.


In some embodiments, the BCR repertoire analysis workflow includes an additional last step to identify clonal lineages in the sample. A clonal lineage represents a set of B cell clones (e.g., identified as having unique VDJ sequences) that derive from a common VDJ rearrangement but differ owing to somatic hypermutation and/or class switch recombination. It is generally assumed that members of a clonal lineage may be more likely to target the same antigen than members of different clonal lineages.


In some embodiments, the process of clonal lineage identification includes using a set of BCR clones (e.g., IgH clones) identified (for example as described herein) to perform the following:

    • 1. Separate the clone sequences into groups where group members share the same variable gene (excluding allele information), the same CDR3 nucleotide length, and the same joining gene (excluding allele information). In some embodiments the above J-gene criterion may be omitted.
    • 2. Arrange the clone sequences in each group into clusters based on the CDR3 nucleotide similarity of the clone sequences. Thresholds for CDR3 nucleotide similarity are about 0.70 to about 0.99. In some embodiments, the threshold for CDR3 nucleotide similarity is between about 0.80 to about 0.99. In some embodiments, the threshold for CDR3 nucleotide similarity is between about 0.80 to about 0.90. In certain embodiments, the threshold for CDR3 nucleotide similarity is about 0.80, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.90, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, or 0.99.
      • a. In some embodiments, the clustering is performed using cd-hit-est as described: cd-hit-est vgene_groups.fa-o clustered_vgene_groups.cdhit-T 24-19-d 0-M 100000-B 0-r 0-g 1-S 0-c 0.85-n 5, where vgene_groups.fa consists of the set of CDR3 nucleotide sequences of each clone within a group. Clones within the same cluster are considered members of the same clonal lineage.
    • b. In some instances, somatic hypermutation may be extensive enough that the described clustering criteria may not group all clonal lineage members. For such cases, in some embodiments, an additional step is performed to merge clusters identified in (a). The additional step consists of searching for instances of shared somatic hypermutation-derived mutations in the variable gene between clonal lineages, then merging clonal lineages if the fraction and/or number of shared mutations is above a certain threshold. Variable gene mutations are identified by comparison of the variable gene sequence to the best matching variable gene sequence in the IMGT database, as described. In some embodiments, the threshold for number of shared mutations is 2 or more. In some embodiments, the threshold for number of shared mutations is 3 or more. In other embodiments, the threshold for number of shared mutations is 4, 5, 6, 7, 8, 9, 10 or more. In some embodiments, the fraction of shared mutations is about to about 0.95. In some embodiments, the fraction of shared mutations is about or about 0.85. In other embodiments, the fraction of shared mutations is about 0.2, 0.25, 0.3, 0.35, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 0.95.


In some instances, a variable gene allele may be identified that is not represented in the IMGT database. In such instances, alignment to the IMGT database will indicate a mismatch that is not derived from somatic hypermutation. To avoid noise caused by such unannotated genetic variants, in some embodiments, an initial step is performed before (b) where one identifies all putative novel variable gene alleles in a sample, noting each position that differs from reference. In some embodiments, such positions are then excluded from consideration in the analysis described in (b). Methods for the identification of novel alleles from immune repertoire sequencing data have been described, for example, by Gadala-Maria et al. (2015) Proc. Natl. Acad. Sci. USA 112: E862-E870 and PCT Application Publication No. WO 2018/136562.


At the end of this clonal lineage identification process, each clone has been assigned to a clonal lineage. BCR repertoire features such as diversity, evenness, and convergence may be calculated with the clonal lineage as the unit of analysis. In some embodiments, clonal lineages features, such as the number of clones belonging to a lineage, the isotypes of those clones, the maximum and minimum frequency of the clones in a lineage, the maximum and minimum variable gene somatic hypermutation in a lineage, and others, are calculated and reported to the user.


In the absence of somatic hypermutation, BCR convergence may be calculated as the frequency of clones that are identical, or functionally identical, in amino acid sequence but different in nucleotide sequence. These represent clones that independently underwent VDJ recombination and generally assumed to have proliferated in response to a common antigen. However, somatic hypermutation can create distinct VDJ sequences that do not represent B cells that independently underwent VDJ recombination. To account for this a definition of convergence is used that takes into account the clonal lineage identification. For this purpose, “BCR convergence” is defined as the frequency of B cell clones that are members of different clonal lineages, as determined above, but are similar or identical in amino acid sequence. In some embodiments, two IGH rearrangements are considered convergent if they are assigned to separate clonal lineages but have the same variable gene (excluding allele information) and the same or similar CDR3 amino acid sequence. In other embodiments where sequencing covers all three CDR domains of the IGH chain, two IGH rearrangements may be considered convergent if they are assigned to separate clonal lineages but have the same variable gene (excluding allele information) and the same or similar CDR1, 2 and 3 amino acid sequence. In some embodiments, similar CDR amino acid sequences are within a Hamming or Levenshtein edit distance of 1. In other embodiments, similar CDR amino acid sequences are within a Hamming or Levenshtein edit distance of 2.


Accordingly, in some embodiments, functionally equivalent B cells are identified by searching for BCR clones having the same variable gene and CDR amino acid sequences that are within a Hamming or Levenshtein edit distance of 1 or 2. In some embodiments the program cd-hit may be used to identify clones having similar but functionally equivalent amino acid sequences. (For code and information on the program cd-hit, see github.com/weizhongli/cdhit/wild/3.-User %27s-Guide) In some embodiments cd-hit is run using the following command

    • cd-hit vgene_groups.fa-o clustered_vgene_groups.cdhit-T 24-15-d 0-M 100000-B 0-g 1-S 1-U 1-n 5, where vgene_groups.fa consists of the set of CDR3 amino acid sequences of clones having the same variable gene. Clones within the same cluster are considered to be functionally equivalent.


      In some embodiments, the value for the parameter -S may be 0, 1, 2, or 3. In some embodiments, the value for the parameter -U may be 0, 1, 2, or 3.


      In some embodiments, vgene_groups.fa consists of the set of CDR 1, 2 and 3 amino acid sequences of clones having the same variable gene. In some embodiments, vgene_groups.fa consists of the set of clones having both the same variable gene and the same CDR3 length.


In some embodiments, provided sequence analysis workflows include a downsampling analysis. For immune repertoire sequencing and subsequent analysis, use of downsampling analysis can help, for example, to eliminate variability owing to differences in sequencing depth across an assay. For example, an exemplary downsampling analysis for use with RNA or cDNA sequencing and analysis workflows applies the following procedure to the data: a) starting with the total set of productive+rescued productive reads, sequence reads are randomly removed down to one of several fixed read depths and b) this subset of reads is used to perform all downstream calculations (for example, clonotyping and calculation of secondary repertoire features including without limitation evenness, convergence, diversity, number and identity of clones detected, and clonal lineages).


In some embodiments, downsampling analysis identifies the point at which a particular sample is sequenced to saturation, for example, a point at which additional reads do not identify additional clones or lineages or add additional diversity to the detected repertoire. In some embodiments, downsampling allows the refining of sequencing depth or multiplexing among or between assays with similar sample types.


In some embodiments, the set of variable gene alleles detected by the assay methods and compositions provided may be used for de novo identification of haplotype groups within human populations. In particular embodiments, provided assay methods and compositions which include use of a plurality of V gene-specific primers and at least one C gene specific primer to amplify IgH CDR 1, 2, and 3 nucleotide sequences may be used to identify the IgH haplotype of a subject's BCR repertoire. For example, in some embodiments, methods and compositions provided which use at least set of primers comprising a plurality of V gene FR1 primers selected from Table 3 and at least one C gene primer selected from Tables 6-10 may be used to identify the IgH haplotype of a subject's BCR repertoire. Methods for identification of TCR haplotype groups are described in PCT Application No. PCT/US2019/023731, filed Mar. 22, 2019, the entirety of which is incorporated herein by reference, and may similarly be used in conjunction with the methods and compositions provided herein to identify IgH haplotype groups. In some embodiments, the set of variable gene alleles detected by amplifying and sequencing IgH CDR 1, 2, and 3 nucleotide sequences may be used to assign a sample to one of several pre-existing haplotype groups as part of a larger procedure for predicting the risk of autoimmune disease or adverse events following an immunotherapy. Methods for assigning a sample to a haplotype group in a procedure for predicting risk of autoimmune disease or adverse events following an immunotherapy are also described in PCT Application No. PCT/US2019/023731, filed Mar. 22, 2019 and incorporated herein by reference, and may similarly be used in conjunction with the methods and compositions provided herein to assign a sample to a IgH haplotype group, for example, for predicting such risks. In some embodiments, the IgH CDR 1, 2, 3 sequence data obtained using the provided assay methods and compositions may be used to infer phased IgH locus haplotypes (for example, Kidd et al. (2012) J. Immunol. 188(3): 1333-1340).


In some embodiments, the method comprises hybridizing a plurality of V gene gene-specific primers and a plurality of J gene-specific primers to a cDNA molecule, extending a first primer (e.g., a V gene-specific primer) of the primer pair, denaturing the extended first primer from the cDNA molecule, hybridizing to the extended first primer product, a second primer (e.g., a J gene-specific primer) of the primer pair and extending the second primer, digesting the target-specific primer pairs to generate a plurality of target amplicons. In some embodiments, adapters are ligated to the ends of the target amplicons prior to performing a nick translation reaction to generate a plurality of target amplicons suitable for nucleic acid sequencing. In some embodiments, at least one of the ligated adapters includes at least one barcode sequence. In some embodiments, each adapter ligated to the ends of the target amplicons includes a barcode sequence. In some embodiments, the one or more target amplicons can be amplified using bridge amplification, emulsion PCR or isothermal amplification to generate a plurality of clonal templates suitable for nucleic acid sequencing.


In some embodiments, provided methods comprise preparation and formation of a plurality of immune receptor-specific amplicons. In some embodiments, the method comprises hybridizing a plurality of V gene gene-specific primers and a plurality of J gene-specific primers to a gDNA molecule, extending a first primer (eg, a V gene-specific primer) of the primer pair, denaturing the extended first primer from the gDNA molecule, hybridizing to the extended first primer product, a second primer (e.g., a J gene-specific primer) of the primer pair and extending the second primer, digesting the target-specific primer pairs to generate a plurality of target amplicons. In some embodiments, adapters are ligated to the ends of the target amplicons prior to performing a nick translation reaction to generate a plurality of target amplicons suitable for nucleic acid sequencing. In some embodiments, at least one of the ligated adapters includes at least one barcode sequence. In some embodiments, each adapter ligated to the ends of the target amplicons includes a barcode sequence. In some embodiments, the one or more target amplicons can be amplified using bridge amplification or emulsion PCR to generate a plurality of clonal templates suitable for nucleic acid sequencing.


In some embodiments, the disclosure provides methods for sequencing target amplicons and processing the sequence data to identify productive immune receptor rearrangements expressed in the biological sample from which the cDNA was derived. In other embodiments, the disclosure provides methods for sequencing target amplicons and processing the sequence data to identify productive immune receptor gene rearrangements gDNA from a biological sample. In embodiments in which J gene-directed primers are used to amplify the expressed immune receptor sequences or rearranged immune receptor gDNA sequences, processing the sequence data includes inferring the nucleotide sequence of the J gene primer used for amplification as well as the remaining portion of the J gene that was targeted, as described herein. In some embodiments, processing the sequence data includes performing provided error identification and correction steps to generate rescued productive sequences. In some embodiments, use of the provided error identification and correction workflow can result in a combination of productive reads and rescued productive reads being at least 50% of the sequencing reads for an immune receptor cDNA or gDNA sample. In some embodiments, use of the provided error identification and correction workflow can result in a combination of productive reads and rescued productive reads being at least 60%, at least 70%, at least 80%, at least 90%, or at least 95% of the sequencing reads for an immune receptor cDNA or gDNA sample. In some embodiments, use of the provided error identification and correction workflow can result in a combination of productive reads and rescued productive reads being about 50-60%, about 60-70%, about 70-80%, about 80-90%, about 50-80%, or about 60-90% of the sequencing reads for an immune receptor cDNA or gDNA sample. In some embodiments, use of the provided error identification and correction workflow can result in a combination of productive reads and rescued productive reads averaging about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90% of the sequencing reads for an immune receptor cDNA or gDNA sample.


With particular samples, the provided error identification and correction workflow can result in a combination of productive reads and rescued productive reads being less than 50% of the sequencing reads for an immune receptor cDNA or gDNA sample when particular samples are used. Such samples include, for example, those in which the RNA or gDNA is highly degraded such as FFPE samples and cfDNA samples, and those in which the number of target immune cells is very low such as, for example, samples with very low B cell count or samples from subjects experiencing severe leukopenia. Accordingly, in some embodiments, use of the provided error identification and correction workflow can result in a combination of productive reads and rescued productive reads being about 30-50%, about 40-50%, about 30-40%, about 40-60%, at least 30%, or at least 40% of the sequencing reads for an immune receptor cDNA or gDNA sample.


In certain embodiments, methods of the invention comprise the use of target immune receptor primer sets wherein the primers are directed to sequences of the same target immune receptor gene, e.g, BCR (immunoglobulin) genes. In some embodiments the immune receptor is an antibody receptor selected from the group consisting of heavy chain alpha, heavy chain delta, heavy chain epsilon, heavy chain gamma, heavy chain mu, light chain kappa, and light chain lambda.


In certain embodiments, provided is a method for amplification of expression nucleic acid sequences of a BCR repertoire in a sample, comprising performing a multiplex amplification reaction to amplify BCR nucleic acid template molecules having a constant portion and a variable portion using at least one set of: i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of a leader or framework region within the V gene, and ii) one or more C gene primers directed to at least a portion of the respective target constant gene of the BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa, and IgLlambda, and wherein performing amplification using each set results in amplicons representing the entire repertoire of the respective immune receptor in the sample; thereby generating immune receptor amplicons comprising the repertoire of the BCR. In particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about an 80 nucleotide portion of the framework region. In more particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about a 50 nucleotide portion of the framework region.


In certain embodiments, provided is a method for amplification of expression nucleic acid sequences of a BCR repertoire in a sample, comprising performing a multiplex amplification reaction to amplify BCR nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of: i) a plurality of V gene primers directed to a majority of different V genes of a BCR coding sequence comprising at least a portion of a leader or framework region within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target immune receptor coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda, and wherein performing amplification using each set results in amplicons representing the entire repertoire of the respective immune receptor in the sample; thereby generating amplicons comprising the repertoire of the BCR. In particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about an 80 nucleotide portion of the framework region. In more particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about a 50 nucleotide portion of the framework region. In particular embodiments the one or more plurality of J gene primers of ii) are directed to sequences over about a 50 nucleotide portion of the J gene. In more particular embodiments the one or more plurality of J gene primers of ii) are directed to sequences over about a 30 nucleotide portion of the J gene. In certain embodiments, the one or more plurality of J gene primers of ii) are directed to sequences completely within the J gene.


In certain embodiments, provided is a method for amplification of expression nucleic acid or genomic DNA sequences of a BCR repertoire in a sample, comprising performing a multiplex amplification reaction to amplify BCR nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of: i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa, and IgLlambda, and wherein performing amplification using each set results in amplicons representing the entire repertoire of the respective immune receptor in the sample; thereby generating BCR amplicons comprising the repertoire of the BCR. In particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about an 80 nucleotide portion of the framework region. In more particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about a 50 nucleotide portion of the framework region. In more particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about a 40 to about a 60 nucleotide portion of the framework region. In some embodiments the one or more plurality of V gene primers of i) anneal to at least a portion of the framework 3 region of the template molecules. In certain embodiments the plurality of J gene primers of ii) comprises at least two primers that anneal to at least a portion of the J gene portion of the template molecules. In some embodiments the plurality of J gene primers of ii) comprises at least 2 to about 8 primers that anneal to at least a portion of the J gene portion of the template molecules. In some embodiments the plurality of J gene primers of ii) comprises about 4 primers that anneal to at least a portion of the J gene portion of the template molecules. In some embodiments the plurality of J gene primers of ii) comprises about 3 to about 6 primers that anneal to at least a portion of the J gene portion of the template molecules. In particular embodiments at least one set of the generated amplicons includes complementarity determining region CDR3 of a BCR expression sequence. In some embodiments the amplicons are about 60 to about 160 nucleotides in length, about 70 to about 100 nucleotides in length, about 100 to about 120 nucleotides in length, at least about 70 to about 90 nucleotides in length, about 80 to about 90 nucleotides in length, or about 80 nucleotides in length. In some embodiments the nucleic acid template used in methods is cDNA produced by reverse transcribing nucleic acid molecules extracted from a biological sample.


In certain embodiments, methods are provided for providing sequence of the BCR repertoire in a sample, comprising performing a multiplex amplification reaction to amplify BCR nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target immune receptor coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa, and IgLlambda, thereby generating BCR amplicon molecules. Sequencing of resulting BCR amplicon molecules is then performed and the sequences of the immune receptor amplicon molecules determined thereby provides sequence of the BCR repertoire in the sample. In some embodiments, determining the sequence of the BCR amplicon molecules includes obtaining initial sequence reads, aligning the initial sequence read to a reference sequence, identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads, and determining the sequences of the resulting immune receptor molecules. In particular embodiments, determining the sequence of the BCR amplicon molecules includes obtaining initial sequence reads, adding the inferred J gene sequence to the sequence read to create an extended sequence read, aligning the extended sequence read to a reference sequence and identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads, and determining the sequences of the resulting BCR molecules. In particular embodiments the combination of productive reads and rescued productive reads is at least 50%, at least 60% at least 70% or at least 75% of the sequencing reads for the BCRs. In additional embodiments the method further comprises sequence read clustering and BCR clonotype reporting. In some embodiments, the sequences of the identified BCR repertoire are compared to a contemporaneous or current version of the IMGT database and the sequence of at least one allelic variant absent from that IMGT database is identified. In some embodiments the sequence read lengths are about 60 to about 185 nucleotides, depending in part on inclusion of any barcode sequence in the read length. In some embodiments the average sequence read length is between 90 and 120 nucleotides, is between 70 and 90 nucleotides, or is between about 75 and about 85 nucleotides, or is about 80 nucleotides. In certain embodiments at least one set of the sequenced amplicons includes complementarity determining region CDR3 of a BCR expression sequence.


In particular embodiments, methods provided utilize target BCR primer sets comprising V gene primers wherein the one or more of a plurality of V gene primers are directed to sequences over an FR3 region about 50 nucleotides in length. In other embodiments the one or more of a plurality of V gene primers are directed to sequences over an FR3 region about 70 nucleotides in length. In other particular embodiments the one or more of a plurality of V gene primers are directed to sequences over an FR3 region about 40 to about 60 nucleotides in length. In certain embodiments a target BCR primer set comprises V gene primers comprising about 50 to about 85 different FR3-directed primers. In certain embodiments a target BCR primer set comprises V gene primers comprising about 55 to about 80 different FR3-directed primers. In some embodiments, a target immune receptor primer set comprises V gene primers comprising about 62 to about 75 different FR3-directed primers. In some embodiments, a target BCR primer set comprises V gene primers comprising about 65, 66, 67, 68, 69, or 70 different FR3-directed primers. In some embodiments the target BCR primer set comprises a plurality of J gene primers. In some embodiments a target BCR primer set comprises at least two J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises 2 to about 8 J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises about 3 to about 6 different J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises about 2, 3, 4, 5, 6, 7 or 8 different J gene primers. In particular embodiments a target immune receptor primer set comprises about 4 J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides.


In particular embodiments, methods of the invention comprise the use of at least one set of primers comprising V gene primers of BCR IgH coding sequence and J gene primers of BCR IgH coding sequence i), and V gene primers of BCR IgLlambda coding sequence and J gene primers of BCR IgLlambda coding sequence ii), and V gene primers of BCR IgIgLkappa coding sequence and J gene primers of BCR IgLkappa coding sequence iii), and optionally Cint sequence primers and KDE sequence primers iv), selected from Tables 9 and 6 and Tables 3-4 and Tables 1-2 and Table 5, respectively.


In particular embodiments, methods of the invention comprise the use of at least one set of primers comprising V gene primers of BCR IgH FR2 coding sequence and J gene primers of BCR IgH coding sequence i), and/or V gene primers of BCR IgH distal FR3 coding sequence and J gene primers of BCR IgH coding sequence ii), selected from Tables 8 and 6 and Tables 7 and 6, respectively.


In some embodiments methods of the invention comprise the use of at least one set of primers i) and ii) and iii), optionally iv) comprising primers selected from SEQ ID NOs1161-1446 and 973-988 and SEQ ID Nos 597-910 and 911-950 and SEQ ID Nos 1-548 and 549-596 and optionally selected from SEQ ID Nos 951-972. In other certain embodiments methods of the invention comprise the use of at least one set of primers i) and ii) comprising primers selected from SEQ ID 1304-1446 and 981-988 and SEQ ID Nos 785-816, 847-876 and 931-935, 941-945 and SEQ ID Nos 406-456 and 557-580-596 and optionally selected from SEQ ID Nos 960, 961 and 972.


In some embodiments methods of the invention comprise the of the invention comprise at least one set of primers i) and ii) comprising primers selected from SEQ ID NOs: 1065-1160 and 973-988 or selected from SEQ ID NOs: 1065-1112 and 981-988. In other certain embodiments co methods of the invention comprise the use of at least one set of primers i) and ii) comprising primers selected from SEQ ID NOs: 989-1064 and 973-988 or selected from SEQ ID NOs: 1027-1064 and 981-988.


In certain embodiments, provided is a method for amplification of expression nucleic acid sequences of a BCR repertoire in a sample, comprising performing a multiplex amplification reaction to amplify BCR nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of: i) a plurality of V gene primers directed to a majority of different V genes of a BCR coding sequence comprising at least a portion of framework region 2 (FR2) within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target immune receptor coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda, and wherein performing amplification using each set results in amplicons representing the entire repertoire of the respective immune receptor in the sample; thereby generating immune receptor amplicons comprising the repertoire of the BCR. In particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about an 80 nucleotide portion of the framework region. In more particular embodiments the one or more plurality of V gene primers of i) are directed to sequences over about a 50 nucleotide portion of the framework region. In some embodiments the one or more plurality of V gene primers of i) anneal to at least a portion of the FR2 region of the template molecules. In certain embodiments the plurality of J gene primers of ii) comprise at least ten primers that anneal to at least a portion of the J gene of the template molecules. In some embodiments the plurality of J gene primers of ii) comprises about 14 primers that anneal to at least a portion of the J gene portion of the template molecules. In some embodiments the plurality of J gene primers of ii) at least two primers that anneal to at least a portion of the J gene portion of the template molecules. In some embodiments the plurality of J gene primers of ii) comprises at least 2 to about 8 primers that anneal to at least a portion of the J gene portion of the template molecules. In some embodiments the plurality of J gene primers of ii) comprises about 4 primers that anneal to at least a portion of the J gene portion of the template molecules. In some embodiments the plurality of J gene primers of ii) comprises about 3 to about 6 primers that anneal to at least a portion of the J gene portion of the template molecules. In particular embodiments at least one set of the generated amplicons includes complementarity determining regions CDR2 and CDR3 of a BCR gene sequence. In some embodiments the amplicons are about 160 to about 270 nucleotides in length, about 180 to about 250 nucleotides, or about 195 to about 225 nucleotides in length. In some embodiments the nucleic acid template used in methods is cDNA produced by reverse transcribing nucleic acid molecules extracted from a biological sample.


In certain embodiments, methods are provided for providing sequence of the BCR repertoire in a sample, comprising performing a multiplex amplification reaction to amplify BCR nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR2 within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target immune receptor coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda, thereby generating BCR amplicon molecules. Sequencing of resulting immune receptor amplicon molecules is then performed and the sequences of the BCR amplicon molecules determined thereby provides sequence of the BCR repertoire in the sample. In some embodiments, determining the sequence of the BCR amplicon molecules includes obtaining initial sequence reads, aligning the initial sequence read to a reference sequence, identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads, and determining the sequences of the resulting immune receptor molecules. In particular embodiments, determining the sequence of the BCR amplicon molecules includes obtaining initial sequence reads, adding the inferred J gene sequence to the sequence read to create an extended sequence read, aligning the extended sequence read to a reference sequence and identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads, and determining the sequences of the resulting BCR molecules. In particular embodiments the combination of productive reads and rescued productive reads is at least 40%, at least 50%, at least 60% at least 70% or at least 75% of the sequencing reads for the BCRs. In additional embodiments the method further comprises sequence read clustering and BCR clonotype reporting. In some embodiments, the sequences of the identified immune repertoire are compared to a contemporaneous or current version of the IMGT database and the sequence of at least one allelic variant absent from that IMGT database is identified. In some embodiments the average sequence read length is between 160 and 300 nucleotides, between 180 and 280 nucleotides, between 200 and 260 nucleotides, or between 225 and 270 nucleotides, depending in part on inclusion of any barcode sequence in the read length. In certain embodiments at least one set of the sequenced amplicons includes complementarity determining regions CDR2 and CDR3 of a BCR expression sequence.


In particular embodiments, methods provided utilize target BCR primer sets comprising V gene primers wherein the one or more of a plurality of V gene primers are directed to sequences over an FR2 region about 70 nucleotides in length. In other particular embodiments the one or more of a plurality of V gene primers are directed to sequences over an FR2 region about 50 nucleotides in length. In certain embodiments a target BCR primer set comprises V gene primers comprising about 4 to about 20 different FR2-directed primers. In some embodiments a target BCR primer set comprises V gene primers comprising about 5 to about 15 different FR2-directed primers. In some embodiments a target BCR primer set comprises V gene primers comprising about 5, 6, 7, 8, 9, 10, 11, or 12 different FR2-directed primers. In some embodiments the target BCR primer set comprises a plurality of J gene primers. In some embodiments a target BCR primer set comprises at least two J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises 2 to about 8 J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises about 3 to about 6 different J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises about 2, 3, 4, 5, 6, 7 or 8 different J gene primers. In particular embodiments a target immune receptor primer set comprises about 4 J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides.


In particular embodiments, methods of the invention comprise use of at least one set of primers i) and ii) and iii), optionally iv) comprising primers selected from SEQ ID NOs1161-1446 and 973-988 and SEQ ID Nos 597-910 and 911-950 and SEQ ID Nos 1-548 and 549-596 and optionally selected from SEQ ID Nos 951-972. In other certain embodiments methods comprise use of at least one set of primers i) and ii) comprising primers selected from SEQ ID 1304-1446 and 981-988 and SEQ ID Nos 785-816, 847-876 and 931-935, 941-945 and SEQ ID Nos 406-456 and 557-580-596 and optionally selected from SEQ ID Nos 960, 961 and 972.


In particular embodiments, methods of the invention comprise use of at least one set of primers i) and ii) comprising primers selected from SEQ ID NOs: 1065-1160 and 973-988 or selected from SEQ ID NOs: 1065-1112 and 981-988. In other certain embodiments methods comprise use of at least one set of primers i) and ii) comprising primers selected from SEQ ID NOs: 989-1064 and 973-988 or selected from SEQ ID NOs: 1027-1064 and 981-988.


In certain embodiments, methods of the invention comprise use of a biological sample selected from the group consisting of hematopoietic cells, lymphocytes, and tumor cells. In some embodiments the biological sample is selected from the group consisting of peripheral blood mononuclear cells (PBMCs), B cells, circulating tumor cells, and tumor infiltrating lymphocytes (herein “TILs” or “TIL”). In some embodiments, the biological sample comprises B cells undergoing ex vivo activation and/or expansion. In some embodiments, the biological sample comprises cfDNA, such as found, for example, in blood or plasma. In some embodiments, the biological sample is selected from the group consisting of tissue (for example, lymph node, organ tissue, bone marrow), whole blood, synovial fluid, cerebral spinal fluid, tumor biopsy, and other clinical specimens containing cells.


In some embodiments, methods, compositions, and systems are provided for determining the immune repertoire of a biological sample by assessing both expressed immune receptor RNA and rearranged immune receptor genomic DNA (gDNA) from a biological sample. In some embodiments, the sample RNA and gDNA may be assessed concurrently and following reverse transcription of the RNA to form cDNA, the cDNA and gDNA may be amplified in the same multiplex amplification reaction. In some embodiments, cDNA from the sample RNA and the sample gDNA may undergo multiplex amplification in separate reactions. In some embodiments, cDNA from the sample RNA and sample gDNA may undergo multiplex amplification with parallel primer pools. In some embodiments, the same BCR-directed primer pools are used to assess the BCR repertoire of gDNA and RNA from the sample. In some embodiments, different immune receptor-directed primer pools are used to assess the immune repertoire of gDNA and RNA from the sample. In some embodiments, multiplex amplification reactions are performed separately with cDNA from the sample RNA and with sample gDNA to amplify the same or different target immune receptor molecules from the sample and the resulting immune receptor amplicons are sequenced, thereby providing sequence of the expressed immune receptor RNA and rearranged immune receptor gDNA of a biological sample.


In some embodiments, different immune receptor-directed primer pools are used to assess the immune repertoire of gDNA and/or RNA from the sample. In some embodiments, multiplex amplification reactions are performed with a set of IgH primers provided herein and with a set of TCR beta-directed primers, for example as described in PCT Application No. PCT/US2018/014111, filed Jan. 17, 2018, and PCT Application No. PCT/US2018/049259, filed Aug. 31, 2018, the entirety of each of which is incorporated herein by reference, or commercially available as Oncomine™ TCR Beta-SR Assay DNA, Oncomine™ TCR Beta-SR Assay RNA, and Oncomine™ TCR Beta-LR Assay (Thermo Fisher Scientific). The ability to assess both the BCR (eg, IgH) and TCR (eg, TCR beta) repertoires from a sample using a single multiplex amplification reaction is useful in saving time and limited biological sample and is applicable in many of the methods described herein, including methods related to allergy and autoimmunity, vaccine development and use, and immune-oncology. For example, combining B cell repertoire analysis with T cell repertoire analysis may be used to improve detection of changes in the immune repertoire following administration of immunotherapy, such as checkpoint blockade or checkpoint inhibitor immunotherapy, potentially indicating a response to the immunotherapy. Also, combining B cell repertoire analysis with T cell repertoire analysis may be used to improve evaluation of vaccine efficacy. Exemplary immune repertoire changes in response to immunotherapy or in response to vaccine administration include, without limitation, a decrease in T and B cell evenness following treatment (for example without limitation, at day 7-14 post treatment) in comparison to the pretreatment evenness values, and an increase in the representation of IgG1 expressing B cells following treatment(s) in comparison to the pretreatment values.


In some embodiments, the methods and compositions provided are used to identify and/or characterize an immune repertoire of a subject. In some embodiments, methods and compositions provided are used to identify and characterize novel or non-canonical BCR alleles of a subject's immune repertoire. In some embodiments, the sequences of the identified immune repertoire are compared to a contemporaneous or current version of the IMGT database and the sequence of at least one allelic variant absent from that IMGT database is identified. In some embodiments, identified allelic variants absent from the IMGT database are subjected to evidence-based filtering using, for example, criteria such as clone number support, sequence read support and/or number of individuals having the allelic variant. Allelic variants identified and reported as absent from IMGT may be compared to other databases containing immune repertoire sequence information, such as NCBI NR database and Lym1K database, to cross-validate the reported novel or non-canonical BCR alleles. Characterizing the existence of undocumented or non-canonical IgH polymorphism, for example, may help with understanding factors that influence autoimmune disease, infectious disease, and response to immunotherapy. In some embodiments, the sequences of novel or non-canonical BCR alleles identified as described herein may be used to generate recombinant BCR nucleic acids or molecules. In other embodiments accordingly, provided are methods for making recombinant nucleic acids encoding identified novel IgH allelic variants. In some embodiments, provided are methods for making recombinant IgH allelic variant molecules and for making recombinant cells which express the same.


In some embodiments, methods and compositions provided are used to identify and characterize novel or non-canonical BCR alleles of a subject's immune repertoire. In some embodiments, a patient's immune repertoire may be identified or characterized before and/or after a therapeutic treatment, for example treatment for a cancer or immune disorder. In some embodiments, identification or characterization of an immune repertoire may be used to assess the effect or efficacy of a treatment, to modify therapeutic regimens, and/or to optimize the selection of therapeutic agents. In some embodiments, identification or characterization of the immune repertoire may be used to assess a patient's response to an immunotherapy, a cancer vaccine and/or other immune-based treatment or combination(s) thereof. In some embodiments, identification or characterization of the immune repertoire may indicate a patient's likelihood to respond to a therapeutic agent or may indicate a patient's likelihood to not be responsive to a therapeutic agent.


In some embodiments, a patient's BCR repertoire may be identified or characterized to monitor progression and/or treatment of hyperproliferative diseases, including detection of residual disease following patient treatment, monitor progression and/or treatment of autoimmune disease, transplantation monitoring, and to monitor conditions of antigenic stimulation, including following vaccination, exposure to bacterial, fungal, parasitic, or viral antigens, or infection by bacteria, fungi, parasites or virus. In some embodiments, identification or characterization of the BCR repertoire may be used to assess a patient's response to an anti-infective or anti-inflammatory therapy.


In some embodiments, methods and compositions are provided for identifying and/or characterizing immune repertoire clonal populations in a sample from a subject, comprising performing one or more multiplex amplification reactions with the sample or with cDNA prepared from the sample to amplify immune repertoire nucleic acid template molecules having a constant portion and a variable portion using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 1 (FR1) within the V gene, and ii) one or more C gene primers directed to at least a portion of a respective target C gene of the immune receptor coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda, thereby generating BCR amplicon molecules. The method further comprises sequencing the resulting BCR amplicon molecules, determining the sequences of the BCR amplicon molecules, and identifying one or more immune repertoire clonal populations for the target BCR from the sample. In particular, embodiments determining the sequence of the immune receptor amplicon molecules includes obtaining initial sequence reads, aligning the initial sequence read to a reference sequence and identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads; and determining the sequences of the resulting immune receptor molecules. In other embodiments of such methods and compositions, the one or more multiplex amplification reaction is performed using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and ii) one or more J gene primers directed to at least a portion of a respective target J gene of the BCR coding sequence, wherein each set of i) and ii) and iii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLlambda, and IgLkappa In some embodiments, multiplex amplification reactions are performed with primer sets designed to generate amplicons which include the expressed CDR3 regions of the target immune receptor. In some embodiments, multiplex amplification reactions are performed using i) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgH coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene and (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgH coding sequence; and ii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLlambda coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgLlambda coding sequence; and iii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLkappa coding sequence comprising at least a portion of framework region 1 (FR3) within the V gene, wherein each set of i) and ii) and iii) primers is directed to coding sequences of the same BCR gene such that performing the amplification using the at least one set of i) and ii) primers results in amplicon molecules representing the target BCR repertoire in the sample; thereby generating target BCR amplicon molecules comprising the target BCR repertoire. For example, exemplary primers specific for IgH V gene FR3 regions are shown in Table 9 and exemplary primers specific for IgH J genes are shown in Table 6, exemplary primers specific for IgLkappa V gene FR3 regions are shown in Table 3 and exemplary primers specific for IgLkappa J genes are shown in Table 4, exemplary primers specific for IgLlambda V gene FR3 regions are shown in Table 1 and exemplary primers specific for IgLlambda J genes are shown in Table 2 and exemplary primers specific for KDE and Cint are shown in Table 5. The method further comprises sequencing the resulting BCR amplicon molecules, determining the sequences of the BCR amplicon molecules, and identifying one or more immune repertoire clonal populations for the target BCR from the sample. In particular, embodiments determining the sequence of the immune receptor amplicon molecules includes obtaining initial sequence reads, adding the inferred J gene sequence to the sequence read to create an extended sequence read, aligning the extended sequence read to a reference sequence and identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads, and determining the sequences of the resulting immune receptor molecules.


In some embodiments, the multiplex amplification reaction uses i) (a) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of distal FR3 within the V gene, and/or (b) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR2 within the V gene; and ii) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the at least one BCR coding sequence, wherein each set of i) and ii) primers is directed to coding sequences of the same target BCR IgH gene such that performing the amplification using the at least one set of i) and ii) primers results in amplicon molecules representing the target BCR repertoire in the sample; thereby generating target BCR amplicon molecules comprising the target BCR repertoire. For example, exemplary primers specific for IgH V gene FR2 regions are shown in Table 7 and exemplary primers specific for IgH J genes are shown in Table 6 and exemplary primers specific for IgH V gene distalFR3 regions are shown in Table 8 and exemplary primers specific for IgH J genes are shown in Table 6. The method further comprises sequencing the resulting BCR amplicon molecules, determining the sequences of the BCR amplicon molecules, and identifying one or more immune repertoire clonal populations for the target BCR from the sample. In particular, embodiments determining the sequence of the immune receptor amplicon molecules includes obtaining initial sequence reads, adding the inferred J gene sequence to the sequence read to create an extended sequence read, aligning the extended sequence read to a reference sequence and identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads, and determining the sequences of the resulting immune receptor molecules.


In some embodiments, accordingly, methods, compositions and workflows provided are for use, without limitation, in assessing clonality, diversity and richness of B cell populations. For example, clonal expansion may identify B cells that are responding to antigen challenge and longitudinal analysis may be used to evaluate efficacy of vaccination. In some embodiments, methods, compositions and workflows provided are for use in identifying clonal lineages with many members. For example, clonal lineages with many members may represent B cells that are responding to chronic antigen stimulation. In some embodiments, methods, compositions and workflows provided are for use in identifying antigen-specific B cells. For example, comparing the IgH repertoire across groups of individuals who have been exposed to the same antigen may reveal shared IgH amino acid motifs indicative of antigen specific IgH chains. In some embodiments, methods, compositions and workflows provided are for use in evaluating clonal overlap. For example, clonal overlap analysis may reveal B cell trafficking and developmental relationships between populations of B cells. In some embodiments, methods, compositions and workflows provided are for use in determining VDJ sequence of dominant clones, including in longitudinal analysis. In some embodiments, methods, compositions and workflows provided are for use in identifying malignant subclones via clonal lineage analysis. For example, for some B cell malignancies (e.g., follicular lymphoma), somatic hypermutation is ongoing, leading to the presence of malignant subclones having different but related IgH sequences that may be tracked with the provided methods, compositions and workflows.


In some embodiments, methods, compositions and workflows provided are for use in evaluating clonal evolution. For example, analysis of clonal lineages may reveal isotype switching and IgH residues important for antigen binding. In some embodiments, methods, compositions and workflows provided are for use in evaluating isotype abundance. For example, over or under representation of certain isotypes may indicate disease or immunodeficiency such as, without limitation, elevated IgG1 in response to viral infection, elevated IgE in allergy, and missing or underrepresented isotypes may indicate primary immunodeficiency. In some embodiments, methods, compositions and workflows provided are for use in quantifying somatic hypermutation. For example, the frequency of somatic hypermutation provides insight into the stage of B cell development at which malignant transformation occurred.


In some embodiments, methods and compositions provided are used to identify and/or characterize somatic hypermutations (SHM) within a BCR repertoire or clonal populations. In some embodiments, methods and compositions provided are used to identify and/or screen for rare BCR clones or subclones, for example those having somatically hypermutated VDJ rearrangements. In some embodiments, identification, quantification and/or characterization of rare BCR clones may provide biomarkers for a given condition or treatment response. Accordingly, in some embodiments, methods and compositions provided herein are used to identify, screen for and/or characterize BCR clones as biomarkers using samples obtained for example from retrospective or longitudinal subject studies.


In some embodiments, methods for identifying and/or characterizing BCR clonal lineages and SHM comprise performing one or more multiplex amplification reaction with a subject's sample to amplify BCR nucleic acid template molecules having a constant portion and a variable portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and one or more C gene primers directed to at least a portion of a respective target C gene of the BCR coding sequence, sequencing the resultant BCR amplicons, and performing VDJ sequence analysis provided herein to identify and/or quantify SMH and clonal lineages for the target BCR from the sample. In other embodiments, methods for identifying and/or characterizing BCR clonal lineages and SHM comprise performing one or more multiplex amplification reaction with a subject's sample to amplify BCR nucleic acid template molecules having a J gene portion and a variable portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, sequencing the resultant BCR amplicons, and performing VDJ sequence analysis provided herein to identify SHM and clonal lineages for the target BCR from the sample.


In certain embodiments, the methods and compositions provided are used to monitor changes in BCR repertoire clonal populations and clonal lineages, for example changes in clonal expansion, changes in clonal contraction, changes in relative ratios of clones or clonal populations within a BCR repertoire, changes in expansion or contraction of clonal lineages, changes in somatic hypermutation and/or isotype class switching within a repertoire. In some embodiments, the provided methods and compositions are used to monitor changes in BCR repertoire clonal populations or clonal lineages (e.g., clonal population or lineage expansion, clonal population or lineage contraction, clonal population or lineage changes in relative ratios, changes in somatic hypermutation and/or class switching) in response to tumor growth. In some embodiments, the provided methods and compositions are used to monitor changes in BCR repertoire clonal populations (e.g., clonal population or lineage expansion, clonal population or lineage contraction, clonal population or lineage changes in relative ratios, changes in somatic hypermutation and/or class switching) in response to tumor treatment. In some embodiments, the provided methods and compositions provided are used to monitor changes in BCR repertoire clonal populations or clonal lineages (e.g., clonal population or lineage expansion, clonal population or lineage contraction, clonal population or lineage changes in relative ratios, changes in somatic hypermutation and/or class switching) during a remission period. For many lymphoid malignancies, a clonal B cell receptor sequence can be used a biomarker for the malignant cells of the particular cancer (e.g., leukemia) and to monitor residual disease, tumor expansion, contraction, and/or treatment response. In certain embodiments a clonal B cell receptor may be identified and further characterized to confirm a new utility in therapeutic, biomarker and/or diagnostic use.


In some embodiments, methods and compositions are provided for monitoring changes in BCR clonal populations in a subject, comprising performing one or more multiplex amplification reaction with a subject's sample to amplify immune repertoire nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, sequencing the resultant BCR amplicons, identifying immune repertoire clonal populations for the target BCR from the sample, and comparing the identified immune repertoire clonal populations to those identified in samples obtained from the subject at a different time. In various embodiments, the one or more multiplex amplification reactions performed in such methods may be a single multiplex amplification reaction or may be two or more multiplex amplification reactions performed in parallel, for example parallel, highly multiplexed amplification reactions performed with different primer pools. Samples for use in monitoring changes in BCR repertoire clonal populations include, without limitation, samples obtained prior to a diagnosis, samples obtained at any stage of diagnosis, samples obtained during a remission, samples obtained at any time prior to a treatment (pre-treatment sample), samples obtained at any time following completion of treatment (post-treatment sample), and samples obtained during the course of treatment.


In certain embodiments, methods and compositions are provided for identifying and/or characterizing the BCR repertoire of a patient to monitor progression and/or treatment of the patient's hyperproliferative disease. In some embodiments, the methods and compositions provided are used for minimal residual disease (MRD) monitoring for a patient following treatment. In some embodiments, the methods and compositions provided allow for the deep sequencing of the patient BCR repertoire useful for MRD measurements and for identifying rare BCR clones. In some embodiments, monitoring MRD includes assessing somatic hypermutation of the BCR repertoire. In some embodiments, the methods and compositions are used to identify and/or track B cell lineage malignancies. In some embodiments, the methods and compositions are used to detect and/or monitor MRD in patients diagnosed with leukemia or lymphoma, including without limitation, acute lymphoblastic leukemia, chronic myeloid leukemia, chronic lymphocytic leukemia, chronic myelogenous leukemia, B cell lymphoma, mantle cell lymphoma, and multiple myeloma. In some embodiments, the methods and compositions are used to detect and/or monitor MRD in patients diagnosed with solid tumors, including without limitation, breast cancer, lung cancer, colorectal, and neuroblastoma. In some embodiments, the methods and compositions are used to detect and/or monitor MRD in patients following cancer treatment including without limitation bone marrow transplant, lymphocyte infusion, adoptive cell therapy, other cell-based immunotherapy, and antibody-based immunotherapy.


In some embodiments, methods and compositions are provided for identifying and/or characterizing the BCR repertoire of a patient to monitor progression and/or treatment of the patient's hyperproliferative disease, comprising performing one or more multiplex amplification reactions with a sample from the patient or with cDNA prepared from the sample to amplify BCR nucleic acid template molecules having a constant portion and a variable portion using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and ii) one or more J gene primers directed to at least a portion of a respective target J gene of the BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa, and IgLlambda, thereby generating BCR amplicon molecules. The method further comprises sequencing the resulting BCR amplicon molecules, determining the sequences of the BCR amplicon molecules, and identifying immune repertoire for the target BCR from the sample. In particular, embodiments determining the sequence of the immune receptor amplicon molecules includes obtaining initial sequence reads, aligning the initial sequence read to a reference sequence and identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads; and determining the sequences of the resulting immune receptor molecules. In other embodiments of such methods and compositions, the multiplex amplification reaction is performed using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR2 within the V gene, and ii) one or more C gene primers directed to at least a portion of a respective target C gene of the BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda.


In some embodiments, methods and compositions are provided for identifying and/or characterizing the BCR repertoire of a patient to monitor progression and/or treatment of the patient's hyperproliferative disease, comprising performing one or more multiplex amplification reaction with a sample from the patient or with cDNA prepared from the sample to amplify immune repertoire nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa, and IgLlambda, thereby generating BCR amplicon molecules. The method further comprises sequencing the resulting BCR amplicon molecules, determining the sequences of the BCR amplicon molecules, and identifying immune repertoire for the target BCR from the sample. In particular, embodiments determining the sequence of the immune receptor amplicon molecules includes obtaining initial sequence reads, adding the inferred J gene sequence to the sequence read to create an extended sequence read, aligning the extended sequence read to a reference sequence and identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads; and determining the sequences of the resulting immune receptor molecules. In other embodiments of such methods and compositions, the multiplex amplification reaction is performed using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1 within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda. In other embodiments of such methods and compositions, the multiplex amplification reaction is performed using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR2 within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda.


In some embodiments, methods and compositions are provided for MRD monitoring for a patient having a hyperproliferative disease, comprising performing one or more multiplex amplification reaction with a patient's sample to amplify immune repertoire nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, sequencing the resultant BCR amplicons, identifying immune repertoire sequences for the target BCR, and detecting the presence or absence of immune receptor sequence(s) in the sample associated with the hyperproliferative disease. In various embodiments, the one or more multiplex amplification reactions performed in such methods may be a single multiplex amplification reaction or may be two or more multiplex amplification reactions performed in parallel, for example parallel, highly multiplexed amplification reactions performed with different primer pools. Samples for use in MRD monitoring include, without limitation, samples obtained during a remission, samples obtained at any time following completion of treatment (post-treatment sample), and samples obtained during the course of treatment.


In certain embodiments, methods and compositions are provided for identifying and/or characterizing the BCR repertoire of a subject in response to a treatment. In some embodiments, the methods and compositions are used to characterize and/or monitor populations or clones of tumor infiltrating lymphocytes (TILs) before, during, and/or following tumor treatment. In some embodiments, profiling immune receptor repertoires of TILs provides characterization and/or assessment of the tumor microenvironment. In some embodiments, the methods and compositions for determining immune repertoire are used to identify and/or track therapeutic B cell population(s). In some embodiments, the methods and compositions provided are used to identify and/or monitor the persistence of cell-based therapies following patient treatment, and/or immune reconstitution after allogeneic hematopoietic cell transplantation.


In some embodiments, the methods and compositions provided are used to characterize and/or monitor B cell clones or populations present in patient sample following administration of cell-based therapies to the patient, including but not limited to, e.g., cancer vaccine cells, CAR-T, TIL, and/or other engineered cell-based therapy. In some embodiments, the provided methods and compositions are used to characterize and/or monitor BCR repertoire in a patient sample following cell-based therapies in order to assess and/or monitor the patient's response to the administered cell-based therapy. Samples for use in such characterizing and/or monitoring following cell-based therapy include, without limitation, circulating blood cells, circulating tumor cells, TILs, tissue, cfDNA, and tumor sample(s) from a patient.


In some embodiments, methods and compositions are provided for monitoring cell-based therapy for a patient receiving such therapy, comprising performing one or more multiplex amplification reactions with a patient's sample to amplify BCR repertoire nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, sequencing the resultant BCR amplicons, identifying immune repertoire sequences for the target BCR, and detecting the presence or absence of BCR sequence(s) in the sample associated with the cell-based therapy.


In some embodiments, methods and compositions are provided for monitoring a patient's response following administration of a cell-based therapy, comprising performing one or more multiplex amplification reactions with a patient's sample to amplify BCR repertoire nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, sequencing the resultant BCR amplicons, identifying immune repertoire sequences for the target BCR, and comparing the identified BCR repertoire to the immune receptor sequence(s) identified in samples obtained from the patient at a different time. Cell-based therapies suitable for such monitoring include, without limitation, TILs, and other enriched autologous cells. In various embodiments, the one or more multiplex amplification reactions performed in such methods may be a single multiplex amplification reaction or may be two or more multiplex amplification reactions performed in parallel, for example parallel, highly multiplexed amplification reactions performed with different primer pools. Samples for use in such monitoring include, without limitation, samples obtained prior to a diagnosis, samples obtained at any stage of diagnosis, samples obtained during a remission, samples obtained at any time prior to a treatment (pre-treatment sample), samples obtained at any time following completion of treatment (post-treatment sample), and samples obtained during the course of treatment.


In some embodiments, the methods and compositions for determining B cell receptor repertoires, are used to measure and/or assess immunocompetence before, during, and/or following a treatment, including without limitation, solid organ transplant or bone marrow transplant.


In some embodiments, methods and compositions are provided for identifying and/or characterizing the BCR repertoire of a subject in response to a treatment, comprising obtaining a sample from the subject following initiation of a treatment, performing one or more multiplex amplification reactions with the sample or with cDNA prepared from the sample to amplify BCR nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda, thereby generating BCR amplicon molecules. The method further comprises sequencing the resulting BCR amplicon molecules, determining the sequences of the BCR amplicon molecules, and identifying immune repertoire for the target BCR from the sample. In some embodiments, the method further comprises comparing the identified BCR repertoire from the sample obtained following treatment initiation to the BCR repertoire from a sample of the patient obtained prior to treatment. In particular, embodiments determining the sequence of the BCR amplicon molecules includes obtaining initial sequence reads, adding the inferred J gene sequence to the sequence read to create an extended sequence read, aligning the extended sequence read to a reference sequence and identifying productive reads, correcting one or more indel errors to generate rescued productive sequence reads; and determining the sequences of the resulting BCR molecules. In other embodiments of such methods and compositions, the multiplex amplification reaction is performed using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1 within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda. In other embodiments of such methods and compositions, the multiplex amplification reaction is performed using at least one set of primers comprising i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR2 within the V gene, and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda.


In some embodiments, methods and compositions are provided for monitoring changes in the BCR repertoire of a subject in response to a treatment, comprising performing one or more multiplex amplification reactions with a subject's or patient's sample to amplify BCR nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, sequencing the resultant BCR amplicons, identifying immune repertoire sequences for the target BCR from the sample, and comparing the identified BCR repertoire to those identified in samples obtained from the subject at a different time. In various embodiments, the one or more multiplex amplification reactions performed in such methods may be a single multiplex amplification reaction or may be two or more multiplex amplification reactions performed in parallel, for example parallel, highly multiplexed amplification reactions performed with different primer pools. Samples for use in monitoring changes in BCR repertoire include, without limitation, samples obtained prior to a diagnosis, samples obtained at any stage of diagnosis, samples obtained during a remission, samples obtained at any time prior to a treatment (pre-treatment sample), samples obtained at any time following completion of treatment (post-treatment sample), and samples obtained during the course of treatment.


In certain embodiments, the methods and compositions provided are used to characterize and/or monitor BCR repertoires associated with immune system-mediated adverse event(s), including without limitation, those associated with inflammatory conditions, autoimmune reactions, and/or autoimmune diseases or disorders. In some embodiments, the methods and compositions provided are used to identify and/or monitor B cell, or B cell and T cell, immune repertoires associated with chronic autoimmune diseases or disorders including, without limitation, multiple sclerosis, Type I diabetes, narcolepsy, rheumatoid arthritis, ankylosing spondylitis, asthma, and SLE. In some embodiments, a systemic sample, such as a blood sample, is used to determine the immune repertoire(s) of an individual with an autoimmune condition. In some embodiments, a localized sample, such as a fluid sample from an affected joint or region of swelling, is used to determine the immune repertoire(s) of an individual with an autoimmune condition. In some embodiments, comparison of the immune repertoire found in a localized or affected area sample to the immune repertoire found in the systemic sample can identify clonal T or B cell populations to be targeted for removal.


In some embodiments, methods and compositions are provided for identifying and/or monitoring a BCR repertoire associated with progression and/or treatment of a patient's immune system-mediated adverse event(s), comprising performing one or more multiplex amplification reactions with a patient's sample to amplify BCR nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, sequencing the resultant BCR amplicons, identifying BCR sequences for the target immune receptor from the sample, and comparing the identified BCR repertoire to the BCR repertoire(s) identified in samples obtained from the patient at a different time. In various embodiments, the one or more multiplex amplification reactions performed in such methods may be a single multiplex amplification reaction or may be two or more multiplex amplification reactions performed in parallel, for example parallel, highly multiplexed amplification reactions performed with different primer pools. Samples for use in monitoring changes in immune repertoire associated with immune system-mediated adverse event(s) include, without limitation, samples obtained prior to a diagnosis, samples obtained at any stage of diagnosis, samples obtained during a remission, samples obtained at any time prior to a treatment (pre-treatment sample), samples obtained at any time following completion of treatment (post-treatment sample), and samples obtained during the course of treatment.


In some embodiments, the methods and compositions provided are used to characterize and/or monitor immune repertoires associated with passive immunity, including naturally acquired passive immunity and artificially acquired passive immunity therapies. For example, the methods and compositions provided may be used to identify and/or monitor protective antibodies that provide passive immunity to the recipient following transfer of antibody-mediated immunity to the recipient, including without limitation, antibody-mediated immunity conveyed from a mother to a fetus during pregnancy or to an infant through breast-feeding, or conveyed via administration of antibodies to a recipient. In another example, the methods and compositions provided may be used to identify and/or monitor B cell and/or T cell immune repertoires associated with passive transfer of cell-mediated immunity to a recipient, such as the administration of mature circulating lymphocytes to a recipient histocompatible with the donor. In some embodiments, the methods and compositions provided are used to monitor the duration of passive immunity in a recipient.


In some embodiments, the methods and compositions provided are used to characterize and/or monitor immune repertoires associated with active immunity or vaccination therapies. For example, following exposure to a vaccine or infectious agent, the methods and compositions provided may be used to identify and/or monitor protective antibodies or protective clonal B cell populations, or clonal B cell and T cell populations, that may provide active immunity to the exposed individual. In some embodiments, the methods and compositions provided are used to monitor the duration of B cell clones, or B cell and T cell clones, which contribute to immunity in an exposed individual. In some embodiments, the methods and compositions provided are used to identify and/or monitor B cell and/or T cell immune repertoires associated with exposure to bacterial, fungal, parasitic, or viral antigens. In some embodiments, the methods and compositions provided are used to identify and/or monitor B cell and/or T cell immune repertoires associated with bacterial, fungal, parasitic, or viral infection. Accordingly, in some embodiments, methods and composition provided are for use in vaccine development, including without limitation identifying and/or characterizing one or responses to a vaccine candidate, and assessing one or more responses to a vaccine for quality or regulatory purposes.)


In some embodiments, methods and compositions are provided for monitoring changes in the BCR repertoire following exposure to a vaccine or infectious agent, comprising performing one or more multiplex amplification reactions with an exposed subject's sample to amplify BCR repertoire nucleic acid template molecules having a J gene portion and a V gene portion using at least one set of primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of FR1, FR2 or FR3 within the V gene, and a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, sequencing the resultant BCR amplicons, identifying BCR sequences for the target immune receptor from the sample, and comparing the identified BCR repertoire to the BCR repertoire(s) identified in samples obtained from the patient at a different time. Accordingly, methods and compositions may be used to monitor changes in B cell repertoire (including isotype class switching) and assess a subject's response to vaccine exposure.


In some embodiments, the methods and compositions provided are used to screen or characterize lymphocyte populations which are grown and/or activated in vitro for use as immunotherapeutic agents or in immunotherapeutic-based regimens. In some embodiments, the methods and compositions provided are used to screen or characterize TIL populations or other harvested B cell populations which are grown and/or activated in vitro. In some embodiments, determining the IgH sequence of a BCR facilitates identification and production of antigen-specific B cells. In some embodiments, the methods and compositions provided are used to screen or characterize engineered B cell populations which are grown and/or activated in vitro, for use, for example, in immunotherapy or antibody production. In some embodiments, the methods and compositions provided are used to assess cell populations by monitoring BCR repertoires during ex vivo workflows for manufacturing engineered cell preparations, for example, for quality control or regulatory testing purposes.


In some embodiments, the sequences of novel or non-canonical BCR alleles identified as described herein may be used to generate recombinant BCR nucleic acids or molecules. In some embodiments, the methods and compositions provided are used in the screening and/or production of recombinant antibody libraries. Compositions provided which are directed to identifying BCRs can be used to rapidly evaluate recombinant antibody library size and composition to identify antibodies of interest.


In some embodiments, profiling immune receptor repertoires as provided herein may be combined with profiling immune response gene expression to provide characterization of the tumor microenvironment. In some embodiments, combining or correlating a tumor sample's BCR repertoire profile with a targeted immune response gene expression profile provides a more thorough analysis of the tumor microenvironment and may suggest or provide guidance for immunotherapy treatments.


Suitable cells for analysis include, without limitation, various hematopoietic cells, lymphocytes, and tumor cells, such as peripheral blood mononuclear cells (PBMCs), B cells, circulating tumor cells, and tumor infiltrating lymphocytes (TILs). Lymphocytes expressing immunoglobulin include pre-B cells, B-cells, e.g. memory B cells, and plasma cells. For example, in some embodiments, a sample comprising PBMCs may be used as a source for antibody immune repertoire analysis. The sample may contain, for example, lymphocytes, monocytes, and macrophages as well as antibodies and other biological constituents.


Analysis of the BCR repertoire is of interest for conditions involving cellular proliferation and antigenic exposure, including without limitation, the presence of cancer, exposure to cancer antigens, exposure to antigens from an infectious agent, exposure to vaccines, exposure to allergens, exposure to food stuffs, presence of a graft or transplant, and the presence of autoimmune activity or disease. Conditions associated with immunodeficiency are also of interest for analysis, including congenital and acquired immunodeficiency syndromes.


B cell lineage malignancies of interest include, without limitation, multiple myeloma; acute lymphocytic leukemia (ALL); relapsed/refractory B cell ALL, chronic lymphocytic leukemia (CLL); diffuse large B cell lymphoma; mucosa-associated lymphatic tissue lymphoma (MALT); small cell lymphocytic lymphoma; mantle cell lymphoma (MCL); Burkitt lymphoma; mediastinal large B cell lymphoma; Waldenström macroglobulinemia; nodal marginal zone B cell lymphoma (NMZL); splenic marginal zone lymphoma (SMZL); intravascular large B-cell lymphoma; primary effusion lymphoma; lymphomatoid granulomatosis, etc. Non-malignant B cell hyperproliferative conditions include monoclonal B cell lymphocytosis (MBL).


Other malignancies of interest include, without limitation, acute myeloid leukemia, head and neck cancers, brain cancer, breast cancer, ovarian cancer, cervical cancer, colorectal cancer, endometrial cancer, gallbladder cancer, gastric cancer, bladder cancer, prostate cancer, testicular cancer, liver cancer, lung cancer, kidney (renal cell) cancer, esophageal cancer, pancreatic cancer, thyroid cancer, bile duct cancer, pituitary tumor, wilms tumor, kaposi sarcoma, osteosarcoma, thymus cancer, skin cancer, heart cancer, oral and larynx cancer, neuroblastoma and non-hodgkin lymphoma.


Neurological inflammatory conditions are of interest, e.g. Alzheimer's Disease, Parkinson's Disease, Lou Gehrig's Disease, etc. and demyelinating diseases, such as multiple sclerosis, chronic inflammatory demyelinating polyneuropathy, etc. as well as inflammatory conditions such as rheumatoid arthritis. Systemic lupus erythematosus (SLE) is an autoimmune disease characterized by polyclonal B cell activation, which results in a variety of anti-protein and non-protein autoantibodies (see Kotzin et al. (1996) Cell 85:303-306). These autoantibodies form immune complexes that deposit in multiple organ systems, causing tissue damage. An autoimmune component may be ascribed to atherosclerosis, where candidate autoantigens include Hsp60, oxidized LDL, and 2-Glycoprotein I (2GPI).


A sample for use in the methods described herein may be one that is collected from a subject with a malignancy or hyperproliferative condition, including lymphomas, leukemias, and plasmacytomas. A lymphoma is a solid neoplasm of lymphocyte origin, and is most often found in the lymphoid tissue. Thus, for example, a biopsy from a lymph node, e.g. a tonsil, containing such a lymphoma would constitute a suitable biopsy. Samples may be obtained from a subject or patient at one or a plurality of time points in the progression of disease and/or treatment of the disease.


In some embodiments, the disclosure provides methods for performing target-specific multiplex PCR on a cDNA sample having a plurality of expressed immune receptor target sequences using primers having a cleavable group.


In certain embodiments, library and/or template preparation to be sequenced are prepared automatically from a population of nucleic acid samples using the compositions provided herein using an automated systems, e.g., the Ion Chef™ system.


As used herein, the term “subject” includes a person, a patient, an individual, someone being evaluated, etc.


As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of features is not necessarily limited only to those features but may include other features not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive-or and not to an exclusive-or.


As used herein, “antigen” refers to any substance that, when introduced into a body, e.g., of a subject, can stimulate an immune response, such as the production of an antibody that recognizes the antigen. Antigens include molecules such as nucleic acids, lipids, ribonucleoprotein complexes, protein complexes, proteins, polypeptides, peptides and naturally occurring or synthetic modifications of such molecules against which an immune response involving T and/or B lymphocytes can be generated. With regard to autoimmune disease, the antigens herein are often referred to as autoantigens. With regard to allergic disease the antigens herein are often referred to as allergens. Autoantigens are any molecule produced by the organism that can be the target of an immunologic response, including peptides, polypeptides, and proteins encoded within the genome of the organism and post-translationally-generated modifications of these peptides, polypeptides, and proteins. Such molecules also include carbohydrates, lipids and other molecules produced by the organism. Antigens also include vaccine antigens, which include, without limitation, pathogen antigens, cancer associated antigens, allergens, and the like.


As used herein, “amplify”, “amplifying” or “amplification reaction” and their derivatives, refer to any action or process whereby at least a portion of a nucleic acid molecule (referred to as a template nucleic acid molecule) is replicated or copied into at least one additional nucleic acid molecule. The additional nucleic acid molecule optionally includes sequence that is substantially identical or substantially complementary to at least some portion of the template nucleic acid molecule. The template nucleic acid molecule can be single-stranded or double-stranded and the additional nucleic acid molecule can independently be single-stranded or double-stranded. In some embodiments, amplification includes a template-dependent in vitro enzyme-catalyzed reaction for the production of at least one copy of at least some portion of the nucleic acid molecule or the production of at least one copy of a nucleic acid sequence that is complementary to at least some portion of the nucleic acid molecule. Amplification optionally includes linear or exponential replication of a nucleic acid molecule. In some embodiments, such amplification is performed using isothermal conditions; in other embodiments, such amplification can include thermocycling. In some embodiments, the amplification is a multiplex amplification that includes the simultaneous amplification of a plurality of target sequences in a single amplification reaction. At least some of the target sequences can be situated on the same nucleic acid molecule or on different target nucleic acid molecules included in the single amplification reaction. In some embodiments, “amplification” includes amplification of at least some portion of DNA- and RNA-based nucleic acids alone, or in combination. The amplification reaction can include single or double-stranded nucleic acid substrates and can further including any of the amplification processes known to one of ordinary skill in the art. In some embodiments, the amplification reaction includes PCR.


As used herein, “amplification conditions” and its derivatives, refers to conditions suitable for amplifying one or more nucleic acid sequences. Such amplification can be linear or exponential. In some embodiments, the amplification conditions can include isothermal conditions or alternatively can include thermocycling conditions, or a combination of isothermal and thermocycling conditions. In some embodiments, the conditions suitable for amplifying one or more nucleic acid sequences includes PCR conditions. Typically, the amplification conditions refer to a reaction mixture that is sufficient to amplify nucleic acids such as one or more target sequences, or to amplify an amplified target sequence ligated to one or more adapters, e.g., an adapter-ligated amplified target sequence. Amplification conditions include a catalyst for amplification or for nucleic acid synthesis, for example a polymerase; a primer that possesses some degree of complementarity to the nucleic acid to be amplified; and nucleotides, such as deoxyribonucleotide triphosphates (dNTPs) to promote extension of the primer once hybridized to the nucleic acid. The amplification conditions can require hybridization or annealing of a primer to a nucleic acid, extension of the primer and a denaturing step in which the extended primer is separated from the nucleic acid sequence undergoing amplification. Typically, but not necessarily, amplification conditions can include thermocycling; in some embodiments, amplification conditions include a plurality of cycles where the steps of annealing, extending and separating are repeated. Typically, the amplification conditions include cations such as Mg2+ or Mn2+ (e.g., MgCl2, etc) and can also include various modifiers of ionic strength.


As used herein, “target sequence” or “target sequence of interest” and its derivatives, refers to any single or double-stranded nucleic acid sequence that can be amplified or synthesized according to the disclosure, including any nucleic acid sequence suspected or expected to be present in a sample. In some embodiments, the target sequence is present in double-stranded form and includes at least a portion of the particular nucleotide sequence to be amplified or synthesized, or its complement, prior to the addition of target-specific primers or appended adapters. Target sequences can include the nucleic acids to which primers useful in the amplification or synthesis reaction can hybridize prior to extension by a polymerase. In some embodiments, the term refers to a nucleic acid sequence whose sequence identity, ordering or location of nucleotides is determined by one or more of the methods of the disclosure.


As defined herein, “sample” and its derivatives, is used in its broadest sense and includes any specimen, culture and the like that is suspected of including a target. In some embodiments, the sample comprises cDNA, RNA, PNA, LNA, chimeric, hybrid, or multiplex-forms of nucleic acids. The sample can include any biological, clinical, surgical, agricultural, atmospheric or aquatic-based specimen containing one or more nucleic acids. The term also includes any isolated nucleic acid sample such as expressed RNA, fresh-frozen or formalin-fixed paraffin-embedded nucleic acid specimen.


As used herein, “contacting” and its derivatives, when used in reference to two or more components, refers to any process whereby the approach, proximity, mixture or commingling of the referenced components is promoted or achieved without necessarily requiring physical contact of such components, and includes mixing of solutions containing any one or more of the referenced components with each other. The referenced components may be contacted in any particular order or combination and the particular order of recitation of components is not limiting. For example, “contacting A with B and C” encompasses embodiments where A is first contacted with B then C, as well as embodiments where C is contacted with A then B, as well as embodiments where a mixture of A and C is contacted with B, and the like. Furthermore, such contacting does not necessarily require that the end result of the contacting process be a mixture including all of the referenced components, as long as at some point during the contacting process all of the referenced components are simultaneously present or simultaneously included in the same mixture or solution. Where one or more of the referenced components to be contacted includes a plurality (e.g., “contacting a target sequence with a plurality of target-specific primers and a polymerase”), then each member of the plurality can be viewed as an individual component of the contacting process, such that the contacting can include contacting of any one or more members of the plurality with any other member of the plurality and/or with any other referenced component (e.g., some but not all of the plurality of target specific primers can be contacted with a target sequence, then a polymerase, and then with other members of the plurality of target-specific primers) in any order or combination.


As used herein, the term “primer” and its derivatives refer to any polynucleotide that can hybridize to a target sequence of interest. In some embodiments, the primer can also serve to prime nucleic acid synthesis. Typically, the primer functions as a substrate onto which nucleotides can be polymerized by a polymerase; in some embodiments, however, the primer can become incorporated into the synthesized nucleic acid strand and provide a site to which another primer can hybridize to prime synthesis of a new strand that is complementary to the synthesized nucleic acid molecule. The primer may be comprised of any combination of nucleotides or analogs thereof, which may be optionally linked to form a linear polymer of any suitable length. In some embodiments, the primer is a single-stranded oligonucleotide or polynucleotide. (For purposes of this disclosure, the terms ‘polynucleotide” and “oligonucleotide” are used interchangeably herein and do not necessarily indicate any difference in length between the two). In some embodiments, the primer is single-stranded but it can also be double-stranded. The primer optionally occurs naturally, as in a purified restriction digest, or can be produced synthetically. In some embodiments, the primer acts as a point of initiation for amplification or synthesis when exposed to amplification or synthesis conditions; such amplification or synthesis can occur in a template-dependent fashion and optionally results in formation of a primer extension product that is complementary to at least a portion of the target sequence. Exemplary amplification or synthesis conditions can include contacting the primer with a polynucleotide template (e.g., a template including a target sequence), nucleotides and an inducing agent such as a polymerase at a suitable temperature and pH to induce polymerization of nucleotides onto an end of the target-specific primer. If double-stranded, the primer can optionally be treated to separate its strands before being used to prepare primer extension products. In some embodiments, the primer is an oligodeoxyribonucleotide or an oligoribonucleotide. In some embodiments, the primer can include one or more nucleotide analogs. The exact length and/or composition, including sequence, of the target-specific primer can influence many properties, including melting temperature (T m), GC content, formation of secondary structures, repeat nucleotide motifs, length of predicted primer extension products, extent of coverage across a nucleic acid molecule of interest, number of primers present in a single amplification or synthesis reaction, presence of nucleotide analogs or modified nucleotides within the primers, and the like. In some embodiments, a primer can be paired with a compatible primer within an amplification or synthesis reaction to form a primer pair consisting or a forward primer and a reverse primer. In some embodiments, the forward primer of the primer pair includes a sequence that is substantially complementary to at least a portion of a strand of a nucleic acid molecule, and the reverse primer of the primer of the primer pair includes a sequence that is substantially identical to at least of portion of the strand. In some embodiments, the forward primer and the reverse primer are capable of hybridizing to opposite strands of a nucleic acid duplex. Optionally, the forward primer primes synthesis of a first nucleic acid strand, and the reverse primer primes synthesis of a second nucleic acid strand, wherein the first and second strands are substantially complementary to each other, or can hybridize to form a double-stranded nucleic acid molecule. In some embodiments, one end of an amplification or synthesis product is defined by the forward primer and the other end of the amplification or synthesis product is defined by the reverse primer. In some embodiments, where the amplification or synthesis of lengthy primer extension products is required, such as amplifying an exon, coding region, or gene, several primer pairs can be created than span the desired length to enable sufficient amplification of the region. In some embodiments, a primer can include one or more cleavable groups. In some embodiments, primer lengths are in the range of about 10 to about 60 nucleotides, about 12 to about 50 nucleotides and about 15 to about 40 nucleotides in length. Typically, a primer is capable of hybridizing to a corresponding target sequence and undergoing primer extension when exposed to amplification conditions in the presence of dNTPs and a polymerase. In some embodiments, the primer includes one or more cleavable groups at one or more locations within the primer.


As used herein, “target-specific primer” and its derivatives, refers to a single stranded or double-stranded polynucleotide, typically an oligonucleotide, that includes at least one sequence that is at least 50% complementary, typically at least 75% complementary or at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% or at least 99% complementary, or identical, to at least a portion of a nucleic acid molecule that includes a target sequence. In such instances, the target-specific primer and target sequence are described as “corresponding” to each other. In some embodiments, the target-specific primer is capable of hybridizing to at least a portion of its corresponding target sequence (or to a complement of the target sequence); such hybridization can optionally be performed under standard hybridization conditions or under stringent hybridization conditions. In some embodiments, the target-specific primer is not capable of hybridizing to the target sequence, or to its complement, but is capable of hybridizing to a portion of a nucleic acid strand including the target sequence, or to its complement. In some embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the target sequence itself; in other embodiments, the target-specific primer includes at least one sequence that is at least 75% complementary, typically at least 85% complementary, more typically at least 90% complementary, more typically at least 95% complementary, more typically at least 98% complementary, or more typically at least 99% complementary, to at least a portion of the nucleic acid molecule other than the target sequence. In some embodiments, the target-specific primer is substantially non-complementary to other target sequences present in the sample; optionally, the target-specific primer is substantially non-complementary to other nucleic acid molecules present in the sample. In some embodiments, nucleic acid molecules present in the sample that do not include or correspond to a target sequence (or to a complement of the target sequence) are referred to as “non-specific” sequences or “non-specific nucleic acids”. In some embodiments, the target-specific primer is designed to include a nucleotide sequence that is substantially complementary to at least a portion of its corresponding target sequence. In some embodiments, a target-specific primer is at least 95% complementary, or at least 99% complementary, or identical, across its entire length to at least a portion of a nucleic acid molecule that includes its corresponding target sequence. In some embodiments, a target-specific primer is at least 90%, at least 95% complementary, at least 98% complementary or at least 99% complementary, or identical, across its entire length to at least a portion of its corresponding target sequence. In some embodiments, a forward target-specific primer and a reverse target-specific primer define a target-specific primer pair that are used to amplify the target sequence via template-dependent primer extension. Typically, each primer of a target-specific primer pair includes at least one sequence that is substantially complementary to at least a portion of a nucleic acid molecule including a corresponding target sequence but that is less than 50% complementary to at least one other target sequence in the sample. In some embodiments, amplification is performed using multiple target-specific primer pairs in a single amplification reaction, wherein each primer pair includes a forward target-specific primer and a reverse target-specific primer, each including at least one sequence that substantially complementary or substantially identical to a corresponding target sequence in the sample, and each primer pair having a different corresponding target sequence. In some embodiments, the target-specific primer is substantially non-complementary at its 3′ end or its 5′ end to any other target-specific primer present in an amplification reaction. In some embodiments, the target-specific primer can include minimal cross hybridization to other target-specific primers in the amplification reaction. In some embodiments, target-specific primers include minimal cross-hybridization to non-specific sequences in the amplification reaction mixture. In some embodiments, the target-specific primers include minimal self-complementarity. In some embodiments, the target-specific primers can include one or more cleavable groups located at the 3′ end. In some embodiments, the target-specific primers can include one or more cleavable groups located near or about a central nucleotide of the target-specific primer. In some embodiments, one of more targets-specific primers includes only non-cleavable nucleotides at the 5′ end of the target-specific primer. In some embodiments, a target specific primer includes minimal nucleotide sequence overlap at the 3′ end or the 5′ end of the primer as compared to one or more different target-specific primers, optionally in the same amplification reaction. In some embodiments 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more, target-specific primers in a single reaction mixture include one or more of the above embodiments. In some embodiments, substantially all of the plurality of target-specific primers in a single reaction mixture includes one or more of the above embodiments.


As used herein, “polymerase” and its derivatives, refers to any enzyme that can catalyze the polymerization of nucleotides (including analogs thereof) into a nucleic acid strand. Typically but not necessarily, such nucleotide polymerization can occur in a template-dependent fashion. Such polymerases can include without limitation naturally occurring polymerases and any subunits and truncations thereof, mutant polymerases, variant polymerases, recombinant, fusion or otherwise engineered polymerases, chemically modified polymerases, synthetic molecules or assemblies, and any analogs, derivatives or fragments thereof that retain the ability to catalyze such polymerization. Optionally, the polymerase is a mutant polymerase comprising one or more mutations involving the replacement of one or more amino acids with other amino acids, the insertion or deletion of one or more amino acids from the polymerase, or the linkage of parts of two or more polymerases. Typically, the polymerase comprises one or more active sites at which nucleotide binding and/or catalysis of nucleotide polymerization can occur. Some exemplary polymerases include without limitation DNA polymerases and RNA polymerases. The term “polymerase” and its variants, as used herein, also refers to fusion proteins comprising at least two portions linked to each other, where the first portion comprises a peptide that can catalyze the polymerization of nucleotides into a nucleic acid strand and is linked to a second portion that comprises a second polypeptide. In some embodiments, the second polypeptide can include a reporter enzyme or a processivity-enhancing domain. Optionally, the polymerase can possess 5′ exonuclease activity or terminal transferase activity. In some embodiments, the polymerase is optionally reactivated, for example through the use of heat, chemicals or re-addition of new amounts of polymerase into a reaction mixture. In some embodiments, the polymerase can include a hot-start polymerase or an aptamer based polymerase that optionally is reactivated.


As used herein, the term “nucleotide” and its variants comprises any compound, including without limitation any naturally occurring nucleotide or analog thereof, which can bind selectively to, or is polymerized by, a polymerase. Typically, but not necessarily, selective binding of the nucleotide to the polymerase is followed by polymerization of the nucleotide into a nucleic acid strand by the polymerase; occasionally however the nucleotide may dissociate from the polymerase without becoming incorporated into the nucleic acid strand. Such nucleotides include not only naturally occurring nucleotides but also any analogs, regardless of their structure, that can bind selectively to, or can be polymerized by, a polymerase. While naturally occurring nucleotides typically comprise base, sugar and phosphate moieties, the nucleotides of the present disclosure can include compounds lacking any one, some or all of such moieties. In some embodiments, the nucleotide can optionally include a chain of phosphorus atoms comprising three, four, five, six, seven, eight, nine, ten or more phosphorus atoms. In some embodiments, the phosphorus chain is attached to any carbon of a sugar ring, such as the 5′ carbon. The phosphorus chain can be linked to the sugar with an intervening O or S. In one embodiment, one or more phosphorus atoms in the chain can be part of a phosphate group having P and O. In another embodiment, the phosphorus atoms in the chain is linked together with intervening O, NH, S, methylene, substituted methylene, ethylene, substituted ethylene, CNH2, C(O), C(CH2), CH2CH2, or C(OH)CH2R (where R can be a 4-pyridine or 1-imidazole). In one embodiment, the phosphorus atoms in the chain has side groups having O, BH3, or S. In the phosphorus chain, a phosphorus atom with a side group other than O can be a substituted phosphate group. In the phosphorus chain, phosphorus atoms with an intervening atom other than O can be a substituted phosphate group. Some examples of nucleotide analogs are described in U.S. Pat. No. 7,405,281. In some embodiments, the nucleotide comprises a label and referred to herein as a “labeled nucleotide”; the label of the labeled nucleotide is referred to herein as a “nucleotide label.” In some embodiments, the label is in the form of a fluorescent dye attached to the terminal phosphate group, i.e., the phosphate group most distal from the sugar. Some examples of nucleotides that can be used in the disclosed methods and compositions include, but are not limited to, ribonucleotides, deoxyribonucleotides, modified ribonucleotides, modified deoxyribonucleotides, ribonucleotide polyphosphates, deoxyribonucleotide polyphosphates, modified ribonucleotide polyphosphates, modified deoxyribonucleotide polyphosphates, peptide nucleotides, modified peptide nucleotides, metallonucleosides, phosphonate nucleosides, and modified phosphate-sugar backbone nucleotides, analogs, derivatives, or variants of the foregoing compounds, and the like. In some embodiments, the nucleotide can comprise non-oxygen moieties such as, for example, thio- or borano-moieties, in place of the oxygen moiety bridging the alpha phosphate and the sugar of the nucleotide, or the alpha and beta phosphates of the nucleotide, or the beta and gamma phosphates of the nucleotide, or between any other two phosphates of the nucleotide, or any combination thereof. “Nucleotide 5′-triphosphate” refers to a nucleotide with a triphosphate ester group at the 5′ position, and are sometimes denoted as “NTP”, or “dNTP” and “ddNTP” to particularly point out the structural features of the ribose sugar. The triphosphate ester group can include sulfur substitutions for the various oxygens, e.g. alpha-thio-nucleotide 5′-triphosphates. For a review of nucleic acid chemistry, see: Shabarova, Z. and Bogdanov, A. Advanced Organic Chemistry of Nucleic Acids, VCH, New York, 1994.


The term “extension” and its variants, as used herein, when used in reference to a given primer, comprises any in vivo or in vitro enzymatic activity characteristic of a given polymerase that relates to polymerization of one or more nucleotides onto an end of an existing nucleic acid molecule. Typically but not necessarily such primer extension occurs in a template-dependent fashion; during template-dependent extension, the order and selection of bases is driven by established base pairing rules, which can include Watson-Crick type base pairing rules or alternatively (and especially in the case of extension reactions involving nucleotide analogs) by some other type of base pairing paradigm. In one non-limiting example, extension occurs via polymerization of nucleotides on the 3′OH end of the nucleic acid molecule by the polymerase.


The term “portion” and its variants, as used herein, when used in reference to a given nucleic acid molecule, for example a primer or a template nucleic acid molecule, comprises any number of contiguous nucleotides within the length of the nucleic acid molecule, including the partial or entire length of the nucleic acid molecule.


The terms “identity” and “identical” and their variants, as used herein, when used in reference to two or more nucleic acid sequences, refer to similarity in sequence of the two or more sequences (e.g., nucleotide or polypeptide sequences). In the context of two or more homologous sequences, the percent identity or homology of the sequences or subsequences thereof indicates the percentage of all monomeric units (e.g., nucleotides or amino acids) that are the same (i.e., about 70% identity, preferably 75%, 80%, 85%, 90%, 95%, 98% or 99% identity). The percent identity can be over a specified region, when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection. Sequences are said to be “substantially identical” when there is at least 85% identity at the amino acid level or at the nucleotide level. Preferably, the identity exists over a region that is at least about 25, 50, or 100 residues in length, or across the entire length of at least one compared sequence. A typical algorithm for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al, Nuc. Acids Res. 25:3389-3402 (1977). Other methods include the algorithms of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), and Needleman & Wunsch, J. Mol. Biol. 48:443 (1970), etc. Another indication that two nucleic acid sequences are substantially identical is that the two molecules or their complements hybridize to each other under stringent hybridization conditions.


The terms “complementary” and “complement” and their variants, as used herein, refer to any two or more nucleic acid sequences (e.g., portions or entireties of template nucleic acid molecules, target sequences and/or primers) that can undergo cumulative base pairing at two or more individual corresponding positions in antiparallel orientation, as in a hybridized duplex. Such base pairing can proceed according to any set of established rules, for example according to Watson-Crick base pairing rules or according to some other base pairing paradigm. Optionally there can be “complete” or “total” complementarity between a first and second nucleic acid sequence where each nucleotide in the first nucleic acid sequence can undergo a stabilizing base pairing interaction with a nucleotide in the corresponding antiparallel position on the second nucleic acid sequence. “Partial” complementarity describes nucleic acid sequences in which at least 20%, but less than 100%, of the residues of one nucleic acid sequence are complementary to residues in the other nucleic acid sequence. In some embodiments, at least 50%, but less than 100%, of the residues of one nucleic acid sequence are complementary to residues in the other nucleic acid sequence. In some embodiments, at least 70%, 80%, 90%, 95% or 98%, but less than 100%, of the residues of one nucleic acid sequence are complementary to residues in the other nucleic acid sequence. Sequences are said to be “substantially complementary” when at least 85% of the residues of one nucleic acid sequence are complementary to residues in the other nucleic acid sequence. In some embodiments, two complementary or substantially complementary sequences are capable of hybridizing to each other under standard or stringent hybridization conditions. “Non-complementary” describes nucleic acid sequences in which less than 20% of the residues of one nucleic acid sequence are complementary to residues in the other nucleic acid sequence. Sequences are said to be “substantially non-complementary” when less than 15% of the residues of one nucleic acid sequence are complementary to residues in the other nucleic acid sequence. In some embodiments, two non-complementary or substantially non-complementary sequences cannot hybridize to each other under standard or stringent hybridization conditions. A “mismatch” is present at any position in the sequences where two opposed nucleotides are not complementary. Complementary nucleotides include nucleotides that are efficiently incorporated by DNA polymerases opposite each other during DNA replication under physiological conditions. In a typical embodiment, complementary nucleotides can form base pairs with each other, such as the A-T/U and G-C base pairs formed through specific Watson-Crick type hydrogen bonding, or base pairs formed through some other type of base pairing paradigm, between the nucleobases of nucleotides and/or polynucleotides in positions antiparallel to each other. The complementarity of other artificial base pairs can be based on other types of hydrogen bonding and/or hydrophobicity of bases and/or shape complementarity between bases.


As used herein, “amplified target sequences” and its derivatives, refers to a nucleic acid sequence produced by the amplification of/amplifying the target sequences using target-specific primers and the methods provided herein. The amplified target sequences may be either of the same sense (the positive strand produced in the second round and subsequent even-numbered rounds of amplification) or antisense (i.e., the negative strand produced during the first and subsequent odd-numbered rounds of amplification) with respect to the target sequences. In some embodiments, the amplified target sequences is less than 50% complementary to any portion of another amplified target sequence in the reaction. In other embodiments, the amplified target sequences is greater than 50%, greater than 60%, greater than 70%, greater than 80%, or greater than 90% complementary to any portion of another amplified target sequence in the reaction.


As used herein, the terms “ligating”, “ligation” and their derivatives refer to the act or process for covalently linking two or more molecules together, for example, covalently linking two or more nucleic acid molecules to each other. In some embodiments, ligation includes joining nicks between adjacent nucleotides of nucleic acids. In some embodiments, ligation includes forming a covalent bond between an end of a first and an end of a second nucleic acid molecule. In some embodiments, for example embodiments wherein the nucleic acid molecules to be ligated include conventional nucleotide residues, the ligation can include forming a covalent bond between a 5′ phosphate group of one nucleic acid and a 3′ hydroxyl group of a second nucleic acid thereby forming a ligated nucleic acid molecule. In some embodiments, any means for joining nicks or bonding a 5′phosphate to a 3′ hydroxyl between adjacent nucleotides can be employed. In an exemplary embodiment, an enzyme such as a ligase is used. For the purposes of this disclosure, an amplified target sequence can be ligated to an adapter to generate an adapter-ligated amplified target sequence.


As used herein, “ligase” and its derivatives, refers to any agent capable of catalyzing the ligation of two substrate molecules. In some embodiments, the ligase includes an enzyme capable of catalyzing the joining of nicks between adjacent nucleotides of a nucleic acid. In some embodiments, the ligase includes an enzyme capable of catalyzing the formation of a covalent bond between a 5′ phosphate of one nucleic acid molecule to a 3′ hydroxyl of another nucleic acid molecule thereby forming a ligated nucleic acid molecule. In some embodiments, the ligase is an isothermal ligase. In some embodiments, the ligase is a thermostable ligase. Suitable ligases may include, but not limited to, T4 DNA ligase, T4 RNA ligase, and E. coli DNA ligase.


As used herein, “ligation conditions” and its derivatives, refers to conditions suitable for ligating two molecules to each other. In some embodiments, the ligation conditions are suitable for sealing nicks or gaps between nucleic acids. As defined herein, a “nick” or “gap” refers to a nucleic acid molecule that lacks a directly bound 5′ phosphate of a mononucleotide pentose ring to a 3′ hydroxyl of a neighboring mononucleotide pentose ring within internal nucleotides of a nucleic acid sequence. As used herein, the term nick or gap is consistent with the use of the term in the art. Typically, a nick or gap is ligated in the presence of an enzyme, such as ligase at an appropriate temperature and pH. In some embodiments, T4 DNA ligase can join a nick between nucleic acids at a temperature of about 70-72° C.


As used herein, “blunt-end ligation” and its derivatives, refers to ligation of two blunt-end double-stranded nucleic acid molecules to each other. A “blunt end” refers to an end of a double-stranded nucleic acid molecule wherein substantially all of the nucleotides in the end of one strand of the nucleic acid molecule are base paired with opposing nucleotides in the other strand of the same nucleic acid molecule. A nucleic acid molecule is not blunt ended if it has an end that includes a single-stranded portion greater than two nucleotides in length, referred to herein as an “overhang”. In some embodiments, the end of nucleic acid molecule does not include any single stranded portion, such that every nucleotide in one strand of the end is based paired with opposing nucleotides in the other strand of the same nucleic acid molecule. In some embodiments, the ends of the two blunt ended nucleic acid molecules that become ligated to each other do not include any overlapping, shared or complementary sequence. Typically, blunted-end ligation excludes the use of additional oligonucleotide adapters to assist in the ligation of the double-stranded amplified target sequence to the double-stranded adapter, such as patch oligonucleotides as described in US Pat. Publication No. 2010/0129874. In some embodiments, blunt-ended ligation includes a nick translation reaction to seal a nick created during the ligation process.


As used herein, the terms “adapter” or “adapter and its complements” and their derivatives, refers to any linear oligonucleotide which is ligated to a nucleic acid molecule of the disclosure. Optionally, the adapter includes a nucleic acid sequence that is not substantially complementary to the 3′ end or the 5′ end of at least one target sequences within the sample. In some embodiments, the adapter is substantially non-complementary to the 3′ end or the 5′ end of any target sequence present in the sample. In some embodiments, the adapter includes any single stranded or double-stranded linear oligonucleotide that is not substantially complementary to an amplified target sequence. In some embodiments, the adapter is substantially non-complementary to at least one, some or all of the nucleic acid molecules of the sample. In some embodiments, suitable adapter lengths are in the range of about 10-100 nucleotides, about 12-60 nucleotides and about 15-50 nucleotides in length. An adapter can include any combination of nucleotides and/or nucleic acids. In some embodiments, the adapter can include one or more cleavable groups at one or more locations. In another embodiment, the adapter can include a sequence that is substantially identical, or substantially complementary, to at least a portion of a primer, for example a universal primer. The structure and properties of universal amplification primers are well known to those skilled in the art and can be implemented for utilization in conjunction with provided methods and compositions to adapt to specific analysis platforms (e.g., as described herein universal P1 and A primers have been described in the art and utilized for sequencing on Ion Torrent sequencing platforms). Similarly, additional and other universal adaptor/primer sequences described and known in the art (e.g., Illumina universal adaptor/primer sequences, PacBio universal adaptor/primer sequences, etc.) can be used in conjunction with the methods and compositions provided herein. In some embodiments, the adapter can include a barcode or tag to assist with downstream cataloguing, identification or sequencing. In some embodiments, a single-stranded adapter can act as a substrate for amplification when ligated to an amplified target sequence, particularly in the presence of a polymerase and dNTPs under suitable temperature and pH.


In some embodiments, an adapter is ligated to a polynucleotide through a blunt-end ligation. In other embodiments, an adapter is ligated to a polynucleotide via nucleotide overhangs on the ends of the adapter and the polynucleotide. For overhang ligation, an adapter may have a nucleotide overhang added to the 3′ and/or 5′ ends of the respective strands if the polynucleotides to which the adapters are to be ligated (eg, amplicons) have a complementary overhang added to the 3′ and/or 5′ ends of the respective strands. For example, adenine nucleotides can be added to the 3′ terminus of an end-repaired PCR product. Adapters having with an overhang formed by thymine nucleotides can then dock with the A-overhang of the amplicon and be ligated to the amplicon by a DNA ligase, such as T4 DNA ligase.


As used herein, “reamplifying” or “reamplification” and their derivatives refer to any process whereby at least a portion of an amplified nucleic acid molecule is further amplified via any suitable amplification process (referred to in some embodiments as a “secondary” amplification or “reamplification”, thereby producing a reamplified nucleic acid molecule. The secondary amplification need not be identical to the original amplification process whereby the amplified nucleic acid molecule was produced; nor need the reamplified nucleic acid molecule be completely identical or completely complementary to the amplified nucleic acid molecule; all that is required is that the reamplified nucleic acid molecule include at least a portion of the amplified nucleic acid molecule or its complement. For example, the reamplification can involve the use of different amplification conditions and/or different primers, including different target-specific primers than the primary amplification.


As defined herein, a “cleavable group” refers to any moiety that once incorporated into a nucleic acid can be cleaved under appropriate conditions. For example, a cleavable group can be incorporated into a target-specific primer, an amplified sequence, an adapter or a nucleic acid molecule of the sample. In an exemplary embodiment, a target-specific primer can include a cleavable group that becomes incorporated into the amplified product and is subsequently cleaved after amplification, thereby removing a portion, or all, of the target-specific primer from the amplified product. The cleavable group can be cleaved or otherwise removed from a target-specific primer, an amplified sequence, an adapter or a nucleic acid molecule of the sample by any acceptable means. For example, a cleavable group can be removed from a target-specific primer, an amplified sequence, an adapter or a nucleic acid molecule of the sample by enzymatic, thermal, photo-oxidative or chemical treatment. In one embodiment, a cleavable group can include a nucleobase that is not naturally occurring. For example, an oligodeoxyribonucleotide can include one or more RNA nucleobases, such as uracil that can be removed by a uracil glycosylase. In some embodiments, a cleavable group can include one or more modified nucleobases (such as 7-methylguanine, 8-oxo-guanine, xanthine, hypoxanthine, 5,6-dihydrouracil or 5-methylcytosine) or one or more modified nucleosides (i.e., 7-methylguanosine, 8-oxo-deoxyguanosine, xanthosine, inosine, dihydrouridine or 5-methylcytidine). The modified nucleobases or nucleotides can be removed from the nucleic acid by enzymatic, chemical or thermal means. In one embodiment, a cleavable group can include a moiety that can be removed from a primer after amplification (or synthesis) upon exposure to ultraviolet light (i.e., bromodeoxyuridine). In another embodiment, a cleavable group can include methylated cytosine. Typically, methylated cytosine can be cleaved from a primer for example, after induction of amplification (or synthesis), upon sodium bisulfate treatment. In some embodiments, a cleavable moiety can include a restriction site. For example, a primer or target sequence can include a nucleic acid sequence that is specific to one or more restriction enzymes, and following amplification (or synthesis), the primer or target sequence can be treated with the one or more restriction enzymes such that the cleavable group is removed. Typically, one or more cleavable groups can be included at one or more locations with a target-specific primer, an amplified sequence, an adapter or a nucleic acid molecule of the sample.


As used herein, “cleavage step” and its derivatives, refers to any process by which a cleavable group is cleaved or otherwise removed from a target-specific primer, an amplified sequence, an adapter or a nucleic acid molecule of the sample. In some embodiments, the cleavage step involves a chemical, thermal, photo-oxidative or digestive process.


As used herein, the term “hybridization” is consistent with its use in the art, and refers to the process whereby two nucleic acid molecules undergo base pairing interactions. Two nucleic acid molecule molecules are said to be hybridized when any portion of one nucleic acid molecule is base paired with any portion of the other nucleic acid molecule; it is not necessarily required that the two nucleic acid molecules be hybridized across their entire respective lengths and in some embodiments, at least one of the nucleic acid molecules can include portions that are not hybridized to the other nucleic acid molecule. The phrase “hybridizing under stringent conditions” and its variants refers to conditions under which hybridization of a target-specific primer to a target sequence occurs in the presence of high hybridization temperature and low ionic strength. In one exemplary embodiment, stringent hybridization conditions include an aqueous environment containing about 30 mM magnesium sulfate, about 300 mM Tris-sulfate at pH 8.9, and about 90 mM ammonium sulfate at about 60-68° C., or equivalents thereof. As used herein, the phrase “standard hybridization conditions” and its variants refers to conditions under which hybridization of a primer to an oligonucleotide (i.e., a target sequence), occurs in the presence of low hybridization temperature and high ionic strength. In one exemplary embodiment, standard hybridization conditions include an aqueous environment containing about 100 mM magnesium sulfate, about 500 mM Tris-sulfate at pH 8.9, and about 200 mM ammonium sulfate at about 50-55° C., or equivalents thereof.


As used herein, “GC content” and its derivatives, refers to the cytosine and guanine content of a nucleic acid molecule. The GC content of a target-specific primer (or adapter) of the disclosure is 85% or lower. More typically, the GC content of a target-specific primer or adapter of the disclosure is between 15-85%.


As used herein, the term “end” and its variants, when used in reference to a nucleic acid molecule, for example a target sequence or amplified target sequence, can include the terminal 30 nucleotides, the terminal 20 and even more typically the terminal 15 nucleotides of the nucleic acid molecule. A linear nucleic acid molecule comprised of linked series of contiguous nucleotides typically includes at least two ends. In some embodiments, one end of the nucleic acid molecule can include a 3′ hydroxyl group or its equivalent, and is referred to as the “3′ end” and its derivatives. Optionally, the 3′ end includes a 3′ hydroxyl group that is not linked to a 5′ phosphate group of a mononucleotide pentose ring. Typically, the 3′ end includes one or more 5′ linked nucleotides located adjacent to the nucleotide including the unlinked 3′ hydroxyl group, typically the 30 nucleotides located adjacent to the 3′ hydroxyl, typically the terminal 20 and even more typically the terminal 15 nucleotides. One or more linked nucleotides can be represented as a percentage of the nucleotides present in the oligonucleotide or can be provided as a number of linked nucleotides adjacent to the unlinked 3′ hydroxyl. For example, the 3′ end can include less than 50% of the nucleotide length of the oligonucleotide. In some embodiments, the 3′ end does not include any unlinked 3′ hydroxyl group but can include any moiety capable of serving as a site for attachment of nucleotides via primer extension and/or nucleotide polymerization. In some embodiments, the term “3′ end” for example when referring to a target-specific primer, can include the terminal 10 nucleotides, the terminal 5 nucleotides, the terminal 4, 3, 2 or fewer nucleotides at the 3′end. In some embodiments, the term “3′ end” when referring to a target-specific primer can include nucleotides located at nucleotide positions 10 or fewer from the 3′ terminus.


As used herein, “5′ end”, and its derivatives, refers to an end of a nucleic acid molecule, for example a target sequence or amplified target sequence, which includes a free 5′ phosphate group or its equivalent. In some embodiments, the 5′ end includes a 5′ phosphate group that is not linked to a 3′ hydroxyl of a neighboring mononucleotide pentose ring. Typically, the 5′ end includes to one or more linked nucleotides located adjacent to the 5′ phosphate, typically the 30 nucleotides located adjacent to the nucleotide including the 5′ phosphate group, typically the terminal 20 and even more typically the terminal 15 nucleotides. One or more linked nucleotides can be represented as a percentage of the nucleotides present in the oligonucleotide or can be provided as a number of linked nucleotides adjacent to the 5′ phosphate. For example, the 5′ end can be less than 50% of the nucleotide length of an oligonucleotide. In another exemplary embodiment, the 5′ end can include about 15 nucleotides adjacent to the nucleotide including the terminal 5′ phosphate. In some embodiments, the 5′ end does not include any unlinked 5′ phosphate group but can include any moiety capable of serving as a site of attachment to a 3′ hydroxyl group, or to the 3′end of another nucleic acid molecule. In some embodiments, the term “5′ end” for example when referring to a target-specific primer, can include the terminal 10 nucleotides, the terminal 5 nucleotides, the terminal 4, 3, 2 or fewer nucleotides at the 5′end. In some embodiments, the term “5′ end” when referring to a target-specific primer can include nucleotides located at positions 10 or fewer from the 5′ terminus. In some embodiments, the 5′ end of a target-specific primer can include only non-cleavable nucleotides, for example nucleotides that do not contain one or more cleavable groups as disclosed herein, or a cleavable nucleotide as would be readily determined by one of ordinary skill in the art.


As used herein, “DNA barcode” and its derivatives, refers to a unique short (e.g., 6-14 nucleotide) nucleic acid sequence within an adapter that can act as a ‘key’ to distinguish or separate a plurality of amplified target sequences in a sample. For the purposes of this disclosure, a DNA barcode can be incorporated into the nucleotide sequence of an adapter.


As used herein, the phrases “two rounds of target-specific hybridization” or “two rounds of target-specific selection” and their derivatives refers to any process whereby the same target sequence is subjected to two consecutive rounds of hybridization-based target-specific selection, wherein a target sequence is hybridized to a target-specific sequence. Each round of hybridization based target-specific selection can include multiple target-specific hybridizations to at least some portion of a target-specific sequence. In one exemplary embodiment, a round of target-specific selection includes a first target-specific hybridization involving a first region of the target sequence and a second target-specific hybridization involving a second region of the target sequence. The first and second regions can be the same or different. In some embodiments, each round of hybridization-based target-specific selection can include use of two target specific oligonucleotides (e.g., a forward target-specific primer and a reverse target-specific primer), such that each round of selection includes two target-specific hybridizations.


As used herein, “comparable maximal minimum melting temperatures” and its derivatives, refers to the melting temperature (T m) of each nucleic acid fragment for a single adapter or target-specific primer after cleavage of the cleavable groups. The hybridization temperature of each nucleic acid fragment generated by a single adapter or target-specific primer is compared to determine the maximal minimum temperature required preventing hybridization of any nucleic acid fragment from the target-specific primer or adapter to the target sequence. Once the maximal hybridization temperature is known, it is possible to manipulate the adapter or target-specific primer, for example by moving the location of the cleavable group along the length of the primer, to achieve a comparable maximal minimum melting temperature with respect to each nucleic acid fragment.


As used herein, “addition only” and its derivatives, refers to a series of steps in which reagents and components are added to a first or single reaction mixture. Typically, the series of steps excludes the removal of the reaction mixture from a first vessel to a second vessel in order to complete the series of steps. An addition only process excludes the manipulation of the reaction mixture outside the vessel containing the reaction mixture. Typically, an addition-only process is amenable to automation and high-throughput.


As used herein, “synthesizing” and its derivatives, refers to a reaction involving nucleotide polymerization by a polymerase, optionally in a template-dependent fashion. Polymerases synthesize an oligonucleotide via transfer of a nucleoside monophosphate from a nucleoside triphosphate (NTP), deoxynucleoside triphosphate (dNTP) or dideoxynucleoside triphosphate (ddNTP) to the 3′ hydroxyl of an extending oligonucleotide chain. For the purposes of this disclosure, synthesizing includes to the serial extension of a hybridized adapter or a target-specific primer via transfer of a nucleoside monophosphate from a deoxynucleoside triphosphate.


As used herein, “polymerizing conditions” and its derivatives, refers to conditions suitable for nucleotide polymerization. In typical embodiments, such nucleotide polymerization is catalyzed by a polymerase. In some embodiments, polymerizing conditions include conditions for primer extension, optionally in a template-dependent manner, resulting in the generation of a synthesized nucleic acid sequence. In some embodiments, the polymerizing conditions include PCR. Typically, the polymerizing conditions include use of a reaction mixture that is sufficient to synthesize nucleic acids and includes a polymerase and nucleotides. The polymerizing conditions can include conditions for annealing of a target-specific primer to a target sequence and extension of the primer in a template dependent manner in the presence of a polymerase. In some embodiments, polymerizing conditions are practiced using thermocycling. Additionally, polymerizing conditions can include a plurality of cycles where the steps of annealing, extending, and separating the two nucleic strands are repeated. Typically, the polymerizing conditions include a cation such as MgCl2. Polymerization of one or more nucleotides to form a nucleic acid strand includes that the nucleotides be linked to each other via phosphodiester bonds, however, alternative linkages may be possible in the context of particular nucleotide analogs.


As used herein, the term “nucleic acid” refers to natural nucleic acids, artificial nucleic acids, analogs thereof, or combinations thereof, including polynucleotides and oligonucleotides. As used herein, the terms “polynucleotide” and “oligonucleotide” are used interchangeably and mean single-stranded and double-stranded polymers of nucleotides including, but not limited to, 2′-deoxyribonucleotides (nucleic acid) and ribonucleotides (RNA) linked by internucleotide phosphodiester bond linkages, e.g. 3′-5′ and 2′-5′, inverted linkages, e.g. 3′-3′ and 5′-5′, branched structures, or analog nucleic acids. Polynucleotides have associated counter ions, such as H+, NH4+, trialkylammonium, Mg2+, Na+ and the like. An oligonucleotide can be composed entirely of deoxyribonucleotides, entirely of ribonucleotides, or chimeric mixtures thereof. Oligonucleotides can be comprised of nucleobase and sugar analogs. Polynucleotides typically range in size from a few monomeric units, e.g. 5-40, when they are more commonly frequently referred to in the art as oligonucleotides, to several thousands of monomeric nucleotide units, when they are more commonly referred to in the art as polynucleotides; for purposes of this disclosure, however, both oligonucleotides and polynucleotides may be of any suitable length. Unless denoted otherwise, whenever a oligonucleotide sequence is represented, it will be understood that the nucleotides are in 5′ to 3′ order from left to right and that “A” denotes deoxyadenosine, “C” denotes deoxycytidine, “G” denotes deoxyguanosine, “T” denotes thymidine, and “U” denotes deoxyuridine. Oligonucleotides are said to have “5′ ends” and “3′ ends” because mononucleotides are typically reacted to form oligonucleotides via attachment of the 5′ phosphate or equivalent group of one nucleotide to the 3′ hydroxyl or equivalent group of its neighboring nucleotide, optionally via a phosphodiester or other suitable linkage.


As defined herein, the term “nick translation” and its variants comprise the translocation of one or more nicks or gaps within a nucleic acid strand to a new position along the nucleic acid strand. In some embodiments, a nick is formed when a double stranded adapter is ligated to a double stranded amplified target sequence. In one example, the primer can include at its 5′ end, a phosphate group that can ligate to the double stranded amplified target sequence, leaving a nick between the adapter and the amplified target sequence in the complementary strand. In some embodiments, nick translation results in the movement of the nick to the 3′ end of the nucleic acid strand. In some embodiments, moving the nick can include performing a nick translation reaction on the adapter-ligated amplified target sequence. In some embodiments, the nick translation reaction is a coupled 5′ to 3′ DNA polymerization/degradation reaction, or coupled to a 5′ to 3′ DNA polymerization/strand displacement reaction. In some embodiments, moving the nick can include performing a DNA strand extension reaction at the nick site. In some embodiments, moving the nick can include performing a single strand exonuclease reaction on the nick to form a single stranded portion of the adapter-ligated amplified target sequence and performing a DNA strand extension reaction on the single stranded portion of the adapter-ligated amplified target sequence to a new position. In some embodiments, a nick is formed in the nucleic acid strand opposite the site of ligation.


As used herein, the term “polymerase chain reaction” (“PCR”) refers to the method of K. B. Mullis U.S. Pat. Nos. 4,683,195 and 4,683,202, hereby incorporated by reference, which describe a method for increasing the concentration of a segment of a polynucleotide of interest in a mixture of expressed RNA or cDNA without cloning or purification. This process for amplifying the polynucleotide of interest consists of introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired polynucleotide of interest, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers are complementary to their respective strands of the double stranded polynucleotide of interest. To effect amplification, the mixture is denatured and the primers then annealed to their complementary sequences within the polynucleotide of interest molecule. Following annealing, the primers are extended with a polymerase to form a new pair of complementary strands. The steps of denaturation, primer annealing and polymerase extension can be repeated many times (i.e., denaturation, annealing and extension constitute one “cycle”; there can be numerous “cycles”) to obtain a high concentration of an amplified segment of the desired polynucleotide of interest. The length of the amplified segment of the desired polynucleotide of interest (amplicon) is determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of repeating the process, the method is referred to as the “PCR”. Because the desired amplified segments of the polynucleotide of interest become the predominant nucleic acid sequences (in terms of concentration) in the mixture, they are said to be “PCR amplified”. As defined herein, target nucleic acid molecules within a sample including a plurality of target nucleic acid molecules are amplified via PCR. In a modification to the method discussed above, the target nucleic acid molecules are PCR amplified using a plurality of different primer pairs, in some cases, one or more primer pairs per target nucleic acid molecule of interest, thereby forming a multiplex PCR reaction. In some embodiments provided herein, multiplex PCR amplifications are performed using a plurality of different primer pairs, in typical cases, one primer pair per target nucleic acid molecule. Using multiplex PCR, it is possible to simultaneously amplify multiple nucleic acid molecules of interest from a sample to form amplified target sequences. It is also possible to detect the amplified target sequences by several different methodologies (e.g., quantitation with a bioanalyzer or qPCR, hybridization with a labeled probe; incorporation of biotinylated primers followed by avidin-enzyme conjugate detection; incorporation of 32P-labeled deoxynucleotide triphosphates, such as dCTP or dATP, into the amplified target sequence). Any oligonucleotide sequence can be amplified with the appropriate set of primers, thereby allowing for the amplification of target nucleic acid molecules from RNA, cDNA, formalin-fixed paraffin-embedded DNA, fine-needle biopsies and various other sources. In particular, the amplified target sequences created by the multiplex PCR process as disclosed herein, are themselves efficient substrates for subsequent PCR amplification or various downstream assays or manipulations.


As defined herein “multiplex amplification” refers to selective and non-random amplification of two or more target sequences within a sample using at least one target-specific primer. In some embodiments, multiplex amplification is performed such that some or all of the target sequences are amplified within a single reaction vessel. The “plexy” or “plex” of a given multiplex amplification refers to the number of different target-specific sequences that are amplified during that single multiplex amplification. In some embodiments, the plexy is about 12-plex, 24-plex, 48-plex, 74-plex, 96-plex, 120-plex, 144-plex, 168-plex, 192-plex, 216-plex, 240-plex, 264-plex, 288-plex, 312-plex, 336-plex, 360-plex, 384-plex, or 398-plex. In some embodiments, highly multiplexed amplification reactions include reactions with a plexy of greater than 12-plex.


In some embodiments, the amplified target sequences are formed via PCR. Extension of target-specific primers can be accomplished using one or more DNA polymerases. In one embodiment, the polymerase is any Family A DNA polymerase (also known as pol I family) or any Family B DNA polymerase. In some embodiments, the DNA polymerase is a recombinant form capable of extending target-specific primers with superior accuracy and yield as compared to a non-recombinant DNA polymerase. For example, the polymerase can include a high-fidelity polymerase or thermostable polymerase. In some embodiments, conditions for extension of target-specific primers can include ‘Hot Start’ conditions, for example Hot Start polymerases, such as Amplitaq Gold® DNA polymerase (Applied Biosciences), Platinum® Taq DNA Polymerase High Fidelity (Invitrogen) or KOD Hot Start DNA polymerase (EMD Biosciences). A ‘Hot Start’ polymerase includes a thermostable polymerase and one or more antibodies that inhibit DNA polymerase and 3′-5′ exonuclease activities at ambient temperature. In some instances, ‘Hot Start’ conditions can include an aptamer.


In some embodiments, the polymerase is an enzyme such as Taq polymerase (from Thermus aquaticus), Tfi polymerase (from Thermus filiformis), Bst polymerase (from Bacillus stearothermophilus), Pfu polymerase (from Pyrococcus furiosus), Tth polymerase (from Thermus thermophilus), Pow polymerase (from Pyrococcus woesei), Tli polymerase (from Thermococcus litoralis), Ultima polymerase (from Thermotoga maritima), KOD polymerase (from Thermococcus kodakaraensis), Pol I and II polymerases (from Pyrococcus abyssi) and Pab (from Pyrococcus abyssi). In some embodiments, the DNA polymerase can include at least one polymerase such as Amplitaq Gold® DNA polymerase (Applied Biosciences), Stoffel fragment of Amplitaq® DNA Polymerase (Roche), KOD polymerase (EMD Biosciences), KOD Hot Start polymerase (EMD Biosciences), Deep Vent™ DNA polymerase (New England Biolabs), Phusion polymerase (New England Biolabs), Klentaql polymerase (DNA Polymerase Technology, Inc), Klentaq Long Accuracy polymerase (DNA Polymerase Technology, Inc), Omni KlenTaq™ DNA polymerase (DNA Polymerase Technology, Inc), Omni KlenTaq™ LA DNA polymerase (DNA Polymerase Technology, Inc), Platinum® Taq DNA Polymerase (Invitrogen), Hemo Klentag™ (New England Biolabs), Platinum® Taq DNA Polymerase High Fidelity (Invitrogen), Platinum® Pfx (Invitrogen), Accuprime™ Pfx (Invitrogen), or Accuprime™ Taq DNA Polymerase High Fidelity (Invitrogen).


In some embodiments, the DNA polymerase is a thermostable DNA polymerase. In some embodiments, the mixture of dNTPs is applied concurrently, or sequentially, in a random or defined order. In some embodiments, the amount of DNA polymerase present in the multiplex reaction is significantly higher than the amount of DNA polymerase used in a corresponding single plex PCR reaction. As defined herein, the term “significantly higher” refers to an at least 3-fold greater concentration of DNA polymerase present in the multiplex PCR reaction as compared to a corresponding single plex PCR reaction.


In some embodiments, the amplification reaction does not include a circularization of amplification product, for example as disclosed by rolling circle amplification.


The practice of the present subject matter may employ, unless otherwise indicated, conventional techniques and descriptions of organic chemistry, molecular biology (including recombinant techniques), cell biology, and biochemistry, which are within the skill of the art. Such conventional techniques include, but are not limited to, preparation of synthetic polynucleotides, polymerization techniques, chemical and physical analysis of polymer particles, preparation of nucleic acid libraries, nucleic acid sequencing and analysis, and the like. Specific illustrations of suitable techniques can be used by reference to the examples provided herein. Other equivalent conventional procedures can also be used. Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV), PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press), Hermanson, Bioconjugate Techniques, Second Edition (Academic Press, 2008); Merkus, Particle Size Measurements (Springer, 2009); Rubinstein and Colby, Polymer Physics (Oxford University Press, 2003); and the like.


According to various exemplary embodiments, one or more features of any one or more of the above-discussed teachings and/or exemplary embodiments may be performed or implemented using appropriately configured and/or programmed hardware and/or software elements. Determining whether an embodiment is implemented using hardware and/or software elements may be based on any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds, etc., and other design or performance constraints.


Examples of hardware elements may include processors, microprocessors, input(s) and/or output(s) (I/O) device(s) (or peripherals) that are communicatively coupled via a local interface circuit, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. The local interface may include, for example, one or more buses or other wired or wireless connections, controllers, buffers (caches), drivers, repeaters and receivers, etc., to allow appropriate communications between hardware components. A processor is a hardware device for executing software, particularly software stored in memory. The processor can be any custom made or commercially available processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the computer, a semiconductor based microprocessor (e.g., in the form of a microchip or chip set), a macroprocessor, or any device for executing software instructions. A processor can also represent a distributed processing architecture. The I/O devices can include input devices, for example, a keyboard, a mouse, a scanner, a microphone, a touch screen, an interface for various medical devices and/or laboratory instruments, a bar code reader, a stylus, a laser reader, a radio-frequency device reader, etc. Furthermore, the I/O devices also can include output devices, for example, a printer, a bar code printer, a display, etc. Finally, the I/O devices further can include devices that communicate as both inputs and outputs, for example, a modulator/demodulator (modem; for accessing another device, system, or network), a radio frequency (RF) or other transceiver, a telephonic interface, a bridge, a router, etc.


Examples of software may include software components, programs, applications, computer programs, application programs, system programs, machine programs, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computing code, computer code, code segments, computer code segments, words, values, symbols, or any combination thereof. A software in memory may include one or more separate programs, which may include ordered listings of executable instructions for implementing logical functions. The software in memory may include a system for identifying data streams in accordance with the present teachings and any suitable custom made or commercially available operating system (O/S), which may control the execution of other computer programs such as the system, and provides scheduling, input-output control, file and data management, memory management, communication control, etc.


According to various exemplary embodiments, one or more features of any one or more of the above-discussed teachings and/or exemplary embodiments may be performed or implemented using appropriately configured and/or programmed non-transitory machine-readable medium or article that may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the exemplary embodiments. Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, scientific or laboratory instrument, etc., and may be implemented using any suitable combination of hardware and/or software. The machine-readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, read-only memory compact disc (CD-ROM), recordable compact disc (CD-R), rewriteable compact disc (CD-RW), optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disc (DVD), a tape, a cassette, etc., including any medium suitable for use in a computer. Memory can include any one or a combination of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, EPROM, EEROM, Flash memory, hard drive, tape, CDROM, etc.). Moreover, memory can incorporate electronic, magnetic, optical, and/or other types of storage media. Memory can have a distributed architecture where various components are situated remote from one another, but are still accessed by the processor. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, etc., implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.


According to various exemplary embodiments, one or more features of any one or more of the above-discussed teachings and/or exemplary embodiments may be performed or implemented at least partly using a distributed, clustered, remote, or cloud computing resource.


According to various exemplary embodiments, one or more features of any one or more of the above-discussed teachings and/or exemplary embodiments may be performed or implemented using a source program, executable program (object code), script, or any other entity comprising a set of instructions to be performed. When a source program, the program can be translated via a compiler, assembler, interpreter, etc., which may or may not be included within the memory, so as to operate properly in connection with the O/S. The instructions may be written using (a) an object oriented programming language, which has classes of data and methods, or (b) a procedural programming language, which has routines, subroutines, and/or functions, which may include, for example, C, C++, Pascal, Basic, Fortran, Cobol, Perl, Java, and Ada.


According to various exemplary embodiments, one or more of the above-discussed exemplary embodiments may include transmitting, displaying, storing, printing or outputting to a user interface device, a computer readable storage medium, a local computer system or a remote computer system, information related to any information, signal, data, and/or intermediate or final results that may have been generated, accessed, or used by such exemplary embodiments. Such transmitted, displayed, stored, printed or outputted information can take the form of searchable and/or filterable lists of runs and reports, pictures, tables, charts, graphs, spreadsheets, correlations, sequences, and combinations thereof, for example.


Various additional exemplary embodiments may be derived by repeating, adding, or substituting any generically or specifically described features and/or components and/or substances and/or steps and/or operating conditions set forth in one or more of the above-described exemplary embodiments. Further, it should be understood that an order of steps or order for performing certain actions is immaterial so long as the objective of the steps or action remains achievable, unless specifically stated otherwise. Furthermore, two or more steps or actions can be conducted simultaneously so long as the objective of the steps or action remains achievable, unless specifically stated otherwise. Moreover, any one or more feature, component, aspect, step, or other characteristic mentioned in one of the above-discussed exemplary embodiments may be considered to be a potential optional feature, component, aspect, step, or other characteristic of any other of the above-discussed exemplary embodiments so long as the objective of such any other of the above-discussed exemplary embodiments remains achievable, unless specifically stated otherwise.


In certain embodiments, compositions of the invention comprise target BCR primer sets wherein the primers are directed to sequences of the same target BCR gene. In some embodiments the immune receptor is an antibody receptor selected from the group consisting of heavy chain alpha, heavy chain delta, heavy chain epsilon, heavy chain gamma, heavy chain mu, light chain kappa, and light chain lambda. In some embodiments, a target BCR primer set can be combined with a primer set directed to a TCR selected from the group consisting of TCR alpha, TCR beta, TCR gamma, and TCR delta.


In some embodiments, compositions of the invention comprise target BCR primer sets selected to have various parameters or criteria outlined herein. In some embodiments, compositions of the invention comprise a plurality of target-specific primers (e.g., V gene FR2- and FR3-directed primers, the J gene directed primers, the Cint-KDE directed primers) of about 15 nucleotides to about 40 nucleotides in length and having at least two or more following criteria: a cleavable group located at a 3′ end of substantially all of the plurality of primers, a cleavable group located near or about a central nucleotide of substantially all of the plurality of primers, substantially all of the plurality of primers at a 5′ end including only non-cleavable nucleotides, minimal cross-hybridization to substantially all of the primers in the plurality of primers, minimal cross-hybridization to non-specific sequences present in a sample, minimal self-complementarity, and minimal nucleotide sequence overlap at a 3′ end or a 5′ end of substantially all of the primers in the plurality of primers. In some embodiments, the composition can include primers with any 3, 4, 5, 6 or 7 of the above criteria.


In some embodiments, composition comprise a plurality of target-specific primers of about 15 nucleotides to about 40 nucleotides in length having two or more of the following criteria: a cleavable group located near or about a central nucleotide of substantially all of the plurality of primers, substantially all of the plurality of primers at a 5′ end including only non-cleavable nucleotides, substantially all of the plurality of primers having less than 20% of the nucleotides across the primer's entire length containing a cleavable group, at least one primer having a complementary nucleic acid sequence across its entire length to a target sequence present in a sample, minimal cross-hybridization to substantially all of the primers in the plurality of primers, minimal cross-hybridization to non-specific sequences present in a sample, and minimal nucleotide sequence overlap at a 3′ end or a 5′ end of substantially all of the primers in the plurality of primers. In some embodiments, the composition can include primers with any 3, 4, 5, 6 or 7 of the above criteria.


In some embodiments, target-specific primers (e.g., the V gene FR2- and FR3-directed primers, the J gene directed primers, and the Cint-KDE gene directed primers) used in the compositions of the invention are selected or designed to satisfy any one or more of the following criteria: (1) includes two or more modified nucleotides within the primer sequence, at least one of which is included near or at the termini of the primer and at least one of which is included at, or about the center nucleotide position of the primer sequence; (2) length of about 15 to about 40 bases in length; (3) Tm of from above 60° C. to about 70° C.; (4) low cross-reactivity with non-target sequences present in the sample; (5) at least the first four nucleotides (going from 3′ to 5′ direction) are non-complementary to any sequence within any other primer present in the composition; and (6) non-complementary to any consecutive stretch of at least 5 nucleotides within any other sequence targeted for amplification with the primers. In some embodiments, the target-specific primers used in the compositions are selected or designed to satisfy any 2, 3, 4, 5, or 6 of the above criteria. In some embodiments, the two or more modified nucleotides have cleavable groups. In some embodiments, each of the plurality of target-specific primers comprises two or more modified nucleotides selected from a cleavable group of methylguanine, 8-oxo-guanine, xanthine, hypoxanthine, 5,6-dihydrouracil, uracil, 5-methylcytosine, thymine-dimer, 7-methylguanosine, 8-oxo-deoxyguanosine, xanthosine, inosine, dihydrouridine, bromodeoxyuridine, uridine or 5-methylcytidine.


In some embodiments compositions are provided for analysis of an immune repertoire in a sample, comprising at least one set of i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene; and ii) one or more J gene primers directed to at least a portion of a respective target C gene of the BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa, and IgLlambda and wherein each set of i) and ii) primers directed to the same target BCR is configured to amplify the target BCR repertoire. In certain embodiments a single set of primers comprising i) and ii) is encompassed within a composition. In more particular embodiments such set comprises primers directed to IgH. In still other embodiments at least two sets of primers are encompassed in a composition wherein the sets are directed to IgH and IgLkappa and IgLlambda.


In some embodiments compositions are provided for analysis of a BCR repertoire in a sample, comprising at least one set of i) a plurality of V gene primers directed to a majority of different V genes of at least one BCR coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene; and ii) a plurality of J gene primers directed to a majority of different J genes of the respective target BCR coding sequence, wherein each set of i) and ii) primers directed to the same target immune receptor sequences is selected from the group consisting of IgH, IgLkappa and IgLlambda and wherein each set of i) and ii) primers directed to the same target immune receptor is configured to amplify the target BCR repertoire. In certain embodiments a single set of primers comprising i) and ii) is encompassed within a composition. In more particular embodiments such set comprises primers directed to IgH. In still other embodiments at least two sets of primers are encompassed in a composition wherein the sets are directed to IgH and IgLkappa and IgLlambda. In still other embodiments three sets of primers are encompassed in a composition wherein the sets are directed to IgH and IgLkappa and IgLlambda.


In particular embodiments, compositions provided include target BCR primer sets comprising V gene primers wherein the one or more of a plurality of V gene primers are directed to sequences over an FR3 region about 50 nucleotides in length. In other embodiments the one or more of a plurality of V gene primers are directed to sequences over an FR3 region about 70 nucleotides in length. In other particular embodiments the one or more of a plurality of V gene primers are directed to sequences over an FR3 region about 40 to about 60 nucleotides in length. In some embodiments a target BCR primer set comprises V gene primers comprising about 50 to about 85 different FR3-directed primers. In certain embodiments a target BCR primer set comprises V gene primers comprising about 55 to about 80 different FR3-directed primers. In some embodiments a target BCR primer set comprises V gene primers comprising about 62 to about 75 different FR3-directed primers. In some embodiments, a target BCR primer set comprises V gene primers comprising about 65, 66, 67, 68, 69, or 70 different FR3-directed primers. In some embodiments the target BCR primer set comprises a plurality of J gene primers. In some embodiments a target BCR primer set comprises at least 2 J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In certain embodiments a target BCR primer set comprises 2 to about 8 J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises about 3 to about 6 different J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises about 2, 3, 4, 5, 6, 7 or 8 different J gene primers wherein each is directed to at least a portion of a J gene within target polynucleotides. In some embodiments a target BCR primer set comprises about 4 J gene primers wherein each is directed to at least a portion of the J gene portion within target polynucleotides.


In particular embodiments, methods of the invention comprise the use of at least one set of primers comprising V gene primers of BCR IgH coding sequence and J gene primers of BCR IgH coding sequence i), and V gene primers of BCR IgLlambda coding sequence and J gene primers of BCR IgLlambda coding sequence ii), and V gene primers of BCR IgIgLkappa coding sequence and J gene primers of BCR IgLkappa coding sequence iii), and optionally Cint sequence primers and KDE sequence primers iv), selected from Tables 9 and 6 and Tables 3-4 and Tables 1-2 and Table 5, respectively.


In particular embodiments, methods of the invention comprise the use of at least one set of primers comprising V gene primers of BCR IgH FR2 coding sequence and J gene primers of BCR IgH coding sequence i), and/or V gene primers of BCR IgH distal FR3 coding sequence and J gene primers of BCR IgH coding sequence ii), selected from Tables 8 and 6 and Tables 7 and 6, respectively.


In certain embodiments compositions of the invention comprise at least one set of primers i) and ii) and iii), optionally iv) comprising primers selected from SEQ ID NOs1161-1446 and 973-988 and SEQ ID Nos 597-910 and 911-950 and SEQ ID Nos 1-548 and 549-596 and optionally selected from SEQ ID Nos 951-972. In other certain embodiments compositions of the invention comprise at least one set of primers i) and ii) comprising primers selected from SEQ ID 1304-1446 and 981-988 and SEQ ID Nos 785-816, 847-876 and 931-935, 941-945 and SEQ ID Nos 406-456 and 557-580-596 and optionally selected from SEQ ID Nos 960, 961 and 972.


In certain embodiments compositions of the invention comprise at least one set of primers i) and ii) comprising primers selected from SEQ ID NOs: 1065-1160 and 973-988 or selected from SEQ ID NOs: 1065-1112 and 981-988. In other certain embodiments compositions of the invention comprise at least one set of primers i) and ii) comprising primers selected from SEQ ID NOs: 989-1064 and 973-988 or selected from SEQ ID NOs: 1027-1064 and 981-988.


In some embodiments, multiple different primers including at least one modified nucleotide can be used in a single amplification reaction. For example, multiplexed primers including modified nucleotides can be added to the amplification reaction mixture, where each primer (or set of primers) selectively hybridizes to, and promotes amplification of different rearranged target nucleic acid molecules within the nucleic acid population. In some embodiments, the target specific primers can include at least one uracil nucleotide.


In some embodiments, multiplex amplification may be performed using PCR and cycles of denaturation, primer annealing, and polymerase extension steps at set temperatures for set times. In some embodiments, about 12 cycles to about 30 cycles are used to generate the amplicon library in the multiplex amplification reaction. In some embodiments, 13 cycles, 14 cycles, 15 cycles, 16 cycles, 17 cycles, 18 cycles, 19 cycles, preferably 20 cycles, 23 cycles, or 25 cycles are used to generate the amplicon library in the multiplex amplification reaction. In some embodiments, 17-25 cycles are used to generate the amplicon library in the multiplex amplification reaction.


In some embodiments, the amplification reactions are conducted in parallel within a single reaction phase (for example, within the same amplification reaction mixture within a single well or tube). In some instances, an amplification reaction can generate a mixture of products including both the intended amplicon product as well as unintended, unwanted, nonspecific amplification artifacts such as primer-dimers. Post amplification, the reactions are then treated with any suitable agent that will selectively cleave or otherwise selectively destroy the nucleotide linkages of the modified nucleotides within the excess unincorporated primers and the amplification artifacts without cleaving or destroying the specification amplification products. For example, the primers can include uracil-containing nucleobases that can be selectively cleaved using UNG/UDG (optionally with heat and/or alkali). In some embodiments, the primers can include uracil-containing nucleotides that can be selectively cleaved using UNG and Fpg. In some embodiments, the cleavage treatment includes exposure to oxidizing conditions for selective cleavage of dithiols, treatment with RNAse H for selective cleavage of modified nucleotides including RNA-specific moieties (e.g., ribose sugars, etc.), and the like. This cleavage treatment can effectively fragment the original amplification primers and non-specific amplification products into small nucleic acid fragments that include relatively few nucleotides each. Such fragments are typically incapable of promoting further amplification at elevated temperatures. Such fragments can also be removed relatively easily from the reaction pool through the various post-amplification cleanup procedures known in the art (e.g., spin columns, NaEtOH precipitation, etc).


In some embodiments, amplification products following cleavage or other selective destruction of the nucleotide linkages of the modified nucleotides are optionally treated to generate amplification products that possess a phosphate at the 5′ termini. In some embodiments, the phosphorylation treatment includes enzymatic manipulation to produce 5′ phosphorylated amplification products. In one embodiment, enzymes such as polymerases can be used to generate 5′ phosphorylated amplification products. For example, T4 polymerase can be used to prepare 5′ phosphorylated amplicon products. Klenow can be used in conjunction with one or more other enzymes to produce amplification products with a 5′ phosphate. In some embodiments, other enzymes known in the art can be used to prepare amplification products with a 5′ phosphate group. For example, incubation of uracil nucleotide containing amplification products with the enzyme UDG, Fpg and T4 polymerase can be used to generate amplification products with a phosphate at the 5′ termini. It will be apparent to one of skill in the art that other techniques, other than those specifically described herein, can be applied to generate phosphorylated amplicons. It is understood that such variations and modifications that are applied to practice the methods, systems, kits, compositions and apparatuses disclosed herein, without resorting to undue experimentation are considered within the scope of the disclosure.


In some embodiments, primers that are incorporated in the intended (specific) amplification products, these primers are similarly cleaved or destroyed, resulting in the formation of “sticky ends” (e.g., 5′ or 3′ overhangs) within the specific amplification products. Such “sticky ends” can be addressed in several ways. For example, if the specific amplification products are to be cloned, the overhang regions can be designed to complement overhangs introduced into the cloning vector, thereby enabling sticky ended ligations that are more rapid and efficient than blunt ended ligations. Alternatively, the overhangs may need to be repaired (as with several next-generation sequencing methods). Such repair can be accomplished either through secondary amplification reactions using only forward and reverse amplification primers (e.g., correspond to A and P1 primers) comprised of only natural bases. In this manner, subsequent rounds of amplification rebuild the double-stranded templates, with nascent copies of the amplicon possessing the complete sequence of the original strands prior to primer destruction. Alternatively, the sticky ends can be removed using some forms of fill-in and ligation processing, wherein the forward and reverse primers are annealed to the templates. A polymerase can then be employed to extend the primers, and then a ligase, optionally a thermostable ligase, can be utilized to connect the resulting nucleic acid strands. This could obviously be also accomplished through various other reaction pathways, such as cyclical extend-ligation, etc. In some embodiments, the ligation step can be performed using one or more DNA ligases.


In some embodiments, the amplicon library prepared using target—specific primer pairs can be used in downstream enrichment applications such as emulsion PCR, bridge PCR or isothermal amplification. In some embodiments, the amplicon library can be used in an enrichment application and a sequencing application. For example, an amplicon library can be sequenced using any suitable DNA sequencing platform, including any suitable next generation DNA sequencing platform. In some embodiments, an amplicon library can be sequenced using an Ion PGM Sequencer or an Ion GeneStudio S5 Sequencer (Thermo Fisher Scientific). In some embodiments, a PGM Sequencer or an S5 Sequencer can be coupled to server that applies parameters or software to determine the sequence of the amplified target nucleic acid molecules. In some embodiments, the amplicon library can be prepared, enriched and sequenced in less than 24 hours. In some embodiments, the amplicon library can be prepared, enriched and sequenced in approximately 9 hours.


In some embodiments, methods for generating an amplicon library can include: amplifying cDNA of immune receptor genes using V gene-specific and C gene-specific primers to generate amplicons; purifying the amplicons from the input DNA and primers; phosphorylating the amplicons; ligating adapters to the phosphorylated amplicons; purifying the ligated amplicons; nick-translating the amplified amplicons; and purifying the nick-translated amplicons to generate the amplicon library. In some embodiments, methods for generating an amplicon library can include: amplifying cDNA of immune receptor genes using V gene-specific and J gene-specific primers to generate amplicons; purifying the amplicons from the input DNA and primers; phosphorylating the amplicons; ligating adapters to the phosphorylated amplicons; purifying the ligated amplicons; nick-translating the amplified amplicons; and purifying the nick-translated amplicons to generate the amplicon library. In some embodiments, additional amplicon library manipulations can be conducted following the step of amplification of rearranged immune receptor gene targets to generate the amplicons. In some embodiments, any combination of additional reactions can be conducted in any order, and can include: purifying; phosphorylating; ligating adapters; nick-translating; amplification and/or sequencing. In some embodiments, any of these reactions can be omitted or can be repeated. It will be readily apparent to one of skill in the art that the method can repeat or omit any one or more of the above steps. It will also be apparent to one of skill in the art that the order and combination of steps may be modified to generate the required amplicon library, and is not therefore limited to the exemplary methods provided.


A phosphorylated amplicon can be joined to an adapter to conduct a nick translation reaction, subsequent downstream amplification (e.g., template preparation), or for attachment to particles (e.g., beads), or both. For example, an adapter that is joined to a phosphorylated amplicon can anneal to an oligonucleotide capture primer which is attached to a particle, and a primer extension reaction can be conducted to generate a complimentary copy of the amplicon attached to the particle or surface, thereby attaching an amplicon to a surface or particle. Adapters can have one or more amplification primer hybridization sites, sequencing primer hybridization sites, barcode sequences, and combinations thereof. In some embodiments, amplicons prepared by the methods disclosed herein can be joined to one or more Ion Torrent™ compatible adapters to construct an amplicon library. Amplicons generated by such methods can be joined to one or more adapters for library construction to be compatible with a next generation sequencing platform. For example, the amplicons produced by the teachings of the present disclosure can be attached to adapters provided in the Ion AmpliSeg™ Library Kit 2.0 or Ion AmpliSeg™ Library Kit Plus (Thermo Fisher Scientific).


In some embodiments, amplification of immune receptor cDNA or rearranged gDNA can be conducted using a 5×Ion AmpliSeg™ HiFi Master Mix. In some embodiments, the 5×Ion AmpliSeg™ HiFi Master Mix can include glycerol, dNTPs, and a DNA polymerase such as Platinum™ Taq DNA polymerase High Fidelity. In some embodiments, the 5×Ion AmpliSeqm™ HiFi Master Mix can further include at least one of the following: a preservative, magnesium chloride, magnesium sulfate, tris-sulfate and/or ammonium sulfate.


In some embodiments, the immune receptor rearranged gDNA multiplex amplification reaction further includes at least one PCR additive to improve on-target amplification, amplification yield, and/or the percentage of productive sequencing reads. In some embodiments, the at least one PCR additive includes at least one of potassium chloride or additional dNTPs (e.g., dATP, dCTP, dGTP, dTTP). In some embodiments, the dNTPs as a PCR additive is an equimolar mixture of dNTPs. In some embodiments, the dNTP mix as a PCR additive is an equimolar mixture of dATP, dCTP, dGTP, and dTTP. In some embodiments, about 0.2 mM to about 5.0 mM dNTPs is added to the multiplex amplification reaction. In some embodiments, amplification of rearranged immune receptor gDNA can be conducted using a 5× Ion AmpliSeg™ HiFi Master Mix and an additional about 0.2 mM to about 5.0 mM dNTPs in the reaction mixture. In some embodiments, amplification of rearranged immune receptor gDNA can be conducted using a 5× Ion AmpliSeg™ HiFi Master Mix and an additional about 0.5 mM to about 4 mM, about 0.5 mM to about 3 mM, about 0.5 mM to about 2.5 mM, about 0.5 mM to about 1.0 mM, about 0.75 mM to about 1.25 mM, about 1.0 mM to about 1.5 mM, about 1.0 to about 2.0 mM, about 2.0 mM to about 3.0 mM, about 1.25 to about 1.75 mM, about 1.3 to about 1.8 mM, about 1.4 mM to about 1.7 mM, or about 1.5 to about 2.0 mM dNTPs in the reaction mixture. In some embodiments, amplification of rearranged immune receptor gDNA can be conducted using a 5× Ion AmpliSeg™ HiFi Master Mix and an additional about 0.2 mM, about 0.4 mM, about 0.6 mM, about 0.8 mM, about 1.0 mM, about 1.2 mM, about 1.4 mM, about 1.6 mM, about 1.8 mM, about 2.0 mM, about 2.2 mM, about 2.4 mM, about 2.6 mM, about 2.8 mM, about 3.0 mM, about 3.5 mM, or about 4.0 mM dNTPs in the reaction mixture. In some embodiments, about 10 mM to about 200 mM potassium chloride is added to the multiplex amplification reaction. In some embodiments, amplification of rearranged immune receptor gDNA can be conducted using a 5× Ion AmpliSeg™ HiFi Master Mix and an additional about 10 mM to about 200 mM potassium chloride in the reaction mixture. In some embodiments, amplification of rearranged immune receptor gDNA can be conducted using a 5× Ion AmpliSeg™ HiFi Master Mix and an additional about 10 mM to about 60 mM, about 20 mM to about 70 mM, about 30 mM to about 80 mM, about 40 mM to about 90 mM, about 50 mM to about 100 mM, about 60 mM to about 120 mM, about 80 mM to about 140 mM, about 50 mM to about 150 mM, about 150 mM to about 200 mM or about 100 mM to about 200 mM potassium chloride in the reaction mixture. In some embodiments, amplification of rearranged immune receptor gDNA can be conducted using a 5× Ion AmpliSeg™ HiFi Master Mix and an additional about 10 mM, about 20 mM, about 30 mM, about 40 mM, about 50 mM, about 60 mM, about 70 mM, about 80 mM, about 90 mM, about 100 mM, about 120 mM, about 140 mM, about 150 mM, about 160 mM, about 180 mM, or about 200 mM potassium chloride in the reaction mixture.


In some embodiments, phosphorylation of the amplicons can be conducted using a FuPa reagent. In some embodiments, the FuPa reagent can include a DNA polymerase, a DNA ligase, at least one uracil cleaving or modifying enzyme, and/or a storage buffer. In some embodiments, the FuPa reagent can further include at least one of the following: a preservative and/or a detergent.


In some embodiments, phosphorylation of the amplicons can be conducted using a FuPa reagent. In some embodiments, the FuPa reagent can include a DNA polymerase, at least one uracil cleaving or modifying enzyme, an antibody and/or a storage buffer. In some embodiments, the FuPa reagent can further include at least one of the following: a preservative and/or a detergent. In some embodiments, the antibody is provided to inhibit the DNA polymerase and 3′-5′ exonuclease activities at ambient temperature.


In some embodiments, the amplicon library produced by the teachings of the present disclosure are sufficient in yield to be used in a variety of downstream applications including the Ion Chef™ instrument and the Ion S5™ Sequencing Systems (Thermo Fisher Scientific).


It will be apparent to one of ordinary skill in the art that numerous other techniques, platforms or methods for clonal amplification such as wildfire PCR and bridge amplification can be used in conjunction with the amplified target sequences of the present disclosure. It is also envisaged that one of ordinary skill in art upon further refinement or optimization of the conditions provided herein can proceed directly to nucleic acid sequencing (for example using Ion PGM™ System or Ion S5™ System or Ion Proton™ System sequencers, Thermo Fisher Scientific) without performing a clonal amplification step.


In some embodiments, at least one of the amplified targets sequences to be clonally amplified can be attached to a support or particle. The support can be comprised of any suitable material and have any suitable shape, including, for example, planar, spheroid or particulate. In some embodiments, the support is a scaffolded polymer particle as described in U.S. Published App. No. 20100304982, hereby incorporated by reference in its entirety.


In some embodiments, a kit is provided for amplifying multiple immune receptor expression sequences from a population of nucleic acid molecules in a single reaction. In some embodiments, the kit includes a plurality of target-specific primer pairs containing one or more cleavable groups, one or more DNA polymerases, a mixture of dNTPs and at least one cleaving reagent. In one embodiment, the cleavable group is 8-oxo-deoxyguanosine, deoxyuridine or bromodeoxyuridine. In some embodiments, the at least one cleaving reagent includes RNaseH, uracil DNA glycosylase, Fpg or alkali. In one embodiment, the cleaving reagent is uracil DNA glycosylase. In some embodiments, the kit is provided to perform multiplex PCR in a single reaction chamber or vessel. In some embodiments, the kit includes at least one DNA polymerase, which is a thermostable DNA polymerase. In some embodiments, the concentration of the one or more DNA polymerases is present in a 3-fold excess as compared to a single PCR reaction. In some embodiments, the final concentration of each target-specific primer pair is present at about 5 nM to about 2000 nM. In some embodiments, the final concentration of each target-specific primer pair is present at about 25 nM to about 50 nM or about 100 nM to about 800 nM. In some embodiments, the final concentration of each target-specific primer pair is present at about 50 nM to about 400 nM or about 50 nM to about 200 nM. In some embodiments, the final concentration of each target-specific primer pair is present at about 200 nM or about 400 nM. In some embodiments, the kit provides amplification of immune repertoire expression sequences from immunoglobulin heavy chain gamma, immunoglobulin heavy chain mu, immunoglobulin heavy chain alpha, immunoglobulin heavy chain delta, immunoglobulin heavy chain epsilon, immunoglobulin light chain lambda, or immunoglobulin light chain kappa from a population of nucleic acid molecules in a single reaction chamber. In particular embodiments, a provided kit is a test kit. In some embodiments, the kit further comprises one or more adapters, barcodes, and/or antibodies.









TABLE 1







IgL lambda V gene FR3











SEQ

SEQ ID


Sequence
ID NO
Sequence
NO













ATCTCTGGGCTCCAGGCTG
1
GAGCCCAAGCCGGGGATGAG
285





ATCTCTGGGCTCTAGTCTG
2
GAGCCCAGGCTGGGGACGAG
286





ATTACTGGACTCCAGCCTG
3
GGGCCCAGGCAGATGATGAA
287





CTGTCAGGTGTGCAGCCTG
4
GAGTCCAGGCAGAAGACGAG
288





ATCTCCAACCTCCAGTTAG
5
GACTGAAGACTGAGGACGAG
289





ATCTCTGGGCTCCAGCCTG
6
CAATCCCGTCTGAGGATGGA
290





ATCTCTGGGCTCCAGTCTG
7
GGCTCTGGGCTGAGGACAAG
291





ATCAGCAGGGCTCAGACTG
8
ATGGGGGCACAGGATGGAAA
292





CTTTTGGGTGCGCAGCCTG
9
ATGGGGCACAGGATGGAAA
293





ATCACTGGACTCCAGTCTG
10
ACAGGGCCCAGGCTGGGA
294





ATCAGTGGGCTCCAGTCTG
11
CTGAGCAGCCTGAGATCAAG
295





ATCACTGGGGCCCAGGCTG
12
ACAGGGCCCAGGCTGGGGA
296





ATCTCCAACCTCCAGTCTG
13
ATGGGCCCCAGGCTGGAAA
297





ACCTCTGGGCTCCAGGCTG
14
GTGGGGCCCAGGCCAGGGA
298





CTTTCGGGTGCGCAGCCTG
15
TGTGCAGCCCGAGAGGTGAA
299





ATCTCTGGACTCCAGGCTG
16
GACTCCAGUCTGAGGAUGAG
300





ATCTCCGGGCTCCAGTCTG
17
GGCTCCAGUCTGAGGAUGAG
301





ATCACTGGGCTCCAGGCTG
18
ACCTCCAGUTAGAGGAUGAG
302





ATCTCCAACCTCCAGTTTG
19
ACCTCCAGUTTGAGGAUGAG
303





ATCTCGGGCCTCTAGCCTG
20
GGGCCCAGGCUGAGGAUGAG
304





ATCATCAGGGCTCAGACTG
21
GCCTCCAGUCTGAGGAUGAG
305





ATCTCCAGCCTCCAGTCTG
22
GGCTCCGGUCCGAGGAUGAG
306





GTCTCTGGGCTCCAGGCTG
23
GGCTCTAGUCTGAGGAUGAG
307





ATCAGTGGGCTCCGGTCCG
24
GGCTCCAGUCTGAAGAUGAG
308





CCATCAGCAGGGTCCTGACCG
25
GUGTGCAGCCUGAGGACGAG
309





CCATCAGTGGAGTCCAGGCAG
26
GCCUCTAGCCUGAGGACGAG
310





CCATCAGCAGGATCGAGGCTG
27
GTGCGCAGCCUGAGGAUGAG
311





TCATTTCTACAATCCCGTCTG
28
GGCUCCAGGCUGAGGACGAG
312





CCATCACTGGGGCTCAGGCGG
29
GACUCCAGCCUGAGGACGAG
313





TCATCTCTGGGCTCCAGCCTG
30
ACCTCCAGUCTGAGGAUGAG
314





CCATCACTGGGATTCAGGTTG
31
GGCTCCAGGCUGAGGAUGAG
315





CCATTAGCAGGGTCCTGACCA
32
GACUCCAGGCUGAGGACGAG
316





CCATCAGCGGGGCCCAGGTTG
33
GGGCUCAGACUGAGGACGAG
317





CCATCAGCAGGGTCGAAGCCG
34
GGCTCCAGCCUGAGGAUGAG
318





CCATCTCTGGCCTCCAGACCA
35
GGGUCCUGACCGAAGACGAG
319





CCATCACGGGGGCCCAGGCAG
36
GCCUCTGGCCUGAGGACGAG
320





GCATCACCGGACTCCAGACTG
37
GCCTCTGGCCUGAGGACUAG
321





CCATCAGTGGAGCCCAGGCTG
38
GGATCGAGGCUGGGGAUGAG
322





GCATCTCTGAGCTGCAGCCTG
39
GGGCCCAGGUGGAGGAUGAA
323





CCATCACTGGGGCTCAGGTTG
40
GGCTCAAGUCCGAGGTUGAG
324





CCATCTCTGGACTGAAGACTG
41
GCCTCCAGACCAAGGACAAG
325





CTATCAGTGGGGCCCAGGTGG
42
GGCUCCAGCCUGAGGACGAG
326





CCATCAAGAACATCCAGGAAG
43
GGGCCCAGGUTGAGGAUGAG
327





CCATCAGCAGAGCCCAAGCCG
44
GGGUCCUGACCAAAGGCGGG
328





CCATCTCTGGGCTCAAGTCCG
45
ACCTCCAGUCTGACGAUGAG
329





GCACCACTGGGCTCTGGGCTG
46
AGCUGCAGCCUGAGGACGAG
330





GCATCACTGGCCTCTGGCCTG
47
GGACCCAGGCUATGGAUGAG
331





CCATCAGCGGGACCCAGGCTA
48
GACUCCAGACUGGGGACGAG
332





CCATCAGTGGGGCCCAGGTGG
49
ACAUCCAGGAAGAAGAUGAG
333





CCTTCTCCAACCTCCAGTCTG
50
GGAUTCAGGTUGAAGACAAG
334





CTGATAATCAATGGGCCCCAG
51
GGGUCGAAGCCGGGGAUGAG
335





TGACCATTAGTGGGGCCCAG
52
GAGUCCAGGCAGAAGAUGAG
336





CTGACCATTAGTGGGGCCCAG
53
ACAUCCAGGAAGAGGAUGAG
337





GGCCATCAACAGGGCCCAG
54
GGGCUCAGGTTGAACAUGAA
338





CTGCACATTTCTGAGCAGCCTG
55
GGGCUCAGGCGGAAGAUGAG
339





TTGATTATTAATGGGGCACAG
56
GAGCCCAAGCCGGGGATGAG
340





TTGATTATTAATGGGGGCACAG
57
GAGCCCAGGCTGGGGACGAG
341





CTCTTGGGTGTGCAGCCCGA
58
GGGCCCAGGCAGAUGAUGAA
342





ATCTCTGGGCUCCAGGCUG
59
GAGTCCAGGCAGAAGACGAG
343





ATCTCTGGGCUCTAGTCUG
60
GACUGAAGACUGAGGACGAG
344





ATTACTGGACUCCAGCCUG
61
CAATCCCGUCTGAGGAUGGA
345





CTGTCAGGUGTGCAGCCUG
62
GGCUCTGGGCUGAGGACAAG
346





ATCTCCAACCUCCAGTUAG
63
AUGGGGGCACAGGAUGGAAA
347





ATCTCTGGGCUCCAGCCUG
64
AUGGGGCACAGGAUGGAAA
348





ATCTCTGGGCUCCAGTCUG
65
ACAGGGCCCAGGCTGGGA
349





ATCAGCAGGGCUCAGACUG
66
CTGAGCAGCCUGAGAUCAAG
350





CTTTTGGGUGCGCAGCCUG
67
ACAGGGCCCAGGCTGGGGA
351





ATCACTGGACUCCAGTCUG
68
AUGGGCCCCAGGCUGGAAA
352





ATCAGTGGGCUCCAGTCUG
69
GTGGGGCCCAGGCCAGGGA
353





ATCACUGGGGCCCAGGCUG
70
TGUGCAGCCCGAGAGGUGAA
354





ATCTCCAACCUCCAGTCUG
71
TCCGAGGATGAGGCTGATTATTAC
355





ACCTCTGGGCUCCAGGCUG
72
GCTGAGGACGAGGCTGATTATTAG
356





CTTTCGGGUGCGCAGCCUG
73
TTAGAGGATGAGGCTGATTATTAC
357





ATCTCTGGACUCCAGGCUG
74
TTTGAGGATGAGGCTGATTATTAC
358





ATCTCCGGGCUCCAGTCUG
75
CCTGAGGACGAGGCTGACTATTAC
359





ATCACTGGGCUCCAGGCUG
76
GCTGAGGATGAGGCTGATTATTAC
360





ATCTCCAACCUCCAGTTUG
77
CCTGAGGATGAGGCTGAGTATTAC
361





ATCTCGGGCCUCTAGCCUG
78
CCTGAGGACGAGGCTGATTATTAC
362





ATCATCAGGGCUCAGACUG
79
CCTGAGGATGAGGCTGACTATTAC
363





ATCTCCAGCCUCCAGTCUG
80
TCTGAGGATGAGGCTGACTATTAC
364





GTCTCTGGGCUCCAGGCUG
81
CCTGAGGACGAGGCTGAGTATTAC
365





ATCAGTGGGCUCCGGUCCG
82
TCTGAGGATGAGGCTGATTATTAC
366





CCAUCAGCAGGGTCCUGACCG
83
GCTGAGGACGAGGCTGATTATTAC
367





CCAUCAGTGGAGUCCAGGCAG
84
TCTGAAGATGAGGCTGACTATTAC
368





CCATCAGCAGGAUCGAGGCUG
85
ACTGAGGACGAGGCTGACTATTAC
369





TCATTTCTACAAUCCCGTCUG
86
CCTGAGGACTAGGCCGATTATTAC
370





CCATCACUGGGGCUCAGGCGG
87
GTTGAAGACAAGGCTGACTATTAC
371





TCATCTCTGGGCUCCAGCCUG
88
ACTGGGGACGAGGCCGATTATTAC
372





CCATCACUGGGATTCAGGTUG
89
GCTGGGGATGAGGCTGACTATTAC
373





CCATUAGCAGGGTCCUGACCA
90
GCCGGGGATGAGGCCGACTATTAC
374





CCAUCAGCGGGGCCCAGGTUG
91
GTTGAGGATGAGGCTGACTATTAC
375





CCAUCAGCAGGGUCGAAGCCG
92
GCTGAGGACAAGACTGATTATCAC
376





CCATCUCTGGCCUCCAGACCA
93
TCTGACGATGAGGCTGAGTATCAC
377





CCATCACGGGGGCCCAGGCAG
94
ACCGAAGACGAGGCTGACTATTAC
378





GCATCACCGGACUCCAGACUG
95
TCCGAGGTTGAGGCTAATTATCAC
379





CCATCAGUGGAGCCCAGGCUG
96
GTGGAGGATGAAGATGACTACTAC
380





GCATCTCUGAGCTGCAGCCUG
97
CCTGAGGACGAGGCTATGTATTAC
381





CCATCACUGGGGCTCAGGTUG
98
ACCAAGGACAAGCCTGCCTATTAC
382





CCATCTCUGGACTGAAGACUG
99
GTTGAACATGAAGCTGACTATTAC
383





CTATCAGUGGGGCCCAGGUGG
100
TCTGAGGATGGAGCTGACTATATC
384





CCAUCAAGAACAUCCAGGAAG
101
GAAGAAGATGAGAGTGACTACCAC
385





CCATCAGCAGAGCCCAAGCCG
102
GCCGGGGATGAGGCTGACTATTAC
386





CCATCTCUGGGCTCAAGUCCG
103
GCTATGGATGAGGCTGACTATTAC
387





GCACCACUGGGCTCTGGGCUG
104
GCAGATGATGAACTGATTATTAC
388





GCATCACUGGCCTCTGGCCUG
105
GTGGAGGATGAAGCTGACTACTAC
389





CCAUCAGCGGGACCCAGGCUA
106
ACTGAGGACGAGGCTGACTACTAC
390





CCATCAGUGGGGCCCAGGUGG
107
ACCAAAGGCGGGGCTGACTATTAC
391





CCTTCTCCAACCUCCAGTCUG
108
GCAGATGATGAATCTGATTATTAC
392





CUGATAATCAAUGGGCCCCAG
109
GCTGGGGACGAGGCTTTCCTCT
393





UGACCATTAGUGGGGCCCAG
110
GCAGAAGATGAGGCTGACTATTAC
394





CUGACCATTAGUGGGGCCCAG
111
GAAGAGGATGAGAGTGACTACCAC
395





GGCCATCAACAGGGCCCAG
112
GCGGAAGATGAGGCTGACTATTAC
396





CTGCACATTTCUGAGCAGCCUG
113
CCTGAGGACGAGGCCGATTATTAC
397





TUGATTATTAAUGGGGCACAG
114
GCAGAAGACGAGGCTGACTATTAC
398





TUGATTATTAAUGGGGGCACAG
115
ACAGGATGGAAACAAGGCTATTAC
399





CUCTTGGGTGUGCAGCCCGA
116
CCAGGCTGGAAACAAGGCTATTAC
400





ATCATCAGGGCTCAGACTGAG
117
CCAGGCCAGGGACGAGGCTATTAC
401





ATCTCTGGACTCCAGGCTGAG
118
CCAGGCTGGGGACCAGGCTATTAC
402





ATCACTGGGGCCCAGGCTGAG
119
CCTGAGATCAAGTCCGACTATTAC
403





ATCTCCAGCCTCCAGTCTGAG
120
CCAGGCTGGGACGAGGCTATTAC
404





ATCTCTGGGCTCCAGTCTGAG
121
CCGAGAGGTGAAGCTGAGTACTAC
405





ATCTCTGGGCTCCAGTCTGAA
122
TCCGAGGAUGAGGCTGATTATUAC
406





GTCTCTGGGCTCCAGGCTGAG
123
GCTGAGGACGAGGCUGATTATUAG
407





ATCTCCGGGCTCCAGTCTGAG
124
TTAGAGGAUGAGGCTGATTATUAC
408





ATCACTGGACTCCAGTCTGAG
125
TTTGAGGAUGAGGCTGATTATUAC
409





ATCTCTGGGCTCCAGCCTGAG
126
CCTGAGGACGAGGCUGACTATUAC
410





ATCAGCAGGGCTCAGACTGAG
127
GCTGAGGAUGAGGCTGATTATUAC
411





CTTTTGGGTGCGCAGCCTGAG
128
CCTGAGGAUGAGGCTGAGTATUAC
412





CTTTCGGGTGCGCAGCCTGAG
129
CCTGAGGACGAGGCUGATTATUAC
413





ATCTCTGGGCTCCAGGCTGAG
130
CCTGAGGAUGAGGCTGACTATUAC
414





ATCTCGGGCCTCTAGCCTGAG
131
TCTGAGGAUGAGGCTGACTATUAC
415





ATCTCCAACCTCCAGTTTGAG
132
CCTGAGGACGAGGCUGAGTATUAC
416





ATCTCTGGGCTCTAGTCTGAG
133
TCTGAGGAUGAGGCTGATTATUAC
417





ACCTCTGGGCTCCAGGCTGAG
134
GCTGAGGACGAGGCUGATTATUAC
418





ATCAGTGGGCTCCAGTCTGAG
135
TCTGAAGAUGAGGCTGACTATUAC
419





ATCAGTGGGCTCCGGTCCGAG
136
ACTGAGGACGAGGCUGACTATUAC
420





ATCTCCAACCTCCAGTTAGAG
137
CCTGAGGACUAGGCCGATTATUAC
421





ATTACTGGACTCCAGCCTGAG
138
GTTGAAGACAAGGCUGACTATUAC
422





ATCTCCAACCTCCAGTCTGAG
139
ACUGGGGACGAGGCCGATTATUAC
423





ATCACTGGGCTCCAGGCTGAG
140
GCTGGGGAUGAGGCTGACTATUAC
424





CTGTCAGGTGTGCAGCCTGAG
141
GCCGGGGAUGAGGCCGACTATUAC
425





CATCTCTGGGCTCAAGTCCGAG
142
GTTGAGGAUGAGGCTGACTATUAC
426





CATCTCTGGCCTCCAGACCAAG
143
GCTGAGGACAAGACUGATTAUCAC
427





CATTTCTACAATCCCGTCTGAG
144
TCTGACGAUGAGGCTGAGTAUCAC
428





CATCACTGGCCTCTGGCCTGAG
145
ACCGAAGACGAGGCUGACTATUAC
429





CTTCTCCAACCTCCAGTCTGAC
146
TCCGAGGTUGAGGCTAATTAUCAC
430





CATCACTGGGGCTCAGGCGGAA
147
GTGGAGGAUGAAGATGACTACUAC
431





CATCAAGAACATCCAGGAAGAG
148
CCTGAGGACGAGGCUATGTATUAC
432





CATCACTGGGGCTCAGGTTGAA
149
ACCAAGGACAAGCCUGCCTATUAC
433





CATTAGCAGGGTCCTGACCAAA
150
GTTGAACAUGAAGCTGACTATUAC
434





CATCAGCAGGGTCCTGACCGAA
151
TCTGAGGAUGGAGCTGACTATAUC
435





CATCAGCAGAGCCCAAGCCGGG
152
GAAGAAGAUGAGAGTGACUACCAC
436





CATCTCTGGGCTCCAGCCTGAG
153
GCCGGGGAUGAGGCTGACTATUAC
437





CATCACGGGGGCCCAGGCAGAT
154
GCTATGGAUGAGGCTGACTATUAC
438





CATCAGCAGGGTCGAAGCCGGG
155
GCAGATGAUGAACTGATTATUAC
439





CATCAGTGGAGTCCAGGCAGAA
156
GTGGAGGAUGAAGCTGACTACUAC
440





CATCACTGGGATTCAGGTTGAA
157
ACTGAGGACGAGGCUGACTACUAC
441





CATCAGCGGGGCCCAGGTTGAG
158
ACCAAAGGCGGGGCUGACTATUAC
442





TATCAGTGGGGCCCAGGTGGAG
159
GCAGATGAUGAATCTGATTATUAC
443





CATCAGTGGGGCCCAGGTGGAG
160
GCTGGGGACGAGGCUTTCCTCU
444





CATCACCGGACTCCAGACTGGG
161
GCAGAAGAUGAGGCTGACTATUAC
445





CATCTCTGGACTGAAGACTGAG
162
GAAGAGGAUGAGAGTGACUACCAC
446





CATCAGTGGAGCCCAGGCTGGG
163
GCGGAAGAUGAGGCTGACTATUAC
447





CATCTCTGAGCTGCAGCCTGAG
164
CCUGAGGACGAGGCCGATTATUAC
448





CATCAGCAGGATCGAGGCTGGG
165
GCAGAAGACGAGGCUGACTATUAC
449





CACCACTGGGCTCTGGGCTGAG
166
ACAGGAUGGAAACAAGGCTATUAC
450





CATCAGCGGGACCCAGGCTATG
167
CCAGGCUGGAAACAAGGCTATUAC
451





CATCAAGAACATCCAGGAAGAA
168
CCAGGCCAGGGACGAGGCUATUAC
452





GGCCATCAACAGGGCCCAGGC
169
CCAGGCUGGGGACCAGGCTATUAC
453





GACCATTAGTGGGGCCCAGGC
170
CCTGAGATCAAGUCCGACTATUAC
454





GATTATTAATGGGGGCACAGGA
171
CCAGGCUGGGACGAGGCTATUAC
455





GCACATTTCTGAGCAGCCTGAG
172
CCGAGAGGUGAAGCTGAGTACUAC
456





GATTATTAATGGGGCACAGGA
173
GCTGAGGATGAGGCTGATTATTACT
457





GATAATCAATGGGCCCCAGGC
174
CCTGAGGACGAGGCTGAGTATTACT
458





CTTGGGTGTGCAGCCCGAGA
175
TCTGAGGATGAGGCTGACTATTACT
459





ATCATCAGGGCUCAGACUGAG
176
TCCGAGGATGAGGCTGATTATTACT
460





ATCTCTGGACUCCAGGCUGAG
177
CCTGAGGACGAGGCTGACTATTACT
461





ATCACUGGGGCCCAGGCUGAG
178
ACTGAGGACGAGGCTGACTATTACT
462





ATCTCCAGCCUCCAGTCUGAG
179
TCTGAAGATGAGGCTGACTATTACT
463





ATCTCTGGGCUCCAGTCUGAG
180
TTAGAGGATGAGGCTGATTATTACT
464





ATCTCTGGGCUCCAGTCUGAA
181
GCTGAGGACGAGGCTGATTATTAGT
465





GTCTCTGGGCUCCAGGCUGAG
182
TCTGAGGATGAGGCTGATTATTACT
466





ATCTCCGGGCUCCAGTCUGAG
183
CCTGAGGATGAGGCTGACTATTACT
467





ATCACTGGACUCCAGTCUGAG
184
GCTGAGGACGAGGCTGATTATTACT
468





ATCTCTGGGCUCCAGCCUGAG
185
CCTGAGGATGAGGCTGAGTATTACT
469





ATCAGCAGGGCUCAGACUGAG
186
CCTGAGGACGAGGCTGATTATTACT
470





CTTTTGGGUGCGCAGCCUGAG
187
TTTGAGGATGAGGCTGATTATTACT
471





CTTTCGGGUGCGCAGCCUGAG
188
GCCTGAGGACGAGGCTATGTATTACT
472





ATCTCTGGGCUCCAGGCUGAG
189
GGCAGAAGACGAGGCTGACTATTACT
473





ATCTCGGGCCUCTAGCCUGAG
190
GTCTGACGATGAGGCTGAGTATCACT
474





ATCTCCAACCUCCAGTTUGAG
191
GGCAGATGATGAATCTGATTATTACT
475





ATCTCTGGGCUCTAGTCUGAG
192
GTCCGAGGTTGAGGCTAATTATCACT
476





ACCTCTGGGCUCCAGGCUGAG
193
GGTTGAACATGAAGCTGACTATTACC
477





ATCAGTGGGCUCCAGTCUGAG
194
GGCTATGGATGAGGCTGACTATTACT
478





ATCAGTGGGCUCCGGUCCGAG
195
GGCGGAAGATGAGGCTGACTATTACT
479





ATCTCCAACCUCCAGTUAGAG
196
GGAAGAGGATGAGAGTGACTACCACT
480





ATTACTGGACUCCAGCCUGAG
197
GGTTGAGGATGAGGCTGACTATTACT
481





ATCTCCAACCUCCAGTCUGAG
198
GACCAAAGGCGGGGCTGACTATTACT
482





ATCACTGGGCUCCAGGCUGAG
199
GACCAAGGACAAGCCTGCCTATTACT
483





CTGTCAGGUGTGCAGCCUGAG
200
GGCTGGGGATGAGGCTGACTATTACT
484





CATCTCUGGGCTCAAGUCCGAG
201
GACTGGGGACGAGGCCGATTATTACT
485





CAUCTCTGGCCUCCAGACCAAG
202
GGTTGAAGACAAGGCTGACTATTACT
486





CATTTCTACAAUCCCGTCUGAG
203
GACCGAAGACGAGGCTGACTATTACT
487





CATCACTGGCCUCTGGCCUGAG
204
GGTGGAGGATGAAGATGACTACTACT
488





CTTCTCCAACCUCCAGTCUGAC
205
GACTGAGGACGAGGCTGACTACTACT
489





CATCACUGGGGCUCAGGCGGAA
206
GTCTGAGGATGGAGCTGACTATATCT
490





CAUCAAGAACAUCCAGGAAGAG
207
AGCCGGGGATGAGGCCGACTATTACT
491





CATCACUGGGGCTCAGGTUGAA
208
GGCAGATGATGAACTGATTATTACT
492





CATUAGCAGGGTCCUGACCAAA
209
GGTGGAGGATGAAGCTGACTACTACT
493





CATCAGCAGGGUCCUGACCGAA
210
GGAAGAAGATGAGAGTGACTACCACT
494





CATCAGCAGAGCCCAAGCCGGG
211
GGCTGAGGACAAGACTGATTATCACT
495





CATCTCTGGGCUCCAGCCUGAG
212
GCCTGAGGACTAGGCCGATTATTACT
496





CATCACGGGGGCCCAGGCAGAT
213
GGCAGAAGATGAGGCTGACTATTACT
497





CAUCAGCAGGGUCGAAGCCGGG
214
GCCTGAGGACGAGGCCGATTATTACT
498





CAUCAGTGGAGUCCAGGCAGAA
215
AGCCGGGGATGAGGCTGACTATTACT
499





CATCACUGGGATTCAGGTUGAA
216
GGCTGGGGACGAGGCTTTCCTCT
500





CAUCAGCGGGGCCCAGGTUGAG
217
ACAGGATGGAAACAAGGCTATTACT
501





TATCAGUGGGGCCCAGGUGGAG
218
CCAGGCTGGGACGAGGCTATTACT
502





CATCAGUGGGGCCCAGGUGGAG
219
CCAGGCCAGGGACGAGGCTATTACT
503





CATCACCGGACUCCAGACUGGG
220
CCAGGCTGGGGACCAGGCTATTACT
504





CATCTCUGGACTGAAGACUGAG
221
CCAGGCTGGAAACAAGGCTATTACT
505





CATCAGUGGAGCCCAGGCUGGG
222
CCTGAGATCAAGTCCGACTATTACT
506





CATCTCTGAGCUGCAGCCUGAG
223
CCGAGAGGTGAAGCTGAGTACTACT
507





CATCAGCAGGAUCGAGGCUGGG
224
GCTGAGGAUGAGGCTGATTATUACT
508





CACCACTGGGCUCTGGGCUGAG
225
CCTGAGGACGAGGCUGAGTATUACT
509





CAUCAGCGGGACCCAGGCTAUG
226
TCTGAGGAUGAGGCTGACTATUACT
510





CAUCAAGAACAUCCAGGAAGAA
227
TCCGAGGAUGAGGCTGATTATUACT
511





GGCCATCAACAGGGCCCAGGC
228
CCTGAGGACGAGGCUGACTATUACT
512





GACCAUTAGUGGGGCCCAGGC
229
ACTGAGGACGAGGCUGACTATUACT
513





GAUTATTAAUGGGGGCACAGGA
230
TCTGAAGAUGAGGCTGACTATUACT
514





GCACATTTCUGAGCAGCCUGAG
231
TTAGAGGAUGAGGCTGATTATUACT
515





GAUTATTAAUGGGGCACAGGA
232
GCTGAGGACGAGGCUGATTATUAGT
516





GAUAATCAAUGGGCCCCAGGC
233
TCTGAGGAUGAGGCTGATTATUACT
517





CUTGGGTGUGCAGCCCGAGA
234
CCTGAGGAUGAGGCTGACTATUACT
518





GACTCCAGTCTGAGGATGAG
235
GCTGAGGACGAGGCUGATTATUACT
519





GGCTCCAGTCTGAGGATGAG
236
CCTGAGGAUGAGGCTGAGTATUACT
520





ACCTCCAGTTAGAGGATGAG
237
CCTGAGGACGAGGCUGATTATUACT
521





ACCTCCAGTTTGAGGATGAG
238
TTTGAGGAUGAGGCTGATTATUACT
522





GGGCCCAGGCTGAGGATGAG
239
GCCTGAGGACGAGGCUATGTATUACT
523





GCCTCCAGTCTGAGGATGAG
240
GGCAGAAGACGAGGCUGACTATUACT
524





GGCTCCGGTCCGAGGATGAG
241
GTCTGACGAUGAGGCTGAGTAUCACT
525





GGCTCTAGTCTGAGGATGAG
242
GGCAGATGAUGAATCTGATTATUACT
526





GGCTCCAGTCTGAAGATGAG
243
GTCCGAGGTUGAGGCTAATTAUCACT
527





GTGTGCAGCCTGAGGACGAG
244
GGTTGAACAUGAAGCTGACTATUACC
528





GCCTCTAGCCTGAGGACGAG
245
GGCTATGGAUGAGGCTGACTATUACT
529





GTGCGCAGCCTGAGGATGAG
246
GGCGGAAGAUGAGGCTGACTATUACT
530





GGCTCCAGGCTGAGGACGAG
247
GGAAGAGGAUGAGAGTGACUACCACT
531





GACTCCAGCCTGAGGACGAG
248
GGTTGAGGAUGAGGCTGACTATUACT
532





ACCTCCAGTCTGAGGATGAG
249
GACCAAAGGCGGGGCUGACTATUACT
533





GGCTCCAGGCTGAGGATGAG
250
GACCAAGGACAAGCCUGCCTATUACT
534





GACTCCAGGCTGAGGACGAG
251
GGCTGGGGAUGAGGCTGACTATUACT
535





GGGCTCAGACTGAGGACGAG
252
GACUGGGGACGAGGCCGATTATUACT
536





GGCTCCAGCCTGAGGATGAG
253
GGTTGAAGACAAGGCUGACTATUACT
537





GGGTCCTGACCGAAGACGAG
254
GACCGAAGACGAGGCUGACTATUACT
538





GCCTCTGGCCTGAGGACGAG
255
GGTGGAGGAUGAAGATGACTACUACT
539





GCCTCTGGCCTGAGGACTAG
256
GACTGAGGACGAGGCUGACTACUACT
540





GGATCGAGGCTGGGGATGAG
257
GTCTGAGGAUGGAGCTGACTATAUCT
541





GGGCCCAGGTGGAGGATGAA
258
AGCCGGGGAUGAGGCCGACTATUACT
542





GGCTCAAGTCCGAGGTTGAG
259
GGCAGATGAUGAACTGATTATUACT
543





GCCTCCAGACCAAGGACAAG
260
GGTGGAGGAUGAAGCTGACTACUACT
544





GGCTCCAGCCTGAGGACGAG
261
GGAAGAAGAUGAGAGTGACUACCACT
545





GGGCCCAGGTTGAGGATGAG
262
GGCUGAGGACAAGACTGATTAUCACT
546





GGGTCCTGACCAAAGGCGGG
263
GCCTGAGGACUAGGCCGATTATUACT
547





ACCTCCAGTCTGACGATGAG
264
GGCAGAAGAUGAGGCTGACTATUACT
548





AGCTGCAGCCTGAGGACGAG
265
GCCUGAGGACGAGGCCGATTATUACT
1447





GGACCCAGGCTATGGATGAG
266
AGCCGGGGAUGAGGCTGACTATUACT
1448





GACTCCAGACTGGGGACGAG
267
GGCTGGGGACGAGGCUTTCCTCU
1449





ACATCCAGGAAGAAGATGAG
268
ACAGGAUGGAAACAAGGCTATUACT
1450





GGATTCAGGTTGAAGACAAG
269
CCAGGCUGGGACGAGGCTATUACT
1451





GGGTCGAAGCCGGGGATGAG
270
CCAGGCCAGGGACGAGGCUATUACT
1452





GAGTCCAGGCAGAAGATGAG
271
CCAGGCUGGGGACCAGGCTATUACT
1453





ACATCCAGGAAGAGGATGAG
272
CCAGGCUGGAAACAAGGCTATUACT
1454





GGGCTCAGGTTGAACATGAA
273
CCTGAGATCAAGUCCGACTATUACT
1455





GGGCTCAGGCGGAAGATGAG
274
CCGAGAGGUGAAGCTGAGTACUACT
1456





GACGGAGACCAAGGAUGTUGGA
275
GACGGAGACCAAGGATGTTGGA
1457





GGTGGAGGCUGAGGATGTUGGA
276
GGTGGAGGCTGAGGATGTTGGA
1458





GGTAGAGGCUGAGGACGTUGGG
277
GGTAGAGGCTGAGGACGTTGGG
1459





GGTGGAGGCUGAGGATTTUGGA
278
GGTGGAGGCTGAGGATTTTGGA
1460





GGTGGAGGCUGAGGATGTUGGG
279
GGTGGAGGCTGAGGATGTTGGG
1461





GACGGAGACUAAGGATGTUGGA
280
GACGGAGACTAAGGATGTTGGA
1462





AGTGGAGGCUGAGGATGTUGGG
281
AGTGGAGGCTGAGGATGTTGGG
1463





GGTGGAAGCUGAGGATGUCGGG
282
GGTGGAAGCTGAGGATGTCGGG
1464





GATGGATGCUGAGGATGTUGGG
283
GATGGATGCTGAGGATGTTGGG
1465





GGTGGAGGCUGAGGATATUCGA
284
GGTGGAGGCTGAGGATATTCGA
1466
















TABLE 2







IgL lambda J gene











SEQ ID



Sequence
NO







GACGGTCAGCTCCGTCCC
549







GACGGTCAGCTCGGTCCC
550







GACGGTGACCTTGGTCCC
551







GACGGTCAGCTGGGTGCC
552







AATGATCAGCTGGGTTCC
553







GACGGTCACCTTGGTGCC
554







GACGGTCAGCTTGGTCCC
555







GGCGGTCAGCTGGGTGCC
556







GACGGUCAGCTCCGUCCC
557







GACGGUCAGCTCGGUCCC
558







GACGGUGACCTTGGUCCC
559







GACGGUCAGCTGGGUGCC
560







AATGATCAGCUGGGTUCC
561







GACGGUCACCTTGGUGCC
562







GACGGUCAGCTTGGUCCC
563







GGCGGUCAGCTGGGUGCC
564







GGACGGTCAGCTGGGTGC
565







GGGCGGTCAGCTGGGTGC
566







AAATGATCAGCTGGGTTC
567







GGACGGTGACCTTGGTCC
568







GGACGGTCACCTTGGTGC
569







GGACGGTCAGCTTGGTCC
570







GGACGGTCAGCTCCGTCC
571







GGACGGTCAGCTCGGTCC
572







GGACGGUCAGCTGGGUGC
573







GGGCGGUCAGCTGGGUGC
574







AAATGAUCAGCTGGGTUC
575







GGACGGUGACCTTGGUCC
576







GGACGGUCACCTTGGUGC
577







GGACGGUCAGCTTGGUCC
578







GGACGGUCAGCTCCGUCC
579







GGACGGUCAGCTCGGUCC
580







CGAGGACGGTCAGCTGGGT
581







CGAGGGCGGTCAGCTGGGT
582







CTAGGACGGTGACCTTGGT
583







CGAGGACGGTCACCTTGGT
584







CTAGGACGGTCAGCTCCGT
585







CTAGGACGGTCAGCTTGGT
586







CTAAAATGATCAGCTGGGT
587







CTAGGACGGTCAGCTCGGT
588







CGAGGACGGUCAGCUGGGT
589







CGAGGGCGGUCAGCUGGGT
590







CTAGGACGGUGACCTUGGT
591







CGAGGACGGUCACCTUGGT
592







CTAGGACGGUCAGCUCCGT
593







CTAGGACGGUCAGCTUGGT
594







CTAAAAUGATCAGCUGGGT
595







CTAGGACGGUCAGCUCGGT
596

















TABLE 3







IgL kappa V gene











SEQ




ID



Sequence
NO







CTAGAGCCTGAAGATTTTGCAGTGTATTAC
597







CTGCAGCCTGAAGATTTTGCAACTTATTAC
598







CTGCAACCTGAAGATGTTATAACTTATTGC
599







CTGCAGCCTGAAGATTTTGCAACTTACTAT
600







CTGCAGCCTGAAGATTTTGCAGTTTATTAC
601







CTGCAGTCTGAAGATTTTGCAACTTATTAC
602







CTGGAGCCTGAAGATTTGCACTTCATCAC
603







CTGCAACCTGAAGATTTTGCAACTTATTAC
604







CTGGAGCCTGAAGATTTTGCAGTTTATTAC
605







CTGGAGCCTGAAGATTTTGCAGTGTATTAC
606







CTAGAGCCTGAAGATTTTGCAGTTTATTAC
607







CTGGAGCCTGAAGATTTTGCAGTCTATTAC
608







CTGGAAGCTGAAGATGCTGCAACATATTAC
609







CTGCAGGCTGAAGATGTGGCAGTTTATTAC
610







CTGCAGCCTGATGATTTTGCAACTTATTAC
611







CTGAAGCCTGAAGATTTTGCAGCTTATTAC
612







CTCCAGTCTGAAGTTGCTGCAACTTCTTAT
613







CTGCAGCCTAAAGATGTTGCAACTTATTAC
614







CTGGAAGCTGAAGATGCTGCAACGTATTAC
615







CTGCAGTCTGAAGATTTTGCAGTTTATTAC
616







CTAGACCCTGAAGATGTCACAATTTTATTAC
617







CTGCAGCCTGAAGATGTTGCAACTTATTAC
618







CTGCAGCCTGAAGATATTGCAACATATTAC
619







CTGCAGCCTAAAGATGTTGCAAGTTATTAC
620







CTGGAGCATGAAGATTTTGCACTTTAACAC
621







CTGGAAGCTGAAGATGCTGCAGCGTATTAC
622







ATAGAATCTGAGGATGCTGCATATTACTTC
623







GTGGAAGCTAATGATACTGCAAATTATTAC
624







CTGCAACCTGAAGATTTTGCAACTTACTAC
625







CTGCAACCTGAAGATGTTATAACTTATTAC
626







GCCTTCCCACACAGGTTCTCCC
627







AGAGCCTGAAGATTTTGCAGTGTATTACT
628







AGAGCCTGAAGATTTTGCAGTTTATTACT
629







AGAATCTGAGGATGCTGCATATTACTTCT
630







GGAGCATGAAGATTTTGCACTTTAACACT
631







GGAAGCTAATGATACTGCAAATTATTACT
632







GGAGCCTGAAGATTTTGCAGTTTATTACT
633







CCAGTCTGAAGTTGCTGCAACTTCTTATT
634







GCAGCCTGAAGATGTTGCAACTTATTACG
635







GCAGGCTGAAGATGTGGCAGTTTATTACT
636







GCAGCCTGATGATTTTGCAACTTATTACT
637







GCAACCTGAAGATTTTGCAACTTATTACT
638







GGAAGCTGAAGATGCTGCAGCGTATTACT
639







GCAACCTGAAGATTTTGCAACTTACTACT
640







GCAGTCTGAAGATTTTGCAACTTATTACT
641







GAAGCCTGAAGATTTTGCAGCTTATTACT
642







GCAGCCTGAAGATGTTGCAACTTATTACT
643







AGACCCTGAAGATGTCACAATTTTATTACC
644







GGAAGCTGAAGATGCTGCAACATATTACT
645







GCAGCCTAAAGATGTTGCAACTTATTACT
646







GGAGCCTGAAGATTTTGCAGTCTATTACT
647







GGAAGCTGAAGATGCTGCAACGTATTACT
648







GCAGCCTGAAGATATTGCAACATATTACT
649







GGAGCCTGAAGATTTGCACTTCATCACT
650







GCAGCCTAAAGATGTTGCAAGTTATTACT
651







GCAGCCTGAAGATTTTGCAGTTTATTACT
652







GCAACCTGAAGATGTTATAACTTATTGC
653







GCAGCCTGAAGATTTTGCAACTTATTACT
654







GCAACCTGAAGATGTTATAACTTATTACT
655







GGAGCCTGAAGATTTTGCAGTGTATTACT
656







GCAGCCTGAAGATTTTGCAACTTACTATT
657







GCAGTCTGAAGATTTTGCAGTTTATTACT
658







CTGCCTTCCCACACAGGTTCTCC
659







CTGGAGCCTGAAGATTTGCACTTCAT
660







GTGGAAGCTAATGATACTGCAAATTAT
661







CTGCAGCCTGAAGATTTTGCAGTTTAT
662







CTGGAGCCTGAAGATTTTGCAGTGTAT
663







CTGGAGCCTGAAGATTTTGCAGTTTAT
664







CTGGAAGCTGAAGATGCTGCAACATAT
665







CTGCAGCCTGAAGATTTTGCAACTTAC
666







CTAGAGCCTGAAGATTTTGCAGTTTAT
667







CTGCAGTCTGAAGATTTTGCAACTTAT
668







CTGCAGTCTGAAGATTTTGCAGTTTAT
669







CTAGACCCTGAAGATGTCACAATTTTAT
670







CTGGAAGCTGAAGATGCTGCAGCGTAT
671







CTGGAGCCTGAAGATTTTGCAGTCTAT
672







CTGGAGCATGAAGATTTTGCACTTTAA
673







CTGCAGCCTGATGATTTTGCAACTTAT
674







CTGCAGCCTAAAGATGTTGCAACTTAT
675







CTGCAGGCTGAAGATGTGGCAGTTTAT
676







CTGCAGCCTGAAGATGTTGCAACTTAT
677







CTGCAACCTGAAGATTTTGCAACTTAC
678







ATAGAATCTGAGGATGCTGCATATTAC
679







CTGCAACCTGAAGATTTTGCAACTTAT
680







CTGCAGCCTAAAGATGTTGCAAGTTAT
681







CTGCAGCCTGAAGATTTTGCAACTTAT
682







CTAGAGCCTGAAGATTTTGCAGTGTAT
683







CTGCAACCTGAAGATGTTATAACTTAT
684







CTGAAGCCTGAAGATTTTGCAGCTTAT
685







CTGGAAGCTGAAGATGCTGCAACGTAT
686







CTGCAGCCTGAAGATATTGCAACATAT
687







CTCCAGTCTGAAGTTGCTGCAACTTCT
688







CAGTTTTCTGCCTTCCCACACAGGTT
689







CCTGCAGCCTGAAGATTTTGCA
690







CCTGCAGGCTGAAGATGTGGCA
691







CCTGCAGCCTGAAGATGTTGCA
692







CCTGCAGTCTGAAGATTTTGCA
693







ACTGGAGCCTGAAGATTTTGCA
694







TGTGGAAGCTAATGATACTGCA
695







CCTGCAGCCTGATGATTTTGCA
696







CCTGGAAGCTGAAGATGCTGCA
697







CCTAGAGCCTGAAGATTTTGCA
698







CCTGCAGCCTGAAGATATTGCA
699







TCTGCAACCTGAAGATTTTGCA
700







GCTGGAGCATGAAGATTTTGCA
701







CCTGCAGCCTAAAGATGTTGCA
702







CCTCCAGTCTGAAGTTGCTGCA
703







CCTGCAACCTGAAGATGTTATA
704







GCTGGAGCCTGAAGATTTGCA
705







CCTAGACCCTGAAGATGTCACA
706







CATAGAATCTGAGGATGCTGCA
707







CCTGAAGCCTGAAGATTTTGCA
708







CAGTTTTCTGCCTTCCCACACAGGT
709







GGGTGGAGGCTGAGGATATTCG
710







GGACGGAGACCAAGGATGTTGG
711







GGGTGGAGGCTGAGGATGTTGG
712







GGACGGAGACTAAGGATGTTGG
713







GGATGGATGCTGAGGATGTTGG
714







AAGTGGAGGCTGAGGATGTTGG
715







GAGTGGAGGCTGAGGATGTTGG
716







GGGTGGAGGCTGAGGATTTTGG
717







GGGTGGAAGCTGAGGATGTCGG
718







GGGTAGAGGCTGAGGACGTTGG
719







ACATAGAATCTGAGGATGCTGC
720







CTGTGGAAGCTAATGATACTGC
721







ACCTGCAGCCTGAAGATTTTGC
722







GCCTGCAGCCTGAAGATTTTGC
723







GTCTGCAACCTGAAGATTTTGC
724







GCCTGCAACCTGAAGATGTTAT
725







GCCTGCAGGCTGAAGATGTGGC
726







TCCTGCAGTCTGAAGATTTTGC
727







TCCTCCAGTCTGAAGTTGCTGC
728







GCCTGGAAGCTGAAGATGCTGC
729







GCCTAGAGCCTGAAGATTTTGC
730







GCCTGCAGCCTGATGATTTTGC
731







GACTGGAGCCTGAAGATTITGC
732







GCCTGAAGCCTGAAGATTTTGC
733







TCCTGCAGCCTAAAGATGTTGC
734







GCCTGCAGTCTGAAGATTTTGC
735







GGCTGGAGCATGAAGATTTTGC
736







GCCTAGACCCTGAAGATGTCAC
737







GCCTGCAGCCTGAAGATGTTGC
738







GCCTGCAGCCTGAAGATATTGC
739







GGCTGGAGCCTGAAGATTTGC
740







CAATCAGTTTTCTGCCTTCCCACACAGG
741







CGGGTGGAGGCTGAGGATTTTG
742







AGGGTGGAGGCTGAGGATGTTG
743







AGAGTGGAGGCTGAGGATGTTG
744







TGGGTGGAGGCTGAGGATGTTG
745







AGGATGGATGCTGAGGATGTTG
746







CGGGTGGAGGCTGAGGATGTTG
747







AGGACGGAGACTAAGGATGTTG
748







AAAGTGGAGGCTGAGGATGTTG
749







AGGGTGGAGGCTGAGGATATTC
750







AGGGTGGAAGCTGAGGATGTCG
751







AGGGTAGAGGCTGAGGACGTTG
752







AGGACGGAGACCAAGGATGTTG
753







CTAGAGCCTGAAGAUTTTGCAGTGTATUAC
754







CTGCAGCCTGAAGAUTTTGCAACTTATUAC
755







CTGCAACCTGAAGAUGTTATAACTTATUGC
756







CTGCAGCCTGAAGAUTTTGCAACTTACUAT
757







CTGCAGCCTGAAGAUTTTGCAGTTTATUAC
758







CTGCAGTCTGAAGAUTTTGCAACTTATUAC
759







CTGGAGCCUGAAGATTTGCACTTCAUCAC
760







CTGCAACCTGAAGAUTTTGCAACTTATUAC
761







CTGGAGCCTGAAGAUTTTGCAGTTTATUAC
762







CTGGAGCCTGAAGAUTTTGCAGTGTATUAC
763







CTAGAGCCTGAAGAUTTTGCAGTTTATUAC
764







CTGGAGCCTGAAGAUTTTGCAGTCTATUAC
765







CTGGAAGCTGAAGAUGCTGCAACATATUAC
766







CTGCAGGCTGAAGAUGTGGCAGTTTATUAC
767







CTGCAGCCTGAUGATTTTGCAACTTATUAC
768







CTGAAGCCTGAAGAUTTTGCAGCTTATUAC
769







CTCCAGTCTGAAGUTGCTGCAACTTCTUAT
770







CTGCAGCCTAAAGAUGTTGCAACTTATUAC
771







CTGGAAGCTGAAGAUGCTGCAACGTATUAC
772







CTGCAGTCTGAAGAUTTTGCAGTTTATUAC
773







CTAGACCCTGAAGAUGTCACAATTTTATUAC
774







CTGCAGCCTGAAGAUGTTGCAACTTATUAC
775







CTGCAGCCTGAAGAUATTGCAACATATUAC
776







CTGCAGCCTAAAGAUGTTGCAAGTTATUAC
777







CTGGAGCAUGAAGATTTTGCACTTUAACAC
778







CTGGAAGCTGAAGAUGCTGCAGCGTATUAC
779







ATAGAATCTGAGGAUGCTGCATATTACTUC
780







GTGGAAGCTAAUGATACTGCAAATTATUAC
781







CTGCAACCTGAAGAUTTTGCAACTTACUAC
782







CTGCAACCTGAAGAUGTTATAACTTATUAC
783







GCCTUCCCACACAGGTTCUCCC
784







AGAGCCTGAAGAUTTTGCAGTGTATUACT
785







AGAGCCTGAAGAUTTTGCAGTTTATUACT
786







AGAATCTGAGGAUGCTGCATATTACTUCT
787







GGAGCATGAAGAUTTTGCACTTUAACACT
788







GGAAGCTAATGAUACTGCAAATTATUACT
789







GGAGCCTGAAGAUTTTGCAGTTTATUACT
790







CCAGTCTGAAGTUGCTGCAACTTCTTAUT
791







GCAGCCTGAAGAUGTTGCAACTTATUACG
792







GCAGGCTGAAGAUGTGGCAGTTTATUACT
793







GCAGCCTGAUGATTTTGCAACTTATUACT
794







GCAACCTGAAGAUTTTGCAACTTATUACT
795







GGAAGCTGAAGAUGCTGCAGCGTATUACT
796







GCAACCTGAAGAUTTTGCAACTTACUACT
797







GCAGTCTGAAGAUTTTGCAACTTATUACT
798







GAAGCCTGAAGAUTTTGCAGCTTATUACT
799







GCAGCCTGAAGAUGTTGCAACTTATUACT
800







AGACCCTGAAGAUGTCACAATTTTATUACC
801







GGAAGCTGAAGAUGCTGCAACATATUACT
802







GCAGCCTAAAGAUGTTGCAACTTATUACT
803







GGAGCCTGAAGAUTTTGCAGTCTATUACT
804







GGAAGCTGAAGAUGCTGCAACGTATUACT
805







GCAGCCTGAAGAUATTGCAACATATUACT
806







GGAGCCTGAAGAUTTGCACTTCAUCACT
807







GCAGCCTAAAGAUGTTGCAAGTTATUACT
808







GCAGCCTGAAGAUTTTGCAGTTTATUACT
809







GCAACCTGAAGAUGTTATAACTTATUGC
810







GCAGCCTGAAGAUTTTGCAACTTATUACT
811







GCAACCTGAAGAUGTTATAACTTATUACT
812







GGAGCCTGAAGAUTTTGCAGTGTATUACT
813







GCAGCCTGAAGAUTTTGCAACTTACTAUT
814







GCAGTCTGAAGAUTTTGCAGTTTATUACT
815







CTGCCTUCCCACACAGGTTCUCC
816







CTGGAGCCUGAAGATTTGCACTUCAT
817







GTGGAAGCTAAUGATACTGCAAATUAT
818







CTGCAGCCUGAAGATTTTGCAGTTUAT
819







CTGGAGCCUGAAGATTTTGCAGTGUAT
820







CTGGAGCCUGAAGATTTTGCAGTTUAT
821







CTGGAAGCTGAAGAUGCTGCAACAUAT
822







CTGCAGCCUGAAGATTTTGCAACTUAC
823







CTAGAGCCUGAAGATTTTGCAGTTUAT
824







CTGCAGTCUGAAGATTTTGCAACTUAT
825







CTGCAGTCUGAAGATTTTGCAGTTUAT
826







CTAGACCCUGAAGATGTCACAATTTUAT
827







CTGGAAGCTGAAGAUGCTGCAGCGUAT
828







CTGGAGCCUGAAGATTTTGCAGTCUAT
829







CTGGAGCAUGAAGATTTTGCACTTUAA
830







CTGCAGCCTGAUGATTTTGCAACTUAT
831







CTGCAGCCUAAAGATGTTGCAACTUAT
832







CTGCAGGCUGAAGATGTGGCAGTTUAT
833







CTGCAGCCUGAAGATGTTGCAACTUAT
834







CTGCAACCUGAAGATTTTGCAACTUAC
835







ATAGAATCTGAGGAUGCTGCATATUAC
836







CTGCAACCUGAAGATTTTGCAACTUAT
837







CTGCAGCCUAAAGATGTTGCAAGTUAT
838







CTGCAGCCUGAAGATTTTGCAACTUAT
839







CTAGAGCCUGAAGATTTTGCAGTGUAT
840







CTGCAACCUGAAGATGTTATAACTUAT
841







CTGAAGCCUGAAGATTTTGCAGCTUAT
842







CTGGAAGCTGAAGAUGCTGCAACGUAT
843







CTGCAGCCUGAAGATATTGCAACAUAT
844







CTCCAGTCTGAAGUTGCTGCAACTUCT
845







CAGTTTTCTGCCUTCCCACACAGGUT
846







CCTGCAGCCUGAAGATTTUGCA
847







CCTGCAGGCUGAAGATGUGGCA
848







CCTGCAGCCUGAAGATGTUGCA
849







CCTGCAGUCTGAAGATTTUGCA
850







ACTGGAGCCUGAAGATTTUGCA
851







TGTGGAAGCUAATGATACUGCA
852







CCTGCAGCCUGATGATTTUGCA
853







CCTGGAAGCUGAAGATGCUGCA
854







CCTAGAGCCUGAAGATTTUGCA
855







CCTGCAGCCUGAAGATATUGCA
856







TCTGCAACCUGAAGATTTUGCA
857







GCTGGAGCAUGAAGATTTUGCA
858







CCTGCAGCCUAAAGATGTUGCA
859







CCTCCAGUCTGAAGTTGCUGCA
860







CCTGCAACCUGAAGATGTTAUA
861







GCTGGAGCCUGAAGATTUGCA
862







CCTAGACCCUGAAGATGUCACA
863







CATAGAATCUGAGGATGCUGCA
864







CCTGAAGCCUGAAGATTTUGCA
865







CAGUTTTCTGCCTUCCCACACAGGT
866







GGGTGGAGGCUGAGGATATUCG
867







GGACGGAGACCAAGGAUGTUGG
868







GGGTGGAGGCUGAGGATGTUGG
869







GGACGGAGACUAAGGATGTUGG
870







GGATGGATGCUGAGGATGTUGG
871







AAGTGGAGGCUGAGGATGTUGG
872







GAGTGGAGGCUGAGGATGTUGG
873







GGGTGGAGGCUGAGGATTTUGG
874







GGGTGGAAGCUGAGGATGUCGG
875







GGGTAGAGGCUGAGGACGTUGG
876







ACATAGAATCUGAGGATGCUGC
877







CTGTGGAAGCUAATGATACUGC
878







ACCTGCAGCCUGAAGATTTUGC
879







GCCTGCAGCCUGAAGATTTUGC
880







GTCTGCAACCUGAAGATTTUGC
881







GCCTGCAACCUGAAGATGTUAT
882







GCCTGCAGGCUGAAGATGUGGC
883







TCCTGCAGUCTGAAGATTTUGC
884







TCCTCCAGUCTGAAGTTGCUGC
885







GCCTGGAAGCUGAAGATGCUGC
886







GCCTAGAGCCUGAAGATTTUGC
887







GCCTGCAGCCUGATGATTTUGC
888







GACTGGAGCCUGAAGATTTUGC
889







GCCTGAAGCCUGAAGATTTUGC
890







TCCTGCAGCCUAAAGATGTUGC
891







GCCTGCAGUCTGAAGATTTUGC
892







GGCTGGAGCAUGAAGATTTUGC
893







GCCTAGACCCUGAAGATGUCAC
894







GCCTGCAGCCUGAAGATGTUGC
895







GCCTGCAGCCUGAAGATATUGC
896







GGCTGGAGCCUGAAGATTUGC
897







CAATCAGUTTTCTGCCTUCCCACACAGG
898







CGGGTGGAGGCUGAGGATTTUG
899







AGGGTGGAGGCUGAGGATGTUG
900







AGAGTGGAGGCUGAGGATGTUG
901







TGGGTGGAGGCUGAGGATGTUG
902







AGGATGGAUGCTGAGGATGTUG
903







CGGGTGGAGGCUGAGGATGTUG
904







AGGACGGAGACUAAGGATGTUG
905







AAAGTGGAGGCUGAGGATGTUG
906







AGGGTGGAGGCUGAGGATATUC
907







AGGGTGGAAGCUGAGGATGUCG
908







AGGGTAGAGGCUGAGGACGTUG
909







AGGACGGAGACCAAGGAUGTUG
910

















TABLE 4







IgL kappa J gene











SEQ ID



Sequence
NO







GTTTGATCTCCACCTTGGTCCCT
911







GTTTGATTTCCACCTTGGTCCCT
912







GTTTGATCTCCAGCTTGGTCCCC
913







GTTTAATCTCCAGTCGTGTCCCT
914







GTTTGATATCCACTTTGGTCCCA
915







TTGATCTCCACCTTGGTCCCTCC
916







TTAATCTCCAGTCGTGTCCCTTG
917







TTGATATCCACTTTGGTCCCAGG
918







TTGATTTCCACCTTGGTCCCTTG
919







TTGATCTCCAGCTTGGTCCCCTG
920







TCCAGCTTGGTCCCCTGGC
921







TCCACCTTGGTCCCTCCGC
922







TCCACTTTGGTCCCAGGGC
923







TCCACCTTGGTCCCTTGGC
924







TCCAGTCGTGTCCCTTGGC
925







CAGTCGTGTCCCTTGGC
926







CAGCTTGGTCCCCTGGC
927







CACCTTGGTCCCTTGGC
928







CACTTTGGTCCCAGGGC
929







CACCTTGGTCCCTCCGC
930







GTTTGATCUCCACCTTGGUCCCT
931







GTTTGATTUCCACCTTGGUCCCT
932







GTTTGATCUCCAGCTTGGUCCCC
933







GTTTAATCUCCAGTCGTGUCCCT
934







GTTTGATAUCCACTTTGGUCCCA
935







TTGATCTCCACCUTGGTCCCUCC
936







TTAATCTCCAGUCGTGTCCCTUG
937







TTGATAUCCACTTTGGUCCCAGG
938







TTGATTTCCACCUTGGTCCCTUG
939







TTGATCTCCAGCUTGGTCCCCUG
940







TCCAGCTUGGTCCCCUGGC
941







TCCACCTUGGTCCCUCCGC
942







TCCACUTTGGUCCCAGGGC
943







TCCACCTUGGTCCCTUGGC
944







TCCAGTCGUGTCCCTUGGC
945







CAGTCGUGTCCCTUGGC
946







CAGCTUGGTCCCCUGGC
947







CACCTUGGTCCCTUGGC
948







CACUTTGGUCCCAGGGC
949







CACCTUGGTCCCUCCGC
950

















TABLE 5







KDE-Cint











SEQ ID



Sequence
NO







CTTTGGTGGCCATGCCACCG
 951







CAGCCGCCTTGCCGCTAG
 952







CATGCCACCGCGCTCTTG
 953







CTTTGGUGGCCAUGCCACCG
 954







CAGCCGCCUTGCCGCUAG
 955







CAUGCCACCGCGCTCTUG
 956







AGCCGCGGTCTTTCTCGAT
 957







CGCGGTCTTTCTCGATTGAGT
 958







CCCTGTGTCtgcccgattg
 959







AGCCGCGGUCTTTCUCGAT
 960







CGCGGTCUTTCTCGATUGAGT
 961







CCCTGTGUCTGCCCGATUG
1837







CTGTAAATAAGCATTATCCTGGGCT
 962







GTAAATAAGCATTATCCTGGGCTGA
 963







CTGCTGTAAATAAGCATTATCCTGGG
 964







CTGTAAAUAAGCATTATCCUGGGCT
 965







GTAAATAAGCATUATCCTGGGCUGA
 966







CTGCTGTAAAUAAGCATTATCCUGGG
 967







CAGACTCATGAGGAGTCGCCCT
1838







ACTCATGAGGAGTCGCCCTG
 968







CAGCTGCAGACTCATGAGGAGTCG
 969







CAGACTCAUGAGGAGUCGCCCT
 970







ACTCATGAGGAGUCGCCCUG
 971







CAGCTGCAGACUCATGAGGAGUCG
 972

















TABLE 6







IGH J gene











SEQ ID



SEQ ID NO
NO







GAGGAGACGGTGACCGTG
973







GAGACAGTGACCAGGGTGC
974







GAGACGGTGACCATTGTCC
975







TGAGGAGACGGTGACCAGG
976







CTTACCTGAGGAGACGGTGACC
977







GACTCACCTGAGGAGACGGTG
978







GACTCACCTGAGGAGACAGTG
979







TTCTTACCTGAGGAGACGGTG
980







GAGGAGACGGUGACCGUG
981







GAGACAGUGACCAGGGUGC
982







GAGACGGUGACCATTGUCC
983







TGAGGAGACGGUGACCAGG
984







CTTACCUGAGGAGACGGUGACC
985







GACTCACCUGAGGAGACGGUG
986







GACTCACCUGAGGAGACAGUG
987







TTCTTACCUGAGGAGACGGUG
988

















TABLE 7







IGH FR2 gene











SEQ ID



SEQ ID NO
NO







GTGGGCTGGATCCGTCAGC
 989







GTGAGCTGGATCCGTCAGC
 990







GCGAGCTGGATCCGTCAGC
 991







GTGAGCTGGGTCCGTCAGC
 992







GCACTGGGTCCGTCAAG
 993







GTACTGGGTCCGCCAGG
 994







GCACTGGGTCCGCCAGG
 995







GGACTGGGTCCGCCAGG
 996







GAACTGGGTCCGCCAGG
 997







GAGCTGGTTCCGCCAGG
 998







GAGCTGGGTCCGCCAGG
 999







GCACTGGGTCCGGCAAG
1000







GCACTGGGTCCGCCAAG
1001







GAGCTGGATCCGCCAGG
1002







GAGCTGGGTCCGCCAGC
1003







GAGCTGGGTCCGCCAAG
1004







CTGCACTGGGTGCGACAGG
1005







ATCAACTGGGTGCGACAGG
1006







ATGAATTGGGTGCGACAGG
1007







ATCAGCTGGGTGCGACAGG
1008







ATGCACTGGGTGCGACAGG
1009







ATGCATTGGGTGCGCCAGG
1010







ATGCAGTGGGTGCGACAGG
1011







ATGCACTGGGTGCAACAGG
1012







GTGCAGTGGGTGCGACAGG
1013







GAGCTGGATCCGGCAGC
1014







GAGCTGGATCCGCCAGC
1015







GAGTTGGGTCCGCCAGC
1016







GAACTGGATCAGGCAGT
1017







GGGCTGGATCCGGCAGC
1018







GGGCTGGATCCGCCAGC
1019







GAGCTGGATCCGGCAGT
1020







GTGCTGGATCCGCCAGC
1021







GAGTTGGATCCGCCAGC
1022







ACCGGCTGGGTGCGCCAGA
1023







ATCAGCTGGGTGCGCCAGA
1024







ATCGGCTGGGTGCACCAGA
1025







ATCGGCTGGGTGCGCCAGA
1026







GTGGGCUGGATCCGUCAGC
1027







GTGAGCUGGATCCGUCAGC
1028







GCGAGCUGGATCCGUCAGC
1029







GTGAGCUGGGTCCGUCAGC
1030







GCACUGGGTCCGUCAAG
1031







GUACTGGGUCCGCCAGG
1032







GCACUGGGUCCGCCAGG
1033







GGACUGGGUCCGCCAGG
1034







GAACUGGGUCCGCCAGG
1035







GAGCUGGTUCCGCCAGG
1036







GAGCUGGGUCCGCCAGG
1037







GCACUGGGUCCGGCAAG
1038







GCACUGGGUCCGCCAAG
1039







GAGCUGGAUCCGCCAGG
1040







GAGCUGGGUCCGCCAGC
1041







GAGCUGGGUCCGCCAAG
1042







CUGCACTGGGUGCGACAGG
1043







AUCAACTGGGUGCGACAGG
1044







AUGAATTGGGUGCGACAGG
1045







AUCAGCTGGGUGCGACAGG
1046







AUGCACTGGGUGCGACAGG
1047







AUGCATTGGGUGCGCCAGG
1048







AUGCAGTGGGUGCGACAGG
1049







ATGCACUGGGUGCAACAGG
1050







GUGCAGTGGGUGCGACAGG
1051







GAGCUGGAUCCGGCAGC
1052







GAGCUGGAUCCGCCAGC
1053







GAGUTGGGUCCGCCAGC
1054







GAACUGGAUCAGGCAGT
1055







GGGCUGGAUCCGGCAGC
1056







GGGCUGGAUCCGCCAGC
1057







GAGCUGGAUCCGGCAGT
1058







GUGCTGGAUCCGCCAGC
1059







GAGUTGGAUCCGCCAGC
1060







ACCGGCUGGGUGCGCCAGA
1061







AUCAGCTGGGUGCGCCAGA
1062







ATCGGCUGGGUGCACCAGA
1063







ATCGGCUGGGUGCGCCAGA
1064

















TABLE 8







IGH FR3distal gene











SEQ



Sequence
ID NO







TCTACAGCACAUCCCUGAAGACC
1065







TCTACAGCACAUCTCUGAAGACC
1066







ACTACAGCACAUCTCUGAACACC
1067







GCTACGGCCCAUCTCUGAAGAGC
1068







ACTACAGCACAUCTCUGAAGACC
1069







GCTACAGCCCAUCTCUGAAGAGC
1070







CCTACAGCACAUCTCUGAAGAGC
1071







TATCCAGGCUCCGUGAAGGGG
1072







TACACAGACUCCGUGAAGGGC
1073







TATGCAGACUCTGUGAAGGGC
1074







TACGGAGACUCCGUGAAGGGC
1075







TATGUGGACTCTGUGAAGGGC
1076







TATGCAGACUCCGUGAAGGGC
1077







TACACAGACUCTGUGAAGGGC
1078







TATGCAGACUCTGUGAAGGGT
1079







TACGCAGACUCAGUGAAGGGC
1080







TACGCGGACUCCGUGAAGGGC
1081







TATGCGGACUCTGUGAAGGGC
1082







TACGCAGACUCCGUGAAGGGC
1083







TATGCAAACUCTGUGAAGGGC
1084







TAUGCAGACUCCGCGAAGGGC
1085







TACGCAGACUCTGUGAAGGGC
1086







TATCCAGGCUCCGUGAAGGGC
1087







CGCUGCACCTGUGAAAGGC
1088







CGCCGCGUCTGUGAAAGGC
1089







TGCTGCGUCGGUGAAAGGC
1090







CACCGCGUCTGUGAAAGGC
1091







CGCUGCACCCGUGAAAGGC
1092







AACAACAACCCCUCCCUCAAGAGT
1093







AACTACAACCCGUCCCUCAAGAGT
1094







TACTACAACCCGUCCCUCAAGAGT
1095







AACAACAACCCGUCCCUCAAGAGT
1096







AACTACAACCCCUCCCUCAAGAGT
1097







ATATUCACAGGAGTUCCAGGGC
1098







ATATUCACAGAAGTUCCAGGGC
1099







CTAUGCACAGAAGTTUCAGGGC
1100







CTAUGCACAGAAGCUCCAGGGC
1101







AUACGCAGAGAAGTUCCAGGGC
1102







CUACGCACAGAAGTUCCAGGGC
1103







CUACGCACAGAAATUCCAGGAC
1104







CUACGCACAGAAGTUCCAGGAA
1105







CTAUGCACAGAAGTUCCAGGGC
1106







ATAUGCAGAGAAGTUCCAGGGC
1107







CUACGCACAGAAGTUGCAGGGC
1108







GATTATGCAGUATCTGUGAAAAGT
1109







ACGTAUGCCCAGGGCTUCACAGGA
1110







CTACAGCCCGUCCTUCCAAGGC
1111







ATACAGCCCGUCCTUCCAAGGC
1112







TCTACAGCACATCCCTGAAGACC
1113







TCTACAGCACATCTCTGAAGACC
1114







ACTACAGCACATCTCTGAACACC
1115







GCTACGGCCCATCTCTGAAGAGC
1116







ACTACAGCACATCTCTGAAGACC
1117







GCTACAGCCCATCTCTGAAGAGC
1118







CCTACAGCACATCTCTGAAGAGC
1119







TATCCAGGCTCCGTGAAGGGG
1120







TACACAGACTCCGTGAAGGGC
1121







TATGCAGACTCTGTGAAGGGC
1122







TACGGAGACTCCGTGAAGGGC
1123







TATGTGGACTCTGTGAAGGGC
1124







TATGCAGACTCCGTGAAGGGC
1125







TACACAGACTCTGTGAAGGGC
1126







TATGCAGACTCTGTGAAGGGT
1127







TACGCAGACTCAGTGAAGGGC
1128







TACGCGGACTCCGTGAAGGGC
1129







TATGCGGACTCTGTGAAGGGC
1130







TACGCAGACTCCGTGAAGGGC
1131







TATGCAAACTCTGTGAAGGGC
1132







TATGCAGACTCCGCGAAGGGC
1133







TACGCAGACTCTGTGAAGGGC
1134







TATCCAGGCTCCGTGAAGGGC
1135







CGCTGCACCTGTGAAAGGC
1136







CGCCGCGTCTGTGAAAGGC
1137







TGCTGCGTCGGTGAAAGGC
1138







CACCGCGTCTGTGAAAGGC
1139







CGCTGCACCCGTGAAAGGC
1140







AACAACAACCCCTCCCTCAAGAGT
1141







AACTACAACCCGTCCCTCAAGAGT
1142







TACTACAACCCGTCCCTCAAGAGT
1143







AACAACAACCCGTCCCTCAAGAGT
1144







AACTACAACCCCTCCCTCAAGAGT
1145







ATATTCACAGGAGTTCCAGGGC
1146







ATATTCACAGAAGTTCCAGGGC
1147







CTATGCACAGAAGTTTCAGGGC
1148







CTATGCACAGAAGCTCCAGGGC
1149







ATACGCAGAGAAGTTCCAGGGC
1150







CTACGCACAGAAGTTCCAGGGC
1151







CTACGCACAGAAATTCCAGGAC
1152







CTACGCACAGAAGTTCCAGGAA
1153







CTATGCACAGAAGTTCCAGGGC
1154







ATATGCAGAGAAGTTCCAGGGC
1155







CTACGCACAGAAGTTGCAGGGC
1156







GATTATGCAGTATCTGTGAAAAGT
1157







ACGTATGCCCAGGGCTTCACAGGA
1158







CTACAGCCCGTCCTTCCAAGGC
1159







ATACAGCCCGTCCTTCCAAGGC
1160

















TABLE 9







IGH FR3V gene











SEQ




ID



Sequence
NO







CATGCAGCTGAGCAGCC
1161







CATGGAGCTGAGCAGGC
1162







GGAGCTGAGCAGCCTGA
1163







AGAGCTGAGCAGCCTGA
1164







GGAGCTGAGGAGCCTGA
1165







CACAGACCTGAGCAGCCT
1166







TGGAGCTAAGCAGCCTGA
1167







CATGGAGCTGAGGAGCCTA
1168







ATGGAGCTAAGCAGTCTGAGATC
1169







TCCTTACCATGACCAACATGGAC
1170







GTGGTCCTTACAATGACCAACATG
1171







GGTTCTAACAGTGATCAACATGGAC
1172







ACACGGCGTATCTGCAAATGAA
1173







CACGCTGTATGTCCAAATGAGC
1174







AACACGCTCTACCTGCAAATGAA
1175







GAACTCACCGTATCTGCAAACGAA
1176







GAACACGCTGTATGTTCAAATGAG
1177







GAACACGCTGTTTCTGCAAATGAA
1178







AAGCATCGCCTATCTGCAAATGAA
1179







GAACACCCTGTATCTGCAAACGAA
1180







AGAACACGCTGTATCTGCAAATGAG
1181







GGAACTCCCTGTATCTGCAAAAGAA
1182







GCAAGTCCCTGTATCTGCAAAAGAA
1183







GAACACGCTGCATCTTCAAATGAAC
1184







GAACTCACTCCGTTTGCAAATGAAC
1185







GAACACGCTGTATCTGCAAATGAAC
1186







AAGAACACGCTGTATCTTCAAATGGG
1187







AAGAACTCCCTCTATCTGCAAGTGAA
1188







AAGAACAGGCTGTATCTGCAAATGAA
1189







CAAAAACTCCCTGTATCTGCAAATGAA
1190







TAAGAACTCACCGTATCTCCAAACGAA
1191







CCAGAATTCACTGTCTCTGCAAATGAA
1192







CAGGAACTTCCTGTATCAGCAAATGAA
1193







CAAGAACACGCTGTATCTTCAAATGAG
1194







CAAGAACTCCCTGTATCTGCAAATGAA
1195







CAAGAACTCACTCTGTTTGCAAATGAA
1196







CAAAAAACACGCTGTATCTGCAAATGAT
1197







CAAAGAACACGATGTATCTGCAAATGAG
1198







CAAGAACTCACTGTATCTGCAAATGAAC
1199







CCAAGAACTCACTGTATTTGCTAATGAA
1200







CCAAGAACTCACTGTATTTGCAAATGAA
1201







CCAATAACTCACCGTATCTGCAAATGAA
1202







CAAAAAACACGCTGTATCTGCAAATGAA
1203







CAAGAACACGCTGTATCTTCAAATGAAC
1204







CAAAAGCATCACCTATCTGCAAATGAAC
1205







CAAGAACACACTTCATCTGCAAATGAAC
1206







CAAAAGCATCACCTATCTGCAAATGAAG
1207







TCAAAGAACTCACTGTATCTGCAAATGAA
1208







GCTAAGAACTCTCTGTATCTGCAAATGAA
1209







CCAAGAACTCCTTGTATCTTCAAATGAAC
1210







CCAAGAAGTCCTTGTATCTTCAAATGAAC
1211







GCCAAGAAGTCCTTGTATCTTCATATGAAC
1212







CAGTTCCCCCTGAAGCTGA
1213







CAGTTCTCCCTGAAGCTGGG
1214







CAGTTCTCCCTGAAGCCGAG
1215







CCAGTTCTCCCTGAAGCTGAG
1216







AACCAGTTCTCCCTGAACCTGA
1217







AACCACTTCTCCCTGAAGCTGA
1218







GAACCAATTCTCCCTGAAGCTGA
1219







AGAACCAGTTCTACCTGAAGCTGA
1220







AGAAGCAGTTCTACCTGAAGCTGA
1221







GCAGTGGAGCAGCCTGA
1222







CAGTTCTCCCTGCAGCTGA
1223







ATCTGCAGATCAGCACGCTAA
1224







TATCTGCAGATCAGCAGCCTAA
1225







TGTCTTCAGATCAGCAGCCTAA
1226







TACCTGCAGATCAGCAGCCTAA
1227







TATCTGCAGATCTGCAGCCTAA
1228







TGGAGCTGAGCAGCCTGAGATCTGA
1229







CAATGACCAACATGGACCCTGTGGA
1230







TCTGCAAATGAACAGCCTGAGAGCC
1231







GAGCTCTGTGACCGCCGCGGACACG
1232







CAGCACCGCCTACCTGCAGTGGAGC
1233







GTTCTCCCTGCAGCTGAACTCTGTG
1234







CAGCACGGCATATCTGCAGATCAG
1235







CACAGTCTACATGGAGCTGAGC
1236







CACAGCCTACATGCAGCTGAGC
1237







CACAGCCTACACGGAGCTGAGC
1238







CACAGCCTACATGGAGCTGAGG
1239







CACAGCCTACACAGACCTGAGC
1240







CACAGCCTGCACGGAGCTGAGC
1241







CACAGCCTACATGGAGCTAAGC
1242







CACAGCCTACATGGAGCTGAGC
1243







GACAGCCTACATAGAGCTGAGC
1244







CAAGAATGAAGTGGTTCTAACAGTGA
1245







CAAAAACCAGGTGGTCCTTACAATGA
1246







CAAAAGCCAGGTGGTCCTTACCATGA
1247







AACACGCTCTACCTGCAAATGAACAGC
1248







AAGTCCTTGTATCTTCAAATGAACAGC
1249







AAGTCCCTGTATCTGCAAAAGAACAGA
1250







AACACGCTGTATCTGCAAATGAACAGC
1251







AACTCCCTGTATCTGCAAAAGAACAGA
1252







AACTCTCTGTATCTGCAAATGAACACT
1253







AATTCACTGTCTCTGCAAATGAACAGC
1254







AACTCACTGTATTTGCAAATGAACAGT
1255







AACTCACCGTATCTGCAAACGAACAGT
1256







AACACGCTGTATGTTCAAATGAGCAGT
1257







AACTCACTCTGTTTGCAAATGAACAGT
1258







AACACGCTGCATCTTCAAATGAACAGC
1259







AAGTCCTTGTATCTTCATATGAACAGC
1260







AACACGCTGTATCTGCAAATGATCAGC
1261







AACTCACTCCGTTTGCAAATGAACAGT
1262







AACACGCTGTTTCTGCAAATGAACAGC
1263







AACTCCTTGTATCTTCAAATGAACAGC
1264







AACTTCCTGTATCAGCAAATGAACAGC
1265







AACACGCTGTATCTTCAAATGAACAGC
1266







AACTCACCGTATCTGCAAATGAACAGC
1267







AACTCACTGTATCTGCAAATGAACAGC
1268







AACACGATGTATCTGCAAATGAGCAAC
1269







AGCATCACCTATCTGCAAATGAACAGC
1270







AACACGCTGTATCTTCAAATGAGCAGT
1271







AACTCTCTGTATCTGCAAATGAACAGT
1272







AACACGCTGTATGTCCAAATGAGCAGT
1273







AACACGGCGTATCTGCAAATGAACAGC
1274







AGCATCACCTATCTGCAAATGAAGAGC
1275







AACACGCTGTATCTTCAAATGAACAAC
1276







AACACGCTGTATCTGCAAATGAACAGT
1277







AACTCCCTCTATCTGCAAGTGAACAGC
1278







AACTCCCTGTATCTGCAAATGAACAGT
1279







AGCATCGCCTATCTGCAAATGAACAGC
1280







AACACGCTGTATCTGCAAATGAGCAAC
1281







AACACCCTGTATCTGCAAACGAATAGC
1282







AACACACTTCATCTGCAAATGAACAGC
1283







AACAGGCTGTATCTGCAAATGAACAGC
1284







AACTCACCGTATCTCCAAACGAACAGT
1285







AACACGCTGTATCTTCAAATGGGCAGC
1286







AACTCACTGTATTTGCTAATGAACAGT
1287







AACACGCTGTATCTGCAAATGAGCAGC
1288







CCTGAAGCTGAGCTCTGTGACTG
1289







CCTGAAGCCGAGCTCTGTGACTG
1290







CCTGAACCTGAGCTCTGTGACCG
1291







CCTGAAGCTGAGCTCTGTGACCG
1292







CCTGAAGCTGGGCTCTGTGACCG
1293







GACAAGTCCATCAGCACCGCCTACC
1294







GACAGCTCCAGCAGCACCGCCTACC
1295







GACAAGCCCATCAGCACCGCCTACC
1296







GACAAGTCCATCAGCACTGCCTACC
1297







CAGTTCTCCCTGCAGCTGAACTCT
1298







ACACCTCTGTCAGCATGGCGTA
1299







ACACCTCTGTCAGCACGGCATA
1300







ACACCTCTGCCAGCACAGCATA
1301







ACACGTCTGTCAGCACGGCGTG
1302







ACACCTCTGTCAGCATGGCATA
1303







CAUGCAGCUGAGCAGCC
1304







CAUGGAGCUGAGCAGGC
1305







GGAGCUGAGCAGCCUGA
1306







AGAGCUGAGCAGCCUGA
1307







GGAGCUGAGGAGCCUGA
1308







CACAGACCTGAGCAGCCT
1309







TGGAGCUAAGCAGCCUGA
1310







CATGGAGCUGAGGAGCCUA
1311







ATGGAGCUAAGCAGTCTGAGAUC
1312







TCCTTACCAUGACCAACAUGGAC
1313







GTGGTCCTUACAATGACCAACAUG
1314







GGTTCTAACAGUGATCAACAUGGAC
1315







ACACGGCGUATCTGCAAAUGAA
1316







CACGCTGUATGTCCAAAUGAGC
1317







AACACGCTCUACCTGCAAAUGAA
1318







GAACUCACCGTATCUGCAAACGAA
1319







GAACACGCUGTATGTTCAAAUGAG
1320







GAACACGCUGTTTCTGCAAAUGAA
1321







AAGCATCGCCUATCTGCAAAUGAA
1322







GAACACCCUGTATCUGCAAACGAA
1323







AGAACACGCUGTATCTGCAAAUGAG
1324







GGAACUCCCTGTATCUGCAAAAGAA
1325







GCAAGUCCCTGTATCUGCAAAAGAA
1326







GAACACGCUGCATCTTCAAAUGAAC
1327







GAACTCACUCCGTTTGCAAAUGAAC
1328







GAACACGCUGTATCTGCAAAUGAAC
1329







AAGAACACGCUGTATCTTCAAAUGGG
1330







AAGAACTCCCUCTATCTGCAAGUGAA
1331







AAGAACAGGCUGTATCTGCAAAUGAA
1332







CAAAAACTCCCUGTATCTGCAAAUGAA
1333







TAAGAACUCACCGTATCUCCAAACGAA
1334







CCAGAATTCACUGTCTCTGCAAAUGAA
1335







CAGGAACTTCCUGTATCAGCAAAUGAA
1336







CAAGAACACGCUGTATCTTCAAAUGAG
1337







CAAGAACTCCCUGTATCTGCAAAUGAA
1338







CAAGAACTCACUCTGTTTGCAAAUGAA
1339







CAAAAAACACGCUGTATCTGCAAAUGAT
1340







CAAAGAACACGAUGTATCTGCAAAUGAG
1341







CAAGAACTCACUGTATCTGCAAAUGAAC
1342







CCAAGAACTCACUGTATTTGCTAAUGAA
1343







CCAAGAACTCACUGTATTTGCAAAUGAA
1344







CCAATAACTCACCGUATCTGCAAAUGAA
1345







CAAAAAACACGCUGTATCTGCAAAUGAA
1346







CAAGAACACGCUGTATCTTCAAAUGAAC
1347







CAAAAGCATCACCUATCTGCAAAUGAAC
1348







CAAGAACACACUTCATCTGCAAAUGAAC
1349







CAAAAGCATCACCUATCTGCAAAUGAAG
1350







TCAAAGAACTCACUGTATCTGCAAAUGAA
1351







GCTAAGAACTCUCTGTATCTGCAAAUGAA
1352







CCAAGAACTCCUTGTATCTTCAAAUGAAC
1353







CCAAGAAGTCCUTGTATCTTCAAAUGAAC
1354







GCCAAGAAGTCCUTGTATCTTCATAUGAAC
1355







CAGTTCCCCCUGAAGCUGA
1356







CAGTTCUCCCTGAAGCUGGG
1357







CAGUTCTCCCUGAAGCCGAG
1358







CCAGTTCUCCCTGAAGCUGAG
1359







AACCAGTTCUCCCTGAACCUGA
1360







AACCACTTCUCCCTGAAGCUGA
1361







GAACCAATTCUCCCTGAAGCUGA
1362







AGAACCAGTTCUACCTGAAGCUGA
1363







AGAAGCAGTTCUACCTGAAGCUGA
1364







GCAGUGGAGCAGCCUGA
1365







CAGTTCUCCCTGCAGCUGA
1366







ATCTGCAGAUCAGCACGCUAA
1367







TATCTGCAGAUCAGCAGCCUAA
1368







TGTCTTCAGAUCAGCAGCCUAA
1369







TACCTGCAGAUCAGCAGCCUAA
1370







TATCTGCAGAUCTGCAGCCUAA
1371







TGGAGCUGAGCAGCCTGAGATCUGA
1372







CAATGACCAACAUGGACCCTGUGGA
1373







TCTGCAAAUGAACAGCCUGAGAGCC
1374







GAGCUCTGUGACCGCCGCGGACACG
1375







CAGCACCGCCUACCTGCAGUGGAGC
1376







GTTCTCCCUGCAGCTGAACTCTGUG
1377







CAGCACGGCAUATCTGCAGAUCAG
1378







CACAGTCUACATGGAGCUGAGC
1379







CACAGCCUACATGCAGCUGAGC
1380







CACAGCCUACACGGAGCUGAGC
1381







CACAGCCUACATGGAGCUGAGG
1382







CACAGCCUACACAGACCUGAGC
1383







CACAGCCUGCACGGAGCUGAGC
1384







CACAGCCUACATGGAGCUAAGC
1385







CACAGCCUACATGGAGCUGAGC
1386







GACAGCCUACATAGAGCUGAGC
1387







CAAGAATGAAGUGGTTCTAACAGUGA
1388







CAAAAACCAGGUGGTCCTTACAAUGA
1389







CAAAAGCCAGGUGGTCCTTACCAUGA
1390







AACACGCTCUACCTGCAAAUGAACAGC
1391







AAGTCCTTGUATCTTCAAAUGAACAGC
1392







AAGUCCCTGTATCUGCAAAAGAACAGA
1393







AACACGCTGUATCTGCAAAUGAACAGC
1394







AACUCCCTGTATCUGCAAAAGAACAGA
1395







AACTCTCTGUATCTGCAAAUGAACACT
1396







AATTCACTGUCTCTGCAAAUGAACAGC
1397







AACTCACTGUATTTGCAAAUGAACAGT
1398







AACUCACCGTATCUGCAAACGAACAGT
1399







AACACGCUGTATGTTCAAAUGAGCAGT
1400







AACTCACTCUGTTTGCAAAUGAACAGT
1401







AACACGCUGCATCTTCAAAUGAACAGC
1402







AAGTCCTTGUATCTTCATAUGAACAGC
1403







AACACGCTGUATCTGCAAATGAUCAGC
1404







AACTCACUCCGTTTGCAAAUGAACAGT
1405







AACACGCTGUTTCTGCAAAUGAACAGC
1406







AACTCCTTGUATCTTCAAAUGAACAGC
1407







AACTTCCTGUATCAGCAAAUGAACAGC
1408







AACACGCUGTATCTTCAAAUGAACAGC
1409







AACTCACCGUATCTGCAAAUGAACAGC
1410







AACTCACTGUATCTGCAAAUGAACAGC
1411







AACACGATGUATCTGCAAAUGAGCAAC
1412







AGCATCACCUATCTGCAAAUGAACAGC
1413







AACACGCUGTATCTTCAAAUGAGCAGT
1414







AACTCTCTGUATCTGCAAAUGAACAGT
1415







AACACGCTGUATGTCCAAAUGAGCAGT
1416







AACACGGCGUATCTGCAAAUGAACAGC
1417







AGCATCACCUATCTGCAAAUGAAGAGC
1418







AACACGCUGTATCTTCAAAUGAACAAC
1419







AACACGCTGUATCTGCAAAUGAACAGT
1420







AACTCCCTCUATCTGCAAGUGAACAGC
1421







AACTCCCTGUATCTGCAAAUGAACAGT
1422







AGCATCGCCUATCTGCAAAUGAACAGC
1423







AACACGCTGUATCTGCAAAUGAGCAAC
1424







AACACCCTGTAUCTGCAAACGAAUAGC
1425







AACACACTUCATCTGCAAAUGAACAGC
1426







AACAGGCTGUATCTGCAAAUGAACAGC
1427







AACUCACCGTATCUCCAAACGAACAGT
1428







AACACGCUGTATCTTCAAAUGGGCAGC
1429







AACTCACTGUATTTGCTAAUGAACAGT
1430







AACACGCTGUATCTGCAAAUGAGCAGC
1431







CCTGAAGCUGAGCTCTGTGACUG
1432







CCTGAAGCCGAGCUCTGTGACUG
1433







CCTGAACCUGAGCTCTGUGACCG
1434







CCTGAAGCUGAGCTCTGUGACCG
1435







CCTGAAGCUGGGCTCTGUGACCG
1436







GACAAGTCCAUCAGCACCGCCUACC
1437







GACAGCUCCAGCAGCACCGCCUACC
1438







GACAAGCCCAUCAGCACCGCCUACC
1439







GACAAGTCCAUCAGCACTGCCUACC
1440







CAGTTCTCCCUGCAGCTGAACUCT
1441







ACACCTCTGUCAGCATGGCGUA
1442







ACACCTCTGUCAGCACGGCAUA
1443







ACACCTCUGCCAGCACAGCAUA
1444







ACACGTCTGUCAGCACGGCGUG
1445







ACACCTCTGUCAGCATGGCAUA
1446

















TABLE 10







IGH Leader gene











SEQ




ID



Sequence
NO







GCACCTGGAGGATCCTCCTCTTG
1467







GCACCTGGAGGATCCTCTTCTTG
1468







GGATTTGGAGGATCCTCTTCTTG
1469







GGACCTGGAGGGTCTTCTGCTTG
1470







GGATTTGGAGGGTCCTCTTCTTG
1471







GGACCTGGAGAATCCTCTTCTTG
1472







GGACCTGGAGGTTCCTCTTTGTG
1473







GGACCTGGAGCATCCTTTTCTTG
1474







GGACCTGGAGGATCCTCTTTTTG
1475







GGACCTGGAGGATCCTCTTCTTG
1476







CTTTGTTCCACGCTCCTGCTG
1477







CTTTGCTACACACTCCTGCTG
1478







CTTTGCTCCACGCTCCTGCTG
1479







CTTTGTTCCACGCTCCTGCTA
1480







TTGGGCTGAGCTGGGTTTTCCTC
1481







TGGGGCTGAGCTGGGTTTTCCTT
1482







TGGGACTGAGCTGGATTTTCCTT
1483







TTGGGCTGAGCTGGCTTTTTCTT
1484







TTGGGCTGAGCTGGATTTTCCTT
1485







TTGGACTGAGCTGGGTTTTCCTT
1486







TGGGGCTGTGCTGGGTTTTCCTT
1487







TTGGGCTTAGCTGGGTTTTCCTT
1488







TTGGGCTGAGCTGGGTTTTCCTT
1489







TCTGGCTGAGCTGGGTTCTCCTT
1490







TTTGGCTGAGCTGGGTTTTCCTT
1491







TGGGGCTCCGCTGGGTTTTCCTT
1492







CATCTGTGGTTCTTCCTTCTCCTG
1493







CACCTGTGGTTTTTCCTCCTGCTG
1494







CACCTGTGGTTCTTCCTCCTGCTG
1495







CACCTGTGGTTCTTCCTCCTCCTG
1496







CACCTGTGGTTCTTTCTCCTCCTG
1497







CACCTGTGGTTCTTCCTGCTCCTG
1498







GAAAAGACTACATGATTGCTGAGCTGTT
1499







GGGCCTCTCCACTTAAACCCAGG
1500







CGCCATCCTCGCCCTCCTC
1501







CCTTCCTCATCTTCCTGCCCGTG
1502







CCTTCCTCATCTTCCTGCCGTG
1503







GCACCTGGAGGAUCCTCCTCTUG
1504







GCACCTGGAGGAUCCTCTTCTUG
1505







GGATTTGGAGGAUCCTCTTCTUG
1506







GGACCTGGAGGGUCTTCTGCTUG
1507







GGATTTGGAGGGUCCTCTTCTUG
1508







GGACCTGGAGAAUCCTCTTCTUG
1509







GGACCTGGAGGUTCCTCTTTGUG
1510







GGACCTGGAGCAUCCTTTTCTUG
1511







GGACCTGGAGGAUCCTCTTTTUG
1512







GGACCTGGAGGAUCCTCTTCTUG
1513







CTTTGTTCCACGCUCCTGCUG
1514







CTTTGCUACACACTCCTGCUG
1515







CTTTGCUCCACGCTCCTGCUG
1516







CTTTGTTCCACGCUCCTGCUA
1517







TTGGGCTGAGCUGGGTTTTCCUC
1518







TGGGGCTGAGCUGGGTTTTCCUT
1519







TGGGACTGAGCUGGATTTTCCUT
1520







TTGGGCTGAGCUGGCTTTTTCUT
1521







TTGGGCTGAGCUGGATTTTCCUT
1522







TTGGACTGAGCUGGGTTTTCCUT
1523







TGGGGCTGUGCTGGGTTTTCCUT
1524







TTGGGCTTAGCUGGGTTTTCCUT
1525







TTGGGCTGAGCUGGGTTTTCCUT
1526







TCTGGCTGAGCUGGGTTCTCCUT
1527







TTTGGCTGAGCUGGGTTTTCCUT
1528







TGGGGCTCCGCUGGGTTTTCCUT
1529







CATCTGTGGTUCTTCCTTCTCCUG
1530







CACCTGTGGTUTTTCCTCCTGCUG
1531







CACCTGTGGTUCTTCCTCCTGCUG
1532







CACCTGTGGTUCTTCCTCCTCCUG
1533







CACCTGTGGTUCTTTCTCCTCCUG
1534







CACCTGTGGTUCTTCCTGCTCCUG
1535







GAAAAGACTACAUGATTGCTGAGCTGUT
1536







GGGCCUCTCCACTUAAACCCAGG
1537







CGCCATCCUCGCCCTCCUC
1538







CCTTCCTCATCUTCCTGCCCGUG
1539







CCTTCCTCAUCTTCCTGCCGUG
1540

















TABLE 11







IgH V gene FR1











SEQ




ID



Sequence
NO







AGTGAAGGTTTCCTGCAAGGCAT
1541







AGTGAAGGTCTCCTGCAAGGTTT
1542







AGTGAAGGTTTCCTGCAAGGCTT
1543







AGTGAAGGTCTCCTGCAAGGCTT
1544







AGTGAAAATCTCCTGCAAGGTTT
1545







GGTGAAGGTCTCCTGCAAGGCTT
1546







ACGCTGACCTGCACCTTCT
1547







ACGCTGACCCGCACCTTCT
1548







ACACTGACCTGCGCCTTCT
1549







ACGCTGACCTGCACCGTCT
1550







ACACTGACCTGCACCTTCT
1551







CTGAAACTCTCCTGTGCAGCCT
1552







CTGAGACTCTCCTTTGCAGCCT
1553







CTGAGACTCTCCTGTTCAGCCT
1554







CTTAGACTCTCCTGTGCAGCCT
1555







CTGAGACTCTCCTGTGCAGCCT
1556







CTGAGACTCTCCTGTGCAGCGT
1557







CTGAGACTCTCCTGTACAGCTT
1558







GTCCCTCACCTGCGCTGTCT
1559







GTCCCTCACCTGTACTGTCT
1560







GTCCCTCACCTGCACTGTCT
1561







GTCCCTCACCTGCGCTATCT
1562







GTCCCTCACCTGCACTGTCA
1563







GTCTCTGAAGATCTCCTGTAAGGGTT
1564







GTCTCTGAGGATCTCCTGTAAGGGTT
1565







CCTCTCACTCACCTGTGCCATCT
1566







AGTGAAGGTTTCCTGCAAGGCTT
1567







GGGCTGAGGTGAAGAAGCTTG
1568







GAGCTGAGGTGAAGAAGCCTG
1569







GGTCTGAGTTGAAGAAGCCTG
1570







GGGCTGAGGTGAAGAAGACTG
1571







GGCCTGAGGTGAAGAAGCCTG
1572







GGGCTGAGGTGAAGAAGCCTG
1573







GTCTGGTCCTACGCTGGTAAAA
1574







GTCTGGTCCTGCGCTGGTGAAA
1575







GTCTGGTCCTGTGCTGGTGAAA
1576







GTCTGGTCCTACGCTGGTGAAA
1577







GGTCCCTTAGACTCTCCTGTG
1578







GGTCCCTGAGACTCTCCTGTA
1579







CGTCCCTGAGACTCTCCTGTA
1580







GGTCCCTGAAACTCTCCTGTG
1581







GGTCCCTGAGACTCTCCTTTG
1582







GGGCCCTGAGACTCTCCTGTG
1583







GGTCCCTGAGACTCTCCTGTT
1584







GGTCCCTGAGACTCTCCTGTG
1585







GGGCGCAGGACTGTTGAAG
1586







AGGTCCAGGACTGGTGAAG
1587







CGGCTCAGGACTGGTGAAG
1588







GGGCCCAGGACTGTTGAAG
1589







GGGCCCAGGACTGGTGAAG
1590







AGGTCCGGGACTGGTGAAG
1591







GCAGTCTGGAGCAGAGGTGAAA
1592







GCAGTCCGGAGCAGAGGTGAAA
1593







GGCTGAGGTGAAGAAGCCTGG
1594







AGCTGAGGTGAAGAAGCCTGG
1595







GCCTGAGGTGAAGAAGCCTGG
1596







GGCTGAGGTGAAGAAGCTTGG
1597







GGCTGAGGTGAAGAAGACTGG
1598







GTCTGAGTTGAAGAAGCCTGG
1599







GTCCTACGCTGGTGAAACCC
1600







GTCCTGTGCTGGTGAAACCC
1601







GTCCTGCGCTGGTGAAACCC
1602







GTCCTACGCTGGTAAAACCC
1603







GTCCAGGACTGGTGAAGCCC
1604







GCCCAGGACTGTTGAAGCCT
1605







GTCCGGGACTGGTGAAGCCC
1606







GCTCAGGACTGGTGAAGCCT
1607







GCCCAGGACTGGTGAAGCCT
1608







GCGCAGGACTGTTGAAGCCT
1609







GCAGTCCGGAGCAGAGGTGAAAA
1610







GCAGTCTGGAGCAGAGGTGAAAA
1611







GGTCTGAGTTGAAGAAGCCTGG
1612







GGCCTGAGGTGAAGAAGCCTGG
1613







GGGCTGAGGTGAAGAAGACTGG
1614







GGGCTGAGGTGAAGAAGCTTGG
1615







GAGCTGAGGTGAAGAAGCCTGG
1616







GGGCTGAGGTGAAGAAGCCTGG
1617







CCTGCGCTGGTGAAACCCAC
1618







CCTGTGCTGGTGAAACCCAC
1619







CCTACGCTGGTGAAACCCAC
1620







CCTACGCTGGTAAAACCCAC
1621







GGCTGAGGTGAAGAAGCCTGGG
1622







GCCTGAGGTGAAGAAGCCTGGG
1623







GGCTGAGGTGAAGAAGCTTGGG
1624







AGCTGAGGTGAAGAAGCCTGGG
1625







GTCTGAGTTGAAGAAGCCTGGG
1626







GGCTGAGGTGAAGAAGACTGGG
1627







TGCGCTGGTGAAACCCACAC
1628







TACGCTGGTAAAACCCACAC
1629







TGTGCTGGTGAAACCCACAG
1630







TACGCTGGTGAAACCCACAC
1631







TCCTCGGTGAAGGTCTCCTGCA
1632







GCCTCAGTGAAGGTCTCCTGCA
1633







TCCTCAGTGAAGGTCTCCTGCA
1634







ACCTCAGTGAAGGTCTCCTGCA
1635







GCCTCAGTGAAGGTTTCCTGCA
1636







TCCTCAGTGAAGGTTTCCTGCA
1637







GCTACAGTGAAAATCTCCTGCA
1638







CAGACCCTCACACTGACCTG
1639







GAGACCCTCACGCTGACCTG
1640







CAGACCCTCACGCTGACCTG
1641







CAGACCCTCACGCTGACCCG
1642







CTGTCCCTCACCTGTACTGT
1643







CCGTCCCTCACCTGCACTGT
1644







CTCTCACTCACCTGTGCCAT
1645







CTGTCCCTCACCTGCACTGT
1646







CTGTCCCTCACCTGCGCTAT
1647







CTGTCCCTCACCTGCGCTGT
1648







GGGAGTCTCTGAGGATCTCCTGTA
1649







GGGAGTCTCTGAAGATCTCCTGTA
1650







TCGGTGAAGGTCTCCTGCAAGG
1651







TCAGTGAAGGTTTCCTGCAAGG
1652







ACAGTGAAAATCTCCTGCAAGG
1653







TCAGTGAAGGTCTCCTGCAAGG
1654







TCACGCTGACCTGCACCGT
1655







TCACGCTGACCCGCACCTT
1656







TCACACTGACCTGCGCCTT
1657







TCACACTGACCTGCACCTT
1658







TCACGCTGACCTGCACCTT
1659







CCCTGAGACTCTCCTGTACAGC
1660







CCCTGAGACTCTCCTGTTCAGC
1661







CCCTGAGACTCTCCTTTGCAGC
1662







CCCTGAAACTCTCCTGTGCAGC
1663







CCCTGAGACTCTCCTGTGCAGC
1664







CCCTTAGACTCTCCTGTGCAGC
1665







GAGTCTCTGAAGATCTCCTGTAAGGG
1666







GAGTCTCTGAGGATCTCCTGTAAGGG
1667







GTGAAGGTCTCCTGCAAGGCTTCT
1668







GTGAAAATCTCCTGCAAGGTTTCT
1669







GTGAAGGTTTCCTGCAAGGCATCT
1670







GTGAAGGTTTCCTGCAAGGCTTCT
1671







GTGAAGGTTTCCTGCAAGGCTTCC
1672







GTGAAGGTCTCCTGCAAGGTTTCC
1673







ACGCTGACCCGCACCTTCTC
1674







ACGCTGACCTGCACCTTCTC
1675







ACGCTGACCTGCACCGTCTC
1676







ACACTGACCTGCGCCTTCTC
1677







ACACTGACCTGCACCTTCTC
1678







GTCCCTCACCTGCACTGTCTC
1679







GTCCCTCACCTGTACTGTCTC
1680







GTCCCTCACCTGCGCTATCTC
1681







GTCCCTCACCTGCACTGTCAC
1682







CTCACTCACCTGTGCCATCTC
1683







GTCCCTCACCTGCGCTGTCTA
1684







GTCCCTCACCTGCGCTGTCTC
1685







GTCTCTGAGGATCTCCTGTAAGGGTTC
1686







GTCTCTGAAGATCTCCTGTAAGGGTTC
1687







AGTGAAGGUTTCCUGCAAGGCAT
1688







AGTGAAGGTCUCCTGCAAGGTUT
1689







AGTGAAGGTTUCCTGCAAGGCUT
1690







AGTGAAGGTCUCCTGCAAGGCUT
1691







AGTGAAAATCUCCTGCAAGGTUT
1692







GGTGAAGGTCUCCTGCAAGGCUT
1693







ACGCTGACCUGCACCTUCT
1694







ACGCUGACCCGCACCTUCT
1695







ACACTGACCUGCGCCTUCT
1696







ACGCTGACCUGCACCGUCT
1697







ACACTGACCUGCACCTUCT
1698







CTGAAACUCTCCTGUGCAGCCT
1699







CTGAGACUCTCCTTUGCAGCCT
1700







CTGAGACUCTCCTGTUCAGCCT
1701







CTTAGACUCTCCTGUGCAGCCT
1702







CTGAGACUCTCCTGUGCAGCCT
1703







CTGAGACUCTCCTGUGCAGCGT
1704







CTGAGACTCUCCTGTACAGCUT
1705







GTCCCTCACCUGCGCTGUCT
1706







GTCCCUCACCTGTACTGUCT
1707







GTCCCTCACCUGCACTGUCT
1708







GTCCCTCACCUGCGCTAUCT
1709







GTCCCTCACCUGCACTGUCA
1710







GTCTCTGAAGAUCTCCTGTAAGGGUT
1711







GTCTCTGAGGAUCTCCTGTAAGGGUT
1712







CCTCTCACUCACCTGTGCCAUCT
1713







AGTGAAGGTTUCCTGCAAGGCUT
1714







GGGCTGAGGUGAAGAAGCTUG
1715







GAGCTGAGGUGAAGAAGCCUG
1716







GGTCTGAGTUGAAGAAGCCUG
1717







GGGCTGAGGUGAAGAAGACUG
1718







GGCCTGAGGUGAAGAAGCCUG
1719







GGGCTGAGGUGAAGAAGCCUG
1720







GTCTGGTCCUACGCTGGUAAAA
1721







GTCTGGTCCUGCGCTGGUGAAA
1722







GTCTGGTCCUGTGCTGGUGAAA
1723







GTCTGGTCCUACGCTGGUGAAA
1724







GGTCCCTUAGACTCTCCTGUG
1725







GGTCCCUGAGACTCTCCTGUA
1726







CGTCCCUGAGACTCTCCTGUA
1727







GGTCCCUGAAACTCTCCTGUG
1728







GGTCCCUGAGACTCTCCTTUG
1729







GGGCCCUGAGACTCTCCTGUG
1730







GGTCCCUGAGACTCTCCTGUT
1731







GGTCCCUGAGACTCTCCTGUG
1732







GGGCGCAGGACUGTUGAAG
1733







AGGUCCAGGACTGGUGAAG
1734







CGGCUCAGGACTGGUGAAG
1735







GGGCCCAGGACUGTUGAAG
1736







GGGCCCAGGACUGGUGAAG
1737







AGGUCCGGGACTGGUGAAG
1738







GCAGTCUGGAGCAGAGGUGAAA
1739







GCAGUCCGGAGCAGAGGUGAAA
1740







GGCTGAGGUGAAGAAGCCUGG
1741







AGCTGAGGUGAAGAAGCCUGG
1742







GCCTGAGGUGAAGAAGCCUGG
1743







GGCTGAGGUGAAGAAGCTUGG
1744







GGCTGAGGUGAAGAAGACUGG
1745







GTCTGAGTUGAAGAAGCCUGG
1746







GTCCUACGCTGGUGAAACCC
1747







GTCCTGUGCTGGUGAAACCC
1748







GTCCUGCGCTGGUGAAACCC
1749







GTCCUACGCTGGUAAAACCC
1750







GTCCAGGACUGGUGAAGCCC
1751







GCCCAGGACUGTUGAAGCCT
1752







GTCCGGGACUGGUGAAGCCC
1753







GCUCAGGACTGGUGAAGCCT
1754







GCCCAGGACUGGUGAAGCCT
1755







GCGCAGGACUGTUGAAGCCT
1756







GCAGUCCGGAGCAGAGGUGAAAA
1757







GCAGTCUGGAGCAGAGGUGAAAA
1758







GGTCTGAGTUGAAGAAGCCUGG
1759







GGCCTGAGGUGAAGAAGCCUGG
1760







GGGCTGAGGUGAAGAAGACUGG
1761







GGGCTGAGGUGAAGAAGCTUGG
1762







GAGCTGAGGUGAAGAAGCCUGG
1763







GGGCTGAGGUGAAGAAGCCUGG
1764







CCUGCGCTGGUGAAACCCAC
1765







CCUGTGCTGGUGAAACCCAC
1766







CCUACGCTGGUGAAACCCAC
1767







CCUACGCTGGUAAAACCCAC
1768







GGCTGAGGUGAAGAAGCCUGGG
1769







GCCTGAGGUGAAGAAGCCUGGG
1770







GGCTGAGGUGAAGAAGCTUGGG
1771







AGCTGAGGUGAAGAAGCCUGGG
1772







GTCTGAGTUGAAGAAGCCUGGG
1773







GGCTGAGGUGAAGAAGACUGGG
1774







TGCGCUGGUGAAACCCACAC
1775







TACGCUGGUAAAACCCACAC
1776







TGUGCTGGUGAAACCCACAG
1777







TACGCUGGUGAAACCCACAC
1778







TCCTCGGUGAAGGTCTCCUGCA
1779







GCCTCAGUGAAGGTCTCCUGCA
1780







TCCTCAGUGAAGGTCTCCUGCA
1781







ACCTCAGUGAAGGTCTCCUGCA
1782







GCCTCAGUGAAGGTTTCCUGCA
1783







TCCTCAGUGAAGGTTTCCUGCA
1784







GCTACAGUGAAAATCTCCUGCA
1785







CAGACCCUCACACTGACCUG
1786







GAGACCCUCACGCTGACCUG
1787







CAGACCCUCACGCTGACCUG
1788







CAGACCCUCACGCUGACCCG
1789







CTGTCCCUCACCTGTACUGT
1790







CCGTCCCUCACCTGCACUGT
1791







CTCTCACUCACCTGUGCCAT
1792







CTGTCCCUCACCTGCACUGT
1793







CTGTCCCUCACCTGCGCUAT
1794







CTGTCCCUCACCTGCGCUGT
1795







GGGAGTCTCUGAGGATCTCCTGUA
1796







GGGAGTCTCUGAAGATCTCCTGUA
1797







TCGGUGAAGGTCTCCUGCAAGG
1798







TCAGUGAAGGTTTCCUGCAAGG
1799







ACAGTGAAAAUCTCCUGCAAGG
1800







TCAGTGAAGGUCTCCUGCAAGG
1801







TCACGCUGACCUGCACCGT
1802







TCACGCUGACCCGCACCUT
1803







TCACACTGACCUGCGCCUT
1804







TCACACUGACCTGCACCUT
1805







TCACGCUGACCTGCACCUT
1806







CCCTGAGACUCTCCTGUACAGC
1807







CCCTGAGACUCTCCTGTUCAGC
1808







CCCTGAGACUCTCCTTUGCAGC
1809







CCCTGAAACUCTCCTGUGCAGC
1810







CCCTGAGACUCTCCTGUGCAGC
1811







CCCTTAGACUCTCCTGUGCAGC
1812







GAGTCTCUGAAGATCTCCTGUAAGGG
1813







GAGTCTCUGAGGATCTCCTGUAAGGG
1814







GTGAAGGTCUCCTGCAAGGCTUCT
1815







GTGAAAATCUCCTGCAAGGTTUCT
1816







GTGAAGGTTUCCTGCAAGGCAUCT
1817







GTGAAGGTTUCCTGCAAGGCTUCT
1818







GTGAAGGTTUCCTGCAAGGCTUCC
1819







GTGAAGGTCUCCTGCAAGGTTUCC
1820







ACGCUGACCCGCACCTTCUC
1821







ACGCTGACCUGCACCTTCUC
1822







ACGCTGACCUGCACCGTCUC
1823







ACACTGACCUGCGCCTTCUC
1824







ACACTGACCUGCACCTTCUC
1825







GTCCCTCACCUGCACTGTCUC
1826







GTCCCTCACCUGTACTGTCUC
1827







GTCCCTCACCUGCGCTATCUC
1828







GTCCCTCACCUGCACTGUCAC
1829







CTCACTCACCUGTGCCATCUC
1830







GTCCCTCACCUGCGCTGTCUA
1831







GTCCCTCACCUGCGCTGTCUC
1832







GTCTCTGAGGAUCTCCTGTAAGGGTUC
1833







GTCTCTGAAGATCUCCTGTAAGGGTUC
1834







AGTGAAGGTCUCCTGCAAGGTUT
1835







AGTGAAGGTTUCCTGCAAGGCUT
1836










The following description of various exemplary embodiments is exemplary and explanatory only and is not to be construed as limiting or restrictive in any way. Other embodiments, features, objects, and advantages of the present teachings will be apparent from the description and accompanying drawings, and from the claims.


Although the present description described in detail certain exemplary embodiments, other embodiments are also possible and within the scope of the present invention. Variations and modifications will be apparent to those skilled in the art from consideration of the specification and FIGURES and practice of the teachings described in the specification and FIGURES, and the claims.


EXEMPLIFICATION

Provided immune repertoire compositions include, without limitation, reagents designed for library preparation and sequencing of rearranged genomic IgH, IGkappa and IGlambda sequences. Generally, gDNA and/or total RNA was extracted from samples (e.g., blood samples, sorted cell samples, normal tissue sample, tumor samples, (e.g., fresh, frozen, FFPE, of various types)); libraries were generated, templates prepared, e.g., using Ion Chef™ or Ion OneTouch™ 2 System, then prepared templates were sequenced using next generation sequencing technology, e.g., an Ion S5™ System, an Ion GeneStudio S5™ System and sequence analysis was performed using Ion Reporter™ software. Kits suitable for extracting and/or isolating genomic DNA from biological samples are commercially available from, for example, Thermo Fisher Scientific and BioChain Institute Inc.


Example 1

In the examples herein, exemplary sets of forward and reverse primers comprising SEQ ID Nos 785-816, 847-876, 960-961, 972, 931-935, 941-945, 981-988, and 1304-1446 from Tables 1-9 were used. In one multiplex assay, sets of forward and reverse primers targeting the framework 3 (FR3) portion of the variable gene and the joining gene region of heavy- and light-chain loci (IGH, IGK, IGL) were included for amplifying sequences for alleles found within the IMGT database of B cell genomic DNA, enabling readout of the complementary-determining region 3 (CDR3) sequence of each immunoglobulin chain. To maximize sensitivity, primers to amplify IGK loci rearrangements involving Kappa deletion and C intron elements were also included. In addition, reflex assays were used to assess extended regions of the IGH sequence (FR3distal-J and FR2distal-J). Performance of assays was evaluated by clonality assessment and limit-of-detection testing following sequence analysis. Testing used gDNA from research samples representing common B cell malignancies, including B cell lines (ATCC, DSMZ) and clinical research samples (Cureline), including samples derived from peripheral blood, bone marrow, and FFPE-preserved tissues. Sequencing was performed on the Ion GeneStudio S5 and analysis using Ion Reporter 5.16.


Briefly, multiplex amplification reactions were performed as follows. To a single well of a 96-well PCR plate 200 ng prepared gDNA, 4 microliters of 5×BCR IGH-IGK-IGL panel (200 nM IgH forward and reverse primer and 100 nM IgK/IgL final concentration of primer pool), 4 microliters of 5× Ion AmpliSeg™ HiFi Mix (an amplification reaction mixture that can include glycerol, dNTPs, and Platinum® Taq High Fidelity DNA Polymerase (Invitrogen, Catalog No. 11304)), 2 microliters dNTP Mix (6 mM each dNTP, prepared in advance), and 2 microliters DNase/RNase free water were added to bring final reaction volume to 20 microliters. The 1 pool FR2-J and FR3d-J reactions were prepared in the same manner.


The PCR plate was sealed, reaction mixtures mixed, and loaded into a thermal cycler (e.g., Veriti™ 96-well thermal cycler (Applied Biosystems)) and run on the following temperature profile to generate the amplicon library. An initial holding stage was performed at 95° C. for 2 minutes, followed by about 20 cycles of a denaturing stage at 95° C. for 30 seconds, an annealing stage at 60° C. for 45 seconds, and an extending stage for 72° C. for 45 seconds. After cycling, a final extension 72° C. for 10 minutes was performed and the amplicon library was held at 10° C. until proceeding. Typically, about 20 cycles are used to generate the amplicon library. For some applications, up to 30 cycles can be used.


The amplicon sample was briefly centrifuged to collect contents before proceeding. To the amplicon library (˜20 microliters), 2 microliters of FuPa reagent was added. The reaction mixture was sealed, mixed thoroughly to ensure uniformity and incubated at 50° C. for 10 minutes, 55° C. for 10 minutes, 60° C. for 20 minutes, then held at 10° C. for up to 1 hour. The sample was briefly centrifuged to collect contents before proceeding.


After incubation, the reaction mixture proceeded directly to a ligation step. Here, the reaction mixture now containing the phosphorylated amplicon library was combined with 2 microliters of Ion Select Barcode Adapters, 5 μM each (Thermo Fisher Scientific), 4 microliters of AmpliSeq Plus Switch Solution (sold as a component of the Ion AmpliSeg™ Library Kit Plus, Thermo Fisher Scientific) and 2 microliters of DNA ligase, added last (sold as a component of the Ion AmpliSeg™ Library Kit Plus, Thermo Fisher Scientific), then incubated at the following: 22° C. for 30 minutes, 68° C. for 5 minutes, 72° C. for 5 minutes, then held at 10° C. for up to 1 hour. The sample was briefly centrifuged to collect contents before proceeding.


After the incubation step, 45 microliters (1.5× sample volume) of room temperature AMPure® XP beads (Beckman Coulter) was added to ligated DNA and the mixture was pipetted thoroughly to mix the bead suspension with the DNA. The mixture was incubated at room temperature for 5 minutes, placed on a magnetic rack such as a DynaMag™-96 side magnet (Invitrogen, Part No. 12331D) for two minutes. After the solution had cleared, the supernatant was discarded. Without removing the plate from the magnetic rack, 150 microliters of freshly prepared 70% ethanol was introduced into the sample and incubated while gently rotating the tube on the magnetic rack. After the solution cleared, the supernatant was discarded without disturbing the pellet. A second ethanol wash was performed, the supernatant discarded, and any remaining ethanol was removed by pulse-spinning the tube and carefully removing residual ethanol while not disturbing the pellet. The pellet was air-dried for about 5 minutes at room temperature. The ligated DNA was eluted from the beads in 50 microliters of low TE buffer.


Eluted libraries were quantitated by qPCR using the Ion Library TaqMan® Quantitation Kit (Ion Torrent, Cat. No. 4468802), according to manufacturer instructions. After quantification, the libraries were diluted to a concentration of about 100 pM.


Libraries were normalized to 20 pM and aliquots of the final libraries were used in template preparation and chip loading using the Ion Chef™ instrument according to the manufacturer instructions. Sequencing was performed using Ion 540™ chips on the Ion GeneStudio SS™ System according to manufacturer instructions, and gene sequence analysis was performed with the Ion Torrent Suite™ 5.16 software.


Exemplary sequencing data for two of the cell lines is shown in Table 12. Similar results were obtained across cell lines.
















TABLE 12







Mean Read
Total
Productive
Clones
Shannon



Analysis
Sample
Length
Reads
Read %
Detected
Diversity
Evenness







IGK/L
BDCM_Cell_Line_PBL
84
1,557,828
42.32
3110
10.2838
0.8863



CRL-
91
1,278,221
42.57
3179
10.1882
0.8757



2975_Cell_Line_PBL


IGH
BDCM_Cell_Line_PBL
84
1,557,828
64.94
2668
10.5336
0.9255



CRL-
91
1,278,221
64.67
2755
10.6605
0.9329



2975_Cell_Line_PBL









Clonality Assessment provides a means to identify the dominating clone (>˜10% frequency), and determine the sequence (e.g., CDR3 sequence) of a potential clone of interest. 27 B cell lines derived from a variety of B cell malignancies (including B-ALL, CLL, Multiple Myeloma, Non-Hodgkin's Lymphoma) were profiled. See Table 13. Cell line samples were diluted 1:100 in PBL gDNA using the Pan-Clonality IGH/K/L assay, as well as the FR3(d)-J and FR2-J reflex assays as described above.









TABLE 13







Cell Lines










Cell Line
Research Model







BDCM
B-ALL



MOLT-3
T-ALL



MOLT-4
T-ALL



Loucy
T-ALL



SUP-T1
T-ALL



HT
B cell Lymphoma (unspecified)



JM1
B cell Lymphoma (unspecified)



MC116
B cell Lymphoma (unspecified)



NU-DUL-1
B cell Lymphoma (unspecified)



RL
B cell Lymphoma (unspecified)



NALM-1
Blast Phase CML



JVM-2
CLL - small lymphocytic lymphoma



Pfeiffer
Diffuse large B cell lymphoma



SU-DHL-10
Diffuse large B cell lymphoma



SU-DHL-6
Diffuse large B cell lymphoma



SU-DHL-8
Diffuse large B cell lymphoma



Hs 611.T
Hodgkin's Lymphoma



HuT 78
Mycosis fungoides - Sezary Syndrome



CA-46
Burkitt's Lymphoma



Ramos
Burkitt's Lymphoma



BCP-1
Body cavity-based lymphoma



Daudi
Burkitt's Lymphoma



GA-10
Burkitt's Lymphoma



Toledo
Non-Hodgkin's Lymphoma



DS-1B
lymphangiectasia



U266B1
myeloma; plasmacytoma



IM-9
Multiple Myeloma



WSU-NHL
Non-Hodgkin's Lymphoma



GM14952
B-ALL










Results of clonality assessment are depicted in Table 14. Boxes indicate positive detection with the number of rearrangements detected. Positive detection of at least one rearrangement (IGH, IGK, IGL, KDE/Cint) was found in 25/27 cell lines tested, demonstrating a 93% positive detection rate.









TABLE 14







B Cell Line Clonality Detection















Cell Line
Published
IGH-SR
IGH



IGH
IGH


Name
rearrangements
(V1)
(V2)
IGK
IGL
KDE/Cint
FR3(d)-J
FR2-J





WSU-NHL
Unknown


2
1
1




CA46
IGH, IGK [1]
2
2
1


1
1


Toledo
IGK, IGL [1]


1
2
1




GA-10
IGH, IGK [1]
1
1
1


1
1


Daudi
IGH, IGK [1]




1
1
1








(IGKV-KDE)


U266B1
IGH, IGK, IGL [1]



1
1




GM14952
Unknown
1
1

2

1
 1*


Ramos
IGK [2]

 1**

2

1
1


RL
IGL [1] IGH [7]




1
1
1


HS611.T
IGH, IGK [1]

1
2

1
1



SU-DHL-6 [8]
IGH, IGK [1]


 1*






BDCM
IGH, IGL,

1

1
1 + 1
1




TRA [1]




(IGKV-KDE)


SU-DHL-8
IGH, IGK,



1


1



IGL, TRA [1]


GM04154
Unknown

1

1

1
1


IM9
IGH, IGK [3]



 1*

1



MM.1R (CRL-
IGL [4]

1

1
1
1



2975)


NALM-1
IGH, IGL, TRA [1]

2
1


1
1


DS-1B
IGH (IgG), IGK [5]









HT
IGH, IGK [1]


1


1
1


JVM-2
IGH, IGL [1]

1

1

1
1


LP1 (ACC41)
IGL [6]



 1*
1
1
 1*








(IGKV-KDE)


JM1
IGK, IGL, TRA [1]



1
1




Pfeiffer
IGH, IGK, IGL [1]

1



1
1


MC116
IGH, IGK, IGL [1]



1





TMM-ACC
Unknown

1


1
1









(IGKV-KDE)


NU-DUL-1
IGH, IGK, IGL [1]


1
1





BCP-1
IGH [1]

1



1
1





*Detection of rearrangement at low frequency indicating potential of SHM prohibiting efficient priming


**Analysis reports single clone with two entries possibly due to on-going SHM






Example 2

Detection of clonality in clinical research samples: 20 clinical research samples from a variety of B cell malignancies (including, e.g., MM—Multiple Myeloma, CLL—Chronic Lymphocytic Leukemia, B-ALL—Bcell Acute Lymphoblastic Leukemia, and DLBCL—Diffuse Large B cell Lymphoma) were profiled using the Pan Clonality (IGH/K/L) assay associated reflex assays using methods described in Example 1 above. Exemplary sequencing data for clinical samples is shown in Table 15. Similar results were obtained across cell lines.


Table 16 depicts the results of clonality assessment of the samples. Boxes indicate positive detection with the number of rearrangements detected. Positive detection of at least one rearrangement (IGH, IGK, IGL,KDE/Cint) was found in 19 of 20 cell lines assessed using the assay, demonstrating 95% positive rate using the single assay approach.















TABLE 15







Input
Total
Total






Amount
Productive
Clones
Shannon


Sample
Sample Type
(ng gDNA)
Reads
Detected
Diversity
Evenness





















MM-11
PBMC
100
985384
2513
4.7529
0.4208


CLL-3
BMMC
100
449851
120
0.6297
0.0912


MM-13
BM Aspirate
30
34854
118
5.5671
0.8089


MM-3
BMMC
100
4842
43
4.3718
0.8057


CLL-2
PBMC
100
408200
45
0.0416
0.0076


MM-2
BM Aspirate
50
462509
3907
8.1646
0.6843


CLL-1
BM Aspirate
50
379073
43
0.0414
0.0076


CLL-2
PBMC
100
455316
159
0.5654
0.0773


DLBCL-1
FFPE tissue
100
72499
384
6.4051
0.7461


DLBCL-2
FFPE tissue
100
966
5
0.9221
0.3971
















TABLE 16







Clinical Sample Clonality Detection











Sample Name
IGH
IGK
IGL
KDE/Cint





MM-1_BMMC

2




MM-2_BMasp_FF_StageIIIA






MM-3_BMMC_FF_StageI_IgA

2




MM-4

1
1



MM-5

2




MM-6
1
1




MM-7
1
1

1-IGKdel


MM-8
1





MM-9
1
2




MM-10

1
1



MM-11
1
1




MM-12
1
1

1-IGKdel


MM-13_BMA

2

1


CLL-1_BMaspirate_FF
1

2



CLL-2_PMC_FF
1

2



CLL-3_BMMC_FF
1
1
1



CLL-4_PBMC_FF
1
1
1
1


B-ALL-1_PBMC
1





DLBCL-1_FFPE



1


DLBCL-1_FFPE


1
1









Example 3

Linearity/Limit-of-detection of the single reaction Pan-Clonality (IGH/K/L) assay using a BDCM cell line. Linearity of response of detection of a BDCM cell line spike-in to a background of PBL gDNA was determined by preparing diluted samples then determining detection of BDCM rearrangements using the Pan-Clonality assay as described in Example 1 above. Cell line gDNA was serially diluted in PBL gDNA from 1:10 to 1:106 then prepared samples were assessed using a single library reaction. Sequencing data for clinical samples is shown in Table 17. Similar results were obtained across cell lines. The Pan-Clonality (IGH/K/L) assay detects 4 rearrangements in the BDCM cell line. See Example 1 and Table 11. All four rearrangements were detected by the assay from prepared diluted samples (data not shown). In addition, each of the four rearrangements were detected linearly in cell line dilutions down to a dilution level of 1:105 and two of the 4 rearrangements (IGH and IGK) were detected at a dilution level of 1:106.















TABLE 17








Proportion








of




Productive
Productive

Shannon


Analysis
Sample
Reads
Reads
Clones
Diversity
Evenness





















IGH
BDCM cell
270853
0.584659474
20714
14.078
0.9818



line_PBL_gDNA_r1


IGH
BDCM cell
280881
0.582047082
20560
14.0842
0.983



line_PBL_gDNA_r2


IGH
BDCM cell
331112
0.594597035
22443
14.1976
0.9823



line_PBL_gDNA_r3


IGH
BDCM cell line
244344
0.583328154
20503
14.0649
0.9819



PBL_gDNA_r4


IGH
BDCM cell
226216
0.588142327
19274
13.9777
0.982



line_PBL_gDNA_r5


IGH
BDCM cell
247536
0.578067611
20609
14.0795
0.9825



line_PBL_gDNA_r6


IGH
BDCM cell
322987
0.590917158
21399
14.1209
0.9816



line_PBL_gDNA_r7


IGH
BDCM cell
390315
0.597050144
23446
14.2507
0.9817



line_PBL_gDNA_r8


IGKL
BDCM cell
206742
0.328173124
9166
11.3775
0.8644



line_PBL_gDNA_r1


IGKL
BDCM cell
214020
0.32020798
9601
11.4661
0.8667



line_PBL_gDNA_r2


IGKL
BDCM cell
306028
0.332473132
10890
11.6062
0.8654



line_PBL_gDNA_r3


IGKL
BDCM cell
233572
0.329465981
10090
11.5134
0.8656



line_PBL_gDNA_r4


IGKL
BDCM cell
290676
0.341973699
11079
11.607
0.8639



line_PBL_gDNA_r5


IGKL
BDCM cell
291667
0.332251739
10617
11.5381
0.8627



line_PBL_gDNA_r6


IGKL
BDCM cell
277015
0.334880181
11027
11.6062
0.8643



line_PBL_gDNA_r7


IGKL
BDCM cell
322256
0.333145098
11460
11.6494
0.8639



line_PBL_gDNA_r8









Example 4

Current NGS sequencing methods for analyzing SHM rely on multiplex primers targeting the framework 1 (FR1) or Leader regions of the IGH variable gene and joining gene primers to amplify rearranged IGH chains from gDNA templates. We have developed panels based on Ion Ampliseq technology comprising primer panels for SHM evaluation. Panels were compared using both DNA and RNA input. Performance was compared using SHM values obtained from RNA samples amplified using FR1 variable gene primers in combination with constant gene primers from the Oncomine™ BCR IGH LR Assay to determine each IGH isotype and subtype in a single PCR reaction. Comparison of SHM frequencies measured from matched RNA and DNA samples were used to determine the feasibility for use of RNA in the study of SHM, as well as comparison of the performance of Leader-J and FR1-J assays in DNA studies


The Oncomine™ BCR IGH LR Assay covers CDR1, CDR2, CDR3, and CH1 domain of the constant gene with framework 1 and isotype-specific primers (FR1-C). This design enables accurate quantitation of somatic hypermutation, clonal expansion, isotype switching and identification of clonal lineages. Constant region primers are designed against all B cell isotypes and subtypes, with input requirements ranging from 25 ng to Zug of non-FFPE RNA.


Exemplary sets of forward and reverse primers comprising SEQ ID Nos 981-988, and 1504-1540, and 1593-1740 from Tables 6, and 10-11 were designed to generate BCR IGHV Assay Designs (Leader-J/FR1-J); primers targeting leader-J and FR1-J regions in separate reactions can accurately measure clonal frequencies and/or somatic hypermutation frequencies across B cell rearrangements, with input requirement range from 200 ng to Zug of gDNA.


The IgH V gene FR1 primers of Table 11 and the IgH J gene primers of Table 6 were designed to amplify all of the currently known expressed human IgH rearrangements. A variety of primer sets for amplifying sequences from the V gene FR1 region to the J genes of IgH gDNA were generated using forward primers selected from Table 11 and reverse primers selected from Table 6.


The IgH Leader primers of Table 10 and the IgH J gene primers of Table 6 were designed to amplify all of the currently known expressed human IgH rearrangements. A variety of primer sets for amplifying sequences from the V gene leader region to the J genes of IgH gDNA were generated using forward primers selected from Table 10 and reverse primers selected from Table 6. Exemplary Leader-J primer set panels are described in Table A with each primer in the set at a 1 micromolar concentration.












TABLE A







Leader-J Primer Set
SEQ ID NOs









1
1504-1540, 981-984



2
1504-1540, 985-988



3
1504-1540, 981-988










For RNA experiments, RNA from human adult normal peripheral blood leukocytes (from BioChain Institute, Inc.) was reverse transcribed to cDNA with SuperScript™ IV VILO™ Master Mix (Thermo Fisher Scientific) according to manufacturer instructions.


To a single well of a 96-well PCR plate was added 10 microliters prepared cDNA or gDNA (50 ng), 4 microliters of 1 μM forward and reverse primer pool, 4 microliters of 5× Ion AmpliSeg™ HiFi Mix (Thermo Fisher Scientific), 2 microliters dNTP Mix (6.5 mM each dNTP) and 2 microliters DNase/RNase free water to bring the final reaction volume to 20 microliters. The PCR plate was sealed, reaction mixtures mixed, and loaded into a thermal cycler (e.g., Veriti™ 96-well thermal cycler (Applied Biosystems)) and run on the following temperature profile to generate the amplicon library: an initial holding stage was performed at 95° C. for 2 minutes, followed by about 32 cycles of a denaturing stage at 95° C. for 45 seconds, an annealing stage at 62° C. for 45 seconds, and an extending stage for 72° C. for 195 seconds. After cycling, a final extension 72° C. for 10 minutes was performed and the amplicon library was held at 10° C. until proceeding. Typically, about 32 cycles are used to generate the amplicon library. For some applications (eg., more or less DNA starting material, FFPE sourced DNA, quality or quantity of DNA questionable, etc), cycle number may be increased (e.g., +3).


The amplicon sample was briefly centrifuged to collect contents before proceeding. To the amplicon library (˜20 microliters), 2 microliters of FuPa reagent was added. The reaction mixture was sealed, mixed thoroughly to ensure uniformity and incubated at 50° C. for 10 minutes, 55° C. for 10 minutes, 60° C. for 20 minutes, then held at 10° C. for up to 1 hour. The sample was briefly centrifuged to collect contents before proceeding to a ligation step. The reaction mixture now containing the phosphorylated amplicon library was combined with 2 microliters of Ion Torrent™ Dual Barcode Adapters (Thermo Fisher Scientific), 4 microliters of AmpliSeq Plus Switch Solution (sold as a component of the Ion AmpliSeg™ Library Kit Plus, Thermo Fisher Scientific) and 2 microliters of DNA ligase, added last (sold as a component of the Ion AmpliSeg™ Library Kit Plus, Thermo Fisher Scientific), then incubated at the following: 22° C. for 30 minutes, 68° C. for 5 minutes, 72° C. for 5 minutes, then held at 10° C. for up to 1 hour. The sample was briefly centrifuged to collect contents before proceeding to a library purification step.


After the ligation step incubation, 24 microliters (0.8× sample volume) of room temperature AMPure® XP beads (Beckman Coulter) was added to ligated DNA and the mixture was pipetted thoroughly to mix the bead suspension with the DNA. The mixture was incubated at room temperature for 5 minutes, placed on a magnetic rack such as a DynaMag™-96 side magnet (Invitrogen, Part No. 12331D) for two minutes. After the solution had cleared, the supernatant was discarded. Without removing the plate from the magnetic rack, 150 microliters of freshly prepared 70% ethanol was introduced into the sample, and incubated while gently rotating the tube on the magnetic rack. After the solution cleared, the supernatant was discarded without disturbing the pellet. A second ethanol wash was performed, the supernatant discarded, and any remaining ethanol was removed by pulse-spinning the tube and carefully removing residual ethanol while not disturbing the pellet. The pellet was air-dried for about 5 minutes at room temperature.


The ligated DNA was amplified by elution of the ligated DNA by suspension in 50 microliters library amp mix and 2 microliters 25× Library Amp Primers and removal of magnetic beads, followed by amplification. An initial holding stage was performed at 98° C. for 2 minutes, followed by about 7 cycles of a denaturing stage at 98° C. for 15 seconds, an annealing/extending stage at 64° C. for 60 seconds. After cycling, the amplicon library was held at 10° C. until proceeding. Typically, about 7 cycles are used to generate the amplicon library. For some applications, cycle number may be reduced (e.g., −2 cycles) or increased (e.g., +2 cycles).


After the amplification step, 30 microliters (0.6× sample volume) of room temperature AMPure® XP beads (Beckman Coulter) was added to ligated DNA and the mixture was pipetted thoroughly to mix the bead suspension with the DNA. The mixture was incubated at room temperature for 5 minutes, placed on a magnetic rack such as a DynaMag™-96 side magnet (Invitrogen, Part No. 12331D) for two minutes. After the solution had cleared, the supernatant was discarded. Without removing the plate from the magnetic rack, 150 microliters of freshly prepared 70% ethanol was introduced into the sample, and incubated while gently rotating the tube on the magnetic rack. After the solution cleared, the supernatant was discarded without disturbing the pellet. A second ethanol wash was performed, the supernatant discarded, and any remaining ethanol was removed by pulse-spinning the tube and carefully removing residual ethanol while not disturbing the pellet. The pellet was air-dried for about 5 minutes at room temperature.


The amplified DNA was eluted from the beads in 50 microliters of low TE buffer and another purification carried out. Purified library was eluted from the beads in 50 microliters of low TE buffer.


The eluted libraries were quantitated by qPCR using the Ion Library TaqMan® Quantitation Kit (Ion Torrent, Cat. No. 4468802), according to manufacturer instructions. After quantification, the libraries were diluted to a concentration of about 25 pM.


The libraries were normalized to 25 pM and aliquots of the final libraries were used in template preparation and chip loading using the Ion Chef™ instrument according to the manufacturer instructions. Sequencing was performed using Ion 540™ chips on the Ion GeneStudio™ System according to manufacturer instructions, and gene sequence analysis was performed with the Ion Torrent Suite™ software. Since the sequences were generated from use of J gene primers, they were subjected to a J gene sequence inference process involving adding the inferred J gene sequence to the sequence read to create an extended sequence read, aligning the extended sequence read to a reference sequence, and identifying productive reads, as described herein. In addition, the generated sequence data was further subjected to the error identification and removal programs provided herein.


Oncomine™ BCR-IGH LR libraries were prepared using plasmid constructs containing full length IGH chains cloned from germline and CLL research samples that were spiked into PBL total RNA background. These libraries were sequenced using the Ion™ GeneStudio S5 530 chip and analyzed using the Ion Reporter to evaluate the ability to quantify somatic hypermutation, identify isotype, clonal structure of germline and CLL research samples. Measured and known SHM frequency were using control plasmids. V-gene SHM frequencies for constructs were calculated over entire V-gene (including leader sequence).


Table 18 indicates observed SHM levels measured using Oncomine™ BCR-IGH LR Assay is comparable to known SHM frequencies from known CLL sequences which were designed into synthetic plasmid controls.









TABLE 18







Qualifying SHM in Germline and CLL Research Samples










Expected
Observed












V-Gene

V-Gene














Research Sample
SHM

Clonal
SHM

Clonal















Accession
Status
Frequency
Isotype
Structure
Frequency
Isotype
Structure
Status


















JX432218.1
Mutated
0.037
IgA1
Monoclonal
0.048
IgA1
Monoclonal
PASS


AF021966.1
Mutated
0.088
IgG2
Monoclonal
0.102
IgG2
Monoclonal
PASS


AF021964.1
Mutated
0.084
IgG1
Monoclonal
0.088
IgG1
Monoclonal
PASS


JX432219.1
Mutated
0.058
IgA2
Monoclonal
0.057
IgA2
Monoclonal
PASS


JX432222.1
Germline
0
IgG3
Monoclonal
0
IgG3
Monoclonal
PASS


AF021958.1
Germline
0
IgM
Monoclonal
0
IgM
Monoclonal
PASS


AF021967.1
Germline
0
IgD
Monoclonal
0
IgD
Monoclonal
PASS









Oncomine™ BCR-IGH LR SHM values were compared to those obtained by Sanger sequencing using IGH-Leader or FR1 and joining gene primer sets. High concordance between BCR IGH-LR assay with sanger sequencing was found when comparing the IGHV SHM frequencies. (IGHV SHM Spearman Concordance Value=0.849, data not shown).


Libraries were prepared using the Oncomine™ BCR IGH-LR Assay from total RNA extracted from peripheral blood spiked with lymphoma cell line total RNA to a frequency of 10E-2 by mass ratio; and libraries were also prepared using the IGHV SHM Leader-J and FR1-J assays from genomic DNA extracted from peripheral blood spiked with lymphoma cell line genomic DNA to a frequency of 10E-2 by mass ratio. Libraries were sequenced via the Ion GeneStudio™ S5 System, followed by Ion Reporter analysis to identify clonotypes and evaluate B cell clone frequencies.









TABLE 19







Correlation between Ion Oncomine ™ BCR IGH LR Assay and IGHV SHM Leader-J and FR1-J Assays















SHM


SHM
SHM
SHM
SHM



Frequency
SHM
SHM
Frequency
Frequency
Frequency
Frequency



measured by
Frequency
Frequency
measured by
measured by
measured by
measured by



BCR Pan-
measured by
measured by
Leader-J
Leader-J
FR1-J
FR1-J


Cell Line
Clonality
IGH-LR (1)
IGH-LR (2)
Assay (1)
Assay (2)
Assay (1)
Assay (2)

















MM.1R
0


1.7
1.7
2.2
2.2


(CRL-2975)


JVM2
0
0.8
0.9
0.7
0.7
0.9
0.9


BDCM
2.2
5.7
5.7
5.1
5.4
5.7
5.7


Pfeiffer
5
2.2
2.2
1.7
1.7
2.2
2.2


GA-10
8.3


6.1
6.1




TMM
15
9.1
9.1
7.4
7.4
9.1
9.1









Both RNA and DNA input assay workflows were able to correctly determine the SHM status of all rearrangements tested. IGHV SHM values were highly concordant between both RNA and DNA approaches. SHM values derived from FR1 targeting variable gene primers delivered concordant results compared to leader targeting variable gene primers when using DNA input across a wide range of SHM frequencies tested. See Table 19.


Clone frequencies were obtained from the Ion Reporter clone summary analysis and high concordance was observed for 5 research sample values when correlated between BCR-IGH LR and IGHV Leader J and FR1-J approaches. See Table 20.









TABLE 20







Clone Frequencies











IGH-BCR-LR
IGHV Leader-J
IGHV FR1-J














IGH-BCR-LR
1.0000
0.8101
0.9495


IGHV Leader-J
0.8101
1.0000
0.8201


IGHV FR1-J
0.9495
0.8201
1.0000









High concordance was observed when comparing SHM frequency values for 5 selected research cell lines that are correlated with a RA2 value of greater than 0.9 in comparison to the values derived from the IGH BCR-LR assay (data not shown).


These results support the ability of highly multiplexed long-read NGS assays to accurately quantify SHM in either DNA or RNA samples. Concordant results were shown between FR1 and Leader targeting primers using DNA input show the utility in both priming locations. RNA based NGS methods benefit from lower sample requirements as well as the addition of isotype (and subtype) identification, opening new research areas for study of the B cell immune repertoire.


Example 5

After testing with cell lines in as described above, we carried out similar analytics using extracts from CLL (chronic lymphocytic leukemia) clinical research samples obtained from Cureline, determined by the IGH-LR (RNA), FR1-J (DNA), and Leader-J (DNA) assays. Libraries were prepared and analysis carried out as described in the previous example, using the Oncomine™ BCR IGH-LR Assay from total RNA extracted from clinical samples; and libraries were also prepared using the IGHV SHM Leader-J and FR1-J assays from genomic DNA extracted from the samples. Libraries were sequenced via the Ion GeneStudio™ S5 System, and comparison of SHM frequency in CLL research samples.


Our expectation was there would be lower somatic hypermutation SHM rates for Leader-J assay due to longer overall amplicon length. However, results from SHM assays are in agreement. See Table 21


Results support the ability of highly multiplexed long-read NGS assays to accurately quantify SHM in either DNA or RNA samples, in contrived samples and clinical samples. RNA based NGS methods benefit from lower sample requirements as well as the addition of isotype (and subtype) identification, opening new research areas for study of the B cell immune repertoire.









TABLE 21







Correlation between Ion Oncomine ™ BCR IGH LR Assay and IGHV SHM Leader-J and FR1-J Assays using clinical samples


















SHM
SHM
SHM
SHM




SHM
SHM
Frequency
Frequency
Frequency
Frequency




Frequency
Frequency
measured by
measured by
measured by
measured by




measured by
measured by
Leader-J
Leader-J
FR1-J
FR1-J


Sample
Cureline Clinical Samples
IGH-LR (1)
IGH-LR (2)
Assay (1)
Assay (2)
Assay (1)
Assay (2)

















1
CLL-16-507/1215 PBMC
10.40%   
10.40%   
8.10%
8.10%
10.40%   
10.40%   


2
CLL-16-628/0318 PBMC
0%
0%
  0%
  0%
0%
0%


3
CLL-11-011 PBMC
8%
8%


8%
8%


4
CCL-11-010 PBMC


  0%
  0%
0%
0%


5
CLL-108/06 PBMC
0%
0%
  0%
  0%
0%
0%


6
CLL-17-577/0217 PBMC
0.40%  
0.40%  
0.30%
0.30%
0.40%  
0.40%  



CLL-17-577/0217 BMMC
0.40%  
0.40%  
0.30%
0.30%
0.40%  
0.40%  


7
16_5021115_2733930_CLL
3.9%; 4%, 3.8%
3.9%; 4%, 3.8%
3.10%
3.10%
3.9%; 3.8%, 4%
3.9%; 3.8%, 4%



PBMC



16_5021115_2733930_CLL
3.9%; 4%, 3.8%
3.9%; 4%, 3.8%
3.10%
3.10%
3.9%; 3.8%, 4%
3.9%; 3.8%, 4%



BMA



16_5021115_2733930_CLL
3.9%; 4%, 3.8%
3.9%; 4%, 3.8%
3.10%
3.10%
3.9%; 3.8%, 4%
3.9%; 3.8%, 4%



BMMC


8
16_5100116_2733930_CLL
7.4%; 7.3%
7.4%; 7.3%
6.40%
6.40%
7.70%  
7.70%  



PBMC



16_5100116_2733930_CLL
7.4%; 7.3%
7.4%; 7.3%
6.40%
6.40%
7.70%  
7.70%  



BMA



16_5100116_2733930_CLL
7.4%; 7.3%
7.4%; 7.3%
6.40%
6.40%
7.70%  
7.70%  



BMMC


9
16_6290318_2733930_CLL
0%
0%
  0%
  0%
0%
0%



PBMC


10
16_5681116_2733930_CLL
0%
0%
  0%
  0%
0%
0%



PBMC









Example 6

A recent publication (Ho, et al. doi.org/10.1016/j.jmoldx.2020.10.015) testing clonality detection in 438 MM samples from 251 individuals using a combination of the Invivoscribe IGH FR3-, FR2-, FR1-, Leader-J Assays, as well as IGK Assay found 93% positive detection (235/251). When we compare clonality detection using the assays provided herein, total positive detection achieved similar or higher levels using a simpler, streamlined assay approach.









TABLE 22







Summary of clonality detection












Total
Total positive
Total positive
Total positive


Samples
Tested
(IGH) [%]
(IGL) [%]
(IGH + IGL) [%]





Cell Lines
27
13
22
25




[48%]
[81%]
[93%]


Clinical Research Samples
20
11
19
19


(MM, CLL, B-ALL, DLBCL)

[55%]
[95%]
[95%]









REFERENCES



  • [1] Tan, K., Ding, L., Sun, Q. et al. BMC Cancer 18, 940 (2018).-doi.org/10.1186/s12885-018-4840-5

  • [2] D Benjamin, et al. J Immunol 1982; 129:1336-1342

  • [3] van Boxel J A, Buell D N. Nature. 1974; 251(5474):443-444.

  • [4] web.expasy.org/cellosaurus/CVCL_8794

  • [5] web.expasy.org/cellosaurus/CVCL_5278

  • [6] web.expasy.org/cellosaurus/CVCL_0012; Pegoraro L, Malavasi F, Bellone G, et al. Blood. 1989; 73(4):1020-1027.

  • [7] www.atcc.org/Products/All/CRL-2261.aspx #characteristics

  • [8] SU-DHL-6 possesses a t(14;18)(q32;q21) translocation and demonstrates an unexpected recombination within its heavy chain gene locus that may be the interchromosomal breakpoint.

  • [9] Huet, R., et al. Leukemia (2020) 34: 2257-2259

  • [10] Davi, F., et al. Leukemia (2020), 34: 2545-2551

  • [11] Ho, et al. doi.org/10.1016/j.jmoldx.2020.10.015


Claims
  • 1. A method for amplification of rearranged genomic DNA (gDNA) sequences of a B cell receptor (BCR) repertoire in a sample, comprising: performing a single multiplex amplification reaction to amplify expressed target BCR nucleic acid template molecules using each of a set of:i) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgH coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgH coding sequence; andii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLlambda coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgLlambda coding sequence; andiii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLkappa coding sequence comprising at least a portion of framework region 1 (FR3) within the V gene, (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgLkappa coding sequence; and optionallyiv) (a) one or more gene primers directed to a IgLkappa Cintron sequence, and (b) one or more gene primers directed to a KDE sequence;wherein each set of i) and ii) and iii) primers is directed to coding sequences of the same target BCR gene selected from an IgH, IgLlambda, and IgLkappa gene, respectively, and wherein performing the amplification using the set of i) and ii) and iii) primers results in amplicon molecules representing the target BCR repertoire in the sample;thereby generating target BCR amplicon molecules comprising the expressed target BCR repertoire.
  • 2. The method of claim 1, wherein each of the plurality primers has any one or more of the following criteria: (1) includes two or more modified nucleotides within the primer, at least one of which is included near or at the termini of the primer and at least one of which is included at, or about the center nucleotide position of the primer;(2) length is about 15 to about 40 bases in length;(3) Tm of from above 60° C. to about 70° C.;(4) has low cross-reactivity with non-target sequences present in the sample;(5) at least the first four nucleotides (going from 3′ to 5′ direction) are non-complementary to any sequence within any other primer present in the same reaction; and(6) are non-complementary to any consecutive stretch of at least 5 nucleotides within any other produced target amplicon.
  • 3. The method of claim 1, wherein each of the plurality of primers includes one or more cleavable groups, preferably located (i) near or at the termini of the primer or (ii) near or about the center nucleotide of the primer.
  • 4. The method of claim 1, wherein each of the plurality primer includes two or more modified nucleotides having a cleavable group selected from a methylguanine, 8-oxo-guanine, xanthine, hypoxanthine, 5,6-dihydrouracil, uracil, 5-methylcytosine, thymine-dimer, 7-methylguanosine, 8-oxo-deoxyguanosine, xanthosine, inosine, dihydrouridine, bromodeoxyuridine, uridine or 5-methylcytidine.
  • 5. The method of claim 1, wherein the set of primers includes (a) one or more gene primers directed to a IgLkappa Cintron sequence, and (b) one or more gene primers directed to a KDE sequence.
  • 6. The method of claim 1, wherein the plurality of V primers anneal to at least a portion of the FR3 portion of the template molecules, and wherein the one or more J gene primers comprises at least five primers that anneal to at least a portion of the J gene portion of the template molecules.
  • 7. The method of claim 1, wherein the generated target BCR amplicon molecules include complementarity determining region CDR3 of the target BCR gene sequence.
  • 8. The method of claim 1, wherein the at least one set of i) and ii) and iii) and iv) is selected from primers of Tables 9 and 6, Tables 1 and 2, Tables 3 and 4, and Table 5, respectively.
  • 9. A method for screening for a biomarker for a disease or condition in a subject, comprising: performing a single multiplex amplification reaction to amplify target BCR nucleic acid template molecules from a sample from the subject according to claim 1;performing sequencing of the target BCR amplicon molecules and determining the sequence of the molecules, wherein determining the sequence includes obtaining initial sequence reads, aligning the initial sequence read to a reference sequence, identifying productive reads, and correcting one or more indel errors to generate rescued productive sequence reads;identifying BCR repertoire clonal populations from the determined target BCR sequences; andidentifying the sequence of at least one BCR clone for use as a biomarker for the disease or condition in the subject.
  • 10. The method of claim 9, wherein the disease or condition is selected from cancer, autoimmune disease, infectious disease, allergy, response to vaccination, and response to an immunotherapy treatment.
  • 11. The method of claim 9, wherein the target BCR gene is IgH, IgLlambda or IgLkappa.
  • 12. The method of claim 9, wherein the sample comprises hematopoietic cells, lymphocytes, tumor cells, or cell-free DNA (cfDNA).
  • 13. The method of claim 9, wherein the sample is selected from the group consisting of peripheral blood mononuclear cells (PBMCs), B cells, circulating tumor cells, and tumor infiltrating lymphocytes.
  • 14. The method of claim 9, wherein the sample is formalin-fixed paraffin-embedded (FFPE) tissue, fresh tissue, frozen tissue, a blood sample, or a plasma sample
  • 15. A composition for analysis of a B cell receptor (BCR) repertoire in a sample, comprising at least one set of: i) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgH coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgH coding sequence; andii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLlambda coding sequence comprising at least a portion of framework region 3 (FR3) within the V gene, (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgLlambda coding sequence; andiii) (a) a plurality of V gene primers directed to a majority of different V genes of BCR IgLkappa coding sequence comprising at least a portion of framework region 1 (FR3) within the V gene, (b) a plurality of J gene primers directed to at least a portion of a majority of different J genes of the BCR IgLkappa coding sequence; and optionallyiv) (a) one or more gene primers directed to a IgLkappa Cintron sequence, and (b) one or more gene primers directed to a KDE sequence;wherein each set of i) and ii) and iii) primers is directed to coding sequences of the same target BCR gene selected from an IgH, IgLlambda, and IgLkappa gene, respectively, and wherein performing the amplification using the set of i) and ii) and iii) primers results in amplicon molecules representing the target BCR repertoire in the sample.
  • 16. The composition of claim 15, wherein each of the plurality primers has any one or more of the following criteria: (1) includes two or more modified nucleotides within the primer, at least one of which is included near or at the termini of the primer and at least one of which is included at, or about the center nucleotide position of the primer;(2) length is about 15 to about 40 bases in length;(3) Tm of from above 60° C. to about 70° C.;(4) has low cross-reactivity with non-target sequences present in the sample;(5) at least the first four nucleotides (going from 3′ to 5′ direction) are non-complementary to any sequence within any other primer present in the same reaction; and(6) are non-complementary to any consecutive stretch of at least 5 nucleotides within any other produced target amplicon.
  • 17. The composition of claim 15, wherein each of the primers includes one or more cleavable groups located (i) near or at the termini of the primer or (ii) near or about the center nucleotide of the primer.
  • 18. The composition of claim 15, wherein each of the plurality primers includes two or more modified nucleotides having a cleavable group selected from a methylguanine, 8-oxo-guanine, xanthine, hypoxanthine, 5,6-dihydrouracil, uracil, 5-methylcytosine, thymine-dimer, 7-methylguanosine, 8-oxo-deoxyguanosine, xanthosine, inosine, dihydrouridine, bromodeoxyuridine, uridine or 5-methylcytidine.
  • 19. The composition of claim 15, wherein the set of primers comprises (a) one or more gene primers directed to a IgLkappa Cintron sequence, and (b) one or more gene primers directed to a KDE sequence.
  • 20. The composition of claim 15, wherein the at least one set of i) and ii) and iii) and iv) is selected from primers of Tables 9 and 6, Tables 1 and 2, Tables 3 and 4, and Table 5, respectively.
RELATED APPLICATIONS

This application is a continuation of International Application No. PCT/US2021/072434, filed Nov. 16, 2021, which in turn claims priority to and the benefit under 35 USC § 119(e) of each of U.S. Provisional Application No. 63/198,843 filed Nov. 16, 2020, U.S. Provisional Application No. 63/201,048 filed Apr. 9, 2021, and U.S. Provisional Application No. 63/203,337, filed Jul. 17, 2021. The entire contents of each of the aforementioned applications are herein incorporated by reference in their entirety.

Provisional Applications (3)
Number Date Country
63203337 Jul 2021 US
63201048 Apr 2021 US
63198843 Nov 2020 US
Continuations (1)
Number Date Country
Parent PCT/US2021/072434 Nov 2021 US
Child 18317484 US