The present disclosure generally relates to immune-repertoire based disease diagnosis technology, and more particularly to a novel system and method for efficiently grouping similar T cell receptor (TCR) sequences and diagnosing a patient with a disease and determining his/her disease status with a peripheral blood TCR repertoire.
Adaptive immune repertoire is an important regulator of diverse human diseases, and over 10,000 TCR repertoire sequencing (TCR-seq) samples have been generated in the recent years. However, interpretation of TCR data has been hindered by the scarcity of known antigen-specificities. Recent studies demonstrated that similarity in the TCR hypervariable complementarity-determining region 3 (CDR3) implicates structural resemblance for antigen recognition. Therefore, clustering of similar CDR3s has become an important way to identify antigen-specific receptors.
Methods, systems, and apparati for transforming a T-cell receptor (TCR) repertoire sample into a fixed-length vector. Short peptide sequences with different lengths in each TCR may be encoded into a numeric vector with fixed dimensions. A large amount of existing TCRs from healthy individuals may be pooled to generate a distribution of the encoding vector in a high-dimensional Euclidean space. Unsupervised clustering may be performed on the “points” in this space (each point is a TCR) to group them into antigen-specific clusters. The centroid of each cluster may be defined as a repertoire functional unit (“RFU”). For a new TCR repertoire sample, each TCR may be assigned to its most similar RFU group, and the RFU counts may be normalized by the number of sequences in the repertoire. The output data may be a fixed-length RFU vector, with each number representing the relative abundance of the given RFU in the repertoire.
The drawings described below are for illustration purposes only. The drawings are not intended to limit the scope of the present disclosure.
A number of conventional studies have applied TCR clustering to investigate antigen-specific T cell responses during disease progression or immunotherapy treatments. It is speculated that integrating a large number of TCR-seq samples from multiple studies will result in more insights into immune-disease interactions, and create novel opportunities for prognosis and diagnosis. Nonetheless, high clustering specificity requires pairwise Smith-Waterman alignment on both the CDR3 sequences and the TCR variable gene (TRBV) alleles, which has quadratic computational complexity that usually cannot scale up to the scale of TCR repertoire samples (≥100K sequences). Motif-based clustering achieves higher speed, but has much lower specificity. Therefore, none of the existing TCR clustering methods are suitable to analyze large cohorts of TCR-seq samples.
Unsupervised TCR clustering is a fundamental analysis of immune repertoire data. In the ideal scenario, all TCRs specific to the same epitope should be included in the same cluster. However, this is not feasible for sequence similarity or motif based clustering approach, due to the putative diversity in TCR sequences of shared specificity. Such diversity is caused by the distinct docking strategies of T cell receptors. For example, TCRs specific to the influenza GIL epitope usually contain the classic RSS/RSA motif in the CDR3 region, yet a related study reported that the LGGW motif also elicits strong binding to GIL from a different direction. Such structural variation cannot be captured by simple Smith-Waterman alignment, or motif grouping. Consequently, CDR3s with dissimilar motifs will be fragmented into smaller clusters despite their shared specificity, which is a common limitation to the current methods.
To address this challenge, a novel framework was developed to transform a TCR repertoire sample into a fixed-length “gene-expression-like” vector. First, each TCR sequence in the TCR repertoire may be numerically encoded. More specially, short peptide sequences with different lengths may be encoded into a numeric vector with fixed dimensions. Second, a large amount of existing TCRs from healthy individuals may be pooled to generate a distribution of the encoding vector in a high-dimensional Euclidean space. Unsupervised clustering may be performed on the “points” in this space (each point is a TCR) to group them into antigen-specific clusters. The centroid of each cluster may be defined as a Repertoire Functional Unit (“RFU”). For a new TCR repertoire sample, each TCR may be assigned to its most similar RFU group, and the RFU counts may be normalized by the number of sequences in the repertoire. The output data may be a fixed-length RFU vector, with each number representing the relative abundance of the given RFU in the repertoire.
The present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of non-limiting illustration, certain examples. Subject matter may, however, be described in a variety of different forms and, therefore, covered or claimed subject matter is intended to be construed as not being limited to any examples set forth herein. Among other things, subject matter may be described as methods, devices, components, or systems. Accordingly, examples may take the form of hardware, software, firmware or any combination thereof (other than software per se). The following detailed description is, therefore, not intended to be taken in a limiting sense.
In general, terminology may be understood at least in part from usage in context. For example, terms, such as “and”, “or”, or “and/or,” as used herein may include a variety of meanings that may depend at least in part upon the context in which such terms are used. Typically, “or” if used to associate a list, such as A, B or C, is intended to mean A, B, and C, here used in the inclusive sense, as well as A, B or C, here used in the exclusive sense. In addition, the term “one or more” as used herein, depending at least in part upon context, may be used to describe any feature, structure, or characteristic in a singular sense or may be used to describe combinations of features, structures or characteristics in a plural sense. Similarly, terms, such as “a,” “an,” or “the,” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context. In addition, the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
The present disclosure is described below with reference to block diagrams and operational illustrations of methods and devices. It is understood that each block of the block diagrams or operational illustrations, and combinations of blocks in the block diagrams or operational illustrations, may be implemented by means of analog or digital hardware and computer program instructions. These computer program instructions can be provided to a processor of a general purpose computer to alter its function as detailed herein, a special purpose computer, ASIC, or other programmable data processing apparatus, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, implement the functions/acts specified in the block diagrams or operational block or blocks. In some alternate implementations, the functions/acts noted in the blocks can occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
For the purposes of this disclosure a non-transitory computer readable medium (or computer-readable storage medium/media) stores computer data, which data can include computer program code (or computer-executable instructions) that is executable by a computer, in machine readable form. By way of example, and not limitation, a computer readable medium may comprise computer readable storage media, for tangible or fixed storage of data, or communication media for transient interpretation of code-containing signals. Computer readable storage media, as used herein, refers to physical or tangible storage (as opposed to signals) and includes without limitation volatile and non-volatile, removable and non-removable media implemented in any method or technology for the tangible storage of information such as computer-readable instructions, data structures, program modules or other data. Computer readable storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, cloud storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other physical or material medium which can be used to tangibly store the desired information or data or instructions and which can be accessed by a computer or processor.
For the purposes of this disclosure the term “server” should be understood to refer to a service point which provides processing, database, and communication facilities. By way of example, and not limitation, the term “server” can refer to a single, physical processor with associated communications and data storage and database facilities, or it can refer to a networked or clustered complex of processors and associated network and storage devices, as well as operating software and one or more database systems and application software that support the services provided by the server. Cloud servers are examples.
For the purposes of this disclosure, a “network” should be understood to refer to a network that may couple devices so that communications may be exchanged, such as between a server and a client device or other types of devices, including between wireless devices coupled via a wireless network, for example. A network may also include mass storage, such as network attached storage (NAS), a storage area network (SAN), a content delivery network (CDN) or other forms of computer or machine readable media, for example. A network may include the Internet, one or more local area networks (LANs), one or more wide area networks (WANs), wire-line type connections, wireless type connections, cellular or any combination thereof. Likewise, sub-networks, which may employ differing architectures or may be compliant or compatible with differing protocols, may interoperate within a larger network.
For purposes of this disclosure, a “wireless network” should be understood to couple client devices with a network. A wireless network may employ stand-alone ad-hoc networks, mesh networks, Wireless LAN (WLAN) networks, cellular networks, or the like. A wireless network may further employ a plurality of network access technologies, including Wi-Fi, Long Term Evolution (LTE), WLAN, Wireless Router (WR) mesh, or 2nd, 3rd, 4th or 5th generation (2G, 3G, 4G or 5G) cellular technology, Bluetooth, 802.11b/g/n, or the like. Network access technologies may enable wide area coverage for devices, such as client devices with varying degrees of mobility, for example.
In short, a wireless network may include virtually any type of wireless communication mechanism by which signals may be communicated between devices, such as a client device or a computing device, between or within a network, or the like.
A computing device may be capable of sending or receiving signals, such as via a wired or wireless network, or may be capable of processing or storing signals, such as in memory as physical memory states, and may, therefore, operate as a server. Thus, devices capable of operating as a server may include, as examples, dedicated rack-mounted servers, desktop computers, laptop computers, set top boxes, integrated devices combining various features, such as two or more features of the foregoing devices, or the like.
Referring now to
The system 100 of
The network 104 may be connected, for example, to one or more client devices 102, an application server 106, a content server 108, and a database 107 and their components with another network or device. The network 104 may be configured as a variety of wireless sub-networks that may further overlay stand-alone ad-hoc networks, and the like, to provide an infrastructure-oriented connection for the one or more client devices 102, the application server 106, the content server 108, and the database 107. The network 104 may be configured to employ any form of computer readable media or network for communicating information from one electronic device to another.
The one or more client devices 102 may, for example, include a desktop computer or a portable device, such as a cellular telephone, a smart phone, a display pager, a radio frequency (RF) device, an infrared (IR) device, a Near Field Communication (NFC) device, a Personal Digital Assistant (PDA), a handheld computer, a tablet computer, a phablet, a laptop computer, a set top box, a wearable computer, smart watch, an integrated or distributed device combining various features, such as features of the forgoing devices, or the like.
The one or more client devices 102 may also include at least one client application that is configured to receive content from another computing device. The one or more client devices 102 may communicate over the network 104 with other devices or servers, and such communications may include sending and/or receiving messages, generating and providing TCR data, searching for, viewing and/or sharing TCR data, or any of a variety of other forms of communications. The one or more client devices 102 may be capable of processing or storing signals, such as in memory as physical memory states, and may, therefore, operate as a server
The application server 106 and the content server 108 may include one or more devices that are configured to provide and/or generate any type or form of content via a network to another device. Devices that may operate as the application server 106 and/or the content server 108 may include personal computers, desktop computers, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, servers, and the like. The application server 106 and the content server 108 may store various types of data related to the content and services provided by each device in the database 107.
Users (e.g., patients, doctors, technicians, and the like) may be able to access services provided by the application server 106 and the content server 108. This may include, for example, application servers, authentication servers, search servers, exchange servers, via the network 104 using the one or more client devices 102. Thus, the application server 106, for example, may store various types of applications and application related information including application data and user profile information.
Although
Referring now to
In an example, the TCR engine 200 may be a conventional personal computer, and the methods described below may be performed using a single thread on a CPU. In another example, when clustering reference data of 10 million sequences, the TCR engine 200 may be a high-performance computing (HPC) super cluster (e.g., with 128G memory allocation and 8 CPU nodes).
The TCR engine 200 may be a stand-alone application that executes on a device (e.g., a user device or system/web-connected server/device). In another example, the TCR engine 200 may function as an application installed on the device and/or a web-based application accessed by the device over a network. The TCR engine 200 may be installed as an augmenting script, program or application (e.g., a plug-in or extension) to another application, such as, for example, a health care application that aggregates and shares patient related data.
The database 107 may be any type of database or memory, and may be associated with a server on a network (e.g., the application server 106 and the content server 108) or a user's device (e.g., the one or more client devices 102). The database 107 may include a dataset of data and metadata associated with local and/or network information related to users, services, applications, content and the like. Such information may be stored and indexed in the database 107 independently and/or as a linked or associated dataset. As discussed herein, it should be understood that the data (and metadata) in the database 107 can be any type of information and type, whether known or to be known, without departing from the scope of the present disclosure.
The database 107 may store data for users (e.g., user data. The stored user data may include, for example, information associated with reference TCR-seq data, a patient's cancer diagnosis, patient's chromosomal information, patient's DNA information, patient's blood information, patient demographic information, patient biographic information, and the like, or some combination thereof.
The data (and metadata) in the database 107 may be any type of information related to TCR-seq data, a patient, doctor, content, a device, an application, a service provider, a content provider, whether known or to be known, without departing from the scope of the present disclosure.
The data stored in the database 107 may be encrypted, for example, using a 256-bit encryption, such that the data is private and controlled according to Health Insurance Portability and Accountability Act of 1996 (HIPPA).
The database 107 may store and index the information as linked set of data and metadata, where the data and metadata relationship can be stored as the n-dimensional vector. Such storage can be realized through any known or to be known vector or array storage, including, but not limited to, a hash tree, queue, stack, VList, or any other type of known or to be known dynamic memory allocation technique or technology. It should be understood that any known or to be known computational analysis technique or algorithm, such as, but not limited to, cluster analysis, data mining, Bayesian network analysis, Hidden Markov models, artificial neural network analysis, logical model and/or tree analysis, and the like, and be applied to determine, derive or otherwise identify vector information for patients and/or health care providers.
As discussed above with reference to
The principal processor, server, or combination of devices that include hardware programmed in accordance with the special purpose functions herein may be referred to for convenience as TCR engine 200. The TCR engine 200 may include a sample module 202, an AI module 204, an encoding module 206, a filtering module 208, an identification (ID) module 210, and a conversion module 212. The engine(s) and modules discussed herein are non-exhaustive, as additional or fewer engines and/or modules (or sub-modules) may be applicable to the examples of the systems and methods discussed. The operations, configurations and functionalities of each module, and their role within examples of the present disclosure are discussed below.
The principles described herein may be embodied in many different forms. T cells reactive to antigens are central mediators of immunity against various diseases and key targets of immunotherapies, yet as most disease antigens are unknown, experimental detection of disease-associated T cells remains difficult. The recent development of deep immune repertoire sequencing (TCR-seq) technology has placed an additional emphasis on the identification of such T cells, as it may open new opportunities for non-invasive clinical diagnosis, prognosis and longitudinal immune monitoring of patients. However, human immune repertoire contains public T cells, naïve T cells, and memory/effector T cells specific to diverse antigens, and this complexity adds to the challenges conventional systems are unable to solve (e.g., to identify cancer-associated T cells in the TCR-seq data).
Previous studies on the TCR repertoires of cancer patients reported that simple statistics, such as diversity and clonality, are associated with clinical outcome under certain conditions, substantiating the utilities of repertoire data as a potential prognostic factor. However, with the fast advancement of immunotherapies and rapid accumulation of TCR-seq data, more computational tools are required to bridge the gap between basic immunogenomics research and clinical applications beneficial to patients.
The disclosed systems and methods provide these needed tools through a novel framework executing ensemble machine learning software (referred to as TCRboost) that provides for de novo prediction of disease-associated immune repertoires using the TCR-seq data. Grouping of similar TCR sequences implicates shared antigen-specificity, and can be used to discover novel therapeutical targets. Conventional methods suffer from high computational expenses that cannot scale up to the magnitude of immune repertoire datasets.
Referring now to
In an example, each sample may be processed to ensure data quality. First, TCR clones without a defined variable gene (TRBV) may be removed. Second, CDR3 amino acid sequences containing non-productive characters, such as “*” or “_”, may be excluded. Third, clones may be ranked by their estimated frequencies from high to low, and a top amount (e.g., 10,000) or maximum number of remaining clones may be selected. Fourth, the format of the TRBV gene names may be modified to be consistent with IMGT (imgt.org) convention. Samples may be derived from peripheral blood.
In an example, the sample TCR repertoire sequencing dataset 301 may include TCRs from over 2,000 samples covering cancer, infectious diseases, autoimmune disorders, and healthy controls merged into one file with over 20 million TCRs. Ultra-large-scale TCR clustering may be performed using the geometric isometry based antigen-specific TCR alignment (GIANA) process described in related Patent Cooperation Treaty (PCT) App. Pub. No. WO 2022/271566 entitled “TCR-Repertoire Framework for Multiple Disease Diagnosis” and filed on Jun. 17, 2022. The full disclosure of this application is incorporated herein by reference.
Referring now to
In step 402, the sample module 202 may identify CDR3 sequences from a TCR dataset. The sample module 202 may receive the TCR dataset from, for example, the database 107. In step 404, the encoding module 206 may encode each of the CDR3 sequences from the TCR dataset into numeric vectors. The numeric vectors may correspond to a sequence of amino acids in each of the CDR3 sequences.
In step 406, the conversion module 212 may convert the numeric vectors to coordinates in a high-dimensional Euclidean space. In step 408, the AI module 204 may generate a predictive model using a neural network. The neural network may learn to generate a tree data structure of the numeric vectors based on relative distances of the coordinates and may then group the coordinates into pre-clusters based on the relative distances. In step 410, the filtering module 208 may filter the CDR3 sequences in the pre-clusters. In step 412, the ID module 210 may identify antigen-specific CDR3 clusters from the filtered pre-clusters.
GIANA full mode (e.g., exact and variable gene included) maybe implemented to identify highly similar TCR clusters. The returned clusters may be processed in one or more ways. For example, clusters with more than 5 TCRs may be removed, as smaller clusters tend to have higher antigen specificity. Further, clusters with identical sequences may be removed.
The GIANA process may be used to close the gap between speed and prediction accuracy, with better precision and sensitivity than conventional methods (e.g., TCRdist) at approximately 600 times of its speed. GIANA may also allow ultrafast query of large reference cohorts, processing over 100 billion sequence comparisons within 3 minutes. In an example, GIANA may be able to compare 10+ TCRs against 107 reference sequences within 3 minutes. Applying GIANA to cluster large-scale TCR datasets may reveal novel insights of disease-specific receptors and provide a new solution to the repertoire classification task. Query of unseen TCR-seq samples against existing references using GIANA may achieve high accuracies and may be used to differentiate cancer, infectious disease, and autoimmune disorders. GIANA may be used as a TCR-based non-invasive multi-disease diagnostic platform.
Referring again to
The isometric embedding of trimers may derived by symmetrizing the trimer substitution matrix by:
This may be performed because the replacement of trimer pairs is not ordered. Next, the Pearson's correlation matrix of Ms, denoted as Ps, may be calculated. The (i, j) entry of Ps may be the Pearson's correlation of trimer i and j (trimers are ordered alphabetically). The Euclidean Distance Matrix (EDM) may be defined using the following formula:
In step 306, multi-dimensional scaling (MDS) may be applied to the EDM. In an example, dimensionality may be set to be 500. In another example, dimensionality may be incremented to 1,000, 1,500, and 2,000. In step 307, the outcome of this analysis may be a length-500 numeric vector (β) for each of the 8,000 amino acid trimers.
In step 308, mean pooling may be performed. In step 309, a process of numeric encoding of the CDR3 sequences may be performed. The numeric encoding process 309 may be able to incorporate TCRs with different lengths. Each CDR3 sequence may be stripped of the first two and last three amino acids (i.e., conserved motifs). The remaining sequence may be split into tiling trimers. For example, a sequence ASDTAGK may give ASD, SDT, DTA, TAG, and AGK. One or more n corresponding trimers may be selected and an average of the numeric vectors of the one or more n corresponding trimers may be used to obtain the numeric encoding vector with fixed dimensions of the TCR of interest:
A key desirable feature of the numeric encoding of TCRs is antigen-specificity (i.e., TCRs specific to the same antigen(s) are expected to have closely located coordinates in the high-dimensional Euclidean space, where distance is well-defined). To evaluate the performance of this new approach, a dataset of T cells with experimentally solved antigen-specificities was used. First, antigens with fewer than 100 associated TCRs in the dataset were selected. This filter was applied because the TCRs reported from some high-throughput tetramer sorting experiments contained high rate of false positives. Next, 3,487 TCRs with unambiguously matched antigens were selected. The distances of each pair of TCRs were then calculated and used as predictors. The response vector was binary, being 1 when the TCRs in the pair specific to the same antigen, and 0 otherwise.
A receiver operator characteristic (ROC) curve was made to visualize the prediction accuracy, and an AUC of 0.59 was observed. Despite the low overall AUC, this approach reached a sensitivity of 19.5% at 95% specificity, which is the same level of previously described TCR clustering methods. This is due to the fact that neighboring TCRs share similar amino acid sequences, and similar TCRs may share antigen-specificity. Therefore, the new encoding method provides an absolute set of Euclidean coordinates that measures TCR similarity. Importantly, this embedding covers all the TCR lengths. The coordinate system has the quality of continuity: infinitesimally close TCRs are almost surely specific to the same antigen(s). This is simply because these TCRs will be “almost” identical. However, distal TCRs may also be specific to the same antigen(s), since it is repeatedly reported that TCRs with different motifs can recognize the same epitope. Therefore, it is expected that the sensitivity is low, but specificity is high.
After the numeric encoding process 309, a second clustering process 310 may be performed using the “points” in this space (each point is a TCR) to group them into antigen-specific clusters. The second clustering process 310 may be performed using a dataset composed of TCRs from healthy donors. In an example, the second clustering process 310 may be different than the first clustering process 302 described above. The second clustering process 310 may cluster sequences with different lengths. The second clustering process 310 may use a novel encoding approach, which may be derived from a large TCR dataset, and may carry antigen-specificity information.
In an example, the dataset of TCRs from healthy donors may include approximately 500,000 TCRs, although larger numbers are contemplated. For example, the number of TCRs in the dataset from healthy donors may range from approximately 500,000 to over 1 million. In general, the more TCRs used in the dataset from healthy donors, the more accurate the resulting clustering may be. It is contemplated that the dataset from healthy donors may contain hundreds of millions or even billions of TCRs. unsupervised k-means may be implemented in the 500 dimensional space, with, for example, 5,000 pre-defined centers (although any number may be chosen). The TCRs may be divided into 5,000 clusters because the top 10,000 abundant clones in each repertoire may cover most of the expanded TCRs. To ensure enough hits in each cluster in the downstream analysis, the number of clusters may not be very large. In an example, the average silhouette width reached 0.28, suggesting that the TCRs within each cluster were closer to the cluster centroid rather than other clusters (i.e., clustering was tight. This result suggests that although TCRs in the human immune repertoire display high diversity, they also show conserved distribution patterns in the high-dimensional encoding space, potentially related to the common antigenic challenges from the environment across different individuals.
In step 312, RFUs may be defined. In an example, the distribution of this distance was measured and it was observed that the 99% of the distances between TCRs and their cluster centroids were below 0.25, which is the cut-off of 95% specificity in the ROC curve. Therefore, it may be concluded that TCRs within the same cluster are mostly specific to the same antigens. This result suggested that the centroid of each TCR cluster can be viewed as a “functional unit” of an immune repertoire (i.e., an RFU), with each unit covering a spectrum of antigens. The immune repertoire may be viewed as patches of such units to cover all the possible pathogens to be encountered during lifetime. The number of antigens that each unit responds to may be very large, considering the enormous amount of internal and external immune challenges human body will receive.
In step 314, TCR repertoire samples may be converted into RFU vectors. For a given TCR repertoire sample, the 500-dimensional encoding vector may be calculated for each TCR. The vector may then be compared to each of the 5,000 RFU centroid using Spearman's correlation. The vector may be designated to the RFU with the highest correlation. The rank correlation may be used instead of Euclidean distance to assign the RFUs to reduce the impact of outlier coordinates and to accelerate computational speed. In an example, over 70% of the TCRs in a sample had Spearman's correlation greater than 0.6, suggesting that most TCRs can be assigned to an RFU with similar centroid. Processing all K TCRs in a sample may result in a length-5,000 vector, with each entry the count of TCRs assigned to the related RFU. The final vector may be the vector normalized by K and multiplied by 10,000. The numeric encoding derived from the trimer replacement matrix may allow for an alternative way to visualize and compare repertoire samples.
Referring now to
To illustrate contrast visualization of immune repertoire samples, two samples from a recent study on B cell Hodgkin lymphoma were selected, one from the healthy control and the other lymphoma patient. Both samples were derived from the peripheral blood. After numeric transformation, each sample was converted into a 10,000-by-500 matrix, with each row corresponding to a TCR, and column a dimension in the encoding space. The distribution of the TCRs in this space was then visualized using tSNE as the dimension reduction technique. Other methods, such as UMAP, or PCA may also serve the same purpose. Each TCR may be represented by a point on the 2-D tSNE plot shown in
Referring now to
Two datasets containing both CD4+ and CD8+ T cell repertoires were processed.
Referring to
As COVID-19 infection has affected over 16% of Americans, it is expected that a marked proportion of the “healthy” population will be COVID-19 patients. To test if this new situation affects cancer prediction by a new dataset consisting of 121 non-small cell lung cancer (“NSCLC”) patients and 160 individuals recently infected with COVID-19 was built. Both datasets were profiled with the latest version of reagents by the immunoSEQ platform of AdaptiveBiotech. Using PCA on the RFU matrix, clear separation of the lung cancer group from the COVID-19 patients was observed, though the cancer vs non-cancer difference is only driving PC2, with AUC reaching 96.3% as shown in
For the purposes of this disclosure a module is a software, hardware, or firmware (or combinations thereof) system, process or functionality, or component thereof, that performs or facilitates the processes, features, and/or functions described herein (with or without human interaction or augmentation). A module may include sub-modules. Software components of a module may be stored on a computer readable medium for execution by a processor. Modules may be integral to one or more servers, or be loaded and executed by one or more servers. One or more modules may be grouped into an engine or an application.
Those skilled in the art will recognize that the methods and systems of the present disclosure may be implemented in many manners and as such are not to be limited by the foregoing examples. In other words, functional elements being performed by single or multiple components, in various combinations of hardware and software or firmware, and individual functions, may be distributed among software applications at either the client level or server level or both. In this regard, any number of the features of the different examples described herein may be combined into single or multiple examples, and alternate examples having fewer than, or more than, all of the features described herein are possible.
Functionality may also be, in whole or in part, distributed among multiple components, in manners now known or to become known. Thus, a myriad software/hardware/firmware combinations are possible in achieving the functions, features, interfaces and preferences described herein. Moreover, the scope of the present disclosure covers conventionally known manners for carrying out the described features and functions and interfaces, as well as those variations and modifications that may be made to the hardware or software or firmware components described herein as would be understood by those skilled in the art now and hereafter.
Furthermore, the examples of methods presented and described as flowcharts in this disclosure are provided by way of example in order to provide a more complete understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Alternative examples are contemplated in which the order of the various operations is altered and in which sub-operations described as being part of a larger operation are performed independently.
While various examples have been described for purposes of this disclosure, such examples should not be deemed to limit the teaching of this disclosure to those examples. Various changes and modifications may be made to the elements and operations described above to obtain a result that remains within the scope of the systems and processes described in this disclosure.
This application claims benefit of priority under 35 U.S.C. § 119 (e) of U.S. Provisional Application No. 63/267,369, filed Jan. 31, 2022. The disclosure of this application is considered part of and is herein incorporated by reference in the disclosure of this application in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2023/061531 | 1/30/2023 | WO |
Number | Date | Country | |
---|---|---|---|
63267369 | Jan 2022 | US |