The instant application contains a Sequence Listing which has been submitted electronically in ASCII format and is hereby incorporated by reference in its entirety. Said ASCII copy, created on Jan. 27, 2022, is named 53344-729_201_SL.txt and is 85,016 bytes in size.
Few methods exist for accurate neurodegenerative diagnosis. Primary screening for neurodegeneration is typically based on cognitive assessment (e.g., Mini-Mental State Examinations and Memory Impairment Screens), and therefore typically identifies cognitive decline without providing insight into underlying causes, pathologies, and risk factors. While medical imaging (e.g., Magnetic Resonance Imaging) and tissue analysis can, in certain cases, distinguish neurological conditions, these methods may struggle with early phase detection and tracking disease progression, and may be prohibitively invasive and cost intensive for routine use.
Responsive to the need for faster and less intensive methods for neurological disease diagnosis, aspects of the present disclosure provide compositions, systems, and methods for identifying pluralities of neurological disease biomarkers from biological samples. As individual biomarker analysis has proven to typically be ineffective for identifying neurological disease states, aspects of the present disclosure provide methods which can identify tens, hundreds, thousands, or tens of thousands of biomolecules from biological samples, as well as patterns of biomolecule abundances and biomolecule-particle binding. Further disclosed herein are computer-implemented systems for identifying biological state information, for example neurological disease information, from biological data.
In some aspects, the present disclosure describes a method, comprising: obtaining a data set comprising protein or peptide information from biomolecule coronas that correspond to physiochemically distinct particles incubated with a biofluid sample from a subject; and using a classifier to identify the biofluid sample being indicative of a biological state comprising healthy state, a neurocognitive disorder, or a neurodegenerative disease, in the subject, based on the data set.
In some embodiments, the neurocognitive disorder comprises a mild cognitive impairment (MCI). In some embodiments, the neurodegenerative disease comprises Alzheimer's disease (AD).
In some embodiments, the protein information comprises expression information for a protein provided in a table or figure included herein. In some embodiments, the peptide information comprises expression information for a protein provided in any table or figure included herein.
In some embodiments, obtaining a data set comprises contacting the biofluid sample with the physiochemically distinct particles to form the biomolecule coronas. In some embodiments, obtaining a data set comprises detecting proteins of the biomolecule coronas by mass spectrometry, chromatography, liquid chromatography, high-performance liquid chromatography, solid-phase chromatography, a lateral flow assay, an immunoassay, an enzyme-linked immunosorbent assay, a western blot, a dot blot, or immunostaining, or a combination thereof. In some embodiments, obtaining a data set comprises detecting the proteins of the biomolecule coronas by mass spectrometry. In some embodiments, obtaining a data set comprises measuring a readout indicative of the presence, absence or amount of proteins of the biomolecule coronas.
In some embodiments, the physiochemically distinct particles comprise lipid particles, metal particles, silica particles, or polymer particles. In some embodiments, the physiochemically distinct particles comprise polystyrene particles, magnetizable particles, dextran particles, silica particles, dimethylamine particles, carboxylate particles, amino particles, benzoic acid particles, or agglutinin particles.
In some embodiments, the method further comprises administering a neurocognitive disorder treatment or a neurodegenerative disease treatment to the subject based on the biological state.
In some embodiments, the biofluid comprises a blood sample, a serum sample, or a plasma sample. In some embodiments, the biofluid comprises a blood sample that has had red blood cells removed. In some embodiments, the biofluid is plasma.
In some aspects, the present disclosure describes a method of evaluating a status of a biological state, comprising: measuring biomarkers in a biofluid sample from a subject suspected of having the neurocognitive disorder or the neurodegenerative disease to obtain biomarker measurements, wherein the biomarkers comprise one or more biomarkers selected from a table or figure included herein.
In some embodiments, the biological state comprises healthy state, a neurocognitive disorder, or a neurodegenerative disease. In some embodiments, the neurocognitive disorder comprises a mild cognitive impairment (MCI). In some embodiments, the neurodegenerative disease comprises Alzheimer's disease (AD).
In some embodiments, measuring the biomarkers comprises using a detection reagent that binds to a protein and yields a detectable signal.
In some embodiments, measuring the biomarkers comprises measuring a readout indicative of the presence, absence or amounts of the one or more biomarkers. In some embodiments, measuring the biomarkers comprises performing mass spectrometry, chromatography, liquid chromatography, high-performance liquid chromatography, solid-phase chromatography, a lateral flow assay, an immunoassay, an enzyme-linked immunosorbent assay, a western blot, a dot blot, or immunostaining, or a combination thereof. In some embodiments, measuring the biomarkers comprises performing mass spectrometry. In some embodiments, measuring the biomarkers comprises performing an immunoassay. In some embodiments, measuring the biomarkers comprises contacting the biofluid sample with a plurality of physiochemically distinct nanoparticles.
In some embodiments, the method further comprises applying a classifier to the biomarker measurements. In some embodiments, the classifier distinguishes any of the healthy state, the neurocognitive disorder, or the neurodegenerative disease, from each other.
In some embodiments, the method further comprises identifying the subject as having the neurocognitive disorder or the neurodegenerative disease based on the biomarker measurements.
In some embodiments, the method further comprises administering a neurocognitive disorder treatment or a neurodegenerative disease treatment to the subject.
In some embodiments, the biofluid comprises blood, plasma, or serum.
In some embodiments, the subject is human.
In some aspects, the present disclosure describes a method, comprising: assaying a biological sample from a subject to identify biomolecules; using a trained classifier to identify that the sample or the subject is positive or negative for Alzheimer's disease (AD) based on the biomolecules identified in (a), wherein the trained classifier is trained using data from training samples comprising known healthy samples and known Alzheimer's disease (AD) samples, and wherein the training samples were assayed using a plurality of particles having physicochemically distinct properties to yield the data.
In some aspects, the present disclosure describes a method, comprising: (a) assaying a biological sample from a subject to identify biomolecules; (b) using a trained classifier to identify that the sample or the subject is positive or negative for mild cognitive impairment (MCI) based on the biomolecules identified in (a), wherein the trained classifier is trained using data from training samples comprising known healthy samples and known mild cognitive impairment (MCI) samples, and wherein the training samples were assayed using a plurality of particles having physicochemically distinct properties to yield the data.
In some embodiments, the biomolecules comprise proteins.
In some embodiments, the proteins are selected from proteins included in a table or figure disclosed herein.
In some embodiments, the data comprises proteomic data identifying a presence or an absence of proteins in the training samples.
In some embodiments, the method further comprises obtaining a biological sample from a subject. In some embodiments, the biological sample is a complex biological sample. In some embodiments, the complex biological sample is a plasma sample or a serum sample.
In some embodiments, the plurality of particles having physicochemically distinct properties comprise two or more particles described herein.
In some embodiments, the assaying comprises performing mass spectrometry or ELISA, and wherein the biomolecules comprise protein. In some embodiments, the assaying comprises targeted mass spectrometry.
In some embodiments, the trained classifier is a trained algorithm.
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “figure” and “FIG.” herein), of which:
From a molecular perspective, neurological disease progression is often difficult to assess, as neurodegeneration is typically associated with multiple underlying and often independent causes. For example, presently recognized mild cognitive impairment (MCI) and Alzheimer's disease (AD) risk factors and indicators may include vascular damage, hypertension, atherosclerosis, infection (including numerous forms of herpes simplex infections), personality changes, cognitive decline, or metabolic abnormalities, with some researchers even positing Alzheimer's disease as “Type 3” diabetes. As many neurological disease risk factors and indicators overlap with those of non-neurological conditions (e.g., liver disease and cirrhosis), identifying and distinguishing neurological diseases is often infeasible with standard pathological and biomarker analysis methods. Further complicating neurological disease analysis, neurological diseases may manifest negligible changes outside of affected tissues, rendering many forms of non-intensive (e.g., blood-based) neurological disease analysis poorly prognostic. Accordingly, options for neurological disease diagnoses absent expensive imaging and intensive nerve biopsy analyses have remained limited.
Responsive to the need for rapid, accurate, and minimally intensive neurological disease diagnostics, the present disclosure provides a range of compositions, systems, and methods for assessing neurological diseases from patient samples. In some cases, the compositions, systems, and methods may be configured to utilize blood or components thereof (e.g., whole blood, plasma, serum) to determine the presence of a neurological disease, such as Alzheimer's disease. The methods, systems, and compositions of the present disclosure may identify a plurality of biomolecules from sample and may furthermore determine relative or absolute abundances of at least a subset of the biomolecules. This may be compared to other blood biomarker tests, some of which may be used identify only a single biomolecule (e.g., a particular protein) from blood samples.
A method of the present disclosure may comprise contacting a biological sample (e.g., plasma) with a particle under conditions suitable for biomolecule collection (e.g., non-covalent adsorption of a protein) on the particle. The collection of biomolecules on the surface of the particle may be referred to as a ‘biomolecule corona’. The biomolecule corona that forms on a particle may comprise a complex mixture of biomolecules from the biological sample. A biomolecule corona may include nucleic acids, small molecules, proteins, lipids, polysaccharides, or any combination thereof. The biomolecule corona may compress the abundance ratios of biomolecules from a sample, thereby enabling analysis of dilute, and in many cases difficult to analyze, biomolecules.
A method of the present disclosure may comprise fractionating a biological sample with a particle. In some cases, the method comprises contacting the biological sample with the particle to form thereon a biomolecule corona which comprises biomolecules from the biological sample. The method may comprise separating the biomolecule corona from the biological sample, for example by immobilizing (e.g., magnetically trapping) the particle within a volume and removing unbound components of the biological sample from the volume (e.g., through a series of wash steps). The method may also comprise analyzing a biomolecule of the biomolecule corona. The analyzing may identify the biomolecule, determine an abundance of the biomolecule, identify a state (e.g., post-transcriptional processing of RNA or a post-translational modification of a protein) or form (e.g., a conformation) of the biomolecule, or identify a biomolecule-biomolecule interaction (e.g., a protein-protein interaction reflected, for example, by the formation of a multi-protein complex). As a biomolecule corona may comprise a compressed dynamic range relative to a sample, the analyzing may identify biomolecules over a broader dynamic range (in terms of biological sample concentrations of the biomolecules) than if the analyzing were performed directly on the biological sample (e.g., without particle-based fractionation of the biological sample).
In some cases, the method comprises contacting the biological sample with a plurality of particles. As biomolecule corona composition may depend on a number of factors, including biological sample composition, biological sample conditions (e.g., pH and salinity), particle concentration, and particle physicochemical properties (e.g., surface charge, hydrophilicity, density, roughness), contacting a sample with a plurality of particles may generate a plurality of biomolecule coronas which reflect different characteristics of the sample. For example, a biomolecule corona of a first particle may be sensitive to sample lipid levels, while a biomolecule of a second particle may be sensitive to nanomolar-scale changes in cytokine concentrations. Furthermore, two biomolecule coronas may comprise different subsets of biomolecules from a sample. Accordingly, the method may not only identify a plurality of biomolecules from a biological sample, but may also generate additional information by identifying one or more relationships between biomolecule corona composition, particle type, and sample conditions.
Aspects of the present disclosure provide compositions, systems, and methods for collecting biomolecules on particles, as well as particle panels of multiple distinct particle types, which may enrich proteins from a sample onto distinct biomolecule coronas formed on the surface of the distinct particle types. The particle panels disclosed herein can be used in methods of corona analysis to detect tens, hundreds, thousands, or tens of thousands of proteins across a wide dynamic range in the span of hours. In some cases, the composition, system, or method may utilize one particle. In some cases, the composition, system, or method may utilize at least two particles. In some cases, the composition, system, or method may utilize at least three particles. In some cases, the composition, system, or method may utilize at least four particles. In some cases, the composition, system, or method may utilize at least five particles. In some cases, the composition, system, or method may utilize at least six particles. In some cases, the composition, system, or method may utilize at least eight particles. In some cases, the composition, system, or method may utilize at least ten particles. In some cases, the composition, system, or method may utilize at least twelve particles. In some cases, the composition, system, or method may utilize at least fifteen particles. In some cases, the composition, system, or method may contact a sample with a particle under at least two conditions (e.g., at least two temperatures), and may compare the biomolecule corona formed under each of the at least two conditions. In some cases, the method may comprise identifying an abundance ratio of a biomolecule on two or more particles. In some cases, the method may comprise identifying an abundance ratio of a plurality of biomolecules on a particle. In some cases, the method may comprise identifying an abundance ratio of a first biomolecule on a first particle and a second biomolecule on a second particle.
In some cases, the a method of the present disclosure may be used to identify a biological state, such as a neurological disease state. In some cases, the method may distinguish a healthy biological state from a diseased biological state, or may identify a stage of a biological state, for example early stage Alzheimer's disease from biomolecule corona data of a biological sample. In some cases, the method may identify a subject or a biological sample as healthy. In some cases, a healthy state may exclude a disease state. For example, a healthy state may exclude having a neurological disorder. In some cases, a disease state may exclude being healthy.
Particle types consistent with the methods disclosed herein can be made from various materials. For example, particle materials of the present disclosure may include metals, polymers, magnetic materials, and lipids. Magnetic particles may be iron oxide particles. Examples of metals include any one of gold, silver, copper, nickel, cobalt, palladium, platinum, iridium, osmium, rhodium, ruthenium, rhenium, vanadium, chromium, manganese, niobium, molybdenum, tungsten, tantalum, iron, cadmium, any other material described in U.S. Pat. No. 7,749,299, or any combination thereof. In some cases, a particle may be a superparamagnetic iron oxide nanoparticle (SPION). A magnetic particle may be a ferromagnetic particle, a ferrimagnetic particle, a paramagnetic particle, a superparamagnetic particle, or any combination thereof (e.g., a particle may comprise a ferromagnetic material and a ferrimagnetic material). For example, a particle core may comprise superparamagnetic γ-ferric iron oxide. In some cases, a particle may comprise a distinct core (e.g., the innermost portion of the particle), shell (e.g., the outermost layer of the particle), and shell or shells (e.g., portions of the particle disposed between the core and the shell). In some cases, a core may comprise a metal, an oxide, a nitride, a ceramic, a carbon material, a silicon material, a polymer, or any combination thereof. In some cases, a shell may comprise a polymer, a saccharide, a lipid, a peptide, a self-assembled monolayer, a sol-gel, a hydrogel, a glass, or any combination thereof. In some cases, a shell may comprise polystyrene, N-(3-(Dimethylamino)propyl)methacrylamide (DMAPMA), or a combination thereof. In some cases, a shell material may comprise a small molecule functionalization. In some cases, a shell material may comprise a biomolecular functionalization (e.g., a peptide or saccharide functional appendage). In some cases, a particle may comprise a uniform composition. In some cases, a core or a shell may comprise a plurality of materials comprising a degree of phase separation. For example, a shell may comprise two phase separated polymers. In some cases, a particle core and shell may comprise different densities. In some cases, a shell material may comprise a thickness of at least 2 nm, at least 4 nm, at least 5 nm, at least 8 nm, at least 10 nm, at least 15 nm, at least 20 nm, at least 25 nm, at least 30 nm, or at least 35 nm. In some cases, a shell material may comprise a thickness of at most 35 nm, at most 30 nm, at most 25 nm, at most 20 nm, at most 15 nm, at most 10 nm, at most 8 nm, at most 5 nm, at most 4 nm, or at most 2 nm.
In some cases, a particle may comprise a polymer. In some cases, the polymer may constitute a core material (e.g., the core of a particle may comprise a particle), a layer (e.g., a particle may comprise a layer of a polymer disposed between its core and its shell), a shell material (e.g., the surface of the particle may be coated with a polymer), or any combination thereof. In some cases, the polymer may comprise a polyethylene, a polycarbonate, a polyanhydride, a polyhydroxyacid, a polypropylfumerate, a polycaprolactone, a polyamide, a polyacetal, a polyether, a polyester, a poly(orthoester), a polycyanoacrylate, a polyvinyl alcohol, a polyurethane, a polyphosphazene, a polyacrylate, a polymethacrylate, a polycyanoacrylate, a polyurea, a polystyrene, a polyamine, a polyalkylene glycol (e.g., polyethylene glycol (PEG)), a polyester (e.g., poly(lactide-co-glycolide) (PLGA) or a polylactic acid), a copolymer of two or more polymers (e.g., a copolymer of a polyalkylene glycol (e.g., PEG) and a polyester (e.g., PLGA)), or any combination thereof. In some cases, the polymer may be a lipid-terminated polyalkylene glycol and a polyester, or any other material disclosed in U.S. Pat. No. 9,549,901.
In some cases, a particle may comprise a lipid. In some cases, a lipid-containing particle may comprise a lipid coupled to its surface (e.g., covalently attached to a surface amine of the particle or non-covalently bound by a particle-bound lipid binding protein). In some cases, a lipid-containing particle may comprise a lipid within a monolayer or bilayer comprising the lipid. In some cases, the lipid monolayer or bilayer may comprise non-lipidic biomolecules, including sterols, proteins (e.g., clathrins), and saccharides. In some cases, a plurality of lipids associated with a particle may be fully or partially polymerized. In some cases, a particle may comprise a liposome. Examples of lipids that can be used to form the particles of the present disclosure include cationic, anionic, and neutrally charged lipids. In some cases, particles can be made of any one of dioleoylphosphatidylglycerol (DOPG), diacylphosphatidylcholine, diacylphosphatidylethanolamine, ceramide, sphingomyelin, cephalin, cholesterol, cerebrosides and diacylglycerols, dioleoylphosphatidylcholine (DOPC), dimyristoylphosphatidylcholine (DMPC), dioleoylphosphatidylserine (DOPS), phosphatidylglycerol, cardiolipin, diacylphosphatidylserine, diacylphosphatidic acid, N-dodecanoyl phosphatidylethanolamines, N-succinyl phosphatidylethanolamines, N-glutarylphosphatidylethanolamines, lysylphosphatidylglycerols, palmitoyloleyolphosphatidylglycerol (POPG), lecithin, lysolecithin, phosphatidylethanolamine, lysophosphatidylethanolamine, dioleoylphosphatidylethanolamine (DOPE), dipalmitoyl phosphatidyl ethanolamine (DPPE), dimyristoylphosphoethanolamine (DMPE), distearoyl-phosphatidylethanolamine (DSPE), palmitoyloleoyl-phosphatidylethanolamine (POPE) palmitoyloleoylphosphatidylcholine (POPC), egg phosphatidylcholine (EPC), di stearoylphosphatidylcholine (DSPC), dioleoylphosphatidylcholine (DOPC), dipalmitoylphosphatidylcholine (DPPC), dioleoylphosphatidylglycerol (DOPG), dipalmitoylphosphatidylglycerol (DPPG), palmitoyloleyolphosphatidylglycerol (POPG), 16-O-monomethyl PE, 16-O-dimethyl PE, 18-1-trans PE, palmitoyloleoyl-phosphatidylethanolamine (POPE), 1-stearoyl-2-oleoyl-phosphatidyethanolamine (SOPE), phosphatidylserine, phosphatidylinositol, sphingomyelin, cephalin, cardiolipin, phosphatidic acid, cerebrosides, dicetylphosphate, cholesterol, any other material listed in U.S. Pat. No. 9,445,994 (which is incorporated herein by reference in its entirety), or any combination thereof.
Examples of particles of the present disclosure are provided in TABLE 1.
A particle of the present disclosure may be synthesized, or a particle of the present disclosure may be purchased from a commercial vendor. For example, some particles of the present disclosure may be purchased from commercial vendors including Sigma-Aldrich, Life Technologies, Fisher Biosciences, nanoComposix, Nanopartz, Spherotech, and other commercial vendors. In some cases, a particle of the present disclosure may be purchased from a commercial vendor and further modified, coated, or functionalized.
An example of a particle type of the present disclosure may be a carboxylate (Citrate) superparamagnetic iron oxide nanoparticle (SPION), a phenol-formaldehyde coated SPION, a silica-coated SPION, a polystyrene coated SPION, a carboxylated poly(styrene-co-methacrylic acid) coated SPION, a N-(3-Trimethoxysilylpropyl)diethylenetriamine coated SPION, a poly(N-(3-(dimethylamino)propyl) methacrylamide) (PDMAPMA)-coated SPION, a 1,2,4,5-Benzenetetracarboxylic acid coated SPION, a poly(Vinylbenzyltrimethylammonium chloride) (PVBTMAC) coated SPION, a carboxylate, PAA coated SPION, a poly(oligo(ethylene glycol) methyl ether methacrylate) (POEGMA)-coated SPION, a carboxylate microparticle, a polystyrene carboxyl functionalized particle, a carboxylic acid coated particle, a silica particle, a carboxylic acid particle of about 150 nm in diameter, an amino surface microparticle of about 0.4-0.6 μm in diameter, a silica amino functionalized microparticle of about 0.1-0.39 μm in diameter, a Jeffamine surface particle of about 0.1-0.39 μm in diameter, a polystyrene microparticle of about 2.0-2.9 μm in diameter, a silica particle, a carboxylated particle with an original coating of about 50 nm in diameter, a particle coated with a dextran based coating of about 0.13 μm in diameter, or a silica silanol coated particle with low acidity. An example of a particle type of the present disclosure may be a mixed amide, carboxylate functionalized, silica-coated SPION having a mean size of about 280 nm and a zeta potential of about 50 mV. An example of a particle type of the present disclosure may be an epichlorohydrin crosslinked Dextran-coated SPION having a mean size of about 275+/−30 nm and a zeta potential of about 15 to 20 mV. An example of a particle type of the present disclosure may be a N1-(3-(trimethoxysilyl)propyl)hexane-1,6-diamine functionalized, silica-coated SPION having a mean size of about 280 nm and a zeta potential of about 40 mV.
Particles of the present disclosure can be made and used in methods of forming protein coronas after incubation in a biofluid at a wide range of sizes. In some cases, a particle of the present disclosure may be a nanoparticle. In some cases, a nanoparticle of the present disclosure may be from about 10 nm to about 1000 nm in diameter. In some cases, a nanoparticle may be at least 10 nm, at least 100 nm, at least 200 nm, at least 300 nm, at least 400 nm, at least 500 nm, at least 600 nm, at least 700 nm, at least 800 nm, at least 900 nm, from 10 nm to 50 nm, from 50 nm to 100 nm, from 100 nm to 150 nm, from 150 nm to 200 nm, from 200 nm to 250 nm, from 250 nm to 300 nm, from 300 nm to 350 nm, from 350 nm to 400 nm, from 400 nm to 450 nm, from 450 nm to 500 nm, from 500 nm to 550 nm, from 550 nm to 600 nm, from 600 nm to 650 nm, from 650 nm to 700 nm, from 700 nm to 750 nm, from 750 nm to 800 nm, from 800 nm to 850 nm, from 850 nm to 900 nm, from 100 nm to 300 nm, from 150 nm to 350 nm, from 200 nm to 400 nm, from 250 nm to 450 nm, from 300 nm to 500 nm, from 350 nm to 550 nm, from 400 nm to 600 nm, from 450 nm to 650 nm, from 500 nm to 700 nm, from 550 nm to 750 nm, from 600 nm to 800 nm, from 650 nm to 850 nm, from 700 nm to 900 nm, or from 10 nm to 900 nm in diameter. In some cases, a nanoparticle may be less than 1000 nm in diameter. In some cases, a particle may comprise a diameter of about 30 nm to about 800 nm. In some cases, a particle comprises a diameter of about 60 nm to about 600 nm. In some cases, a particle comprises a diameter of about 60 nm to about 500 nm. In some cases, a particle comprises a diameter of about 60 nm to about 400 nm. In some cases, a particle comprises a diameter of about 60 nm to about 300 nm. In some cases, a particle comprises a diameter of about 60 nm to about 200 nm. In some cases, a particle comprises a diameter of about 60 nm to about 150 nm. In some cases, a particle comprises a diameter of about 80 nm to about 500 nm. In some cases, a particle comprises a diameter of about 80 nm to about 400 nm. In some cases, a particle comprises a diameter of about 80 nm to about 300 nm. In some cases, a particle comprises a diameter of about 80 nm to about 200 nm. In some cases, a particle comprises a diameter of about 80 nm to about 150 nm. In some cases, a particle comprises a diameter of about 100 nm to about 500 nm. In some cases, a particle comprises a diameter of about 100 nm to about 400 nm. In some cases, a particle comprises a diameter of about 100 nm to about 300 nm. In some cases, a particle comprises a diameter of about 100 nm to about 200 nm. In some cases, a particle comprises a diameter of about 100 nm to about 150 nm. In some cases, a particle comprises a diameter of about 120 nm to about 600 nm. In some cases, a particle comprises a diameter of about 120 nm to about 500 nm. In some cases, a particle comprises a diameter of about 120 nm to about 400 nm. In some cases, a particle comprises a diameter of about 120 nm to about 350 nm. In some cases, a particle comprises a diameter of about 120 nm to about 300 nm. In some cases, a particle comprises a diameter of about 120 nm to about 200 nm. In some cases, a particle comprises a diameter of about 150 nm to about 600 nm. In some cases, a particle comprises a diameter of about 150 nm to about 500 nm. In some cases, a particle comprises a diameter of about 150 nm to about 400 nm. In some cases, a particle comprises a diameter of about 150 nm to about 300 nm. In some cases, a particle comprises a diameter of about 200 nm to about 400 nm. In some cases, a particle comprises a diameter of about 200 nm to about 600 nm. In some cases, a particle comprises a diameter of at least about 100 nm. In some cases, a particle comprises a diameter of at most 500 nm.
In some cases, a particle of the present disclosure may be a microparticle. A microparticle may be a particle that is from about 1 μm to about 1000 μm in diameter. For example, the microparticles disclosed here can be at least 1 μm, at least 10 μm, at least 100 μm, at least 200 μm, at least 300 μm, at least 400 μm, at least 500 μm, at least 600 μm, at least 700 μm, at least 800 μm, at least 900 μm, from 10 μm to 50 μm, from 50 μm to 100 μm, from 100 μm to 150 μm, from 150 μm to 200 μm, from 200 μm to 250 μm, from 250 μm to 300 μm, from 300 μm to 350 μm, from 350 μm to 400 μm, from 400 μm to 450 μm, from 450 μm to 500 μm, from 500 μm to 550 μm, from 550 μm to 600 μm, from 600 μm to 650 μm, from 650 μm to 700 μm, from 700 μm to 750 μm, from 750 μm to 800 μm, from 800 μm to 850 μm, from 850 μm to 900 μm, from 100 μm to 300 μm, from 150 μm to 350 μm, from 200 μm to 400 μm, from 250 μm to 450 μm, from 300 μm to 500 μm, from 350 μm to 550 μm, from 400 μm to 600 μm, from 450 μm to 650 μm, from 500 μm to 700 μm, from 550 μm to 750 μm, from 600 μm to 800 μm, from 650 μm to 850 μm, from 700 μm to 900 μm, or from 10 μm to 900 μm in diameter. In some cases, a microparticle may be less than 1000 μm in diameter. In some cases, a microparticle may comprise a diameter of about 1 μm to about 2 μm. In some cases, a microparticle may comprise a diameter of about 1 μm to about 1.5 μm.
A substrate (such as a particle) may comprise a degree of shape or size uniformity or non-uniformity. A physical measure of such heterogeneity may be polydispersity, which tracks size uniformity of a substrate, and may be defined as the square of the ratio of the standard deviation and the mean of substrate size (e.g., particle diameter). Alternatively, polydispersity may be a ratio of (1) weight average molecular weight to (2) number average molecular weight for a substrate (e.g., for a collection of particles), and therefore serves as a measure of mass variance for the substrate. A substrate may comprise a low polydispersity value, indicating a high degree of size uniformity. For example, a substrate (e.g., a collection of a substrate comprising a plurality of copies of the substrate) may comprise a polydispersity index of at most 1.6, at most 1.4, at most 1.2, at most 1, at most 0.8, at most 0.6, at most 0.5, at most 0.4, at most 0.3, at most 0.25, at most 0.2, at most 0.15, at most 0.1, at most 0.05, at most 0.03, or at most 0.02. Alternatively, a substrate may comprise a high polydispersity index, indicating a degree of size and/or mass variation. For example, a substrate (e.g., a collection of a substrate comprising a plurality of copies of the substrate) may comprise a polydispersity index of at least 0.3, at least 0.4, at least 0.5, at least 0.6, at least 0.8, at least 1, at least 1.2, at least 1.4, at least 1.6, at least 1.8, at least 2, at least 2.2, at least 2.5, or at least 3.
A particle may be substantially spherical. A particle may comprise an oblong geometry. A particle may comprise a surface feature, such as a well, a trench, or a substantially flat region.
A particle may be provided at a range of concentrations. A particle may be provided at a concentration of at least 10 pM. A particle may be provided at a concentration of at least 100 pM. A particle may be provided at a concentration of at least 1 nM. A particle may be provided at a concentration of at least 10 nM. A particle may be provided at a concentration of at most 100 nM. A particle may be provided at a concentration of at most 10 nM. A particle may be provided at a concentration of at most 1 nM. A particle may be provided at a concentration of at most 100 pM. A particle may be provided at a concentration of at most 10 pM. A particle may be provided at a concentration of at most 1 pM. A particle may be provided at a concentration between 100 fM and 100 nM. A particle may be provided at a concentration between 100 fM and 10 pM. A particle may be provided at a concentration between 1 pM and 100 pM. A particle may be provided at a concentration between 10 pM and 1 nM. A particle may be provided at a concentration between 100 pM and 10 nM. A particle may be provided at a concentration between 1 nM and 100 nM. A particle may be provided at a concentration of at least 10 ng/ml. A particle may be provided at a concentration of at least 100 ng/ml. A particle may be provided at a concentration of at least 1 μg/ml. A particle may be provided at a concentration of at least 10 μg/ml. A particle may be provided at a concentration of at least 100 μg/ml. A particle may be provided at a concentration of at least 1 mg/ml. A particle may be provided at a concentration of at least mg/ml. A particle may be provided at a concentration of at least 10 mg/ml. A particle may be provided at a concentration of at most 10 mg/ml. A particle may be provided at a concentration of at most 1/ml. A particle may be provided at a concentration of at most 100 μg/ml. A particle may be provided at a concentration of at most 10 μg/ml. A particle may be provided at a concentration of at most 1 μg/ml. A particle may be provided at a concentration of at most 100 ng/ml. A particle may be provided at a concentration of at most 10 ng/ml.
A particle may be contacted to a biological sample at a range of volume ratios. A solution comprising a particle may be combined with a biological sample, at a volume ratio of greater than about 100:1, about 100:1, about 80:1, about 60:1, about 50:1, about 40:1, about 30:1, about 25:1, about 20:1, about 15:1, about 12:1, about 10:1, about 8:1, about 6:1, about 5:1, about 4:1, about 3:1, about 5:2, about 2:1, about 3:2, about 1:1, about 2:3, about 1:2, about 2:5, about 1:3, about 1:4, about 1:5, about 1:6, about 1:8, about 1:10, about 1:12, about 1:15, about 1:20, about 1:25, about 1:30, about 1:40, about 1:50, about 1:60, about 1:80, about 1:100, or less than about 1:100.
In some cases, the ratio between surface area and mass can be a determinant of a particle's properties. In some cases, the number and types of biomolecules that a particle adsorbs from a solution varies with the particle's surface area to mass ratio. In some cases, a particle can have a surface area to mass ratios of 3 to 30 cm2/mg, 5 to 50 cm2/mg, 10 to 60 cm2/mg, 15 to 70 cm2/mg, 20 to 80 cm2/mg, 30 to 100 cm2/mg, 35 to 120 cm2/mg, 40 to 130 cm2/mg, 45 to 150 cm2/mg, 50 to 160 cm2/mg, 60 to 180 cm2/mg, 70 to 200 cm2/mg, 80 to 220 cm2/mg, 90 to 240 cm2/mg, 100 to 270 cm2/mg, 120 to 300 cm2/mg, 200 to 500 cm2/mg, 10 to 300 cm2/mg, 1 to 3000 cm2/mg, 20 to 150 cm2/mg, 25 to 120 cm2/mg, or from 40 to 85 cm2/mg. In some cases, small particles (e.g., with diameters of 50 nm or less) can have significantly higher surface area to mass ratios, stemming in part from the higher order dependence on diameter by mass than by surface area. In some cases (e.g., for small particles), the particles can have surface area to mass ratios of 200 to 1000 cm2/mg, 500 to 2000 cm2/mg, 1000 to 4000 cm2/mg, 2000 to 8000 cm2/mg, or 4000 to 10000 cm2/mg. In some cases (e.g., for large particles), the particles can have surface area to mass ratios of 1 to 3 cm2/mg, 0.5 to 2 cm2/mg, 0.25 to 1.5 cm2/mg, or 0.1 to 1 cm2/mg.
In some cases, a plurality of particles (e.g., of a particle panel) used with the methods described herein may have a range of surface area to mass ratios. In some cases, the range of surface area to mass ratios for a plurality of particles is less than 100 cm2/mg, 80 cm2/mg, 60 cm2/mg, 40 cm2/mg, 20 cm2/mg, 10 cm2/mg, 5 cm2/mg, or 2 cm2/mg. In some cases, the surface area to mass ratios for a plurality of particles varies by no more than 40%, 30%, 20%, 10%, 5%, 3%, 2%, or 1% between the particles in the plurality. In some cases, the plurality of particles may comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, or more different types of particles.
In some cases, a plurality of particles (e.g., in a particle panel) may comprise a range of surface area to mass ratios. In some cases, the range of surface area to mass ratios for a plurality of particles is greater than 100 cm2/mg, 150 cm2/mg, 200 cm2/mg, 250 cm2/mg, 300 cm2/mg, 400 cm2/mg, 500 cm2/mg, 800 cm2/mg, 1000 cm2/mg, 1200 cm2/mg, 1500 cm2/mg, 2000 cm2/mg, 3000 cm2/mg, 5000 cm2/mg, 6000 cm2/mg, 7500 cm2/mg, 10000 cm2/mg, or more. In some cases, the surface area to mass ratios for a plurality of particles (e.g., within a panel) can vary by more than 100%, 200%, 300%, 400%, 500%, 1000%, 10000% or more. In some cases, the plurality of particles with a wide range of surface area to mass ratios may comprise at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15, 20, or more different types of particles.
A particle may comprise a wide range of physical properties. A physical property of a particle may comprise composition, size, surface charge, hydrophobicity, hydrophilicity, surface functionalization, surface topography, surface curvature, porosity, core material, shell material, shape, or any combination thereof.
A surface functionalization may comprise a polymerizable functional group, a positively or negatively charged functional group, a zwitterionic functional group, an acidic or basic functional group, a polar functional group, or any combination thereof. In some cases, a surface functionalization comprises a polar functional group, an acidic functional group, a basic functional group, a charged functional group, a polymerizable functional group, or any combination thereof. In some cases, a surface functionalization may comprise an aminopropyl functionalization, an amine functionalization, an amide functionalization, a boronic acid functionalization, a carboxylic acid functionalization, a methyl functionalization, an N-succinimidyl ester functionalization, a PEG functionalization, a streptavidin functionalization, a methyl ether functionalization, a triethoxylpropylaminosilane functionalization, a thiol functionalization, a PCP functionalization, a citrate functionalization, a lipoic acid functionalization, a BPEI functionalization, carboxyl functionalization, a hydroxyl functionalization, or any combination thereof. In some cases, a surface functionalization may comprise carboxyl groups, hydroxyl groups, thiol groups, cyano groups, nitro groups, ammonium groups, alkyl groups, imidazolium groups, sulfonium groups, pyridinium groups, pyrrolidinium groups, phosphonium groups, aminopropyl groups, amine groups, amide groups, boronic acid groups, N-succinimidyl ester groups, PEG groups, streptavidin, methyl ether groups, triethoxylpropylaminosilane groups, PCP groups, citrate groups, lipoic acid groups, BPEI groups, or any combination thereof. In some cases, a surface functionalization may be present at various ranges of densities on a particle. In some cases, a surface functionalization comprises an average density of at least about 1 functional group per 20 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at least about 1 functional group per 30 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at least about 1 functional group per 40 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at least about 1 functional group per 50 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at least about 1 functional group per 60 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at least about 1 functional group per 80 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at most about 1 functional group per 80 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at most about 1 functional group per 60 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at most about 1 functional group per 50 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at most about 1 functional group per 40 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at most about 1 functional group per 30 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density of at most about 1 functional group per 20 nm2 on a surface of a particle. In some cases, a surface functionalization may comprise an average density about 1 functional group per 20 nm2 to at most about 1 functional group per 60 nm2 on a surface of a particle.
In some cases, a particle may be selected from the group consisting of: micelles, liposomes, iron oxide particles, silver particles, gold particles, palladium particles, quantum dots, platinum particles, titanium particles, silica particles, metal or inorganic oxide particles, synthetic polymer particles, copolymer particles, terpolymer particles, polymeric particles with metal cores, polymeric particles with metal oxide cores, polystyrene sulfonate particles, polyethylene oxide particles, polyoxyethylene glycol particles, polyethylene imine particles, polylactic acid particles, polycaprolactone particles, polyglycolic acid particles, poly(lactide-co-glycolide polymer particles, cellulose ether polymer particles, polyvinylpyrrolidone particles, polyvinyl acetate particles, polyvinylpyrrolidone-vinyl acetate copolymer particles, polyvinyl alcohol particles, acrylate particles, polyacrylic acid particles, crotonic acid copolymer particles, polyethlene phosphonate particles, polyalkylene particles, carboxy vinyl polymer particles, sodium alginate particles, carrageenan particles, xanthan gum particles, gum acacia particles, Arabic gum particles, guar gum particles, pullulan particles, agar particles, chitin particles, chitosan particles, pectin particles, karaya tum particles, locust bean gum particles, maltodextrin particles, amylose particles, corn starch particles, potato starch particles, rice starch particles, tapioca starch particles, pea starch particles, sweet potato starch particles, barley starch particles, wheat starch particles, hydroxypropylated high amylose starch particles, dextrin particles, levan particles, elsinan particles, gluten particles, collagen particles, whey protein isolate particles, casein particles, milk protein particles, soy protein particles, keratin particles, polyethylene particles, polycarbonate particles, polyanhydride particles, polyhydroxyacid particles, polypropylfumerate particles, polycaprolactone particles, polyamine particles, polyacetal particles, polyether particles, polyester particles, poly(orthoester) particles, polycyanoacrylate particles, polyurethane particles, polyphosphazene particles, polyacrylate particles, polymethacrylate particles, polycyanoacrylate particles, polyurea particles, polyamine particles, polystyrene particles, poly(lysine) particles, chitosan particles, dextran particles, poly(acrylamide) particles, derivatized poly(acrylamide) particles, gelatin particles, starch particles, chitosan particles, dextran particles, gelatin particles, starch particles, poly-β-amino-ester particles, poly(amido amine) particles, poly lactic-co-glycolic acid particles, polyanhydride particles, bioreducible polymer particles, 2-(3-aminopropylamino)ethanol particles, and any combination thereof.
In some cases, particles of the present disclosure may differ by one or more physicochemical property. The one or more physicochemical property is selected from the group consisting of: composition, size, surface charge, hydrophobicity, hydrophilicity, roughness, density surface functionalization, surface topography, surface curvature, porosity, core material, shell material, shape, and any combination thereof. The surface functionalization may comprise a macromolecular functionalization, a small molecule functionalization, or any combination thereof. A small molecule functionalization may comprise an aminopropyl functionalization, amine functionalization, an amide functionalization, boronic acid functionalization, carboxylic acid functionalization, alkyl group functionalization, N-succinimidyl ester functionalization, monosaccharide functionalization, phosphate sugar functionalization, sulfurylated sugar functionalization, ethylene glycol functionalization, streptavidin functionalization, methyl ether functionalization, trimethoxysilylpropyl functionalization, silica functionalization, triethoxylpropylaminosilane functionalization, thiol functionalization, PCP functionalization, citrate functionalization, lipoic acid functionalization, ethyleneimine functionalization. A particle panel may comprise a plurality of particles with a plurality of small molecule functionalizations selected from the group consisting of silica functionalization, trimethoxysilylpropyl functionalization, dimethylamino propyl functionalization, phosphate sugar functionalization, amine functionalization, and carboxyl functionalization.
A small molecule functionalization may comprise a polar functional group. Non-limiting examples of polar functional groups comprise carboxyl group, a hydroxyl group, a thiol group, a cyano group, a nitro group, an ammonium group, an imidazolium group, a sulfonium group, a pyridinium group, a pyrrolidinium group, a phosphonium group or any combination thereof. In some cases, the functional group is an acidic functional group (e.g., sulfonic acid group, carboxyl group, and the like), a basic functional group (e.g., amino group, cyclic secondary amino group (such as pyrrolidyl group and piperidyl group), pyridyl group, imidazole group, guanidine group, etc.), a carbamoyl group, a hydroxyl group, an aldehyde group and the like.
A small molecule functionalization may comprise an ionic or ionizable functional group. Non-limiting examples of ionic or ionizable functional groups comprise an ammonium group, an imidazolium group, a sulfonium group, a pyridinium group, a pyrrolidinium group, a phosphonium group.
A small molecule functionalization may comprise a polymerizable functional group. Non-limiting examples of the polymerizable functional group include a vinyl group and a (meth)acrylic group. In some cases, the functional group is pyrrolidyl acrylate, acrylic acid, methacrylic acid, acrylamide, 2-(dimethylamino)ethyl methacrylate, hydroxyethyl methacrylate and the like.
A surface functionalization may comprise a charge. For example, a particle can be functionalized to carry a net neutral surface charge, a net positive surface charge, a net negative surface charge, or a zwitterionic surface. A zwitterionic particle surface may be zwitterionic over at least 1, at least 2, at least 3, at least 4, at least 5, at least 6 or more pH units. Surface charge can be a determinant of the types of biomolecules collected on a particle. Accordingly, optimizing a particle panel may comprise selecting particles with different surface charges, which may not only increase the number of different proteins collected on a particle panel, but also increase the likelihood of identifying a biological state of a sample. A particle panel may comprise a positively charged particle and a negatively charged particle. A particle panel may comprise a positively charged particle and a neutral particle. A particle panel may comprise a positively charged particle and a zwitterionic particle. A particle panel may comprise a neutral particle and a negatively charged particle. A particle panel may comprise a neutral particle and a zwitterionic particle. A particle panel may comprise a negative particle and a zwitterionic particle. A particle panel may comprise a positively charged particle, a negatively charged particle, and a neutral particle. A particle panel may comprise a positively charged particle, a negatively charged particle, and a zwitterionic particle. A particle panel may comprise a positively charged particle, a neutral particle, and a zwitterionic particle. A particle panel may comprise a negatively charged particle, a neutral particle, and a zwitterionic particle. In some cases, a charge of a particle may be determined by measuring the zeta potential of the particle.
The present disclosure provides compositions and methods of use thereof for assaying a sample for proteins. Compositions described herein may include particle panels comprising one or more than one distinct particle types. Particle panels described herein can vary in the number of particle types and the diversity of particle types in a single panel. For example, particles in a panel may vary based on size, polydispersity, shape and morphology, surface charge, surface chemistry and functionalization, and base material. Panels may be incubated with a sample to be analyzed for protein composition. Proteins in the sample may adsorb to the surface of the different particle types in the particle panel to form a protein corona. The types of proteins which adsorb to a certain particle type in the particle panel may depend on the composition, size, and surface charge of the particle type. Thus, each particle type in a panel may have different protein coronas due to adsorbing a different set of proteins, different concentrations of a particular protein, or a combination thereof. Each particle type in a panel may have mutually exclusive protein coronas or may have overlapping protein coronas. Overlapping protein coronas can overlap in protein identity, in protein concentration, or both.
The present disclosure also provides methods for selecting a particle types for inclusion in a panel depending on the sample type. Particle types included in a panel may be a combination of particles that are optimized for removal of highly abundant proteins. Particle types also consistent for inclusion in a panel are those selected for adsorbing particular proteins of interest. In some cases, the particles may be nanoparticles. In some cases, the particles may be microparticles. In some cases, the particles may be a combination of nanoparticles and microparticles.
A particle panel including any number of distinct particle types disclosed herein, may enrich and identify a single protein or protein group. In some cases, the single protein or protein group may comprise proteins having different post-translational modifications. For example, a first particle type in the particle panel may enrich a protein or protein group having a first post-translational modification, a second particle type in the particle panel may enrich the same protein or same protein group having a second post-translational modification, and a third particle type in the particle panel may enrich the same protein or same protein group lacking a post-translational modification. In some cases, the particle panel including any number of distinct particle types disclosed herein, may enrich and identify a single protein or protein group by binding different domains, sequences, or epitopes of the single protein or protein group. For example, a first particle type in the particle panel may enrich a protein or protein group by binding to a first domain of the protein or protein group, and a second particle type in the particle panel may enrich the same protein or same protein group by binding to a second domain of the protein or protein group.
A particle panel may comprise a combination of particles with silica and polymer surfaces. For example, a particle panel may comprise a SPION coated with a thin layer of silica, a SPION coated with poly(dimethyl aminopropyl methacrylamide) (PDMAPMA), and a SPION coated with poly(ethylene glycol) (PEG). A particle panel of the present disclosure can also comprise two or more particles selected from the group consisting of silica coated SPION, an N-(3-Trimethoxysilylpropyl) diethylenetriamine coated SPION, a PDMAPMA coated SPION, a carboxyl-functionalized polyacrylic acid coated SPION, an amino surface functionalized SPION, a polystyrene carboxyl functionalized SPION, a silica particle, and a dextran coated SPION. A particle panel of the present disclosure may also comprise two or more particles selected from the group consisting of a surfactant free carboxylate microparticle, a carboxyl functionalized polystyrene particle, a silica coated particle, a silica particle, a dextran coated particle, an oleic acid coated particle, a boronated nanopowder coated particle, a PDMAPMA coated particle, a Poly(glycidyl methacrylate-benzylamine) coated particle, and a Poly(N-[3-(Dimethylamino)propyl]methacrylamide-co-[2-(methacryloyloxy)ethyl]dimethyl-(3-sulfopropyl)ammonium hydroxide, P(DMAPMA-co-SBMA) coated particle. A particle panel of the present disclosure may comprise silica-coated particles, N-(3-Trimethoxysilylpropyl)diethylenetriamine coated particles, poly(N-(3-(dimethylamino)propyl) methacrylamide) (PDMAPMA)-coated particles, phosphate-sugar functionalized polystyrene particles, amine functionalized polystyrene particles, polystyrene carboxyl functionalized particles, ubiquitin functionalized polystyrene particles, dextran coated particles, or any combination thereof.
A particle panel of the present disclosure may comprise a silica functionalized particle, an amine functionalized particle, a silicon alkoxide functionalized particle, a carboxylate functionalized particle, and a benzyl or phenyl functionalized particle. A particle panel of the present disclosure may comprise a silica functionalized particle, an amine functionalized particle, a silicon alkoxide functionalized particle, a polystyrene functionalized particle, and a saccharide functionalized particle. A particle panel of the present disclosure may comprise a silica functionalized particle, an N-(3-Trimethoxysilylpropyl)diethylenetriamine functionalized particle, a PDMAPMA functionalized particle, a dextran functionalized particle, and a polystyrene carboxyl functionalized particle. A particle panel of the present disclosure may comprise 5 particles including a silica functionalized particle, an amine functionalized particle, a silicon alkoxide functionalized particle.
A particle panel of the present disclosure may comprise a silica particle, an amine functionalized particle, and a polyethylene glycol-functionalized particle. The particle panel may further comprise a carboxylate functionalized particle, such as a carboxylate functionalized styrene particle. The particle panel may further comprise a saccharide-coated particle. In some cases, the saccharide-coated particle is a dextran-coated particle. The particle panel may further comprise a sulfuryl functionalized particle. The sulfuryl functionalized particle may comprise a positively charged surface functionalization such as an amine, and thereby may be zwitterionic. The particle panel may further comprise a particle with a boronated or boronic acid functionalized surface. The particle panel may further comprise a particle with an oleic acid functionalized surface. The particle panel may comprise at least one microparticle.
The present disclosure includes compositions (e.g., particle panels) and methods that comprise two or more particles differing in at least one physicochemical property. A composition or method of the present disclosure may comprise 3 to 6 particles differing in at least one physicochemical property. A composition or method of the present disclosure may comprise 4 to 8 particles differing in at least one physicochemical property. A composition or method of the present disclosure may comprise 4 to 10 particles differing in at least one physicochemical property. A composition or method of the present disclosure may comprise 5 to 12 particles differing in at least one physicochemical property. A composition or method of the present disclosure may comprise 6 to 14 particles differing in at least one physicochemical property. A composition or method of the present disclosure may comprise 8 to 15 particles differing in at least one physicochemical property. A composition or method of the present disclosure may comprise 10 to 20 particles differing in at least one physicochemical property. A composition or method of the present disclosure may comprise at least 2 distinct particle types, at least 3 distinct particle types, at least 4 distinct particle types, at least 5 distinct particle types, at least 6 distinct particle types, at least 7 distinct particle types, at least 8 distinct particle types, at least 9 distinct particle types, at least 10 distinct particle types, at least 11 distinct particle types, at least 12 distinct particle types, at least 13 distinct particle types, at least 14 distinct particle types, at least 15 distinct particle types, at least 20 distinct particle types, at least 25 particle types, or at least 30 distinct particle types.
An example of a particle panel of the present disclosure is provided in
A further example of particle panels is provided in
In some cases, a particle panel may comprise a particle listed in TABLE 2, below. A particle panel may comprise at least two particles listed in TABLE 2. In some cases, a particle panel may comprise at least three particles listed in TABLE 2. In some cases, a particle panel may comprise at least four particles listed in TABLE 2. In some cases, a particle panel may comprise the particles listed in TABLE 2.
In some cases, a particle panel may comprise a particle listed in TABLE 3, below. In some cases, a particle panel may comprise at least two particles listed in TABLE 3. In some cases, a particle panel may comprise at least three particles listed in TABLE 3. In some cases, a particle panel may comprise at least four particles listed in TABLE 3. In some cases, a particle panel may comprise the particles listed in TABLE 3.
In some cases, a particle panel may comprise a particle listed in TABLE 4, below. In some cases, a particle panel may comprise at least two particles listed in TABLE 4. In some cases, a particle panel may comprise at least three particles listed in TABLE 4. In some cases, a particle panel may comprise at least four particles listed in TABLE 4. In some cases, a particle panel may comprise the particles listed in TABLE 4.
In some cases, a particle panel may comprise a particle listed in TABLE 5, below. In some cases, a particle panel may comprise at least two particles listed in TABLE 5. In some cases, a particle panel may comprise at least three particles listed in TABLE 5. In some cases, a particle panel may comprise at least four particles listed in TABLE 5. In some cases, a particle panel may comprise the particles listed in TABLE 5.
In some cases, a particle panel of the present disclosure may comprise at least one, at least two, at least 3, at least 4, or at least 5 particles, each particle selected from the group consisting of a superparamagnetic iron oxide particle (SPION) comprising a silica surface, a SPION comprising an N-(3-Trimethoxysilylpropyl)diethylenetriamine surface, a SPION comprising a Poly(dimethyl aminopropyl methacrylamide) (Dimethylamine) surface, a SPION comprising a carboxyl functionalized polystyrene surface, and a SPION comprising a dextran coating. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a poly(N-(3-(dimethylamino)propyl) methacrylamide) (PDMAPMA) surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a poly(oligo(ethylene glycol) methyl ether methacrylate) (POEGMA) surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising an N-(3-Trimethoxysilylpropyl)diethylenetriamine surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a Poly(dimethyl aminopropyl methacrylamide) (Dimethylamine) surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a dextran surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a surface with a mixed chemistry based on amine-epoxy chemistry. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a Polyzwitterion coated (Poly(N-[3-(Dimethylamino)propyl]methacrylamide-co-[2-(methacryloyloxy)ethyl]dimethyl-(3-sulfopropyl)ammonium hydroxide, P(DMAPMA-co-SBMA)) surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising styrene surface comprising an oleic acid functionalization. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a boronated styrene surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a carboxylated styrene surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a carboxylated styrene surface. In some cases, a particle panel of the present disclosure may comprise a SPION comprising a strongly acidic silica surface. A particle panel of the present disclosure may comprise at least one particle, at least 2 particles, at least 3 particles, or at least 4 particles, or at least 5 particles, each selected from the group consisting of a silica-coated SPION, a poly(dimethylaminopropylmethacrylamide)-coated SPION, an N-(3-Trimethoxysilylpropyl)diethylenetriamine-coated SPION, a 1,6-hexanediamine-coated SPION, and an N1-(3-(trimethoxysilyl)propyl)hexane-1,6-diamine functionalized, silica-coated SPION. A particle panel of the present disclosure may comprise a silica-coated SPION, a poly(dimethylaminopropylmethacrylamide)-coated SPION, an N-(3-Trimethoxysilylpropyl)diethylenetriamine-coated SPION, a 1,6-hexanediamine-coated SPION, and an N′-(3-(trimethoxysilyl)propyl)hexane-1,6-diamine functionalized, silica-coated SPION.
In some cases, particles of the present disclosure may be used to serially interrogate a sample by incubating a first particle type with the sample to form a biomolecule corona on the first particle type, separating the first particle type, incubating a second particle type with the sample to form a biomolecule corona on the second particle type, separating the second particle type, and repeating the interrogating (by incubation with the sample) and the separating for any number of particle types. In some cases, the biomolecule corona on each particle type used for serial interrogation of a sample may be analyzed by protein corona analysis. The biomolecule content of the supernatant may be analyzed following serial interrogation with one or more particle types.
The particle panels disclosed herein can be used to identify a number of proteins, peptides, or protein groups using a method disclosed herein. Feature intensities, as disclosed herein, may refer to the intensity of a discrete spike (“feature”) seen on a plot of mass to charge ratio versus intensity from a mass spectrometry run of a sample. These features can correspond to variably ionized fragments of peptides and/or proteins. Feature intensities can be sorted into protein groups. Protein groups refer to two or more proteins that are identified by a shared peptide sequence. Alternatively, a protein group can refer to one protein that is identified using a unique identifying sequence. For example, if in a sample, a peptide sequence is assayed that is shared between two proteins (Protein 1: XYZZX and Protein 2: XYZYZ), a protein group could be the “XYZ protein group” having two members (protein 1 and protein 2). Alternatively, if the peptide sequence is unique to a single protein (Protein 1), a protein group could be the “ZZX” protein group having one member (Protein 1). Each protein group can be supported by more than one peptide sequence. Protein detected or identified according to the instant disclosure can refer to a distinct protein detected in the sample (e.g., distinct relative other proteins detected using mass spectrometry). Thus, analysis of proteins present in distinct coronas corresponding to the distinct particle types in a particle panel, yields a high number of feature intensities. This number decreases as feature intensities are processed into distinct peptides, further decreases as distinct peptides are processed into distinct proteins, and further decreases as peptides are grouped into protein groups (two or more proteins that share a distinct peptide sequence),
Aspects of the present disclosure provide compositions, systems, and methods for collecting biomolecules on nanoparticles and microparticles (as well as other types of sensor elements such as polymer matrices, filters, rods, and extended surfaces). In some cases, a particle may adsorb a plurality of biomolecules upon contact with a biological sample, thereby forming a biomolecule corona on the surfaces of the particles. In some cases, the biomolecule corona may comprise proteins, lipids, nucleic acids, metabolites, saccharides, small molecules (e.g., sterols), and other biological species present in a sample. In some cases, a biomolecule corona comprising proteins may also be referred to as a ‘protein corona’, and may refer to all constituents adsorbed to a particle (e.g., proteins, lipids, nucleic acids, and other biomolecules), or may refer only to proteins adsorbed to the particle.
The composition of the biomolecule corona may depend on a property of the particle. In many cases, the composition of the biomolecule corona is strongly dependent on the surface of the particle. Characteristics such as particle surface material (e.g., ceramic, polymer, metal, metal oxide, graphite, silicon dioxide, etc.), surface texture (rough, smooth, grooved, etc.), surface functionalization (e.g., carboxylate functionalized, amine functionalized, small molecule (e.g., saccharide) functionalized, etc.), shape, curvature, and size can each independently serve as determinants for biomolecule corona composition. In addition to surface features, the particle core composition, particle density, and particle surface area to mass ratio may each influence biomolecule corona composition. For example, two particles comprising the same surfaces and different cores may form different biomolecule coronas upon contact with the same sample.
Biomolecule corona formation may also be influenced by sample composition. For example, a first sample condition (e.g., low salinity) might favor the solubility of a particular analyte (e.g., an isoform of Bone Morphogenic Protein 1 (BMP1)), and thereby disfavor its binding in a biomolecule corona, while a second sample condition (e.g., high salinity) may diminish the solubility of the analyte, thereby driving its incorporation into a biomolecule corona.
Biomolecule corona composition may also depend on molecular level interactions between the biomolecules, themselves. An energetically favorable interaction between two biomolecules may promote their co-incorporation into a biomolecule corona. For example, if a first protein adsorbed to a particle comprises an affinity for a second protein in solution, the first protein may bind to a portion of the second protein, thereby driving its binding to the particle or to other proteins of the biomolecule corona of the particle. Analogously, a first biomolecule disposed within a biomolecule corona may comprise an energetically unfavorable interaction with a second biomolecule in a biological sample, thereby disfavoring its incorporation into a biomolecule corona. In part owing to these inter-biomolecule dependencies, biomolecule coronas provide sensitive platforms for directly and indirectly sensing biomolecules from a biological sample.
Biomolecules collected on a particle may be subjected to further analysis. A method may comprise collecting a biomolecule corona or a subset of biomolecules from a biomolecule corona. The collected biomolecule corona or the collected subset of biomolecules from the biomolecule corona may be subjected to further particle-based analysis (e.g., particle adsorption). The collected biomolecule corona or the collected subset of biomolecules from the biomolecule corona may be purified or fractionated (e.g., by a chromatographic method). The collected biomolecule corona or the collected subset of biomolecules from the biomolecule corona may be analyzed (e.g., by mass spectrometry). Furthermore, as biomolecule corona composition is dependent on solution-phase and particle-bound biomolecules as well as sample conditions (e.g., pH, osmolarity, lipid concentration), biomolecule corona composition can provide a sensitive measure of biomolecules which are not bound to a particle and of sample conditions.
The particles and methods of use thereof disclosed herein can bind a large number of unique biomolecules (e.g., proteins) in a biological sample (e.g., a biofluid). For example, a particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising at least 5 protein groups, at least 10 protein groups, at least 15 protein groups, at least 20 protein groups, at least 25 protein groups, at least 50 protein groups, at least 80 protein groups, at least 100 protein groups, least 150 protein groups, at least 180 protein groups, at least 200 protein groups, at least 250 protein groups, at least 300 protein groups, at least 350 protein groups, at least 400 protein groups, at least 450 protein groups, at least 500 protein groups, at least 600 protein groups, at least 700 protein groups, at least 800 protein groups, at least 900 protein groups, at least 1000 protein groups, at least 1100 protein groups, at least 1200 protein groups, at least 1300 protein groups, at least 1400 protein groups, at least 1500 protein groups, at least 1600 protein groups, at least 1800 protein groups, at least 2000 protein groups, at least 2500, at least 5000 protein groups, at least 10000 protein groups, at least 15000 protein groups, at least 20000 protein groups, at least 25000 protein groups, at least 30000 protein groups, at least 35000 protein groups, at least 45000 protein groups, at least 50000 protein groups, at least 60000 protein groups, at least 70000 protein groups, at least 80000 protein groups, at least 90000 protein groups, or at least 100000 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising at most 5 protein groups, at most 10 protein groups, at most 20 protein groups, at most 30 protein groups, at most 40 protein groups, at most 50 protein groups, at most 60 protein groups, at most 80 protein groups, at most 100 protein groups, at most 150 protein groups, at most 200 protein groups, at most 250 protein groups, at most 300 protein groups, at most 400 protein groups, at most 500 protein groups, at most 600 protein groups, at most 800 protein groups, at most 1000 protein groups, at most 1200 protein groups, at most 1500 protein groups, at most 1800 protein groups, at most 2000 protein groups, at most 2500 protein groups, at most 3000 protein groups, at most 4000 protein groups, at most 5000 protein groups, at most 7500 protein groups, at most 10000 protein groups, at most 15000 protein groups, at most 20000 protein groups, at most 25000 protein groups, at most 50000 protein groups, at most 75000 protein groups, or at most 100000 protein groups. A particle disclosed herein can be incubated with a biological sample to form a protein corona comprising from 5 to 2500 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 5 to 50 protein groups. A particle disclosed herein can be incubated with a biological sample to form a protein corona comprising from 10 to 100 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 20 to 100 protein groups. A particle disclosed herein can be incubated with a biological sample to form a protein corona comprising from 20 to 400 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 50 to 500 protein groups. A particle disclosed herein can be incubated with a biological sample to form a protein corona comprising from 100 to 800 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 200 to 1000 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 300 to 1200 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 400 to 1500 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 500 to 2000 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 800 to 2500 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 1000 to 3000 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 1000 to 5000 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 2000 to 10000 protein groups. A particle or particle panel disclosed herein can be incubated with a biological sample to form a protein corona comprising from 5000 to 25000 protein groups. In some cases, several different types of particles can be used, separately or in combination, to identify large numbers of proteins in a particular biological sample. In other words, particles can be multiplexed in order to bind and identify large numbers of proteins in a biological sample. Protein corona analysis may compress the dynamic range of the analysis compared to a protein analysis of the original sample.
An assay utilizing a plurality of particles may distinguish which particle a specific biomolecule, biomolecule fragment (e.g., peptide generated by digesting a biomolecule corona protein), or signal corresponding to a biomolecule (e.g., one of ten mass spectrometric signals associated with a specific peptide fragment of a biomolecule corona protein). As biomolecule corona composition is dependent on sample conditions (e.g., salinity, temperature, pH), biomolecular composition, and particle physicochemical properties, two particles may develop different biomolecule coronas upon contacting a sample. Accordingly, the type or types of particles on which a particular biomolecule is observed comprise biological state information which may be utilized for analysis. A method may identify the type of particle on which a biomolecule, biomolecule fragment, or signal corresponding to a biomolecule is observed. A method may identify a ratio of abundances of a biomolecule or biomolecule fragment on a plurality of particles. A method may identify a ratio of signal intensities associated with a biomolecule identified on a plurality of particles.
Annotating biomolecules, biomolecule fragments, and signals by particle type can increase the amount of information derived from an assay. While many methods generate lists of biomolecules associated with samples, the present disclosure provides methods which differentiate the binding affinity of individual biomolecules across multiple particle types. As demonstrated in examples 1 and 2, differences in biomolecule abundance across two particles can comprise greater diagnostic utility than simple identification of a biomolecule within a sample. For example, 17 of the top 20 features in the trained Alzheimer's disease (AD) Random Forest classifier presented in example 2 are associated with proteins with OpenTarget Alzheimer's disease scores of less than 0.04, indicating that their sample-level abundances likely contain negligible diagnostic utility for Alzheimer's disease detection, but that their particle-specific detection can generate accurate Alzheimer's disease diagnoses.
A method (e.g., computer-implemented analysis with a trained classifier) of the present disclosure can comprise identifying a particle on which a biomolecule, biomolecule fragment, or signal was derived. A method of the present disclosure can comprise identifying an abundance ratio of a biomolecule or a biomolecule fragment across at least 2 particle types. A method of the present disclosure can comprise identifying an intensity ratio of a signal associated with a biomolecule or a biomolecule fragment across at least 2 particle types. A method of the present disclosure can comprise identifying an abundance ratio of a biomolecule or a biomolecule fragment across at least 3 particle types. A method of the present disclosure can comprise identifying an intensity ratio of a signal associated with a biomolecule or a biomolecule fragment across at least 3 particle types. A method of the present disclosure can comprise identifying an abundance ratio of a biomolecule or a biomolecule fragment across at least 4 particle types. A method of the present disclosure can comprise identifying an intensity ratio of a signal associated with a biomolecule or a biomolecule fragment across at least 4 particle types. A method of the present disclosure can comprise identifying an abundance ratio of a biomolecule or a biomolecule fragment across at least 5 particle types. A method of the present disclosure can comprise identifying an intensity ratio of a signal associated with a biomolecule or a biomolecule fragment across at least 5 particle types. A method of the present disclosure can comprise identifying an abundance ratio of a biomolecule or a biomolecule fragment across at least 6 particle types. A method of the present disclosure can comprise identifying an intensity ratio of a signal associated with a biomolecule or a biomolecule fragment across at least 6 particle types. A method of the present disclosure can comprise identifying an abundance ratio of a biomolecule or a biomolecule fragment across at least 8 particle types. A method of the present disclosure can comprise identifying an intensity ratio of a signal associated with a biomolecule or a biomolecule fragment across at least 8 particle types. A method of the present disclosure can comprise identifying an abundance ratio of a biomolecule or a biomolecule fragment across at least 10 particle types. A method of the present disclosure can comprise identifying an intensity ratio of a signal associated with a biomolecule or a biomolecule fragment across at least 10 particle types.
A method of the present disclosure may also identify an abundance or signal intensity ratio associated with different biomolecules or biomolecule fragments. For example, rather than exclusively utilizing an individual biomolecule abundance as an input, a trained classifier of the present disclosure may utilize an abundance ratio of a first biomolecule observed on a first particle and a second biomolecule observed on a second particle. As many biomolecules, and in particular many blood biomolecules, are ubiquitous across healthy and neurodegenerative disease samples (for example albumin, globulins, iron storage proteins), changes in their abundances may not be diagnostic for neurodegenerative disease states or progressions. However, a change in a ratio of two biomolecules, such as the iron storage proteins ferritin and transferrin can comprise information relevant for neurodegenerative disease and biological state diagnosis. Furthermore, as biomolecule particle adsorption can comprise a dependence on sample composition, an abundance or signal intensity ratio of two biomolecules on two particles can reflect biological state-relevant changes in a sample. Accordingly, a method of the present disclosure may identify an abundance ratio of a first biomolecule on a first particle and a second biomolecule on a second particle. A method of the present disclosure may also identify an intensity ratio of a first signal associated with a first biomolecule on a first particle and a second signal associated with a second biomolecule on a second particle.
Protein corona analysis may comprise an automated component. For example, an automated instrument may contact a sample with a particle or particle panel, identify proteins on the particle or particle panel (e.g., digest the proteins on the particle or particle panel and perform mass spectrometric analysis), and generate data for identifying a specific biomolecule or a biological state of a sample. The automated instrument may divide a sample into a plurality of volumes, and perform analysis on each volume or a subset of the plurality. The automated instrument may analyze multiple separate samples, for example by disposing multiple samples within multiple wells in a well plate, and performing parallel analysis on each sample or a subset of samples within the well plate.
The particle panels disclosed herein can be used to identify a number of proteins, peptides, protein groups, or protein classes using a protein analysis workflow described herein (e.g., a protein corona analysis workflow). Protein corona analysis may comprise contacting a sample to distinct particle types (e.g., a particle panel), forming biomolecule corona on the distinct particle types, and identifying the biomolecules in the biomolecule corona (e.g., by mass spectrometry). Feature intensities, as disclosed herein, refers to the intensity of a discrete spike (“feature”) seen on a plot of mass to charge ratio versus intensity from a mass spectrometry run of a sample. These features can correspond to variably ionized fragments of peptides and/or proteins. Using the data analysis methods described herein, feature intensities can be sorted into protein groups. Protein groups refer to two or more proteins that are identified by a shared peptide sequence. Alternatively, a protein group can refer to one protein that is identified using a unique identifying sequence. For example, if in a sample, a peptide sequence is assayed that is shared between two proteins (Protein 1: XYZZX and Protein 2: XYZYZ), a protein group could be the “XYZ protein group” having two members (protein 1 and protein 2) which share the identifiable XYZ motif. Alternatively, if the peptide sequence is unique to a single protein (Protein 1), a protein group could be the “ZZX” protein group having one member (Protein 1). A protein group can be supported by more than one peptide sequence. Protein detected or identified according to the instant disclosure can refer to a distinct protein detected in the sample (e.g., distinct relative other proteins detected using mass spectrometry). Thus, analysis of proteins present in distinct coronas corresponding to the distinct particle types in a particle panel yields a high number of feature intensities. In some cases, multiple features are associated with a single peptide, such that processing feature intensities yields a lower number of peptides. As an illustrative example, during data processing, 6000 feature intensities (e.g., mass spectrometric signals) may be assigned to 1200 peptides, yielding an average of one peptide per 5 feature intensities. Furthermore, in some cases, multiple peptides may be associated with individual proteins or protein groups, such that processing peptides yields a lower number of proteins or protein groups. As another illustrative example, 1200 peptides may be assigned to 300 protein groups, yielding an average of one protein group per 4 peptides. In some cases, a single feature intensity may identify a peptide. In some cases, a single peptide may identify a protein group. In some cases, a single feature intensity may be divided between multiple peptides. For example, tandem mass spectrometric analysis (MS/MS) of a feature intensity may identify that two separate peptides contribute to the feature intensity.
The methods disclosed herein include isolating one or more particle types from a sample or from more than one sample (e.g., a biological sample or a serially interrogated sample). The particle types can be isolated or separated from the sample using a magnet. Moreover, multiple samples that are spatially isolated can be processed in parallel. Thus, the methods disclosed herein provide for isolating or separating a particle type from unbound protein in a sample. A particle type may be separated using methods including but not limited to magnetic separation, centrifugation, filtration, or gravitational separation. Particle panels may be incubated with a plurality of spatially isolated samples, wherein each spatially isolated sample is in a well in a well plate (e.g., a 96-well plate, a 192-well plate, or a 384-well plate). After incubation, the particle types in each of the wells of the well plate can be separated from unbound protein present in the spatially isolated samples by placing the entire plate on a magnet. This pulls down the superparamagnetic particles in the particle panel. The supernatant in each sample can be removed to remove the unbound protein. These steps (incubate, pull down) can be repeated to effectively wash the particles, thus removing residual background unbound protein that may be present in a sample. This is one example, but one of skill in the art could envision numerous other scenarios in which superparamagnetic particles are rapidly isolated from one or more than one spatially isolated samples at the same time.
In some cases, the methods and compositions of the present disclosure may provide identification and measurement of particular proteins in the biological samples by processing of the proteomic data via digestion of coronas formed on the surface of particles. Examples of proteins that can be identified and measured include highly abundant proteins, proteins of medium abundance, and low-abundance proteins. A low abundance protein may be present in a sample at concentrations at or below about 10 ng/mL. A high abundance protein may be present in a sample at concentrations at or above about 10 μg/mL. A protein of moderate abundance may be present in a sample at concentrations between about 10 ng/mL and about 10 μg/mL. Examples of proteins that are highly abundant proteins include albumin, IgG, and the top 14 proteins in abundance that contribute 95% of the analyte mass in plasma. Additionally, any proteins that may be purified using a conventional depletion column may be directly detected in a sample using the particle panels disclosed herein. Examples of proteins may be any protein listed in published databases such as Keshishian et al. (Mol Cell Proteomics. 2015 September; 14(9):2375-93. doi: 10.1074/mcp.M114.046813. Epub 2015 Feb. 27.), Farr et al. (J Proteome Res. 2014 Jan. 3; 13(1):60-75. doi: 10.1021/pr4010037. Epub 2013 Dec. 6.), or Pernemalm et al. (Expert Rev Proteomics. 2014 August; 11(4):431-48. doi: 10.1586/14789450.2014.901157. Epub 2014 Mar. 24.).
The proteomic data of the biological sample can be identified, measured, and quantified using a number of different analytical techniques. For example, proteomic data can be generated using SDS-PAGE or any gel-based separation technique. Peptides and proteins can also be identified, measured, and quantified using an immunoassay, such as ELISA. Alternatively, proteomic data can be identified, measured, and quantified using mass spectrometry, high performance liquid chromatography, LC-MS/MS, Edman Degradation, immunoaffinity techniques, methods disclosed in EP3548652, WO2019083856, WO2019133892, each of which is incorporated herein by reference in its entirety, and other protein separation techniques.
An assay may comprise protein collection of particles, protein digestion, and mass spectrometric analysis (e.g., MS, LC-MS, LC-MS/MS). The digestion may comprise chemical digestion, such as by cyanogen bromide or 2-Nitro-5-thiocyanatobenzoic acid (NTCB). The digestion may comprise enzymatic digestion, such as by trypsin or pepsin. The digestion may comprise enzymatic digestion by a plurality of proteases. The digestion may comprise a protease selected from among the group consisting of trypsin, chymotrypsin, Glu C, Lys C, elastase, subtilisin, proteinase K, thrombin, factor X, Arg C, papaine, Asp N, thermolysine, pepsin, aspartyl protease, cathepsin D, zinc mealloprotease, glycoprotein endopeptidase, proline, aminopeptidase, prenyl protease, caspase, kex2 endoprotease, or any combination thereof. The digestion may cleave peptides at random positions. The digestion may cleave peptides at a specific position (e.g., at methionines) or sequence (e.g., glutamate-histidine-glutamate). The digestion may enable similar proteins to be distinguished. For example, an assay may resolve 8 distinct proteins as a single protein group with a first digestion method, and as 8 separate proteins with distinct signals with a second digestion method. The digestion may generate an average peptide fragment length of 8 to 15 amino acids. The digestion may generate an average peptide fragment length of 12 to 18 amino acids. The digestion may generate an average peptide fragment length of 15 to 25 amino acids. The digestion may generate an average peptide fragment length of 20 to 30 amino acids. The digestion may generate an average peptide fragment length of 30 to 50 amino acids.
An assay may rapidly generate and analyze proteomic data. Beginning with an input biological sample (e.g., a buccal or nasal smear, plasma, or tissue), an assay of the present disclosure may generate and analyze proteomic data in less than 7 hours. Beginning with an input biological sample, an assay of the present disclosure may generate and analyze proteomic data in 5-7 hours. Beginning with an input biological sample, an assay of the present disclosure may generate and analyze proteomic data in less than 5 hours. Beginning with an input biological sample, an assay of the present disclosure may generate and analyze proteomic data in 3-5 hours. Beginning with an input biological sample, an assay of the present disclosure may generate and analyze proteomic data in 2-4 hours. Beginning with an input biological sample, an assay of the present disclosure may generate and analyze proteomic data in 2-3 hours. Beginning with an input biological sample, an assay of the present disclosure may generate and analyze proteomic data in less than 3 hours. Beginning with an input biological sample, an assay of the present disclosure may generate and analyze proteomic data in less than 2 hours. The analyzing may comprise identifying a protein group. The analyzing may comprise identifying a protein class. The analyzing may comprise quantifying an abundance of a biomolecule, a peptide, a protein, protein group, or a protein class. The analyzing may comprise identifying a ratio of abundances of two biomolecules, peptides, proteins, protein groups, or protein classes. The analyzing may comprise identifying a biological state.
The biomolecule corona analysis methods described herein may comprise assaying biomolecules in a sample of the present disclosure across a wide dynamic range. The dynamic range of biomolecules assayed in a sample may be a range of measured signals of biomolecule abundances as measured by an assay method (e.g., mass spectrometry, chromatography, gel electrophoresis, spectroscopy, or immunoassays) for the biomolecules contained within a sample. For example, an assay capable of detecting proteins across a wide dynamic range may be capable of detecting proteins of very low abundance to proteins of very high abundance. The dynamic range of an assay may be directly related to the slope of assay signal intensity as a function of biomolecule abundance. For example, an assay with a low dynamic range may have a low (but positive) slope of the assay signal intensity as a function of biomolecule abundance, e.g., the ratio of the signal detected for a high abundance biomolecule to the ratio of the signal detected for a low abundance biomolecule may be lower for an assay with a low dynamic range than an assay with a high dynamic range. In specific cases, dynamic range may refer to the dynamic range of proteins within a sample or assaying method.
The particle panels disclosed herein can be used to identify the number of distinct proteins disclosed herein, and/or any of the specific proteins disclosed herein, over a wide dynamic range. As used herein, a dynamic range may denote a log10 value of a ratio of the highest and lowest abundance species of a specified type. Enriching or assaying species over a dynamic range may refer to the abundances of those species in the sample from which they were assayed or derived. For example, the particle panels disclosed herein comprising distinct particle types, can enrich for proteins in a sample over the entire dynamic range at which proteins are present in a sample (e.g., a plasma sample). In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 2 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 3 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 4 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of a about 5 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 6 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 7 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 8 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 9 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 10 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 11 to about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of about 12. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of from about 2 to about 6. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of from about 3 to about 8. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of from about 4 to 8. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of from about 5 to about 10. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of from about 6 to about 10. In some cases, a particle panel including any number of distinct particle types disclosed herein, enriches and identifies proteins over a dynamic range of from about 6 to about 12.
The biomolecule corona analysis methods described herein may compress the dynamic range of an assay. The dynamic range of an assay may be compressed relative to another assay if the slope of the assay signal intensity as a function of biomolecule abundance is lower than that of the other assay. For example, a plasma sample assayed using protein corona analysis with mass spectrometry may have a compressed dynamic range compared to a plasma sample assayed using mass spectrometry alone, directly on the sample or compared to provided abundance values for plasma proteins in databases (e.g., the database provided in Keshishian et al., Mol. Cell Proteomics 14, 2375-2393 (2015), also referred to herein as the “Carr database”). The compressed dynamic range may enable the detection of more low abundance biomolecules in a biological sample using biomolecule corona analysis with mass spectrometry than using mass spectrometry alone.
The dynamic range of a proteomic analysis assay may be the ratio of the signal produced by highest abundance proteins (e.g., the highest 10% of proteins by abundance) to the signal produced by the lowest abundance proteins (e.g., the lowest 10% of proteins by abundance). Compressing the dynamic range of a proteomic analysis may comprise decreasing the ratio of the signal produced by the highest abundance proteins to the signal produced by the lowest abundance proteins for a first proteomic analysis assay relative to that of a second proteomic analysis assay. The protein corona analysis assays disclosed herein may compress the dynamic range relative to the dynamic range of a total protein analysis method (e.g., mass spectrometry, gel electrophoresis, or liquid chromatography).
Provided herein are several methods for compressing the dynamic range of a biomolecular analysis assay to facilitate the detection of low abundance biomolecules relative to high abundance biomolecules. For example, a particle type of the present disclosure can be used to serially interrogate a sample. Upon incubation of the particle type in the sample, a biomolecule corona comprising forms on the surface of the particle type. If biomolecules are directly detected in the sample without the use of the particle types, for example by direct mass spectrometric analysis of the sample, the dynamic range may span a wider range of concentrations, or more orders of magnitude, than if the biomolecules are directed on the surface of the particle type. Thus, using the particle types disclosed herein may be used to compress the dynamic range of biomolecules in a sample. Without being limited by theory, this effect may be observed due to more capture of higher affinity, lower abundance biomolecules in the biomolecule corona of the particle type and less capture of lower affinity, higher abundance biomolecules in the biomolecule corona of the particle type.
A dynamic range of a proteomic analysis assay may be illustrated by the slope of a plot of a protein signal measured by the proteomic analysis assay as a function of total abundance of the protein in the sample. Compressing the dynamic range may comprise decreasing the slope of the plot of a protein signal measured by a proteomic analysis assay as a function of total abundance of the protein in the sample relative to the slope of the plot of a protein signal measured by a second proteomic analysis assay as a function of total abundance of the protein in the sample. The protein corona analysis assays disclosed herein may compress the dynamic range relative to the dynamic range of a total protein analysis method (e.g., mass spectrometry, gel electrophoresis, or liquid chromatography).
Provided herein are kits comprising compositions of the present disclosure that may be used to perform the methods of the present disclosure. A kit may comprise one or more particle types to interrogate a sample to identify a biological state of a sample. In some cases, a kit may comprise a particle type provided in TABLES 1-5. A kit may comprise a reagent for functionalizing a particle (e.g., a reagent for tethering a small molecule functionalization to a particle surface). The kit may be pre-packaged in discrete aliquots. In some cases, the kit can comprise a plurality of different particle types that can be used to interrogate a sample. The plurality of particle types can be pre-packaged where each particle type of the plurality is packaged separately. Alternately, the plurality of particle types can be packaged together to contain combination of particle types in a single package. A particle may be provided in dried (e.g., lyophilized) form, or may be provided in a suspension or solution. The particles may be provided in a well plate. For example, a kit may contain an 8 well plate, an 8-384 well plate with particles provided (e.g., sealed) within the wells. For example, a well plate may comprise at least 8, at least 16, at least 24, at least 32, at least 40, at least 48, at least 56, at least 64, at least 72, at least 80, at least 88, at least 96, at least 104, at least 112, at least 120, at least 128, at least 136, at least 144, at least 152, at least 160, at least 168, at least 176, at least 184, at least 192, at least 200, at least 208, at least 216, at least 224, at least 232, at least 240, at least 248, at least 256, at least 264, at least 272, at least 280, at least 288, at least 296, at least 304, at least 312, at least 320, at least 328, at least 336, at least 344, at least 352, at least 360, at least 368, at least 376, at least 384, at least 392, at least 400 wells comprising particles. Two wells in such a well plate may contain different particles or different concentrations of particles. Two wells may comprise different buffers or chemical conditions. For example, a well plate may be provided with different particles in each row of wells and different buffers in each column of rows. A well may be sealed by a removable covering. For example, a kit may comprise a well plate comprising a plastic slip covering a plurality of wells. A well may be sealed by a pierceable covering. For example, a well may be covered by a septum that a needle can pierce to facilitate sample movement into and out of the well.
The present disclosure provides a range of samples that can be assayed using the particles and the methods provided herein. A sample may be a biological sample (e.g., a sample derived from a living organism). A sample may comprise a cell or be cell-free. A sample may comprise a biofluid, such as blood, serum, plasma, urine, or cerebrospinal fluid (CSF). Samples of the present disclosure include biological samples from a subject. A method may include analyzing a sample from a single subject, or analyzing samples from multiple subjects. The subject may be a human or a non-human animal. The biological samples can contain a plurality of proteins or proteomic data, which may be analyzed after adsorption of proteins to the surface of the various sensor element (e.g., particle) types in a panel and subsequent digestion of protein coronas. Proteomic data can comprise nucleic acids, peptides, or proteins. A biofluid may be a fluidized solid, for example a tissue homogenate, or a fluid extracted from a biological sample. A biological sample may be, for example, a tissue sample or a fine needle aspiration (FNA) sample. A biological sample may be a cell culture sample. For example, a biofluid may be a fluidized cell culture extract.
A wide range of samples are compatible for use within the methods and compositions of the present disclosure. The biological sample may comprise plasma, serum, urine, cerebrospinal fluid, synovial fluid, tears, saliva, whole blood, a blood component (e.g., plasma or white blood cells), milk, nipple aspirate, ductal lavage, vaginal fluid, nasal fluid, ear fluid, gastric fluid, pancreatic fluid, trabecular fluid, lung lavage, sweat, crevicular fluid, semen, prostatic fluid, sputum, fecal matter, bronchial lavage, fluid from swabbings, bronchial aspirants, fluidized solids, fine needle aspiration samples, tissue homogenates, lymphatic fluid, cell culture samples, or any combination thereof. The biological sample may comprise blood or a blood component. The biological sample may comprise multiple biological samples (e.g., pooled plasma from multiple subjects, or multiple tissue samples from a single subject). The biological sample may comprise a single type of biofluid or biomaterial from a single source. A biological sample may comprise a nerve biopsy.
Various methods of the present disclosure utilize blood or blood components (e.g., red blood cells, buffy coats, plasma). Contrasting many tissue biopsies, which can be damaging and cost intensive, blood collection is often relatively facile and benign, and is therefore suitable for routine and low-risk patient monitoring. Furthermore, as human blood is estimated to contain over 5000 types of protein groups whose abundances and forms (e.g., post-translationally modifications and variant types) can be responsive to, the blood proteome offers a biological state changes are often evidenced by subtle changes in blood protein composition. A method of the present disclosure may use whole blood (e.g., untreated blood drawn from a subject). A method of the present disclosure may also use a treated or partitioned blood sample. In some cases, a sample comprises plasma, buffy coat, white blood cells, platelets, hematocrit, red blood cells, serum, blood clots or any combination thereof. In some cases, plasma, buffy coat, white blood cells, platelets, hematocrit, red blood cells, serum, blood clots or any combination thereof are extracted from a blood sample for use in a method disclosed herein.
In some cases, a method utilizes serum. As used herein, “serum” may denote the liquid fraction remaining after a blood sample clots. As a blood sample left at room temperature will typically clot within 15-60 minutes, serum may be prepared by incubating a blood sample at or above room temperature, for example at 25° C. or at 37° C., respectively. After at least about 10 minutes, at least about 15 minutes, at least about 20 minutes, at least about 30 minutes, at least about 40 minutes, at least about 50 minutes, or at least about 60 minutes, the blood clots may be separated from solution through centrifugation. While serum is often prepared non-hemolyzed (e.g., wherein blood cells remain intact through clotting and removal), some methods of the present disclosure may utilize serum derived from hemolyzed blood samples.
In some cases, a method utilizes plasma. As used herein, “plasma” may denote a fraction collected from blood pretreated with an anticoagulant and separated from blood cells and platelets. Contrasting with serum, plasma typically contains an array of clotting factors, such as fibrinogen, prothrombin, and proaccelerin. As the concentrations and forms of these species can reflect certain health conditions, plasma analysis can provide greater diagnostic insight than serum analysis for some biological states. Plasma samples can be prepared treating blood with an anticoagulant, and then centrifuging the treated blood. The anticoagulant may comprise citrate, ethylenediaminetetraaceticacid (EDTA), potassium oxalate, hirudin, argatroban, ximelagatran, heparin, fondaparinux, or any combination thereof.
Centrifugation parameters affect the proteins which remain in solution, and therefore may be modified depending on the biomolecules of interest for detection from plasma or serum. Centrifugation may be performed for at least 2 minutes, at least 4 minutes, at least 6 minutes, at least 8 minutes, at least 10 minutes, at least 12 minutes, at least 15 minutes, at least 20 minutes, or at least 30 minutes. Centrifugation may be performed for at most 30 minutes, at most 20 minutes, at most 15 minutes, at most 10 minutes, at most 8 minutes, at most 6 minutes, at most 4 minutes, or at most 2 minutes. Centrifugation may impart at least 100 gravitational force equivalents (g), at least 200 g, at least 300 g, at least 400 g, at least 500 g, at least 600 g, at least 800 g, at least 1000 g, at least 1200 g, at least 1500 g, at least 1800 g, at least 2000 g, at least 2500 g, at least 3000 g, at least 4000 g, at least 5000 g, at least 6000 g, at least 8000 g, or at least 10000 g. The centrifugation may impart at most 100 g, at most 200 g, at most 300 g, at most 400 g, at most 500 g, at most 600 g, at most 800 g, at most 1000 g, at most 1200 g, at most 1500 g, at most 1800 g, at most 2000 g, at most 2500 g, at most 3000 g, at most 4000 g, at most 5000 g, at most 6000 g, at most 8000 g, or at most 10000 g.
The biological sample may be diluted or pre-treated. The biological sample may undergo depletion (e.g., albumin removal from serum or plasma) prior to or following contact with a particle or plurality of particles. The biological sample may also undergo physical (e.g., homogenization or sonication) or chemical treatment prior to or following contact with a particle or plurality of particles. The biological sample may be diluted prior to or following contact with a particle or plurality of particles. The dilution medium may comprise buffer or salts, or be purified water (e.g., distilled water). Different partitions of a biological sample may undergo different degrees of dilution. A biological sample or a portion thereof may undergo a 1.1-fold, 1.2-fold, 1.3-fold, 1.4-fold, 1.5-fold, 2-fold, 3-fold, 4-fold, 5-fold, 6-fold, 8-fold, 10-fold, 12-fold, 15-fold, 20-fold, 30-fold, 40-fold, 50-fold, 75-fold, 100-fold, 200-fold, 500-fold, or 1000-fold dilution. For example, a plasma sample may be subjected to a 5-fold dilution with buffer prior to analysis.
The compositions and methods of the present disclosure can be used to measure, detect, and identify specific proteins from biological samples. Examples of proteins that can be identified and measured include highly abundant proteins, proteins of medium abundance, and low-abundance proteins. For example, a composition or method may identify at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 10, at least 12, at least 15, at least 18, at least 20, at least 25, at least 30, at least 35, at least 40, or at least 50 human plasma proteins from the group consisting of albumin, immunoglobulin G (IgG), lysozyme, carcino embryonic antigen (CEA), receptor tyrosine-protein kinase erbB-2 (HER-2/neu), bladder tumor antigen, thyroglobulin, alpha-fetoprotein, prostate specific antigen (PSA), mucin 16 (CA125), carbohydrate antigen 19-9 (CA19.9), carcinoma antigen 15-3 (CA15.3), leptin, prolactin, osteopontin, insulin-like growth factor 2 (IGF-II), 4F2 cell-surface antigen heavy chain (CD98), fascin, sPigR, 14-3-3 eta, troponin I, B-type natriuretic peptide, breast cancer type 1 susceptibility protein (BRCA1), c-Myc proto-oncogene protein (c-Myc), interleukin-6 (IL-6), fibrinogen, epidermal growth factor receptor (EGFR), gastrin, PH, granulocyte colony-stimulating factor (G CSF), desmin, enolase 1 (NSE), folice-stimulating hormone (FSH), vascular endothelial growth factor (VEGF), P21, Proliferating cell nuclear antigen (PCNA), calcitonin, pathogenesis-related proteins (PR), luteinizing hormone (LH), somatostatin S100, insulin. alpha-prolactin, adrenocorticotropic hormone (ACTH), B-cell lymphoma 2 (Bcl 2), estrogen receptor alpha (ER alpha), antigen k (Ki-67), tumor protein (p53), cathepsin D, beta catenin, von Willebrand factor (VWF), CD15, k-ras, caspase 3, ENTH domain-containing protein (EPN), CD10, FAS, breast cancer type 2 susceptibility protein (BRCA2), CD30L, CD30, CGA, CRP, prothrombin, CD44, APEX, transferrin, GM-CSF, E-cadherin, interleukin-2 (IL-2), Bax, IFN-gamma, beta-2-MG, tumor necrosis factor alpha (TNF alpha), cluster of differentiation 340, trypsin, cyclin D1, MG B, XBP-1, HG-1, YKL-40, S-gamma, ceruloplasmin, NESP-55, netrin-1, geminin, GADD45A, CDK-6, CCL21, breast cancer metastasis suppressor 1 (BrMS1), 17betaHDI, platelet-derived growth factor receptor A (PDGRFA), P300/CBP-associated factor (Pcaf), chemokine ligand 5 (CCLS), matrix metalloproteinase-3 (MMP3), claudin-4, and claudin-3
The compositions and methods disclosed herein can be used to identify various biological states of samples and subjects from which samples are derived. As an example, biological state can refer to an elevated or low level of a particular biomolecule or set of biomolecules, such as elevated blood glucose or misfolded alpha synuclein. Biological state may also refer to a particular pathology, such as Alzheimer's disease, or a stage of the pathology, such as early, middle, or late stage dementia. In other examples, a biological state can refer to identification of a disease, such as cancer. The particles and methods of us thereof can be used to distinguish between two biological states. The two biological states may be related diseases states (e.g., mild cognitive impairment and Alzheimer's disease). The two biological states may be different phases of a disease, such as pre-Alzheimer's and mild Alzheimer's. The two biological states may be distinguished with a high degree of accuracy (e.g., the percentage of accurately identified biological states among a population of samples). For example, the compositions and methods of the present disclosure may distinguish two biological states with at least 60% accuracy, at least 70% accuracy, at least 75% accuracy at least 80% accuracy, at least 85% accuracy, at least 90% accuracy, at least 95% accuracy, at least 98% accuracy, or at least 99% accuracy. The two biological states may be distinguished with a high degree of specificity (e.g., the rate at which negative results are correctly identified among a population of samples). For example, the compositions and methods of the present disclosure may distinguish two biological states with at least 60% specificity, at least 70% specificity, at least 75% specificity at least 80% specificity, at least 85% specificity, at least 90% specificity, at least 95% specificity, at least 98% specificity, or at least 99% specificity.
The methods, compositions, and systems of the present disclosure may detect a neurological disease state. Neurological disorders or neurological diseases are used interchangeably and refer to diseases associated with neurological tissues, such as the brain, the spinal chord, and the nerves that connect them. Neurological diseases include, but are not limited to, brain tumors, epilepsy, Parkinson's disease, Alzheimer's disease, ALS, arteriovenous malformation, cerebrovascular disease, brain aneurysms, epilepsy, multiple sclerosis, Peripheral Neuropathy, Post-Herpetic Neuralgia, stroke, frontotemporal dementia, demyelinating disease (including but are not limited to, multiple sclerosis, Devic's disease (i.e. neuromyelitis optica), central pontine myelinolysis, progressive multifocal leukoencephalopathy, leukodystrophies, Guillain-Barre syndrome, progressing inflammatory neuropathy, Charcot-Marie-Tooth disease, chronic inflammatory demyelinating polyneuropathy, and anti-MAG peripheral neuropathy) and the like. Neurological disorders also include immune-mediated neurological disorders (IMNDs), which include diseases with at least one component of the immune system reacts against host proteins present in the central or peripheral nervous system and contributes to disease pathology. IMNDs may include, but are not limited to, demyelinating disease, paraneoplastic neurological syndromes, immune-mediated encephalomyelitis, immune-mediated autonomic neuropathy, myasthenia gravis, autoantibody-associated encephalopathy, and acute disseminated encephalomyelitis.
Methods, systems, and/or apparatuses of the present disclosure may be able to accurately distinguish between patients with or without Alzheimer's disease. These may also be able to detect patients who are pre-symptomatic and may develop Alzheimer's disease several years after the screening. This provides advantages of being able to treat a disease at a very early stage, even before development of the disease.
The methods, compositions, and systems of the present disclosure can detect a pre-disease stage of a disease or disorder. A pre-disease stage is a stage at which the patient has not developed any signs or symptoms of the disease. A pre-neurological disease stage would be a stage in which a person has not developed one or more symptom of the neurological disease. The ability to diagnose a disease before one or more sign or symptom of the disease is present allows for close monitoring of the subject and the ability to treat the disease at a very early stage, increasing the prospect of being able to halt progression or reduce the severity of the disease.
The methods, compositions, and systems of the present disclosure may detect the early stages of a disease or disorder. Early stages of the disease can refer to when the first signs or symptoms of a disease may manifest within a subject. The early stage of a disease may be a stage at which there are no outward signs or symptoms. For example, in Alzheimer's disease an early stage may be a pre-Alzheimer's stage in which no symptoms are detected yet the patient will develop Alzheimer's months or years later.
Identifying a disease in either pre-disease development or in the early states may often lead to a higher likelihood for a positive outcome for the patient. For example, diagnosing dementia at an early stage (stage 0 or stage 1) can enable early stage interventions, which may slow or even halt its progression, and increase the quality of life and life expectancy of the patient.
In some cases, the methods, compositions, and systems of the present disclosure are able to detect intermediate stages of the disease. Intermediate states of the disease describe stages of the disease that have passed the first signs and symptoms and the patient is experiencing one or more symptom of the disease. Further, the methods, compositions, and systems of the present disclosure may be able to detect late or advanced stages of the disease. Late or advanced stages of the disease may also be called “severe” or “advanced” and usually indicates that the subject is suffering from multiple symptoms and effects of the disease.
The methods of the present disclosure can include processing the biomolecule corona data of a sample against a collection of biomolecule corona datasets representative of a plurality of diseases and/or a plurality of disease states to determine if the sample indicates a disease and/or disease state. For example, samples can be collected from a population of subjects over time. Once the subjects develop a disease or disorder, the present disclosure allows for the ability to characterize and detect the changes in biomolecule fingerprints over time in the subject by computationally analyzing the biomolecule fingerprint of the sample from the same subject before they have developed a disease to the biomolecule fingerprint of the subject after they have developed the disease. Samples can also be taken from cohorts of patients who all develop the same disease, allowing for analysis and characterization of the biomolecule fingerprints that are associated with the different stages of the disease for these patients (e.g. from pre-disease to disease states).
In some cases, the methods, compositions, and systems of the present disclosure are able to distinguish not only between different types of diseases, but also between the different stages of the disease (e.g. early stages of disease). This can comprise distinguishing healthy subjects from pre-disease state subjects. The pre-disease state may be, for example, a neurodegenerative disease, dementia.
The present disclosure provides computer control systems that are programmed to implement methods of the disclosure.
The computer system 101 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 105, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 101 also includes memory or memory location 110 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 115 (e.g., hard disk), communication interface 120 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 125, such as cache, other memory, data storage and/or electronic display adapters. The memory 110, storage unit 115, interface 120 and peripheral devices 125 are in communication with the CPU 105 through a communication bus (solid lines), such as a motherboard. The storage unit 115 can be a data storage unit (or data repository) for storing data. The computer system 101 can be operatively coupled to a computer network (“network”) 130 with the aid of the communication interface 120. The network 130 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 130 in some cases is a telecommunication and/or data network. The network 130 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 130, in some cases with the aid of the computer system 101, can implement a peer-to-peer network, which may enable devices coupled to the computer system 101 to behave as a client or a server.
The CPU 105 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 110. The instructions can be directed to the CPU 105, which can subsequently program or otherwise configure the CPU 105 to implement methods of the present disclosure. Examples of operations performed by the CPU 105 can include fetch, decode, execute, and writeback.
The CPU 105 can be part of a circuit, such as an integrated circuit. One or more other components of the system 101 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
The storage unit 115 can store files, such as drivers, libraries and saved programs. The storage unit 115 can store user data, e.g., user preferences and user programs. The computer system 101 in some cases can include one or more additional data storage units that are external to the computer system 101, such as located on a remote server that is in communication with the computer system 101 through an intranet or the Internet.
The computer system 101 can communicate with one or more remote computer systems through the network 130. For instance, the computer system 101 can communicate with a remote computer system of a user. Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 101 via the network 130.
Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 101, such as, for example, on the memory 110 or electronic storage unit 115. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 105. In some cases, the code can be retrieved from the storage unit 115 and stored on the memory 110 for ready access by the processor 105. In some situations, the electronic storage unit 115 can be precluded, and machine-executable instructions are stored on memory 110.
The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
Aspects of the systems and methods provided herein, such as the computer system 101, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
The computer system 101 can include or be in communication with an electronic display 135 that comprises a user interface (UI) 140 for providing, for example a readout of the proteins identified using the methods disclosed herein. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.
Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 105.
Determination, analysis or statistical classification can be performed using methods, including, but not limited to, for example, a supervised and unsupervised data analysis and clustering approaches such as hierarchical cluster analysis (HCA), principal component analysis (PCA), Partial least squares Discriminant Analysis (PLSDA), machine learning (e.g., Random Forest), logistic regression, decision trees, support vector machine (SVM), k-nearest neighbors, naive Bayes, linear regression, polynomial regression, SVM for regression, K-means clustering, and hidden Markov models, among others. The computer system can perform various aspects of analyzing the protein sets or protein corona of the present disclosure, such as, for example, comparing/analyzing the biomolecule corona of several samples to determine with statistical significance what patterns are common between the individual biomolecule coronas to determine a protein set that is associated with the biological state. The computer system can be used to develop classifiers to detect and discriminate different protein sets or protein corona (e.g., characteristic of the composition of a protein corona). Data collected from the presently disclosed sensor array can be used to train a machine learning algorithm, specifically an algorithm that receives array measurements from a patient and outputs specific biomolecule corona compositions from each patient. Before training the algorithm, raw data from the array can be first denoised to reduce variability in individual variables.
Machine learning can be generalized as the ability of a learning machine to perform accurately on new, unseen examples/tasks after having experienced a learning data set. Machine learning may include the following concepts and methods. Supervised learning concepts may include AODE; Artificial neural network, such as Backpropagation, Autoencoders, Hopfield networks, Boltzmann machines, Restricted Boltzmann Machines, and Spiking neural networks; Bayesian statistics, such as Bayesian network and Bayesian knowledge base; Case-based reasoning; Gaussian process regression; Gene expression programming; Group method of data handling (GMDH); Inductive logic programming; Instance-based learning; Lazy learning; Learning Automata; Learning Vector Quantization; Logistic Model Tree; Minimum message length (decision trees, decision graphs, etc.), such as Nearest Neighbor Algorithm and Analogical modeling; Probably approximately correct learning (PAC) learning; Ripple down rules, a knowledge acquisition methodology; Symbolic machine learning algorithms; Support vector machines; Random Forests; Ensembles of classifiers, such as Bootstrap aggregating (bagging) and Boosting (meta-algorithm); Ordinal classification; Information fuzzy networks (IFN); Conditional Random Field; ANOVA; Linear classifiers, such as Fisher's linear discriminant, Linear regression, Logistic regression, Multinomial logistic regression, Naive Bayes classifier, Perceptron, Support vector machines; Quadratic classifiers; k-nearest neighbor; Boosting; Decision trees, such as C4.5, Random forests, ID3, CART, SLIQ SPRINT; Bayesian networks, such as Naive Bayes; and Hidden Markov models. Unsupervised learning concepts may include; Expectation-maximization algorithm; Vector Quantization; Generative topographic map; Information bottleneck method; Artificial neural network, such as Self-organizing map; Association rule learning, such as, Apriori algorithm, Eclat algorithm, and FPgrowth algorithm; Hierarchical clustering, such as Singlelinkage clustering and Conceptual clustering; Cluster analysis, such as, K-means algorithm, Fuzzy clustering, DBSCAN, and OPTICS algorithm; and Outlier Detection, such as Local Outlier Factor. Semi-supervised learning concepts may include; Generative models; Low-density separation; Graph-based methods; and Co-training. Reinforcement learning concepts may include; Temporal difference learning; Q-learning; Learning Automata; and SARSA. Deep learning concepts may include; Deep belief networks; Deep Boltzmann machines; Deep Convolutional neural networks; Deep Recurrent neural networks; and Hierarchical temporal memory. A computer system may be adapted to implement a method described herein. The system includes a central computer server that is programmed to implement the methods described herein. The server includes a central processing unit (CPU, also “processor”) which can be a single core processor, a multi core processor, or plurality of processors for parallel processing. The server also includes memory (e.g., random access memory, read-only memory, flash memory); electronic storage unit (e.g. hard disk); communications interface (e.g., network adaptor) for communicating with one or more other systems; and peripheral devices which may include cache, other memory, data storage, and/or electronic display adaptors. The memory, storage unit, interface, and peripheral devices are in communication with the processor through a communications bus (solid lines), such as a motherboard. The storage unit can be a data storage unit for storing data. The server is operatively coupled to a computer network (“network”) with the aid of the communications interface. The network can be the Internet, an intranet and/or an extranet, an intranet and/or extranet that is in communication with the Internet, a telecommunication or data network. The network in some cases, with the aid of the server, can implement a peer-to-peer network, which may enable devices coupled to the server to behave as a client or a server.
The storage unit can store files, such as subject reports, and/or communications with the data about individuals, or any aspect of data associated with the present disclosure.
The computer server can communicate with one or more remote computer systems through the network. The one or more remote computer systems may be, for example, personal computers, laptops, tablets, telephones, Smart phones, or personal digital assistants.
In some applications the computer system includes a single server. In other situations, the system includes multiple servers in communication with one another through an intranet, extranet and/or the internet.
The server can be adapted to store measurement data or a database as provided herein, patient information from the subject, such as, for example, medical history, family history, demographic data and/or other clinical or personal information of potential relevance to a particular application. Such information can be stored on the storage unit or the server and such data can be transmitted through a network.
Methods as described herein can be implemented by way of machine (or computer processor) executable code (or software) stored on an electronic storage location of the server, such as, for example, on the memory, or electronic storage unit. During use, the code can be executed by the processor. In some cases, the code can be retrieved from the storage unit and stored on the memory for ready access by the processor. In some situations, the electronic storage unit can be precluded, and machine-executable instructions are stored on memory. Alternatively, the code can be executed on a second computer system.
Aspects of the systems and methods provided herein, such as the server, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical, and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless likes, optical links, or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” can refer to any medium that participates in providing instructions to a processor for execution.
The computer systems described herein may comprise computer-executable code for performing any of the algorithms or algorithms-based methods described herein. In some applications the algorithms described herein will make use of a memory unit that is comprised of at least one database.
Data relating to the present disclosure can be transmitted over a network or connections for reception and/or review by a receiver. The receiver can be but is not limited to the subject to whom the report pertains; or to a caregiver thereof, e.g., a health care provider, manager, other health care professional, or other caretaker; a person or entity that performed and/or ordered the analysis. The receiver can also be a local or remote system for storing such reports (e.g. servers or other systems of a “cloud computing” architecture). In one embodiment, a computer-readable medium includes a medium suitable for transmission of a result of an analysis of a biological sample using the methods described herein.
Aspects of the systems and methods provided herein can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide nontransitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
Further disclosed herein are computer-implemented systems for identifying biological state information from biomolecule corona data. The computer-implemented system may comprise a communication interface configured to receive data, such as biomolecule corona data. The communication interface may receive data over a communication network, such as a cloud-based network or a computer server-based network, or a storage device such as a flash drive memory device or a compact disc. The computer-implemented system may comprise a computer in communication with the communication interface. The computer may comprise one or more processors, as well as computer readable medium comprising machine-executable code which may be executed by the one or more processors, and which may be configured to implement a method. The method may process biomolecule corona data, for example by filtering or baseline correcting a portion of the data. The method may identify a biomolecule (e.g., a protein, a protein group, a saccharide, a nucleic acid, or a metabolite). The method may identify an abundance of a biomolecule or an intensity of a signal (e.g., by performing a Gaussian or Lorentzian fit to a peak in the data). The method may identify a ratio of two or more biomolecule abundances or two or more signal intensities. The method may comprise a machine learning algorithm or a trained algorithm for biological state analysis. The method may identify a biological state based at least in part on the biomolecule corona data.
The computer may comprise one or more processors, as well as computer readable medium which may be executed by the one or more processors to communicate with an instrument through the communication interface, and operate or provide parameters (e.g., temperatures, incubation times, number of wash cycles) the instrument to perform biomolecule corona analysis (e.g., perform biological sample-particle incubation, wash, digestion, and solid-phase extraction). For example, upon input of a sample and reagents into an automated instrument for biomolecule corona analysis, the computer may prompt a user for information regarding the sample or intended assay, and then execute a biomolecule corona analysis method based on the information by the user, such as sample type, intended depth of sample coverage (e.g., in some cases, the length of particle-biological sample incubation times may affect the number of protein groups identified in an assay).
The computer may comprise one or more processors, as well as computer readable medium which may be executed by the one or more processors to communicate with an instrument configured to analyze a sample which has been subjected to biomolecule corona analysis through the communication interface, and to operate or provide parameters to the instrument, as well as computer readable medium which may be executed by the one or more processors to operate an instrument configured to perform biomolecule corona analysis. For example, the computer may provide parameters to a mass spectrometer for analysis of a protease digested biomolecule corona.
The method of determining a set of proteins associated with the disease or disorder and/or disease state include the analysis of the corona of the at least two samples. This determination, analysis or statistical classification can be performed using methods, including, but not limited to, for example, supervised and unsupervised data analysis, machine learning, deep learning, and clustering approaches including hierarchical cluster analysis (HCA), principal component analysis (PCA), Partial least squares Discriminant Analysis (PLS-DA), random forest, logistic regression, decision trees, support vector machine (SVM), k-nearest neighbors, naive bayes, linear regression, polynomial regression, SVM for regression, K-means clustering, and hidden Markov models, among others. In other words, the proteins in the corona of each sample can be compared/analyzed with each other to determine with statistical significance what patterns are common between the individual corona to determine a set of proteins that is associated with the disease or disorder or disease state.
Generally, machine learning algorithms are used to construct models that accurately assign class labels to datasets or features within datasets based on a set of input features. In some case it may be advantageous to employ machine learning and/or deep learning approaches for the methods described herein. For example, machine learning can be used to associate the protein corona with various disease states (e.g. no disease, precursor to a disease, having early or late stage of the disease, etc.). For example, in some cases, one or more machine learning algorithms are employed in connection with a method of the invention to analyze data detected and obtained by the protein corona and sets of proteins derived therefrom. For example, a machine learning algorithm may be trained to distinguish subjects with Alzheimer's disease from healthy subjects.
A method or system (e.g., a computer-implemented system) may utilize biomolecule corona data for classifier training and as an input on which a trained classifier may perform analysis. The biomolecule corona data may comprise raw data (data acquired directly from an instrument such as a mass spectrometer, or data which has been subjected to basic pre-processing and filtering steps, such as baseline flattening), processed data (e.g., a list of mass spectrometry peaks identified above a baseline signal-to-noise threshold, a ratio of two mass spectrometry peak intensities), annotated data (e.g., a list of peptides identified from mass spectrometric data), or any combination thereof. As the present disclosure provides methods for identifying biomolecules spanning broad dynamic ranges, biomolecule corona data used for training or biological sample analysis may span about 2 to about 12 orders of magnitude in terms of biomolecule concentration in the biological sample, about 4 to about 12 orders of magnitude in terms of biomolecule concentration in the biological sample, about 5 to about 12 orders of magnitude in terms of biomolecule concentration in the biological sample, about 6 to about 12 orders of magnitude in terms of biomolecule concentration in the biological sample, about 7 to about 12 orders of magnitude in terms of biomolecule concentration in the biological sample, about 8 to about 12 orders of magnitude in terms of biomolecule concentration in the biological sample, about 4 to about 10 orders of magnitude in terms of biomolecule concentration in the biological sample, about 5 to about 10 orders of magnitude in terms of biomolecule concentration in the biological sample, about 6 to about 10 orders of magnitude in terms of biomolecule concentration in the biological sample, about 7 to about 10 orders of magnitude in terms of biomolecule concentration in the biological sample, about 8 to about 10 orders of magnitude in terms of biomolecule concentration in the biological sample, about 2 to about 8 orders of magnitude in terms of biomolecule concentration in the biological sample, about 4 to about 8 orders of magnitude in terms of biomolecule concentration in the biological sample, about 6 to about 8 orders of magnitude in terms of biomolecule concentration in the biological sample, about 2 to about 6 orders of magnitude in terms of biomolecule concentration in the biological sample, about 4 to about 6 orders of magnitude in terms of biomolecule concentration in the biological sample, about 2 to about 4 orders of magnitude in terms of biomolecule concentration in the biological sample, or about 2 to about 3 orders of magnitude in terms of biomolecule concentration in the biological sample. For example, the top 20 particle-specific protein biomarkers from the Random Forest model summarized in
Aspects of the present disclosure increase the amount of information derived from biological sample analysis. Some biological states are not distinguishable solely through biomolecule identification. For example, identifying concentrations for the thirty most abundant proteins in a plasma sample is often insufficient for distinguishing subjects afflicted with Alzheimer's disease from healthy subjects. The present disclosure provides a range of approaches for increasing the dimensionality of biological sample data, and for using the data to identify biological states. In some cases, biomolecule corona data may comprise a ratio of two or more biomolecule abundances or signal intensities. For example, a datapoint may be a ratio of three mass spectrometric peak intensities, and which may comprise greater diagnostic utility than the intensities of all three mass spectrometric peak intensities taken individually.
In some cases, biomolecule corona data comprises particle-level annotations which identify the type of particle a biomolecule was identified on, and further may optionally comprise an abundance of or a signal intensity associated with the biomolecule. For example, in some cases, alpha-2-antiplasmin plasma levels may be weakly diagnostic for Alzheimer's disease, but alpha-2-antiplasmin abundance in biomolecule coronas of a (PDMAPMA)-coated SPION contacted to plasma may vary with a high degree of statistical significance between healthy and Alzheimer's disease samples. In some cases, biomolecule corona data comprises particle-level annotations which identify the type of particle a peptide was identified on. In some cases, a plurality of peptides from a single protein are identified on a single particle. In some cases, biomolecule corona data comprises an abundance ratio of two peptides associated with a single protein on two different particles. In some cases, biomolecule corona data comprises sample condition annotations which identify a condition under which the biomolecule was observed. For example, a datapoint may comprise an abundance of a peptide identified from a biological sample, a particle type on which the peptide was identified, and the osmolarity and pH of the sample.
The present disclosure also identifies a number of proteins which can be diagnostic for neurological diseases. In some cases, a trained classifier utilizes a protein, a peptide fragment of a protein, or a signal associated with a protein in any one of TABLES 7-12. In some cases, a trained classifier utilizes at least two proteins (or associated peptides or signals) from any one of TABLES 7-12. In some cases, a trained classifier utilizes at least three proteins (or associated peptides or signals) from any one of TABLES 7-12. In some cases, a trained classifier utilizes at least four proteins (or associated peptides or signals) from any one of TABLES 7-12. In some cases, a trained classifier utilizes at least five proteins (or associated peptides or signals) from any one of TABLES 7-12. In some cases, a trained classifier utilizes about 2 to about 10, about 4 to about 10, about 5 to about 15, about 5 to about 20, about 8 to about 20, about 10 to about 25, or about 15 to about 30 proteins (or associated peptides or signals) from any one of TABLES 7-12. In some cases, a protein (or associated peptide or signal) is annotated with a particle type or condition used for its detection.
Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Any reference to “or” herein is intended to encompass “and/or” unless otherwise stated.
Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.
Whenever the term “no more than,” “less than,” “less than or equal to,” or “at most” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than” or “less than or equal to,” or “at most” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.
Where values are described as ranges, it will be understood that such disclosure includes the disclosure of all possible sub-ranges within such ranges, as well as specific numerical values that fall within such ranges irrespective of whether a specific numerical value or specific sub-range is expressly stated.
The following examples are illustrative and non-limiting to the scope of the compositions, devices, systems, kits, and methods described herein.
This example covers plasma biomarker identification for Alzheimer's disease (AD) and mild cognitive impairment (MCI). While Alzheimer's disease and mild cognitive impairment can affect homeostasis, expression, and morphology of nervous tissues, profiling these tissues is often intensive, expensive, and can impart permanent damage. The identification of clinically useful biomarkers for Alzheimer's disease and mild cognitive impairment from blood has thus been a long-standing goal. This example covers a particle-based assay for deep plasma proteomic profiling and candidate protein biomarker analysis for Alzheimer's disease and mild cognitive impairment. 200 subject plasma samples, comprising 50 Alzheimer's disease, 50 mild cognitive impairment, and 100 Controls were profiled with two separate 5-particle panels, summarized in TABLE 6 below. Using the 10-particle panel and 85 μL of plasma per nanoparticle, proteins were quantified by data-independent acquisition (DIA) liquid-chromatography mass-spectrometry (LC-MS) over about 6 weeks. Normalized peptide intensities were used in ten rounds of 10-fold cross-validation to develop random forest models for class discrimination.
The data from all 200 subjects (comprising approximately 2,000 nanoparticle corona preparations and MS data acquisition runs) were collected over a period of approximately one month using the 10 particle panel outlined in TABLE 6 for sample processing. A total of 2,617 proteins were detected by the 10 particle panel, with 2,232 proteins present in at least 25% of the samples. Forty proteins with the highest possible Alzheimer's OpenTargets scores were part of this list, including Amyloid beta, ApoE and Clusterin. Median protein counts per nanoparticle ranged from 747 to 1,209. A total of 26,264 peptides were detected, with 16,323 peptides present in at least 25% of the samples. Median peptide counts per nanoparticle ranged from 5,273 to 8,785.
Inclusion criteria for participation in the study included a Mini-Mental State Examination (see Folstein et al. “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975 November; 12(3):189-98.) score of between 14 and 28, age of at least 50, a magnetic-resonance imaging (MM) or computerized tomography (CT) scan within the past two years excluding other pathologies, and a Hachinski score of less than 4. General exclusion criteria included evidence of multi-infarct dementia, drug intoxication, thyroid disease, pernicious anemia, tertiary syphilis, chronic infections of the nervous system, normal pressure hydrocephalus, Huntington's disease, Creutzfeldt-Jakob disease and brain tumors, polypharmacy, or Korsakoffs syndrome as a cause of dementia.
Sample annotations, provided after blinded sample processing, were evaluated in order to understand the study design and any potential issues with respect to between-sample or between-group comparisons. Probable Alzheimer's disease classifications were ascribed to subjects meeting NINCDS-ARDA criteria (McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan E M (1984). “Clinical diagnosis of Alzheimer's disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer's Disease”. Neurology. 34 (7): 939-44.), including Mini-Mental State Examination scores of between 14 and 26, and exhibiting progressive deterioration of specific cognitive functions, impaired activities of daily living and altered patterns of behavior. Probable mild cognitive impairment classifications were ascribed to subjects determined to be memory compliant, not demented, and with preserved cognitive function; with abnormal memory function below education adjusted cutoff on Logical Memory II subscale from the Wechsler Memory Scale—Revised; and with Mini-Mental State Examination scores of between 22 and 28.
The reported gender status for each subject was also used to ascertain significant differences between the comparative groups. In
Using a Fisher test for proportionality comparisons, that observation is confirmed with the gender proportions for the CONTROL-v-MCI as well as the CONTROL-v-DISEASED having significant different proportions (
Protocols for processing the samples are generally described in Blume et al. Nature Communications. 2020; 11(1):3662. Briefly, the 10 particles were separately provided in dry form, and reconstituted with deionized water to final total particle concentrations of 2.5-15 mg/ml. The 200 plasma samples were subjected to 5-fold buffer dilutions, mixed with the particle solutions, and then sealed and incubated at 37° C. for 1 hour with shaking at 300 rpm to promote biomolecule corona formation. After incubation, the plate was placed on top of a magnetic collection device for 5 minutes to draw down the particles. While still magnetically immobilized, the particles were subjected to a series of wash steps with 150 mM KCl and 0.05% CHAPS in a pH 7.4 Tris EDTA buffer to remove non-biomolecule corona bound biomolecules. Next, Lyse buffer was added to each sample and heated at 95° C. for 10 min with agitation at 1000 rpm. Trypsin was added to the samples for protein digestion. After 3 hours at 37° C. and 500 rpm shaking, the trypsin digestion was stopped by lowering sample pH. The particles were magnetically removed from the digested samples. The digested samples were then twice eluted from the filter cartridge and combined. The peptides were analyzed with data-dependent liquid chromatography-tandem mass spectrometry (LC-MS/MS).
The experiments performed for this example used a 16 sample-per plate configuration, and interrogated each sample interrogated with 5 particles. Each sample was interrogated with one of two 5-particle panels, each of which is summarized in TABLE 6. The number of control, MCI, and AD samples per plate, as well as the identities of the particle panels used for interrogation, are provided in
Plasma samples for the 200 subjects were processed without prior knowledge of their diagnostic status using a randomization schema. The intent was to distribute the subject samples from the three classes across the sample preparation plates to avoid any systematic processing bias. The 200 samples in this study were randomized by class across sufficient plates (n=14). One automated biomolecule corona sample preparation instrument and one mass spectrometer were able to process and collect data from all 200 samples in about 6 weeks.
Sample preparation with the particle panels yielded digested peptides in solution which are quantified using ThermoFisher peptide quant kits prior to drying and subsequent resuspension before mass spectrometric analysis. At least in part due to differing physicochemical properties of the particles, peptide yields varied across the 10 particle types (both in terms of total peptide yield and peptide types). Nonetheless, the yields for each particle were fairly consistent across samples. Since constant sample volumes were used for each assay, differences in peptide yield across samples was taken as diagnostic of differences in plasma protein concentrations.
As is shown in
Each processing plate included control samples for various stages of the assay. These included an overall process control which went through the full assay with one nanoparticle as well as a digestion control, an MPE control for the filtration device, and a mass spectrometry control which comprised pre-prepared peptides for mass spectrometric data acquisition evaluation. The layout of the assay plate used in this example and the context of the controls are shown in
To control for measurement stochasticity and inter-sample variations not reflective of biological state, the results were filtered to exclude protein groups not observed in at least 25% of samples within the study.
As reproducible measurement is often a key requirement in proteomics profiling and biomarker studies, a reasonably robust and relatively simple normalization strategy was implemented. First, the protein log intensity data were median normalized using reference proteins defined as those present in all samples in the study for each given particle type. Then a scaling factor for each sample for each given particle was calculated so that the medians of the reference proteins (or peptides) for each sample were adjusted to the mean of the medians across all samples.
Coverage of high-value, annotated list Alzheimer's disease candidate biomarkers were evaluated against the full list of 2,617 protein groups detected across the study's 200 samples. 673 unique protein entries were selected from OpenTargets (https://www.opentargets.org) gene and protein annotations with Alzheimer's scores equal to 1. These entries include proteins from all tissues, not limited to blood, and represents a superset of potential targets from which a subset might be accessible in plasma. 40 high-value Alzheimer's targets were identified by overlapping the proteins detected in this study with the 673 protein entries from OpenTargets. Those proteins, and the fraction of the 200 samples in which those proteins were detected (column titled “Detected”), are shown in TABLE 7 below.
The particle assay profiled deep into the plasma dynamic range. Particle range compression enabled quantification of proteins spanning more than 8 orders of magnitude in concentration in the plasma samples.
Multiple peptide identifications per protein group generated rich datasets for proteomics and multifold validation for protein group assignments.
As a first analysis for the potential to discriminate between sample types (i.e., control, MCI, AD) using the peptide data, an initial univariate analysis was performed. Using the peptide data, median normalized as described above, and filtered to include only those peptides which were present in at least 50% of at least one of the classes, a Wilcox test, non-parametric analysis was performed on a feature-by-feature basis. As above, a feature in this context is a particle-peptide intersection, meaning that more than one particle may provide unique intensity values for the same identified peptide sequence.
Four sample group comparisons were performed: CONTROL v AD, CONTROL v MCI, AD v MCI, and CONTROL v DISEASED, where DISEASED is defined as the combination of the 50 AD and 50 MCI samples. Multiple testing correction (Benjamini-Hochberg 5% FDR) was performed using all of the features from the ten nanoparticles.
The peptide feature data summarized in
A total of 825 different protein groups were derived from the AD and MCI models. Of these protein groups, 151 were unique to AD, 222 were unique to MCI, and 452 were common to both sets. Given both the biological overlap in diagnosis of AD and MCI that might exist in these samples as well as the potential sample collection stratification factors highlighted above, this degree of overlap as well as the overall number of protein groups that overlap may not be unexpected. Nonetheless, the large numbers of protein groups unique to AD and MCI show that the particle panel interrogation of the present example is capable of distinguishing AD and MCI.
Given the overlap between the AD and MCI peptides outlined above, the identified protein groups were analyzed against previous annotations for Alzheimer's utility as annotated in the OpenTargets database.
The studies described in this example provide particle profiling data, as well as the analyses of these data with respect to classification between for AD and MCI diagnostic groups as compared to age- and gender-matched controls. The particle panel platforms detected 2,232 protein groups (present in at least 25% of the 200 samples) and 16,323 unique peptides (also present in at least 25% of the samples). Univariate analysis of the pair-wise comparisons of the study classes using the peptide-level data revealed a significant number of protein groups with significantly different measured intensities. After multiple testing correction, 603, 674, and 930 protein groups were significantly different in the Control versus AD, Control versus MCI, and Control versus Diseased comparisons, respectively, with an overlap of 452 protein groups between the AD and MCI lists. The possibility of statistically significant subject sample blocking factors (i.e., age, gender, site, and time of collection) were reviewed, but the magnitude of the observed effects does not appear to be meaningful. However, there were no protein groups that achieved significant difference after multiple testing correction in the univariate MCI versus AD comparison.
This example demonstrates the potential for developing models based on biomolecule corona and mass spectrometric analysis, and outlines Random Forest (RF)-based models which use multiple rounds of cross-validation and are accurate models for biological state prediction. Peptide features (e.g., a specific peptide observed on a particle type) from Example 1 were used as the unit of data for training and development of a classifier to distinguish Alzheimer's disease (AD), mild cognitive impairment (MCI), and healthy (control) samples. As a feature is defined as a unique particle-peptide pair, the same peptide from a protein could be present on different particles and count as distinct inputs. Accordingly, the number of features is in significant excess to the number of peptides for any sample.
To prepare the data for classifier training, the data were median normalized using reference peptides as outlined in Example 1. The data were then filtered, such that only peptide features (i.e., nanoparticle-peptide pairs) that were present in at least 25% of the 200 samples were used for the classification analyses. After filtering, missing values were imputed by replacement with the lowest measured value for any feature from the given sample-particle combination. Although the replacement value for different peptide features from a sample-particle combination will be replaced by a common value, given the non-parametric nature of Random Forest classification models, monotonic replacement is unlikely to affect model performance.
After the data were prepared, ten rounds of 10-fold cross validation were performed for each class comparison, using a sparse tuning grid consisting of three different value for tree-node evaluation, namely
While this does not represent an exhaustive tuning of the classification modeling process, it is useful for overall appraisal for model potential.
It is also worth pointing out that the large number of peptide features generated with the panel of 10 particles prevented all of the data from being used at one time for evaluation. There are many feature selection and reduction strategies that can be employed to reduce the dimensionality of classification problems (e.g., PCA transformations), but once again, Random Forests are relatively robust for correlating data for an initial approximation.
In the control versus AD classification model, the 100 control samples and 50 AD samples were used for training and analysis. As is shown in
The 20 top features from the 5 particle Random Forest AD versus control sample classifier are provided in
The control versus mild cognitive impairment classification models were trained using data from the 100 control samples and 50 MCI samples from Example 1. As outlined in
The 20 top features from the 5 particle Random Forest AD versus control sample classifier are provided in
Mild Cognitive Impairment versus Alzheimer's Disease
From the 10-particle panel experiments in Example 1, there was considerable overlap between protein groups which exhibited significant differences for MCI and AD (of the 825 protein groups which exhibited significant differences for MCI and AD samples, 222 were specific for MCI, 151 for AD, and 452 common to AD and MCI). Given the considerable overlap between the MCI and AD protein groups, the ability to significantly discriminate MCI v AD by Random Forest classification was anticipated to be somewhat challenging.
As shown in
Given an overlap between the univariate analysis of the control versus AD and control versus MCI Random Forest classifiers, the overlap of the top peptide features in classifiers for each of these comparisons can be compared. Models trained with data collected from 10 particle panels were compared, as shown in
Analogous to the univariate analysis described in Example 1, the majority of the top 20 features for each classifier have very low or no annotated AD OpenTarget score suggesting either that these represent novel, previously unappreciated candidate markers for AD and MCI (the favorable interpretation) or that they represent markers related to potential subject sample stratification as described above. The considerable number of high and low top 20 features shared across particle types in each model comparison suggests a higher degree of confidence in the results (i.e., lack of overfitting), since each classifier is independently built with its own particle peptide data. That being said, the number of top high OpenTarget score features not shared across particle-types indicates that interrogation with a panel of particles rather than any one particle may generate greater degrees of profiling depth, reproducibility, and biological insight.
Using data derived at the high-resolution (as outlined in Example 1), peptide-level univariate and cross-validated classification analyses on the sample diagnostic groups were performed yielding high-performance models with AD- and MCI-nanoparticle classifiers in excess of 0.90 AUC. The net result was the identification of both pre-existing and novel differences between the groups, with the classifiers combining both for predictive performance. While the results from these specific analyses represent novel opportunities for clinical test development with respect to AD and MCI, the results and analyses also highlight the potential for the methods disclosed herein to be deployed in even larger studies in a practicable and affordable format, resolving one of the key barriers (e.g., small study sizes constrained by complex workflows) to improving protein candidate biomarker discovery.
Using the 200 samples in the respective pairwise sample group comparisons, cross-validated classifier constructions by Random Forest machine-learning, high-performing classification occurred with all nanoparticles. For the AD and MCI classifications versus Controls, all cross-validated ROC AUCs were greater than or equal to 0.90. For the MCI versus AD, classification performance was less refined, with individual nanoparticle ROC AUCs ranging from 0.63 to 0.50. Inspection of the top 20 features in each Random Forest-based classification highlighted the identification of novel combinations of pre-existing and unknown candidate biomarker protein groups, with several instances of the identification of the same protein on different nanoparticles. Taken together, the results of this collaborative study highlight at least two considerations for AD and MCI analysis. First, the particle panel platform is a superior workflow for the collection and identification of proteomics profiling data in a rapid and broad fashion, enabling large-scale studies with enhanced ability to detect novel insights. Second, the specific results from the univariate and cross-validation analyses identify novel candidate markers, both with and without prior appreciation of utility in AD testing, and thus suggest potential for the use of the particle panel platform in biomarker discovery for both diagnostic and therapeutic research and development.
In total, more than 600 peptide features contributed to the classification models. The top 20 peptide features identified on each particle for each biological state comparison (control versus AD, control versus MCI, and AD versus MCI), along with the plasma protein groups from which they are derived, are summarized in TABLE 12.
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
The present application claims the benefit of U.S. Provisional Application No. 63/109,806, filed Nov. 4, 2020; and U.S. Provisional Application No. 63/149,047, filed Feb. 12, 2021, each of which is incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
63109806 | Nov 2020 | US | |
63149047 | Feb 2021 | US |