The present invention relates to a method for preparing a nucleic acid derived from a skin cell of a subject, and a method for analyzing a skin of a subject using the nucleic acid.
Techniques for examining current and even future internal physiological conditions of human living bodies by analyzing molecules in biological samples (e.g. nucleic acids, proteins and metabolic substances) have been developed. In particular, analysis using nucleic acid molecules has the advantage that an abundance of information can be obtained by one analysis because an exhaustive analysis method has been established, and that it is easy to functionally link analysis results on the basis of many study reports related to single-nucleotide polymorphisms, RNA functions and the like.
Among various tissues of living bodies, skins receive attention as tissues which contact the outside, and therefore enable collection of biological samples with low invasiveness. As a method for non-invasively collecting a nucleic acid from a skin and analyzing the nucleic acid, a method has been reported in which a human skin sample is collected by wiping the skin surface with a wetted cotton swab, and RNA profiling is performed (Non-Patent Literature 1). Patent Literature 1 indicates that a nucleic acid derived from a skin cell of a subject, such as RNA, is separated from a skin surface lipid, and used as a sample for analysis of a living body.
In an embodiment, the present invention provides a method for preparing a nucleic acid derived from a skin cell of a subject, the method containing preserving at 0° C. or lower an RNA-containing skin surface lipid collected from the subject.
In another embodiment, the present invention provides a method for preparing a nucleic acid derived from a skin cell of a subject, the method containing: converting RNA which has been contained in a skin surface lipid of the subject into cDNA by reverse transcription, and then subjecting the cDNA to multiplex PCR; and purifying a reaction product of the PCR.
In another embodiment, the present invention provides a method for analyzing a condition of a skin, a part other than the skin or the whole body of a subject, the method containing analyzing the nucleic acid prepared by the above-described method.
In another embodiment, the present invention provides a method for evaluating the effect or the efficacy of a skin external preparation, an intracutaneously administered preparation, a patch, an oral preparation or an injection on a subject, the method containing analyzing the nucleic acid prepared by the above-described method.
In another embodiment, the present invention provides a method for analyzing a concentration of a component in the blood of a subject, the method containing analyzing the nucleic acid prepared by the above-described method.
All the patent literatures, non-patent literatures and other publications cited in the present description are incorporated herein by reference in their entirety.
The names of the genes disclosed in the present description follow Official Symbol described in NCBI ([www.ncbi.nlm.nih.gov/]). On the other hand, with regard to the gene ontology (GO), the names of the genes follow Pathway ID described in String ([string-db.org/]).
The present invention relates to a method for preparing a nucleic acid derived from a skin cell of a subject, and a method for analyzing a skin of a subject using the nucleic acid.
The method of the present invention enables stable preservation of a nucleic acid sample derived from a skin cell has been contained in a skin surface lipid of a subject. Therefore, the present invention improves the accuracy of analysis using the nucleic acid sample (e.g. gene analysis and diagnosis). Further, since the concentration of a specific marker gene-derived component has been contained in the skin cell-derived nucleic acid sample prepared by the method of the present invention correlates to the concentrations of various components present in the blood, use of the nucleic acid sample enables non-invasive measurement of the concentration of a component in the blood.
RNA has the property of being easily decomposed, and is therefore usually preserved under a particular low-temperature condition of −80° C. except when the RNA is specifically treated. When a sample having a reduced amount of RNA due to decomposition is used, the accuracy of analysis decreases. Even when RNA is converted into cDNA by reverse transcription reaction and preserved, the accuracy of analysis decreases because a sufficient amount of cDNA cannot be obtained if the original RNA is unstable.
Previously, the present inventors found that a lipid present on a skin surface (skin surface lipid) contains RNA derived from a skin cell of a subject, and use of the RNA enables biological analysis, and the present inventors applied for a patent (Patent Literature 1). Here, further, the present inventors found that RNA has been contained in the skin surface lipid can be preserved under a general low-temperature condition, and can be stably preserved under a condition other than a conventional particular low-temperature condition of −80° C. Further, the present inventors found that by subjecting RNA separated from the skin surface lipid to reverse transcription reaction and PCR under predetermined conditions, and then purifying the RNA, a sufficient amount of a nucleic acid sample for analysis can be obtained even from a skin surface lipid having a low RNA content.
Accordingly, in an aspect, the present invention provides a method for preparing a nucleic acid derived from a skin cell of a subject. In an embodiment, the method for preparing a nucleic acid derived from a skin cell of a subject according to the present invention comprises preserving at 0° C. or lower an RNA-containing skin surface lipid collected from a subject. In another embodiment, the method for preparing a nucleic acid derived from a skin cell of a subject according to the present invention comprises converting RNA has been contained in the skin surface lipid of a subject into cDNA by reverse transcription, then subjecting the cDNA to multiplex PCR, and purifying a reaction product of the PCR.
In the present description, the “skin surface lipid (SSL)” refers to a lipid-soluble fraction present on a skin surface, and is sometimes referred to as sebum. In general, SSL mainly contains secretions secreted from an exocrine gland such as a sebaceous gland on the skin, and is present on the skin surface in the form of a thin layer covering the skin surface.
In the present description, the “skin” is a generic term for regions including tissues of the surface skin, the dermis, the follicle, the sweat gland, the sebaceous gland and other glands of the body surface, unless otherwise specified.
Examples of the nucleic acid derived from a skin cell of a subject and prepared by the method of the present invention include, without limitation, DNA and RNA, and RNA or DNA prepared from the RNA is preferable. Examples of RNA include mRNA, tRNA, rRNA, small RNA (e.g. microRNA (miRNA), small interfering RNA (siRNA) and Piwi-interacting RNA (piRNA)) and long intergenic non-coding (linc) RNA. The mRNA is RNA encoding a protein, and often has a length of 1,000 nt or more. Each of the miRNA, the siRNA, the piRNA and the lincRNA is non-coding (nc) RNA which does not encode a protein. The miRNA is small RNA having a length of from 19 to 30 nt among ncRNAs. The lincRNA is long non-coding RNA having poly-A like mRNA, and has a length of 200 nt or more (Non-Patent Literature 1). More preferably, the RNA prepared in the method of the present invention is RNA having a length of 200 nt or more. Still more preferably, the RNA prepared in the method of the present invention is at least one selected from the group consisting of mRNA and lincRNA. Examples of the DNA prepared in the present invention include cDNA prepared from the aforementioned RNA, and reaction products (e.g. PCR products and clone DNA) from the cDNA.
The subject in the method of the present invention may be an organism having SSL on the skin. Examples of the subject include mammals including humans and non-human mammals, with humans being preferable. Preferably, the subject is a human or a non-human mammal needing or desiring analysis of its nucleic acid. Preferably, the subject is a human or a non-human mammal needing or desiring analysis of gene expression on the skin, or analysis of the condition of the skin or a part other than the skin using a nucleic acid.
SSL collected from a subject includes RNA expressed on a skin cell of the subject, preferably RNA expressed on any of the surface skin, the sebaceous gland, the follicle, the sweat gland and the dermis of the subject, more preferably RNA expressed on any of the surface skin, the sebaceous gland, the follicle and the sweat gland (see Patent Literature). Therefore, the RNA derived from a skin cell of a subject and prepared by the method of the present invention is preferably RNA derived from at least one part selected from the group consisting of the surface skin, the sebaceous gland, the follicle, the sweat gland and the dermis of the subject, more preferably RNA derived from at least one part selected from the group consisting of the surface skin, the sebaceous gland, the follicle and the sweat gland.
In an embodiment, the method of the present invention may further comprise collecting SSL from a subject. Examples of the part of the skin, from which SSL is collected, include, but are not limited to, skins of any part of the body such as the head, the face, the neck, the body trunk or the limb, skins having a disease such as atopy, acne, dryness, inflammation (redness) or a tumor, and skins having a wound. Preferably, the part of the skin from which SSL is collected does not include the palm, the back, the sole of the foot, or the finger skin.
For collection of SSL from the skin of a subject, any means used for collecting or removing SSL from the skin can be employed. Preferably, a SSL absorbing, a SSL bonding material or a device for scraping off SSL from the skin as described below can be used. The SSL absorbing material or the SSL bonding material is not limited as long as it is a material having affinity for SSL, and examples thereof include polypropylene and pulp. Specific examples of the procedure for collecting SSL from the skin include a method in which SSL is absorbed into a sheet-shaped material such as an oil blotting paper or an oil blotting film; a method in which SSL is bonded to a glass plate, a tape or the like; and a method in which SSL is scraped off with a spatula or a scraper. A SSL absorbing material containing a solvent with high lipid solubility beforehand may be used for improving the SSL adsorption property. On the other hand, when the SSL absorbing material contains a solvent with high water solubility or moisture, adsorption of SSL is inhibited, and therefore the content of a solvent with high water solubility or moisture is preferably low. It is preferable that the SSL absorbing material be used in a dried state.
In an embodiment of the present invention, the collected RNA-containing SSL is preserved under a low-temperature condition of 0° C. or lower. It is preferable that the collected RNA-containing SSL be preserved under a predetermined low-temperature condition as soon as possible after the collection for suppressing decomposition of RNA as much as possible. The temperature condition for preservation of the RNA-containing SSL in the present invention may be 0° C. or lower, and is preferably from −20±20° C. to −80±20° C., more preferably from −20±10° C. to −80±10° C., still more preferably from −20±20° C. to −40±20° C., even more preferably from −20±10° C. to −40±10° C., even more preferably −20±10° C., even more preferably −20±5° C. This temperature condition is much milder than a conventional general RNA preservation condition (e.g. −80° C.). Therefore, preservation of the RNA-containing SSL in the present invention under a preferred low-temperature condition does not require use of special equipment such as an ultracold freezer or a dedicated preservation container, and can be performed by using a usual freezer or a freezing chamber of a refrigerator. The period of preservation of the RNA-containing SSL in the present invention under the low-temperature condition is preferably 12 months or less, for example 6 hours or more and 12 months or less, more preferably 6 months or less, for example 1 day or more and 6 months or less, still more preferably 3 months or less, for example 3 days or more and 3 months or less, without limitation.
For separation of RNA from the collected RNA-containing SSL, a method which is normally used for extraction or purification of RNA from a biological sample can be used, for example a phenol/chloroform method, an AGPC (acid guanidinium thiocyanate-phenol-chloroform extraction) method, a method using a column such as TRIzol (registered trademark), RNeasy (registered trademark) or QIAzol (registered trademark), a method using special magnetic particles coated with silica, a method using solid phase reversible immobilization magnetic particles, or extraction with a commercially available RNA extraction reagent such as ISOGEN can be used.
In another embodiment of the present invention, RNA separated from the RNA-containing SSL (SSL-derived RNA) can be used as it is for various analyses. In a preferred embodiment, the SSL-derived RNA is converted into DNA. Preferably, the SSL-derived RNA is converted into cDNA by reverse transcription, the cDNA is then subjected to PCR, and the resulting reaction product is purified. For the reverse transcription, a primer targeting specific RNA to be analyzed, and it is preferable to use a random primer for more comprehensive preservation and analysis. In the PCR, only the specific DNA may be amplified using a primer pair targeting specific DNA to be analyzed, and a plurality of DNAs may be amplified using a plurality of primer pairs. Preferably, the PCR is multiplex PCR, which is a method for simultaneously amplifying a plurality of gene regions by simultaneously using a plurality of primer pairs in the PCR reaction system. The multiplex PCR can be performed using a commercially available kit (e.g. Ion AmpliSeqTranscriptome Human Gene Expression Kit; Life Technologies Japan Ltd.).
For the reverse transcription of RNA, a common reverse transcriptase or reverse transcription reagent kit can be used. Preferably, a reverse transcriptase or reverse transcription reagent kit with high accuracy and efficiency is used, and examples thereof include M-MLV reverse transcriptase and modified products thereof, or commercially available reverse transcriptases or reverse transcription reagent kits, for example PrimeScript (registered trademark) Reverse Transcriptase series (Takara Bio Inc.) and SuperScript (registered trademark) Reverse Transcriptase series (Thermo Scientific). SuperScript (registered trademark) III reverse Transcriptase and SuperScript (registered trademark) VILO cDNA Synthesis kit (each from Thermo Scientific), etc. are preferably used.
By adjusting the reaction conditions for the reverse transcription and PCR, the yield of the PCR reaction product is further improved, and hence the accuracy of analysis using the PCR reaction product is further improved. It is preferable that in elongation reaction in the reverse transcription, the temperature be adjusted to preferably 42° C.±1° C., more preferably 42° C.±0.5° C., still more preferably 42° C.±0.25° C., and the reaction time be adjusted to preferably 60 minutes or more, more preferably from 80 to 100 minutes. Preferably, the temperature for annealing and elongation reaction in PCR is preferably 62° C.±1° C., more preferably 62° C.±0.5° C., still more preferably 62° C.±0.25° C. Therefore, it is preferable that in the PCR, annealing and elongation reaction be carried out in one step. The time for the step of annealing and elongation reaction can be adjusted according to the size of DNA to be amplified, etc., and is preferably from 14 to 18 minutes. The condition for degeneration reaction in the PCR can be adjusted according to DNA to be amplified, and is preferably from 10 to 60 seconds at 95 to 99° C. Reverse transcription and PCR with the above-described temperature and time can be carried out using a thermal cycler which is commonly used in PCR.
It is preferable that purification of the reaction product obtained by the PCR be performed by size separation of the reaction product. The size separation enables separation of a desired PCR reaction product from the primer and other impurities contained in the PCR reaction liquid. The size separation of DNA can be performed with, for example, a size separation column, a size separation chip, magnetic beads usable for size separation, or the like. Preferred examples of the magnetic beads usable for size separation include solid phase reversible immobilization (SPRI) magnetic beads such as Ampure XP. When Ampure XP is mixed with a DNA solution, DNA is adsorbed to carboxy groups coated on the surfaces of the magnetic beads, and only the magnetic beads are recovered with a magnet to purify the DNA. When the mixing ratio of the Ampure XP solution to the DNA solution is changed, the molecular size of DNA adsorbed to the magnetic beads changes. By utilizing this principle, DNA with a specific molecular size, which is to be captured, can be recovered on the magnetic beads, while DNA with other molecular sizes and impurities are purified.
The purified PCR reaction product may be subjected to further treatment necessary for performing subsequent analysis. For example, for sequencing or fragment analysis, an appropriate buffer solution may be prepared from the purified PCR reaction product, PCR primer regions contained in DNA subjected to PCR amplification may be cut, or an adaptor sequence may be further added to the amplified DNA. Libraries for various analyses can be prepared by, for example, preparing a buffer solution from the purified PCR reaction product, subjecting the amplified DNA to removal of the PCR primer sequence and adaptor ligation, and amplifying the resulting reaction product if necessary. These operations can be carried out by, for example, using 5XVILO RT Reaction Mix attached to SuperScript (registered trademark) VILO cDNA Synthesis Kit (Life Technologies Japan Ltd.), 5XIon AmpliSeq HiFi Mix attached to Ion AmpliSeq Transcriptome Human Gene Expression Kit (Life Technologies Japan Ltd.), and Ion AmpliSeq Transcriptome Human Gene Expression Core Panel and following the protocols attached to the kits.
The SSL-derived RNA which is subjected to the reverse transcription and PCR may be RNA derived from RNA-containing SSL immediately after collection from a living body, or RNA derived from RNA-containing SSL preserved at room temperature or refrigerated after collection from the living body, and is preferably RNA derived from RNA-containing SSL preserved at 0° C. or lower after collection from the living body. The preservation at 0° C. or lower may be preservation at −80° C., and is preferably preservation at −20±10° C., more preferably preservation at −20±5° C. The SSL-derived RNA may be used for the reverse transcription or PCR immediately after being separated from SSL, or may be stored by a usual method until being used.
A nucleic acid derived from a skin cell of a subject and prepared from SSL-derived RNA by the method of the present invention can be used for various analyses or diagnoses using nucleic acids. Accordingly, the present invention also provides a method for analyzing a nucleic acid, the method containing analyzing a nucleic acid prepared by the method for preparing a nucleic acid according to the present invention. The nucleic acid is a nucleic acid prepared by the method for preparing a nucleic acid according to the present invention. Examples of analysis and diagnosis which can be performed using the nucleic acid prepared according to the present invention include:
(i) analysis of gene expression related to the skin of the subject, analysis of other gene information, analysis of functions related to the skin of the subject, which is based on the above-mentioned analyses, and the like;
(ii) analysis of a disease or a condition of the skin of the subject, for example evaluation of a health condition of the skin (skin condition such as sebum secretion, moisture content, redness, atopic dermatitis or sensitive skin), estimation of a current skin condition or prediction of a future skin condition, prediction of past histories of the skin such as a cumulative ultraviolet exposure time, diagnosis or prognosis of skin disease, diagnosis or prognosis of skin cancer, evaluation of subtle change of the skin, and the like;
(iii) evaluation of effects or efficacy of a skin external preparation, an intracutaneously administered preparation, a patch, an oral preparation or an injection with the utilization of the analysis of a disease or a condition of the skin of the subject;
(iv) analysis of a condition of a part other than the skin, or the whole body of the subject, for example evaluation of a general health condition or prediction of a future general health condition, diagnosis or prognosis of various diseases such as neural disease, cardiovascular disease, metabolic disease and cancer, and the like; and
(v) analysis of the concentration of a component in the blood of the subject.
More specific examples of analysis and diagnosis using the nucleic acid prepared according to the present invention are described below.
As disclosed in Patent Literature 1, SSL contains an abundance of high-molecular-weight RNA such as mRNA derived from the subject. SSL, which is a supply source of mRNA which can be non-invasively collected from the subject, is useful as a biological sample for analysis of gene expression. Further, the mRNA in SSL reflects gene expression profiles of the sebaceous gland, the follicle and the surface skin (see Examples 1 to 4 of Patent Literature 1). Therefore, the nucleic acid prepared according to the present invention is suitable as a biological sample for analysis of gene expression of the skin, particularly the sebaceous gland, the follicle and the surface skin.
The skin of the subject can be analyzed by using as a sample the nucleic acid prepared according to the present invention. Examples of the analysis of the skin include the analysis of gene expression and the analysis of a skin condition. Examples of the analysis of a skin condition include detection of a skin with or a predetermined disease or condition or a skin without predetermined disease or condition. Examples of the predetermined disease or condition include, but are not limited to, deficiency or excess in amount of sebum, deficiency or excess in skin moisture content, redness, atopic dermatitis, and sensitive skin. For example, analysis of the expression level of a marker gene for a predetermined disease or condition such as an amount of sebum, a skin moisture content, redness, atopic dermatitis or sensitive skin in the skin of the subject from the nucleic acid prepared according to the present invention enables determination of whether or not the skin of the subject has the predetermined disease or condition. Preferably, comparison of the expression level of a marker gene for a predetermined disease or condition, which is obtained for the subject, with the expression level of the marker gene in the nucleic acid prepared by the method of the present invention from SSL of a group with the predetermined disease or condition (positive control) or a group without the predetermined disease or condition (negative control) enables determination of whether or not the skin of the subject has the predetermined disease or condition. As the marker gene, a known skin condition-related marker gene can be used.
Another example of analysis of the skin is prediction of a skin condition, and examples of prediction the skin condition include prediction of a skin physical property, prediction of visual or palpatory evaluation of the skin, and prediction of a sebum composition. Examples of the skin physical property include the horn cell layer moisture content, the transepidermal water loss (TEWL), the amount of sebum, the amount of melanin and the amount of erythema. Examples of the visual or palpatory evaluation of the skin include evaluation of a skin condition which is usually performed visually or on palpation by a professional evaluator. More specific examples of the visual evaluation include evaluation of the existence or non-existence or the degree of “cleanness”, “clearness”, “lightness”, “luster”, “flecks”, “conspicuous dark circles”, “yellowness”, “overall redness”, “textured wrinkles on the cheek”, “drooping corners of the mouth”, “scale”, “acne”, “conspicuous pores on the cheek”, “conspicuous pores on the nose” and the like, and examples of the palpatory evaluation include evaluation of the existence or non-existence or the degree of “rough feeling”, “moist feeling” and the like. Examples of the sebum composition include the amounts of components such as free fatty acid (FFA), wax ester (WE), cholesterol ester (ChE), squalene (SQ), squalene epoxide (SQepo), squalene oxide (SQOOH), diacylglycerol (DAG) and triacylglycerol (TAG).
As shown in Examples below, by correlational analysis of an RNA expression profile obtained from analysis of SSL-derived RNA and data for the measured values and evaluation values of various skin conditions linked to the RNA expression profile, genes closely related to various skin conditions can be selected and used for construction of a prediction model. Specific related genes used for prediction of a skin condition include genes shown in Table 8.
When a large number of gene data are analyzed as expression data of closely related genes used for construction of the prediction model, the prediction model may be constructed after the data are compressed by analysis of main components if necessary.
As an algorism in construction of the prediction model, a known algorism such as one that is used for machine learning. Examples of the machine learning algorism include algorisms such as those of linear regression model (Linear model), Lasso regression (Lasso), random forest (Random Forest), neural network (Neural net), linear kernel support vector machine (SVM (linear)) and rbf kernel support vector machine (SVM (rbf)). Data for verification is input to constructed prediction models to calculate predicted values. A model giving the smallest root-mean-square-error (RMSE) of a difference between a predicted value and a measured value can be selected as an optimum model.
Another example of analysis of the skin is prediction of a cumulative ultraviolet exposure time of the skin. In general, the cumulative ultraviolet exposure time is calculated with the ultraviolet exposure time predicted on the basis of questionary studies on the lifestyle habit and outdoor leisure activity. As shown in Examples below, by correlational analysis of an RNA expression profile obtained from analysis of SSL-derived RNA and calculated data of the cumulative ultraviolet exposure time linked to the RNA expression profile, genes closely related to the cumulative ultraviolet exposure time can be selected to construct a prediction model. The procedure for constructing the model is the same as described above.
Alternatively, the expression level of the nucleic acid prepared from SSL of a group with the predetermined disease or condition (positive control) or a group without the predetermined disease or condition (negative control) is analyzed. A gene for which there is a significant difference in expression level between both the groups can be used as a skin condition-related marker gene. Specifically, as the marker gene for atopic dermatitis, mention is made of one or more genes selected from a group of 1911 genes ((A) of Tables 7-1 to 7-24) whose expression is significantly lower in atopic dermatitis patients than in healthy persons in Test Example 6 below; and one or more genes selected from a group of 370 OR genes ((B) of Tables 7-1 to 7-11) whose expression is lower in atopic dermatitis patients than in healthy persons and a group of 368 OR genes ((C) of Tables 7-1 to 7-11) and a group of 284 OR genes ((D) of Tables 7-1 to 7-11) whose expression decreases in response to the severity of dermatitis, among olfactory receptors (ORs) contained in GO: 0050911 which is a biological process (BP) found to be closely related to atopic dermatitis. As the marker gene for sensitive skin, mention is made of one or more genes selected from a group of 693 genes ((E) of Tables 7-1 to 7-20) whose expression is significantly lower in a group with subjective symptoms of sensitive skin than in a group without subjective symptoms of sensitive skin in Test Example 7 below; and one or more genes selected from a group of 344 OR genes ((F) of Tables 7-1 to 7-10) whose expression is lower in a group with subjective symptoms than in a group without subjective symptoms, among olfactory receptors (ORs) contained in GO: 0050911 which is a biological process (BP) found to be closely related to sensitive skin. As the marker gene for redness, mention is made of one or more genes selected from a group of 703 genes ((G) of Tables 7-1 to 7-20) for which there is a significant difference in expression between a group with intense skin redness and a group with mild skin redness in Test Example 8 below. As the marker gene for the skin moisture content, mention is made of one or more genes selected from a group of 553 genes ((H) of Tables 7-1 to 7-16) for which there is a significant difference in expression between a group with a high horn cell layer moisture content and a low horn cell layer moisture content in Test Example 8 below. As the marker gene for the amount of sebum, mention is made of one or more genes selected from a group of 594 genes ((I) of Tables 7-1 to 7-17) for which there is a significant difference in expression between a group with a large amount of sebum and a group with a small amount of sebum in Test Example 8 below.
Further, on the basis of the analysis of a disease or a condition of the skin of the subject, the effect or efficacy of a given skin external preparation, an intracutaneously administered preparation, a patch, an oral preparation, an injection or the like on the subject can be evaluated. For example, by examining expression of a marker gene for a disease or a condition of the skin of the subject, the effect or efficacy of use of the skincare product on the skin of the subject can be evaluated. The marker for a disease or a condition of the skin, which is used for the evaluation, is, for example, one or more genes selected from the group consisting of BNIP3, CALML3, GAL, HSPA5, JUNB, KIF13B, KRT14, KRT17, KRT6A, OVOL1, PPIF, PRDM1, RBM3, RPLP1, RPS4X, SEPT9, SOAT1, SPNS2, UBB, VCP, WIPI2 and YPEL3.
The concentrations of various components present in the blood of the subject can be analyzed by using as a sample the nucleic acid prepared according to the present invention. As shown in Examples below, it was possible to predict the concentration of a component in the blood of the subject from the expression level of related marker gene-derived RNA in SSL-derived RNA of the subject by using a machine learning model constructed on the basis of the expression level of related marker gene-derived RNA in SSL-derived RNA and data of the concentrations of various components in the blood. Therefore, the concentrations of various components in the blood can be determined on the basis of the expression level of related marker gene-derived RNA in SSL-derived RNA. The machine learning model can be constructed in accordance with the procedure for constructing a prediction model for the skin condition. Examples of various components present in the blood, which are analyzed according to the present invention, include hormones, insulin, neutral fat, γ-GTP and LDL-cholesterol. Examples of the hormone in the blood include androgens such as testosterone, dihydrotestosterone, androstenedione and dehydroepiandrosterone, estrogens such as estrone and estradiol, progesterone and cortisol. Of these, testosterone or cortisol is preferable. The related marker gene-derived RNA in SSL-derived RNA which is used for determination of the concentrations of various components in the blood can be selected from the group consisting of RNAs whose expression level has a relatively high correlation with the concentration of a component in the blood. Preferably, the expression level of SSL-derived RNA and the concentration of a target component in the blood are measured on a population, a correlation of the expression level of each RNA with the concentration of the component in the blood is examined, and RNA having a relatively high correlation is selected.
As an example of related marker gene-derived RNA in SSL-derived RNA which is used for determination of the concentration of each of, for example, testosterone, insulin, neutral fat, γ-GTP and LDL-cholesterol in the blood, mention is made of at least one selected from a group of RNAs derived from human genes shown below, preferably all of the RNAs.
(Neutral fat) CCDC9, C6orf106, CERK, HSD3B2, SUN2, FNDC4, GRAMD1C, DGAT2, ALPL, HOMER3, MTHFS, ADIPOR1, RBM3, EXOC8 and ARHGEF37;
A preferred procedure for determining the concentration of a component in the blood using SSL-derived RNA will be described below with determination of the blood testosterone concentration taken as an example. First, by machine learning in which data of the expression level of RNA of each of the 10 genes (SCARNA16, PRSS27, RDBP, PSMB10, SBNO1, EMC3, MAR9, C20orf112, C14orf2 and CCDC90B) having a high correlation with the blood testosterone concentration and contained in SSL-derived RNA obtained from a human population serves as an explanatory variable and data of the concentration of the blood testosterone obtained from the population serves as an objective variable, an optimum prediction model for predicting the blood testosterone concentration is constructed from the expression level of the RNA. On the other hand, SSL-derived RNA is collected from a human subject whose blood testosterone concentration is to be examined. On the basis of the constructed model, the predicted value of the blood testosterone concentration of the subject can be calculated from the data of the expression level of RNA of each of the 10 genes in the SSL-derived RNA of the subject.
It has been recently reported that about 63% of a group of RNAs whose expression changes in cancer cells are mRNA encoding proteins (Cancer Res. 2016, 76, 216-226). Therefore, by measuring the expression state of mRNA, a change in physiological condition of cells due to a disease such as cancer can be more exactly detected, so that it is possible to more accurately diagnose a physical condition. SSL contains an abundance of mRNA, and contains mRNA of SOD2 reported to be related to cancer (Physiol genomics, 2003, 16, 29-37; Cancer Res, 2001, 61, 6082-6088). Therefore, SSL is useful as a biological sample for diagnosis or prognosis of cancers such as skin cancer.
In recent years, it has been reported that expression of molecules in the skin varies in patients with diseases in tissues other than the skin, such as obesity, Alzheimer's disease, breast cancer and cardiac disease, and therefore “the skin can be a window to body's health (Eur. J. Pharm. Sci. 2013, 50, 546-556). Thus, it may be possible to analyze a physiological condition at a part other than the skin or a general physiological condition in the subject by measuring the expression state of mRNA in SSL.
In recent years, the involvement of non-coding (nc) RNA such as miRNA and lincRNA in gene expression in cells has been given attention, and actively studied. Non-invasive or low-invasive methods for diagnosing cancer or the like using miRNA in the urine or serum have been heretofore developed (e.g. Proc. Natl. Acad. Sci. USA, 2008, 105, 10513-10518; Urol Oncol, 2010, 28, 655-661). ncRNA prepared from SSL, such as miRNA and LincRNA, can be used as a sample for the studies and diagnoses.
A nucleic acid marker for a disease or a condition can be screened or detected by using as a sample the nucleic acid prepared from SSL. In the present description, the nucleic acid marker for a disease or a condition is a nucleic acid serving as an index for determination of a given disease or condition or determination of a risk thereof. Preferably, the nucleic acid marker is an RNA marker, and the RNA is preferably mRNA, miRNA or lincRNA. Examples of the disease or condition targeted by the nucleic acid marker include, but are not limited to, various skin diseases (e.g. atopic dermatitis); skin health conditions (sensitive skin, photoaging, inflammation (redness), dryness, moisture content or oil content, skin tenseness and dullness); and cancers such as skin cancer and diseases in tissues other than the skin, such as obesity, Alzheimer's disease, breast cancer and cardiac disease, as described in the section “Pathological diagnosis”. Analysis of expression of a nucleic acid can be performed by known means such as analysis of RNA expression using real-time, PCR, microarrays or a next-generation sequencer.
An example is a method for selecting a nucleic acid marker for a disease or a condition. In the method, a population with a predetermined disease or condition or a risk thereof is taken as a subject, and a nucleic acid derived from a skin cell of the subject is prepared by the method for preparing a nucleic acid according to the present invention. The expression (e.g. expression level) of the nucleic acid prepared from the population is compared to the expression of a control. Examples of the control include a population without the predetermined disease or condition or a risk thereof, and associated data. A nucleic acid whose expression is different from that of the control can be selected as a marker for the predetermined disease or condition or a candidate thereof. Examples of the nucleic acid marker or candidate selected in this manner include marker genes described in Tables 7-1 to 7-24.
Another example is a method for detecting a nucleic acid marker for a disease or a condition, or a method for determining a disease or a condition on the basis of the detection of the marker, or determining a risk thereof. In the method, from a subject desiring or needing determination of a predetermined disease or condition or a risk thereof, a nucleic acid derived from a skin cell of the subject is prepared by the method for preparing a nucleic acid according to the present invention. Subsequently, a nucleic acid marker for the predetermined disease or condition is detected from the prepared nucleic acid. The disease or condition of the subject or a risk thereof is determined on the basis of existence or non-existence or the expression level of the nucleic acid marker.
Analysis of the nucleic acid prepared according to the present invention can be performed by a usual method used for analysis of nucleic acids, such as Real-time, PCR, RT-PCR, microarrays, sequencing and chromatography. The method for analyzing a nucleic acid according to the present invention is not limited thereto.
The present description discloses the following substances, production methods, uses, methods and the like as illustrative embodiments of the present invention. The present invention is not limited to these embodiments.
[1] A method for preparing a nucleic acid derived from a skin cell of a subject, the method containing preserving at 0° C. or lower an RNA-containing skin surface lipid collected from the subject.
[2] The method according to [1], wherein the temperature for the preservation is preferably from −20±20° C. to −80±20° C., more preferably from −20±10° C. to −80±10° C., still more preferably from −20±20° C. to −40±20° C., even more preferably from −20±10° C. to −40±10° C., even more preferably −20±10° C., even more preferably −20±5° C.
[3] The method according to [1] or [2], wherein the period for the preservation is preferably 12 months or less, for example 6 hours or more and 12 months or less, more preferably 6 months or less, for example 1 day or more and 6 months or less, still more preferably 3 months or less, for example 3 days or more and 3 months or less.
[4] A method for preparing a nucleic acid derived from a skin cell of a subject, the method containing: converting RNA has been contained in a skin surface lipid of the subject into cDNA by reverse transcription, and then subjecting the cDNA to multiplex PCR; and purifying a reaction product of the PCR.
[5] The method according to [4], wherein a temperature for annealing and elongation reaction in the multiplex PCR is preferably 62° C.±1° C., more preferably 62° C.±0.5° C., still more preferably 62° C.±0.25° C.
[6] The method according to [4] or [5], wherein preferably, the elongation reaction in the reverse transcription is carried out under the following conditions:
42° C.±1° C. for 60 minutes or more;
42° C.±1° C. for from 80 to 100 minutes;
42° C.±0.5° C. for 60 minutes or more;
42° C.±0.5° C. for from 80 to 100 minutes;
42° C.±0.25° C. for 60 minutes or more; or
42° C.±0.25° C. for from 80 to 100 minutes.
[7] The method according to any one of [4] to [6], wherein preferably, the purification of the reaction product of the PCR is purification by size separation.
[8] The method according to any one of [4] to [7], wherein the RNA has been contained in the skin surface lipid of the subject is prepared by separating the RNA from the skin surface lipid of the subject.
[9] The method according to any one of [4] to [8], wherein the skin surface lipid of the subject is one preserved at preferably 0° C. or lower, more preferably from −20±20° C. to −80±20° C., still more preferably from −20±10° C. to −80±10° C., even more preferably from −20±20° C. to −40±20° C., even more preferably from −20±10° C. to −40±10° C., even more preferably −20±10° C., even more preferably −20±5° C.
[10] The method according to [9], wherein the skin surface lipid of the subject is one preserved for preferably 12 months or less, for example 60 hours or more and 12 months or less, more preferably 6 months or less, for example 1 day or more and 6 months or less, still more preferably 3 months or less, for example 3 days or more and 3 months or less.
[11] A method for analyzing a condition of a skin, a part other than the skin, or the whole body in the subject, the method containing analyzing a nucleic acid prepared by the method according to any one of [1] to [10].
[12] The method according to [11], wherein the analysis is preferably analysis of a disease or a condition of the skin, more preferably detection of a skin with redness, sensitive skin or atopic dermatitis or a skin without redness, sensitive skin or atopic dermatitis; detection of a skin with a small or large amount of sebum or skin moisture content; estimation or prediction of a skin condition, for example prediction of a skin physical property, estimation or prediction of visual or palpatory evaluation of the skin, or estimation or prediction of the sebum composition; or estimation or prediction of the cumulative ultraviolet exposure time of the skin.
[13] The method according to [11], wherein the analysis is detection of a skin with atopic dermatitis or a skin without atopic dermatitis, and the nucleic acid is at least one selected from the group consisting of the genes described in (B), (C) and (D) of Tables 7-1 to 7-11, more preferably all of the genes;
the analysis is detection of a skin with mild or moderate atopic dermatitis or without atopic dermatitis, and the nucleic acid is at least one selected from the group consisting of the genes described in (C) and (D) of Tables 7-1 to 7-11, more preferably all of the genes;
the analysis is detection of a skin with sensitive skin or a skin without sensitive skin, and the nucleic acid is at least one selected from the group consisting of the genes described in (E) of Tables 7-1 to 7-20, more preferably all of the genes;
the analysis is detection of a skin with sensitive skin or a skin without sensitive skin, and the nucleic acid is at least one selected from the group consisting of the genes described in (F) of Tables 7-1 to 7-10, more preferably all of the genes;
the analysis is detection of a skin with redness or a skin without redness, and the nucleic acid is at least one selected from the group consisting of the genes described in (G) of Tables 7-1 to 7-20, more preferably all of the genes;
the analysis is detection of a skin with a large or small moisture content, and the nucleic acid is at least one selected from the group consisting of the genes described in (H) of Tables 7-1 to 7-16, more preferably all of the genes;
the analysis is detection of a skin with a large or small amount of sebum, and the nucleic acid is at least one selected from the group consisting of the genes described in (I) of Tables 7-1 to 7-17, more preferably all of the genes; or
the analysis is estimation or prediction of a skin physical property, estimation or prediction of visual or palpatory evaluation of the skin, or estimation or prediction of the sebum composition, and the nucleic acid is at least one selected from the group consisting of the genes described in Table 8, more preferably all of the genes.
[14] A method for evaluating an effect or efficacy of a skin external preparation, an intracutaneously administered preparation, a patch, an oral preparation or an injection on a subject, the method containing analyzing a nucleic acid prepared by the method according to any one of [1] to [10].
[15] The method according to [14], wherein the effect or efficacy of the skin external preparation, the intracutaneously administered preparation, the patch, the oral preparation or the injection on the subject is preferably an improving effect of a skincare product on a skin condition of the subject, and the nucleic acid is preferably at least one selected from the group consisting of BNIP3, CALML3, GAL, HSPA5, JUNB, KIF13B, KRT14, KRT17, KRT6A, OVOL1, PPIF, PRDM1, RBM3, RPLP1, RPS4X, SEPT9, SOAT1, SPNS2, UBB, VCP, WIPI2 and YPEL3, more preferably all of the genes.
[16] A method for analyzing a concentration of a component in the blood of a subject, the method containing analyzing a nucleic acid prepared by the method according to any one of [1] to [10].
[17] The method according to [16], wherein preferably, the component in the blood is a hormone, insulin, neutral fat, γ-GTP or L-cholesterol.
[18] The method according to [17], wherein the hormone is preferably testosterone, dihydrotestosterone, androstenedione, dehydroepiandrosterone, estrone, estradiol, progesterone or cortisol, more preferably testosterone or cortisol.
[19] The method according to [16], wherein the component in the blood is preferably testosterone, and the nucleic acid is preferably at least one selected from the group consisting of 10 RNAs derived from 10 genes consisting of SCARNA16, PRSS27, RDBP, PSMB10, SBNO1, EMC3, MARS, C20orf112, C14orf2 and CCDC90B, more preferably the 10 RNAs.
[20] The method according to [16], wherein the component in the blood is preferably insulin, and the nucleic acid is preferably at least one selected from the group consisting of 10 RNAs derived from 10 genes consisting of EAPP, SDE2, LYAR, ZNF493, PSMB10, FAM71A, GPANK1, FGD4, MRPL43 and CMPK1, more preferably the 10 RNAs.
[21] The method according to [16], wherein the component in the blood is preferably neutral fat, and the nucleic acid is preferably at least one selected from the group consisting of 15 RNAs derived from 15 genes consisting of CCDC9, C6orf106, CERK, HSD3B2, SUN2, FNDC4, GRAMD1C, DGAT2, ALPL, HOMERS, MTHFS, ADIPOR1, RBM3, EXOC8 and ARHGEF37, more preferably the 15 RNAs.
[22] The method according to [16], wherein the component in the blood is preferably γ-GTP, and the nucleic acid is preferably at least one selected from the group consisting of 15 RNAs derived from 15 genes consisting of TMEM38A, BTN3A2, NAP1L2, ABCA2, ALPL, SECTM1, C17orf62, GNB2, R3HDM4, LRG1, SBNO2, CD14, MLLT1, NINJ2 and LIMD2, more preferably the RNAs.
[23] The method according to [16], wherein the component in the blood is preferably LDL-cholesterol, and the nucleic acid is preferably at least one selected from the group consisting of 10 RNAs derived from 10 genes consisting of THTPA, LOC100506023, ZNF700, TAB3, PLEKHA1, ZNF845, FXC1, CUL4A, NDUFV1 and AMZ2, more preferably the 10 RNAs.
[24] A method for analyzing a concentration of a component in the blood of a subject, the method containing:
obtaining an expression level of RNA derived from a gene having a high correlation with the concentration of a component in the blood from the nucleic acid of a subject, which is prepared by the method according to any one of [1] to [10]; and
analyzing the concentration of the component in the blood of the subject by a machine learning model on the basis of the expression level of RNA derived from the gene having a high correlation with the concentration of the component in the blood,
the machine learning model being a machine learning model constructed so that the data of the expression level of RNA derived from the gene having a high correlation with the concentration of the component in the blood and has been contained in skin surface lipid-derived RNA obtained from a human population serves as an explanatory variable and the data of the concentration of the component in the blood obtained from the human population serves as an objective variable.
[25] The method according to [24], wherein preferably, the component in the blood is a hormone, insulin, neutral fat, γ-GTP or LDL-cholesterol, and the hormone is preferably testosterone or cortisol.
[26] The method according to [25], wherein
the component in the blood is preferably testosterone, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of SCARNA16, PRSS27, RDBP, PSMB10, SBNO1, EMC3, MAR9, C20orf112, C14orf2 and CCDC90B, more preferably all of the genes;
the component in the blood is preferably insulin, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of EAPP, SDE2, LYAR, ZNF493, PSMB10, FAM71A, GPANK1, FGD4, MRPL43 and CMPK1, more preferably all of the genes;
the component in the blood is preferably neutral fat, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of CCDC9, C6orf106, CERK, HSD3B2, SUN2, FNDC4, GRAMD1C, DGAT2, ALPL, HOMER3, MTHFS, ADIPOR1, RBM3, EXOC8 and ARHGEF37, more preferably all of the genes;
the component in the blood is preferably γ-GTP, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of TMEM38A, BTN3A2, NAP1L2, ABCA2, ALPL, SECTM1, C17orf62, GNB2, R3HDM4, LRG1, SBNO2, CD14, MLLT1, NINJ2 and LIMD2, more preferably all of the genes; or
the component in the blood is preferably LDL-cholesterol, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of THTPA, LOC100506023, ZNF700, TAB3, PLEKHA1, ZNF845, FXC1, CUL4A, NDUFV1 and AMZ2, more preferably all of the genes.
[27] A database for constructing a machine learning model for analyzing a concentration of a component in the blood, the database containing:
data of the expression level of RNA derived from a gene having a high correlation with the concentration of the component in the blood and has been contained in skin surface lipid-derived RNA obtained from a human population; and
data of the concentration of the component in the blood obtained from the human population, wherein
the component in the blood is preferably testosterone, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of SCARNA16, PRSS27, RDBP, PSMB10, SBNO1, EMC3, MAR9, C20orf112, C14orf2 and CCDC90B, more preferably all of the genes;
the component in the blood is preferably insulin, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of EAPP, SDE2, LYAR, ZNF493, PSMB10, FAM71A, GPANK1, FGD4, MRPL43 and CMPK1, more preferably all of the genes;
the component in the blood is preferably neutral fat, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of CCDC9, C6orf106, CERK, HSD3B2, SUN2, FNDC4, GRAMD1C, DGAT2, ALPL, HOMER3, MTHFS, ADIPOR1, RBM3, EXOC8 and ARHGEF37, more preferably all of the genes;
the component in the blood is preferably γ-GTP, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of TMEM38A, BTN3A2, NAP1L2, ABCA2, ALPL, SECTM1, C17orf62, GNB2, R3HDM4, LRG1, SBNO2, CD14, MLLT1, NINJ2 and LIMD2, more preferably all of the genes; or
the component in the blood is preferably LDL-cholesterol, and the gene having a high correlation with the concentration of the component in the blood is preferably at least one selected from the group consisting of THTPA, LOC100506023, ZNF700, TAB3, PLEKHA1, ZNF845, FXC1, CUL4A, NDUFV1 and AMZ2, more preferably all of the genes.
[28] A program for carrying out the method according to any one of [24] to [26].
[29] An apparatus for carrying out the method according to any one of [24] to [26].
Hereinafter, the present invention will be described in more detail on the basis of Examples, which should not be construed as limiting the present invention.
Sebum was collected from the entire face of a healthy person using an oil blotting film (5×8 cm, made of polypropylene, 3M Ltd.). The oil blotting film was transferred into a glass vial, and left standing at 4° C. for several hours, and RNA in SSL contained in the film was then purified. In the purification of RNA, the oil blotting film was cut to an appropriate size, and RNA was extracted in accordance with an attached protocol using QIAzol (registered trademark) Lysis Reagent (Qiagen). The extracted RNA was subjected to reverse transcription at 42° C. for 30 minutes with SuperScript (registered trademark) VILO cDNA Synthesis kit (Life Technologies Japan Ltd.) to synthesize cDNA. As a primer for the reverse transcription reaction, a random primer attached to the kit was used. From the obtained cDNA, a library containing cDNA derived from the 20802 gene was prepared by multiplex PCR. The multiplex PCR was performed under the condition of [99° C., 2 min→4(99° C., 15 sec→460° C., 16 min)×20 cycles→4° C., Hold] using Ion AmpliSeqTranscriptome Human Gene Expression Kit (Life Technologies Japan Ltd.). The prepared library was measured using TapeStation (Agilent Technologies) and High Sensitivity D1000 ScreenTape (Agilent Technologies), and the results showed that a peak derived from the library was not detected. The reason why the peak was not detected was that the amount of sebum collected from the subject was small; and leaving the library standing at 4° C. after the collection and before the purification had accelerated decomposition, so that the amount of RNA purified was small.
For examining the effect of the preservation temperature on human RNA in SSL, the oil blotting film used for collecting the sebum in 1) was coated with 40 ng of a human surface skin cell-derived RNA solution as RNA, and then preserved for 4 days at (i) room temperature (RT), (ii) 4° C., (iii)−20° C. or (iv)−80° C. As the human surface skin cell-derived RNA solution, one obtained by dissolving RNA extracted from frozen NHEK (NB) (KURABO INDUSTRIES LTD.) in a 50% (v/v) ethanol solution was used. The oil blotting film after the preservation was cut to an appropriate size, and RNA was extracted in accordance with an attached protocol using QIAzol (registered trademark) Lysis Reagent (Qiagen). The extracted RNA was measured with TapeStation (Agilent Technologies) and High Sensitivity RNA Screen Tape (Agilent Technologies).
Using an oil blotting film (3M Ltd.), sebum was collected from the entire face of a healthy person with a small amount of sebum. From the oil blotting film, RNA was extracted in accordance with the same procedure as in Test Example 1. The extracted RNA was subjected to reverse transcription to synthesize cDNA. The reverse transcription reaction was carried out using SuperScript (registered trademark) VILO cDNA Synthesis kit (Thermo Scientific). As a primer for the reverse transcription reaction, a random primer attached to the kit was used. The condition of the elongation temperature and the time for the reverse transcription was set to (i) 40° C. for 60 minutes, (ii) 40° C. for 90 minutes, (iii) 42° C. for 60 minutes or (iv) 42° C. for 90 minutes (temperature accuracy: ±0.25° C.). Using the obtained cDNA, multiplex PCR was performed under the same conditions as in Test Example 1. The obtained PCR product was purified with Ampure XP (Beckman Coulter Inc.). The concentration of a PCR product in the solution of the obtained purified product was determined with TapeStation (Agilent Technologies) and High Sensitivity D1000 Screen Tape (Agilent Technologies). Table 1 shows the results. The PCR product was obtained in the largest amount when reverse transcription was performed at 42° C. for 90 minutes.
Using an oil blotting film (3M Ltd.), sebum was collected from the entire face of a healthy person with a small amount of sebum. From the oil blotting film, RNA was extracted in accordance with the same procedure as in Test Example 1. Using the extracted RNA, synthesis of cDNA and multiplex PCR were performed in the same manner as in 1) except that the temperature for annealing and elongation in PCR was changed. The condition for reverse transcription was set to 42° C. for 90 minutes. The temperature for annealing and elongation was set to (i) 60° C., (ii) 62° C., (iii) 63° C. or (iv) 64° C. (temperature accuracy: ±0.25° C.). The obtained PCR product was purified with Ampure XP (Beckman Coulter Inc.), and determined with TapeStation (Agilent Technologies) and High Sensitivity D1000 Screen Tape (Agilent Technologies). Table 2 shows the results. The PCR product was obtained in the largest amount when the temperature for annealing and elongation was 62° C.
Using an oil blotting film (3M Ltd.), sebum was collected from the entire face of a healthy person with a small amount of sebum. From the oil blotting film, RNA was extracted in accordance with the same procedure as in Test Example 1. Using the extracted RNA, synthesis of cDNA and multiplex PCR were performed in the same manner as in 1). The condition for reverse transcription was set to 42° C. for 30 minutes. The temperature for annealing and elongation was set to (i) 60° C. (temperature accuracy: ±0.25° C.). The obtained PCR product was divided into two parts. One part was purified with Ampure XP (Beckman Coulter Inc.), and the other part was not purified. Each sample solution, 5XVILO RT Reaction Mix attached to SuperScript (registered trademark) VILO cDNA Synthesis kit, 5XIon Ampliseq HiFi Mix attached to Ion AmpliSeqTranscriptome Human Gene Expression Kit (Life Technologies Japan Ltd.), and Ion AmpliSeq Transcriptome Human Gene Expression Core Panel were mixed to reconstruct the buffer composition, and in accordance with protocols attached to kits, digestion of the primer sequence, adaptor ligation and purification, and amplification were performed to prepare a library. The concentration of the obtained library was determined with TapeStation (Agilent Technologies) and High Sensitivity D1000 Screen Tape (Agilent Technologies). Table 3 shows the results. In samples which were not purified, the library was not detected.
The results in 1) and 2) showed that when a nucleic acid sample was prepared from RNA in SSL, the optimum condition for reverse transcription reaction was approximately 42° C. for 90 minutes, and the optimum condition of the annealing and elongation temperature for multiplex PCR was approximately 62° C. It was considered that by performing multiplex RT-PCR under these conditions, the yield of the nucleic acid sample from RNA in SSL was increased. The results in 3) showed that addition of a purification step after PCR increased the yield of the nucleic acid sample, so that it was possible to prepare of a nucleic acid sample even from SSL with a small RNA amount. Further, it was considered that as shown in Test Example 1, when RNA in SSL collected from the subject was preserved at −20° C. until being used for preparation of the nucleic acid sample, RNA was inhibited from denaturing, so that it was possible to further increase the yield of the nucleic acid sample.
The reverse transcriptase and the primer used during the reverse transcription reaction are SuperScript (registered trademark) III Reverse Transcriptase and random Primers, respectively, and the enzyme and the primer used at the time of performing PCR are AmpliSeq HiFi Mix Plus and AmpliSeq Transcriptome Panel Human Gene Expression CORE, respectively.
20 healthy persons (20 to 39-year-old males, BMI: 18.5 or more and less than 25.0) and 11 atopic dermatitis patients (ADs) (20 to 39-year-old males, BMI: 18.5 or more and less than 25.0) were selected as subjects. The healthy persons were confirmed to have no abnormality of the skin by a dermatologist in advance, and ADs were diagnosed as atopic dermatitis by a dermatologist in advance. Sebum was collected from the entire face of each subject using an oil blotting film (3M Ltd.) after the entire face was photographed. The oil blotting film was transferred into a glass vial, and preserved at −80° C. for about 1 month until being used for extraction of RNA. In the following test examples as well as this test example, SSL collected from the subject was preserved at −80° C., i.e. a common preservation condition, until being used for extraction of RNA. If the SSL is preserved under a condition enabling more stable preservation of RNA in SSL (at −20° C.) as shown in Test Example 1, at least comparable analysis results may be obtained because RNA expression analysis data can be more stably obtained.
From the preserved oil blotting film, RNA was extracted in accordance with the same procedure as in Test Example 1. The extracted RNA was subjected to reverse transcription at 42° C. for 90 minutes, and multiplex PCR was performed at an annealing and elongation temperature of 62° C. The obtained PCR product was purified with Ampure XP (Beckman Coulter Inc.), followed by performing reconstruction of the buffer, digestion of the primer sequence, adaptor ligation and purification, and amplification in accordance with the same procedure as in Test Example 2 and 3) to prepare a library. The prepared library was loaded into Ion 540 Chip, and subjected to sequencing using Ion S5/XL System (Life Technologies Japan Ltd.).
The expression levels of SSL-derived RNA species confirmed to be expressed through the sequencing were compared between the healthy persons and the ADs. As RNA species to be compared, 19 immune response-related RNAs and 17 keratinization-related RNAs were used. It is reported in a document (J Allergy Clin Immunol, 2011, 127: 954-964) that for these RNA species, the ratio of the expression level in AD to the expression level in the healthy person varies between an affected part and a non-affected part of the skin tissue of AD.
38 healthy males (20 to 50-year-old, BMI: 18.5 or more and less than 25.0) confirmed to have no abnormality of the skin by a dermatologist in advance were selected as subjects.
3 mL of blood was collected from the arm of each subject using a vacuum blood collection tube, and serum was separated and preserved at −80° C. From the preserved serum, the serum testosterone concentration was determined in accordance with an attached protocol using Testosterone ELISA Kit (Cayman Chemical). An external inspection organization (LSI Medience Corporation) was commissioned to determine the serum concentrations of insulin, neutral fat, γ-GTP and LDL-cholesterol.
Sebum was collected from the entire face of each subject using an oil blotting film (3M Ltd.) after the entire face was photographed. The oil blotting film was transferred into a glass vial, and preserved at −80° C. for about 1 month. From the preserved oil blotting film, RNA in SSL was extracted in accordance with the same procedure as in Test Example 3, a library was prepared, RNA species were identified through sequencing, and the expression levels of the RNA species were measured.
Data of the measured expression levels of SSL-derived RNAs from the subjects (reads per million mapped reads: RPM values) was randomly divided into data for 33 subjects and data for 5 subjects. On the basis of the SSL-derived RNA expression levels (RPM values) and the serum testosterone concentrations for a total of 33 subjects, a serum testosterone concentration prediction model based on machine learning was constructed. First, 10 RNAs having the highest correlation with the serum testosterone concentration (RNAs derived from SCARNA16, PRSS27, RDBP, PSMB10, SBNO1, EMC3, MARS, C20orf112, C14orf2 and CCDC90B) were selected on the basis of the RPM values.
As learning data, the expression levels (RPM values) of SSL-derived RNA for the selected 10 RNAs for the 33 subjects were used as explanatory variables, and the serum testosterone concentrations for the 33 subjects were used as objective variables to perform construction and selection of an optimum prediction model with Visual Mining Studio Software (NTT DATA Mathematical System Inc.).
Using the selected prediction model, predicted values of blood testosterone concentrations were calculated from the SSL-derived RNA expression levels for the other 5 subjects. The results showed that the calculated predicted values had a high correlation with the measured values of serum testosterone concentrations (correlation coefficient=0.93) as shown in
Data of the measured expression levels of SSL-derived RNAs from the subjects (RPM values) was randomly divided into data for 31 subjects and data for 7 subjects. On the basis of the SSL-derived RNA expression levels (RPM values) and the serum concentrations of insulin, neutral fat, γ-GTP and LDL-cholesterol for a total of 31 subjects, prediction models for the serum concentrations of insulin, neutral fat, γ-GTP and LDL-cholesterol, which are based on machine learning, were constructed. First, on the basis of the RPM values, RNAs derived from the following molecules were selected as RNAs having the highest correlation with the serum concentrations of 1) insulin, 2) neutral fat, 3) γ-GTP and 4) LDL-cholesterol:
1) insulin: EAPP, SDE2, LYAR, ZNF493, PSMB10, FAM71A, GPANK1, FGD4, MRPL43 and CMPK1;
2) neutral fat: CCDCl9, C6orf106, CERK, HSD3B2, SUN2, FNDC4, GRAMD1C, DGAT2, ALPL, HOMER3, MTHFS, ADIPOR1, RBM3, EXOC8 and ARHGEF37;
As learning data, the expression level (RPM value) of SSL-derived RNA for each of the selected RNAs for the 31 subjects was used as an explanatory variable, and the concentration of insulin, neutral fat, γ-GTP or LDL-cholesterol for the 31 subjects was used as an objective variable to perform construction and selection of an optimum prediction model with Visual Mining Studio Software (NTT DATA Mathematical System Inc.).
Using the selected prediction model, predicted values of concentrations of insulin, neutral fat, γ-GTP and LDL-cholesterol in the blood were calculated from the SSL-derived RNA expression levels for the other γsubjects. The results showed that the calculated predicted values had a positive correlation with the measured values of concentrations of insulin, neutral fat, γ-GTP and LDL-cholesterol in the serum, as shown in
9 healthy males (20 to 39-year-old) were selected as subjects. As a test product, a facial cleanser having an effect of decomposing and removing horn plugs (Biore Ouchi de Esute, Kao corporation) was used. The subjects each washed the entire face twice a day (morning and night) for 1 week using an appropriate amount (about 1 g) of the test product. Before the start of use of the facial cleanser as the test product (day 0) and 2 days after the start of use of the facial cleanser, SSL was collected from the entire face of the subject using an oil blotting film (3M Ltd.).
The oil blotting film containing the collected SSL was transferred into a glass vial, and preserved at −80° C. for about 1 month. From the preserved oil blotting film, RNA in SSL was extracted in accordance with the same procedure as in Test Example 3, a library was prepared, RNA species were identified through sequencing, and the expression levels of the RNA species were measured.
As RNAs in which with respect to the measured SSL-derived RNA expression levels (RPM values) before the start of use of the cleanser and 2 days after the start of use of the cleanser, the p value in the Student's t-test 2 days after the start of use of the cleanser was 0.05 times or less of that on day 0 and the RPM value 2 days after the start of use of the cleanser was 2 times or more of that on day 0, RNAs derived from 22 molecules consisting of BNIP3, CALML3, GAL, HSPA5, JUNB, KIF13B, KRT14, KRT17, KRT6A, OVOL1, PPIF, PRDM1, RBM3, RPLP1, RPS4X, SEPT9, SOAT1, SPNS2, UBB, VCP, WIPI2 and YPEL3 were identified (Table 4). These molecules included molecules related to terminal keratinization of the skin, such as BNIP3, OVOL1, KRT14 and KRT17, and molecules related to anti-inflammation action, such as JUNB and PRDM1. It was suggested that these molecules could serve as markers indicating an improvement in skin condition because use of the cleanser increased the expression levels of the molecules.
18 healthy males (20 to 48-year-old) were selected as subjects. As a test product, a facial cleanser having an effect of decomposing and removing horn plugs (Biore Ouchi de Esute, Kao corporation) was used. The subjects each washed the entire face twice a day (morning and night) for 1 week using an appropriate amount (about 1 g) of the test product as in “(1) Identification of RNA species used for prediction of effect of facial cleanser on skin” above. Before the start of use of the facial cleanser as the test product (day 0) and 1 week after the start of use of the facial cleanser, SSL was collected from the subject and a skin condition was measured as described below.
i) SSL was collected from the entire face using an oil blotting film (3M Ltd.).
ii) The face was washed, and then conditioned in a constant-temperature room (20±5° C., 40% RH) for 15 minutes.
iii) A magnified image of the cheek was taken, and the horn cell layer moisture content of the left part of the cheek was then measured at one point using Corneometer (MPA580, Courage+Khazaka Electronic GmbH, Germany).
iv) Questionnaire studies on the skin condition were conducted.
The oil blotting film containing the collected SSL was transferred into a glass vial, and preserved at −80° C. for about 1 month. From the preserved oil blotting film, RNA in SSL was extracted in accordance with the procedure as in Test Example 3, a library was prepared, RNA species were identified by sequencing, and the expression levels of the RNA species were measured.
On the basis of the RPM values on day 0 for the group of 22 RNAs (Table 4) selected in “(1) Identification of RNA species used for prediction of effect of facial cleanser on skin” above, the subjects were classified into two groups: a group in which expression of the 22 RNAs tended to be generally high (high-value group); and a group in which expression of the 22 RNAs tended to be generally low (low-value group) (
Francesco Iorio et al. reported “Signature reversion” as a method for improving a disease, containing applying a drug etc. having an effect of increasing expression of a group of RNAs whose expression decreases due to a disease or the like (Drug Discov Today, 2013, 18 (7-8): 350-357). This method was thought to ensure that in the low-value group, expression of the group of 22 RNAs would be increased more markedly by use of the test product than that in the high-value group, leading to improvement of a skin condition. In practice,
These results suggest that use of a SSL-derived RNA analysis technique enables prediction of the effect of a skin external preparation before the start of use of the product. For example, when SSL-derived RNAs (e.g. 22 RNAs found in this example) which are expressed or are not expressed characteristically in persons who can easily enjoy the effect of a certain skin external preparation, and the expression levels of the SSL-derived RNAs in a subject are then examined, it is possible to predict whether or not the effect can be obtained when the subject uses the skin external preparation.
55 healthy persons (20 to 49-year-old males, BMI: 18.5 or more and less than 25.0), 15 mild atopic skin dermatitis patients (20 to 39-year-old males, BMI: 18.5 or more and less than 25.0) and 25 moderate atopic skin dermatitis patients (20 to 39-year-old males, BMI: 18.5 or more and less than 25.0) were selected as subjects. The healthy persons were confirmed to have no abnormality of the skin by a dermatologist in advance, and the atopic dermatitis patients were diagnosed as atopic dermatitis by a dermatologist in advance.
Sebum was collected from the entire face of each subject using an oil blotting film (3M Ltd.). The oil blotting film was transferred into a glass vial, and preserved at −80° C. for about 1 month until being used for extraction of RNA. From the preserved oil blotting film, RNA in SSL was extracted in accordance with the same procedure as in Test Example 3, a library was prepared, RNA species were identified through sequencing, and the expression levels of the RNA species were measured.
On the basis of SSL-derived RNA information (RPM value), a RPM value converted into a base-2 logarithmic value was subjected to data analysis. A group of 1911 genes were extracted in which the RPM value converted into the base-2 logarithmic value in the atopic dermatitis patients was half or less of that in the healthy persons and the p value in the Student's t-test in the atopic dermatitis patients was 0.05 times or less of that in the healthy persons ((A) of Tables 7-1 to 7-24). Subsequently, on the basis of a value obtained by converting the RPM value into a base-2 logarithmic value, and with the false discovery rate (FDR) level set to 5%, searching of biological processes (BPs) by gene ontology (GO) enrichment analysis was performed in accordance with a published document (Nature Protoc, 2009, 4: 44-57; Nucleic Acids res, 2009, 37: 1-13). As a result, 19 BPs related to a group of genes whose expression decreased in the atopic dermatitis patients. Of these, GO: 0050911 (detection of chemical stimulus involved in sensory perception of smell) was shown to be most closely related (Table 5). RNAs forming GO: 0050911 included about 400 olfactory receptors (ORs), and expression of 370 ORs was statistically significantly lower in the atopic dermatitis patients than in the healthy persons. Such a decrease in expression contributed to the significance of GO. This suggested that the expression levels of the 370 ORs in the SSL-derived RNA information could serve as a useful marker for discriminating healthy persons from atopic dermatitis patients. Further, the results of comparing the healthy persons with mild atopic dermatitis patients for RNA expression of OR and comparing the mild atopic dermatitis patients with the moderate atopic dermatitis patients showed that the expression of 368 ORs was lower in the mild atopic dermatitis patients than in the healthy persons, and expression of 284 ORs was lower in the moderate dermatitis patients than in the healthy persons ((B) to (D) of Tables 7-1 to 7-11). These results revealed that the expression levels of the ORs shown in (B) to (D) of Tables 7-1 to 7-11 in SSL decreased as the symptom of atopic dermatitis worsened, and it was suggested that the severity of atopic dermatitis could be known by using the expression levels of the ORs as an index.
42 healthy females confirmed to have no abnormality of the skin by a dermatologist in advance (20 to 59-year-old, BMI: 18.5 or more and less than 25.0) were selected as subject candidates. For these candidates, questionary studies were conducted on whether or not there are subjective symptoms of sensitive skin (one of the four feelings: “bothered”, “not so bothered”, “not bothered” and “not bothered at all”). 10 candidates showing the feeling of “not bothered” or “not bothered at all” were classified as a group without subjective symptoms of sensitive skin, and 13 candidates showing the feeling of “bothered” was classified as a group with subjective symptoms of sensitive skin. These candidates were selected as subjects.
Sebum was collected from the entire face of each subject using an oil blotting film (3M Ltd.). The oil blotting film was transferred into a glass vial, and preserved at −80° C. for about 1 month until being used for extraction of RNA. From the preserved oil blotting film, RNA in SSL was extracted in accordance with the same procedure as in Test Example 3, a library was prepared, RNA species were identified through sequencing, and the expression levels of the RNA species were measured.
On the basis of SSL-derived RNA information (RPM value), a RPM value converted into a base-2 logarithmic value (Log2 RPM value) was subjected to data analysis. A group of 693 genes were extracted in which the Log2 RPM value in the group with subjective symptoms of sensitive skin was half or less of that in the group without subjective symptoms of sensitive skin and the p value in the Student's t-test in the group with subjective symptoms was 0.05 or less of that in the group without subjective symptoms ((E) of Tables 7-1 to 7-20). Subsequently, on the basis of the Log2 RPM value, and with the FDR level set to 5%, searching of BPs by gene ontology enrichment analysis was performed in accordance with the above-described published document. As a result, 4 BPs related to the group of genes whose expression decreased in the group with subjective symptoms of the sensitive skin were obtained, and it was shown that GO: 0050911 was most closely related (Table 6). Expression of 344 ORs ((F) of Tables 7-1 to 7-10), among about 400 PRs in GO: 0050911, was statistically significantly lower in the persons with subjective symptoms of sensitive skin than in the persons without subjective symptoms of sensitive skin. Such a decrease in expression contributed to the significance of GO. This suggested that the expression levels of these ORs in the SSL-derived RNA information could serve as a useful marker for detecting a subjective symptom of sensitive skin.
38 healthy males (20 to 59-year-old, BMI: 18.5 or more and less than 25) confirmed to have no abnormality of the skin by a dermatologist in advance were selected as subjects.
The entire face was photographed, and the casual amount of sebum in the forehead of each subject before washing of the face was then measured using Sebumeter (MPA580, Courage+Khazaka Electronic GmbH, Germany). Thereafter, the face was washed, and conditioned for 15 minutes in a variable-environment room (temperature: 20° C. (±2° C.) and humidity: 50% (±5%)). After completion of the conditioning, the moisture content of the cheek was measured using Corneometer (MPA580, Courage+Khazaka Electronic GmbH, Germany).
After the casual amount of sebum was measured in the measurement of the skin physical properties, sebum was collected from the entire surface of each subject using an oil blotting film (3M Ltd.). The oil blotting film was transferred into a glass vial, and preserved at −80° C. for about 1 month. From the preserved oil blotting film, RNA in SSL was extracted in accordance with the same procedure as in Test Example 3, a library was prepared, RNA species were identified through sequencing, and the expression levels of the RNA species were measured.
Persons in which the value of the forehead casual amount of sebum measured with Sebumeter was less than 100 were classified as a low-value group, and persons in which the value of the forehead casual amount of sebum was 150 or more were classified as a high-value group. Comparison of the expression level of SSL-derived RNA (RPM value) between the two groups (low-value group and high-value group) was performed in accordance with the same procedure as in Test Example 7, and the result showed that 594 RNAs statistically significantly varied in expression ((I) of Tables 7-1 to 7-17). As shown in
On the basis of the results of measurement by Corneometer, the top 15 persons in terms of the horn cell layer moisture content (high-value group) and the bottom 15 persons in terms of the horn cell layer moisture content (low-value group) were selected. Comparison of the expression level of SSL-derived RNA (RPM value) between the two groups (low-value group and high-value group) was performed in accordance with the same procedure as in Test Example 7, and the result showed that 553 RNAs statistically significantly varied in expression ((H) of Tables 7-1 to 7-16). A natural moisturizing factor has been reported to play an important role in maintenance of the skin moisturizing capacity (Dermatol Ther, 17 Suppl, 2004, 1: 43-48). Expression of a factor related to generation of the natural moisturizing factor was examined, and the result revealed that as shown in
On the basis of the results of visually evaluating face images, 8 persons with intense skin redness (high-value group) and 6 persons with mild skin redness (low-value group) were selected. Comparison of the expression level of SSL-derived RNA (RPM value) between the two groups (low-value group and high-value group) was performed in accordance with the same procedure as in Test Example 7, and the result showed that 703 RNAs statistically significantly varied in expression ((G) of Tables 7-1 to 7-20). It was evident that as shown in
39 healthy females (age: 30s) having no problem on the skin of the face, the fingers or the upper arms were selected as subjects.
Using an oil blotting film (5 cm×8 cm, 3M Ltd.), sebum was collected from the entire face of each subject before washing of the face, and preserved as a sample for analysis of SSL-derived RNA at −80° C. for about 1 month.
After the sebum was collected, the subjects each washed the face using a commercially available facial cleanser, and conditioned in a variable-environment room (temperature: 20° C.±1° C. and humidity: 40%±5%). During the conditioning, the skin condition of the face of each of the subjects was evaluated visually and on palpation.
Visual evaluation items: “cleanness”, “clearness”, “lightness”, “yellowness”, “overall redness”, “flecks”, “scale”, “luster”, “textured wrinkles on the cheek”, “conspicuous dark circles”, “drooping corners of the mouth”, “acne”, “conspicuous pores (cheek)” and “conspicuous pores (nose)”
Palpatory evaluation: “rough feeling” and “moist feeling”
For each evaluation item, three professional evaluators marked scores on the basis of criteria (3: very heavy, 2: heavy, 1: slightly heavy, 0: none), and an average of the scores by the three evaluators was defined as an evaluation value.
From each of the subjects after completion of the conditioning, the horn cell layer moisture content was measured with Corneometer (MPA580, Courage+Khazaka Electronic GmbH, Germany) and Skicon (YOYOI Co., Ltd.), the transepidermal water loss (TEWL) was measured with Tewameter (MPA580, Courage+Khazaka Electronic GmbH, Germany), the amount of sebum was measured with Sebumeter (MPA580, Courage+Khazaka Electronic GmbH, Germany), and the amount of melanin and the amount of erythema were measured with CM26000d (KONICA MINOLTA, INC.). The amount of sebum was measured on the forehead, and all the others were measured.
After a lapse of 1 hour or more from the washing of the face, the subject was caused to lie supine, and two sheets of cigarette paper (1.7 cm×1.7 cm, RIZLA: RIZLA BLUE DOUBLE) degreased with chloroform/methanol=1/1 were arranged near the center of the forehead so as not to overlap each other, and lightly pressed against the forehead for 10 seconds to collect sebum. The cigarette paper containing the sebum was put into a screw tube, methanol was immediately added, and the cigarette paper was cryogenically preserved at −80° C. until analysis.
The solvent was removed from the screw tube by distillation under the nitrogen flow, and 1 mL of chloroform/methanol=1/1 was then added into the screw tube. After it was confirmed that the cigarette paper was sufficiently immersed in the solvent in the screw tube, sebum was extracted by ultrasonic treatment for 5 minutes to obtain a sebum solution. In a very small vial, 20 μL of a lipid internal standard solution for direct-MS/MS measurement at 100 μmol/L was solidified by drying, 100 μL of the sebum solution prepared in accordance with the above-described procedure was added thereto, dissolved and mixed to prepare a sebum sample solution containing an internal standard. From the prepared sample solution, the amounts of free fatty acid (FFA), wax ester (WE), cholesterol ester (ChE), squalene (SQ), squalene epoxide (SQepo), squalene oxide (SQOOH), diacylglycerol (DAG) and triacylglycerol (TAG) were measured for each subject by direct-MS/MS, and absolute amounts were calculated on the basis of the internal standard.
In accordance with the method described in a document (JP-B-6482215), measurement was performed under the following conditions.
Instrument: LC/Agilent 1200 series, mass spectrometer/6460 triple quadrupole (manufactured by Agilent Technologies)
Mobile phase: 15 mmol/L ammonium acetate-containing chloroform/methanol=1/1
Flow rate: 0.2 mL/min
Injection volume: 1 μL
Detection conditions: ionization method=ESI, dry gas temperature=300° C., dry gas flow rate=5 L/min, nebulizer pressure=45 psi, sheath gas flow rate=11 L/min, nebulizer voltage=0 V, capillary voltage=3,500 V
FFA: Scan (Negative mode)
WE: Precursor Ion Scan for detecting molecules from constituent fatty acid-derived product ions
ChE: Precursor Ion Scan for detecting molecules from cholesterol backbone-derived product ions
SQ, SQepo, SQOOH: MRM
DAG: Neutral Loss Scan for detecting molecules from desorbed hydroxyl groups
TAG: Neutral Loss Scan for detecting molecules from fatty acids desorbed as neutral molecules
From the preserved oil blotting film, RNA in SSL was extracted in accordance with the same procedure as in Test Example 3, a library was prepared, RNA species were identified through sequencing, and the expression levels of the RNA species were measured.
Data Used
In the data of the expression levels of SSL-derived RNAs from the subjects (read count values), data with a read count of less than 10 was set as a missing value, and converted into a RPM value corrected for a difference in the total number of reads between samples, and the missing value was then supplemented by singular value decomposition (SVD) imputation. Only genes for which expression data that is not a missing value had been obtained in 80% or more of all the subjects was used for the following analysis. For construction of the machine learning model, RPM values converted into base-2 logarithmic values (Log2 RPM values) were used for approximating RPM values following a negative binomial distribution to a normal distribution. Evaluation values and measured values for the prediction target items (data from the visual and palpatory evaluation of the skin, the measurement of skin physical properties and the skin composition analysis; Table 8) were converted into deviation values in the target data sets, which were defined as target values.
Of the RNA profile data set obtained from the subjects, RNA profile data for 31 subjects, which amounts to 80% of the data set, was used as training data for skin condition prediction models, and RNA profile data for the other 8 subjects, which amounts to 20% of the data set, was used as test data for evaluation of model accuracy. For dividing the data set into the training data and the test data, a division method giving a uniform age distribution (division 1) and a division method giving uniform target values in prediction target items (division 2) were examined.
Selection of Feature Genes
In the training data, target values in the prediction target items and absolute values of Spearman's correlation coefficients (rho) of Log2 RPM values were calculated, and the top 10 genes (or 5 genes) shown in Table 8 were selected as feature genes in the prediction target items.
Model Construction
Model construction was performed using the caret package of Statistical Analysis Environment R.
The data of the expression level of SSL-derived RNA (Log2 RPM value) in the training data was used as an explanatory variable, and the target value in each of the prediction target items was used as an objective variable to construct a prediction model.
For each prediction target item, the prediction model was made to learn by performing 10-fold cross validation using 6 algorisms which are linear regression model (Linear model), Lasso regression (Lasso), random forest (Random Forest), neural network (Neural net), linear kernel support vector machine (SVM (linear)) and rbf kernel support vector machine (SVM (rbf)).
For each algorism, the expression level of SSL-derived RNA (Log2 RPM value) in the test data was input to the model after learning to calculate the target predicted value in each prediction item.
For each prediction item, the root-mean-square-error (RMSE) of a difference between a predicted value and a measured value was calculated, and the model giving the smallest value of RMSE was selected as an optimum prediction model.
Table 9 shows the data division method giving the smallest RMSE, the algorism used and RMSE for each prediction target item.
128 healthy females (age: 20s to 50s) having no problem on the skin of the face, the fingers or the upper arms were selected as subjects.
Using an oil blotting film (5 cm×8 cm, 3M Ltd.), sebum was collected from the entire face of each subject before washing of the face, and preserved as a sample for analysis of SSL-derived RNA at −80° C. for about 1 month. From the preserved oil blotting film, RNA in SSL was extracted in accordance with the same procedure as in Test Example 3, a library was prepared, RNA species were identified through sequencing, and the expression levels of the RNA species were measured.
Measurement of Blood Cortisol Concentration
15 mL of blood was collected from the arm of each subject using a vacuum blood collection tube, and serum was separated and preserved at −80° C. An external inspection organization (LSI Medience Corporation) was commissioned to determine the concentration of cortisol in the preserved serum by a chemiluminescent immunoassay method (CLIA method).
Data Used
As in Test Example 9, data with a read count of less than 10 in the data of the expression levels of SSL-derived RNAs from the subjects (read count values) was set as a missing value, and converted into a RPM value corrected for a difference in the total number of reads between samples, and the missing value was then supplemented by SVD imputation. Only genes for which expression data that is not a missing value had been obtained in 80% or more of all the subjects was used for analysis. Log2 RPM values were used as the expression level data.
Division of Data Set
From the RNA profile data set obtained from the subjects, RNA profile data for 102 subjects, which amounts to 80% of the data set, was randomly extracted, and used as training data for blood cortisol concentration prediction models. RNA profile data for the other 26 subjects, which amounts to 20% of the data set, was used as test data for evaluation of model accuracy.
Selection of Feature Genes
1,000 genes having a large Pearson's correlation coefficient with the blood cortisol concentration in the training data were extracted.
Algorithms Used (Hyperparameter Candidate Values)
Support vector machine (C: [0.1, 1, 10], kernel: [‘linear’, ‘rbf’ and ‘poly’])
Random forest (max depth: [1,2,3], max_features: [1,2], n_estimators: [10, 100])
Multilayer perceptron (solver: ‘lbfgs’, ‘adam’, alpha: [0.1,1,10])
Model Construction
Model construction was performed using the machine learning library scikit-learn of Python.
The prediction model was made to learn by performing 10-fold cross validation, where the data of the expression level of SSL-derived RNA (Log2 RPM value) in the training data was used as an explanatory variable, and the blood cortisol concentration was used as an objective variable.
In the cross validation, the data of the expression levels of 1,000 genes extracted as feature genes was compressed to first to tenths main components by main component analysis, and the model was then made to learn while grid search was performed for each algorism and hyperparameter candidate value.
The expression level of SSL-derived RNA (Log2 RPM value) in the test data was input to each model after learning to calculate the predicted value, and the model giving the smallest RMSE of the difference between the predicted value and the measured value was selected as an optimum prediction model.
130 healthy females (age: 20s to 50s) having no problem on the skin of the face, the fingers or the upper arms were selected as subjects.
Using an oil blotting film (5 cm×8 cm, 3M Ltd.), sebum was collected from the entire face of each subject before washing of the face, and preserved as a sample for analysis of SSL-derived RNA at −80° C. for about 1 month. From the preserved oil blotting film, RNA in SSL was extracted in accordance with the same procedure as in Test Example 3, a library was prepared, RNA species were identified through sequencing, and the expression levels of the RNA species were measured.
A standard time during which subjects in a certain range of ages had been exposed to sunlight was predicted on the basis of questionary studies on the lifestyle habit and outdoor leisure activity, and the cumulative ultraviolet exposure time (hour) was calculated with consideration given to an actual age. The questionary items for the questionary studies were prepared on the basis of the questionnaire on the light exposure history which is published in National Cancer Institute (Arch. Dermatol. 144, 217-22 (2088)).
As in Test Example 9, data with a read count of less than 10 in the data of the expression levels of SSL-derived RNAs from the subjects (read count values) was set as a missing value, and converted into a RPM value corrected for a difference in the total number of reads between samples, and the missing value was then supplemented by SVD imputation. Only genes for which expression data that is not a missing value had been obtained in 80% or more of all the subjects was used for analysis. Log2 RPM values were used as the expression level data.
Division of Data Set
From the RNA profile data set obtained from the subjects, RNA profile data for 104 subjects, which amounts to 80% of the data set, was randomly extracted, and used as training data for cumulative ultraviolet exposure time prediction models. RNA profile data for the other 26 subjects, which amounts to 20% of the data set, was used as test data for evaluation of model accuracy.
Selection of Feature Genes
1,000 genes having a large Pearson's correlation coefficient with the cumulative ultraviolet exposure time in the training data were extracted. In addition to these 1,000 genes, the ages of the subjects were added to the feature.
Algorithms Used (Hyperparameter Candidate Values)
Algorithms identical to those in Test Example 10 were used.
Model Construction
10-fold cross validation was performed in the same manner as in Test Example 10, and the model giving the smallest RMSE of the difference from the measured value was selected as an optimum prediction model. In addition to the 1,000 genes selected above as features, the ages of the subjects were used.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/043040 | 11/1/2019 | WO | 00 |