The present invention relates to a space evaluation system.
As the awareness for the maintenance and improvement of people's health as well as their mental and physical functions increases, realization of spaces that improve labor productivity and provide a high stress-reducing effect is gaining attention. For example, it is well known that someone living together with a plant can expect to obtain a healing effect, and realization of a space into which biophilic design is incorporated and that lets one “feel as if one were in a natural forest” is expected. Biophilic design is the practice of designing a space based on the concept of biophilia that “humans have an instinctive desire to connect with nature”. In a space design such as a biophilic design, it is important to ascertain how close the space is to a natural environment.
Approaches for objectively evaluating a natural environment have been proposed. Patent Literature 1 discloses a method by which tree-trunk shape images captured of a forest area from above and spectral analysis results are analyzed to evaluate the forest area. Patent Literature 2 discloses an approach for evaluating naturality by ascertaining the state of material circulation from data on the amount of plant and data on microbial activity in a natural environment.
Evaluation approaches that focus on the degree of naturalness felt by humans have also been proposed. For example, Patent Literature 3 discloses a (space evaluating) approach wherein physiological response information when in a space in a forest and physiological response information when in a space in an urban area are acquired, and it is determined whether the space in the forest is suitable for forest bathing based on the difference in physiological response information between the two. Non Patent Literature 1 discloses a method for evaluating the degree of naturalness of a space based on evaluation items including the light and colors in an indoor space, fractal structures of a landscape, the presence or absence of living organisms in the space, and the like.
However, the approach disclosed in Patent Literature 1 mainly involves analysis of image data captured from up in the air, and is therefore limited to evaluation by means of an image. The approach disclosed in Patent Literature 2 is not applicable if there is no soil in the target space. According to the approach disclosed in Patent Literature 3, in order to evaluate an unknown space, it is necessary to acquire relative changes in physiological response information between a plurality of different spaces and to then perform analysis, requiring significant effort and time for the evaluation. In addition, as the evaluation result is largely dependent on individual differences between the subjects providing the physiological response information, it is difficult to evaluate the space quantitatively. According to the approach disclosed in Non Patent Literature 1, because the respective evaluation items are mainly based on visual information and the evaluation is in three levels, the amount of extracted information is small. Further, because the evaluation approach is limited to indoor spaces, it is difficult to evaluate naturalness in comparison to a natural environment.
The present invention was made in view of the foregoing, and it is an object of the present invention to provide a novel space evaluation system capable of simply and quantitatively evaluating how close an unknown space to be evaluated is to a natural environment.
In a space design such as a biophilic design, it is important to ascertain the “naturalness” as an index of how close the space is to a natural environment. The inventors have found that the naturalness of a space is affected by the quality of air present in the space (hereafter also referred to as “air quality”). In particular, the inventors have found that the naturalness of a space is greatly affected by microbes present in the air in the space.
In order to solve the problem, a space evaluation system according to the present invention includes a setting unit in which naturalness as an index of how close a space is to a natural environment is set; and an estimating unit for estimating, from air quality data indicating a type of material including a microbe included in a sample collected from air in a target space to be evaluated and indicating an abundance of each material, the naturalness of the target space from which the sample has been collected.
Thus, the space evaluation system, by simply collecting a sample from air in a target space that may be freely determined, and acquiring air quality data of the collected sample, can estimate naturalness from the air quality data alone. That is, the space evaluation system can estimate naturalness from the air quality data alone, without capturing an image of the target space from above, acquiring physiological response information in the target space, or performing sensory evaluation each time. In addition, the space evaluation system is applicable whether the target space is a space having no soil, such as an indoor space, or an outdoor space closer to a natural environment, and can estimate naturalness irrespective of the attributes of the target space. Accordingly, the space evaluation system can simply and quantitatively evaluate how close an unknown space is to a natural environment.
In a further preferred embodiment, in the setting unit, the naturalness may be set based on environment data indicating conditions of a plurality of specific spaces. The environment data may include data acquired in each of the plurality of specific spaces having different environments.
Thus, the space evaluation system can establish the naturalness as an index that enables objective evaluation of various spaces having different environments. Accordingly, the space evaluation system can accurately estimate naturalness by means of the estimating unit, and can therefore accurately evaluate how close an unknown space is to a natural environment.
In a further preferred embodiment, the environment data may include quantitative data acquired by a sensor in the specific spaces and qualitative data acquired in the specific spaces through sensory evaluation.
Thus, the space evaluation system can calculate and set naturalness by combining various data of different perspectives including quantitative data and qualitative data, and can therefore establish the naturalness as an index having high probability of enabling comprehensive evaluation from various viewpoints. In particular, because the environment data includes qualitative data acquired through sensory evaluation, the space evaluation system can establish the naturalness as an index approximating human sensory evaluation results. Thus, the space evaluation system can more accurately estimate naturalness by means of the estimating unit, and can therefore more accurately evaluate how close an unknown space is to a natural environment.
In a further preferred embodiment, the calculation of the naturalness may be machine-learned using, as training data, a data set in which the air quality data of a sample for learning collected from air in each of the plurality of specific spaces is associated with the naturalness corresponding to each of the plurality of specific spaces.
Thus, the space evaluation system can more simply and accurately estimate naturalness only from the air quality data of the target space that may be freely determined, and can therefore more simply and accurately evaluate how close an unknown space is to a natural environment.
In a further preferred embodiment, the air quality data may be acquired by analyzing, by means of an analysis device, a sample collected by a collecting device. In the setting unit, one or both of the air quality data of the material present in the collecting device before the sample is collected and the air quality data of the material present in the analysis device before the sample is analyzed may be set as the air quality data of a negative control sample. The estimating unit may estimate a contaminated proportion of the air quality data of the negative control sample that contaminates the air quality data of the sample collected in the target space, and may estimate the naturalness of the target space from the air quality data of the target space from which the air quality data of the negative control sample has been removed.
Thus, the space evaluation system can estimate naturalness from the true air quality data of the collected sample. Accordingly, the space evaluation system can more accurately estimate naturalness by means of the estimating unit, and can therefore more accurately evaluate how close an unknown space is to a natural environment.
According to the present invention, it is possible to provide a novel space evaluation system capable of simply and quantitatively evaluating how close an unknown space to be evaluated is to a natural environment.
In the following, embodiments of the present invention will be described with reference to the drawings. Configurations referred to by like reference signs in the respective embodiments have like or similar functions in the respective embodiments unless otherwise noted, and their description may be omitted.
With reference to
The space evaluation system 1 is a system for evaluating how close various spaces, including an outdoor space, such as a forest or an urban area, and an indoor space such as an office or a residence, are to a natural environment. The space evaluation system 1 is effective in realizing a space incorporating the biophilic design. In a space design for constructing a space for coexistence with plants that allows one to experience nature, such as a biophilic design, it is important to ascertain “naturalness” as an index of how close the space is to a natural environment. Further, in addition to sensory stimulations such as visual perception and auditory perception, people are also affected by the air quality of a space. In such space designs, it is important to evaluate the naturalness of a space by also focusing on air quality.
The present embodiment introduces a biophilic score (hereafter also referred to as “BPS”) as the naturalness of a space focusing also on air quality. The BPS is calculated by analyzing “environment data” indicating the condition of a space, such as its temperature and humidity, using a statistical approach. Details of the environment data and the calculation of the BPS will be described below with reference to
The space evaluation system 1 estimates the BPS of an unknown space to be evaluated (hereafter also referred to as a “target space”) from data indicating the air quality (hereafter also referred to as “air quality data”) of the target space. The target space is a space that may be freely determined, whether an indoor space or an outdoor space. The air quality data of the target space is data that indicates the type of materials including microbes contained in a sample collected from the air in the target space, and that indicates the abundance of each of the materials (relative abundance).
In addition to microbes, examples of the materials included in the samples used in the space evaluation system 1 include inorganic gases, volatile organic compounds, and allergen. Microbes are present in various environments, and are known to affect material circulation and the health state and the like of a host, for example. Microbes present in the air in the target space affect the quality of the air in the target space. In the present embodiment, attention is focused on microbes as the materials included in the samples used in the space evaluation system 1, and microbial community structure data of the target space is adopted as the air quality data of the target space. The microbial community structure data of the target space is data that indicates the type of microbes (microbial strains) belonging to the microbial community included in a sample collected from the air in the target space, and the abundance (relative abundance) of each of the microbes.
As illustrated in
The arithmetic processing device 10 includes an estimating unit 11 for estimating the BPS of a target space from the microbial community structure data of the target space, and a setting unit 12 in which the microbial community structure data and the BPS of reference spaces are set. The estimating unit 11 is comprised of a mathematical model (hereafter also referred to as an “estimation model”) for estimating the BPS of a target space from the microbial community structure data of the target space.
In the present embodiment, the estimating unit 11 has machine-learned to calculate the BPS with respect to the microbial community structure data of the target space, using, as training data, a data set in which the microbial community structure data of samples for learning collected from the air in each of a plurality of reference spaces is associated with a BPS corresponding to each of the plurality of reference spaces. Each of the plurality of reference spaces is a predetermined space for collecting the samples for learning. In the present embodiment, the spaces adopted as the plurality of reference spaces include various outdoor spaces such as a forest, a park, and an urban area; various indoor spaces such as an office, a laboratory, and a residence; and an experimentally fabricated indoor afforestation space. The reference spaces are an example of a “specific space” set forth in the claims.
Because the estimating unit 11 has machine-learned to calculate the BPS with respect to the microbial community structure data of the target space using the data set as training data, the space evaluation system 1 can more simply and accurately estimate the BPS only from the air quality data of the target space. Thus, the space evaluation system 1 can more simply and accurately evaluate how close an unknown space is to a natural environment.
A procedure for constructing the BPS estimation model constituting the estimating unit 11 will be described. In the estimation model learning stage, first, a sample for learning is collected from the air in each of a plurality of predetermined reference spaces. The structure of a microbial community included in each of the collected samples is analyzed to acquire the microbial community structure data for each of the plurality of reference spaces. Also, environment data is acquired in each of the plurality of reference spaces. Based on the acquired environment data, a BPS is calculated. Then, the microbial community structure data for each of the plurality of reference spaces is associated with the BPS corresponding to each of the plurality of reference spaces to create a data set. The created data set is set in the setting unit 12. The setting unit 12 sets the data set in the estimation model as training data, and trains the estimation model by machine learning to calculate the BPS with respect to the microbial community structure data of the target space. In this way, a trained estimation model is constructed. In the space evaluation system 1, the processing for implementing the setting of the training data and machine learning with respect to the estimation model may be performed by the setting unit 12.
In the estimation model learning stage, in addition to the data set, microbial community structure data of a negative control sample (hereafter also referred to as “NC sample”) is set in the estimation model. The NC sample essentially is a material that exists in the air of neither the reference spaces nor the target space. The NC sample is a material that could enter during the process of acquiring the microbial community structure data by collecting samples from the air in the reference spaces or the target space. The NC sample is, for example, a material present in a collecting device, such as an air sampler used for collecting a sample from the air; in an analysis device for the collected sample; or in a reagent and the like. In the present embodiment, microbial community structure data of microbes present in the collecting device before a sample is collected, and/or microbial community structure data of microbes present in the analysis device before a sample is analyzed is set in the setting unit 12 in advance as the microbial community structure data of the NC sample. The setting unit 12 sets the microbial community structure data of the NC sample in the estimation model, and then performs the machine learning using the data set and the microbial community structure data of the NC sample to construct the trained estimation model. The acquisition of the microbial community structure data will be described below with reference to
A procedure for estimating the BPS of the target space by the BPS estimation model constituting the estimating unit 11 will be described. In the BPS estimation model utilization stage, first, a sample is collected from the air in the target space. The structure of microbial communities included in the collected sample is analyzed to acquire the microbial community structure data of the target space. The microbial community structure data of the target space is then input into the trained BPS estimation model to estimate the BPS of the target space. Specifically, in the trained BPS estimation model, the contaminated proportion of the microbial community structure data of the NC sample that contaminates the microbial community structure data of the sample collected in the target space is estimated, and the BPS of the target space is estimated from the microbial community structure data of the target space from which the microbial community structure data of the NC sample has been excluded.
Accordingly, the space evaluation system 1 can estimate the BPS from the true microbial community structure data of the sample collected in the target space. Conventionally, it has been difficult to appropriately estimate the contaminated proportion of the microbial community structure data of the NC sample, and therefore it has been difficult to acquire the true microbial community structure data of the sample collected in the target space. The space evaluation system 1 can estimate the contaminated proportion of the microbial community structure data of the NC sample that contaminates the microbial community structure data of the target space, and can estimate the BPS from the true microbial community structure data of the collected sample. Thus, the space evaluation system 1 can more accurately estimate the BPS by means of the estimating unit 11, and can therefore more accurately evaluate how close an unknown space is to a natural environment.
It is noted that the estimating unit 11 is not limited to an estimation model constructed by machine learning as described above. The estimating unit 11 may be comprised of a relational expression, a table, a graph, or the like describing the relationship between the microbial community structure data acquired in each of a plurality of reference spaces and the BPS.
With reference to
The BPS is calculated based on environment data acquired in each of a plurality of reference spaces. The environment data is data acquired in each of a plurality of reference spaces having different environments. The plurality of reference spaces having different environments may comprise, for example, a plurality of reference spaces having different numbers of artificial objects, such as concrete buildings, or natural objects, such as forests. In the setting unit 12, the BPS calculated based on environment data indicating the condition of each of the plurality of reference spaces are stored.
Thus, the space evaluation system 1 can establish the BPS as an index that enables objective evaluation of a plurality of reference spaces having different environments. Accordingly, the space evaluation system 1 can accurately estimate naturalness by means of the estimating unit 11, and can therefore accurately evaluate how close an unknown space is to a natural environment.
One environment data acquired in one reference space includes, as illustrated in
Thus, the space evaluation system 1 can calculate and set the BPS by combining various data of different perspectives including quantitative data and qualitative data. Accordingly, the space evaluation system 1 can establish the BPS as an index having high probability of enabling comprehensive evaluation from various viewpoints. In particular, because the environment data includes qualitative data acquired through sensory evaluation, the space evaluation system 1 can establish the naturalness as an index approximating human sensory evaluation results. Accordingly, the space evaluation system 1 can more accurately estimate naturalness by means of the estimating unit 11, and can therefore more accurately evaluate how close an unknown space is to a natural environment.
The acquired environment data is associated with the sample collected in the reference space in which the environment data has been acquired, and is stored in a table shown at the top of
The BPS is calculated by performing multiple factor analysis (MFA) on the environment data. Specifically, first, principal component analysis is performed with respect to the quantitative data included in the environment data, and also multiple correspondence analysis is performed with respect to the qualitative data included in the environment data. Then, singular value decomposition is performed with respect to each. As a scaling process for unifying the scales between the data, the whole of the quantitative data is divided by a first singular value obtained by the singular value decomposition of the quantitative data, and also the whole of the qualitative data is divided by a first singular value obtained by the singular value decomposition of the qualitative data. A table in which the quantitative data on which the scaling process has been performed is stored, and a table in which the qualitative data on which the scaling process has been performed is stored are integrated. Principal component analysis is performed with respect to the entire data stored in the integrated table. In this way, multi-dimensional environment data including a plurality of quantitative data items and a plurality of qualitative data items is dimensionally compressed as one-dimensional continuous-value data illustrated by the number line shown at the bottom of
On the upper side of the number line illustrated in
The graph shown in
It is noted that, while the environment data shown in
With reference to
In step S501, first, a sample is collected from the air in a reference space. Specifically, a collecting device, such as the MD8 Airscan or AirPort from Sartorius AG and a gelatin filter are used to suction 3000 L of the air, and a microbial community in the air is caused to be adsorbed onto the gelatin filter.
In step S502, DNA is extracted from the collected sample. Specifically, the gelatin filter is dissolved and filtered, and DNA is extracted using DNeasy PowerWater Kit from QIAGEN.
In step S503, a library is prepared. Specifically, a primer targeting the V1-V2 region of 16S rRNA is used, and PCR amplification is performed in accordance with the standard protocol of Illumina, Inc. to prepare the library.
In step S504, DNA sequencing is performed. Specifically, the iSeq 100 sequencer from Illumina, Inc. is used, and 2×150 bp paired-end sequencing is performed.
In step S505, metagenome analysis is performed. Taxonomic composition data of microbial communities is obtained by shotgun metagenomic sequencing or 16S rRNA amplicon sequencing. Especially in the case of 16S rRNA amplicon sequencing, forward reads after adapter sequence removal are analyzed by Qiime2. In this way, the microbial community structure data of the sample collected from the air in the reference space is acquired.
It is noted that a procedure for acquiring the microbial community structure data of a sample collected from the air in the target space also involves steps similar to the step S501 to step S505 described above. Further, a procedure for acquiring the microbial community structure data of an NC sample also involves steps similar to the step S502 to step S505 described above, with the exception that in step S501, the sample is collected from the air in the reference space or the target space.
With reference to
As an approach for learning conversion from multivariate data, such as microbial community structure data, into numerical value data, such as the BPS, a number of machine learning approaches are available. Among others, non-linear transform approaches such as the random forest and deep learning are known to have high prediction accuracy, and there are many utilization examples. However, these non-linear transform approaches are generally difficult in terms of interpretation of the transform rules. Also, in the present embodiment, it is preferable to be able to construct an estimation model in which the relationship between the microbial community structure data and the BPS is clearly indicated. For example, it is preferable to be able to construct an estimation model that clearly indicates what partial community (constituent unit of a microbial community; hereafter referred to as “sub-community”) should be added to or removed from the microbial community structure data to change the BPS. Further, the process of acquiring the microbial community structure data is essentially a probabilistic phenomenon. Generally, it is impossible to directly observe a “true microbial community” included in a sample, and the microbial community structure data is always acquired by probabilistic sampling from the sample. With a deterministic approach such as deep learning, it is not easy to capture such probabilistic property of data.
Accordingly, in the present embodiment, as a machine learning approach related to the BPS estimation model, supervised Latent Dirichlet Allocation (hereafter also referred to as “sLDA”) is adopted, which is one of topic models. Also, in the present embodiment, the microbial community structure data of the NC samples is set in the estimation model in advance. The sLDA is a modeling approach for simultaneously learning auxiliary information and count data to extract “topics”. In the sLDA, each of the topics is linked with a “regression coefficient of auxiliary information” (one-dimensional continuous value). It is noted that while in the present embodiment sLDA is adopted as the machine learning approach related to the BPS estimation model, other approaches may be adopted.
The variables used in mathematical expressions describing the BPS estimation model are defined as follows:
The generative process of the BPS estimation model is as follows:
5-2. y˜(ηT
Bayesian inference of an unknown parameter is performed. First, a conventional sLDA is considered in which the microbial community structure data of an NC sample is not set in the estimation model. The joint probability of the estimation model is described as follows:
where B is a multinomial beta function. Integrating out with respect to θ,ϕ results in the following description:
Nd is the number of DNA sequences in the sample d, and Nk is the number of DNA sequences assigned to the topic k. What is desired to be determined is the posterior probability for z, η, as described below:
Since the computation of the denominator is intractable, the posterior distribution is approximated by Gibbs sampling.
The full conditional distribution at the topic zdn of the DNA sequence n of the sample d is described as follows:
Initially, the term Πk=1KB(Nk+β) is described as follows:
where the \zdn added to the variables means a count obtained by removing zdn with respect to the corresponding numerical values. Also, the property Γ(x+1)=xΓ(x) of a gamma function is utilized.
Likewise, the term Πd=1DB(Nd+α) is also described as follows:
Finally, the term Πd=1D(yd|ηTzd, 1.0) is also computed as follows:
where δd,d′ is the Kronecker delta.
From the above, the full conditional distribution of the topic zdn of the DNA sequence n of the sample d is described as follows:
Next, the full conditional distribution of the weight parameter ηk of the topic k is considered. While a conditional distribution can be strictly determined for η, it is possible to show that, as a simple approximation of a Bayesian linear regression model, the distribution is centered around the least squares solution of the following equation:
where Z is the matrix Z=(z1 . . . . zD) such that each column is the composition
By the derivation up to this point, a method for updating zdn,η in each step of Gibbs sampling has been obtained. In implementation, first, random topics are assigned to all DNA sequences of all samples, and all zdn are sampled and updated according to equation (1), and η is updated by solving equation (2). This is repeated until the joint probability of the entire model converges.
In the present embodiment, in addition to the learning of sLDA, the microbial community structure data of the NC samples is set in the estimation model in advance. In this case, what needs to be modified in the update equation of Gibbs sampling is the first term on the right-hand side of equation (1). When the microbial community structure data of the NC samples is set in the estimation model in advance, since the microbial community structure data of the NC samples is fixed at all times during the process of learning, equation (1) is modified as follows:
In equation (3), the upper term of Lk represents a term corresponding to conventional topic estimation, and the lower term of Lk represents a term corresponding to the NC samples.
In the BPS estimation model, the microbial community structure is partitioned into a set of subcommunities. One of sub-communities is derived from humans, while another is derived from the natural environment. These subcommunities are the topics estimated in the model. In a sample collected from the air, these topics are present in a mixed manner. The way topics are mixed (which topics are dominant and how dominant they are) is varied depending on the sample. Further, not all of the microbes as the members of the topics are observed in the sample; instead, the results of sampling performed stochastically in accordance with the community structures (types of microbes and abundance thereof) of the topics are observed.
Also, each sample has a BPS calculated independently of the microbial community structure data. In the BPS estimation model, it is assumed that the BPS is defined according to “how the topics are mixed (mixing proportions)” for each sample. For example, a certain topic has a negative influence on the BPS (influence to decrease the BPS), while another certain topic has a positive influence on the BPS (influence to increase the BPS). The parameter representing the influence of each topic on the increase or decrease of the BPS is the n parameter. In the BPS estimation model, it is assumed that the BPS of each sample is calculated according to the inner product of the mixing proportions of topics (topic composition) in each sample and the η parameter.
In the present embodiment, 585 samples collected from the air in the reference spaces are prepared, and the microbial community structure data and the BPS of each sample were acquired. Further, as the NC samples, 27 samples were prepared, and their microbial community structure data was acquired. These data were set in the estimation model and machine learning was performed, whereby 12 topics of Topic #0 to Topic #11 were extracted. The number of topics extracted (12) was set after verifying in advance that a further increase in the number of topics extracted would not significantly increase the model's estimation accuracy.
The graph of
As in the samples of “Sample #1” and “Sample #5” illustrated in
As illustrated in
In the present embodiment, the prediction accuracy of the model was estimated by 5-fold cross validation. Specifically, first, the data sets (microbial community structure data and BPS) of the 585 samples were divided into five data set. Of the divided five data set, four were used for model training, and the remaining one was used for testing to estimate the accuracy of the model by predicting BPS estimates and comparing them to the ground truth BPS. This process was repeated five times to verify the estimation model.
When estimating the BPS by inputting the test data into the trained estimation model, first, the parameters of the estimation model were used to estimate the mixing proportions of the topics (topic composition) in each test data from the microbial community structure data of the test data. Thereafter, the product of the mixing proportions of the topics in each test data and the n parameter is calculated and converted into a BPS.
The graph shown in
As described above, the space evaluation system 1 of the present embodiment includes the setting unit 12 in which the naturalness (BPS) as an index of how close a space is to a natural environment is set. The space evaluation system 1 of the present embodiment further includes the estimating unit 11 which estimates, from the air quality data indicating the type of materials including microbes included in a sample collected from the air in a target space to be evaluated and indicating the abundance of each of the materials (microbial community structure data), the naturalness (BPS) of the target space from which the sample has been collected.
Thus, the space evaluation system 1 of the present embodiment, by simply collecting a sample from the air in a target space that may be freely determined, and acquiring the air quality data of the collected sample, can estimate naturalness from the air quality data alone. That is, the space evaluation system 1 of the present embodiment can estimate naturalness from the air quality data alone, without capturing an image of the target space from above, acquiring physiological response information in the target space, or performing sensory evaluation each time. In addition, the space evaluation system 1 of the present embodiment is applicable whether the target space is a space having no soil, such as an indoor space, or an outdoor space closer to a natural environment, and can estimate naturalness from the air quality data alone irrespective of the attributes of the target space. Conventionally, there have been examples in which the contamination degree of air is expressed as an index and evaluated in terms of inorganic gases, volatile organic compounds and the like. However, air quality data has not been used for evaluating naturalness. Naturally, there is no previous example of a model for estimating naturalness from air quality data. The space evaluation system 1 of the present embodiment can estimate naturalness only from the air quality data of the target space that may be freely determined. Accordingly, the space evaluation system 1 of the present embodiment can simply and quantitatively evaluate how close an unknown space is to a natural environment.
Further, in the space evaluation system 1 of the present embodiment, the machine learning related to the naturalness estimation model constituting the estimating unit 11 is performed by sLDA, which is one of topic models.
Thus, the space evaluation system 1 of the present embodiment can extract, for example, the structure (i.e., topics) of a sub-community that exists in microbial community structure data and affects naturalness. Accordingly, the space evaluation system 1 of the present embodiment can more accurately estimate naturalness by means of the estimating unit 11, and can therefore more accurately evaluate how close an unknown space is to a natural environment.
As noted above, as a machine learning approach related to estimation model, machine learning approaches such as random forest, deep learning and the like are applicable. However, for example, with such approaches, it is not easy to extract the structure of a sub-community that exists in microbial community structure data and that affects naturalness. Further, for example, because the process of acquiring microbial community structure data is essentially a process of sampling from a “true microbial community”, inclusion of stochastic fluctuation of data as noise cannot be avoided. With deterministic approaches such as deep learning, it is not easy to capture probabilistic property of data, and it is not easy to perform modelling of a probabilistic sampling process explicitly. In addition, for example, depending on the microbial community structure data, sampling may not be fully accomplished and there may be much sparse data. Accordingly, with a deterministic approach such as deep learning, it may be also difficult to select a regularization means for preventing over-training. For these reasons, for the estimation model, it is effective to use the approach using sLDA of the present embodiment which is a stochastic model, is capable of extracting the structure of a sub-community, and is a modeling approach that learns regression to numerical value information.
In addition, because the space evaluation system 1 of the present embodiment is capable of extracting topics affecting naturalness as described above, it can be clearly shown what topics should be added or removed to change naturalness. Thus, with the space evaluation system 1 of the present embodiment, it is possible to simply and quantitatively ascertain the types and abundance of materials related to air quality necessary for obtaining desired naturalness. Accordingly, with the space evaluation system 1 of the present embodiment, it is possible to simply and quantitatively develop a guideline for designing a space having desired naturalness.
With reference to
In the foregoing embodiment, the BPS estimation model constituting the estimating unit 11 uses the above-described data set (microbial community structure data and BPS) and the microbial community structure data of an NC sample to perform machine learning by sLDA. The trained estimation model estimates the contaminated proportion of the microbial community structure data of the NC sample that contaminates the microbial community structure data of the sample collected in the target space, and estimates the BPS of the target space from the microbial community structure data of the target space from which the microbial community structure data of the NC sample has been excluded.
Here, the model itself for estimating the contaminated proportion of the microbial community structure data of the NC sample (hereafter also referred to as “NC estimation model”) can be constructed according to an approach different from the sLDA illustrated in
The variables used in the mathematical expressions for describing the NC estimation model by LDAnc are similar to those described above with reference to
The generative process the NC estimation model is as follows:
Bayesian inference of an unknown parameter is performed. As in conventional LDA, all unknown parameters are inferred by collapsed Gibbs sampling. As initial values, number k∈{1, . . . , K, K+1, . . . , K+V} corresponding to any of K topics or V NC samples is randomly assigned. Gibbs sampling is repeated until the joint probability of the entire model converges. In this case, the community structure ϕk=K+1 . . . K+V of the NC samples is fixed and is not updated during the repetition, which is in contrast to conventional LDA.
The full conditional distribution of the topic zdn of DNA sequence n of the sample d is described as follows:
Finally, the number assigned to each DNA sequence is examined, and the DNA sequences to which the numbers corresponding to the NC samples are assigned are identified. Then, of the entire DNA sequences in the sample, the proportions occupied by the DNA sequences to which the numbers corresponding to the NC samples are assigned are computed. In this way, the contaminated proportion of the NC samples can be estimated.
The model validation was performed in a simulated manner using images. Specifically, 10 images were prepared as ground truth data, and 30 images were prepared as test data. The 10 images of the ground truth data comprised patterns of predetermined colors and shapes corresponding to sub-communities disposed in different pixel regions in each image. The 30 images of the test data comprised the patterns corresponding to the sub-communities randomly mixed in the images. Then, the NC estimation model by LDAnc and an NC estimation model by conventional LDA were used to estimate the patterns of the ground truth data from the test data. In this case, in the NC estimation model by conventional LDA, the patterns of the ground truth data were estimated assuming that the 10 items of the ground truth data were all unknown. In the NC estimation model by LDAnc, the patterns of the ground truth data were estimated assuming that of the 10 items of the ground truth data, two were known and the remaining eight were unknown. Then, a mean absolute error (hereafter also referred to as “MAE”) between the estimated patterns and the patterns of the ground truth data was calculated. Such process was repeated 100 times to determine the distribution of the MAE in each NC estimation model.
Thus, with the NC estimation model by LDAnc, it is possible to estimate the contaminated proportion of the microbial community structure data of NC samples that contaminates the microbial community structure data of the sample collected in the target space with higher estimation accuracy than by the NC estimation model by conventional LDA. With the NC estimation model by LDAnc, it is possible to acquire the true microbial community structure data of the collected sample by subtracting the estimated contaminated proportion of the microbial community structure data of the NC samples from the microbial community structure data of the sample collected in the target space.
It is noted that the NC estimation model by LDAnc is not limited to microbial community structure data and may be applied to count data other than microbial community structure data, such as air quality data and document data. The NC estimation model by LDAnc may constitute a part of the estimating unit 11 provided in the arithmetic processing device 10 of the space evaluation system 1.
While embodiments of the present invention have been described, the present invention is not limited to the foregoing embodiments, and various design changes may be made without departing from the spirit and scope of the claims. In the present invention, the configuration of a certain embodiment may be added to the configuration of another embodiment, the configuration of the certain embodiment may be substituted with another embodiment, or a part of the configuration of the certain embodiment may be deleted.
Number | Date | Country | Kind |
---|---|---|---|
2021-005128 | Jan 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/001136 | 1/14/2022 | WO |