This application claims priority under 35 U.S.C. § 119 to Korean Patent Application No. 10-2023-0116366 filed on Sep. 1, 2023, in the Korean Intellectual Property Office, the disclosure of which is incorporated by reference herein in its entirety.
Embodiments of the present disclosure described herein relate to a machine learning device, and more particularly, relate to a machine learning device using an adaptive feature selection technique and a method of operating the same.
Incremental machine learning methods continuously update a learning model even without previous learning data, in a varying feature space environment with changing features that make up the data, in a data stream input in real time or in multiple data sets provided regularly or irregularly.
However, in a dynamic environment with the varying feature space, new features may continuously emerge as incremental machine learning progresses. As new features continuously emerge, the complexity of the learning model increases due to the accumulation of features. As the complexity of the learning model increases, the computational complexity required for inference of the learning model may increase.
In addition, when features overlap with each other due to the accumulation of features or features unnecessary for inference remain in the learning model, noise may occur during the inference process of the learning model and the prediction performance of the learning model may decrease. Accordingly, a device and a method for selecting features advantageous for inference and prediction in an incremental machine learning environment with the varying feature space may be required.
Embodiments of the present disclosure provide a machine learning device and a method of operating the same that use an adaptive feature selection technique to improve inference performance in a dynamic environment with a feature space dimension that continuously increases as incremental machine learning progresses.
According to an embodiment of the present disclosure, a machine learning device includes a probability table generating device that generates, based on a data set received from an outside, a probability table with respect to a target feature of the data set and a conditional probability table with respect to the target feature of a plurality of input features of the data set, a correlation extraction device that generates correlation information associated with the target feature and the plurality of input features based on the data set, a feature weight extraction device that generates a feature weight for each of the plurality of input features based on at least some of first information among the correlation information, the probability table, and the conditional probability table, a feature selection device that calculates a feature importance based on second information among the correlation information and the feature weight and selects at least some of the plurality of input features based on the feature importance, and a model generating device that generates a prediction model including at least some of the selected input features.
According to an embodiment, after the data set is received, when a new data set is received from the outside, the probability table generating device may update the probability table and the conditional probability table with respect to the target feature based on the new data set.
According to an embodiment, the first information may include a degree of relevance of each of the plurality of input features with respect to the target feature and a degree of redundancy between the plurality of input features.
According to an embodiment, after the data set is received, when a new data set is received from the outside, the correlation extraction device may update the degree of relevance and the degree of redundancy based on the plurality of input features included in the new data set.
According to an embodiment, the second information may include a correlation between the plurality of input features conditioned on the target feature.
According to an embodiment, after the data set is received, when a new data set is received from the outside, the correlation extraction device may update the correlation based on the plurality of input features included in the new data set.
According to an embodiment, the feature selection device may include a feature importance extraction device that extracts the feature importance based on the second information and the feature weight, a feature importance comparison device that compares the extracted feature importance with a first threshold, and a feature extraction device that extracts the at least some of the plurality of input features based on the comparison result.
According to an embodiment, the feature extraction device may extract input features having a feature importance equal to or greater than the first threshold.
According to an embodiment, the feature importance extraction device may count other input features for input feature pair combinations and an input feature having second information equal to or greater than a second threshold, and may transmit information with respect to a first input feature that does not have second information equal to or greater than the second threshold to the feature extraction device based on the count result.
According to an embodiment, the feature importance extraction device may calculate an average value of the second information with respect to the second input feature having the second information greater than the second threshold with the other input features based on the count result, and may extract the feature importance based on the feature weight, average value, and the count result with respect to the second input feature.
According to an embodiment, the feature weight extraction device may receive the prediction model from the model generating device, may receive the probability table and the conditional probability table with respect to the target feature of the at least some of the selected input features from the probability table generating device, may receive first information with respect to the at least some of the selected input features from the correlation extraction device, and may re-extract a feature weight for each of the selected at least some input features based on at least some of the probability table, the conditional probability table with respect to the target feature of the selected at least some input features, and the first information.
According to an embodiment, the model generating device may receive the feature weight re-extracted from the feature weight extraction device, may generate a completed prediction model based on the re-extracted feature weight, and may perform an inference on at least some of the data sets based on the completed prediction model.
According to an embodiment, the data set may include data for generating the prediction model and data for performing the inference.
According to an embodiment of the present disclosure, a method of operating a machine learning device includes generating, by the machine learning device, based on a data set received from an outside, a probability table with respect to a target feature of the data set and a conditional probability table with respect to the target feature of a plurality of input features of the data set, generating, by the machine learning device, correlation information associated with the target feature and the plurality of input features based on the data set, generating, by the machine learning device, a feature weight for each of the plurality of input features based on at least some of first information among the correlation information, the probability table, and the conditional probability table, calculating, by the machine learning device, a feature importance based on second information among the correlation information and the feature weight and selecting at least some of the plurality of input features based on the feature importance, and generating, by the machine learning device, a prediction model including at least some of the selected input features.
According to an embodiment, the first information may include a degree of relevance of each of the plurality of input features with respect to the target feature and a degree of redundancy between the plurality of input features, and the second information may include a correlation between the plurality of input features conditioned on the target feature.
According to an embodiment, the selecting of at least some of the plurality of input features may include extracting a feature importance based on the second information and the feature weight, comparing the extracted feature importance with a first threshold, and extracting the at least some of the plurality of input features based on the comparison result.
According to an embodiment, the extracting of the feature importance may include counting other input features for input feature pair combinations and an input feature having second information equal to or greater than a second threshold, and calculating an average value of the second information with respect to the second input feature having the second information greater than the second threshold with the other input features based on the count result, and extracting the feature importance based on the feature weight, the average value, and the count result with respect to the input feature.
According to an embodiment, the generating of the feature weight may include re-extracting a feature weight for each of the selected at least some input features based on at least some of the probability table, the conditional probability table with respect to the target feature of the selected at least some input features, and the first information.
According to an embodiment, the generating of the prediction model may further include generating a completed prediction model based on the re-extracted feature weight, and performing, by the machine learning device, an inference on at least some of the data sets based on the completed prediction model.
According to an embodiment, the data set may include data for generating the prediction model and data for performing the inference.
The above and other objects and features of the present disclosure will become apparent by describing in detail embodiments thereof with reference to the accompanying drawings.
Hereinafter, embodiments of the present disclosure may be described in detail and clearly to such an extent that an ordinary one in the art easily implements the present disclosure.
Each of the probability table generating device 110, the correlation extraction device 120, the feature weight extraction device 130, the feature selection device 140, the model generating device 150, and the inference device 160 may be implemented in the form of software or hardware, or a combination thereof.
For example, the software may be a machine code, firmware, an embedded code, and application software. For example, the hardware may include an electrical circuit, an electronic circuit, a processor, a computer, an integrated circuit, integrated circuit cores, a pressure sensor, an inertial sensor, a microelectromechanical system (MEMS), a passive element, or a combination thereof.
The machine learning device 100 may receive a data set from a database 10 located outside the machine learning device 100. The machine learning device 100 may generate a prediction model through incremental machine learning based on the received data set, and may perform inference and prediction on the data set using the prediction model.
In addition, the machine learning device 100 may gradually update the prediction model even without previous data sets when multiple data consisting of a plurality of data sets or each data set of data streams is sequentially input. In detail, the machine learning device 100 may generate a new prediction model each time a data set is input.
The data set may include data for generating a prediction model and data that are the subject of inference and prediction. The data for generating a prediction model may include a plurality of input features.
The machine learning device 100 may be implemented with at least one of a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), or a neural network processing unit (NPU), but the present disclosure is not limited thereto.
The machine learning device 100 may include a prediction model for performing the above-described incremental machine learning. For example, the prediction model may include a convolutional neural network (CNN), but the present disclosure is not limited thereto.
The probability table generating device 110 may designate a target feature with respect to a plurality of input features based on the received data set. The probability table generating device 110 may generate a probability table for the target feature and a conditional probability table for each of the input features with the target feature as a condition.
The conditional probability table for each of the input features with the target feature as the condition may refer to a conditional probability table with respect to the target feature of the input features.
The probability table generating device 110 may designate a node (a target node) (not illustrated) corresponding to the target feature and may form feature nodes (not illustrated) with respect to all input features. The probability table generating device 110 may connect each of the feature nodes (not illustrated) to the target node (not illustrated).
The probability table generating device 110 may generate an incremental classifier based on a probability table for the target feature and a conditional probability table for input features that are conditioned on the target feature.
The incremental classifier may be updated through learning based only on the current data set, even without the previous data set, in a situation where data sets come in sequentially. In detail, when a previous incremental classifier model is generated using a previous data set, the incremental classifier model may be updated based only on the current data set and the previous incremental classifier model, even without the previous data set.
The probability table generating device 110 may use a machine learning algorithm capable of sequential learning to generate the incremental classifier. As an example, the probability table generating device 110 may generate the incremental classifier (hereinafter referred to as a Naive Bayes-based incremental classifier) using a Naive Bayes-based classifier learning algorithm.
The Naive Bayes-based classifier learning algorithm may be an algorithm that assumes conditional independence between features.
The probability table generating device 110 may train the Naive Bayes-based incremental classifier by generating a probability table for the target feature and a conditional probability table for the target feature of each input feature based on data for generating a prediction model.
The probability table generating device 110 may configure a Bayesian network based on a conditional probability table for each input feature that is conditioned on the target feature or one or more input features.
When the probability table generating device 110 receives a new data set from the database 10, the probability table generating device 110 may update the Naive Bayes-based incremental classifier generated based on the existing data set. In detail, the probability table generating device 110 may update the probability table for the target feature and the conditional probability table for the target feature of each input feature based on the new data set.
For example, when a new data set has at least one new input feature that is not in the existing data set, the probability table generating device 110 may add a feature node corresponding to the new input feature to the existing Naive Bayes-based incremental classifier, and may connect the new feature node to the target node.
When a new data set has input features that are in the existing data set, the probability table generating device 110 may update the probability table and the conditional probability table based on the target feature and the input features.
The correlation extraction device 120 may extract the degree of relevance of between the target feature and each input feature and the degree of redundancy between input features based on the received data set. The correlation extraction device 120 may extract the degree of relevance and the degree of redundancy for the input features based on a joint probability between the target node (not illustrated) and each feature node (not illustrated), and between feature nodes (not illustrated).
The degree of relevance for the input features is obtained by quantifying a correlation between the target node (not illustrated) and a specific feature node (not illustrated) and normalizing it for all input features, and may indicate a relative relevance between the target node (not illustrated) and a corresponding feature node.
The degree of redundancy for input features is obtained by quantifying a correlation between a specific input feature and the remaining input features and normalizing it for all input features, and may indicate how much the distribution characteristics of the corresponding input feature overlap with other input features.
The correlation extraction device 120 may use at least one of a Pearson Correlation Coefficient, a Spearman Correlation Coefficient, and mutual information to quantify the correlation. However, the correlation extraction device 120 is not limited thereto and may quantify the correlation using various statistical calculation methods.
Hereinafter, in this specification, the correlation extraction device 120 will be described as quantifying the correlation using the mutual information.
The correlation extraction device 120 may further extract correlations between input features conditioned on the target node (not illustrated) in addition to the degree of relevance and the degree of redundancy with respect to the input features.
When quantifying the correlation using the mutual information, the correlation between input features conditioned on the target node (not illustrated) may be referred to as inter-feature mutual information.
The value of the inter-feature mutual information between an input feature
When the value that a target feature (a target node) Y may have is y, then
Thereafter, when the correlation extraction device 120 receives a new data set from the database 10, the correlation extraction device 120 may update the relevance, redundancy, and inter-feature mutual information extracted based on the existing data set.
For example, when the new data set has at least one new input feature that is not in the existing data set, the relevance, redundancy, and inter-feature mutual information between the new input feature and the existing input features may be added to the existing correlation information.
When the new data set has input features that are in the existing data set, the relevance, redundancy, and inter-feature mutual information with respect to the input features may be updated.
The feature weight extraction device 130 may receive information on the degree of relevance and redundancy of features from the correlation extraction device 120. The feature weight extracting device 130 may receive a probability table associated with the target feature and a conditional probability table for each of the input features conditioned on the target feature from the probability table generating device 110.
The feature weight extraction device 130 may extract a feature weight for each of the input features based on the degree of relevance and redundancy of the features, a probability table, and a conditional probability table. Alternatively, the feature weight extraction device 130 may extract the feature weight for each input feature based only on the degree of relevance and redundancy of the features.
When a new data set is input to the machine learning device 100, the feature weight extraction device 130 may receive information on updated relevance and redundancy from the correlation extraction device 120. In addition, the feature weight extraction device 130 may receive the updated probability table and conditional probability table from the probability table generating device 110.
The feature weight extraction device 130 may update the feature weight for each of the input features based on the updated relevance and redundancy, and the updated probability table and conditional probability table. Alternatively, the feature weight extraction device 130 may update the feature weight for each of the input features based only on the updated relevance and redundancy of the features.
For example, when the new data set includes at least one new input feature that is not in the existing data set, the feature weight extraction device 130 may extract the feature weight with respect to the new input feature.
When the new data set includes input features that are in the existing data set, the feature weight extraction device 130 may update the feature weight for each of the input features.
The feature selection device 140 may receive the inter-feature mutual information from the correlation extraction device 120. The feature selection device 140 may receive the feature weight for each of the input features from the feature weight extraction device 130.
The feature selection device 140 may calculate feature importance based on the feature weight and the inter-feature mutual information. The feature selection device 140 may include a feature importance calculation function. The feature selection device 140 may calculate feature importance based on the feature importance calculation function.
The feature selection device 140 may select at least some input features among a plurality of features included in the data set based on the feature importance.
When a new data set is input to the machine learning device 100, the feature selection device 140 may receive the added or updated inter-feature mutual information from the correlation extraction device 120. In addition, the feature selection device 140 may receive the added or updated feature weights from the feature weight extraction device 130.
The feature selection device 140 may update feature importance based on the updated feature weights and the updated inter-feature mutual information. The feature selection device 140 may update feature importance based on the feature importance calculation function.
For example, when the new data set includes at least one new input feature that is not in the existing data set, the feature selection device 130 may extract feature importance with respect to the new input feature.
When the new data set includes input features that are in the existing data set, the feature weight extraction device 130 may update the feature importance for each of the input features. A detailed description of the configuration of the feature selection device 140 will be described later.
The model generating device 150 may receive information on at least some input features from the feature selection device 140. The model generating device 150 may generate a prediction model including at least some input features. In this case, the remaining features that are not selected among the plurality of features included in the data set may not be included in the prediction model.
When a new data set is input to the machine learning device 100, the model generating device 150 may receive the updated feature importance from the feature selection device 140. The model generating device 150 may update the prediction model based on the updated feature importance.
For example, when the new data set includes at least one new input feature that is not in the existing data set, the model generating device 150 may generate a prediction model including the new input feature.
When the new data set includes the input features that are in the existing data set, the model generating device 150 may update the existing prediction model.
The feature weight extraction device 130 may receive a prediction model (or an updated prediction model) from the model generating device 150. The feature weight extraction device 130 may re-extract a feature weight for each of at least some input features included in the prediction model (or the updated prediction model) by utilizing a data set in an incremental machine learning process. A detailed description of this will be provided later in
The model generating device 150 may receive the re-extracted feature weight from the feature weight extraction device 130. The model generating device 150 may regenerate the prediction model (or the updated prediction model) based on the re-extracted feature weight (hereinafter, the prediction model generated based on the re-extracted feature weight refers to a completed prediction model).
The inference device 160 may receive a completed prediction model (or an updated completed prediction model) from the model generating device 150. The inference device 160 may receive a data set from the database 10 external to the machine learning device 100. The inference device 160 may perform inference and prediction on the data set based on a completed prediction model (or an updated completed prediction model).
The feature selection device 140 may receive inference and prediction results from the inference device 160. The feature selection device 140 may adjust a feature selection method based on the inference and prediction results.
For example, the feature selection device 140 may optimize the feature importance calculation function based on the inference and prediction results. A detailed description of this will be provided later in
Referring to
The feature importance extraction device 241 may receive an inter-feature mutual information from the correlation extraction device 120. The feature importance extraction device 241 may receive the feature weight for each of the input features from the feature weight extraction device 130.
The feature importance extraction device 241 may extract the feature importance for each of the input features based on the inter-feature mutual information and the feature weight. The detailed configuration of this will be described later in
The feature importance comparison device 242 may receive the feature importance for each of the input features from the feature importance extraction device 241. The feature importance comparison device 242 may compare the feature importance of each input feature with a threshold.
The feature extraction device 243 may receive a comparison result from the feature importance comparison device 242. The feature extraction device 243 may extract a set of input features of which feature importance is equal to or greater than the threshold.
Alternatively, the feature extraction device 243 may directly receive information on the input features from the feature importance extraction device 241. The detailed configuration of this will be described later in
Referring to
The feature counter 341a may receive the inter-feature mutual information from the correlation extraction device 120. The feature counter 341a may count the number of input features having inter-feature mutual information equal to or greater than a threshold.
For example, the pairwise mutual information between two input features
The set of input features that satisfy Mij≥θ (θ: threshold, 0≤θ≤1) and are included in the combination of h input feature pairs may be
For example, it is assumed that F={X1, X2, X3, X4, X5, X6} and
Since the number of combinations of input feature pairs including input feature
The average value calculation device 341b may receive mutual information between two features
For example, when Mave(Xi) is the average value of Mkl for the input feature pairs that the input feature
The feature importance calculation device 341c may receive the average value
The feature importance calculation device 341c may include a feature importance calculation function
For example, the feature importance calculation function
In Equation 3, the value of the feature importance calculation function
In Equation 4, the value of the feature importance calculation function
Each coefficient
In Equation 5, the value of the feature importance calculation function
In Equation 6, the value of the feature importance calculation function
Each index
However, the feature importance calculation function
Referring to
The prediction model receiving device 431 may receive a prediction model (or an updated prediction model) from the model generating device 150. The prediction model (or the updated prediction model) may include at least some input features selected by the feature selection device 140 or 240 among a plurality of features included in the data set.
The feature weight calculation device 432 may receive a prediction model (or an updated prediction model) from the prediction model receiving device 431. The feature weight calculation device 432 may re-extract a feature weight for each of at least some input features by using a data set (or a new data set) in an incremental machine learning process.
For example, the number of input features included in the data set (or the new data set) may be 100, and the number of at least some input features selected by the feature selection device 140 or 240 may be 30.
In this case, the feature weight calculation device 432 may receive information about the degree of relevance and redundancy with respect to 30 input features from the correlation extraction device 120. The feature weight calculation device 432 may receive a probability table for the target feature and a conditional probability table for each of 30 input features conditioned on the target feature from the probability table generating device 110.
The feature weight calculation device 432 may re-extract the feature weight for each of the 30 input features based on the probability table for the target feature, the degree of relevance and redundancy for the 30 input features, and the conditional probability table. Alternatively, the feature weight calculation device 432 may re-extract the feature weight for each of the 30 input features based only on the degree of relevance and redundancy of the 30 input features.
Referring to
In operation S120, the machine learning device 100 may extract probability information of features and correlations between features based on the data set. The machine learning device 100 may designate a target feature with respect to input features.
The probability information may include a probability table for the target feature and a conditional probability table with respect to the target feature of each input feature that is conditioned on the target feature.
The correlation between features may include the degree of relevance of each input feature to the target feature, the degree of redundancy between each input feature, and the inter-feature mutual information that is conditioned on the target feature.
In operation S130, the machine learning device 100 may extract a feature weight for each feature based on the probability information and the correlations. In this case, the correlation may refer to the degree of relevance of each input feature to the target feature and the degree of redundancy between each input feature.
In operation S140, the machine learning device 100 may extract feature importance based on the correlations and the feature weight. In this case, the correlation may refer to the inter-feature mutual information that is conditioned on the target feature.
In operation S150, the machine learning device 100 may select at least some of the input features based on the feature importance. The machine learning device 100 may select an input feature that has a feature importance equal to or greater than the threshold among the input features. In addition, the machine learning device 100 may select an input feature of which the inter-feature mutual with respect to each of the other input features is less than a threshold.
In operation S160, the machine learning device 100 may generate a prediction model based on at least some of the selected input features. The generated prediction model may not be immediately transmitted to the inference device 160.
The feature weights for each of at least some input features included in the generated prediction model may be re-extracted. The machine learning device 100 may generate a completed prediction model based on the re-extracted feature weights. A detailed description of this will be provided later in
Referring to
In operation S220, the machine learning device 100 may generate a completed prediction model based on the re-extracted feature weight.
In operation S230, the machine learning device 100 may perform inference on at least some of the data sets based on the completed prediction model. At least some of the data sets may be data that are the subject of inference and prediction, unlike data for generating a prediction model.
The machine learning device 100 may adjust the coefficient or index of the feature importance calculation function by applying the inference results.
Thereafter, when a new data set is input, the machine learning device 100 may update the completed prediction model even without the previous data set.
As described above, the machine learning device 100 may extract probability information of input features and correlations between input features based on the new data set.
The machine learning device 100 may extract the feature weight for each of the input features based on probability information, the degree of relevance of each input feature with respect to the target feature, and the degree of redundancy between each input feature.
The machine learning device 100 may extract feature importance based on the feature weight and inter-feature mutual information between features that are conditioned on the target feature.
The machine learning device 100 may select at least some of the input features based on the feature importance.
The machine learning device 100 may generate a prediction model based on at least some selected input features.
The machine learning device 100 may re-extract feature weights based on probability information with respect to at least some of the input features included in the prediction model, the degree of relevance of at least some of the input features with respect to the target feature, and the degree of redundancy between at least some of the input features.
In detail, the machine learning device 100 may utilize information on at least some input features selected from among the input features included in the new data set.
The machine learning device 100 may generate a completed prediction model based on the re-extracted feature weights. In detail, the machine learning device 100 may update the completed prediction model corresponding to the previous data set to the completed prediction model corresponding to the new data set.
The machine learning device 100 may perform inference on at least some of the data sets based on the updated prediction model.
According to an embodiment of the present disclosure, the machine learning device and the method of operating the same may omit unnecessary features for inference and prediction by using an adaptive feature selection technique. Accordingly, the complexity of the learning model due to an increase in the feature space may be reduced, and the inference and prediction performance of the machine learning device may be improved.
The above descriptions are detail embodiments for carrying out the present disclosure. Embodiments in which a design is changed simply or which are easily changed may be included in the present disclosure as well as an embodiment described above. In addition, technologies that are easily changed and implemented by using the above embodiments may be included in the present disclosure. While the present disclosure has been described with reference to embodiments thereof, it will be apparent to those of ordinary skill in the art that various changes and modifications may be made thereto without departing from the spirit and scope of the present disclosure as set forth in the following claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 10-2023-0116366 | Sep 2023 | KR | national |