The present invention relates to a machine learning technology. More particularly, the present invention relates to a machine learning technology for eliminating spurious correlation.
Technologies such as machine learning and neural networks are widely used in a technical field of artificial intelligence. One of the important applications of artificial intelligence is to identify objects (such as human faces, vehicle license plates, etc.) or predict data (such as stock prediction, medical treatment prediction, etc.). The object detection and the data prediction can be realized through feature extraction and feature classification.
However, spurious correlation usually happens between features for the feature extraction and the feature classification, and the spurious correlation always causes that prediction accuracy of the object detection and the data prediction decreases.
The disclosure provides a machine learning method, which includes following steps: obtaining, by a processor, a model parameter from a memory, and performing, by a processor, a classification model according to the model parameter, wherein the classification model comprises a plurality of neural network structural layers; calculating, by the processor, a first loss and a second loss according to a plurality of training samples, wherein the first loss corresponds to an output layer of the plurality of neural network structural layers, and the second loss corresponds to one, which is before the output layer, of the plurality of neural network structural layers; and performing, by the processor, a plurality of updating operations for the model parameter according to the first loss and the second loss to train the classification model.
The disclosure provides a machine learning device, which includes a memory and a processor. The memory is configured for storing a plurality of instructions and a model parameter; a processor is coupled with the memory. The processor is configured to run a classification model, and is configured to execute the instructions to: obtain the model parameter from the memory, and perform a classification model according to the model parameter, wherein the classification model comprises a plurality of neural network structural layers; calculate a first loss corresponding to an output layer of the plurality of neural network structural layers, and calculating a second loss corresponding to one, which is before the output layer, of the plurality of neural network structural layers; and perform a plurality of updating operations for a model parameter of the classification model according to the first loss and the second loss to train the classification model.
These and other features, aspects, and advantages of the present invention will become better understood with reference to the following description and appended claims.
It is to be understood that both the foregoing general description and the following detailed description are by examples, and are intended to provide further explanation of the invention as claimed.
The invention can be more fully understood by reading the following detailed description of the embodiment, with reference made to the accompanying drawings as follows:
Reference will now be made in detail to the present embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
Reference is made to
In some embodiments, the machine learning device 100 can be established by a computer, a server or a processing center. In some embodiments, the processor 110 can be realized by a central processing unit or a computing unit. In some embodiments, the memory 120 can be realized by a flash memory, a read-only memory (ROM), a hard disk or any equivalent storage component.
In some embodiments, the machine learning device 100 is not limited to include the processor 110 and the memory 120. The machine learning device 100 can further include other components required to operating the machine learning device 100 in various applications. For example, the machine learning device 100 can further include an output interface (e.g., a display panel for displaying information), an input interface (e.g., a touch panel, a keyboard, a microphone, a scanner or a flash memory reader) and a communication circuit (e.g., a WiFi communication module, a Bluetooth communication module, a wireless telecommunication module, etc.).
As shown in
In some embodiments, the classification model 111 can classify input data, for example, detecting that an input image contains vehicles, faces, license plates, text, totems, or other image-feature objects, or predicting input stock data being rising or falling in the future. The classification model 111 is configured to generate a corresponding label according to a classification result. It should be noted that the classification model 111 will refer to a model parameter MP while performing classification operations.
As shown in
In this embodiment, the classification model 111 includes multiple neural network structural layers. In some embodiments, each one of the neural network structural layers corresponds to one weight parameter content (configured to determine the operation of one neural network structural layer) among the model parameter MP. On the other hand, each one of the neural network structural layers of the classification model 111 corresponds to the weight parameter content independent from others. In other words, each one of the neural network structural layers corresponds to one weight value set, where this weight value set includes multiple weight values.
In some embodiments, the neural network structural layer can be a convolution layer, a pooling layer, a linear rectification layer, a fully connected layer or other type of neural network structure layer. In some embodiments, the classification model 111 is relative to neural networks (e.g. the classification model 111 is composed of deep residual networks (ResNet) and fully connected layer, or composed of EfficentNet and fully connected layer).
Reference is further made to
As shown in
In step S220, a first loss and a second loss are calculated according to multiple training samples, where the first loss corresponds to an output layer of the neural network structural layers, and the second loss corresponds to one, which is before the output layer, of the neural network structural layers. In an embodiment, the first loss is generated by the processor 110 from the output layer of the neural network structural layers of the classification model 111, and the second loss is generated by the processor 110 from the neural network structural layer before the output layer. In some embodiments, the output layer includes at least one fully connection layer. Further details about step S220 will be further described in following paragraphs with some examples.
In step S230, multiple updating operations are performed for the model parameter MP according to the first loss and the second loss to train the classification model 111. In an embodiment, the model parameter MP is updated by the processor 110 in the updating operations according to the first loss and the second loss to generate the updated model parameter MP, and the classification model is trained according to the updated model parameter MP to generate the classification model 111 after the training. Further details about step S230 will be further described in following paragraphs with some examples.
By this way, the classification model 111 after training can be used to execute subsequent applications. For example, the classification model 111 after the training can be used for object recognition, face recognition, audio recognition, or motion detection within input pictures, images or streaming data, or can be used for data prediction about stock data or weather information.
Reference is further made to
As shown in
For example, the neural network structure layers SL1 and SL2 can be convolutional layers; the neural network structure layer SL3 can be a pooling layer; the neural network structure layers SL4 and SL5 can be convolutional layers; the neural network structure layer SL6 can be a pooling layer, the neural network structure layer SL7 can be a convolutional layer; the neural network structure layer SL8 can be a linear rectification layer; and the neural network structure layer SLt can be a fully connected layer, and the disclosure is not limited thereto.
In some embodiments, the classification model 111 can have multiple residual mapping blocks, and by using structures of the residual mapping blocks, t can be decreased greatly. The following refers to this structure of the classification model 111 as examples to further describe step S221 to step S224A.
It is added that, for brevity of description, the classification model 111 in
As shown in
As shown in
As shown in
It should be noted that it may exists spurious correlation between the extraction features {Hi,1, Hi,2, . . . Hi,m}i=1n and the training labels {yi}i=1n. In detail, suppose a first extraction feature is causally related to both a second extraction feature and the training label yi, but the second extraction feature and the training label yi are not causally related to each other. Based on this, the second extraction feature and the training label yi may be associated. When the value of the second extraction feature increases along with the change of labels linearly, the second extraction feature is spuriously correlated with the training label yi. The spurious correlation belongs to explicit if the extraction feature which causes the spurious correlation can be observed (i.e. relationship between the first extraction feature, the second extraction feature and the training label yi). Otherwise, the spurious correlation is said to be implicit (i.e. relationship between the second extraction feature and the training label yi). The spurious correlation causes that the predicted labels {ŷi}i=1n are different from the training labels {yi}i=1n more greatly.
For example, if a patient clinical image usually has a cell tissue of a lesion and a bone which color is similar the cell tissue, it causes the explicit spurious correlation between the extraction feature of the bone and the label of the lesion. For another example, the patient clinical image usually has a background, and the lesion in the patient clinical image is similar to the background. Therefore, it causes the implicit spurious correlation between the extraction feature of the background and the label of the lesion.
To avoid the spurious correlation, the following paragraphs further describes details of using statistical independence to eliminate the explicit spurious correlation and using average treatment effect to eliminate the implicit spurious correlation.
As shown in
Where E(.) means an expected value of the random variables, a and b are the random variables, and p and q are positive integers. According to the formula (1), an independent loss can be shown in following formula (2).
As shown in
Where j and k are positive integers and are not more than m. By using the formula (3), the second loss L2 is calculated according to the extraction features {Hi,1, Hi,2, . . . Hi,m}i=1n. In some embodiments, the second loss of the formula (3) can further multiply an importance value to generate the second loss L2, where the importance value is more than zero and is a hyperparameter to control importance of the independent loss.
Reference is further made to
It should be noted that difference between
As shown in
Where p(.) means a probability of a random variable, Yi and Ti are random variables, Ti∈{0, 1} represent a treatment, Yi∈ is an observed outcome, Ci∈v is a covariate vector, |T|=Σi=1nTi, and =Σi=1n(1−Ti).
As shown in
Where the loss of jth extraction feature means a causal loss (i.e. the average treatment effect loss) corresponding to the extraction features H1,j, H2,j, . . . Hn,j, σ(x) means a hard sigmoid function which is
Based on the formula (5), the second loss L3 which indicates the average treatment effect of the {Hi,1, Hi,2, . . . Hi,m}i=1n is shown as following formula (6).
By using the formula (6), the second loss L3 is calculated according to the extraction features {Hi,1, Hi,2, . . . Hi,m}i=1n and the training labels {yi}i=1n of the training samples {xi}i=1n. In some embodiments, the second loss of the formula (6) also can further multiply another importance value to generate the second loss L3, where the another importance value is also more than zero and is another hyperparameter to control importance of the average treatment effect loss.
Reference is further made to
As shown in
In addition, the loss difference also can be calculated according to the first loss, the second loss generated from step S224A in
In step S232, it is to determine whether the loss difference converged. In some embodiments, when the loss difference converged, the loss difference approaches or equals to a difference threshold which is generated according to statistical experiment outcomes.
In this embodiments, if the loss difference did not converge, it performs step S233. In step S233, a backpropagation operation is performed by the processor 110 for the classification model according to the first loss and the second loss to update the model parameter MP. In other words, an updated model parameter is generated from the model parameter MP according to backpropagation based on the first loss and the second loss.
By this way, it continues to repeat steps s233, S220 and S231A for gradually updating the model parameter MP in an iterative manner. Accordingly, the loss difference minimizes gradually (i.e. the second loss maximizes gradually) until the loss difference approaches or equals to the difference threshold. On the contrary, if the loss difference converged, it means that the machine learning device 100 has completed the training, and the classification model 111 after training can be used to execute subsequent applications.
Based on aforesaid embodiments, by using the second loss in step S224A, the extraction features belonging to the explicit spurious correlation can be removed in step S230. In addition, by using the second loss in step S224B, the extraction features belonging to the implicit spurious correlation can be removed in step S230.
Reference is further made to
As shown in
Reference is further made to
It should be noted that difference between
As shown as
Based on aforesaid embodiments, by using the second loss in step S224A and the third loss in S220′ at the same time, the extraction features belonging to the explicit spurious correlation and the implicit spurious correlation can be removed in step S230.
As shown in
In the field of computer vision and computer prediction, the accuracy of deep learning mainly relies on a large quantity of labeled training data. As the quality, quantity, and variety of training data increase, the performance of the classification model usually improves correspondingly. However, the classification model always has the explicit spurious correlation or the implicit spurious correlation between the extraction features and the training labels. If we can remove the explicit spurious correlation or the implicit spurious correlation, it will be more efficient and more accurate. In aforesaid embodiments of the disclosure, it proposes adjust the model according to the independent loss and the average treatment effect loss to remove the explicit spurious correlation or the implicit spurious correlation in the classification model. Therefore, the adjusting of the model parameter according to the independent loss and the average treatment effect loss can improve the overall model performance.
For practical applications, the machine learning method and the machine learning device in the disclosure can be utilized in various fields such as machine vision, image classification, data prediction or data classification. For example, this machine learning method can be used in classifying medical images. The machine learning method can be used to classify X-ray images in normal conditions, with pneumonia, with bronchitis, or with heart disease. The machine learning method can also be used to classify ultrasound images with normal fetuses or abnormal fetal positions. The machine learning method can also be used to predict stock data being rising or falling in the future. On the other hand, this machine learning method can also be used to classify images collected in automatic driving, such as distinguishing normal roads, roads with obstacles, and road conditions images of other vehicles. The machine learning method can be utilized in other similar fields. For example, the machine learning methods and machine learning device in the disclosure can also be used in music spectrum recognition, spectral recognition, big data analysis, data feature recognition and other related machine learning fields.
Although the present invention has been described in considerable detail with reference to certain embodiments thereof, other embodiments are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the embodiments contained herein.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims.
This application claims priority to U.S. Provisional Application Ser. No. 63/120,216, filed Dec. 2, 2020, and U.S. Provisional Application Ser. No. 63/152,348, filed Feb. 23, 2021, all of which are herein incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
63152348 | Feb 2021 | US | |
63120216 | Dec 2020 | US |