The present invention relates to an information processing apparatus for detecting overfitting of a learned model.
In recent years, a technique, has been known, for detecting an object in an image by using a model (a machine learning model) learned by machine learning such as deep learning. In general, it is desirable to cause the machine learning model to be learned by using a large amount of training data. In a case where the machine learning model is learned by using a small amount of training data or biased training data, the machine learning model may be in an over-fitted state, and erroneous detection or detection omission occurs.
JP 7207565 B2 discloses a technique for detecting overfitting of a machine learning model in a case where verification data are sequentially input to each of an inspector model learned by using a limited number of biased training data and a target machine learning model, and a matching rate of each output result (a classification result) is greater than or equal to a threshold value. Furthermore, Mukund Sundararajan and two others, “Axiomatic Attribution for Deep Networks”, [online], [searched on Sep. 1, 2023], and <URL: https://arxiv.org/abs/1703.01365> disclose a technique for measuring and visualizing a contribution degree of a feature of each input value (an image or a text) for prediction by a deep network.
By the way, even if the overfitting of the machine learning model can be detected, it is difficult to objectively grasp what kind of factors may cause the overfitting. Furthermore, even in a case where the contribution degree of the feature of the each input value is visualized, it is only possible for a person to grasp what an input value having a high contribution degree refers to.
The present invention has been made in view of the above problems, and an object thereof is to achieve a technique capable of suggesting what kind of factors may cause the overfitting of the machine learning model.
In order to solve the problem, for example, the information processing apparatus according to the present invention is an information processing apparatus for detecting overfitting of a machine learning model, the information processing apparatus including:
According to the present invention, it is possible to suggest what kind of factors may cause the overfitting of the machine learning model.
Embodiments will be described in detail below with reference to the accompanying drawings. Note that the following embodiments do not limit the invention according to the claims, and all combinations of the features described in the embodiments are not necessarily essential to the invention. Two or more features of a plurality of features described in the embodiments may be arbitrarily combined. Furthermore, the same or similar configurations are denoted by the same reference numerals, and redundant description will be omitted.
An example of an information processing system according to the present embodiment will be described with reference to
For example, the information processing apparatus 100 receives a photographed image from a communication device 101 and returns an object detection result. Note that the apparatus that transmits the photographed image and the apparatus that receives the object detection result may be separate devices, and for example, an image transmission device (for example, a monitoring camera) that transmits the photographed image and the communication device 101 may be separately configured. The information processing apparatus 100 can communicate with the communication device 101 (or the image transmission device) and the service provider terminal 102 via a network.
For example, as implementation of a predetermined service including object detection, the information processing apparatus 100 detects an object in the photographed image by using the machine learning model, and transmits an object detection result to another device. The object detection result may include information indicating the object detected (for example, a frame indicating a region of the object detected, a color or text indicating a name of the object, or a numerical value such as a score for the object during the detection). Note that, in the following description, the machine learning model (a detection model) to be executed to detect an object will be described as an example, but the machine learning model is not limited to a model that detects an object, and may be a classification model that classifies a subject. Note that the detection model includes, for example, a model using a deep neural network known as YOLO or the like. However, the detection model may be any of various models as long as the models include an object detection function. In the present embodiment, examples of the object may include a person, an animal, a plant, a vehicle, a building, daily necessities, a food item, and the like. Therefore, examples of the detection model may include: a model that detects a person in an image; a model that detects an animal in an image; a model that detects a specific article (for example, a vehicle, daily necessities, or the like) in an image; a model that detects a plurality of these types of objects, or the like. Furthermore, the detection model may be a model that detects an object having a specific feature (for example, a person performing a specific action, a person wearing a specific article, a person of a specific sex, or the like).
Examples of the predetermined service may include a service that detects and annotates at least one of a person, an animal, a plant, a vehicle, a building, daily necessities, a food item, and the like from a photographed still image or moving image, provides information related to an object detected, and provides navigation. Furthermore, examples of the predetermined service may also include a service that provides information for analyzing a still image or a moving image by detecting the above objects from the still image or the moving image to be transmitted from a fixed device such as a monitoring camera.
Furthermore, the information processing apparatus 100 executes the overfitting determination process to be described later in addition to detection of an object in an image using the detection model. Although details will be described later, the overfitting determination process is a process of determining the overfitting of the machine learning model by performing object detection using a mask image to be generated during the process.
In the overfitting determination process, an image and a set value are set by an operation in the service provider terminal 102, and the information processing apparatus 100 executes the overfitting determination process and transmits a processing result to the service provider terminal 102. The service provider terminal 102 is a terminal to be managed by a service provider, and for example, the service provider provides an object detection service for detecting an object in an image photographed. For the detection model, the service provider terminal 102 can perform, for example, setting of a hyperparameter of the detection model, learning and deployment of the detection model, confirmation of an operation state of the detection model, and the like. The information processing apparatus 100 transmits results of the overfitting determination process to the service provider terminal 102, and thus the service provider can grasp presence of the overfitting (or possibility of the overfitting) in the detection model being used. The service provider can relearn the detection model by preparing new training data or adding the mask image generated in the overfitting determination process to the training data according to the results of the overfitting determination process.
The information processing apparatus 100 is, for example, a server device. The information processing apparatus 100 may be, for example, a server device that implements a cloud service platform, and the object detection process and the overfitting determination process described above may be implemented on the cloud service platform. However, the information processing apparatus 100 may be an edge node disposed on the network, or a node constituting a P2P network. Alternatively, the information processing apparatus 100 may be a virtual machine configured on the cloud service platform.
Furthermore, in the present embodiment, a case where the object detection process and the overfitting determination process are executed on the information processing apparatus 100, will be described as an example. However, the present embodiment is also applicable to a case where the process is executed by a plurality of server devices that implement the cloud service platform. Furthermore, the present embodiment is also applicable to a case where the overfitting determination process is executed in the service provider terminal 102.
The communication device 101 is, for example, a tablet device or a smartphone, but may be a personal computer, or the like. Furthermore, the service provider terminal 102 is, for example, a personal computer, but may be a tablet device or a smartphone.
A hardware configuration example of the information processing apparatus 100 will be described with reference to
The memory 202 is a volatile storage medium such as a DRAM, and temporarily stores data and a program. Furthermore, the storage 208 is a non-volatile storage medium that permanently stores the data and the program. The program to be stored includes one or more commands executable by a processor. The storage 208 may be, for example, a semiconductor memory or a hard disk. The storage 208 can store training data for learning a neural network, test data for testing the neural network learned, and various data such as a photographed image received from the communication device 101.
The processor 204 includes, for example, an arithmetic circuit such as a central processing unit (CPU). The processor 204 may be configured by one or more processors. The processor 204 may further include an arithmetic circuit (for example, a GPU) and dedicated hardware for executing a statistical process such as machine learning at a higher speed, and may include a memory inside. The processor 204 deploys a program stored in the storage 208 to the memory 202, and executes the program to implement various functions of the information processing apparatus 100.
The communication interface 206 is an interface for transmitting and receiving data to and from a device outside the information processing apparatus 100. The communication interface 206 may include a communication circuit capable of communication in a communication scheme in conformity to various standards. The communication interface 206 is connected to the network, and exchanges the data with the service provider terminal 102 and the like illustrated in
The power supply 212 is a circuit or a module for providing power for each unit of the information processing apparatus 100 to operate. The power supply 212 may include a battery.
Next, a functional configuration example of the information processing apparatus 100 will be described with reference to
A detection model processing unit 310 includes a detection model execution unit 312 and a detection result output unit 314, and executes the object detection process with functional configurations of these units. Furthermore, the detection model processing unit 310 can include a data acquisition function of acquiring a photographed image from an external device. The photographed image may be included in a plurality of frames constituting the moving image. For example, the detection model processing unit 310 may acquire a moving image including images of a plurality of consecutive frames from the communication device 101. Furthermore, the detection model processing unit 310 executes the object detection process in the overfitting determination process. At this time, the detection model processing unit 310 can acquire the photographed image (an input image) temporarily or permanently stored in the storage 208 according to an instruction from the service provider terminal 102. Furthermore, the detection model processing unit 310 can acquire, as an input for the object detection process, a mask image generated by an overfitting detection unit 320 to be described later, according to an instruction from the service provider terminal 102.
The detection model execution unit 312 inputs the photographed image or the mask image acquired by the detection model processing unit 310, executes a process of a learned detection model, and detects an object in the image. The detection model outputs, for example, a frame indicating a region of the object detected, a color or text indicating a name of the object detected, and a numerical value indicating a probability during the detection. Note that the frame may include a plurality of coordinates for specifying the frame.
The detection result output unit 314 outputs information about the object detected by the detection model (that is, the object detection result). Using an output from the detection model execution unit 312, the detection result output unit 314 superimposes the frame indicating the region of the object on the photographed image to generate the object detection result. The detection result output unit 314 can provide the object detection result generated to, for example, the communication device 101, the service provider terminal 102, and the overfitting detection unit 320.
The overfitting detection unit 320 includes a setting unit 322, a basis region specification unit 324, a mask image generation unit 326, a condition determination unit 328, and a notification unit 330, and the overfitting detection unit 320 executes the overfitting determination process by functional configurations of these units.
The setting unit 322 sets a model, an image, a label, and the like to be used in the overfitting determination process in accordance with, for example, setting information to be transmitted from the service provider terminal 102. The setting information includes, for example, identification information of the model used in the overfitting determination process, a file name or a path of the input image, and a file name or a path of a file including a correct answer label. The identification information of the model is, for example, information for identifying a learned model stored in model data 331.
For example, the basis region specification unit 324 specifies, as a basis region (that is, a region regarded as important in object detection), a region in which a contribution degree of each of image features is higher than a predetermined threshold value when the detection model outputs a specific object region (for example, skis) as a detection result. The basis region specification unit 324 can use integrated gradients that is a known description method to specify the basis region (with reference to Axiomatic Attribution for Deep Networks). Certainly, the present invention is not limited to the example, and the basis region specification unit 324 may use another description method.
The mask image generation unit 326 generates an image (a mask image) obtained by invalidating the image features distributed in a region (for example, a person region) different from the specific object region (for example, the skis) in the basis region in the input image. The mask image generation unit 326 can generate various mask images, and details of the unit will be described later.
By using the detection model, the condition determination unit 328 executes an object detection process in the mask image, and determines whether or not the specific object is detected again even in a case of using the mask image. In other words, the condition determination unit 328 determines whether or not the object region can be correctly detected in a case where the image features distributed in the region (for example, the person region) different from the object region are hidden, during detection of the specific object region (for example, the skis). That is, in a case where the condition determination unit 328 cannot correctly detect the skis if there is no other features such as in the person region when the skis are detected, the detection model determines that the overfitting is present in the detection of the skis. Accordingly, it is possible to suggest that the overfitting using features of a region masked is caused in a case where the skis are not detected again. Note that, in order to determine whether or not the specific object is detected again in the case of using the mask image, the condition determination unit 328 can make the determination based on, for example, a degree of coincidence between the specific object region detected and a region indicated by correct answer data, and coincidence between a class detected and a class indicated by the correct answer data.
In a case where the condition determination unit 328 determines that the overfitting is present, the notification unit 330 causes the service provider terminal 102 to display that the overfitting is present.
Next, the overfitting determination process will be specifically described with reference to
Moreover, a process of generating the mask image from the input image will be described with reference to
On the other hand, a mask image 520 is an image obtained by masking the region 511 in which the contribution degree of the each of the image features is higher than the threshold value (Th=0.1), but a person region 521 (a region of a boundary box of the person) is not masked. In other words, in a case where the mask image 520 is used, the detection model can detect the object by using the image features distributed in the ski region 402 and the person region 521. On the other hand, in a case where the mask image 510 is used, the detection model cannot use the image features distributed in the person region 521 during the object detection.
Alternatively, the change in the result of the object detection may be determined by, for example, whether or not a recall rate (Recall) is equal to or less than a predetermined threshold value. The recall rate is calculated, for example, by dividing the number of correct answers (=TP) with an actual correct answer value being “positive” and a predicted value also being “positive” by the total number of data (=TP+FN) with the actual correct answer value being “positive”. At this time, the total number of the data can be, for example, a sum of the numbers of the mask images generated and the input image. For example, in a case where the detection model uses the mask image, the recall rate decreases when detection omission 701 or erroneous detection 702 occurs. The condition determination unit 328 may determine that the overfitting is present in a case where the recall rate is equal to or less than the predetermined threshold value.
Referring again to
Next, a series of operations of the overfitting determination process in the information processing apparatus 100 will be described with reference to
In S801, the setting unit 322 acquires the setting information from the service provider terminal 102. In S802, the setting unit 322 acquires the detection model, the input image, the correct answer label, and the like from the setting information.
The setting information screen 901 receives a user operation for designating a setting name 902, a model information 903, an input image 904, and a correct answer label 906. An arbitrary name for the user to distinguish the setting can be set as the setting name 902. The model information 903 is model identification information. The input image 904 is an input image used for the overfitting determination process. When the input image 904 is designated, the input image designated may be read into an application and displayed in an image region 905 within the setting information screen 901.
Moreover, the correct answer label 906 is designated by a file including a class of a correct answer object included in the input image and a position of the correct answer object. The correct answer label 906 may include object classes and positions of a plurality of objects. When the correct answer label 906 is designated, the correct answer label designated may be read and content of the label may be displayed in 907. By displaying the input image selected and the correct answer label, the user can confirm whether or not the image and the label are intended. When a save button 908 is pressed according to a user operation, settings 902 to 906 are transmitted to the information processing apparatus 100 as the setting information. At this time, the service provider terminal 102 transmits the setting information to the information processing apparatus 100, and then displays an object detection screen illustrated in
In S803, the detection model processing unit 310 detects the object within the input image by using the detection model set in S802, and transmits the detection result to the service provider terminal 102. At this time, the service provider terminal 102 displays a detection result image 1004 as illustrated in
On the object detection screen 1001, the user can set a mask image parameter for generating the mask image. The mask image parameter can be used to set, for example, a target object 1007 when the mask image is generated. The target object corresponds to the specific object described above. Note that the target object 1007 may be explicitly designated by the user, or may be automatically set by the service provider terminal 102 or the information processing apparatus 100 (for example, from the person and the skis of the detection result). In a case where the target object 1007 (that is, the specific object) is set as the skis, the object different from the specific object is set as, for example, the person.
Although not explicitly illustrated in
In S804, the setting unit 322 sets the basis region parameter and the mask image parameter according to the transmission information from the service provider terminal 102. The basis region parameter includes, for example, the threshold value (Th) of the contribution degree of the each of the image features described above, and the mask image parameter includes a setting as to whether or not to mask the region of the target object (the specific object) or the object different from the specific object. In a case where the information from the service provider terminal 102 does not include these pieces of information, the setting unit 322 can determine and set a value of each parameter. Furthermore, in a case where the each parameter is repeatedly set (that is, in a case where processes after S804 are repeated), the setting unit 322 can change and update the each parameter. For example, the setting unit 322 can gradually reduce the threshold value of the contribution degree from 1 to 0, switch the target object (from the skis to the person), and switch the setting as to whether or not to mask the region of the different object. In other words, the setting unit 322 can generate various mask images by setting various different parameters.
In S805, as described above, the basis region specification unit 324 specifies the basis region based on the basis region parameter. In S806, the mask image generation unit 326 generates the mask image based on the basis region specified in S805 and the mask image parameter, as described above. In S807, the detection model processing unit 310 executes the process of detecting the object in the mask image generated, by using the detection model.
In S808, the condition determination unit 328 determines whether or not the detection result of the object by the detection model satisfies determination conditions. The condition determination unit 328 may determine that the determination conditions are satisfied in a case where none of the erroneous detection or the detection omission occurs, or in a case where the recall rate (Recall) is greater than or equal to a predetermined threshold value. In a case where the condition determination unit 328 determines that the detection result of the object satisfies the determination conditions, the process proceeds to S809. Otherwise, the process proceeds to S810.
In S810, the notification unit 330 notifies the service provider terminal 102 of the possibility of the overfitting.
In S809, the overfitting detection unit 320 determines whether or not the process is executed on all the parameters. For example, in a case where the basis region parameter or the mask image parameter are changed in S804, the overfitting detection unit 320 determines whether or not all patterns are executed. In a case where it is determined that the process is executed on all the parameters, the overfitting detection unit 320 ends the process. Otherwise, the overfitting detection unit 320 causes the process to return to S804 and repeats the processes from S804 to S810. Accordingly, the overfitting detection unit 320 can repeat specifying the basis region (S805) and determining whether or not the specific object is detected again (S808), while changing the threshold value, the target object, or the like set in S804.
As described above, in the present embodiment, the detection model processing unit 310 outputs the region of the first object (for example, the skis) in the input image by using the detection model. Next, the basis region specification unit 324 specifies, as the basis region, the region in which the contribution degree of the each of the image features is higher than the threshold value (Th) when the machine learning model outputs the region of the first object. Moreover, the condition determination unit 328 executes the object detection process in the mask image (that is, an image obtained by invalidating the image features distributed in a region other than the ski region in the basis region), and determines whether or not the first object is detected again by using the detection model. Accordingly, it is possible to suggest that the overfitting using the features of the region masked is caused in a case where the first object is not detected again. In other words, it is possible to suggest what kind of the factors may cause the overfitting of the machine learning model.
Furthermore, in the present embodiment, the detection model can detect a plurality of objects including the first object and the second object in the input image. Then, during the determination of whether or not the first object is detected again, the condition determination unit 328 executes the object detection process in the mask image, and determines whether or not the first object is detected again, the mask image obtained by invalidating the image features distributed in a region of the second object and a region other than the region of the first object in the basis region. The condition determination unit 328 can suggest the possibility that the overfitting is caused in consideration of presence or absence of the region of the second object.
The invention is not limited to the above embodiments, and various modifications and changes can be made within the scope of the gist of the invention.