The present invention relates to an information processing apparatus, an information processing method, and a storage medium.
In recent years, methods using neural networks have been developed in an image processing technology for improving the image quality of an image. For example, neural networks are used to achieve image processing for improving image quality such as noise removal, blur removal, and super resolution.
Additionally, for example, in Document 1 (Frequency Attention Network: Blind Noise Removal for Real Images, Hongcheng Mo et al., 2020.), an attention mechanism is proposed as one of the elements constituting the neural network. In the attention mechanism, when a feature amount is input, a weight of the feature amount is generated, and attention processing for emphasizing an important element in the feature amount is realized by using the generated weight.
By introducing this attention mechanism into a neural network, the performance of various image processing tasks can be improved, and image quality can also be improved in the aforementioned image processing for improving image quality.
However, in general, the processing load of the attention processing is large, and the processing speed is reduced when the attention processing is excessively introduced into the network.
In contrast, in Japanese Unexamined Patent Application Publication No. 2019-212206, by utilizing the fact that the weight of the input feature amount is generated by the attention processing, it is determined that the calculation processing for generating the feature amount having a small weight is redundant, and the calculation processing is reduced.
However, the processing in Japanese Patent Application Laid-Open Publication No.2019-212206 is intended to reduce the number of filters and the number of dimensions of the feature amount, and is not intended to reduce a series of processing steps of the feature amount, and therefore, the processing load of the attention mechanism is still large.
An information processing apparatus comprising: at least one processor; and a memory coupled to the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to: perform image processing by inputting an image to a first neural network in which an attention mechanism that performs image processing for improving image quality is included; detect a redundant attention mechanism by determining whether or not a weight of attention processing generated in a process of the image processing is active; and acquire a second neural network by deleting the redundant attention mechanism detected by the detection from the first neural network, and; and perform machine learning with the second neural network.
Further features of the present invention will become apparent from the following description of embodiments with reference to the attached drawings.
Hereinafter, with reference to the accompanying drawings, favorable modes of the present invention will be described using embodiments. In each diagram, the same reference signs are applied to the same members or elements, and duplicate descriptions will be omitted or simplified.
The information processing apparatus 100 also includes a display control unit 105 that generates a display signal for displaying an image and the like on a liquid crystal display, an organic display, and the like, a communication unit 106 for communicating with the outside of the information processing apparatus 100, and the like.
However, some or all of them may be realized by hardware. As the hardware, a dedicated circuit (ASIC), a processor (reconfigurable processor, DSP), and the like can be used.
Additionally, each functional block as shown in
The information processing apparatus 100 as shown in
In the model storage unit 116, a neural network in which attention that has been learned in advance is incorporated is stored. The database unit 110 is a database that stores images for image quality evaluation and images for learning image processing for improving image quality.
The image processing unit 111 executes predetermined image processing when an input image is provided by using a neural network stored in the model storage unit 116. That is, the image processing unit 111 executes an image processing step that performs the image processing by inputting an image quality evaluation image to the first neural network in which the attention mechanism for executing the image processing for improving image quality is incorporated. The processing result storage unit 112 stores a weight of the attention processing generated in the process of the image processing that has been executed by the image processing unit 111.
The detection unit 113 performs the activation determination for a weight generated by the attention mechanism, and detects a redundant attention mechanism from the attention of the neural network stored in the model storage unit 116. Specifically, the detection unit 113 detects a redundant attention mechanism by determining whether or not the weight of the attention processing generated in the process of the image processing is active.
The deletion unit 114 deletes the redundant attention mechanism detected by the detection unit 113 from the first neural network stored in the model storage unit 116, and acquires a new second neural network. The learning unit 115 learns the second neural network acquired by the deletion unit 114.
Next, with reference to the functional block diagram of
First, the process of detecting a redundant attention mechanism in step S301 of
Here, a detailed processing flow of step S301 is shown in
As the image quality evaluation image stored in the database unit 110, an image is used in which a part to be emphasized in evaluating the image quality can be evaluated. For example, in a case in which the sharpness of an image is to be emphasized in the image quality of an image processing result, the sharpness and reproducibility of edges and colors are important.
In this case, a frequency chart indicating the fluctuation of the frequency band in the image and a color chart in which regions in the image are divided for each color is used in the image quality evaluation image in order to evaluate the sharpness and reproducibility of the edge and the color.
In addition, for example, in a case in which the visibility of characters is to be emphasized, a character chart reproduces characters is used. Additionally, in a case in which the visibility of a specific object, for example, the visibility of an automobile is to be emphasized in image quality, an image in which an automobile is captured is used.
In step S312, the image quality evaluation image stored in the database unit 110 is acquired as an input image to the network for performing the image processing for improving image quality. In step S313, the image processing unit 111 executes the image processing for improving image quality on the input image acquired in step S312, by using the learned neural network to which the attention mechanism that is stored in the model storage unit 116 is introduced. Additionally, in step S313, the loop processing starts.
Note that the neural network according to the first embodiment executes noise removal as the image processing for improving image quality by using, for example, U-Net as described in Document 2 (Toward Convolutional Blind Denoising of Real Photographs, Shi Guo et al., 2019). That is, in Document 2, a Convolutional Neural Network (CNN) that realizes noise removal is described, and the CNN is composed of a large number of convolution layers and activation layers.
Additionally, in the network of Document 2, noise removal is performed by using a network referred to as “U-Net” having a U-shaped structure in order to realize that image processing for improving image quality such as noise removal and super-resolution in particular. In the first embodiment, an example using the above U-Net will be explained.
That is, the first and second neural networks in the first embodiment have an encoder unit that generates a plurality of feature amounts, and a decoder unit that restores the plurality of feature quantities as an image of a desired image processing execution result. More specifically, the first and second neural networks have an encoder unit that generates feature amounts having a plurality of resolutions, and a decoder unit that restores feature amounts of the plurality of resolutions as an image of a desired image processing execution result.
First, in the encoder 401, the feature amounts having different resolutions and channels of the input image 411 are generated. Thereafter, while deconvolution is performed by the encoder 401 on the feature amount compressed to the end by the decoder 402, the number of channels is reduced and the resolution is increased, and the feature amount is restored as an image.
Finally, a noise-removed image 413 can be obtained as a high-quality image on which image processing including predetermined noise removal processing has been performed. That is, when an image with noise is input, the first and second neural networks output an image from which noise has been removed.
Although in the first embodiment, the network configured as described above is used, the network may have any structure as long as the network realizes high image quality such as generation of feature amounts of a plurality of resolutions, and the position and the number of attention mechanisms are not limited thereto.
Details of the neural network according to the first embodiment as shown in
Note that the activation processing relu (Rectified Linear Unit) is processing using a function in which the output value is always 0 when the input value is 0 or less, and the output value is the same value as the input value when the input value is more than 0.
As described above, in the network of the first embodiment, processing 421 of applying a plurality of convolutions and the activation process relu is performed on the input image 411 including noise to generate a high-resolution feature amount. Furthermore, the attention processing is executed on the generated feature amount by attention processing 405 to generate the feature amount 412.
In a low-resolution feature amount encoder unit 404A, the pooling processing on the feature amount 412 is performed and a plurality of convolutions and the activation processing relu are performed to generate a low-resolution feature amount. Furthermore, the attention processing 405A is executed on the feature amount generated by the low-resolution feature amount encoder unit 404A.
On the feature amount 412A on which the attention processing 405A has been executed, pooling processing is further performed in a low-resolution feature amount encoder unit 404B to reduce the resolution, and the feature amount is compressed. Additionally, a feature amount 412B is generated by performing the attention processing 405B.
Subsequently, the resolution is further reduced by the pooling processing in a low-resolution feature amount encoder unit 404C to compresses the feature amount. Additionally, a feature amount 412C is generated by performing the attention processing 405C.
As described above, the low-resolution feature amount encoder unit 404 of the first embodiment includes the attention processing 405A to 405C at the end of the path for generating the feature amount. Note that the attention processing 405A to 405C are attention processing for performing weighting the feature amount in the spatial direction or the attention processing for weighting the feature amount in the channel direction.
That is, the attention mechanism incorporated in the first neural network includes an attention mechanism that generates a weight in the spatial direction of the input feature amount or an attention mechanism that generates a weight for the channel direction of the input feature amount.
Subsequently, the processing 512 using a sigmoid function is applied to the generated feature amount, and a weight 513 in a spatial direction having the value of 0 to 1 is generated. Furthermore, the generated weight 513 is multiplied by the input feature amount 501, so that the feature amount 515 weighted in the spatial direction is acquired.
In contrast,
On the generated feature amount, the sigmoid function 524 is further applied through the fully connected layer processing 523, and thereby, a weight 525 in the channel direction is generated. Then, the generated weight 525 is multiplied by the input feature amount 501, so that a feature amount 526 weighted in the channel direction is acquired. Note that, in the first embodiment, it is assumed that attention in the spatial direction is used.
Returning to the explanation of
In step S315, the detection unit 113 determines whether the weight of the attention processing is active or inactive for each attention mechanism, by using the set of the image quality evaluation image and the weight of the attention processing stored in the processing result storage unit 112.
The attention mechanism in which the weight of the attention processing is determined to be inactive is determined to be redundant, and the redundant attention mechanism is detected. Here, step S315 functions as a detection step of detecting a redundant attention mechanism by determining whether or not the weight of the attention processing generated in the process of the image processing step is active.
Note that, in the first embodiment, the statistical amounts of the weight values of the attention processing are calculated, and whether or not the weight of the attention processing is active is determined according to the statistical amounts. Specifically, as a method for determining whether the weight of the attention processing is active or inactive, a variance value of the weight generated by the attention mechanism is used, and the variance value of the weight generated by the attention mechanism is calculated for each attention mechanism. When the calculated variance value is equal to or less than a predetermined threshold, the weight is determined to be inactive, and when the calculated variance value is equal to or more than the threshold, the weight is determined to be active.
The weight of the attention processing of the attention mechanism incorporated in the neural network when the input image 601 is given is acquired. That is, for example, a weight 602 of the attention processing 405 and a weight 603 of the attention processing 405A are respectively acquired.
At this time, it is assumed that the values of all the elements of the weight 602 of the attention processing 405 are substantially constant. In contrast, it is assumed that, with respect to the weight 603 of the attention processing 405A, the weight increases as the frequency of the input image 601 decreases.
That is, even if the weight 602 of the attention processing 405 is used, the weighting is not performed on the feature amount, and thus the attention mechanism that generates this weight can be determined to be redundant. As described above, the value of the weight of the attention processing varies or does not vary for each element depending on the input image. Accordingly, in the first embodiment, the variance value of the weight is calculated. Therefore, when the value varies depending on the element, the variance value increases, and when the variation is small, the variance value decreases.
Then, the active or inactive determination results for the weight of the attention processing are totaled for all the image quality evaluation images for each attention mechanism. In the first embodiment, it is determined whether or not the attention mechanism is redundant by a majority decision of active or inactive, and if there are more inactive weights of the attention processing, it is determined that the attention mechanism that generates the weights of the attention processing is redundant.
That is, it is detected that the attention mechanism in which there are more inactive weights of attention processing than the active weights of attention processing, among weights of the attention processing acquired for each attention mechanism in a case in which one or more image quality evaluation images are given, is redundant.
Note that the method below may be used as another method for the determination of the activation for the weight of the attention processing and detection of the redundant attention mechanism. That is, for example, first, the average value of the variances of the weights of the attention processing for all the image quality evaluation images is calculated for each attention mechanism. Next, the weight of the attention processing, which is the average value of the variances equal to or less than the defined threshold, may be determined to be inactive and the attention mechanism may be determined to be redundant.
That is, the average value of the variances of the weights of the attention processing acquired when one or more image quality evaluation images are given may be calculated, and the attention mechanism that generates the weight of the attention processing of which the average value is less than a predetermined threshold may be detected as redundant.
Thus, in step S315, the activation determination for the weight of attention is performed and a redundant attention mechanism is detected, and then the process proceeds to step S302. In step S302, the deletion unit 114 deletes the redundant attention mechanism detected in step S315 from the structure of the neural network that is stored in the model storage unit 116.
As a result, a new network structure is acquired, and the acquired new network structure is stored in the model storage unit 116. Here, the step S302 functions as a deletion step of deleting the redundant attention mechanism detected in the detection step from the first neural network and acquiring a new second neural network.
Next in step S303 of
In step S321, the loop processing of the learning of the neural network starts. By this process, the parameters such as the weight and bias of the network are repeatedly updated. That is, at the start of learning, the neural network to which initial parameter values are given is read, and thereafter, while updating the parameters of the neural network by repeatedly learning, the parameters are stored in the model storage unit 116. At this time, the learning ends by repeating a predetermined number of times.
In step S322, the database unit 110 outputs an input image to the image processing unit 111 and a true value image to the learning unit 115, from among the set consisting of the input image and the true value image. In the first embodiment, a clean image without noise is used as the true value image so that noise removal is executed as image processing. Additionally, a noise-added image obtained by adding noise to an input image is used.
In step S323, the image processing unit 111 performs image processing by inputting the input image obtained in step S322 to a neural network, and obtains the result. The obtained image processing results are output to the learning unit 115.
In step S325, the learning unit 115 calculates an error value by using the true value image obtained in step S322 and the image processing result obtained in step S323. In step S325, the parameters of the neural network are updated by the back propagation method using the error value calculated in step S324.
The neural network having the updated parameters is stored in the model storage unit 116. The processes from step S321 to step S325 are repeatedly performed by loop processing, and the learning of a neural network having a new structure for executing the image processing for improving image quality is performed.
The method for determining whether the attention layer is active or inactive based on, for example, the variance value of the weights generated by the attention mechanism and deleting a redundant attention layer has been explained above. Note that the score serving as an index for determining whether the attention layer is active or inactive is not limited to the variance value. Any index may be used as long as it can be used for a similar purpose.
For example, standard deviation may be used as another index. Alternatively, as another index for determining the degree of distribution, an index, for example, skewness and kurtosis, may be used, or a combination thereof may be used as an index for achieving the purpose.
As described above, according to the first embodiment, in a neural network in which noise removal processing is performed, redundant attention can be deleted and a neural network with a faster processing speed can be acquired.
In the first embodiment, whether the weight of the attention processing is active or inactive is determined by using the variance value of the weight and the like. In Modification 1, another method for determining whether the weight of the attention processing in the spatial direction is active or inactive in a case in which a frequency chart is used for the image quality evaluation set will be explained with reference to
The input image 601 that is a frequency chart is given as an input to a neural network capable of executing the image processing for improving image quality. This frequency chart has a characteristic in which frequencies increase from the left end portion of the image toward the right end portion thereof. At this time, the weight 602 of attention processing 405 incorporated in the neural network, the weight 603 of the attention processing 405A, and the like are obtained.
Here, in Modification 1, with respect to the weight of the attention processing acquired when the frequency chart is given as an input of the first neural network, the region is divided for each frequency band, and a representative value of the weight is calculated for each region. The average value or maximum value of the weights is used as the representative value.
Next, an absolute difference value of the representative value between adjacent regions is calculated. When the absolute difference value is higher than a predetermined value (for example, 0) in any adjacent region, the weight of the attention processing is determined to be active. Additionally, when the difference value is equal to or less than a predetermined threshold, the weight of the attention processing is determined to be inactive.
If such a determination method is used, the weight 602 of the attention processing 405 can be determined to be inactive because the weight value is constant regardless of the frequency band. Additionally, the weight 603 of the attention processing 405A can be determined to be active because the value of the weight varies depending on the frequency band.
In Modification 2, a method for determining whether the weight of the attention processing is active or inactive in the spatial direction when a character chart is used in the image quality evaluation set will be explained with reference to
In
In this case, for example, a character detector is used in order to separate the character region and the background region. Additionally, the average value or the maximum value of the weights is used as the representative value. Next, the absolute difference value of the representative values between the character region and the background region is calculated. At this time, when the absolute difference value is larger than 0, it is determined that the weight of the attention processing is active.
If such a determination method is used, the weight 802 of the attention process 405 is determined to be inactive because there is no difference in the weight value between the character region and the background region. Additionally, the weight 803 of the attention processing 405A is determined to be active because a difference in the weight value between the character region and the background region is produced. Additionally, such a determination method can also be realized by using an object detector that detects an object image in which a specific object is reflected and a specific object, instead of a character chart.
That is, the weight of attention processing acquired when a character chart or an object image is given as an input to the neural network is divided into a character region or an object region and a background region, and a representative value of the weight is calculated for each region. Additionally, a difference value of the representative values of the weights between the character region or the object region and the background region is calculated, and when the difference value is equal to or less than a predetermined threshold, it is determined that the weight of the attention processing is inactive.
In Modification 3, a method for determining whether the weight of the attention processing in the spatial direction is active or inactive in a case in which a color chart is used in the image quality evaluation set will be explained with reference to
In the color chart 901 shown in
Specifically, among the weights of attention processing obtained when a color chart is given as an input to the neural network, regions are divided for each color, and a representative value of the weights is calculated for each region.
Each color region can be divided by using the fact that the pixel values of the image are constant. Next, two color regions are randomly sampled from all the color regions and a pair of color regions is formed. The absolute difference value of the representative value is calculated between the formed pairs of color regions.
At this time, if there are many pairs of all of the color regions whose absolute difference values are greater than 0, the weight of attention processing is determined to be active. If such a determination method is used, the weight 902 of the attention processing 405 is determined to be inactive because there is no difference in the weight value between each color region.
Additionally, the weight 903 of the attention processing 405A is determined to be active because a difference in the value of the weight in each color region is produced. Thus, the difference value of the representative values of the weights between the regions divided for each color is calculated, and when the difference value is equal to or less than a predetermined threshold, the weight of the attention processing is determined to be inactive.
As described above, the image quality evaluation image input to the first neural network may be various images. That is, the image quality evaluation image includes at least one of a frequency chart indicating a change in a frequency band in an image, a character chart in which characters are written in the image, an object image in which a specific object whose image quality is desired to be improved is reflected in the image, and a color chart in which regions are divided for each color in the image.
In Modification 4, a method of deleting a redundant attention mechanism in consideration of the calculation cost of attention will be explained. First, the processing speeds of all the attention processing when an input image is provided to the neural network are measured.
The target value when the attention mechanism is deleted is determined based on the measurement result of the processing speed. For example, a target value when the total processing time of the attention processing is reduced by 20% is set as a target value. Next, redundant attention is detected in ascending order of the average value of the variance values of the weights of attention when a plurality of image quality evaluation images is given.
At this time, if the processing speed of the neural network when the attention mechanism as the target is deleted reaches the reduction of the processing time of 20% of the attention processing as the target value, the detection processing of the redundant attention mechanism stops.
Then, it is determined that an attention mechanism in which the average value of the variance values of the attention weights detected so far is small is a redundant attention mechanism. Thus, it is possible to improve the processing speed while suppressing a decrease in accuracy due to the deletion of an attention determined to be redundant. Thus, the redundant attention mechanism may be detected based on the statistic of the weight of the attention processing and the processing speed of the attention processing.
In the first embodiment, redundant attention is deleted from a noise removal network in which attention is incorporated, and a noise removal network having a faster processing speed is newly acquired.
In the second embodiment, an example will be explained in which redundant attention is deleted from a super-resolution network that executes super-resolution processing of converting a low-resolution image into a high-resolution image, and a super-resolution network having a faster processing speed is newly acquired. That is, in the second embodiment, when an image is input, the first and second neural networks output a high-resolution image.
The configuration of the functional blocks of the image processing apparatus in the second embodiment is similar to, for example, the configuration of the functional blocks of the first embodiment as shown in
In step S313, the neural network executes super-resolution processing using a model learned so as to perform super-resolution processing. When a low-resolution image is given to the neural network as an input image, a high-resolution image is output.
Additionally, in step S322, a low-resolution image is acquired as an input image and a high-resolution image is acquired as a true value image. The newly acquired neural network is learned in the subsequent processes by using the acquired low-resolution image and high-resolution image.
By using the second embodiment configured as described above, even in a neural network in which super-resolution processing is performed, redundant attention is removed and a faster processing speed neural network can be acquired.
In the first embodiment, the activation determination of the attention processing with respect to a certain image quality evaluation item. In the third embodiment, an example will be explained in which a neural network is acquired in which a redundant attention mechanism is deleted while an attention mechanism that is effective in a certain image quality evaluation item remains by performing the activation determination using a plurality of different image quality evaluation images.
The configuration of the functional blocks of the image processing apparatus in the third embodiment is similar to the configuration of the functional blocks of the first embodiment as shown in, for example,
In the process of generating the feature amounts 1001A, 1002A, and 1003A, attention processing is finally executed as in attention processing 1011, 1012, and 1013, and thereby, feature amounts are generated. Subsequently, each of the generated feature amounts are combined when being input to the decoder unit 1004, and are processed by the decoder unit 1004 so as to output a high-quality image 1006 having a desired resolution and a desired number of channels.
In the neural network configured as described above, it is desirable that feature amounts having different properties are generated in each of the encoder units. It is possible to generate feature amounts having different properties in each encoder unit by performing learning such that the weight of attention processing of each encoder unit is different using, for example, the method of Document 3 (Diversified Visual Attention Networks for Fine-Grained Object Classification, Bo Zhao., 2016.).
Alternatively, feature amounts having different properties in each encoder unit may be generated by the following method. First, one of the encoder units is connected to the decoder unit, and sufficient learning is performed so as to obtain a desired feature amount.
For example, a neural network consisting of the encoder unit 1001 and the decoder unit 1004 is configured, and learning is performed by giving a noise-added image and a clean image without noise of an image in which an automobile is reflected during learning. As a result, the encoder unit 1001 generates a feature amount specialized for an automobile image.
Similarly, the encoder unit 1002 generates a feature amount specialized for a character chart by giving a noise-added image and a clean image without noise of the character chart at the time of learning and performing learning.
It is possible to acquire a neural network in which noise removal is performed while each encoder unit generates different feature amounts by sufficiently learning the encoder units independently of each other, changing the structure to the network as shown in
Additionally, in the third embodiment, the activation determination for a plurality of attentions is performed. For example, in the third embodiment, the attention activation determination using a frequency chart as an image quality evaluation image is performed by using the method of Modification 1.
Additionally, the attention activation determination is performed by using an image in which a character chart is reflected in the image quality evaluation image by using the method of Modification 2. Furthermore, the attention activation determination is also performed using an image in which, for example, an automobile as a specific object image is reflected. Thus, the activation determination for weights of a plurality of types of attention processing is performed, and an attention mechanism determined to be inactive by any activation determination method is detected as a redundant attention mechanism.
For example, when the weight of the attention processing 1011 is determined to be inactive by using any of the character chart, the specific object image, and the frequency chart, the attention mechanism is detected as a redundant attention mechanism.
If the weight of the attention processing 1012 is determined to be active, for example, for the character chart among the three attention activation determinations, it is determined that the attention mechanism is not redundant. Alternatively, if even one is determined to be active, it is determined that the attention mechanism is not a redundant attention mechanism. Then, the detected redundant attention mechanism is deleted, and a new structure of the neural network is acquired.
An example of the structure of the neural network newly acquired in this way is shown in
As described above, in the third embodiment, the activation determination for a plurality of different image quality evaluation images is performed. As a result, it is possible to acquire a neural network in which attention processing is executed only on a feature amount effective for at least one image quality evaluation item. Then, it is possible to acquire a neural network in which the speed reduction of the attention processing is suppressed while improving the image quality for the image quality item to be emphasized.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation to encompass all such modifications and equivalent structures and functions.
In addition, as a part or the whole of the control according to the embodiments, a computer program realizing the function of the embodiments described above may be supplied to the information processing apparatus and the like through a network or various storage media. Then, a computer (or a CPU, an MPU, or the like) of the information processing apparatus and the like may be configured to read and execute the program. In such a case, the program and the storage medium storing the program configure the present invention. Then, a computer (or a CPU, an MPU, or the like) of the movable apparatus and the like may be configured to read and execute the program. In such a case, the program and the storage medium storing the program configure the present invention.
Additionally, the present invention also includes a configuration that can be realized by using, for example, at least one processor or a circuit configured to function as the embodiments described above. Note that a plurality of processors may be used to perform distributed processing.
This application claims the benefit of priority from Japanese Patent Application No. 2023-067814, filed on Apr. 18, 2023, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2023-067814 | Apr 2023 | JP | national |