The present invention relates to an information processing apparatus, an information processing method, and a non-transitory computer-readable storage medium.
In recent years, in data processing such as image processing technology for improving image quality of an image, a technique using a neural network (NN) has been actively developed. For example, there is a technology for achieving image quality enhancement image processing such as noise removal, blur removal, and super-resolution by using the NN (Restormer: Efficient Transformer for High-Resolution Image Restoration, Inception Institute of AI, Nov. 18, 2021, (searched on Dec. 13, 2023), Internet <URL: https://openaccess.thecvf.com/content/CVPR2022/papers/Zamir_Restormer_Effic ient_Transformer_for_High-Resolution_Image_Restoration_CVPR_2022_paper.pdf>).
Since recent the NN has a large number of layers and a large amount of calculation, a high-speed computer is used at the time of learning. However, in data processing in post-learning inference, calculation resources are often limited, and a more efficient calculation method is required.
As a calculation method at the time of efficient inference, a method of performing calculation by quantizing the weights and the feature amounts of the NN into low-precision numerical values is known. Performing quantization enables even equipment having poor calculation resources such as embedded equipment to operate. Even in a general-purpose computer, there is a case where a high-throughput calculation instruction such as a single instruction multiple data (SIMD) instruction can be used by quantizing the weights of the NN or the like, and speeding up can be expected.
However, by quantizing the weights, the feature amounts, and the like of the NN into low-precision numerical values, precision of data such as the resolution of an image output by the NN generally decreases. In particular, in a case where a weight or the like is quantized at a bit depth smaller than a bit depth of original data, precision of data such as gradation of an image to be output becomes rough, and deterioration thereof remarkably appears. For example, regarding an NN intended for image quality enhancement image processing, in a case where an input image to a target NN is a RAW image of 12-bit to 14-bit, if the weights, the feature amounts, and the like of the NN are quantized to 8-bit, the image quality enhancement image output by the NN is also output in 8-bit, and thus it is not possible to express the gradation of the original image. The RAW image is finally converted into an 8-bit JPEG image or the like by development processing, but since the gradation of the RAW image that is an original image is rough, the converted 8-bit JPEG image also has a rough gradation as a result, and an image with deteriorated image quality is output.
According to the present invention, final data deterioration is suppressed even when quantization is performed at a bit depth smaller than a bit depth of processing target data in the NN used for data processing such as image quality enhancement image processing.
According to one aspect of the present disclosure, there is provided an information processing apparatus configured to process target data by a neural network, the information processing apparatus comprising: an input data acquisition unit configured to acquire the target data; a supervisory data acquisition unit configured to acquire supervisory data; and a learning unit configured to perform learning so as to reduce an error between output data obtained by inputting, to the neural network, and processing the target data and the supervisory data, and updates a parameter of the neural network, wherein the supervisory data acquisition unit acquires the supervisory data subjected to depth conversion processing of converting a value of the supervisory data with a resolution matching a characteristic of the target data in a case where a bit depth of the supervisory data is a second bit depth smaller than a first bit depth of the target data.
According to another aspect of the present disclosure, there is provided an information processing method of processing target data by a neural network, the information processing method comprising: acquiring the target data; acquiring supervisory data; performing learning so as to reduce an error between output data obtained by inputting, to the neural network, and processing the target data and the supervisory data, and updating a parameter of the neural network; and in acquisition of the supervisory data, acquiring the supervisory data subjected to depth conversion processing of converting a value of the supervisory data with a resolution matching a characteristic of the target data in a case where a bit depth of the supervisory data is a second bit depth smaller than a first bit depth of the target data.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing a computer program for, when read and executed by, a computer that processes target data by a neural network, the computer acquires the target data; acquires supervisory data; performs learning so as to reduce an error between output data obtained by inputting, to the neural network, and processing the target data and the supervisory data, and updates a parameter of the neural network; and in acquisition of the supervisory data, acquires the supervisory data subjected to depth conversion processing of converting a value of the supervisory data with a resolution matching a characteristic of the target data in a case where a bit depth of the supervisory data is a second bit depth smaller than a first bit depth of the target data.
Further features of the present invention will become apparent from the following description of exemplary embodiments (with reference to the attached drawings).
Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed invention. Multiple features are described in the embodiments, but limitation is not made to an invention that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
The present invention will be described below in detail based on preferred embodiments thereof with reference to the accompanying drawings. Note that the configurations illustrated in the following embodiments are merely examples, and the present invention is not limited to the illustrated configurations.
The CPU 11 is an abbreviation for central processing unit, and controls the information processing apparatus 1 by reading a control program stored in the ROM 12, developing the control program in the RAM 13, and executing the control program. The CPU 11 includes an SIMD instruction that collectively calculates 8-bit integer types, and uses the SIMD instruction in inference processing described later. The information processing apparatus 1 may include other processors such as a micro processing unit (MPU), a graphics processing unit (GPU), and a quantum processing unit (QPU) in place of the CPU 11 or in addition to the CPU 11.
The ROM 12 is an abbreviation for read only memory, and is a nonvolatile memory. The ROM 12 stores a control program, various parameter data necessary for executing the program, and the like. The control program is executed by the CPU 11 and implements each processing described later.
The RAM 13 is an abbreviation for random access memory, and is a volatile memory. The RAM 13 temporarily stores an image, a control program, and an execution result thereof.
The secondary storage device 14 rewritably stores various programs and data such as image data used for the processing of the present embodiment. The secondary storage device 14 is, for example, a nonvolatile storage device such as a hard disk drive (HDD), a solid state drive (SSD), or a flash memory. The secondary storage device 14 stores, for example, an image used for calculation of the NN, a control program such as a model of the NN, a processing result of the control program, and the like. The information stored in the secondary storage device 14 is output to the RAM 13 in response to a request from the CPU 11 or the like, and is used by the CPU 11 to execute a program.
The input device 15 serves as an interface with the outside such as a user. The input device 15 may be a mouse, a keyboard, or the like that acquires input from the user.
The display device 16 is, for example, a monitor such as a liquid crystal display and an organic electro luminescence (EL) display. The display device 16 displays a processing result of the program, an image, and the like.
The connection bus 17 connects the CPU 11, the ROM 12, the RAM 13, the secondary storage device 14, the input device 15, and the display device 16 constituting the information processing apparatus 1, and performs data communication with one another.
In the present embodiment, the CPU 11 executes software or a program to implement processing and functions described later, but some or all of the processing and functions may be implemented by hardware. The hardware may be, for example, a dedicated circuit (application specific integrated circuit (ASIC), field programmable gate array (FPGA)), a processor (reconfigurable processor, digital signal processor (DSP)), and the like.
The information processing apparatus 1 may acquire, via a network or various storage media, software or a program describing functions and processing described later, and execute the software or the program on a processing apparatus (processor such as a CPU and a GPU) such as a personal computer.
Based on image data, the information processing apparatus 2 causes a model of the NN to learn, updates parameters such as a weight of the model of the NN, and quantizes the model. The image data is an example of target data. Note that the term “image” may be used as a term including a moving image, a still image, a video, and data thereof. The information processing apparatus 2 has functions of an input data acquisition unit 201, a model acquisition unit 202, a learning unit 203, a supervisory data acquisition unit 204, and a quantization unit 205.
The input data acquisition unit 201 acquires input data to be input to the model of the NN acquired by the model acquisition unit 202. The input data acquisition unit 201 acquires an image as input data, for example. For example, the input data acquisition unit 201 acquires, as input data, an image in which a 14-bit RAW image is converted into an unsigned integer type 16-bit type.
The model acquisition unit 202 acquires a model of the NN.
The learning unit 203 has a function for learning the model of the NN acquired by the model acquisition unit 202. The learning unit 203 includes a quantization parameter acquisition unit 206 and a weight determination unit 207.
The quantization parameter acquisition unit 206 acquires a quantization parameter qi used for quantization of each layer of the model of the NN acquired by the model acquisition unit 202. In a case of not being necessary to distinguish as to which layer a quantization parameter is of, it is described as a quantization parameter q. In the example of
The weight determination unit 207 updates and determines, by learning, the weights of the model of the NN acquired by the model acquisition unit 202. For example, the weight determination unit 207 performs inference processing on an image acquired by the input data acquisition unit 201 with respect to the model of the NN, outputs an output image (output data), and calculates a loss (objective function) between the supervisory data acquired by the supervisory data acquisition unit 204 and the output image. In the present embodiment, the objective function is a square error, which is an example of an error. Thereafter, the weight determination unit 207 calculates a gradient using an error back propagation method using the calculated loss, performs learning so as to reduce the loss, and calculates an update amount of the weights of the model. The weight determination unit 207 updates the weights of the model of the NN by the update amounts. As a learning method of weight update, a known learning method of NN may be applied, and detailed description will be omitted.
The supervisory data acquisition unit 204 acquires and converts supervisory data. The supervisory data acquisition unit 204 includes a 16-bit supervisory data acquisition unit 209 and a supervisory data bit depth conversion unit 210.
The 16-bit supervisory data acquisition unit 209 acquires data to be input to the supervisory data bit depth conversion unit 210. The 16-bit supervisory data acquisition unit 209 acquires, for example, an image in which a 14-bit RAW image is converted into an unsigned integer type 16-bit. The image is a high-quality image from which noise of the image acquired by the input data acquisition unit 201 has been removed.
The supervisory data bit depth conversion unit 210 executes depth conversion processing of converting an unsigned integer type 16-bit image acquired by the 16-bit supervisory data acquisition unit 209 into an unsigned integer type 8-bit image. 16-bit is an example of the first bit depth, and 8-bit is an example of the second bit depth. The supervisory data bit depth conversion unit 210 converts an unsigned integer type 16-bit into an unsigned integer type 8-bit using a preset lookup table, for example.
Using the quantization parameter, the quantization unit 205 quantizes the output values such as the weights and the feature amounts of each layer of the NN learned by the learning unit 203. For example, the quantization unit 205 quantizes the weights and the output values to the same bit depth as the bit depth of the image data output by the supervisory data bit depth conversion unit 210. Therefore, in the present embodiment, the quantization unit 205 quantizes the weights and the output values into 8-bit. For details, a known NN quantization method may be applied, and description thereof will be omitted.
The information processing apparatus 3 executes inference processing on the RAW image by using the model of the NN learned by the information processing apparatus 2, and thereafter converts an image of an inference result having the bit depth changed into an image in a format such as a JPEG image, that is, develops the image. The information processing apparatus 3 includes an inference data acquisition unit 215, a quantization model acquisition unit 211, an inference data bit depth conversion unit 212, and a development unit 213. The inference data bit depth conversion unit 212 is an example of a depth conversion unit.
The inference data acquisition unit 215 acquires image data used for inference. For example, the inference data acquisition unit 215 acquires and passes, to the quantization model acquisition unit 211, an 8-bit or 16-bit RAW image.
The quantization model acquisition unit 211 acquires a model of the NN quantized by the quantization unit 205. For example, the quantization model acquisition unit 211 acquires a model of the NN including an 8-bit weight quantized by the quantization unit 205. The quantization model acquisition unit 211 executes inference processing on the RAW image acquired by the inference data acquisition unit 215 by the model of the NN having been acquired, and outputs an 8-bit image as an inference result.
The inference data bit depth conversion unit 212 executes depth conversion processing of converting, from an unsigned integer type 8-bit to an unsigned integer type 16-bit, an output value (e.g., pixel value of the image) of the quantized NN model acquired by the quantization model acquisition unit 211. The inference data bit depth conversion unit 212 converts the bit depth from the unsigned integer type 8-bit to the unsigned integer type 16-bit using the same lookup table as the lookup table used by the supervisory data bit depth conversion unit 210. Note that the lookup tables used for conversion of both bit depths may be substantially the same. Here, the supervisory data bit depth conversion unit 210 performs conversion from the “unsigned integer type 16-bit” column to the “unsigned integer type 8-bit” column in
The development unit 213 performs image processing such as gradation change on a RAW image, and finally converts the RAW image into an image in any format such as a JPEG image and a PNG image. For example, the development unit 213 performs format conversion on the unsigned integer type 16-bit RAW image in which the bit depth has been converted by the inference data bit depth conversion unit 212, and outputs a 16-bit image. The development unit 213 includes, for example, the gradation change unit 214. The gradation change unit 214 is an example of a data change unit.
The gradation change unit 214 performs gradation change processing in the development unit 213. The gradation change unit 214 executes gradation change processing on the 16-bit image using the tone curve illustrated in
The flowchart of
In S501, the model acquisition unit 202 acquires and outputs, to the weight determination unit 207, the model of the NN.
In S502, the quantization parameter acquisition unit 206 acquires and outputs, to the weight determination unit 207, the quantization parameter q from the model acquisition unit 202.
In S503, the input data acquisition unit 201 acquires and outputs, to the weight determination unit 207, a mini-batch image of a learning data set as an unsigned integer type 16-bit input image.
In S504, the 16-bit supervisory data acquisition unit 209 acquires, as supervisory data, and outputs, to the supervisory data bit depth conversion unit 210, image data of the unsigned integer type 16-bit corresponding to the input image acquired in S503.
In S505, the supervisory data bit depth conversion unit 210 executes depth conversion processing of converting the bit depth for the supervisory data. For example, using the lookup table of
In S506, the weight determination unit 207 performs, using the mini-batch image acquired from the input data acquisition unit 201, inference processing on the model of the NN acquired from the model acquisition unit 202. Using the inference result, the weight determination unit 207 calculates a loss (objective function) with supervisory data of the unsigned integer type 8-bit acquired from the supervisory data bit depth conversion unit 210.
In S507, the weight determination unit 207 calculates the gradient using the error back propagation method using the calculated loss, and calculates the update amounts of the weights of the model of the NN.
In S508, the weight determination unit 207 updates the weights of the NN based on the update amount weights having been calculated.
In S509, the weight determination unit 207 generates and outputs a model of the NN in which the weights is updated.
As described above, the information processing apparatus 2 repeats the processing in the procedure from S501 to S509 until the learning loss converges, and determines the weights of the model of the NN. Note that when the learning loss converges and the learning ends, the weight determination unit 207 outputs the model of the NN to the quantization unit 205. Using the quantization parameter, the quantization unit 205 quantizes the model of the NN acquired from the weight determination unit 207. For example, using the quantization parameter, the quantization unit 205 quantizes the weights and the feature amounts of the model of the NN acquired from the weight determination unit 207 to 8-bit.
The flowchart of
In S601, the quantization model acquisition unit 211 acquires a model of the NN quantized to 8-bit, for example, by the quantization unit 205.
In S602, the inference data acquisition unit 215 acquires and outputs, to the quantization model acquisition unit 211, a RAW image of the unsigned integer type 16-bit, for example, for inference as an input image. By this, the model of the NN acquired by the quantization model acquisition unit 211 executes inference processing on the image for inference acquired by the inference data acquisition unit 215. The model of the NN outputs an unsigned integer type 8-bit image as an inference result to the inference data bit depth conversion unit 212.
In S603, the inference data bit depth conversion unit 212 executes depth conversion processing of converting the bit depth on the inference result of the model of the NN having been quantized. For example, by depth conversion processing using the lookup table of
In S604, the development unit 213 performs development processing on the unsigned integer type 16-bit image, and converts the image into any image format such as a JPEG image and a PNG image. In this conversion processing, the gradation change unit 214 performs gradation change using the tone curve of
As described above, the information processing apparatus 3 performs from inference to development processing by the quantized NN model after learning in the procedure of S601 to S604.
The unsigned integer type 16-bit image output from the inference data bit depth conversion unit 212 is finally subjected to gradation processing by the tone curve of
The supervisory data bit depth conversion unit 210 converts the pixel values in accordance with the resolution with respect to the pixel values in the processing of converting unsigned integer type 16-bit data into unsigned integer type 8-bit data. Specifically, in the bit depth conversion processing, the supervisory data bit depth conversion unit 210 generates supervisory data by performing conversion with a fine gradation in a low luminance region requiring fine gradation at the time of development and performing conversion with a rough gradation in a low luminance region and thereafter. Then, the model of the NN performs learning so as to output an unsigned integer type 8-bit image with similar gradation using 8-bit supervisory data where the bit depth is converted. When converting the unsigned integer type 8-bit image output from the model of the NN into the unsigned integer type 16-bit, the inference data bit depth conversion unit 212 converts the image into an unsigned integer type 16-bit by a reverse procedure using a similar lookup table. Therefore, the gradation of the low luminance region of the unsigned integer type 16-bit data converted by the inference data bit depth conversion unit 212 becomes fine, and corresponds to the gradation of the tone curve of the development processing after the inference processing. As a result, fine gradation is maintained in a fine gradation change part in the gradation processing of the tone curve, and deterioration of final image quality can be suppressed.
Note that in the present embodiment, in order to explain a general development processing procedure in which the development unit 213 performs development processing on a 16-bit image, an example has been described in which after the inference data bit depth conversion unit 212 converts unsigned integer type 8-bit data into unsigned integer type 16-bit data, the development unit 213 performs the development processing. However, if the development unit 213 has specifications of processing data of unsigned integer type 8-bit, it is possible to obtain the effects of the present proposal even if the inference data bit depth conversion unit 212 and the gradation change unit 214 do not exist.
In the present embodiment, after learning of the model of the NN having the quantization parameter, the quantization unit 205 performs 8-bit quantization on the model of the NN. However, even if after learning of the normal NN model having no quantization parameter, the quantization unit 205 performs 8-bit quantization on the model of the NN, it is possible to obtain the effects of the present proposal. Even if the model of the NN already quantized to 8-bit learns 8-bit as it is, it is possible to obtain the effects of the present proposal.
In the first embodiment described earlier, the lookup table is used for the bit depth conversion processing of the supervisory data bit depth conversion unit 210, but an equation may be used.
Since the target of the gradation processing is an unsigned integer type 16-bit image, the value calculated by Equation (1) is converted into an integer value.
Equation (2) is an equation for converting the pixel value in accordance with the resolution with respect to the pixel value (luminance) in a case where a 16-bit image is converted into an 8-bit image by the bit depth conversion processing. Since Equation (2) is an equation for converting an unsigned integer type 16-bit integer value into am unsigned integer type 8-bit integer value, the unsigned integer type 16-bit image is converted into the unsigned integer type 8-bit image by Equation (2). Since the unsigned integer type 16-bit image is originally an image in which a 14-bit RAW image is type-converted, the maximum value of x in Equation (2) is 16383. As seen by comparing Equations (1) and (2), use of Equation (2) enables the supervisory data bit depth conversion unit 210 to convert the unsigned integer type 16-bit image into the unsigned integer type 8-bit image, while enabling the gradation change unit 214 to convert the image with the resolution of the gradation similar to that of the tone curve of Equation (1) of the present embodiment.
Equation (3) is an equation for converting the pixel value in accordance with the resolution with respect to the pixel value (luminance) in a case where an 8-bit image is converted into a 16-bit image by the bit depth conversion processing. Since Equation (3) is an equation for converting an unsigned integer type 8-bit integer value into an unsigned integer type 16-bit integer value, the integer value of the unsigned integer type 8-bit image is converted into the unsigned integer type 16-bit integer value. Since the unsigned integer type 16-bit image is assumed to be an image in which a 14-bit RAW image is type-converted, the calculation value of Equation (3) takes only a value of 16383 at a maximum. Since Equation (3) is an inverse function of Equation (1), use of Equation (3) enables the inference data bit depth conversion unit 212 to convert the unsigned integer type 8-bit image into the unsigned integer type 16-bit image, while enabling the gradation change unit 214 to convert the image with the resolution of the gradation similar to that of the tone curve of Equation (1) of the present embodiment.
As described above, according to the present embodiment, the effects of the present proposal can be obtained also by performing, with an equation, the bit depth conversion of the supervisory data bit depth conversion unit 210 and the inference data bit depth conversion unit 212.
Note that the equations and the lookup tables of the supervisory data bit depth conversion unit 210 and the inference data bit depth conversion unit 212 may be acquired by learning.
In the embodiments of the present proposal, the luminance of the image has been described as a processing target, but the present invention is also applicable to a frequency, and is also applicable to processing of voice other than the image, a regression task, and the like.
The above-described embodiments may be combined. For example, the user may be allowed to select the lookup table of the first embodiment and the equation of the second embodiment.
In the above-described embodiments, the 8-bit and 16-bit images have been described, but the bit number of an image may be appropriately changed.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2023-216193, filed Dec. 21, 2023, hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2023-216193 | Dec 2023 | JP | national |