The present invention relates to an image processing apparatus and method, and a storage medium, and more particularly to an image segmentation technique.
Conventionally, semantic segmentation is known in machine learning. Semantic segmentation is a task of dividing an image into a plurality of regions of objects. To perform semantic segmentation, the image is detailed pixel by pixel and output. At this time, the training data for semantic segmentation needs to be labeled for each pixel.
Therefore, in semantic segmentation, the annotation work (the work of labeling images and creating training data) is very burdensome, so to reduce the workload, superpixels are used. A superpixel is a small region formed by grouping pixels with similar colors and/or textures.
For example, PTL 1 discloses a method for segmenting an image using hybrid-scale superpixels. Specifically, a user replaces larger-scale superpixels in a region of interest (Region of Interest: ROI) with smaller-scale superpixels (region size of the superpixel) to improve segmentation, thereby achieving better boundary depiction.
Further, PTL 2 discloses a region discrimination apparatus that discriminates regions based on a saliency map and superpixels.
However, according to the technique described in PTL 1, when superpixels in an ROI are replaced with superpixels of a smaller scale, the scale is based on a user's instruction, and therefore a good segmentation boundary may not be always obtained. That is, if the scale is too large, the boundary accuracy deteriorates, and if the scale is too small, the workload of selecting superpixels when segmenting an image increases.
In addition, in PTL 2, the scale of the superpixels is not changed. Therefore, if an object having a size different from an expected size or a plurality of objects with different sizes exist in the same image, it is not possible to generate superpixels of an appropriate scale, and it is not possible to attain the segmentation boundary with good accuracy.
The present invention has been made in consideration of the above problems, and aims to strike a balance between improvement of boundary accuracy using superpixels and reduction of the load of segmentation work when segmenting an image in annotation work.
According to the present invention, provided is an image processing apparatus comprising one or more processors and/or circuitry which function as: an input unit that inputs image data of an image; an acquisition unit that acquires a size of an object region including an object to be extracted that is included in the image; a setting unit that sets a division number into which the object region is divided; a determination unit that determines a segment size of a superpixel based on the size of the object region and the division number; and a generation unit that generates, using the image data, superpixels each having a size in a predetermined range that includes the segment size determined by the determination unit.
Further, according to the present invention, provided is an image processing method comprising: inputting image data of an image; acquiring a size of an object region including an object to be extracted that is included in the image; setting a division number into which the object region is divided; determining a segment size of a superpixel based on the size of the object region and the division number; and generating, using the image data, superpixels each having a size in a predetermined range that includes the determined segment size.
Furthermore, according to the present invention, provided is a non-transitory computer-readable storage medium, the storage medium storing a program that is executable by the computer, wherein the program includes program code for causing the computer to function as an image processing apparatus comprising: an input unit that inputs image data of an image; an acquisition unit that acquires a size of an object region including an object to be extracted that is included in the image; a setting unit that sets a division number into which the object region is divided; a determination unit that determines a segment size of a superpixel based on the size of the object region and the division number; and a generation unit that generates, using the image data, superpixels each having a size in a predetermined range that includes the segment size determined by the determination unit.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
Hereinafter, embodiments will be described in detail with reference to the attached drawings. Note, the following embodiments are not intended to limit the scope of the claimed invention. Multiple features are described in the embodiments, but limitation is not made to an invention that requires all such features, and multiple such features may be combined as appropriate. Furthermore, in the attached drawings, the same reference numerals are given to the same or similar configurations, and redundant description thereof is omitted.
The image input unit 101 inputs an image 106 (image data) to be analyzed to the image processing system 100, and the image processing unit 102 executes application software to perform segmentation processing on the image 106 input to the image input unit 101. The operation input unit 103 is composed of a mouse, keyboard, tablet, etc., and an operator operates the operation input unit 103 to input information to the image processing unit 102. The display unit 104 interactively displays the image being processed by the image processing unit 102 and the operation results of the operation input unit 103, etc. The label output unit 105 outputs a segmentation result, which is the processing result of the image processing unit 102, as a label 107. The output label 107 is stored in a storage device (not shown).
Next, a functional configuration of the image processing unit 102 will be described.
When dividing the image 106 into superpixels, a target division number setting unit 121 sets the number of superpixels into which a target object in the image that is to be segmented (extracted) is to be divided (target division number). A condition determination unit 122 determines superpixel generation conditions based on the size of the region of the target object (object region) and the set target division number. Here, the average segment size (average scale) of the superpixels is calculated as the superpixel generation condition. A superpixel generation unit 123 generates superpixels based on the average segment size.
A superpixel extraction unit 124 performs segmentation using superpixels corresponding to a target object selected by a GUI tool of the application software using the operation input unit 103. A segmentation correction unit 125 corrects errors in the segmentation results by the superpixel extraction unit 124. Specifically, among the superpixels selected by segmentation, a region of a superpixel that protrudes from the target object or a region of a missing or insufficient superpixel is corrected pixel by pixel using a pen tool or the like to bring the region closer to the region of the target object.
The storage unit 212 is composed of a main storage unit 215 (ROM or RAM, etc.) and an auxiliary storage unit 216 (magnetic disk device, SSD: Solid State Drive, etc.).
The CPU 210 performs calculations and control, and executes programs stored in the storage unit 212, thereby functioning as the image processing unit 102 of the image processing system 100 shown in
The computer 200 may include one or more CPUs 210 and storage units 212. That is, if at least one or more processing devices (CPUs) are connected to at least one storage device, and if the at least one or more processing devices execute a program stored in at least one or more storage devices, the computer 200 functions as the image processing unit 102. The configuration that functions as the image processing unit 102 is not limited to the CPU 210, and may be a FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integration Circuit), etc.
Next, details of the target division number setting unit 121 in this embodiment will be described with reference to
In an image 310, an object 312 is a dental restoration, which is the target object of segmentation.
The superpixel generation unit 123 executes superpixel processing on the entire image to be analyzed. Images 320 and 330 are enlarged images of a region 311 around a target object in an image in which the result of the superpixel processing is superimposed on an input image. In the superpixel processing, an average segment size, which is a setting value indicating the size of the superpixels with which the image is divided, is set. The image 320 is an example in which the average segment size is small, and the image 330 is an example in which the average segment size is large.
The larger the average segment size, the larger the size of each superpixel, and the lighter the extraction load on the superpixel extraction unit 124. On the other hand, as shown in the image 330, the more likely it is that superpixels will protrude from the target object as a protrude portion 331, or that part of the target object will not be included in the superpixels as a missing or insufficient portion 332, resulting in lower segmentation accuracy.
By contrast, the smaller the average segment size, the smaller the size of each superpixel, which increases the extraction workload on the superpixel extraction unit 124, but the higher the segmentation accuracy for the target object, as can be seen in the image 320.
Thus, there is a trade-off between the workload of superpixel extraction and segmentation accuracy.
The target division number setting unit 121 sets how many superpixels the target object, i.e., in this embodiment, the restoration in the dental image, is to be divided into (the target division number). For example, in the example of image 320 of
Next, an operation of the image processing system 100 in the first embodiment will be described with reference to the flowchart shown in
First, in step S400, the CPU 210 inputs an image to be analyzed via the image input unit 101.
In step S401, the CPU 210 sets a target division number in the image processing unit 102 by the target division number setting unit 121 through the operator operating the operation input unit 103. As described above, there is a trade-off between the workload of superpixel extraction and the segmentation accuracy. The operator sets a target division number that does not impose a large workload of superpixel extraction and provides sufficient segmentation accuracy. In many cases, a balance between the workload and the segmentation accuracy can be struck by setting the target division number between 10 and 100. In this embodiment, the target division number is described as 50 as an example, but is not limited to this value. In a case where the target division number is set to 50, the operator selects approximately 50 superpixels that constitute the target object.
In step S402, the CPU 210 specifies a target object by the condition determination unit 122 based on the operator's operation of the operation input unit 103.
As an example, an input image 500 to be analyzed is a dental image, and target objects 511, 512, and 513 are dental restorations. Specifying a region of a target object is to roughly surround a target object with a closed curve. As an example, the region of the target object 511 may be specified by a circle 501 using a circle drawing tool, and as another example, the region of the target object 512 may be specified by a rectangle 502 using a rectangle drawing tool. As yet another example, the region of the target object 513 may be specified by a closed curve 503 using a freehand tool. When the freehand tool is used, the rough shape of the target object 513 can be obtained.
As the CPU 210 specifies a target object by the method described above or the like, it obtains the number of pixels (size) of the region within the drawn closed curve as the approximate target size.
Given that the target objects have the same size, the more complex the shape of the target object, the lower the segmentation accuracy by superpixels. Therefore, the value of the target division number set in step S401 may be corrected based on the complexity of the shape of the target object obtained by using the freehand tool. The complexity is calculated using curvature entropy, circularity, etc., and the larger this value is, the more the value of the target division number is corrected. In this way, it is possible to improve the accuracy of segmentation by superpixels by taking into account the shape of the target object. Note that a default value may be set in step S401 as the target division number, and the target division number may be re-determined in step S402 based on the obtained shape of the target object.
As another example, deep learning may be used to perform object detection of teeth and dental diseases, and the region of the target object may be specified by selecting one of the rectangular detection results obtained.
In step S403, an average segment size is calculated. Here, the CPU 210 obtains an approximate number of pixels (size) of the region of the target object from the closed curve surrounding the target object specified in step S402 by the condition determination unit 122, and divides it by the target division number. In this way, the average segment size of the superpixel is calculated.
In step S404, superpixels are generated. Here, the CPU 210 uses the superpixel generation unit 123 to execute a superpixel processing on the image to be analyzed using the average segment size calculated in step S403. In this embodiment, LSC (Linear Spectral Clustering) is used as the superpixel algorithm.
It should be noted that other algorithms may be used, such as SEEDS (Superpixel Extracted via Energy-Driven Sampling) or SLIC (Simple Linear Iterative Clustering). Depending on the implementation, the size of the image to be analyzed may be divided by the average segment size to obtain the number of superpixels in the entire image, which may then be used as input for the superpixel processing.
Furthermore, the segment size of each superpixel may be within a predetermined range including the average segment size calculated in step S403.
In step S405, superpixels corresponding to the target object are extracted. Here, the CPU 210 uses the superpixel extraction unit 124 to extract superpixels corresponding to the target object based on the operator's work. In this process, the CPU 210 displays a GUI of the application software on the display unit 104, displays the input image on the GUI, and superimposes a layer of the segment map of the generated superpixels on the input image. Then, at a position on the image to be analyzed designated by the mouse or through a tablet, the superpixel corresponding to that position is displayed, and an region corresponding to the target object is extracted in units of superpixels.
In step S406, segment correction is performed. Here, the CPU 210 uses the segmentation correction unit 125 to correct the region extracted in step S405 based on the operator's work. Specifically, in the region extracted in units of superpixels, portions that are protruding from the target object, or portions of the object that are not included in the superpixels are corrected in units of pixel. An eraser tool, pen tool, or the like is used for the correction.
If there are N types of approximate target sizes of target objects to be segmented in the input image, the above-mentioned processes from step S402 to step S406 are performed N times for each approximate target size. In other words, if there are a plurality of target objects in the input image, the same superpixel processing is performed for target objects having approximately the same approximate target size, and for target objects of different sizes, the regions of the target objects are redesignated, the conditions for the superpixel processing are changed, and the superpixel processing is performed in different cycles.
The types of approximate target sizes can be classified using a plurality of thresholds. Further, as described above, in a case where the region of the target object is specified by a predetermined shape such as a circle or a rectangle, circles and rectangles of a plurality of sizes may be prepared in advance, and the approximate target size may be classified by using a circle or rectangle of an appropriate size from among them.
In step S407, the processing results are output. Here, the segment map corrected in step S406 is output in association with the input image and saved as a label. The label is saved in IndexPNG format (also known as palette format), which is often used as training data for semantic segmentation.
The format of the label may be bitmap format or another format, and can be selected according to the application of the label.
Next,
A GUI 600 of the application software is displayed on the display unit 104, and has various control areas for achieving segmentation using superpixels.
By pressing “Open” button 611, the directory of the image to be segmented can be selected.
“ImageList” 620 displays a list of images registered in the directory, and a selected image from among them is displayed in a picture box 630. In the picture box 630, a pointer 631 or a circular pointer 632 that can be controlled by the operation input unit 103 is displayed.
In “TargetSetting” 690, various tools used in specifying the region of the target object in step S402 are arranged. In “targetSelect” 691, which is a group of radio buttons, tools for specifying a closed curve for roughly specifying the region of a target object are arranged. In this embodiment, as an example, “rectangle” for specifying a rectangular region, “circle” for specifying a circular region, “FreeHand” for specifying a free region, and “Disable” are arranged. When “Disable” is selected, the rough specification of the region of a target object is disabled, and when a radio button other than “Disable” is selected, the rough specification of the region of a target object is enabled.
After obtaining a closed curve, the size of the region specified by the closed curve is displayed in “targetPixNum” 693.
“Nt” 692 is an area for inputting the target division number in step S401. Then, by pressing “targetAreaSet” 694, the average segment size calculated from the size of the region specified by the closed curve and the target division number is displayed in “AveSegSize” 643.
In “SuperPixelSetting” 640, input boxes for setting values of superpixel processing conditions are arranged. Using a superpixel algorithm selection tab 641, a superpixel algorithm can be selected. In the example of
“CFactor” 644 controls the shape of a superpixel; the higher the value, the more regular the shape of the superpixel.
“MinElementSize” 645 represents the minimum segment size of a superpixel; a superpixels smaller than this size is absorbed into a larger superpixel. Changing “MinElementSize” setting affects the number of superpixel divisions, so the average segment size obtained in step S403 is calculated taking this effect into account.
Note that these specific settings are merely examples and can be changed to suit the characteristics of the image to be segmented.
Since “SuperPixelSetting” 640 includes a plurality of setting items, each setting value can be recorded in a json file and then selected using the “ReadParamFile” 646 button or the like to read them all at once.
Then, by pressing “CalculateSuperPixel” button 647, the superpixel processing is performed on the image selected and displayed in the picture box 630.
An image 710 shows the result of the superpixel processing. An image 720 shows the input image on which a superpixel segment map is superimposed. The selected superpixels and corrected portion 721 are superimposed on a restoration 712. The corrected portion 721 is superimposed and displayed in a color corresponding to the selected index. An image 730 is an image in which the superpixels selected by the user in the image 720 and the correction result are output as labels.
When “CalculateSuperPixel” button 647 is pressed, the image 720 is displayed in the picture box 630.
In “PaintTool” 650 in
“SuperPixel” 651 is a tool used in the superpixel extraction process in step S405; by clicking in the picture box 630 using a mouse, tablet, etc., a superpixel corresponding to the designated coordinates is extracted and the corresponding superpixel is superimposed and displayed. An example of the superimposed display is the image 720 in
“Pen” 652 is a tool used in the segment correction process in step S406; by dragging in the picture box 630 using a mouse, tablet, etc., a region corresponding to the specified coordinates is extracted by freehand. Note that “PaintTool” is not limited to “Pen” 652 and “SuperPixel” 651, but may be a filling tool that fills in closed areas, or a tool for drawing shapes such as rectangles, circles, and triangles.
“ColorIndex” 660 is an area for specifying the index used when labeling in the superpixel extraction in step S405 and the segment correction in step S406.
“Eraser” 661 labels with Index 0. In this embodiment, Index 0 indicates a background label, and “Eraser” 661 is a so-called eraser tool.
When a color index is specified in a combo box 663, the color in a palette corresponding to that index is displayed in an area 664. Note that “Index” can be set to a value between 0 and 255, and for example, PascalVOC2012 may be the index value of the color map corresponding to the dataset.
“DisplaySettings” 670 includes an area where the display of labels and super pixels can be turned ON/OFF to facilitate labeling in the superpixel extraction process in step S405 and the segment correction process in step S406.
When the check box of “OverlayLabel” 671 is OFF, the label is not displayed in the picture box, and when it is ON, the label is displayed in the picture box with the transparency specified in the numerical value setting box 672. This makes it possible to work while checking whether the labeling is successful. When the check box “OverlaySuperpixel” 673 is ON, the superpixels are displayed, and when it is OFF, the superpixels are not displayed.
“Save” button 680 converts the label, which is the processing results generated in the picture box 630, into a predetermined format in the output in step S407 and saves it.
These processes may be performed by resizing (reducing) the input image to a smaller size, and then resizing the output label to the original size. The resizing algorithm may use, for example, the nearest neighbors method, and after resizing, the boundaries may be smoothed by performing “Erosion” and “Dilation” processes, which are enlargement and reduction processes in morphology processing.
In this way, by resizing an image to a smaller size once, the processing load of the superpixel processing can be reduced. However, if the image is resized to a too small size, the boundary accuracy deteriorates, so it is desirable to consider whether to perform resizing for each target object.
These configurations make it possible to balance the improvement of boundary accuracy using superpixels and the reduction of the load of segmentation work when segmenting images in annotation work.
Next, a second embodiment of the present invention will be described.
In the second embodiment, as shown in “targetSelectTool” 891, “SuperPixel” 805 is provided as a target area designation method used in the condition determination unit 122 in addition to “Disable”, “rectangle”, “circle”, and “FreeHand”.
Next, the operation of the image processing system 100 in the second embodiment will be described with reference to
When “SuperPixel” 805 is selected in step S402 of
By setting the first average segment size to a value larger than the second average segment size, the region of the target object can be roughly specified with coarse superpixels, reducing the workload, and by generating fine superpixels in step S404, high-precision processing can be achieved.
In addition, when the target object is specified by selecting large superpixels, the approximate shape of the target object can be known, so the value of the target division number set in step S401 may be corrected based on the complexity of the shape of the target object, as in the first embodiment. The complexity is calculated using curvature entropy, circularity, etc., and the larger this value is, the more the target division number value is corrected. In this way, by taking the shape of the target object into consideration, the segmentation accuracy of the superpixels can be improved.
As described above, according to the second embodiment, the approximate size of the region of the target object can be obtained with a smaller load, and the size can be obtained in a form that more closely reflects the shape of the object than using tools such as a rectangle or a circle. This makes it possible to realize superpixel processing with a more appropriate average segment size, and to further improve boundary accuracy.
Next, a third embodiment of the present invention will be described.
The division number correction unit 901 has a function of correcting the average segment size of superpixels. Specifically, it takes statistics on the usage of the eraser tool “Eraser” 661 for each index (object), that is, the number of corrections, and generates a correction amount for the target division number specified in step S401 based on the statistics. Furthermore, it has a function of correcting the target division number “Nt” 692 based on the correction amount when the next superpixels of the same index (object) are generated. Here, “Eraser” 661 indicates labeling with the index value of the background.
Next, the operation of the image processing system 900 in the third embodiment will be described with reference to the flowchart shown in
In the division number correction in step S1001, after the region of the target object is specified, the CPU 210 specifies the index of the object to which the region of the target object belongs, using the division number correction unit 901. The index is specified by the GUI of the application software (not shown). Then, if there is a division number correction amount to be acquired in step S1002 for the specified index, the target division number “Nt” 692 is corrected. The division number correction amount is calculated based on the usage statistics obtained by the segmentation correction unit 125. When the correction amount of index i is Hi, the target division number is corrected by multiplying the target division number “Nt” by Hi.
In step S1002, the division number correction amount is calculated based on the usage statistics obtained by the segmentation correction unit 125. Specifically, the CPU 210 uses the division number correction unit 901 to obtain statistics on the usage status of “Eraser” 661, and generates a correction amount for the target division number specified in step S401 for each index (object) based on the statistical values.
For example, if the average number of times the eraser tool is used per object for all indexes is Na, the average number of times the eraser tool is used per object for index i is Ni, and a coefficient for controlling the magnitude of the correction amount is k, then the correction amount Hi for index i may be calculated by Equation (1).
In this case, if the number of times the eraser tool is used for index i is average, that is, if Ni=Na, then Hi=1 and the target division number is not corrected. If the number of times the eraser tool is used for index i is higher than average, that is, if Ni>Na, then Hi becomes larger than 1, and the target division number “Nt” is corrected to a larger value, and the image segmentation becomes more accurate.
Conversely, if the number of times the eraser tool is used for index i is less than average, that is, if Ni<N, then Hi becomes smaller than 1, and the target division number “Nt” is corrected to a smaller value, and the load of image segmentation is reduced.
In addition, if Ni<Na, that is, if Hi is smaller than 1, in a case where the target division number “Nt” is corrected, the target division number will be corrected to a small value, and there is a concern that the image segmentation accuracy will drop. Therefore, if Ni<Na, correction may not be performed.
Note that, if an effective correction parameter is acquired, it is not necessary to acquire the division number correction amount in step S1002 and thereafter, and it is also possible to simply use the acquired correction parameter in step S1001.
As described above, according to the third embodiment, since it is considered that the segmentation accuracy is low for objects for which the eraser tool is used frequently, the target division number for the same object is corrected to be larger in the next segmentation. This enables segmentation with higher accuracy and makes it possible to reduce the number of times the eraser tool is used in the next segmentation.
Next, a fourth embodiment of the present invention will be described.
In the following explanation, it is assumed that the input image 106 is a dental image.
In dental images, specular reflection occurs due to saliva, and the color and texture of objects displayed in the intraoral photograph may appear different.
The reflecting region acquisition unit 1101 acquires a specular reflection region contained in a dental image, and highlights the reflecting region in the picture box 630 displayed on the display unit 104.
In the reflecting region highlighting display in step S1201, a pixel having a luminance equal to or greater than a certain threshold in the image histogram is determined to belong to a reflecting region, and the reflecting region is superimposed and displayed on the image in the picture box 630 to highlight the reflecting regions. The reflecting region may be determined based on the RGB gradation values.
As described above, according to the fourth embodiment, even if there is reflection in an intraoral photograph, by highlighting the specular reflection region, superpixel extraction and segment correction in steps S405 and S406 can be performed while paying attention to the specular reflection region. This makes it possible to realize highly accurate segmentation.
The present invention may be applied to a system made up of a plurality of devices, or to an apparatus made up of a single device.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD)™), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
Number | Date | Country | Kind |
---|---|---|---|
2022-175027 | Oct 2022 | JP | national |
This application is a Continuation of International Patent Application No. PCT/JP2023/035348, filed Sep. 28, 2023, which claims the benefit of Japanese Patent Application No. 2022-175027, filed Oct. 31, 2022, both of which are hereby incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2023/035348 | Sep 2023 | WO |
Child | 19175327 | US |