The technology described in the present specification relates to a biological image processing program, a biological image processing apparatus, and a biological image processing method.
A captured biological image (in other words, a medical image) may be denoised (in other words, enhanced in image quality) to segment a target site from the biological image.
For example, in Patent Literature 1, a medical image of a predetermined site of a subject is acquired, and a high-quality image is generated from the acquired medical image by using an image quality enhancement engine including a machine learning engine obtained by using learning data in which noise corresponding to a state of at least a partial region of the medical image is added to at least the partial region.
However, there is concern that denoising of a biological image may require complicated arithmetic processing and high-performance hardware resources.
In one aspect, the technique described in the present description aims to accurately perform segmentation of a biological image through simple arithmetic processing.
According to one aspect, a biological image processing program causes a computer for performing segmentation of a biological image to perform processing of: reducing a size of the biological image that has been input and extracting a feature point, by adding an attention block for averaging channels to a convolution block of the biological image; and outputting information relating to a feature image obtained by segmenting the biological image including the extracted feature point, by contracting the channels in a state in which the attention block has been added to the convolution block.
As one aspect, segmentation of a biological image can be accurately performed through simple arithmetic processing.
Hereinafter, embodiments will be described with reference to the drawings. However, the embodiments described below are merely examples, and there is no intention to exclude various modifications and applications of techniques that are not explicitly described in the embodiments. That is, the present embodiment can be modified and implemented in various ways without departing from the scope of the present embodiment.
In addition, each drawing is not intended to include only components illustrated therein, and may include other components. Hereinafter, in the drawings, parts denoted by the same reference numerals indicate the same or similar parts unless otherwise specified.
A program receives an input of a retinal optical coherence tomography angiographic (OCTA) image as an original image as indicated by reference numeral A1 (INPUT OCTA IMAGE).
The program segments and detects a foveal avascular zone (FAZ) region from the OCTA image and calculates FAZ parameters as indicated by reference numeral A2 (CALCULATE FAZ PARAMETERS). As the FAZ parameters, for example, an area (e.g., 0.4716 mm2), a circularity index (e.g., 0.6404), a circumference length (a perimeter of 3.04111 mm), an upper half circumference length (an upper half detailed calculation of perimeter of, for example, 1.6503 mm), a lower half circumference length (a lower half detailed calculation of perimeter of, for example, 1.3908 mm), and circumference lengths of four quadrants (a perimeter with angle of, for example, 0.8164 mm at 0 to 90°, 0.8339 mm at 90° to 180°, 0.7101 mm at 180° to 270°, and 0.6808 mm at 270° to) 360° may be calculated for a FAZ area (FAZ_AREA). Further, a circularity index indicates becoming closer to a perfect circle as a value thereof becomes closer to 1, and indicates becoming closer to a straight line as a value thereof becomes closer to 0. The program determines the center of the FAZ area and calculates the perimeters of the upper and lower halves by dividing the boundary of the FAZ area into two parts that are the upper and lower halves.
The program calculates a vascular density as indicated by reference numeral A3 (CALCULATION OF VASCULAR DENSITY). Details of a vascular density calculation process will be described below with reference to
The program uses a trained deep learning (DL) model for an original OCTA image to densify the image to obtain a denoised image as indicated by reference numeral A31.
The program generates a binary image from the denoised image as indicated by reference numeral A32 to calculate a full image vascular density % wiVD (e.g., 40.1203) including the FAZ area and a full image vascular density % wiVD_without_FAZ (e.g., 42.3787) not including the FAZ area.
Since the center of the FAZ area is not exactly at the center point of the image as indicated by reference numeral A33, the program sets the center of the FAZ area as the center of the image. Then, the program divides the image into an upper half and a lower half from the center of the FAZ area, and calculates the vascular densities of the upper and lower halves based on the presence or absence of the FAZ area (UPPER AND LOWER HALF VASCULAR). In the illustrated example, the vascular density of the upper half including the FAZ area % UH_VD is 39.7751, the vascular density of the upper half not including the FAZ area % UH_VD noFAZ is 42.3611, the vascular density of the lower half including the FAZ area % LH_VD is 38.8605, and the vascular density of the lower half not including the FAZ area % LH_VD noFAZ is 41.1779. The program calculates the para-foveal vascular density % pfVD (e.g., 40.8791) as indicated by reference numeral A34.
The program calculates a foveal vascular density including the FAZ area % fVD (a foveal vascular density with faz of, for example, 28.1084) and a foveal vascular density not including the FAZ area % fVD_without_FAZ (a foveal vascular density without faz of, for example, 38.5901) as indicated by reference numeral A35.
The program calculates an angular region vascular density (VASCULAR DENSITY WITH ANGLE) for each of the six angular regions with or without the FAZ area as indicated by reference numeral A36. For example, the angular region vascular density at 0° to 60° including the FAZ area 0-60 with FAZ is 35.8756, and the angular region vascular density at 0° to 60° not including the FAZ area 0-60 without FAZ is 39.0671.
An example of a DL model designed for biomedical image segmentation is Unet (see, e.g., Non-Patent Literature 1).
Unet has a contracting path (in other words, an encoder for extracting feature points) and an expanding path with skip connection (in other words, a decoder for image reconstruction).
The structure of Unet relies on a cascaded recurrent neural network (CNN) to extract a region of interest. The simplest way to promote somite formation is to build up more layers to make the network deeper.
In general, the deeper the network, the more parameter multiplications occur, requiring more computation power and storage capacity.
In the layer indicated by reference numeral B1, although the size of the input image is 320×320 with three channels, the size remains in 320×320 but the number of channels is increased to 128. The size contracts to 160×160 in the layer indicated by reference numeral B2, the size contracts to 80×80 in the layer indicated by reference numeral B3, the size contracts to 40×40 in the layer indicated by reference numeral B4, and the size contracts to 20×20 in the layer indicated by reference numeral B5. Note that in each layer, the number of channels remains at 128.
Model parameters are trimmed by fixing the number of feature channels/filters to 128 in both the contracting and expanding paths, replacing the concatenation (used for skip connection) with the Add layer, and transposing the 2D convolution operation layer (conv2D, activation-sigmoid) with upsampling.
Each convolution operation block (Conv-Block) in
In paying attention to the feature map layer as well as the contracting path and the expanding path, an attention block created by modifying a squeeze and excitation (SE) block (see, for example, Non-Patent Literature 2) is added.
In the attention block, global averaging of channels, followed by a fully connected layer, activation by relu and sigmoid functions, and multiplication of the resulting weights by each channel are performed. The combination of relu and sigmoid activation limits the channel attention weight to be between 0.5 and 1.0. That is, channels with low priority are suppressed. Here, the channels with low priority are channels with no feature points such as the outer shapes of FAZ areas or vascular parts. A biological image in which channels with no feature points is suppressed and only channels with a large number of feature points are included may be referred to as a feature image.
That is, channels with low priority may be reduced by half instead of eliminating the channels completely, and the network may determine importance of the channels in the next layer. In a Unet-type model, refinement of a high-level feature map at the center of the contracting and expanding paths is also important in accurate reconstruction of a segmented image.
In Non-Patent Literature 2 and the like, a feature map at a bottleneck was not examined. Therefore, the present embodiment introduces continuous narrowing of a channel by paying attention to a bottleneck. Here, only the functions of the decoder required to reconstruct a necessary segmented image are allowed. This is done by reducing the number of channels by half in successive layer blocks. For example, the number of channels decreases from 128 to 16 in four stages at the bottleneck (i.e., the middle block) indicated by reference numeral B5 in
In the expanding path from the middle block to the generation of the output image, contrary to the contracting path of the input image, the size is increased stepwise while the number of channels of the image is maintained at 128, and the output image has a size of 320×320 and three channels similar to the input image.
In the following, it will be described that bottleneck narrowing with attention block (Bottleneck Analysis or BNA) is significantly effective in identifying a fake FAZ area detected by the network in poor-quality OCTA images and helps to improve the overall segmentation accuracy of the FAZ.
The present embodiment utilizes reduction of an image size to extract important feature points and reduction of a channel at the bottleneck to reconstruct an accurate segmented image by the encoder network. In the present embodiment, this network is referred to as a lightweight bottleneck narrowing Unet with an attention block (LWBNA_UNet).
In
Batch normalization is performed on the convolution operation blocks (step S12).
The convolution operation blocks activate a relu (Activation-relu) (step S13). Then, the processing for the convolution operation blocks ends.
In
The attention blocks densify the channels (step S22).
The attention blocks activate the relu and sigmoid (ACTIVETION relu+sigmoid) (step S23).
The activated attention blocks are multiplied by the original attention blocks (step S24). Then, the processing for the attention blocks ends.
A block input to the segmentation process contracts to X-pix*Y-pix*F as indicated by reference numeral D1. M is a value equal to or greater than 1 based on the image size, and “-pix” represents pixels. The block further contracts to X-pix/2*Y-pix/2*2*M*F as indicated by reference numeral D2. F is a value equal to or greater than 8 based on the image size. The block further contracts to X-pix/4*Y-pix/4*4*M*F as indicated by reference numeral D3. The block further contracts to X-pix/8*Y-pix/8*8*M*F as indicated by reference numeral D4.
The number of channels is reduced to 8*M*F, 4*M*F, 2*M*F, and M*F in four stages at the bottleneck (i.e., in the middle block) indicated by reference numeral D5. Then, an FC1 layer and an FC2 layer are output as indicated by reference numeral D6. The output may be classified into Category-1, Category-2, Category-3, and the like. In the example (EXAMPLE 1) indicated by reference numeral D61, the data is classified into three categories: “Category-1-bad”, “Category-2-medium”, and “Category-3-good”. In addition, in the example (EXAMPLE 2) indicated by reference numeral D62, the data is classified into two categories: “Category-1-DH-” and “Category-3-DH+”. Note that, DH-represents no hemorrhage, and DH+ represents hemorrhage being present.
An unknown image is input to the DL model as indicated by reference numeral C1.
A trained DL model (Unet Trained DL model) is referred to in Unet as indicated by reference numeral C2.
The FAZ at the central portion of the image is corrected, and the fake FAZ at the bottom left portion of the image as indicated by reference numeral C3 is removed. As a result, an image indicated by reference numeral C4 is output.
Then, FAZ parameters are calculated as indicated by reference numeral C5.
Segmentation of FAZ in OCTA images in the lightweight DL model (LWBNA_Unet) of the present embodiment was tested. In addition, for comparison, a standard Unet model was constructed with and without attention blocks.
In
According to the results of
The OCTA image shown in
The biological image processing apparatus 1 has a server function and includes a CPU 11, a memory unit 12, a display controller 13, a storage device 14, an input interface (IF) 15, an external recording medium processing unit 16, and a communication IF 17 as illustrated in
The memory unit 12 is an example of a storage unit and a read only memory (ROM), a random access memory (RAM), and the like as an example. A program such as a basic input/output system (BIOS) may be written in the ROM of the memory unit 12. A software program in the memory unit 12 may be appropriately read and executed in the CPU 11. In addition, the RAM of the memory unit 12 may be used as a temporary recording memory or a working memory.
The display controller 13 is connected to a display device 131 and controls the display device 131. The display device 131 is a liquid crystal display, an organic light-emitting diode (OLED) display, a cathode ray tube (CRT), an electronic paper display, or the like, and displays various kinds of information for the operator, and the like. The display device 131 may be combined with an input device and may be, for example, a touch panel. In the present embodiment, the display device 131 displays information relating to segmented feature images. Further, a printing apparatus, which is not illustrated, or the like may realize output of the information relating to the segmented feature images.
The storage device 14 is a storage device with high I/O performance, and for example, a dynamic random access memory (DRAM), an SSD, a storage class memory (SCM), or an HDD may be used.
The input IF 15 may be connected to an input device such as a mouse 151 or a keyboard 152 to control the input device such as the mouse 151 or the keyboard 152. The mouse 151 and the keyboard 152 are examples of the input device, and an operator performs various input operations through these input devices.
The external recording medium processing unit 16 is configured such that a recording medium 160 can be mounted thereon. The external recording medium processing unit 16 is configured to be able to read information recorded in the recording medium 160, with the recording medium 160 mounted thereon. In the present example, the recording medium 160 is portable. For example, the recording medium 160 is a flexible disk, an optical disk, a magnetic disk, a magneto-optical disk, a semiconductor memory, or the like.
The communication IF 17 is an interface for enabling communication with an external apparatus.
The CPU 11 is an example of a processor and is a processing device that performs various types of control and calculation. The CPU 11 realizes various functions by executing an operating system (OS) and programs stored in the memory unit 12.
A device for controlling the overall operations of the biological image processing apparatus 1 is not limited to the CPU 11, and may be, for example, any one of an MPU, a DSP, an ASIC, a PLD, and an FPGA. In addition, the device for controlling the overall operations of the biological image processing apparatus 1 may be a combination of two or more of a CPU, an MPU, a DSP, an ASIC, a PLD, and an FPGA. Further, MPU is an abbreviation for micro processing unit, DSP is an abbreviation for digital signal processor, and ASIC is an abbreviation for application specific integrated circuit. In addition, PLD is an abbreviation for programmable logic device, and FPGA is an abbreviation for field programmable gate array.
The CPU 11 of the biological image processing apparatus 1 has functions as an extraction processing unit 111, an output processing unit 112, and a blood vessel density calculation unit 113.
The extraction processing unit 111 reduces the image size and extracts feature points, by adding an attention block for averaging channels to convolution blocks for the input biological image. In addition, the extraction processing unit 111 may extract feature points by preferentially reducing a region that does not include the outer shape or the vascular portion of the segmentation target site included in the biological image.
The output processing unit 112 outputs information relating to a feature image obtained by segmenting the biological image including the extracted feature points by contracting the channels in a state in which the attention block has been added to the convolution blocks. In addition, the output processing unit 112 may expand the channels to the size of the input biological image and output a feature image, after contracting the channels.
The blood vessel density calculation unit 113 calculates a vascular density for each of one or more regions in the eyeball based on the information relating to the feature image.
Thus, segmentation of a biological image can be accurately performed through simple arithmetic processing.
The disclosed technology is not limited to each of the embodiments, and can be carried out with various modifications without departing from the scope of each embodiment. Each configuration and processing operation of each embodiment can be selected as necessary, or may be appropriately combined.
Although an example in which the segmentation process is performed on the FAZ area from the biological image of an eyeball has been described in the above-described embodiment, the present invention is not limited thereto. The segmentation process according to the embodiment can be used to segment a specific site from various biological images by changing the DL model applied to LWBNA_UNet according to the embodiment. For example, the segmentation process may be used to segment a tumor such as a cancer cell from an image obtained by imaging an organ of a living body. In this case, the image of the organ with the tumor may be applied to LWBNA_UNet in the embodiment as a DL model.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/037976 | 10/13/2021 | WO |