Multi-angle image semantic segmentation method for cadmium zinc telluride chips

Information

  • Patent Grant
  • 12175721
  • Patent Number
    12,175,721
  • Date Filed
    Tuesday, August 27, 2024
    4 months ago
  • Date Issued
    Tuesday, December 24, 2024
    a day ago
Abstract
Disclosed is a multi-angle image semantic segmentation method for cadmium zinc telluride chips, belonging to the field of image quality augmentation. Firstly, construction of an n+1 dataset is performed by using acquire CZT images, and then pixel-level and latent-level knowledge representation is performed through a Pixel Aggregation Network PAN and a Latent Aggregation Network LAN in a Progressive Complementary Knowledge Aggregation network PCKA, which ultimately improves the quality and speed of CZT image segmentation. The method is suitable for applications that require multi-angle image acquisition and semantic segmentation, such as semiconductor material segmentation.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to Chinese Patent Application No. 202311112314.6, filed on Aug. 31, 2023, the content of which is incorporated herein by reference in its entirety.


TECHNICAL FIELD

The present disclosure herein belongs to the technical field of image segmentation methods, and in particular relates to a multi-angle image semantic segmentation method for cadmium zinc telluride chips.


BACKGROUND

Cadmium zinc telluride (CdZnTe, CZT) is a strategically significant third-generation semiconductor material, mainly composed of three elements including tellurium, zinc, and cadmium. Its crystal structure belongs to a sphalerite structure, its adjustable bandgap width is wide, its energy gap width is about 1.57 eV, and it is a wide-forbidden-band semiconductor material. Due to its excellent chemical stability and optoelectronic properties, CZT has a wide range of applications in medical imaging, safety testing, nuclear radiation detection, and other fields. However, due to the different growth periods of various elements in CZT, CZT crystals are prone to useless heterocrystals due to lattice damage, which will affect the optoelectronic properties of useful monocrystals. Therefore, it is necessary to cut off heterocrystals. At present, CZT monocrystals and heterocrystals are cut manually. Due to the harm caused to the human body by long-term exposure to CT crystals, and the low efficiency and subjectivity of traditional manual recognition and cutting methods, the present disclosure conducts research on automatic cutting vision technology for CZT crystals based on image segmentation, aiming to use the automated cutting technology to solve the problems of manual cutting.


In recent years, semantic segmentation methods based on deep learning have achieved excellent results in many machine vision tasks. For example, Tong Zhen et al. improved the U-Net network to achieve the segmentation of standing tree images, effectively improving the accuracy of standing tree segmentation. However, this method is only applicable to standing tree images under specific lighting conditions. Under situations where the lighting is too bright or too dark, using this method will lead to significant errors. Gu Shiju et al. improved the U-Net network to achieve segmentation of muzzle flames, but due to factors such as muzzle flame sample labeling, the segmentation accuracy of the model is not high, and there are still problems such as inaccurate flame edge contours. Chen Tianhua et al. improved Deeplabv3 to achieve semantic segmentation of remote sensing images. Although this method can complete image segmentation, there are still errors in segmentation results at edges, and boundary segmentation results in some areas are still relatively blurry. The above methods provide ideas for the segmentation of monocrystals and heterocrystals of CZT. However, they are all designed and optimized for specific tasks and scenarios. Due to the complexity of CZT images, applying these methods simply and directly will lead to poor segmentation results and weak model robustness.


SUMMARY

The purpose of the present disclosure is to, by using the reflective characteristics of CZT crystals and combining with deep learning, provide a multi-angle CZT image acquisition method, a multi-angle image dataset construction method, and a CZT image semantic segmentation deep neural network model based on a multi-angle latent aggregation network, so as to ultimately improve the quality and speed of CZT image segmentation.


In order to achieve the above purpose, the present disclosure is implemented by adopting the following technical solution: multi-angle image semantic segmentation method for cadmium zinc telluride chips, including:


step 1) construction of an n+1 dataset:


acquiring single-camera multi-angle CZT images I1, I2, . . . , In by using an acquisition system, manually selecting an image with the highest recognition of monocrystal and heterocrystal defect boundaries from the n images as an image to be labeled, and marking a monocrystal area, a heterocrystal area and a background area of the CZT image by using labeling software to generate pixel-level semantic labels;

    • step 2) construction and training of a Progressive Complementary Knowledge Aggregation network PCKA:
    • network structure
    • the PCKA including a Pixel Aggregation Network PAN and a Latent Aggregation Network LAN, feeding the multi-angle images I1, I2, . . . , In in step 1) into the PAN to obtain pixel-level aggregated images Apixel, then feeding the pixel-level aggregated images Apixel and the multi-angle images I1, I2, . . . , In into the LAN respectively through a feature embedding module FEM to obtain a defect semantic graph based on a general latent expression A latent;
    • network training
    • inputting the acquired single-camera multi-angle CZT images into the PCKA for training, n multi-angle CZT images being used as an input of a training dataset of a system, one labeled image and labeled data thereof being used as network output true values.


Further, in step 1), the acquisition system is composed of one industrial camera and n angle light sources, and acquiring images includes:

    • step 1.1: adjusting the focal length of the camera;
    • step 1.2: turning on the n angle light sources, optimizing and adjusting light brightness, and acquiring the multi-angle CZT images according to formula (1):

      Ii=kambientiLambienti+kdiffuseiLdiffusei(hi·li)+kspeculariLspeculari(ri·v)αi
      ri=2(hi·li)hi−li,i=1,2, . . . ,n  (1)


where Ii represents an image acquired when an ith angle light source is lighted up solely, n represents the total number of multi-angle light sources, Lambienti, Ldiffusei and Lspeculari respectively represent ith angle ambient lighting, diffuse lighting and specular lighting, kambient, kdiffuse and kspecular respectively represent reflection coefficients corresponding to ambient lighting, diffuse lighting and specular lighting, h represents the normal direction of a corresponding point of incidence, l represents an incident light direction vector, r represents a reflection direction vector, α represents surface roughness, and v represents the direction of the camera.


Further, a specific data processing process of the PAN includes:

    • feeding the multi-angle images I1, I2, . . . , In into a U-shaped weight calibrator to obtain corresponding weights w1, w2, . . . , wn, and expressing an encoder and a decoder in the calibrator respectively as PE and PDE, this process being expressed as:

      O1=PE1(Cat(I1,I2, . . . ,In))
      Oi=PEi(Oi-1),i=2, . . . ,5  (2)
    • where Cat(⋅) represents a splicing operation, PEi represents an ith encoding layer in the calibrator, and Oi represents an output of the ith encoding layer in the calibrator;

      O′1=PDE1(O5)
      O′i=PDEi(Cat(O′i-1,O5-(i-1))),i=2, . . . ,5  (3)
    • where PDEi represents an ith decoding layer in the PAN and O′i represents an output of the ith decoding layer in the PAN;
    • finally obtaining n weights w1, w2, . . . , wn corresponding to I1, I2, . . . , In, then obtaining calibrated images I′i (i=1, . . . , n) according to Ii*wi, and finally obtaining pixel-level aggregated images Apixel through












i
=
1

n




I
i


.





Further, the LAN includes Forward Extraction Modules FEMs, Augmentation Guide Modules AGM and a Deep Projection Module DPM, the FEMs are configured to obtain feature expressions of I1, I2, . . . , In and Apixel, the AGMs are configured to adaptively extract latent clues under the guide of Apixel, the DPM is configured to project aggregated features to a deeper feature space to obtain the general latent expression Alatent to finally obtain a semantic graph based on Alatent.


Further, a specific process of the FEMs includes:

    • given Ii (i=1, . . . , n) and Apixel, using three convolutional layers, one pooling layer and first three residual blocks of ResNet-101 as a shallow layer extractor, which is expressed as:

      xa=Down(ƒ(Iα)),Iα∈{Ij,Apixel}  (4)
    • where f(⋅) represents a latent feature extraction function, Down(⋅) represents a maximum pooling operation, and xα represents an output latent feature of Iα and is fed into first three modules of ResNet-101 to further extract latent residual features;
    • expressing required bottom layer mapping as H(xα), and fitting another mapping by using several nonlinear layers:

      F(xα)=H(xα)−xα  (5)
    • where F(⋅) represents stacked convolutional layers, till which latent residual features generated by original mapping F(xα)+xα are obtained as an input of a subsequent module.


Further, a specific process of the AGMs includes using xpixel to guide, from a latent space, unique boundary clues coming from xi (i=1, . . . , n), that is, aggregating supplementary knowledge beneficial to segmentation, and suppressing useless redundant information:











L
i

=


σ

(


g
1

(


g
2

(

Cat

(


x
i

,

x

p

i

x

e

l



)

)

)

)

×

x
i







L


=







i
=
1

n



L
i







(
9
)







where g1(⋅) and g2(⋅) are implementable by using convolutional layers, σ represents a Sigmoid function, Li represents latent features extracted from I1, . . . , In, and L′ represents an aggregated complementary feature.


Further, a specific process of the DPM includes projecting L′ to a deeper feature space to obtain a deep latent feature Alatent in a case that an output L′ of the AGMs is known, which is expressed as:

Alatent=D3(D2(D1(L′)))  (7)

    • where D1(⋅), D2 (⋅) and D3 (⋅) are implementable by using subsequent residual blocks of ResNet-101.


Compared with the traditional image methods, the present disclosure has the following innovative points and advantages.


Firstly, the present disclosure adopts a single-camera multi-angle acquisition mode aiming at the multi-angle reflection characteristics of CZT crystals. This acquisition mode not only avoids the errors caused by multi-camera image registration, but also can fuse different boundary information from images from different lighting angles.


Secondly, the present disclosure proposes an n+1 dataset construction mode for CZT image segmentation. This dataset is composed of high-resolution and high-precision multi-angle CZT images and pixel-level semantic labels thereof, and provides a benchmark dataset for other industrial tasks that require information fusion to assist image analysis.


Lastly, the present disclosure also proposes a segmentation network based on multi-angle complementary knowledge aggregation aiming at CZT image segmentation. This network achieves the effect of adaptively aggregating boundary information from both pixel-level and feature-level information. The present disclosure is suitable for applications that require multi-angle image acquisition and semantic segmentation, such as semiconductor material segmentation.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates a schematic diagram of single-camera multi-angle lighting control.



FIG. 2 illustrates a schematic diagram of data labeling, where a of FIG. 2 is a labeling process, b of FIG. 2 is a generated labeled image, a dark area represents CZT monocrystal, a light area represents heterocrystal, and a black area represents background.



FIG. 3 illustrates a schematic diagram of a Progressive Complementary Knowledge Aggregation network PCKA.



FIG. 4 illustrates a schematic diagram of a Pixel Aggregation Network PAN.



FIG. 5 illustrates a schematic diagram of a Latent Aggregation Network LAN.



FIG. 6 illustrates ablation experiment data of each component of a PCKA based on U-Net according to the present disclosure.



FIG. 7 illustrates a data graph of comparisons between a method according to the present disclosure and different solutions.



FIG. 8 illustrates a performance comparison between 12-angle images from specific lighting angles.



FIG. 9 illustrates a multi-angle image semantic segmentation method for cadmium zinc telluride chips.





DETAILED DESCRIPTION OF THE EMBODIMENTS

The present disclosure will be further described below in detail in combination with the embodiments with reference to the drawings.


Referring to FIG. 9, a multi-angle image semantic segmentation method for cadmium zinc telluride chips includes the following steps:

    • Step 1) construction of an n+1 dataset


Single-camera multi-angle CZT images I1, I2, . . . , In are acquired by using an acquisition system. Referring to FIG. 1, the acquisition system is composed of one industrial camera and n angle light sources, and acquiring images includes the following steps:


In step 1.1, the focal length of the camera is adjusted.


In step 1.2, the n angle light sources are turned on, light brightness is optimized and adjusted, and the multi-angle CZT images are acquired according to formula (1):

Ii=kambientiLambienti+kdiffuseiLdiffusei(hi·li)+kspeculariLspeculari(ri·v)αi
ri=2(hi·li)hi−li,i=1,2, . . . ,n  (1)

    • where Ii represents an image acquired when an ith angle light source is lighted up solely, n represents the total number of multi-angle light sources, Lambienti, Ldiffusei and Lspeculari respectively represent ith angle ambient lighting, diffuse lighting and specular lighting, kambient, kdiffuse and kspecular respectively represent reflection coefficients corresponding to ambient lighting, diffuse lighting and specular lighting, h represents the normal direction of a corresponding point of incidence, l represents an incident light direction vector, r represents a reflection direction vector, α represents surface roughness, and v represents the direction of the camera.


Then, an image with the highest recognition of monocrystal and heterocrystal defect boundaries is manually selected from the n images as an image to be labeled, and a monocrystal area, a heterocrystal area and a background area of the CZT image are marked by using labeling software to generate a pixel-level semantic graph, as illustrated in FIG. 2.

    • Step 2) construction and training of a Progressive Complementary Knowledge Aggregation network PCKA


      Network Structure


Referring to FIG. 3, the PCKA includes a Pixel Aggregation Network PAN and a Latent Aggregation Network LAN. The multi-angle images I1, I2, . . . , In in step 1) are fed into the PAN to obtain pixel-level aggregated images Apixel. Then the pixel-level aggregated images Apixel and the multi-angle images I1, I2, . . . , In are fed into the LAN respectively through a feature embedding module (FEM) to obtain a defect semantic graph based on a general latent expression Alatent.


Referring to FIG. 4, a specific data processing process of the PAN includes the following steps.


The multi-angle images I1, I2, . . . , In are fed into a U-shaped weight calibrator to obtain corresponding weights w1, w2, . . . , wn. An encoder and a decoder in the calibrator are respectively expressed as PE and PDE. This process is expressed as:

O1=PE1(Cat(I1,I2, . . . ,In))
Oi=PEi(Oi-1),i=2, . . . ,5  (2)

    • where Cat(⋅) represents a splicing operation, PEi represents an ith encoding layer in the calibrator, and Oi represents an output of the ith encoding layer in the calibrator;

      O′1=PDE1(O5)
      O′i=PDEi(Cat(O′i-1,O5-(i-1))),i=2, . . . ,5  (3)


where PDEi represents an ith decoding layer in the PAN and O′i represents an output of the ith decoding layer in the PAN.


Finally, n weights w1, w2, . . . , wn corresponding to I1, I2, . . . , In are obtained. Then, calibrated images I′i (i=1, . . . , n) are obtained according to Ii*wi. Finally, pixel-level aggregated images Apixel are obtained through












i
=
1

n




I
i


.





Referring to FIG. 5, the LAN includes Forward Extraction Modules FEMs, Augmentation Guide Modules AGMs and a Deep Projection Module DPM. The FEMs are configured to obtain feature expressions of I1, I2, . . . , In and Apixel. The AGMs are configured to adaptively extract latent clues under the guide of Apixel. The DPM is configured to project aggregated features to a deeper feature space to obtain the general latent expression Alatent to finally obtain a semantic graph based on Alatent.


A specific process of the FEMs includes the following steps.


Given Ii (i=1, . . . , n) and Apixel, three convolutional layers, one pooling layer and first three residual blocks of ResNet-101 are used as a shallow layer extractor, which is expressed as:

xa=Down(ƒ(Iα)),Iα∈{Ij,Apixel}  (4)

    • where f(⋅) represents a latent feature extraction function, Down(⋅) represents a maximum pooling operation, and xα represents an output latent feature of Iα and is fed into first three modules of ResNet-101 to further extract latent residual features.


Required bottom layer mapping is expressed as H(xα). Another mapping is fit by using several nonlinear layers:

F(xα)=H(xα)−xα  (5)

    • where F(⋅) represents stacked convolutional layers, till which latent residual features generated by original mapping F(xα)+xα are obtained as an input of a subsequent module.


A specific process of the AGMs includes the following steps.


xpixel is used to guide, from a latent space, unique boundary clues coming from xi (i=1, . . . , n), that is, supplementary knowledge beneficial to segmentation is aggregated, and useless redundant information is suppressed:











L
i

=


σ

(


g
1

(


g
2

(

Cat

(


x
i

,

x

p

i

x

e

l



)

)

)

)

×

x
i







L


=







i
=
1

n



L
i







(
9
)









    • where g1(⋅) and g2 (⋅) are implementable by using convolutional layers, σ represents a Sigmoid function, Li represents latent features extracted from I1, . . . , In, and L′ represents an aggregated complementary feature.





A specific process of the DPM includes the following steps.


L′ is projected to a deeper feature space to obtain a deep latent feature A latent in a case that an output L′ of the AGMs is known, which is expressed as:

Alatent=D3(D2(D1(L′)))  (7)

    • where D1(⋅), D2 (⋅) and D3 (⋅) are implementable by using subsequent residual blocks of ResNet-101.


      Network Training


The acquired single-camera multi-angle CZT images are input into the PCKA for training. N multi-angle CZT images are used as an input of a training dataset of a system. One labeled image and labeled data thereof are used as network output true values.


Simulation experiments were conducted below on the multi-angle image semantic segmentation method for cadmium zinc telluride chips according to the present disclosure.


Experimental Setup


Our deep learning algorithm run on Ubuntu20.04LTS operating system. A CPU i9-10900X 3.7 GHZ, 16 GB memory, and two Quadro RTX 8000GPU with a 48 GB video memory were used. We selected ImageNet pretrained ResNet-101 as a backbone network. Training images were randomly scaled (0.75 to 2.0). Then, resulting images were randomly cropped to a size of 768×768. All experiments were conducted by using a small-batch stochastic gradient descent method. We adopted a poly learning rate adjustment strategy, where the initial learning rate was multiplied by







1
-


(


i

t

e

r

max_iter

)


power


,





where power=0.9. We used momentum 0.9 and weight decay 0.0001. The initial learning rate of the pixel aggregation network was 1e−3. The initial learning rate of the latent aggregation network was 1e−2. All experimental results in the present disclosure were obtained after 40000 iterations for training.


Ablation Experiment


In FIG. 6, we conducted ablation studies on each component of PCKA based on U-Net. Specifically, we used a first convolutional layer of ResNet-101 as a feature extractor (weight sharing) to projected bottom layer pixels into a latent space, abbreviated as CONV, and set CONV as our baseline. We input Apixel into a deep network, abbreviated as PAN. In addition, we aggregated xi,2, . . . , xi,12 and xpixel,2, which was denoted as LAN*, to demonstrate the effectiveness of feature-level information aggregation. Finally, we equipped the AGMs onto the LAN to use Apixel to guide information aggregation, abbreviated as LAN. FIG. 6 validates the effectiveness of each component on the n+1 dataset. We set CONV as our baseline.


It can be seen that PAN, when compared with the baseline, can bring good gain results, indicating that pixel-level information aggregation is necessary for 12-angle images. By performing feature-level aggregation on xi,2, . . . , xi,12 and xpixel,2, the performance was further improved by 1.4% in LAN*, indicating that the aggregated latent clues can provide performance gains. For the last row (PCKA) in FIG. 6, we demonstrate that the solution of pixel-level and feature-level clue aggregation results in mIoU improvement of 0.5%, thus validating the effectiveness of LAN.


Comparison with Different Solutions



FIG. 7 illustrates a performance comparison between different solutions. Bg, Cd and De respectively refer to background, monocrystal and heterocrystal. In FIG. 7, we conducted comprehensive experiments by using different learning frameworks on the n+1 dataset, so as to show the effectiveness and efficiency of PCKA. In order to meet the actual application demands, the reasoning time was obtained through calculation on actual deployed devices (individual Geforce RTX 3060).


Specifically, 1) images from 12 different angles can be directly aggregated (averaged and cascaded) at the pixel level to aggregate pixel-level information, abbreviated as Avg and Concat, respectively; 2) we randomly selected an image during each iteration of the training phase to train the model. During the testing phase, images from 12 different angles were predicted to obtain 12 different segmentation results. Then, we fused these results for model integration, abbreviated as Late Fusion.


Obviously, all methods have achieved excellent performance in category “Bg”, indicating that for CdZnTe images, segmentation algorithms can easily distinguish between foreground and background. Therefore, the key issue in defect segmentation of CdZnTe crystals is how to adaptively extract clues that are beneficial for distinguishing the boundaries of “Cd” and “De”.


It can be seen that there is a significant gap between pixel-level methods (Avg and Concat) and other solutions, indicating that if we treat images from 12 angles equally, it is difficult for deep models to directly learn boundary information. Due to the powerful model integration capability of Late Fusion and the effective information aggregation capability of PCKA, both frameworks have achieved excellent performance. However, Late Fusion requires a significant amount of computing resources when integrating models, resulting in longer processing time and unacceptable performance in real-world industrial scenarios, leading to less than ideal results. In short, only our PCKA has achieved satisfactory results in both time and segmentation performance.


Further, this patent studied the performance of Late Fusion and PCKA under different numbers of input images. We only input images into Late Fusion and PCKA based on specific lighting angles.



FIG. 8 illustrates a performance comparison between 12-angle images from specific lighting angles (the curve in the image cannot be distinguished after grayscale transformation, please modify it). LF represents Late Fusion. The primary Y-axis represents mIoU, and the secondary Y-axis represents reasoning time.


For the different segmentation models in FIG. 8, although Late Fusion outperforms our PCKA, it requires longer time to integrate more models compared to our PCKA. In addition, when the number of input images is 12, our PCKA achieves competitive performance while ensuring reasoning speed.


It can be seen that our PCKA can gather beneficial clues at the pixel and feature levels, rather than relying on model integration, which results in consistent performance gains compared to Late Fusion as the number of input images increases. It can be seen that our PCKA can effectively extract useful boundary clues from different images. On the contrary, Late Fusion exhibits a “performance saturation effect” as the number of input images increases. It consumes a significant amount of reasoning time in all cases, which is unacceptable in practical applications.


What are described above are only specific embodiments of the present disclosure, but the scope of protection of the present disclosure is not limited to thereto. Any changes or replacements that can be easily thought of by those skilled in the art within the technical scope disclosed in the present disclosure should be included in the scope of protection of the present disclosure. Therefore, the scope of protection of the present disclosure should be subject to the scope of protection of the claims.

Claims
  • 1. A multi-angle image semantic segmentation method for cadmium zinc telluride chips, comprising: step 1) construction of an n+1 dataset:acquiring single-camera multi-angle Cadmium Zinc Telluride (CZT) images I1, I2, . . . , In by using an acquisition system, manually selecting an image with the highest recognition of monocrystal and heterocrystal defect boundaries from n images as an image to be labeled, and marking a monocrystal area, a heterocrystal area and a background area of the CZT image by using labeling software to generate pixel-level semantic labels;step 2) construction and training of a Progressive Complementary Knowledge Aggregation network (PCKA):network structurethe PCKA comprising a Pixel Aggregation Network (PAN) and a Latent Aggregation Network (LAN), feeding the multi-angle images I1, I2, . . . , In in step 1) into the PAN to obtain pixel-level aggregated images Apixel, then feeding the pixel-level aggregated images Apixel and the multi-angle images I1, I2, . . . , In into the LAN respectively through a Feature Embedding Module (FEM) to obtain a defect semantic graph based on a general latent expression Alatent;a specific data processing process of the PAN comprising:feeding the multi-angle images I1, I2, . . . , In into a U-shaped weight calibrator to obtain corresponding weights w1, w2, . . . , wn, and expressing an encoder and a decoder in the calibrator respectively as PE and PDE, this process being expressed as: O1=PE1(Cat(I1,I2, . . . ,In))Oi=PEi(Oi-1),i=2, . . . ,5  (2)where Cat(⋅) represents a splicing operation, PEi represents an ith encoding layer in the calibrator, and Oi represents an output of the ith encoding layer in the calibrator; O′1=PDE1(O5)O′i=PDEi(Cat(O′i-1,O5-(i-1))),i=2, . . . ,5  (3)where PDEi represents an ith decoding layer in the PAN and O′i represents an output of the ith decoding layer in the PAN;finally obtaining n weights w1, w2, . . . , wn corresponding to I1, I2, . . . , In, then obtaining calibrated images I′i (i=1, . . . , n) according to Ii*wi, and finally obtaining pixel-level aggregated images Apixel through
  • 2. The multi-angle image semantic segmentation method for cadmium zinc telluride chips according to claim 1, wherein in step 1), the acquisition system is composed of one industrial camera and n angle light sources, and acquiring images comprises: step 1.1: adjusting a focal length of the camera;step 1.2: turning on the n angle light sources, optimizing and adjusting light brightness, and acquiring the multi-angle CZT images according to formula (1): Ii=kambientiLambienti+kdiffuseiLdiffusei(hi·li)+kspeculariLspeculari(ri·v)αi ri=2(hi·li)hi−li,i=1,2, . . . ,n  (1)where Ii represents an image acquired when an ith angle light source is lighted up solely, n represents the total number of multi-angle light sources, Lambienti, Ldiffusei and Lspeculari respectively represent ith angle ambient lighting, diffuse lighting and specular lighting, kambient, kdiffuse and kspecular respectively represent reflection coefficients corresponding to ambient lighting, diffuse lighting and specular lighting, h represents the normal direction of a corresponding point of incidence, l represents an incident light direction vector, r represents a reflection direction vector, α represents surface roughness, and v represents the direction of the camera.
  • 3. The multi-angle image semantic segmentation method for cadmium zinc telluride chips according to claim 1, wherein a specific process of the Forward Extraction Modules (FEMs) comprises: given Ii (i=1, . . . , n) and Apixel, using three convolutional layers, one pooling layer and first three residual blocks of ResNet-101 as a shallow layer extractor, which is expressed as: xa=Down(ƒ(Iα)),Iα∈{Ij,Apixel}  (4)where f(⋅) represents a latent feature extraction function, Down(⋅) represents a maximum pooling operation, and xα represents an output latent feature of Iα and is fed into first three modules of ResNet-101 to further extract latent residual features;expressing required bottom layer mapping as H(xα), and fitting another mapping by using several nonlinear layers: F(xα)=H(xα)−xα  (5)where F(⋅) represents stacked convolutional layers, till which latent residual features generated by original mapping F(xα)+xα are obtained as an input of a subsequent module.
  • 4. The multi-angle image semantic segmentation method for cadmium zinc telluride chips according to claim 3, wherein a specific process of the Augmentation Guide Modules (AGMs) comprises using xpixel to guide, from a latent space, unique boundary clues coming from xi (i=1, . . . , n), that is, aggregating supplementary knowledge beneficial to segmentation, and suppressing useless redundant information:
  • 5. The multi-angle image semantic segmentation method for cadmium zinc telluride chips according to claim 4, wherein a specific process of the DPM comprises projecting L′ to a deeper feature space to obtain a deep latent feature A latent in a case that an output L′ of the AGMs is known, which is expressed as: Alatent=D3(D2(D1(L′)))  (7)wherein D1(⋅), D2(⋅) and D3(⋅) are implementable by using subsequent residual blocks of ResNet-101.
Priority Claims (1)
Number Date Country Kind
202311112314.6 Aug 2023 CN national
US Referenced Citations (2)
Number Name Date Kind
20220120698 Norman Apr 2022 A1
20220215662 Yang Jul 2022 A1
Foreign Referenced Citations (5)
Number Date Country
113658188 Nov 2021 CN
113673529 Nov 2021 CN
114155481 Mar 2022 CN
115205672 Oct 2022 CN
116246065 Jun 2023 CN
Non-Patent Literature Citations (5)
Entry
First Office Action issued in counterpart Chinese Patent Application No. 202311112314.6, dated Jun. 21, 2024.
Jiang et al., Granticle segmentation based on semantic feature extraction, Science of China: Information Science, vol. 50, No. 1, pp. 109-127, dated Jan. 20, 2020.
Li et al., Leaf segmentation based on semant ic segmentation and visible spectroscopy, Spectroscopy and spectral Analysis, vol. 43, No. 4, pp. 1248-1253, dated Apr. 30, 2023.
Notification to Grant Patent Right for Invention issued in counterpart Chinese Patent Application No. 202311112314.6, dated Jul. 17, 2024.
Zhou et al., Semantic segmentation of images based on deep feature fusion, Computer Science, vol. 47, No. 2, pp. 126-134, Dec. 31, 2020.