This application relates to the field of computer vision, and in particular, to a method for determining a lesion region, and a model training method and apparatus.
At present, a specific lesion location can be determined from a pathological image.
In the related art, a histopathological examination begins with biopsy, where a doctor obtains tissue slices from the body of a user and makes slides through steps of embedding, staining or the like. Afterwards, a pathologist places the slides under a microscope for observation in order to find the specific lesion location from the pathological image.
However, in the above related art, the efficiency of determining a lesion location is relatively low.
Embodiments of this application provide a method for determining a lesion region, and a model training method and apparatus. The technical solutions are described as follows.
According to one aspect of the embodiments of this application, a method for determining a lesion region in a pathological image is performed by a computer device, and the method including the following steps:
According to one aspect of the embodiments of this application, a computer device is provided, the computer device including a processor and a memory, the memory storing therein at least one program, the at least one program being loaded and executed by the processor and causing the computer device to implement the above method for determining a lesion region in a pathological image.
According to one aspect of the embodiments of this application, a non-transitory computer-readable storage medium is provided, the storage medium storing therein at least one program, the at least one program being loaded and executed by a processor of a computer device and causing the computer device to implement the above method for determining a lesion region in a pathological image.
The technical solutions provided in the embodiments of this application may have the following beneficial effects:
In addition, in the embodiments of this application, firstly, the candidate lesion region in the pathological image is determined based on the first sampling way, then the second instance image is acquired based on the second sampling way with a greater sampling overlap degree than that of the first sampling way, and based on the second instance image, the lesion region in the pathological image is determined from the candidate lesion region. Thus the area of a region in the pathological image that needs to be sampled by the second sampling way is reduced, thereby reducing the number of the second instance images that need to be collected and feature information extracted, reducing computational resources required to determine the lesion region, and improving the efficiency of determining the lesion region.
Moreover, the candidate lesion region is more likely to include the lesion region than other regions, and has a relatively high signal-to-noise ratio. Thus, by executing the second sampling way only for the candidate lesion region, an information loss can be reduced and perception of a lesion region with a relatively small area can be enhanced, thereby determining the lesion region in the pathological image more accurately.
To make the objectives, technical solutions, and advantages of this application clearer, implementations of this application will be further described in detail with reference to the accompanying drawings.
A method for training a sustainable learning model in this application involves the following technology:
Artificial intelligence (AI) involves a theory, a method, a technology, and an application system that use a digital computer or a machine controlled by the digital computer to simulate, extend, and expand human intelligence, perceive an environment, acquire knowledge, and use knowledge to obtain an optimal result. In other words, AI is a comprehensive technology in computer science and attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. AI is to study the design principles and implementation methods of various intelligent machines, to enable the machines to have the functions of perception, reasoning, and decision-making.
The AI technology is a comprehensive discipline, and relates to a wide range of fields including both hardware-level technologies and software-level technologies. The basic AI technologies generally include technologies such as a sensor, a dedicated AI chip, cloud computing, distributed storage, a big data processing technology, an operating/interaction system, and electromechanical integration. AI software technologies mainly include several major directions such as a computer vision (CV) technology, a speech processing technology, a natural language processing technology, and machine learning/deep learning.
The CV technology is a science that studies how to use a machine to “see”, and further, that uses a camera and a computer to replace human eyes to perform machine vision such as recognition and measurement on a target, and further perform graphic processing, so that the computer processes the target into an image more suitable for human eyes to observe, or an image transmitted to an instrument for detection. As a scientific discipline, CV studies related theories and technologies and attempts to establish an AI system that can obtain information from images or multidimensional data. The CV technology generally includes technologies such as image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, a 3 Dimensions (3D) technology, virtual reality, augmented reality and map construction.
Machine learning (ML) is a multi-domain interdiscipline, and involves a plurality of disciplines such as the Probability Theory, the Statistics, the Approximation Theory, the Convex Analysis, and the Algorithm Complexity Theory. ML specializes in studying how a computer simulates or implements a human learning behavior to obtain new knowledge or skills, and reorganize an existing knowledge structure, so as to keep improving its performance. ML is the core of AI, is a basic way to make the computer intelligent, and is applied to various fields of AI. ML and deep learning generally include technologies such as an artificial neural network, a belief network, reinforcement learning, transfer learning, inductive learning, and learning from demonstrations.
With the research and progress of the AI technology, it has been researched and applied in multiple fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, intelligent marketing, unmanned driving, autonomous driving, drones, robots, smart healthcare, and intelligent customer service. It is believed that with the development of the technology, the AI technology will be applied in more fields, and plays an increasingly important role.
The solution provided in embodiments of this application involves technologies such as ML and CV in AI, and a lesion region in a pathological image is determined using a trained lesion region determination model.
The technical solutions of this application will be described below with reference to several embodiments.
Refer to
The encoding network 10 is configured to perform feature encoding on an instance image to acquire feature information corresponding to the instance image. Exemplarily, in an embodiment of this application, the above instance image includes at least two first instance images and at least two second instance images. The first instance image is acquired by sampling a pathological image by the first sampling way, and the second instance image is acquired by sampling a candidate lesion region by the second sampling way. Moreover, in the embodiment of this application, an overlap degree between the second instance images is greater than that between the first instance images.
The first classification network 20 is configured to determine a first predicted probability corresponding to the first instance image and global feature information of the pathological image according to feature information corresponding to the first instance image, and then determine the lesion region in the pathological image, based on the first predicted probability. The first predicted probability refers to a probability of the presence of the lesion region in the first instance image.
The second classification network 30 is configured to determine local feature information of the pathological image for a candidate lesion region according to feature information corresponding to the second instance image.
The third classification network 40 is configured to determine lesion indication information of the pathological image according to the above global feature information and the above local feature information. The lesion indication information is used for indicating the lesion region in the pathological image.
In some embodiments, the above lesion region determination model can be applied to a lesion region determination system. Exemplarily, as shown in
The terminal device 50 may be an electronic device such as a mobile phone, a tablet, a wearable device, a personal computer (PC), an intelligent voice interaction device, a medical device and a medical assistive robot, which will not be limited in the embodiments of this application. In some embodiments, the terminal device 50 includes a client of an application program. The application program may be any application with a pathological image collection function. Exemplarily, the above application program may be an application program that needs to be downloaded and installed, or a click-to-run application program, including an application program in a web form and an application program in a mini program form, which will not be limited in the embodiments of this application.
The server 60 is configured to provide background services for the terminal device 50. The server 60 may be one server, a server cluster including a plurality of servers, or a cloud computing service center. In some embodiments, the server 60 may be a background server of the client of the above application program. In an exemplary embodiment, the server 60 provides background services for a plurality of terminal devices 50.
The above terminal device 50 and the above server 60 transmit data via a network. In some embodiments, the server 60 includes a lesion region determination model. The terminal device 50 collects and acquires a pathological image and sends the same to the server 60. Afterwards, the server 60 processes the pathological image, based on the lesion region determination model to determine the lesion region in the pathological image.
The above introduction in
A device for training the above lesion region determination model may be the above server 60, or other computer devices, which will not limited in the embodiments of this application.
Refer to
step 310: Sample a pathological image by a first sampling way to obtain at least two first instance images.
The pathological image is a whole slide image (WSI), which is a type of medical image. In some embodiments, a pathological image is generated by slicing, embedding, staining, scanning, and other processing of tissue or organs in the body of a patient. In an embodiment of this application, after a computer device acquires the above pathological image, the pathological image is sampled by the first sampling way to obtain at least two first instance images. In some embodiments, the above pathological image may be a pathological image of teeth, arms, heart, liver, kidneys, lungs, prostate, stomach, and other parts. In some embodiments, the pathological images may be a computed tomography (CT) image, a magnetic resonance imaging (MRI) image, a B-scan ultrasonography image or the like, or other types of pathological images, which will not be specifically limited in the embodiments of this application.
In some embodiments, the pathological image includes a background image and a foreground image. The foreground image refers to an image region corresponding to tissue or organ in the body of a patient, and the background image refers to an image region unrelated to the tissue or organ in the body of the patient. Alternatively, the foreground image refers to an image region of a body part that requires pathological analysis, and the background image refers to the remaining image regions, i.e., image regions other than the image region of the body part that requires the pathological analysis.
Exemplarily, the above first sampling way is uniform sampling. The uniform sampling may refer to a sampling way of partitioning the plane of a two-dimensional continuous image equidistantly in both horizontal and vertical directions. In an embodiment of this application, the uniform sampling refers to partitioning a pathological image into a plurality of lattices of the same shape and size, where the lattice may be a square, a rectangle, a triangle, a parallelogram or the like, which will not be specifically limited in the embodiment of this application, and these lattices do not overlap with each other. In some embodiments, a first instance image obtained by the uniform sampling is cropped into an instance image of 224×224 pixels at 10× magnification.
In some embodiments, step 310 further includes the following sub-steps:
1. Partition a background of the pathological image, and determine a background image and a foreground image in the pathological image.
In some embodiments, the pathological image is converted to a binary image, and two different colors are used for representing the background image and the foreground image, respectively. For example, there are only black and white colors in the pathological image, where a black image portion represents the background image and a white image portion represents the foreground image. Alternatively, the white image portion represents the background image, and the black image portion represents the foreground image.
2. Uniformly segment the pathological image to obtain at least two first candidate instance images.
In some embodiments, after the background image and the foreground image in the pathological image are determined, uniform sampling is performed on the pathological image, and the pathological image is uniformly segmented into a plurality of candidate instance images. In some embodiments, each candidate instance image obtained by segmentation has the same shape and size.
3. Determine a first candidate instance image including the foreground image from the at least two first candidate instance images as the first instance image.
By partitioning the pathological image into the background image and the foreground image, the image region of the body part that requires the pathological analysis is distinguished from an image region that does not require the pathological analysis, thereby avoiding interference from the image region that does not require the pathological analysis in a subsequent pathological analysis process and helping to reduce subsequent computational complexity.
In some embodiments, a first candidate instance image including the foreground image is determined from the at least two first candidate instance images, and the first candidate instance image including the foreground image is determined as the first instance image.
In some embodiments, for each first candidate instance image obtained after segmentation, if a proportion the foreground image included in the first candidate instance image accounts for in the first candidate instance image is equal to or greater than a first threshold, the first candidate instance image is determined as the first instance image; and if a proportion the foreground image included in the first candidate instance image accounts for in the first candidate instance image is less than the first threshold, it is determined that the first candidate instance image is not the first instance image. The first threshold is equal to or greater than 0%, and the first threshold is less than or equal to 100%. The first threshold may be 0%, 5%, 8%, 10%, 15%, 24%, 30%, 45%, 50%, 62%, 70%, 100% or the like. Of course, the first threshold may also be other numerical values, and a specific numerical value of the first threshold can be set by a related technical person according to the actual situation, which will not be specifically limited in the embodiments of this application.
In some embodiments, there may be only one or more first instance images determined from at least two first candidate instance images, which will not be specifically limited in the embodiments of this application.
In some embodiments, before step 310, the method further includes the following steps: acquiring an initial pathological image; and zooming the initial pathological image to a fixed size to obtain a pathological image. Since the size of the initial pathological image may not be fixed, after the initial pathological image is acquired, the initial pathological image can be firstly zoomed to a set fixed size to obtain a pathological image required in step 310. For example, if the size of the initial pathological image is smaller than the fixed size, the initial pathological image is zoomed-in to the fixed size; and if the size of the initial pathological image is greater than the fixed size, the initial pathological image is zoomed-out to the fixed size. The fixed size can be set by a related technical person according to the actual situation, which will not be specifically limited in the embodiments of this application.
Step 320: Determine a candidate lesion region in the pathological image, based on feature information extracted from the at least two first instance images.
In some embodiments, after a computer device acquires the above first instance image, feature extraction is performed on each first instance image, feature information corresponding to each first instance image is acquired, and based on the feature information corresponding to each first instance image, a candidate lesion region in a pathological image is determined.
The feature information extracted from the at least two first instance images can be used for representing pathological features of the at least two first instance images. The feature information extracted from the at least two first instance images includes first feature information corresponding to each first instance image, and global feature information of the pathological image. The first feature information is used for representing pathological features corresponding to each first instance image, and the global feature information is used for representing global pathological features of the pathological image. For example, feature vectors corresponding to each first instance image (i.e., the feature information of the first instance image) are extracted via a Swin Transformer backbone network.
Global aggregation features of the pathological image and coarse distribution of a lesion region are obtained through the feature information of the first instance image extracted by the first sampling way.
The candidate lesion region refers to a region where there is likely a lesion region. The lesion region may refer to a region where there is a tumor, a region where there is carcinogenesis, a region where there is perforation, a region that has ulcerated or has a sign of ulceration, or a region that is clearly abnormal in color. The lesion region may also refer to a region where there are other types of lesions, and can be set by a related technical person, which will not be specifically limited in the embodiments of this application.
Step 330: Sample the candidate lesion region by the second sampling way to obtain at least two second instance images.
In some embodiments, after a computer device determines the above candidate lesion region, the candidate lesion region is sampled by the second sampling way to obtain at least two second instance images. In some embodiments, multiple samplings are performed around a given coordinate (such as the center point of the candidate lesion region) to obtain at least two second instance images. An overlap degree between the second instance images is greater than that between the first instance images. For example, there is overlap between the second instance images, and there is no overlap between the first instance images (that is, the first instance images are obtained by the uniform sampling way introduced above). For another example, there is an overlap degree between the second instance images, as well as between the first instance images, but the overlap degree between the second instance images is greater than that between the first instance images. In some embodiments, the second instance image is an image obtained through dense sampling, and different second instance images may be the same or different in size.
In an embodiment of this application, the overlap degree is used for indicating an overlap extent between images, such as indicating an overlap degree between instance images in a candidate lesion region. In some embodiments, the overlap degree can be worked out by an overlap rate corresponding to each instance image. For example, a mean of overlap rates corresponding to the instance images is determined as an overlap degree of these instance images in a candidate region. Alternatively, a median of overlap rates corresponding to the instance images can be determined as the overlap degree of these instance images in the candidate region.
For each instance image, the overlap rate may refer to a ratio of the total area of overlap regions between this instance image and other instance images to the area of this instance image. In some embodiments, for a second target instance image in at least two second instance images, there is at least one second instance image having an overlap rate with the second target instance image equal to or greater than an overlap threshold. Of course, a specific numerical value of the overlap threshold can also be set by a related technical person according to the actual situation, which will not be specifically limited in the embodiments of this application.
In some embodiments, the overlap degree can also be expressed as a ratio of the sum of the area of all instance images in a candidate lesion region to the area of the candidate lesion region; the overlap degree can also be expressed as a ratio of a difference between the sum of the area of all the instance images in the candidate lesion region and the area of the candidate lesion region over the area of the candidate lesion region; and the overlap degree can also be expressed as a ratio of the total area of overlap regions between every two instance images in the candidate lesion region to the area of the candidate lesion region. If a certain region is both an overlap region between an instance image A and an instance image B and an overlap region between the instance image A and an instance image C, then this region is also an overlap region between the instance image B and instance image C, then when the total area of the overlap regions between every two instance images is calculated, this region is to be calculated three times, that is, the area of this region is multiplied by 3 and then, an obtained product is included in the total area of the overlap regions between every two instance images. Therefore, the worked out total area of the overlap regions between every two instance images may be larger than the area of a candidate region. In some embodiments, the overlap degree is a numerical value equal to or greater than 0, and of course, a value of the overlap degree may also be greater than 1. There may also be other definitions and calculation ways for the overlap degree, and they can be set by a related technical person according to the actual situation, which will not be specifically limited in the embodiments of this application.
In some embodiments, step 330 may further include the following sub-steps:
1. Extract candidate lesion images from the pathological image according to the candidate lesion region.
In some embodiments, after the candidate lesion region in the pathological image is determined, an image of the candidate lesion region can be directly determined as a candidate lesion image. Alternatively, an image of the candidate lesion region plus a surrounding region of the candidate lesion region can be determined as the candidate lesion image. In some embodiments, after the candidate lesion region is determined, the center of the candidate lesion region is taken as the center of the candidate lesion image, a region with the same shape as the pathological image and with the candidate lesion region included is determined, and an image within this region is determined as the candidate lesion image.
2. Zoom the candidate lesion image, based on the size of the pathological image to obtain a target lesion image, where the size of the target lesion image is consistent with that of the pathological image.
In some embodiments, if the size of the candidate lesion image is generally smaller than that of the pathological image (i.e., the above fixed size), then the candidate lesion image can be zoomed-in to obtain a target lesion image of which the size is consistent with that of the pathological image.
3. Sample the target lesion image by the second sampling way to obtain at least two second instance images.
In some embodiments, the target lesion image is sampled by the above second sampling way (such as dense sampling) to obtain at least two second instance images. In some embodiments, the number of the second instance images sampled from the target lesion image is equal to or greater than a second threshold, thereby ensuring a certain sampling number.
In the above embodiments, in a case where the shape of the candidate lesion image is the same as that of the pathological image, it is only necessary to zoom-in the candidate lesion image in all directions in the same proportion to obtain the target lesion image with the same size data as the pathological image, without a need to stretch or shorten the size of the candidate lesion image in a certain direction.
Step 340: Determine lesion indication information of the pathological image, based on feature information extracted from the at least two second instance images.
The feature information extracted from the at least two second instance images can be used for representing pathological features of the at least two second instance images. In some embodiments, after a computer device acquires the above second instance images, lesion indication information of a pathological image is determined based on feature information extracted from the above at least two second instance images, where the lesion indication information is used for indicating the lesion region in the pathological image.
In summary, in the technical solution provided in the embodiments of this application, the pathological image is sampled to obtain the instance images, the feature information is extracted from the instance images, and the lesion region in the pathological image is automatically determined based on the feature information of the instance images, thereby reducing the consumption of human resources and saving costs required to determine the lesion region.
In addition, in the embodiments of this application, firstly, the candidate lesion region in the pathological image is determined based on the first sampling way, then the second instance image is acquired based on the second sampling way with a greater sampling overlap degree than that of the first sampling way, and based on the second instance image, the lesion region in the pathological image is determined from the candidate lesion region. That is, global processing is performed on the pathological image with the first instance image, and then local processing is performed on the pathological image with the second instance image. From the global processing to the local processing, the area of a region in the pathological image that needs to be sampled by the second sampling way is reduced, thereby reducing the number of the second instance images that need to be collected and feature information extracted, reducing computational resources required to determine the lesion region, and improving the efficiency of determining the lesion region.
Moreover, the candidate lesion region is more likely to include the lesion region than other regions, and has a relatively high signal-to-noise ratio. Thus, by executing the second sampling way only for the candidate lesion region, an information loss can be reduced and perception of a lesion region with a relatively small area can be enhanced, thereby determining the lesion region in the pathological image more accurately.
The above way to determine the candidate lesion region will be introduced below.
In some possible implementations, the above step 320 may further include the following steps (1 to 4):
1. Perform feature encoding on each first instance image to obtain first feature information corresponding to each first instance image.
In some embodiments, feature encoding is performed on each first instance image to obtain a feature vector corresponding to each first instance image, where the feature vector corresponding to the first instance image may be a 768-dimensional feature or a feature of other dimensions, which will not be specifically limited in the embodiments of this application.
In some embodiments, the first feature information is extracted from the first instance image via a backbone network, and the process of extracting the first feature information from the first instance image via the backbone network can be described as:
where hk may be a 768-dimensional feature extracted from an instance.
2. Perform feature fusion on the first feature information corresponding to each first instance image to obtain global feature information of the pathological image.
In some embodiments, aggregation (i.e., feature fusion) is performed on the first feature information corresponding to each first instance image via a first classification network to obtain global feature information of the pathological image.
In some embodiments, the first feature information corresponding to each first instance image is processed by an attention mechanism to obtain a weight corresponding to each piece of first feature information; and weighted summation is performed on all pieces of first feature information according to the weight corresponding to each piece of first feature information to obtain global feature information of the pathological image.
In some embodiments, refer to the following formula for the process of obtaining the global feature information of the pathological image:
where Logits(B) is a global feature of the pathological image, c is a first classification network, and g is an attention mechanism-based aggregation network; and W∈, V∈RL×M.
In some embodiments, an attention module corresponding to the attention mechanism allocates a weight to each first instance image, and performs weighted summation thereon, and an obtained sum serves as a feature vector (i.e., global feature information) representing a package. The global feature information obtained by fusion can be fed into the first classification network to predict a classification label corresponding to each first instance image. The attention module can also be deemed as a binary classification network.
The classification networks (such as the first classification network, the second classification network, and the third classification network) involved in the embodiments of this application can be represented as:
where c(hi)n represents an nth classification network.
3. Determine a first predicted probability corresponding to each first instance image according to the global feature information and the first feature information corresponding to each first instance image, where the first predicted probability refers to a probability that the first instance image includes the lesion region.
In some embodiments, after the first feature information corresponding to each first instance image and the global feature information of the pathological image are obtained, a probability that each first instance image includes the lesion region can be determined via the first classification network according to the global feature information and the first feature information corresponding to each first instance image.
In some embodiments, the determining a first predicted probability corresponding to each first instance image according to the global feature information and the first feature information corresponding to each first instance image may include the following sub-steps (3.1 to 3.3):
3.1. Map the global feature information to generate a first probability coefficient, where the first probability coefficient refers to a probability that the pathological image includes the lesion region.
In some embodiments, the global feature information is mapped by Softmax to probability distribution between 0 and 1. For example, a set (also referred to as a package) of all first instance images is defined as B, then B={(x0, y0), (x1, y1), . . . , (xk, yk)}, where xk, yk represent a kth first instance image and a label thereof, respectively. Then a label Y(B) of B can be defined as:
where Y(B)=0 represents the absence of a lesion region in a pathological image, and Y(B)=1 represents the presence of the lesion region in the pathological image.
3.2. For each first instance image, map the first feature information corresponding to the first instance image to generate a second probability coefficient.
In some embodiments, the first feature information corresponding to each first instance image is mapped by Sigmoid to obtain the second probability coefficient corresponding to each first instance image, where the second probability coefficient refers to an initial probability that the first instance image includes the lesion region.
3.3. Determine a first predicted probability corresponding to the first instance image according to the first probability coefficient and the second probability coefficient.
In some embodiments, the first probability coefficient and the second probability are multiplied to obtain the first predicted probability corresponding to each first instance image. This process can be expressed as:
where p(hi) represents the first predicted probability of an ith first instance image, softmax{c(hi)} represents the first probability coefficient, and sigmoid{wTtanh(VhkT)} represents the second probability coefficient.
In some embodiments, the first predicted probability corresponding to the first instance image can also be expressed as:
By using the above method, the first probability coefficient and the second probability coefficient are multiplied to obtain the probability that the first instance image includes the lesion region. Then, a candidate lesion region can be determined according to whether the first predicted probability meets a first condition or not.
4. Determine the candidate lesion region, based on a position of the first instance image corresponding to the first predicted probability that meets a first condition in the pathological image.
In some embodiments, after a first predicted probability corresponding to each first instance image is determined, the first instance image that meets a first condition is selected, and a candidate lesion region is determined based on this.
In some embodiments, the first condition may be that a first predicted probability corresponding to a first instance image is equal to or greater than a third threshold.
In some embodiments, the first condition may also be to sort first instance images with the first predicted probability corresponding to the first instance images not less than x %, such as first instance images with the first predicted probability corresponding to the first instance images not less than 2%, from largest to smallest. That is, the first instance images that meet the first condition refer to the top x % of the first instance images with a maximum predicted probability of the presence of a lesion region. x may be 1, 2, 3 or the like, and a specific numerical value of x can be set by a related technical person according to the actual situation, which will not be specifically limited in the embodiments of this application.
In some embodiments, an image region composed of the first instance images that meet a first condition can be directly determined as a candidate lesion region, or the first instance images that meet the first condition and their surrounding regions can be determined as the candidate lesion region.
By determining a position of the first instance image that meets the first condition in a pathological image as the candidate lesion region, the range of the candidate lesion region is reduced, thereby reducing computational resources required to determine the lesion region and improving the efficiency of determining the lesion region.
The above way to acquire the lesion indication information will be introduced below.
In a possible implementation, the above step 340 may further include the following steps (1 to 3):
By determining the lesion probability distribution information of the pathological image, the probability distribution of the lesion region in the pathological image is obtained, and thereby, the lesion region in the pathological image can be determined based on the lesion probability distribution information, which saves costs of determining the lesion region.
In some embodiments, the performing feature fusion on the second feature information corresponding to each second instance image to obtain local feature information of the pathological image for a candidate lesion region includes: processing the second feature information corresponding to each second instance image by an attention mechanism to obtain a weight corresponding to each piece of second feature information; and performing weighted summation on all pieces of second feature information according to the weight corresponding to each second feature information to obtain local feature information.
By performing the weighted summation on all the pieces of second feature information to obtain the local feature information, the local feature information obtained by fusion can be fed into a second classification network to predict a classification label corresponding to each second instance image. In some embodiments, the lesion indication information is obtained by a lesion region determination model, where the lesion region determination model includes an encoding network, a first classification network, a second classification network, and a third classification network; where:
Refer to the above embodiment for part of the content of each step in this implementation, which will not be repeated herein.
In some embodiments, sampling is performed by the second sampling way to obtain a second instance image, which can be expressed as:
where pu represents the first predicted probability corresponding to a first instance image, and represents the second instance image obtained by sampling by the second sampling way.
In some embodiments, after being spliced, the local feature information and the global feature information of the pathological image are inputted into the third classification network to obtain lesion probability distribution information of the pathological image. This process can be expressed as:
where Logits(B) represents the lesion probability distribution information of the pathological image, z1 represents the global feature information, and z2 represents the local feature information.
In another possible implementation, the above step 340 may include the following steps (1 to 4):
In some embodiments, the second condition may be that the second predicted probability corresponding to the second instance image is equal to or greater than a fourth threshold. The second condition may also be to sort second instance images with the second predicted probability corresponding to the second instance images not less than a second proportion threshold, such as second instance images with the second predicted probability corresponding to the second instance images not less than 15%, from largest to smallest.
In some embodiments, an image region composed of the second instance images that meet a second condition can be directly determined as the lesion region, or the second instance images that meet the second condition and their surrounding regions can be determined as the lesion region.
By determining a position of the second instance image that meets the second condition in a pathological image as lesion indication information, the lesion region in the pathological image is obtained, thereby reducing computational resources required to determine the lesion region and improving the efficiency of determining the lesion region.
In some embodiments, the determining a second predicted probability corresponding to each second instance image according to the local feature information and the second feature information corresponding to each second instance image further includes the following steps (3.1 to 3.3):
3.3. determine a second predicted probability corresponding to the second instance image according to the third probability coefficient and the fourth probability coefficient.
In some embodiments, referring to the above embodiments, the local feature information is mapped by Softmax to generate the third probability coefficient; and the second feature information corresponding to the second instance image is mapped by Sigmoid to generate the fourth probability coefficient.
Refer to the above embodiment for part of the content of each step in this implementation, which will not be repeated herein.
In this implementation, after the candidate lesion region is determined, the lesion indication information can be predicted directly according to the second instance image, without a need for the global feature information or other information related to the first instance image, thereby simplifying a way to determine the lesion indication information, saving processing resources and time required to determine the lesion indication information, and improving the efficiency of determining the lesion indication information.
As shown in
Refer to
step 810: Acquire a training sample set, where the training sample set includes at least one sample pathological image.
In some embodiments, the sample pathological image is a pathological image in which a lesion region has been labeled.
Step 820: Sample the sample pathological image by a first sampling way and a second sampling way to obtain at least two first sample instances and at least two second sample instances corresponding to the sample pathological image, where an overlap degree between the second sample instances is greater than that between the first sample instances.
Step 830: Determine lesion probability distribution information of the sample pathological image, based on feature information extracted from the at least two first sample instances and feature information extracted from the at least two second sample instances, the lesion probability distribution information being used for indicating probability distribution of the lesion region in the sample pathological image.
In some embodiments, as shown in
Step 840: Train the lesion region determination model according to the lesion probability distribution information.
In some embodiments, as shown in
In some embodiments, the lesion region determination model is trained by a self-supervised model training method, such as a Moco V3 method. In theory, two branches of the lesion region determination model (i.e., a branch of processing a first instance image and a branch of processing a second instance image) can both acquire sufficient information to determine whether there is a lesion region in a pathological image or not. Therefore, a label (i.e., a label indicating the presence or absence of a lesion region in a pathological image) of the pathological image may serve as a label for final prediction or an auxiliary signal in a supervised model training process.
In some embodiments, in a training process of a lesion region determination model, a label of a sample pathological image may serve as both a training label and an auxiliary signal in a supervised training process of the lesion region determination model. In some embodiments, from first or second sample instances, k sample instances with a maximum probability of the presence of a lesion region are used as positive sample instances (also referred to as + sample instances), and k sample instances with a minimum probability of the presence of a lesion region are used as negative sample instances (also known as—sample instances), and dummy labels are generated, and then are constrained with a cross-entropy loss. A loss of a lesion region determination model may be expressed as:
where i represents a loss of an ith classification network, and 1 to 4 represent losses corresponding to the above first classification network, the above second classification network, the above third classification network, and the above encoding network, respectively.
A total loss of a lesion region determination model may be expressed as:
where total represents a total loss of the lesion region determination model, and λ1 to λ4 represent coefficients corresponding to these 4 losses 1 to 4.
In summary, in the technical solution provided in the embodiments of this application, the global processing is performed on the pathological image with the first sample instance, thereby quickly determining the candidate lesion region. Then, the key region (i.e., the candidate lesion region) of the pathological image is processed in a more targeted manner with the second sample instance. The candidate lesion region is more likely to include the lesion region than other regions, and has a relatively high signal-to-noise ratio. Thus, by only processing the candidate lesion region in a second stage, an information loss can be reduced and perception of a lesion region with a relatively small area can be enhanced, so that the trained lesion region determination model can determine the lesion region in the pathological image more accurately, thereby improving the accuracy and precision of the lesion region determination model.
The method for training the lesion region determination model and the method for determining the lesion region provided in the embodiments of this application are a model training process and a model use process corresponding to each other. For details that are not explained in detail on one side, refer to the introduction on the other side.
Taking the prostate as an example, through comparative experiments and ablation experiments on a prostate dataset and its difficult sample subset, it can be proven that the technical solution provided in the embodiments of this application has superiority and a good visualization effect. As shown in
As shown in Table 1 below, by employing test sets of the technical solution provided in the embodiments of this application in comparative experiments, indicator data in all aspects were significantly superior to those of comparative cases. It can be seen that compared to the comparative case, the technical solution provided in the embodiments of this application has higher accuracy and precision in terms of determining the lesion region in the pathological image.
The following describes apparatus embodiments of this application, which can be used for executing the method embodiments of this application. For details not disclosed in the apparatus embodiments of this application, refer to the method embodiments of this application.
Refer to
The first image acquisition module 1110 is configured to sample a pathological image by a first sampling way to obtain at least two first instance images.
The lesion region determination module 1120 is configured to determine a candidate lesion region in the pathological image, based on feature information extracted from the at least two first instance images.
The second image acquisition module 1130 is configured to sample the candidate lesion region by the second sampling way to obtain at least two second instance images, where an overlap degree between the second instance images is greater than that between the first instance images.
The lesion information determination module 1140 is configured to determine lesion indication information of the pathological image, based on feature information extracted from the at least two second instance images, where the lesion indication information is used for indicating the lesion region in the pathological image.
In some embodiments, as shown in
The feature encoding sub-module 1121 is configured to perform feature encoding on each first instance image to obtain first feature information corresponding to each first instance image.
The feature fusion sub-module 1122 is configured to perform feature fusion on the first feature information corresponding to each first instance image to obtain global feature information of the pathological image.
The probability determination sub-module 1123 is configured to determine a first predicted probability corresponding to each first instance image according to the global feature information and the first feature information corresponding to each first instance image, where the first predicted probability refers to a probability that the first instance image includes the lesion region.
The region determination sub-module 1124 is configured to determine a candidate lesion region, based on a position of the first instance image corresponding to the first predicted probability that meets a first condition in the pathological image.
In some embodiments, as shown in
In some embodiments, as shown in
In some embodiments, as shown in
The feature encoding sub-module 1121 is further configured to perform feature encoding on each second instance image to obtain second feature information corresponding to each second instance image.
The feature fusion sub-module 1122 is further configured to perform feature fusion on the second feature information corresponding to each second instance image to obtain local feature information of the pathological image for the candidate lesion region.
The probability determination sub-module 1141 is configured to determine lesion probability distribution information of the pathological image, based on the local feature information, and global feature information of the pathological image, where the lesion probability distribution information is used for indicating probability distribution of the lesion region in the pathological image; and where the lesion indication information includes the lesion probability distribution information.
In some embodiments, as shown in
In some embodiments, as shown in
The feature encoding sub-module 1121 is further configured to perform feature encoding on each second instance image to obtain second feature information corresponding to each second instance image.
The feature fusion sub-module 1122 is further configured to perform feature fusion on the second feature information corresponding to each second instance image to obtain local feature information of the pathological image for the candidate lesion region.
The probability determination sub-module 1141 is configured to determine a second predicted probability corresponding to each second instance image according to the local feature information and the second feature information corresponding to each second instance image, where the second predicted probability refers to a probability that the second instance image includes the lesion region.
The lesion information determination sub-module 1142 is configured to determine lesion indication information of the pathological image, based on a position of the second instance image corresponding to the second predicted probability that meets a second condition in the pathological image.
In some embodiments, as shown in
In some embodiments, the first image acquisition module 1110 is configured to:
In some embodiments, the second image acquisition module 1130 is configured to:
In some embodiments, the lesion indication information is obtained by a lesion region determination model, where the lesion region determination model includes an encoding network, a first classification network, a second classification network, and a third classification network; where
In summary, in the technical solution provided in the embodiments of this application, the pathological image is sampled to obtain the instance images, the feature information is extracted from the instance images, and the lesion region in the pathological image is automatically determined based on the feature information of the instance images, thereby reducing the consumption of human resources and saving costs required to determine the lesion region.
In addition, in the embodiments of this application, firstly, the candidate lesion region in the pathological image is determined based on the first sampling way, then the second instance image is acquired based on the second sampling way with a greater sampling overlap degree than that of the first sampling way, and based on the second instance image, the lesion region in the pathological image is determined from the candidate lesion region. Thus the area of a region in the pathological image that needs to be sampled by the second sampling way is reduced, thereby reducing the number of the second instance images that need to be collected and feature information extracted, reducing computational resources required to determine the lesion region, and improving the efficiency of determining the lesion region.
Moreover, the candidate lesion region is more likely to include the lesion region than other regions, and has a relatively high signal-to-noise ratio. Thus, by executing the second sampling way only for the candidate lesion region, an information loss can be reduced and perception of a lesion region with a relatively small area can be enhanced, thereby determining the lesion region in the pathological image more accurately.
Refer to
The sample acquisition module 1310 is configured to acquire a training sample set, where the training sample set includes at least one sample pathological image.
The instance acquisition module 1320 is configured to sample the sample pathological image by a first sampling way and a second sampling way to obtain at least two first sample instances and at least two second sample instances corresponding to the sample pathological image, where an overlap degree between the second sample instances is greater than that between the first sample instances.
The information acquisition module 1330 is configured to determine lesion probability distribution information of the sample pathological image, based on feature information extracted from the at least two first sample instances and feature information extracted from the at least two second sample instances, the lesion probability distribution information being used for indicating probability distribution of the lesion region in the sample pathological image.
The model training module 1340 is configured to train the lesion region determination model according to the lesion probability distribution information.
In some embodiments, the lesion region determination model includes an encoding network, a first classification network, a second classification network, and a third classification network. The information acquisition module 1330 is configured to:
In some embodiments, the model training module 1340 is configured to:
In summary, in the technical solution provided in the embodiments of this application, the global processing is performed on the pathological image with the first sample instance, thereby quickly determining the candidate lesion region. Then, the key region (i.e., the candidate lesion region) of the pathological image is processed in a more targeted manner with the second sample instance. The candidate lesion region is more likely to include the lesion region than other regions, and has a relatively high signal-to-noise ratio. Thus, by only processing the candidate lesion region in a second stage, an information loss can be reduced and perception of a lesion region with a relatively small area can be enhanced, so that the trained lesion region determination model can determine the lesion region in the pathological image more accurately, thereby improving the accuracy and precision of the lesion region determination model.
When the apparatus provided in the above embodiment implements its functions, it is only illustrated with the division of the above functional modules as an example. In the practical application, the above functions may be allocated to and completed by different function modules according to requirements. That is, an internal structure of the device is divided into different function modules to complete all or some of the functions described above. In addition, the apparatus provided in the above embodiments and the method embodiments fall within the same conception. For details of a specific implementation process, refer to the method embodiments, which will not repeated herein.
Refer to
The basic I/O system 1406 includes a display 1408 configured to display information and an input device 1409 such as a mouse or a keyboard that is configured to input information by a user. The display 1408 and the input device 1409 are both connected to the CPU 1401 via an input/output controller 1410 connected to the system bus 1405. The basic I/O system 1406 may further include the input and output controller 1410 to receive and process inputs from a plurality of other devices such as a keyboard, a mouse, and an electronic stylus. Similarly, the input/output controller 1410 further provides an output to a display screen, a printer, or other types of output devices.
The mass storage device 1407 is connected to the CPU 1401 via a mass storage controller (not shown) connected to the system bus 1405. The mass storage device 1407 and a computer-readable medium associated with the large-capacity storage device provide non-volatile storage to the computer device 1400. That is, the mass storage device 1407 may include a computer-readable medium (not shown) such as a hard disk or a compact disc read-only memory (CD-ROM) drive.
Without loss of generality, the computer-readable medium may include a computer storage medium and a communication medium. The computer storage medium includes volatile and non-volatile media, and removable and non-removable media implemented with any method or technology used for storing information such as computer-readable instructions, data structures, program modules, or other data. The computer storage medium includes an RAM, an ROM, an erasable programmable read only memory (EPROM), an electrically erasable programmable read only memory (EEPROM), a flash memory or other solid-state storage devices, a CD-ROM, a digital video disc (DVD) or other optical memories, a tape cartridge, a magnetic cassette, a magnetic disk memory, or other magnetic storage devices. Of course, a person skilled in art can know that the computer storage medium is not limited to the above several types. The above system memory 1404 and the above mass storage device 1407 may be collectively referred to as a memory.
According to the various embodiments of this application, the computer device 1400 may further be connected, through a network such as the Internet, to a remote computer on the network and run. That is, the computer device 1400 may be connected to a network 1412 by using a network interface unit 1411 connected to the system bus 1405, or may be connected to another type of network or a remote computer system (not shown) by using a network interface unit 1411.
The memory further includes a computer program, where the computer program is stored in the memory, and is configured to be executed by one or more processors to implement the above method for determining a lesion region or the above method for training a lesion region determination model.
In an exemplary embodiment, a non-transitory computer-readable storage medium is further provided, where the storage medium stores a computer program therein, and the computer program, when executed by a processor, implements the above method for determining the lesion region or the above method for training the lesion region determination model.
In some embodiments, the computer-readable storage medium may include: a read-only memory (ROM), a random access memory (RAM), a solid state drive (SSD), and an optical disc. The RAM may include a resistance random access memory (ReRAM) and a dynamic random access memory (DRAM).
In an exemplary embodiment, a computer program product is further provided, where the computer program product includes a computer program, and the computer program is stored in a computer-readable storage medium. A processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program such that the computer device executes the above method for determining the lesion region or executes the above method for training the lesion region determination model.
“A plurality of” mentioned herein refers to two or more. “And/or” describes an association relationship between associated objects and represents that there may be three relationships. For example, A and/or B may represent the following three cases: only A exists, both A and B exist, and only B exists. The character “/” generally indicates an “or” relationship between the associated objects. In addition, the step numbers described herein merely exemplarily show a possible execution sequence of the steps. In some other embodiments, the above steps may not be performed according to the number sequence. For example, two steps with different numbers may be performed simultaneously, or two steps with different numbers may be performed according to a sequence contrary to the sequence shown in the figure. This is not limited in the embodiments of this application. In this application, the term “module” in this application refers to a computer program or part of the computer program that has a predefined function and works together with other related parts to achieve a predefined goal and may be all or partially implemented by using software, hardware (e.g., processing circuitry and/or memory configured to perform the predefined functions), or a combination thereof. Each module can be implemented using one or more processors (or processors and memory). Likewise, a processor (or processors and memory) can be used to implement one or more modules. Moreover, each module can be part of an overall module that includes the functionalities of the module.
The above descriptions are merely exemplary embodiments of this application, but are not intended to limit this application. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of this application are to fall within the protection scope of this application.
Number | Date | Country | Kind |
---|---|---|---|
202211056712.6 | Aug 2022 | CN | national |
This application is a continuation application of PCT Patent Application No. PCT/CN2023/102592, entitled “IMAGE ENCODER TRAINING METHOD AND APPARATUS, DEVICE, AND MEDIUM” filed on Jun. 27, 2023, which claims priority to Chinese Patent Application No. 202211056712.6, entitled “METHOD FOR DETERMINING LESION REGION, AND MODEL TRAINING METHOD AND APPARATUS FOR PATHOLOGICAL IMAGE” filed on Aug. 31, 2022, the entire content of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2023/102592 | Jun 2023 | WO |
Child | 18641184 | US |