The following relates generally to the medical reporting arts, radiology arts, radiology reporting arts, radiology examination reading arts, histopathology reporting arts, and related arts.
Medical reports, such as radiology reports, histopathology reports, and the like, provide written summaries of clinical findings drawn from medical images such as magnetic resonance imaging (MRI) images, computed tomography (CT) images, positron emission tomography (PET) images, or the like in the case of radiology, or microscope images in the case of histopathology. Interpretation of image features in such medical disciplines to determine actionable clinical findings (sometimes referred to as “reading” a medical examination) is a medical specialty, and the reports are typically written by domain experts, e.g. radiologists in the radiology field, or pathologists in the histopathology field. Much of the medical report is usually written in freeform text in order to enable the domain expert to capture clinical findings with a high degree of precision. Skilled radiologists and pathologists are in high demand, and often maintain heavy workloads. This can impose time constraints on performing readings of medical examinations. For example, in the clinical radiology field, some medical institutions employ a metric such as Relative Value Unit (RVU), where (as a non-limiting example) a CT reading may be assigned 4 RVU points, an MRI reading 8 RVU points, an x-ray reading 1 RVU point, and so forth. The radiologist is expected to perform readings with a certain number of total RVU points per work shift.
These time constraints, if too severe, can adversely impact accuracy of radiology readings. Hence, there is interest in developing computer aided diagnostic (CADx) tools for providing assistive automated image interpretation. For example, a CADx tool can be trained to detect tumors or lesions in a radiology image. A CADx tool can also be used for other purposes. For example, if combined with an automated tool trained to extract clinical findings from a radiology report, the CADx tool and report findings extractor can be used in tandem to assess completeness and accuracy of radiology reports.
Training of machine learning (ML) models and deep learning (DL) models for use in analyzing medical images and/or medical reports usually requires a large amount of labelled data, which can be expensive to acquire (especially for radiology data). For example, training a CADx tool to detect whether a certain finding is present in an image usually requires a large number of labeled training medical images, with each image labeled by a domain expert (e.g., skilled radiologist) as to whether the finding is present. Likewise, training a report findings extractor usually requires a large corpus of labeled medical reports which are labeled with the findings contained in those reports. There exists a large amount of unstructured and free-text radiologic reports, which can be particularly valuable for the development of artificial intelligence techniques. However, most of the radiologic reports cannot be directly used for ML or DL model development for two main reasons. First, the labels for clinical findings need to be extracted from the reports. Second, the reports are not structured, and can have various styles depending on the radiologists. The unstructured and free-text nature of the reports make it a challenging task to extract the labels accurately. The variability of the writing style of the various reports may also hinder effective analysis of the reports.
Regarding the latter issue of variable writing style, the main strategy of reducing the impact of style is structured reporting, which has been advocated to standardize the style of the report to improve reporting quality and data mining. However, structured reporting may interrupt the case viewing workflow. A structured report may also include unnecessary or redundant information, which reduce the clarity of the report. The rigid nature of structured reporting may also reduce the productivity of radiologists, and/or may limit the ability of the radiologist to convey subtleties of his or her clinical findings.
Recently-developed automatic report annotation algorithms attempt to find better ways to extract labels from the free-text radiology reports directly. These algorithms can be rule-based algorithms (e.g., CheXpert Labeller), or DL-based algorithm (e.g., Amazon Comprehend Medical and Tie-Net). Particularly, the Tie-Net algorithm utilizes both the chest x-ray images and the reports to perform a multi-modal prediction and showed promising results for report annotation and label extraction of critical findings for chest x-rays. Compared to conventional algorithms, the better performance of DL algorithms for image and report annotation is attributed to the deep and complex features extracted through convolutional neural networks (CNNs).
Using extracted features from radiologic data, there have been a few algorithms proposed for automatic generation of radiology reports, typically based on Recurrent Neural Networks (RNNs). Particularly, hierarchical Long Short-Term Memory (LSTM), with one sentence LSTM predicting the topic of a sentence and a word LSTM generating the words for that sentence, has shown great potential in generating high quality radiology reports.
The following discloses certain improvements to overcome these problems and others.
In one aspect, a non-transitory computer readable medium stores instructions executable by at least one electronic processor to perform a method of analyzing a medical report presenting clinical content determined from one or more images. The method includes: extracting a text embedding from the medical report; extracting an image embedding from the one or more images; determining one or more content feature vectors from the text embedding and the image embedding, the one or more content feature vectors being indicative of clinical content presented in the medical report; determining one or more style feature vectors from the text embedding, the one or more style feature vectors being indicative of a style of the medical report; and at least one of: extracting one or more clinical findings contained in the medical report using the one or more content feature vectors; scoring the style of the medical report using the one or more style feature vectors; and/or converting the medical report to a target style using the one or more content feature vectors and one or more target style feature vectors different from the determined one or more style feature vectors.
In another aspect, an apparatus includes at least one electronic processor programmed to: extract a text embedding from a medical report; extract an image embedding from one or more images; determine one or more content feature vectors from the text embedding and the image embedding, the one or more content feature vectors being indicative of clinical content presented in the medical report; determine one or more style feature vectors from the text embedding, the one or more style feature vectors being indicative of a style of the medical report; and extract one or more clinical findings contained in the medical report using the one or more content feature vectors.
In another aspect, a method of analyzing a medical report presenting clinical content determined from one or more images includes: extracting a text embedding from the medical report; extracting an image embedding from the one or more images; determining one or more content feature vectors from the text embedding and the image embedding, the one or more content feature vectors being indicative of clinical content presented in the medical report; determining one or more style feature vectors from the text embedding, the one or more style feature vectors being indicative of a style of the medical report; and converting the medical report to a target style using the one or more content feature vectors and one or more target style feature vectors different from the determined one or more style feature vectors.
One advantage resides in providing more focused automated analyses of medical reports by separating content components and style components of the medical reports.
Another advantage resides in increasing medical report annotation accuracy, medical report effectiveness, and medical report clarity.
Another advantage resides in generating a medical report template for radiologists, pathologists, or other domain experts to use in preparing future medical reports.
Another advantage resides in performing a learning disentangled feature representation of the content and style components of the radiology report.
A given embodiment may provide none, one, two, more, or all of the foregoing advantages, and/or may provide other advantages as will become apparent to one of ordinary skill in the art upon reading and understanding the present disclosure.
The disclosure may take form in various components and arrangements of components, and in various steps and arrangements of steps. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the disclosure.
The following relates to improvements in on automated analyses of the content of medical reports, with radiology reports being described as an illustrative example of medical reports. For example, a radiology report can be analyzed to identify clinical findings, with the results being used to assess report completeness (e.g., identify missed or erroneous findings) or to generate image labels to label training data for use in training artificial intelligence (AI) image analyzers (e.g., CADx finding detectors).
A radiologist's style in writing a report can adversely impact these tasks. That is, different radiologists may articulate the same clinical finding using different wordings, placed in different sections of the report, with different modifier words, and so forth. This style variability can adversely impact automated analyses of radiology report content. Differences in style can be reduced by using a structured radiology reporting form, but radiologists may not like being constrained to a predefined report structure.
As recognized herein, style features and content features should be independent. In particular, the content features should be the same for a given image set and should be independent of which radiologist prepared the report; whereas the style features should be the same for all reports written by a particular radiologist and should be independent of the content of those reports.
Based on the foregoing, feature disentanglement is performed to isolate a content feature vector and a style feature vector for each report. First, a report is processed to generate a text embedding. Corresponding images (that is, images that were reviewed by the radiologist, pathologist, or other domain expert and which are described or analyzed in the medical report) are processed to generate an image embedding. The text embedding can be done using a word embedding model, while the image embedding may be done using a Convolutional Neural Network (CNN), for example.
In some embodiments disclosed herein, a content encoder is trained on a training set consisting of both radiology report embeddings and the corresponding image embeddings, while a style encoder is trained on a training set consisting of only the report embeddings. Using the image embeddings in training the content encoder is expected to stabilize it. On the other hand, the style encoder is trained on only the report embeddings, because the style should be independent of the images which contain only content information. The content encoder and style encoder are typically artificial neural networks.
To ensure that the content encoder produces feature vectors that contain only content information (and not style information), in an illustrative embodiment the training includes simultaneously training the content encoder and a findings annotator by inputting the content feature vectors output by the content encoder to the findings annotator that then outputs clinical finding label vectors (e.g., binary labels in which each vector element corresponds to a finding and have a “1” or “0” value; or probability labels in which each vector element stores a probability of the corresponding finding). The training data are labeled with ground truth finding values (e.g., annotated by an expert radiologist) and the difference between the clinical finding vectors output by the findings annotator and the corresponding ground truth finding vectors is fed back as an error to the content encoder training.
To ensure that the style encoder produces feature vectors that contain only style information (and not content information), in an illustrative embodiment the training includes simultaneously training the style encoder and a report generator. The report generator is an artificial neural network that is trained to receive a content feature vector and a style feature vector and to output a text embedding. By running the report generator on content/style vector combinations that are swapped, it can be assessed as to whether the style feature vector is independent of the content.
In some embodiments disclosed herein, the trained content encoder can be applied to an input radiology report and associated images to generate a content feature vector that is then input to the co-trained findings annotator to extract the findings contained in the report. This can be used in various ways, such as for assessing report quality in terms of the actual report content.
In other embodiments disclosed herein, the trained style encoder can be applied to an input radiology report to generate a style features vector. A satisfaction predictor trained on a training set of such style feature vectors that are labeled with feedback assessments by receiving physicians can then be applied to the style features vector to predict whether receiving physicians will be likely to be satisfied with the style of the report.
In some embodiments disclosed herein, the trained content and style encoders can be used along with the report generator to transform a radiology report from one style to another style. For example, a “standard” style can be generated by running a number of reports that received positive feedback from receiving physicians through the style encoder, and taking an average of the resulting style feature vectors. Then, when a new report is received, it is processed by the content encoder to generate a corresponding content feature vector. The report generator then receives that corresponding content feature vector along with the standard style feature vector to reconstruct a report with the content of the new report but with the standard style.
While the following focuses on radiology reports, a similar approach can be used in conjunction with other types of reports that analyze image data, such as histopathology reports.
With reference to
The at least one electronic processor 20 is configured as described above to perform the analysis method or process 100 presenting clinical content determined from one or more images 38. The non-transitory storage medium 26 stores instructions which are readable and executable by the at least one electronic processor 20 to perform disclosed operations including performing the method or process 100. In some examples, the method 100 may be performed at least in part by cloud processing.
With reference to
At an operation 106, one or more content feature vectors 107 are determined from the text embedding and the image embedding. The one or more content feature vectors 107 are indicative of clinical content presented in the medical report 34. At an operation 108, one or more style feature vectors 109 are determined from the text embedding only, and not using the image embedding. The one or more style feature vectors 109 are indicative of a style of the medical report 34. The one or more content feature vectors 107 are not indicative of the style of the medical report 34, and the one or more style feature vectors 109 are not indicative of the clinical content presented in the medical report. The operations 106 and 108 can be performed in either order, or performed concurrently. It is also noted that the term “one or more feature vectors” as used herein is intended to encompass any data structure that stores information elements, e.g. the one or more feature vectors may be a single vector, multiple vectors, a concatenation of vectors, an array having rows and columns whose elements represent elements of a single vector or of multiple vectors, and/or so forth. Such a data structure may optionally be encrypted during storage, e.g., to ensure security of patient medical information, and optionally may be encrypted during use (e.g., if homomorphic encryption is used).
In some embodiments, the content feature vector(s) and/or the style feature vector(s) can be used in training operations. In one example embodiment, a content encoder 44 and a clinical findings annotator 46 can be implemented in the at least one electronic processor 20 (see
To co-train the content encoder 44 and the findings annotator 46, the content feature vector(s) and the corresponding image feature vectors are input to the content encoder 44, and the content vector 107 output by the content encoder 44 is then input to the findings annotator 46. The parameters (e.g., NN weights and activation functions) of the content encoder 44 and the findings annotator 46 are optimized using backpropagation and/or other NN training techniques to design the content encoder 44 and the findings annotator 46 to output findings that optimally match the finding labels of the training data which serve as ground truth values.
In the illustrative embodiment, a style encoder 48 and a report generator 50 can be implemented in the at least one electronic processor 20. The style encoder 48, which can be a NN, is used in determining the one or more style feature vectors 109, and the report generator 50 is configured to generate the medical reports 34. The style encoder 48 and the report generator 50 are co-trained using training text embeddings of training medical reports 34 presenting clinical content determined from corresponding training images 38.
To co-train the style encoder 48 and the report generator 50, the text embedding (but not the image embedding) is input to the style encoder 48 to generate the style feature vector(s) 109. Content feature vector(s) 107 generated by the content encoder 44 for other reports are used with the style feature vector(s) 109 as inputs to the report generator 50. Text embeddings are output for these various content/style combinations by the report generator. A determination is made as to whether the one or more style features vectors are independent of the one or more content features vectors.
With reference back to
The apparatus 10 utilizes both data from the images 38 and the reports 34 as input. It is not required to have the same case reviewed by different radiologist, but it is assumed that the images 38 are of the same kind (e.g., modality and/or anatomy imaged, such as chest x-rays). Labels can be required, which are suitably M×1 binary vectors, to indicate whether a finding/concept is present in the report 34. “M” is the number of finding/concept categories. For image data, image embeddings 52 can be extracted using an imaging embedding Convolutional Neural Network (CNN) 42. For report data, text embeddings 54 can be obtained using the pre-trained word embedding models 40.
The image and text embeddings 52, 54 are input to content encoder 44 to generate the content feature vectors 107, while the text embeddings 54 alone are input to the style encoder 48 to generate the style feature vectors 109. The use of image and report data jointly for content feature extracting improves the reliability of the content and stabilizes the model training. These encoders 44, 48 are suitably NNs which map the embeddings to two vectors/features-a content feature and a style feature. The content encoder 44 is, in one illustrative example, composed of a CNN which extracts features from image data (here, the image embeddings 52), and a Transformer-based neural network which extracts features from text data (here, the text embeddings 54). The style encoder 48, in one illustrative embodiment, contains a Transformer-based neural network only.
The content feature vectors 107 and style feature vectors 109 can be input to the report generator 50 (and optionally to an image generator 56) to recover the image and text embeddings 52, 54. Similar to the encoders 44, 48, the image generator 56 is a CNN and the input thereto is the content feature, while the report generator 50 is a Transformer-based neural network and the input thereto is the content features and style features. Next, the image embeddings 52 can be input to another CNN to reconstruct the images 38R with the original dimensions, and the text embeddings 54 can be fed into a look-up-table or another Transformer-based neural network to generate updated reports 34u. (In an alternative embodiment, the image generator 56 is omitted, and the original images 38 are retained rather than reconstructed. However, reconstructing the images using the image generator 56 can provide a check on performance of the image embedding process.)
where M represents the number of finding types, yi,c is an indicator variable which equals 1 if sample i belongs to class c. pi,c is the predicted probability of sample i belongs to class c, and N is the total number of samples in the training set.
With continuing reference to
With continuing reference to
These are merely illustrative applications. In another example application (not shown), the style features of two reports 34 can also be used to compute a quantitative distance to reflect the similarity of two reports. In another example application (not shown), by comparing with a standard report style, a report outlier detection can be achieved. This can also be used for training radiology residents on report writing.
To train the style encoder 48 and the report generator 50, an adversarial loss algorithm can be used. Specifically, for a given pair of training samples, with one report 34 from, for example, radiologist A and another report 34 from, for example, radiologist B, their extracted style features can be swapped, and the reconstructed text embeddings can be indistinguishable from the original reports with the same style (but different content) according to Equation 2:
where Di (i=1 or 2) is a discriminator for report (ri) written by radiologist i, ci and si are the content and style features extracted from the report.
To train the report generator 50, a combination of reconstruction loss and adversarial loss can be used according to Equation 3:
where R is a loss function which evaluates the distance between the original (Ereport) and reconstructed (Ereport) text embeddings. Examples of R include L1 norm and L2 norm. La is the adversarial loss shown in Equation 2, and λa is a weighting constant. The image generator network can be trained using a similar loss function shown in Equation 3.
With reference to
The apparatus 10 can be used for improving radiology data annotation in many applications, including generating high-quality labels which can be used for developing machine learning and deep learning techniques; generating radiology reports with standardized or customized style, while preserving the content; transferring the style of a report to the style of another report; measuring the similarity of two radiology reports based on their styles; training radiology residents on report writing by evaluating content completeness and reporting style; detecting radiology report outliers in terms of style; development of report analytics models such as report content categorization, content completeness check correlation between user satisfaction and reporting style; among others.
The disclosure has been described with reference to the preferred embodiments. Modifications and alterations may occur to others upon reading and understanding the preceding detailed description. It is intended that the exemplary embodiment be construed as including all such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2022/052089 | 1/28/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63143256 | Jan 2021 | US |