The present invention relates to similarity search generally and to X-ray image search in particular.
When radiologists encounter an ambiguous case, they typically search in public or internal databases for similar cases that would help them in the diagnostic decision-making process. Such searches are a significant burden to their workflow, and reduces time available to diagnose other cases. It is important to replace such a manual intensive search, with an automatic content-based image retrieval system.
In their paper: “Interpretability-Guided Content-Based Medical Image Retrieval” by Wilson Silva, Alexander Poellinger, Jaime S. Cardoso and Mauricio Reyes, at MICCAI 2020, Silva et al describe a medical image retrieval system 100 as shown in
KNN searcher 105 then performed a KNN search using candidate diagnosed embeddings 102 against a query partially diagnosed X-ray 107 which had similarly been encoded into a query partially diagnosed embedding 108. As a result, K (for example 10) candidate diagnosed embeddings 102 that were most similar to the query partially diagnosed X-ray 107 were returned by KNN searcher 105. System 100 then returned the candidate diagnosed chest X-rays 101 associated with the K candidate diagnosed embeddings 102 to the operator, as the K most cases in the database, most similar to the partially diagnosed X-ray 107.
There is therefore provided, in accordance with a preferred embodiment of the present invention a system to retrieve medical X-rays. The system includes a trained convolutional neural network (CNN), a balancing feature generator, a balancing type selector, and a K-Nearest Neighbor (KNN) classifier. The trained CNN encodes a plurality of diagnosed X-ray images into a plurality of candidate embeddings, and encodes a partially diagnosed X-ray image into a query embedding. The balancing feature generator produces a plurality of virtual candidate embeddings from the query embedding and the plurality of candidate embeddings. The balancing type selector selects a subset of the plurality of virtual candidate embeddings. The KNN classifier performs a KNN search between the query embedding and a plurality of the candidate embeddings and the subset of the plurality of virtual candidate embeddings.
Moreover, in accordance with a preferred embodiment of the present invention, the system includes a diagnosed X-ray image datastore, an embeddings datastore, and a balancing embeddings datastore. The diagnosed X-ray image datastore stores the plurality of diagnosed X-ray images, the embeddings datastore stores the plurality of candidate embeddings, and a balancing embeddings datastore. The balancing embeddings datastore stores the plurality of virtual candidate embeddings.
Further, in accordance with a preferred embodiment of the present invention, the system includes a target diagnosis selector which filters unwanted candidate embeddings stored in the embeddings datastore, from the KNN classifier, prior to the performance of the KNN search.
Still further, in accordance with a preferred embodiment of the present invention, the system includes a data visualizer which shows the quantity of the plurality of candidate embeddings stored in the embeddings datastore, and/or the quantity of the plurality of virtual candidate embeddings stored in the balancing embeddings datastore, that are associated with a plurality of diagnoses and a plurality of classes of the plurality of diagnoses.
Additionally, in accordance with a preferred embodiment of the present invention, the system includes an X-ray data retriever which retrieves diagnostic and image data, from the diagnosed image X-ray datastore, that is associated with the K nearest neighbor candidates returned by the KNN classifier during the KNN search.
Moreover, in accordance with a preferred embodiment of the present invention, the system is implemented in associative memory.
There is also provided, in accordance with a preferred embodiment of the present invention, a method to retrieve medical X-rays. The method includes encoding a plurality of diagnosed X-ray images into a plurality of candidate embeddings, and second encoding a partially diagnosed X-ray image into a query embedding, producing a plurality of virtual candidate embeddings from the query embedding and the plurality of candidate embeddings, selecting a subset of the plurality of virtual candidate embeddings, and performing a KNN search between the query embedding and a plurality of the candidate embeddings and the subset of the plurality of virtual candidate embeddings.
Moreover, in accordance with a preferred embodiment of the present invention, the method includes storing the plurality of diagnosed X-ray images in a diagnosed X-ray image datastore, storing the plurality of candidate embeddings in an embeddings datastore, and storing the plurality of virtual candidate embeddings in a balancing embeddings datastore.
Further, in accordance with a preferred embodiment of the present invention, the method includes filtering unwanted candidate embeddings stored in the embeddings datastore, from the KNN classifier, prior to the performance of the KNN search.
Still further, in accordance with a preferred embodiment of the present invention, the method includes showing the quantity of the plurality of candidate embeddings stored in the embeddings datastore, and/or the quantity of the plurality of virtual candidate embeddings stored in the balancing embeddings datastore, that are associated with a plurality of diagnoses and a plurality of classes of the plurality of diagnoses.
Additionally, in accordance with a preferred embodiment of the present invention, the method includes retrieving diagnostic and image data, from the diagnosed image X-ray datastore, that is associated with the K nearest neighbor candidates returned by the KNN classifier during the KNN search.
The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.
Applicant has realized that for accurate KNN search, the candidate dataset (against which a query will be searched) needs to be balanced. To be balanced, a dataset does not have an overwhelming amount of data for only one, or only some of the target candidate classes or groups. The problem with Silva et Al's X-ray CNN/KNN system described hereinabove, is that the dataset of candidate X-ray embeddings is unbalanced. The imbalance is reflected in that for any particular diagnosis, or class of diagnosis (which may be the class or group mentioned hereinabove), there number of records associated with each class or group, is not equal. For example, if there are 5 diagnosis classes, 1 thru 5, the number of X-ray records associated with the groups is unequal.
Such an imbalance in diagnosed candidate X-ray records leads to an imbalance in candidate X-ray embeddings. This imbalance leads to deterioration of the performance of the Silva et Al's KNN X-ray diagnosis method.
The article ‘Smote-variants: a Python Implementation of 85 Minority Oversampling Techniques, in Neurocomputing Journal, June 2019, describes methods to create ‘virtual-embeddings’ from existing embeddings, so as to increase the number of available embeddings.
Applicant has realized that the methods used to create ‘virtual-embeddings’ described in the abovementioned article, may also be used to create ‘virtual candidate X-ray embeddings.’
Applicant has realized that by adding a ‘balancing system’ to an X-ray CNN/KNN system, the accuracy of prediction results may be improved.
Applicant has realized that by enabling users to choose between KNN search results both with and without additional virtual embeddings, they may choose the more accurate result.
Reference is made to
Utilizing an image KNN system like that described in U.S. Pat. No. 10,929,751, entitled “FINDING K EXTREME VALUES IN CONSTANT PROCESSING TIME” issued Feb. 23, 2021, owned by Applicant, and incorporated here by reference, a plurality of known candidate X-ray images 116C from diagnosed X-ray datastore 101, and an unknown query X-ray image 117Q may be encoded into candidate X-ray embeddings 116CE and query X-ray embedding 117QE respectively, by CNN feature extractor 102, and may be stored in a embeddings datastore 103. Candidate X-ray embeddings 116CE and query X-ray embeddings 117QE may then be input into a KNN classifier 107 for identification.
It will be appreciated that diagnosed or candidate X-ray images 116C and their associated candidate X-ray embeddings 116CE may represent different classes of diagnoses such as cancers, viral infections, bacterial infections, etc. It will also be appreciated that diagnosed X-ray images 116C and their associated candidate X-ray embeddings 116CE may also represent different diagnoses within such classes of diagnoses, for example, different cancer types.
A radiologist who may suspect, for example, a particular cancer type, may want to exclude candidate X-ray embeddings 116CE associated with non-cancer diagnoses from KNN classifier 107. She may view a visualization of the candidate X-ray embeddings 116CE dataset contained in embedding datastore 103 utilizing data visualizer 230. Such a visualization may show the number of X-ray embeddings 116CE associated with a plurality of diagnoses and a plurality of classes of diagnoses. With a knowledge of such numbers of candidate X-ray embeddings 116CE, she may then exclude any unwanted candidate X-ray embeddings 116CE using target diagnosis selector 108. Target diagnosis selector 108 may select only candidate X-ray embeddings 116CE from embeddings datastore 103 that match, for example, the suspected or target diagnosis class, and may input such candidate X-ray embeddings 116CE into KNN classifier 107. It will be appreciated that the radiologist may alternatively choose not to filter the dataset, and hence may input no data requirements into target diagnosis selector 108.
KNN classifier 107 may then find K candidate X-ray embeddings 116CE which are nearest neighbors to query X-ray embedding 117QE. X-ray data retriever 104 may then retrieve diagnostic and image data associated with the K nearest neighbor candidates from diagnosed X-ray datastore 101, and may then output the image and diagnostic information that corresponds to the K nearest neighbors returned by KNN classifier 107.
Balancing system 220 comprises a balancing embeddings generator 105, a balancing embeddings datastore 106, and a balancing type selector 110.
In the abovementioned operational scenario, after reviewing a visualization of candidate X-ray embeddings 116CE on dataset visualizer 230, the radiologist may consider that the number of candidate X-ray embeddings 116CE for any particular diagnosis or class (for example, a particular lung cancer type) in embeddings datastore 103 is too low to produce an accurate KNN calculation or classification. In such a case, she may choose to add a plurality of virtual candidate X-ray embeddings 116VCE, to the plurality of candidate embeddings 116CE, used by KNN classifier 107 in the KNN calculation.
To balance the candidate dataset, the radiologist may add a plurality of existing virtual candidate X-ray embeddings 116VCE from balancing embeddings datastore 106. She may enter the required number and type(s) of virtual candidate X-ray embeddings 116VCE on balancing type selector 110, which will add that number and type(s) from balancing embeddings datastore 106 to KNN classifier 107. The radiologist may them repeat the KNN classification, using the balanced data set, in a similar manner to described above.
It will be appreciated that by changing the number and type of virtual candidate X-ray embeddings 116VCE to be input to KNN classifier 107 by balancing type selector 110 between ‘no additional virtual candidate X-ray embeddings 116VCE’ and a ‘desired number of additional virtual candidate X-ray embeddings 116VCE’, the radiologist may now compare the KNN search results produced by the original unbalanced data set using only selected candidate X-ray embeddings 116CE, and the result produced by the balanced data set with additional virtual candidate X-ray embeddings 116VCE. The radiologist may then compare KNN search results both with and without additional virtual embeddings and may then choose the more accurate result.
If there are not enough virtual candidate X-ray embeddings 116VCE in balancing embeddings datastore 106, the radiologist may choose to create some new virtual candidate X-ray embeddings 116VCE. She may enter into balancing embeddings generator 105, the number of virtual candidate X-ray embeddings 116VCE she wishes to create and the type of candidate X-ray embedding 116CE from which she wishes them created. Balancing embeddings generator 105 may search in feature datastore 103 for m (for example m=5) nearest neighbor candidate X-ray embeddings 116CE to query X-ray embedding 117QE. Balancing embeddings generator 105 may then generate a new virtual candidate X-ray embedding 116VCE that has feature vectors that are, for example but not limited to, an average of the m candidate X-ray embeddings 116CE, found by the algorithm.
Balancing embeddings generator 105 may store virtual candidate X-ray embedding 116VCE in balancing embeddings datastore 106. This process may be repeated as often as required. It will be appreciated that due to the random nature of KNN search, the generation of a plurality of virtual candidate X-ray embeddings 116VCE, from the same KNN search against the same query X-ray embedding 117QE by balancing embeddings generator 105, may not produce identical virtual candidate X-ray embeddings 116VCE.
Balancing X-ray image system 200 may be implemented on an associative memory array within an associative processing unit, similar to the KNN system in U.S. Pat. No. 10,929,751 mentioned hereinabove. The massive parallel processing functionality of associative processing units may reduce data manipulation and KNN search times.
Reference is made to
KNN classifier 204 may operate on plurality of candidate X-ray embeddings 116CE, plurality of virtual candidate X-ray embeddings 116VCE, and query X-ray embedding 117QE in a massively parallel operation as described in U.S. Pat. No. 10,929,751, mentioned hereinabove. It will be appreciated that candidate embeddings 112 and virtual candidate embeddings 113 may be included or excluded as required by KNN classifier 204, by use of a marker row 301. When columns in marker row 301 are selected, then only those embeddings in those rows may be included in the KNN classification. Marker row 310 may be the implementation of target diagnosis selector 108 and balancing type selector 110, both of which are explained hereinabove.
Reference is made to
A query X-ray embedding 117QE, a selected plurality of candidate X-ray embeddings 116CE, and selected plurality of virtual candidate X-ray embeddings 116VCE, may be written to columns 302 of temporary store 308 before being operated on in parallel by KNN classifier 309.
It will be appreciated that through balancing datasets, the accuracy of X-ray image identification in the medical image system described by Silva et al hereinabove improved by 5% from unbalanced results.
While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the true spirit of the invention.
This application claims priority from U.S. provisional patent applications 63/246,854, filed Sep. 22, 2021, and 63/403,763, filed Sep. 4, 2022, both of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63246854 | Sep 2021 | US | |
63403763 | Sep 2022 | US |