The present invention relates generally to automatic analysis of a large patient population using medical imaging data, and more particularly to automatically processing medical imaging data using one or more machine learning algorithms to generate quantitative patient data for analyzing a large patient population.
Analyzing large patient populations may reveal medical trends or relationships that can be used for planning, diagnosis, decision support, and intervention. Such patient populations are typically analyzed based on medical images acquired for those patients. With the continued advancement in medical imaging technology, the amount of medical images that is acquired, and the amount of information encapsulated in those medical images, is continuously increasing. Typically, medical images are analyzed by radiologists who interpret the medical images to generate medical imaging reports (e.g., radiology reports) on the medical images. This is a tedious process due to the large amount of medical images, the large amount of information that the medical images encapsulate, the pressure to interpret the medical images quickly and efficiently, the continuous effort required to maintain reading consistency, the lower reimbursement amounts provided by insurance companies, the quality of the medical images, and the reporting requirements. As a result, radiologists interpreting the medical images typically do not extract as much information as is available in the medical images and as such, a large amount of information encapsulated in medical images is not captured in the medical imaging reports in a quantitative format.
In addition, such medical images and medical imaging reports on the medical images are not quantitatively organized in a structured format in a manner to facilitate analysis of the patient population. Specifically, medical images are typically provided as raw image pixels and medical imaging reports are typically provided as free text. While medical images may be conventionally stored according to the DICOM (digital imaging and communications in medicine) standard, DICOM tags are populated before analysis of the medical images by the radiologist. As such, DICOM tags mostly include information relating to scanning procedures and include very little quantitative information relating to the content of the medical images. Similarly, medical imaging reports may be conventionally organized into fields, however fields relating to image interpretation are typically provided as free text. Information from medical images and medical imaging reports are typically partitioned and therefore it is tedious to cross reference such information. Thus, it is difficult to analyze a large patient population where the medical images and the medical imaging reports of the patient population are not in a quantitative structured format.
One or more embodiments of the present invention provide for automatic processing of unstructured and non-quantitative medical images and/or medical imaging reports of a large patient population to generate structured patient data using one or more machine learning algorithms. Advantageously, analytic measures for the patient population as a whole may be generated by searching, parsing, or otherwise analyzing the structured patient data. Embodiments of the present invention enable the automatic transformation of raw data into new and actionable information that can be used for planning, diagnosis, decision support, and intervention.
In accordance with one or more embodiments, systems and methods are provided for determining an analytic measure of a patient population. A knowledge database comprising structured patient data for a patient population is maintained. The structured patient data is generated by processing unstructured medical imaging data for the patient population using one or more machine learning algorithms. An analytic measure of the patient population is determined based on the structured patient data of the knowledge database. The analytic measure of the patient population is output.
In accordance with one or more embodiments, the unstructured medical imaging data comprises medical images of the patient population and medical imaging reports of the medical images. The structured patient data comprises anatomical structures, one or more segmented anatomical structures, and quantitative patient data. The quantitative patient data may include a presence, a position, a distance, a diameter, a circumference, a volume, a surface area, and/or a score determined from medical records (e.g., patient related information, reconstructed medical images, detected, annotated, or segmented anatomical structures, other quantitative patient data, etc.) of the patient population. The unstructured medical imaging data for the patient population is processed to generate the structured patient data by detecting the anatomical structures in medical images of the unstructured medical imaging data for the patient population, segmenting the one or more of the anatomical structures from the medical images, and determining the quantitative patient data based on the medical records of the patient population.
In accordance with one or more embodiments, the analytic measure of the patient population is determined by generating a graph or a table based on the structured patient data. The analytic measure of the patient population may be in response to a user request. For example, the request may be a query for filtered structured patient data from the knowledge database. The analytic measure of the patient population may be displayed on a display device, or may be exported to a file
These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.
The present invention generally relates to systems and methods for the automatic analysis of a large patient population using medical imaging data. Embodiments of the present invention are described herein to give a visual understanding of such systems and methods. A digital image is often composed of digital representations of one or more objects (or shapes). The digital representation of an object is often described herein in terms of identifying and manipulating the objects. Such manipulations are virtual manipulations accomplished in the memory or other circuitry/hardware of a computer system. Accordingly, is to be understood that embodiments of the present invention may be performed within a computer system using data stored within the computer system.
Further, it should be understood that while the embodiments discussed herein may be discussed with respect to automatic analysis of a large patient population using medical imaging data, the present invention is not so limited. Embodiments of the present invention may be applied for the analysis of any type of data using any type of imaging data.
End users of computing device 102 may communicate via network 104 for interacting with a patient data analysis system 106 and knowledge database 108 to retrieve analytic measures of a large patient population, such as, e.g., a cohort of patients. End users may interact with patient data analysis system 106 and knowledge database 108 via an interface of a web browser executing on computing device 102, an application executing on computing device 102, an app executing on computing device 102, or any other suitable interface for interacting with patient data analysis system 106. In one example, end users of computing devices 102 may interact with a software as a service (SaaS) application hosted by patient data analysis system 106.
Communication system 100 also includes medical imaging database 110 (such as, e.g., a picture archiving and communication system, PACS) for storing medical images of the patient population and electronic medical records 112 for storing other patient related information of the patient population, such as, e.g., medical imaging reports (e.g., radiology reports) of the medical images, as well as demographic information (e.g., age, gender, and weight), medical history, medication and allergies, immunization status, laboratory test results, vital signs, etc.
Conventionally, medical imaging data (e.g., medical images and medical imaging reports of the patient population) is not in a quantitative or structured format. Specifically, medical images are typically provided as raw image pixels and medical imaging reports are typically provided as free text. Medical images encapsulate a large amount of information that is not in a quantitative format. Medical images and medical imaging reports are also typically unstructured in that they are not quantitatively organized in a data structure. It is difficult to search, parse, or otherwise analyze a large patient population to determine analytic measures of the patient population where medical images or medical imaging reports of the patient population are not in a structured or quantitative format.
Patient data management system 106 is configured to automatically process unstructured medical imaging data (e.g., unstructured medical images and/or unstructured medical imaging reports) to extract structured patient data using one or more machine learning algorithms, and store the extracted patient data in knowledge database 108 in a structured format. The one or more machine learning algorithms facilitate the automatic processing of large amounts of patient data for the patient population. Advantageously, analytic measures for the patient population as a whole may be generated by searching, parsing, or otherwise analyzing the structured patient data. Patient data management system 106 enables the automatic transformation of raw data into new and actionable information that can be used for planning, diagnosis, decision support, and intervention.
At step 202, a knowledge database (e.g., knowledge database 108) comprising structured patient data for the patient population is maintained. The patient population represents a large group or cohort of patients. The structured patient data is generated by processing unstructured medical imaging data for the patient population using one or more machine learning algorithms. The structured patient data is organized and stored in the knowledge database in one or more data structures. The data structures may be of any suitable format to enable searching, parsing, or otherwise analyzing of the structured patient data. For example, the structured patient data may be organized and stored as one or more arrays, linked lists, tuples, tables, etc.
The structured patient data may comprise any type of data relating to the patient population. For example, the structured patient data may include patient related information (e.g., medical history, demographic information, etc.), medical images reconstructed based on acquisition information (e.g., DICOM information), anatomical structure information (e.g., landmarks, organs, organ sub-structures, systems, etc.) detected in the medical images using a machine learning algorithm, segmented anatomical structures generated using a machine learning algorithm, and/or quantitative patient data. The quantitative patient data is any quantitative measure of interest measured, calculated, or otherwise determined from any medical record. For example, the quantitative patient data may be patient related information, reconstructed medical images (reconstructed based on acquisition information), detected, annotated, or segmented anatomical structures, or other quantitative patient data. Examples of quantitative patient data include a presence or position of an anatomical structure, a distance between two anatomical structures, a volume or surface area of an anatomical structure, a surface of an anatomical structure, a diameter or circumference of an anatomical structure, a score associated with an anatomical structure (e.g., a lesion score), etc.
In one embodiment, the unstructured medical imaging data for the patient population is processed to generate the structured patient data by detecting the anatomical structures in medical images of the unstructured medical imaging data for the patient population, segmenting anatomical structures from the medical images, determining the quantitative patient data based segmented anatomical structures, reconstructing medical image acquisition with extraction of scanning metadata present in the medical imaging database 110, extraction of related patient data (e.g., treatment and medical history, diagnosis from reports, etc.) from electronic medical records 112, and non-radiology related data from other hospital information system (HIS) integrating electronic medical records. In one embodiment, the structured patient data is generated by processing medical imaging data for the patient population according to the steps of method 300 of
At step 204, a request for an analytic measure of the patient population is received. The request may be received from a user, such as, e.g., an end user of computing device 102 of
The analytic measure of the patient population is any measure of interest for analyzing the patient population. In one embodiment, the analytic measure of the patient population is a statistical measure of the patient population. In another embodiment, the analytic measure of the patient population is patient data of the patient population resulting from an analysis (e.g., parsing, filtering, or searching). The analytic measure of the patient population may be of any suitable form. For example, the analytic measure of the patient population may be visually represented, e.g., as a graph, plot, or table, or may be represented as discrete values (e.g., an average, a standard deviation, etc.). Examples of the analytic measure of the patient population may include a table of data parsed or filtered according to the structured patient data and a comparison of two or more factors or variables according to the structured patient data shown in a graph or a plot.
At step 206, in response to receiving the request, the analytic measure of the patient population is determined based on the structured patient data of the knowledge database. In one embodiment, the structured patient data may be analyzed according to the parameters of the request to determine the analytic measure. For example, the structured patient data may be filtered based on an anatomical structure, quantitative patient data, or any other patient data (e.g., age, weight, gender, etc.) as defined by the parameters, and the filtered structured patient data may be represented by generating a table, chart, or any other suitable visual representation as the analytic measure of the patient population. In another example, two or more factors or variables of the structured patient data may be compared with each other as defined by the parameters. The comparison may be represented by generating a table, chart, graph, plot, or any other suitable visual representation as the analytic measure of the patient population. Examples of the analytic measure of the patient population are shown in
At step 208, the analytic measure of the patient population is output. In one embodiment, the analytic measure of the patient population may be output by exporting the analytic measure to a file (e.g., a comma-separate values (CSV) file), which may be used as input to other analytic systems or tools for deeper analysis. In other embodiments, the analytic measure of the patient population can be output by displaying the analytic measure of the patient population on a display device of a computer system (e.g., computing system 102), storing the analytic measure of the patient population on a memory or storage of a computer system, or by transmitting the analytic measure of the patient population to a remote computer system.
At step 302, medical images of a patient population is received. The medical images are not in a structured or quantitative format. The medical images may be of any suitable modality, such as, e.g., x-ray, magnetic resonance imaging (MRI), computed tomography (CT), DynaCT (DTC), ultrasound (US), single-photon emission computed tomography (SPECT), positron emission tomography (PET), or any other suitable modality or combination of modalities.
At step 304, anatomical structures are detected in the medical images of the patient population. The anatomical structures may include landmarks, organs, organ sub-structures, systems, etc. The anatomical structures may be detected according to any known approach. For example, anatomical structures may be detected using regression forests, heatmap regression with convolutional neural networks, or deep reinforcement learning.
At step 306, one or more of the anatomical structures are segmented from the medical images. The one or more segmented anatomical structures are represented as a segmentation mask having voxels (or pixels, as the case may be) associated with the one or more anatomical structures. The segmentation may be performed using any known approach. For example, the one or more anatomical structures may be segmented using encoder-decoder fully convolutional networks, which may be performed with adversarial training. In one embodiment, the one or more anatomical structures may be segmented using the adversarial deep image-to-image network 600 of
At step 308, quantitative patient data is determined based on medical records associated with the patient population. The medical records may include, e.g., patient related information (e.g., medical history, demographic information, etc.), medical images reconstructed based on acquisition information (e.g., DICOM information), anatomical structure information (e.g., landmarks, organs, organ sub-structures, systems, etc.) detected in the medical images using a machine learning algorithm, segmented anatomical structures generated using a machine learning algorithm, or any other suitable medical record.
The quantitative patient data may include any quantitative measure determined from the medical records associated with the patient population, such as, e.g., patient related information, reconstructed medical images (reconstructed based on acquisition information), detected, annotated, or segmented anatomical structures, or a presence, a position, a distance, a diameter, a circumference, a volume, a surface area, a score, etc. associated with the anatomical structures For example, the quantitative patient data may be a distance between two anatomical structures, a distance (e.g., length, width, height, diameter, circumference, etc.) associated with an anatomical structure, a volume of an anatomical structure, or a lesion score associated with an anatomical structure.
At step 310, the detected anatomical structures, the one or more segmented anatomical structures, and the quantitative patient data are output as the structured patient data.
As shown in
Discriminator network 604 is utilized during the training stage. Discriminator network 604 is shows with blocks BLK14-BLK16, each of which include one or more convolutional layers. Prediction 608 (e.g., a binary segmentation mask) is input together with a ground truth segmentation 610 to discriminator network 604. The role of discriminator network 604 is to classify one image as the generated image and the other image as the ground truth image. Training is successful if discriminator network 604 cannot distinguish between the generated image and the ground truth image.
Embodiments of the present invention were experimentally validated using the chronic obstructive pulmonary disease (COPDGene) database, which included clinical information, blood samples, test results, and CT medical images for a patient population of over 10,000 patients. The CT images included nearly 24,000,000 CT images comprising more than 40,000 three dimensional CT volumes. CT images is used to identify potential causes for COPD symptoms such as cough or breathlessness, or other causes that present similarly such as bronchiectasis, fibrosis, infections, or cancer. The COPDGene database include lung lobes and airway segmentations that are used to quantify emphysema, potential thickening of the airways, or air trappings for each patient.
Embodiments of the present invention were applied to process CT images of the COPDGene database to determine and compare measurements relating to the effects of COPD and quantitative measurements from lung CT images. The measurements of the effects of COPD were measured as a percentage of low-attenuation areas in the CT images, measured as areas below a predetermined Hounsfield Units (HU) threshold (such as −950 HU or −900 Hu, denoted as LAA950 and LAA900 respectively) and the measurements from lung CT images were measured as the forced expiratory volume (FEV).
As expected, patients with a current diagnosis for COPD and those with a specific diagnosis for emphysema had a statistically significant (p-value<10−6) higher percentage of LAA950/LAA900 than those not currently diagnosed. Alternatively, current smokers demonstrated a statistically significant (p-value<10−6) lower LAA900, or in other words had “denser” lungs than non-current smokers.
With the automatic processing of a vast number of quantitative values from medical images enabled by embodiments of the present invention, quantities considered unrelated to the pathology of interest can be analyzed for opportunistic research studies or diagnostics. For example, images collected for the study of a primarily respiratory disease (COPD) were utilized to perform an unrelated skeletal tissue analysis: the study of bone mineral density (BMD) globally or of specific vertebrae in relation to age. When considering bone mineral density, a decrease in bone density was demonstrated with age for both male and female patients, albeit at different rates of decrease.
Embodiments of the present invention were experimentally shown to be able to quickly analyze complex relationships between such quantitative data and clinical data. A similar study of 10,000 patients using conventional manual data collection and bone segmentation would likely take months, however this analysis using embodiments of the present invention took one day. Embodiments of the present invention enable the automatic transformation of raw data into new and actionable information that can be used for planning, diagnosis, decision support, and intervention.
Systems, apparatuses, and methods described herein may be implemented using digital circuitry, or using one or more computers using well-known computer processors, memory units, storage devices, computer software, and other components. Typically, a computer includes a processor for executing instructions and one or more memories for storing instructions and data. A computer may also include, or be coupled to, one or more mass storage devices, such as one or more magnetic disks, internal hard disks and removable disks, magneto-optical disks, optical disks, etc.
Systems, apparatus, and methods described herein may be implemented using computers operating in a client-server relationship. Typically, in such a system, the client computers are located remotely from the server computer and interact via a network. The client-server relationship may be defined and controlled by computer programs running on the respective client and server computers.
Systems, apparatus, and methods described herein may be implemented within a network-based cloud computing system. In such a network-based cloud computing system, a server or another processor that is connected to a network communicates with one or more client computers via a network. A client computer may communicate with the server via a network browser application residing and operating on the client computer, for example. A client computer may store data on the server and access the data via the network. A client computer may transmit requests for data, or requests for online services, to the server via the network. The server may perform requested services and provide data to the client computer(s). The server may also transmit data adapted to cause a client computer to perform a specified function, e.g., to perform a calculation, to display specified data on a screen, etc. For example, the server may transmit a request adapted to cause a client computer to perform one or more of the steps or functions of the methods and workflows described herein, including one or more of the steps or functions of
Systems, apparatus, and methods described herein may be implemented using a computer program product tangibly embodied in an information carrier, e.g., in a non-transitory machine-readable storage device, for execution by a programmable processor; and the method and workflow steps described herein, including one or more of the steps or functions of
A high-level block diagram of an example computer 1502 that may be used to implement systems, apparatus, and methods described herein is depicted in
Processor 1504 may include both general and special purpose microprocessors, and may be the sole processor or one of multiple processors of computer 1502. Processor 1504 may include one or more central processing units (CPUs), for example. Processor 1504, data storage device 1512, and/or memory 1510 may include, be supplemented by, or incorporated in, one or more application-specific integrated circuits (ASICs) and/or one or more field programmable gate arrays (FPGAs).
Data storage device 1512 and memory 1510 each include a tangible non-transitory computer readable storage medium. Data storage device 1512, and memory 1510, may each include high-speed random access memory, such as dynamic random access memory (DRAM), static random access memory (SRAM), double data rate synchronous dynamic random access memory (DDR RAM), or other random access solid state memory devices, and may include non-volatile memory, such as one or more magnetic disk storage devices such as internal hard disks and removable disks, magneto-optical disk storage devices, optical disk storage devices, flash memory devices, semiconductor memory devices, such as erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM), digital versatile disc read-only memory (DVD-ROM) disks, or other non-volatile solid state storage devices.
Input/output devices 1508 may include peripherals, such as a printer, scanner, display screen, etc. For example, input/output devices 1508 may include a display device such as a cathode ray tube (CRT) or liquid crystal display (LCD) monitor for displaying information to the user, a keyboard, and a pointing device such as a mouse or a trackball by which the user can provide input to computer 1502.
An image acquisition device 1514 can be connected to the computer 1502 to input image data (e.g., medical images) to the computer 1502. It is possible to implement the image acquisition device 1514 and the computer 1502 as one device. It is also possible that the image acquisition device 1514 and the computer 1502 communicate wirelessly through a network. In a possible embodiment, the computer 1502 can be located remotely with respect to the image acquisition device 1514.
Any or all of the systems and apparatus discussed herein, including elements of computing device 102, patient data analysis system 106, knowledge database 108, medical imaging database 110, and electronic medical records 112 of
One skilled in the art will recognize that an implementation of an actual computer or computer system may have other structures and may contain other components as well, and that
The foregoing Detailed Description is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.
This application claims the benefit of U.S. Provisional Application No. 62/645,450, filed Mar. 20, 2018, and U.S. Provisional Application No. 62/645,454, filed Mar. 20, 2018, the disclosures of which are herein incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
62645450 | Mar 2018 | US | |
62645454 | Mar 2018 | US |