The present invention relates generally to an artificial intelligence assisted computation in smart medical field, and more particularly to a method and a system for reading a larynx image and calculating parameters with an artificial intelligence assistance.
Normalized glottal gap area (NGGA) is an important medical parameter adapted to evaluate a phonation condition of vocal folds in clinical researches of Phoniatrics. In order to obtain such medical parameter, firstly a larynx image is obtained by Laryngeal stroboscopy or Flexible endoscopy; then a glottal gap is marked in the larynx image and a glottal gap area is calculated; after a length of a vocal fold is measured, the normalized glottal gap area is obtained by manually substituting the aforementioned data into the formula “(the glottal gap area/square of the length of the vocal fold)×100”.
The concept of the normalized glottal gap area was first mentioned in the publication by Omoril in 1996 and is extensively cited in subsequent Phoniatrics medical publications. However, the normalized glottal gap area is calculated by firstly downloading the larynx image to a computer and manually marking the glottal gap in the larynx image; then the glottal gap area is estimated by using an area measuring function of an image processing software, such as ImageJ, so that the glottal gap area in the larynx image is acquired. As such medical parameter is obtained by manually marking the glottal gap in the larynx image and the subsequent calculation of the area of the marked region by using the image processing software is time-consuming, the value of the aforementioned method for obtaining the medical parameter is limited in clinical practice.
In view of the above, the primary objective of the present invention is to provide an object detection technique by an artificial intelligence, wherein after recognizing features of vocal folds in a medical image or a medical video and extracting a glottis image, edges and region of a glottis are marked by artificial intelligence image recognition and segmentation technique and hence the values of medical features are obtained, thereby assisting doctors in quantitatively evaluating the vocal fold condition.
The present invention provides a method for calculating parameters in a larynx image with an artificial intelligence assistance, including: training a model: training a deep learning object detection software by a plurality of larynx images with a manually marked glottis image region to extract a glottis image from a larynx image received and training a deep learning image recognition and segmentation software by a plurality of glottis images with a manually marked membranous glottal gap to recognize a membranous glottal gap in a glottis image received; the membranous glottal gap includes structural features of a left vocal fold, a right vocal fold, and an anterior commissure;
receiving a larynx image: receiving a larynx image, a plurality of larynx images captured frame-by-frame, or a larynx video that is captured when vocal folds are in a phonating state; recognizing a glottis image: extracting at least one glottis image from the larynx image, the plurality of larynx images, or the larynx video by the deep learning object detection software; recognizing a membranous glottal gap in the glottis image: recognizing a membranous glottal gap in the at least one glottis image by the deep learning image recognition and segmentation software and outputting at least one membranous glottal gap filter corresponding to the at least one glottis image by the deep learning image recognition and segmentation software; and obtaining a medical parameter: performing image processing of edge detection and image patching on the at least one membranous glottal gap filter to clearly outline a membranous glottal gap in the at least one membranous glottal gap filter and obtaining a medical parameter of a plurality of vocal fold anatomies from the clearly outlined membranous glottal gap in the at least one membranous glottal gap filter.
The present invention further provides a system for calculating parameters in a larynx image with an artificial intelligence assistance, including an input unit, a processing unit, and an output unit. The input unit receives a medical image. The medical image includes a larynx image, a plurality of larynx images captured frame-by-frame, or a larynx video that is captured when vocal folds are in a phonating state. The processing unit is in signal connection with the input unit and is configured to perform a deep learning algorithm to compute the medical image received by the input unit. The deep learning algorithm includes a deep learning object detection software and a deep learning image recognition and segmentation software.
The processing unit extracts at least one glottis image from the medical image by the deep learning object detection software. The processing unit recognizes a membranous glottal gap in the at least one glottis image by the deep learning image recognition and segmentation software and outputs at least one membranous glottal gap filter corresponding to the at least one glottis image. The processing unit performs image processing of edge detection and image patching on the at least one membranous glottal gap filter to clearly outline a membranous glottal gap in the at least one membranous glottal gap filter. At least one medical parameter, which corresponds to at least one vocal fold anatomy, is obtained from the clearly outlined membranous glottal gap in the at least one membranous glottal gap filter and at least one vocal fold anatomy mark is added to a position of the medical image corresponding to the at least one medical parameter. The output unit is in signal connection with the processing unit. The output unit receives the medical image having the at least one vocal fold anatomy mark and the at least one medical parameter from the processing unit and outputs the medical image and the at least one medical parameter as a medical parameter and image report.
With the aforementioned design, by recognizing the vocal fold anatomies and features in the medical image, such as the larynx image and the glottis image, by the artificial intelligence and performing rapid or immediate computation, the medical parameters, such as the normalized membranous glottal gap area and the amplitude of vocal fold vibration, are obtained, thereby assisting in evaluating the vocal fold condition. As the conventional way is to manually determine the position and the region of the vocal fold anatomies upon obtaining image information, such as incomplete adduction of vocal folds and mucosal wave, from the larynx images captured frame-by-frame by Laryngeal stroboscopy, the determination result of the conventional way is relatively subjective and is hard to be normalized. The present invention prevents the aforementioned problems of the conventional way.
The present invention will be best understood by referring to the following detailed description of some illustrative embodiments in conjunction with the accompanying drawings, in which
Step S01: train a model:
As shown in
Referring to
Step S02: receive a larynx image: a medical image of a larynx in the phonating state is received, wherein the medical image includes a larynx image, a plurality of larynx images captured frame-by-frame, or a larynx video that is captured when the vocal folds are in the phonating state. In the current embodiment, a plurality of larynx images captured frame-by-frame is used as an example.
Step S03: recognize a glottis image: referring to
Step S04: recognize a membranous glottal gap in the glottis image: referring to
Step S05: obtain a medical parameter: image processing of edge detection and image patching is performed on the membranous glottal gap filter 21A. As shown in
The way to obtain the normalized membranous glottal gap area is to calculate the membranous glottal gap area of each of the glottis images 21 from the clearly outlined membranous glottal gap 171 of each of the membranous glottal gap filters 21A and to obtain the vocal fold length L from the same clearly outlined membranous glottal gap 171. The vocal fold length L is a straight-line distance from the left vocal fold 10 to the anterior commissure 12 or from the right vocal fold 11 to the anterior commissure 12. The membranous glottal gap area and the vocal fold length L are substituted into the formula “the membranous glottal gap area/the square of the vocal fold length)×100”, so that the normalized membranous glottal gap area (NMGGA) is calculated.
Additionally, the way to obtain the amplitude of vocal fold vibration L1 is to mark the linkage of the left vocal fold 10 (the right vocal fold 11) and anterior commissure 12 during vocal fold adduction in the clearly outlined membranous glottal gap 171 of the membranous glottal gap filters 21A and to calculate the longest distance from the side edge of the membranous glottal gap 171 to the linkage of the left vocal fold 10 (the right vocal fold 11) and anterior commissure 12 in a direction perpendicular to the linkage.
Step S06: prepare a report and a graph: after obtaining the medical parameter in Step S05, a graph 30 is prepared with the time or the frame order as a horizontal axis and the normalized membranous glottal gap area as a vertical axis as shown in
The medical parameter in the medical parameter and image report could be one or plural. As the data of the medical parameters is obtained upon performing obtaining the medical parameter (Step S05), Step S06 could be selectively performed. Step S06 is configured to output the medical parameter and medical image obtained in obtaining the medical parameter (Step S05) as a readable and user-friendly report.
In the current embodiment, the deep learning image recognition and segmentation software B is U-net. Therefore, in Step S04 of recognizing a membranous glottal gap of the glottis image, during recognizing the glottal gap 171 by using U-Net, the glottis images are required to be compressed into the glottis images 21 in small size, such as 128 pixels×128 pixels, in advance as shown in
The present invention further provides a system for calculating parameters in a larynx image with an artificial intelligence assistance. Referring to
The input unit 40 could be an input interface, a card reader, or a network card and is configured to receive a medical image, wherein the medical image includes a larynx image 20, a plurality of larynx images 20 captured frame-by-frame, or a larynx video that is taken when the vocal folds are in the phonating state.
The processing unit 41 includes a computing, controlling, and memory module. The processing unit 41 is in signal connection with the input unit 40 to receive the medical image. The processing unit 41 perform a deep learning algorithm to compute the medical image received from the input unit 40. The deep learning algorithm includes a deep learning object detection software A and a deep learning image recognition and segmentation software B. In the current embodiment, the deep learning object detection software A is YOLO and is trained by a plurality of larynx images 20 with a manually marked glottis image region; the deep learning image recognition and segmentation software B is U-Net and is trained by the plurality of glottis images 21 with a manually marked membranous glottal gap 171.
The processing unit 41 performs Step S03 of recognizing a glottis image. As shown in
The output unit 42 could be an output interface, a printer, or a display screen and is in signal connection with the processing unit 41. The output unit 42 receives the at least one medical parameter and the medical image having the at least one vocal fold anatomy mark from the processing unit 41, wherein the at least one medical parameter includes the normalized membranous glottal gap area of the membranous glottal gap 171 or the amplitude of vocal fold vibration L1. The output unit 42 performs Step S06 of preparing a report and a graph. The output unit 42 output the medical image, such as the glottis image(s) 21 having the at least one vocal fold anatomy mark, and the value of the at least one medical parameter corresponding to the glottis image(s) 21 as a medical parameter and image report. As shown in
It must be pointed out that the embodiments described above are only some preferred embodiments of the present invention. All equivalent structures which employ the concepts disclosed in this specification and the appended claims should fall within the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
111143975 | Nov 2022 | TW | national |