The disclosure generally relates to artificial intelligence and, more particularly, to methods, computer program products and apparatuses for remotely diagnosing tongues based on deep learning.
Tongue diagnosis in Chinese medicine is a method of diagnosing disease and disease patterns by visual inspection of the tongue and its various features. The tongue provides important clues reflecting the conditions of the internal organs. Like other diagnostic methods, tongue diagnosis is based on the “outer reflects the inner” principle of Chinese medicine, which is that external structures often reflect the conditions of the internal structures and can give us important indications of internal disharmony. Conventionally, various image recognition algorithms are used to complete the computer-implemented tongue diagnosis. However, the algorithms can only identify limited tongue characteristics related to color. Thus, it is desirable to have methods, computer program products and apparatuses for remotely diagnosing tongues to identity more tongue characteristics than that are recognized by the image recognition algorithms.
In an aspect of the invention, the invention introduces a method for remotely diagnosing tongues based on deep learning, performed by processing unit, including: obtaining a medical-treatment request and medical-record information from a client apparatus over a network, which includes a shooting photo; inputting the shooting photo to a plurality of partial-detection convolutional neural networks (CNNs) to obtain a plurality of classification results of a plurality of categories, which are associated with a tongue of the shooting photo, wherein a total number of the partial-detection CNNs equals a total number of the categories, and each partial-detection CNN is used to generate a classification result of one corresponding category; displaying a screen of a remote tongue-diagnosis application on a display unit, which contains the classification results of the categories; obtaining a medical advice corresponding to the classification results of the categories; and replying with the medical advice to the client apparatus over the network.
In another aspect of the invention, the invention introduces a non-transitory computer-readable storage medium for remotely diagnosing tongues based on deep learning to include program code when executed by a processing unit to perform steps of the aforementioned method.
In still another aspect of the invention, the invention introduces an apparatus for remotely diagnosing tongues based on deep learning to include a communications interface; a display unit; and a processing unit. The processing unit is arranged operably to obtain a medical-treatment request and medical-record information from a client apparatus through the communications interface over a network, which contains a shooting photo; input the shooting photo to a plurality of partial-detection CNNs to obtain a plurality of classification results of a plurality of categories, which are associated with a tongue of the shooting photo, wherein a total number of the partial-detection CNNs equals a total number of the categories, and each partial-detection CNN is used to generate a classification result of one corresponding category; display a screen of a remote tongue-diagnosis application on the display unit, which contains the classification results of the categories; obtain a medical advice corresponding to the classification results of the categories; and reply with the medical advice to the client apparatus through the communications interface over the network.
Both the foregoing general description and the following detailed description are examples and explanatory only, and are not restrictive of the invention as claimed.
Reference is made in detail to embodiments of the invention, which are illustrated in the accompanying drawings. The same reference numbers may be used throughout the drawings to refer to the same or like parts, components, or operations.
The present invention will be described with respect to particular embodiments and with reference to certain drawings, but the invention is not limited thereto and is only limited by the claims. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Use of ordinal terms such as “first”, “second”, “third”, etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having the same name (but for use of the ordinal term) to distinguish the claim elements.
It will be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present. Other words used to describe the relationship between elements should be interpreted in a like fashion (e.g., “between” versus “directly between,” “adjacent” versus “directly adjacent.” etc.)
In some implementations, a tongue-diagnosis application may use various image recognition algorithms to identity characteristics of tongues in images. Conventionally, such algorithms have better recognition results for features that are highly related to colors, such as “tongue color,” “moss color,” etc. However, such algorithms less-effectively identity the tongue characteristics that are not highly related colors, such as “tongue shape,” “tongue coating,” “saliva,” “tooth-marked tongue,” “red spots,” “black spots,” “cracked tongue,” etc.
To overcome the drawbacks of the image recognition algorithms, an embodiment of the invention introduces the method for diagnosing tongues based on deep learning, including three phases: training; verification and real-time judgment. Refer to
In the verification phase, the training apparatus 110 receives images 125 (also referred to as verification images) including a variety of tongues, and answers in each image, where each answer is associated with a specific category. Subsequently, the verification images 125 are input to the trained tongue-diagnosis model 130 to classify each verification image 125 after proper image pre-processing into resulting items of different categories. The training apparatus 110 compares the answers associated with the verification images 125 with the classification results of the verification images 125 by the tongue-diagnosis model 130 to determine whether the accuracy of the tongue-diagnosis model 130 has passed the examination accordingly. If so, the tongue-diagnosis model 130 is provided to the tablet computer 140; otherwise, the deep learning parameters are adjusted to retrain the tongue-diagnosis model 130.
Refer to
Refer to
In the tablet computer 140, the input device 430 includes a camera module for sensing the R, G and B light strength at a specific focal length, and a digital signal processor (DSP) for generating the shooting photo 150 of a patient according to the sensed values. One surface of the tablet computer 140 may be provided with the display panel for displaying the screen 30 of the tongue-diagnosis application, and the other surface thereof may be provided with the camera module.
In some embodiments for the training phase, the outcome of deep learning (that is, the tongue-diagnosis model 130) may be a convolutional neural network (CNN). The CNN is a simplified artificial neural network (ANN) architecture, which filters out some parameters that are not actually used in image processing, making it uses fewer parameters than that by a deep neural network (DNN) to improve training efficiency. The CNN is composed of convolution layers and pooling layers with associated weights, and a fully connected layer on the top.
In some embodiments for establishing the tongue-diagnosis models 130, the training images 120 and all the tags of different categories for each training image 120 are input to deep learning algorithms to generate a full-detection CNN for recognizing the shooting photo 150. Refer to
Step S510: The training images 120 are collected and each training image is attached with tags of different categories. For example, one training image carries tags of the night categories as {“light white,” “normal,” “white,” “thin moss,” “averaged,” “no,” “yes,” “no,” “yes.”}
Step S520: The variable j is set to 1.
Step S531: The j-th (i.e. first) convolution operation is performed on the collected training image 120 according to their tags of different categories to generate convolution layers and the associated weights.
Step S533: The j-th max pooling operation is performed on the convolution results to generate pooling layers and the associated weights.
Step S535: It is determined whether the variable j equals MAX(j). If so, the process proceeds to step S541; otherwise, the process proceeds to step S537. MAX(j) is a preset constant used to indicate the maximum number of executions of convolution and max pooling operations.
Step S537: The variable j is set to j+1.
Step S539: The j-th convolution operation is performed on the max-pooling results to generate convolution layers and the associated weights.
In other words, steps S533 to S539 form a loop that is executed MAX(j) times.
Step S550: The previous calculation results (such as, the convolution layers, the pooling layers, the associated weights, etc.) are flatten to generate the full-detection CNN. For example, the full-detection CNN is capable of determining the classified item of each of the aforementioned nine categories from one shooting photo.
In alternative embodiments for establishing the tongue-diagnosis models 130, multiple partial-detection CNNs are generated and each partial-detection CNN is capable of determining the classified item of one designated category. Refer to
Step S610: The variable i is set to 1.
Step S620: The training images 120 are collected and each training image is attached with a tag of the i-th category.
Step S630: The variable j is set to 1.
Step S641: The j-th (i.e. first) convolution operation is performed on the collected training image 120 according to their tags of the i-th category to generate convolution layers and the associated weights.
Step S643: The j-th max pooling operation is performed on the convolution results to generate pooling layers and the associated weights.
Step S645: It is determined whether the variable j equals MAX(j). If so, the process proceeds to step S650; otherwise, the process proceeds to step S647. MAX(j) is a preset constant used to indicate the maximum number of executions of convolution and max pooling operations.
Step S647: The variable j is set to j+1.
Step S649: The j-th convolution operation is performed on the max-pooling results to generate convolution layers and the associated weights.
Step S650: The previous calculation results (such as, the convolution layers, the pooling layers, the associated weights, etc.) are flatten to generate the partial-detection CNN for the i-th category. The partial-detection CNN for the i-th category is capable of determining the classified item of the i-th category from one shooting photo.
Step S660: It is determined whether the variable i equals MAX(i). If so, the process ends; otherwise, the process proceeds to step S670. MAX(i) is a preset constant used to indicate the total number of the categories.
Step S670: The variable i is set to i+1.
In other words, steps S620 to S670 form an outer loop that is executed MAX(i) times and steps S643 to S649 form an inner loop that is executed MAX(j) times.
The processing unit 410 may execute various convolution algorithms known by those artisans to realize steps S531, S539, S641 and S649, execute various max pooling algorithms known by those artisans to realize steps S533 and S643, and execute various flatten algorithms known by those artisans to realize steps S550 and S650, and the detailed algorithms are omitted herein for brevity.
In the real-time judgment phase, if the storage device 440 of the tablet computer 140 stores the full-detection CNN established by the method as shown in
Step S710: The shooting photo 150 is obtained.
Step S720: The shooting photo 150 is input to the full-detection CNN to obtain the classification results of all categories. For example, the classification results of the aforementioned nine categories are {“light red,” “normal,” “white,” “thin moss,” “averaged,” “no,” “no,” “no,” “no.”}
Step S730: The classification results 360 of the screen 30 of the tongue-diagnosis application are updated accordingly.
In the real-time judgment phase, if the storage device 440 of the tablet computer 140 stores the partial-detection CNNs established by the method as shown in
Step S810: The shooting photo 150 is obtained.
Step S820: The variable i is set to 1.
Step S830: The shooting photo 150 is input to the partial-detection CNN for the i-th category to obtain the classification result of the i-th category.
Step S840: It is determined whether the variable i equals MAX(i). If so, the process proceeds to step S860; otherwise, the process proceeds to step S850. MAX(i) is a preset constant used to indicate the total number of the categories.
Step S850: The variable i is set to i+1.
Step S860: The classification results 360 of the screen 30 of the tongue-diagnosis application are updated accordingly.
Since the numbers of training and verification samples would affect the accuracy and the learning time of deep learning. In some embodiments, for each partial-detection CNN, the ratio of the total numbers of the training images 120, the verification images 125 and the test photo could be set to 17:2:1.
Refer to
Refer to
Refer to
If the storage device 440 in the remote tongue-diagnosis computer 910 stores the full-detection CNN generated by the method of
Step S1410: The medical-treatment request and the medical-record information are received from the client apparatus over the network 900 through the communications interface 460 in the remote tongue-diagnosis computer 910. The processing unit 410 in the remote tongue-diagnosis computer 910 may execute a background program routine to collect the medical-treatment request and the medical-record information, and store them in the storage device 4410 in the remote tongue-diagnosis computer 910. When detecting that the “Open” button 1322 is pressed, the remote tongue-diagnosis application drives the display unit 420 in the remote tongue-diagnosis computer 910 to display a selection screen, which includes multiple entries each including a medical-treatment request with corresponding medical-record information, so that the doctor can choose one entry to deal with. When the doctor completes the selection, the process continues with the following steps.
Step S1422: The shooting photo is obtained from the medical-record information, and the obtained photo is displayed in the preview window 1312.
The technical details of step S1424 are similar to step S720, and will not be repeated for the sake of brevity.
Step S1426: The classification results of the screen 1300 of the remote tongue-diagnosis application are updated accordingly. The classification name prompts 1330 include, such as “Tongue-color,” “Tongue-shape,” “Moss-color,” “Tongue-coating,” “Saliva,” “Tooth-marked tongue,” “Red-spot,” “Black-spot,” “cracked-tongue,” and the classification results 1340 are shown under the category prompts 1330. The comprehensive summary window 1314 displays a text description of the comprehensive analysis of the classification results 1340.
Step S1432: The QR code is obtained from the medical-record information, and the obtained QR code is displayed in the medication-history window 1360.
Step S1434: The medical prescription database stored in the storage device 440 in the remote tongue-diagnosis computer 910 is searched for the associated medical prescription with the QR code, and the screen 1300 of the remote tongue-diagnosis application is updated accordingly. The remote tongue-diagnosis application may display the associated medical prescription next to the QR code in the medication-history window 1360.
Step S1440: The symptoms of the patient are obtained from the medical-record information to update the screen 1300 of the remote tongue-diagnosis application. The remote tongue-diagnosis application may display the obtained symptoms in the symptom window 135.
Step S1450: The medical advice is replied to the client apparatus issuing the medical-treatment request over the network 900 through the communications interface 460 of the remote tongue-diagnosis computer 910. Regarding the content of the medical advice, in some embodiments, the doctor may refer to the updated information in the screen 1300 of the remote tongue diagnosis application and input the medical advice to the patient in the medical advice text-input box 1370. In other embodiments, in addition to the medical advice, the doctor may further provide a link to the appointment registration system in the medical advice text-input box 1370, which is used to notify the patient that he or she can enter the appointment registration system for online registration, so that the patient can register in the appropriate time to see the doctor. The link may be a hyperlink, and when the patient clicks or taps the hyperlink in the medical advice with a client apparatus, a browser or a proprietary application run on the client apparatus launches the appointment registration system. Regarding the way of reply, in some embodiments, when the “reply to patient” button 1326 is pressed, the remote tongue-diagnosis application embeds the content in the medical advice text-input box 1370 into a specific email template to generate a medical-advice email, searches the patient database stored in the storage device 440 in the remote tongue-diagnosis computer 910 for the email address of this patient, and sends the medical-advice email to the email address of this patient over the network 900. In other embodiments, when the “reply to patient” button 1326 is pressed, the remote tongue-diagnosis application embeds the content in the medical advice text-input box 1370 into a specific message template to generate a medical-advice message, searches the patient database stored in the storage device 440 in the remote tongue-diagnosis computer 910 for the Internet Protocol (IP) address of this patient, and sends the medical-advice message to the message queue with the IP address of this patient over the network 900. In further embodiments, when the “reply to patient” button 1326 is pressed, the remote tongue-diagnosis application embeds the content in the medical advice text-input box 1370 into a specific message template to generate a medical-advice message, searches the patient database stored in the storage device 440 in the remote tongue-diagnosis computer 910 for the mobile phone number of this patient, and sends the short message to the mobile phone number of this patient over the network 900.
Moreover, when the “Store” button 1324 is pressed, the remote tongue-diagnosis application stores relevant information appeared in the screen 1300 in the storage device 440 in the remote tongue-diagnosis computer 910 in specific data structure.
If the storage device 440 in the remote tongue-diagnosis computer 910 stores the partial-detection CNNs generated by the method of
The difference between the methods of
Since the CNN theoretically has multi-dimensional classification capabilities, the technical solution described in
Some or all of the aforementioned embodiments of the method of the invention may be implemented in a computer program, such as program code in a specific programming language, or others. Other types of programs may also be suitable, as previously explained. Since the implementation of the various embodiments of the present invention into a computer program can be achieved by the skilled person using his routine skills, such an implementation will not be discussed for reasons of brevity. The computer program implementing some or more embodiments of the method of the present invention may be stored on a suitable computer-readable data carrier such as a DVD, CD-ROM, USB stick, a hard disk, which may be located in a network server accessible via a network such as the Internet, or any other suitable carrier.
Although the embodiment has been described as having specific elements in
While the invention has been described by way of example and in terms of the preferred embodiments, it should be understood that the invention is not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Number | Date | Country | Kind |
---|---|---|---|
202011187504.0 | Oct 2020 | CN | national |
202111058461.0 | Sep 2021 | CN | national |
This application is a Continuation-In-Part of and claims the benefit of priority to U.S. patent application Ser. No. 17/099,961, filed on Nov. 17, 2020, which claims the benefit of priority to Patent Application No. 202011187504.0, filed in China on Oct. 30, 2020; and this application also claims the benefit of priority to Patent Application No. 202111058461.0, filed in China on Sep. 10, 2021; the entirety of which is incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 17099961 | Nov 2020 | US |
Child | 17510541 | US |