This application claims the benefit of priority to Patent Application No. 202011187504.0, filed in China on Oct. 30, 2020; the entirety of which is incorporated herein by reference for all purposes.
The disclosure generally relates to artificial intelligence and, more particularly, to methods, computer program products and apparatuses for diagnosing tongues based on deep learning.
Tongue diagnosis in Chinese medicine is a method of diagnosing disease and disease patterns by visual inspection of the tongue and its various features. The tongue provides important clues reflecting the conditions of the internal organs. Like other diagnostic methods, tongue diagnosis is based on the “outer reflects the inner” principle of Chinese medicine, which is that external structures often reflect the conditions of the internal structures and can give us important indications of internal disharmony. Conventionally, various image recognition algorithms are used to complete the computer-implemented tongue diagnosis. However, the algorithms can only identify limited tongue characteristics related to color. Thus, it is desirable to have methods, computer program products and apparatuses for diagnosing tongues to identity more tongue characteristics than that are recognized by the image recognition algorithms.
In an aspect of the invention, the invention introduces a method for diagnosing tongues based on deep learning, performed by processing unit of a tablet computer, including: obtaining a shooting photo through a camera module of the tablet computer; inputting the shooting photo to a convolutional neural network (CNN) to obtain classification results of different categories, which are associated with a tongue of the shooting photo; and displaying a screen of a tongue-diagnosis application on a display panel of the tablet computer, where the screen includes the classification results of the categories.
In another aspect of the invention, the invention introduces a non-transitory computer program product for diagnosing tongues based on deep learning to include program code when executed by a processing unit of a tablet computer to perform steps of the aforementioned method.
In still another aspect of the invention, the invention introduces an apparatus for diagnosing tongues based on deep learning to include a camera module; a display panel; and a processing unit. The processing unit is arranged operably to obtain a shooting photo through the camera module; input the shooting photo to a CNN to obtain classification results of different categories, which are associated with a tongue of the shooting photo; and display a screen of a tongue-diagnosis application on the display panel, where the screen includes the classification results of the categories.
Both the foregoing general description and the following detailed description are examples and explanatory only, and are not restrictive of the invention as claimed.
Reference is made in detail to embodiments of the invention, which are illustrated in the accompanying drawings. The same reference numbers may be used throughout the drawings to refer to the same or like parts, components, or operations.
The present invention will be described with respect to particular embodiments and with reference to certain drawings, but the invention is not limited thereto and is only limited by the claims. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Use of ordinal terms such as “first”, “second”, “third”, etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having the same name (but for use of the ordinal term) to distinguish the claim elements.
It will be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present. Other words used to describe the relationship between elements should be interpreted in a like fashion (e.g., “between” versus “directly between,” “adjacent” versus “directly adjacent.” etc.)
In some implementations, a tongue-diagnosis application may use various image recognition algorithms to identity characteristics of tongues in images. Conventionally, such algorithms have better recognition results for features that are highly related to colors, such as “tongue color,” “moss color,” etc. However, such algorithms less-effectively identity the tongue characteristics that are not highly related colors, such as “tongue shape,” “tongue coating,” “saliva,” “tooth-marked tongue,” “red spots,” “black spots,” “cracked tongue,” etc.
To overcome the drawbacks of the image recognition algorithms, an embodiment of the invention introduces the method for diagnosing tongues based on deep learning, including three phases: training; verification and real-time judgment. Refer to
In the verification phase, the training apparatus 110 receives images 125 (also referred to as verification images) including a variety of tongues, and answers in each image, where each answer is associated with a specific category. Subsequently, the verification images 125 are input to the trained tongue-diagnosis model 130 to classify each verification image 125 after proper image pre-processing into resulting items of different categories. The training apparatus 110 compares the answers associated with the verification images 125 with the classification results of the verification images 125 by the tongue-diagnosis model 130 to determine whether the accuracy of the tongue-diagnosis model 130 has passed the examination accordingly. If so, the tongue-diagnosis model 130 is provided to the tablet computer 140; otherwise, the deep learning parameters are adjusted to retrain the tongue-diagnosis model 130.
Refer to
Refer to
In the tablet computer 140, the input device 430 includes a camera module for sensing the R, G and B light strength at a specific focal length, and a digital signal processor (DSP) for generating the shooting photo 150 of a patient according to the sensed values. One surface of the tablet computer 140 may be provided with the display panel for displaying the screen 30 of the tongue-diagnosis application, and the other surface thereof may be provided with the camera module.
In some embodiments for the training phase, the outcome of deep learning (that is, the tongue-diagnosis model 130) may be a convolutional neural network (CNN). The CNN is a simplified artificial neural network (ANN) architecture, which filters out some parameters that are not actually used in image processing, making it uses fewer parameters than that by a deep neural network (DNN) to improve training efficiency. The CNN is composed of convolution layers and pooling layers with associated weights, and a fully connected layer on the top.
In some embodiments for establishing the tongue-diagnosis models 130, the training images 120 and all the tags of different categories for each training image 120 are input to deep learning algorithms to generate a full-detection CNN for recognizing the shooting photo 150. Refer to
Step S510: The training images 120 are collected and each training image is attached with tags of different categories. For example, one training image carries tags of the night categories as {“light white,” “normal,” “white,” “thin moss,” “averaged,” “no,” “yes,” “no,” “yes.”}
Step S520: The variable j is set to 1.
Step S531: The j-th (i.e. first) convolution operation is performed on the collected training image 120 according to their tags of different categories to generate convolution layers and the associated weights.
Step S533: The j-th max pooling operation is performed on the convolution results to generate pooling layers and the associated weights.
Step S535: It is determined whether the variable j equals MAX(j). If so, the process proceeds to step S541; otherwise, the process proceeds to step S537. MAX(j) is a preset constant used to indicate the maximum number of executions of convolution and max pooling operations.
Step S537: The variable j is set to j+1.
Step S539: The j-th convolution operation is performed on the max-pooling results to generate convolution layers and the associated weights.
In other words, steps S533 to S539 form a loop that is executed MAX(j) times.
Step S550: The previous calculation results (such as, the convolution layers, the pooling layers, the associated weights, etc.) are flatten to generate the full-detection CNN. For example, the full-detection CNN is capable of determining the classified item of each of the aforementioned nine categories from one shooting photo.
In alternative embodiments for establishing the tongue-diagnosis models 130, multiple partial-detection CNNs are generated and each partial-detection CNN is capable of determining the classified item of one designated category. Refer to
Step S610: The variable i is set to 1.
Step S620: The training images 120 are collected and each training image is attached with a tag of the i-th category.
Step S630: The variable j is set to 1.
Step S641: The j-th (i.e. first) convolution operation is performed on the collected training image 120 according to their tags of the i-th category to generate convolution layers and the associated weights.
Step S643: The j-th max pooling operation is performed on the convolution results to generate pooling layers and the associated weights.
Step S645: It is determined whether the variable j equals MAX(j). If so, the process proceeds to step S650; otherwise, the process proceeds to step S647. MAX(j) is a preset constant used to indicate the maximum number of executions of convolution and max pooling operations.
Step S647: The variable j is set to j+1.
Step S649: The j-th convolution operation is performed on the max-pooling results to generate convolution layers and the associated weights.
Step S650: The previous calculation results (such as, the convolution layers, the pooling layers, the associated weights, etc.) are flatten to generate the partial-detection CNN for the i-th category. The partial-detection CNN for the i-th category is capable of determining the classified item of the i-th category from one shooting photo.
Step S660: It is determined whether the variable i equals MAX(i). If so, the process ends; otherwise, the process proceeds to step S670. MAX(i) is a preset constant used to indicate the total number of the categories.
Step S670: The variable i is set to i+1.
In other words, steps S620 to S670 form an outer loop that is executed MAX(i) times and steps S643 to S649 form an inner loop that is executed MAX(j) times.
The processing unit 410 may execute various convolution algorithms known by those artisans to realize steps S531, S539, S641 and S649, execute various max pooling algorithms known by those artisans to realize steps S533 and S643, and execute various flatten algorithms known by those artisans to realize steps S550 and S650, and the detailed algorithms are omitted herein for brevity.
In the real-time judgment phase, if the storage device 440 of the tablet computer 140 stores the full-detection CNN established by the method as shown in
Step S710: The shooting photo 150 is obtained.
Step S720: The shooting photo 150 is input to the full-detection CNN to obtain the classification results of all categories. For example, the classification results of the aforementioned nine categories are {“light red,” “normal,” “white,” “thin moss,” “averaged,” “no,” “no,” “no,” “no.”}
Step S730: The classification results 360 of the screen 30 of the tongue-diagnosis application are updated accordingly.
In the real-time judgment phase, if the storage device 440 of the tablet computer 140 stores the partial-detection CNNs established by the method as shown in
Step S810: The shooting photo 150 is obtained.
Step S820: The variable i is set to 1.
Step S830: The shooting photo 150 is input to the partial-detection CNN for the i-th category to obtain the classification result of the i-th category.
Step S840: It is determined whether the variable i equals MAX(i). If so, the process proceeds to step S860; otherwise, the process proceeds to step S850. MAX(i) is a preset constant used to indicate the total number of the categories.
Step S850: The variable i is set to i+1.
Step S860: The classification results 360 of the screen 30 of the tongue-diagnosis application are updated accordingly.
Since the numbers of training and verification samples would affect the accuracy and the learning time of deep learning. In some embodiments, for each partial-detection CNN, the ratio of the total numbers of the training images 120, the verification images 125 and the test photo could be set to 17:2:1.
Some or all of the aforementioned embodiments of the method of the invention may be implemented in a computer program, such as program code in a specific programming language, or others. Other types of programs may also be suitable, as previously explained. Since the implementation of the various embodiments of the present invention into a computer program can be achieved by the skilled person using his routine skills, such an implementation will not be discussed for reasons of brevity. The computer program implementing some or more embodiments of the method of the present invention may be stored on a suitable computer-readable data carrier such as a DVD, CD-ROM, USB stick, a hard disk, which may be located in a network server accessible via a network such as the Internet, or any other suitable carrier.
Although the embodiment has been described as having specific elements in
While the invention has been described by way of example and in terms of the preferred embodiments, it should be understood that the invention is not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Number | Date | Country | Kind |
---|---|---|---|
202011187504.0 | Oct 2020 | CN | national |