The present invention generally relates to a chart processing technology, and more specifically, to a method, device and storage medium for recognizing charts.
For patients with hearing loss, the most basic and commonly used test item in clinical hearing clinics is pure tone hearing threshold tests, and the test results are generally represented by audiograms. Based on the test results presented by the audiograms, whether the patient's hearing has deteriorated, and the extent of deterioration can be accurately assessed.
Pure tone refers to the sound with a single frequency component, such as 500 Hz single frequency tone, 1000 Hz single frequency tone, etc.; and hearing threshold refers to the minimum loudness of sound that patients can perceive subjectively during the test at a ratio greater than 50%, which can be 30 dB, 40 dB, etc. At the same time, because sound can be transmitted through air conduction and bone conduction, audiograms sometimes include test results under air conduction and bone conduction, but most of them are mainly obtained under air conduction. Air conduction is the conduction of sound through air, passing through the auricle, external auditory canal, tympanic membrane, ossicular chain to the oval window, and then into the inner ear. Bone conduction is the direct effect of sound on the skull to the inner ear.
A standard audiogram usually contains an abscissa representing sound frequency and an ordinate representing loudness of sound. The abscissa usually contains a plurality of sound frequency coordinate axis labels each assigned with a fixed hertz number, and the ordinate usually contains a plurality of loudness of sound coordinate axis labels each assigned with a fixed decibel number. Audiograms generally contain one, two or four curves. In most cases, the audiogram may contain two curves, namely a left ear air conduction curve and a right ear air conduction curve. When the audiogram contains four curves, the four curves are air conduction curves for the left and right ears as well as bone conduction curves for the left and right ears. Each air conduction or bone conduction curve includes a plurality of characteristic labels, and the color and shape of the characteristic labels indicate different detection types. Under standard conditions, blue represents left ear, red represents right ear, “O” represents right ear air conduction, “X” represents left ear air conduction, “<” represents right ear bone conduction, and “>” represents left ear bone conduction.
Audiograms are usually used by physicians or audiologists to provide patients with hearing aids that are suitable for them. When the physicians or audiologists obtain the patient's audiograms, they may need to read characteristic label values on curves in the audiograms by themselves, and then manually input them into a fitting software of hearing aids from different manufacturers to obtain the parameters or values of the hearing aids. The obtained parameters can be written into the hearing aids to configure the hearing aids. This process is cumbersome. Sometimes the labels are difficult to distinguish because they are overlapping with each other or black-and-white colored. The process of recognizing images is slow and error-prone. It can be seen from
Therefore, it is desired to provide an improved method and device for recognizing audiograms.
An objective of the present application is to provide a method and a device for recognizing charts, especially audiograms, to solve the problem of manual reading of charts that is error-prone, time-consuming and labor-intensive.
In one aspect of the present application, a method for recognizing a chart is provided, wherein the method comprises: acquiring an object image containing a chart, wherein the chart comprises a labeled area defined by a first coordinate axis and a second coordinate axis that intersect with each other, first coordinate labels along the first coordinate axis, second coordinate labels along the second coordinate axis, and a plurality of characteristic labels within the labeled area; processing the object image with a trained first neural network to identify and separate the chart from the object image; processing the chart with a trained second neural network to identify the first coordinate labels, the second coordinate labels and the plurality of characteristic labels; generating a chart coordinate system based on the identified first coordinate labels and second coordinate labels, wherein the chart coordinate system fits the first coordinate axis and the second coordinate axis of the object image; and determining coordinate values of each of the plurality of characteristic labels based on an identified position of the characteristic label in the chart coordinate system.
In some embodiments, after the step of processing the object image with the trained first neural network, the method further comprises: rotating the chart to extend the first coordinate axis generally in a horizontal direction and the second coordinate axis generally in a vertical direction.
In some embodiments, the step of rotating the chart further comprises: determining a first angle to be rotated for the first coordinate axis and a second angle to be rotated for the second coordinate axis using Hough straight line transformation method; and rotating the first coordinate axis and the second coordinate axis based on the determined first and second angles to be rotated.
In some embodiments, the trained first neural network and the trained second neural network are trained with different data sets.
In some embodiments, the first neural network and the second neural network use the same neural network algorithm.
In some embodiments, the first neural network and the second neural network both use faster region based convolutional neural network (RCNN) algorithm in combination with feature pyramid network (FPN) algorithm.
In some embodiments, the second neural network is trained with a synthetized training data set, and the synthetized training data set comprises a plurality of synthetized audiograms each including a background image and coordinate labels superimposed on the background image, and wherein the coordinate labels are generated based on one or more character libraries.
In some embodiments, the synthetized audiogram further comprises interference labels superimposed on the background image.
In some embodiments, the step of generating a chart coordinate system based on the identified first coordinate labels and second coordinate labels further comprises: using Huber regression algorithm to fit the chart coordinate system to the first coordinate axis and the second coordinate axis.
In some embodiments, the step of generating a chart coordinate system based on the identified first coordinate labels and second coordinate labels further comprises: using random sample and consensus (RANSAC) algorithm to spatially fit the chart coordinate system to the first coordinate labels and to the second coordinate labels respectively; and using RANSAC algorithm to numerically fit at least a part of the first coordinate labels and to at least a part of the second coordinate labels so as to generate the first coordinate axis and the second coordinate axis.
In some embodiments, the step of determining coordinate values of each of the plurality of characteristic labels based on an identified position of the characteristic label in the chart coordinate system comprises: projecting each of the characteristic labels onto the first coordinate axis to determine a first coordinate value of the characteristic label; projecting each of the characteristic labels onto the second coordinate axis to determine a second coordinate value of the characteristic label; and combining the first coordinate value and the second coordinate value for each characteristic label.
In some embodiments, the chart is an audiogram, the first coordinate axis represents sound frequency, the second coordinate axis represents loudness of sound, the first coordinate axis labels are frequency values, and the second coordinate axis labels are loudness values, and the coordinate values of each characteristic label has a pair of frequency value and loudness value.
In some embodiments, the characteristic labels further comprise left ear characteristic labels each representing left ear hearing and right ear characteristic labels each representing right ear hearing.
In some embodiments, the characteristic labels further comprise left ear air conduction characteristic labels or left ear bone conduction characteristic labels each representing left ear hearing, and right ear air conduction characteristic labels or right ear bone conduction characteristic labels each representing right ear hearing.
In another aspect of the present application, a device for automatically recognizing a chart is also provided, wherein the device for automatically recognizing a chart comprises a non-transitory computer storage medium on which one or more executable instructions are stored, and the one or more instructions are executable by a processor to perform the steps as mentioned in above aspect.
In another aspect of the present application, a non-transitory computer storage medium is also provided, wherein one or more executable instructions are stored thereon, and the one or more instructions are executable by a processor to perform the steps as mentioned in above aspect.
The above is an overview of the present application, which may be simplified, summarized and omitted in detail. Therefore, those skilled in the art should realize that this part is only illustrative, and is not intended to limit the scope of the application in any way. This summary is neither intended to determine the key features or essential features of the subject matter sought to be protected, nor is it intended to be used as an auxiliary means to determine the scope of the subject matter sought to be protected.
The above and other features of the content of the present application will be more fully understood through the following description and appended claims in combination with the drawings. It can be understood that these drawings only illustrate several implementations of the content of the present application, and therefore should not be considered as limiting the scope of the content of the present application. By adopting the drawings, the content of the present application will be explained more clearly and in detail.
Before explaining any embodiments of the present invention in detail, it shall be understood that the application of the present invention is not limited to the details of the configuration and the arrangement of components set forth in the following description or shown in the following drawings.
The present invention can have other embodiments and can be practiced or implemented in various ways. Moreover, it shall be understood that the wording and terminology used herein are for illustrative purposes and shall not be considered limitative.
In the following detailed description, reference is made to the drawings constituting a part thereof. In the drawings, similar symbols usually indicate similar components, unless the context specifies otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not intended to limit. Without departing from the spirit or scope of the subject matter of the present application, other embodiments may be applied, and other changes may be made. It can be understood that various aspects of the content of the present application described generally in the present application and illustrated in the drawings can be configured with various configurations, substitutions, combinations, and designs, all of which clearly constitute a part of the content of the present application.
In order to facilitate the processing of charts with standard formats such as audiograms, the inventors of the present application propose a method that uses neural network technology to process images and recognize charts in the images and values corresponding to chart labels. This processing method can effectively reduce manpower occupation and improve processing efficiency. In some embodiments, this chart processing method is executable by an electronic device with computing and data processing capabilities to realize an automated processing process.
As shown in
Next, in step 204, the object image is processed using a trained first neural network to recognize and separate the chart from the object image.
As mentioned above, the object image may have text, image or other background irrelevant to the chart, and the irrelevant information may affect the recognition of the chart. Ideally, step 204 may not be performed and the object image may be directly recognized to obtain the audiogram. However, since the audiogram only occupies a part or even a small part of the entire object image, directly performing subsequent processing steps would greatly reduce accuracy. Moreover, in some cases, the object image may contain a plurality of audiograms, and skipping step 204 would also make subsequent processing very complicated. Therefore, in order to better perform subsequent recognition of the content in the chart, the object image can be processed first to extract the chart from an area where it is located.
In order to improve the accuracy of extracting the chart, the embodiments of the present application use neural network technology to process the object image. In some embodiments, the first neural network for recognizing the chart may use the Faster-RCNN (Faster Region Based Convolutional Neural Network) neural network model commonly used in target detection. In the process of extracting the chart from the object image, the detection target for the first neural network is a chart similar to an audiogram. It can be understood that the first neural network may be pre-trained from a data set including similar object images and audiograms, so that it can recognize audiograms in a targeted manner. For example, the first neural network can be trained by giving some images with annotated audiogram locations. During training and testing, the audiogram is regarded as the only category of foreground, and all other areas are regarded as background for training and testing.
The first neural network first performs characteristic extraction on the acquired object image through a plurality of convolutional layers, and extracts a characteristic diagram of the entire image. In some embodiments, the first neural network using the Faster RCNN model is mainly composed of two parts, i.e., a first part which is RPN (Region Proposal Network), and a second part which is Fast RCNN. The RPN is mainly used to extract candidate frames, while the Fast RCNN corrects and classifies candidate frames on the extracted candidate frames. Compared with a single stage detection algorithm (such as the YOLO algorithm, etc.), although the processing speed of Faster RCNN is slower, its accuracy is higher, and it is especially suitable for detecting small targets.
The first neural network also uses a pyramid network (FPN) model. FPN is a method that can efficiently extract characteristics of various dimensions in an object image. The FPN fuses the characteristic diagrams of different levels in the convolutional neural network, so that the final fused characteristic diagram has both high-level and summative information as well as low-level and more fine-grained information. Since the FPN can effectively improve the detection accuracy, the Faster-RCNN and FPN models are both used in the first neural network used in step 204.
It can be understood that although the Faster-RCNN model is adopted as a whole in the first neural network model, some parts of the model can be replaced with algorithms or models that can achieve the same function. For example, the classification structure can be replaced by a support vector machine (SVM). In addition, other common target detection models such as Fast-RCNN and YOLO can be used. Moreover, in this type of target detection model neural network architecture, the characteristic FPN model can be combined with the Faster-RCNN model to fuse the high-resolution information of the low-level characteristics and the high-semantic information of the high-level characteristics to further improve the detection result of the target graph.
Specifically, the Hough line detection method can transform an image into a parameter space. When the Hough line detection method is used to process an image, a point in the image is mapped to a curve in a parameter space, and a straight line in the image is mapped to a point in the parameter space. In the image, the points on the same straight line correspond to a curve cluster that intersect at one point in the parameter space, and the intersection point is the straight line in the image. Since most of the straight lines in the audiogram are parallel to one of the two coordinate axes, a mode of the Hough transform parameters corresponding to a slope of the straight line in the audiogram is an angle to be rotated of the audiogram. Preferably, non-maximum value suppression can be performed after the Hough transform that transforms the image into the parameter space, and the straight line with higher confidence in the parameter space is retained.
After the straight line with higher confidence is determined, the chart area can be rotated based on the angle to be rotated between the coordinate axis it represents and the edge of the chart area, so as to compensate for the original deflection angle of the audiogram in the chart area.
It can be understood that, in some embodiments, the subsequent steps may be directly performed without rotating the chart area, or other processing may be performed on the chart area, such as image distortion correction processing.
Still referring to
Specifically, after obtaining the chart area with the audiogram therein, the information in the audiogram can be processed. First, the position and direction of the coordinate axis can be detected. Since the coordinate axis is close to the coordinate axis labels distributed thereon, the position and direction of the coordinate axis can usually be fitted according to the coordinate axis labels on the coordinate axis. Therefore, the coordinate axis labels can be recognized first when the chart is processed.
For standard charts such as audiograms, the coordinate axis labels on the coordinate axes are generally fixed numbers. Therefore, target detection can be performed based on a limited number of coordinate axis labels of a plurality of fixed types used in the chart to determine the coordinate axis positions. In some embodiments, the second neural network may use a neural network with the same algorithm or model that the first neural network used in chart detection in step 204, for example, using Faster RCNN in combination with FPN model. The specific structure of these models will not be repeated here. The second neural network usually also needs to be trained by a specific data set, so that it has the ability to recognize coordinate axis labels and characteristic labels.
In some embodiments, a training data set may be constructed in advance to train the second neural network. For example, the training data set can be constructed in the following way.
First of all, various standard or non-standard labels can be used to generate synthetic training data sets. The reason using the label library to generate the synthetic training data set is that the audiogram is printed, and the coordinate axis labels in the audiogram are also generated by some commonly used label libraries. Therefore, after various coordinate axis labels are generated in advance based on the label library, the corresponding relationship between the numbers and the coordinate axis labels can be directly obtained, and in practical applications, it is possible to generate coordinate axis labels in various required formats in batches without manually labeling the coordinate axis labels in the actual audiogram, which can reduce the processing complexity. Specifically, labels for various coordinate axis labels (including a variety of different fonts, different rotational angles, different sizes, etc.) can be generated first. Preferably, some common interference fonts (also including a variety of different fonts, different rotational angles, different sizes, etc.) can be generated at the same time, and these interference fonts can also be used as part of the synthetic training data set. Adding interference fonts in the synthetic training data set can enhance the ability of the second neural network to distinguish the required coordinate axis labels from the interference irrelevant characteristic labels.
Afterwards, some relatively random backgrounds (mainly charts, solid color paper (such as white paper or gray paper), etc.) can be generated, and the previously generated fonts can be superimposed on the background at random positions.
Through the above means, a large number of synthetic training data sets containing coordinate axis labels can be generated, and no time-consuming manual labeling is required. Not only does this effectively solve the problem of lack of data due to the high cost of labeling, it also enables the trained neural network to have good recognition capabilities for recognizing various fonts, various sizes, and various angles, thus further improving the accuracy of recognition of coordinate axis labels.
It can be understood that although there are relatively few types of characteristic labels, training data related to characteristic labels can also be similarly generated and be used to train the second neural network, which is not elaborated herein.
After the processing by the second neural network, the coordinate axis labels and characteristic labels and their respective positions in the audiogram can be determined. In some embodiments, a left ear air conduction characteristic label or left ear bone conduction characteristic label representing left ear hearing, and a right ear air conduction characteristic label or right ear bone conduction characteristic label representing right ear hearing can both be determined. In some other embodiments, the characteristic labels further include a left ear characteristic label representing left ear hearing and a right ear characteristic label representing right ear hearing. Referring to
In some embodiments, the first coordinate axis label and the second coordinate axis label can be detected using different sub-modules. These two sub-modules generally have the same algorithm and function, but the data sets for training the two sub-modules may not be exactly the same. For example, the abscissa axis labels mainly include 125, 250, 500, 1k, 2k, 4k, 8k, 16k, etc., while the ordinate axis labels mainly include −10, 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, etc. It should be noted that, since the labels that need to be recognized in the standard audiogram are mainly the above-mentioned labels, other numbers or characters usually do not need to be recognized, so the use of neural network technology to recognize such labels has a higher accuracy. In addition, when the neural network is used to recognize the coordinate axis labels, these coordinate axis labels themselves are considered as some image labels, and there is no need to recognize them as characters or numbers, which can save processing resources.
In particular, referring to
It can be understood that, similar to the first neural network, some parts of the second neural network model can also be replaced with an algorithm structure that can achieve the same purpose. For example, the classification structure can be replaced by a support vector machine (SVM). In addition, other common target detection models can be used, such as Fast-RCNN and YOLO.
Next, in step 208, based on the recognized plurality of first coordinate axis labels and plurality of second coordinate axis labels, a chart coordinate system is generated, where the chart coordinate system is used to fit the first coordinate axis and the second coordinate axis. Specifically, after obtaining the positions of the coordinate axis labels, a robust fitting method can be used to fit the coordinate axis using the coordinate axis labels. Robust fitting method can reduce the influence of coordinate axis label detection error.
In some embodiments, the robust fitting method may be the Huber regression fitting method, and the objective function Obj(a) of the method is given by the following equation (1).
Compared with the traditional linear regression method that uses the mean square error as the objective function and is sensitive to outliers, the Huber regression fitting method uses the absolute value error as the objective function when the data is abnormal, which has a better ability to suppress outliers. For example, the audiogram or other similar standard-format charts shown in
Other methods can also be used to fit either of the coordinate axis. For example, the random sample and consensus (RANSAC) algorithm can be used to perform coordinate axis fitting. The RANSAC algorithm is also a robust fitting method. It divides the data into in-group points and outlying points; and in an iterative way, it continuously selects some data points in the data for fitting randomly, and retains the model with the most in-group points as the final result.
As shown in
Then, as shown in
It can be understood that after obtaining the straight lines corresponding to the two coordinate axes by fitting, the coordinate axis labels recognized in step 206 can be combined with each other to generate the chart coordinate system, so that the frequency value or loudness value corresponding to each length of the abscissa axis or the ordinate axis is determined. In some embodiments, it is also possible to generate fitting coordinate lines (not shown in the figures) similar to the respective abscissa and ordinate lines that intersect with each other in the original audiogram in the chart coordinate system.
Next, in step 210, coordinate values of each characteristic labels can be determined based on an identified position of the characteristic label in the chart coordinate system.
After the positions of the coordinate axes and the positions of the characteristic labels are determined, each characteristic label can be projected onto the two coordinate axes determined in step 208 to determine the coordinate values of the characteristic label on the two coordinate axes. In the embodiment of the present application, the slope of the coordinate axis obtained by fitting can be used to calculate an equation of a straight line parallel to the coordinate axis passing through the characteristic label, and to obtain the position of the intersection point of the straight line with each of the coordinate axes. Finally, it can be determined that the coordinate closest to the intersection point on the coordinate axis corresponds to the coordinate value of the characteristic label on the coordinate axis.
Referring to
In some embodiments, respective distances between the above-mentioned intersection points and each frequency axis label can be compared, and the frequency axis label corresponding to the shortest distance is the frequency of the characteristic label m, because the hearing test is performed under the standard frequencies. Similarly, the projection of the characteristic label m on the loudness coordinate axis l can be obtained, that is, the coordinate of the intersection point of the straight line f′ and the straight line l is
In some embodiments, the distance between the respective intersection points and each loudness axis label may be compared, and the loudness axis label corresponding to the shortest distance may be determined as the loudness of the characteristic label m. In other embodiments, the loudness of the characteristic label m may be calculated proportionally based on the respective distances between the loudness coordinate axis labels and the intersection point, or the respective distances between at least two adjacent loudness coordinate axis labels and the intersection point. Therefore, using the above method, the frequency and loudness corresponding to each characteristic label can be calculated, and the audiogram can be completed.
In other embodiments, other means may be used to determine the coordinate values of the characteristic labels. For an embodiment that uses two RANSAC algorithm fittings as shown in
In some embodiments, the coordinate values of the characteristic labels can be combined with the object image to facilitate observation by the operator.
The chart recognition method of the present application can accurately and efficiently recognize charts such as audiograms, has strong robustness, and can cover most application scenarios. This application can also effectively promote the automated fitting of hearing aids, bringing convenience to the majority of patients.
The embodiments of the present invention may be implemented by hardware, software, or a combination of software and hardware. The hardware part can be implemented using dedicated logic; the software part can be stored in a memory and executed by an appropriate instruction execution system, such as a microprocessor or dedicated design hardware. Those skilled in the art can understand that the above-mentioned devices and methods can be implemented using computer-executable instructions and/or included in processor control codes, for example, such codes are provided on a carrier medium such as a disk, CD or DVD-ROM, on a programmable memory such as a read-only memory (firmware) or on a data carrier such as an optical or an electronic signal carrier provide such codes. The device and modules thereof of the present invention can be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc. It can also be implemented by software executed by various types of processors. It can also be implemented by a combination of the above-mentioned hardware circuit and software, such as firmware.
It should be noted that although several steps or modules of the method, device, and storage medium for recognizing the chart are mentioned in the above detailed description, this division is only exemplary and not mandatory. In fact, according to the embodiments of the present application, the features and functions of two or more modules described above can be embodied in one module. In contrary, the features and functions of a module described above can be further divided into multiple modules to be embodied.
Those skilled in the art can understand and implement other changes to the disclosed embodiments by studying the description, the disclosed content, the drawings, and the appended claims. In the claims, the word “comprise” does not exclude other elements and steps, and the word “a” and “an” do not exclude plurals. In the practical application of the present application, one part may perform the functions of multiple technical features cited in the claims. Any reference numerals in the claims should not be construed as limiting the scope.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202110614188.9 | Jun 2021 | CN | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/CN2022/094420 | 5/23/2022 | WO |