This application is a National Stage of International Application No. PCT/JP2018/029613 filed Aug. 7, 2018, claiming priority based on Japanese Patent Application No. 2017-156115 filed Aug. 10, 2017.
The present invention relates to a method and a device for recognizing s tire image.
There has been recommended to replace a tire with a new tire when decrease in a tread rubber due to wear, or a damage due to external wound or deterioration occurs, in order to secure a tire performance and safety. Information acquisition for determination the above-mentioned phenomenon has been made mainly through appearance observation by visual inspection.
In determination of an amount of wear, although the determination is important for the running performance and the safety performance of the tire, it is hard to say that inspection by a driver has been performed as frequently as needed on a daily basis.
Therefore, instead of visual inspection by a human, if the tire information such as the amount of wear can be recognized from images produced by a machine, such as a camera, not only labor-saving in the inspection, but also reduction in management costs can be expected.
In recent, years, the image processing technology and the image recognition technology have significantly advanced, and the study for application to the inspection of a tire, for example, capturing a tread pattern of a tire and analyzing an aspect ratio thereof and a tread groove depth to identify a tire wear amount, has started (see, for example, Patent Document 1).
Patent Document 1; US 2016/0343126
However, there has been a problem that, in Patent Document 1, because feature amounts that are characteristic geometric information such as edges and lines of a tread pattern are set in advance with intervention of a person such as a developer, not only analysis parameters are limited to individual cases, but also it takes a lot of time to analyze large amounts of tires.
In addition, the analysis accuracy has been affected by an individual image state such as brightness, an angle, or a size of an image used.
The present invention has been made in view of the conventional problems, and aims at providing a method and a device that can easily and reliably recognize a tire type and a wear state from images of the tire.
The present invention provides a tire image recognition method, including: a step of obtaining a plurality of images of tires that differ from one another in either one of or both of a tire type and a tire condition, the obtained images being regarded as teacher images; a step of converting the respective teacher images into a size of a predetermined number of pixels; a step of learning by a convolution neural network using data of the plurality of converted teacher images as learning images, and setting parameters for the convolution neural network; a step of obtaining a tire image of a recognition-target tire and converting the obtained tire image into a size identical to that of the teacher images; and a step of inputting the converted tire image of the recognition-target tire to the convolution neural network and determining either one of or both of the tire type and the tire condition of the recognition-target tire.
Further, the present invention provides a tire image recognition device, including: a tire image capturing means that captures a plurality of teacher images and a recognition target image, the teacher images being images of tires that differ from one another in either one of or both of a tire type and a tire condition; an image data converting means that converts the teacher images and the recognition-target image into a size of a predetermined number of pixels; a feature amount extracting means that extracts feature amounts of the images converted by the image data converting means; and a determining means that compares feature amounts of the recognition target image with feature amounts of the teacher images to determine either one of or the both of the tire type and the tire condition of the target tire, in which the feature amount extracting means includes a convolution layer and a pooling layer of a convolutional neural network in which the teacher images are configured as learning images, and in which the determining means includes a fully connected layer of the convolutional neural network.
The summary of the invention does not enumerate all the features required for the present invention, and sub-combinations of these features may also become the invention.
The tire image recognition device 10 includes a tire image capturing means 11, an image data converting means 12, an image storage means 13, a tire recognizing and determining means 14, and a display means 15, and determines a wear condition of a tire from captured images of the tire.
As the tire image capturing means 11, for example, an image capturing device such as a digital camera or a smart phone is used, and the display means 15 is configured of a display or the like. Incidentally, by shooting a motion picture such as a video, still images of the motion picture may be used.
Further, each of the means, from the image data converting means 12 to the tire recognizing and determining means 14, is configured by storage devices such as a read only memory (ROM) and a random access memory (RAM) and a microcomputer program.
The tire image capturing unit 11 obtains the images by capturing images of the surface of a tire 20. More specifically, the tire image capturing unit 11 obtains a plurality of images by capturing a plurality of positions of the circumference of the tire 20 (for example, six positions).
A color tone may be either one of a gray scale and RGB, however, in the present embodiment, gray scale images were used since the tire is black color. With this, since only one channel is required, the amount of image information can be reduced. Incidentally, as image gradations, images obtained by normalizing the gray scale 225 gradations in the range of 0-1 were used.
The tire image capturing means 11 captures a plurality of reference tires, which differ in wear amounts from each other, for obtaining learning data for a convolutional neural network and a recognition-target tire for recognizing and determining the wear amount.
In the present embodiment, the number of images of the reference tire was 6 sheets, and the number of images of the recognition-target tire was 2 sheets.
Further, sizes and the number of pixels of the images are not particularly limited, however, in the present embodiment, the size of the captured images was set to 480×640 pixels. Though an image-capturing range is also not particularly limited, it is desirable that any of parts of the tread is captured in the entire image. If an object, such as scenery, a vehicle and the like other than the tire 20, has been captured in the image, it is desirable to extract the tire part and make it as a new tire image.
The image data converting means 12 converts the captured images into images of a predetermined size.
More specifically, as illustrated in
In the present embodiment, the size of the tire image G0 was set to 480×640 pixels and the size of the converted images G1 to Gn was set to 256×256 pixels, and 6 sheets of the converted images G1 to G6 were cut out from one tire image Gn. The number of the converted images thus became 36 sheets.
The image storage means 13 stores converted images GL1 to GLn of the learning data and converted images GS1 to GSn of the recognition data that were converted by the image data converting means 12. In the meantime, the converted images GL1 to GLn of the learning data are stored by being divided into teacher data GL1 to GLm for determining filters of a convolution layer and a pooling layer and parameters of a fully connected layer that are to be described later, and test data GLm+1 to GLn for confirming a determination accuracy of the convolutional neural network. As the number m of the teacher data, it is desirable to set to two third or more of the total number n of the learning data.
In the present embodiment, the number of levels was set to 2, and the number m was set to 27. That is, 27×2 sheets out of 36×2 sheets of the learning images were made as the teacher images, and the remaining 9×2 sheets were made as the test images.
The tire recognizing and determining means 14 includes a feature amount extracting unit 14 and a recognizing and determining unit 14B.
The feature amount extracting unit 14A includes a convolution layer having a convolution filter F1 (in this case, F11, F12) and a pooling layer having a rectangular filter F1 (in this case, F21, F22). The feature amount extracting unit 14A extracts, from the converted images GS (GS1 to GSn) of the recognition data converted by the image data converting means 12, feature amounts of the recognition-target image that are images of the tire to be a recognition-target, and thereafter, develops the value of each of the pixels into one dimension and sends them to the recognizing and determining unit 14B.
The recognizing and determining unit 14B includes three fully connected layers of input layers, hidden layers and output layers. The recognizing and determining unit 14B compares the feature amounts of the recognition-target image with the feature amounts of the teacher images, recognizes and determines a wear condition of the recognition-target tire, and outputs, in the form of “probability”, determination results from the output layers to the display unit 15.
Each of the fully connected layers is configured by a plurality of units (referred to “neuron”) shown by circles in
Here, the number of levels of the wear condition was set to two levels of a new product (wear amount 0 mm) and a large wear amount (wear amount 11 mm).
In the meantime, the number of the fully connected layers may be two layers or four layers or more.
Also, parameters (weights) for connecting the convolution filters F11, F12 with the rectangular filters F21, F22 and for connecting mutual units of the fully connected layers can be obtained by deep learning using the teacher data GL1 to GLk.
The detail of the convolutional neural network and the deep learning will be described later.
The display means 15 displays the determination results of the tire recognizing and determining means 14 in a display screen 15G.
Next, an explanation is given as to the convolutional neural network.
The convolutional neural network is a feed-forward type neural network formed of a combination of the convolution layer that outputs feature images by performing convolution processing using filters to the input images and the pooling layer that improves the recognition ability with respect to a positional change by lowering a positional sensibility of the extracted features In the convolutional neural network, after repeating the convolution layer and the pooling layer for several times; the fully connected layer is arranged. In the meantime, the convolution layer and the pooling layer are not necessarily be a pair, a combination of the convolution layer—the convolution layer—the pooling layer may be employed, for example.
The convolution layer is a layer for filtering (convoluting) the input images and in order to precisely grasp the features of the input images, it is desirable to use a plurality of filters.
The filter for convolution is for weighting each pixel value included in the area of a moderate size and adding them together, and may be expressed by a four-dimensional tensor.
On the other hand, the pooling layer shifts the rectangle filter within the input image, takes out a maximum value in the rectangular and outputs a new image (MAX pooling) so as to lower the positional sensibility of the extracted features. Incidentally, an average value pooling may be performed for averaging the values in the rectangular.
Next, an explanation is given as to the operation of the convolutional layer. The explanation is given, taking as an example, operations up to the process in which the recognition-target image Gk is convolution-processed by the first convolution layer to obtain the first convolution image Gk (F11).
As the convolution filter F11, a square filter having a size of p×p is generally used. The size of the square of the convolution filter F11 corresponds to the pixels of the recognition-target image Gk, and numbers (filter values) a1.1˜ap,p in the square are parameters that can be updated by learning. That is, the parameters a1.1˜ap,p are updated so as to be able to extract the feature amounts of the images, in the learning process.
As illustrated in
Incidentally, the operation for obtaining, using the convolution filter F12, a second convolution image Gk (F12) from a second pooling image Gk (F21) to be described later is similar to the above-described operation.
As the convolution filters F11 and F12, a horizontal direction differential filter for detecting an edge in the horizontal direction or a vertical direction differential filter for detecting an edge in the vertical direction or the like is used.
Next, an explanation is given as to the operation of the pooling layer, taking as an example, operations up to the process in which the first convolution image Gk (F11) is pooling-processed by the first pooling layer to obtain the first pooling image which is the output image.
In the present embodiment, as illustrated in
The operation for obtaining, using the rectangular filter F22, a second pooling image Gk (F22) from the second convolution image Gk (F12) is similar to the above-described operation.
Incidentally, the parameters to be updated in the learning process do not exist in the pooling process.
The fully connected layer is a neural network having input layers, hidden layers and output layers, each of these layers being configured of a plurality of units, and performs a pattern classification of the second pooling image Gk (F22), which is a two-dimensional image and which has been converted into one-dimensional vector against the input data.
As illustrated in
If the number of units of the input layer and the number of units of the hidden layer are N1, N2, respectively, the weight that is a parameter for connecting an m-th (m=1 to N2) unit of the input layer from the top with an n-th unit of the hidden layer from the top is Wm,n, and the value of each unit of the input layer is u1, k (k=1˜N1), then the input value u2, m to the n-th unit of the hidden layer from the top becomes u2, m=W1, m×u1, 1+W2, m×u1, 2+ . . . +WNi, m×u1, N1. Actually, a bias b2, n is added to the input value u2, m. The bias b2, n is the parameter that can also be updated by learning.
In the neural network, by outputting, through an activation function, the input value u2, m obtained in this way, the non-linearity is enhanced and the determination accuracy of the classification is improved.
The same applies to a case where a plurality of hidden layers exist, and to the relationship between the hidden layer and the output layer.
As the activation function, tanh, the Sigmoid function or the like is used, however, in the present embodiment, the Rectified Linear Unit (ReLU) function was used, which is faster and higher in performance than tanh.
Incidentally, in the output layer, the Softmax function is used as the activation function.
The Softmax function is a special activation function which is used only for the output layer and which converts the combination of the output values of the output layers into probability. Namely, it converts the output values of the output layers so that the output values become 0-1, and the total sum of the output values becomes 1 (100%).
Next, an explanation is given as to a method for self-updating, using teacher images, parameters such as the filter values a1, 1˜ap, p and the weight Wm, n and so on.
First, a difference between output values of “correct answer” with respect to output values of the respective levels and output values obtained by inputting the teacher images is digitized by a loss function. In the present embodiment, since the teacher images are 27×2 sheets, the parameters are updated so that a sum of the errors occurring at the time when data of 54 sheets are passed through the convolutional network becomes the minimum. In the present embodiment, a cross-entropy loss function was used as the loss function.
Further, in the present embodiment, as a method for reducing the error, the stochastic gradient descent method (SGD) was used, and an algorithm of the back propagation was used for modifying a gradient of the loss function.
The stochastic gradient descent method extracts only a small number of samples in mini batch units from among all the data, and updates the parameters while regarding these samples as all the data.
Further, in the back propagation, by sequentially obtaining the gradient from the output to the input rather than computing the gradient directly, the gradient can be obtained at a high speed.
Incidentally, in a case where the number of data is large, if the technique of Dropout, in which the calculation is performed assuming as if a part of the units is not existing, is used when calculating the fully connected layer, over-training can be prevented.
Further, the number of learning is not particularly limited, however, it is desirable to perform at least ten times. If the learning is performed correctly, the value of the loss function is reduced each time the learning is performed.
Next, an explanation is given as to a tire image recognition method by referring to the flowchart of
First, confirmation is made as to whether or not the learning of the convolution neural network (CNN) has been finished (step S10).
If the learning has not been finished, the process proceeds to step S11, whereas if the learning has been finished, the process proceeds to step S21.
In step S11, surfaces of a plurality of reference tires with different wear amounts are captured to obtain images of the reference tires.
Next, after converting the obtained images of the reference tires into a plurality of images of a predetermined size (step S12), the converted images are divided into a plurality of teacher images and test images (step S13).
Then, by using these plurality of teacher images, the deep learning is performed, and filter values of the convolution layer and the pooling layer, and the parameters of CNN such as the weight of the fully connected layer are self-updated to obtain learning parameters (step S14), and with the use of these obtained learned parameters, the wear amount determining device corresponding to the tire recognizing and determining means in
Then, at the time of completion of the learning, the determination accuracy of the wear amount determining device is confirmed with the use of the test images (step S16).
After confirming the determination accuracy, the process proceeds to step S21, and the surface of the tire, which is a target for recognizing and determining the amount of wear, is captured to obtain the image of the recognition-target tire.
Next, the obtained image of the recognition-target tire is converted into a plurality of images of a predetermined size (step S22).
Then, after data of these converted images is inputted to the wear amount determination device configured in step S15 and the recognition-target tire (step S23) is recognized and determined, the determination results are displayed on the display screen such as a display (step S24), and the process is terminated.
Incidentally, for recognition and determination of a next tire, it is sufficient to perform the processing of step S21 to step S24.
Although the present invention has been described with the use of the embodiment, the technical scope of the present invention is not limited to the scope described in the above embodiment. It is apparent to those skilled in the art that various modifications and improvements may be added to the above-described embodiment. It is apparent from the claims that embodiments with such modifications or improvements may also belong to the technical scope of the present invention.
For example, in the above-described embodiment, the number of levels of the wear state was set to two levels, that is a new article (wear amount 0 mm) and a large wear amount (wear amount 11 mm), however, the number of levels may be three or more levels.
For example, when it is desired to determine the amount of wear instead of the wear state, it is sufficient to lean, as teacher data, by labeling the images of the worn tires of a plurality of levels, in increments of 1˜2 mm, for the number of levels of the wear amount, and determine the actual wear amount of the tire to be determined with the use of those parameters.
In the above embodiment, the tire condition was determined by the tread wear condition, however, as to whether or not there is a crack in the side tread and so on, it is possible to determine by recognizing the normal product and a defective product.
Further, when it is desired to identify a tire type from the tread pattern, it is sufficient to learn the number of tire types as labels and use the labels for determination.
It should be noted that if the outputted determination results are stored in a server or a cloud or the like, it is possible to use the outputted determination results for services such as announcement of result information to a site user or, depending on the results, recommendation of tire change.
An explanation is given as to identification results in a case where the tire wear amount was set to two levels, that is, a new tire and a large wear amount for tires of the same type.
Note that the identification method followed the flowchart illustrated in
The specifications of the tires are shown below.
Six pictures of a circumference length of each tire were taken randomly by a camera of a smart phone.
The sizes of the images are each 480×640 pixels.
Gray scale 255 gradations were normalized in the range of 0-1.
The images after data conversion are shown in
The sizes of the images after data conversion are each 256×256 pixels.
For two types of tires, the data were distributed as follows, respectively.
Deep Learning Condition
In the calculation from the first layer to the second layer, the dropout method was employed.
The output value was converted into the random element by the soft max function.
As the loss function, the cross entropy function was used to estimate the error with respect to the teacher data.
The filter and the weighting function were updated bye the gradient back propagation.
The learning cycle mentioned above was repeated ten times to obtain the learning parameters.
Results
Results of using the obtained learning parameters for the identification test of the test images are shown in the Table 1 below.
As shown in Table 1, it was confirmed that the error has approached zero each time the number of times of learning was increased and the learning has progressed.
Further, the determination correct answer rate was 100%.
In other words, a total of 18 sheets of 9 images of the test tire 1 and 9 images of the test tire 2 were all correctly recognized and classified.
The tire 2 with the large wear amount and a tire 3 with a medium wear amount were identified.
The determination correct answer rate was 96%.
In other words, out of a total 18 sheets of 9 images of the test tire 2 and 9 images of the test tire 3, 17 sheets were correctly recognized and classified.
The new tire 1, the tire 2 with the large wear amount and the tire 3 with the medium wear amount were identified.
Incidentally, the captured images of the tire 1 to the tire 3 and, the images after data conversion are the same as those illustrated in
The determination correct answer rate was 96%.
In other words, out of a total 27 sheets of 9 images of the test tire 1, 9 images of the test tire 2 and 9 images of the test tire 3, 26 sheets were correctly recognized and classified.
The tire 1 and a tire 4 of different types were identified.
The implementation conditions conform to the Example 1.
The determination correct answer rate was 100%.
In other words, a total of 18 sheets of 9 images of the test tire 1 and 9 images of the test tire 4 were all correctly recognized and classified.
The tire 4 and a tire 5 of different tread patterns were identified.
The determination correct answer rate was 100%.
In other words, a total of 18 sheets of 9 images of the test tire 4 and 9 images of the test tire 5 were all correctly recognized and classified.
The new tire 1, the tire 2 with the large wear amount and the tire 3 with the medium wear amount were identified by a tire identification device that does not have the convolution structure.
The determination correct answer rate was 59%.
In other words, out of a total of 27 sheets of 9 images of the test tire 1, 9 images of the test tire 2 and 9 images of the test tire 3, only 16 sheets were correctly recognized.
In summary, the present invention can be described as follows. That is, the present invention provides a tire image recognition method, including: a step of obtaining a plurality of images of tires that differ from one another in either one of or both of a tire type and a tire condition, the obtained images being regarded as teacher images; a step of converting the respective teacher images into a size of a predetermined number of pixels; a step of learning by a convolution neural network using data of the plurality of converted teacher images as learning images, and setting parameters for the convolution neural network; a step of obtaining a tire image of a recognition-target tire and converting the obtained tire image into a size identical to that of the teacher images; and a step of inputting the converted tire image of the recognition-target tire to the convolution neural network and determining either one of or both of the tire type and the tire condition of the recognition-target tire.
As such, because the convolutional neural network, which determines by the fully connected layer (conventional neural network) after extracting the feature amounts from the inputted tire image data by the convolution layer and the pooling layer, was used when recognizing the tire images, not only the calculation speed is expedited as the parameters in the neural network are greatly reduced, but also the tire information such as tire types and wear amounts can be accurately recognized and determined without setting the characteristic geometrical information such as edges and lines of the tread pattern.
Further, in the convolutional neural network, at the time of learning, the parameters of the neural network are updated and optimized so as to minimize the errors with respect to the set of the teacher image data by the back propagation with the use of the gradient descent method (GD), the stochastic gradient descent method (SGD) or the like. Thus, the accuracy in the determination of the recognition-target tire can be improved greatly.
Further, since the tire type or the tire condition is either one of a tread pattern, a tread wear amount, a damage in a bead and a clack in a side tread, tire information necessary for tire replacement can be accurately recognized and determined.
Further, since at least one pattern periodic structure is captured in the teacher images and the tire image of the recognition-target tire, the tire information can be accurately recognized and determined with less image information amount.
Further, since the teacher images and the tire image of the recognition-target tire are converted into gray scale, and gradations of the gray scale are normalized in the range of 0 to 1, the image information amount can be reduced, hence the calculation time can be shortened.
In addition, the present invention provides a tire image recognition device, including: a tire image capturing means that captures a plurality of teacher images and a recognition-target image, the teacher images being images of tires that differ from one another in either one of or both of a tire type and a tire condition: an image data converting means that converts the teacher images and the recognition-target image into a size of a predetermined number of pixels; a feature amount extracting means that extracts a feature amount of the images converted by the image data converting means; and, a determining means that compares a feature amount of the recognition-target image with a feature amount of the teacher images to determine either one of or the both of the tire type and the tire condition of the target tire, in which the feature amount extracting means includes a convolution layer and a pooling layer of a convolutional neural network in which the teacher images are configured as learning images, and in which the determining means includes a fully connected layer of the convolutional neural network.
By adopting such a configuration, it is possible to realize the tire image recognition device that can precisely recognize and determine the tire information such as tire types, wear amounts and so on.
10: Tire image recognition device, 11: Tire image capturing means, 12: Image data converting means 13: Image storage means, 14: Tire recognizing and determining mans, 14A: Feature amount extracting unit, 14B: Recognizing and determining unit 15: display means, 20: Tire.
Number | Date | Country | Kind |
---|---|---|---|
JP2017-156115 | Aug 2017 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/029613 | 8/7/2018 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2019/031503 | 2/14/2019 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
9798946 | Lee | Oct 2017 | B2 |
20100100275 | Mian | Apr 2010 | A1 |
20120207340 | Bulan et al. | Aug 2012 | A1 |
20130129182 | Noyel | May 2013 | A1 |
20150227819 | Kimura | Aug 2015 | A1 |
20150278580 | Sato et al. | Oct 2015 | A1 |
20160343126 | Miller | Nov 2016 | A1 |
20170083796 | Kim | Mar 2017 | A1 |
Number | Date | Country |
---|---|---|
105913450 | Aug 2016 | CN |
10-255048 | Sep 1998 | JP |
2013-532315 | Aug 2013 | JP |
2015-148979 | Aug 2015 | JP |
2015-185034 | Oct 2015 | JP |
2017-129492 | Jul 2017 | JP |
2017096570 | Jun 2017 | WO |
Entry |
---|
Internal Preliminary Report on Patentability with Translation of Written Opinion of the International Searching Authority for PCT/JP2018/029613 dated Feb. 20, 2020. |
Communication dated Mar. 24, 2021 by the European Patent Office in application No. 18843613.3. |
Search Report dated Feb. 5, 2021 from The State Intellectual Property Office of P.R. China in Application No. 201880048281.4. |
Cui Xuehong et al., “Defect classification for tire X-ray images using convolutional neural network”, Electronic Measurement Technology, 5th term vol. 40, May 31, 2017. pp. 168-173 (6 pages). |
Zhang et al., “Road Crack Detection Using Deep Convolutional Neural Network”,2016 IEEE International Conference on Image Processing (ICIP),IEEE, Sep. 25, 2016,pp. 3708-3712 (5 pages total). |
International Search Report of PCT/JP2018/029613 dated Oct. 30, 2018. |
Number | Date | Country | |
---|---|---|---|
20200125887 A1 | Apr 2020 | US |