The disclosed technology relates to a recognition device, a recognition method, and a recognition program.
There is a technology of recognizing a shape of another object appearing in a video of an in-vehicle camera such as a drive recorder.
Here, it is assumed that a problem vehicle that repeats dangerous behavior such as road rage is observed for another object. A technology of automatically detecting a problem vehicle in a case where the vehicle enters a detection area has been developed (see Non Patent Literature 1). It is assumed that the problem vehicle repeats dangerous behavior at another point, at another time, and to another vehicle. Therefore, it is considered that notifying other vehicles that the vehicle is dangerous is highly necessary. For this purpose, obtaining information that can identify the vehicle is important. However, the vehicle that repeats dangerous behavior is less likely to provide position information, a behavior history, a video of the inside of the vehicle, and the like as the information that enables identification, and identification needs to be performed from the outside.
In order to identify a vehicle, for example, a vehicle type, a color of a vehicle body, and a number written on an automobile registration number tag (license plate) are useful information for identification.
As an existing technology, a technology of detecting a license plate by an object detection technology and reading characters described thereon has been proposed (see Non Patent Literature 2 and Non Patent Literature 3). This is a method of performing front/rear detection, and performing processing in stages of license plate detection and character recognition (character detection).
Here, since recognition of a license plate is composed of 26 alphabets+10 numbers in the alphabet-using countries assumed in Non Patent Literature 3, and a character string that is main is often represented in a large line, it is assumed that character recognition is relatively easy. However, in a country having a form of a localized license plate such as Japan or China, characters including hiragana characters and Chinese characters divided into a plurality of columns need to be recognized, and a case where patterns are complicated is assumed. Furthermore, in the example of Non Patent Literature 3, three-stage object detection is performed, and the calculation cost is large.
For example, there are a case where the size of some characters is small, a case where Chinese characters are included, a case where some digits of 1 to 3 digits are alphabets, a case where there is one character in hiragana, a case where a dot (⋅) is included, and the like. Examples of the case where a dot is included include a case where the number is 3 digits or less, such as “⋅1-43”. Furthermore, there is a plurality of types of backgrounds according to the classification such as general use or business use, ordinary vehicle or light vehicle, and out-of-inspection, and the background of a license plate is determined regardless of the color of the vehicle body. Furthermore, variations in the background are increasing due to the influence of a local license plate unique to a region (see Non Patent Literature 4). As described above, there are issues related to recognition of a license plate.
Furthermore, unlike an expensive high-performance camera, in an in-vehicle camera, the influence of blurring of the subject, that is, blurring caused by the influence of the relative speed between an observation vehicle and a target vehicle increases. Since capturing conditions including exposure, reflection, and the like are mainly different each time, there is a case where recognition is difficult by a conventional method.
For example, since there is no relationship between a vehicle body color and a license plate, in a case where the license plate and the vehicle body color are similar to each other, the boundary is unclear, and it is assumed that object detection and shape recognition are difficult. For example, a case where a white or silver vehicle body has a white license plate, a case where a black vehicle body has a black license plate, and the like are assumed. Furthermore, even if the region of a license plate is enlarged, a portion other than the number “XX-XX” may not be clear in some cases. As described above, there are issues related to blurring of the subject.
The disclosed technology has been made in view of the above circumstances, and an object thereof is to provide a recognition device, a recognition method, and a recognition program for enabling evaluation of detected characters and recognition of a target even in a case where a specific target is difficult to be recognized.
A first aspect of the present disclosure is a recognition device including an acquisition unit that acquires a time-series image acquired in an environment in which a vehicle travels, a detection unit that detects characters of a predetermined character string from the image, and a shape recognition unit that evaluates relationship between detected characters of the character string and recognizes a shape of a target including the character string.
A second aspect of the present disclosure is a recognition method for causing a computer to execute processing, the processing including acquiring a time-series image acquired in an environment in which a vehicle travels, detecting characters of a predetermined character string from the image, and evaluating relationship between detected characters of the character string and recognizing a shape of a target including the character string.
A third aspect of the present disclosure is a recognition program for causing a computer to execute processing, the processing including acquiring a time-series image acquired in an environment in which a vehicle travels, detecting characters of a predetermined character string from the image, and evaluating relationship between detected characters of the character string and recognizing a shape of a target including the character string.
According to the disclosed technology, even in a case where a specific target is difficult to be recognized, detected characters can be evaluated and a target can be recognized.
Hereinafter, an example of an embodiment of the disclosed technology will be described with reference to the drawings. In the drawings, the same or equivalent components and parts will be denoted by the same reference signs. Further, dimensional ratios in the drawings are exaggerated for convenience of description and may be different from actual ratios.
First, an overview of the present disclosure will be described. Hereinafter, in the example described in the present embodiment, a case where a shape is recognized using a license plate of a target vehicle captured from an observation vehicle as a target will be described. In a method of the present embodiment, a pattern or a shape corresponding to a license plate is not detected, but a pattern in which a character string corresponding to a license plate is drawn is recognized as a number, and a region in which the pattern of the character string is present is recognized as a license plate. Hereinafter, the description regarding a character string is assumed to represent a pattern of the character string. This is because no matter what color and design the background of the license plate is, the drawn character string follows the description rule of a license plate. Furthermore, hereinafter, a portion of a character string of a license plate may be described as a number, and the license plate itself as an object may be described as a plate. Note that the method of the present embodiment can be applied not only to a license plate but also to a feature on which a sign and other characters are drawn.
In the following description of the embodiment, description will be given focusing on a portion of “XX-XX” that is a character string of a number portion of a license plate. Considering the particularity of the font and the description rule, it is assumed that the license plate can be recognized only by the portion “XX-XX”. A license plate is restricted, and a position to be installed, an angle, an arrangement in the plate, and the like are restricted. Note that, in the present embodiment, the definition of characters of a character string includes numbers, symbols, hiragana, and Chinese characters.
An existing technology utilized in the present embodiment will be described.
For detection of each character of a character string, for example, an object detection technology described in Reference Literature 1 is utilized. In this technology, for example, an object and each character of alphabets and numbers are detected from an image, and coordinate information (upper left XY coordinates and lower right XY coordinates of rectangles) on circumscribed rectangular images is output, and each character of a character string can be detected. [Reference Literature 1] “YOLO: Real-Time Object Detection”, URL: “https://pjreddie.com/darknet/yolo/”
Furthermore, the principle of a camera as described in Reference Literature 2 is utilized. [Reference Literature 2] “Proximity”, URL: “http://www.persfreaks.jp/main/intro/pers/”
In a case where a monocular camera such as a general in-vehicle camera is used, a feature such as another vehicle or a building is drawn so as to converge with respect to the vanishing point. Furthermore, in a case where an image that is not easily affected by lens distortion is used, such as a case where a non-wide-angle lens is used or a case where a region having less distortion of a wide-angle lens is cut out, the size of an object appearing in an image can be approximated as in a perspective view, and changes according to a law with respect to a distance from a reference point.
On the basis of the above, the configuration of the present embodiment will be described.
As illustrated in
The CPU 11 is a central processing unit, and executes various programs and controls each unit. That is, the CPU 11 reads the programs from the ROM 12 or the storage 14 and executes the programs by using the RAM 13 as a work region. The CPU 11 controls the components described above and performs various types of calculation processing according to the program stored in the ROM 12 or the storage 14. In the present embodiment, a recognition program is stored in the ROM 12 or the storage 14.
The ROM 12 stores various programs and various types of data. The RAM 13 serving as a working region temporarily stores programs or data. The storage 14 includes a storage device such as a hard disk drive (HDD) or a solid state drive (SSD) and stores various programs including an operating system and various types of data.
The input unit 15 includes pointing devices such as a mouse and a keyboard and is used to execute various inputs.
The display unit 16 is, for example, a liquid crystal display and displays various types of information. The display unit 16 may function as the input unit 15 by employing a touchscreen system.
The communication interface 17 is an interface communicating with another device such as a terminal. For the communication, for example, a wired communication standard such as Ethernet (registered trademark) or FDDI, or a wireless communication standard such as 4G, 5G, or Wi-Fi (registered trademark) is used.
Next, each functional unit of the recognition device 100 will be described.
As illustrated in
The acquisition unit 110 acquires a time-series image from a video obtained by capturing by an in-vehicle camera of an observation vehicle.
The recognition unit 112 includes a detection unit 120 and a shape recognition unit 122.
The storage unit 114 stores correspondence information regarding a target feature. The correspondence information is, for example, information for identifying the type of a font of a license plate or a sign, and information of restrictions of the number interval and the like.
The detection unit 120 detects characters in a character string from an image. As illustrated in
Furthermore, in detection by the detection unit 120, characters of a character string of a specific font may be detected using a model learned in advance to detect characters of the specific font.
The shape recognition unit 122 evaluates relationship between detected characters of a character string, and recognizes a shape of a target including the character string.
An example of evaluation by the shape recognition unit 122 will be described. The shape recognition unit 122 evaluates positional relationship between detection rectangles corresponding to respective characters and evaluates relationship among pixels of the detection rectangles as evaluation regarding the relationship. Description will be made in the case of the example illustrated in
Furthermore, the shape recognition unit 122 determines whether fonts of respective characters of a character string are the same, and evaluates positional relationship among fonts determined to be the same. A target may be identified from a list of features for which use of fonts is known with reference to the correspondence information in the storage unit 114. The fonts may be obtained by detection rectangles being compared further using a feature such as an aspect ratio of characters.
As described above, the shape recognition unit 122 can recognize a portion “XX-XX” corresponding to a number. Therefore, the shape recognition unit 122 may additionally recognize characters in the periphery of the portion “XX-XX” corresponding to the number. Furthermore, the shape of the target may be recognized and detected from the size of the characters. For example, in a case where it is attempted to recognize the shape of a license plate, pixels of the same color as characters of “XX-XX” may be searched for as characters in the periphery or small elements that cannot be recognized as characters, and pixels other than the number that appears on the license plate may be estimated in consideration of the presence of additional characters in the upper and left regions of “XX-XX”, the character size of the characters of “XX-XX”, here, the number of vertical and horizontal pixels of detection rectangles.
Furthermore, the shape recognition unit 122 may perform additional processing of searching the periphery of a character string of four characters corresponding to a number of a license plate, and determining that there is a region of the license plate if there is a pixel having the same color as the background of the license plate in the periphery. Note that the background of a license plate is a pixel other than pixels of characters in rectangles detected by coordinates. Furthermore, a region in which the license plate is assumed to appear may be recognized from the number of pixels of detection rectangles of four characters.
The shape recognition unit 122 recognizes a license plate as described above and identifies a region in an image in which the license plate is drawn. As described above, the recognition device 100 recognizes the shape of the license plate by the identified region and outputs the recognition result.
Note that, although a license plate has been described as an example, for example, a road sign or a road marking indicating a speed limit or the like may be set as a target. In a case of a sign illustrated in
Next, operation of the recognition device 100 will be described.
In step S100, the CPU 11 as the acquisition unit 110 acquires a time-series image from a video obtained by capturing by an in-vehicle camera of an observation vehicle.
In step S102, the CPU 11 as the detection unit 120 detects each character of a character string from the image by detection rectangles.
In step S104, the CPU 11 as the shape recognition unit 122 evaluates positional relationship of the detection rectangles corresponding to respective characters of the character string.
In step S106, the CPU 11 as the shape recognition unit 122 evaluates pixel relationship of the detection rectangles.
In step S108, the CPU 11 as the shape recognition unit 122 identifies a region in the image in which a license plate is drawn on the basis of the evaluation result of the positional relationship and the evaluation result of the pixel relationship.
In step S110, the CPU 11 as the shape recognition unit 122 recognizes the shape of the license plate by the identified region and outputs the recognition result.
As described above, according to the recognition device 100 of the present embodiment, even in a case where a specific target is difficult to be recognized, detected characters can be evaluated and the target can be recognized.
Furthermore, although an example of recognizing characters including numerical values using an object detection technology has been described, another method such as pattern matching may be used. For example, a plurality of patterns is prepared, and a portion similar to a pattern is searched for in an image.
Among characters used for a license plate, dots are symbols that are used for a number uniquely in Japan and are difficult to be recognized. For example, there is no rule of setting 0 as a head as in “00-08”, and there is a rule of expressing using a dot in a case where the head is 0 as in “ . . . 8”, “ . . . 28”, and “⋅1-28”. Therefore, as a modification, the object detection technology may be applied only to a number and dots may be individually searched for. In a case where a detected number is one character, three pixel regions of a pattern that may be regarded as a dot in corresponding pixels in the “left” are searched for in consideration of a hyphen region. Then, for example, peripheral regions to be dot possibilities are binarized, and whether the centers are black pixels and the others are white pixels is determined to detect the dots. In a case where the number of detected characters is two, the search range is two portions. In a case where the number of detected characters is three, the search range is one portion.
Furthermore, although an example of using an in-vehicle camera has been described, any camera may be used as long as the camera captures an environment in which a vehicle travels. A fixed monitoring camera installed above an intersection or in a parking lot, or a capturing device for identifying a speeding vehicle may be used, or a monitoring camera installed on a sidewalk or a storefront may be used. In this case, since there is a possibility that a license plate, a road sign, and the like are not directly in front but are inclined, the shape may be recognized by a portion extending vertically or horizontally or a bent portion being corrected in consideration of the aspect ratio of detected characters and numbers. In evaluation of relationship among detection rectangles, the aspect ratio per character of the target may be considered.
Note that the recognition processing executed by the CPU reading software (program) in the above embodiment may be executed by various processors other than the CPU. Examples of the processors in this case include a programmable logic device (PLD) whose circuit configuration can be changed after manufacturing, such as a field-programmable gate array (FPGA), and a dedicated electric circuit that is a processor having a circuit configuration exclusively designed for executing specific processing, such as an application specific integrated circuit (ASIC). In addition, the recognition processing may be executed by one of these various processors, or may be executed by a combination of two or more processors of the same type or different types (for example, a plurality of FPGAs, a combination of a CPU and an FPGA, or the like). Further, a hardware structure of the various processors is, more specifically, an electric circuit in which circuit elements such as semiconductor elements are combined.
In the above embodiment, the aspect in which the recognition program is stored (installed) in advance in the storage 14 has been described, but the present invention is not limited thereto. The program may be provided by being stored in a non-transitory storage medium such as a compact disk read only memory (CD-ROM), a digital versatile disk read only memory (DVD-ROM), and a universal serial bus (USB) memory. The program may be downloaded from an external device via a network.
Regarding the above embodiment, the following supplementary notes are further disclosed.
A recognition device including:
A non-transitory storage medium that stores a program executable by a computer to execute recognition processing,
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/028191 | 7/29/2021 | WO |