The present disclosure relates generally to the field of computer vision. More specifically, the present disclosure relates to computer vision systems and methods for information extraction from inspection tag images.
Inspection tags are paper or other tags that are attached to items that require periodic inspection, such as fire extinguishers, valves, hoses, and other equipment. Often, such tags include various indicia such as the name of the company performing the inspection, company address, date (e.g., year and/or month) of inspection, the name of the individual performing the inspection, information about the inspected object, and other relevant information. Additionally, one or more regions of inspection tags are often physically “punched” (e.g., portions of the tag are removed) to indicate a date (e.g., year/month) when the last inspection was performed. As can be appreciated, inspection tags record important information regarding the operational status and safety of associated equipment.
Information from inspection tags is generally manually obtained by insurance adjusters and other individuals performing site visits in connection with a dwelling/location, such that the adjuster or other individually reads the inspection tag and writes down relevant information from the tag, which information is then subsequently used for various insurance adjusting and other functions. However, this process is time-consuming and prone to error. With the advent of computer vision and machine learning technology, it would be highly beneficial to provide a system which automatically processes an image of an inspection tag (e.g., taken by a camera of a smart phone or other device) and automatically extracts relevant information from the tag, so as to significantly speed up the process of acquiring important inspection information at a facility and to also improve the accuracy of the information extraction from such tags.
Accordingly, what would be desirable are computer vision systems and methods for information extraction from inspection tag images which solve the foregoing and other needs.
The present disclosure relates to computer vision systems and methods for information extraction from inspection tag images. The system receives an image of an inspection tag, detects one or more tags in the image, crops and aligns the image to focus on the detected one or more tags, and processes the cropped and aligned image to automatically extract information from the depicted inspection tag. Each tag identified by the system can be bounded by a tag-box that bounds the detected tag, and a tag quality score can be calculated for each tag-box. A tag-box with the highest score can be selected for processing (e.g., for cropping and alignment of the tag depicted in the tag-box). One or more visual features can be extracted after cropping of the image, and pixel-level prediction can be performed on the image to predict and/or correct an orientation of the image. Word-level and line-level optical character recognition (OCR) is then performed on the cropped and aligned image of the tag in order to extract a plurality of information from the tag, such as date (e.g., year/month) of inspection indicated in the tag, company name, address, phone number, information about the inspected object, and other relevant information.
The foregoing features of the disclosure will be apparent from the following Detailed Description, taken in connection with the accompanying drawings, in which:
The present disclosure relates to computer vision systems and methods for analyzing images of inspection tags, as described in detail below in connection with
It is noted that the computer system 12 could be any suitable computing device including, but not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a server, a cloud computing platform, an embedded processor, or any other suitable computing device. The image database 14 could also be stored in a memory of the computer system 12 and accessed by a processor of the computer system 12, or stored separate from (e.g., external to) the computer system 12, e.g., on one or more database servers or other computer systems in communication with the computer system 12. Additionally, the computer system 12 and the image database 14 could be in communication via a wired or wireless network, including but not limited to, an intranet, the Internet, a local area network (LAN), a wide area network (WAN), or other form of communication.
If no tag-box is detected, the system exits the process. Otherwise, if only one tag-box is detected, the tag in that tag-box is identified as the tag of interest. If multiple tag-boxes are detected, the one with the highest confidence score becomes the tag of interest. Tag quality for each tag-box is computed as a ratio between the tag-box area and the area of the image, as follows:
where hb and wb are the height and width of the tag-box, and H and W are the height and width of the image. If the image is completely focused on the tag, then the tag quality tends toward a value of 1.
In step 44, the system crops the image to generate a cropped image 46, such that the tag region is cropped out using the box-of-interest and is fed to subsequent modules. Next, in step 48, the system processes the cropped image to extract one or more features from the image, including, but not limited to, a tag mask, a tag hole (e.g., one or more physical holes in the tag depicted in the cropped image), one or more punches in the tag, one or more semantic regions of the tag, and one or more key points indicating a month or other time or date indicator. Specifically, features are extracted from the tag that are used to correct the tag alignment, as well as the punched date on the tag. It is noted that a DeepLabv3 model can be trained and utilized in this step to generate a plurality of visual features.
Next, in step 50, the system performs a pixel-level prediction on the extracted features. Then, in step 52, the system predicts an orientation and a correction for the image, which is then processed in step 54 to perform an axis alignment on the cropped image to generate an axis-aligned image 56. Specifically, border point estimates can be used and mask prediction maps can be generated to estimate the four corners of a depicted tag, and perspective projective transformation can be performed to align the corners of the tag to the corners of the image. Detected visual features an also be transformed using a transformation matrix. In step 58, the system performs word-level optical character recognition (OCR) on the axis-aligned image 56 in order to determine an inspection year (e.g., 2023) and other words indicated on the tag depicted in the image. Additionally, in step 60, the system performs line-level OCR on the axis-aligned image 56 to identify a company indicated on the tag depicted in the image, as well as other information.
In step 62, the system combines the identified word-level and line-level information into one or more blocks of text. In step 64, the system extracts a punched year and a punched month. A “regex” filter can be used to detect all year-like text from all text segments detected via OCR. If only a single year is detected, that year can be identified by the system as the inspection year for the tagged object. If multiple years are detected, the system can detect distances with pairs of year boxes and punches on the tags, and the pair that has the minimum distance can be identified as the punched (inspection) year. To identify the inspection month, a similar approach can be utilized, such that distances between the month key points and punches can be computed and the pair with the minimum distance can be identified as the punched (inspection) month.
In step 66, the system extracts a company, address, and telephone number, and outputs the punched date as output 68 and company information as output 70. The system classifies all text segments generated by optical character recognition into classes of company name, address, phone number, and any other applicable information. Natural language processing (NLP)-based models can also be utilized to perform these steps, alone or in combination with visual structures that provide extraction cues. Probability maps can be computed for pixel-level locations of these classes of information, and class labels can be attached to each text segment based on dominant pixel types in the probability maps.
Finally, the outputs 42, 68, and 70 can be combined into an output table 72 that identifies a variety of information from the tag, including, but not limited to, a file name, an indication of the tagged object (e.g., fire extinguisher), a tag type (e.g., vertical or horizontal tag), a tag quality (e.g., a numeric score indicating the quality of the tag), a date the tag was punched, number of months left since the last inspection, a telephone number, a company name, a company address, and a website identifier (e.g., URL). It is noted that the processes described in
Although the systems and methods of the present disclosure have been described in connection with extracting information from inspection tags, it is noted that the systems and methods described herein could also be utilized in connection with identifying and extracting information from other types of tags and/or paper-based indicia. Additionally, such information need not be limited to inspection information, and could indeed be applied to a wide variety of information extraction of various types.
It is noted that the systems and methods of the present disclosure could be extended to identify the location of a particular photo of an inspection tag, such as a geocode, global positioning system (GPS) coordinates, or other location information. Such location information could be useful in verifying that the photo of the inspection tag is genuine, and that the image was taken at the actual location of the inspection tag. Additionally, the system could compare one or more inspection tag images with images of inspection tags at the same location or another location, so as to verify the authenticity of the detected inspection tag. Further, additional attributes of the inspection tag could be detected, such as the approximate age of the inspection (e.g., due to detected conditions of the tag such as the condition of the paper or material forming the inspection tag, etc.). Still further, the system could utilize computer vision techniques to identify the type of a tagged object from the image (e.g., a fire extinguisher) and could compare the type of the detected object to an object type indicated on the inspection tag, in order to verify that the inspection tag corresponds to the correct object.
It is additionally noted that the systems and methods of the present disclosure could be utilized to resolve ambiguous punch locations or resolution in an image of an inspection tag. For example, if the location of the punch is not immediately clear (e.g., the tag is punched on a line between two different month boxes, or the punch spans two boxes), the system could utilize machine learning to resolve the correct location of the punch. For example, the system could determine which side of the line the punch is closer to using a model that is trained on multiple tags. Also, if there are multiple tags in the same building with the same punch date but in different spots, the system could select an inspection date as the date that is most clear and/or common and/or sensible.
Having thus described the system and method in detail, it is to be understood that the foregoing description is not intended to limit the spirit or scope thereof. It will be understood that the embodiments of the present disclosure described herein are merely exemplary and that a person skilled in the art can make any variations and modification without departing from the spirit and scope of the disclosure. All such variations and modifications, including those discussed above, are intended to be included within the scope of the disclosure.
The present application claims priority of U.S. Provisional Patent Application Ser. No. 63/468,659 filed on May 24, 2023, the entire disclosure of which is expressly incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63468659 | May 2023 | US |