The invention relates to the field of image analysis and a method for quantitatively rating the degree of similarity between images. The invention also relates to a device instructed to perform the method, and a program and a computer-readable medium with instructions for carrying out the method.
Digital documents such as files and images are widespread throughout society and industries. These digital documents may be classified in many ways regarding confidentiality and possible content of personal information making exchange and storage of such documents a cumbersome task. Furthermore, digital documents are relatively easy to manipulate to make different modified versions of the same digital document, e.g., an image may be resized, skewed or elements may be added or removed.
For some purposes where such digital documents and images are to be used, it may be sufficient and more expedient to convert the digital documents and images into a reduced data format. While this reduced data format will still serve many purposes going forward, even in the absence of the digital document or image itself, it can more easily comply with the requirements for data compliance in relation to confidentiality, General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA) and the like.
Fraud is a global problem that affects not least the insurance industry. It is estimated that 10% of all insurance pay-outs are made to fraudsters. To receive an insurance pay-out, various documents must be presented to validate the insurance claim. Even then, there are loopholes. Fraudsters seek to cheat insurers in a plethora of ways and today, fraud has moved into the digital arena too.
Digital documents introduce a variety of new ways to cheat and commit fraud, not least insurance fraud. Verifying the uniqueness, ownership and authenticity of documents and items is very difficult when documents are digital files. In the past, digital rights management has been used on some file types to ensure that they were not copied, although the inconvenience thereof made it infeasible, and so insecure documents are here to stay. Verifying the uniqueness, ownership and authenticity of documents and items is the job of insurance investigators, who make value-judgments about documents throughout their workday. The more scrutinous they are, the slower and more expensive insurance pay-outs and premiums get. Some insurance companies have decided to solve this by being slack with verification and accepting as high as 20% fraud, since this allows them to have fewer investigators and so retain operative costs low.
Digital documents can often be copied indiscriminately. Keeping track of documents can be difficult and re-use of the same claim document for multiple cases across different insurance companies do happen. When a claim document is re-used in multiple insurance claim across different insurance companies, it is often due to insurance fraud. Fraudster either use their own or others images in multiple claims e.g. images downloaded from the internet.
To combat re-use of images in insurance fraud, some systems utilize a method where an image is converted to a HASH value and saved in a database. However, when a fraudster either mirrors, edits, rotates, resizes, crops or skews images, these traditional methods are no longer sufficient.
As fraudsters utilizes the same images across different insurance companies, there is a need for a database comprised of images from different insurers. This will help them combat fraud by getting an alert signal when an image is an image that has been handled (used in an insurance claim) in earlier claims. However; saving images directly to a database that can be accessed by different employees at insurance companies is not compliant to e.g. GDPR. To combat future fraud, a method is needed to save images into a database in a compliant manner, while still having the ability to correctly verifying if an image is used in other insurance claims, even when said image has been modified, e.g. cropped, skewed, rotated and/or mirrored.
There is a thus a need for increased security in the pay-out process as well as in verifying document authenticity at large. There is also a need for reducing the amount of data stored over time such as to limit data storage capacity expansions and minimize energy expenditure. Additionally, there is a need for being able to store reduced data format of digital images such that data is applicable for certain purposes while at the same time not requiring the cumbersome handling in compliance with confidentiality and data protection regulations.
In a first aspect, the present invention provides a method for quantitatively rating the degree of similarity between two digital images (100), comprising the following steps for each of the digital images (100):
In a preferred embodiment the method further comprises the step of rotating the pixilated image (200) 180 degrees if the total pixel value of the pixilated image next row from top is less than the total pixel value of the pixilated image next row from bottom.
In a preferred embodiment the method further comprises the step of rotating the pixilated image (200) 180 degrees if the total pixel value of the pixilated image row V from top is less than the total pixel value of the pixilated image row V from bottom, where V is an integer larger than 1.
In a preferred embodiment the method further comprises the step of flipping the pixilated image over its vertical axis if the total pixel value of the pixilated image next column from first column is less than the total pixel value of the pixilated image next column from last column.
In a preferred embodiment the method further comprises the step of flipping the pixilated image over its vertical axis if the total pixel value of the pixilated image number Z column from first column is less than the total pixel value of the pixilated image of number Z column from last column, where Z is an integer larger than 1.
In a preferred embodiment the pixilated image is rotated in the above-mentioned procedure before it is flipped in the above-mentioned procedure.
In a second aspect, the present invention provides a method for quantitatively rating the degree of similarity between two digital images (100), comprising the following steps for each of the digital images (100):
Using this method, identical images and minor modifications thereof can be matched. If a high hit-score (180) is being calculated, then there is a high likelihood that two different images are indeed a set of an original image and a modified version of that original image.
For example, a prior loss image (an image used in a previous insurance claim) may be provided in a different format or with different metadata in connection with fraudulent claims across different insurance companies. Also, such prior loss images may be provided with edits to part of the image, e.g., concealing certain blurred areas or cropping part out.
Hence, in one embodiment the modification of the image is a rotated image, a resized image, a skewed image, a cropped image, a mirrored image, an image with addition or elimination of one or more elements such as text or signs, or a combination thereof.
The present invention has several advantages, most notably that there is no need for storing previously handled images provided an image string (170) has been calculated by a known embodiment of the present method and stored. Since the image string (170) is a reduced data format that do not contain all the information from the image used for generating the image string (170), the confidentiality and data protection compliance challenges of storing images in a database (to e.g. see if an image has already been used in a previous insurance claim) are solved by the present invention.
Storing only the reduced data format in the form of the image string (170) takes up only little space in a database and thus limits the speed of data storage capacity upgrade over time and it is energy-saving.
Furthermore, by comparing a digital image through the use of image strings (170) with a list of historically used/generated image strings (170), a portion of insurance fraud attempts can be stopped in their tracks, freeing up insurance investigators to investigate other cases, such as more complicated cases. Yet further, the investigators need to perform fewer mouse-clicks on average to process an insurance claim.
In a third aspect the present invention provides a method for determining whether an image (100) has already been handled as the same image (100) or a modification thereof, comprising the determination of an image string (170) by the method provided in the first aspect of the invention and calculating the hit-score as a percentage identity or homology between the image string (170) of said image and the image strings (170) in a database (205) comprising the image strings of previously handled images for which image strings (170) have been calculated and stored in the database (205).
An important advantage of the methods of both the first and second aspect of the present invention is that they do not require access to the previously handled images for which image strings (170) are stored in a database (205).
In a fourth aspect the present invention provides the use of the methods for the verification of the uniqueness of an image, such as from a digital image (100).
In a fifth aspect the present invention provides the use of the methods in the process of handling insurance claims for increased security in the pay-out process.
In a sixth aspect the present invention provides a computing device (210) having a processor (211) adapted to perform the steps of the methods.
In a further aspect the present invention provides a computer program comprising instructions which cause the computer (210) to carry out one of the methods, when the program is executed by a computer (210).
In a yet further aspect, the present invention provides computer-readable medium comprising instructions which cause the computer (210) to carry out one of the methods, when executed by a computer (210).
In the following the invention is described in detail through exemplary embodiments which should not be considered as limiting to the scope of the invention.
Throughout the present text and figures it is noted that number identifiers are used to designate both the singular and the plural form of an item or concept. For instance, the identifier “100” is used both in singular for “digital image (100)” and in plural for “digital images (100)”.
In one embodiment the modification of the image is a rotated image, a resized image, a skewed image, a cropped image, a mirrored image, an image with addition or elimination of one or more elements such as text or signs, an image with a part being blurred, or a combination thereof.
In an aspect the present invention provides a method for quantitatively rating the degree of similarity between two digital images (100), comprising the following steps for each of the digital images (100):
In the practical use of the method of the invention the image string (170) of a particular digital image (100) can be calculated by the method of the invention and then compared by calculation of percentage identity or homology to a plurality of image strings (170) in a database where said plurality of image strings (170) are often generated previously by using the same method with the same parameters for resizing the digital image (100), dividing into sections (120) and pixels (130), generation of pixel strings (150) and section strings (160) ending up with the image string (170).
In this practical application of the method of the invention, the requestor of the quantitative image uniqueness analysis who submits the digital image will receive the highest hit-score (180) or a group of the highest hit-scores (180) calculated by comparing the image string (170) with a plurality of previously generated image strings (170) in a database.
In one exemplary embodiment the assignment of a score for the color code value set (140) being RGB (46, 117, 182) is as follows: For each of R(46), G(117) and B(182) an integer between 1-5 is assigned via the following system:
The same assignment for G and B. In this embodiment the pixel (130) having the color code value set RGB (46, 117, 182) will generate the pixel string (150) being “234”.
Hence, in a section (120) having 3 rows and 3 columns there will be 9 pixels (130) and thus 9 pixel strings (150). When these 9 pixel strings (150) are assembled it will give a section string (160) which has a length of 27 integers, e.g.:
If the resized image (110) was divided by 4 rows and 4 columns into 16 sections (120), then the image string (170) has 16×27=432 integers.
In another exemplary embodiment
It is well known to the person skilled in the art how to calculate a hit-score (180) as a percentage identity or homology between the image strings (170) of the two digital images (100). Since the two image strings (170) used for calculating the percentage identity or homology are always the same length, the algorithm for calculating percentage identity is very simple: align the two image strings (170) and count how many position have identical integer or letter, and divide that number by the length of the image string (170) times 100 (unit: %). Also, when calculating percentage homology the image strings (170) have the same length, hence for homology there will only be the added rule for how big a discrepancy qualifies for homology. The calculation is then to sum the homologous and identical positions of the image strings (170), divide by the length of the image string (170) times 100 (unit: %).
It is given that the division of the resized images (110) is by X rows and Y columns into the sections (120), then the sections (120) are either squares or rectangles. Choosing between squares and rectangles here is based on the size and format of the digital image, since the better match will give the best method/algorithm for the purpose.
In one embodiment, both X and Y are 3 or 4. In another embodiment (X, Y) is selected from (3,4), (4,3), (3,5) and (5,3).
Also, the division of the sections (120) into pixels (130) by rows and columns results in squares or rectangles of the pixels (130) in one given section (120). In one embodiment the number of pixels (130) in each section is in the range from 6 to 70, in the range from 8 to 49, in the range from 9 to 25, or in the range from 9 to 16.
In another embodiment the number of pixels (130) in each section is 8, 9, 12, 15, 16 or 20.
In another exemplary embodiment the assignment of a score for the color code value set (140) being RGB (46, 117, 182) is as follows: For each of R(46), G(117) and B(182) an integer between 1-6 is assigned via the following system:
The same assignment for G and B. In this exemplary embodiment the pixel (130) having the color code value set RGB (46, 117, 182) will generate the pixel string (150) being “345”.
In another aspect the present invention provides a method for quantitatively rating the degree of similarity between two digital images (100), comprising the following steps for each of the digital images (100):
In the practical use of the method of the invention the image string (170) of a particular digital image (100) is calculated by the method of the invention and then compared by calculation of percentage identity or homology to a plurality of image strings (170) in a database where said plurality of image strings (170) are often generated previously by using the same method with the same parameters for resizing the digital image (100), dividing into pixels (130), generation of pixel strings (202) ending up with the image string (170).
In this practical application of the method of the invention, the requestor of the quantitative image uniqueness analysis who submits the digital image will receive the highest hit-score (180) or a group of the highest hit-scores (180) calculated by comparing the image string (170) with a plurality of previously generated image strings (170) in a database.
In one exemplary embodiment the assignment of a score for the color code value set being RGB (230,172,109) is as follows. Calculate distance to closest predefined 140 colors—sample listed below:
The calculated closest color is RGB (244,164,96). The RGB pixel value is transformed from decimal to hexadecimal notation to provide conform data representation. Thus (244,164,96) is F4A460 which is the pixel string.
Pixels strings representing each pixel in the digital images is assembled into a single sting representing the digital image (170).
As will be evident from the above two exemplary embodiments using a different system for assignment of the score for the color code value set (160) being RGB (46, 117, 182), the method of the invention works for the comparison of two or more image strings (170) provided all of these image strings (170) are generated by the same method, i.e. same parameter for resizing the digital image (100), dividing into pixels (202), generation of pixel strings (203) ending up with the image string (170).
The color code value set 140 may be determined based on different color systems. In one embodiment the color code value set 140 is based on the RGB (red, green, blue) system giving a three digit/letter pixel string (150). The RGB system is widely used and commercial scanners can provide this format. Another color system is the CMYK (cyan, magenta, yellow and black) system giving a four digit/letter pixel string 150.
The score assigned to each pixel using the color code value set (140) can be any system which combines simplicity and discriminatory ability. In an embodiment, said score is an integer or a letter which is selected from an integer, a single digit integer, an integer in the range from 1 to 7, an integer in the range from 1 to 5 and an integer in the range from 1 to 3. In another embodiment, said score is an integer or a letter which is selected from a letter, a letter from a group of three letters, a letter from a group of five letters or a letter from a group of seven letters, such as (a, b, c) or (f, g, h, i, j).
In an embodiment, said fixed order for assembling the strings is row by row starting from the top row moving down or starting from the bottom row moving up.
In an embodiment, said fixed order for assembling the strings is column by column starting from the left hand column moving right or starting from the right hand column moving left.
In a further aspect the present invention provides the use of the method for the verification of the uniqueness of an image, such as from a digital image (100).
Such a use of the method of the invention for verification of the uniqueness of an image is advantageous in a number of industries, in particular in connection with the prevention of fraud, such as insurance fraud or fraud with product guarantees. In the latter situation a manufacturer of e.g. expensive engine parts may be faced with guarantee claims over broken engine part, in which case it is advantageous to ensure that an image of the broken engine part is unique, i.e. it has not already been submitted via another engines part retailer.
In a further aspect the present invention provides the use of the method in the process of handling insurance claims for increased security in the pay-out process.
As set out above, it is an great advantage that the method converts the digital image to an image string (170), since the actual image of the latter is not available and hence the image strings (170) can be stored in a database (200) without the set-up required for handling confidential documents and personal information under e.g. GDRP or HIPAA.
In a further aspect the present invention provides a computing device (210) having a processor (211) adapted to perform the steps of the present method.
In a yet further aspect the present invention provides a computer program comprising instructions which cause the computer (210) to carry out the method, when the program is executed by a computer (210).
In a yet further aspect the present invention provides a computer-readable medium comprising instructions which cause the computer (210) to carry out the method, when executed by a computer (210).
The networking interface (212) receives requests for digital image (100) uniqueness verification from various devices, such as over an organizations intranet, an insurance claim handling system, or a public internet. The networking interface (212) transmits the signal to the processor (211) for computations according to the steps of the method, see e.g.
The processor (211) performs the division of the resized image (110) into sections (120), and division of the sections (120) into pixels (130), followed by the assignment of a score for each of the individual color codes and assembling them into a pixel string (150). For all the sections (120), the processor (211) then generates a section string (160) by assembling all the pixel strings (150) from a section (120) and then generates the image string (170) by assembling all of the section strings (160). The processor (211) finally calculates a hit-score (180) as a percentage identity or homology between the image strings (170) of two digital images (100), said hit-score (180) being the rating of the similarity between the two digital images (100). This hit-score (180) is returned via the networking interface (212) to the device making the request for uniqueness verification (e.g., an insurance investigator). As described above the practical use of the method of the invention will often imply that an image string (170) is compared by the calculation using another, previously generated, image string (170) or plurality of image strings (170) from a database.
Alternatively as illustrated in
The hit-score (180) is a percentage number in the range from 0% indicating no similarity to 100% indicating exact match under parameters used in the method. The requestor for comparing digital images will determine one or more threshold levels for alerts on initiating closer analysis and scrutiny in relation to digital image not being unique. For instance, above the threshold of 95% an alert is issued to the requestor while for a threshold between 85% and 95% an observance notice is issued to the requestor.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP22/55984 | 3/9/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63200489 | Mar 2021 | US |