Disclosure of document images, such as check, invoice, and receipt images and related documents can be a significant privacy problem. Knowledge of who gave how much money to whom, and even for what, may be of interest to media organizations, business competitors, and criminals alike. Steganographic techniques have been used to conceal tracking data within documents and documents images using many steganographic techniques. However, such techniques often fail due to poor quality reproduction of compromised documents and document images, awareness and redaction of steganographic markers, and the like.
Various embodiments herein each include at least one of systems, methods, and software that assists in identification of a source of an unauthorized disclosure of a document, such as a public disclosure of check, receipt, or other document type.
One method embodiment includes receiving a request for a stored document image from a requestor and retrieving the requested stored document image. This method may then apply a parameterized image deformation algorithm according to at least one input parameter to generate a watermarked image and then stores the at least one parameter and an identifier of the requestor in association with an identifier of the requested stored document in a watermarked image log. The watermarked image is then transmitted to the requestor.
Another method embodiment includes receiving a compromised image and correlating the compromised image to the stored document image. This method may then process at least one of the compromised and stored images such that both the compromised and stored document images have the same vertical and horizontal properties. The method then retrieves each of the at least one parameters and respective requestor identifier for each entry in the watermarked image log including an identifier of the stored document image correlated to the compromised image. The parameterized image deformation algorithm is then applied against the stored document image for each of the at least one parameters retrieved to generate comparison images of the stored document image. This method then compares each comparison image to the processed compromised image to generate a matching score. When the comparing results in a score meeting a matching criterion, the method outputs at least the requestor identifier associated in the watermarked image log with the at least one parameter applied by the parameterized deformation algorithm in generating the respective comparison image.
In various embodiments herein, one objective is to hide a signature in plain view such that it is transferred even when lower fidelity photos or image conversion is performed. As an example, the signature will remain even when a document image viewer takes a picture of a document or a document image presented on a computer screen with a smartphone phone or other camera. This type action would not normally be traceable.
Some embodiments herein use a new steganography technique to add a unique signature to an image every time an image is downloaded by a computer system and presented either on a screen or in print form. The steganographic signature is recorded in an audit facility so that when an information disclosure is being investigated, the disclosed image may be traced to a specific access by a user at a recorded time and date.
In some such embodiments, exact matching is not necessary as even narrowing the field of candidate accesses can be of great help in an investigation. For example, a blurry cell phone picture may be difficult to match exactly, but maybe 70% of accesses are ruled out allowing the investigation to focus on the remaining 30%.
The steganography technique utilized in some embodiments herein includes geometrically deforming a retrieved document image, such as an image of a check or receipt, before providing the image to a requesting computer system user. The geometric deformation is typically performed by a parameterized process that is provided with unique parameters that specify how the geometric deformation is to be performed. This may include one more values that indicate an amplitude of a particular deformation type (e.g., stretch, compress, swirl, Bezier curve, etc.) and may even identify one or more particular deformation-types to apply. The parameters, an image identifier, a requesting user identifier, and a date-time-stamp are then recorded in an audit trail or log to enable reproduction of the geometrically deformed document image and tracking back to the image access by the particular user. Note that the geometric deformation is made to provide a signature tracking mechanism but to still maintain legibility of the document image.
Later, should a document image be compromised, an image of the compromised document can be matched to the original document and the audit trail can be utilized to regenerate the documents with their signatures to enable comparison with an image of the compromised document. This comparison may identify the user who disclosed the compromised document or narrow the possible users that disclosed the compromised document.
These and other embodiments are described and illustrated herein.
In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments in which the inventive subject matter may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice them, and it is to be understood that other embodiments may be utilized and that structural, logical, and electrical changes may be made without departing from the scope of the inventive subject matter. Such embodiments of the inventive subject matter may be referred to, individually and/or collectively, herein by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed.
The following description is, therefore, not to be taken in a limited sense, and the scope of the inventive subject matter is defined by the appended claims.
The functions or algorithms described herein are implemented in hardware, software or a combination of software and hardware in one embodiment. The software comprises computer executable instructions stored on computer readable media such as memory or other type of storage devices. Further, described functions may correspond to modules, which may be software, hardware, firmware, or any combination thereof. Multiple functions are performed in one or more modules as desired, and the embodiments described are merely examples. The software is executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a system, such as a personal computer, server, a router, or other device capable of processing data including network interconnection devices.
Some embodiments implement the functions in two or more specific interconnected hardware modules or devices with related control and data signals communicated between and through the modules, or as portions of an application-specific integrated circuit. Thus, the exemplary process flow is applicable to software, firmware, and hardware implementations.
The system 100 includes clients 102, 104, 106 that may be utilized to generate or capture document images, such as images of checks, invoices, receipts, specifications, contracts, and other documents. The clients 102, 104, 106 may then transmit captured or generated document images via a network 108, such one or more of the Internet, Local Area Network, or other network, to a backend system 110 that stores image copies and tracks accessing thereof.
The clients 102, 104, 106 are illustrated as a smartphone 102, a personal computer 104, and a tablet 106. However, some embodiments may include other client-types, such as self-service terminals (SSTs) such as automated teller machines (ATMs) and self-checkout terminals. Such SSTs may include imaging devices to generate images of presented checks, generated receipts, and the like. The clients may also include check imaging devices utilized to process checks by payment processing entities. Other client types may also be present in other embodiments.
The backend system 110 may be a standalone document image management system or may be a part of another system, such as a banking system, a customer relationship management system, an accounting system, an Enterprise Resource Management system, a document management system, and the like. The backend system 110 may be deployed to one or more physical or virtual computing device in various embodiments. The backend system 110 may store image and log data locally thereon, on one or more databases 112, or a combination thereof.
When a client 102, 104, 106 user wants to view a document image, the user manipulates an application or app on their respective client 102, 104, 106 to request a document image. The document image may be retrieved by a unique document identifier, such as a check number, an index number, file name, or other identifier. The client 102, 104, 106 app or application, which may be a web browser-accessible application, upon receipt of input to request a document image generates and sends a request to the backend system 110 via the network 108. The request will include or be associated with data identifying the requested document and the user requesting the document. The backend system 110 then retrieves the requested document and geometrically deforms the document to watermark the document image that will be provided as is discussed in greater detail below. The backend system 110 then logs the access of the document in a watermarked image log with the document identifier, the user identifier, parameters utilized in geometrically deforming the document, and a date time stamp. The backend system 110 then transmits the geometrically deformed document image to the client 102, 104, 106 of the requesting user.
Later, the document image provided to the requesting user may be compromised. An image of the compromised document can be processed according to various embodiment herein to match the compromised document image to an instance where a user requested to view the original document, or at least reduce the number of viewing users from consideration when investigating to identify the compromising source. At a high level the matching process, which is also described in greater detail below, includes a user providing a compromised document image via a client 102, 104, 106. The user may then identify the original document stored by the backend system 110 that was compromised and submit the compromised image document and the original document identifier to the backend system 110. The backend system 110 then processes at least one of the compromised and original document images so that the two images have the same resolution, size, and to remove any skewing in the compromised document image. Skewing refers to the documents having a trapezoidal or other shape in the compromised document image that can occur when a camera or other imaging device that captured the image was at an angle or was rotated when capturing the image. One way to consider this processing is normalization of the images to enable likewise comparison.
The backend system 110 then retrieves watermark image log data associated with the original document. This retrieved data includes the parameters utilized in geometrically deforming the document image, a user identifier, and a date time stamp. In some embodiments, the backend system 110 may eliminate retrieved data that has a date time stamp after a date and time the compromised image was known to have been compromised. The remaining retrieved data, or all of the retrieved data when no data has been eliminated, is then utilized to generate a comparison image for each logged data document viewing. The comparison images are then compared to the compromised document image to identify a match or that match according to a scoring algorithm where the score is at or above a matching threshold. The comparison in some embodiments, includes an image subtraction. For example, the images may be in or converted to a binary black and white form. Since the images have been processed to be of the same size, shape, and resolution, one image may be subtracted from the other image (i.e., pixel by pixel subtraction). When the score is zero, this indicates an absolute match. Absolute matching is likely to be a rare occurrence as reproduction of images, especially by taking pictures or scanning of documents and images of documents rarely results in a perfect reproduction. However, when the result indicates a 70% or greater matching, the possibility of a match may be deemed quite likely. The percentage of matching pixels' indicative of a likely match is a threshold in some embodiments, and that threshold may be adjusted in some embodiments.
In some instances, one image may not be aligned well with the pixels of the other image. Thus, in some embodiments, the processing of the images before the comparing is performed may include an alignment of the images for comparing. This may include shifting some or all pixels of one image up or down and left or right.
When an actual or deemed likely or potential match is identified, the backend system outputs the watermark image log data associated with therewith. The backend system may output the data back to the client 102, 104, 106 of the user that submitted the compromised document image or otherwise notify that user or one or more other users.
The top portion of
Note however that the above example is greatly exaggerated. The transition between deforming will typically be more gradual and the amount of stretching below the threshold where it will be obviously visible.
Note as well that the signature, in some embodiments, is randomly generated according to well known random number and value generation techniques. The signature is recorded in a watermarked image log, as mentioned above, with time, date, viewing user, and image identifying data.
The compressing and stretching of an image such as is applied to arrive at the image 204 above is but one example of a type of geometric deformation that can be applied. Other embodiments may include modifying the axis of the bands to yield a completely different signature, such as orienting the stretching and compressing at an angle (e.g., 30, 45, or 90 degrees). Different angles and orientations also yield unique signatures and can be one parameter of the signature.
Additionally, bands are not the only form of geometric deformation. Alternatives include circular, elliptical, along an irregular boundary or curve, and the like. The parameters of the curve form part of the signature that is recorded. Another example is a Bezier curve superimposed on the image where pixels are stretched away from the normal of the curve, such as illustrated by image 206.
Now with reference to the bottom portion of
The method 300, in some embodiments, includes the backend system 110 receiving 302 a request for a stored document image via the network 108 from a requestor (i.e., a user of a client 102, 104, 106). The backend system 110 then retrieves 304 the requested stored document image from storage and applies 306 a parameterized image deformation algorithm according to at least one input parameter to generate a watermarked image. The input parameter may be one or more randomly generated values, tracked incremented values, and the like that are programmatically obtained and provided as input arguments to the parameterized image deformation algorithm. The method 300 then stores 308 the at least one parameter and an identifier of the requestor in association with an identifier of the requested stored document in a watermarked image log. A date time stamp or similar data may also be stored 308 to the watermarked image log. The method 300 then transmits 310 the watermarked image to the requestor via the network 108.
In some embodiments of the method 300, an original document of the stored document image includes a representation of a unique identifier of the original document and the identifier of the requested stored document is the unique identifier of the original document. The unique identifier may be a check number which may also include an account number, a file name, and other unique data item identifier types.
In some embodiments, the parameterized image deformation algorithm geometrically deforms the stored document image to a degree as specified by each of the at least one parameters. In one such embodiment, at least one of the parameters specifies at least one geometric deformation method to be applied to deform the stored document image, such as a banded or Bezier curve geometric deformation method as described above. In some such embodiments, at least one parameter identifies an amplitude of geometric deformation to be applied to the stored document image.
The method 400, in some embodiments, includes receiving 402 a compromised image and correlating 404 the compromised image to the stored document image. The compromised image may be received 402 from a client 102, 104, 106 user and the correlating 404 may be based on received user input or by reading data from the compromised image, such as by performing optical character recognition. The method 400 then processes 406 at least one of the compromised and stored images such that both the compromised and stored document images have the same vertical and horizontal properties.
The method 400 then retrieves 408 each of the at least one parameters and respective requestor identifier for each entry in the watermarked image log including an identifier of the stored document image correlated to the compromised image. Next, the method 400 applies 410 the parameterized image deformation algorithm against the stored document image for each of the at least one parameters retrieved to generate comparison images of the stored document image. Comparing 412 is then performed with regard to each comparison image and the processed compromised image to generate a matching score. When the comparing results in a score meeting a matching criterion, the method 400 outputs 414 at least the requestor identifier associated in the watermarked image log with the at least one parameter applied by the parameterized deformation algorithm in generating the respective comparison image.
Returning to the computer 510, memory 504 may include volatile memory 506 and non-volatile memory 508. Computer 510 may include—or have access to a computing environment that includes a variety of computer-readable media, such as volatile memory 506 and non-volatile memory 508, removable storage 512 and non-removable storage 514. Computer storage includes random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM) and electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.
Computer 510 may include or have access to a computing environment that includes input 516, output 518, and a communication connection 520. The input 516 may include one or more of a touchscreen, touchpad, mouse, keyboard, camera, one or more device-specific buttons, one or more sensors integrated within or coupled via wired or wireless data connections to the computer 510, and other input devices. The computer 510 may operate in a networked environment using a communication connection 520 to connect to one or more remote computers, such as database servers, web servers, and other computing device. An example remote computer may include a personal computer (PC), server, router, network PC, a peer device or other common network node, or the like. The communication connection 520 may be a network interface device such as one or both of an Ethernet card and a wireless card or circuit that may be connected to a network. The network may include one or more of a Local Area Network (LAN), a Wide Area Network (WAN), the Internet, and other networks. In some embodiments, the communication connection 520 may also or alternatively include a transceiver device, such as a BLUETOOTH® device that enables the computer 510 to wirelessly receive data from and transmit data to other BLUETOOTH® devices.
Computer-readable instructions stored on a computer-readable medium are executable by the processing unit 502 of the computer 510. A hard drive (magnetic disk or solid state), CD-ROM, and RAM are some examples of articles including a non-transitory computer-readable medium. For example, various computer programs 525 or apps, such as one or more applications and modules implementing one or more of the methods illustrated and described herein or an app or application that executes on a mobile device or is accessible via a web browser, may be stored on a non-transitory computer-readable medium.
It will be readily understood to those skilled in the art that various other changes in the details, material, and arrangements of the parts and method stages which have been described and illustrated in order to explain the nature of the inventive subject matter may be made without departing from the principles and scope of the inventive subject matter as expressed in the subjoined claims.