The present invention is related to determining the geographic location of a scanned digital image.
Consumers today are switching from film-based chemical photography to digital photography in increasing numbers. The instantaneous nature of image capture and review, the ease of use, numerous output and sharing options, multimedium capabilities, and on-line and digital medium storage capabilities have all contributed to consumer acceptance of this technological advancement. A hard drive, on-line account, or a DVD can store thousands of images, which are readily available for printing, transmitting, conversion to another format, conversion to another medium, or used to produce an image product. Since the popularity of digital photography is relatively new, the majority of images retained by a typical consumer usually takes the form of hardcopy medium. These legacy images can span decades of time and have a great deal of personal and emotional importance to the collection's owner. In fact, these images often increase in value to their owners over time. Thus, even images that were once not deemed good enough for display are now cherished. These images are often stored in boxes, albums, frames, or even their original photofinishing return envelopes.
Getting a large collection of legacy images into a digital form is often a formidable task for a typical consumer. The user is required to sort through hundreds of physical prints and place them in some relevant order, such as chronology or sorting by event. Typically, events are contained on the same roll of film or across several rolls of film processed in the same relative time frame. After sorting the prints, the user would be required to scan the medium to make a digital version of the image. Scanning hardcopy image medium such as photographic prints to obtain a digital record is well known. Many solutions currently exist to perform this function and are available at retail from imaging kiosks and digital minilabs and at home with “all-in-one” scanner/printers or with personal computers equipped with medium scanners. Some medium scanning devices include medium transport structure, simplifying the task of scanning hardcopy medium. Using any of these systems requires that the user spend time or expense converting the images into a digital form only to be left with the problem of providing some sort of organizational structure to the collection of digital files generated.
The prior art teaches sorting scanned hardcopy images by physical characteristics and also utilizing information/annotation from the front and back of the image. This teaching permits grouping images in a specific chronological sequence, which can be adequate for very large image collections.
Hardcopy images exist from many areas of the world. It is desirable to identify the geographic location of a given image as this information assists in searching and organizing an image collection (e.g. an image collection viewer can view all images captured in Canada, or all images from California in the years 1950-1960). Current methods for identifying geographic location from an image (e.g. J. Hays, A. Efros, “IM2GPS: estimating geographic information from a single image”. Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008) rely solely on the information in a digital image but ignore valuable features such as watermarks, postage stamps, language, annotation, and date format. Therefore, current methods are not adequate for accurately determining a geolocation for a hardcopy image.
The present invention provides a method of determining the geographic location of a hardcopy medium having an image side and a non-image side, comprising:
(a) scanning a hardcopy medium to produce a scanned digital image;
(b) scanning the non-image side of the hardcopy medium;
(c) detecting a location feature from the scan of the non-image side of the hardcopy medium;
(d) using the location feature to determine the geographic location of the scanned digital image; and
(e) storing the determined geographic location of the scanned digital image.
The invention can be more completely understood by considering the detailed description of various embodiments of the invention which follows in connection with the accompanying drawings. Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
Over time, these collections become large and unwieldy. Users typically store these collections in boxes and it is difficult to find and gather images from certain events or time erase. It can require a significant time investment for the user to locate their images given the sorting requirement they can have at that time. For example, if you were looking for all images of your children, it would be extremely difficult to manually search your collection and look at each image to determine if it includes your child. If you are looking for images from the 1970s, you would have a very difficult process once again to look at the image (either the front or the back) to find the year it was taken.
These unorganized collections of hardcopy medium 10 also includes of print medium of various sizes and formats. This unorganized hardcopy medium 10 can be converted to digital form with a medium scanner capable of duplex scanning (not shown). If the hardcopy medium 10 is provided in a “loose form,” such as with prints in a shoebox, it is preferable to use a scanner with an automatic print feed and drive system. If the hardcopy medium 10 is provided in albums or in frames, a page scanner or digital copy stand should be used so as not to disturb or potentially damage the hardcopy medium 10.
Once digitized, the resulting digitized images are separated into designated subgroups 20, 30, 40, 50 based on physical size and format determined from the image data recorded by the scanner. Existing medium scanners, such as the KODAK i600 Series Document Scanners, automatically transport and duplex scan hardcopy medium, and include image-processing software to provide automatic de-skewing, cropping, correction, text detection, and Optical Character Recognition (OCR). The first subgroup 20 represents images of bordered 3.5″×3.5″ (8.89 cm×8.89 cm) prints. The second subgroup 30 represents images of borderless 3.5″×5″ (8.89 cm×12.7 cm) prints with round corners. The third subgroup 40 represents images of bordered 3.5″×0.5″ (8.89 cm×12.7 cm) prints. The fourth subgroup 50 represents images of borderless 4″×6″ (10.16 cm×15.24 cm) prints. Even with this new organizational structure, any customer provided grouping or sequence of images is maintained as a sort criterion. Each group, whether envelope, pile or box, should be scanned and tagged as a member of “as received” group and sequence within the group should be recorded.
This dynamic digital metadata record is an organizational structure that becomes even more important as image collections grow in size and time frame. If the hardcopy image collection is large, including thousands of images, and is converted to digital form, an organizational structure such as a file structure, searchable database, or navigational interface is required in order to establish usefulness.
Photographic print medium 90 and the like have an image surface 91, a non-image surface 100, and often include a manufacturer's watermark 102 on the non-imaging surface 100 of the print medium 90. The manufacturer of the print medium 90 prints watermarks 102 on “master rolls” of medium, which are slit or cut into smaller rolls suitable for use in photo processing equipment such as kiosks, minilabs, and digital printers. Manufacturers change watermarks 102 from time to time as new medium types with new characteristics, features and brand designations are introduced to the market. Watermarks 102 are used for promotional activities such as advertising manufacturer sponsorships, to designate special photofinishing processes and services, and to incorporate market specific characteristics such as foreign language translations for sale in foreign markets. Watermarks 102 are typically non-photographically printed on the non-image surface 100 of the print medium 90 with a subdued density and can include text of various fonts, graphics, logos, color variations, multiple colors, and typically run diagonally to the medium roll and cut print shape.
Manufacturers also include slight variations to the master roll watermarks such as adding a line above or below a designated character in the case of an alphanumeric watermark. This coding technique is not obvious or even apparent to user, but is used by the manufacturer in order to monitor manufacturing process control or to identify the location of a manufacturing process problem if a defect is detected. Different variations are printed at set locations across the master medium roll. When finished rolls are cut from the master roll they retain the specific coded watermark variant applied at that relative position along the master roll. In addition, manufacturers maintain records of the various watermark styles, coding methodologies, and when specific watermark styles were introduced into the market.
In testing with actual consumer hardcopy medium, it has been determined that watermark variations, including manufacturer watermarks with special process control coding, provided a very effective way to determine original film roll printing groupings. Once hardcopy medium images are separated into original roll printing groups, image analysis techniques can be used to further separate the roll groupings into individual events. Watermark analysis can also be used to determine printing sequence, printing image orientation, and the time frame in which the print was generated.
A typical photofinishing order, such as processing and printing a roll of film, will, under most circumstances, be printed on medium from the same finished medium roll. If a medium roll contains a watermark with a manufacturer's variant code and is used to print a roll of film negatives, the resulting prints will have a watermark that will most likely be unique within a user's hardcopy medium collection. An exception to this can be if a user had several rolls of film printed at the same time by the same photofinisher, as with film processed at the end of an extended vacation or significant event. However, even if the photofinisher had to begin a new roll of print paper during printing a particular customer's order, it is likely that the new roll will be from the same batch as the first. Even if that is not the case, the grouping of the event such as a vacation into two groups on the basis of differing back prints is not catastrophic.
The medium manufacturer, on an ongoing basis, releases new medium types with unique watermarks 102 to the market. Digital image scanning systems (not shown) can convert these watermarks 102 into digital records, which can be analyzed using Optical Character Recognition (OCR) or digital pattern matching techniques. This analysis is directed at identifying the watermark 102 so that the digital record can be compared to the contents of Look Up Tables (LUT's ) provided by a manufacturer of the medium. Once identified, the scanned watermark 102 can be used to provide a date of manufacture or sale of the print medium. This date can be stored in the dynamic digital metadata record. The image obtained from the image surface 91 of the hardcopy medium 90 is sometimes provided with a date designation 92 such as the markings from a camera date back, which can be used to establish a time frame for a scanned hardcopy medium image 96 without intervention from the user.
If the hardcopy medium 90 has an unrecognized watermark style, that watermark pattern is recorded and stored as metadata in the dynamic digital metadata record and later used for sorting purposes. If a photofinisher or user applied date or other information indicative of an event, time frame, location, subject identification, or the like is detected, that information would be incorporated into the LUT and used to establish a chronology or other organizational structure for subsequent images including the previously unidentified watermark. If a user or photofinisher applied date is observed on that hardcopy medium 90, that date can be added to the LUT. The automatically updated LUT can now use this new associated date whenever this unknown watermark style is encountered. This technique can be deployed to establish a relative chronology for hardcopy image collections that can span decades.
Another technique uses the physical format characteristics of hardcopy medium 90 and correlates these to the film systems that were used to create them and the time frames that these film systems were in general use. Examples of these formats and related characteristics include the INSTAMATIC (a trademark of the Eastman Kodak Company) Camera and 126 film cartridge introduced in 1963 which produced 3.5 inch×3.5 inch (8.89 cm×8.89 cm) prints and was available in roll sizes of 12, 20, and 24 frames.
The Kodak Instamatic camera 110 film cartridge was introduced in 1972 and produced 3.5″×5″ (8.89 cm×12.7 cm) prints and was available in roll sizes: 12, 20, and, 24 frames. The Kodak Disc camera and Kodak Disc film cartridge was introduced in 1982 and produced 3.5″×4.5″ (8.89 cm×11.43 cm) prints with 15 images per Disc. Kodak, Fuji, Canon, Minolta and Nikon introduced the Advanced Photo System (APS) in 1996. The camera and film system had the capability for user selectable multiple formats including Classic, HDTV, and Pan producing prints sizes of 4″×6″, 4″×7″, and 4″×11″ (10.16 cm ×15.24 cm, 10.16×17.78 cm, 10.16×27.94 cm). Film roll sizes were available in 15, 25, and 40 frames and index prints containing imagettes of all images recorded on the film were a standard feature of the system.
The APS system has a date exchange system permitting the manufacturer, camera, and photofinishing system to record information on a clear magnetic layer coated on the film. An example of this data exchange was that the camera could record the time of exposure and the user selected format on the film's magnetic layer which was read and used by the photofinishing system to produce the print in the desired format and record the time of exposure, frame number, and film roll ID# on the back of the print and on the front surface of a digitally printed index print. 35 mm photography has been available in various forms since the 1920's to present and has maintained popularity until the present in the form of “One Time Use Cameras.” 35 mm systems typically produce 3.5″ (8.89 cm)×5″ (12.7 cm) or 4″ (10.16 cm)×6″ (15.24 cm). Prints and roll sizes are available in 12, 24 and 36 frame sizes. “One Time Use Cameras” has the unique characteristic in that the film is “reverse wound” meaning that the film is wound back into the film cassette as pictures are taken producing a print sequence opposite to the normal sequence. Characteristics such as physical format, expected frame count, and imaging system time frame can all be used to organize scanning hardcopy medium into meaningful events, time frames, and sequences.
As with traditional photography instant photography systems also changed over time, for example, the Instant film SX-70 format was introduced in the 1970s, the Spectra system, Captiva, I-Zone systems were introduced in the 1990s, each of which had a unique print size, shape, and border configuration.
For cameras with a square format, the photographer had little incentive to rotate the camera. However, for image capture devices that produce rectangular hardcopy prints, the photographer sometimes rotates the image capture device by 90 degrees about the optical axis to capture a portrait format image (i.e. the image to be captured has a height greater than its width to capture objects such a buildings that are taller than they are wide) rather than a landscape format image (i.e. the image to be captured has a width greater than it's height).
In
For example, once every hardcopy medium item has been scanned and an associated complete metadata record 200 has been created, powerful search queries can be constructed to permit the hardcopy medium to be organized in different and creative ways. Accordingly, large volumes of hardcopy medium images can be rapidly converted into digital form and the digital metadata record 200 is dynamically created to completely represent the metadata of the image. This dynamic digital metadata record 200 can then be used for, but not limited to, manipulating the digitized hardcopy images, such as organizing, orientating, restoring, archiving, presenting and enhancing digitized hardcopy images.
Referring now to
The hardcopy medium can be scanned by a scanner in any order in which the medium was received. The medium is prepared 210 and the front and back of the medium is scanned 215. The scanner creates information in the image file that can be used to extract the recorded metadata information 220. By using a Color/Black and White algorithm 225, a decision point is created 230 and the appropriate color map (non-flesh, i.e. black and white) 235, (flesh color) 240 is used to find, but is not limited to, faces in the image. If the map is rotated in orientations of 0, 90, 180, 270 degrees with a face detector, the orientation of the image can be determined and the rotation angle (orientation) is recorded 245. The orientation will be used to automatically rotate the image before it is written (useful before writing to a CD/DVD or displaying one or more images on a display).
Using a border detector 250, a decision point is made if a border 255 is detected. If a border is detected, a minimum density (Dmin) 260 can be calculated by looking in the edge of the image near the border. After the border minimum density is calculated, it is recorded 265 in the derived metadata. Text information/annotation written in the border can be extracted 270. OCR can be used to convert the extracted text information to ASCII codes to facilitate searching. The border annotation is recorded 290 into the derived metadata. The border annotation bitmap can also be recorded 292 into the derived metadata. The border style such as scalloped, straight, rounded is detected 294 and recorded 296 into the derived metadata. If the image is an index print 275, information such as the index print number can be detected 280 and recorded 282. Index print events can also be detected 284 and recorded 286. If the image is not an index print 275, information such as a common event grouping can be detected 277 and recorded 279. The common event grouping is one or more images originating from the same event or a group of images having similar content. For example, a common event grouping can be one or more images originating from a fishing trip, birthday party or vacation for a single year or multiple years. The complete set of metadata In the present embodiment, the determine image transform step 506 uses derived metadata information 298 originally derived by scanning the non-image surface 100 of print medium 90 to determine an image transform 510. For example, the image transform 510 can be an image rotation such that the image is corrected in accordance with a determined image. An image transform 510 is applied to a particular image by the apply image transform step 514, producing an enhanced digital image.
The determine image transform step 506 can also use derived metadata 298 associated with other images from the same event grouping to determine the image transform 510. This is because an event grouping is detected 277 using watermarks 102 and recorded 279, as described above. In addition, the determine image transform 506 step can also use image information (i.e. pixel values) from the image and other image(s) from the same event grouping to determine the image transform 510. After application of the image transform, the improved rotated scanned digital image can be printed on any printer, or displayed on an output device, or transmitted to a remote location or over a computer network. Transmission can include placing the transformed image on a server accessible via the internet, or emailing the transformed image. Also, a human operator can supply operator input 507 to verify that the application of the image transform 510 provides a benefit. For example, the human operator views a preview of the image transform 510 applied to the image, and can decide to ‘cancel’ or ‘continue’ with the application of the image transform. Further, the human operator can override the image transform 510 by suggesting a new image transform (e.g. in the case of image orientation, the human operator indicates via operator input 507 a rotation of counter-clockwise, clockwise, or 180 degrees).
For example, the image transform 510 can be used to correct the orientation of an image based on the derived metadata associated with that image and the derived metadata associated with other imaged from the same event grouping. The image's orientation indicates which one of the image's four rectangular sides is “up”, from the photographer's point of view. An image having proper orientation is one that is displayed with the correct rectangular side “up”.
In
The geographic location of a hardcopy image is detected with the help of a location feature. A location feature 299 is any information extracted by one or more of a suite of recognizers (a text recognizer 209, a text language recognizer 214, a date recognizer 213, a postmark recognizer 211, a stamp recognizer 207, and a watermark recognizer 212) which operate upon the image and the non-image surfaces of a hardcopy image such that the information is useful in detecting the geographic location of an image. Some examples of a location feature are the format of the printed or handwritten date, the language of the handwritten or printed text, or location specific words extracted from one or more of the aforementioned recognizers. A location specific word is a word in any language which can be directly converted into geographic location(s) using available geographical knowledgebases. A location specific word can be as precise as “Paris, France” or as generic as “beach”. Location specific words specify the geographic location as a distribution over the entire world. The aforementioned recognizers and the location feature(s) which they produce will be described in detail below.
A collection of hardcopy medium 10 is scanned by a scanner 201. Preferably, the scanner 201 scans both the image side (producing a scanned digital image) and the non-image side of each photographic print. The collection of these scans make up a digital image collection 203.
A text detector 205 is used to detect text on either the scanned digital image or the scan of the non-image side of each image. For example, text can be found with the method described by U.S. Pat. No. 7,177,472. In the present invention, there are two types of text that are of primary interest: handwritten annotations and machine annotations.
Handwritten annotations contain rich information, often describing the location of the photo, the people in the photo and the date of the photo. Recognizing handwritten text, of course poses challenges due to large variations in handwritings, language, and grammar of the handwritten text. There have been several attempts in the machine learning community to address the problem of handwritten character recognition. The published article of R Plamondon, S N Srihari, E Polytech, Q Montreal, Online and off-line handwriting recognition: a comprehensive survey, IEEE Trans. Pattern Analysis and Machine Intelligence, 2000 discusses this field in detail. This problem is more generally covered in the field of OCR, Optical Character Recognition which refers to the process of mechanical or electronic translation of images of handwritten, typewritten or printed text from a scanned print into machine-editable text. Examples of handwritten and printed text are shown as 1000 and 1006 respectively in
A date recognizer 213 analyzes the recognized text from a text recognizer 209. Text recognizer 209 is an OCR system. The recognized text is analyzed by the date recognizer 213 that searches the text for possible dates, or for features that relate to a date. Note that the image capture date can be precise (e.g. Jun. 26, 2002 at 19:15) or imprecise (e.g. December 2005 or 1975 or the 1960s), or can by represented as a continuous or discrete probability distribution function over time intervals. Features from the image itself give clues related to the date of the image. Additionally, features describing the actual photographic print (e.g. black and white and scalloped edges) are used to determine the date. Finally, annotations can be used to determine the date of the photographic print as well. When multiple features are found, a Bayesian network or another probabilistic model is used to arbitrate and determine the most likely date of the photographic print.
For determining the geographic location, the exact date is not as valuable as the format in which date has been written. There are three standard ways to express calendar dates in popular as well as formal use:
A complete list of calendar date formats and their usages can be obtained from any encyclopedia (for example Wikipedia http://en.wikipedia.org/wiki/Calendar_date). The format of writing the date (handwritten or printed date) can be a useful cue to determine where the picture was taken. It is possible that the format of the date alone may not be sufficient to determine the geographic region precisely. Ambiguities could result from errors in identifying the date, months, and year fields in a date represented in any of the aforementioned formats. However, the date format feature used in conjunction with other forms of inferences (for example determining date using front scans only) can be helpful in reducing ambiguities. Another possibility is that the handwritten or printed date could represent the geographic affiliation of the writer, photographer, or her place of residence rather than the geographic affiliation of the picture itself. The calendar date or the format of the calendar date can form a part-of or complete location feature 299 and passed to the geographic location detector 300.
A postmark recognizer 211 analyzes the recognized text from the text recognizer 209. A postmark is a postal marking made on a letter, package, postcard or a back of a photo indicating the date, time, and place that the item was delivered into the care of the postal service. Postmarks may be applied by hand or by machines, using methods such as rollers or inkjets, while digital postmarks are a recent innovation. Postmarks are found on the back of photographs if they were mailed. An example postmark is shown as 1002 in
A text language recognizer 214 analyzes the recognized text from the text recognizer 209. The preprinted or handwritten text can correspond to one or more languages. For example, the text can be written in English and German. The language(s) of the text can be converted to one or more location specific word(s). A method to detect the language of text can be found in U.S. Patent Application Publication No. 2002/0095288, Text language detection. The language of the preprinted or handwritten text or the location specific word(s) obtained from the language of the text can form a part-of or complete location feature 299 and passed to the geographic location detector 300.
A stamp recognizer 207 analyses the collection 203. A postage stamp is an adhesive paper evidence of pre-paying a fee for postal services. Usually a small paper rectangle or square that is attached to the object being mailed, the postage stamp signifies that the person sending the letter or package may have either fully, or perhaps partly, pre-paid for delivery. An example postage stamp is shown as 1004 in
A watermark recognizer 212 analyses the collection 203. An example of a manufacturer watermark is shown as 102 in
A visual scene recognizer 206 analyses the collection 203. Visual scene recognition has been studied in the computer vision research area for a number of years. Scene recognition can range from recognizing activities/events in an image to pinpointing to exact place where the image was taken. Scene recognition can be helpful for refining the geographic location in association with other forms of inferences. For example, if the text recognizer 209 detects the text “Nice, France” (626 in
The invention has been described in detail with particular reference to certain preferred embodiments thereof, but it will be understood that variations and modifications can be effected within the spirit and scope of the invention.
Reference is made to commonly assigned U.S. patent application Ser. No. 11/511,798 file Apr. 21, 2006 (now U.S. Patent Application Publication No. 2007/0250529) entitled “Method for Automatically Generating a Dynamic Digital Metadata Record From Digitized Hardcopy Media by Louis J. Beato et al; U.S. patent application Ser. No. 12/136,820 field Jun. 11, 2008, entitled “Finding Image Capture Date of Hardcopy Medium” by Andrew C. Gallagher et al and U.S. patent application Ser. No. 12/136,836 filed Jun. 11, 2008, entitled “Finding Orientation and Date of Hardcopy Medium” by Andrew C. Gallagher et al, the disclosures of which are incorporated herein.