The present invention relates to annotating photographs and more specifically, to a method and a system for associating annotations to photographs during a slide show and enabling a photograph search using the annotations.
In recent years, digital cameras have been replacing film cameras. A growing number of users are storing their photographs on computers. A number of software programs are available to manage these photographs.
Existing software programs enable users to organize and manage photographs into photo albums. The photographs in the photo albums can then be labeled with appropriate descriptions. Some of the existing methods of labeling photographs comprise annotating the photographs with text descriptions or with voice tags. These voice tags can then be transcribed and indexed.
Japanese patent No. 2003087624, titled, “Digital Camera” assigned to Matsushita Electric Ind Co Ltd, discloses a method for annotating images with speech, and indexing the transcribed speech.
Another method as described in U.S. Pat. No. 6,084,582, titled “Method and apparatus for recording a voice narration to accompany a slide show”, records a voice-description to accompany a slide show presentation. Further, the recorded voice-descriptions can be segmented such that segments of voice-descriptions can be digitized, stored and associated with each slide in the slide show.
There exist systems that enable photograph search in a photo album. Some of these systems employ ‘Automatic Image Annotation’ techniques and search based on keywords. Also, cameras often annotate photograph with the time the photograph was taken, and sometimes also the location of the photograph. These time and location annotations can also be used to conduct search. However, the kind of queries that can be answered using these automatic annotations is very limited.
Additionally, in some of the existing systems, users can record short voice segments in which they describe each photograph. These voice segments can then be transcribed using automatic transcription software and the resulting text can then be used for searching the photographs. For example, various modern digital cameras come with a voice annotation feature, with which users can record a short voice segment and attach the voice segment with a specific picture. Some photo-management systems can transcribe these annotations and conduct a search on them. Other photo-management systems, for example AT&T's Shoebox, let users record photograph description after downloading the photographs to their computers, and these descriptions can be transcribed and made searchable.
However, annotating each photograph individually is an inconvenient job for a user and as the collection of photographs gets bigger it takes long time to annotate the photographs. Further, searching using voice annotations such as time, location and names of people facilitates only limited kinds of queries.
An object of the present invention is to provide a method and a system for associating annotations to photographs during a slide show and enabling a photograph search using the annotations.
In order to fulfill the above object, the method comprises displaying at least one photograph during the slide show. In response to displaying at least one photograph, a voice-description corresponding to the slide show is recorded. Thereafter, the voice-description is transcribed to form at least one transcribed-text corresponding to each photograph.
Further, the voice-description corresponding to the slide show can be filtered to remove periods of silence or noise. The voice-description can then be segmented on the basis of a photograph being displayed in the slide show to form a segmented-voice-description corresponding to the photograph. The segmented-voice-description can be saved in a compressed format and can be associated to its corresponding photograph.
The system includes a displaying module for displaying photographs during a slide show and a recording module for recording a voice-description for each photograph. The system further includes a transcribing module for transcribing a voice-description for each photograph and a storing module for storing a transcribed-text for each photograph.
The system further includes a searching module, the searching module searches a photograph based on keywords, and the photographs found as a result of the search are displayed in a search-display module in an order of a context score corresponding to each photograph.
The foregoing objects and advantages of the present invention for a method for annotating photographs during a slide show may be more readily understood by one skilled in the art with reference being had to the following detailed description of several preferred embodiments thereof, taken in conjunction with the accompanying drawings wherein like elements are designated by identical reference numerals throughout the several views, and in which:
Before describing in detail embodiments that are in accordance with the present invention, it should be observed that the embodiments reside primarily in combinations of method steps and system components related to annotating photographs during a slide show. Accordingly, the system components and method steps have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present invention so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein. Thus, it will be appreciated that for simplicity and clarity of illustration, common and well-understood elements that are useful or necessary in a commercially feasible embodiment may not be depicted in order to facilitate a less obstructed view of these various embodiments.
In this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” “has”, “having,” “includes”, “including,” “contains”, “containing” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises, has, includes, contains a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. An element preceded by “comprises . . . a”, “has . . . a”, “includes . . . a”, “contains . . . a” does not, without more constraints, preclude the existence of additional identical elements in the process, method, article, or system that comprises, has, includes, contains the element. The terms “a” and “an” are defined as one or more unless explicitly stated otherwise herein. The terms “substantially”, “essentially”, “approximately”, “about” or any other version thereof, are defined as being close to as understood by one of ordinary skill in the art, and in one non-limiting embodiment the term is defined to be within 10%, in another embodiment within 5%, in another embodiment within 1% and in another embodiment within 0.5%. The term “coupled” as used herein is defined as connected, although not necessarily directly and not necessarily mechanically. A device or structure that is “configured” in a certain way is configured in at least that way, but may also be configured in ways that are not listed.
It will be appreciated that embodiments of the present invention described herein may be comprised of one or more conventional processors and unique stored program instructions that control the one or more processors to implement, in conjunction with certain non-processor circuits, some, most, or all of the functions of the method and system for annotating photographs during a slide show described herein. The non-processor circuits may include, but are not limited to, a transceiver, signal drivers, clock circuits and power source circuits. As such, these functions may be interpreted as steps of a method to annotate photographs during a slide show described herein. Alternatively, some or all functions could be implemented by a state machine that has no stored program instructions, or in one or more application specific integrated circuits (ASICs), in which each function or some combinations of certain of the functions are implemented as custom logic. Of course, a combination of the two approaches could be used. Thus, methods and means for these functions have been described herein. Further, it is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions and programs and ICs with minimal experimentation.
Generally speaking, pursuant to the various embodiments, the present invention relates to associating annotations to photographs during a slide show. The annotations can comprise voice-descriptions and corresponding text descriptions. The photographs can be made searchable using these voice-descriptions or text descriptions. Those skilled in the art will realize that the above recognized advantages and other advantages described herein are merely exemplary and are not meant to be a complete rendering of all of the advantages of the various embodiments of the present invention.
A user often shows computerized photo albums to others, such as family and friends, generally accompanied with a verbal description of the photographs. A slide show mode is typically used to display photographs one after another, magnified to fill an entire screen. The present invention proposes a method and a system that enables the user to record verbal description during such slide show sessions to display the photographs. The verbal description can then be segmented such that the segments of the verbal description are associated as annotations with corresponding photographs in the slide show.
Referring now to the drawings, and in particular
Referring back to
In an embodiment of the present invention, a user can annotate the same slide show more than one time. For example, a user can show the user's wedding photographs to the user's family and during the slide show a segmented-voice-description and corresponding transcribed-text can be stored. Further, the user can then show the same slide show to the user's friends and thereby another segmented-voice-description and corresponding transcribed-text is acquired. The different segmented-voice-description and the corresponding transcribed-text acquired during different time stamps are managed and stored while omitting any repetition that might be there. Therefore, enabling a user to update a transcribed text associated with a photograph is updated each time a user displays the photograph on the slide show. Further, this enable in improved searching capability for the photographs as the transcribed-text covers the various descriptions that were given during the more than one slide show.
Turning now to
Turning now to
In another embodiment of the present invention, the transcribed-text corresponding to a photograph can be embedded in the photograph in a predefined form. The predefined form can be, for example, a Joint Photography Experts Group comment (JPEG comment) or an Exchangeable Image File header (EXIF header). Transcribed-text embedded in the photographs can make it convenient for the user to see the transcribed-text along with the photographs during the slide show.
In yet another embodiment of the present invention, a database is maintained that stores a reference to a photograph and a transcribed-text corresponding to the photograph. A reference is an unique identification information of a photograph. The unique identification information can be a unique file checksum corresponding to a photograph. When a photograph is access during a slide show using the reference the corresponding transcribed-text is retrieved and is displayed to the user. Further, when a search is conducted using information related to the transcribed-text certain, the database is searched for the corresponding transcribed-text and using the corresponding reference a photograph corresponding to the transcribed-text can be retrieved.
Referring now to
Referring now to
Displaying module 505 is configured for displaying the slide show. The slide show can be displayed on a computer screen, a television screen, a personal digital assistant screen or a mobile phone screen. Photographs in the slide show can be displayed one after the other in a predefined interval, magnified to fill an entire screen for displaying.
Recording module 510 is configured for recording a voice-description given by a user corresponding to the slide show. In an embodiment of the present invention, when the user starts displaying the slide show, the system automatically initiates recording module 510 and the voice-description corresponding to the slide show can be recorded without user intervention. The user can give the voice-description by speaking into a microphone or a headset attached to the device displaying the slide show and recording module 510. Recording module 510 can take the voice-description as an input and forward it to transcribing module 515. Transcribing module 515 is configured to transcribe the voice-description into a corresponding transcribed-text. In an embodiment of the present invention, the voice-description is segmented to form segmented-voice-description. A segmented-voice-description can be associated with each photograph in the slide show. Essentially, a segmented-voice-description corresponding to a photograph can be a brief description of the photograph. As mentioned earlier, a segmented-voice-description may not exist for some photographs in case of a period of silence or noise during the slide show. The segmented-voice-descriptions can then be transcribed by transcribing module 515 to obtain a transcribed-text corresponding to each photograph in the slide show. However, if there is no segmented-voice-description associated with some photographs in the slide show, transcribed-text may not exist for those photographs.
The transcribed-text obtained from transcribing module 515 can be forwarded to storing module 520. Storing module 520 is configured to store the transcribed-text corresponding to each photograph. In an embodiment of the present invention, a transcribed-text corresponding to a photograph is saved with a name same as the corresponding name of the photograph. In an embodiment of the present invention, more than one transcribed-text corresponding to a photograph can be saved. For example, if a user views a slide show more than one time, different transcribe-text can be generated each time corresponding to a photograph. Therefore, there can be more than one transcribed-text associated with each photograph. In an embodiment of the present invention, deleting repeated text and adding new text can refine the transcribed-text associated with a photograph in the slide show. This enables a user to use a broader list of keywords for searching each photograph.
In yet another embodiment of the present invention, a database is maintained that stores a reference to a photograph and a transcribed-text corresponding to the photograph. A reference is an unique identification information of a photograph. The unique identification information can be a unique file checksum corresponding to a photograph. When a photograph is access during a slide show using the reference the corresponding transcribed-text is retrieved and is displayed to the user. Further, when a search is conducted using information related to the transcribed-text certain, the database is searched for the corresponding transcribed-text and using the corresponding reference a photograph corresponding to the transcribed-text can be retrieved.
One embodiment of the present invention deploys the method described in
Searching module 525 can search a photograph based on at least one keyword given by the user. The at least one keyword can relate to at least one annotation, for example transcribed-text, corresponding to the photograph that the user wishes to display. In an embodiment of the present invention, searching module 525 further comprises a search-display module coupled to displaying module 505 and searching module 525. The search-display module can be configured to display the search results in an order of the context-score corresponding to each photograph.
Turning now to
Various embodiment of the present invention provides system and method for annotating photographs during a slide show. The system simultaneously associate a voice-description based on the description provided by the user during the slide show. Therefore, saving the user from an extra effort of annotating the photographs with the voice-description. Further, the various embodiments of the present invention associate a plurality of transcribed-text, a context and context-score with each photograph. Therefore, enabling an efficient way of searching the photographs.
The method for annotating photographs during a slide show, as described in the invention or any of its components may be embodied in the form of a computing device. The computing device can be, for example, but not limited to, a general-purpose computer, a programmed microprocessor, a micro-controller, a peripheral integrated circuit element, and other devices or arrangements of devices, which are capable of implementing the steps that constitute the method of the invention.
The computing device executes a set of instructions that are stored in one or more storage elements, in order to process input data. The storage elements may also hold data or other information as desired. The storage element may be in the form of a database or a physical memory element present in the processing machine.
The set of instructions may include various instructions that instruct the computing device to perform specific tasks such as the steps that constitute the method of the invention. The set of instructions may be in the form of a program or software. The software may be in various forms such as system software or application software. Further, the software might be in the form of a collection of separate programs, a program module with a larger program or a portion of a program module. The software might also include modular programming in the form of object-oriented programming. The processing of input data by the computing device may be in response to user commands, or in response to results of previous processing or in response to a request made by another computing device.
In the foregoing specification, specific embodiments of the present invention have been described. However, one of ordinary skills in the art appreciates that various modifications and changes can be made without departing from departing from the scope of the present invention as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of present invention. The benefits, advantages, solutions to problems, and any element(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential features or elements of any or all the claims.