This invention relates to still picture cameras as well as to apparatus and method for creating relatively permanent facsimiles of the pictures taken by the cameras.
People who go on vacation often take still pictures of people and places at various points/times along the way. Typically, the pictures, when taken with a conventional film-type camera, are developed sometime after returning from the vacation, and thereafter the developed pictures are placed in an album. The same is true with digital pictures, although often the digital pictures are stored in a computer rather than, or in addition to, transferring the pictures to paper. Alas, all too often people forget to record the circumstances surrounding the particular pictures, and later forget those circumstances. This is a problem that would be highly desirable to overcome.
An advance in the art is achieved by providing means, associated with the still cameras (film, or digital), for recording a short message in association with each of the pictures that the still camera captures, essentially contemporaneously with the picture taking. When the pictures are regenerated for the viewing pleasure of people, each picture presents the previously recorded message in a manner that is perceivable by the people who are viewing the pictures.
In an illustrative embodiment for film cameras, the message comprises a spoken speech passage that is stored in a memory chip situated within the cartridge containing the camera's photographic film. When the pictures that are captured on the individual frames of the film are developed, the speech messages that are maintained in the memory chip and that are associated with respective frames of the film are converted to text by means of conventional text to speech processing, and the text is imaged on the picture that is printed for each of the frames.
In an illustrative embodiment for digital cameras, the recorded messages that are associated with each picture that is captured by the camera are stored in the memory in which the image is stored, and presented to whatever display means that a user chooses to use. In the case where the display means is a printer that memorializes the images on paper, the associated messages are printed in association with each of the respective printed pictures. In the case of the picture being downloaded to a computer, the associated message is downloaded as well and, either concurrently with the downloading or thereafter, the message is converted to text by means of conventional speech to text processing. Alternatively, the speech is outputted as speech.
In another illustrative embodiment (for either type of camera), the camera includes a processor that converts the speech to text and, optionally, embeds the text in the picture.
In a digital camera, shown in
Photographic Film Cameras
As indicated above, it is an objective of this invention is to record information that is provided by a user, often contemporaneously with the taking of a picture. To meet this objective, the
Although memory 20 can be situated within the camera, the illustrated embodiment contemplates memory 20 to be part of the film cartridge; i.e., physically associated. As an aside, in the context of this disclosure, the term “cartridge” is broad, encompassing rolls, and other means of packaging film for still cameras. Accordingly,
In addition to memory 20, the
In addition to storing information that is provided by the user, it is an objective of this invention to create pictures that are augmented with the information that is provided by the user. The information may be processed (perhaps even to a significant degree) before it is stored, but that is not a requirement. For example, when means 22 is a microphone and the input is speech that is provided by the user, a viable embodiment results with the mere conversion of the speech to electrical signals and the storing of those signals in memory 20. In many applications, however, converting the speech into text and storing the text will prove to be perfectly satisfactory. It is realized, for example, that the interval between picture taking is typically many seconds, which is amble time for processing speech into text.
The general operational philosophy is that when a user is given an opportunity to record an utterance of a relatively short duration (in the television arts, such utterances are sometimes called, a “sound byte,” and in education arts such utterances might be called “speech strips”) shortly after taking a picture. Assuming that the recording is that of voice (which, for sake of simplicity, is assumed throughout the disclosure), the utterance of the user is recorded. This is accomplished by block 100 in
Eventually, the film is converted to photographs that are printed on paper, and it is the objective of this invention to create a human-readable facsimile of the utterance on the photographs. The process associated with the creation of the relatively permanent facsimile, i.e., creating the “hard copy” photographs printed on photographic paper, is shown in step 200.
Though the process of step 100 can be logically quite complex, if many different options are given to the user (for example, allowing the user to record an utterance shortly before taking a picture, or after taking the picture, even a very simple process achieves the desired results.
In the illustrative embodiment of
The processing of step 104 may be simply a conversion of the audio signal to digital format, but it can be more than that. In fact, employing a speech-to-text algorithm yields a very attractive embodiment. Speech to text algorithms have existed for a number of years, so a number of options are available to the artisan for selecting software modules to be included in processor 15 for carrying out the processing of step 107. The speech-to-text engine of IBMs' ViaVoice is an example of such algorithms. ScanSoft offers a similar software module.
It may be noted that conventional movie film recording, as well as or video recording, comprises a collection of still images (frames) that appear at a rate that is faster than what a humane eye can discriminate, resulting in a visual experience (upon playback) of continuous action. Nevertheless, there is a finite time interval between frames. However, the sound that is recorded in such movie film and video recording is essentially continuous, so one might assert that there is a correspondence between a frame and a snippet of recorded sound. That is not what this invention contemplates. Rather, the recorded sound of this invention is much longer in duration than the duration between successive captures of images. It is contemplated that the recording associated with an image will be at least on the order of a second or more, which is more than an order of magnitude longer than sound snippets that may be said to be associated with frames of a movie or a video recording. For purposes of this disclosure, a sound of a time interval of more than one second is called label time.
The above description focuses on when the sound recording is initiated, rather than the duration of the label time. That duration can be fixed by the camera's manufacturer, preset by the user, or dynamically controlled. For example, camera can be set so that the user presses button 21 to initiate the sound recording, and presses it again to stop recording. The difference between those two instances is the label time.
It is expected that, as is the case with all picture-taking sessions, eventually the last frame of the film is exposed, and the next step is to develop the film, and print pictures on photographic paper. In accord with the principles of this invention, the film to be developed advantageously is developed essentially contemporaneously with the retrieval and processing of the information contained in memory 20. Processing may be simply transcribing the speech information—for example, storing it in a computer memory. However, the processing may also encompass a conversion of the speech to text.
Step 200 of
In accord with the illustrated embodiment, the text of the utterance is printed at the picture's white border, or embedded in the picture area. Such printing can be effectively accomplished photographically, as shown, for example, in
The
It should be noted that the process of recording the user-supplied information that is described above needs to be modified not to provide the option of recording user-provided information after the taking of the picture. In such a modified process, which is simpler than the process described above, the user must first press button 21, record the utterance, have processor 15 convert the utterance to text and then, when button 14 is pressed, the have processor 15 control the shutter the camera in a conventional way and, in addition, outputting the utterance to imaging element 23. The latter function is not unlike the placing of a time and date on a picture, except for the novel notion of having that text be user controlled.
Digital Cameras
To meet the above-mentioned objective for digital cameras (in contrast to the film cameras discussed above), no additional hardware is necessarily required over and above the hardware that digital cameras already have, except for means 22 and the functionality of button 21. Adding a microphone for means 22, for example, is a pretty trivial augmentation.
As in the above-described film camera, the information can be injected shortly after a picture is taken, whereupon the user-provided information is stored in the same memory that stores the actual pictures, perhaps in the form of .jpg files, for speech, and .txt files for text.
The user-provided information can be stores separately from the associated pictures, and it can also be stored within the file, such as in a trailer of the file that can be easily segregated. In this manner, whether an image of a picture is retrieved in the digital camera (for local previewing), or is retrieved from a computer to which pictures of the digital camera were downloaded, it is relatively easy to separate the image from the user-provided information, to process the user-provided information, and to display it.
One advantage of this approach is that the user-provided information can be altered at any time. This feature may be quite desirable by some users when away on vacation and the user wishes to employ unusual words (often foreign words, such as names of places). It may be desirable to be able to review the recorded speech and edit it, or to review the algorithmically generated text and edit the text. As for the former, it is straight forward to enable the camera to re-record a label. As to the latter, means would need to be provided to allow the user to enter/edit text; perhaps with a dial pad not unlike the dial pad of a cellular telephone.
On the other hand, processor 15 can include a pixel-salting software module that follows the module that converts speech to text, where the pixel-salting module converts text to pixel groupings that form letters. The signals developed by the pixel-salting module are added to the signals that are already present in the memory and, thus, after the digital camera takes a picture, the user-supplied text may be added to the stored image. This creates pictures with embedded text that can be previewed while still in the camera's memory, and can also be viewed when downloaded to a computer. Converting the digital images to paper images, of course, preserves the text.
The above disclosure presents the principles of this invention by way of illustrative embodiments. It should be realized, however, that various other additions and modifications can be made without departing from the spirit and scope of this invention, as defined in the accompanying claims. To give an example of a modification, although the disclosure above speaks only of text as the compressed form of utterances, one should realize that other forms are also acceptable; for example the phonemes of the utterance also form a useable, compressed representation of utterances. To give an example of an addition, there has been no mention of the duration of the utterance that may be recorded. That, of course would be a design choice that relates to the size of memory 20, and to the manner in which the information is stored (raw, or compressed).
As indicated above, one feature of this invention is that the recording of information (for example, speech) is enabled for a relatively long time—compared to the time it takes to take a picture, or the time it takes successive pictures in a movie camera or a video camera. Another way to characterize this feature is that the recording of information is unrelated to any time constraints pertaining to the picture taking. In particular, also the above mentions no upper limit on the duration of the information recording. In connection with film cameras, where the primary motivation is to create a label that makes up a permanent feature of the captures image, it is expected that users will record only short utterances; perhaps less than 10 words' worth. With digital cameras, however, this self-imposed limitation might not occur. A user who is planning to show the digital picture through a computer-controller “slide show,” might be willing to utter and record a number complete sentences.