The present invention also relates to a method for image data recording and reproducing, in particular for automatically creating metadata for digital image file.
It is noted that citation or identification of any document in this application is not an admission that such document is available as prior art to the present invention.
Apparatuses and methods for image data recording and reproducing are well known at the state of the art; in particular, said apparatuses comprise digital cameras apt to capture images and store them on a digital medium. It should be noted that, in the present text, the words “apparatus” and/or “camera” can be used in order to relate to digital still cameras, digital video cameras, mobile telephones having integrated digital cameras, and the like.
With the apparatuses known at the state of the art, between the time an image is captured and the time it is printed or otherwise displayed, the user (that usually is also the photographer) may forget or lose access to information related to the image, such as the time at which it was captured and/or the location in which it was captured and/or the persons depicted in it.
Some digital cameras allow text, such as text representing the date and the time on which an image was captured, to be associated with a photograph; this text is typically created by the camera and superimposed on the image at a predetermined location and in a predetermined format.
Said text only contains a small amount of information, and it conveys little or no useful information to the user of the digital camera that will help him for distinguishing one image from another.
The same problem arise with the default file naming scheme, that is used in digital cameras in order to identify and track digital image files; in fact, said default file naming scheme only employs:
Therefore, also with the default file naming scheme the user has little or no useful information about the contents of a particular image file. In fact, the user must open and view each image file to determine if said image file contains a desired image of a person, of a place, and so on. Eventually the user can edit the naming scheme with the help of a computer, but this possibility is practically of no use when done some time after having recorded the images.
Document No. EP1876596 relates to an apparatus for image data recording and reproducing, said apparatus comprising:
According to what is described in document No. EP1876596, the metadata to be included in the image file are generated by using the text data converted by the speech recognition unit, so that it is possible to add reliable metadata (such as, for example, shooting locations or persons being displayed in the image) to the image file just after the capture of the image and/or while reviewing the image file.
In addition, the name of the folder in which the image file is to be stored is generated based on the text data that is converted by using speech recognition, so that it is possible to classify the image files at a time when the image is captured.
However, it has been observed that even the apparatus described in document No. EP1876596 suffers from some drawbacks, since it is adapted to recognize and convert only one predetermined language.
In fact, the programs and software for recognizing speech and converting the speech into text data are expensive, large and very big in size, usually in the order of many megabyte (or a gigabyte) for each language that has to be recognized and converted into text; therefore, said programs and software cannot be utilized in a image data recording and reproducing apparatus without making a choice of only one predetermined language for each apparatus.
This implies that each apparatus realized in accordance with the teachings of the document No. EP1876596 needs to comprise a program apt to recognize and convert into text only one language.
This necessarily means that the apparatus cannot be versatile and eclectic, since it is necessary for the user to have an apparatus comprising a specific program for recognizing his own language, in order to convert said language into text.
This also means that the producer of the apparatus is not able to produce a single product that can be sold in different countries, where the users speak different languages. The consequence of that are an increased number of models for the same product and an increase of cost of production
It is noted that in this disclosure and particularly in the claims and/or paragraphs, terms such as “comprises”, “comprised”, “comprising” and the like can have the meaning attributed to it in U.S. Patent law; e.g., they can mean “includes”, “included”, “including”, and the like; and that terms such as “consisting essentially of” and “consists essentially of” have the meaning ascribed to them in U.S. Patent law, e.g., they allow for elements not explicitly recited, but exclude elements that are found in the prior art or that affect a basic or novel characteristic of the invention.
It is further noted that the invention does not intend to encompass within the scope of the invention any previously disclosed product, process of making the product or method of using the product, which meets the written description and enablement requirements of the USPTO (35 U.S.C. 112, first paragraph) or the FPO (Article 83 of the EPC), such that applicant(s) reserve the right to disclaim, and hereby disclose a disclaimer of, any previously described product, method of making the product, or process of using the product.
In this frame, it is the main object of the present invention to overcome the above-mentioned drawbacks by providing an apparatus and a method for image data recording and reproducing which allow to recognize and convert into text a plurality of languages.
It is a further object of the present invention to provide an apparatus and a method for image data recording and reproducing conceived in a manner to be versatile and eclectic.
It is a further object of the present invention to provide a single apparatus and method for image data recording and reproducing able to recognize and convert into text a plurality of different languages.
These objects are achieved by the present invention through an apparatus and a method for image data recording and reproducing, incorporating the features set out in the appended claims, which are intended as an integral part of the present description.
Further objects, features and advantages of the present invention will become apparent from the following detailed description and from the annexed drawings, which are supplied by way of non-limiting example, wherein:
It is to be understood that the figures and descriptions of the present invention have been simplified to illustrate elements that are relevant for a clear understanding of the present invention, while eliminating, for purposes of clarity, many other elements which are conventional in this art. Those of ordinary skill in the art will recognize that other elements are desirable for implementing the present invention. However, because such elements are well known in the art, and because they do not facilitate a better understanding of the present invention, a discussion of such elements is not provided herein.
The present invention will now be described in detail on the basis of exemplary embodiments.
In
The apparatus 1 for image data recording and reproducing according to the exemplary embodiment of the present invention may be a digital still camera, a digital video camera, a mobile telephone having an integrated or associated digital camera, and the like.
Said apparatus 1 comprises:
Said imaging system 10 may comprise a lens/shutter assembly 11, which directs and focuses light onto a sensor 12 for capturing images of a subject; in particular, said sensor 12 can comprise one or more CCD (Charge Coupled Device) or one or more CMOS (Complementary Metal-Oxide Semiconductor).
Therefore, said signal processor 20 controls the operations of the lens/shutter assembly 11 and processes image information received from the sensor 12 for generating an image file containing the captured image in a digital format.
When the image file includes still image data, the digital image file may be in Joint Photographic Experts Group (JPEG) or Tag Image File Format (TIFF) format; when the image file includes moving image data, the digital image file may be in Moving Picture Experts Group (MPEG) format or other video formats known on the state of the art.
Moreover, as known at the state of the art, each of the image files includes an area for storing the image data and an area for storing information regarding the image. This is done in accordance to international standards. In fact there are some entities that have defined how to add metadata to image files, like:
As it can be seen from
In accordance with the present invention, said speech recognition unit 40 comprises a plurality of subsets 41 of words, each subset 41 having a limited number of words, in order to recognize and convert into text speech annotations acquired from a corresponding plurality of languages.
In particular, each subset 41 of words does not comprise a complete dictionary of words of a specific language, but each subset 41 of words comprises a relative translation in a determined language only of a limited number of words, choosing and memorizing them at the manufacturer site only between the words more frequently used for being associated to a determined image.
In particular, said plurality of words may comprise:
This provision allows to obtain an apparatus and a method for image data recording and reproducing which allow to recognize and convert into text a plurality of languages, even if limited to a subset of words.
It is clear that if the word that the user wants to associate to a certain image is not provided by the limited subset of words memorized and recognizable by the apparatus, this particular word can be edited manually by making use of one of the several tools known in the state of the art for writing words: keyboards, touch screen systems, etc.
In particular, the apparatus 1 and the method according to the present invention allows to recognize speech and to convert the speech into text data without the need of using a speech recognition unit 40 expensive, large and very big in size, usually in the order of many megabyte (or a gigabyte), for each language that has to be recognized and converted into text. Therefore, this solution can be implemented in consumer products like digital still cameras, digital video cameras, mobile telephones having integrated digital cameras, and the like, without charging these products with a cost that cannot accepted by the market.
It is therefore clear that said speech recognition unit 40 can be utilized in the apparatus 1 without making a choice at the manufacturer site of a predetermined language to be used, and that said speech recognition unit 40 allows to indicate one single apparatus 1 and method conceived in such a manner to be extremely versatile and eclectic.
Preferably, said speech recognition unit 40 is associated to activating means 42 that allow the user to activate the speech recognition unit 40 in order to convert the speech annotation into text data.
In particular, said activating means 42 can be actuated by the user before the image is captured and/or displayed; otherwise, said activating means 42 can be actuated by the user after the image is captured, in particular when said image is displayed. For example, said activating means 42 may comprise a button (not shown in the drawings) preferably positioned on an external surface of the apparatus 1.
The apparatus 1 comprises also a memory 50 coupled to the signal processor for storing the digital image file and/or the speech annotation and/or the speech annotation converted into text data. Said memory 50 can comprise a Random Access Memory (RAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or the like.
Moreover, the apparatus 1 further comprises a display 60 associated to the signal processor 20. As known, said display 60 can be used for a plurality of purposes, in particular:
In a preferred embodiment of the present invention, said display 60 comprises an On Screen Display (OSD) system apt to choose both a language between a plurality of languages for displaying the operation of the apparatus 1, both one of said subsets 41 of words.
As said before, it is clear that the apparatus 1 can comprise input means (not shown in
In particular, said method comprises the following steps:
According to the present invention, said step 130 of recognising and converting the speech annotation into text data is performed by making use of one of the plurality of subsets 41 of words stored in said speech recognition unit 40 for recognising and converting into text speech annotations acquired from a corresponding plurality of languages.
In
In particular, the method according to the present invention is performed through the step 160 of actuating activating means 42 of the speech recognition unit 40, said activating means 42 allowing the user to activate the speech recognition unit 40 in order to convert the speech annotation into text data.
As can be seen in particular in
Alternatively, as can be appreciated in particular from
Moreover, the method according to the present invention comprises the further step 180 of choosing both a language between a plurality of languages for displaying the operation of the apparatus 1, both one of said subsets 41 of words by means of an On Screen Display (OSD) system comprised in said display 60.
Preferably, with reference to the method of
Moreover, it must be noticed that the present invention can also be embodied as computer readable metadata on a computer readable storage medium/data. The computer readable storage medium/data is any data storage device that can store data, which can be thereafter read by a computer system. Examples of the computer readable recording medium include Electrically Erasable Programmable Read Only Memory (EEPROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and the like.
The advantages offered by an apparatus and a method for image data recording and reproducing according to the present invention are apparent from the above description.
In particular, such advantages are due to the fact that the provision of a speech recognition unit 40 comprising a plurality of subsets 41 of words allows to recognize and convert into text a plurality of languages; in particular, this can be done without the need of using a speech recognition unit 40 expensive, large and very big in size, usually in the order of many megabyte (or a gigabyte), for each language that has to be recognized and converted into text.
It is therefore clear that clear that said speech recognition unit 40 can be utilized in the apparatus 1 without making a choice of a predetermined language that has to be recognized and converted into text, therefore, the particular realization of the speech recognition unit 40 according to the present invention allows to indicate an apparatus 1 and a method conceived in such a manner to be versatile and eclectic.
The apparatus and method described herein by way of example may be subject to many possible variations without departing from the novelty spirit of the inventive idea; it is also clear that in the practical implementation of the invention the illustrated details may have different devices or be replaced with other technically equivalent elements, as well as providing different sequences of steps.
For instance with respect to the embodiments shown in
It can therefore be easily understood that the present invention is not limited to the above-described apparatus and method, but may be subject to many modifications, improvements or replacements of equivalent parts and elements without departing from the inventive idea, as clearly specified in the following claims.
While this invention has been described in conjunction with the specific embodiments outlined above, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art. Accordingly, the preferred embodiments of the invention as set forth above are intended to be illustrative, not limiting. Various changes may be made without departing from the spirit and scope of the inventions as defined in the following claims.
The present application claims priority from PCT Patent Application No. PCT/EP2010/057747 filed on Jun. 2, 2010, the disclosure of which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP10/57747 | 6/2/2010 | WO | 00 | 2/25/2013 |