The present invention relates to a method for generating an image information, especially to a generation of an image information based on a light field information captured for example by a so-called light field camera or plenoptic camera. The present invention relates furthermore to a device implementing the method for generating an image information and to a light field camera.
In conventional cameras, so-called digital cameras, an image of an environment or scene to be captured is reproduced on an image sensor, for example a CCD sensor or a CMOS sensor, via a lens. Data from the image sensor comprises for example a plurality of pixel data each representing a color and brightness of the image reproduced on the image sensor. The image data captured by the image sensor can be directly reproduced by a display to a user.
A new type of camera which has been developed and researched in recent years is the so-called light field camera or plenoptic camera, which is one type of a so-called computational camera. In light field cameras, the image is not directly reproduced on the image sensor, such that essentially the output of the image sensor directly shows the captured scene, but light rays from the scene or environment are guided in light field cameras to an image sensor arrangement in an unconventional manner. For example, light rays originating from a single object in the scene to be captured may be guided to different locations remote from each other on the image sensor arrangement, which corresponds to viewing the object from different directions. To this end, for example a conical mirror may be arranged in front of a lens. In other implementations, an optic used for guiding light from a scene to be recorded to the image sensor arrangement may be variable, for example by varying geometric or radiometric properties. Furthermore, light field cameras may comprise an array of sub-cameras capturing the scene from different perspectives.
Unlike conventional cameras, in light field cameras a more sophisticated processing of the data captured by the image sensors or the sub-cameras is necessary to provide the final image. On the other hand, in many cases there is a higher flexibility in setting parameters like focus plane of the final image. For example, by combining the images from the sub-cameras it is possible to achieve a number of attractive features, for example refocusing the image after capturing.
However, controlling of the new flexibility and features of light field cameras requires intuitive control means to increase acceptance and user benefit of light field cameras. Therefore, there is a need for aiding a user to control new features of light field cameras.
According to the present invention, this object is achieved by a method for generating an image information as defined in claim 1, a device as defined in claim 10 and a light field camera as defined in claim 15. The dependent claims define preferred and advantageous embodiments of the invention.
According to an aspect of the present invention, a method for generating image information is provided. According to the method, a light field information of an environment or a scene is captured and a gaze information is detected, which indicates a position on a display at which a user is gazing. In other words, when the user is looking at a certain position on the display unit, this certain position is detected as the gaze information. Based on the light field information and the gaze information an image information is generated. Using the gaze information for generating the image information from the light field information allows for example setting a focus on a specific object in the environment, zooming in or out in the image, or optimizing so-called high dynamic range information like a contrast or a color range for a certain object or area in the image.
According to an embodiment, based on the light field information and the gaze information a two-dimensional or a three-dimensional image is rendered. Depending on the display which is used for displaying the image information to the user, a two- or three-dimensional image may be generated and displayed. Light field information allows to reconstruct an image information from different perspectives and therefore two-dimensional as well as three-dimensional or stereoscopic images can be reconstructed.
According to a further embodiment, the light field information is captured as a four-dimensional light field information with a light field camera. Devices for capturing four-dimensional light field information may include a plurality of cameras arranged for example in an arc or in an array, or an optical system in which an array of microlenses is inserted in the optical path.
According to another embodiment, the generated image information is displayed on the display unit to the user. By changing the position the user is looking at, a new gaze information can be generated and used for generating a correspondingly changed image information based on the light field information. The light field information may be updated continuously such that the generated image information is a live video of the environment captured. The light field information may be captured at a certain point in time, for example on a user demand, and the image information may be generated based on the light field information captured at this certain point in time. Thus, by changing the position on the display unit the user is looking at, different image information can be generated from the same light field information having different properties, for example a different focus plane or a different high dynamic range information.
According to some embodiments, the image information is generated by determining a position in the environment which corresponds to the position on the display unit the user is gazing at. For example, a focus plane for generating the image information can be set according to a distance between the position in the environment and the light field camera. Furthermore, a scaled up or scaled down image information containing at least the position in the environment can be generated. Moreover, high dynamic range information like a color information, a contrast information or a brightness information of the image information can be adapted based on a color information, contrast information and brightness information, respectively, of the light field information at the position in the environment. For example, the display unit may have a lower color depth than the color depth provided in the light field information. When the user is looking at a certain position, an area around this certain position may have color information which comprises only a part of the color depth provided by the light field information. The color information of this area where the user is looking at may be generated in the image information using the full available color depth provided for the image information thus providing a more detailed color representation of this area to the user. Similarly, a more detailed contrast and brightness information may be provided in the image information and displayed to the user.
According to a further embodiment, an audio information of the environment is captured with an array microphone or an array of microphones, and an audio output based on the audio information and the position in the environment is generated. The array microphone captures comparable to the light field camera an acoustic field information of the environment. Thus, audio information originating from a certain position in the environment can be generated as the audio output wherein noise from other positions in the environment can be reduced. For example, when a crowd of people talking to each other is located in the environment, the user may gaze at a certain talking person. The gaze information indicates the position on the display unit where the person is displayed, and a corresponding position in the environment is determined. In the generated image information the head of the person may be focused. The audio output generated based on the audio information from the array microphone and the position in the environment includes therefore essentially audio information originating from the person, with noise from the talking other persons being reduced. Thus, a perceivability of the speech of the person can be increased.
According to an embodiment, a further gaze information is detected which indicates a further position on a further display unit at which a further user is gazing. Based on the light field information and the further gaze information a further image information is generated. The light field information comprises information from which different image information can be generated having for example a different focus plane. Thus, the light field information captured for example by a single light field camera can be provided to different display units of different users and for each user a specific image information can be generated depending on the gaze information of the respective user. For example, a first user may look at a first position on the display unit and the image information generated for the first user may be focused on an object at a corresponding first position in the environment. Based on the same light field information a second user may look at a second different position and a second image information may be generated focusing on an object at the position the second user is looking at. In other words, the same light field information can be provided to a plurality of users and for each user a specific image information can be generated taking into account the position the user is looking at.
According to a further embodiment, a plurality of gaze information can be detected over a period of time. Each gaze information indicates a respective position on the display unit the user is gazing at. The gaze information is determined depending on the plurality of gaze information. For example, changing the focus in the generated image information may only be performed, when the user is looking at a certain position for a predetermined amount of time. Furthermore, a zooming into the image, i.e. a generation of a scaled up image information, may be performed, when the user looks continuously at the certain position for an even longer time. Moreover, a scaled down image information, i.e. a zoomed out image, may be generated, when the user is varying the position where he is looking at more frequently, Thus, the generation of the image information can be controlled intuitively by just looking at the generated image on the display unit.
According to a further aspect of the present invention, a device, for example a mobile phone, a personal digital assistant, a mobile music player, a tablet computer, a laptop computer, a notebook computer or a navigation system, is provided. The device comprises an input for receiving a light field information of an environment and a display unit for displaying image information to a user. The device comprises furthermore a detecting unit for detecting a gaze information indicating a position on the display unit the user is gazing at. The device comprises a processing unit which is configured to generate the image information based on the light field information and the gaze information. The device may be adapted to perform the above-described method and comprises therefore the above-described advantages.
According to an embodiment, the detecting unit comprises an infrared camera. For detecting, where a user is looking or gazing at, a tracking of the pupils of the user may be tracked by a camera. Pupils provide a much better reflection of infrared light than of visible light. Therefore, a tracking of the pupils can be reliably performed using infrared light. The device may comprise additionally an infrared illumination source or a plurality of infrared illumination sources for illuminating the face and the eyes of the user. The most widely used current designs are video-based eye trackers. A camera focuses on one or both eyes and records their movement as the viewer looks at some kind of stimulus. Most modern eye-trackers use the centre of the pupil and infrared/near-infrared non-collimated light to create corneal reflections. The vector between the pupil centre and the corneal reflections can be used to compute the point of regard on surface or the gaze direction. A simple calibration procedure of the individual is usually needed before using the eye tracker. Two general types of eye tracking techniques are used: Bright Pupil and Dark Pupil. Their difference is based on the location of the illumination source with respect to the optics. If the illumination is coaxial with the optical path, then the eye acts as a retroreflector as the light reflects off the retina creating a bright pupil effect similar to red eye. If the illumination source is offset from the optical path, then the pupil appears dark because the retroreflection from the retina is directed away from the camera. Bright Pupil tracking creates greater iris/pupil contrast allowing for more robust eye tracking with all iris pigmentation and greatly reduces interference caused by eyelashes and other obscuring features. It also allows for tracking in lighting conditions ranging from total darkness to very bright. But bright pupil techniques are not effective for tracking outdoors as extraneous infrared sources interfere with monitoring.
In some embodiments the detecting unit may comprise a light field camera. Thus, a light field information of the user and an environment around the user may be provided to other users facilitating for example video conferencing. The light field camera may be configured to detect light in or near the infrared spectrum. For example, one or more sub cameras of the light field camera may be sensitive to light in or near the infrared spectrum, whereas other sub cameras of the light field camera may be sensitive to light in the visible spectrum. Furthermore, one or more infrared illumination sources may be provided for illuminating the environment to be captured by the light field camera, e.g. an environment where the user is located. Therefore, the light field camera may be used for detecting where the user is looking or gazing at.
According to another aspect of the present invention, a light field camera is provided. The light field camera comprises a sensor arrangement adapted to capture a light field information of an environment, and an input for receiving a gaze information indicating a position in the environment. The position in the environment may be determined based on a position on a display unit a user is gazing at. The light field camera comprises furthermore a processing unit configured to generate an image information based on the light field information and the gaze information.
As can be seen from the above-described device and light field camera, the processing for generating the image information based on the light field information and the gaze information may be performed in either the device or the light field camera. The processing may be performed in either the device or the light field camera depending on the available processing power or the communication bandwidth between the device and the light field camera.
Although specific features described in the above summary and the following detailed description are described in connection with specific embodiments and aspects, it is to be understood that the features of the embodiments and aspects may be combined with each other unless specifically noted otherwise.
The invention will now be described in more detail with reference to the accompanying drawings.
In the following, exemplary embodiments of the invention will be described in more detail. It has to be understood that the features of the various exemplary embodiments described herein may be combined with each other unless specifically noted otherwise. Same reference signs in the various drawings and the following description refer to similar or identical components.
The light field camera 11 is coupled to the device 10 via a connection 12 which may comprise any kind of suitable data communication, for example an Ethernet connection or a wireless connection like Bluetooth or WLAN. The light field camera 11 may comprise an array camera for detecting a light field information of an environment 13. The environment 13 comprises in this exemplary embodiment a circular object 14 and a star-shaped object 15. The objects 14, 15 are located in a different distance to the light field camera 11, for example, the star 15 may be located in closer vicinity to the light field camera 11 than the circle 14.
The device 10 comprises a detecting unit 16, for example an infrared camera, a display unit 17 and an infrared illumination unit 18. On the display unit 17 the circle 14 and the star 15 are displayed based on the information received from the light field camera 11. A user 19 is looking at the display unit 17 of the device 10. The user 19, especially the eyes of the user 19, are illuminated by the infrared illumination unit 18. The camera 16 tracks the pupils of the user 19 to determine the direction 20 in which the user 19 is looking, thus determining a position on the display unit 17 at which the user is gazing. In the example shown in
The processing for generating the image information based on the light field information and the gaze information may be performed in either the light field camera 11 or the device 10. For example, the gaze information may be sent from the device 10 to the light field camera 11. The light field camera 11 detects the distance to the object gazed at from the information in the image grabbed by the light field camera 11. An image having a focus plane around that distance is generated and a two-dimensional image is created and sent to the device 10 and displayed on the display unit 17. As an alternative, the complete light field information captured by the light field camera 11 may be sent from the light field camera 11 to the device 10, and the device 10 is responsible for detecting the distance at the gaze point, focusing around the distance, creating the image information and the displaying the image information. In addition to using the gaze to control the focus, it is possible to zoom in or out the image or to optimize high dynamic range information, for example color, contrast and brightness. A zooming out may be performed for example, when the user varies the position at which he is gazing rapidly. When changing the gaze to the star, the focus plane may be set accordingly. Naturally, it is also possible to generate the image information not only on the position the user is gazing at, but also based on an area or areas the user is gazing at for example also with varying gaze intensity over a period of time, which is then used when displaying the image on the display unit 17.
Furthermore, it is also possible to control several remote light field cameras in the same way. It is also possible for multiple persons to control the same light field camera. Additionally, it is possible to control the direction of an array microphone using the gaze information in the same way. This may require some more information, concerning for example a placement and characteristics of the light field camera and the array microphone in order to align them. Again, it is possible to control the same remote array microphone by multiple users.
The camera 16 and the light field camera 21 may be realized as separate cameras as shown in
The embodiment shown in
While exemplary embodiments have been described above, various modifications may be implemented in other embodiments. For example, instead of a light field camera any other kind of computational camera may be used. Furthermore, the gaze tracking may be performed by any other devices, for example a camera tracking the pupils in the visible light range or a camera which is not arranged at the device 10 but which is arranged for example in glasses the user is wearing.
Number | Date | Country | Kind |
---|---|---|---|
12 007 049.5 | Oct 2012 | EP | regional |