The present invention relates to a method and apparatus for modifying a digital image.
Many techniques have been developed to improve the quality of digital imaging. In particular, techniques have been developed to make the imaging more real and less 2-dimensional. One known technique is to add shadows or shading generated by light sources. Numerous techniques have been developed, for example, as disclosed by GB2267409 and the paper IEEE Transactions on multimedia, vol. 1, No. 2, June 1999 “Augmented Reality with Automatic Illumination Control Incorporating Ellipsoidal Models” Jürgen Stauder.
However, these techniques do not go far enough. There is an increasing demand by users to feel more immersed in what they are viewing “as if they are there”, particularly in interactive systems such as video telephony.
With the growing amount of people having access to broadband Internet, voice over IP (VoIP) applications are being used more and more and Internet video telephony will be more widely available.
Much of the research to date in this area has focused on improving the audiovisual quality of video telephony by signal processing means. To date high audiovisual quality and low communication delay has been recognized as key for large-scale breakthrough of video telephony.
Even with perfect audiovisual quality, even with 3D images on a very large screen with 3D sound, and with minimal communication delay, there are fundamental differences between mediated person-to-person communication (video telephony) and person-to-person communication at the same place. One such aspect concerns the fact that if you are in a room with someone else, you influence the lighting conditions in the room by your very presence: depending on the location of light sources you create a shadow on walls and/or on the person with whom you are talking. With video telephony you do not influence the lighting conditions in the other room, and therefore there is not a feeling of “as if you are there” even with the highest audiovisual quality.
The present invention seeks to provide a system, which displays a scene in an image and gives the viewer the perception that they are present within the scene whilst they are viewing it.
This is achieved, according to an aspect of the invention, by a method of modifying a digital image being displayed to a user, the method comprising the steps of: detecting at least one characteristic of a user viewing a scene displayed by a digital image; modifying the digital image to include an aspect of the detected at least one characteristic to give the user the impression that the user is present within the scene displayed by the digital image.
This is also achieved, according to another aspect of the invention, by an apparatus for modifying a digital image being displayed to a user, the apparatus comprising: detecting means for detecting at least one characteristic of a user viewing a scene displayed by a digital image; and processing means for modifying the digital image to include an aspect of the detected at least one characteristic to give the user the impression that the user is present within the scene displayed by the digital image.
In an embodiment, the at least one characteristic of the user is at least one selected from: location of the user; silhouette of the user; viewpoint of the user. The aspect may be a shadow.
In this way the user viewing the scene sees their shadow in the scene giving them a feeling of being there and increased immersion within the scene whilst viewing it.
This effect may be achieved by providing a virtual light source in the vicinity of the user; determining a shadow of the user cast by the virtual light source; modifying a digital image being displayed to the viewer by adding the determined shadow to the digital image.
The digital image may be modified by determining whether or not each pixel of the digital image is within a region of the determined shadow; and then modifying the luminance of the pixel if it is determined that it is within the region of the determined shadow.
In an embodiment movement of the user is taken into account and the aspect is adjusted on the basis of this movement; and the digital image is modified to include the adjusted aspect. For example, the movement of the user is taken into account by modifying the image for any changes in a characteristic such as the silhouette of the user.
In this way the modified digital image maintains realism of the user being present within the scene whilst viewing it.
To increase the realism, depth information of objects within the scene may be determined and the digital image may include adjusting the aspect on the basis of the depth information.
The digital image may be a video image, which comprises a plurality of frames of the digital image, and the step of modifying the digital image includes modifying the image of each frame.
The aspects of the invention may be included in video telephony in which a digital image is generated at a far-side site and transmitted via an audio-visual communication link for display at a nearside site. The nearside site is remote from the far-side site. To improve the realism for the user at the nearside site, and, possibly, also improve realism for the user at the far-side site to being in a face-to-face communication, at least one characteristic of an object at the far-side site included within the scene of the digital image is detected and the digital image to include an aspect of the at least one characteristic within the digital image.
According to yet another aspect of the invention, there is provided a system for transmitting a digital image having associated audio from a far-side site to a nearside site, the system comprising: an audio-visual communication link between a far-side site and a nearside site; means for generating a digital image at the far-side site; transmitting means for transmitting the generated digital image to the nearside site via the audio-visual communication link; a virtual light source located at the nearside site; processing means for determining a shadow of a user at the nearside site cast by the virtual light source and modifying the transmitted digital image by adding the determined virtual shadow to the transmitted digital image; and display means for displaying the modified digital image to the user.
The processing means may further determine a shadow cast by at least one object at the far-side site within the scene of the generated digital image cast by the virtual light source and modifying the digital image to include the determined shadow.
For a more complete understanding of the present invention, reference is now made to the following description taken in conjunction with the accompanying drawings in which:
a-d show a simple schematic of a nearside site (room) with display in a video telephony application incorporating an embodiment of the invention;
a-c show a simple schematic of a nearside site with display in a video telephony application showing variation of “Ambilight” operation in a further embodiment of the present invention;
a illustrates an example of a far-side image;
b illustrates the result of segmentation of the image of
a illustrates a far-side image as seen in the nearside site in which a shadow of the far-side user is added.
b illustrated the far-side image of
An example of an application of the invention is shown in the figures. With reference to
At least one camera is placed at a fixed, predetermined location near the display 110 at the nearside site 100, and from the camera video signal the location of the nearside user 118 is calculated. From this information a virtual shadow 120 of the nearside user 118 casts by the virtual light source 114 is determined. The virtual shadow 120 of the nearside user 118 is added to the displayed image and shadow on the far-side user 112 and on the wall behind him/her. To calculate the shadows the position of the virtual light source 114 at the nearside site 100, the depth positions of the users 112, 118 at both sides relative to the display 110, and the depth position of the wall at the far-side site, is taken into account as explained below.
The shadow calculations occur in real-time at the nearside site 100, and are only visible to the nearside user 118. By seeing his/her shadow in the other room, an improved sense of presence is created for the nearside user. To make the effect more natural, the shadow 116 of the far-side user 112 is also provided in the image.
a-c illustrates the effect of additional light sources mounted inside a display screen that project light on a wall such as that provided by Ambilight TV, which could be used in a display provided on an ambilight television set. The nearside room 200 incorporates an ambilight television which generates the colour and intensity of its ambilights from an analysis of the video content, in such a way that the light impressions on the screen are extended on the wall in a perceptually pleasant way. The result is an increase sense of ‘immersiveness’. According to the embodiment, calculation of the ambi-light settings the image of the nearside user 218 as recorded by a camera (not shown) mounted at a known position inside or near the display 210. The shadow impression 220 casts by the virtual light source 214 extend beyond the display 210.
With reference to
It can be appreciated that a virtual light source may also be provided at the far-side site 307 to provide a similar effect to the far-side user 305. For simplicity, the embodiment is described with reference to the effect being seen by the nearside user 301. Therefore, for the explanation of the embodiment, the nearside site receives an audiovisual signal from the far-side site, and for this reason is henceforth considered as the receive end of the communication set-up. The far-side site consequently is the transmit end.
As shown in
The shadow 319 casts is virtual and added to the displayed image by digital video processing of the appropriate image pixels so that it is visible only at the nearside site 303.
To create the virtual shadow, it is determined which image pixel values have to be modified. To do this the viewpoint 317 of the nearside user 301 is taken into account as shown in
When there are multiple nearside users then there are multiple viewpoints. In principle, from each different viewpoint the shadow rendered on the display should be different. This can be done with multi-view displays, for example based on a lenticular lens design, or based on a barrier design. The perceptual importance of the shadow effect lies mostly in the concept that the shadow of the user in the nearside room moves in correspondence with the motion of that same person. However, having a different shadow rendered for each different viewpoint is less important perceptually. Therefore, rendering of multi-viewpoints can be neglected and only one shadow need be computed. As a result, it suffices to use a normal 2D display.
As a result it is not necessary to take into account the precise viewpoint of a nearside user. Therefore, only the viewpoint distance value d2 is required and the viewpoint angle is set to 90 degrees (perpendicular to the display). Alternatively, a precise knowledge of the viewpoint location of the nearside user may be taken into account. To simplify the calculation a good approximation of the nearside viewpoint localization is made as follows: d2 is set to a fixed value. Next a well-known face detection algorithm is used such as that disclosed by P. Viola and M. J. Jones, “Robust real-time face detection,” International Journal of Computer Vision, vol. 57, no. 2, pp. 137-154, 2004, to find a rectangular face region in the camera image. The centre of this region is then taken as the angle towards the nearside user viewpoint.
Next, in real-time, a map of the space that the nearside user occupies in the 3D environment is measured. This simplified to a 2D segmentation map. The camera 601 mounted at a fixed and known position near the display is used, as indicated in
From the camera sequence the nearside segmentation mask is achieved using one of the known image object segmentation techniques that are based on the knowledge that the background object (i.e. the room) is stationary and the foreground object (i.e. the user) is non-stationary. One such technique is known in the literature as background subtraction, see for example, P Noriega and O Bernier, “Real Time Illumination Invariant Background Subtraction Using Local Kernel Histograms,” British Machine Vision Conference BMVC06 (III:979).
The near-end segmentation map (or image) is denoted by
Dne(c,r) with cε[0,Wne−1], rε[0,Hne−1] (1)
where c and r are the integer column and row indices respectively, and Wne and Hne are the respective horizontal and vertical pixel amounts of the near-end segmentation image. Dne(c,r) has non-zero values only at locations where in the camera image the nearside user was detected. In the situation depicted in
Also from the far-side site, next to the normal received image data, a segmentation image is needed at every image frame period. This segmentation mask is either calculated at the transmit end and sent to the other side, or it is calculated from the incoming video at the receive end (using for example background subtraction), an example is presented in
The far-side luminance image shown on the display 309 at the nearside site 303 is denoted by
Ife(c,r) with cε└0,Wfe−1┘, rε└0,Hfe−1┘, (2)
and the corresponding segmentation map (or image) is denoted by
Dfe(c,r) with cε└0,Wfe−1┘, rε└0,Hfe−1┘, (3)
where again c and r are the integer column and row indices respectively. Wfe and Hfe are the respective horizontal and vertical pixel amounts of the far-end image and the segmentation map. Dfe(c,r) has non-zero values at locations where in the camera image the near-end person was detected, and zero values for the background (the wall), see
Next the desired shadow effect in the image shown on the screen at the nearside site is calculated.
Each row of pixels in the image is scanned to determine whether or not each pixel in the row is a shadow pixel. If it is not a shadow pixel then the luminance Ife(c,r) is unchanged. If it is a shadow pixel the luminance Ife(c,r) is modified according to
Ĩ
fe(c,r)=max{Ife(c,r)−s,0}, (4)
where s is a number that reflects the desired drop in luminance due to the shadow. For most realism s is chosen to depend on all geometry parameters of the scene so that the shadow intensity changes depending on the distances between the virtual light source, the nearside and far-side users, and the far-side wall. For simplicity, but still providing a realistic effect s may be chosen to be a small constant number s=sfe when the shadow comes from the far-side user and a slightly larger constant number s=sne when the shadow comes from the nearside user.
To determine whether or not a pixel in the far-side image is a shadow pixel, the pixel at the c1-th column and r1-th row in the far-side image Ife(c,r) is considered and the corresponding location in the (x,y,z)-coordinate system is calculated. This location is denoted by the vector a1, which is calculated as
Here |·| means the absolute value. In the example situation of
When the solved unknowns in (6) are denoted by κo, λo and μo, then a2 is given by
In the example situation of
Next it is verified whether or not the line through a2 and plight which crosses the space occupied by the nearside user. The location a3 is calculated as the location of the cross-point of that line with the plane z=−d2, see
Solving for φ, ρ, and σ gives φo, ρo, and σo, and a3 becomes
Now the real-valued elements of a3 from the (x, y, z) coordinate system are translated in (c,r) coordinates for the nearside segmentation mask Dne(c,r), denoted by coordinates by (c3, r3). It is assumed that the segmentation image Dne(c,r) lies exactly within the coordinate range xε[xcr, xcl] and yε[yct, ycb] in the vertical image plane defined by z=−d2, see
The values of c3 and r3 of Equation (10) are real, and are rounded to the nearest integer before being used as image indices. Alternatively, for more accuracy, c3 and r3 are not rounded and a bilinear image interpolation is applied when fetching a pixel value from an image at a non-integer location.
In the same way the location of the point a4 is calculated, which is the cross-point of the line through peye and a2 and the plane z=0, see
From these calculations it is now possible to make a decision whether or not a pixel at location (c1,r1) in the far-side image Ife(c,r) is a shadow pixel.
A pixel Ife(c1,r1) is a nearside user's shadow pixel when a3 coincides with the location of the nearside users, hence when
Dne(c3,r3)≠0. (11)
A pixel Ife(c1,r1) is a far-side user's shadow pixel when three conditions are satisfied, namely it is not a nearside user's shadow pixel, and it is not a pixel belonging to the far-side user (so a2 is a wall pixel), and the line through peye and a2 intersects with the location of the far-side user. Hence, a pixel Ife(c1,r1) is a far-side user's shadow pixel when
D
ne(c3,r3)=0, and Dfe(c1,r1)=0, and Dfe(c4,r4)≠0. (12)
When the condition in Equation (11) is satisfied the shadow constant s in Equation (4) is set to s=sne=30 (assuming image luminance values in the range [0,255]). When the condition in Equation (12) is satisfied s=sne=20 is set.
Note in
For each time frame, a current silhouette of the user is detected as described above and the corresponding shadow of the current silhouette is determined such that as the user moves, the shadow correspondingly moves.
The invention may be applied in video telephony and video conferencing systems to create a sense of “being there”. Alternatively, it may also be applied for the creation of your own shadow in virtual environments (games, Second Life).
The invention may also be used within applications, where the goal is to be able to capture, share, and (re-)live the experience “as if you were there”. For the lifelike full-sensory re-creation of recorded experiences, for example watching your holiday movie and whilst the viewer is watching it, their own life shadow is shown in the holiday scene in real time giving them a feeling of being there and increased immersion.
Although embodiment of the present invention have been illustrated in the accompanying drawings and described in the foregoing description, it will be understood that the invention is not limited to the embodiments disclosed but capable of numerous modifications without departing from the scope of the invention as set out in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
08152535.4 | Mar 2008 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB09/50843 | 3/3/2009 | WO | 00 | 8/30/2010 |