The present invention relates a transmitting communication device comprising a camera for capturing an image sequence with a display range and a transmitting unit for transmitting the captured image sequence to a receiving communication device.
The present invention also relates to a receiving communication device comprising a receiving unit for receiving from a transmitting communication device through a communication network, video data corresponding to an image sequence to be displayed with a display range, and a display for displaying the video data.
This invention is, for example, relevant for video telephony and video conferencing systems.
With the growing amount of people having access to broadband internet, Voice Over Internet Protocol (VoIP) applications are used more and more. Also internet video telephony is coming, now that bandwidth limitations are decreasing.
Much research in the world focuses on improving the audiovisual quality of video telephony by signal processing means.
It is an object of the invention to propose a communication device which increases the user experience.
As a matter of fact, the invention is based on the following observations. With conventional video telephony and conferencing systems the image reproduction is limited to the used display itself. Naturally, the display is the location where people at the near-end location of the communication will look when they want to see what goes on at the far-end location. In this situation the central part (in space) of the human visual system is stimulated, there where the eye has a high sensitivity for spatial detail. In normal face-to-face communication, when people “are really there”, not only the central part of the human visual system but also the peripheral part of the visual system is stimulated. In the peripheral regions the human visual system is not very sensitive for spatial detail, but it has a high sensitivity for temporal detail (moving objects). This means that people can quickly spot from the corner of their eyes when someone is moving. Visual information that stimulates people's peripheral viewing is not at all used in current-state video conferencing and video telephony systems, while it could add to the sense of “being there”. With proper peripheral viewing you could spot someone who enters the room at the far-end location at the left or right side, even though that person is located outside the limited viewing range of the display. The fact that the peripheral viewing senses of humans is not exploited in video telephony or video conferencing is an issue which to the knowledge of the author of this invention has not yet been addressed. Solving this issue means improving on the sense of being there, which is the goal of the invention.
To this end, there is provided a transmitting communication device comprising a camera for capturing an image sequence with a display range; a sensor for capturing environmental events that are not perceptible within the display range, a processor for computing a control signal from the environmental events, and a transmitting unit for transmitting the captured image sequence and the control signal to a receiving communication device.
There is also provided a corresponding receiving communication device comprising a receiving unit for receiving separately, from a transmitting communication device through a communication network, video data corresponding to an image sequence to be displayed with a display range and a control signal representative of environmental events not perceptible within the display range; a display for displaying the video data; light sources next to the display; and a controller for interpreting the control signal so as to derive the environmental events and for controlling the light sources in dependence on the control signal.
The invention makes it possible to stimulate people's peripheral viewing senses in video telephony and video conferencing systems by extension of the displayed visual information to the region outside the display. With the invention it is possible to spot events at the far-end location that happen outside the visible image shown on your screen. This provides with an enhanced sense of “being there”.
According to a first embodiment of the invention, the sensor is a wide-angle camera and the control signal is computed from an image sequence captured by the wide-angle camera outside the display the range. Beneficially, the processor comprises means for detecting the face or faces of people within a viewing range of the camera, and means for zooming around this face or these faces so as to obtain the image sequence with the display range to be transmitted to the receiving communication device. The processor may comprise means for estimating motion of an object entering into the viewing range of the wide-angle camera so as to derive a motion vector magnitude, which motion vector magnitude is included in the control signal. The processor may also comprise means for computing color information of pixels within the viewing range and outside the display range, said color information being included in the control signal.
At the receiving communication device side, the controller may be adapted to derive a motion information representative of the motion of an object outside the display range at the transmitting communication device side and a position of the object relative to the display so as to switch on the corresponding light source. The controller may be adapted to derive a motion vector magnitude representative of the amount of motion of an object outside the display range at the transmitting communication device side, an intensity of the light source being controlled in proportion of the motion vector magnitude. The controller may also be adapted to derive a color information representative of a position and/or motion of an object outside the display range at the transmitting communication device side, a color of the light source being controlled in dependence on the color information.
According to a second embodiment of the invention, the sensor is a microphone array arranged to detect sound outside the display range, and the control signal is computed from the microphone signals. In this case, the processor comprises means for performing acoustic source localization so as to detect a direction of a sound event, the direction of the sound event being included in the control signal.
At the receiving communication device side, the controller may be adapted to derive a magnitude of a sound emitted outside the display range at the transmitting communication device side, an intensity of the light source being controlled in proportion of the sound magnitude. The controller may also be adapted to derive a direction of a sound emitted outside the display range at the transmitting communication device side so as to switch on the light source corresponding to the direction where the sound comes from.
These and other aspects of the invention will be apparent from and will be elucidated with reference to the embodiments described hereinafter.
The present invention will now be described in more detail, by way of example, with reference to the accompanying drawings, wherein:
Referring to
It is to be noted that face tracking, whereby faces looking at the camera inside the camera's field of view are detected, and digital zoom, whereby the digital zoom is controlled such that all faces looking at the camera are selected (and not much more), are already known from a person skilled in the art, and is performed for example by Philips Webcam SPC1300NC, featuring face detection and tracking with digital zoom.
It will also be apparent for a skilled person that standard solutions exist already for the video encoder and decoder, and that these solutions (like H.264 for video compression for example) provide the means for carrying some low-bit rate private data (namely the event data which is low bit rate data).
The invention can be applied in video telephony and conferencing systems when next to the display external light sources are present. The invention requires at the transmit end some intelligence to capture and transmit (as private data next to the audio and video) some low bit rate event data. The invention will now be explained through two different and non-limitative examples.
Referring to
Due to the limitation of the communication bandwidth it makes sense to reduce the high-resolution camera image to a lower resolution image by video signal processing means in such a way that the faces of the people looking at the camera are left intact and the surroundings are cropped of (see
In
The invention tackles the following issue. Imagine the situation shown in
Referring to
As shown in
The processor of the transmitting communication device performs motion estimation on the image sequence from the wide-angle camera view. To this end, the processor uses for example a 3D recursive search (3D-RS) algorithm for motion estimation. Such an algorithm is known to a person skilled in the art and is described for example in U.S. Pat. No. 6,996,175. Based on the 3D-RS algorithm, the processor achieves for each camera image a new motion vector field, which gives us a motion vector v=(vx,vy) for each rectangular image block of RxC pixels (often square blocks are used with R×C=8×8 pixels). One important image region is on the left of the zoom region in the camera image, denoted by RL, see
or an approximation of the average motion vector magnitude which is easier to compute (less operations on the processor):
Here NL and NR are the number of non-zero motion vectors in RL and RR, respectively. With these equations (1) and (2), vav,L or vav,R has a large value when the moving object in either the region RL or RR is of a large size. When the moving object is small, vav,L or vav,R can still have a large value if that object moves rapidly. These average motion vector magnitudes are part of the control signal that is transmitted to the receiving communication device.
As an extension to the invention, the event data can be extended with averaged color information of the pixels in a region (RR or RL). In this averaging, only those pixels are taken into account for which holds that the associated motion vector magnitude exceeds a small fixed threshold so that only the pixels that move to a sufficient extent contribute to the average color.
The control signal representative of the event data is then received by the receiving communication device. Said device includes a controller for interpreting the control signal and for controlling the light at the near-end location.
In
I
L
=f(vav,L), (3)
where f( ) is a monotonically increasing function of its argument. A similar principle is applied for the right side of the display. The intensity of a light source remains fixed according to the equation (3) up until the moment that new event data arrives. Alternatively, for more a smooth behavior of the light source, a filter operation (like linear interpolation) is applied in the time direction to the event data. To guarantee that an abrupt movement still leads to an abrupt light change, this time filter should not smooth the data when the motion makes a sudden large change, the time filter should only smooth the data when the absolute difference of subsequent average motion magnitudes is small (below a fixed threshold).
Thanks to the invention, the people at the near-end side can quickly spot a video event from the corner of their eyes outside the display range, just like the people at the far-end side can quickly spot this event. By this measure, there is an increased sense of being there for the near-end people.
For more temporal accuracy (taking into account that the corner of the human eye sees a higher temporal resolution than the center of the human eye), the generation of the event data and thus the control of the near-end light sources occurs at a higher rate than the frame rate of the transmitted video. For this the camera from which the event data is derived at the far-end location should run at a higher frame rate than the transmit frame rate.
As an extension of the invention, the event data contains average color information from the moving object, and the light source is activated with that color.
According to a second embodiment of the invention, the event data are generated from audio rather than video input at the transmit end location. A loud sound event is detected at the transmit end, and results in a sudden light flash with the light sources at the receive end. Advantageously, the people at the near-end location know immediately that the sudden sound comes from the other side and not from their own home. Also advantageously, the detection of the sudden sound event does not need to happen at the transmit end, it can also be performed on the incoming audio at the receive end where the light source is.
In a more refined method, the direction of the loud sound event is also detected and added to the event data at the transmit end. Then, at the receive end, the corresponding light sources turns on for the duration of the loud sound event, see
In
The invention may be implemented by means of hardware and/or dedicated software. A set of instructions corresponding to this software and which is loaded into a program memory causes an integrated circuit of the communication device to carry out the method in accordance with the embodiments of the invention. The set of instructions may be stored on a data carrier such as, for example, a disk. The set of instructions can be read from the data carrier so as to load it into the program memory of the integrated circuit, which will then fulfils its role. For example, the software is copied on a CD-ROM, said CD ROM being sold together with the communication device. Alternatively, the software can also be made available through the Internet. Moreover, this dedicated software may also be integrated by default in Flash memory or a Read Only Memory ROM memory of the communication device.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be capable of designing many alternative embodiments without departing from the scope of the invention as defined by the appended claims. In the claims, any reference signs placed in parentheses shall not be construed as limiting the claims. The word “comprising” and “comprises”, and the like, does not exclude the presence of elements or steps other than those listed in any claim or the specification as a whole. The singular reference of an element does not exclude the plural reference of such elements and vice-versa.
Number | Date | Country | Kind |
---|---|---|---|
08300252.7 | Aug 2008 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB09/53312 | 7/30/2009 | WO | 00 | 1/31/2011 |