The present invention relates to a method and apparatus to enable users participating to a video conference, for having control whether they will be visible or not to other remote users of the video conference system. This relates to providing a possibility to these users to put themselves on “visual mute”.
A simple solution to solve this problem is to simply leave the room. But this isn't always adequate because the user may still want to follow the meeting passively, without actively participating to it.
Another solution is that a global human meeting leader manually controls the visual state for each particular participant to the video conference call. This is however also not a feasible solution if many participants are present, or many (de) mute-requests to the system are made.
It is therefore an object of embodiments of the present invention to present a method of the known type but which does not show the aforementioned disadvantages.
According to embodiments of the present invention this object is achieved by a method for adapting video data recorded by a camera on a location during a video conference such as to hide the presence of a participant at said location of said video conference, said method comprising a step of registering a predefined gesture possibly to be performed by any participant of said video conference at said location, a step of detecting said gesture, and upon detection thereof, identifying the at least one participant having performed said gesture at said location , adapting said video data such as to eliminate data relating to said at least one participant having performed said gesture from said video data, thereby generating adapted video data, for being transmitted to other participants of said video conference on other locations.
In this way, an automated and simple solution is provided enabling visual muting of a participant upon detection of this participant performing a predefined gesture, previously registered.
In a variant said predefined gesture is detected by analyzing said video data.
In this variant gesture recognition can be performed via video analysis techniques such as image recognition or the like of the video data itself.
In another variant said predefined gesture is detected by means of receiving a trigger detection signal from and transmitted by an object on which said predefined gesture is performed by said at least one participant.
This variant allows to detect the gesture in an alternative way, by e.g. receiving a signal from an object present at the conference location, such object generally being adapted to communicate with a video conferencing client, and transmit a signal indicative of the gesture being performed by a participant. Upon receipt of such a signal, the video data may be further analyzed for recognizing the conference participant having performed this gesture. Alternatively, in other embodiments, the trigger detection signal itself may already comprise information with respect to the participant having performed the gesture such that the identification of said at least one participant having performed said gesture at said location is performed by analyzing said trigger detection signal from said object.
In an embodiment the video data may be adapted by replacing video data pertaining to said at least one participant with background video data.
This presents a simple way for visual muting of the participant.
The present invention relates as well to embodiments of a video analysis and adaptation device for adapting video data recorded by a camera at a location during a video conference, said video analysis and adaptation device being adapted to receive said video data, to analyze said video data for detecting at least one participant of said video conference in said location having performed a predefined gesture and to, upon detecting of said at least one participant having performed said predefined gesture, perform a step of adapting said video data such as to eliminate data relating to said at least one participant from said video data, thereby generating adapted video data and to provide said adapted video data on an output of said video adaptation device.
In a variant the video adaptation device is able to adapt said video data by replacing video data pertaining to said at least one participant with background video data.
In another variant the video analysis and adaptation device is further adapted to receive a trigger signal indicative of the presence of said predefined gesture, and to upon receipt of said trigger signal, start detecting said at least one participant having performed said predefined gesture.
In another embodiment the video analysis and adaptation device is further adapted to analyze said video data for detecting said predefined gesture, and to upon detection of said predefined gesture, start detecting said at least one participant performing said predefined gesture.
The present invention relates as well to embodiments of a video conferencing client adapted to receive video data from a camera recording a video conference at a location, characterized in that said video conferencing client further comprises a video analysis and adaptation device in accordance to any of the claims 6 to 8, said video conferencing client further being adapted to transmit the adapted video data towards at least one other video conferencing client serving other participants of said video conference at at least one other location.
In an embodiment the video conferencing client further comprises registration means being adapted to receive and store user information related to said predefined gesture.
In a variant said user information comprises gesture information performed by a human on an object.
In another embodiment the video conferencing client is further adapted to receive a trigger detection signal from and transmitted by an object on which said predefined gesture is performed by said at least one participant.
In these variants the video conferencing client is adapted to communicate with such an object such as to detect and recognize said predefined gesture.
The present invention relates as well to embodiments of an object being adapted to detect a predefined gesture performed by at least one participant of a video conference at a location, said object further being adapted to generate a trigger detection signal upon detection of said predefined gesture, and to provide said trigger detection signal to a video conferencing client in said location.
In an embodiment the object is further adapted to generate and transmit a registration request related to said predefined gesture to said video conferencing client.
In yet other embodiments said object is a portable communication device, such that said video conferencing client is able to receive from said portable communication device a registration request for providing said user information related to said predefined gesture possibly performed by the participant of said video conference handling said portable communication device during the time of the video conference.
In some variants the object comprises a communication unit and a movement detector or touch sensor.
In an embodiment said object may be a portable communication device such as a mobile phone, a game console, a tablet computer, a laptop etc.
Alternatively any tangible commodity object, e.g. a toaster, a coffee machine, . . . equipped with a small communication unit and a touch sensor, can be used in such a location such as a meeting room.
The present invention relates as well to embodiments of a video conferencing server, comprising a video analysis and adaptation device in accordance with any of the previous claims 6-8.
The video conferencing server may further be able to receive from respective video conferencing clients the video data of the conference participants in these respective locations.
In a variant these video conferencing clients can also transmit the predefined gesture information towards the server. In other embodiments these predefined gestures can be the same for all clients, and can be centrally stored within the server. In some embodiments the conferencing server communicates said predefined gesture information towards the video conferencing clients in the different locations. For these embodiments, embodiments of the video conferencing clients are adapted to, upon detection of the predefined gesture being performed by a participant, to transmit a signal to the video conferencing server, indicative of the gesture being performed. The server can then further analyze the video images received from this particular client, and adapt the video data accordingly.
The present invention relates as well to a computer program product comprising software adapted to perform the method steps in accordance to any of the claims 1 to 5, when executed on a data-processing apparatus.
It is to be noticed that the term ‘coupled’, used in the claims, should not be interpreted as being limitative to direct connections only. Thus, the scope of the expression ‘a device A coupled to a device B’ should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means.
It is to be noticed that the term ‘comprising’, used in the claims, should not be interpreted as being limitative to the means listed thereafter. Thus, the scope of the expression ‘a device comprising means A and B’ should not be limited to devices consisting only of components A and B. It means that with respect to the present invention, the only relevant components of the device are A and B.
The above and other objects and features of the invention will become more apparent and the invention itself will be best understood by referring to the following description of an embodiment taken in conjunction with the accompanying drawings wherein:
Each of the locations furthermore is equipped with a video conferencing client device, respectively denoted VCC1, VCC2 to VCCn, for the respective locations room 1, room 2 to room n. Such a video conferencing client is adapted to receive the video data recorded by the respective cameras coupled to it, at its respective location, to process the video data e.g. by compressing and encoding the data, or by mixing the data into one composed video in case several cameras are coupled to one video conferencing client, followed by encoding this resulting composed video. The encoded video from one location is then transmitted by the video conferencing client to the other video conferencing clients in the other locations. This also means that a particular video conferencing client, e.g. VCC1, will thus also receive the processed video from locations room 2 to room n and provide these to a display unit D, e.g. a screen at the particular location.
Embodiments of the present invention aim to provide a method for adapting video data recorded by one or more cameras during a video conference at a location such as to hide the presence of a participant of this video conference on this location, upon receiving a trigger by this participant, by means of this participant performing a predetermined gesture. This gesture can be anything, e.g. waving with a left hand, turning a wallet upside down, pushing a chair, turning over or rotating a mobile device equipped with some movement detection and communication capabilities such as a game console, a tablet computer, a cell phone, etc. In one embodiment, where only one gesture is to be used for any participant of a certain location to mute himself/herself, this particular gesture is agreed upon by all participants within a certain location, and is registered by the video conferencing client at this location. In these embodiments each location can have a separate location-specific agreed upon gesture which will trigger the visual mute. Alternatively a video conferencing system with a more centralized approach, may use an initially defined or preconfigured gesture which is the same across all locations. In such situations each of the video conferencing clients may be preconfigured to be triggered by this gesture. In this case the step of registering a predefined gesture possibly to be performed by a participant of the video conference on the particular location, can merely reduce to e.g. the storage of this gesture within the different video conferencing clients. In the earlier described situation, where the participants may initially agree upon the gesture to be recognized per location, the registration of this gesture is to be performed in an initializing step.
In other, more complex embodiments several gestures may be registered, enabling each person to mute him/herself based on a different gesture. In these embodiments the registration procedure will however be more complex.
In the embodiment on
For the more complex embodiments where each person may use a different gesture for being muted, each of these persons then has to register with his/her particular gesture.
Within the video conferencing client a gesture registration means, denoted GR1 for VCC1, can be present to receive and store such information related to the predefined gesture G1. In the aforementioned embodiments this registration means can thus also receive video information from the camera with the gesture to be recognized later on. However in other embodiments such a separate gesture registration means is not necessary and its functionality can be incorporated within a memory or processing unit of the VCC itself.
In these embodiments, depicted in
The analysis and adaptation of the conference video data, denoted video room1 and recorded by the camera, denoted cam 1, in
The video analysis and adaptation device, denoted VAAS, is generating adapted video data, for possibly being encoded and being transmitted to other participants on other locations participating to said video conference. In the embodiment depicted in
For the situation depicted in
Depending upon the gesture itself, its detection can thus be performed by a mere analysis of the conference video data . This is for instance the case if the gesture relates to a gesture to be performed solely by the human body itself, e.g. waving with a hand (as was depicted in
Alternatively the predefined gesture may be detected by the presence of a signal transmitted by an object upon being touched by a participant of the video conference. This may for instance be the case if the gesture relates to touching a particular button on a device with some motion sensing and communication capabilities such as a mobile phone, laptop, . . . turning a mobile phone or tablet computer, touching a watch with touch screen and communication capabilities such as bluetooth, etc.
In these situations the predefined gesture can be simply detected by the object itself, e.g. in case this object is equipped with a movement sensor. Upon detection of this gesture, the object will then generate and transmit a particular trigger detection signal to the video conferencing client. In some cases e.g. when each video conferencing client comprises a video analysis and adaptation device VAAS, this object can also directly transmit this trigger detection signal to this VAAS. In such situations the registration of the gesture will then reduce to an initial transmission of such a signal by this particular object to the VCC or VAAS, which will then accordingly store this. This situation is shown in
The trigger detection signal can thus be a signal transmitted by the object such as a cell phone, which is generated by this object upon detecting the particular movement by this object (e.g. turning over or turning upside down),In another example it can be signal transmitted by an object e.g. a watch upon detecting, by this watch , that a particular button is pushed, etc. Of course all types of objects having these capabilities can be used to this purpose.
For all such cases, where a gesture detection signal is generated by an object, the video analysis and adaptation device will then only analyze and adapt the video from the camera, upon being triggered by a trigger signal. This trigger signal can either be transmitted from the VCC to the VAAS, which signal is itself generated by the VCC upon having received a trigger detection signal by a touched upon object. Alternatively the trigger signal can be directly received from this object by the VAAS. As previously mentioned, in a decentralized situation, such as depicted in
In a particular implementation, the object communicating with the video conferencing client can thus be a portable communication device, such as a mobile phone, a laptop, a game console etc. Such devices are then adapted to generate, e.g. before the start of the video conference, a registration request for providing user information related to the predefined gesture possibly performed by a participant, e.g. a gesture of turning this device upside down. The generation of such a registration request then implies a detection of this particular movement by this device, followed by the generation of a particular trigger detection signal to the video conferencing client at the particular location.
In case every participant at that location wants to enjoy this feature, all these devices then need to send their registration signal to the conferencing client This was schematically indicated in
In this embodiment, VAAS will thus immediately recognize the person having performed the gesture, as each person is linked to his/her personal object, with a specific trigger.
In the other embodiments, where there is no direct registration of which mobile phone belongs to which user, and any gesture detection signal from any mobile phone may lead to a visual muting of the person having touched this mobile phone, a detection of the particular person having performed the gesture is still needed, by means of the analysis of the video conferencing data.
For a more centralized situation, where the VAAS is part of a video conferencing server which is depicted in
In other embodiments somehow mixed architectures may exist where on one location a video conferencing client will also incorporate the features of a video conferencing server, for the video conferencing clients at the other locations.
An more detailed embodiment of another embodiment of a video conferencing server is depicted in
In yet other embodiments it may even be possible to enable visual muting of a particular person, by another person performing a predefined gesture. In such cases the gesture registration database will contain cross-reference information about which gesture will trigger which person to be muted.
It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
While the principles of the invention have been described above in connection with specific apparatus, it is to be clearly understood that this description is made only by way of example and not as a limitation on the scope of the invention, as defined in the appended claims. In the claims hereof any element expressed as a means for performing a specified function is intended to encompass any way of performing that function. This may include, for example, a combination of electrical or mechanical elements which performs that function or software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function, as well as mechanical elements coupled to software controlled circuitry, if any. The invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for, and unless otherwise specifically so defined, any physical structure is of little or no importance to the novelty of the claimed invention. Applicant thus regards any means which can provide those functionalities as equivalent as those shown herein.
Number | Date | Country | Kind |
---|---|---|---|
12305965.1 | Aug 2012 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/065999 | 7/30/2013 | WO | 00 |