The present invention refers to a method for modifying video data during a video conference session according to claim 1, a computer program product for executing such a method according to claim 14 and a system for video conference sessions according to claim 15.
The technical field of the present invention refers to video conference systems. Such video conference systems enable visual communication via separated terminals, like computer or laptops or smartphones, over a data connection, in particular an internet connection. In some video conference systems head-mounted displays (HMDs) are used. Such HMD enable a virtual reality (VR) and/or augmented reality (AR) and/or mixed reality (MR).
Other than video conference, virtual meeting rooms, virtual spaces, role playing games and virtual environments in general are also carried out with HMDs and HMDs are annoying and disturbing in such applications as well if users real face cannot be seen.
Document U.S. Pat. No. 6,806,898B1 discloses a system and method for automatically adjusting gaze and head pose in a videoconferencing environment, where each participant has a camera and display. The images of participants are rendered in a virtual 3D space. Head-pose orientation and eye-gaze direction are corrected so that a participant's image in the 3D space appears to appear to be looking at the person they are looking at on the screen. If a participant is looking at the viewer, their gaze is set toward the “camera”, which gives the perception of eye-contact.
Document US20080252637 discloses a virtual reality-based teleconferencing.
Document US20110202306 discloses an adjustable virtual reality system.
Video-conference with head-mounted-displays can be disturbing, because the other people in the conference only see a small portion of the face of a person wearing said HMD—due to the large size of the HMD.
Thus, it is the object of the present invention to provide a method for video conferences and a video conference system that improves user comfort during a video conference session.
The before mentioned object is solved by a method for modifying video data during a video conference session according to claim 1. The inventive method comprises at least the steps: providing a first terminal, comprising a first camera unit for capturing of at least visual input and a first head-mounted display, providing a second terminal at least for outputting visual input, providing a server means or communication means or transfer medium, wherein said first terminal and said second terminal are connected via the transfer medium, in particular server means, for data exchange, providing or capturing first basic image data or first basic video data of a head of a first person with the first camera unit, capturing first process image data or first process video data of the head of said first person while said first person wears the head-mounted display with the first camera unit, determining first process data sections of the first process image data or first process video data representing the visual appearance of the first head-mounted display, generating a first set of modified image data or modified video data by replacing the first process data sections of the first process image data or first process video data by first basic data sections, wherein the first basic data sections are part of the first basic image data or first basic video data and are representing parts of the face of said person, in particularly the eyes of said person, in particular for outputting the first modified image data or first modified video data, in particular representing a complete face of said person, via at least one further terminal, in particular at least the second terminal.
Thus, the present invention discloses a method to provide a preferably full face video-conference while one, two or multiple users wear head-mounted-displays (HMD) such as virtual reality glasses (VR) or augmented reality glasses (AR). In case where one, two or multiple parties using HMDNR/AR devices significant portion of user face, especially eyes, are covered with HMDNR/AR device so that other user are not able see full face and making video-conference somehow meaningless. With the novel method, one or multiple prior recorded face pose/s of a person respectively an user is overlaid (superimposed) on real-time video and transferred to remote destinations to establish video-conference with full face view without any obstacle.
Further preferred embodiment are subject-matter of dependent claims and/or the following specification parts.
According to a preferred embodiment of the present invention the second terminal comprises a second camera unit and a second head-mounted display. The invention is further characterized by steps like providing or capturing second basic image data or second basic video data of a head of a second person with the second camera unit, and capturing second process image data or second process video data of the head of said second person while said second person wears the second head-mounted display with the second camera unit and determining second process data sections of the second process image data or second process video data representing the visual appearance of the second head-mounted display and generating respectively forming a second set of modified image data or modified video data by replacing the second process data sections of the second process image data or second process video data by second basic data sections, wherein the second basic data sections are part of the second basic image data or second basic video data and are representing parts of the face of said second person, in particularly the eyes of said second person, in particular for outputting the second modified image data or second modified video data via the first terminal. This embodiment is beneficial since not just one or not just at least one HMD is integrated into the video conference method. Thus, two or at least two persons or user can use the inventive video conference method while wearing HMDs.
The first modified image data or first modified video data and/or the second modified image data or second modified video data is/are according to a further preferred embodiment of the present invention outputted via at least one further terminal connected to the server means. This embodiment is beneficial since not every terminal needs to have a HMD. Thus, persons or users wearing or not wearing a HMD can interact in the same manner, in particular the faces, in particular full facer or without HMD, of each user respectively person are displayed on the one, two or at least one or at least two or multiple terminals.
A terminal can be understood as every device having a screen or projecting a visual image on a surface or in space. Thus, terminals are preferably laptops, tablet PCs, desktop PC, smartphones, TVs, etc. It is further conceivable that terminal and HMD are one device.
A further terminal comprises according to a further preferred embodiment of the present invention a further camera unit and a further head-mounted display. The invention is further characterized by the steps providing or capturing further basic image data or further basic video data of a head of a further person with the further camera unit, capturing further process image data or further process video data of the head of said further person while said further person wears the further head-mounted display with the further camera unit, determining further process data sections of the further process image data or further process video data representing the visual appearance of the further head-mounted display, forming a further set of modified image data or modified video data by replacing the further process data sections of the further process image data or further process video data by further basic data sections, wherein the further basic data sections are part of the further basic image data or further basic video data and are representing parts of the face of said further person, in particularly the eyes of said further person, in particular outputting the further modified image data or further modified video data via the first terminal and/or via the second terminal and/or any further terminal, in particular at the same time. This embodiment is beneficial since multiple users or persons, in particular more than two or three or more than three or four or more than four can wear respectively use HMDs. It is also conceivable that different types of HMDs, in particular VR and AR devices, are utilized in the same video conference session. Thus, each HMD represented by process image data or process video data can be replaced with data representing a face part, in particular the eyes, of a user using said respective HMD.
First, second and/or further basic video data or first, second and/or further basic image data are according to a preferred embodiment of the present invention stored in a memory of the respective terminal and/or on the server means. First, second and/or further basic video data or first, second and/or further basic image data are captured once and processed in case first, second and/or further modified video data or first, second and/or further modified image data is required. Alternatively first, second and/or further basic video data or first, second and/or further basic image data are captured each time said first, second and/or third person joins a video conference and the first, second and/or further basic video data or first, second and/or further basic image data is updated or replaced and processed in case first, second and/or further modified video data or first, second and/or further modified image data is required.
At least one terminal and preferably the majority of terminals or all terminals are according to a further preferred embodiment of the present invention comprising means for capturing and/or outputting audio data, wherein said captured audio data captured by one terminal is at least routed to one or multiple further terminals. Such a means can be e.g. a microphone. The audio capturing means can be arranged at the HMD or can be part of the terminal.
The position of the first head-mounted display with respect to the face of the first person is determined according to a further preferred embodiment of the present invention by means of object recognition. The shape of the first head-mounted display is preferably determined by object recognition and/or identification data visually or electronically provided. Electronical identification data is provided due to a data connection between the first head-mounted display and the first terminal. The position of the second head-mounted display with respect to the face of the second person is determined according to a further preferred embodiment of the present invention by means of object recognition. The shape of the second head-mounted display is preferably determined by object recognition and/or identification data visually or electronically provided. Electronical identification data is provided due to a data connection between the second head-mounted display and the second terminal. The position of the further head-mounted display with respect to the face of the further person is determined according to a further preferred embodiment of the present invention by means of object recognition. The shape of the further head-mounted display is preferably determined by object recognition and/or identification data visually or electronically provided. Electronical identification data is provided due to a data connection between the further head-mounted display and the further terminal.
Face movement data representing movements of skin portions of the face of the first person is according to a further preferred embodiment of the present invention generated, wherein the movements of the skin portions are captured by said first camera unit. Face movement data representing movements of skin portions of the face of the second person is preferably also generated, wherein the movements of the skin portions are captured by said second camera unit. Face movement data representing movements of skin portions of the face of the third person is preferably also generated, wherein the movements of the skin portions are captured by said third camera unit.
Eye movement data representing movements of at least one eye of the first person is generated according to a further preferred embodiment of the present invention, wherein the movements of the eye are captured by an eye tracking means. Eye movement data representing movements of at least one eye of the second person is also generated, wherein the movements of the eye are captured by a second eye tracking means. Eye movement data representing movements of at least one eye of the further person is also generated, wherein the movements of the eye are captured by a further eye tracking means. Skin movements of the face can be detected by an optional face movement detector, wherein said face movement detector can be provided in addition or alternatively to an eye tracking means. It is further conceivable that a combined eye tracking and face movement detector is provided, in particular arranged on or inside the HMD or as part of the HMD.
First basic data sections are according to a further preferred embodiment modified in dependency of said captured face movement data of the face of the first person and/or of said captured eye movement data of at least one eye of the first person. Second basic data sections are according to a further preferred embodiment modified in dependency of said captured face movement data of the face of the second person and/or of said captured eye movement data of at least one eye of the second person. Third basic data sections are according to a further preferred embodiment modified in dependency of said captured face movement data of the face of the third person and/or of said captured eye movement data of at least one eye of the third person.
Eye data representing shapes of the eyes of the first person as part of the first basic data sections is identified according to a further preferred embodiment. The eye data is preferably modified in dependency of said captured eye movement data and/or skin data representing the skin portions of the face of the first person above and/or below the eyes in the first basic data section is preferably identified. The skin data is preferably modified in dependency of said captured face movement data. Eye data representing shapes of the eyes of the second person as part of the second basic data sections is identified according to a further preferred embodiment. The eye data is preferably modified in dependency of said captured eye movement data and/or skin data representing the skin portions of the face of the second person above and/or below the eyes in the second basic data section is preferably identified. The skin data is preferably modified in dependency of said captured face movement data. Eye data representing shapes of the eyes of the further person as part of the further basic data sections is identified according to a further preferred embodiment. The eye data is preferably modified in dependency of said captured eye movement data and/or skin data representing the skin portions of the face of the further person above and/or below the eyes in the further basic data section is preferably identified. The skin data is preferably modified in dependency of said captured face movement data. This embodiment is beneficial since visual data representing the eye movement of the respective person using a HMD can be utilized to further enhance the usability and/or comfort of a video conference session respectively system.
An eye tracking means is preferably a near eye PCCR tracker. Said eye tracking means is preferably arrange on or inside the first head-mounted display and/or on or inside the second head-mounted display and/or on or inside the further head-mounted display.
According to a further preferred embodiment of the present invention the inventive method comprises the steps of receiving information, in particular by means of camera unit, relating to the pose of the head of the first person, orienting a virtual model of the head and facial gaze of the head according to the pose of the object, in particular head of first person, projecting visible pixels from a portion of the videoconference communication onto the virtual model, creating synthesized eyes of the head that produces a facial gaze at a desired point in space, orienting the virtual model according to the produced facial gaze; and projecting the virtual model onto a corresponding portion of the videoconference communication, wherein at least one part of the first set of modified image data or modified video data is replaced by said virtual model. This embodiment is beneficial since the respective (first, second and/or further) process image data or (first, second and/or further) process video data can be modified to further enhance the inventive method respectively inventive system.
Creating synthesized eyes preferably includes receiving segmentation information of the eyes and estimating iris and pupil information to create the synthetic eye. Synthesized eyes further preferably include digitally drawing the synthetic eyes on a corresponding portion of the video conference communication using the segmentation information to replace the original eyes with the synthetic eyes. Preferably, a step of digitally adjusting the synthesized eyes of the virtual model in real time during videoconference communication is provided. The video conference communication preferably occurs between at least two participants and is highly preferably facilitated by at least one of the Internet, integrated services digital network, or a direct communication link.
The present invention further refers to a computer program product for executing a method according to claims 1-13.
The present invention further refers to a system for video conference sessions. Said system preferably comprises at least a first terminal, comprising a first camera unit for capturing of at least visual input and a first head-mounted display, a second terminal at least for outputting visual input, a server means, wherein said first terminal and said second terminal are connected via the server means for data exchange, wherein first basic image data or first basic video data of a head of a first person is provided or captured with the first camera unit, wherein first process image data or first process video data of the head of said first person is captured while said first person wears the head-mounted display with the first camera unit, wherein first process data sections of the first process image data or first process video data representing the visual appearance of the first head-mounted display is determined, wherein a first set of modified image data or modified video data is formed by replacing the first process data sections of the first process image data or first process video data by first basic data sections, wherein the first basic data sections are part of the first basic image data or first basic video data and are representing parts of the face of said person, in particularly the eyes of said person, wherein the first modified image data or first modified video data, in particular representing a complete face of said person, are outputted via the second terminal.
Further benefits, goals and features of the present invention will be described by the following specification of the attached figures, in which exemplarily components of the invention are illustrated. Components of the systems and methods according to the inventions, which match at least essentially with respect to their function can be marked with the same reference sign, wherein such components do not have to be marked or described multiple times with respect to said figures.
In the following the invention is just exemplarily described with respect to the attached figures.
Reference number 103Y indicates a second camera unit. The head of a second user 111 is captured by said second camera unit 103Y. The second camera unit 103Y is a camera unit that can be utilized during video-conference sessions. The second camera unit 103Y is preferably positioned externally in exemplary case but any other option is possible. It is also conceivable that second user 111 uses or wears a HMD during an inventive video conference session.
Reference number 103Z indicates a further camera unit. The head of a further user 114 is captured by said further camera unit 103Z. The further camera unit 103Z is a camera unit that can be utilized during video-conference sessions. The further camera unit 103Z is preferably positioned externally in exemplary case but any other option is possible.
Reference number A indicates a case in which first user 101 and second user 111 are communicating via the video conferencing system. In this case only first user 101 wears a HMD, second user 111 utilizes an optical output means which is different from a HMD, like a screen. In this case data captured by first camera 103X and by second camera 103Y are transferred to the other party via any transfer medium 105 for video-conferences. The transfer medium 105 is preferably a server unit, in particular the Internet, in exemplary case. It is further possible to send audio data as well as video data via path 104 from a first user terminal 100A on the side of the first user 101 to a second user terminal 100B on the side of the second user 111 and vice versa. Thus, video data captured with the first camera unit 103X is outputted via screen 109 to the second user 111. Thus, the second user 111 sees the first user wearing a HMD 110.
Reference number B indicates a case in which first user 101 and further user 114 are communicating via the video conferencing system. In this case both first user 101 and further user 114 are using respectively wearing HMDs. In this case data captured by first camera 103X and by further camera 103Z are transferred to the other party via any transfer medium 105 for video-conferences. The transfer medium 105 is preferably a server unit, in particular the internet, in exemplary case. It is further possible to send audio data as well as video data via path 104 from a first user terminal 100A on the side of the first user 101 to a further user terminal 100C on the side of further user 114 and vice versa. Thus, video data captured with the first camera unit 103X is outputted via HMD 112 to the second user 111. Video data captured by first camera unit 103A is preferably outputted via right screen of HMD 112 and left screen of HMD 112. Thus, second user 111 sees the first user wearing a HMD 110, that means video of first user 101 is transferred 113A, 1138 without any alteration to right screen 112A of HMD 112 respectively glass and to left screen 112B of HMD respectively glass.
Thus,
Also in case B further user 114 sees the eyes, in particular the full face of first person 101.
Thus, reference number 203A represents image data that transfers full face view of first user 101 even he or she uses HMD for right display of HMD, in particular VR glass. Thus, reference number 203B represents image data that transfers full face view of first user 101 even he or she uses HMD for right display of HMD, in particular VR glass.
Therefore, the present invention discloses a method for modifying video data during a video conference session or a method for providing an advanced video conference session. A first terminal 100A is provided and used, wherein first terminal 100A can be a laptop, desktop, mobile phone, tablet PC, TV or the like. Said terminal 100A preferably comprises a first camera unit 103X for capturing of at least visual input and further comprises a first head-mounted display 102. Furthermore, a second terminal 100B is provided and used at least for outputting visual input. Preferably a data transfer means, in particular a server means 106 is provided, wherein said first terminal 100A and said second terminal 100B are connected via server means 106 respectively data transfer means for data exchange. First basic image data or first basic video data 201 of a head of a first person 101 is preferably provided or captured with the first camera unit 103X. First process image data or first process video data of the head of said first person 101 while said first person 101 wears the first head-mounted display 102 is captured with first camera unit 103X. First process data sections of the first process image data or first process video data representing the visual appearance of the first head-mounted display 102 are captured, wherein a first set of modified image data or modified video data is generated by replacing the first process data sections of the first process image data or first process video data by first basic data sections. The first basic data sections are preferably part of the first basic image data or first basic video data and are representing parts of the face of said first person 101, in particularly the eyes of said person 101. The first modified image data or first modified video data, in particular representing a complete face of said first person 101, can be outputted respectively shown via an output device, in particular a screen, of the second terminal 100B.
Thus, the invention is related with video-conferencing (or any other tele-conferencing technique) while one or more users using respectively wearing HMD (Head-mounted-display). A HMD (Head-mounted-display) can be any virtual reality glasses either stand alone h its own display or mobile phone attachment, augmented reality glasses, which superimpose augmented images (videos) over real world, mixed reality devices and/or head-up-displays (HUD).
Camera devices respectively camera units 103X-103Z can be any camera, some examples external cameras or embedded camera of mobile phone or any computer. Camera can be single lens camera or dual lens (multiple lens) even light field camera.
Video-conference can be done over internet, with instant messenger (IM) or voip environment. Video-conference term preferably covers all kind of videoconferencing activities such as teleconference.
User head data 301 is either taken with learning cycle or recorded video or similar technique. Head data for the head of first user 101 is preferably captured of a front perspective of said head. Reference number 302 indicates a rotation movement of said head and/or of said camera unit 310, in particular in a range of at least 180° or of at least 270° or of 360°. Camera unit 310 is preferably a recording device, in particular a simple camera or more complex devices such as immersive or light depth camera preferably with scanning laser support. It is further conceivable that camera unit 310 is the first camera unit 103X or another camera unit. The size of first HMD 102 is detected and the user head data 301 is suitably cropped 307 for replacing the data representing the HMD of image or video data captured of first person 101 while using respectively wearing a HMD. The cropped data is transferred 308 to an unit for graphic respectively image data or video data modification.
Cropped data 305 is provided 309 for modifying image or video data captured by first camera unit 103X, in particular process image data or process video data.
Therefore,
Further possibilities, like data transfer via wireless technology like NFC or Bluetooth or RFID or WiFi or non-wireless respectively cable connection technology, in particular USB, are additionally or alternatively conceivable.
According to reference number 901 the inventive system respectively a video conference session is started. Thus, it is checked in 902 if the system is really started, in case system is not started 902N nothing happens 904. In case system is really started 902Y it is checked if at least one HMD is detected 904. In case no HMD device is detected 904N nothing happens. In case a HMD device is detected 904Y recorded data (basic image data or basic video data) is requested respectively loaded or generated. Then, initiation of system ends 908.
Thus, overlaying computations shown in
Thus, the present invention refers to a system and a method, wherein the method preferably serves for modifying video data during a video conference session and comprises at least the steps: Providing a first terminal 100A, comprising a first camera unit 103X for capturing of at least visual input and a first head-mounted display 102; Providing a second terminal 100B at least for outputting visual input, Providing a server means 105, wherein said first terminal 100A and said second terminal 100B are connected via the server means 105 for data exchange, Providing or capturing first basic image data or first basic video data of a head of a first person 101 with the first camera unit 103X, capturing first process image data or first process video data of the head of said first person 101 with the first camera unit while said first person 101 wears the head-mounted display 102, determining first process data sections of the first process image data or first process video data representing the visual appearance of the first head-mounted display 102, generating a first set of modified image data or modified video data by replacing the first process data sections of the first process image data or first process video data by first basic data sections, wherein the first basic data sections are part of the first basic image data or first basic video data and are representing parts of the face of said person, in particularly the eyes of said first person 101.
Outputting the first modified image data or first modified video data, in particular representing a complete face of said first person 101, via the second terminal 100B.
Therefore, the inventive method, first records or takes pictures of user face without VR/AR glasses. Any technique such as light depth photography or any other 3D or immersive video/photo technique can be used during this process. After user face area near the eyes is modelled, it is preferably stored on device respectively terminal. When user starts to use VR/AR glass in either video conferencing or video calls, computer easily detects the edges of VR/AR glass (or any other method such as identifiers etc.) and combines the video with VR/AR glass to normal video, result is other user see normal face without VR/AR glasses. Additional feature will be eye tracking inside VR/AR glass can also position eyes correctly.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/052449 | 2/3/2017 | WO | 00 |