The present disclosure relates to a system and method for controlling an online meeting.
Online meeting services such as Teams, Zoom, and Skype are known. Typically, during an online meeting using such services, a camera implemented in a laptop captures and provides a video to the other attendees.
In the conventional meeting service, it may be easy to see a face or a facial expression of each attendee who is located in front of a camera implemented in a laptop but it may not be easy to see the other information such as a whiteboard in a meeting room, ROIs specified in a meeting room, a face of a presenter who is not facing a laptop, or the like.
An apparatus and control method for controlling an online meeting is provided for receiving, from a camera, a captured video of the meeting room, transmitting, via a first server, the captured video of the meeting room to an online meeting client, specifying an ROI (Region Of Interest) in the meeting room, controlling an optical zoom magnification of the camera for capturing a still image of the ROI in the meeting room, and transmitting, via a second server that is different from the first server processing the captured video of the meeting room, the still image that the camera captures after the control for the optical zoom magnification to the online meeting client. As a result, in an online meeting, information other than attendees' face located in front of a PC will be improved in visibility.
These and other objects, features, and advantages of the present disclosure will become apparent upon reading the following detailed description of exemplary embodiments of the present disclosure, when taken in conjunction with the appended drawings, and provided claims.
Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope and spirit of the subject disclosure as defined by the appended claims.
In an exemplary embodiment, the client computer A 106 and the client computer B 107 executes the same computer programs for an online meeting to work as the online meeting clients. However, the client computer A 106 and the client computer B 107 are described by a different name according to whether the computer is located in the meeting room 101 or not, for explanation purpose.
However, this is not seen to be limiting. For example, the operation reference to
In S101, the control apparatus 103 receives, from the camera 102, a captured video of the meeting room 101. The camera 102 is set in the Meeting room 101, performs real-time image capturing in the Meeting room 101, and transmits the captured video to the control apparatus 103.
In S102, the control apparatus 103 performs detection process for detecting one or more face regions of respective users from the captured video. More specifically, the control apparatus 103 identifies one or more video frames from among a plurality of video frames constituting the captured video, and perform face detection process on the identified video frame(s). As illustrated in
In S103, the control apparatus 103 transmits the three cropped face regions (face images) to the user recognition service 121 to obtain Usernames corresponding to the three face regions. The user recognition service 121 comprises the database 122 which stores Facial data and Username information associated with respective facial data. The user recognition service 121 is able to compare the face images received from the control apparatus 103 with Facial data in the database 122 to identify the Username corresponding to the face region detected from the video. The identified username is provided from the user recognition service 121 to the control apparatus 103.
In S104, the control apparatus 103 transmits, via the first server 104, to the client computers 106 and 107, (i) a video 113 of the meeting room 101, (ii) a video 114 of the face region of the person 108 which is cropped from the video of the meeting room 101, (iii) a video 115 of the face region of the person 109 which is cropped from the video of the meeting room 101 and (iv) a video 116 of the face region of the person 110 which is cropped from the video of the meeting room 101. The control apparatus 103 communicates with the first server 104 based on a first communication protocol (e.g. WebRTC) on which a bit rate of a media content is changed according to an available bandwidth of a communication path between the control apparatus 103 and the first server 104.
In S105, the control apparatus 103 transmits, via the second server 105, to the client computers 106 and 107, the name and position information 120 which contains name and position of each region. The control apparatus 103 communicates with the second server 105 based on a second communication protocol (e.g. HTTP) on which a bit rate of a media content is not changed according to an available bandwidth of a communication path between the control apparatus 103 and the second server 105.
As shown in
The control apparatus 103 may also specify the type 405 of each video stream based on face detection process, presenter designation process, ROI designation process, whiteboard detection process and the like. For example, if the face detection process detects a region, the type 405 of the region may be “Attendee”, and if the whiteboard detection process detects a region, the type 405 of the region may be “Whiteboard”. The control apparatus 103 obtains the Name 406 of “Attendee” from the user recognition service 121 which performs the user recognition process using the database 122 as described above. The name 406 of “Whiteboard” is determined according to the detection or designation order. For example, the name 406 of a whiteboard that is detected (or designated) first may be “Whiteboard A” and the name 406 of a whiteboard that is detected (or designated) may be “Whiteboard B”.
The name 406 of “ROI” is determined according to the detection or designation order. For example, the name 406 of an ROI firstly detected or designated may be “ROI-1” and the name 406 of an ROI that is secondly detected or designated may be “ROI-2”. If the number of ROI is limited to One, and a new ROI is detected or designated while an ROI is already existing as shown in ID=07 in
Returning to
In S107, the control apparatus 103 determines whether the online meeting is closed. The online meeting will be closed in response to a trigger event that the control apparatus 103 detects a predetermined gesture for finishing an online meeting from a video captured by the camera 102. In the present exemplary embodiment, when the control apparatus 103 keeps detecting a hand gesture showing a palm (see
In S108, the control apparatus 103 performs process regarding ROI. The detailed explanation of this ROI process will be provided later with reference to
In S109, the control apparatus 103 performs process regarding Whiteboards. The detailed explanation of this Whiteboard process will be provided later with reference to
In S110, the control apparatus 103 performs process regarding Presenter. The detailed explanation of this Presenter process will be provided later with reference to
Each of the client computers 106 and 107 is able to display an online meeting window.
In T101, the client computer 106 receives, via the first server 104, from the control apparatus 103, (i) a video 113 of the meeting room 101, (ii) a video 114 of the face region of the person 108 which is cropped from the video of the meeting room 101, (iii) a video 115 of the face region of the person 109 which is cropped from the video of the meeting room 101 and (iv) a video 116 of the face region of the person 110 which is cropped from the video of the meeting room 101.
In T102, the client computer 106 receives, via the second server 105, from the control apparatus 103, the name and position information 120 which contains name and position of each region.
In T103, the client computer 106 receives, via the second server 105, from the control apparatus 103, (i) a still image 117 of the Whiteboard A 111 which is cropped by the control apparatus 103 from a video frame of the meeting room 101, (ii) a still image 118 of the
Whiteboard B 112 which is cropped by the control apparatus 103 from a video frame of the meeting room 101 and (iii) a still image 119 of the ROI (Region Of Interest) which is cropped by the control apparatus 103 from a video frame of the meeting room 101.
The client computer 106 may be able to display the online meeting window 900 based on the information received in T101-T103.
When the single view indicator 901 is selected, the online meeting window 900 may contain the display region 911. In the present exemplary embodiment, if the two view indicator 902 is selected as shown in
Also, the user of the client computer 106 is able to choose one or more icons from among a meeting room icon 906, a presenter icon 907, a whiteboard icon A 908, a whiteboard icon B 909 and an ROI icon 910 as shown in
In the present exemplary embodiment, a display order of the video icons 906-910 within the menu region 917 is determined based on a generation order of each media stream. For example, if the video 113 of the meeting room 101 is firstly defined among the all videos provided from the control apparatus 103 to the client computer 106, the meeting room icon 906 corresponding to the video 113 is located at the right most within the menu region 917. Similarly, in the present exemplary embodiment,
In the present exemplary embodiment, the display order of the HR images within the menu region 917 is also determined based on a generation order of each HR image. In other words, as shown in
Note that the client computer 106 may be able to remove any of the icons 906-910 and 914-916 as per user operations. In the present exemplary embodiment, when a mouse cursor 919 moves onto an arbitrary icon (e.g. icon 914), a removal button 920 for a remove instruction is displayed, as shown in
Returning to
In T105, the client computer 106 determines whether to display username/position on the online meeting window 900.
In T106, the client computer 106 updates display contents on the online meeting window 900 based on the process in T101-T105.
In T107, the client computer 106 determines whether to leave the online meeting. In the present exemplary embodiment, when the user of the client computer 106 clicks or taps the leave button 918 on the online meeting window 900, the client computer 106 determines to leave the online meeting. In addition, the client computer 106 may determine to leave the online meeting when the control apparatus 103 inform the client computer 106 of the meeting is over. If the client computer 106 determines not to leave the online meeting, flow returns to T101-T103. If the client computer 106 determines to leave or over the online meeting, flow proceeds to END.
ROI process described in S108 of
In A101, the control apparatus 103 determines whether the first predefined hand gesture is being detected for a first predetermined time period (e.g. 2 seconds). In the present exemplary embodiment, the control apparatus 103 detects an open-hand gesture (see
In A102, the control apparatus 103 performs a control for output a predetermined sound E for notifying the user that the first predefined hand gesture is detected for the first predetermined time period and it's time to change the hand gesture to a second predefined hand gesture. After outputting the predetermined sound E, flow proceeds to A103.
In A103, the control apparatus 103 determines whether the second predefined hand gesture is detected within a second predetermined time period (e.g. 3 seconds) after outputting the predetermined sound E. In the present exemplary embodiment, the control apparatus 103 detects a closed-hand gesture (see
In A104, the control apparatus 103 determines whether the second predefined hand gesture is being detected for a third predetermined time period (e.g. 2 seconds). If the control apparatus 103 determines that the control apparatus 103 continuously detects the second predefined hand gesture for the third predetermined time period, flow proceeds to A105. When the control apparatus 103 determines “No” in either of A101, A103 and A104, flow proceeds to A111 and the control apparatus 103 determines notifies a user of an error during the ROI designation process.
In A105, the control apparatus 103 performs a control for output a predetermined sound F for notifying the user that the second predefined hand gesture is detected for the third predetermined time period and the ROI designation process is successfully completed. After outputting the predetermined sound F, flow proceeds to A106.
In A106, the control apparatus 103 adds a new media stream according to the ROI designation. More specifically, the control apparatus 103 adds a new media stream 119 to periodically transmit the ROI images cropped from a video frame of the meeting room 101 to the client computers 106 and 107 via the second server 105. If the number of ROIs already designated by the user is larger than a threshold number, the control apparatus 103 may update the oldest ROI position with the new ROI position instead of adding the new media stream.
In A107, the control apparatus 103 suspends transmitting video streams 113-116 and transmits image data which includes a message indicating that an ROI capturing is in progress.
In A108, the control apparatus 103 controls an optical zoom magnification of the camera 102 according to the ROI position. In an exemplary embodiment, the center of the ROI is identical to the center of the second hand gesture detected in A104, and the dimension of the ROI is 20% of the field of view of the camera 102. For example, if an original captured image is 1280 [pixel]*960 [pixel], the ROI is a 256 [pixel]*192 [pixel] range within the captured image. In A108, the camera 102 performs zoom-in process into the ROI to improve a resolution of the ROI.
In A109, the control apparatus 103 causes the camera 102 to perform capturing process to obtain a HR (High Resolution) still image of the ROI. When the control apparatus 103 obtains the HR still image of the ROI from the camera 102, the control apparatus 103 transmits a URL to the client computers 106 and 107 via the second server 105. The client computers 106 and 107 are able to get the HR still image of the ROI via the second server 105 by accessing the URL. Also, the control apparatus 103 periodically crops the ROI from a video frame of the meeting room 101 and provides the ROI image to the client computers 106 and 107 via the second server 105 unless the ROI detected in A104 is deleted by the user.
In A110, the control apparatus 103 controls an optical zoom magnification of the camera 102 to the original value. In other words, in A110, the optical zoom parameters of the camera 102 returns to parameters before the optical zoom control in A108. After this returning process, the control apparatus 103 resumes transmitting the video streams to the client computers 106 and 107, and flow proceeds to S109 in
After the ROI designation in S108, the control apparatus 103 periodically crops the ROI within a video frame from the camera 102 and the ROI is provided to the client computers 106 and 107 via the second server 105.
The processing performed in A106-A109 is further described in
The processing performed in A106-A109 advantageously provides a combination view which uses both a digital zoom and capture ROI for live view imaging and static optical zoom and capture for a high quality view of a particular area within the live view captured frame captured using the digital zoom of the image capture device.
In order to take the high quality static image, the system will take over the room camera and pan/zoom to the region-of-interest which is identified in the manner described herein throughout the present disclosure and capture a high-quality image of the identified ROI. Upon completing the capture, the camera will be controlled to return back to the room view position as defined immediately preceding the static image capture. In doing so, a reposition time value is determined that represents a length of time required to reposition the camera (e.g. X seconds) is and a buffering process that buffers the live video is started. The output frame rate of the live video is reduced to a predetermined frame rate less than a present frame rate. In one embodiment, the predetermined frame rate is substantially 50% of the normal frame rate. At the expiration of the reposition time value (e.g. X seconds have elapsed), the control apparatus will send a control signal for controlling the PTZ to reposition such that the PTZ camera can optically zoom in and capture a still image of predetermined region in the wide angle view of the room as identified by the detected gesture and take the high quality image. In one embodiment, the high quality image is captured at a maximum resolution of the image capture device. For example, the image may be captured at 4K resolution. The control apparatus will continue sending the buffered video at predetermined frame rate to the remote computing devices while the repositioning of the camera is occurring. When the reposition and reset is complete, normal frame rate video will resume.
The algorithm for generating these dual views using the single camera is shown in
In an instance when, not only does a user want to present the live view ROI to the remote user but also wants a higher quality view of the particular ROI, the control apparatus 103 can control the camera 102 to capture a still image having an image quality higher than an image quality being captured via the live-view capture. In one embodiment, the gesture indicating that an ROI within a frame should be captured can initiate the still image capture process that follows. In another embodiment, a further gesture may be required to initiate the still image capture of the ROI at the higher image quality and, upon detection thereof in accordance with the manner described herein, still image capture can be initiated.
In response to control by the control apparatus 103 to capture a still image, the control apparatus 103 determines, in 2105, one or more camera control parameters that will be used to physically control the position of the camera to position the camera to capture a still image of the identified ROI. In one embodiment, the one or more camera parameters includes a pan direction, a pan speed, a tilt direction, a tilt speed and an optical zoom amount required to capture a still image of the area within the ROI. The one or more camera parameters are obtained based on the pixel-by-pixel dimensions of the ROI that corresponds to the region sounding a region within the frame that is identified by the detected gesture. The one or more camera parameters further includes a reposition time value that represents an amount of time it will take the camera to move into the correct position and capture the particular ROI as determined by the detected gesture. In one embodiment, the reposition time value can be determined by calculating an X and Y reposition distance which can occur because the position of the camera is known as is the target position representing the ROI. This value is then multiplying by a constant factor representing the relocation time per unit of distance (ie, 1 MS per pixel). The result is the reposition time value that represents the estimated time it would take the camera to reposition to the new location to capture the still image of the ROI.
The control apparatus 103, after calculating the one or more camera parameters, causes the live video data of the ROI being capture during the live view ROI capture processing to be sent to one or more video buffers in 2106. The control apparatus 103 causes the live view ROI video data in the buffer to be output, in 2107, at a frame rate less than a current frame rate at which the live view ROI is being captured. At a point in time substantially equal to half the reposition time value in 2108, the determined one or more camera parameters are provided, in 2109, by the control apparatus 103 to the camera 102 which causes the camera 102 to be controlled to reposition in 2110 based on the one or more camera parameters and communicate an image capture command in 2111 that causes the camera 102 to capture a still image having a higher image quality than the live view ROI video image that is being output by the buffer at the lower frame rate. The control apparatus receives, in 2112, the captured still image having a higher image quality than an individual frame of the live view ROI image and communicates this captured image to the remote computing devices. In embodiment, the still image captured is transmitted in 2210 to the remote computing devices via a communication channel different from the live view ROI video stream. In another embodiment, this still image is stored in a memory that is specific to a particular user or organization that controls the online meeting. After the high resolution still image capture described above, the video capture rate is caused to return to the rate being captured prior to 2107 such that the live view of the meeting room can be captured and provided as described herein
In a further embodiment, shown in
Turning back to
In B101, the control apparatus 103 determines whether a whiteboard region is detected. The control apparatus 103 may detect whiteboard regions based on image recognition process and/or based on user operations. A user is able to designate four corners of a whiteboard to designate a whiteboard region. If the control apparatus 103 determines that the whiteboard region is not detected, flow proceeds to S110. If the control apparatus 103 determines that the whiteboard region is detected, flow proceeds to B102.
In B102, the control apparatus 103 highlights the whiteboard region thereby a user of the control apparatus 103 is able to see which region is designated as the whiteboard region.
In B103, the control apparatus 103 adds a new video stream according to the whiteboard detection. More specifically, the control apparatus 103 adds a new video stream to periodically send the Whiteboard images cropped from a video frame of the meeting room 101 to the client computers 106 and 107 via the second server 105. If the number of whiteboards already detected is larger than a threshold number, the control apparatus 103 may update the oldest whiteboard position with the new whiteboard position instead of adding the new stream.
In B104, the control apparatus 103 performs keystone correction on a video frame of the video of the meeting room 101 and crops the whiteboard region from the keystone corrected video frame to obtain the still image of the whiteboard and the cropped whiteboard region is transmitted to the client computers 106 and 107. As illustrated in
The presenter processing of S110 of
In C101, the control apparatus 103 determines whether the presenter name has been switched after the previous determination. If the control apparatus 103 determines that the presenter name has not been changed, flow proceeds to each of S104-S106. If the control apparatus 103 determines that the presenter name has been changed, flow proceeds to C102. In an exemplary embodiment, the presenter name is able to be switched based on user operations on the control apparatus 103.
In C102, the control apparatus 103 identifies a username of a current presenter from the name and position information 120 and changes the type 405 of the identified username from “Presenter” to “Attendee”. In an exemplary embodiment as illustrated in
In C103, the control apparatus 103 searches for a username of the new presenter from the name and position information as illustrated in
In C104, the control apparatus 103 changes the Type 405 of the new presenter Dorothy Moore from “Attendee” to “Presenter”.
In C105, the control apparatus 103 crops the face region of the new presenter from each video frame of the video of the meeting room 101 and transmits the cropped video to the client computers 106 and 107 via the first server 104. Until the presenter is switched, the control apparatus 103 continuously crops the face region of Dorothy Moore from a video frame of the meeting room 101 and provides the cropped video to the client computers 106 and 107 via the first server 104. After C105, flow proceeds to each of S104-S106.
As another exemplary embodiment of C105, the type 405 may not have the type “Presenter” and the control apparatus 103 and the client computers 106 and 107 identify the presenter by referring presenter information 801 that is separately stored from the name and position information 120.
As described above, the control apparatus 103 may transmit a video of the meeting room 101 and videos of the face regions via the first server 104 and may transmit the images of whiteboards, the images of the ROI and the name and position information via the second server 105. However, this is not seen to be limiting. Another exemplary embodiment, the control apparatus 103 may transmit the video of the meeting room via the first server 104 and may transmit the videos of the face regions, the images of the whiteboards, the images of the ROI and Name and Position information via the second server 105. As another example, the control apparatus 103 may transmit the video of the meeting room, the videos of the face regions and the images of the whiteboards and the images of the ROI cropped from the video frames of the meeting room 101 via the first server 104 and may transmit the images of the HR image of the ROI and Name and Position information via the second server 105.
The scope of the present disclosure includes a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform one or more embodiments of the invention described herein. Examples of a computer-readable medium include a hard disk, a floppy disk, a magneto-optical disk (MO), a compact-disk read-only memory (CD-ROM), a compact disk recordable (CD-R), a CD-Rewritable (CD-RW), a digital versatile disk ROM (DVD-ROM), a DVD-RAM, a DVD-RW, a DVD+RW, magnetic tape, a nonvolatile memory card, and a ROM. Computer-executable instructions can also be supplied to the computer-readable storage medium by being downloaded via a network.
The use of the terms “a” and “an” and “the” and similar referents in the context of this disclosure describing one or more aspects of the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the subject matter disclosed herein and does not pose a limitation on the scope of any invention derived from the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential.
It will be appreciated that the instant disclosure can be incorporated in the form of a variety of embodiments, only a few of which are disclosed herein. Variations of those embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. Accordingly, this disclosure and any invention derived therefrom includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
This application claims priority from U.S. Provisional Patent Application Ser. No. 63/292,271 filed on Dec. 21, 2021, the entirety of which is incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/081860 | 12/16/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63292271 | Dec 2021 | US |