This disclosure relates generally to videoconferencing and relates particularly to detection of whiteboards and individuals in one or more captured audio-visual streams.
Currently whiteboards are treated primarily as content sources, so that the whiteboard is provided as a content stream. A presenter is seen in a video stream, even if he moves around. In some cases, a camera is dedicated to a whiteboard, but then a user must switch the video source being provided to the far end between the whiteboard and the presenter. If the presenter is standing near or in front of the whiteboard, any framing with the whiteboard can become confusing. For example, the whiteboard is provided in the content stream and displayed on a content monitor, but the whiteboard is also present in the presenter video stream and the main monitor.
For illustration, there are shown in the drawings certain examples described in the present disclosure. In the drawings, like numerals indicate like elements throughout. The full scope of the inventions disclosed herein are not limited to the precise arrangements, dimensions, and instruments shown. In the drawings:
Far end viewer comprehension is improved in examples according to the present disclosure. A near end videoconferencing endpoint determines if there is a whiteboard and if a presenter is near the whiteboard. If there is no whiteboard in view or the presenter is not near the whiteboard, any content from a camera focused on the whiteboard is continued and any presenter framing is done normally. If the presenter is in front of the whiteboard, any whiteboard content is ended, and appropriate portions of the whiteboard are included in the main video stream framed with the presenter. If the whiteboard is empty, framing is done without reference to the whiteboard. If the whiteboard is full or has writing away from the presenter, the entire whiteboard and the presenter are framed together. If the whiteboard only has writing near the presenter, only the relevant portion of the whiteboard is framed with the presenter. By including the whiteboard in the framing with the presenter and turning off any whiteboard content stream when the presenter is near the whiteboard, the far end viewer does not see the whiteboard in two different streams.
Referring now to
In
In
By including the presence of the whiteboard 16 and any writing on the whiteboard 16 into the decisions for framing the presenter P, and appropriately controlling the transmission of the whiteboard as content, viewer confusion is reduced.
Referring now to
In step 306, the audio streams from the microphone arrays are used for sound source localization (SSL), with the SSL results then used in combination with the video streams to find talkers. In the case of a presenter in front of a whiteboard, there is generally only a single talker to be framed.
After the talkers are found in step 306, in step 308 the parties are framed as desired. Framing is usually based on the locations and numbers of talkers or participants to be framed. Examples according to the present disclosure add the location of a whiteboard into the framing considerations. Details of the framing according to examples of the present disclosure are provided in
If the presenter is near the whiteboard in step 404, in step 410 transmission of the whiteboard as content is discontinued. In step 412, it is determined if the whiteboard is empty. If so, the whiteboard need not be considered in framing determinations and operation proceeds to step 408, for framing as illustrated in
If the whiteboard is not empty, in step 414 it is determined of the whiteboard is substantially full of writing or is only in portions not adjacent the presenter. If the whiteboard is full or the writing is not adjacent the presenter, in step 416 framing is based on the presenter and the entire whiteboard, as in
In a simplified example, there is no evaluation of the amount or location of any writing on the whiteboard and the presenter is simply framed with the entire whiteboard when the presenter is near the whiteboard, so that the framing is as shown in
If the presenter is pacing, so that the whiteboard comes into and out of a framing view of the presenter, a situation might arise where the whiteboard content stream is rapidly and repeatedly turned on and off. This would be distracting to the viewer at the far end, so in some examples time delays are included after the determination of step 404 as shown in
If the talker is near the whiteboard in step 404, in step 452 it is determined if the talker has been near the whiteboard for a desired period, such as five seconds. If so, operation proceeds to step 410 and the provision of the whiteboard as content is discontinued. If the desired period has not elapsed, operation proceeds to step 406, where the whiteboard continues to be provided as content.
Operation is similar even if the whiteboard is never provided as content, such as when there is no camera aimed at the whiteboard to operate as a content camera. This operation is shown in
While whiteboards have been discussed above, it is understood that other objects are similar to whiteboards, so the term whiteboard as used herein is not limited to just dry erase whiteboards per se but includes similar items, such as smart or interactive whiteboards, flip charts, extra-large sticky notes, bulletin boards with paper on them, boards (including Kanban boards and scrum boards), clusters of sticky notes, a wall with a projected image from an interactive projector, etc., all of which are broadly considered as interactive group presentation devices.
While writing on the whiteboard has been discussed above, it is understood that writing is used broadly, so that other information besides the illustrated textual information, such as graphical information, pre-printed materials, etc. placed on or displayed by the whiteboard are classified as writing, all of which are broadly considered as information.
In the examples of this disclosure, a content camera 511 has been described as capturing the whiteboard to be provided as content. If the whiteboard is a smart or interactive whiteboard, the whiteboard itself may be providing the content image. If the whiteboard is an image projected by an interactive projector, the projector may be providing the content image. The transmission of the content image in either case would be controlled as described in
While the use of neural networks has been described to determine the presence of a whiteboard and the amount of writing on a whiteboard, it is understood that more conventional computer vision techniques can also be used.
In examples according to the present disclosure, the camera with the best view of the presenter P and whiteboard 16 is used for the framing operations and then transmitted to the far end. For example, in
While this disclosure has focused on the use of a whiteboard in a conference room, it is understood that the whiteboard and presenter may be in many different settings, including a classroom, an auditorium, a lecture hall, a theater and so on.
Additionally, while the whiteboard 16 has been shown mounted on a wall, the whiteboard may also be freestanding or a portion of another object.
By including the whiteboard into presenter or talker framing decisions when the presenter is near or in front of the whiteboard, the experience of viewers at the far end is improved as confusion with provision of the whiteboard as content is reduced, particularly if the provision of whiteboard content is coordinated with the presenter framing decisions so that the whiteboard is not presented in both normal video stream and the content stream at the same time.
The codec 500 is connected to a corporate or other local area network (LAN) 514. The corporate LAN 514 is connected to a firewall 516 and then the Internet 518 in a common configuration to allow communication with a remote endpoint 634 at a far end.
Details of the codec 500 are shown in
Cameras 510, 512 and content camera 511 are connected to the camera inputs module 618. The monitor and speaker 506 is connected to the HDMI output module 616. External DRAM 612 and a Wi-Fi/Bluetooth module 620 are connected to the SoC 600 to provide the needed bulk operating memory (RAM associated with each CPU and DSP is not shown) and additional I/O capabilities commonly used today. An audio codec 624 is connected to the SoC 600 to provide local analog line level capabilities. An analog microphone 508 is connected to the audio codec 624.
Preferably two network interface chips (NICs) 626, 628, such as Intel I210, are connected to the PCIe interfaces of the SoC 600. In the illustrated embodiment, NIC 626 is for connection to the corporate LAN 514 and then to IP microphones 632, the Internet 518 and remote or far end endpoints 634, while the other NIC 628 is used for local connection of IP-connected devices, such as IP microphones 630.
Flash memory 604 is connected to the SoC 600 to hold the programs that are executed by the CPUs 601 and DSPs 602 to provide the endpoint functionality of the codec 500, including the whiteboard and presenter framing discussed above. Illustrated modules include a video codec 650, camera control 652, face, body and ROI finding 653, neural network models 655, framing 654, other video processing 656, audio codec 658, audio processing 660, sound source localization 661, network operations 666, user interface 668 and operating system and various other modules 670. The RAM 608 and DRAM 612 is used for storing any of the modules in the flash memory 604 when the module is executing, storing video images of video streams and audio samples of audio streams and can be used for scratchpad operation of the SoC 600. The neural network models 855 and face, body and ROI finding 853 are used with the framing 654 to perform the whiteboard and presenter detection and framing as described above for
A graphics acceleration module 724 is connected to the high speed interconnect 708. A display subsystem as the HDMI output 616 is connected to the high speed interconnect 708 to allow operation with and connection to various video monitors. A system services block 732, which includes items such as DMA controllers, memory management units, general purpose I/O's, mailboxes, and the like, is provided for normal SoC 700 operation. A serial connectivity module 734 is connected to the high speed interconnect 708 and includes modules as normal in an SoC. A connectivity module 736 provides interconnects for external communication interfaces, such as PCIe block 738, USB block 740 and an Ethernet switch 742. A capture/MIPI module is the camera interface 618 and includes a four lane CSI 2 compliant transmit block 746 and a four lane CSI 2 receive module and hub.
An MCU island 760 is provided as a secondary subsystem and handles operation of the integrated SoC 700 when the other components are powered down to save energy. An MCU ARM processor 762, such as one or more ARM R5F cores, operates as a master and is coupled to the high speed interconnect 708 through an isolation interface 761. An MCU general purpose I/O (GPIO) block 764 operates as a slave. MCU RAM 766 is provided to act as local memory for the MCU ARM processor 762. A CAN bus block 768, an additional external communication interface, is connected to allow operation with a conventional CAN bus environment in a vehicle. An Ethernet MAC (media access control) block 770 is provided for further connectivity. External memory, generally non volatile memory (NVM) such as flash memory 604, is connected to the MCU ARM processor 762 via an external memory interface 769 to store instructions loaded into the various other memories for execution by the various appropriate processors. The MCU ARM processor 762 operates as a safety processor, monitoring operations of the SoC 700 to ensure proper operation of the SoC 700.
It is understood that this is one example of an SoC provided for explanation and many other SoC examples are possible, with varying numbers of processors, DSPs, accelerators and the like.
A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. One general aspect includes a method of presenting a talker and a whiteboard to a far end of a videoconference. The method also includes receiving at least one video stream containing both the talker and the whiteboard. The method also includes determining the presence of the talker near the whiteboard. The method also includes when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
Implementations may include one or more of the following features. The method may include determining the presence of writing on the whiteboard, and where framing the talker and the whiteboard together is performed only when there is writing on the whiteboard. Determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and where framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker. Determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and where framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard. The method the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference, the method may include: discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together. The method may include continuing provision of the whiteboard as content when the talker is not near the whiteboard. Determining the presence of the talker near the whiteboard includes detecting regions of interest in the at least one video stream; and determining if a region of interest is a whiteboard. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
The above description is intended to be illustrative, and not restrictive. For example, the above-described embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.”
This application claims priority to U.S. Provisional Application Ser. No. 63/161,133, filed Mar. 15, 2021, the contents of which are incorporated herein in their entirety by reference.
Number | Date | Country | |
---|---|---|---|
63161133 | Mar 2021 | US |