The present disclosure relates to video stream processing, and more particularly to transmission of images of a moving individual.
In many education settings a presenter, such as a teacher, lecturer or speaker, is speaking to an audience, either local, remote or a combination. Often, the presenter is more comfortable moving around than just standing stationary. Because of that, videoconferencing endpoints include a mode for presenter tracking, where the camera pans to follow the presenter. However, most videoconferencing endpoints strive to place the presenter in the center of the frame. While this is acceptable if the presenter is not moving, when the presenter is moving this often results in the presenter appearing very close to the edge of the frame in the direction the presenter is moving, due to time lags in the presenter tracking software. This results in a cramped and uncomfortable feeling for the viewer, as it appears that the presenter is about to run into the edge of the frame or walk out of the frame. In some instances, the presenter tracking software is fast enough to keep the presenter near center of the frame, but this still results in a cramped and uncomfortable feeling for the viewer.
In examples according to the present invention, the size of a rule of thirds frame covering a presenter and the zooming and panning of a videoconference camera are based on the motion of a presenter. The size of the frame, and thus provided walking space in the frame, varies with the speed and the pose of the presenter. The slower the presenter, the smaller and tighter the frame. The faster the presenter, the larger and the looser the frame. If the presenter is pacing, a pacing frame is developed that is centered on the limits of the pacing and is large enough to cover both ends of the pacing. The movement frame and the pacing frame provide a pleasant experience for a viewer, one where the presenter does not appear cramped or walking out of the frame.
Referring now to
In
In step 710, the movement of the presenter ROI is tracked based on comparison with previous or succeeding video frames. The video frames used in the tracking are all video frames, not just the sample video frames, in some examples according to the present invention. In other examples, a reduced set of video frames is used, but still much more frequent than the sample video frames. In some examples, movement is detected in less than 30 ms to 50 ms. The tracking is performed in three dimensions, with the lateral movement of the ROI providing the x and y directions and change in the width of the ROI providing the z direction of a location vector. In some examples, the tracking of the ROI is performed using a neural network, but other methods can be used.
In step 712, it is determined if the presenter's torso is moving, the presenter moving as opposed to the presenter's arms moving. Torso movement is determined if the vector value of the ROI change exceeds a minimum movement threshold number of pixels over a selected number of video frames. Having the change in location of the ROI below a stillness threshold for a number of video frames indicates that the presenter is not moving. If the presenter is moving, the movement amount is stored to develop speed or velocity values in three dimensions and in units such as pixels/ms or pixles/sec.
If there is no movement, in step 714 it is determined if pacing mode is in effect. Pacing mode is a state where the framing is determined based on the extent of pacing of the presenter. If pacing mode is in effect, in step 716 it is determined if a pacing mode non-moving wait time has been exceeded. If not, in step 718 pacing framing, as discussed with regard to
If the presenter is moving in step 712, in step 738 the movement speed is determined: low, medium or fast. If the movement speed is low, in step 740 medium framing is set, such as shown in
If the pacing boundary has not been exceeded or pacing mode is not in effect, in step 754 it is determined if the speed of the presenter has changed between sample video frames. If the speed is not changed, in step 756 the framing size is maintained the same and the frame movement speed is maintained the same to provide proper walking space and to maintain the presenter in a desirable position according to the rule of thirds. If the speed has changed in step 754, in step 758 a determination is made if the presenter is moving faster. If not, in step 760 the allotted walking space is reduced. After steps 752 and 760, a determination is made in step 762 whether framing the presenter according to the looseness framing value meets the rule of thirds. If so, in step 764 the presenter is framed according to the looseness frame setting and the speed of movement of the frame is adjusted. If not, in step 766 the presenter is framed according to the rule of thirds, with walking space set to the allotted amount, which is dependent on the speed of the presenter. In some examples the allotted walking space is directly proportional to the movement speed. In some examples, the allotted walking space is allocated in discrete amounts, each amount applying to a range of movement speeds. It is understood that changing the framing size, based on either a looseness setting or the rule of thirds, is performed by changing the zoom of the camera, while matching the movement speed of the presenter is performed by panning the camera. These are both true for mechanical pan, tilt and zoom cameras and electronic pan, tilt and zoom (ePTZ) cameras.
If the direction of the presenter had changed in step 746, in step 768 the position of the presenter at the direction change is stored for later reference. In step 770, it is determined if the presenter was previously at this location. If not, in step 772 the direction of the location of the walking space in the frame is reversed to match the direction change of the presenter. Preferably this reversal is done smoothly using easing so that the presenter gradually has more walking space until the desired looseness or rule of thirds is met. In step 774, a determination is made whether framing the presenter according to the looseness framing setting meets the rule of thirds. If so, in step 776 the presenter is framed according to the looseness frame setting and the speed of movement of the frame is adjusted. If not, in step 778 the presenter is framed according to the rule of thirds, with walking space set to the allotted amount, which is dependent on the speed of the presenter.
If the presenter was previously at this location as determined in step 770, in step 780 this location is stored or confirmed as a pacing boundary, either right or left. In step 782, it is determined if both the left and right boundaries are set. If not, operation proceeds to step 772 to reverse the walking space and direction of movement of the frame. If both boundaries have been set in step 782, then in step 784 pacing mode is set and in step 786 the pacing frame is centered on the left and right boundaries of the pacing distance as illustrated in
By making the size of the presenter frame relate to the movement speed of the presenter, walking space is maintained in a moving frame when utilizing the appropriate of the rule of thirds, a looseness setting or a pacing determination, even though the presenter is moving. When not in a pacing mode, as the presenter location is proportional to the speed of the presenter and the frame is moving with the presenter and walking space is provided related to the speed, the viewer is never uncomfortable with the presenter walking off the edge of the frame. In this manner, the movement of the presenter is used to provide a comfortable viewing frame based on the speed of the presenter. If the presenter is pacing, then a pacing frame is provided that maintains the presenter within the pacing frame without changing size, so the presenter comfortably walks inside the pacing frame.
The processing unit 802 can include digital signal processors (DSPs), central processing units (CPUs), graphics processing units (GPUs), dedicated hardware elements, such as neural network accelerators and hardware codecs, and the like in any desired combination.
The flash memory 804 stores modules of varying functionality in the form of software and firmware, generically programs, for controlling the codec 800. Illustrated modules include a video codec 850; camera control 852; face and body finding 853; neural network models 855; framing 854, which performs the operations of
The network interface 808 enables communications between the codec Boo and other devices and can be wired, wireless or a combination. In one example, the network interface 808 is connected or coupled to the Internet 830 to communicate with remote endpoints 840 in a videoconference. In one or more examples, the general interface 810 provides data transmission with local devices such as a keyboard, mouse, printer, projector, display, external loudspeakers, additional cameras, and microphone pods, etc.
In one example, the camera 816 and the microphones 814A, 814B, 814C capture video and audio, respectively, in the videoconference environment and produce video and audio streams or signals transmitted through the bus 815 to the processing unit 802. In at least one example of this disclosure, the processing unit 802 processes the video and audio using algorithms in the modules stored in the flash memory 804. Processed audio and video streams can be sent to and received from remote devices coupled to network interface 808 and devices coupled to general interface 810. This is just one example of the configuration of a codec 800.
The processing unit 902 can include digital signal processors (DSPs), central processing units (CPUs), graphics processing units (GPUs), dedicated hardware elements, such as neural network accelerators and hardware codecs, and the like in any desired combination.
The flash memory 904 stores modules of varying functionality in the form of software and firmware, generically programs, for controlling the camera 900. Illustrated modules include camera control 952, sound source localization 960 and operating system and various other modules 970. The RAM 905 is used for storing any of the modules in the flash memory 904 when the module is executing, storing video images of video streams and audio samples of audio streams and can be used for scratchpad operation of the processing unit 902.
Other configurations, with differing components and arrangement of components, are well known for both videoconferencing endpoints and for devices used in other manners.
A graphics acceleration module 1024 is connected to the high-speed interconnect 1008. A display subsystem 1026 is connected to the high-speed interconnect 1008 to allow operation with and connection to various video monitors. A system services block 1032, which includes items such as DMA controllers, memory management units, general purpose I/O's, mailboxes and the like, is provided for normal SoC woo operation. A serial connectivity module 1034 is connected to the high-speed interconnect 1008 and includes modules as normal in an SoC. A vehicle connectivity module 1036 provides interconnects for external communication interfaces, such as PCIe block 1038, USB block 1040 and an Ethernet switch 1042. A capture/MIPI module 1044 includes a four lane CSI 2 compliant transmit block 1046 and a four lane CSI 2 receive module and hub.
An MCU island 1060 is provided as a secondary subsystem and handles operation of the integrated SoC woo when the other components are powered down to save energy. An MCU ARM processor 1062, such as one or more ARM R5F cores, operates as a master and is coupled to the high-speed interconnect 1008 through an isolation interface 1061. An MCU general purpose I/O (GPIO) block 1064 operates as a slave. MCU RAM 1066 is provided to act as local memory for the MCU ARM processor 1062. A CAN bus block 1068, an additional external communication interface, is connected to allow operation with a conventional CAN bus environment in a vehicle. An Ethernet MAC (media access control) block 1070 is provided for further connectivity. External memory, generally nonvolatile memory (NVM)such as flash memory 104, is connected to the MCU ARM processor 1062 via an external memory interface 1069 to store instructions loaded into the various other memories for execution by the various appropriate processors. The MCU ARM processor 1062 operates as a safety processor, monitoring operations of the SoC 1000 to ensure proper operation of the SoC 1000.
It is understood that this is one example of an SoC provided for explanation and many other SoC examples are possible, with varying numbers of processors, DSPs, accelerators and the like.
While the description above has focused on framing a presenter, such as a teacher or lecturer, in a videoconference, it is understood that the presenter can be any object that is moving and which is desired to be automatically tracked without operator intervention or control, such as an athlete or moving vehicle, in which case the performance or operation of the individual or object is considered the presentation and can apply to streaming, recording and broadcasting.
Some examples according to the present invention include a method of framing a presenter in a videoconference, by detecting a presenter in a video frame, tracking movement of the presenter, and developing a frame that has a size based on the movement of the presenter and that provides walking space for the presenter based on the movement of the presenter. In some examples, the frame size and walking space are proportional to the speed of the presenter. In some examples the presenter is framed according to the rule of thirds, while in other examples the presenter is framed based on a looseness framing setting. In some examples, the frame size and walking space are proportional to the speed of the presenter. In some examples, the presenter is pacing. In some examples, the pacing has a left end and a right end, and the frame encompasses the left end and the right end. In some examples, the frame is centered on the centerline between the left end and the right end.
Some examples according to the present invention include a videoconference endpoint having a processor, a network interface coupled to the processor for connection to a far end videoconference endpoint, a camera interface coupled to the processor for receiving at least one video stream of captured images of containing a presenter, a video output interface coupled to the processor for providing a video stream to a display for presentation, and memory coupled to the processor for storing instructions executed by the processor to perform the operations of detecting the presenter in a video frame, tracking movement of the presenter, and developing a frame that has a size based on the movement of the presenter and that provides walking space for the presenter based on the movement of the presenter. In some examples of the videoconference endpoint, the frame size and walking space are proportional to the speed of the presenter. In some examples of the videoconference endpoint, the presenter is framed according to the rule of thirds, while in other examples the presenter is framed based on a looseness framing setting. In some examples of the videoconference endpoint, the frame size and walking space are proportional to the speed of the presenter. In some examples of the videoconference endpoint, the movement of the presenter is pacing. In some examples of the videoconference endpoint, the pacing has a left end and a right end, and the frame encompasses the left end and the right end. In some examples of the videoconference endpoint, the frame is centered on the centerline between the left end and the right end.
Some examples according to the present invention include a non-transitory program storage device or devices, readable by one or more processors in a videoconference endpoint and comprising instructions stored thereon to cause the one or more processors to perform a method of detecting a presenter in a video frame, tracking movement of the presenter, and developing a frame that has a size based on the movement of the presenter and that provides walking space for the presenter based on the movement of the presenter. In some examples, the method performed according to the instructions in the non-transitory program storage device or devices includes the frame size and walking space being proportional to the speed of the presenter. In some examples, the method performed according to the instructions in the non-transitory program storage device or devices includes the presenter being framed according to the rule of thirds, while in other examples the presenter is framed based on a looseness framing setting In some examples, the method performed according to the instructions in the non-transitory program storage device or devices includes the frame size and walking space being proportional to the speed of the presenter. In some examples, the method performed according to the instructions in the non-transitory program storage device or devices includes the movement of the presenter being pacing. In some examples, the method performed according to the instructions in the non-transitory program storage device or devices includes the pacing having a left end and a right end, the frame encompassing the left end and the right end, and the frame being centered on the centerline between the left end and the right end.
The above description is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this disclosure. The scope of the invention should therefore be determined not with reference to the above description, but instead with reference to the appended claims along with their full scope of equivalents.