This disclosure relates generally to videoconferencing and relates particularly to a hybrid approach to correcting for deformation in facial imaging.
Attempts to correct for both image distortion and image deformation for images captured by wide-angle lenses have not been wholly satisfactory. Thus, there is room for improvement in the art.
For illustration, there are shown in the drawings certain examples described in the present disclosure. In the drawings, like numerals indicate like elements throughout. The full scope of the inventions disclosed herein are not limited to the precise arrangements, dimensions, and instruments shown. In the drawings:
In the drawings and the description of the drawings herein, certain terminology is used for convenience only and is not to be taken as limiting the examples of the present disclosure. In the drawings and the description below, like numerals indicate like elements throughout.
Introduction
Images captured using a wide-angle lens inherently include distortion (405) effects and deformation effects. As used herein, distortion (405) refers to bending of light such that straight lines appear curved in an image. As used herein, deformation refers to “stretching” in a portion of an image such that objects appear larger in one or more dimensions than is natural. As used herein, the term deviation encompasses both distortion (405) and deformation. Distortion (405) and/or deformation may be corrected in an image by applying a transformation to the image. However, distortion-correction (508) can exacerbate deformation. Distortion (405) and deformation may be relatively more noticeable in different portions of an image. For example, in a cropped view of an image, deformation may be more noticeable than in a full view of the image. Further, deformation may be more noticeable at edges of the image (403) as compared to areas closer to the center (304). Disclosed are systems and methods (800) for selectively correcting distortion (405) and deformation in images. While the disclosed systems and methods (800) are described in connection with a teleconference system, it should be noted that the disclosed systems and methods (800) can used in other contexts according to the disclosure.
Discussion
In general, the videoconferencing endpoint 10 can be a conferencing device, a videoconferencing device, a personal computer with audio or video conferencing abilities, or any similar type of communication device. The videoconferencing endpoint 10 is configured to generate near-end audio and video and to receive far-end audio and video from the remote videoconferencing endpoints 60. The videoconferencing endpoint 10 is configured to transmit the near-end audio and video to the remote videoconferencing endpoints 60 and to initiate local presentation of the far-end audio and video.
A microphone 120 captures audio and provides the audio to the audio module 30 and codec 32 for processing. The microphone 120 can be a table or ceiling microphone, a part of a microphone pod, an integral microphone to the videoconferencing endpoint 10, or the like. Additional microphones 121 can also be provided. Throughout this disclosure all descriptions relating to the microphone 120 apply to any additional microphones 121, unless otherwise indicated. The videoconferencing endpoint 10 uses the audio captured with the microphone 120 primarily for the near-end audio. A camera 46 captures video and provides the captured video to the video module 40 and codec 42 for processing to generate the near-end video. For each frame (705) of near-end video captured by the camera 46, the control module 20 selects a view region, and the control module 20 or the video module 40 crops the frame (705) to the view region. The view region may be selected based on the near-end audio generated by the microphone 120 and the additional microphones 121, other sensor data, or a combination thereof. For example, the control module 20 may select an area of the frame (705) depicting a participant who is currently speaking as the view region. As another example, the control module 20 may select the entire frame (705) as the view region in response to determining that no one has spoken for a period of time. Thus, the control module 20 selects view regions based on a context of a communication session.
The camera 46 includes a wide-angle lens. Due to the nature of wide-angle lenses, video (and still images) captured by the camera 46 includes both distortion (405) and deformation (507) effects. The video module 40 includes deformation-reduction (1050) logic 72 and distortion-correction (508) logic 74. In some examples, the deformation-reduction (1050) logic 72 and the distortion-correction (508) logic 74 correspond to mapping tables (e.g., 807, 809, 811) that identify adjustments to make to images captured by the camera 46. In at least one example of this disclosure, the mapping tables are based on properties of a lens of the camera 46, such as focal length, etc. For each frame (705) of video captured by the camera 46, the video module 40 selects the deformation-reduction (1050) logic 72 or the distortion-correction (508) logic 40 based on a size of a view region selected by the control module 20 for that frame (705) as described further below herein. The video module 40 then applies the selected correction logic to the view region of the frame (705) to generate a corrected near end video frame (705). Thus, each corrected near-end video frame (705) corresponds to a potentially cropped and corrected version of a video frame (705). The corrected near end video frames (705) taken together comprise corrected near-end video.
The videoconferencing endpoint 10 uses the codecs 32/42 to encode the near-end audio and the corrected near-end video according to any of the common encoding standards, such as MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and H.264. Then, the network module 50 outputs the encoded near-end audio and corrected video to the remote videoconferencing endpoints 60 via the network 55 using any appropriate protocol. Similarly, the network module 50 receives the far-end audio and video via the network 55 from the remote videoconferencing endpoints 60 and sends these to their respective codec 32/42 for processing. Eventually, a loudspeaker 130 outputs the far-end audio (received from a remote videoconferencing endpoint), and a display 48 outputs the far-end video. The display 48 also outputs the corrected near-end video in some examples.
The processing unit 110 includes a CPU, a GPU, or both. The memory 140 can be any conventional memory such as SDRAM and can store modules 145 in the form of software and firmware for controlling the videoconferencing endpoint 10. The stored modules 145 include the various video and audio codecs 32/42 and software components of the other modules 20/30/40/50/200 discussed previously. Moreover, the modules 145 can include operating systems, a graphical user interface (GUI) that enables users to control the videoconferencing endpoint 10, and other algorithms for processing audio/video signals.
The network interface 150 provides communications between the videoconferencing endpoint 10 and remote videoconferencing endpoints (60). By contrast, the general I/O interface 160 can provide data transmission with local devices such as a keyboard, mouse, printer, overhead projector, display 48, external loudspeakers, additional cameras 46, microphones, etc.
As described above, the videoconferencing endpoint 10 captures frames (705) of video, selectively crops the frames (705) to view regions, and selectively applies deformation (507) or distortion-correction (508) to the view regions based on size of the view regions. Because distortion (405) may be more noticeable in relatively larger view regions and deformation (507) may be more noticeable in relatively smaller view regions, selectively using one of the correction techniques enhances quality of video during a communication session by addressing irregularities that may be more noticeable to a communication session participant. Thus,
In many artificial intelligence-based cameras 46, when an active speaker is determined an active speaker view is formed by cropping the region containing the speaker's face (308) from a full view captured by the camera 46. When someone at the far end looks at the feed containing the wide view and the feed containing the active talker, he or she will tend to notice (and be distracted by) distortion (405) in the wide view, while he or she will tend to notice (and be distracted and/or disturbed by) any deformation (507) of the face (308) if present. In a full view, the viewer would care more about the geometric reality of background since it occupies the most regions in an image. In active speaker view, the viewer cares more about proper depiction of the single person who is the subject of the active speaker view. If the lens distortion-correction (508) formula that is used for a full view is also used in a corresponding active speaker view, the deformation (507) of the facial features can be made more noticeable. In at least one example of this disclosure, systems and methods (800) are described which address this problem.
In at least one example of this disclosure, in the central region (301) of a full view image, distortion (405) and deformation (507) could be corrected very well at the same time. But in corner regions, deformation (507) of a human face (308) will increase if the corrective measures applied to the central region (301) are applied in the same way to the corner regions.
In at least one example of this disclosure, for most wide-angle lenses, the faces in central area of the view captured by a wide-angle lens are noticeably visually pleasing after lens distortion-correction 508. As demonstrated in
In at least one example of this disclosure, the radius 300 of the central portion of a view 400 captured by a wide-angle camera 46 for which lens distortion-correction 508 does not cause noticeable deformation 507 of faces is approximately 700 pixels. In at least one example of this disclosure, the “minimal-deformation” zone of a lens with a wider angle of view will have a radius that is smaller than 700 pixels.
Whether a facial image is located in the minimal-deformation zone is thus a major determiner for whether distortion-correction 508 will induce deformation 507, such as occurred for Hailin in
In at least one example of this disclosure, a lens distortion 405 correction formula will be determined based on a face's location 308 in an image 400, 306 with respect to the center 304 (e.g., the distance 300 of the face 308 from the center 304), and the distance 300 of the face 308 from the camera 46. As noted, both the distance 300 of the face 308 from the center 304 and the distance 300 of the face 308 from the camera 46 cause variations in deformations 507 of the face 308 in question. At least one example of this disclosure includes a computationally efficient way of balancing the need for distortion-correction with the need for deformation 507 reduction/minimization (1050). At least one example of this disclosure includes a method of switching among three different lens distortion-correction tables based on the distance 300 of the face 308 from the center 304 and the distance of the face 300 from the camera 46.
Values of an example background lookup table 807 are shown below in Table 1. Values of an example blended lookup table 811 are shown below in table 3. Values of an example large face lookup table 813 in Table 3. Some values from the tables 807, 811, and 813 are plotted in the lens distortion-correction 508 with deformation-reduction (1050) map 900 of
The system bus 1210 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. A basic input/output system (BIOS) stored in ROM 1240 or the like, may provide the basic routine that helps to transfer information between elements within the device 1200, such as during start-up. The device 1200 further includes storage devices 1260 such as a hard disk drive, a magnetic disk drive, an optical disk drive, tape drive or the like. The storage device 1260 can include software modules 1262, 1264, 1266 for controlling the processor 1220 (110). Other hardware or software modules are contemplated. The storage device 1260 is connected to the system bus 1210 by a drive interface. The drives and the associated computer readable storage media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the device 1200. In at least one example, a hardware module that performs a function includes the software component stored in a non-transitory computer-readable medium coupled to the hardware components—such as the processor 1220 (110), bus 1210, output device 1270, and so forth—necessary to carry out the function.
For clarity of explanation, the device of
Examples of this disclosure include:
Example 1. A method 800, 1300 for reducing deviations in images 400 captured by a wide-angle camera 46, comprising: receiving 701, at a processor 110, 1220, a first frame 705 corresponding to a first view 401; rendering, using the processor 110, 1220, a first wide-view image 403 corresponding to the first frame 705, the first wide-view image 403 having a central region 301; detecting, using the processor 110, 1220, a face 308 in a first face-portion 309 of the first wide-view image 403, the first face-portion 309 having a center 312; determining 1308, using the processor 110, 1220, that the center 312 of the first face-portion 309 is external of the central region 301 of the first wide-view image 403; determining 1310, using the processor 110, 1220 and based on the determination 1308 that the center 312 of the first face-portion 309 is external of the central region 301 of the first wide-view image 403, a dimension of the first face-portion; determining 1312, using the processor 110, 1220, that the dimension of the first face-portion 309 is less than a predetermined threshold (e.g., 250 pixels); and rendering 1314, using the processor 110, 1220, a first focus-view image 1011, 1019 corresponding to the first face-portion 309, wherein rendering 1314 the first focus-view image 1011, 1019 includes imposing a degree of distortion-correction 508 on the first face-portion 309 and imposing a degree of deformation-reduction 1050 on the first face-portion 309.
Example 2. The method 800, 1300 of example 1, further comprising: receiving 701, at the processor 110, 1220, a second frame 705 corresponding to a second view; rendering, using the processor 110, 1220, a second wide-view image 403 corresponding to the second frame 705, the second wide-view image 403 having a central region 301; detecting, using the processor 110, 1220, a second face 308 in a second face-portion 309 of the second wide-view image 403, the second face-portion 309 having a center; determining, using the processor 110, 1220, that the center 312 of the second face-portion 309 is external of the central region 301 of the second wide-view image 403; determining, using the processor 110, 1220 and based on the determination that the center 312 of the second face-portion 309 is external of the central region 301 of the second wide-view image 403, a dimension of the second face-portion; determining, using the processor 110, 1220, that the dimension of the second face-portion 309 is greater than or equal to the predetermined threshold; and rendering 1314, using the processor 110, 1220, a second focus-view image 1011, 1019 corresponding to the second face-portion, wherein rendering 1314 the second focus-view image 1011, 1019 includes imposing a degree of distortion-correction 508 on the second face-portion 309 and imposing a degree of deformation-reduction 1050 on the second face-portion, wherein the degree of distortion-correction 508 imposed on the second face-portion 309 is lower than the degree of distortion-correction 508 imposed on the first face-portion, and wherein the degree of deformation-reduction 1050 imposed on the second face-portion 309 is greater than the degree of deformation-reduction 1050 imposed on the first face-portion 309.
Example 3. The method 800, 1300 of example 2, further comprising: receiving 701, at the processor 110, 1220, a third frame 705 corresponding to a third view; rendering, using the processor 110, 1220, a third wide-view image 403 corresponding to the third frame 705, the third wide-view image 403 having a central region 301; detecting, using the processor 110, 1220, a third face 308 in a third face-portion 309 of the third wide-view image 403, the third face-portion 309 having a center; determining, using the processor 110, 1220, that the center 312 of the third face-portion 309 is internal to the central region 301 of the third wide-view image 403; rendering 1314, using the processor 110, 1220 and based on the determination that the center 312 of the third face-portion 309 is external of the central region 301 of the third wide-view image 403, a third focus-view image 1011, 1019 corresponding to the third face-portion, wherein rendering 1314 the third focus-view image 1011, 1019 includes imposing a degree of distortion-correction 508 on the third face-portion 309 and imposing a degree of deformation-reduction 1050 on the third face-portion, wherein the degree of distortion-correction 508 imposed on the third face-portion 309 is greater than the degree of distortion-correction 508 imposed on the first face-portion, and wherein the degree of deformation-reduction 1050 imposed on the third face-portion 309 is lower than the degree of deformation-reduction 1050 imposed on the first face-portion.
Example 4. The method 800, 1300 of example 3, wherein the first frame 705, the second frame 705, and the third frame 705 are the same, and wherein the first wide-view image 403, the second wide-view image 403, and the third wide-view image 403 are different.
Example 5. The method 800, 1300 of example 3, wherein: imposing the degree of distortion-correction 508 to the first face-portion 309 and imposing the degree of deformation-reduction 1050 to the first face-portion 309 comprise fetching values from a first lookup table; imposing the degree of distortion-correction 508 on the second face-portion 309 and imposing the degree of deformation-reduction 1050 on the second face-portion 309 comprise fetching values from a second lookup table; imposing the degree of distortion-correction 508 on the third face-portion 309 and imposing the degree of deformation-reduction 1050 on the third face-portion 309 comprise fetching values from a third lookup table, and wherein some values in the first lookup table are based on extrapolation of some values in the third lookup table and some values in the first lookup table are based on interpolation of some values in the second lookup table.
Example 6. The method 800, 1300 of example 1, wherein the central region 301 of the first wide-view image 403 has a radius of 700 pixels centered in the first wide-view image 403.
Example 7. The method 800, 1300 of example 1, wherein the dimension of the first face-portion 309 is a width and the predetermined threshold is 250 pixels.
Example 8. The method 800, 1300 of example 1, further comprising capturing image data corresponding to the first frame 705 using a wide-angle lens.
Example 9. The method 800, 1300 of example 1, wherein capturing image data corresponding to the first frame 705 using an image sensor with a field of view greater than one hundred and fifty-nine degrees, and less than one hundred and eighty degrees.
Example 10. The method 800, 1300 of example 1, wherein rendering the first wide-view image 403 comprises displaying the first wide-view image 403 using a first display device 48, and wherein rendering 1314 the first focus-view image 1011, 1019 comprises displaying at least some of the first focus-view image 1011, 1019 using a second display device 1270.
Example 11. The method 800, 1300 of example 10, wherein the first display device 48 and the second display device 1270 are different.
Example 12. A videoconferencing endpoint 10, comprising: a wide-angle camera 46; a display device 48; a processor 110, 1220 coupled to the wide-angle camera 46 and the display device 48; a memory storing instructions executable by the processor 110, 1220, wherein the instructions comprise instructions to: receive a first frame 705 corresponding to a first view 401; render a first wide-view image 403, the first wide-view image 403 corresponding to the first frame 705 and having a central region 301; detect a face 308 in a first face-portion 309 of the first wide-view image 403, the first face-portion 309 having a center; determine that the center 312 of the first face-portion 309 is external of the central region 301 of the first wide-view image 403; determine, using the processor 110, 1220 and based on the determination that the center 312 of the first face-portion 309 is external of the central region 301 of the first wide-view image 403, a dimension of the first face-portion; determine that the dimension of the first face-portion 309 is less than a predetermined threshold; render, using the display device 48, a focus-view image 1011, 1019 corresponding to the first face-portion, wherein the instructions to render, using the display device 48, the focus-view image 1011, 1019 include instructions to impose a degree of distortion-correction 508 to the first face-portion 309 and impose a degree of deformation-reduction 1050 to the first face-portion.
Example 13. The videoconferencing endpoint 10 of example 12, wherein the instructions further comprise instructions to: receive a second frame 705 corresponding to a second view; render, using the display device 48, a second wide-view image 403 corresponding to the second frame 705, the second wide-view image 403 having a central region 301; detect a second face 308 in a second face-portion 309 of the second wide-view image 403, the second face-portion 309 having a center; determine that the center 312 of the second face-portion 309 is external of the central region 301 of the second wide-view image 403; determine, using the processor 110, 1220 and based on the determination that the center 312 of the second face-portion 309 is external of the central region 301 of the second wide-view image 403, a dimension of the second face-portion; determine that the dimension of the second face-portion 309 is greater than or equal to the predetermined threshold; render, using the display device 48, a second focus-view image 1011, 1019 corresponding to the second face-portion, wherein the instructions to render the second focus-view image 1011, 1019 include instructions to impose a degree of distortion-correction 508 on the second face-portion 309 and impose a degree of deformation-reduction 1050 on the second face-portion 309, whereby the degree of distortion-correction 508 imposed on the second face-portion 309 is lower than the degree of distortion-correction 508 imposed on the first face-portion, and whereby the degree of deformation-reduction 1050 imposed on the second face-portion 309 is greater than the degree of deformation-reduction 1050 imposed on the first face-portion.
Example 14. The videoconferencing endpoint 10 of example 13, the instructions further comprising instructions to: receive a third frame 705 corresponding to a third view; render, using the display device 48, a third wide-view image 403 corresponding to the third frame 705, the third wide-view image 403 having a central region 301; detect a third face 308 in a third face-portion 309 of the third wide-view image 403, the third face-portion 309 having a center; determine that the center 312 of the third face-portion 309 is internal to the central region 301 of the third wide-view image 403; render, using the display device 48 and based on the determination that the center 312 of the third face-portion 309 is external of the central region 301 of the third wide-view image 403, a third focus-view image 1011, 1019 corresponding to the third face-portion, wherein the instructions to render the third focus-view image 1011, 1019 include instructions to impose a degree of distortion-correction 508 on the third face-portion 309 and impose a degree of deformation-reduction 1050 on the third face-portion, whereby the degree of distortion-correction 508 imposed on the third face-portion 309 is greater than the degree of distortion-correction 508 imposed on the first face-portion, and whereby the degree of deformation-reduction 1050 imposed on the third face-portion 309 is lower than the degree of deformation-reduction 1050 imposed on the first face-portion.
Example 15. The videoconferencing endpoint 10 of example 14, wherein the first frame 705, the second frame 705, and the third frame 705 are the same, and wherein the first wide-view image 403, the second wide-view image 403, and the third wide-view image 403 are different.
Example 16. The videoconferencing endpoint 10 of example 14, wherein: the instructions to impose the degree of distortion-correction 508 to the first face-portion 309 and impose the degree of deformation-reduction 1050 to the first face-portion 309 comprise instructions to fetch values from a first lookup table; the instructions to impose the degree of distortion-correction 508 on the second face-portion 309 and impose the degree of deformation-reduction 1050 on the second face-portion 309 comprise instructions to fetch values from a second lookup table; the instructions to impose the degree of distortion-correction 508 on the third face-portion 309 and impose the degree of deformation-reduction 1050 on the third face-portion 309 comprise instructions to fetch values from a third lookup table, wherein the first lookup table, the second lookup table, and the third lookup table are different.
Example 17. The videoconferencing endpoint 10 of example 12, wherein the central region 301 of the first wide-view image 403 has a radius of 700 pixels centered in the first wide-view image 403.
Example 18. The videoconferencing endpoint 10 of example 12, wherein the dimension of the first face-portion 309 is a width and the predetermined threshold is fifteen percent of the width of the wide-view image 403.
Example 19. The videoconferencing endpoint 10 of example 12, wherein the wide-angle camera 46 comprises a wide-angle lens.
Example 20. The videoconferencing endpoint 10 of example 12, wherein the wide-angle camera 46 comprises an image sensor with a field of view greater than one hundred and fifty-nine degrees, and less than one hundred and eighty degrees.
The examples described above are provided by way of illustration only and should not be construed to limit the scope of the disclosure. Various modifications and changes can be made to the principles and examples described herein without departing from the scope of the disclosure and without departing from the claims which follow.
Number | Date | Country | Kind |
---|---|---|---|
201910706478.9 | Aug 2019 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/044288 | 7/30/2020 | WO |