The disclosure relates to image processing techniques.
In a meeting room or conference center, there are generally writing surfaces upon which meeting participants are able to write using writing instruments and which allow the participant to view information that one or more participants believe is important. One such writing surface is a whiteboard and participants are able to write on the whiteboard with, for example, erasable whiteboard markers. This is a valuable collaboration tool used by participants.
In certain instances, there is a need for people who are physically unable to be in the meeting room or conference center to be able to participate remotely. There are a plurality of remote online meeting solutions that can effectuate this process. Further, during these meetings where one or more participants are remotely located, it is desirable for them to be able to view what is being written on the writing surface to feel as though they are part of the collaboration. Tools such as electronic whiteboards which digitize written information exist to allow this to occur but they are expensive and difficult to integrate with IT networks. Other mechanism such as image capture systems also exist which allow for an image capture device to capture an image of the writing surface and transmit those images to the remote users. However, in a remote meeting scenario, where a single camera is used to display the meeting room, remote users usually have difficulty in viewing/reading the contents of a whiteboard shown by the single camera. A possible solution for the problem would be an addition of a dedicated camera that focus on the whiteboard, incurring in additional cost in the meeting room set up for remote meetings. A system and method according to the present disclosure remedies the drawbacks identified above.
An image processing apparatus is provided and includes one or more processors and one or more memories storing instructions that, when executed, configures the one or more processors, to receive a captured video from a camera capturing a meeting room, extract and store a predefined region of the video as extracted image data, generate a first corrected image by performing first image correction processing on the extracted image data to correct noise and generate a binary mask of the first corrected image, generate a filtered image based on the binary mask of the first corrected image and the first corrected image, generate a second corrected image by performing second image correction processing on the filtered image; and perform blending processing that combines the second corrected image with the first corrected image to generate a final corrected image.
In another embodiment, store, in memory, a predetermined number of masks of the first corrected image, compute the average of the stored predetermined number of masks; and generate an average mask, wherein the filtered image is generated using the average mask.
These and other objects, features, and advantages of the present disclosure will become apparent upon reading the following detailed description of exemplary embodiments of the present disclosure, when taken in conjunction with the appended drawings, and provided claims.
Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope and spirit of the subject disclosure as defined by the appended claims.
Throughout the figures, the same reference numerals and characters, unless otherwise stated, are used to denote like features, elements, components or portions of the illustrated embodiments. Moreover, while the subject disclosure will now be described in detail with reference to the figures, it is done so in connection with the illustrative exemplary embodiments. It is intended that changes and modifications can be made to the described exemplary embodiments without departing from the true scope and spirit of the subject disclosure as defined by the appended claims.
Exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be noted that the following exemplary embodiment is merely one example for implementing the present disclosure and can be appropriately modified or changed depending on individual constructions and various conditions of apparatuses to which the present disclosure is applied. Thus, the present disclosure is in no way limited to the following exemplary embodiment and, according to the Figures and embodiments described below, embodiments described can be applied/performed in situations other than the situations described below as examples.
In an online meeting environment where a writing surface such as a whiteboard is being utilized by one or more participants in a meeting room, it is important that those attending the meeting remotely, and thus online, are able to clearly visualize the information being written on the writing surface. One way this is accomplished is using an image capture system such as a camera that can capture high resolution video image data of the writing surface so that those images can be communicated via a network to the remote participants for display on a computing device such as a laptop, tablet, and/or phone. However, in an exemplary environment as shown in
The image capture device 102 is controlled to capture the in-room data stream by a control apparatus 110. The control apparatus 110 is a computing device that may be located locally within the meeting room or deployed as a server that is in communication with the image capture device 102. The control apparatus 110 is hardware as described herein below with respect to
The control apparatus 110 is further configured to transmit video image data representing the real time in-room video via a communication network 120 to which at least one remote client using a computing device 130 is connected. In one embodiment, the communication network 120 is a wide area network (WAN) or local area network (LAN) that is further connected to a WAN such as the internet. The remote client device 130 can selectively access the in-room video data using a meeting application that controls an online meeting between participants in the room 101 and the at least one remote client device 130. The remote client device 130 may used a defined access link to obtain at least the in-room video data captured by the image capture device 102 via the control apparatus 110. In one embodiment, the access link enables the at least one remote client device 130 to obtain both the in-room video data and the predetermined region of the in-room video data that has been enhanced according to the image processing algorithm described hereinbelow.
In exemplary operation, the present disclosure advantageously enhances the writing surface 106 (e.g. whiteboard image) by selecting writing surface area on which a first image correction is performed to generate and store in memory, a first corrected image. In one embodiment, the first image correction is a keystone correction. Thereafter, a mask is computed based on the first corrected image and stored in a mask queue in memory which is set to store a predetermined number (which is configurable) of computed masks. When the Mask Queue is full, the oldest mask is dropped and new one is added to the end of the queue. A computed mask image that is used in performing the remaining image enhancement on the writing surface is computed based on all the masks in Mask Queue at a given time. The purpose of this process is to reduce the variation due to noise/compression across consecutive frames. Finally, the mask is applied to the first corrected image to filter out unwanted artifacts and generates a second corrected image on which color enhancement is applied thereby generating a third corrected image. This algorithm which is realized by one or more processors (CPU 501) of the control apparatus 103 reading and executing a pre-determined program stored in a memory (ROM 503) is described in greater detail below.
An exemplary image processing algorithm that improves the visual look of a predetermined region of a video data stream that is extracted therefrom and which performed by one or more processors that executes a set of stored instructions (e.g. a program) is described below. In one embodiment, the predetermined region includes a writing surface. The exemplary algorithm is as follows includes obtaining information representing predetermined corner positions of the writing surface to be corrected. These corner positions may be input via a user selection using a user interface whereby the user selects corner positions therein. In another embodiment, the writing surface (whiteboard, is automatically detected using known white board detection processing. For example, a user may view the in-room image that shows field of view 104 and identify points representing the four corners of the whiteboard. This may be done using an input device such as a mouse or via a touchscreen if the device displaying the video data is capable of receiving touch input.
Thereafter, a first image correction processing is performed on the data extracted from the region identified above. The first image correction processing is keystone correction on the whiteboard area based on the 4 defined corners in order to compute the smallest rectangle that will contain the 4 corners as shown in
The perspective transform is computed using the four user-defined corners as source and four corners of the computed rectangle as the target. The perspective transform calculates from four pairs of the corresponding points whereby the function calculates the 3×3 map_matrix of a perspective transform in Equation (1) so that:
where src represents the 4 corners defined and dst represents the 4 corners for the smallest rectangle according to the following equation
The algorithm obtains coefficients of the map_matrix (Cij) which is computed according to the algorithm illustrated in
A binary mask is created to filter out noise/illumination artifacts where the threshold value is the mean of neighborhood area. The KC Image is converted to grayscale and adaptive thresholding is applied to the grayscale image to create the binary mask. Adaptive thresholding is a method where the threshold value is calculated for smaller regions and therefore, there will be different threshold values for different regions). In one embodiment, the threshold value is the mean of neighborhood area and pixel values that are above the threshold are set to 1 and pixels values below the threshold are set to 0 (for example, the neighborhood area is a block/neighborhood size of 21). The created mask is added to a queue of masks.
Using the masks in the queue of masks, an updated binary mask is created whereby, for each pixel in the updated binary mask, the pixel value is determined such that, if sum of values for that pixel in all the masks in queue is greater than or equal to the number of masks in queue divided 2, the pixel value in the updated mask is set to 1 (enabled). Otherwise the pixel value is set to 0. The calculation performed for each pixel in the updated mask is performed using Equation 4:
where N is the number of masks in the queue, pxyq is the value for a respective pixel (x,y) for mask q, that value being 0 or 1, and mxy is the value of the pixel (x,y) in the final mask.
Next, the saturation and intensity are adjusted based on user configuration to adjust for more or less color saturation/intensity. This adjustment is performed for each pixel in KC Image while applying updated binary mask. To do this, the image color space is converted from RGB to HSV in order to adjust saturation and intensity values. Once converted, for all pixels in HSV Image, if mask value for the pixel is 1 (enabled), the pixel S value is updated using the configured saturation setting and the pixel V value is with the configured intensity setting. On the other hand, if mask value in pixel in 0 (disabled), then the pixel is set to white HSV value (0,0,255). Once the settings for each pixel in the HSV image are applied, the HSV Image back to RBG color space as an updated RGB image. An alpha blend is applied the updated RGB image using the KC Image (background). The alpha value for blending is configurable which advantageously enables control as to how strong the unfiltered frame will be merged to the filtered frame. For example, if the alpha value was configured as 0, only the filtered frame will be visible whereas if the alpha value is configured as 1 only the unfiltered frame will be visible. This allows the user to configure the blending that will ultimately be performed. The following computation in Equation 5 is applied for each pixel in result image (pr) using the updated RGB image (pu) with the KC Image (pkc) in order to return the resulting updated RGB image.
In 303, the writing surface region is extracted from image 302. The extracted writing surface region is defined using the points identified in image 302 whereby the points are positioned at respective corners of the writing surface. First image correction processing 303 is performed on the extracted writing surface region and generates a first corrected image 304 in
The second corrected image 310 in
In operation, the above algorithm is performed in real time as new video image data representing the in-room video stream is received by the control apparatus 110. During the online meeting the in-room video stream is transmitted over a first communication path (e.g. channel) in a first format and caused to be displayed on a display of the remote computing device. The extracted region representing the writing surface is not transmitted in the same first format. Rather, as the above algorithm extracts data from video frames, the enhanced writing surface region is transmitted in a second format. In one example, the second format is still image data transmitted at particular rate so that the transmitted enhanced writing surface region appears as video but is actually a series of sequentially processed still images which are communicated to the remote client device over a second, different communication path (channel). This advantageously enables the control apparatus to cause simultaneous display of both the live video data captured by the image capture device and an enhanced region that is generated in accordance with the algorithm described herein of that video data. The algorithm advantageously creates a binary mask based on the keystone corrected image (based on a number of past masks) to filter out noise and then performs saturation and intensity enhancements after applying the mask to the original image in order to alpha blend the keystone corrected image and enhanced image to produce final result.
The scope of the present invention includes a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform one or more embodiments of the invention described herein. Examples of a computer-readable medium include a hard disk, a floppy disk, a magneto-optical disk (MO), a compact-disk read-only memory (CD-ROM), a compact disk recordable (CD-R), a CD-Rewritable (CD-RW), a digital versatile disk ROM (DVD-ROM), a DVD-RAM, a DVD-RW, a DVD+RW, magnetic tape, a nonvolatile memory card, and a ROM. Computer-executable instructions can also be supplied to the computer-readable storage medium by being downloaded via a network.
The use of the terms “a” and “an” and “the” and similar referents in the context of this disclosure describing one or more aspects of the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the subject matter disclosed herein and does not pose a limitation on the scope of any invention derived from the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential.
It will be appreciated that the instant disclosure can be incorporated in the form of a variety of embodiments, only a few of which are disclosed herein. Variations of those embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. Accordingly, this disclosure and any invention derived therefrom includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
This application claims priority from U.S. Provisional Patent Application Ser. No. 63/291,650 filed on Dec. 20, 2021, the entirety of which is incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/081936 | 12/19/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63291650 | Dec 2021 | US |