Embodiments of the present disclosure relate generally to image capture and processing, computer science, and machine learning and, more specifically, to techniques for generating mattes for images.
Oftentimes, when generating visual effects, the foreground of an image needs to be separated from the background of that image. For example, in a situation where a video frame of image data includes an actor in the foreground in front of a particular background, the foreground that includes the actor may need to be separated from the background in order to generate a desired visual effect involving the actor. Continuing with this particular example, once separated, the foreground that includes the actor can be composited with a different background to generate a composite video frame in which the actor appears to be at a different location.
Separating the foreground of an image from the background of the image is, as a general matter, quite challenging technically because each pixel of the image can belong to both the foreground and the background. For example, a given pixel can belong in part to the foreground and in part to the background when that pixel is at the edge of an object, part of a wispy or transparent structure, or within a defocused or motion-blurred area of the image, to name a few. Image data indicating whether each pixel in an image belongs to the foreground or background by a degree of opacity or transparency for each pixel is commonly referred to as the “alpha channel” and also is sometimes referred to as a “matte” or an “alpha matte.” The alpha channel can account for foreground pixels, background pixels, and pixels that belong to both the foreground and background. In this regard, the alpha channel can indicate that foreground pixels within an image are opaque, background pixels are transparent, and pixels belonging to both the foreground and background are semi-transparent.
One conventional approach for generating an alpha matte is to capture an image of a subject, such as an actor, in front of a green screen. Green pixels within the captured image are considered the background of the image for an alpha matte. One drawback of this approach is the green screen background is oftentimes not perfectly green (i.e., captured pixels of the green screen can have nonzero red and/or blue values), and foreground elements can also include green (i.e., captured pixels of the foreground can have nonzero green values). Consequently, the foreground can be difficult to disambiguate from the background, and a complex matting algorithm typically has to be used to generate an alpha matte that indicates pixels belonging to the foreground and the background. As a general matter, conventional matting algorithms have many parameters that need to be tuned manually for any given image. Accordingly, having to implement conventional matting algorithms is usually computationally complex, tedious, and labor-intensive.
Another conventional approach for generating a matte is to directly capture the matte using a second camera that captures a different type of light than is captured using an RGB (red, green, blue) camera. For example, a beam splitter can be used to split incoming light between the RGB camera and the second camera, and the second camera can include a filter to capture infrared (IR) or sodium vapor light that is used to distinguish the background from the foreground of an image. One drawback of this approach is that the images captured using the second camera need to be precisely aligned with images that are captured using the RGB camera, which is oftentimes difficult to achieve. In addition, any camera setup that includes a second camera is necessarily more cumbersome and more difficult from an operational perspective relative to using only an RGB camera.
As the foregoing illustrates, what is needed in the art are more effective techniques for generating alpha mattes for images.
One embodiment of the present disclosure sets forth a computer-implemented method for generating mattes for images. The method includes receiving an image that includes a foreground having a first color and a background having a second color, wherein the second color is a complement of the first color. The method further includes generating a matte based on the second color included in the image.
Another embodiment of the present disclosure sets forth a system. The system includes at least one first light source configured to emit light having a first color. The system further includes a background having a second color that is a complement of the first color. In addition, the system includes a camera configured to capture at least a portion of the background and at least a portion of one or more objects illuminated by the at least one first light source.
Other embodiments of the present disclosure include, without limitation, one or more computer-readable media including instructions for performing one or more aspects of the disclosed techniques as well as a computing device for performing one or more aspects of the disclosed techniques.
At least one technical advantage of the disclosed techniques relative to the prior art is that the disclosed techniques can be used to generate relatively high quality mattes for images, without requiring the use of any complex matting algorithms or requiring the manual tuning of any algorithm parameters. Accordingly, the disclosed techniques can be more computationally efficient and less labor intensive than prior art approaches. In addition, the disclosed techniques utilize images captured by a single camera rather than images captured by two separate cameras where the resulting images need to be precisely aligned with each other. Thus, the disclosed techniques can be more accurate and can generate images without artifacts resulting from misalignments relative to what can be achieved using prior art approaches. These technical advantages represent one or more technological improvements over prior art approaches.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
So that the manner in which the above recited features of the present disclosure can be understood in detail, a more particular description of the disclosure, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments of this disclosure and are therefore not to be considered limiting of its scope, for the disclosure may admit to other equally effective embodiments.
As described, separating the foreground of an image from the background of the image is, as a general matter, quite challenging technically because each pixel of the image can belong to both the foreground and the background. One conventional approach for generating an alpha matte is to capture an image of a subject, such as an actor, in front of a green screen. Green pixels within the captured image are considered the background of the image for an alpha matte. However, the green screen background is oftentimes not perfectly green, and foreground elements can also include green. Because the foreground can be difficult to disambiguate from the background, a complex matting algorithm typically has to be used to generate an alpha matte that indicates pixels belonging to the foreground and the background. As a general matter, conventional matting algorithms have many parameters that need to be tuned manually for any given image. Accordingly, having to implement conventional matting algorithms is usually computationally complex, tedious, and labor-intensive. Another conventional approach for generating a matte is to directly capture the matte using a second camera that captures a different type of light than is captured using an RGB (red, green, blue) camera. However, images captured using the second camera need to be precisely aligned with images that are captured using the RGB camera, which is oftentimes difficult to achieve. In addition, any camera setup that includes a second camera is necessarily more cumbersome and more difficult from an operational perspective relative to using only an RGB camera.
The disclosed techniques determine alpha channels for images. In some embodiments, an image, which can be a standalone image or a frame of a video, is captured using foreground lighting of a particular color and a background having a complement color. The image is pre-processed to correct for color crosstalk. The complement color in the pre-processed image is converted to grayscale to generate a holdout matte, which can be inverted to obtain the alpha channel (i.e., matte) that indicates pixels of the image belonging to the foreground and/or background. Bounce light is also removed by subtracting the bounce light, which can be determined during calibration, multiplied by the holdout matte. Then, a trained machine learning model can be applied to convert a foreground of the image having the particular color into a colorized foreground image that also includes the complement color. The colorized foreground image can then be composited with an image of another background. In addition, the image and corresponding alpha channel can be used to train a machine learning model to predict an alpha channel given an image.
Advantageously, the disclosed techniques can be used to generate relatively high quality mattes for images, without requiring the use of any complex matting algorithms or requiring the manual tuning of any algorithm parameters. Accordingly, the disclosed techniques can be more computationally efficient and less labor intensive than prior art approaches. In addition, the disclosed techniques utilize images captured by a single camera rather than images captured by two separate cameras where the resulting images need to be precisely aligned with each other. Thus, the disclosed techniques can be more accurate and can generate images without artifacts resulting from misalignments relative to what can be achieved using prior art approaches.
Illustratively, in the LED volume 102, walls of the LED volume 102 in front and to the side of the objects 120 are used for the foreground lighting 104, and an area of the wall of the LED volume 102 behind the objects 120 is used for the background 106. In some embodiments, the background 106 can be within, and cropped tightly around, the camera frustum of the camera 110, to minimize spill light.
In operation, the camera 110 captures images of the objects 120 that are (1) lit by the foreground lighting 104 of one color, shown as magenta; and (2) in front of the background 106 that is a complement color of the foreground color, shown as green. Any suitable color and complement color can be used in some embodiments, such as yellow foreground lighting and a blue background, cyan foreground lighting and a red background, etc. Further, the camera 110 can be any suitable digital camera, such as a camera that is typically used for digital filmmaking.
As described in greater below in conjunction with
As shown, a model trainer 216 executes on a processor 212 of the machine learning server 210 and is stored in a system memory 214 of the machine learning server 210. The processor 212 receives user input from input devices, such as a keyboard, a mouse, a joystick, a touchpad, or a touchscreen. In operation, the processor 212 is the master processor of the machine learning server 210, controlling and coordinating operations of other system components. In particular, the processor 212 may issue commands that control the operation of a graphics processing unit (GPU) that incorporates circuitry optimized for graphics and video processing, including, for example, video output circuitry. The GPU may deliver pixels to a display device that may be any conventional cathode ray tube, liquid crystal display, light-emitting diode display, or the like.
The memory 214 of the machine learning server 210 stores content, such as software applications and data, for use by the processor 212 and the GPU. The memory 214 may be any type of memory capable of storing data and software applications, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash ROM), or any suitable combination of the foregoing. In some embodiments, a storage (not shown) may supplement or replace the memory 214. The storage may include any number and type of external memories that are accessible to the processor 212 and/or the GPU. For example, and without limitation, the storage may include a Secure Digital Card, an external Flash memory, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
It will be appreciated that the machine learning server 210 shown herein is illustrative and that variations and modifications are possible. For example, the number of processors 212, the number of GPUs, the number of system memories 214, and the number of applications included in the memory 214 may be modified as desired. Further, the connection topology between the various units in
As discussed in greater detail below, the model trainer 216 is configured to train machine learning models, including a colorization model 250, as discussed in greater detail below in conjunction with
Subsequent to training, the trained machine learning models can be deployed to any suitable applications. Illustratively, a compositing application 246 that utilizes the colorization model 250 is stored in a memory 244, and executes on a processor 242, of the computing device 240. In some embodiments, the compositing application 246 determines alpha channels for images captured by the camera 110 of the system 100, described above in conjunction with
The number of machine learning servers and computing devices may be modified as desired in some embodiments. Further, the functionality included in any of the applications may be divided across any number of applications or other software that are stored and executed via any number of devices that are located in any number of physical locations.
As shown, the image 302 includes a magenta foreground and a green background. The alpha channel module 304 processes the image 302 to determine an alpha channel associated with the image 302, shown as a holdout matte 306 that is output by the alpha channel module 304 and an inverse of the alpha channel. Illustratively, the holdout matte 306, includes white pixels corresponding to a background in the image 302, black pixels corresponding to a foreground in the image 302, and various gray pixels that indicate semi-transparency at boundaries between the foreground and background. Optionally, the compositing application 246 can invert the holdout matte 306 to generate an alpha matte (not shown). As shown, the alpha channel module 304 can also generate a foreground image 308 in which the background from the image 302 has been replaced with black.
The alpha channel generation module 406 takes as input the color-corrected image 404 and outputs the holdout matte 306. As described, the holdout matte 306 is an inverse of the alpha channel that includes white pixels corresponding to a background in the image 302, black pixels corresponding to a foreground in the image 302, and various gray pixels that indicate semi-transparency at boundaries between the foreground and background. In some embodiments, the alpha channel generation module 406 can generate the holdout matte 306 by taking the background (green) color values in the pre-processed image 408 to be the monochromatic alpha channel, which essentially converts the background color values to grayscale values of the holdout matte 306.
More formally, let the background color of a pixel be denoted by the RGB triple [BR, BG, BB], the foreground subject of the pixel be denoted by [FR, FG, FB], and the composited appearance of the pixel be denoted by [CR, CG, CB]. Assuming a single alpha transparency α for all color channels, the matting equations are:
It should be noted the matting equations (1) include seven total unknowns for a given image: BR, BG, BB, FR, FG, FB, α, as the pixel values of the image include CR, CG, CB. In cases where the background color BR, BG, BB can be measured (e.g., by photographing a clean plate without the foreground subject) there are four unknowns: FR, FG, FB, α. If a foreground subject reflects no blue light, then FB=0, and the blue channel of the subject in front of a blue screen gives a direct measurement of 1−α, which allows FR, FG, and a to be determined easily. The system 100, described above in conjunction with
Rearranging the equations (2) to solve for the three remaining unknowns yields
It should be noted the equations (3) are solvable only if BG>0, as otherwise the first equation is undefined. Furthermore, if a is zero, the foreground colors FR and FB are undefined. The intuition behind the equations (3) is that the background color (e.g., green) channel is guaranteed only to be nonzero in the background, and so the background color channel is essentially just a silhouette image of the subject, with pixel values of zero everywhere in the foreground. Accordingly, the background color channel is the inverse of the alpha channel, up to a scale factor. Although in equation (3) the foreground is guaranteed to have FG=0, FR=0, or FB=0 (or another color being zero) could instead have been chosen.
The bounce light subtraction module 408 takes as input the color-corrected image 404 and the holdout matte 306. Given such inputs, the bounce light subtraction module 408 generates the foreground image 308. As used herein, bounce light refers to light from the foreground that has reflected onto the background. As a result of bounce light, the foreground will not be seen against a perfect field of black, as is required for the foreground to be self-matting with a premultiplied alpha, but rather against a field of dim reflected foreground light. In some embodiments, the bounce light can be determined during calibration prior to capturing of the image 302. For example, for the LED volume 102 of
Returning to
The compositing module 314 takes the colorized foreground image 312 and the holdout matte 306 as inputs, and the compositing module 314 generates the composite image 316 in which the background in the colorized foreground image 312, as indicated by the holdout matte 306, has been replaced by a portion of another image, shown as an image of an outdoor scene. The compositing module 314 can perform any technically feasible compositing, including known compositing techniques, in some embodiments.
In addition to generating composite images, alpha channels and/or holdout mattes (e.g., holdout matte 306) that are generated by the alpha channel module 304 can be used, along with corresponding images, to train a machine learning model to output alpha channels (or holdout mattes) given input images. As discussed in greater detail below in conjunction with
In some embodiments, when videos are captured using the system 100 described above in conjunction with
In some embodiments, given time-multiplexed video frames, such as the video frames 502 and 504, a set of the video frames that includes one color foreground lighting and a complement color background can be used to generate an alpha channel (and/or holdout matte) and composite image, as described above in conjunction with
Illustratively, the training module 610 takes the images 606 and 608 as input and trains a colorization model 250 using the images 606 and 608 as training data. In some embodiments, the training module 610 uses the image 606 as the model input and the image 608 as the expected output during training. In some embodiments, any technically feasible training technique can be employed by the training module 610, such as backpropagation with gradient descent or a variation thereof. For example, in some embodiments, the Adam optimizer can be used during training. In some embodiments, the colorization model 250 can either be trained from scratch or trained by fine-tuning a pre-trained model for a particular scene, with the pre-trained model having been previously trained on other images of the world. In some embodiments, data augmentation can also be employed, such as by generating croppings of the input image 602 and randomly perturbing the image luminance and color balance in the croppings.
As shown, a method 800 begins at step 802, where the compositing application 246 receives an image that includes foreground lighting of a first color and a background having a complement color of the first color. In some embodiments, the image can be captured by a camera in a system such as the system 100, described above in conjunction with
At step 804, the compositing application 246 removes color crosstalk from the received image based on a color calibration to generate a color-corrected image. In some embodiments, the compositing application 246 removes the color crosstalk by applying, to the received image, a color calibration transformation. In such cases, the color calibration transformation can be a matrix determined during a calibration phase in which a color chart including a white square is captured while being illuminated separately by red, blue, and green light, as described above in conjunction with
At step 806, the compositing application 246 generates a matte (i.e., alpha channel) based on an inverse of the complement color in the pre-processed image. In addition or alternatively, the compositing application 246 can generate a holdout matte based on the complement color in the pre-processed image. As described, a holdout matte can be generated by taking the background (green) color values in the pre-processed image to be the monochromatic alpha channel, which essentially converts the background color values into grayscale values of the holdout matte. A matte can be obtained by inverting the holdout matte.
At step 808, the compositing application 246 optionally processes the matte via the alpha channel colorization model 704 to generate a colorized matte. As described above in conjunction with
As shown, a method 900 begins at step 902, where the compositing application 246 generates a foreground image based on (1) an image that includes foreground lighting of a first color and background having a complement color, and (2) a corresponding matte. The compositing application 246 can generate the foreground image by removing the background, as indicated by the matte, from the image.
At step 904, the compositing application 246 subtracts bounce light from the foreground image. In some embodiments, bounce light that was previously measured during calibration is multiplied by the holdout matte, and the result is subtracted from the foreground image at step 904.
At step 906, the compositing application 246 processes the foreground image (from which bounce light has been subtracted) using the colorization model 250 to generate a colorized foreground image. As described above in conjunction with
At step 908, the compositing application 246 composites the colorized foreground image with a background image to generate a composite image. Any technically feasible compositing, including known compositing techniques, can be performed in some embodiments.
At step 910, the compositing application 246 optionally adds motion blur to the composite image. Any technically feasible techniques for adding motion blur, including known techniques that generate synthetic motion blur based on optical flow, can be performed in some embodiments.
As shown, a method 1000 begins at step 1002, where the model trainer 216 receives images captured using white light and a black background.
At step 1004, the model trainer 216 trains the colorization model 250 using one color of the images as input and a complement color of the images as expected output. Any technically feasible training techniques, such as the training techniques described above in conjunction with
As shown, a method 1100 begins at step 1102, where the model trainer 216 receives images and corresponding mattes. In some embodiments, the mattes can be generated for the images according to the method 800 described above in conjunction with
At step 1104, the model trainer 216 trains a machine learning model to generate mattes using the images as input and the corresponding mattes as expected output. Any suitable machine learning model, such as a deep neural network, can be trained in some embodiments. Any technically feasible training techniques, such as backpropagation with gradient descent or a variant thereof, can be used in some embodiments. Subsequent to training, the trained machine learning model can be deployed for use in any suitable application, such a videoconferencing application, an image or video editing application, or the like.
In sum, techniques are disclosed for determining alpha channels for images. In some embodiments, an image, which can be a standalone image or a frame of a video, is captured using foreground lighting of a particular color and a background having a complement color. The image is pre-processed to correct for color crosstalk. The complement color in the pre-processed image is converted to grayscale to generate a holdout matte, which can be inverted to obtain the alpha channel (i.e., matte) that indicates pixels of the image belonging to the foreground and/or background. Bounce light is also removed by subtracting the bounce light, which can be determined during calibration, multiplied by the holdout matte. Then, a trained machine learning model can be applied to convert a foreground of the image having the particular color into a colorized foreground image that also includes the complement color. The colorized foreground image can then be composited with an image of another background. In addition, the image and corresponding alpha channel can be used to train a machine learning model to predict an alpha channel given an image.
At least one technical advantage of the disclosed techniques relative to the prior art is that the disclosed techniques can be used to generate relatively high quality mattes for images, without requiring the use of any complex matting algorithms or requiring the manual tuning of any algorithm parameters. Accordingly, the disclosed techniques can be more computationally efficient and less labor intensive than prior art approaches. In addition, the disclosed techniques utilize images captured by a single camera rather than images captured by two separate cameras where the resulting images need to be precisely aligned with each other. Thus, the disclosed techniques can be more accurate and can generate images without artifacts resulting from misalignments relative to what can be achieved using prior art approaches. These technical advantages represent one or more technological improvements over prior art approaches.
Any and all combinations of any of the claim elements recited in any of the claims and/or any elements described in this application, in any fashion, fall within the contemplated scope of the present disclosure and protection.
The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.
Aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “module” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine. The instructions, when executed via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such processors may be, without limitation, general-purpose processors, special-purpose processors, application-specific processors, or field-programmable gate arrays.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
While the preceding is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application claims priority benefit of the United States Provisional Patent Application titled, “TECHNIQUES FOR SPECTRALLY AND/OR TEMPORALLY MULTIPLEXED ALPHA MATTING,” filed on May 12, 2023, and having Ser. No. 63/502,034. The subject matter of this related application is hereby incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63502034 | May 2023 | US |