The present invention refers to the field of video processing, and, in particular, to a system and a method for achieving high quality foreground segmentation using an active background.
Chroma key compositing, or chroma keying, is a special effects/post-production technique for compositing (layering) two images or video streams together based on color hues (chroma range). The technique has been used heavily in many fields to remove a background from the subject of a photo or video—particularly the news casting, motion picture and video game industries.
This technique still has some limitations. Obviously the foreground objects and persons can't feature a color similar to the background. The main issue is to achieve even lightning: The whole background has to appear to the camera as a uniform unique given color and have to be spotless of shade from the foreground onto the color background. Bright materials can also provoke segmentation errors by having the color spill onto them. Even a difference in the focal length of the lenses used can affect the success of chroma key.
The proposed invention describes not only how to achieve high quality and smooth foreground segmentation but also how to achieve proper lightning compensation and resilience. The system is able to operate in a wide range of lightning conditions and can potentially be used outdoors with indirect sunlight. By using the invention it could lead to breakthrough capabilities and cost reduction for applications in the fields covered by but not limited to:
The present invention is embodied in a system and a method capable of achieving high quality foreground segmentation using an active background, wherein foreground is any object or person located between a camera and a background. The system is comprising an active background, one or several multispectral cameras, a hardware synchronizer, an invisible light driver and a main computer.
The system is using the active background for illuminating in the invisible light spectrums and the multispectral cameras for simultaneously recording images in both visible and invisible spectrums. Comparing the invisible light images to a reference invisible light image set allows to identify the pixels that are occupied by the foreground, and thus to generate a foreground mask. According to an embodiment of the invention the main computer can further compute gradients from foreground to background pixels, used to interpolate color and alpha values. According to some embodiments of the present invention the active background is electronically controlled and it can trigger invisible light flashes only when required by the system. This way, the system can reduce artifacts by comparing natural ambient invisible light amount to invisible light emitted by the active background. The system finally generates a live stream of images containing an alpha channel insuring great looking spatial transition with the foreground.
The invisible light (further referred to as IL) is light that is not visible for the human eye but that can be filmed by dedicated cameras. This includes any spectrum that is below or above what the human eye can perceive (such as, but not limited to, infra-red or ultra violet spectrums). The visible light will be further referred to as VL. Ambient invisible light is IL that does not come from the active background. This can be natural infra-red light from the sun for example, or non-visible light coming out of halogen light bulbs. This non-controlled light generates noise for the system and will be further referred as ambient IL.
The active background is one or several surfaces emitting or reflecting IL towards one or several cameras. According to one of the embodiments of the invention the background is comprising an IL emitter (sources of invisible light such as projectors or LED strips) and a background surface that can either reflect or be transparent to IL. In such a configuration at least two setups can be used for the background. For example the IL emitter faces the background surface and in that case, the IL is emitted towards the surface and rebounds on the opaque and reflective background surface. Another example could be a setup wherein the IL emitter is behind the background surface and faces the camera. In this case, the IL travels through the background surface towards the camera. This time, the background surface is not opaque and reflective, but translucent to IL. The background surface may be painted or printed provided it respects the needs explained above (IL reflectivity or translucency). Translucency may be achieved using micro-perforated printed material.
According to some embodiments of the present invention electronically controlled IL emitting devices are used, that can be electronically switched on and off at a very high pace by the Hardware Synchronizer. Background is also called “active background” because it emits IL on demand.
The multispectral camera is able to record, for each pixel, a red, green and blue value (RGB color model) from the visible spectrum and also a value for a specific narrow band in the IL spectrums corresponding to the active background. In order to improve and widen the resilience to external lightning change, the camera is able to apply different settings for the gain to the visible light sensors and the IL sensors. External synchronization signals for each of the channels (RGB and IL) are available through general purpose input/output pins. Prism based multi CCD cameras often feature independent sensors for IL and visible spectrums.
The Hardware Synchronizer is kind of a metronome that synchronizes IL flashes with camera's images capture by means of at least three signals that can be generated:
It can be a simple microcontroller board running a program logic to generate the proper synchronization signals, logic that will be described in detail later in the document. Each of the signals can contain additional information, as for example timing and aperture time information for the camera signals.
According to some embodiments of the present invention the hardware synchronizer can work in different modes such as Flash, Strobe or Double Strobe depending on the available functionalities of the camera and a balance between latency and resilience. However when ambient IL is very low (generally indoors with controlled lighting system), the Hardware Synchronizer can be removed and leave the IL emitter always switched on. This allows the usage of a camera that wouldn't feature a synchronization signal.
The IL Driver is to power up and control the IL emitter according to the system requirements. As an example, if the lighting system is based on LED technology then its main interest is to provide the highest supported electric current to the lightning system with less than 10 us ramp up and down.
The main computer runs the video signal processing and performs the segmentation procedure. It has to be able to read the stream of incoming image data and have enough computational power both in term of CPU and GPU to achieve real time processing.
A general scheme of the components is presented in
The main features of the system consist of one or several of the following:
The first sub-system (a) comprises:
The second sub-system (b) comprises:
The third sub-system (c) comprises:
The last sub-system (d) comprises:
Other features are inherent in the methods and systems disclosed or will become apparent to those skilled in the art from the following detailed description of embodiments and accompanying drawings.
In a preferred embodiment, the system is composed of an active background (110), one or several multispectral cameras (120), a hardware synchronizer (130), an invisible light driver (140) and a main computer (150). A general scheme of the components is presented in
In one embodiment of the invention, a system setup (312) is initially performed. The components can be placed in such configurations as allowing the subject to be located between the camera (120) and the active background (110). One example for such a system configuration is schematically illustrated in
In one embodiment of the invention, subsequently to the system setup (312) the system can be triggered to start a new recording (314). In a preferred embodiment the whole system can start up by itself once plugged, the main computer (150) auto boot a client application dedicated for the real time processing and the hardware synchronizer (130) always emits its trigger signals. The client application will further send the startup sequence and change the hardware synchronizer (130) mode if needed.
In one embodiment of the invention, the real time processing is preceded by a calibration step when background reference images (316) are acquired and stored in the main computer (150). This step requires that no foreground object is present. The reference images are acquired both with and without illumination in the VL and IL spectrums. The main computer (150) is creating 3 reference maps, storing for each pixel mean value and standard deviation as follows: the first reference map for the 3 color channels of the visible spectrum (RGB), the second with the IL spectrum values with backlighting on and the third in the IL spectrum with backlighting off. According to some embodiments of the present invention such a recalibration (330) can be executed during the video recording, reference images being automatically recomputed when no foreground object is detected for a given period of time. For example, the recalibration is mostly needed when the lightning conditions are changing drastically. Other situations that could require recalibration includes scenarios when the background have been moved a little bit (happening for example when the subject collide with the background while playing).
In a preferred embodiment, the real time processing is performed frame by frame (320) by the main computer (150) based on the frame images (318) acquired by the camera (120) in both visible and invisible spectrums. During the IL frame images acquisition (318) the background lightning is ensured either by keeping the IL emitter always switched on or by using the hardware synchronizer (130) to set the timing for the invisible lightning system and the camera exposure, wherein the IL exposure time is kept very short compared to the visible light exposure time. Considering the usage of the hardware synchronizer (130) different scenarios will be further described.
A step by step method for performing a real time frame processing (320) is schematically illustrated in
In one embodiment of the invention, in order to compute the alpha-channel (327) the system estimates the ratio of foreground visibility (“alpha”) as follows: “alpha”=(il−bg_il)/(fg_il−bg_il) where bg_il is an estimate of the IL intensity that would be measured if the pixel was showing only the background, and where fg_il is an estimate of the IL intensity that would be measured if the pixel was totally showing the foreground object. One possibility to estimate bg_il is to use the proper reference map taken when no foreground is present (with either backlighting on or off). Another option is to search the closest pixel marked as background in the trimap and to the current frame IL pixel at this location as an estimate for bg_il. One possibility to estimate fg_il is by searching the closest pixel on the trimap marked as foreground, and to use the IL value for the pixel at this location as fg_il. When searching for the nearest pixel different methodologies can be further applied. One possibility could be to accelerate the computation of searching the nearest pixel satisfying some properties by using a Distance Transform algorithm, modified to keep track of which pixel is the closest [Felzenszwalb 2004].
In one embodiment of the invention, the visibility ratio “alpha” can be used to remove the influence of the background on “unknown” pixels. In one of the embodiments the color “fg” without background influence can be computed in the following way: “fg”=measured/alpha−estimatedBg*(1/alpha−1) where “measured” is the measured RGB pixel from the input image, and where estimatedBg is an estimate of the RGB color of the background at this location. One possible method to estimate the background color for a pixel within the unknown zone of the trimap is to look at the nearest pixel marked as background in the trimap and use this color, or use an average of neighboring colors. Another option is to rely on a color background model. In a preferred embodiment a reference color model showing the background is acquired. It would not be used directly since illumination and camera acquisition settings might have change between background acquisition time and the current frame. It is necessary to estimate illumination locally to correct the color background model. Doing so consists in multiplying the background model pixel at the estimated location by the ratio between a current frame pixel at a close known background location with the background model pixel at the same location.
According to some embodiments of the present invention the hardware synchronizer (130) is used either in Strobe or Double Strobe modes so that noise reduction (321) can be performed as the first step in frame processing. This can be achieved by comparing two consecutive IL images (one with the background IL on and the other with the background IL off), then subtracting the two consecutive IL images together and storing the absolute value for each pixel. This value will have the ambient IL influence removed and can be further used in the real time processing. For example, when in Double Strobe mode one possibility to achieve this is by acquiring the two IL frames with a very short aperture time (for example close to 200 us) and a negligible delay between them (for example less than 1 ms). Additionally, when in Strobe mode, the last 2 frames are stored. By comparing the current frame with the one taken with the same lighting conditions, movement compensation can be further achieved. In one embodiment of the invention camera movement can be also achieved by using for example external augmented reality engines to compute camera position and movement to find the proper pixel to pixel matching to the reference background model.
Referring to
The processes described herein are not limited to use with the hardware and software of
The system may be implemented, at least in part, via a computer program product, (e.g., in a non-transitory machine-readable storage medium such as, for example, a non-transitory computer-readable medium), for execution by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers). Each such program may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the programs may be implemented in assembly or machine language. The language may be a compiled or an interpreted language and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network. A computer program may be stored on a non-transitory machine-readable medium that is readable by a general or special purpose programmable computer for configuring and operating the computer when the non-transitory machine-readable medium is read by the computer to perform the processes described herein. For example, the processes described herein may also be implemented as a non-transitory machine-readable storage medium, configured with a computer program, where upon execution, instructions in the computer program cause the computer to operate in accordance with the processes. A non-transitory machine-readable medium may include but is not limited to a hard drive, compact disc, flash memory, non-volatile memory, volatile memory, magnetic diskette and so forth but does not include a transitory signal per se.
The processing blocks (for example, in the processes described herein associated with implementing the system may be performed by one or more programmable processors executing one or more computer programs to perform the functions of the system. All or part of the system may be implemented as, special purpose logic circuitry (e.g., an FPGA (field-programmable gate array) and/or an ASIC (application-specific integrated circuit)). All or part of the system may be implemented using electronic hardware circuitry that include electronic devices such as, for example, at least one of a processor, a memory, a programmable logic device or a logic gate.
Elements of different embodiments described herein may be combined to form other embodiments not specifically set forth above. Various elements, which are described in the context of a single embodiment, may also be provided separately or in any suitable sub combination. Other embodiments not specifically described herein are also within the scope of the following claims.