1. Field of the Invention
This invention relates generally to the field of motion capture. More particularly, the invention relates to an improved apparatus and method for performing motion capture and image reconstruction.
2. Description of the Related Art
“Motion capture” refers generally to the tracking and recording of human and animal motion. Motion capture systems are used for a variety of applications including, for example, video games and computer-generated movies. In a typical motion capture session, the motion of a “performer” is captured and translated to a computer-generated character.
As illustrated in
By contrast, in an optical motion capture system, such as that illustrated in
A motion tracking unit 150 coupled to the cameras is programmed with the relative position of each of the markers 101, 102 and/or the known limitations of the performer's body. Using this information and the visual data provided from the cameras 120-122, the motion tracking unit 150 generates artificial motion data representing the movement of the performer during the motion capture session.
A graphics processing unit 152 renders an animated representation of the performer on a computer display 160 (or similar display device) using the motion data. For example, the graphics processing unit 152 may apply the captured motion of the performer to different animated characters and/or to include the animated characters in different computer-generated scenes. In one implementation, the motion tracking unit 150 and the graphics processing unit 152 are programmable cards coupled to the bus of a computer (e.g., such as the PCI and AGP buses found in many personal computers). One well known company which produces motion capture systems is Motion Analysis Corporation (see, e.g., www.motionanalysis.com).
A system and method are described for performing motion capture on a subject using transparent makeup, paint, dye or ink that is visible to certain cameras, but invisible to other cameras. For example, a system according to one embodiment of the invention comprises the application of makeup, paint, dye or ink on a subject in a random pattern that contains a phosphor that is transparent in the visible light spectrum, but is emissive in a non-visible spectrum such as the infrared (IR) or ultraviolet (UV) spectrum; using visible light such as ambient light or daylight to illuminate the subject; using a first plurality of cameras sensitive in the visible light spectrum to capture the normal coloration of the subject; and using a second plurality of cameras sensitive in a non-visible spectrum to capture the random pattern.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
A better understanding of the present invention can be obtained from the following detailed description in conjunction with the drawings, in which:
a illustrates one embodiment of the invention during a time interval when the light panels are lit.
b illustrates one embodiment of the invention during a time interval when the light panels are dark.
a illustrates a prior art stop-motion animation stage.
b illustrates one embodiment of the invention where stop-motion characters and the set are captured together.
c illustrates one embodiment of the invention where the stop-motion set is captured separately from the characters.
d illustrates one embodiment of the invention where a stop-motion character is captured separately from the set and other characters.
e illustrates one embodiment of the invention where a stop-motion character is captured separately from the set and other characters.
a-b illustrate one embodiment of the invention for capturing images using two different types of light panels.
a-33b illustrate one embodiment of the invention for capturing images of surfaces with transparent IR-emissive makeup.
Described below is an improved apparatus and method for performing motion capture using shutter synchronization and/or phosphorescent makeup, paint or dye. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form to avoid obscuring the underlying principles of the invention.
The assignee of the present application previously developed a system for performing color-coded motion capture and a system for performing motion capture using a series of reflective curves painted on a performer's face. These systems are described in the co-pending applications entitled “A
The assignee of the present application also previously developed a system for performing motion capture of random patterns applied to surfaces. This system is described in the co-pending applications entitled “A
The assignee of the present application also previously developed a system for performing motion capture using shutter synchronization and phosphorescent paint. This system is described in the co-pending application entitled “A
As described in these co-pending applications, by analyzing curves or random patterns applied as makeup on a performer's face rather than discrete marked points or markers on a performer's face, the motion capture system is able to generate significantly more surface data than traditional marked point or marker-based tracking systems. The random patterns or curves are painted on the face of the performer using retro-reflective, non-toxic paint or theatrical makeup. In one embodiment of the invention, non-toxic phosphorescent makeup is used to create the random patterns or curves. By utilizing phosphorescent paint or makeup combined with synchronized lights and camera shutters, the motion capture system is able to better separate the patterns applied to the performer's face from the normally-illuminated image of the face or other artifacts of normal illumination such as highlights and shadows.
a and 2b illustrate an exemplary motion capture system described in the co-pending applications in which a random pattern of phosphorescent makeup is applied to a performer's face and motion capture is system is operated in a light-sealed space. When the synchronized light panels 208-209 are on as illustrated
Grayscale dark cameras 204-205 are synchronized to the light panels 208-209 using the synchronization signal generator PCI Card 224 (an exemplary PCI card is a PCI-6601 manufactured by National Instruments of Austin, Tex.) coupled to the PCI bus of synchronization signal generator PC 220 that is coupled to the data processing system 210 and so that all of the systems are synchronized together. Light Panel Sync signal 222 provides a TTL-level signal to the light panels 208-209 such that when the signal 222 is high (i.e. ≧2.0V), the light panels 208-209 turn on, and when the signal 222 is low (i.e. ≦0.8V), the light panels turn off. Dark Cam Sync signal 221 provides a TTL-level signal to the grayscale dark cameras 204-205 such that when signal 221 is low the camera 204-205 shutters open and each camera 204-205 captures an image, and when signal 221 is high the shutters close and the cameras transfer the captured images to camera controller PCs 205. The synchronization timing (explained in detail below) is such that the camera 204-205 shutters open to capture a frame when the light panels 208-209 are off (the “dark” interval). As a result, grayscale dark cameras 204-205 capture images of only the output of the phosphorescent makeup. Similarly, Lit Cam Sync 223 provides TTL-level signal to color lit cameras 214-215 such that when signal 221 is low the camera 204-205 shutters open and each camera 204-205 captures an image, and when signal 221 is high the shutters close and the cameras transfer the captured images to camera controller computers 225. Color lit cameras 214-215 are synchronized (as explained in detail below) such that their shutters open to capture a frame when the light panels 208-209 are on (the “lit” interval). As a result, color lit cameras 214-215 capture images of the performers' face illuminated by the light panels.
As used herein, grayscale cameras 204-205 may be referenced as “dark cameras” or “dark cams” because their shutters normally only when the light panels 208-209 are dark. Similarly, color cameras 214-215 may be referenced as “lit cameras” or “lit cams” because normally their shutters are only open when the light panels 208-209 are lit. While grayscale and color cameras are used specifically for each lighting phase in one embodiment, either grayscale or color cameras can be used for either light phase in other embodiments.
In one embodiment, light panels 208-209 are flashed rapidly at 90 flashes per second (as driven by a 90 Hz square wave from Light Panel Sync signal 222), with the cameras 204-205 and 214-205 synchronized to them as previously described. At 90 flashes per second, the light panels 208-209 are flashing at a rate faster than can be perceived by the vast majority of humans, and as a result, the performer (as well as any observers of the motion capture session) perceive the room as being steadily illuminated and are unaware of the flashing, and the performer is able to proceed with the performance without distraction from the flashing light panels 208-209.
As described in detail in the co-pending applications, the images captured by cameras 204-205 and 214-215 are recorded by camera controllers 225 (coordinated by a centralized motion capture controller 206) and the images and images sequences so recorded are processed by data processing system 210. The images from the various grayscale dark cameras are processed so as to determine the geometry of the 3D surface of the face 207. Further processing by data processing system 210 can be used to map the color lit images captured onto the geometry of the surface of the face 207. Yet further processing by the data processing system 210 can be used to track surface points on the face from frame-to-frame.
In one embodiment, each of the camera controllers 225 and central motion capture controller 206 is implemented using a separate computer system. Alternatively, the camera controllers and motion capture controller may be implemented as software executed on a single computer system or as any combination of hardware and software. In one embodiment, the camera controller computers 225 are rack-mounted computers, each using a 945GT Speedster-A4R motherboard from MSI Computer Japan Co., Ltd. (C&K Bldg. 6F 1-17-6, Higashikanda, Chiyoda-ku, Tokyo 101-0031 Japan) with 2 Gbytes of random access memory (RAM) and a 2.16 GHz Intel Core Duo central processing unit from Intel Corporation, and a 300 GByte SATA hard disk from Western Digital, Lake Forest Calif. The cameras 204-205 and 214-215 interface to the camera controller computers 225 via IEEE 1394 cables.
In another embodiment the central motion capture controller 206 also serves as the synchronization signal generator PC 220. In yet another embodiment the synchronization signal generator PCI card 224 is replaced by using the parallel port output of the synchronization signal generator PC 220. In such an embodiment, each of the TTL-level outputs of the parallel port are controlled by an application running on synchronization signal generator PC 220, switching each TTL-level output to a high state or a low state in accordance with the desired signal timing. For example, bit 0 of the PC 220 parallel port is used to drive synchronization signal 221, bit 1 is used to drive signal 222, and bit 2 is used to drive signal 224. However, the underlying principles of the invention are not limited to any particular mechanism for generating the synchronization signals.
The synchronization between the light sources and the cameras employed in one embodiment of the invention is illustrated in
As a result, during the first time interval 301, a normally-lit image is captured by the color lit cameras 214-215, and the phosphorescent makeup is illuminated (and charged) with light from the light panels 208-209. During the second time interval 302, the light is turned off and the grayscale dark cameras 204-205 capture an image of the glowing phosphorescent makeup on the performer. Because the light panels are off during the second time interval 302, the contrast between the phosphorescent makeup and any surfaces in the room without phosphorescent makeup is extremely high (i.e., the rest of the room is pitch black or at least quite dark, and as a result there is no significant light reflecting off of surfaces in the room, other than reflected light from the phosphorescent emissions), thereby improving the ability of the system to differentiate the various patterns applied to the performer's face. In addition, because the light panels are on half of the time, the performer will be able to see around the room during the performance, and also the phosphorescent makeup is constantly recharged. The frequency of the synchronization signals is 1/(time interval 303) and may be set at such a high rate that the performer will not even notice that the light panels are being turned on and off. For example, at a flashing rate of 90 Hz or above, virtually all humans are unable to perceive that a light is flashing and the light appears to be continuously illuminated. In psychophysical parlance, when a high frequency flashing light is perceived by humans to be continuously illuminated, it is said that “fusion” has been achieved. In one embodiment, the light panels are cycled at 120 Hz; in another embodiment, the light panels are cycled at 140 Hz, both frequencies far above the fusion threshold of any human. However, the underlying principles of the invention are not limited to any particular frequency.
Lit Image 401 shows an image of the performer captured by one of the color lit cameras 214-215 during lit interval 301, when the light panels 208-209 are on and the color lit camera 214-215 shutters are open. Note that the phosphorescent makeup is quite visible on the performer's face, particularly the lips.
Dark Image 402 shows an image of the performer captured by one of the grayscale dark cameras 204-205 during dark interval 302, when the light panels 208-209 are off and the grayscale dark camera 204-205 shutters are open. Note that only random pattern of phosphorescent makeup is visible on the surfaces where it is applied. All other surfaces in the image, including the hair, eyes, teeth, ears and neck of the performer are completely black.
3D Surface 403 shows a rendered image of the surface reconstructed from the Dark Images 402 from grayscale dark cameras 204-205 (in this example, 8 grayscale dark cameras were used, each producing a single Dark Image 402 from a different angle) pointed at the model's face from a plurality of angles. One reconstruction process which may be used to create this image is detailed in co-pending application A
Textured 3D Surface 404 shows the Lit Image 401 used as a texture map and mapped onto 3D Surface 403 and rendered at an angle. Although Textured 3D Surface 404 is a computer-generated 3D image of the model's face, to the human eye it appears real enough that when it is rendered at an angle, such as it is in image 404, it creates the illusion that the model is turning her head and actually looking at an angle. Note that no phosphorescent makeup was applied to the model's eyes and teeth, and the image of the eyes and teeth are mapped onto flat surfaces that fill those cavities in the 3D surface. Nonetheless, the rest of the 3D surface is reconstructed so accurately, the resulting Textured 3D Surface 404 approaches photorealism. When this process is applied to create successive frames of Textured 3D Surfaces 404, when the frames are played back in real-time, the level of realism is such that, to the untrained eye, the successive frames look like actual video of the model, even though it is a computer-generated 3D image of the model viewed from side angle.
Since the Textured 3D Surfaces 404 produces computer-generated 3D images, such computer-generated images can manipulated with far more flexibility than actual video captured of the model. With actual video it is often impractical (or impossible) to show the objects in the video from any camera angles other than the angle from which the video was shot. With computer-generated 3D, the image can be rendered as if it is viewed from any camera angle. With actual video it is generally necessary to use a green screen or blue screen to separate an object from its background (e.g. so that a TV meteorologist can be composited in front of a weather map), and then that green- or blue-screened object can only be presented from the point of view of the camera shooting the object. With the technique just described, no green/blue screen is necessary. Phosphorescent makeup, paint, or dye is applied to the areas desired to be captured (e.g. the face, body and clothes of the meteorologist) and then the entire background will be separated from the object. Further, the object can be presented from any camera angle. For example, the meteorologist can be shown from a straight-on shot, or from an side angle shot, but still composited in front of the weather map.
Further, a 3D generated image can be manipulated in 3D. For example, using standard 3D mesh manipulation tools (such as those in Maya, sold by Autodesk, Inc.) the nose can be shortened or lengthened, either for cosmetic reasons if the performer feels her nose would look better in a different size, or as a creature effect, to make the performer look like a fantasy character like Gollum of “Lord of the Rings.” More extensive 3D manipulations could add wrinkles to the performers face to make her appear to be older, or smooth out wrinkles to make her look younger. The face could also be manipulated to change the performer's expression, for example, from a smile to a frown. Although some 2D manipulations are possible with conventional 2D video capture, they are generally limited to manipulations from the point of view of the camera. If the model turns her head during the video sequence, the 2D manipulations applied when the head is facing the camera would have to be changed when the head is turned. 3D manipulations do not need to be changed, regardless of which way the head is turned. As a result, the techniques described above for creating successive frames of Textured 3D Surface 404 in a video sequence make it possible to capture objects that appear to look like actual video, but nonetheless have the flexibility of manipulation as computer-generated 3D objects, offering enormous advantages in production of video, motion pictures, and also video games (where characters may be manipulated by the player in 3D).
Note that in
However, there is a notable differences between the images of
Note that mixing the phosphorescent makeup with makeup base does reduce the brightness of the phosphorescence during the Dark interval 302. Despite this, the phosphorescent brightness is still sufficient to produce Dark Image 502, and there is enough dynamic range in the dark images from the 8 grayscale dark cameras to reconstruct 3D Surface 503. As previously noted, some applications do not require an accurate capture of the skin color of the model, and in that case it is advantageous to not mix the phosphorescent makeup with base, and then get the benefit of higher phosphorescent brightness during the Dark interval 302 (e.g. higher brightness allows for a smaller aperture setting on the camera lens, which allows for larger depth of field). But some applications do require an accurate capture of the skin color of the model. For such applications, it is advantageous to mix the phosphorescent makeup with base (in a color suited for the model's skin tone) makeup, and work within the constraints of lower phosphorescent brightness. Also, there are applications where some phosphor visibility is acceptable, but not the level of visibility seen in Lit Image 401. For such applications, a middle ground can be found in terms of skin color accuracy and phosphorescent brightness by mixing a higher percentage of phosphorescent makeup relative to the base.
In another embodiment, luminescent zinc sulfide (ZnS:Cu) in its raw form is mixed with base makeup and applied to the model's face.
A disadvantage of using phosphorescent makeup, with or without base makeup mixed in, as described above and illustrated in
The phosphor makeup used in
One embodiment is illustrated in
As previously described above and in the co-pending applications, the multiple views of the random patterns of the makeup 3003 (e.g. in this case, the transparent UV makeup, rather than the phosphorescent or visible light makeup) captured by the grayscale cameras 3004 and 3005 are processed by data processing system 3010, to result in the 3D surface 3007. And, then when the images 3002 captured by the color camera 3014-3015 are texture mapped onto to the 3D surface 3007, the textured 3D surface 3017 is generated, which at sufficient resolution and viewed from the same angles is effectively indistinguishable from the color images 3002.
The timing diagram showing the sync signals generated by the Sync Generator PCI card to achieve the light and camera operation described in the previous paragraphs is shown in
In one embodiment, the alternation of the Visible Light 3008-3009 panels and the UV Light Panels 3038 occurs 90 times per second or higher, which places the flashing above the threshold of human perception, and so that the flashing is not perceptible to either the performer or viewers.
In another embodiment, the Visible Light Panels 3008-3009 are left on all the time (e.g. effectively Visible Light Panel sync signal 3022 is in the “On” state 3133 all the time). Alternatively, or in addition, the same effect can be achieved without a sync signal by using any form of ambient lighting or by shooting in daylight. Regardless of the type of visible lighting used, only the UV light panels are flashed on and off in this embodiment. The camera shutter synchronization is the same as described above. In this case, the color cameras 3014-3015 capture the natural skin coloring when their shutters are opened since the UV lights are off during that time. The images captured by the grayscale cameras show the performer illuminated by both visible light and UV light. In practice, there is still significant contrast between the bright emissive random pattern of the transparent UV makeup and the reflective background skin color. A significant advantage of this embodiment is that the visible lighting does not need to be flashed, and as a result, the normal ambient lighting (whether indoors or outdoors) can be used.
In some special effects situations, the natural skin color is not needed. In another embodiment, both the UV lighting and the visible lighting are left on all of the time (e.g. Sync Signals 3022 and 3026 are in On states 3133 and 3151 constantly, or simply ambient lighting is left on (or daylight is used) and the UV Light Panels 3008-3009 are left on), and the color and grayscale cameras are synchronized, but their shutters are open for the entire frame interval, or for as much of the frame interval desired by the camera operator (i.e. they are operated as typical video cameras). In this embodiment, the color cameras will capture the random pattern of the transparent makeup, and as a result the natural skin coloring will not be captured. Indeed, in one embodiment, no color cameras are used at all, and just the random pattern is captured by the grayscale cameras. In another embodiment, no grayscale cameras are used at all, as the random pattern captured by the color cameras is used. And, in another embodiment as previously described, a random pattern of visible light makeup that contrasts with the skin color (e.g., each dark makeup on light skin or light makeup on dark skin) is used and no UV light is used at all.
In embodiments employing UV Light Panels, one problem is that UV light will not only be absorbed by the transparent UV makeup, but it will also reflect off of surfaces on the performer. For example, white areas of the eyes and teeth are good reflectors of UV light. Many cameras are sensitive to UV light as well as visible light, and as a result, the cameras will capture not only the visible light emitted by the transparent UV makeup, but also the reflected UV light. Moreover, the reflected UV light can be of higher intensity than visible light, thereby dominating the captured image. Camera lenses typically will have a different focal length for UV light than for visible light. So, if the cameras are focused for visible light to capture the random emissive pattern of the makeup, they will typically be out of focus in capturing areas strongly reflecting UV light such as eyes and teeth. In one embodiment, the images of surfaces that do not have makeup on them (e.g. eyes and teeth) are used in creating a 3D model of the performance (e.g. by tracking the eye position or the teeth position, either automatically by computers performing image processing, by human animators, or a combination of both). If such features are blurry, then it will be more difficult to accurately track such surface features.
In one embodiment, the cameras whose shutters are open when the UV lights are on are outfitted with UV-blocking filters. Such filters are quite commonly available from optical or photographic suppliers. In this way, the cameras only capture the visible light emitted by the transparent UV makeup and the visible light reflected by the surfaces that do have UV makeup on them. And, since only visible light is captured, it can all be captured sharply with the same focus setting of the cameras.
One disadvantage of using the transparent UV makeup is that UV lights typically have to be on during the capture of the random pattern, and indeed, in some embodiments, the ambient lights are on as well. As a result, the cameras will capture not only the random pattern of the transparent UV makeup, but whatever else is illuminated in the scene by whatever lights are on. When the captured images are processed in Data Processing system 3010, the processing system may find pattern correlations in areas without the transparent UV makeup and may find correlations in those areas and try to reconstruct 3D surfaces in those areas. Although there are situations where this may be acceptable, or even useful, in other situations this is not useful and in fact may result in 3D surface data that is either not accurate, nor desired or both.
The undesired inaccurately-reconstructed surfaces can be removed through various means, resulting in the relatively smooth desired surface of Trimmed 3D Surface 3202. In one embodiment the undesired surfaces are removed by hand, using any of many common 3D modeling applications, such as Maya from Autodesk. In another embodiment, the surface reconstruction system in Data Processing system 3010 rejects any 3D surface for which the pattern correlation is low. Since there is typically a low correlation in areas without the transparent UV makeup, this eliminates much of the undesired surface. In another embodiment, filters that only pass the color of the transparent UV phosphor emission (e.g. blue) are placed on the cameras capturing the random pattern, so as to attenuate the brightness of non-blue areas in the camera view. And, the surface reconstruction system in Data Processing system 3010 converts any captured pixels below an established brightness threshold to black. This serves to cut out most of the image that is not part of the transparent UV phosphor emission. In another embodiment, using any or several of the embodiments described herein, the first frame of a sequence of captured frames is “trimmed” of the undesired 3D surface. Then, in subsequent frames, the surface reconstruction system in Data Processing system 3010 rejects random patterns that (a) are not found within the trimmed first frame AND (b) are not found within the perimeter of the trimmed first frame (e.g. if the face moves and skin unfolds, new random patterns may be revealed, but such patterns must still be within the perimeter of the first trimmed frame, or they will be rejected).
In another embodiment, transparent UV makeup with different color light emission other than blue is used. This can be useful, for example, if a scene has a predominant blue color in the background and could be helpful either in the processing of the transparent UV makeup random patterns (e.g. if the background is blue, and the transparent UV makeup emission is blue, then a blue filter on the cameras would not attenuate the background, and may result in undesirable surface reconstruction of the background). Or, conversely, if the background color in the scene is used for visual effects, it may be helpful to have the transparent UV makeup be a different color (e.g. if blue screens or blue objects are used in the background for the purposes of identifying certain areas, perhaps for compositing with other image elements, then a blue emission from the transparent UV makeup might interfere with such identification). Transparent UV makeup is available that emits in many different colors, such as red, white, yellow, purple, orange, and green.
In addition, in one embodiment, transparent UV makeup is used which emits electromagnetic radiation (EMR) in the ultraviolet spectrum. In this embodiment, cameras sensitive to UV light are used, preferably with filters that block visible and IR light, and with lenses that are focused for the UV spectrum. Moreover, in one embodiment, transparent UV makeup is used which emits electromagnetic radiation (EMR) in the infrared (IR) spectrum. In this embodiment, cameras sensitive to IR light are used, preferably with filters that block visible and UV light, and with lenses that are focused for the IR spectrum.
An embodiment which uses transparent makeup that emits EMR in the IR spectrum may be excited by various forms of EMR including UV light or visible light. While such makeup is generally not commercially available, it can be formulated using transparent makeup base (e.g., that of transparent UV makeup or that of many other transparent makeup base formulations) combined with phosphor that has the characteristic of emitting IR light when excited by UV or visible light. Such phosphors are commonly used, for example, in anti-forgery inks. For example, the VIS/IR ink offered by Allami Nyomda Plc., H-1102 Budapest, Halom u. 5., Hungary at http://www.allaminyomda.hu/file/1000354 (code IF 01) is excited by visible light at 480 nm, and emits near IR light.
In this embodiment, a transparent IR-emissive makeup made with such phosphor is applied to the performer in a random pattern, and then the performer is illuminated constantly by ambient lighting on the set (or daylight). In
In one embodiment, color cameras are used that are not sensitive to IR light, and as a result do not require filters. In another embodiment, color cameras are used with sensors that can capture Red, Green, Blue and IR light (e.g. by having Red, Green, Blue and IR filters in a 2×2 pattern over each 4 pixels of the sensor), and these color cameras are used both for capturing the visible light in the Red, Green and Blue spectrum as well as the IR light, rather than having separate grayscale cameras for capturing the IR light.
In one embodiment, the ambient lighting sources are either chosen to be sources that do not emit significant IR light (e.g. Red, Green, Blue LEDs), or they are outfitted with IR filters that attenuate their IR emission. In this way the amount of IR light that reflects from the performer is minimized, resulting in higher contrast between the random pattern of the transparent IR-emitting light. Also, if a lighting source is within view of one of the cameras capturing the random pattern emitted by IR, that lighting source will be less likely to overdrive the camera sensors.
In one embodiment, the transparent makeup contains an IR-emitting phosphor which is excited by IR light. Such phosphors are commercially available for biological applications, such as IRDye® Infrared Dyes from Li-Cor Biosciences of Lincoln, Nebr., and for various security, consumer and other applications from Microtrace of Minneapolis, Minn. In this embodiment an IR light source is directed at the random pattern of transparent IR-emitting makeup in addition to any (or no) ambient or outdoor lighting. The advantage of this approach is if the ambient or outdoor lighting is dim or is inconsistent (e.g. contains shadows) for any reason (e.g. for artistic lighting effects), the transparent IR-emitting makeup can still be illuminated by a bright and uniform IR light source without impacting the visible lighting of the scene. In other embodiments similarly applied, the transparent makeup is excited and/or emissive with only UV light or UV and IR light, and is illuminated with lights in the excitation spectrum and the random pattern is captured by cameras sensitive in the emission spectrum. And, other embodiments, the transparent makeup does not fluoresce, but absorbs or reflects either UV or IR light, and is used to create a random pattern in non-visible light spectra, which is illuminated by non-visible light and captured by cameras sensitive to the non-visible light.
The embodiments described above with respect to
It should be noted that the term “light” is used in different contexts herein to refer to both visible EMR (EMR within the visible spectrum) and non-visible EMR (light outside of the visible spectrum). For example, the terms “IR light” or “UV light” recited above refer to non-visible EMR in the IR spectrum and UV spectrum, respectively; whereas “visible light,” “ambient light,” or “daylight” refer to visible EMR.
In another embodiment, the techniques described above are used to capture cloth.
Because the phosphor charges from any light incident upon it, including diffused or reflected light that is not directly from the light panels 208-209, even phosphor within folds gets charged (unless the folds are so tightly sealed no light can get into them, but in such cases it is unlikely that the cameras can see into the folds anyway). This illustrates a significant advantage of utilizing phosphorescent makeup (or paint or dye) for creating patterns on (or infused within) surfaces to be captured: the phosphor is emissive and is not subject to highlights and shadows, producing a highly uniform brightness level for the patterns seen by the grayscale dark cameras 204-205, that neither has areas too dark nor areas too bright.
Another advantage of dyeing or painting a surface with phosphorescent dye or paint, respectively, rather than applying phosphorescent makeup to the surface is that with dye or paint the phosphorescent pattern on the surface can be made permanent throughout a motion capture session. Makeup, by its nature, is designed to be removable, and a performer will normally remove phosphorescent makeup at the end of a day's motion capture shoot, and if not, almost certainly before going to bed. Frequently, motion capture sessions extend across several days, and as a result, normally a fresh application of phosphorescent makeup is applied to the performer each day prior to the motion capture shoot. Typically, each fresh application of phosphorescent makeup will result in a different random pattern. One of the techniques disclosed in co-pending applications is the tracking of vertices (“vertex tracking”) of the captured surfaces. Vertex tracking is accomplished by correlating random patterns from one captured frame to the next. In this way, a point on the captured surface can be followed from frame-to-frame. And, so long as the random patterns on the surface stay the same, a point on a captured surface even can be tracked from shot-to-shot. In the case of random patterns made using phosphorescent makeup, it is typically practical to leave the makeup largely undisturbed (although it is possible for some areas to get smudged, the bulk of the makeup usually stays unchanged until removed) during one day's-worth of motion capture shooting, but as previously mentioned it normally is removed at the end of the day. So, it is typically impractical to maintain the same phosphorescent random pattern (and with that, vertex tracking based on tracking a particular random pattern) from day-to-day. But when it comes to non-skin objects like fabric, phosphorescent dye or paint can be used to create a random pattern. Because dye and paint are essentially permanent, random patterns will not get smudged during the motion capture session, and the same random patterns will be unchanged from day-to-day. This allows vertex tracking of dyed or painted objects with random patterns to track the same random pattern through the duration of a multi-day motion capture session (or in fact, across multiple motion capture sessions spread over long gaps in time if desired).
Skin is also subject to shadows and highlights when viewed with reflected light. There are many concave areas (e.g., eye sockets) that often are shadowed. Also, skin may be shiny and cause highlights, and even if the skin is covered with makeup to reduce its shininess, performers may sweat during a physical performance, resulting in shininess from sweaty skin. Phosphorescent makeup emits uniformly both from shiny and matte skin areas, and both from convex areas of the body (e.g. the nose bridge) and concavities (e.g. eye sockets). Sweat has little impact on the emission brightness of phosphorescent makeup. Phosphorescent makeup also charges while folded up in areas of the body that fold up (e.g. eyelids) and when it unfolds (e.g. when the performer blinks) the phosphorescent pattern emits light uniformly.
Returning back to
In additional embodiments, rather than using phosphorescent paint or dye, as described above, transparent UV- or transparent IR-emissive paint, ink or dye is used on clothing, props or other objects in the scene. Phosphor with the same properties as those previously described with makeup is used, and the same lighting, camera, filtering and other capture and processing techniques are used.
In another embodiment, phosphor is embedded in silicone or a moldable material such as modeling clay in characters, props and background sets used for stop-motion animation. Stop-motion animation is a technique used in animated motion pictures and in motion picture special effects. An exemplary prior art stop-motion animation stage is illustrated in
There are many difficulties with the stop-motion animation process that both limit the expressive freedom of the animators, limit the degree of realism in motion, and add to the time and cost of production. One of these difficulties is animating many complex characters 702-703 within a complex set 701 on a stop-motion animation stage such as that shown in
In one embodiment illustrated by the stop-motion animation stage in
At low concentrations of zinc sulfide in the various embodiments described above, the zinc sulfide is not significantly visible under the desired scene illumination when light panels 208-208 are on. The exact percentage of zinc sulfide depends on the particular material it is mixed with or applied to, the color of the material, and the lighting circumstances of the character 702-703, prop or set 701. But, experimentally, the zinc sulfide concentration can be continually reduced until it is no longer visually noticeable in lighting situations where the character 702-703, prop or set 701 is to be used. This may result in a very low concentration of zinc sulfide and very low phosphorescent emission. Although this normally would be a significant concern with live action frame capture of dim phosphorescent patterns, with stop-motion animation, the dark frame capture shutter time can be extremely long (e.g. 1 second or more) because by definition, the scene is not moving. With a long shutter time, even very dim phosphorescent emission can be captured accurately.
Once the characters 702-703, props and the set 701 in the scene are thus prepared, they look almost exactly as they otherwise would look under the desired scene illumination when light panels 208-209 are on, but they phosphoresce in random patterns when the light panels 208-209 are turned off. At this point all of the characters 702-703, props and the set 701 of the stop-motion animation can now be captured in 3D using a configuration like that illustrated in
In one embodiment, the light panels 208-209 are left on while the animators adjust the positions of the characters 702-703, props or any changes to the set 701. Note that the light panels 208-209 could be any illumination source, including incandescent lamps, because there is no requirement in stop-motion animation for rapidly turning on and off the illumination source. Once the characters 702-703, props and set 701 are in position for the next frame, lit cam sync signal 223 is triggered (by a falling edge transition in the presently preferred embodiment) and all of the lit cameras 214-215 capture a frame for a specified duration based on the desired exposure time for the captured frames. In other embodiments, different cameras may have different exposure times based on individual exposure requirements.
Next, light panels 208-209 are turned off (either by sync signal 222 or by hand) and the lamps are allowed to decay until the scene is in complete darkness (e.g. incandescent lamps may take many seconds to decay). Then, dark cam sync signal 221 is triggered (by a falling edge transition in the presently preferred embodiment) and all of the dark cameras 208-209 capture a frame of the random phosphorescent patterns for a specified duration based on the desired exposure time for the captured frames. Once again, different cameras have different exposure times based on individual exposure requirements. As previously mentioned, in the case of very dim phosphorescent emissions, the exposure time may be quite long (e.g., a second or more). The upper limit of exposure time is primarily limited by the noise accumulation of the camera sensors. The captured dark frames are processed by data processing system 210 to produce 3D surface 207 and then to map the images captured by the lit cameras 214-215 onto the 3D surface 207 to create textured 3D surface 217. Then, the light panels, 208-9 are turned back on again, the characters 702-703, props and set 701 are moved again, and the process described in this paragraph is repeated until the entire shot is completed.
The resulting output is the successive frames of textured 3D surfaces of all of the characters 702-703, props and set 701 with areas of surfaces embedded or painted with phosphor that are in view of at least 2 dark cameras 204-205 at a non-oblique angle (e.g., <30 degrees from the optical axis of a camera). When these successive frames are played back at the desired frame rate (e.g., 24 fps), the animated scene will come to life, but unlike frames of a conventional stop-motion animation, the animation will be able to be viewed from any camera position, just by rendering the textured 3D surfaces from a chosen camera position. Also, if the camera position of the final animation is to be in motion during a frame sequence (e.g. if a camera is following a character 702-703), it is not necessary to have a physical camera moving in the scene. Rather, for each successive frame, the textured 3D surfaces of the scene are simply rendered from the desired camera position for that frame, using a 3D modeling/animation application software such as Maya (from Autodesk, Inc.).
In another embodiment, illustrated in
This approach provides significant advantages to stop-motion animation. The following are some of the advantages of this approach: (a) individual characters 702-703 may be manipulated individually without worrying about the animator bumping into another character 702-703 or the characters 702-703 bumping into each other, (b) the camera position of the rendered frames may be chosen arbitrarily, including having the camera position move in successive frames, (c) the rendered camera position can be one where it would not be physically possible to locate a camera 705 in a conventional stop-motion configuration (e.g. directly between 2 characters 702-703 that are close together, where there is no room for a camera 705), (d) the lighting, including highlights and shadows can be controlled arbitrarily, including creating lighting situations that are not physically possible to realize (e.g. making a character glow), (e) special effects can be applied to the characters 702-703 (e.g. a ghost character 702-703 can be made translucent when it is rendered into the scene), (f) a character 702-703 can remain in a physically stable position on the ground while in the scene it is not (e.g. a character 702-703 can be captured in an upright position, while it is rendered into the scene upside down in a hand stand, or rendered into the scene flying above the ground), (g) parts of the character 702-703 can be held up by supports that do not have phosphor on them, and as such will not be captured (and will not have to be removed from the shot later in post-production), (h) detail elements of a character 702-703, like mouth positions when the character 702-703 is speaking, can be rendered in by the 3D modeling/animation application, so they do not have be attached and then removed from the character 702-703 during the animation, (i) characters 702-703 can be rendered into computer-generated 3D scenes (e.g. the man with leash 702 and dog 703 can be animated as clay animations, but the city street set 701 can be a computer-generated scene), (j) 3D motion blur can be applied to the objects as they move (or as the rendered camera position moves), resulting in a smoother perception of motion to the animation, and also making possible faster motion without the perception of jitter.
In additional embodiments, rather than using phosphorescent paint, dye or powder, as described previously, transparent UV- or transparent IR-emissive paint, ink, dye or powder is used on or embedded within stop motion objects in the scene. Phosphor with the same properties as that previously described with makeup is used, and the same lighting, camera, filtering and other capture and processing techniques are used.
In another embodiment, different phosphors other than ZnS:Cu are used as pigments with dyes for fabrics or other non-skin objects. ZnS:Cu is the preferred phosphor to use for skin applications because it is FDA-approved as a cosmetic pigment. But a large variety of other phosphors exist that, while not approved for use on the skin, are in some cases approved for use within materials handled by humans. One such phosphor is SrAl2O4:Eu2+,Dy3+. Another is SrAl2O4:Eu2+. Both phosphors have a much longer afterglow than ZnS:Cu for a given excitation.
Many phosphors that phosphoresce or fluoresce in visible light spectra are charged more efficiently by ultraviolet light than by visible light. This can be seen in chart 800 of
Other phosphors that may be used for non-skin phosphorescent use (e.g. for dyeing fabrics) also are excited best by ultraviolet light. For example, SrAl2O4:Eu2+,Dy3+ and SrAl2O4:Eu2+ are both excited more efficiently with ultraviolet light than visible light, and in particular, are excited quite efficiently by UVA (black light).
As can be seen in
But, in addition to these disadvantages, the only very bright LEDs currently available are white or RGB LEDs. In the case of both types of LEDs, the wavelengths of light emitted by the LED does not overlap with wavelengths where the zinc sulfide is efficiently excited. For example, in
Other lighting sources exist that output light at wavelengths that are more efficiently absorbed by zinc sulfide. For example, fluorescent lamps (e.g. 482-S9 from Kino-Flo, Inc. 2840 North Hollywood Way, Burbank, Calif. 91505) are available that emit UVA (black light) centered around 350 nm with an emission curve similar to 821, and Blue/violet fluorescent lamps (e.g. 482-S10-S from Kino-Flo) exist that emit bluish/violet light centered around 420 nm with an emission curve similar to 822. The emission curves 821 and 822 are much closer to the peak of the zinc sulfide excitation curve 811, and as a result the light energy is far more efficiently absorbed, resulting in a much higher phosphorescent emission 812 for a given excitation brightness. Such fluorescent bulbs are quite inexpensive (typically $15/bulb for a 48″ bulb), produce very little heat, and are very light weight. They are also available in high wattages. A typical 4-bulb fluorescent fixture produces 160 Watts or more. Also, theatrical fixtures are readily available to hold such bulbs in place as staging lights. (Note that UVB and UVC fluorescent bulbs are also available, but UVB and UVC exposure is known to present health hazards under certain conditions, and as such would not be appropriate to use with human or animal performers without suitable safety precautions.)
The primary issue with using fluorescent lamps is that they are not designed to switch on and off quickly. In fact, ballasts (the circuits that ignite and power fluorescent lamps) typically turn the lamps on very slowly, and it is common knowledge that fluorescent lamps may take a second or two until they are fully illuminated.
Standard fluorescent ballasts are not designed to switch fluorescent lamps on and off quickly, but it is possible to modify an existing ballast so that it does.
For the moment, consider only the prior art ballast circuit 1002 of
Synchronization control circuit 1001 is added to modify the prior art ballast circuit 1002 described in the previous paragraph to allow rapid on-and-off control of the fluorescent lamp 1003 with a sync signal. In the illustrated embodiment in
This process repeats as the sync signal coupled to SYNC+ oscillates between high and low level. The synch control circuit 1001 combined with prior art ballast 1002 will switch fluorescent lamp 1003 on and off reliably, well in excess of 120 flashes per second. It should be noted that the underlying principles of the invention are not limited to the specific set of circuits illustrated in
Although the modified circuit shown in
There exists a wide range of decay periods for different brands and types of fluorescent lamps, from as short as 200 microseconds, to as long as over a millisecond. To address this property of fluorescent lamps, one embodiment of the invention adjusts signals 221-223. This embodiment will be discussed shortly.
Another property of fluorescent lamps that impacts their usability with a motion capture system such as that illustrated in
Fluorescent lamp 1370 is a lamp in the same state as prior art lamp 1350, 10 milliseconds after the bulb 1370 has been shut off, with its electrodes 1371-1372 still glowing and producing illuminated regions 1381-1382 near the ends of the bulb of fluorescent lamp 1370, but unlike prior art lamp 1350, wrapped around the ends of lamp 1370 is opaque tape 1391 and 1392 (shown as see-through with slanted lines for the sake of illustration). In the presently preferred embodiment black gaffers' tape is used, such as 4″ P-665 from Permacel, A Nitto Denko Company, US Highway No. 1, P.O. Box 671, New Brunswick, N.J. 08903. The opaque tape 1391-1392 serves to block almost all of the light from glowing electrodes 1371-1372 while blocking only a small amount of the overall light output of the fluorescent lamp when the lamp is on during lit interval 301. This allows the fluorescent lamp to become much darker during dark interval 302 when being flashed on and off at a high rate (e.g. 90 Hz). Other techniques can be used to block the light from the glowing electrodes, including other types of opaque tape, painting the ends of the bulb with an opaque paint, or using an opaque material (e.g. sheets of black metal) on the light fixtures holding the fluorescent lamps so as to block the light emission from the parts of the fluorescent lamps containing electrodes.
Returning now to the light decay property of fluorescent lamps illustrated in
At the end of lit time interval 1401, the falling edge 1432 of sync signal 222 turns off light panels 208-209 and is roughly coincident with the rising edge 1412 of lit cam sync signal 223, which closes the shutters of the lit cameras 214-215. Note, however, that the light output of the light panels 208-209 does not drop from lit to dark immediately, but rather slowly drops to dark as the fluorescent lamp phosphor decays as shown by edge 1442. When the light level of the fluorescent lamps finally reaches dark level 1441, dark cam sync signal 221 is dropped from high to low as shown by edge 1422, and this opens the shutters of dark cameras 204-205. This way the dark cameras 204-205 only capture the emissions from the phosphorescent makeup, paint or dye, and do not capture the reflection of light from any objects illuminated by the fluorescent lamps during the decay interval 1442. So, in this embodiment the dark interval 1402 is shorter than the lit interval 1401, and the dark camera 204-205 shutters are open for a shorter period of time than the lit camera 214-205 shutters.
Another embodiment is illustrated in
Note that in the embodiments shown in both
In yet another embodiment the lit cameras 214-215 leave their shutters open for some or all of the dark time interval 1502. In this case, the phosphorescent areas in the scene will appear very prominently relative to the non-phosphorescent areas since the lit cameras 214-215 will integrate the light during the dark time interval 1502 with the light from the lit time interval 1501.
Because fluorescent lamps are generally not sold with specifications detailing their phosphor decay characteristics, it is necessary to determine the decay characteristics of fluorescent lamps experimentally. This can be readily done by adjusting the falling edge 1522 of sync signal 221 relative to the falling edge 1532 of sync signal 222, and then observing the output of the dark cameras 204-205. For example, in the embodiment shown in
In another embodiment the decay of the phosphor in the fluorescent lamps is such that even after edge 1532 is delayed as long as possible after 1522 to allow for the dark cameras 204-205 to have a long enough shutter time to capture a bright enough image of phosphorescent patterns in the scene, there is still a small amount of light from the fluorescent lamp illuminating the scene such that non-phosphorescent objects in the scene are slightly visible. Generally, this does not present a problem for the pattern processing techniques described in the co-pending applications identified above. So long as the phosphorescent patterns in the scene are substantially brighter than the dimly-lit non-fluorescent objects in the scene, the pattern processing techniques will be able to adequately correlate and process the phosphorescent patterns and treat the dimly lit non-fluorescent objects as noise.
While the following discussion focuses on the embodiments illustrated in
In another embodiment the lit cameras 214-215 and dark cameras 204-205 are operated at a lower frame rate than the flashing rate of the light panels 208-209. For example, the capture frame rate may be 30 frames per second (fps), but so as to keep the flashing of the light panels 208-209 about the threshold of human perception, the light panels 208-209 are flashed at 90 flashes per second. This situation is illustrated in
In another embodiment where the lit cameras 214-215 and dark cameras 204-205 are operated at a lower frame rate than the flashing rate of the light panels 208-209, sync signal 223 causes the lit cameras 214-215 to open their shutters after sync signal 221 causes the dark cameras 204-205 to open their shutters. This is illustrated in
In another embodiment where the lit cameras 214-215 and dark cameras 204-205 are operated at a lower frame rate than the flashing rate of the light panels 208-209, the light panels 208-209 are flashed with varying light cycle intervals so as to allow for longer shutter times for either the dark cameras 204-205 or lit cameras 214-215, or to allow for longer shutters times for both cameras. An example of this embodiment is illustrated in
Unlike the previously described embodiments, where there is one sync signal 221 for the dark cameras and one sync signal 223 for the lit cameras, in the embodiment illustrated in
In this embodiment, as shown in
Sync signal 222 transitions with edge 2032 from a high to low state 2031. Low state 2031 turns off light panels 208-209, which gradually decay to a dark state 2041 following decay curve 2042. When the light panels are sufficiently dark for the purposes of providing enough contrast to separate the phosphorescent makeup, paint, or dye from the non-phosphorescent surfaces in the scene, sync signal 1921 transitions to low state 2021. This causes dark cameras 1931-1932 to open their shutters and capture a dark frame. After the time interval 2002, sync signal 222 transitions with edge 2034 to high state 2033 which causes the light panels 208-209 to transition with edge 2044 to lit state 2043. Just prior to light panels 208-209 becoming lit, sync signal 1921 transitions to high state 2051 closing the shutter of dark cameras 1931-1932. Just after the light panels 208-209 become lit, sync signal 1924 transition to low state 2024, causing the shutters on the lit cameras 1941-1942 to open during time interval 2001 and capture a lit frame. Sync signal 222 transitions to a low state, which turns off the light panels 208-9, and sync signal 1924 transitions to a high state at the end of time interval 2001, which closes the shutters on lit cameras 1941-1942.
The sequence of events described in the preceding paragraphs repeats 2 more times, but during these repetitions sync signals 1921 and 1924 remain high, keeping their cameras shutters closed. For the first repetition, sync signal 1922 opens the shutter of dark cameras 1933-1934 while light panels 208-209 are dark and sync signal 1925 opens the shutter of lit cameras 1943-1944 while light panels 208-209 are lit. For the second repetition, sync signal 1923 opens the shutter of dark cameras 1935-1936 while light panels 208-209 are dark and sync signal 1926 opens the shutter of lit cameras 1945-1946 while light panels 208-209 are lit.
Then, the sequence of events described in the prior 2 paragraphs continues to repeat while the motion capture session illustrated in
Although the “cascading” timing sequence illustrated in
When a scene is shot conventionally using prior art methods and cameras are capturing only 2D images of that scene, the “cascading” technique to use multiple slower frame rate cameras to achieve a higher aggregate frame rate as illustrated in
Ideally, the full dynamic range, but not more, of dark cameras 204-205 should be utilized to achieve the highest quality pattern capture. For example, if a pattern is captured that is too dark, noise patterns in the sensors in cameras 204-205 may become as prominent as captured patterns, resulting in incorrect 3D reconstruction. If a pattern is too bright, some areas of the pattern may exceed the dynamic range of the sensor, and all pixels in such areas will be recorded at the maximum brightness level (e.g. 255 in an 8-bit sensor), rather than at the variety or brightness levels that actually make up that area of the pattern. This also will result in incorrect 3D reconstruction. So, prior to capturing a pattern, per the techniques described herein, it is advantageous to try to make sure the brightness of the pattern throughout is not too dark, nor too bright (e.g. not reaching the maximum brightness level of the camera sensor).
When phosphorescent makeup is applied to a performer, or when phosphorescent makeup, paint or dye is applied to an object, it is difficult for the human eye to evaluate whether the phosphor application results in a pattern captured by the dark cameras 204-205 that is bright enough in all locations or too bright in some locations.
Image 2202 shows such an objective measure. It shows the same cylinder as image 2201, but instead of showing the brightness of each pixel of the image as a grayscale level (in this example, from 0 to 255), it shows it as a color. Each color represents a range of brightness. For example, in image 2202 blue represents brightness ranges 0-32, orange represents brightness ranges 192-223 and dark red represents brightness ranges 224-255. Other colors represent other brightness ranges. Area 2211, which is blue, is now clearly identifiable as an area that is very dark, and area 2221, which is dark red, is now clearly identifiable as an area that is very bright. These determinations can be readily made by the human eye, even if the dynamic range of the display monitor is less than that of the sensor, or if the display monitor is incorrectly adjusted, or if the brain of the observer adapts to the brightness of the display. With this information the human observer can change the application of phosphorescent makeup, dye or paint. The human observer can also adjust the aperture and/or the gain setting on the cameras 204-205 and/or the brightness of the light panels 208-209.
In one embodiment image 2202 is created by application software running on one camera controller computer 225 and is displayed on a color LCD monitor attached to the camera controller computer 225. The camera controller computer 225 captures a frame from a dark camera 204 and places the pixel values of the captured frame in an array in its RAM. For example, if the dark cameras 204 is a 640×480 grayscale camera with 8 bits/pixel, then the array would be a 640×480 array of 8-bit bytes in RAM. Then, the application takes each pixel value in the array and uses it as an index into a lookup table of colors, with as many entries as the number of possible pixel values. With 8 bits/pixel, the lookup table has 256 entries. Each of the entries in the lookup table is pre-loaded (by the user or the developer of the application) with the desired Red, Green, Blue (RGB) color value to be displayed for the given brightness level. Each brightness level may be given a unique color, or a range of brightness levels can share a unique color. For example, for image 2202, lookup table entries 0-31 are all loaded with the RGB value for blue, entries 192-223 are loaded with the RGB value for orange and entries 224-255 are loaded with the RGB value for dark red. Other entries are loaded with different RGB color values. The application uses each pixel value from the array (e.g. 640×480 of 8-bit grayscale values) of the captured frame as an index into this color lookup take, and forms a new array (e.g. 640×480 of 24-bit RGB values) of the looked-up colors. This new array of look-up colors is then displayed, producing a color image such as 1102.
If a color camera (either lit camera 214 or dark camera 204) is used to capture the image to generate an image such as 2202, then one step is first performed after the image is captured and before it is processed as described in the preceding paragraph. The captured RGB output of the camera is stored in an array in camera controller computer 225 RAM (e.g. 640×480 with 24 bits/pixel). The application running on camera controller computer 225 then calculates the average brightness of each pixel by averaging the Red, Green and Blue values of each pixel (i.e. Average=(R+G+B)/3), and places those averages in a new array (e.g. 640×480 with 8 bits/pixel). This array of Average pixel brightnesses (the “Average array”) will soon be processed as if it were the pixel output of a grayscale camera, as described in the prior paragraph, to produce a color image such as 2202. But, first there is one more step: the application examines each pixel in the captured RGB array to see if any color channel of the pixel (i.e. R, G, or B) is at a maximum brightness value (e.g. 255). If any channel is, then the application sets the value in the Average array for that pixel to the maximum brightness value (e.g. 255). The reason for this is that it is possible for one color channel of a pixel to be driven beyond maximum brightness (but only output a maximum brightness value), while the other color channels are driven by relatively dim brightness. This may result in an average calculated brightness for that pixel that is a middle-range level (and would not be considered to be a problem for good-quality pattern capture). But, if any of the color channels has been overdriven in a given pixel, then that will result in an incorrect pattern capture. So, by setting the pixel value in the Average array to maximum brightness, this produces a color image 2202 where that pixel is shown to be at the highest brightness, which would alert a human observer of image 1102 of the potential of a problem for a high-quality pattern capture.
It should be noted that the underlying principles of the invention are not limited to the specific color ranges and color choices illustrated in
Correlating lines or random patterns captured by one camera with images from other cameras as described above provides range information for each camera. In one embodiment of the invention, range information from multiple cameras is combined in three steps: (1) treat the 3d capture volume as a scalar field; (2) use a “Marching Cubes” (or a related “Marching Tetrahedrons”) algorithm to find the isosurface of the scalar field and create a polygon mesh representing the surface of the subject; and (3) remove false surfaces and simplify the mesh. Details associated with each of these steps is provided below.
The scalar value of each point in the capture volume (also called a voxel) is the weighted sum of the scalar values from each camera. The scalar value for a single camera for points near the reconstructed surface is the best estimate of the distance of that point to the surface. The distance is positive for points inside the object and negative for points outside the object. However, points far from the surface are given a small negative value even if they are inside the object.
The weight used for each camera has two components. Cameras that lie in the general direction of the normal to the surface are given a weight of 1. Cameras that lie 90 degrees to the normal are given a weight of 0. A function is used of the form: ni=cos2 ai, where ni is the normal weighting function, and ai ios the angle between the camera's direction and the surface normal. This is illustrated graphically in
The second weighting component is a function of the distance. The farther the volume point is from the surface the less confidence there is in the accuracy of the distance estimate. This weight decreases significantly faster than the distance increases. A function is used of the form: wi=1/(di2+1), where wi is the weight and di is the distance. This is illustrated graphically in
It should be noted that other known functions with similar characteristics to the functions described above may also be employed. For example, rather than a cosine-squared function as described above, a cosine squared function with a threshold may be employed. In fact, virtually any other function which produces a graph shaped similarly to those illustrated in
In one embodiment of the invention, the “Marching Cubes” algorithm and its variant “Marching Tetrahedrons” finds the zero crossings of a scalar field and generates a surface mesh. See, e.g., Lorensen, W. E. and Cline, H. E., Marching Cubes: a high resolution 3D surface reconstruction algorithm, Computer Graphics, Vol. 21, No. 4, pp 163-169 (Proc. of SIGGRAPH), 1987, which is incorporated herein by reference. A volume is divided up into cubes. The scalar field is known or calculated as above for each corner of a cube. When some of the corners have positive values and some have negative values it is known that the surface passes through the cube. The standard algorithm interpolates where the surface crosses each edge. One embodiment of the invention improves on this by using an improved binary search to find the crossing to a high degree of accuracy. In so doing, the scalar field is calculated for additional points. The computational load occurs only along the surface and greatly improves the quality of the resulting mesh. Polygons are added to the surface according to tables. The “Marching Tetrahedrons” variation divides each cube into six tetrahedrons. The tables for tetrahedrons are much smaller and easier to implement than the tables for cubes. In addition, Marching Cubes has an ambiguous case not present in Marching Tetrahedrons.
The resulting mesh often has a number of undesirable characteristics. Often there is a ghost surface behind the desired surface. There are often false surfaces forming a halo around the true surface. And finally the vertices in the mesh are not uniformly spaced. The ghost surface and most of the false surfaces can be identified and hence removed with two similar techniques. Each vertex in the reconstructed surface is checked against the range information from each camera. If the vertex is close to the range value for a sufficient number of cameras (e.g., 1-4 cameras) confidence is high that this vertex is good. Vertices that fail this check are removed. Range information generally doesn't exist for every point in the field of view of the camera. Either that point isn't on the surface or that part of the surface isn't painted. If a vertex falls in this “no data” region for too many cameras (e.g., 1-4 cameras), confidence is low that it should be part of the reconstructed surface. Vertices that fail this second test are also removed. This test makes assumptions about, and hence restrictions on, the general shape of the object to be reconstructed. It works well in practice for reconstructing faces, although the underlying principles of the invention are not limited to any particular type of surface. Finally, the spacing of the vertices is made more uniform by repeatedly merging the closest pair of vertices connected by an edge in the mesh. The merging process is stopped when the closest pair is separated by more than some threshold value. Currently, 0.5 times the grid spacing is known to provide good results.
“Vertex tracking” as used herein is the process of tracking the motion of selected points in a captured surface over time. In general, one embodiment utilizes two strategies to tracking vertices. The Frame-to-Frame method tracks the points by comparing images taken a very short time apart. The Reference-to-Frame method tracks points by comparing an image to a reference image that could have been captured at a very different time or possibly it was acquired by some other means. Both methods have strengths and weaknesses. Frame-to-Frame tracking does not give perfect results. Small tracking errors tend to accumulate over many frames. Points drift away from their nominal locations. In Reference-to-Frame, the subject in the target frame can be distorted from the reference. For example, the mouth in the reference image might be closed and in the target image it might be open. In some cases, it may not be possible to match up the patterns in the two images because it has been distorted beyond recognition.
To address the foregoing limitations, in one embodiment of the invention, a combination of Reference-to-Frame and Frame to Frame techniques are used. A flowchart describing this embodiment is illustrated in
In one embodiment, for both Reference-to-Frame and Frame-to-Frame tracking, the camera closest to the normal of the surface is chosen. Correlation is used to find the new x,y locations of the points. See, e.g., A
At times the reconstruction of a surface is imperfect. It can have holes or extraneous bumps. The location of every point is checked by estimating its position from its neighbor's positions. If the tracked location is too different it is suspected that something has gone wrong with either the tracking or with the surface reconstruction. In either case the point is corrected to a best estimate location.
Many prior art motion capture systems (e.g. the Vicon MX40 motion capture system) utilize markers of one form or another that are attached to the objects whose motion is to be captured. For example, for capturing facial motion one prior art technique is to glue retroreflective markers to the face. Another prior art technique to capture facial motion is to paint dots or lines on the face. Since these markers remain in a fixed position relative to the locations where they are attached to the face, they track the motion of that part of the face as it moves.
Typically, in a production motion capture environment, locations on the face are chosen by the production team where they believe they will need to track the facial motion when they use the captured motion data in the future to drive an animation (e.g. they may place a marker on the eyelid to track the motion of blinking). The problem with this approach is that it often is not possible to determine the ideal location for the markers until after the animation production is in process, which may be months or even years after the motion capture session where the markers were captured. At such time, if the production team determines that one or more markers is in a sub-optimal location (e.g. located at a location on the face where there is a wrinkle that distorts the motion), it is often impractical to set up another motion capture session with the same performer and re-capture the data.
In one embodiment of the invention users specify the points on the capture surfaces that they wish to track after the motion capture data has been captured (i.e. retrospectively relative to the motion capture session, rather than prospectively). Typically, the number of points specified by a user to be tracked for production animation will be far fewer points than the number of vertices of the polygons captured in each frame using the surface capture system of the present embodiment. For example, while over 100,000 vertices may be captured in each frame for a face, typically 1000 tracked vertices or less is sufficient for most production animation applications.
For this example, a user may choose a reference frame, and then select 1000 vertices out of the more than 100,000 vertices on the surface to be tracked. Then, utilizing the vertex tracking techniques described previously and illustrated in
Embodiments of the invention may include various steps as set forth above. The steps may be embodied in machine-executable instructions which cause a general-purpose or special-purpose processor to perform certain steps. Various elements which are not relevant to the underlying principles of the invention such as computer memory, hard drive, input devices, have been left out of the figures to avoid obscuring the pertinent aspects of the invention.
Alternatively, in one embodiment, the various functional modules illustrated herein and the associated steps may be performed by specific hardware components that contain hardwired logic for performing the steps, such as an application-specific integrated circuit (“ASIC”) or by any combination of programmed computer components and custom hardware components.
Elements of the present invention may also be provided as a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may include, but is not limited to, flash memory, optical disks, CD-ROMs, DVD ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, propagation media or other type of machine-readable media suitable for storing electronic instructions. For example, the present invention may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
Throughout the foregoing description, for the purposes of explanation, numerous specific details were set forth in order to provide a thorough understanding of the present system and method. It will be apparent, however, to one skilled in the art that the system and method may be practiced without some of these specific details. Accordingly, the scope and spirit of the present invention should be judged in terms of the claims which follow.
This application is a continuation-in-part of the following U.S. patent applications: U.S. Ser. No. 11/888,377, filed Jul. 31, 2007, entitled, “System And Method For Performing Motion Capture And Image Reconstruction” which claims the benefit of U.S. Provisional Ser. No. 60/834,771, filed Jul. 31, 2006, entitled, “System And Method For Performing Motion Capture And Image Reconstruction” U.S. Ser. No. 11/449,127, filed Jun. 7, 2006, entitled, “System And Method For Performing Motion Capture Using Phosphor Application Techniques” U.S. Ser. No. 11/449,043, filed Jun. 7, 2006, entitled, “System And Method For Performing Motion Capture By Strobing A Fluorescent Lamp” U.S. Ser. No. 11/449,131, filed Jun. 7, 2006, entitled, “System And Method For Three Dimensional Capture Of Stop-Motion Animated Characters” U.S. Ser. No. 11/255,854, filed Oct. 20, 2005, entitled, “Apparatus And Method For Performing Motion Capture Using A Random Pattern On Capture Surfaces” which claims the benefit of U.S. Provisional Ser. No. 60/724,565, filed Oct. 7, 2005 entitled, “Apparatus And Method For Performing Motion Capture Using A Random Pattern On Capture Surfaces” U.S. Ser. No. 11/077,628, filed Mar. 10, 2005, entitled, “Apparatus And Method For Performing Motion Capture Using Shutter Synchronization” U.S. Ser. No. 11/066,954, filed Feb. 25, 2005, entitled, “Apparatus And Method Improving Marker Identification Within A Motion Capture System” U.S. Ser. No. 10/942,413, filed Sep. 15, 2004, entitled, “Apparatus And Method For Capturing The Expression Of A Performer” U.S. Ser. No. 10/942,609, filed Sep. 15, 2004, entitled, “Apparatus And Method For Capturing The Motion Of A Performer” These applications are collectively referred to as the “co-pending applications” and are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60834771 | Jul 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11888377 | Jul 2007 | US |
Child | 12455771 | US |