Reference will be made to embodiments of the invention, examples of which may be illustrated in the accompanying figures. These figures are intended to be illustrative, not limiting. Although the invention is generally described in the context of these embodiments, it should be understood that it is not intended to limit the scope of the invention to these particular embodiments.
In the following description, for purpose of explanation, specific details are set forth in order to provide an understanding of the invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without these details. One skilled in the art will recognize that embodiments of the present invention, some of which are described below, may be incorporated into a number of different systems and devices including projection systems, theatre systems, televisions, home entertainment systems, and other types of audio/visual entertainment systems. The embodiments of the present invention may also be present in software, hardware, firmware, or combinations thereof. Structures and devices shown below in block diagram are illustrative of exemplary embodiments of the invention and are meant to avoid obscuring the invention. Furthermore, connections between components and/or modules within the figures are not intended to be limited to direct connections. Rather, data between these components and modules may be modified, re-formatted, or otherwise changed by intermediary components and modules.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, characteristic, or function described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” or “in an embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Systems and methods are disclosed for animating one or more objects in a surround visual field. In an embodiment, a surround visual field is a synthesized, or generated, display that may be shown in conjunction with a main audio/visual presentation in order to enhance the presentation. A surround visual field may comprise one or more elements including, but not limited to, images, patterns, colors, shapes, textures, graphics, texts, objects, characters, and the like.
In an embodiment, one or more elements within the surround visual field may relate to, or be responsive to, the main audio/visual presentation. In one embodiment, one or more elements within the surround visual field, or the surround visual field itself, may visually change in relation to the audio/visual content or the environment in which the audio/visual content is being displayed. For example, elements within a surround visual field may move or change in relation to motion, sounds, and/or color within the audio/video content being displayed.
Returning to
The present invention discloses exemplary frameworks, or systems, for animating elements within a surround visual field. Also disclosed are some illustrative methods for utilizing the system to generate a surround visual field.
A. Surround Visual Field System or Framework
Embodiments of the present invention present a scalable, real-time framework, or system, for creating a surround visual field that is responsive to an input stream. In an embodiment, the framework may be used to affect foreground objects in a surround video field. In an embodiment, the framework may also be used to affect background elements, including but not limited to terrain, lighting, sky, water, background object, and the like, using one or more control signals, or cues, extracted from the input stream.
Examples of the control signals obtained from the audio include, but are not limited to, phase differences between audio channels, volume levels, audio frequency characteristics, and the like. Examples of control signals from the video include, but are not limited to, motion, color, lighting (such as, for example, identifying the light source in the video or an out of frame light source), and the like. Content recognition techniques may also be used to obtain information about the input stream content.
In an embodiment, the control signal extractor 220 may create a model of motion between successive video frame pairs. In an alternative embodiment, control signal extractor 220 or the coupling rules module 240 may extrapolates the motion model beyond the boundaries of the input stream video frame and use that extrapolation to control the surround visual field, in relation to the extrapolated motion model. In one embodiment, the optic flow vectors may be identified between successive video frame pairs and used to build a global motion model. In an embodiment, an affine model may be used to model motion in the input stream.
In an embodiment, the control signal extractor 220 analyzes motion between an input stream video frame pair and creates a model from which motion between the frame pair may be estimated. The accuracy of the model may depend on a number of factors including, but not limited to, the accuracy of the estimated optical flow, the density of the optic flow vector field used to generate the model, the type of model used and the number of parameters within the model, and the amount and consistency of movement between the video frame pair. The embodiment below is described in relation to successive video frames; however, the present invention may estimate and extrapolate motion between any two or more frames within a video signal and use this extrapolated motion to control a surround visual field.
In one example, motion vectors that are encoded within a video signal may be extracted and used to identify motion trajectories between video frames. One skilled in the art will recognize that these motion vectors may be encoded and extracted from a video signal using various types of methods including those defined by various video encoding standards (e.g. MPEG, H.264, etc.). In another example, optic flow vectors may be identified that describe motion between video frames. Various other types of methods may also be used to identify motion within a video signal; all of which are intended to fall within the scope of the present invention.
In one embodiment of the invention, the control signal extractor may identify a plurality of optic flow vectors between a pair of frames. The vectors may be defined at various motion granularities including pixel-to-pixel vectors and block-to-block vectors. These vectors may be used to create an optic flow vector field describing the motion between the frames.
The vectors may be identified using various techniques including correlation methods, extraction of encoded motion vectors, gradient-based detection methods of spatio-temporal movement, feature-based methods of motion detection and other methods that track motion between video frames.
Correlation methods of determining optical flow may include comparing portions of a first image with portions of a second image having similarity in brightness patterns. Correlation is typically used to assist in the matching of image features or to find image motion once features have been determined by alternative methods.
Motion vectors that were generated during the encoding of video frames may be used to determine optic flow. Typically, motion estimation procedures are performed during the encoding process to identify similar blocks of pixels and describe the movement of these blocks of pixels across multiple video frames. These blocks may be various sizes including a 16×16 macroblock, and sub-blocks therein. This motion information may be extracted and used to generate an optic flow vector field.
Gradient-based methods of determining optical flow may use spatio-temporal partial derivatives to estimate the image flow at each point in the image. For example, spatio-temporal derivatives of an image brightness function may be used to identify the changes in brightness or pixel intensity, which may partially determine the optic flow of the image. Using gradient-based approaches to identifying optic flow may result in the observed optic flow deviating from the actual image flow in areas other than where image gradients are strong (e.g., edges). However, this deviation may still be tolerable in developing a global motion model for video frame pairs.
Feature-based methods of determining optical flow focus on computing and analyzing the optic flow at a small number of well-defined image features, such as edges, within a frame. For example, a set of well-defined features may be mapped and motion identified between two successive video frames. Other methods are known which may map features through a series of frames and define a motion path of a feature through a larger number of successive video frames.
In an embodiment, the control signals obtained from the input stream may represent a characteristic value (e.g., color, motion, audio level, etc.) at a specific instant in time in the input stream or over a relatively short period of time. These local signals allow elements in the surround visual field to correlate with events in video. For example, an instanteous event, such as an explosion, in the input stream can correlate via a local signal to a contemporaneous or relatively contemporaneous change in the surround visual field. In an embodiment, the nature, extent, and duration of the change these local signals will have on the surround visual field may be determined by one or more coupling rules.
B. Coupling Rules
The couple rules represent the linking between the local control signals and how a foreground or background element in the surround visual field will be affected. As shown in
For example, in an embodiment, an aspect of the present invention may involve the synthesizing of three-dimensional environments for a surround visual field. In one embodiment, physics-based simulation techniques know to those skilled in the art of computer animation may be used not only to synthesize the surround visual field, but also as coupling rules. In an embodiment, to generate interactive content to display in the surround visual field, the parameters of two-dimensional and/or three-dimensional simulations may be coupled to or provided with control signals obtained from the input stream.
For purposes of illustration, consider the following embodiments of 3D simulations in which dynamics are approximated by a Perlin noise function. Perlin noise functions have been widely used in computer graphics for modeling terrain, textures, and water, as discussed by Ken Perlin in “An image synthesizer,” Computer Graphics (Proceedings of SIGGRAPH 1985), Vol. 19, pages 287-296, July 1985; by Claes Johanson in “Real-time water rendering,” Master of Science Thesis, Lund University, March 2004; and by Ken Perlin and Eric M. Hoffert in “Hypertexture,” Computer Graphics (Proceedings of SIGGRAPH 1989), Vol. 23, pages 253-262, July 1989, each of which is incorporated herein by reference in its entirety. It shall be noted that the techniques presented herein may be extended to other classes of 3D simulations, including without limitation, physics-based systems.
A one-dimensional Perlin function is obtained by summing up several noise generators Noise(x) at different amplitudes and frequencies:
The function Noise(x) is a seeded random number generator, which takes an integer as the input parameter and returns a random number based on the input. The number of noise generators may be controlled by the parameter octaves, and frequency at each level is incremented by a factor of two. The parameter α controls the amplitude at each level, and β controls the overall scaling. A two-dimensional version of Equation (1) may be used for simulating a natural looking terrain. A three-dimensional version of Equation (1) may be used to create water simulations.
The parameters of a real-time water simulation may be driven using an input video stream to synthesize a responsive three-dimensional surround field. The camera motion, the light sources, and the dynamics of the three-dimensional water simulation may be coupled through coupling rules to motion vectors, colors, and audio signals sampled from the video.
In an embodiment, the motion of a virtual camera may be governed by dominant motions from the input video stream. To create a responsive “fly-through” of the three-dimensional simulation, an affine motion model may be fit to motion vectors from the input stream. An affine motion field may be decomposed into the pan, tilt, and zoom components about the image center (cx, cy). These three components may be used to control the direction of a camera motion in simulation.
The pan component may be obtained by summing the horizontal components of the velocity vector (ui, vi) at four symmetric points (xi, yi) 360A-360D around the image center 350:
The tilt component may be obtained by summing the vertical components of the velocity vector at the same four points:
The zoom component may be obtained by summing the projections of the velocity vectors along the radial direction (rix, riy):
In embodiment, control signals may be used to control light sources in the three-dimensional synthesis. A three-dimensional simulation typically has several rendering parameters that control the final colors of the rendered output. The coloring in a synthesized environment may be controlled or affected by one or more color values extracted from the input stream. In an embodiment, a three-dimensional environment may be controlled or affected by a three-dimensional light source Clight, the overall brightness Cavg, and the ambient color Camb. In one embodiment, for each frame in the video, the average intensity, the brightest color, and the median color may be computed and these values assigned to Cavg, Clight, and Camb respectively. One skilled in the art will recognize that other color values or frequency of color sampling may be employed.
In an embodiment, the dynamics of a simulation may be controlled by the parameters α and β in Equation (1). By way of illustration, in a water simulation, the parameter α controls the amount of ripples in the water, whereas the parameter β controls the overall wave size. In an embodiment, these two simulation parameters may be coupled to the audio amplitude Aamp and motion amplitude Mamp as follows:
where Mamp=Vpan+Vtilt+Vzoom; f(.) and g(.) are linear functions that vary the parameters between their acceptable intervals (αmin, αmax) and (βmin, βmax). The above coupling rules or equations result in the simulation responding to both the audio and motion events in the input video stream.
It should be noted that the above discussion was presented to illustrate how control signals obtained from the input stream may be used to couple with the generating of the surround visual field in the framework 200, such as for example, using one or more parameters of a model to have one or more elements within the surround visual field respond to the input stream. Those skilled in the art will recognize other implementations may be embodiment to generate surround visual fields and such implementations fall within the scope of the present invention.
C. Articulated Elements
Another aspect of the present invention is its ability to animate one or more articulated elements, for example fish, birds, people, machines, etc., that may be made to move and/or to behave in response to the input stream. As explained in more detail below, the framework 200 enables elements in the surround visual field to exhibit a wide range of rich and expressive behaviors. Additionally, the framework allows for easy control of the global characteristics, such as motion and behavior, using few control parameters.
1. Model
An element, such as animals, insects, people, machines, and even plants, has a frame or skeleton. Modeling the frame or skeleton is beneficial in modeling how inputs, such as input forces, affect the element. Consider, by way of example, animals and people. These moving elements have articulated musculoskeletal frameworks for locomotion. The element's musculoskeletal frame determines the type and range of motions for the object.
The same principles of skeleton-based locomotion may be applied to virtual elements. In an embodiment, each character element may be represented using a triangular mesh with an underlying skeletal bone structure.
By way of example,
In
2. Animating Articulated Character Elements
In an embodiment, a character element may be animated by varying the root position and joint angles over time. The motion of the root joint controls the overall pose, including position and orientation, of the element, and the motion of the other joints create different behaviors. In an embodiment, these joint angles may be animated by an artist by posing the skeleton. In one embodiment, the framework 200 computes deformations of the mesh in response to the changes in the skeleton poses. This process of deforming the mesh in response to the changes in joint angles is called skinning. Examples of skinning are discussed by J. P. Lewis, Matt Cordner, and Nickson Fong in “Pose space deformations: A unified approach to shape interpolation and skeleton-driven deformation.” Proceedings of ACM SIGGRAPH 2000, Computer Graphics Proceedings, Annual Conference Series, pages 165-172, July 2000, which is incorporated by reference herein in its entirety.
In one embodiment, skinning may involve associating one or more regions of the mesh of the character with its underlying frame segment/bone, and updating these mesh regions (vertex positions) as the frame segments/bones move.
In an embodiment, to achieve real-time performance, portions of the animation framework may be implemented on a graphics processing unit (GPU) or graphic card. For example, embodiments of the present invention were performed using an NVIDIA® GeForce 6800 processor with a 256 megabit (MB) texture memory. One skilled in the art will recognize that no particular graphics processing unit is critical to the practice of the present invention.
In an embodiment, the skinning process may be implemented on a graphics card. That is, in an embodiment, the framework may implement skinning on hardware using vertex or pixel shaders. Each vertex on the base mesh may be influenced by a maximum number of bones. To compute the final, deformed position of a given vertex, the shader program may compute the deformation caused by all the joints affecting that particular vertex. The final position of the vertex may be a weighed average of these deformations. Because the deformations of each vertex is independent of other vertices in the mesh, the skinning step may be implemented on the GPU.
One skilled in the art will recognize that these and other modeling and animation techniques may be used for any of a number of objects, including without limitation, plants, animals, people, insects, machines, and the like.
3. Behavioral Model
In an embodiment, the motion of an element's frame may be designed by an artist using existing animation packages, such as Maya or Blender3D. The motion may be designed such that a sequence of joint angles, called motion clips, for the element corresponds to a unique behavior. These motion clips may be stored for retrieval by the framework 200. In an embodiment, the motion clips may be stored as “.x,” “.bmp,” and/or “jpeg” file formats and accessed by the framework 200. As noted previously, it shall be noted that no particular file format is critical to the present invention, and that the motion clips and other elements of the surround visual field may be stored in any file format now existing or later developed.
It shall be noted that the motion clips need not be linked to emotional traits, but may be applied any animation or motions, such as a machine performing specific tasks or a plant swaying, blooming, shedding its leaves, etc.
In an embodiment, the overall behavior of the element may be modeled using a collection of motion clips describing different behaviors. The collection may include one or more specific motion sequences.
4. Markov Model For Transitions
In one embodiment, the motion clips may be combined to create a combination of behaviors by using Markov models for transitions between motion clips. Markov models provide a simple mechanism for the element to change its behavior based on the events in an input stream.
Markov models may be used for capturing the overall element behavior using the collection of motion clips. In an embodiment, a Markov model represents each motion clip as a node in a graph. Transitions between these nodes may be controlled by one or more control signals, or cues, derived from the input audio-visual stream.
In an embodiment, it is assumed that the next state of the element depends only on the current state of the element and not on its history. In one embodiment, each element may have multiple states (e.g., happy, sad, scared, jump, run, hop, eat, etc.) and may have an uncertainty associated with the actions (e.g., by assigning a probability to each action), which allows for a rich set of object variations. In such cases, the element behavior may be explained mathematically using a Markov Decision Process (MDP).
It should be noted that an embodiment of the behavioral model may be based on transitions within a clip and between other clips. To synthesize smooth animations, transitions may be made continuous. In an embodiment, continuity may be achieved by smoothly morphing the vertex positions from the last pose of the previous clip to the first pose of the new clip. In an embodiment, this step may be implemented on a graphics processing unit as a vertex shader program.
As mentioned previously, the coupling rules may also include uncertainty or variability associated with the behavior by assigning a probability to each action. For example, one or more puffer fish in a pool of fish may be assigned as “calm” fish, meaning that they have a predisposition to stay in a calm state of swimming. And, one or more puffer fish may be assigned as “easily agitated” fish, wherein they are more likely to get scared. For purposes of illustration,
One skilled in the art will recognize that a benefit of the framework 200 is its ability to allow a user to alter what control signals are extracted, the coupling rules, the probabilities, or more than one of these items thereby giving greater control over the responsiveness of the synthesized surround visual field.
5. Global Motion Model
As noted previously, an embodiment of the behavioral model uses the motion clips, which describe the variation of joint angles of the frame or skeleton. For example, the two motion clips in
D. Control
As noted previously, a beneficial aspect of the framework is its ability to easily control and program the global motion and behavior of the objects in the surround visual field. The character element model presented above has been designed to be easily controllable using a few set of parameters. Presented below are some of the different control parameters that may be used in the framework.
1. Global Motion Control
As mentioned previously, the root joint may be animated independently to generate a desired global trajectory. In an embodiment, the framework may use a key-framing approach to set the root joint trajectory. In one embodiment, the user may specify one or more control points. Given the key frame points, the framework 200 may interpolate these points to generate a smooth trajectory for the root joint. Quantities specified in the control points may include root positions XεR3, orientations θεR4, scale sεR3, and time—each of which may, in an embodiment, be interpolated along a trajectory. In an alternative embodiment, the framework may allow the root joint trajectory to be disturbed in response to one or more local control signals obtained from the input stream. In an embodiment, this result may be achieved by adding a noise displacement to the control points of the interpolated trajectory.
2. Behavior Control
In an embodiment, behavior control may be achieved by building a state-action graph or graphs for the given element. The framework allows for a wide range of control-from fully scripted character element responses to highly stochastic character element behavior. Typically, once a set of motion clips has been designed, a list of possible transitions between the different states may be defined. In an embodiment, the possible transitions between the different states may be weighted by probabilities. In an embodiment, a list of actions, or control signals and coupling rules, corresponding to these transitions may also be specified, which correspond to the various control signal derived from the input stream. The ability to custom build the Markov graph allows for control of the element behavior for a wide range of control signals from the input stream.
E. Coupling with Audio Video Signals
To demonstrate the various features of the animation framework, a fish tank simulation is depicted in
Depicted in
1. Coupling Light Sources with Video Color:
In an embodiment, the color of the fish tank may be designed to relate with colors of the input video 910. In the depicted embodiment, the fish tank simulation has six point light sources, four at the corners, one behind the tank and one in front of the tank. The colors of the light sources may be obtained by sampling colors from the corresponding video frame. For example, the light source on the top left corner samples its color from the upper left quadrant pixels of the input video stream 910. Additionally, the fish tank simulator has a fog source whose density may be coupled to the image colors.
2. Coupling Character Motion with Audio:
In an embodiment, the fish motion (global direction, speed, orientation, etc.) and behavior (swim, scared, etc.) may be controlled by the audio intensity. In the last frame 900C, the fish 940 and 945 are scared by a loud noise in the input video 910C.
In an embodiment, the speed of the fish motion may be coupled with the audio intensity, such that the fish swim faster when there is a lot of audio action in the stream. In order to achieve this, the simulation time step may be varied as follows:
t=t
0×αk×vol (8)
where t0 is the initial value of the time step; vol is the local control signal representing audio intensity; and α and k are tunable parameters. In the simulation depicted in
In an embodiment, other methods for coupling the surround visual field with control signal from the input stream may include using motion vectors to affect the object motion. The above examples were provided for purposes of illustration only and shall not be used to narrow the invention. One skilled in the art will recognize other control signals which may be obtained from the input stream and other coupling rules for linking the control signals to the surround visual field.
F. Surround Visual Field Framework with Growth Model
In one embodiment, the global control signals 224 may be provided to one or more growth coupling rules 270, such as a growth model. The growth coupling rules may possess coupling rules for linking foreground 250 and/or background 260 elements in the surround visual field to one or more growth models. In an embodiment, a growth coupling rule may be used to allow the surround visual field to evolve over the course of the presentation.
It should be noted that the addition of one or more growth coupling rules allows for even more robustness and responsiveness of the surround visual field. In addition to instantaneous changes from local control signals and their associated coupling rules, longer term aspects or patterns in the input stream 210 may be introduced into the surround visual field through one or more growth coupling rules.
In an embodiment, the growth coupling rule may consider the “age” of an element or elements in the surround visual field. Consider, for example, the surround visual field 1130 presented in
In an embodiment, the global control signal and/or growth coupling rules may represent patterns in the input stream. Consider for example the evolving surround visual field depicted in
It should be noted that no particular implementation of the growth model 270 or the framework 200 is critical to the present invention. One skilled in the art will recognize other implementations and uses the surround visual field framework 200, which are within the scope of the present invention.
G. Exemplary Method for Generating a Surround Visual Field According to an Embodiment
Turning now to
An input stream may be analyzed to obtain (1210) the control signal that is related to an input stream. As discussed above, the control signal may relate to the input stream by extracting or obtain a characteristic from the input stream, such as, for example, motion, color, audio signal, and/or content. The control signal may then be supplied to the coupling rule to generate an affect that may be applied (1215) to at least one element of the surround visual field. In an embodiment, the effect may be applied to multiple elements in the surround visual field. In one embodiment, an element may have more than one effect applied to it wherein the resulting effect may be the superposition of all the effects applied to the element.
It should also be noted that the effect on elements within the surround visual field, particularly like elements, may be different. Consider, by way of illustration, a school of fish in a surround visual field. A coupling rule may receive audio control signals as an input and output the motion of the fish. Given the same input, the reaction of each element (i.e., each fish in the school of fish) may be different. The reaction may be different due to additional control signal inputs, parameters, probabilities, or the like. The fish may scatter in different direction and create different flocking groups. The flocking behavior may be part of the local coupling rule and/or part of a growth coupling rule.
Finally, the surround visual field may be displayed (1220) in an area that surrounds or partially surrounds an area displaying the input stream, thereby enhancing the viewing experience for a user or users.
In the embodiment depicted in
It shall be noted that embodiments of the present invention may further relate to computer products with a computer-readable medium that have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind known or available to those having skill in the relevant arts. Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store or to store and execute program code, such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs), flash memory devices, and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher level code that are executed by a computer using an interpreter.
While the invention is susceptible to various modifications and alternative forms, a specific example thereof has been shown in the drawings and is herein described in detail. It should be understood, however, that the invention is not to be limited to the particular form disclosed, but to the contrary, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the appended claims.
This application is related to co-pending and commonly-assigned U.S. patent application Ser. No. 11/294,023, filed on Dec. 5, 2005, entitled “IMMERSIVE SURROUND VISUAL FIELDS,” listing inventors Kar-Han Tan and Anoop K. Bhattacharjya, which is incorporated by reference in its entirety herein. This application is related to co-pending and commonly-assigned U.S. patent application Ser. No. 11/390,932, filed on Mar. 28, 2006, entitled “SYSTEMS AND METHODS FOR UTILIZING IDLE DISPLAY AREA,” listing inventors Kiran Bhat and Anoop K. Bhattacharjya, which is incorporated by reference in its entirety herein. This application is related to co-pending and commonly-assigned U.S. patent application Ser. No. 11/390,907, filed on Mar. 28, 2006, entitled “SYNTHESIZING THREE-DIMENSIONAL SURROUND VISUAL FIELD,” listing inventors Kiran Bhat, Kar-Han Tan, and Anoop K. Bhattacharjya, which is incorporated by reference in its entirety herein.