The present application relates generally to video graphics processing, and more particularly, to merging simulated entities, such as people and vehicles for example, within live-video feeds.
When training in a network centric environment, users may desire a combination of real and virtual platforms. Within existing training centers, it is difficult to illustrate virtual people or platforms involved in a training exercise in combination with a live video feed of a camera.
Existing systems typically film live action in front of a “blue” or “green” screen, and then insert a computer-generated image or virtual entity behind the live action, or in any area within the live video feed where there is the blue or green color. In this example, a person can appear to be standing on the beach, but actually, the person is in a film studio in front of a large blue or green background. Different backgrounds can be added on those parts in the image where the color is blue. However, if the person himself wears blue clothes, for example, his clothes will become replaced with the background video as well. Blue or Green colors are often used because the blue and green colors are considered least like skin tone.
Such techniques work well in instances in which it is not desirable to insert simulated action into the live video feed with a desired depth and obscuration. Existing techniques may not accurately place simulated objects that are part of the both foreground and the background. For example, existing techniques may not allow a virtual entity to change from being fully obscured by a real live object, to partially obscured by the object, to no obscuration by the object.
A system and method for video graphics processing is described. The present system describes a manner of integrating virtual entities and live video streams (or camera snapshots). For example, this method may be used to enable training with live assets to include virtual assets in the scenario. Virtual entities can be made to disappear (or partially disappear) behind terrain objects, such as walls, doors, tables, etc, in the same manner as a view of a live person would be obscured by such objects as well.
In one aspect, the present application includes a method of integrating virtual entities within live video. The method includes receiving a live video feed from a camera of a terrain, receiving any updates from computer generated forces (CGF) entities, and rendering a three-dimensional model of the terrain and the CGF entities in a synchronized manner with the live video feed from the camera. The method further includes merging the rendered terrain including the virtual entity with the live video feed so that the virtual entity is seen in the live video feed, and outputting the merged data to a display.
In another aspect, the present application includes a system for integrating virtual entities within live video. The system includes a live camera video feed of a terrain and a computer operable to execute instructions for rendering a three-dimensional model of the terrain with a virtual entity within the simulated world. The system further includes an overlay controller coupled to the live camera video feed and the computer. The overlay controller merges the three-dimensional model with the live camera video feed so that the virtual entity is seen in the live camera video feed. The overlay controller also synchronizes a view between the rendered virtual world view and the live camera video feed.
In still another aspect, the present application includes a method of integrating virtual entities with the view of a person. The method includes accessing a simulated model of a terrain in the field of view of the person from a database, and inserting a virtual entity into the simulated model of the terrain. The virtual entity is positioned within the simulated model of the terrain accurately with respect to background and static objects in the simulated model of the terrain. The method also includes rendering the simulated model of the terrain including the virtual entity in a synchronized manner with the field of view of the person. The simulated model of the terrain is rendered in a monochrome color and the virtual image is rendered in multi-color. The method further includes displaying the simulated model of the terrain including the virtual entity in front of the person, and the simulated model of the terrain is displayed with the monochrome color set to be transparent.
These as well as other aspects and advantages will become apparent to those of ordinary skill in the art by reading the following detailed description, with reference where appropriate to the accompanying drawings. Further, it is understood that this summary is merely an example and is not intended to limit the scope of the invention as claimed.
The present application provides a method and system for video graphics processing. In an exemplary embodiment, a computer rendered mimic of a live video feed with additional virtual entities is generated, so that the virtual entities can be integrated accurately within the live video feed. The computer rendered mimic of the live video feed is created using known attributes of the video feed (such as the location, orientation, and field of view of the camera). Additionally, the computer rendered mimic of the live video feed uses a three-dimensional terrain model in which location, orientation, and shape of static objects present in the real world are accurately positioned. This allows virtual entities to appropriately interact with static objects in the virtual world so that once merged with the live video feed, the virtual entities will appear to appropriately interact with static objects in the real world. Interacting of the virtual entities with the virtual world may be performed using known techniques that are commercially available and used in computer games, video games, as well as the simulation industry.
The three-dimensional terrain model of the terrain within the view, or possible view, of live camera is first created to map out objects in the image. Objects are positioned in the three-dimensional database relative to each other. A final or exact location of the camera is not necessary at this point, as long as the three-dimensional terrain models contain an accurate model of the terrain within the view of camera. Only objects between the camera position and the virtual entity's position will need to be accurately represented in this three-dimensional terrain model. Less dynamic behavior of the virtual entities may allow for high fidelity sections of the three-dimensional terrain model to be more clearly focused. For cameras with dynamic location or orientation, more terrain can be accurately represented in the three-dimensional terrain model. All physical static objects present in the real world will be represented in a mono-chrome default color within the three-dimensional terrain model. Virtual entities or objects that are not present in the real world will be represented in full-color. The mono-chrome default color should not be present in these virtual entities. During runtime, the three-dimensional terrain and virtual entities are rendered from the point of view of the live camera. This will result in a simulated video feed in which virtual entities are properly placed, sized, and obscured while the rest of the simulated video feed is the default mono-chrome color. From this point, the simulated video is merged with the live video feed so that the parts of the simulated video feed that are not the mono-chrome default color are overlaid on top of the live video feed. Thus, for example, a virtual entity representing and resembling a person may be inserted with portions in front of a tree, but behind a portion of a plant that is present in the video feed.
In exemplary embodiments, the virtual entities rendered with the three-dimensional terrain model can be merged with the live video feed, so that the virtual entities are seen in the live video feed and are accurately represented with respect to objects in the live video feed. The merging can occur in real-time, so that virtual entities are overlaid on top of live video feed in real-time.
Turning to the figures,
It should be understood that the system 100 and other arrangements described herein are for purposes of example only. As such, those skilled in the art will appreciate that other arrangements and other elements (e.g. machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and some elements may be omitted altogether according to the desired results. Additionally other methods of overlaying video may be used. Further, many of the elements that are described are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, in any suitable combination and location.
The operator 202 may operate an asset controller 204 to control the asset 208 and the simulated asset 210. The asset controller 204 may be a computer that includes software that may be executed to control the asset 208, and to view merged video from the asset 208 and the simulated asset 210. The asset controller 204 may be a standard laptop or desktop computer, for example. The asset controller 204 may include software to convert input received from the asset operator 202 into commands that are understood by the asset 208.
The asset controller 204 includes a location/PTZ CMD (pan/tilt/zoom command) application that can be executed by a processor to send commands to the asset 208 and the simulated asset 210 via an overlay controller 206. The format of the commands will depend on the type of asset. The commands may be interpreted to modify and control a location of the asset 208 (in the event that the asset 208 can change locations) and/or modify and control a configuration of the asset 208, such as controlling a pan-tilt-zoom function of a camera. The asset controller 204 may send commands in the format as if the asset controller 204 were directly coupled to the asset 208. The asset controller 204 may receive a response from the asset 208 via the overlay controller 206 indicating receipt of the commands and actual location and orientation of the asset 208. The asset controller 204 will receive a merged video stream from the overlay controller 206 and display the video. The merged video stream comprises the asset 208 video overlaid with the simulated asset 210 video after setting a mono-chrome color of the simulated asset 210 video to transparent, as discussed above.
The overlay controller 206 may take the form of a computer that is coupled to the asset controller 204 (either through a wired or wireless connection), to the asset 208 (either through a wired or wireless connection), and to the simulated asset 210 (either through a wired or wireless connection). The overlay controller 206 operates to merge simulated and live video streams, and to pass location and pan-tilt-zoom command information from the asset controller 204 on to both the asset 208 and the simulated asset 210.
The overlay controller 206 will forward commands from the asset controller 204 to both the asset 208 and the simulated asset 210 using a command resolution application. The overlay controller 206 also resolves any differences between functions and views shown by the asset 208 and the simulated asset 210. The simulated asset 210 operates to mimic the asset 208. For example, if the asset 208 is a camera, then the simulated asset 210 will render the same view point of the asset 208. The simulated asset 210 may receive commands in the same format as the live asset 208, so that if the camera is instructed to turn 45° to the left, then a display shown by the simulated asset 210 should change in a substantially corresponding fashion as the field of view of the camera changes. The commands may be the same as received by the live asset 208 or the overlay controller 206 may make modifications to the commands to synchronize the simulated asset 210 with the live asset 208.
The simulated asset 210 may take the form of a computer executing applications, and will render a simulated world using a rendering application. The rendering application will utilize a three-dimensional model of the terrain in which everything is set to a single mono-chrome color, such as green or blue. A location and orientation at which the simulation of the terrain is rendered will be determined by interpreting commands received from the asset controller 204 via the overlay controller 206.
As mentioned, the simulated asset 210 uses a three-dimensional terrain database as well as three-dimensional models of any entities to render the simulated camera view. A background of the simulated view will be set to a single monochrome color, such as blue or green. Virtual entities in the simulated view will be inserted and rendered in multi-color as normal. Virtual entities will be positioned accurately within the simulated view as the entity would be positioned in real life, such as in front of or behind an object. Virtual entities that are further away will be rendered as smaller than those close up. Virtual entities will not simply be overlaid onto the simulated video, but rather, will be positioned within the simulated video in front of and behind objects, for example.
The overlay controller 206 merges video streams from the asset 208 and the simulated asset 210. The simulated asset 210 will send a video stream with a mono-chrome background (such as blue or green) to the overlay controller 206, which will remove the entire mono-chrome background color, and then place the video stream on top of the remaining data in the asset 208 video stream. The merged video stream can then be sent to the asset controller 204 for viewing by the operator 202.
The overlay controller 206 will ensure that the simulated asset 210 is substantially in synchronization with the asset 208 so that the simulated asset 210 mimics the asset 208. For example, if a location or orientation of the simulated asset 210 differs from the asset 208, the overlay controller 206 may contain software that can modify commands being sent to the simulated asset 210 in order to realign the simulated asset 210 with the asset 208. The overlay controller may receive commands from additional sensors attached to the asset 208 in order to accurately synchronize the asset 208 and the simulated asset 210.
The rendering application of the simulated asset 210 can be connected to a simulation network 212 via a distributed interactive simulation (DIS) or high level architecture (HLA) protocol, or other protocols. In rendering the simulated world, the simulated asset 210 may receive information and instructions from the simulation network 212, such as the location, orientation, and behavior of a virtual entity.
The simulation network 212 includes any number of computers that may be located on a local area network (LAN) or wide area network (WAN). The simulation network 212 can include high to low fidelity simulations that are either autonomous or human-in-the-loop, for example.
Steps of the method of
During setup, three-dimensional models of terrain will be created, as shown in blocks 302 and 304; these steps are often referred to as content creation. During content creation, as shown in block 302, a three-dimensional terrain model is created to match the physical world; however, the three-dimensional terrain model will be of a single monochrome color. To do so, measurements are taken to determine locations of objects in the terrain, or a detailed survey of the terrain may be performed to identify locations of objects. Pre-existing drawings of buildings may be used to obtain measurements. Commercial off the Shelf (COTS) tools can be used for creating a three-dimensional model of the terrain. Examples of a COTS tools would be Creator available from Presagis of Richardson, Tex.; XSI available from Softimage® of Montreal, Canada; and three-dimensional Studio MAX available from Autodesk® of San Rafael, Calif.
The extent and fidelity of the terrain created will depend upon an application. A highest fidelity of the terrain may be required where virtual entities and objects interact within the view of the camera. Objects that will partially obscure virtual entities will typically require precise measurements. The monochrome three-dimensional terrain will be used for rendering the simulated world by the simulated asset 210. A full color version of the terrain may be used by other applications that are part of the simulation network 212. The simulation network 212 may impose other requirements on the extent and fidelity of the terrain.
During the content creation phase, three-dimensional models of all virtual entities in full color and their actions are obtained or created, as shown in block 304. For example, these might include human models such as 118 and 120, or vehicle models. The models may be static or have joints and moving parts. Actions will be created during the content creation phase as well, for example, should the human models be able to walk or kneel.
The computer model simulation and the live-video feed are then linked together or synchronized, as shown at block 306, so that the computer model simulation mimics the live-video feed. For example, if the camera were to receive a command indicating to turn 45° to the left, the live-video feed will correspondingly change, and because the simulated view is linked with the live-video feed, the simulated view will also receive the command indicating to turn 45° to the left and will correspondingly change to mimic the live-video feed.
Next, as shown at block 308, updates will be received for the location of simulated entities. The updates may be internally generated by the simulated camera 110 or generated by computer generated forces and other simulations on the network 212. These updates will include position, orientation, and any behavior information required to render to the entity in the virtual world. The network 212 may also send information on detonations and explosions or other actions for rendering, for example.
As shown in block 310, the simulated world video 112 will be rendered by the simulated camera 110 or simulated asset 210 with full color virtual entities blended into a mono-chrome terrain. At about the same time as the simulated video is being rendered, the live camera 106 or asset 208 would send real video to the overlay controller 104, as shown in block 312. The real video includes the physical world video 108.
As shown in block 314, the video from block 310 is merged with the video from block 312. This is performed by setting the mono-chrome background of the simulated video 112 from block 310 to transparent and then overlaying the simulated video 112 on top of the physical world video 108 from block 312. Other methods of merging video may be employed as well. At this stage, the merged video can be displayed and the steps 306, 308, 310, 312, and 314 can be repeated for a next video frame. In this respect, the merging may occur on a frame-by-frame basis, and in real-time, so as to enable a realistic simulation of a virtual entity present within the live video feed.
The example illustration in
A simulated robot 510 includes a simulation that receives commands in the same format as the robot 508. Differences of location and orientation between the robot 508 and the simulated robot 510 will occur over time due to real world physics, such as friction of the surface that the robot 508 is crossing and that the simulated robot 510 not aware. The simulated robot 510 will send a location, orientation, camera orientation, and FOV to the overlay controller 506 via the interface 522. The overlay controller 506 will compare responses from the robot 508 and the simulated robot 510 and send any adjustments needed to the simulated robot 510 via the interface 522. The adjustment command is an extra command that is received by the simulated robot 510 and is not available in the robot 508, such as a command of generating unrealistic behavior, like an instantaneous jump, out of the simulated robot 510 in order to mimic the robot 508. In addition to adjustment commands, the overlay controller 506 will also send the commands received from the robot controller 504 across the interface 514 to the simulated robot 510 across the interface 522. The commands are the same as the commands sent to the robot 508 across the interface 518.
The simulated robot 510 receives updates from computer generated forces that are part of the simulation network 512 in a Distributed Interactive Simulation (DIS) format across an interface 526. Likewise, the simulated robot 510 reports a position and orientation, which is also the position and orientation of the robot 508, in a DIS format to the simulation network 512 across the interface 526. After updates from CGF entities are received by the simulated robot 510, then the simulated robot 510 renders a simulated camera view. The camera view is rendered using a three-dimensional model of the terrain in a mono-chrome green color. The simulated robot 510 may render the video using the Virtual Environment Software Sandbox (VESS) available for Windows® products, for example. Other technologies or products may be used to render the video, such as MÄK Stealth available from VT MÄK of Cambridge, Mass. Video from the simulated robot 510 will be sent out via the interface 524.
As the simulated robot 510 sends video to the overlay controller 506 over the interface 524, the robot 508 sends a camera feed to the overlay controller 506 over the interface 520. The overlay controller 506 merges the two video streams by setting the mono-chrome green color of the simulated robot video to transparent and laying the simulated video on top of the robot video stream. The overlay controller 506 then sends the merged video to the robot controller 504 across the interface 516 for viewing by the robot operator 502.
The present application has been described as inserting virtual entities into a simulated model of a terrain and merging the simulated model with a live camera feed so as to output the merged data onto a display or recording. However, in another respect, the simulated model can be conceptually merged with a live view.
As a specific example, the user 604 may wear glasses or a head-mounted apparatus that displays the simulated view 614, and by looking through the glasses, the physical world 606 will fill a remainder of a viewing space of the user 604. In this manner, the virtual entities 616 and 618 are inserted into the view of the user 604.
The simulated eye 612 may also have access to sensors 622 that determine a location and field of view of the human eye 602. For example, sensors may be mounted to a head gear apparatus of the user and sense a location of the user's eye 602, and a direction of view of the user's eye 602. The sensors 622 can forward this information to the simulated eye view engine for rendering the simulated eye view 614 appropriately. The simulated eye view 614 is rendered by accessing databases of the three-dimensional terrain model and three-dimensional entity models, such that a mono-chrome terrain is rendered so as to properly obscure the entities. The simulated eye view 614 is then displayed with the mono-chrome color displayed transparently. Because the human eye 602 sees the simulated view 614 in front of the physical world 606, with the background of the simulated view transparent and the virtual entities in full-color, the user 604 will see the merged image 620.
As mentioned, the system 600 may be used with Head Mounted Display (HMD) technology. The HMD can be worn by the field user 604, and could perform the functions for both the transparent display 614 and the sensors 622.
In the present application, the video graphics processing has been described as overlaying the simulated video feed onto the live video feed to perform the merging process. Either method may be used, or the processing may include additional or alternative steps when using one or the other method. In each method, the background of the simulated video will be made transparent prior to the overlay, and after the overlay, any virtual entities inserted into the simulated video will be seen within the live video feed.
It should be understood that the arrangements described herein are for purposes of example only. As such, those skilled in the art will appreciate that other arrangements and other logic or circuit elements can be used instead, and some elements may be omitted altogether according to the desired results. Further, many of the elements that are described are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, in any suitable combination and location.
It is intended that the foregoing detailed description be regarded as illustrative rather than limiting, and it is intended to be understood that the following claims including all equivalents define the scope of the invention.