Two dimensional to three dimensional moving image converter

Description

FIELD OF THE INVENTION

This invention relates to a two dimensional to three dimensional motion image converter. It is useful, for example, for viewing arbitrary two dimensional cable programs or arbitrary DVDs or video cassette in three dimensions on a three dimensional television.

BACKGROUND OF THE INVENTION

The simplest modern televisions are two dimensional. These televisions consist of a screen and a means for connecting to a cable or similar broadcast network, as well as means for connecting to the internet and means for connecting to a laptop or desktop to view online streaming videos. However, recently, several advanced televisions have been developed.

For example, three dimensional televisions, such as the 3D HDTV manufactured by Samsung Corporation, optimize the viewing experience of 3d videos. However, there are relatively few movies that are designed to be viewed in 3d, as opposed to the plethora of traditional videos available. Also, currently available cable, and telephone company based broadcast services do not provide any 3d content, thereby reducing the value to the user of 3d televisions.

A 3D television system and method is described in detail in US App. 2005/0185711, incorporated herein by reference. See also US App. 2006/0007301, US App. 2006/0184966, U.S. Pat. No. 4,740,836, expressly incorporated herein by reference.

3D images can be generated if a 3D model of the environment exists. See, e.g., US App. 2006/0061651, incorporated herein by reference. These images could be used in 3D video games or movies.

Other 3D imaging techniques are known in the art and used in a broad range of fields ranging from medicine to architecture. See, e.g., US App. Nos 20090128551; 20090141024; 20090144173; 20090146657; 20090148070; 20090153553; 20090154794; 20090161944; 20090161989; 20090164339; 20090167595; 20090169076; 20090179896; 20090181769; 20090184349; 20090185750; 20090189889; 20090195640; 20090213113; 20090237327; 20090262108; 20090262184; 20090272015; 20090273601; 20090279756; 20090295801; 20090295805; 20090297010; 20090297011; 20090310216; 20090315979; 20090322742; 20100007659; 20100026789; 20100026909; 20100034450; 20100039573; 20100045696; 20100060857; 20100061603; 20100063992; 20100085358; 20100086099; 20100091354; 20100097374; 20100110070; 20100110162; 20100118125; 20100123716; 20100124368; and U.S. Pat. Nos. 7,719,552; 7,715,609; 7,712,961; 7,710,115; 7,702,064; 7,699,782; 7,697,748; 7,693,318; 7,692,650; and 7,689,019; all expressly incorporated herein by reference.

Many different automatic pattern recognition techniques are also known in the art. See, e.g., US App. Nos 20100121798; 20100115347; 20100099198; 20100092075; 20100082299; 20100061598; 20100047811; 20100046796; 20100045461; 20100034469; 20100027611; 20100027606; 20100026642; 20100016750; 20090326841; 20090324107; 20090297021; 20090297000; 20090290800; 20090290788; 20090287624; 20090268964; 20090254496; 20090232399; 20090226183; 20090220155; 20090208112; 20090169118; 20090152356; 20090149156; 20090144213; 20090122979; 20090087084; 20090087040; 20090080778; 20090080757; 20090076347; 20090049890; 20090035869; 20090034366; 20090010529; 20090006101; 20080319568; 20080317350; 20080281591; 20080273173; 20080270338; 20080270335; 20080256130; 20080246622; and U.S. Pat. Nos. 7,707,128; 7,702,599; 7,702,155; 7,697,765; 7,693,333; 7,689,588; 7,685,042; 7,684,934; 7,684,623; and 7,677,295; all expressly incorporated herein by reference.

In addition, Commons teaches a hierarchal stacked neural network that is useful in pattern recognition in U.S. Pat. No. 7,613,663, incorporated herein by reference.

Video cards or graphics cards, which separate graphics processing from the CPU in laptop and desktop computers, are also known in the art. Lower end video cards are recommended and function efficiently for simple computer use that is not graphics intensive, such as Word processing, reading email, and occasionally watching an online or computer-disk-based video. However, individuals who frequently play picture and video-based computer games frequently require more complex, higher end video cards. See en.wikipedia.org/wiki/Video_card, last accessed May 7, 2010, incorporated herein by reference, for a more detailed discussion of video card technology.

In single instruction multiple data (SIMD) technology, a computer with multiple processing elements performs the same operation on multiple data simultaneously. Many video cards use SIMD because similar transformations might need to occur to multiple pixels simultaneously. In old computers where the graphics processor is part of the central processing unit (CPU), SIMD is typically used for the graphics processing. Young, U.S. Pat. No. 6,429,903, incorporated herein by reference, describes a video card that is optimized by using shading techniques before ascertaining the color change on a pixel on the screen.

Several methods of 2D to 3D image conversion are known in the art. See, e.g., U.S. Pat. No. 7,573,475, expressly incorporated herein by reference. Many of these methods utilize techniques to review and analyze 2D images and employ algorithms to determine distance in the image by way of brightness, manual judgment, and rotoscoping algorithms. Thus, these methods are malconfigured for use in 3D televisions and often cannot convert images seamlessly and in real time, as required by many 3D television viewers. See also U.S. Pat. No. 7,573,489; US App. Nos. 20090322860; 20080150945; 20080101109; 20070279415; 20070279412; and 20040165776; each of which is expressly incorporated herein by reference.

Currently known methods of 2D to 3D conversion are not very practical, and filmmakers typically spend excessive amounts of financial and human resources to recreate 2D movies in 3D. For example, in spite of Disney's great investment of both talent in money in creating a 3D version of Chicken Little, the depth perception by viewers of the movie was still very poor. See, generally, Wikipedia: Chicken Little (2005 film), en.wikipedia.org/wiki/Chicken_Little_(2005_film), last accessed May 21, 2010, discussing the process of producing Chicken Little; and Dipert, Brian, “3-D Stop Motion: Well-Deserved Promotion,” EDN, Oct. 31, 2007, discussing the poor viewer experience in the 3D version of Chicken Little.

Samsung Corporation provides a system and method for 2D to 3D conversion of substantially arbitrary television programs in the UN55C7000 1080p 3D LED HDTV. See www.samsung.com/us/consumer/tv-video/televisions/led-tv/UN55C7000WFXZA/index.idx?pagetype=prd_detail, last accessed Jun. 2, 2010. However, Samsung's system and method is not optimal because it has a high error rate, provides inconsistent images to the right eye and the left eye (where the user is wearing 3D glasses), and has a tendency to give viewers headaches and motion sickness or otherwise discomfort them. See, generally, mashable.com/2010/03/09/samsung-3d-tv-starter-kit/, last accessed Jun. 2, 2010. Samsung's patent application on the topic, US Pat. App. 20090237327, incorporated herein by reference, notes that the right eye signal in the glasses repeats part of the left eye signal. See also, US 2009/0290811, incorporated herein by reference.

3D televisions have the potential to improve viewer experience by providing an additional dimension in which viewers can view scenes. For example, viewing a 2D sportscast is a much lower quality experience than viewing a game in a stadium in part because the 2D TV viewer cannot appreciate depth. 3D TV has the potential to solve this problem. However, a major negative feature of 3D TVs is the lack of content. What is needed in the art is an effective system and method to convert substantially arbitrary content from two dimensions to three dimensions.

SUMMARY DESCRIPTION OF THE INVENTION

Due to the limited number of videos made in three dimensions, and the lack of cable or broadcast programs in three dimensions, the utility of a three dimensional television to a typical person is very limited. This invention proposes a method of changing a substantially arbitrary television program or recording into a viewing format that is optimized for a three dimensional screen. This would allow the users of three dimensional televisions to watch substantially arbitrary programs and videos in a format optimized for 3D viewing.

It is an object of the invention to provide a method comprising: receiving as input a representation of an ordered set of two dimensional images; analyzing the ordered set of two dimensional images to determine at least one first view of an object in at least two dimensions and at least one motion vector; analyzing the combination of the first view of the object in at least two dimensions, the motion vector, and the ordered set of two dimensional images to determine at least a second view of the object; generating a three dimensional representation of the ordered set of two dimensional images on the basis of at least the first view of the object and the second view of the object; and providing as output an indicia of the three dimensional representation.

Optionally, the ordered set of two dimensional images comprises a video. Optionally, at least one image in the ordered set of two dimensional images is taken by a single, stationary camera. Optionally, the motion vector corresponds to an object in the image. Optionally, a processing speed of the method is real-time.

Optionally, the method further comprises predicting a shape and color of at least one object that is not visible in the two dimensional image but is visible in the three dimensional model on the basis of at least one of an Internet lookup, a database lookup, and a table lookup.

It is an object of the invention to provide method comprising: receiving as input a two dimensional image taken by a camera; developing a depth representation of at least one object in the two dimensional image through the steps of: calculating an interpolation function for estimating the three dimensional position of items in the two dimensional image on the basis of at least an estimated height of the camera and an estimated angle relative to a horizontal plane of the camera, using said interpolation function to calculate a distance of the at least one object from the camera, and converting said distance of the at least one object from the camera into a depth of the at least one object in the scene; predicting a shape and color of at least a portion that is not visible in the two dimensional image of the at least one object on the basis of at least one of an Internet lookup, a database lookup, and a table lookup; converting said depth of the at least one object and said a shape and color of at least a portion that is not visible in the two dimensional image of the at least one object in the scene into a three dimensional model of said at least one object; and providing a representation of the three dimensional model of said at least one object.

Optionally, the interpolation function is one of a Newton divided difference interpolation function and a LaGrange interpolation function. Optionally, the two dimensional image is taken by a single, stationary camera. Optionally, a focal length of the camera is unknown. Optionally, a processing speed of the method is real-time. Optionally, the three dimensional model is expressed in a format configured to be displayed on a three dimensional screen.

It is an object of the invention to provide a method comprising: receiving a representation of a two dimensional image; classifying at least one region of the two dimensional image; extracting at least one vanishing line and at least one vanishing point from the two dimensional image; extracting at least one depth gradient in the image on the basis of at least one of said at least one vanishing line and said at least one vanishing point; predicting a shape and color of at least one object that is not visible in the two dimensional image but is visible in the three dimensional model on the basis of at least one of an Internet lookup, a database lookup, and a table lookup; creating a three dimensional model of at least a portion of the two dimensional image on the basis of said at least one depth gradient and said prediction of a shape and color of at least one object; and providing the three dimensional model of at least a portion of the two dimensional image.

Optionally, the at least one region of the two dimensional image is one of sky, land, floor, and wall. Optionally, the extraction of at least one vanishing line and at least one vanishing point is on the basis of whether the image is an indoor image, an outdoor image with geometric features, or an outdoor image without geometric features. Optionally, the three dimensional model is expressed in a format configured to be displayed on a three dimensional screen. Optionally, the two dimensional image is taken by a single, stationary camera. Optionally, a processing speed of the method is real-time.

It is an object of the invention to provide a method of presenting a three dimensional film to a viewer comprising: calculating a distance and an angle from the viewer to a screen; applying at least one transform to a representation of a scene to produce a three dimensional model corresponding to the distance and the angle from the viewer to the screen; and presenting on the screen a three dimensional image corresponding to the three dimensional model.

Optionally, at least one of the distance and the angle from the viewer to the screen is calculated on the basis of an article of clothing or an accessory worn by the viewer. Optionally, at least one of the distance and the angle from the viewer to the screen is calculated on the basis of at least one image taken by a camera connected to the screen. Optionally, at least one of the distance and the angle from the viewer to the screen is calculated on the basis of camera parameters. Optionally, at least one of the distance and the angle from the viewer to the screen is calculated on the basis of image parameters not related to the camera. Optionally, the at least one transform is a 2D to 3D transform. Optionally, the at least one transform is a 3D to 3D transform. Optionally, a processing speed of the method is real-time. Optionally, the screen is configured to be used as a touch screen.

Other embodiments of the invention involve at least one processor and a non-transitory computer readable medium with instructions for the same.

It is an object to provide a method, and system for carrying out that method, and a computer readable medium storing instructions adapted for controlling as programmable processor to carry out the method, comprising: receiving as input a representation of an ordered set of images; analyzing the ordered set of images to determine at least one first view of an object in at least two dimensions; automatically identifying the object and obtaining information extrinsic to the ordered set of two dimensional images describing the object; analyzing the combination of the first view of the object in at least two dimensions, and the information describing the object, to infer a state of a hidden surface in the ordered set of two dimensional images; and generating an output representing the object and at least a portion of the hidden surface.

The ordered set of two dimensional images may comprise a video, e.g., a compressed digital video file such as MPEG-1, MPEG-2, MPEG-4, etc.

The at least one image in the ordered set of two dimensional images may be taken by a single, stationary camera. The object may be associated with a motion vector automatically extracted from the ordered set of images. The object may be identified by image pattern recognition. The object may be identified by metadata within an information stream accompanying ordered set of two dimensional images.

It is also an object to provide a method, and system for carrying out that method, and a computer readable medium storing instructions adapted for controlling as programmable processor to carry out the method, comprising: receiving as input an image; developing a depth representation of at least one object in the image, comprising: calculating an interpolation function for estimating the three dimensional position of items in the two dimensional image on the basis of at least an estimated height of the camera and an estimated angle relative to a horizontal plane of the camera, using said interpolation function to calculate a distance of the at least one object from the camera, and converting said distance of the at least one object from the camera into a depth of the at least one object in the scene; predicting a shape and color of at least a portion that is not visible in the two dimensional image of the at least one object on the basis of at least one of an Internet lookup, a database lookup, and a table lookup; converting said depth of the at least one object and said a shape and color of at least a portion that is not visible in the two dimensional image of the at least one object in the scene into a three dimensional model of said at least one object; and storing in memory a representation of the three dimensional model of said at least one object.

The interpolation function may be one of a Newton divided difference interpolation function and a LaGrange interpolation function.

The two dimensional image may be taken by a single camera, multiple cameras, and/or stationary or moving camera(s). The focal length of the camera may be known or unknown, or vary (zoom) between the various images.

The processing speed of the method is real-time.

The three dimensional model may be expressed in a format configured to be displayed on a three dimensional screen.

It is further an object to provide a method, and system for carrying out that method, and a computer readable medium storing instructions adapted for controlling as programmable processor to carry out the method, comprising: receiving a representation of a two dimensional image; classifying at least one region of the two dimensional image; extracting at least one vanishing line and at least one vanishing point from the two dimensional image; extracting at least one depth gradient in the image on the basis of at least one of said at least one vanishing line and said at least one vanishing point; predicting a shape and color of at least one object that is not visible in the two dimensional image but is visible in the three dimensional model on the basis of at least one of an Internet lookup, a database lookup, and a table lookup; creating a three dimensional model of at least a portion of the two dimensional image on the basis of said at least one depth gradient and said prediction of a shape and color of at least one object; and storing in a memory the three dimensional model of at least a portion of the two dimensional image.

The at least one region of the two dimensional image may be one of sky, land, floor, or wall. The extraction of at least one vanishing line and at least one vanishing point may be on the basis of whether the image is an indoor image, an outdoor image with geometric features, or an outdoor image without geometric features. The three dimensional model may be expressed in a format configured to be displayed on a three dimensional screen. The two dimensional image may be taken by a single, stationary camera. The processing speed may approach real-time, that is, the processing burden is within the capabilities of the processor to avoid generally increasing backlog, and the latency is sufficiently low to avoid a lag that is disruptive to the user.

It is another object to provide at least one processor comprising: an input for receiving a representation of an ordered set of two dimensional images; a memory comprising computer instructions for analyzing the ordered set of two dimensional images to determine at least one first view of an object in at least two dimensions and at least one motion vector; a memory comprising computer instructions for analyzing the combination of the first view of the object in at least two dimensions, the motion vector, and the ordered set of two dimensional images to determine at least a second view of the object; a memory comprising computer instructions for generating a three dimensional representation of the ordered set of two dimensional images on the basis of at least the first view of the object and the second view of the object; and an output providing an indicia of the three dimensional representation. The ordered set of two dimensional images may comprise a video. At least one image in the ordered set of two dimensional images may be taken by a single, stationary camera. The motion vector may correspond to an object in the image. The at least one processor may be configured to operate in real-time.

The processor may further comprise a predictor for a shape and color of at least one object that is not visible in the image but is visible in the three dimensional model on the basis of image pattern recognition.

It is a still further object to provide at least one processor comprising: an input for receiving an image; a memory comprising computer instructions for developing a depth representation of at least one first object in the image comprising: computer instructions for calculating an interpolation function for estimating the three dimensional position of items in the two dimensional image on the basis of at least an estimated height of the camera and an estimated angle relative to a horizontal plane of the camera, computer instructions for using said interpolation function to calculate a distance of the at least one first object from the camera, and computer instructions for converting said distance of the at least one object from the camera into a depth of the at least one first object in the scene; a predictor for a shape and color of at least a portion that is not visible in the two dimensional image of the at least one object on the basis of at least one of an Internet lookup, a database lookup, and a table lookup; a memory comprising computer instructions for converting said depth of the at least one object and said a shape and color of at least a portion that is not visible in the two dimensional image of the at least one object in the scene into a three dimensional model of said at least one object; and an output for providing a representation of the three dimensional model of said at least one object.

The interpolation function may be one of a Newton divided difference interpolation function and a LaGrange interpolation function. The image may be taken by a single, two dimensional camera. The processor in some cases may operate without the focal length of the camera being provided. The at least one processor may be configured to operate in real-time. The three dimensional model may be expressed in a format configured to be displayed on a three dimensional screen.

It is another object to provide at least one processor comprising: an input for receiving a representation of a two dimensional image; a memory configured to store: machine instructions for classifying at least one region of the two dimensional image; machine instructions for extracting at least one vanishing line and at least one vanishing point from the two dimensional image; machine instructions for extracting at least one depth gradient in the image on the basis of at least one of said at least one vanishing line and said at least one vanishing point; machine instructions for predicting a shape and color of at least one object that is not visible in the two dimensional image but is visible in the three dimensional model on the basis of at least one of an Internet lookup, a database lookup, and a table lookup; machine instructions for creating a three dimensional model of at least a portion of the two dimensional image on the basis of said at least one depth gradient and said prediction of a shape and color of at least one object; and an output for at least one of storing and providing the three dimensional model of at least a portion of the two dimensional image.

The at least one region of the two dimensional image may be one of sky, land, floor, and wall. The machine instructions for extracting at least one vanishing line and at least one vanishing point may operate on the basis of whether the image is an indoor image, an outdoor image with geometric features, or an outdoor image without geometric features. The three dimensional model may be expressed in a format configured to be displayed on a three dimensional screen. The image may be taken by a single, two dimensional camera. The at least one processor may be configured to operate in real-time.

Another object provides a non-transitory computer readable medium comprising instructions for: receiving as input a representation of an ordered set of images; analyzing the ordered set of images to determine at least one first view of an object in at least two dimensions; automatically identifying the object and obtaining information extrinsic to the ordered set of two dimensional images describing the object; analyzing the combination of the first view of the object in at least two dimensions, and the information describing the object, to infer a state of a hidden surface in the ordered set of two dimensional images; generating an output representing the object and at least a portion of the hidden surface. The ordered set of images may comprise a video. At least one image in the ordered set of two dimensional images may be taken by a single, two dimensional camera. The object may be associated with a motion vector automatically extracted from the ordered set of images. The object may be identified by image pattern recognition. The object may also be identified by metadata within an information stream accompanying ordered set of two dimensional images.

A further object provides a non-transitory computer readable medium comprising instructions for: receiving as input an image taken by a camera; developing a depth representation of at least one object in the image through the steps of: calculating an interpolation function for estimating the three dimensional position of items in the image on the basis of at least an estimated height of the camera and an estimated angle relative to a horizontal plane of the camera, using said interpolation function to calculate a distance of the at least one object from the camera, and converting said distance of the at least one object from the camera into a depth of the at least one object in the scene; predicting a shape and color of at least a portion that is not visible in the image of the at least one object by image pattern recognition; converting said depth of the at least one object and said a shape and color of at least a portion that is not visible in the two dimensional image of the at least one object in the scene into a three dimensional model of said at least one object; and providing a representation of the three dimensional model of said at least one object. The interpolation function may be one of a Newton divided difference interpolation function and a LaGrange interpolation function. The image may be taken by a single, stationary camera. A focal length of the camera may be provided or absent from an input signal. The instructions may be processed in real-time. The three dimensional model may be expressed in a format configured to be displayed on a three dimensional screen.

A still further object provides a non-transitory computer readable medium comprising instructions for: receiving a representation of an image; classifying at least one region of the image; extracting at least one vanishing line and at least one vanishing point from the image; extracting at least one depth gradient in the image on the basis of at least one of said at least one vanishing line and said at least one vanishing point; predicting a shape and color of at least one object that is not visible in the two dimensional image but is visible in the three dimensional model by image pattern recognition; creating a three dimensional model of at least a portion of the image on the basis of said at least one depth gradient and said prediction of a shape and color of at least one object; and providing the three dimensional model of at least a portion of the image.

The at least one region of the image may be one of sky, land, floor, and wall. The extraction of at least one vanishing line and at least one vanishing point may be on the basis of whether the image is an indoor image, an outdoor image with geometric features, or an outdoor image without geometric features. The three dimensional model may be expressed in a format configured to be displayed on a three dimensional screen. The two dimensional image may be taken by a single, stationary camera. The instructions may be processed in real-time.

Another object provides a method of presenting a three dimensional film to a viewer comprising: calculating a distance and an angle from the viewer to a screen; applying at least one transform to a representation of a scene to produce a three dimensional model corresponding to the distance and the angle from the viewer to the screen; and presenting on the screen a three dimensional image corresponding to the three dimensional model.

Another object provides at least one processor configured to present a three dimensional film to a viewer comprising: an input port configured to receive information representing at least a relative position of a viewer with respect to a display screen; a computational unit configured to calculate a distance and an angle from the viewer to the screen, to apply at least one transform to a representation of a scene to produce a three dimensional model corresponding to the distance and the angle from the viewer to the screen, and to generate an output signal representing a three dimensional image corresponding to the three dimensional model; and an output port configured to present the output signal.

A further object provides a non-transitory computer readable medium comprising instructions for presenting a three dimensional film to a viewer comprising: calculating a distance and an angle from the viewer to a screen; applying at least one transform to a representation of a scene to produce a three dimensional model corresponding to the distance and the angle from the viewer to the screen; and presenting on the screen a three dimensional image corresponding to the three dimensional model.

At least one of the distance and the angle from the viewer to the screen may be calculated on the basis of an article of clothing or an accessory worn by the viewer. At least one of the distance and the angle from the viewer to the screen may also be calculated on the basis of at least one image taken by a camera connected to the screen. At least one of the distance and the angle from the viewer to the screen may be calculated on the basis of camera parameters. At least one of the distance and the angle from the viewer to the screen may be calculated on the basis of image parameters not related to the camera.

The at least one transform may be a 2D to 3D transform and/or a 3D to 3D transform. A processing speed of the method may be real-time. The screen may be configured to be used as a touch screen.

A further object provides a system and method of converting a 2D video file to a 3D video file comprising: at least one of receiving and extracting sound data and image data from a 2D video file; calculating a characteristic delay of a sound in the 2D video file coming from a source in at least one image associated with the 2D video file; auto-correlating sound data associated with the 2D image file between channels; ascertaining amplitude and equalization features to calculate a likely position of a source of at least one sound in the 2D video file; and at least one of providing as output and storing in memory a representation of the likely position of the source of at least one sound in the 2D video file.

Another object provides a processor configured for converting a 2D video file to a 3D video file comprising: an input configured to at receive a 2D video file; a memory comprising computer instructions to extract sound and image data from the 2D video file; a memory comprising computer instructions to calculate a characteristic delay of a sound in the 2D video file coming from a source in at least one image associated with the 2D video file; a memory comprising computer instructions to auto-correlate sound data associated with the 2D image file between channels; a memory comprising computer instructions to ascertain amplitude and equalization features; a memory comprising computer instructions to calculate a likely position of a source of at least one sound in the 2D video file; and an output configured to provide a representation of the likely position of the source of at least one sound in the 2D video file.

A further object provides a non-transitory computer readable medium configured to convert a 2D video file to a 3D video file comprising computer instructions for: at least one of receiving and extracting sound data and image data from a 2D video file; calculating a characteristic delay of a sound in the 2D video file coming from a source in at least one image associated with the 2D video file; auto-correlating sound data associated with the 2D image file between channels; ascertaining amplitude and equalization features to calculate a likely position of a source of at least one sound in the 2D video file; and providing a representation of the likely position of the source of at least one sound in the 2D video file.

An output may be provided representing an error in the calculation in response to detecting at least one of an echo or an inconsistency between the sound data from the 2D video file and the image data from the 2D video file. The 2D video file may comprise a compressed digital video file, e.g., an MPEG-1, an MPEG-2, an MPEG-4, an MOV, a QT, a Divx, a Xvid, a WMV, a WMP, an FLV, and an h.264 format. The representation of the error may comprise a Boolean value.

It is also an object to provide a method comprising: receiving a representation of a two dimensional audiovisual presentation; selecting at least one sound in the two dimensional audiovisual presentation; associating the at least one sound with at least one visual object in the two dimensional audiovisual presentation; creating a three dimensional spatial model of the visual object consistent with an inferred spatial origin of the at least one sound; and outputting a representation in dependence on the three dimensional spatial model of the visual object.

The associating the at least one sound with at least one visual object in the two dimensional audiovisual presentation may comprises: calculating at least one characteristic delay of the at least one sound; auto-correlating at least a portion of the sound data with at least a portion of the visual data; ascertaining amplitude and equalization features to calculate a likely position of a source of the at least one sound; and associating the at least one sound with an object in the likely position of the source of the at least one sound; and providing an output representing said object in the likely position of the source of the at least one sound.

A further object provides a processor comprising: an input configured to receive a representation of a two dimensional audiovisual presentation; a computational unit configured to select at least one sound in the two dimensional audiovisual presentation; associate at least one visual object in the two dimensional audiovisual presentation as an inferred source of the at least one sound; and to create a three spatial dimensional model of the visual object consistent as the inferred source with the at least one sound; and an output configured to provide a representation of the three dimensional model of the visual object associated with the at least one sound.

The audiovisual presentation may comprise a vector quantized, extracted motion vector, compressed digital video file.

The computational unit may be further configured to calculate a characteristic delay of the at least one sound; auto-correlate at least a portion of the sound data with at least a portion of the visual data; ascertain amplitude and equalization features to calculate a likely position of a source of the at least one sound; and associate the at least one sound with an object in the likely position of the source of the at least one sound.

A still further object provides a non-transitory computer readable medium comprising computer instruction for: receiving a representation of a two dimensional audiovisual presentation; selecting at least one sound in the two dimensional audiovisual presentation; associating the at least one sound with at least one visual object in the two dimensional audiovisual presentation, wherein the object at least one of emits and modified the at least one sound; creating a three dimensional spatial model of the visual object associated with the at least one sound wherein the model is derived in part from, and is consistent with, the object at least one of emitting and modifying the sound; and providing an output selectively dependent on a representation of the three dimensional spatial model of the visual object associated with the at least one sound.

The audiovisual presentation may comprise a representation of a live sports event.

The instructions for associating the at least one sound with at least one visual object in the two dimensional audiovisual presentation may comprise instructions for: auto-correlating both the sound data with the visual data; ascertaining audio echo, amplitude and equalization features to calculate a likely position of a source of the at least one sound; and associating the at least one sound with an object as its inferred source in the likely position of the source of the at least one sound.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 provides a geometric model from extracting depth from a moving camera, according to one embodiment of the invention.

FIG. 2 illustrates a search window and a matching window, according to one embodiment of the invention.

FIG. 3 illustrates an example of calculating Φ(u_i,v_i) for three regions.

FIG. 4 illustrates a computer system that could be used to implement the invention.

FIG. 5 is a set of equations applied in the specification.

FIG. 6 illustrates an image that can be analyzed in accordance with an embodiment of the present invention.

FIG. 7 is a set of equations applied in the specification.

FIGS. 8A and 8B are sets of equations applied in the specification.

FIG. 9 illustrates a depth gradient assignment graph for a plane generated by two vanishing lines, in accordance with an embodiment of the invention.

FIG. 10 illustrates parallax relations with respect to a screen in accordance with an embodiment of the invention.

FIG. 11 illustrates stereoscopic image pair generation in accordance with an embodiment of the invention.

FIG. 12 is a flow chart of an embodiment of the present invention involving creating three dimensional representations.

FIG. 13 is a flow chart of an embodiment of the present invention involving a method of presenting three dimensional images.

FIGS. 14A and 14B illustrate the operation of a touch screen machine, in accordance with one embodiment of the invention.

FIG. 15 illustrates a mechanism by which a 3D touch screen device ascertains the position of a user, in accordance with one embodiment of the invention.

FIG. 16 illustrates a search engine for 3D models, according to one embodiment of the invention.

FIG. 17 illustrates a flow chart for a method of calculating a position of a sound source from sound and image data, which may be available in a MPEG or similar video file, according to an embodiment of the invention.

FIG. 18 illustrates a method of image segmentation, according to one embodiment of the invention.

FIG. 19 illustrates a method of creating a 3D representation of at least a portion of a 2D video, according to an embodiment of the invention.

FIG. 20 illustrates a method of developing a three dimensional video from a two dimensional video, according to an embodiment of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Several methods of 2D to 3D image and video conversion are known in the art. See, e.g., Sullivan, U.S. Pat. No. 7,573,475, expressly incorporated herein by reference. These methods generally require the selection of two images and converting one image into a “left eye view” and another image into a “right eye view.” This technique is useful when there is a moving camera and the region to be viewed is relatively stationary. For example, a 3D model of a mesa in the Grand Canyon or Mount Washington in New Hampshire could be generated through this method.

However, in some situations only a single camera is available and there is movement in the scene to be imaged, for example, in a 2D recording of a college football game. Alternatively, there may be only a single 2D photograph of an image and a 3D model is desired. For example, a person involved in a car accident might have taken a single 2D photograph on his cell phone camera.

Depth Detection

Two dimensional imaging involves placing pixels on a screen and assigning each pixel a color. Three dimensional imaging is more complicated to the degree that the location (in three rather than two dimensions) and depth of each object must be known in order to properly model and render the objects in the viewing area.

Murphey describes a depth detection system that can be used for machine-aided driving. Murphey, “Depth Finder, A Real-time Depth Detection System for Aided Driving,” IEEE, 2000. Murphey's system, with several modifications, could be used to provide depth detection 2D to 3D image conversion in television as well. This system could be run on a substantially arbitrary television coupled with a processor, video card or graphics processing unit (GPU). Alternatively, it could be run on a substantially arbitrary modern computer, such as a HP Pavilion DV3T laptop running a Microsoft Windows 7 operating system. Persons skilled in the art will recognize that other operating systems, for example Apple Macintosh OS X or Linux, could be used instead of Windows 7.

Many military and civil applications require the distance information from a moving vehicle to targets from video image sequences. For indirect driving, lack of perception of depth in view hinders steering and navigation. A real-time depth detection system, a system that finds the distances of objects through a monocular vision model, is disclosed herein. This depth detection system can be used with a camera mounted either at the front or side of a moving vehicle. A real-time matching algorithm is introduced to improve the matching performance by several orders of magnitude.

The application of computer vision and image processing can derive significant advantage in a number of military and civil applications including global picture generation and aided driving. Real-time depth detection is a significant component in these applications, in particular in vision-aided driving. Much research in depth detection has been conducted using stereo vision techniques. Stereo vision establishes correspondence between a pair of images acquired from two well-positioned cameras. In this paper, we present our research in depth finding from monocular image sequences. These can be used when two cameras or two different views of an image are not available.

Monocular vision is interesting to a number of military and civil applications. For example, monocular vision is necessary if a substantially arbitrary sequence of 2D images, such as a 2D video recording, is to be automatically converted to 3D. In other examples, indirect vision through the use of cameras can allow the crew of a military tank to operate the vehicle under full armor protection. In order to provide a full view of entire surroundings to a tank crew, we need to have a suite of cameras mounted at the front, rear, and sides of the tank with each camera providing the coverage for a specific area of the scene. Due to practical limitations on channel bandwidth and cost, depth finding using stereo cameras is an unlikely alternative. For indirect driving, lack of perception of depth in a monoscopic view hinders steering and navigation. Furthermore, depth finding from a monocular image sequence can be used as a fault tolerant solution when one camera in a stereo system is damaged.

This application describes, in part, a real-time depth detection system developed for in-vehicle surveillance. A video camera can be mounted at the front or the side of a vehicle or placed in a substantially arbitrary location. The major task of the depth finding system is to provide the current distance from a queried object to the vehicle. Computational speed is a critical issue in a scenario where both vehicle and the object can be moving.

Real Time Depth Finding

The computation of depth from a monocular image sequence obtained in the time domain is based on the geometric model shown in FIG. 1.

FIG. 1 illustrates the variables used in the formulas in FIG. 5 and discussed herein. A camera is moved from point O₁110 (at time t₁) to point O₂120 (at time t₂). The camera is being used to view an image at point P 130. The axis along which the camera is moved 140, is termed the optical axis. H is the distance between P 130 and the optical axis 140. R₁is the projection of the distance between P 130 and the lens at moment t₁on the optical axis. R₂is the projection of the distance between P 130 and the lens at moment t₂on the optical axis. I₁is the distance between the lens and the image plane at moment t₁. I₂is the distance between the lens and the image plane at moment t₂. I₁and I₂should be very close to one another. D₁is the location of P on the image plane at moment t₁. D₂is the location of P on the image plane at moment t₂. θ₁is the location of P on the image plane at moment t₁. θ₂is the location of P on the image plane at moment t₂.

From geometric optics, we have equation (1) in FIG. 5, where f is the focal length of the camera lens. Equation (2) in FIG. 5 can be derived from equation (1), where L=R₁−R₂is the distance that the vehicle has moved during the time period that the two image frames are captured and D₂−D₁is the disparity between the two images taken at time t₁and t₂. Due to the compact size of the camera, we can assume that R₂>>I₂(i.e. the distance between the object and the lens is much greater than the distance between the lens and the image plane inside the camera) and H>>D₂(i.e. the actual size of the object is much greater than its image on the image plane). Thus, equation (2) becomes equation (3) in FIG. 5.

According to equation in (3), the computation of the distance from an object to the vehicle involves two stages. First, search two sequential images to find matching objects. Finding matching objects gives us the disparity, (Δx,Δy), or relative movement of the object from frame to frame. Second, we use the camera parameters and the disparity from step 1 to calculate the depth for each object of interest. The first step involves finding the correspondence of object match between the two image frames. There are a number of approaches being studied in the correspondence problem such as matching edges, object contour, or corners. These approaches depend very much on the outcome of the image feature extraction algorithms, which are also computationally demanding. In one embodiment, intensity feature is used to match corresponding pixels in the two adjacent image frames. In order to have accurate and efficient matching, a number of motion heuristics including maximum velocity change, small change in orientation, coherent motion and continuous motion are provided. Based on these heuristics, for a pair of images I_tand I_t+1, we define a matching window and a search window to compute the correspondence problem (see FIG. 2). The match window 210, the smaller square in FIG. 2, is used to compute the similarity between the two portions in I_tand I_t+1. The search window 220, the two shaded triangles, is used to limit the search for the possible location of a particular pixel in the image frame I_t+1.

The disparity between two images are computed as follows. For a pixel (x,y) in I_t, its corresponding location in I_t+1is found by using the maximum likelihood function given in equation (4) of FIG. 5, where p and q should be within the matching window, and, Φ(u_i′,v_j′)≥Φ(u_i,v_j) for all (u_i′,v_j′) within the search window. The use of the effective window reduces both the computation time and matching error. However, for an image of 540×480 and at the frame rate of 15 fps, the determination of disparity is perhaps still too computationally intensive. A brute force implementation of the algorithm requires the worst computation on the order O(L*H*Dx*Dy*p*q), where L and H are the image width and height, Dx and Dy are the maximum horizontal and vertical disparity values, and p and q are the matching region width and heights as defined above. For a 540×480 image with maximum disparity values Dx=Dy=4 and an average-sized matching window of 32×32, the number of comparisons (differences) that must be taken approaches 2 billion. In order to determine depth accurately while the vehicle is in motion, this computation time must be reduced by at least three orders of magnitude. To produce this computational speedup we use three techniques. First, we apply a dynamic programming algorithm to eliminate the matching window size (p,q) from the complexity. Second, we target the modern cache-dependent processor by localizing data access in the computation. Third, if the computation on a particular processor or image size is still not fast enough, we only calculate the depth of certain subregion(s) of the image.

The dynamic programming algorithm is based on the fact that many of the computations (i.e. intensity difference calculations) are repetitive. We use FIG. 3 to illustrate the algorithm.

FIG. 3 illustrates an example of calculating Φ(u_i,v_i) for three regions centered at pixel's (12, y) 330, (13, y) 340, and (14, y) 350, respectively. The region size is q=5, q=1, and u_i=v_i=0 is fixed. Note that each row of pixels in the figure is the same row (y) of the image; however, the region has shifted. The calculation is completed for both images I₁310 and I₂320.

For the convenience of description, we superimpose a linear window (i.e. q=1) on images I₁and I₂. We denote each difference and square operation with a pair of x coordinates, (x,x′) where x is the coordinate in image I₁and x′ is the corresponding coordinate in I₂. As the matching window shifts across I₁, we re-calculate the square difference for the same pixel exactly q−1 times (ignoring the boundary conditions). When the window becomes rectangular (i.e. p>0), we perform (p−1) repetitive calculations in the vertical direction. Therefore, we can implement the matching algorithm as follows. We first vary the center pixel before varying the disparity (u_i,v_i). This allows us to store the results for each square difference calculation in a table and look them up as needed. This allows us to store the results for each square difference calculation in a table and look them up as needed. The data being stored are the sum and difference calculations for a single row of the image at a time (or possibly a q×L), and the minimum Φ(u_i,v_i) and the associated disparity, (u_i,v_i), in a table for every pixel in the image. This implementation reduces the computational complexity by a factor of the window size, p×q, while the extra storage is proportional to the size of the image.

The implementation of the dynamic programming fully utilizes the localization of main memory address. It is well known in the art that localizing address requests to small regions of memory for long periods of time maximizes cache hits and can significantly increase performance. In an embodiment of the invention, address requests are localized in two ways. First, the invention involves making copies of the sub-region of the image needed in the calculation, and operate on those copies. Second, the invention involves exhaustively calculating all possible calculations (including saving intermediate results necessary for the dynamic method) on each row of the image before proceeding to the next row.

Combining the dynamic programming algorithm and cache-targeted optimizations, the invention reduces the computation time for an input image.

The final calculation speed is dependent on the application. If the video is being streamed in at 15-30 frames per second (fps) and depth calculations are required at least every 2-5 frames, the inventive system and method targets the depth calculations to cover small regions of interest. In one embodiment, the size of these regions is defined by the user and their selection is indicated by using a pointing device. Alternatively, a pre-selected region of interest, e.g. the center of the television screen, could be used. Because the depth calculation includes motion estimation of image object, we are able to track objects from frame to frame while displaying their depth. For applications that require infrequent updates of depth information (approximately once a second), as in the case of a slow moving robotic vehicle, the invention provides a system and method to calculate the depth across the entire image and display depths at user request for particular objects of interest. Murphey, “Depth Finder, A Real-time Depth Detection System for Aided Driving,” IEEE, 2000.

Persons skilled in the art will recognize that there are other methods of depth perception with a single camera. For example, if the camera is not moving, interpolation techniques can be used. In order to find the parameters of the interpolation function, a set of lines with predefined distance from camera is used, and then the distance of each line from the bottom edge of the picture (as the origin line) is calculated. The results of implementation of this method show higher accuracy and less computation complexity with respect to the other methods. Moreover, two famous interpolation functions namely, Lagrange and Divided Difference are compared in terms of their computational complexity and accuracy in depth detection by using a single camera. M. Mirzabaki and A. Aghagolzadeh, “Introducing a New Method for Depth Detection by Camera Using LaGrange Interpolation,” The Second Iranian Conference on Machine Vision, Image Processing & Applications, 2003.

Depth finding by using camera and image processing, have variant applications, including industry, robots and vehicles navigation and controlling. This issue has been examined from different viewpoints, and a number of researches have conducted some valuable studies in this field. All of the introduced methods can be categorized into six main classes.

The first class includes all methods that are based on using two cameras. These methods originate from the earliest research in this field that employ the characteristics of human eye functions. In these methods, two separate cameras are placed on a horizontal line with a specified distance from each other and are focused on a particular object. Then the angles between the cameras and the horizontal line are measured, and by using triangulation methods, the vertical distance of the object from the line connecting two cameras is calculated. The main difficulty of these methods is the need to have mechanical moving and the adjustment of the cameras in order to provide proper focusing on the object. Another drawback is the need for two cameras, which increases the cost and the system need to be replaced if one of the cameras fails.

The second class emphasizes using only one camera. In these methods, the base of the measurement is the amount of the image resizing in proportion to the camera movement. These methods need to know the main size of the object subjected to distance measurement and the camera's parameters such as the focal length of its lens.

The methods in the third class are used for measuring the distance of moving targets. In these methods, a camera is mounted on a fixed station. Then the moving object(s) is (are) indicated, based on the four scenarios: maximum velocity, small velocity changes, coherent motion, and continuous motion. Finally, the distance of the specified target is calculated. The major problem in these methods is the large amount of the necessary calculations.

The fourth class includes the methods which use a sequence of images captured with a single camera for depth perception based on the geometrical model of the object and the camera. In these methods, the results will be approximated. In addition, using these methods for the near field (for the objects near to the camera) is impossible.

The fifth class of algorithms prefer depth finding by using blurred edges in the image. In these cases, the basic framework is as follows: The observed image of an object is modeled as a result of convolving the focused image of the object with a point spread function. This point spread function depends both on the camera parameters and the distance of the object from the camera. The point spread function is considered to be rotationally symmetric (isotropic). The line spread function corresponding to this point spread function is computed from a blurred step edge. The measure of the spread of the line spread function is estimated from its second central moment. This spread is shown to be related linearly to the inverse of the distance. The constants of this linear relation are determined through a single camera calibration procedure. Having computed the spread, the distance of the object is determined from the linear relation.

In the last class, auxiliary devices are used for depth perception. One of such methods uses a laser pointer which three LEDs are placed on its optical axis, built in a pen-like device. When a user scans the laser beam over the surface of the object, the camera captures the image of the three spots (one for from the laser, and the others from LEDs), and then the triangulation is carried out using the camera's viewing direction and the optical axis of the laser.

The main problem of these methods is the need for the auxiliary devices, in addition to the camera, and consequently the increased complexity and cost.

Proposed Method

In one embodiment, two steps are provided. The first step is calculating an interpolation function based on the height and the horizontal angle of the camera. The second step involves using this function to calculate the distance of the object from the camera.

In the first step, named the primitive evaluation phase, the camera is located in a position with a specified height and a horizontal angle. Then, from this position, we take a picture from some lines with equal distance from each other. Then, we provide a table in which the first column is the number of pixels counted from each line to the bottom edge of the captured picture (as the origin line), and the second column is the actual distance of that line from the camera position.

Now, by assigning an interpolation method (e.g. Lagrange method) to this table, the related interpolation polynomial, equation (5) in FIG. 5, is calculated

In this formula, x is the distance of the object from the camera, and n is the number of considered lines in the evaluation environment in the first step.

In the second step of this method—with the same height and horizontal angle of the camera—the number of the pixels between the bottom edge of the target in the image (the nearest edge of an object in the image to the base of the camera) and the bottom edge of the captured image is counted and considered as x values in the interpolation function.

The output of this function will be the real distance between the target in the image and the camera.

This method has several advantages in comparison to the previous methods, such as the one discussed above.

a) Only one stationary camera is involved.

b) There is no direct dependency on the camera parameters, such as focal length, etc.

c) There is a small number of uncomplicated calculations, allowing for fast processing speed.

d) No auxiliary devices are required. Thus, the method can be applied to substantially arbitrary images taken by substantially arbitrary cameras.

e) Having a constant response time, as the method comprises a fixed amount of calculations. Therefore, the method is useful in applications like 2D to 3D television program conversion, where response time is important. (For example, many sports viewers prefer to view sportscasts in “real time” as the game is being played, rather than with some delay. Also, many viewers like to watch movies recorded on DVD or similar media immediately after they insert the disk into the player.)

f) The method exhibits low fault in calculating distances in the evaluation domain.

g) This method can be used for both stationary and moving targets. Therefore, a moving camera is not necessary, as in the example above.

Why is the LaGrange Method Used?

There are two well known interpolation methods: The LaGrange and the Divided difference method of Newton. But for the purpose of the method proposed above, the LaGrange method is preferred for the following reasons.

1) In the method of the divided difference of Newton, by adding new points before the first point or after the last point of the table, a few extra operations are needed to correct and adjust the previous interpolation polynomial with the new situation. In the LaGrange method, on the other hand, all of the operations must be recommenced. For the purposes of the method, this feature is not important as the number of points determined in the evaluation phase and after that time will be constant.

2) Although the error of both methods is approximately equal, the number of the division operations in the latter method is more than the former. In the Lagrange method, for n points there are n division operations, but in the Newton method, there are n(n−1)/2 such operations. As demonstrated here, for more than three points the number of the divisions in Newton case is more than that of the other one. Division causes floating point error as in digital computers, so the error in the Newton method will be greater than the error in the Lagrange method.

3) In Newton's divided difference method, each phase needs the result of the previous phase to complete its calculation. Therefore, although the number of operations in the Lagrange interpolation may be, because of parallel processing, more than the Newton one, the total computation time will be less than the second one's.

Reviewing the above reasons, it can be concluded that the LaGrange interpolation method is preferred over the Newton method. However, the invention is not limited to LaGrange interpolation, and either of these methods, as well as other methods of interpolation, may be applied. M. Mirzabaki and A. Aghagolzadeh, “Introducing a New Method for Depth Detection by Camera Using LaGrange Interpolation,” The Second Iranian Conference on Machine Vision, Image Processing & Applications, 2003.

The methods of Mirzabaki and Murphey, substantially described above, can be used to detect the depth in substantially arbitrary images taken by single cameras. Murphey works best when the camera is moving and the surrounding scene is substantially immobile, for example a camera moving around a room in a video for a real estate advertisement. Mirzabaki is best applied when there is at least one object in the scene whose height can be approximated and there is at least one length that can be approximated. This is true of most television images. For example, most adult men are within 10% of two meters in height. Most television clips feature at least one scene where an adult man is involved. Therefore, an object having a fixed height can be obtained fairly easily. In addition, many video clips feature objects of known length, width, and height. For example, a small sedan, such as the Toyota Corolla S 2010, has a height of 57.7 inches, a width of 69.3 inches, and a length of 178.7 inches. Toyota Corolla Performance & Specs, www.toyota.com/corolla/specs.html, last accessed May 20, 2010. The dimensions of other common objects and scenes can also easily be determined and stored in the memory of a substantially arbitrary computer processor, such as a HP Pavilion DV3 laptop running a Microsoft Windows 7 operating system. Of course, other machines and operating systems can also store this information. Persons skilled in the art will recognize that There are many known automatic facial and object recognition techniques. For example, Commons, U.S. Pat. No. 7,613,663, incorporated herein by reference, presents a method of facial and object recognition using hierarchal stacked neural networks. Other methods of facial recognition are discussed in Steffens, U.S. Pat. No. 6,301,370, incorporated herein by reference. Object recognition systems and methods are disclosed in detail by McQueen, U.S. Pat. No. 6,069,696, incorporated herein by reference. The methods presented herein, when coupled with the facial and object recognition techniques of Commons, Steffens, McQueen, and others, can be used to provide a representation of the depth of the objects in a substantially arbitrary set of consecutive 2D images or single 2D image.

The next step is to provide a representation of all of the objects in the image and their depths and provide this information to a module that would produce a left eye view and a right eye view of the image. These left eye and right eye views are preferably distinct and may be viewed either directly on the television screen or through special 3D glasses designed for 3D viewing on the screen.

In another embodiment, depth detection may be accomplished by providing a 3D model of a portion of the objects in a scene. For example, in a video of a college football game played in the United States during a given season, such as fall 2009, 3D models could be developed of all of the college football players in the country, all of the stadiums, and of the ball used in the game. Such a 3D model can be stored in a 10 GB or smaller processor RAM, allowing the method described herein to be implemented on a substantially arbitrary modern computer, such as a HP Pavilion DV3 running a Microsoft Windows 7 operating system. Alternatively, an Apple or Linux operating system can be used instead of Windows. In one embodiment, the data is stored on a video card or GPU internal to the television. In another embodiment, the data is stored on a video game system external to the monitor, such as a Microsoft Xbox 360, Nintendo Wii or a Sony PlayStation.

It is noted that, while college football is used as an example here, other data sets can also be modeled with the system and method described herein. For example, World Cup American soccer games can also be modeled by generating a 3D model of the soccer stadium in which the game is played, all of the members of the opposing teams, and of the soccer ball. Alternatively, a 3D representation of a figure skating show could be provided by generating a 3D model of all of the skaters and of the ice rink.

In another embodiment of the invention, a 3D representation of a television soap opera, such as “Desperate Housewives,” could be generated from the original 2D recording. The 3D model would store a representation of all or most of the actors in the recording. In addition, a 3D model of common scenes in the soap opera would be stored. For example, in “Desperate Housewives,” it would be desirable to have a 3D model of the buildings on Wisteria Lane, the interior of the homes of main characters, and of items typically found inside houses, such as a bed, a sofa, a table, a chair, etc. As in the sports examples above, such a 3D model can be stored in a 10 GB or smaller processor allowing for the system and method described here to be implemented in a substantially arbitrary modern computer, such as a HP Pavilion DV3 running a Microsoft Windows 7 operating system, or an Apple or Linux computer. Alternatively, the invention could be implemented in a monitor with a GPU internal to the monitor or a monitor connected to an external GPU in a video game machine, such as a Sony PlayStation or similar device.

It is noted that, while “Desperate Housewives” is provided as an example here, other television serials, such as “Monk,” “Modern Family” or “Middle,” could be modeled through a similar mechanism to the one described for “Desperate Housewives,” as all of these shows feature repeating actors and repeating scenes.

It is further noted that, while in all of the above examples the 3D models were stored locally to the monitor, this is not necessary. In one embodiment, the 3D models are stored on the Internet or on a remote database and the processor, monitor, or video game system implementing the invention is provided with a method of accessing the Internet or the remote database. In another embodiment, the video is stored on the Internet, and audio and images from the video are processed through the Internet to generate a 3D model thereof to display to a viewer.

For example, home videos are typically recorded by individuals and shared with friends and family over YouTube.com or a similar video-sharing website. These videos may be recorded with 2D video cameras and uploaded in 2D format. Many home videos feature similar scenes and subjects, for example, babies, young children, water, tropical vacations, and life events (e.g. weddings, baby showers, Bar Mitzvahs, etc.) are common video subjects. See, generally, This American Life #225: Home Movies, originally aired Nov. 9, 2002 by Public Radio International, available at www.thisamericanlife.org/radio-archives/episode/225/home-movies?bypass=true, last accessed Jun. 15, 2010. A database storing 3D models of the common subjects of home videos could be stored on the Internet or provided for upload to interested parties. Such a database would store, among other data, a 3D model of a baby, a 3D model of a swimming pool, a 3D model of a palm tree, etc. The home videos could then be converted from 2D to 3D either when the video files are transferred from the camera to the laptop or desktop computer, when the video files are uploaded to the Internet, or when the video files are downloaded to the viewer's computer or 3D screen. In another embodiment, the 2D-to-3D conversion could be completed in the video camera. This is especially useful if the video camera is connected to a processor, for example a video camera in a smart phone such as the iPhone 3GS or iPhone 4G.

Three Dimensional Image Generation

Various systems and methods for generating a 3D image of a scene of which a 3D model exists are known in the art. For example, Tetterington, US App. 2006/0061651, incorporated herein by reference, discusses one 3D image generator configured for use in a video game system. Tetterington requires that the video game system have a model of a 3D “world” in which the game is played. A separate left image for the left eye and a separate right image for the right eye are generated by shifting the player look at position slightly to the left and slightly to the right from the monocular 2D position, resulting in a 3D view of the scene. Liquid crystal glasses worn by the user of the video game system, alternating between clear and dark synchronized with the vertical refresh rate of the television screen or monitor while generating synchronized left and right images, thereby allow each of the viewers eyes to independently view a separate image.

Battiato also discusses the generation of 3D stereoscopic image pairs from depth maps. Battiato, “3D Stereoscopic Image Pairs by Depth-Map Generation,” Association for Computing Machinery, 2004.

Battiato presents a new unsupervised technique aimed to generate stereoscopic views estimating depth information from a single image input. Using a single image input, vanishing lines and points are extracted using a few heuristics to generate an approximated depth map. The depth map is then used to generate stereo pairs. However, persons skilled in the art will note that other 3D representations and models may be used instead of stereo pairs. The overall method is well suited for real time application and works on color filtering array (CFA) data acquired by consumer imaging devices or on professionally-made photographs or videos.

Traditionally, to generate a 3D view from a single 2D view or image different tools have been required. These methods are not fully automatic application and require expensive computational resources beyond the means of a typical household having a television. The present invention proposes, in one embodiment, a single framework aimed to obtain the stereoscopic view avoiding the user interaction and reducing the computation complexity. Moreover, the depth map generation step is able to work directly onto a Bayer pattern image (CFA) further reducing band, memory and complexity requirements. Alternatively, a sub-sampled image can be used.

This proposed technique is based on a novel algorithm able to generate a depth map from a single image. The strength of the inventive method detailed below is then used to reconstruct the left and right view. The two steps are pipelined to obtain a single automatic 3D generation procedure. In order to obtain the depth map, an image preprocessing is required. It is composed of image classification, followed by vanishing lines and vanishing point extraction.

The stereoscopic pair image is then generated by calculating the parallax value of each object in the image extracting information only from the grey level depth map. The final left and right eye images give to the user a 3D perspective entertainment. The highlighted process is fully automatic and well suited for real-time application. The effectiveness of the proposed processing pipeline has been validated by an exhaustive set of experiments.

Image Pre-Processing

To generate the stereoscopic pair image (left and right view) the depth information of the objects inside the scene has to be estimated. In order to obtain the depth information, a preliminary image pre-processing is applied to extract the relevant information from the input image. The image is first classified as: Outdoor/Landscape, Outdoor with Geometric Elements or Indoor. According to each specific class, the relevant vanishing lines and the related vanishing point are then selected.

Image Classification

The main steps of the classification are summarized as follows:

Semantic region detection involves locating regions of the image, such as: Sky, Farthest Mountain, Far Mountain, Near Mountain, Land and Other. A preliminary color-based segmentation, which identifies chromatically homogeneous regions, helps to reduce incorrect region detection. To each detected region a fixed grey level is assigned.

Comparison of N sampled columns of the semantic regions detection output with a set of typical strings containing allowed region sequences.

Final classification is the step where the output of the previous step is used to classify the image according to heuristics.

Vanishing Lines Detection

The image classification result can also be used to properly detect some image features, like vanishing lines and related Vanishing Point (VP).

If the input image is classified as outdoor without geometric elements, as in FIG. 6, the lowest point in the boundary between the region A=Land U Other and the other regions is located. Using such a boundary point (x_b,y_b) 610 the coordinates of the VP are fixed to where W is the image's width. Moreover, the method generates a set of standard vanishing lines 620.

When the image is classified as outdoor with geometric appearance or indoor, the VP detection is conducted as follows. FIG. 7 illustrates some equations that are useful for the calculations discussed here.

1. Edge detection using a 3×3 Sobel mask. The resulting images, I_Sxand I_Sy, are then normalized and converted into a binary image I_E, eliminating redundant information.

2. Noise reduction of I_Sxand I_Syusing a standard lowpass filter 5×5.

3. Detection of the main straight lines, using I_Sxand I_Sy, passing through each edge point of IE, where m is the slope and is the intersection with the y-axis of the straight line defined by equation (1) and equation (2) of FIG. 7.

4. Each pair of parameters (m,q) is properly sampled and stored in an accumulation matrix, according to equation (3) of FIG. 7, where higher values correspond to the main straight lines of the original image.

5. The intersection between each pair of main straight lines is computed.

6. The VP is chosen as the intersection point with the greatest number of intersections around it, while the vanishing lines detected are the main straight lines passing close to VP.

Depth Map Generation

Taking into account the information collected in the pre-process analysis a series of intermediate steps are used to recover the final depth map. These steps can be summarized as: (1) gradient planes generation; (2) depth gradient assignment; (3) consistency verification of detected region; (4) final depth map generation.

FIGS. 8A and 8B illustrate show equations that are useful in this process.

Gradient Planes Generation

During this processing step, the position of the vanishing point in the image is analyzed. Five different cases can be distinguished, as illustrated in Table 1 of FIG. 8A, where Xvp and Yvp are the vanishing point coordinates onto the image plane and H and W the image height and width.

For each case a set of heuristics (Table 1 of FIG. 8A), based on vanishing lines slope and origin of the vanishing lines onto image plane allow generating horizontal and/or vertical planes (gradient planes) used to gradually set the depth variation.

Preferably, at least two vanishing lines 620 are detected prior to the operation of the method.

Depth Gradient Assignment

A grey level (corresponding to a depth level) is assigned to every pixel belonging to depth gradient planes.

Two main assumptions are used: (1) Higher depth level corresponds to lower grey values; and (2) the vanishing point is the most distant point from the observer.

In most cases, in horizontal planes the depth level is constant along the rows, while in vertical planes it is constant along the columns. The depth level is approximated by a piece-wise linear function, illustrated in Table 2 of FIG. 8B, depending on slopes m₁and m₂of vanishing lines generating the depth gradient plane.

FIG. 9 illustrates a graph 910 comprising two vanishing lines 912 from an image (not illustrated). These are converted to a depth gradient assignment graph 920.

This choice is justified by the consideration that human vision is more sensible to deep variations of close objects than for far ones. Thus the deep levels have a slope that increases going from the closest position to the farthest one (VP).

The output image obtained by regions detection step (qualitative depth map) is analyzed to verify the consistency of the detected regions. In fact, the regions have been detected only by color information. It is preferable, therefore, to analyze the positions, inside the image, of each region with respect to the others checking their dimensions. Using a set of heuristics, the columns of the image are properly scanned to produce some sequences of “regions” which are checked and, if necessary, modified for “consistency verification.” In this way false regions are eliminated from the image.

For example, if between two regions of the image classified as “Sky” there is a different region (e.g. mountain or land) with a vertical size more than a fixed threshold, the second “Sky” region is recognized as a false “Sky” region and is changed to the same type of the upper one.

Similar rules are used to detect the consistency for the others image regions.

Depth Map Generation by Fusion

In this step, the qualitative depth map and the geometric depth map are “fused” together to generate the final depth map M. Let M₁(x,y) be the geometric depth map and M₂(x,y) the qualitative depth map after the consistency verification analysis of the regions. The “fusion” between M₁(x,y) and M₂(x,y) depends on the image category.

1. If the image belongs to the indoor category, then M(x,y) coincides with M₁(x,y).

M(x,y)=M₁(x,y) for all (x,y) 0≤x≤W−1 and 0≤y≤H−1.

2. If the image is classified as outdoor with absence of meaningful geometric components (landscape, e.g. FIG. 6) then the image M(x,y) is obtained as follows:

M(x,y)=M₁(x,y) for all (x,y) in land or (x,y) in other.

M(x,y)=M₂(x,y) for all (x,y) in not land and (x,y) in not other.

3. If the image is classified as outdoor with geometric characteristics, then the image M(x,y) is obtained as follows:

M(x,y)=M₂(x,y) for all (x,y) in sky.

M(x,y)=M₂(x,y) for all (x,y) in not sky.

Stereoscopic Pair Image Generation

Above, a method to reconstruct the binocular view of an image from a monocular view of the image has been proposed. The stereoscopic image pair is obtained by extracting the parallax values from the generated depth map and applying them to the single view input image. The parallax views can be thought as functions of the inter-ocular distance, or baseline B, considering that:

an introduced big difference between consecutive depth layers, thus between foreground and background, affects the images with unresolved occlusions; and

the human eyes are commonly used to converge to a point focusing on it, while with a parallax close to B they have to stay parallel;

the max parallax should be less than B and, consequentially, the depth effect into the screen will be less visible. So fixing a max depth into the screen we can derive the max allowed parallax.

Moreover, the viewer distance from the screen plays a fundamental role. The human vision system has 46 degrees as diagonal aperture; therefore, the minimum distance allowing a comfortable vision of the screen can be evaluated. FIG. 10 illustrates parallax relations with respect to a screen. If the distance D 1050, between the viewer's left eye 1010, right eye 1020 and the screen, is equal to the max depth effect into the screen P 1060, the achievable parallax is equal to B/2, thus more comfortable than B 1030.

Considering the FIG. 10, and exploiting the correlation between similar triangles, we have equation (4) of FIG. 7, where M 1040 is the max parallax, B 1030 is the inter-ocular distance, P 1060 is the depth of the screen, and D 1050 is the user-to screen distance. Starting from equation (4) of FIG. 7, equations (5) and (6) of FIG. 7 can be derived, where the depth_value is the depth map pixel value and N is a reduction factor. Tuning different values of N, the max parallax value changes and optimized 3D images are obtained.

In order to reconstruct the left and right views, equation (7) of FIG. 7 is used.

For each pixels of the input image the value of the parallax is obtained from its depth_value. Considering the input image as a virtual central view, the left and the right views are then obtained, as shown in FIG. 11, shifting the input image pixels by a value equal to parallax/2 1110 for each view. Battiato, “3D Stereoscopic Image Pairs by Depth-Map Generation,” Association for Computing Machinery, 2004.

In some cases, when the 3D model of a 2D image is created, surfaces and objects which were previously blocked may become visible. In this situation, one embodiment of the invention leaves those areas in a solid color, e.g. gray. However, this is not preferred as it does not optimize the view for the user. A preferred embodiment would try to make a “best guess” as to the shapes and features of the uncovered surface. For example, if a 2D image is in a room where the floor is covered with a tapestry having a fixed pattern and a shift to 3D causes a part of the floor, which was previously obscured by a dining table, to be uncovered. The preferred embodiment would copy the pattern of the tapestry onto the uncovered surface. Persons skilled in the art will recognize many automatic pattern recognition techniques that could accomplish this result. For example, Matsagu teaches a pattern recognition system in U.S. Pat. No. 7,697,765, incorporated herein by reference. A preferred embodiment of the invention would train a hierarchal stacked neural network, such as that taught by Commons in U.S. Pat. No. 7,613,663, incorporated herein by reference, to predict the uncovered patterns. This way, even unusual patterns, such as wall clocks, paintings, oddly-shaped furniture, etc., which was partially covered in the 2D view could be fully developed in the 3D view.

The method presented here could be used for single frame 2D to 3D conversion at a high rate of speed. When this method is implemented on a modern computer, for example an Intel Core i7 965 processor with one or more nVidia Tesla M2050 graphic processing units, the process should be able to run in an amount of time on the order of a few seconds or less, making the process useful for watching substantially arbitrary Cable, broadcast, or DVD programs.

Motion Vector Image Change Analysis

FIG. 12 illustrates a flow chart for another embodiment of the present invention.

In step 1210, the processor receives as input a representation of an ordered set of two dimensional images, such as a video. Any arbitrary sequence of two dimensional images may be provided as input. For example, in one embodiment, a scene from Steven Spielberg's film Jurassic Park (1993), which was shot in two dimensions by a single camera, may be used. Note that the camera may be stationary or mobile. Either a single camera or multiple cameras may be used. In another embodiment, a set of two dimensional photographs taken by a security camera are used. The processor may be a processor from an arbitrary computer, such as a HP Pavilion dv3 running Windows 7. Alternately, a computer running the operating system Apple Macintosh Os X or Linux may be used. In another embodiment, the processor is in a video card, which might be located internal to a TV screen or monitor or in a video game system connected to the TV such as the Microsoft Xbox 360, Nintendo Wii or the Sony PlayStation. The connection may be through wires, or over a wireless means such as WiFi, Bluetooth, infrared or microwave.

In step 1220, the processor analyzes the ordered set of two dimensional images to determine a first view of an object in two dimensions 1222 and a motion vector 1224. The first view of an object 1222 can be any scene depicting an object. For example, an image might show a woman standing in front of a Toyota Corolla S 2010, where the woman obscures a portion of the view of the car. In the discussion below, the object will be the Toyota Corolla S 2010, although in another embodiment of the invention, the object might be the woman or any other object in the scene. The motion vector 1224 can be any motion of an object, the camera, or air, light, etc. in the scene. For example, if the woman is moving to the left, this might constitute the motion vector. Alternatively, the car might be moving, or the wind could be blowing the woman's hair or blouse. In another embodiment, the motion vector represents the motion of the camera.

In step 1230, the processor analyzes the combination of the first view of the object in two dimensions and one dimension of time, the motion vector, and the ordered set of two dimensional images to determine a second view of the object 1232. For example, if the motion vector represents the woman moving to the left, a second view of the car is provided. In another embodiment, if the motion vector represents the camera moving to another angle of the camera and the car, a second view of the car, with the woman obscuring a different part of the car, is provided again.

In step 1240, the processor generates a three dimensional representation of the ordered set of two dimensional images based on at least the first view of the object and the second view of the object. The two views of the Toyota Corolla S 2010 reveal more of the features of the vehicle, allowing a three dimensional model to be built. In one embodiment, the three dimensional model is a stereoscopic view, and two views of the scene, a left eye view and a right eye view, are provided. In another embodiment, a full three dimensional model may be provided, and the viewer would be able to see different parts of the Toyota as he moves his head or his body around the television set or monitor. There may be different views as the viewer moves to the left and to the right, as well as up and down.

Step 1250 illustrates that in one embodiment of the invention, for example, where a part of a shape is always invisible in the two dimensional view (e.g. if the woman is always obscuring a side mirror of the Toyota), the processor could predict a shape and color of at least one object that is not visible in the two dimensional image but is visible in the three dimensional model on the basis of an Internet lookup, a database lookup or a table lookup. The database or table may be local or remote to the processor. In the example with the woman and the Toyota above, the Internet (e.g., Google images), database or table contains a representation of a Toyota Corolla S 2010. From the visible part of the image, the processor recognizes that one of the objects in the image is a Toyota Corolla S 2010 and uses the three dimensional model on the Internet, database or table to extract the features of this vehicle (e.g. side mirror) that need to be presented on the screen, as well as the location, color, and shape of these objects. The color and shape may be dependent on the other colors and shapes in the scene. For example, if the Toyota is white, then the covers of the mirrors thereof also need to be white. The scene in the mirror should include a reflection of the objects to which the mirror is pointing.

In step 1260, the processor provides as an output, an indicia of the three dimensional representation. These indicia may be a display on a screen or a representation from which a screen display can be created.

Three Dimensional Screen Based on Viewer Position

In another embodiment, the present invention provides a three dimensional television based on viewer position.

There are several ways to determine the position of a viewer of a television. If the viewer of the television is wearing 3D glasses or other clothing or accessories that communicate with the television, the 3D glasses can communicate the distance and angle of the viewer to the television. The distance and angle can be determined through a triangulation technique. Alternatively, the television could have a camera attached thereto. The camera could be connected to a facial recognition module, which would ascertain the position of the user's face in the scene. The distance and angle of the user's face to the television could then be calculated using techniques similar to those described above in the 2D to 3D conversion method. Alternatively, because the focal length of the camera connected to the television may be known, the camera's focal length can be used as the basis of calculating the distance and angle of the user to the television. Persons skilled in the art will note that this system and method is useful not only for native 2D films that are converted to 3D, e.g. Jurassic Park, but also for native 3D films, e.g. Avatar. This system and method is an improvement over traditional 3D viewing techniques, at least in part, because user fatigue can be reduced to the degree that the user no longer has to have inconsistent information provided to her eyes if she is sitting at a distance and angle from the television for which the 3D show was not designed.

Persons skilled in the art would further note that this technique could further be used to implement a 3D touch screen. Basic touch screen programs with two dimensional screens are known in the art. For example, the Apple iPhone 3G has a touch screen user interface, as do a significant number of computerized ATM machines in the United States. A two dimensional television can implement a similar touch screen device. One embodiment of a two dimensional touch screen machine is disclosed by Heidal, U.S. Pat. No. 5,342,047, incorporated herein by reference.

However, the touch screen is a problem for three dimensional televisions. In three dimensional screens, unlike two dimensional screens, each point on the screen does not map to an image. Rather, the image is based on three factors: (1) the location of the viewer's left eye, (2) the location of the viewer's right eye, and (3) the position on the screen. Thus, the three dimensional television must locate the left eye and right eye of the viewer in order to ascertain which object the viewer is attempting to manipulate with the touch screen. The left and right eyes of the viewer can be located a using a camera communicatively connected to the screen.

The distance to the eyes can then be calculated using the focal length, thereby rendering a representation of the position of the person's eyes.

FIG. 15 illustrates the three dimensional screen idea in detail. A camera 1520 is placed on top of a three dimensional screen 1510. The camera 1520 can ascertain the position of the eyes 1532 and 1534 of the user 1530. If this embodiment is to be used as a touch screen, the screen knows where the user touches it through the conventional touch screen technology. See, e.g., Heidal, U.S. Pat. No. 5,342,047, incorporated herein by reference, describing a touch screen machine.

FIG. 13 is a flow chart depicting how a system, such as the one illustrated in FIG. 15, operates according to one embodiment of the invention.

In step 1310, the camera on the top of the screen takes a picture of the viewer.

In step 1320, the processor calculates a distance and an angle from the viewer to the screen on the basis of the focal length of the camera and other camera parameters. In another embodiment, image parameters not related to the camera could be used instead of camera parameters. This is explained in detail in the discussion of 2D to 3D conversion herein. In yet another embodiment, the viewer could be wearing 3D glasses or some other clothing or accessory that could signal the viewer's position to the processor. Under the latter embodiment, a camera taking a picture, as in step 1310, is not necessary.

In step 1330, the processor applies a transform to a scene in the 3D film Avatar on the basis of the distance and angle from the viewer to the screen in order to produce a new three dimensional model of the scene. Persons skilled in the art will note that a 3D-to-3D transform is involved here. In another embodiment, the base film could be a 2D film, such as Spielberg's Jurassic Park. In this case, a 2D-to-3D transform would be involved. In yet another embodiment, a 2D or 3D photograph, rather than a film, could be transformed and displayed on the screen in three dimensions.

In step 1340, the processor presents an image corresponding to the three dimensional model on the screen.

FIG. 14, copied from FIG. 10 in Ningrat, US App. 2010/0066701, expressly incorporated herein by reference, is a flowchart illustrating methods of implementing an exemplary process for identifying multiple touches in a multi array capacitance based touch screen.

In the following description, it will be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be loaded onto a computer or other programmable apparatus to produce a machine such that the instructions that execute on the computer or other programmable apparatus create means for implementing the functions specified in the flowchart block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means that implement the function specified in the flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed in the computer or on the other programmable apparatus to produce a computer implemented process such that the instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.

Accordingly, blocks of the flowchart illustrations support combinations of means for performing the specified functions and combinations of steps for performing the specified functions. It will also be understood that each block of the flowchart illustrations, and combinations of blocks in the flowchart illustrations, can be implemented by special purpose hardware-based computer systems that perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.

As one skilled the relevant art will recognize, electronically stored data can be used by any type of microprocessor or similar computing system. For example, one or more portions of the present invention can be implemented in software. Software programming code which embodies the present invention is typically accessed by the microprocessor from long-term storage media of some type, such as a CD-ROM drive or hard drive. The software programming code may be embodied on any of a variety of known media for use with a data processing system, such as a diskette, hard drive, CD-ROM or the like. The code may be distributed on such media, or may be distributed from the memory or storage of one computer system over a network of some type to other computer systems for use by such other systems. Alternatively, the programming code may be embodied in the memory, and accessed by the microprocessor. The techniques and methods for embodying software programming code in memory, on physical media, and/or distributing software code via networks are well known and will not be further discussed herein.

Search Engine for 3D Models

Thomas Funkhouser discloses a search engine for 3d models. Thomas Funkhouser et al., “A Search Engine for 3D Models,” ACM Transactions on Graphics, Vol. 22, Issue 1, pg. 83 (January 2003).

As the number of 3D models available on the Web grows, there is an increasing need for a search engine to help people and automatic processors find them. Unfortunately, traditional text-based search techniques are not always effective for 3D data. Funkhouser et al. investigated new shape-based search methods. The key challenges are to develop query methods simple enough for novice users and matching algorithms robust enough to work for arbitrary polygonal models. We present a web-based search engine system that supports queries based on 3D sketches, 2D sketches, 3D models, and/or text keywords. For the shape-based queries, Funkhouser developed a new matching algorithm that uses spherical harmonics to compute discriminating similarity measures without requiring repair of model degeneracies or alignment of orientations. It provides 46-245% better performance than related shape matching methods during precision-recall experiments, and it is fast enough to return query results from a repository of 20,000 models in under a second. The net result is a growing interactive index of 3D models available on the Web (i.e., a search engine for 3D models, which operates in a manner similar to Google or Microsoft Bing for text).

An important question then is how people will search for 3D models. Of course, the simplest approach is to search for keywords in filenames, captions, or context. However, this approach can fail: (1) when objects are not annotated (e.g., “B19745.wrl”), (2) when objects are annotated with specific or derivative keywords (e.g., “yellow.wrl” or “sarah.wrl”), (3) when all related keywords are so common that the query result contains a flood of irrelevant matches (e.g., searching for “faces”—i.e., human not polygonal), (4) when relevant keywords are unknown to the user (e.g., objects with misspelled or foreign labels), or (5) when keywords of interest were not known at the time the object was annotated.

In these cases and others, shape-based queries may be helpful for finding 3D objects. For instance, shape can combine with function to define classes of objects (e.g., round coffee tables). Shape can also be used to discriminate between similar objects (e.g., desk chairs versus lounge chairs). There are even instances where a class is defined entirely by its shape (e.g., things that roll). In these instances, “a picture is worth a thousand words.”

Funkhouser investigates methods for automatic shape-based retrieval of 3D models. The challenges are two-fold. First, we must develop computational representations of 3D shape (shape descriptors) for which indices can be built and similarity queries can be answered efficiently. 3D databases may be searched using orientation invariant spherical harmonic descriptors. Second, user interfaces are provided in which untrained or novice users can specify shape-based queries, for example by 3D sketching, 2D sketching, text, and interactive refinement based on shape similarity.

In one embodiment, the 3D model search engine takes as input a query using any combination of typed keywords and sketches. For example, if a user wants a 3D model of a 2008 Volkswagen Beetle, the user could type “2008 Volkswagen Beetle” into the search engine or provide a standard 2D photograph or drawing of the vehicle as input to the search engine. In one embodiment, the first results of the search engine could be improved by calling a “find similar shape” or similar command.

Prior art content-based image retrieval (CBIR) systems, such as Query by Image Content developed by IBM Corporation in Armonk, N.Y., allow users to input a black and white or color image and find similar images in a database or on the Internet. These systems can be extended into 3D to allow users to search for 3D shapes and models.

FIG. 16 illustrates a search engine for 3D models, according to one embodiment. A user 1660 communicates what she is looking for to a query interface 1650. In another embodiment (not illustrated) the user is an electronic machine rather than a human. The query interface 1650 converts the data into text, which is processed by a text matcher 1644, 2D image data, which is processed by a 2D matcher 1645, and 3D image data, which is processed by a 3D matcher 1646.

Information is then obtained from the World Wide Web 1610 in a crawler 1620, which stores a repository of 3D models 1630. The indexer 1640 then relies on a text index 1641, 2D index 1642, and 3D index 1643 to determine a match to the input to the query interface 1650, which is returned to the user 1660.

Three Dimensional Spatial Fourier Transform

J. R. Feinup discloses that an object may be reconstructed from the modulus of its Fourier transform. J. R. Feinup, “Reconstruction of an Object from the Modulus of its Fourier Transform,” Optics Letters, Vol. 3, No. 1 (July 1978). Mitsuo Takeda describes a method of determining the topography of a scene using spatial Fourier transforms. Mitsuo Takeda, “Fourier-Transform Method of Fringe Pattern Analysis for Computer-Based Topography and Interferometry,” J. Opt. Soc. Am., Vol. 72, No. 1 (January 1982).

Takeda proposes that in various optical measurements, we find a fringe pattern of the form:

g(x,y)=a(x,y)+b(x,y)·cos[2πf₀x+Φ(x,y)] (Eq. 1)

where the phase Φ(x,y) contains the desired information and a(x,y) and b(x,y) represent unwanted irradiance variations arising from the nonuniform light reflection or transmission by a test object; in most cases a(x,y), b(x,y) and Φ(x,y) vary slowly compared with the variation introduced by the spatial-carrier frequency f₀.

The conventional technique has been to extract the phase information by generating a fringe-contour map of the phase distribution. In interferometry, for which Eq. (1) represents the interference fringes of tilted wave fronts, the tilt is set to zero to obtain a fringe pattern of the form:

g₀(x,y)=a(x,y)+b(x,y)·cos[Φ(x,y)] (Eq. 2)

which gives a contour map of Φ(x,y) with a contour interval 2π. In the case of moiré topography, for which Eq. (1) represents a deformed grating image formed on an object surface, another grating of the same spatial frequency is superposed to generate a moiré pattern that has almost the same form as Eq. (2) except that it involves other high-frequency terms that are averaged out in observation. Although these techniques provide us with a direct means to display a contour map of the distribution of the quantity to be measured, they have following drawbacks: (1) The sign of the phase cannot be determined, so that one cannot distinguish between depression and elevation from a given contour map. (2) The sensitivity is fixed at 2π because phase variations of less than 2π create no contour fringes. (3) Accuracy is limited by the unwanted variations a(x,y) and b(x,y), particularly in the case of broad-contour fringes. Fringe-scanning techniques have been proposed to solve these problems, but they require moving components, such as a moving mirror mounted on a translator, which must be driven with great precision and stability.

Takeda proposes a new technique that can solve all these problems by a simple Fourier-spectrum analysis of a non-contour type of fringe pattern, as given in Eq. (1).

First, a non-contour type of fringe pattern of the form given in Eq. (1) is put into a computer by an image-sensing device that has enough resolution to satisfy the sampling-theory requirement, particularly in the x direction. The input fringe pattern is rewritten in the following form for convenience of explanation:

g(x,y)=a(x,y)+c(x,y)·exp(2πif₀x)+c*(x,y)·exp(−2πif₀x) (Eq. 3)
with
c(x,y)=0.5·b(x,y)·exp[iΦ(x,y)] (Eq. 4)

where * denotes a complex conjugate.

Next, Eq. (3) is Fourier transformed with respect to x by the use of a Fast-Fourier-Transform (FFT) algorithm, which gives:

G(f,y)=A(f,y)+C(f−f₀,y)+C*(f+f₀,y) (Eq. 5)

where the capital letters denote the Fourier spectra and f is the spatial frequency in the x direction. Since the spatial variations of a(x,y), b(x,y), and Φ(x,y) are slow compared with the spatial frequency fo, the Fourier spectra in Eq. (5) are separated by the carrier frequency f₀. We make use of either of the two spectra on the carrier, say C(f−f₀,y), and translate it by f₀on the frequency axis toward the origin to obtain C(f,y). Note that the unwanted background variation a(x,y) has been filtered out in this stage. Again using the FFT algorithm, we compute the inverse Fourier transform of C(f,y) with respect to f and obtain c(x,y), defined by Eq. (4). Then we calculate a complex logarithm of Eq. (4):

log[c(x,y)]=log[0.5 b(x,y)]+iΦ(x,y) (Eq. 6)

Now we have the phase Φ(x,y) in the imaginary part completely separated from the unwanted amplitude variation b(x,y) in the real part. The phase so obtained is indeterminate to a factor of 2π. In most cases, a computer-generated function subroutine gives a principal value ranging from −π to π, as, for example. These discontinuities can be corrected by the following algorithm. We determine an offset phase distribution Φ₀(x,y) that should be added to the discontinuous phase distribution Φ_d(x,y) to convert it to a continuous distribution Φ_c(x,y):

Φ_c(x,y)=Φ_d(x,y)+Φ₀(x,y) (Eq. 7)

The first step in making this determination is to compute the phase difference:

ΔΦ_d(x_i,y)=Φ_d(x_i,y)−Φ_d(x_i-1,y)

between the ith sample point and the point preceding it, with the suffix i running from 1 to N to cover all the sample points. Since the variation of the phase is slow compared with the sampling interval, the absolute value of the phase difference |ΔΦ_d(x_i,y)| is much less than 2π, at points where the phase distribution is continuous. But it becomes almost 2π at points where the 2π phase jump occurs. Hence, by setting an appropriate criterion for the absolute phase difference, say 0.9×2π, we can specify all the points at which the 2π phase jump takes place and also the direction of each phase jump, positive or negative, which is defined as corresponding to the sign of ΔΦ_d(x_i,y). The second step is to determine the offset phase at each sample point sequentially, starting from the point x₀=0. Since only a relative phase distribution needs to be determined, we initially set Φ₀^x(x₀,y)=0. Then we set Φ₀^x(x_i,y)=Φ₀^x(x₀,y) for i=1, 2, 3, . . . , k−1 until the first phase jump is detected at the kth sample point. If the direction of the phase jump is positive, we set Φ₀^x(x_k,y)=Φ₀^x(x_k-1,y)−2π, and if it is negative, we set Φ₀^x(x_k,y)=Φ₀^x(x_k-1,y)+2π. Again, we start to set Φ₀^x(x_i,y)=Φ₀^x(x_k,y) for i=k+1, i=k+2, . . . , i=m−1, until the next phase jump occurs at the m^thsample point, where we perform the same 2π addition or subtraction as at the k^thsample point, with k now being replaced with m. Repeating this procedure of 2π phase addition or subtraction at the points of phase jump, we can determine the offset phase distribution the addition of which to Φ_d(x,y) gives a continuous phase distribution Φ_c(x,y). In the case of measurement over a full two-dimensional plane, a further phase-continuation operation in the y direction is necessary because we initially set Φ₀^x(x₀,y)=0 for all y without respect to the phase distribution in the y direction. It is sufficient to determine an additional offset phase distribution in the y direction, Φ₀^y(x,y), on only one line along the y axis, say, on the line through the point x=x_L, L being arbitrary. This can be done by the same procedure as was described for the x direction, with the initial value now being set at Φ₀^y(x_L,y₀)=0. The two-dimensional offset phase distribution is then given by:

Φ₀(x,y)=Φ₀^x(x,y)−Φ₀^x(x_L,y)+Φ₀^y(x_L,y) (Eq. 8)

In Eq. (8), Φ₀^x(x,y)−Φ₀^x(x_L,y) represents the difference of the offset phase between the points (x,y) and (x_L,y), and Φ₀^y(x_L,y) that between points (x_L,y) and (x_L,y₀), so that Φ₀(x,y) gives a relative offset phase distribution defined as the difference from the initial value at (x_L,y₀).

This method developed by Takeda can be modified to from producing a topography of a region to producing a full 3D depth model of a region. The view of two dimensions of space is already known, and the third dimension can be derived from the topographic depth model. This topographic model can be converted into a full (x,y,z) depth model or a combination of a left eye view and a right eye view of a region.

Three Dimensional Audio Processing

The audio of many movies is developed by professional engineers to make the sound appear as though it is being heard at the viewer's position from a certain location where a speaker appears to be standing or from where a sound (e.g. siren, car engine, etc.) is coming. These calculations can be reverse engineered to ascertain an object representing a source of the sound and the position, velocity, and acceleration of the source of the sound. In one embodiment, these calculations are done in a graphics processor unit (GPU). In another embodiment, these calculations are done in the CPU of a computer running an operating system such as Windows 7, Apple Macintosh Os X or Linux.

Emmanuel Gallo discusses 3D audio processing provide improved audio rendering in Emmanuel Gallo, “Efficient 3D Audio Processing with the GPU,” ACM Workshop on General Purpose Computing on Graphics Processors (2004).

Gallo notes that audio processing applications are among the most compute-intensive and often rely on additional DSP resources for real time performance. However, programmable audio digital signal processors (DSPs) are in general only available to product developers. Professional audio boards with multiple DSPs usually support specific effects and products while consumer “game-audio” hardware still only implements fixed-function pipelines which evolve at a rather slow pace.

The widespread availability and increasing processing power of GPUs offer an alternative solution. GPU features, like multiply-accumulate instructions or multiple execution units, are similar to those of most DSPs. Besides, 3D audio rendering applications require a significant number of geometric calculations, which are a perfect fit for the GPU.

GPU-Accelerated Audio Rendering

Gallo considered a combination of two simple operations commonly used for 3D audio rendering: variable delay-line and filtering. The signal of each sound source was first delayed by the propagation time of the sound wave. This involved resampling the signal at non-integer index values and automatically accounts for Doppler shifting. The signal was then filtered to simulate the effects of source and listener directivity functions, occlusions and propagation through the medium. Gallo resampled the signals using linear interpolation between the two closest samples. On the GPU this is achieved through texture resampling. Filtering may be implemented using a simple 4-band equalizer. Assuming that input signals are band-pass filtered in a pre-processing step, the equalization is efficiently implemented as a 4-component dot product. For GPU processing, Gallo stored the sound signals as RGBA textures, each component holding a band-passed copy of the original sound. Binaural stereo rendering requires applying this pipeline twice, using a direction-dependent delay and equalization for each ear, derived from head-related transfer functions (HRTFs). Similar audio processing was used to generate dynamic sub-mixes of multiple sound signals prior to spatial audio rendering (e.g. perceptual audio rendering).

Gallo compared an optimized SSE (Intel's Streaming SIMD Extensions) assembly code running on a Pentium 4 3 GHz processor and an equivalent Cg/OpenGL implementation running on an nVidia GeForce FX 5950 Ultra graphics board on AGP 8x. Audio was processed at 44.1 KHz using 1024-sample long frames. All processing was 32-bit floating point.

The SSE implementation achieves real-time binaural rendering of 700 sound sources, while the GPU renders up to 580 in one timeframe (approximately 22.5 ms). However, resampling floating-point textures requires two texture fetches and a linear interpolation in the shader. If floating-point texture resampling was available in hardware, GPU performance would increase. A simulation of this functionality on the GPU using a single texture-fetch achieved real-time performance for up to 1050 sources. For mono processing, the GPU treats up to 2150 (1 texture fetch)/1200 (2 fetches and linear interpretation) sources, while the CPU handles 1400 in the same amount of time.

Thus, although on average the GPU implementation was about 20% slower than the SSE implementation, it would become 50% faster if floating-point texture resampling was supported in hardware. The latest graphics architectures are likely to significantly improve GPU performance due to their increased number of pipelines and better floating-point texture support.

The huge pixel throughput of the GPU can also be used to improve audio rendering quality without reducing frame-size by recomputing rendering parameters (source-to-listener distance, equalization gains, etc.) on a per-sample rather than per-frame basis. This can be seen as an audio equivalent of per-pixel vs. per-vertex lighting. By storing directivity functions in cube-maps and recomputing propagation delays and distances for each sample, the GPU implementation can still render up to 180 sources in the same time-frame. However, more complex texture addressing calculations are needed in the fragment program due to limited texture size. By replacing such complex texture addressing with a single texture-fetch, we also estimated that direct support for large 1D textures would increase performance by at least a factor of two.

It is noted that current GPU systems are far faster than those analyzed by Gallo.

It is possible to reverse Gallo's method to determine a position, velocity, and acceleration of an object emitting a sound from the object (instead of the defining audio signal from the position, velocity, and acceleration, as discussed by Gallo). Therefore, one can use the GPU to determine the position of objects emitting sounds in two or three dimensions and construct a 2D or 3D model of a scene on this basis. In a typical television scene, there are many objects that emit sounds, for example, humans speak, dogs bark, cars emit engine noises, gun shots result in a sound coming from the source, etc. Thus, the reversal of Gallo's method is particularly useful.

In step 1710, the processor implementing the method receives sound and image data from a 2D movie. It should be noted that the processor implementing the invention could be a modern computer, such as a HP Pavilion DV3 Running a Microsoft Windows 7 operating system or an Apple Macintosh Os X or Linux computer. In other embodiments, a graphics card having a GPU, internal to the monitor or an external graphics card in a Sony PlayStation, other video game console, or similar device is used.

In step 1720, the processor calculates a characteristic delay of a sound coming from a source.

In step 1730, the processor auto-correlates the audio signal data between channels in order to optimize the audio output and determine characteristic time-constants.

In step 1740, the processor optimizes the audio output and ascertains the amplitude and equalization features of the audio signal. This data is then used to calculate a most likely position of the sound source. It should be noted that, in some cases, two or more microphones might be recording the same or different sounds having the same characteristics. In this case, two or more positions, rather than one position, may be calculated.

In step 1750, the processor applies a correction to account for echoes and other errors. It is noted that some of the errors might be intentional. For example, in some music video recordings, the lead singers voice is equalized across all of the speakers while the lead singer is positioned in an arbitrary location of the screen, not corresponding to the source of his/her voice. In a preferred embodiment, the system recognizes these inconsistencies and uses other methods to ascertain a 3D model of the scene under these conditions. It is noted that, in some cases, a sound may be coming from two locations. For example, a rock band could include two guitarists playing at the same frequency but standing at different locations. There could also be two different microphones near the lead singer.

In step 1760, the processor provides as output a representation of the position of the sound source. This output is either submitted to another processor or to another module in the same processor for further processing for example, to develop a 3D screen representation of the scene.

Image Segmentation

FIG. 18 illustrates a method of image segmentation according to one embodiment.

In step 1810, the processor implementing the invention receives a 2D image or video. According to a preferred embodiment, the processor is a graphics processing unit (GPU) that can run single instruction, multiple data (SIMD) processing, such as in a Sony PlayStation connected to a monitor. In another embodiment, a full laptop computer, such as a HP Pavilion DV3 running Microsoft Windows 7, is used. In yet another embodiment, the GPU is internal to the monitor.

In step 1820, the processor applies edge boundary extraction techniques to detect a set of different objects in the 2D image or video. Edge boundary extraction techniques are described, for example, in U.S. Pat. Nos. 6,716,175, and 5,457,744, each of which is expressly incorporated herein by reference.

In step 1830, the processor separates the set of objects into a set of background objects and a set of foreground objects.

In step 1840, the processor processes the background image. Step 1842 involves inferring the shapes and color of parts of the background that are obscured in the 2D image by the foreground. This inference is made on the basis of nearby repeating patterns. For example, if a man is standing in front of a red brick wall with white clay between the bricks, it is very likely that the part of the wall that is obscured by the man has the same pattern. In step 1844, the processor generates a left eye view of the background by removing a part of the rightmost edge from the 2D representation of the background developed in step 1842. In step 1846, the processor generates a right eye view of the background by removing a part of the leftmost edge from the 2D representation of the background developed in step 1842.

In step 1850, the processor processes the objects in the foreground. For each object in the foreground, the processor creates or obtains a 3D model of the object 1852. Methods of creating or obtaining 3D models, either from a 2D image alone or in combination with a database or Internet search, are discussed elsewhere in this document. On the basis of this 3D model, the processor can now create a left eye view 1854 and a right eye view 1856 of the object.

In step 1860, the processor combines the left eye views of the objects in the foreground with the left eye view of the background to create a left eye view of the image or video. It is noted that some objects in the left eye view may obstruct other objects or the background.

In step 1870, the processor combines the right eye views of the objects in the foreground with the right eye view of the background to create a right eye view of the image or video. It is noted that some objects in the right eye view may obstruct other objects or the background.

In step 1880, the processor provides as output the left eye view and the right eye view of the image or video.

FIG. 19 illustrates a method of creating a 3D representation of at least a portion of a 2D video. The method of FIG. 19 takes advantage of the fact that many of the “interesting” objects in a video that need to be modeled in three dimensions usually produce sounds, e.g., speaking, barking, engine revving, etc., or are stationary, e.g. buildings, furniture, sculptures, etc.

In step 1910, the processor implementing the method receives a 2D video. According to a preferred embodiment, the processor is a graphics processing unit (GPU) that can run single instruction, multiple data (SIMD) processing, such as in a Sony PlayStation III connected to a monitor. In another embodiment, a full laptop computer, such as a HP Pavilion DV3 running Microsoft Windows 7, is used. In yet another embodiment, the GPU is internal to the monitor.

In step 1920, the processor selects at least one sound source in the 2D video. A method of ascertaining a sound source is described herein, for example, in FIG. 17. A 3D model of the sound source is then generated 1922. This model may be generated from the audio program, or from the 2D image, as noted herein, for example, in FIG. 16, and the discussion associated therewith.

In step 1930, the processor notes a movement of the camera (rather than an object in the scene external to the camera). Such a movement would provide at least two different views of stationary objects, and therefore would allow for a 3D model of the stationary objects in the scene to be developed. In step 1932, such a 3D model is developed on the basis of the change in the image resulting from the movement in the camera.

Finally, in step 1940, the sound source models and camera movement-based models are combined to create a 3D representation of the scene.

FIG. 20 illustrates a method of developing a three dimensional video from a two dimensional video.

In step 2010, the processor receives a 2D video. In step 2020, the processor applies physical edge boundary extraction techniques to detect a set of objects. These steps and the types of processors that can implement the method have been described in detail above.

In step 2020, the processor recognizes an object as a two dimensional picture, photograph, wallpaper, television or computer screen, etc. In a preferred embodiment, these images would still appear in two dimensions. For example, if a video is being taken in an office where there is a photograph of a man and a woman at a beach on the desk, the desk, chair, computer, etc. in the office would all appear in 3D, but the photograph and the people and objects depicted therein would remain 2D. Therefore, in a preferred embodiment, the processor suppresses the three dimensional modeling of the object that should remain in two dimensions 2022.

A known problem in the art of automatic 2D-to-3D conversion is ascertaining the size and position of an object when there are no visual cues available. For example, when a person sees a white sphere in an otherwise completely dark, black scene it is impossible to tell whether the sphere is a small ball that is close to the viewer or a large ball that is far away from the viewer. Only the ratio of the size of the sphere to the distance from the viewer can be readily ascertained. In other words, it is impossible to tell whether the sphere is a fly-size sphere a few feet away or an airplane-size sphere a mile away. There needs to be a method of ascertaining the identity of the object (fly or airplane) in order to ascertain its likely size, and to predict its distance from the viewer in light of this knowledge of size.

In step 2030, an object is recognized as an airplane. Object recognition techniques based on 2D models are known in the art. See, e.g., U.S. Pat. Nos. 6,069,696, and 7,403,641, each expressly incorporated herein by reference. The processor “knows,” either based on a local database or an Internet search, the typical size of an airplane 2032. The processor also “knows” the length that the airplane appears. Based on this information, the distance from the processor to the airplane can be calculated by applying basic geometric techniques 2034.

In step 2040, an object is recognized as a fly. The processor “knows,” either based on a local database or an Internet search, the typical size of a fly 2042. The processor also “knows” the length that the fly appears. Based on this information, the distance from the processor to the fly can be calculated by applying basic geometric techniques 2044. A representation of the fly can then be added to the 3D model of the scene.

Hardware Overview

FIG. 4, copied from U.S. Pat. No. 7,702,660, issued to Chan, is a block diagram that illustrates a computer system 400 upon which an embodiment of the invention may be implemented. Computer system 400 includes a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information. Computer system 400 also includes a main memory 406, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 402 for storing information and instructions to be executed by processor 404. Main memory 406 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404. Computer system 400 further includes a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404. A storage device 410, such as a magnetic disk or optical disk, is provided and coupled to bus 402 for storing information and instructions.

Computer system 400 may be coupled via bus 402 to a display 412, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 414, including alphanumeric and other keys, is coupled to bus 402 for communicating information and command selections to processor 404. Another type of user input device is cursor control 416, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 404 and for controlling cursor movement on display 412. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.

The invention is related to the use of computer system 400 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 400 in response to processor 404 executing one or more sequences of one or more instructions contained in main memory 406. Such instructions may be read into main memory 406 from another machine-readable medium, such as storage device 410. Execution of the sequences of instructions contained in main memory 406 causes processor 404 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.

The term “machine-readable medium” as used herein refers to any medium that participates in providing data that causes a machine to operation in a specific fashion. In an embodiment implemented using computer system 400, various machine-readable media are involved, for example, in providing instructions to processor 404 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 410. Volatile media includes dynamic memory, such as main memory 406. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 402. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. All such media must be tangible to enable the instructions carried by the media to be detected by a physical mechanism that reads the instructions into a machine.

Common forms of machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.

Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 404 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 400 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 402. Bus 402 carries the data to main memory 406, from which processor 404 retrieves and executes the instructions. The instructions received by main memory 406 may optionally be stored on storage device 410 either before or after execution by processor 404.

Computer system 400 also includes a communication interface 418 coupled to bus 402. Communication interface 418 provides a two-way data communication coupling to a network link 420 that is connected to a local network 422. For example, communication interface 418 may be an Integrated Services Digital Network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 418 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 418 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 420 typically provides data communication through one or more networks to other data devices. For example, network link 420 may provide a connection through local network 422 to a host computer 424 or to data equipment operated by an Internet Service Provider (ISP) 426. ISP 426 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 428. Local network 422 and Internet 428 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 420 and through communication interface 418, which carry the digital data to and from computer system 400, are exemplary forms of carrier waves transporting the information.

Computer system 400 can send messages and receive data, including program code, through the network(s), network link 420 and communication interface 418. In the Internet example, a server 430 might transmit a requested code for an application program through Internet 428, ISP 426, local network 422 and communication interface 418.

The received code may be executed by processor 404 as it is received, and/or stored in storage device 410, or other non-volatile storage for later execution. In this manner, computer system 400 may obtain application code in the form of a carrier wave.

In this description, several preferred embodiments of the invention were discussed. Persons skilled in the art will, undoubtedly, have other ideas as to how the systems and methods described herein may be used. It is understood that this broad invention is not limited to the embodiments discussed herein. Rather, the invention is limited only by the following claims.

Claims

1. A method comprising: receiving a representation of an ordered set of two dimensional visual images of an object;estimating a motion vector for the object represented in the ordered set of two dimensional visual images of the object;classifying the object;predicting a spatial model of the object comprising at least one classification-dependent feature not visible in the ordered set of two dimensional visual images of the object, based on at least the classification of the object and the ordered set of two dimensional visual images of the object;representing a dynamically changing perspective view of the object in a synthetic motion image based on at least the predicted spatial model and the estimated motion vector; andoutputting the perspective view of the object in the synthetic image.
2. The method according to claim 1, further comprising performing an Internet lookup dependent on the classification, to supply shape information for the object, comprising the at least one classification-dependent feature not visible in the ordered set of two dimensional visual images of the object.
3. The method according to claim 1, wherein the perspective view of the object in the synthetic image is further dependent on a vanishing point.
4. The method according to claim 1, wherein the perspective view of the object comprises a stereoscopic visual image.
5. The method according to claim 1, wherein the at least one classification-dependent feature not visible in the ordered set of two dimensional visual images of the object comprises a color of a region of the object obscured in the ordered set of two dimensional visual images of the object.
6. The method according to claim 1, wherein said classifying the object is performed with at least one stacked neural network.
7. The method according to claim 1, wherein said classifying the object is performed using a single-instruction, multiple data processor integrated circuit.
8. The method according to claim 1, wherein the at least one classification-dependent feature not visible in the ordered set of two dimensional visual images of the object comprises a hidden surface of the object not visualized in the ordered set of two dimensional visual images of the object.
9. The method according to claim 1, wherein said predicting the spatial model of the object is performed using a single-instruction, multiple data processor integrated circuit.
10. The method according to claim 1, further comprising: receiving audio signals associated with the ordered set of two dimensional visual images of the object;modifying the received audio signals selectively dependent on at least differences between parameters extracted from the ordered set of two dimensional visual images of the object and parameters associated with the synthetic image; andoutputting the modified received audio signals in conjunction with the synthetic image.
11. The method according to claim 10, wherein the modified received audio signals comprise a spatial audio rendering of the received audio signals selectively dependent on a head-related transfer function of a listener.
12. The method according to claim 1, further comprising: modifying the estimated motion vector of the object to produce a modified motion vector,wherein the perspective view of the object in the synthetic image is further based on at least the modified motion vector of the object.
13. The method according to claim 1, wherein the object is classified dependent on metadata within an information stream accompanying the ordered set of two dimensional visual images.
14. A method comprising: receiving a representation of an object in an environment in an ordered set of images;classifying the object;predicting a spatial model of the object, comprising at least one classification-dependent feature not visible in the ordered set of images of the object, based on at least the classification of the object and the ordered set of images of the object;estimating a motion vector for the object represented in the ordered set of images;modifying the estimated motion vector of the object represented in the ordered set of images to produce a modified motion vector;representing a perspective view of the object in the environment in a synthetic motion image based on at least the received representation of the object in the environment, the predicted spatial model, and the modified motion vector;outputting the perspective view of the object in the synthetic image comprising the environment with a modified representation of the object having a different motion of the object with respect to the environment than in the received representation.
15. The method according to claim 14, further comprising: receiving audio signals associated with the ordered set of images;modifying the received audio signals selectively dependent on at least differences between parameters extracted from the ordered set of images and parameters associated with the synthetic image; andoutputting the modified received audio signals in conjunction with the synthetic image.
16. The method according to claim 15, wherein the modified received audio signals comprise a spatial audio rendering of the received audio signals selectively dependent on a head-related transfer function of a listener.
17. The method according to claim 16, further comprising: modifying the spatial audio rendering selectively dependent on changes in a position of the object within the synthetic image.
18. A system for presenting an audiovisual program to a viewer comprising: a first input port configured to receive a representation of an ordered set of visual images of an object;at least one automated processor configured to: classify the object;predict a spatial model of the object comprising at least one classification-dependent feature not visible in the ordered set of two dimensional visual images of the object, based on at least the classification of the object and the ordered set of two dimensional visual images of the object;estimate a motion of the object within the ordered set of visual images;modify the estimated motion of the object;represent a perspective view of the object in a synthetic image based on at least the predicted spatial model and the modified estimated motion of the object, such that a position of the object in the synthetic image differs from a position of the object within the ordered set of visual images; andgenerate an output signal representing the perspective view of the object in the synthetic image; andan output port configured to present the output signal.
19. The system according to claim 18, wherein the at least one automated processor is further configured to modify audio signals associated with the object in the ordered set of visual images selectively dependent on a head-related transfer function of a listener and a position of the object within the synthetic image.
20. The system according to claim 18, wherein the at least one automated processor is further configured to: define a dynamically changing perspective view of the object in the synthetic motion image; andrepresent the perspective view of the object in the synthetic image further based on at least the dynamically changing perspective view of the object.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is Continuation of U.S. patent application Ser. No. 16/025,347, filed Jul. 2, 2018, now U.S. Pat. No. 10,715,793, issued Jul. 14, 2020, which is Continuation of U.S. patent application Ser. No. 14/733,224, filed Jun. 8, 2015, now U.S. Pat. No. 10,015,478, issued Jul. 3, 2018, which is a Continuation of U.S. patent application Ser. No. 13/161,866, filed Jun. 16, 2011, now U.S. Pat. No. 9,053,562, issued Jun. 9, 2015, which claims benefit of priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 61/358,244, filed Jun. 24, 2010, which are each expressly incorporated herein by reference in their entirety.

US Referenced Citations (899)

Number	Name	Date	Kind
4549275	Sukonick	Oct 1985	A
4646075	Andrews et al.	Feb 1987	A
4694404	Meagher	Sep 1987	A
4740836	Craig	Apr 1988	A
4862392	Steiner	Aug 1989	A
4879652	Nowak	Nov 1989	A
4935879	Ueda	Jun 1990	A
4953107	Hedley et al.	Aug 1990	A
4958147	Kanema et al.	Sep 1990	A
4992780	Penna et al.	Feb 1991	A
5077608	Dubner	Dec 1991	A
5247587	Hasegawa et al.	Sep 1993	A
5342047	Heidel et al.	Aug 1994	A
5343395	Watts	Aug 1994	A
5416848	Young	May 1995	A
5440682	Deering	Aug 1995	A
5454371	Fenster et al.	Oct 1995	A
5457744	Stone et al.	Oct 1995	A
5493595	Schoolman	Feb 1996	A
5495576	Ritchey	Feb 1996	A
5506607	Sanders, Jr. et al.	Apr 1996	A
5510832	Garcia	Apr 1996	A
5548667	Tu	Aug 1996	A
5574836	Broemmelsiek	Nov 1996	A
5673081	Yamashita et al.	Sep 1997	A
5682437	Okino et al.	Oct 1997	A
5682895	Ishiguro	Nov 1997	A
5734383	Akimichi	Mar 1998	A
5734384	Yanof et al.	Mar 1998	A
5742294	Watanabe et al.	Apr 1998	A
5748199	Palm	May 1998	A
5774357	Hoffberg	Jun 1998	A
5777666	Tanase et al.	Jul 1998	A
5781146	Frederick	Jul 1998	A
5842473	Fenster et al.	Dec 1998	A
5872575	Segal	Feb 1999	A
5875108	Hoffberg et al.	Feb 1999	A
5901246	Hoffberg et al.	May 1999	A
5959672	Sasaki	Sep 1999	A
5964707	Fenster et al.	Oct 1999	A
5977987	Duluk, Jr.	Nov 1999	A
5988862	Kacyra et al.	Nov 1999	A
5990900	Seago	Nov 1999	A
6016150	Lengyel et al.	Jan 2000	A
6020901	Lavelle et al.	Feb 2000	A
6023263	Wood	Feb 2000	A
6046745	Moriya et al.	Apr 2000	A
6055330	Eleftheriadis et al.	Apr 2000	A
6059718	Taniguchi et al.	May 2000	A
6064393	Lengyel et al.	May 2000	A
6069696	McQueen et al.	May 2000	A
6072903	Maki et al.	Jun 2000	A
6081750	Hoffberg et al.	Jun 2000	A
6094237	Hashimoto	Jul 2000	A
6118887	Cosatto et al.	Sep 2000	A
6124859	Horii et al.	Sep 2000	A
6124864	Madden et al.	Sep 2000	A
6151026	Iwade et al.	Nov 2000	A
6154121	Cairns et al.	Nov 2000	A
6157396	Margulis et al.	Dec 2000	A
6166748	Van Hook et al.	Dec 2000	A
6169552	Endo et al.	Jan 2001	B1
6173372	Rhoades	Jan 2001	B1
6206691	Lehmann et al.	Mar 2001	B1
6208348	Kaye	Mar 2001	B1
6208350	Herrera	Mar 2001	B1
6212132	Yamane et al.	Apr 2001	B1
6229553	Duluk, Jr. et al.	May 2001	B1
6232974	Horvitz et al.	May 2001	B1
6239810	Van Hook et al.	May 2001	B1
6246468	Dimsdale	Jun 2001	B1
6268875	Duluk, Jr. et al.	Jul 2001	B1
6271860	Gross	Aug 2001	B1
6275234	Iwaki	Aug 2001	B1
6285378	Duluk, Jr.	Sep 2001	B1
6301370	Steffens et al.	Oct 2001	B1
6326964	Snyder et al.	Dec 2001	B1
6330523	Kacyra et al.	Dec 2001	B1
6331856	Van Hook et al.	Dec 2001	B1
6334847	Fenster et al.	Jan 2002	B1
6340994	Margulis et al.	Jan 2002	B1
6342892	Van Hook et al.	Jan 2002	B1
6362822	Randel	Mar 2002	B1
6375782	Kumar et al.	Apr 2002	B1
6400996	Hoffberg et al.	Jun 2002	B1
6405132	Breed et al.	Jun 2002	B1
6418424	Hoffberg et al.	Jul 2002	B1
6420698	Dimsdale	Jul 2002	B1
6429903	Young	Aug 2002	B1
6445833	Murata et al.	Sep 2002	B1
6456340	Margulis	Sep 2002	B1
6461298	Fenster et al.	Oct 2002	B1
6473079	Kacyra et al.	Oct 2002	B1
6476803	Zhang et al.	Nov 2002	B1
6489955	Newhall, Jr.	Dec 2002	B1
6496183	Bar-Nahum	Dec 2002	B1
6512518	Dimsdale	Jan 2003	B2
6512993	Kacyra et al.	Jan 2003	B2
6515659	Kaye et al.	Feb 2003	B1
6518965	Dye et al.	Feb 2003	B2
6522336	Yuasa	Feb 2003	B1
6525722	Deering	Feb 2003	B1
6526352	Breed et al.	Feb 2003	B1
6538658	Herrera	Mar 2003	B1
6549203	Randel	Apr 2003	B2
6556197	Van Hook et al.	Apr 2003	B1
6580430	Hollis et al.	Jun 2003	B1
6587112	Goeltzenleuchter et al.	Jul 2003	B1
6593926	Yamaguchi et al.	Jul 2003	B1
6593929	Van Hook et al.	Jul 2003	B2
6597363	Duluk, Jr. et al.	Jul 2003	B1
6603476	Paolini et al.	Aug 2003	B1
6614444	Duluk, Jr. et al.	Sep 2003	B1
6618048	Leather	Sep 2003	B1
6625607	Gear	Sep 2003	B1
6636214	Leather et al.	Oct 2003	B1
6640145	Hoffberg et al.	Oct 2003	B2
6654018	Cosatto et al.	Nov 2003	B1
6664958	Leather et al.	Dec 2003	B1
6664962	Komsthoeft et al.	Dec 2003	B1
6704018	Mori et al.	Mar 2004	B1
6707458	Leather et al.	Mar 2004	B1
6716175	Geiser et al.	Apr 2004	B2
6738065	Even-Zohar	May 2004	B1
6747642	Yasumoto	Jun 2004	B1
6754373	de Cuetos et al.	Jun 2004	B1
6756986	Kuo et al.	Jun 2004	B1
6778252	Moulton et al.	Aug 2004	B2
6798390	Sudo et al.	Sep 2004	B1
6807290	Liu et al.	Oct 2004	B2
6840107	Gan	Jan 2005	B2
6850252	Hoffberg	Feb 2005	B1
6888549	Bregler et al.	May 2005	B2
6904163	Fujimura et al.	Jun 2005	B1
6937245	Van Hook et al.	Aug 2005	B1
6944320	Liu et al.	Sep 2005	B2
6950537	Liu et al.	Sep 2005	B2
6956578	Kuo et al.	Oct 2005	B2
6980218	Demers et al.	Dec 2005	B1
6980671	Liu et al.	Dec 2005	B2
6988991	Kim et al.	Jan 2006	B2
6993163	Liu et al.	Jan 2006	B2
6999069	Watanabe et al.	Feb 2006	B1
6999100	Leather et al.	Feb 2006	B1
7006085	Acosta et al.	Feb 2006	B1
7006881	Hoffberg et al.	Feb 2006	B1
7015931	Cieplinski	Mar 2006	B1
7020305	Liu et al.	Mar 2006	B2
7039219	Liu et al.	May 2006	B2
7046841	Dow et al.	May 2006	B1
7061488	Randel	Jun 2006	B2
7061502	Law et al.	Jun 2006	B1
7065233	Liu et al.	Jun 2006	B2
7071970	Benton	Jul 2006	B2
7075545	Van Hook et al.	Jul 2006	B2
7082212	Liu et al.	Jul 2006	B2
7088362	Mori et al.	Aug 2006	B2
7098809	Feyereisen et al.	Aug 2006	B2
7098908	Acosta et al.	Aug 2006	B2
7116323	Kaye et al.	Oct 2006	B2
7116324	Kaye et al.	Oct 2006	B2
7116335	Pearce et al.	Oct 2006	B2
7130490	Elder et al.	Oct 2006	B2
7133041	Kaufman et al.	Nov 2006	B2
7133540	Liu et al.	Nov 2006	B2
7136710	Hoffberg et al.	Nov 2006	B1
7142698	Liu et al.	Nov 2006	B2
7149329	Liu et al.	Dec 2006	B2
7149330	Liu et al.	Dec 2006	B2
7156655	Sachdeva et al.	Jan 2007	B2
7158658	Liu et al.	Jan 2007	B2
7174035	Liu et al.	Feb 2007	B2
7181051	Liu et al.	Feb 2007	B2
7184059	Fouladi et al.	Feb 2007	B1
7212656	Liu et al.	May 2007	B2
7228279	Chaudhari et al.	Jun 2007	B2
7242460	Hsu et al.	Jul 2007	B2
7248258	Acosta et al.	Jul 2007	B2
7248718	Comaniciu et al.	Jul 2007	B2
7251603	Connell et al.	Jul 2007	B2
7256791	Sullivan et al.	Aug 2007	B2
7286119	Yamaguchi et al.	Oct 2007	B2
7307640	Demers et al.	Dec 2007	B2
7319955	Deligne et al.	Jan 2008	B2
7330198	Yamaguchi et al.	Feb 2008	B2
7391418	Pulli et al.	Jun 2008	B2
7403641	Nakamoto et al.	Jul 2008	B2
7404645	Margulis	Jul 2008	B2
7446775	Hara et al.	Nov 2008	B2
7451005	Hoffberg et al.	Nov 2008	B2
7471291	Kaufman et al.	Dec 2008	B2
7477360	England et al.	Jan 2009	B2
7480617	Chu et al.	Jan 2009	B2
7502026	Acosta et al.	Mar 2009	B2
7532220	Barenbrug et al.	May 2009	B2
7538772	Fouladi et al.	May 2009	B1
7573475	Sullivan et al.	Aug 2009	B2
7573489	Davidson et al.	Aug 2009	B2
7576748	Van Hook et al.	Aug 2009	B2
7613663	Commons et al.	Nov 2009	B1
7647087	Miga et al.	Jan 2010	B2
7650319	Hoffberg et al.	Jan 2010	B2
7671857	Pulli et al.	Mar 2010	B2
7677295	Fulton et al.	Mar 2010	B2
7684623	Shen et al.	Mar 2010	B2
7684934	Shvartsburg et al.	Mar 2010	B2
7685042	Monroe et al.	Mar 2010	B1
7689019	Boese et al.	Mar 2010	B2
7689588	Badr et al.	Mar 2010	B2
7692650	Ying et al.	Apr 2010	B2
7693318	Stalling et al.	Apr 2010	B1
7693333	Ryu et al.	Apr 2010	B2
7697748	Dimsdale et al.	Apr 2010	B2
7697765	Matsugu et al.	Apr 2010	B2
7699782	Angelsen et al.	Apr 2010	B2
7702064	Boese et al.	Apr 2010	B2
7702155	Glickman et al.	Apr 2010	B2
7702599	Widrow	Apr 2010	B2
7702660	Chan et al.	Apr 2010	B2
7706575	Liu et al.	Apr 2010	B2
7707128	Matsugu	Apr 2010	B2
7710115	Hargreaves	May 2010	B2
7712961	Horndler et al.	May 2010	B2
7715609	Rinck et al.	May 2010	B2
7719552	Karman	May 2010	B2
7777761	England et al.	Aug 2010	B2
7796790	McNutt et al.	Sep 2010	B2
7813822	Hoffberg	Oct 2010	B1
7831087	Harville	Nov 2010	B2
7843429	Pryor	Nov 2010	B2
7853053	Liu et al.	Dec 2010	B2
7859551	Bulman et al.	Dec 2010	B2
7868892	Hara et al.	Jan 2011	B2
7884823	Bertolami et al.	Feb 2011	B2
7904187	Hoffberg et al.	Mar 2011	B2
7940279	Pack	May 2011	B2
7966078	Hoffberg et al.	Jun 2011	B2
7974461	England et al.	Jul 2011	B2
7974714	Hoffberg	Jul 2011	B2
7987003	Hoffberg et al.	Jul 2011	B2
7990397	Bukowski et al.	Aug 2011	B2
7995069	Van Hook et al.	Aug 2011	B2
8009897	Xu et al.	Aug 2011	B2
8031060	Hoffberg et al.	Oct 2011	B2
8046313	Hoffberg et al.	Oct 2011	B2
8068095	Pryor	Nov 2011	B2
8078396	Meadow et al.	Dec 2011	B2
8089506	Takayama et al.	Jan 2012	B2
8139111	Oldroyd	Mar 2012	B2
8165916	Hoffberg et al.	Apr 2012	B2
8177551	Sachdeva et al.	May 2012	B2
8179393	Minear et al.	May 2012	B2
8207964	Meadow et al.	Jun 2012	B1
8294949	Shitara et al.	Oct 2012	B2
8305426	Matsubara	Nov 2012	B2
8306316	Kameyama	Nov 2012	B2
8364136	Hoffberg et al.	Jan 2013	B2
8369967	Hoffberg et al.	Feb 2013	B2
8370873	Shintani	Feb 2013	B2
8401276	Choe et al.	Mar 2013	B1
8428304	Kato et al.	Apr 2013	B2
8447098	Cohen et al.	May 2013	B1
8467133	Miller	Jun 2013	B2
8472120	Border et al.	Jun 2013	B2
8477425	Border et al.	Jul 2013	B2
8482859	Border et al.	Jul 2013	B2
8488246	Border et al.	Jul 2013	B2
8500558	Smith	Aug 2013	B2
8503539	Tran	Aug 2013	B2
8514267	Underwood et al.	Aug 2013	B2
8516266	Hoffberg et al.	Aug 2013	B2
8538136	Wilkinson et al.	Sep 2013	B2
8538139	Kameyama	Sep 2013	B2
8548230	Kameyama	Oct 2013	B2
8553782	Pace	Oct 2013	B2
8558848	Meadow et al.	Oct 2013	B2
8565518	Kameyama	Oct 2013	B2
8572642	Schraga	Oct 2013	B2
8583263	Hoffberg et al.	Nov 2013	B2
8594180	Yang et al.	Nov 2013	B2
8605995	Kameyama	Dec 2013	B2
8614668	Pryor	Dec 2013	B2
8634476	Tran	Jan 2014	B2
8659592	Wang et al.	Feb 2014	B2
8666081	Oh et al.	Mar 2014	B2
8666657	Meadow et al.	Mar 2014	B2
8696464	Smith	Apr 2014	B2
8736548	Pryor	May 2014	B2
8739202	Schraga	May 2014	B2
8755837	Rhoads et al.	Jun 2014	B2
8760398	Pryor	Jun 2014	B2
8761445	Shamir et al.	Jun 2014	B2
8787655	Tatsumi	Jul 2014	B2
8797620	Yankov et al.	Aug 2014	B2
8805007	Zhang	Aug 2014	B2
8814691	Haddick et al.	Aug 2014	B2
8842165	Wada	Sep 2014	B2
8847887	Pryor	Sep 2014	B2
8861836	Wei et al.	Oct 2014	B2
8878897	Huang et al.	Nov 2014	B2
8890866	Meadow et al.	Nov 2014	B2
8890937	Skubic et al.	Nov 2014	B2
RE45264	Meadow et al.	Dec 2014	E
8902226	Meadow et al.	Dec 2014	B2
8902248	Bidarkar et al.	Dec 2014	B1
8902971	Pace et al.	Dec 2014	B2
8907968	Tanaka et al.	Dec 2014	B2
8913107	Huang	Dec 2014	B2
8934759	Kikuchi	Jan 2015	B2
8941782	Kim et al.	Jan 2015	B2
8947441	Hodgins et al.	Feb 2015	B2
8947605	Eichenlaub	Feb 2015	B2
8963928	Koike	Feb 2015	B2
8964298	Haddick et al.	Feb 2015	B2
9014507	Mack et al.	Apr 2015	B2
9025859	Venkatraman et al.	May 2015	B2
9038098	Schraga	May 2015	B2
9053562	Rabin	Jun 2015	B1
9055277	Katayama et al.	Jun 2015	B2
9067133	Smith	Jun 2015	B2
9094615	Aman et al.	Jul 2015	B2
9094667	Hejl	Jul 2015	B1
9097890	Miller et al.	Aug 2015	B2
9097891	Border et al.	Aug 2015	B2
9098870	Meadow et al.	Aug 2015	B2
9104096	Koike	Aug 2015	B2
9106977	Pace	Aug 2015	B2
9128281	Osterhout et al.	Sep 2015	B2
9129295	Border et al.	Sep 2015	B2
9132352	Rabin et al.	Sep 2015	B1
9134534	Border et al.	Sep 2015	B2
9152886	Sakai	Oct 2015	B2
9177413	Tatarinov et al.	Nov 2015	B2
9182596	Border et al.	Nov 2015	B2
9223134	Miller et al.	Dec 2015	B2
9229227	Border et al.	Jan 2016	B2
9239951	Hoffberg et al.	Jan 2016	B2
9285589	Osterhout et al.	Mar 2016	B2
9292953	Wrenninge	Mar 2016	B1
9292954	Wrenninge	Mar 2016	B1
9298320	McCaughan et al.	Mar 2016	B2
9311396	Meadow et al.	Apr 2016	B2
9311397	Meadow et al.	Apr 2016	B2
9311737	Wrenninge	Apr 2016	B1
9329689	Osterhout et al.	May 2016	B2
9330483	Du et al.	May 2016	B2
9341843	Border et al.	May 2016	B2
9363535	Chen et al.	Jun 2016	B2
9363576	Schraga	Jun 2016	B2
9366862	Haddick et al.	Jun 2016	B2
9384277	Meadow et al.	Jul 2016	B2
9386291	Tomioka et al.	Jul 2016	B2
9386294	Luthra et al.	Jul 2016	B2
9402072	Onishi et al.	Jul 2016	B2
9407896	Lam et al.	Aug 2016	B2
9407939	Schraga	Aug 2016	B2
9438892	Onishi et al.	Sep 2016	B2
9456131	Tran	Sep 2016	B2
9462257	Zurek et al.	Oct 2016	B2
9489762	Jenkins	Nov 2016	B2
9495791	Maleki et al.	Nov 2016	B2
9532069	Pace et al.	Dec 2016	B2
9535563	Hoffberg et al.	Jan 2017	B2
9538219	Sakata et al.	Jan 2017	B2
RE46310	Hoffberg et al.	Feb 2017	E
9578234	Tran	Feb 2017	B2
9578345	DeForest et al.	Feb 2017	B2
9582924	McNabb	Feb 2017	B2
9592445	Smith	Mar 2017	B2
9614892	Bidarkar et al.	Apr 2017	B2
9621901	Hejl	Apr 2017	B1
9661215	Sivan	May 2017	B2
9665800	Kuffner, Jr.	May 2017	B1
9743078	DeForest et al.	Aug 2017	B2
9749532	Hinkel et al.	Aug 2017	B1
9759917	Osterhout et al.	Sep 2017	B2
9787899	Hinkel et al.	Oct 2017	B1
9795882	Rabin et al.	Oct 2017	B1
9846961	Haimovitch-Yogev et al.	Dec 2017	B2
9850109	Schoonmaker	Dec 2017	B2
9852538	Jenkins	Dec 2017	B2
9866748	Sivan	Jan 2018	B2
9875406	Haddick et al.	Jan 2018	B2
9881389	Higgs et al.	Jan 2018	B1
9892519	Vejarano et al.	Feb 2018	B2
9916538	Zadeh et al.	Mar 2018	B2
9947134	Komenczi et al.	Apr 2018	B2
9961376	Schraga	May 2018	B2
20010052899	Simpson et al.	Dec 2001	A1
20020012454	Liu et al.	Jan 2002	A1
20020036617	Pryor	Mar 2002	A1
20020059042	Kacyra et al.	May 2002	A1
20020063807	Margulis	May 2002	A1
20020067355	Randel	Jun 2002	A1
20020080143	Morgan et al.	Jun 2002	A1
20020085000	Sullivan et al.	Jul 2002	A1
20020085219	Ramamoorthy	Jul 2002	A1
20020097380	Moulton et al.	Jul 2002	A1
20020102010	Liu et al.	Aug 2002	A1
20020145607	Dimsdale	Oct 2002	A1
20020149585	Kacyra et al.	Oct 2002	A1
20020151992	Hoffberg et al.	Oct 2002	A1
20020158865	Dye et al.	Oct 2002	A1
20020158870	Brunkhart et al.	Oct 2002	A1
20020158872	Randel	Oct 2002	A1
20020176619	Love	Nov 2002	A1
20020181765	Mori	Dec 2002	A1
20020186217	Kamata et al.	Dec 2002	A1
20030001835	Dimsdale et al.	Jan 2003	A1
20030051255	Bulman et al.	Mar 2003	A1
20030080963	Van Hook et al.	May 2003	A1
20030164829	Bregler et al.	Sep 2003	A1
20030197737	Kim	Oct 2003	A1
20040006273	Kim et al.	Jan 2004	A1
20040041813	Kim	Mar 2004	A1
20040104915	Mori et al.	Jun 2004	A1
20040109608	Love et al.	Jun 2004	A1
20040114800	Ponomarev et al.	Jun 2004	A1
20040125103	Kaufman et al.	Jul 2004	A1
20040157662	Tsuchiya	Aug 2004	A1
20040164956	Yamaguchi et al.	Aug 2004	A1
20040164957	Yamaguchi et al.	Aug 2004	A1
20040165776	Brouwer	Aug 2004	A1
20040179107	Benton	Sep 2004	A1
20040197727	Sachdeva et al.	Oct 2004	A1
20040208344	Liu et al.	Oct 2004	A1
20040213438	Liu et al.	Oct 2004	A1
20040213453	Liu et al.	Oct 2004	A1
20040240725	Xu et al.	Dec 2004	A1
20050007374	Kuo et al.	Jan 2005	A1
20050008196	Liu et al.	Jan 2005	A1
20050024378	Pearce et al.	Feb 2005	A1
20050030311	Hara et al.	Feb 2005	A1
20050047630	Liu et al.	Mar 2005	A1
20050053276	Curti et al.	Mar 2005	A1
20050053277	Liu et al.	Mar 2005	A1
20050074145	Liu et al.	Apr 2005	A1
20050099414	Kaye et al.	May 2005	A1
20050129315	Liu et al.	Jun 2005	A1
20050135660	Liu et al.	Jun 2005	A1
20050146521	Kaye et al.	Jul 2005	A1
20050162436	Van Hook et al.	Jul 2005	A1
20050168461	Acosta et al.	Aug 2005	A1
20050171456	Hirschman et al.	Aug 2005	A1
20050185711	Pfister et al.	Aug 2005	A1
20050190962	Liu et al.	Sep 2005	A1
20050195200	Chuang et al.	Sep 2005	A1
20050195210	Demers et al.	Sep 2005	A1
20050195901	Pohjola et al.	Sep 2005	A1
20050207623	Liu et al.	Sep 2005	A1
20050213820	Liu et al.	Sep 2005	A1
20050243323	Hsu et al.	Nov 2005	A1
20060007301	Cho et al.	Jan 2006	A1
20060009978	Ma et al.	Jan 2006	A1
20060033713	Pryor	Feb 2006	A1
20060061566	Verma et al.	Mar 2006	A1
20060061651	Tetterington	Mar 2006	A1
20060064716	Sull et al.	Mar 2006	A1
20060104490	Liu et al.	May 2006	A1
20060104491	Liu et al.	May 2006	A1
20060110027	Liu et al.	May 2006	A1
20060126924	Liu et al.	Jun 2006	A1
20060146049	Pulli et al.	Jul 2006	A1
20060155398	Hoffberg et al.	Jul 2006	A1
20060184966	Hunleth et al.	Aug 2006	A1
20060197768	Van Hook et al.	Sep 2006	A1
20060200253	Hoffberg et al.	Sep 2006	A1
20060200258	Hoffberg et al.	Sep 2006	A1
20060200259	Hoffberg et al.	Sep 2006	A1
20060200260	Hoffberg et al.	Sep 2006	A1
20060232598	Barenbrug et al.	Oct 2006	A1
20060238613	Takayama et al.	Oct 2006	A1
20060244746	England et al.	Nov 2006	A1
20060244749	Kondo et al.	Nov 2006	A1
20060279569	Acosta et al.	Dec 2006	A1
20070016476	Hoffberg et al.	Jan 2007	A1
20070018979	Budagavi	Jan 2007	A1
20070035706	Margulis	Feb 2007	A1
20070053513	Hoffberg	Mar 2007	A1
20070060359	Smith	Mar 2007	A1
20070061022	Hoffberg-Borghesani et al.	Mar 2007	A1
20070061023	Hoffberg et al.	Mar 2007	A1
20070061735	Hoffberg et al.	Mar 2007	A1
20070070038	Hoffberg et al.	Mar 2007	A1
20070070083	Fouladi et al.	Mar 2007	A1
20070081718	Rubbert et al.	Apr 2007	A1
20070099147	Sachdeva et al.	May 2007	A1
20070133848	McNutt et al.	Jun 2007	A1
20070195087	Acosta et al.	Aug 2007	A1
20070206008	Kaufman et al.	Sep 2007	A1
20070279412	Davidson et al.	Dec 2007	A1
20070279415	Sullivan et al.	Dec 2007	A1
20070279494	Aman et al.	Dec 2007	A1
20070280528	Wellington et al.	Dec 2007	A1
20080024490	Loop et al.	Jan 2008	A1
20080101109	Haring-Bolivar et al.	May 2008	A1
20080150945	Wang et al.	Jun 2008	A1
20080152216	Meadow et al.	Jun 2008	A1
20080168489	Schraga	Jul 2008	A1
20080170067	Kim et al.	Jul 2008	A1
20080192116	Tamir et al.	Aug 2008	A1
20080198920	Yang et al.	Aug 2008	A1
20080225046	Pulli et al.	Sep 2008	A1
20080225047	Pulli et al.	Sep 2008	A1
20080226123	Birtwistle et al.	Sep 2008	A1
20080228449	Birtwistle et al.	Sep 2008	A1
20080246622	Chen	Oct 2008	A1
20080256130	Kirby et al.	Oct 2008	A1
20080267582	Yamauchi	Oct 2008	A1
20080270335	Matsugu	Oct 2008	A1
20080270338	Adams	Oct 2008	A1
20080273173	Grotehusmann et al.	Nov 2008	A1
20080281591	Droppo et al.	Nov 2008	A1
20080304707	Oi et al.	Dec 2008	A1
20080317350	Yamaguchi et al.	Dec 2008	A1
20080319568	Berndlmaier et al.	Dec 2008	A1
20090006101	Rigazio et al.	Jan 2009	A1
20090010529	Zhou et al.	Jan 2009	A1
20090015590	Hara et al.	Jan 2009	A1
20090027402	Bakalash et al.	Jan 2009	A1
20090034366	Mathiszik et al.	Feb 2009	A1
20090035869	Scuor	Feb 2009	A1
20090049890	Zhong et al.	Feb 2009	A1
20090076347	Anderson et al.	Mar 2009	A1
20090080757	Roger et al.	Mar 2009	A1
20090080778	Lee et al.	Mar 2009	A1
20090080803	Hara et al.	Mar 2009	A1
20090087040	Torii et al.	Apr 2009	A1
20090087084	Neigovzen et al.	Apr 2009	A1
20090097722	Dekel et al.	Apr 2009	A1
20090122979	Lee et al.	May 2009	A1
20090128551	Bakalash et al.	May 2009	A1
20090141024	Lee et al.	Jun 2009	A1
20090144173	Mo et al.	Jun 2009	A1
20090144213	Patil et al.	Jun 2009	A1
20090144448	Smith	Jun 2009	A1
20090146657	Hebrank et al.	Jun 2009	A1
20090148070	Hwang et al.	Jun 2009	A1
20090149156	Yeo	Jun 2009	A1
20090152356	Reddy et al.	Jun 2009	A1
20090153553	Kim et al.	Jun 2009	A1
20090154794	Kim et al.	Jun 2009	A1
20090161944	Lau et al.	Jun 2009	A1
20090161989	Sim	Jun 2009	A1
20090164339	Rothman	Jun 2009	A1
20090167595	Cross et al.	Jul 2009	A1
20090169076	Lobregt et al.	Jul 2009	A1
20090169118	Eichhorn et al.	Jul 2009	A1
20090179896	Rottger	Jul 2009	A1
20090181769	Thomas et al.	Jul 2009	A1
20090184349	Dungan	Jul 2009	A1
20090185750	Schneider	Jul 2009	A1
20090189889	Engel et al.	Jul 2009	A1
20090195640	Kim et al.	Aug 2009	A1
20090196492	Jung et al.	Aug 2009	A1
20090208112	Hamamura et al.	Aug 2009	A1
20090213113	Sim et al.	Aug 2009	A1
20090220155	Yamamoto et al.	Sep 2009	A1
20090225073	Baker	Sep 2009	A1
20090226183	Kang	Sep 2009	A1
20090231327	Minear et al.	Sep 2009	A1
20090232355	Minear et al.	Sep 2009	A1
20090232388	Minear et al.	Sep 2009	A1
20090232399	Kawahara et al.	Sep 2009	A1
20090237327	Park et al.	Sep 2009	A1
20090254496	Kanevsky et al.	Oct 2009	A1
20090262108	Davidson et al.	Oct 2009	A1
20090262184	Engle et al.	Oct 2009	A1
20090268964	Takahashi	Oct 2009	A1
20090272015	Schnuckle	Nov 2009	A1
20090273601	Kim	Nov 2009	A1
20090279756	Gindele et al.	Nov 2009	A1
20090287624	Rouat et al.	Nov 2009	A1
20090290788	Bogan et al.	Nov 2009	A1
20090290800	Lo	Nov 2009	A1
20090290811	Imai	Nov 2009	A1
20090295801	Fritz et al.	Dec 2009	A1
20090295805	Ha et al.	Dec 2009	A1
20090297000	Shahaf et al.	Dec 2009	A1
20090297010	Fritz et al.	Dec 2009	A1
20090297011	Brunner et al.	Dec 2009	A1
20090297021	Islam et al.	Dec 2009	A1
20090309966	Chen et al.	Dec 2009	A1
20090310216	Roh et al.	Dec 2009	A1
20090315979	Jung et al.	Dec 2009	A1
20090322742	Muktinutalapati et al.	Dec 2009	A1
20090322860	Zhang	Dec 2009	A1
20090324014	Kato et al.	Dec 2009	A1
20090324107	Walch	Dec 2009	A1
20090326841	Zhang et al.	Dec 2009	A1
20100007659	Ludwig et al.	Jan 2010	A1
20100014781	Liu et al.	Jan 2010	A1
20100016750	Anderson et al.	Jan 2010	A1
20100020159	Underwood et al.	Jan 2010	A1
20100026642	Kim et al.	Feb 2010	A1
20100026722	Kondo	Feb 2010	A1
20100026789	Balogh	Feb 2010	A1
20100026909	Yoon	Feb 2010	A1
20100027606	Dai et al.	Feb 2010	A1
20100027611	Dai et al.	Feb 2010	A1
20100034450	Mertelmeier	Feb 2010	A1
20100034469	Thorpe et al.	Feb 2010	A1
20100039573	Park et al.	Feb 2010	A1
20100045461	Caler et al.	Feb 2010	A1
20100045696	Bruder et al.	Feb 2010	A1
20100046796	Pietquin	Feb 2010	A1
20100047811	Winfried et al.	Feb 2010	A1
20100060857	Richards et al.	Mar 2010	A1
20100061598	Seo	Mar 2010	A1
20100061603	Mielekamp et al.	Mar 2010	A1
20100063992	Ma et al.	Mar 2010	A1
20100066701	Ningrat	Mar 2010	A1
20100073366	Tateno	Mar 2010	A1
20100073394	Van Hook et al.	Mar 2010	A1
20100076642	Hoffberg et al.	Mar 2010	A1
20100082299	Dhanekula et al.	Apr 2010	A1
20100085358	Wegbreit et al.	Apr 2010	A1
20100086099	Kuzmanovic	Apr 2010	A1
20100086220	Minear	Apr 2010	A1
20100091354	Nam et al.	Apr 2010	A1
20100092075	Lee et al.	Apr 2010	A1
20100097374	Fan et al.	Apr 2010	A1
20100099198	Zhao et al.	Apr 2010	A1
20100110070	Kim et al.	May 2010	A1
20100110162	Yun et al.	May 2010	A1
20100115347	Noyes	May 2010	A1
20100118053	Karp et al.	May 2010	A1
20100118125	Park	May 2010	A1
20100121798	Matsugu et al.	May 2010	A1
20100123716	Li et al.	May 2010	A1
20100124368	Ye et al.	May 2010	A1
20100142748	Oldroyd	Jun 2010	A1
20100157425	Oh	Jun 2010	A1
20100158099	Kalva	Jun 2010	A1
20100189310	Liu et al.	Jul 2010	A1
20100207936	Minear et al.	Aug 2010	A1
20100208981	Minear et al.	Aug 2010	A1
20100209013	Minear et al.	Aug 2010	A1
20100218228	Walter	Aug 2010	A1
20100220893	Lee et al.	Sep 2010	A1
20100309209	Hodgins et al.	Dec 2010	A1
20100315415	Asami	Dec 2010	A1
20100328436	Skubic et al.	Dec 2010	A1
20110018867	Shibamiya et al.	Jan 2011	A1
20110026811	Kameyama	Feb 2011	A1
20110026849	Kameyama	Feb 2011	A1
20110043540	Fancher et al.	Feb 2011	A1
20110052045	Kameyama	Mar 2011	A1
20110058028	Sakai	Mar 2011	A1
20110059798	Pryor	Mar 2011	A1
20110063410	Robert	Mar 2011	A1
20110069152	Wang et al.	Mar 2011	A1
20110109722	Oh et al.	May 2011	A1
20110115812	Minear et al.	May 2011	A1
20110117530	Albocher et al.	May 2011	A1
20110156896	Hoffberg et al.	Jun 2011	A1
20110167110	Hoffberg et al.	Jul 2011	A1
20110188780	Wang et al.	Aug 2011	A1
20110200249	Minear et al.	Aug 2011	A1
20110211036	Tran	Sep 2011	A1
20110211094	Schraga	Sep 2011	A1
20110213664	Osterhout et al.	Sep 2011	A1
20110214082	Osterhout et al.	Sep 2011	A1
20110221656	Haddick et al.	Sep 2011	A1
20110221657	Haddick et al.	Sep 2011	A1
20110221658	Haddick et al.	Sep 2011	A1
20110221659	King et al.	Sep 2011	A1
20110221668	Haddick et al.	Sep 2011	A1
20110221669	Shams et al.	Sep 2011	A1
20110221670	King et al.	Sep 2011	A1
20110221671	King et al.	Sep 2011	A1
20110221672	Osterhout et al.	Sep 2011	A1
20110221793	King et al.	Sep 2011	A1
20110221896	Haddick et al.	Sep 2011	A1
20110221897	Haddick et al.	Sep 2011	A1
20110222745	Osterhout et al.	Sep 2011	A1
20110225536	Shams et al.	Sep 2011	A1
20110225611	Shintani	Sep 2011	A1
20110227812	Haddick et al.	Sep 2011	A1
20110227813	Haddick et al.	Sep 2011	A1
20110227820	Haddick et al.	Sep 2011	A1
20110231757	Haddick et al.	Sep 2011	A1
20110255746	Berkovich et al.	Oct 2011	A1
20110268426	Kikuchi	Nov 2011	A1
20110311128	Wilkinson et al.	Dec 2011	A1
20120007950	Yang	Jan 2012	A1
20120033873	Ozeki et al.	Feb 2012	A1
20120036016	Hoffberg et al.	Feb 2012	A1
20120040755	Pryor	Feb 2012	A1
20120062445	Haddick et al.	Mar 2012	A1
20120075168	Osterhout et al.	Mar 2012	A1
20120076358	Meadow et al.	Mar 2012	A1
20120086782	Wada	Apr 2012	A1
20120114226	Kameyama	May 2012	A1
20120120190	Lee	May 2012	A1
20120120191	Lee	May 2012	A1
20120121164	Tatsumi	May 2012	A1
20120127159	Jeon et al.	May 2012	A1
20120128238	Kameyama	May 2012	A1
20120134579	Kameyama	May 2012	A1
20120146997	Ishimaru et al.	Jun 2012	A1
20120147135	Matsubara	Jun 2012	A1
20120147154	Matsubara	Jun 2012	A1
20120150651	Hoffberg et al.	Jun 2012	A1
20120154529	Kobayashi	Jun 2012	A1
20120162363	Huang et al.	Jun 2012	A1
20120162396	Huang	Jun 2012	A1
20120169843	Luthra et al.	Jul 2012	A1
20120176477	Givon	Jul 2012	A1
20120179413	Hasse et al.	Jul 2012	A1
20120182387	Enenkl et al.	Jul 2012	A1
20120183202	Wei et al.	Jul 2012	A1
20120194418	Osterhout et al.	Aug 2012	A1
20120194419	Osterhout et al.	Aug 2012	A1
20120194420	Osterhout et al.	Aug 2012	A1
20120194549	Osterhout et al.	Aug 2012	A1
20120194550	Osterhout et al.	Aug 2012	A1
20120194551	Osterhout et al.	Aug 2012	A1
20120194552	Osterhout et al.	Aug 2012	A1
20120194553	Osterhout et al.	Aug 2012	A1
20120200488	Osterhout et al.	Aug 2012	A1
20120200499	Osterhout et al.	Aug 2012	A1
20120200601	Osterhout et al.	Aug 2012	A1
20120200680	So et al.	Aug 2012	A1
20120206322	Osterhout et al.	Aug 2012	A1
20120206323	Osterhout et al.	Aug 2012	A1
20120206334	Osterhout et al.	Aug 2012	A1
20120206335	Osterhout et al.	Aug 2012	A1
20120206485	Osterhout et al.	Aug 2012	A1
20120212398	Border et al.	Aug 2012	A1
20120212399	Border et al.	Aug 2012	A1
20120212400	Border et al.	Aug 2012	A1
20120212406	Osterhout et al.	Aug 2012	A1
20120212414	Osterhout et al.	Aug 2012	A1
20120212484	Haddick et al.	Aug 2012	A1
20120212499	Haddick et al.	Aug 2012	A1
20120218172	Border et al.	Aug 2012	A1
20120218301	Miller	Aug 2012	A1
20120223944	Koike	Sep 2012	A1
20120229445	Jenkins	Sep 2012	A1
20120229462	Eichenlaub	Sep 2012	A1
20120235883	Border et al.	Sep 2012	A1
20120235884	Miller et al.	Sep 2012	A1
20120235885	Miller et al.	Sep 2012	A1
20120235886	Border et al.	Sep 2012	A1
20120235887	Border et al.	Sep 2012	A1
20120235900	Border et al.	Sep 2012	A1
20120235988	Karafin	Sep 2012	A1
20120236030	Border et al.	Sep 2012	A1
20120236031	Haddick et al.	Sep 2012	A1
20120242678	Border et al.	Sep 2012	A1
20120242697	Border et al.	Sep 2012	A1
20120242698	Haddick et al.	Sep 2012	A1
20120249754	Akashi	Oct 2012	A1
20120249797	Haddick et al.	Oct 2012	A1
20120287233	Wang et al.	Nov 2012	A1
20120293505	Meadow et al.	Nov 2012	A1
20120307004	Budagavi	Dec 2012	A1
20120314038	Murayama et al.	Dec 2012	A1
20120320050	Koike	Dec 2012	A1
20130010061	Matsubara	Jan 2013	A1
20130022111	Chen et al.	Jan 2013	A1
20130034266	Shamir et al.	Feb 2013	A1
20130042259	Urbach	Feb 2013	A1
20130044108	Tanaka et al.	Feb 2013	A1
20130050451	Shintani	Feb 2013	A1
20130083164	Engelbert et al.	Apr 2013	A1
20130091515	Sakata et al.	Apr 2013	A1
20130094696	Zhang	Apr 2013	A1
20130100132	Katayama et al.	Apr 2013	A1
20130127980	Haddick et al.	May 2013	A1
20130129190	Cohen et al.	May 2013	A1
20130135447	Kim	May 2013	A1
20130155206	Lazarski et al.	Jun 2013	A1
20130155477	Yankov et al.	Jun 2013	A1
20130169527	Pryor	Jul 2013	A1
20130173235	Freezer	Jul 2013	A1
20130176393	Onishi et al.	Jul 2013	A1
20130179034	Pryor	Jul 2013	A1
20130182108	Meadow et al.	Jul 2013	A1
20130182130	Tran	Jul 2013	A1
20130187949	Meadow et al.	Jul 2013	A1
20130188008	Meadow et al.	Jul 2013	A1
20130188052	Meadow et al.	Jul 2013	A1
20130191252	Meadow et al.	Jul 2013	A1
20130191292	Meadow et al.	Jul 2013	A1
20130191295	Meadow et al.	Jul 2013	A1
20130191359	Meadow et al.	Jul 2013	A1
20130191725	Meadow et al.	Jul 2013	A1
20130194397	Kim et al.	Aug 2013	A1
20130194502	Kim et al.	Aug 2013	A1
20130201340	Meadow et al.	Aug 2013	A1
20130201341	Meadow et al.	Aug 2013	A1
20130212052	Yu	Aug 2013	A1
20130215115	Jenkins	Aug 2013	A1
20130215241	Onishi et al.	Aug 2013	A1
20130229488	Ishimaru et al.	Sep 2013	A1
20130235154	Salton-Morgenstern et al.	Sep 2013	A1
20130249791	Pryor	Sep 2013	A1
20130266292	Sandrew	Oct 2013	A1
20130273968	Rhoads et al.	Oct 2013	A1
20130278631	Border et al.	Oct 2013	A1
20130278727	Tamir et al.	Oct 2013	A1
20130296052	Smith	Nov 2013	A1
20130302013	Takeshita	Nov 2013	A1
20130308800	Bacon	Nov 2013	A1
20130314303	Osterhout et al.	Nov 2013	A1
20130321577	Kobayashi	Dec 2013	A1
20130328858	Meadow et al.	Dec 2013	A9
20130342513	Kim et al.	Dec 2013	A1
20140002441	Hung et al.	Jan 2014	A1
20140019301	Meadow et al.	Jan 2014	A1
20140019302	Meadow et al.	Jan 2014	A1
20140026164	Schraga	Jan 2014	A1
20140029837	Venkatraman et al.	Jan 2014	A1
20140035934	Du et al.	Feb 2014	A1
20140062999	Syu et al.	Mar 2014	A1
20140063054	Osterhout et al.	Mar 2014	A1
20140063055	Osterhout et al.	Mar 2014	A1
20140064567	Kim et al.	Mar 2014	A1
20140085501	Tran	Mar 2014	A1
20140089241	Hoffberg et al.	Mar 2014	A1
20140089811	Meadow et al.	Mar 2014	A1
20140114630	Brave	Apr 2014	A1
20140173452	Hoffberg et al.	Jun 2014	A1
20140184738	Tomioka et al.	Jul 2014	A1
20140192147	Mack et al.	Jul 2014	A1
20140201126	Zadeh et al.	Jul 2014	A1
20140201770	Schraga	Jul 2014	A1
20140211015	Meadow et al.	Jul 2014	A1
20140218358	Mack et al.	Aug 2014	A1
20140226722	Izvorski et al.	Aug 2014	A1
20140240293	McCaughan et al.	Aug 2014	A1
20140267280	Zurek et al.	Sep 2014	A1
20140300758	Tran	Oct 2014	A1
20140347362	Maleki et al.	Nov 2014	A1
20140354548	Lee	Dec 2014	A1
20150002508	Tatarinov et al.	Jan 2015	A1
20150002682	Tran	Jan 2015	A1
20150016748	Ko et al.	Jan 2015	A1
20150057982	Erdman et al.	Feb 2015	A1
20150084979	Bidarkar et al.	Mar 2015	A1
20150103081	Bae et al.	Apr 2015	A1
20150138322	Kawamura	May 2015	A1
20150161818	Komenczi et al.	Jun 2015	A1
20150181195	Bruls et al.	Jun 2015	A1
20150229974	Schraga	Aug 2015	A1
20150264258	Bervoets et al.	Sep 2015	A1
20150269737	Lam et al.	Sep 2015	A1
20150269770	Jenkins	Sep 2015	A1
20150289015	Jung	Oct 2015	A1
20150290544	Smith	Oct 2015	A1
20150293525	Yamamoto et al.	Oct 2015	A1
20150297949	Aman et al.	Oct 2015	A1
20150309316	Osterhout et al.	Oct 2015	A1
20150317822	Haimovitch-Yogev et al.	Nov 2015	A1
20150319424	Haimovitch-Yogev et al.	Nov 2015	A1
20150324840	Ramnath Krishnan	Nov 2015	A1
20150338524	Ben Moshe et al.	Nov 2015	A1
20150356789	Komatsu et al.	Dec 2015	A1
20160005214	Jenkins	Jan 2016	A1
20160012855	Krishnan	Jan 2016	A1
20160042552	Mcnabb	Feb 2016	A1
20160093065	Vejarano et al.	Mar 2016	A1
20160109940	Lyren et al.	Apr 2016	A1
20160150223	Hwang et al.	May 2016	A1
20160176686	Schoonmaker	Jun 2016	A1
20160180193	Masters et al.	Jun 2016	A1
20160180441	Hasan et al.	Jun 2016	A1
20160182894	Haimovitch-Yogev et al.	Jun 2016	A1
20160187654	Border et al.	Jun 2016	A1
20160189421	Haimovitch-Yogev et al.	Jun 2016	A1
20160209648	Haddick et al.	Jul 2016	A1
20160217327	Osterhout et al.	Jul 2016	A1
20160241828	Richards et al.	Aug 2016	A1
20160261793	Sivan	Sep 2016	A1
20160277397	Watanabe	Sep 2016	A1
20160284121	Azuma	Sep 2016	A1
20160286196	Luthra et al.	Sep 2016	A1
20160328604	Bulzacki	Nov 2016	A1
20160330457	Ye et al.	Nov 2016	A1
20160345034	Schraga	Nov 2016	A1
20170094165	Meadow et al.	Mar 2017	A1
20170136360	Smith	May 2017	A1
20170168566	Osterhout et al.	Jun 2017	A1
20170237897	Sivan	Aug 2017	A1
20170262045	Rouvinez et al.	Sep 2017	A1
20170344114	Osterhout et al.	Nov 2017	A1
20180017501	Trenholm et al.	Jan 2018	A1
20180040119	Trenholm et al.	Feb 2018	A1
20180048820	Hinkel et al.	Feb 2018	A1
20180048858	Tran	Feb 2018	A1
20180077380	Tran	Mar 2018	A1
20180089091	Akenine-Moller et al.	Mar 2018	A1
20180179029	Schoonmaker	Jun 2018	A1
20180204078	Seng et al.	Jul 2018	A1
20180204111	Zadeh et al.	Jul 2018	A1
20180210442	Guo et al.	Jul 2018	A1
20180210997	Ashra et al.	Jul 2018	A1

Non-Patent Literature Citations (131)

Entry
John S. Boreczky et al, A hidden Markov model framework for video segmentation using audio and image features, 1998, Palto Alto, CA.
Gerasimos Potamianos et al, An Image Transform Approach for HMM Based Automatic Lipreading, 1998, vol. III, pp. 173-177, Chicago.
Gerasimos Potamianos et al, Recent Advances in the Automatic Recognition of Audio-Visual Speech, Proceedings of the IEEE, vol. 91, No. 9, Sep. 2003.
Michael Isard et al, Condensation—Conditional Density Propagation for Visual Tracking, Int. J. Computer Vision, 1998.
Carlo Tomasi, Shape and Motion from Image Streams under Orthography: a Factorization Method, International Journal of Computer Vision, 9:2, 137-154, 1992.
Brian Babcock et al, Models and Issues in Data Stream Systems, Department of Computer Science, Stanford, CA, 2002.
Simon Lucey et al, Integration Strategies for Audio-Visual Speech Processing: Applied to Text Dependent Speaker Recognition, Pittsburgh, PA, 2005.
Yu-Fei Ma et al, A User Attention Model for Video Summarization, Beijing, China 2002.
Brian Clarkson et al, Unsupervised Clustering of Ambulatory Audio and Video, Cambridge, MA, 1999.
Christoph Bregler, et al, Video rewrite: Driving visual speech with audio, 1997.
Thomas Sikora, The MPEG-4 video standard verification model, IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, No. 1, Feb. 1997.
Zhu Liu et al, Classification TV programs based on audio information using hidden Markov model, Brooklyn, NY 1998.
George W. Uetz et al, Multisensory cues and multimodal communication in spiders: insights from video/audio playback studies, Brain Behav Evol, 59:222-230, 2002.
Robert W. Frischholz et al, BioID: A multimodal biometric identification system, Feb. 2000.
Samy Bengio, An asynchronous hidden markov model for audio-visual speech recognition, Switzerland, 2003.
Nael Hirzalla et al, A temporal model for interactive multimedia scenarios, IEEE Multimedia, vol. 2, No. 3, pp. 24-31, Fall 1995.
Matthew J. Beal et al, Audio-video sensor fusion with probabilistic graphical models, Redmond, WA, 2002.
Tanzeem Choudhury et al, Multimodal person recognition using unconstrained audio and video, Cambridge, MA, 1999.
Yu-Fei Ma et al, A generic framework of user attention model and its application in video summarization, Beijing, China, 2005.
Min Xu et al, Creating audio keywords for event detection in soccer video, Singapore, 2003.
Ricardo Machado Leite De Barros et al, A Method to Synchronise Video Cameras Using the Audio Band, Brazil, Dec. 2004.
Sumedh Mungee et al, The design and performance of a CORBA audio/video streaming service, St. Louis, MO 1999.
P. Venkat Rangan et al, Designing file systems for digital video and audio, San Diego, CA 1991.
Malcolm Slaney et al., Facesync: A linear operator for measuring synchronization of video facial images and audio tracks, San Jose, CA 2001.
Isao Hara et al, Robust speech interface based on audio and video information fusion for humanoid HRP-2, IEEE Int. Conf. on Intelligent Robots and Systems, Sandai, Japan, Sep. 28-Oct. 2, 2004.
Robin Mason, Models of Online Courses, ALN Magazine, United Kingdom, 1998.
Isao Otsuka et al, A highlight scene detection and video summarization system using audio feature for a personal video recorder, Cambridge, MA, 2005.
W. H. Adams et al, Semantic indexing of multimedia content using visual, audio, and text cues, EURASIP Journal on Applied Signal Processing, pp. 170-185, 2003.
Ara V. Nefian et al, A coupled HMM for audio-visual speech recognition, Santa Clara, CA 2002.
Juergen Luettin, Asynchronous stream modeling for large vocabulary audio-visual speech recognition, Switzerland, 2001.
Klaus Havelund et al, Formal modeling and analysis of an audio/video protocol: an industrial case study using UPPAAL, Nov. 1997.
Victor Kulesh et al, Video clip recognition using joint audio-visual processing model, Rochester, MI 2002.
Valerie Gay et al, Specification of multiparty audio and video interaction based on the Reference Model of Open Distributed Processing, Comp.uter Networks and ISDN Systems 27, pp. 1247-1262, 1995.
Shi-Fu Chang et al, Combining text and audio-visual features in video indexing, Massachusetts, 2005.
J. Heere et al, The reference model architecture for MPEG spatial audio coding, Barcelona, Spain, 2005.
Alexander G. Hauptmann et al, Text, Speech, and Vision for Video Segmentation: The Informedia TM Project, Pittsburgh, PA 1995.
Ara V. Nefian et al, Dynamic Bayesian networks for audio-visual speech recognition, EURASIP Journal on Applied Signal Processing 2002:11, pp. 1274-1288, 2002.
Rui Cai et al, Highlight sound effects detection in audio stream, Beijing, China 2003.
Xiaoxing Liu et al, Audio-visual continuous speech recognition using a coupled hidden Markov model, 2002.
Peter K. Doenges et al, MPEG-4: Audio/video and synthetic graphics/audio for mixed media, Salt Lake City, Utah 1997.
Dov Te'Eni, Review: A cognitive-affective model of organizational communication for designing IT, MIS Quarterly vol. 25, No. 2, pp. 251-312, Jun. 2001.
D. A. Sadlier et al, Event detection in field sports video using audio-visual features and a support vector machine, IEEE Transaction on Circuits and Systems for Video Technology, vol. 15, No. 10, Oct. 2005.
Zhihai He, A linear source model and a unified rate control algorithm for DCT video coding, IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, No. 11, Nov. 2002.
Iain Matthews et al, A comparison of model and transform-based visual features for audio-visual LVSCR, Pittsburgh, PA 2001.
W. Zajdel et al, CASSANDRA: audio-video sensor fusion for aggression detection, 2007.
B. Ugur Toreyin et al, HMM based falling person detection using both audio and video, Ankara, Turkey 2005.
M. R. Naphade et al, Probabilistic multimedia objects (multijects): A novel approach to video indexing and retrieval in multimedia systems, Chicago, Illinois 1998.
Hao Jiang et al, Video segmentation with the support of audio segmentation and classification, Beijing, China 2000.
Yao Wang et al, Multimedia content analysis—using both audio and visual clues, IEEE Signal Processing Magazine, Nov. 2000.
Chen-Nee Chuah et al, Characterizing packet audio streams from internet multimedia applications, Davis, CA 2002.
Hao Jiang et al, Video segmentation with the assistance of audio content analysis, Beijing, China 2000.
Timothy J. Hazen, Visual model structures and synchrony constraints for audio-visual speech recognition, 2006.
Michael A. Smith et al, Video skimming for quick browsing based on audio and image characterization, Pittsburgh, PA, Jul. 30, 1995.
Dongge Li et al, Classification of general audio data for content-based retrieval, Detroit, MI 2001.
M. Baillie et al, Audio-based event detection for sports video, 2003.
Kunio Kashino, A quick search method for audio and video signals based on histogram pruning, IEEE Transactions on Multimedia, vol. 5, No. 3, Sep. 2003.
Liwei He et al, Auto-summarization of audio-video presentations, Redmond, WA 1999.
Ramesh Jain et al, Metadata in video databases, La Jolla, CA 1994.
John W. Fisher et al, Learning joint statistical models for audio-visual fusion and segregation, Cambridge, MA 2001.
Stefan Eickeler et al, Content-based video indexing of TV broadcast news using hidden Markov models, Duisburg, Germany 1999.
M. J.Tomlinson et al, Integrating audio and visual information to provide highly robust speech recognition, United Kingdom 1996.
Arun Hampapur et al, Production model based digital video segmentation, Multimedia Tools and Applications, 1, 9-46, 1995.
Maia Garau et al, The impact of eye gaze on communication using humanoid avatars, United Kingdom, 2001.
Nobuhiko Kitawaki et al, Multimedia opinion model based on media interaction of audio-visual communications, Japan, 2005.
Glorianna Davenport et al, Cinematic primitives for multimedia, 1991.
Martin Cooke et al, An audio-Visual corpus for speech perception and automatic speech recognition, Acoustical Society of America, 2006.
Nerve Glotin et al, Weighting schemes for audio-visual fusion in speech recognition, Martigny, Switzerland 201.
T.D.C. Little et al, A digital on-demand video service supporting content-based queries, Boston, MA 1993.
Andrew Singer et al, Tangible progress: less is more in Somewire audio spaces, Palo Alto, CA 1999.
Howard Wactlar et al, Complementary video and audio analysis for broadcast news archives, 2000.
Wei Qi et al, Integrating visual, audio and text analysis for news video, Beijing, China 2000.
Lie Lu et al, A robust audio classification and segmentation method, China 2001.
Mark Barnard et al, Multi-modal audio-visual event recognition for football analysis, Switzerland 2003.
Aljoscha Smolic, 3d video and free viewpoint video-technologies, applications and mpeg standards, 2006.
Guillaume Gravier et al, Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR, Yorktown Heights, NY 2002.
Jonathan Foote et al, Finding presentations in recorded meetings using audio and video features, Palto Alto, CA, 1999.
Sofia Tsekeridou et al, Content-based video parsing and indexing based on audio-visual interaction, Thessaloniki, Greece, 2001.
Zhang et al, Hierarchical classification of audio data for archiving and retrieving, Los Angeles, CA 1999.
John N. Gowdy et al, DBN based multi-stream models for audio-visual speech recognition, Clemson, SC 2004.
Ross Cutler et al, Look who's talking: Speaker detection using video and audio correlation, Maryland, 2000.
Chalapathy V. Nett et al, Audio-visual speaker recognition for video broadcast news, Yorktown Heights, NY 1999.
G. Iyengar et al, Audio-visual synchrony for detection of monologues in video archives, Yorktown Heights, NY 2003.
Trevor Darrell et al, Audio-visual Segmentation and “The Cocktail Party Effect”, Cambridge, MA, 2000.
Girija Chetty et al, Liveness verification in audio-video speaker authentication, Sydney, Australia, 2004.
Hayley Hung et al, Using audio and video features to Classify the most dominant person in a group meeting, Germany, 2007.
E. Kijak et al, HMM based structuring of tennis videos using visual and audio cues, France, 2003.
Marco Cristani et al, Audio-visual event recognition in surveillance video sequences, Italy, 2006.
Stephane Dupont et al, Audio-Visual speech modeling for continuous speech recognition, IEEE Transactions on Multimedia, vol. 2, No. 3, Sep. 2000.
Benoit Maison et al, Audio-visual speaker recognition for video broadcast news: some fusion technigues, Yorktown Heights, NY 1999.
Shih-Fu Chang et al, Overview of the MPEG-7 standard, IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, No. 6, Jun. 2001.
Luhong Liang et al, Speaker independent audio-visual continuous speech recognition, Santa Clara, CA 2002.
Min Xu et al, Affective content analysis in comedy and horror videos by audio emotional event detection, Singapore, 2005.
Ming-Chieh Lee et al, A layered video object coding system using sprite and affine motion model, IEEE Transactions on Circuits and Systems for Video Technology, vol. 7, No. 1, Feb. 1997.
Ajay Divakaran et al, Video summarization using mpeg-7 motion activity and audio descriptors, Cambridge, MA, May 2003.
Mingli Song et al, Audio-visual based emotion recognition—a new approach, China, 2004.
Hyoung-Gook Kim et al, Audio classification based on MPEG-7 spectral basis representations, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, No. 5, May 2004.
Tianming Liu et al, A novel video key-frame-extraction algorithm based on perceived motion energy model, IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, No. 10, Oct. 2003.
Ling-Yu Duan et al, A mid-level representation framework for semantic sports video analysis, Singapore, 2003.
Shih-Fu Chang, The holy grail of content-based media analysis, 2002.
S. Shyam Sundar, Multimedia effects on processing and perception of online news: A study of picture, audio, and video downloads, Journalism and Mass Comm. Quarterly, Autumn 2007.
G. Iyengar et al, Semantic indexing of multimedia using audio, text and visual cues, Yorktown Heights, NY 2002.
M. Baillie et al, An audio-based sports video segmentation and event detection algorithm, United Kingdom, 2004.
Michael G. Christel et al, Evolving video skims into useful multimedia abstractions, 1998.
Jonathan Foote, An overview of audio information retrieval, Multimedia Systems 7:2-10, 1999.
Erling Wold et al, Content-based classification, search, and retrieval of audio, 1996.
Munish Gandhi et al, A data model for audio-video data, Oct. 1994.
Regunathan Radhakrishan et al, Generation of sports highlights using a combination of supervised & unsupervised learning in audio domain, Japan, 2003.
Hari Sundaram et al, A utility framework for the automatic generation of audio-visual skims, New York, NY 2002.
Tong Zhang et al, Heuristic approach for generic audio data segmentation and annotation, Los Angeles, CA, 1999.
Hari Sundaram et al, Determining computable scenes in films and their structures using audio-visual memory models, New York, NY 2000.
Min Xu et al, Audio keyword generation for sports video analysis, Singapore, 2004.
U. Iurgel et al, New approaches to audio-visual segmentation of TV news for automatic topic retrieval, Duisburg, Germany, 2001.
Mingli Song et al, Audio-Visual based emotion recognition using tripled hidden Markov model, China, 2004.
Van-Thinh Vu et al, Audio-video event recognition system for public transport security, France, 2006.
A. Ghias et al, Query by humming: musical information retrieval in an audio database, Feb. 9, 2005.
Juergen Luettin et al, Continuous audio-visual speech recognition, Belgium, 1998.
Nevenka Dimitrova et al, Motion recovery for video content classification, Arizona, 1995.
Nuria Oliver et al, Layered representations for human activity recognition, Redmond, WA 2002.
Michael A. Smith et al, Video skimming and characterization through the combination of image and language understanding, Pittsburgh, PA 1998.
Matthew Roach et al, Classification of video genre using audio, United Kingdom, 2001.
Hideki Asoh et al, An application of a particle filter to bayesian multiple sound source tracking with audio and video information fusion, Japan, 2004.
C. Neti et al, Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction, Yorktown Heights, NY 2000.
Chalapathy Nett et al, Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop, Pittsburgh, PA 2001.
Shih-Fu Chang et al, VideoQ: an automated content based video search system using visual cues, Seattle, WA, 1997.
Shin Ichi Satoh et al, Name-It: Naming and Detecting Faces in Video the Integration of Image and Natural Language Processing, Pittsburgh, PA 1997.
SilVia Pfeiffer et al, Automatic audio content analysis, Mannheim, Germany 1997.
Tong Zhang et al, Hierarchical system for content-based audio classification and retrieval, Los Angeles, CA 1998.
Jane Hunter, Adding multimedia to the semantic web: Building an mpeg-7 ontology, Australia, 2011.
Yong Rui et al, Relevance feedback techniques in interactive content-based image retrieval, Urbana, IL 1998.
Jia-Yu Pan et al, Automatic multimedia cross-modal correlation discovery, 2004.
Ziyou Xiong et al, Generation of sports highlights using motion activity in combination with a common audio feature extraction framework, Cambridge, MA Sep. 2003.

Provisional Applications (1)

	Number	Date	Country
	61358244	Jun 2010	US

Continuations (3)

	Number	Date	Country
Parent	16025347	Jul 2018	US
Child	16927929		US
Parent	14733224	Jun 2015	US
Child	16025347		US
Parent	13161866	Jun 2011	US
Child	14733224		US

Two dimensional to three dimensional moving image converter

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

US

International Classifications