Embodiments of the present invention generally relate to the field of augmented reality. More specifically, embodiments of the present invention relate to systems and methods for determining orientation and position for augmented reality content.
There is a growing need, in the field of Augmented Reality, to track the location and orientation of a device with a high degree of precision. GPS systems typically used in small-scale systems tend to offer only a limited degree of precision and are not generally useable for real-time Augmented Reality applications. While processes for smoothing the raw output of GPS systems using specialized software may improve these GPS systems in some situations, the results are still not accurate enough to support many Augmented Reality applications, particularly in real-time.
Augmented Reality applications typically supplement live video with computer-generated sensory input such as sound, video, graphics or GPS data. It is necessary to keep track of both the position and orientation of a device during an Augmented Reality session to accurately represent the position of known objects and locations within the Augmented Reality application.
Unfortunately, modern GPS systems offer only a limited degree of accuracy when implemented in small-scale systems. For example, a user may travel several feet before the movement is recognized by the GPS system and then the content of the Augmented Reality application is updated to reflect the new location and position. In some scenarios, the GPS system may depict the user rapidly jumping between two or more positions when the user is actually stationary. Furthermore, some sensors common in conventional mobile devices (e.g., magnetometers) are susceptible to drift when tracking a device's orientation, thereby rendering them unreliable unless the drift is detected and compensated for.
The limited accuracy of these GPS systems and sensors makes them difficult to use effectively in Augmented Reality applications, where a low level of precision is detrimental to the overall user experience. Thus, what is needed is a device capable of determining and tracking absolute position and orientation of a small-scale device with a high degree of accuracy and precision.
A method of determining the absolute position and orientation of a mobile computing device is disclosed herein. The method includes capturing a live video feed on the mobile computing device. A first object, a second object, and a third object are detected in one or more frames of the live video feed, where the first object is associated with a first set of location coordinates, the second object is associated with a second set of location coordinates, and the third object is associated with a third set of location coordinates and is non-collinear with respect to the first and second objects. The absolute position and orientation of the mobile computing device are determined based on the set of location coordinates associated with the first, second, and third objects.
More specifically, a computer usable medium is disclosed having computer-readable program code embodied therein for causing a mobile computer system to execute method of determining the absolute position and orientation of the mobile computing device. The method captures a live video feed on the mobile computing device. First, second, and third objects are detected in one or more frames of the live video feed, where the first object is associated with a first set of location coordinates, the second object is associated with a second set of location coordinates, and the third object is associated with a third set of location coordinates and is non-collinear with respect to the first and second objects. The absolute position and orientation of the mobile computing device are automatically determined based on the set of location coordinates associated with the first, second, and third objects.
A mobile computing device is also disclosed. The device includes a display screen, a general purpose processor, a system memory, and a camera configured to capture a live video feed and store the video feed in the system memory e.g., using a bus. The general purpose processor is configured to analyze the live video feed to locate first, second, and third objects in one or more frames of the live video feed. The first object is associated with a first set of location coordinates, the second object is associated with a second set of location coordinates, the third object is associated with a third set of location coordinates and is non-collinear with respect to the first and second objects, and the general purpose processor is further configured to compute an absolute position and orientation of the mobile computing device based on the set of location coordinates associated with the first, second, and third objects.
The accompanying drawings, which are incorporated in and form a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention:
Reference will now be made in detail to several embodiments. While the subject matter will be described in conjunction with the alternative embodiments, it will be understood that they are not intended to limit the claimed subject matter to these embodiments. On the contrary, the claimed subject matter is intended to cover alternative, modifications, and equivalents, which may be included within the spirit and scope of the claimed subject matter as defined by the appended claims.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the claimed subject matter. However, it will be recognized by one skilled in the art that embodiments may be practiced without these specific details or with equivalents thereof. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects and features of the subject matter.
Portions of the detailed description that follows are presented and discussed in terms of a method. Although steps and sequencing thereof are disclosed in a figure herein (e.g.,
Some portions of the detailed description are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits that can be performed on computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer-executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout, discussions utilizing terms such as “accessing,” “writing,” “including,” “storing,” “transmitting,” “traversing,” “associating,” “identifying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
Exemplary Mobile Computing Device with Touch Screen
Embodiments of the present invention are drawn to mobile computing devices having at least one camera system and a touch sensitive screen or panel. The following discussion describes one such exemplary mobile computing device.
In the example of
A communication or network interface 108 allows the mobile computing device 112 to communicate with other computer systems, networks, or devices via an electronic communications network, including wired and/or wireless communication and including an Intranet or the Internet. The touch sensitive display device 110 may be any device capable of displaying visual information in response to a signal from the mobile computing device 112 and may include a flat panel touch sensitive display. The components of the mobile computing device 112, including the CPU 101, memory 102/103, data storage 104, user input devices 106, and the touch sensitive display device 110, may be coupled via one or more data buses 100.
In the embodiment of
Some embodiments may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments.
The framework implemented herein detects known objects within the frame of a video feed. The video feed is received in real time from a camera connected to a mobile computing device such as a smartphone or tablet computer and may be stored in memory, and location coordinates (e.g., latitude and longitude or GPS-based coordinates) are associated with one or more known objects detected in the video feed. Based on the coordinates of the known objects, the user's absolute position and orientation is triangulated with a high degree of precision.
Object detection is performed using cascaded classifiers. A cascaded classifier describes an object as a visual set of items. According to some embodiments, the cascaded classifiers are based on Haar wavelet features of an image (e.g., a video frame). The output of the classifier may be noisy and produce a number of false-positive objects. Therefore, a set of heuristic procedures are performed to clean any noise in the object classifier's output. The heuristic procedures attempt to distinguish between objects that are accurately detected and false-positives from the object classifier. According to some embodiments, the image is converted to grayscale before object detection is performed.
Embodiments of the present invention select true objects from a potentially large group of candidates detected during object detection. Selecting true objects in the scene may be performed using three main steps.
In the first step, bounding boxes are placed around candidate objects for each frame and are grouped such that no two candidate objects are close together. In the second step, small areas around each of the candidate boxes are marked. Frames before and after the current frame (e.g., forward and backward in time) are searched within the marked boxes. The standard deviation of pixel values is computer over the pixels in each candidate bounding box. It is considered more likely that the bounding box contains an object if the standard deviation of pixel values is high. Candidate objects can be rejected quickly based on the size and/or dimensions of the detected object compared to the size and/or dimensions of known objects.
A final score is calculated for each candidate based on a weighted sum of the number of times the object appears in the frames just before and after the current frame and the standard deviation of the pixels from frame to frame. Typically, if a candidate object appears in multiple frames, it is more likely to represent a known object. If the final score calculated for an object is below a certain threshold, that object can be disregarded. This method has been observed to successfully detect objects in over 90% of the frames.
Prior to object detection, the system may be trained to detect objects using a large set of positive examples. The initial training process helps mitigate long detection times when the frames are processed for known objects. Object training uses a utility to mark locations of the objects in a large set of positive examples by hand. A large database of random images that do not represent the object may be provided to serve as negative examples. The training utility then automatically generates a data file (e.g., an XML file) that can be provided as a classifier to the framework. According to some embodiments, each object will have its own classifier because each object is assumed to be unique within the scene.
Auto-localization is performed based on the location of the known objects relative to the user. At least three non-collinear objects are detected in order to successfully triangulate the position of the user, and the three objects need not be detected simultaneously. For example, according to one embodiment, object locations with corresponding camera orientations and location coordinates may be cached and matched according to timestamps. Once the cache contains three valid corresponding objects, the location computation is automatically triggered and reported to the user application based on the cached data. It is therefore unnecessary to perform a manual location registration or to continuously poll for results.
With regard to
With regard to
With regard to
With regard to
With regard to
With regard to
With regard to
Given three points of a triangle with known absolute latitudes and longitudes, it is possible to determine the absolute position and orientation of a fourth point located outside of the triangle if the angles of the points relative to the fourth point are known using trigonometry. Therefore, using these techniques, the location of three known objects and the angle between the objects may be used to derive an absolute position and orientation of the user device. These techniques may offer greater accuracy when compared to GPS position data in consumer devices which typically provides accuracy to only 6-8 meters. With respect to the exemplary calculation illustrated in
With respect to the exemplary object locations of
Embodiments of the present invention are thus described. While the present invention has been described in particular embodiments, it should be appreciated that the present invention should not be construed as limited by such embodiments, but rather construed according to the following claims.
This patent application is related to and incorporates by reference herein in their entirety, the following patent application that is co-owned and concurrently filed herewith: (1) U.S. patent application Ser. No. ______, entitled Automatic “Absolute Orientation and Position Calibration” by Abbott et al., Attorney Docket No. NVID-PDU-130525US01.