In augmented reality (AR) applications, a real world object is imaged and displayed on a screen along with computer generated information, such as an image or textual information. AR can be used to provide information, either graphical or textual, about a real world object, such as a building or product.
In vision based Augmented Reality (AR) systems, the position of a camera relative to an object in the real world (called target) is tracked, and a processing unit overlays content on top of an image of the object displayed on a screen. Tangible interaction can be used to allow a user to manipulate the object in the real world with the result of manipulation changing the overlaid content on the screen, and in this way allow the user to interact with the mixed reality world.
During such an interaction, the user is partially occluding parts of the scene in the real world from the camera, and also occluding the target used by the camera for tracking. The occlusion of the target as seen by a camera may be detected for use in so called virtual buttons. Whenever an object region's that displayed as a virtual button on the screen happens to be covered by a user's finger, detection of the occlusion triggers an event in the processing unit. While virtual buttons are a powerful tool for user input to the processing unit, the ability of a user to specify a value within a given range is limited and non-intuitive. Thus, what is needed is an improved way to identify a location of an occlusion on a target as described below.
A mobile platform captures a scene that includes a real world object, wherein the real world object has a non-uniform pattern in a predetermined region. The mobile platform determines an area in an image of the real world object in the scene corresponding to the predetermined region. The mobile platform compares intensity differences between pairs of pixels in the area, with known intensity differences between pairs of pixels in the non-uniform pattern, to identify any portion of the area that differs from a corresponding portion of the predetermined region. The mobile platform then stores in its memory, a value indicative of a location of the any portion relative to the area. The stored value may be used in any application running in the mobile platform.
In accordance with the described embodiments, a real world object 101 shown in
A processor 114 is programmed with software to identify the location of an occlusion from a camera 100 (i.e. hidden from view of the camera) of a region of pattern 102 that is formed on the above-described real world object 101. Specifically, in act 200 (
In some embodiments, pattern 102 is designed to be sufficiently non-uniform in intensity between the two ends 102L and 102R, so as to be able to identify a location of an occlusion therein, up to a resolution of 1/N. For example, pattern 102 may be formed of pixels that have a predetermined maximum intensity at one end 102L and having a predetermined minimum intensity at the other end 102R, and pixels with intensities that change between the two ends, as shown in
Depending on the embodiment, two pixels in each pair (described above) in an area may or may not be adjacent to one another. In many embodiments, the intensities of the two pixels in each pair in an area are different from one another, and the differences are described in a descriptor, e.g. by a bit in a binary string. In some embodiments, a number N of areas in a newly captured image are classified, based on results of pair-wise intensity comparisons of pixels at predetermined orientations to identify a match or no match. Multiple results of comparisons in an area are combined and used in determining whether the area is a part of an occlusion. Such comparisons may be performed by use of binary robust independent elementary features (BRIEF) descriptors, as described below. Other descriptors of pixel intensities or differences in pixel intensities in an area of object 101 imprinted with pattern 102 may be used to detect an occlusion of the area, depending on the embodiment.
Various other parameters that are initialized in act 200 depend on a specific tracking method that is implemented in the software to track real world object 101 across multiple frames of video. For example, if natural feature tracking is used in the software, processor 114 initializes in act 200, the parameters that are normally used to track one or more natural features of the real world object 101. As another example, one or more digital markers (not shown) may be imprinted on object 101 and if so one or more parameters normally used to track the digital marker(s) are initialized in act 200. Other such parameter initializations may also be performed in act 200, as will be readily apparent to the skilled artisan in view of the following description.
In accordance with the described embodiments, a camera 100 may be used to image a scene within its field of view 111 (
Next, as per act 203 (
In some embodiments, as per act 204, processor 114 subdivides the area 103 (
Subsequently, in act 205 (
Thereafter, in act 207, processor 114 compares an intensity difference ΔIs between pixels 103A and 103B in image area 103 with a corresponding difference ΔIp between a pair of pixels in the non-uniform pattern that is back projected to the camera plane based on the real world position of object 101. Hence, in some embodiments, processor 114 determines a location of occlusion of a predetermined pattern, based on results of either comparing intensities or comparing intensity differences, because both intensities and intensity differences in areas that are occluded on pattern 102 on real world object 101 do not match corresponding intensities and intensity differences when the areas are not occluded.
For example, as shown in
Specifically, in some illustrative aspects of the described embodiments of act 207, processor 114 uses descriptors of intensities of pixels in pattern 102, of the type described in an article entitled “BRIEF: Binary Robust Independent Elementary Features” by Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua, published as Lecture Notes In Computer Science at the website obtained by replacing “%” with “I” and replacing “+” with “.” in the following string “http:%% cvlab+epfl+ch %˜calonder % CalonderLSF10+pdf”. The just-described article is incorporated by reference herein in its entirety. Use of descriptors of differences in intensities of pixels in pattern 102 (such as binary robust independent elementary features descriptors or “BRIEF” descriptors) enables comparison of images of pattern 102 (as per act 207) across different poses, lighting conditions etc. Alternative embodiments may use other descriptors of intensities or other descriptors of intensity differences of a type that will be readily apparent in view of this detailed description.
In some embodiments, processor 114 is programmed to smooth the image before comparing pixel intensities or intensity differences of pairs of pixels. Moreover, in such embodiments, processor 114 is programmed to use binary strings as BRIEF descriptors, wherein each bit in a binary string is a result of comparison of two pixels in an area of pattern 102. Specifically, in these embodiments each area in pattern 102 is represented by, for example, a 16-bit (or 32-bit) binary string, which holds the results of 16 comparisons (or 32 comparisons) in the area. When a result of a comparison indicates that a first pixel is of higher intensity than a second pixel, then the corresponding bit is set to 1 else that bit is set to 0. In this example, 16 pairs of pixels (or 32 pairs of pixels) are chosen in each area, and the pixels are selected in a predetermined manner, e.g. to form a Gaussian distribution at a center of the area.
In some aspects of the described embodiments, descriptors of areas in pattern 102 that is to be occluded during use as a virtual slider as described herein are pre-calculated (e.g. based on real world position of object 101 and its pose that is expected during normal use) and stored in memory by processor 114 to enable fast comparison (relative to calculation during each comparison in act 204). Moreover, in several embodiments, similarity between a descriptor of an area in a newly captured image and a descriptor of a corresponding area in pattern 102 is evaluated by computing a Hamming distance between two binary strings (that constitute the two descriptors), to determine whether the binary strings match one another or not. In some embodiments, such descriptors are compared by performance of a bitwise XOR operation on the two binary strings, followed by a bit count operation on the result of the XOR operation. Alternative embodiments use other methods to compare a descriptor of a pattern 102 to a descriptor of an area in the newly-generated image, as will be readily apparent in view of this detailed description.
Next, in act 208, processor 114 checks if M comparisons have been performed in the selected sampling area or pixels 103A. If the answer is no, then processor 114 returns to act 206 to select another pair of pixels. If the answer is yes, then processor 114 goes to act 209, described below. In some aspects of the described embodiments, the number M is predetermined and identical for each sampling area 191I. For example, the number M can be predetermined to be 4 for all sampling areas 191A-191I, in which case four comparisons are performed (by repeating act 207 four times) in each selected sampling area 191I. In other examples, M may be randomly selected within a range and still be identical for each selected sampling area 191I. In still other examples, M may be randomly selected for each sampling area 191I.
In act 209, processor 114 stores in memory 119 one or more results based on the comparison performed in 207. For example, M values of the above-described ratio R or the difference D may be stored to memory 119, one value for each pair of pixels that was compared in act 207, for each sampling area 191I. As another example, the ratio R or the difference D may be averaged across all M pixel pairs in a selected sampling area 191A, and the average may be stored to memory 119.
In one illustrative embodiment, processor 114 computes a probability pA of occlusion of each sampling area 103I, based on the M results of comparison for that sampling area 103I as follows. If a difference D (or ratio R) for a pixel pair is greater than a predetermined threshold, then the binary value 1 is used as follows for that pixel pair and alternatively the binary value 0 is used as follows: the just-described binary values are added up for all the M pixel pairs in sampling area 103I and divided by M to obtain a probability pI. The probability pI that is computed is then stored to memory 119 (
When comparison results (e.g. probabilities pA . . . pI . . . pN) have been calculated for all sampling areas 191A, 191I, 191N, processor 114 goes to act 211 to select one or more sampling areas for use in computation of location of an occlusion of pattern 102 in image area 103. The specific manner in which sampling areas are selected in act 211 for occlusion location computation can be different, depending on the aspect of the described embodiments. For example, some embodiments compare intensities of pixels in a newly captured image with corresponding intensity ranges of another real world object (also called “occluding object”) predetermined for use in forming an occlusion of pattern 102 on object 101, such as a human finger 112 (
In the case of a human finger 112, certain embodiments compare known intensity ranges of human skin to determine whether or not to filter out (i.e. eliminate) one or more sampling areas when selecting sampling areas in act 211, for computation of location of an occlusion. Similarly, a total width of a group of contiguous sampling areas may be compared to a predetermined limit which is selected ahead of time, based on the size of an adult human's finger to filter out sampling areas. Depending on the embodiment, known intensities of human skin that are used in act 211 as described herein are predetermined, e.g. by requiring a user to provide sample images of their fingers, during initialization. Hence in such embodiments, two sets of known intensities are compared, e.g. one set of pattern 102 in act 207 and another set of human finger 112 in act 211. Other embodiments may select sampling areas (thereby to eliminate unselected areas) in act 211 based on BRIEF descriptors that are found to not match any BRIEF descriptors of pattern 102, by use of predetermined criteria in such matching, thereby to use just a single set of known intensities (of pattern 102).
Next, as per act 212, processor 114 uses probabilities of sampling areas that were selected in act 211 and are contiguous to one another to compute a location of occlusion 105 relative to image area 103. For example, by use of such areas, an occlusion's location may be computed as being Δx1 away from a left edge 103L (
In one illustrative embodiment of act 212, processor 114 computes a probability weighted average of the locations of the selected sampling areas, as follows. For example, sampling areas 191J, 191K and 191L (see
Note that the just-described weighted average as well as the just-described simple average (see previous paragraph) both provide more precision than identification of a single digital marker, from among a sequence of digital markers of the type described in an article entitled “Occlusion based Interaction Methods for Tangible Augmented Reality Environments” by Lee, G. A. et al published in the Proceedings of the 2004 ACM SIGGRAPH International Conference on Virtual Reality Continuum and Its Applications in Industry (VRCAI '04), pp. 419-426 that is incorporated by reference herein in its entirety.
Note that in some embodiments of the type described herein, although markers are used to identify the location of an object in an image and/or location of an area that corresponds to the predetermined region (as per act 203), the markers are not used to compute the location of occlusion in act 212. Instead, in several embodiments of the type described herein, an occlusion's location is computed in act 212 using the results of comparing two intensity differences, namely a first intensity difference between two pixels within the identified area that corresponds to the predetermined region, and a second intensity difference between two pixels within the non-uniform pattern that correspond to the two pixels used to compute the first intensity difference. As noted above, in many such embodiments, two pixels used in the second intensity difference have locations that differ from each other (e.g. by Δx, Δy) identical to corresponding difference in locations of the two pixels used in the first intensity difference.
Referring back to
Next, processor 114 returns to act 201 (described above) and repeats the just-described acts, to update the value in storage element 115 based on changes in location of occlusion 105 relative to image area 103, e.g. when the user moves finger 112 across region 102 on real world object 101 (
Use of descriptors of intensity differences (e.g. BRIEF descriptors) by processor 114 in comparison in act 207 in combination with use of a tracking method in act 202 enables a location of an occlusion to be identified precisely, relative to an end (e.g. end 102L) of a predetermined area (wherein the pattern 102 is included) on a real world object 101 (also called “target”). Specifically, use of natural features and/or digital markers on real world object 101 with appropriate programming of processor 114 can track object 101 even after a portion of pattern 102 goes out of the field of view 111 of camera 100. For example, translation between camera 100 and object 101 may cause left edge 103L to disappear from the field of view 111 and therefore absent from an image 117 (
Although a single row of sampling areas 191A-191N have been illustrated in
In such embodiments, in act 204, the area 103 may be subdivided into a two-dimensional array of sampling areas. In the example illustrated in
In such embodiments, acts 204-212 are performed by processor 114 being appropriately programmed to use the multiple rows of sampling areas in such a two-dimensional array that is formed in electronic memory 119. For example, in computing an occlusion's location, a weighted average of probabilities of sampling areas 192KA . . . 192KI . . . 192KZ (
A value in storage element 115 can be used as an output of a slider control i.e. as a virtual slider. Hence, such a value can control (as per act 213 in
Thus output of a virtual slider, formed by user input via storage element 115 as described herein can be used similar to user input from physically touching a real world slider on a touch screen of a mobile device. However, note that pattern 102 is located directly on the real world object 101 (also called “target”), so that the user can directly work with object 101 without putting their finger 112 back to a touch screen 1001 of a mobile platform 1000 (
Several embodiments of the type described herein are implemented by processor 114 included in mobile platform 1000 (
In an Augmented Reality environment there might be different interaction metaphors used. Tangible interaction allows a user to reach into the scene and manipulate objects directly (as opposed to embodied interaction, where users do interaction direct on the device). Use of a virtual slider as described herein eliminates the need to switch between two metaphors, thereby to eliminate any user confusion arising from switching. Specifically, when tangible interaction is chosen as an input technique, virtual sliders (together with virtual buttons) allow a user to use his hands in the real world with his attention focused in the virtual 3D world, even when the user needs to scroll to input a continuously changing value.
Virtual sliders as described herein can have a broad range of usage patterns. Specifically, virtual sliders can be used in many cases and applications similar to real world sliders on touch screens. Moreover, virtual sliders can be used in an AR setting even when there is no touch screen available on mobile phones. Also, use of virtual sliders allows a user to select between different tools very easily and also to use the UI of the interaction device to specify specific tool parameters. This leads to much faster manipulation times. Virtual sliders as described herein cover a broad range of activities, so it is possible to use virtual sliders as the only interaction technique for a whole application (or even for many different applications). This means once a user has learned to use virtual sliders, he will not need to learn any other tool.
A mobile platform 1000 of the type described above may include functions to perform various position determination methods, and other functions, such as object recognition using “computer vision” techniques. The mobile platform 1000 may also include circuitry for controlling real world object 101 in response to user input via occlusion detected and stored in storage element 115, such as transmitter in transceiver 1010, which may be an IR or RF transmitter or a wireless a transmitter enabled to transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network. The mobile platform 1000 may further include, in a user interface, a microphone and a speaker (not labeled) in addition to touch screen 1001 and/or screen 1002 which is not touch sensitive, used for displaying captured scenes and rendered AR objects. Of course, mobile platform 1000 may include other elements unrelated to the present disclosure, such as a read-only-memory 1007 which may be used to store firmware for use by processor 114.
Although the embodiments described herein are illustrated for instructional purposes, various embodiments not limited thereto. For example, although item 1000 shown in
Memory 119 of several embodiments of the type described above includes software instructions for a detection module 119D that are also executed by one or more processors 114 to detect presence of human finger 112 overlaid on pattern 102 of real world object 101. Depending on the embodiment, such software instructions (e.g. to perform the method of
In addition to module 119D described in the preceding paragraph, memory 119 of several embodiments also includes software instructions of a tracking module 119T that are also executed by one or more processors 114, to track movement over time of a location of occlusion, specifically by presence of finger 112 on pattern 102 of object 101. Such a tracking module 119T is also used by a mobile platform 1000 to track digital marker(s), as described above. In several embodiments, an occlusion's location data output by tracking module 119T (e.g. x coordinate of an occlusion) is used by one or more of processors 114 to control information displayed to a user, by execution of instructions in a rendering module 119R. Hence, instructions in rendering module 119R render different information on screen 1002 (or touch screen 1001), depending on an occlusion's location as determined in detection module 119D and/or tracking module 119T.
In one such example, an embodiment of real world object 101 described above is a pad 501 (
In the example shown in
Accordingly, pattern 102H (
Although in some embodiments, the above-described software modules 119D, 119T and 119R are all present in a common memory 119 of a single device 1000, in other embodiments one or more such software modules 119D, 119T and 119R are present in different memories that are in turn included in different electronic devices and/or computers as will be readily apparent in view of this detailed description. Moreover, instead of modules 119D, 119T and 119R being implemented in software, as instructions stored in memory 119, one or more such modules are implemented in hardware logic in other embodiments.
Various adaptations and modifications may be made without departing from the scope of the embodiments. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description.
This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/511,002 filed on Jul. 22, 2011 and entitled “VIRTUAL SLIDERS: Specifying Values by Occluding a Pattern on a Target”, which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61511002 | Jul 2011 | US |