1. Field of the Invention
The invention is generally related to the area of augmented reality (AR). In particular, the invention is related to techniques for detecting optically a blocked area in a target image, where an AR target being blocked or how it is being blocked by an object is evaluated in a real time to generate different input commands for user interactions.
2. The Background of Related Art
Augmented Reality (AR) is a type of virtual reality that aims to duplicate the world's environment in a computer device. An augmented reality system generates a composite view for a user that is the combination of a real scene viewed by the user and a virtual scene generated by the computer device that augments the scene with additional information. The virtual scene generated by the computer device is designed to enhance the user's sensory perception of the virtual world the user is seeing or interacting with. The goal of Augmented Reality (AR) is to create a system in which the user cannot tell the difference between the real world and the virtual augmentation of it. Today Augmented Reality is used in entertainment, military training, engineering design, robotics, manufacturing and other industries.
The recent development of computer devices such as smart phones or tablet PC and cloud computing services allow software developers to create many augmented reality application programs by overlaying virtual objects and/or additional 2D/3D multi-media information within a captured image by a video camera. When an interactive user interface is required in an AR application, a typical interface design is to generate input commands by finger gestures on a surface of a touch screen of the computer device. However, the interaction on a large touch screen would be very inconvenient for users to interact with an AR display. In order to overcome this sort of ergonomic difficulties, some AR applications introduced sophisticated algorithms to recognize hand/finger gestures in free space. The image sensing device, such as Kinect from Microsoft or Intel 3-D depth sensor, is gaining popularity as a new input method for real-time 3-D interaction with AR applications. However, these interaction methods require highly sophisticated image processing mechanisms involving a specific device along with various software drivers, where an example of the specific device includes a 3-D depth sensor or a RGB video camera. Thus there is a need for techniques of generating input commands based on simple motions of an object or intuitive gestures of the object, where the object may be a hand and something to be held by a user.
This section is for the purpose of summarizing some aspects of the present invention and to briefly introduce some preferred embodiments. Simplifications or omissions may be made to avoid obscuring the purpose of the section. Such simplifications or omissions are not intended to limit the scope of the present invention.
In general, the present invention is related to techniques of allowing users of computer devices to interact with any augmented reality (AR) based multi-media information using simple and intuitive hand gestures. According to one aspect of the present invention, an image capturing device (e.g., a video or photo camera) is used to generate images from which a pre-defined hand gesture is identified on a target image for displaying AR information. One of the advantages, objects and benefits of the present invention is to allow a user to interact with a single target to display significant amounts of AR information. Depending on implementation, the target may be a marker or a markerless image. The image of the target is referred to herein as a target image.
According to another aspect of the present invention, a photo or video camera is employed to take images of a target. With a hand to move with respect to the target and block some or all of the target, a hand motion is detected based on how much the target is being blocked in the images.
According to still another aspect of the present invention, each motion corresponds to an input command. There are a plurality of simple motions that may be made with respect to the target. Thus different input commands may be provided by simply moving a hand with respect to the target.
According to yet another aspect of the present invention, an audio feedback function is provided with a confirmation of an expected command by hand gesture. For example, a simple swipe gesture of hand from left to right across a target could provide the sound of piano when the moving speed of hand gesture is slow, resulting in blocking the target for a relatively long period. When the same swipe gesture is fast, resulting in blocking the target for a relatively short period, then the audio feedback can be set to a whistle sound.
The present invention may be implemented as an apparatus, a method or a part of a system. Different implementations may yield different merits in the present invention. According to one embodiment, the present invention is a system for providing augmented reality (AR) content, the system comprises: a physical target, a computing device loaded with a module related to augmented reality, a video camera, aiming at the physical target, coupled to the computing device, wherein the module is executed in the computing device to cause the computing device to display a first object when the physical target in a target image is fully detected and to cause the computing device to display a second object or conceal the first object when the physical target in the target image is partially detected or missing.
According to another embodiment, the present invention is portable device for providing augmented reality (AR) content, the portable device comprising: a camera aiming at a physical target; a display screen, a memory space for a module; a processor, coupled to the memory, executing the module to cause the camera to generate a sequence of target images while a user of the portable device moves a hand with respect to the physical target, wherein the module configured to cause the processor to determine from the target images whether or how the physical target is being blocked by the hand, the processor is further caused to display an object on the display screen when the physical target is detected in the images, and determine a motion of the hand when the physical target is partially detected in the images, where the motion corresponding to an input command.
According to yet another embodiment, the present invention is method for providing augmented reality (AR) content, the method comprising: providing a module to be loaded in a computing device for execution, the module requiring a video camera to aim at a physical target, the video camera coupled to a computing device, wherein the computing device is caused to display a first object when the physical target in a target image is fully detected, and to display a second object or conceal the first object when the physical target in the target image is partially detected.
One of the objects, features and advantages of the present invention is to provide a mechanism of interacting with an AR module. Other objects, features, benefits and advantages, together with the foregoing, are attained in the exercise of the invention in the following description and resulting in the embodiment illustrated in the accompanying drawings.
These and other features, aspects, and advantages of the present invention will become better understood with regard to the following description, appended claims, and accompanying drawings where:
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will become obvious to those skilled in the art that the present invention may be practiced without these specific details. The description and representation herein are the common means used by those experienced or skilled in the art to most effectively convey the substance of their work to others skilled in the art. In other instances, well-known methods, procedures, components, and circuitry have not been described in detail to avoid unnecessarily obscuring aspects of the present invention.
Reference herein to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Further, the order of blocks in process flowcharts or diagrams representing one or more embodiments of the invention do not inherently indicate any particular order nor imply any limitations in the invention.
Embodiments of the present invention are discussed herein with reference to
In one embodiment, an application module 114, referred to herein as an AR application or module, is designed to perform a set of functions that are to be described further herein. The application module 114 implements one embodiment of the present invention and may be implemented in software. A general computer would not perform the functions or results desired in the present invention unless it is installed with the application module and execute it in a way specified herein. In other words, a new machine is created using a general computer as a base component thereof. As used herein, whenever such a module or an application is described, a phase such as the module is configured to, designed to, intended to, adapted to do a function means that the newly created machine has to perform the function unconditionally.
In particular, when the AR module 114 is executed, the computing device 110 receives the images or video from the video interface 122 and processes the images or video to determine if there is a target image or not, and further to overlay one or more AR objects on a real scene image or video when such a target image is detected. It should be noted that a general computer is not able to perform such functions unless the specially designed AR module 114 is loaded or installed and executed, thus creating a new machine.
According to one embodiment, the AR marker 104 may be in a certain pattern and in dark color.
Depending on implementation, the image 112 may be displayed on a computing device taking pictures or video of a natural scene or any designated display device. As shown in
Referring now to
According to one embodiment, the AR module is configured to display a first AR object after it successfully captures the marker 200 in
Referring now to
According to one embodiment, a marker image is identified from a captured image provided from an image capturing device (e.g., a camera). The marker image is then processed to detect the edge and contour of the marker. Algorithms that may be used in processing a marker image are well known to those skilled in the art and will not be further described herein to avoid obscuring aspects of the present invention.
Once the edge and contour of the marker are extracted from an image, the parameters representing the edge and contour of the marker are to be matched with or compared to the same of the original marker image (i.e., as a marker descriptor, reference or template). When the process 220 is determined that there is no match between the detected marker and the marker template, the process 220 goes to 226 where the hand gesture is to be detected. For example, when a hand blocks a significant portion of the marker as shown in
As will be described below, the hand motion is detected at 226 to determine how the hand is moving. The motion of the hand, when detected from a sequence of images, can be interpreted as a command. More details of detecting the motion will be further described herein.
When the process 220 is determined that there is a match between the detected marker and the marker template, the process 220 goes to 230 to display a predefined AR object. For example, when the marker as shown in
According to one embodiment, the above sequence of image events, the AR module or a separate module is designed to record or estimate the timing in the display stage and the suppress stage of the AR object depending on the degree of how much the marker has been blocked by the hand. Based on the progress of blocking the marker, from little to significant and then to little, the module is designed to detect or estimate the motion direction of the hand using the sequence of locations the marker being blocked.
Referring now to
According to one embodiment, the process 310 starts when there is a mismatch between a detected marker and a marker template, or a missing status of a marker in a captured image. The process 310 may be used at 226 of
Once the moving direction is determined or estimated, the hand gesture is inferred at 318. Depending on implementation, a set of predefined commands may be determined per the hand motions. For example, a first kind of AR object is displayed when a hand is moving from left to right, a second kind of AR object is displayed when a hand is moving downwards. At 320, the AR module is designed to receive a corresponding input and reacts to the input (e.g., display a corresponding 3D AR object among a set of predefined objects).
The calculation for tracking a center of the lost edge/contour area continues until the camera resumes the successful image capturing of the marker. Using the tracking data of the center of lost edge/contour area, the process 310 could identify the moving direction of the hand (e.g.; the hand is moving from left to right, or forward to backward, and so on).
According to one embodiment,
According to one embodiment,
According to one embodiment, when the target is blocked by a hand, corresponding missing distinctive feature points in captured images are noted or tracked when a hand is moving over the target image. By tracking how many feature points are remaining or missing from one image to another, the motion of the hand can be detected. In other words, given the number of the feature points in a target image, by detecting the remaining feature points in a sequence of images (some of the feature points would not be detected due to the blocking by the hand), the motion of the hand can be fairly well detected. In one embodiment, some examples of the distinctive feature points may be a tiny pixel region in a reference image (i.e., a template of the feature points) that has graphical properties of sharp edge/corner or strong contrast, similar to a bright spot on a dark background.
In general, the image is in colors, represented in three primary colors (e.g., red, green and blue). To reduce the image processing complexity, the color image is first converted into a corresponding grey image (represented in intensity or brightness). Through an image algorithm, distinctive feature points in the image are extracted. To avoid obscuring the aspects of the present invention, the description of the image algorithm and the way to covert from a color image to a grey image are ignored herein. Once the distinctive feature points are extracted from the image, a template of the markerless image can be generated. Depending on implementation, the template may include a reference image with the locations of the extracted feature points or a table of descriptions of the extracted feature points.
Referring now back to 422, after a natural image including the magazine page is taken, the captured image is processed at 422 to detect the feature points in the region containing the magazine page. If needed, the captured image may be warped before being processed to detect the feature points in the region containing the magazine. At 424, the detected feature points are then compared with the template. If there is no matching or an indication that some of the feature points are missing, the process 420 goes to 426, where the hand gesture is recognized and a corresponding command is interpreted by tracking the positions of the remaining feature points in the captured images. In one embodiment, the image may be warp-transformed to be processed again if there is no match between the detected feature points and the template at 424 or a comparison ratio is near a threshold. The detail of tracking the remaining feature points in the captured images will be further described in
It is now assumed that there is a match between the detected feature points and the template at 424, which means the markerless image is detected in the image, the process 420 then goes to 428 to call the AR module to display a predefined AR object. It should be noted that the match does not have to be perfect, a match is called if the comparison or the similarity exceeds a certain percentile (e.g., 70%). While the AR object is being displayed, the process 420 goes to 430 to determine if another image is received or an action from a user is received.
Referring now to
According to one embodiment, the process 420 starts when there is a mismatch between a detected marker and a marker template or missing of certain feature points in a captured image. When the process 440 is used at 426 of
Once the moving direction is determined or estimated, the hand gesture is inferred at 448. Depending on implementation, a set of predefined commands may be determined per the hand motions. For example, a first kind of AR object is displayed when a hand is moving from left to right, a second kind of AR object is displayed when a hand is moving downwards. At 450, the AR module is designed to receive a corresponding input and reacts to the input (e.g., display a corresponding 3D AR object among a set of predefined objects). The calculation for tracking the center of lost edge/contour area continues until the camera resumes the successful image capturing of the target image. Using the tracking data of the center of lost edge/contour area, the process 440 could identify the moving direction of hand (e.g.; the hand is moving from left to right, or forward to backward, and so on).
A point C(Xav, Yav) is defined as a center of key points that are lost from currently captured image. For example, there are j key points, K1(x1,y1), K2(x2,y2), . . . , Kj(xj,yj) are lost from a captured image at time t1;
Average x location of C1(Xav)=(x1+x2+ . . . +xj)/j at time t1; and
Average y location C1(Yav)=(y1+y2+ . . . +yj)/j at time t1.
Using the above equation, the center for the lost key points C2(xav,Yav) at time t2 could be computed in the same way using lost key point set at time t2. Next, the center locations can be iteratively computed until time tk, C1(Xav,Yav) at time t1, C2(Xav,Yav) at time=t2, . . . , Ck(Xav,Yav) at time=tk. It should be noted that C1 should be observed at beginning of the hand blocking the target (at time t1) and Ck should be observed at ending of of the hand blocking the target (at time tk).
if a sum of absolute changes of X coordinates C1 . . . Ck is greater than a sum of absolute change of Y coordinates of C1 . . . Ck and its difference is greater than a user specified threshold value, then the movement of Cxav dominates and movement direction is occurred in the X axis.
If a sum of absolute changes of Y coordinates of C1 . . . Ck is greater than a sum of absolute change of X coordinates of C1 . . . Ck) and its difference is greater than user specified threshold value, then the movement of Cyav dominates and movement direction is occurred in the Y axis.
If a sum of absolute changes of X coordinates of C1 . . . Ck is greater than the user specified threshold value_x and a sum of absolute change of Y coordinates of C1 . . . Ck) is also greater than the user specified threshold value_y, then the movement direction of C is diagonal in X-Y coordinates.
Specifically, for
Specifically, for
Specifically, for
If the movement of C is diagonal direction and meets the following condition: If the change of X coordinates from C1 to midpoint Ci is negative and change of Y coordinates from C1 to midpoint Ci is also negative and the change of x coordinates from Ci+1 to Ck is positive and the change of Y coordinates from Ci+1 to Ck is positive, then the U-turn hand gesture on diagonal direction is from upper right corner to center, then return to upper right corner.
If the movement of C is diagonal direction and meets the following condition: If the change of X coordinates from C1 to midpoint Ci is negative and change of Y coordinates from C1 to midpoint Ci is positive and the change of x coordinates from Ci+1 to Ck is positive and the change of Y coordinates from Ci+1 to Ck is negative, then the U-turn hand gesture on diagonal direction is from lower right corner to center, then return to lower right corner.
If the movement of C is diagonal direction and meets the following condition, If the change of X coordinates from C1 to midpoint Ci is positive and change of Y coordinates from C1 to midpoint Ci is negative and the change of x coordinates from Ci+1 to Ck is negative and the change of Y coordinates from Ci+1 to Ck is positive, then the U-turn hand gesture on diagonal direction is from upper left corner to center, then return to upper left corner.
Furthermore, another dozen of new input commands could be created by specifying a different time window of image blocking. In other words, each hand gesture shown in
According to one embodiment,
A markerless image could be any printed paper, large poster, game card (e.g., a thick plate with cartoon printing) and so on.
The invention is preferably implemented in software, but can also be implemented in hardware or a combination of hardware and software. The invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data which can thereafter be read by a computer system. Examples of the computer readable medium include read-only memory, random-access memory, CD-ROMs, DVDs, magnetic tape, optical data storage devices, and carrier waves. The computer readable medium can also be distributed over network-coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
The processes, sequences or steps and features discussed above are related to each other and each is believed independently novel in the art. The disclosed processes and sequences may be performed alone or in any combination to provide a novel and unobvious system or a portion of a system. It should be understood that the processes and sequences in combination yield an equally independently novel combination as well, even if combined in their broadest sense; i.e. with less than the specific manner in which each of the processes or sequences has been reduced to practice.
The present invention has been described in sufficient details with a certain degree of particularity. It is understood to those skilled in the art that the present disclosure of embodiments has been made by way of examples only and that numerous changes in the arrangement and combination of parts may be resorted without departing from the spirit and scope of the invention as claimed. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description of embodiments.
This application claims the benefits of U.S. Provisional Application No. 61/964,190, filed Dec. 27, 2013, and entitled “Method and Apparatus to Provide Hand Gesture Based Interaction with Augmented Reality Application”, which is hereby incorporated by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
61964190 | Dec 2013 | US |