Benefit is claimed under 35 U.S.C. 119(a)-(d) to Foreign application Serial No. 556/CHE/2010 entitled “SYSTEM AND METHOD FOR POINT, SELECT AND TRANSFER HAND GESTURE BASED USER INTERFACE” by Hewlett-Packard Development Company, L.P., filed on Mar. 3, 2010, which is herein incorporated in its entirety by reference for all purposes.
In the pursuit of human-computer interface (HCI) beyond touch-based interface, hand-based gestures, such as those created by the movement of a hand, are being considered as the next mode of interaction. Such hand based gestures are sometimes preferred over a touch-based interface, especially when users like to avoid touching a computer display surface, as in the case of a public-display terminal due to concerns about infections through touching or in a greasy-hand scenario due to concerns about leaving messy imprints on the computer display surface.
There are numerous gesture based recognition systems and techniques for HCI. Majority of these systems use a computer vision system to acquire an image of a user for the purpose of enacting a user input function. In a known system, a user may point at one of a plurality of selection options on a display. The system using one or more image acquisition devices, such as a single image camera or a motion image camera, acquires one or more images of the user pointing at the one of the plurality of selection options. Utilizing these one or more images, the system determines an angle of the pointing. The system then utilizes the angle of pointing, together with determined distance and height data, to determine which of the plurality of selection options the user is pointing to. These systems all have a problem of inaccurately determining the intended selection option in that the location of the selection options in a given display must be precisely known for the system to determine the intended selection option. Further these systems have problems in accurately determining the precise angle of pointing, height and the like that is required for making a reliable determination.
There are other numerous gesture based interaction systems that use depth data obtained using time-to-flight based infra-red depth sensors. However, these systems are typically, designed for specific applications, such as gaming, entertainment, healthcare and so on. Further, some of these systems require carrying a remote control like device.
Various embodiments are described herein with reference to the drawings, wherein:
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.
A system and method for a point, select and transfer hand gesture based user interface is disclosed. In the following detailed description of the embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
In the document, the terms “user interface” and “human-computer interface” are used interchangeably throughout the document.
In some embodiments, the in-front camera captures the depth image of the hand gesture made within a predefined interaction volume for performing various operations associated with select objects. The predefined interaction volume may substantially extend in front of a display screen of the display device by a distance approximately in the range of about 0.5 meter to 1 meter. The select objects may be digital content displayed on the display screen of the display device, such as files, folders, and the like, and the various operations associated with the select objects may include selecting, cutting, copying, and pasting of one or more of the select objects using a hand gesture vocabulary.
At step 110, a nearest point of the hand gesture to the display screen is found using a substantially nearest depth value in the captured depth image for each frame. In some embodiments, each pixel in the captured depth image may be assigned a depth value. In these embodiments, a pixel associated with a nearest depth value may be found. If the captured depth image is a non-inverted depth image, then pixels associated with an object nearer to the in-front camera may appear brighter in the captured depth image and hence a pixel with a highest depth value may be considered as the nearest depth value. In case the captured depth image is inverted, pixels associated with the object nearer to the in-front camera may appear darker in the captured depth image and hence a pixel with a lowest depth value may be considered as the nearest depth value. Accordingly, a location (X, Y) of the pixel associated with the nearest depth value in the captured depth image may be found and thus the location may be used to find the nearest point of the hand gesture.
At step 115, a depth variance is computed using depth values associated with pixels substantially surrounding the pixel with the nearest depth value for each frame. At step 120, it is determined whether the computed depth variance is within a predefined range of variance threshold. If the computed depth variance is within the predefined range of variance threshold, then at step 125, the found nearest point is validated as associated with a hand of a user in the captured depth image. The found nearest point may be a tip of a finger or a part of the hand. If the computed depth variance is not within the predefined range of variance threshold, then it implies that the found nearest point is not associated with the hand in the captured depth image and step 105 is repeated.
At step 130, an image-to-screen mapping of the captured depth image and the found nearest point to the display screen is performed. For example, consider that, a depth image is of width Xmax and breadth Ymax, and the display screen is of width Umax and breadth Vmax. Then, an X co-ordinate on the display screen may be computed as:
U=X/X
max
*U
max, and
a Y co-ordinate on the display screen may be computed as:
V=Y/Y
max
*V
max,
where X and Y are the co-ordinates associated with the location of the pixel with the nearest depth value, and U and V are the co-ordinates associated with a location on the display screen. In this manner, the image-to-screen mapping of the captured depth image and the found nearest point to the display screen may be performed by mapping the X and Y co-ordinates associated with the location of the pixel to the U and V co-ordinates associated with the location on the display screen. As a result, an estimated pointing location (U, V) on the display screen may be obtained by performing the image-to-screen mapping of the captured depth image and the found nearest point to the display screen.
At step 135, the estimated pointing location is smoothened by temporal averaging of the estimated pointing location. The estimated pointing location may be smoothened to eliminate jerky pointing due to quantization and to produce a smooth interaction experience for the user. At step 140, it is determined whether the found nearest point is within a first predetermined threshold range. If the found nearest point is within the first predetermined threshold range, then step 145 is performed. At step 145, the found nearest point is declared as a pointing hand gesture and one of the select objects associated with the estimated pointing location is highlighted. If the found nearest point is not within the first predetermined threshold range, then step 150 is performed. At step 150, it is determined whether the found nearest point is within a second predetermined threshold range. In one example embodiment, the user may continue to make a selecting hand gesture following the pointing hand gesture. In such a case, step 150 is performed upon performing the step 145.
If it is determined that the found nearest point is within the second predetermined threshold range, then step 155 is performed, else step 105 is performed. At step 155, the found nearest point is declared as a selecting hand gesture or a pecking hand gesture and the highlighted one of the select objects is selected. The term ‘pecking hand gesture’ may be defined as a pecking action made with a pointed finger within the second predetermined threshold range. In one example embodiment, the selected one of the select objects may be displayed as a full screen mode view on the display screen. In another example embodiment, the selected one of the select objects may be transferred from a source location to a destination location. In one example, the source location and the destination location may be on the display screen of the display device. In another example, the source location may be within the display device with the in-front camera disposed around it, while the destination location may be within another display device such as a desktop, a laptop, a mobile phone, a smart phone and the like, connected to the display device using wired or wireless networks and located within the field-of-view of the in-front camera.
For transferring the one of the select objects, the selected one of the select objects is grabbed using a grabbing hand gesture. The grabbing operation may include copying or cutting the selected one of the select objects. The in-front camera captures a depth image associated with the grabbing hand gesture to perform the grabbing operation. Subsequently, the grabbed one of the select objects may be transferred to the destination location from the source location. In some embodiments, the grabbed one of the select objects may be transferred by moving the forearm with the grabbing hand gesture towards the destination location and then a release hand gesture may paste the grabbed one of the select objects to the destination location. The detailed process of transferring the one or more select objects is described in greater detail in
It can be seen from
In an example operation, the in-front camera 310 may capture the depth image 200B of a gesture made by the hand 210 within a predefined interaction volume 325. As shown in
In some embodiments, the processor 302 may validate the found nearest point 215 as associated with the hand 210 if a depth variance is within a predefined range of variance threshold. In these embodiments, the processor 302 may compute the depth variance using depth values associated with pixels substantially surrounding the pixel with the nearest depth value for each frame. Upon validation of the hand 210, the processor 302 may perform an image-to-screen mapping of the found nearest point 215 to the display screen 315.
The processor 302 then determines whether the found nearest point 215 is within the first predetermined threshold range 330 or within the second predetermined threshold range 335. In the example embodiment illustrated in
At step 515, a pose of the hand gesture in the captured depth image is identified based on the segmented region of the hand. For example, the pose of the hand gesture may be a select pose, a grab pose or a release pose. In some embodiments, the pose of the hand gesture is identified using a representation or a feature of the region of the hand. At step 520, a location of the hand associated with the pose of the hand gesture in the captured depth image is obtained from the segmented region of the hand.
At step 525, it is determined whether a number of frames associated with the captured depth image is equal to a predetermined number of frames. The determination in step 525 may be performed to determine a length of time window as hand gestures related to select, copy/cut and paste actions are performed by the user at a coarser time intervals as compared with a video frame rate. If it is determined that the number of frames associated with the captured depth image is not equal to the predetermined number of frames, then step 505 is repeated. If it is determined that the number of frames associated with the depth image is equal to a predetermined number of frames, then step 530 is performed.
At step 530, the identified pose of the hand gesture and the location of the hand are temporally integrated for the predetermined number of frames. The temporal integration may be performed using an averaging time window. At step 535, a sequence of poses of the hand gestures, such as point, grab, and release, is recognized based on the temporally integrated pose of the hand gesture and the location of the hand. In one example embodiment, the sequence of poses is recognized using a finite state machine. At step 540, the digital content is transferred from the source location to the destination location in the display screen of the display device by executing actions corresponding to the recognized sequence of poses.
Following the selection of the file 608, the user performs a half-grabbing hand gesture 610 towards the location of the selected file 608 on the display screen 315 to copy the selected file 608, as shown in
Further, the user moves his/her hand with the half closed fist towards the destination location 604 on the display screen 315 as shown in
In one example embodiment, the copied file 608 may be transferred when the found nearest point of the release hand gesture 612 is within one or more predetermined threshold ranges and based on the outcome of the image-to-screen mapping, as discussed in
At step 710, a region of the hand is identified and segmented from the captured depth image of the hand gesture for each frame. In some embodiments, the hand region may be segmented based on depth information obtained from the depth image of the hand gesture. At step 715, a pose of the hand gesture in the captured depth image is obtained based on the segmented region of the hand. For example, the pose of the hand gesture may be a select pose, a grab pose or a release pose. In some embodiments, the pose of the hand gesture is identified using a representation or a feature of the region of the hand. At step 720, a location of the hand associated with the pose of the hand gesture in the captured depth image is obtained from the segmented region of the hand.
At step 725, a presence of a pre-registered destination device is detected within the field-of-view of the in-front camera. At step 730, a direction of the hand during the pose of the hand gesture is detected. For example, it may be detected whether the hand is directed towards the source device or towards the destination device.
At step 735, it is determined whether a number of frames associated with the captured depth image is equal to a predetermined number of frames. The determination in step 735 is performed to determine a length of time window as hand gestures related to select, copy/cut and paste actions are performed by the user at a coarser time intervals as compared with a video frame rate. If it is determined that the number of frames associated with the captured depth image is not equal to the predetermined number of frames, then step 705 is repeated. If it is determined that the number of frames associated with the captured depth image is equal to a predetermined number of frames, then, step 740 is performed.
At step 740, the identified pose of the hand gesture, the location of the hand, presence of the pre-registered destination device and the direction of the hand are temporally integrated for the predetermined number of frames. The temporal integration may be performed using an averaging time window. At step 745, a sequence of poses of the hand gestures, such as point, grab, and release, is recognized based on the temporally integrated pose of the hand gesture and the location of the hand over the time window. In one example embodiment, the sequence of poses is recognized using a finite state machine. At step 750, the digital content is transferred from the source device to the destination device by executing actions corresponding to the recognized sequence of poses.
Following the selection of the file 808, the user performs a full-grabbing hand gesture 810 towards the location of the selected file 808 on the computer 802 to cut the selected file 808, as shown in
Further, the user moves his/her hand with the fully closed fist towards the mobile device 804 as shown in
A general computing device 902, such as the point, select and transfer hand gesture based user interface system 300, in the form of a personal computer, or a laptop may include the processor 302, the memory 304, a removable storage 916, and a non-removable storage 918. The computing device 902 additionally includes a bus 912 and a network interface 914. The computing device 902 may include or have access to the computing environment 900 that includes one or more user input devices 920, one or more output devices 922, and one or more communication connections 924 such as a network interface card or a universal serial bus connection.
The one or more user input devices 920 may be the in-front camera 310, keyboard, trackball, and the like. The one or more output devices 926 may be the display device 305 of the personal computer, or the laptop. The communication connection 924 may include a local area network, a wide area network, and/or other networks.
The memory 304 may include volatile memory 904 and non-volatile memory 906. A variety of computer-readable storage media may be stored in and accessed from the memory elements of the computing device 902, such as the volatile memory 904 and the non-volatile memory 906, the removable storage 916 and the non-removable storage 918. Computer memory elements may include any suitable memory device(s) for storing data and machine-readable instructions, such as read only memory, random access memory, erasable programmable read only memory, electrically erasable programmable read only memory, hard drive, removable media drive for handling compact disks, digital video disks, diskettes, magnetic tape cartridges, memory cards, Memory Sticks™, and the like.
The processor 302, as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction word microprocessor, an explicitly parallel instruction computing microprocessor, a graphics processor, a digital signal processor, or any other type of processing circuit. The processing unit 904 may also include embedded controllers, such as generic or programmable logic devices or arrays, application specific integrated circuits, single-chip computers, smart cards, and the like.
Embodiments of the present subject matter may be implemented in conjunction with program modules, including functions, procedures, data structures, and application programs, for performing tasks, or defining abstract data types or low-level hardware contexts. Machine-readable instructions stored on any of the above-mentioned storage media may be executable by the processor 302 of the computing device 902. For example, a computer program 908 may include machine-readable instructions capable of providing a point, select and transfer hand gesture based user interface, according to the teachings and herein described embodiments of the present subject matter. In one embodiment, the computer program 908 may be included on a compact disk-read only memory (CD-ROM) and loaded from the CD-ROM to a hard drive in the non-volatile memory 906. The machine-readable instructions may cause the computing device 902 to encode according to the various embodiments of the present subject matter.
As shown, the computer program 908 includes a point, select and transfer hand gesture based user interface module 910 to capture a depth image 200B of a hand gesture using the in-front camera 310 substantially on a frame by frame basis within the 325 predefined interaction volume. The in-front camera 310 is substantially disposed around the display device 305 which is designed to display a plurality of select options. Further, the point, select and transfer hand gesture based user interface module 910 may find a nearest point of the hand gesture to the display screen 315 of the display device 305 using a substantially nearest depth value in the captured depth image 200B for each frame.
In addition, the point, select and transfer hand gesture based user interface module 910 may perform an image-to-screen mapping of the captured depth image 200B and the found nearest point to the display screen 315 upon validating the found nearest point as associated with the hand for each frame. Moreover, the point, select and transfer hand gesture based user interface module 910 may point and select one of the plurality of displayed select options on the display screen 315 of the display device 305 when the nearest depth value is within one or more predetermined threshold values or ranges, and based on the outcome of the image-to-screen mapping.
In one exemplary implementation, the point, select and transfer hand gesture based user interface module 910 may point and select digital content displayed on the display screen 315 of the source display device 305 when the nearest depth value associated with a grabbing hand gesture is within one or more predetermined threshold values or ranges, and based on the outcome of the image-to-screen mapping. The point, select and transfer hand gesture based user interface module 910 may then grab the digital content upon pointing and selecting the digital content displayed on the display screen 315. Moreover, the point, select and transfer hand gesture based user interface module 910 may transfer the digital content to a destination display device when the nearest depth value associated with a release hand gesture is within one or more predetermined threshold values or ranges, and based on the outcome of the image-to-screen mapping.
For example, the point, select and transfer hand gesture based user interface module 910 described above may be in the form of instructions stored on a non-transitory computer-readable storage medium. The non-transitory computer-readable storage medium having the instructions that, when executed by the computing device 902, may cause the computing device 902 to perform the one or more methods described in
In various embodiments, the methods and systems described in
Although the present embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the various embodiments. Furthermore, the various devices, modules, analyzers, generators, and the like described herein may be enabled and operated using hardware circuitry, for example, complementary metal oxide semiconductor based logic circuitry, firmware, software and/or any combination of hardware, firmware, and/or software embodied in a machine readable medium. For example, the various electrical structure and methods may be embodied using transistors, logic gates, and electrical circuits, such as application specific integrated circuit.
Number | Date | Country | Kind |
---|---|---|---|
556/CHE/2010 | Mar 2010 | IN | national |