One of the challenges for digital merchandising is how to bridge the gap between attracting attention of potential customers and engaging with those customers. One of the attempts to bridge this gap is the Tesco Virtual Supermarket, which allows customers to buy groceries to be delivered later by using their mobile devices to capture QR codes associated with virtual products as represented by imagery replicates of products. This method works well for people buying basic products, such as groceries, in a fast-paced environment with one benefit being time-saving.
For discretionary purchases, however, it can be a challenge to convert a potential customer or hesitant shopper to a confident buyer. One example is the photo kiosk operation at public attractions such as theme parks, where customers can purchase photos of themselves on a theme park ride. While the operators have invested in equipment and personnel trying to sell these high-quality photos to customers, empirical evidence suggests that their photo purchase rate is low, resulting in a low return on investment. The main reason appears to be that most customers opt to use their mobile devices to take snapshots from the photo preview displays, instead of purchasing the photos.
For a typical digital signage or kiosk, the visual representation of merchandise is targeted to help promote the merchandise. For certain merchandise, however, such a visual representation could actually impede the sales. In the case of the preview display at theme parks, while it is necessary for potential customers to preview and decide whether to purchase the merchandise (digital or physical photo), it also exposes the merchandise that can be copied, albeit at lower quality, by the customers with their cameras. Such action renders the original content valueless to the operator despite the investment.
A system for human interaction based upon intention detection, consistent with the present invention, includes a display device for electronically displaying information, a sensor for providing information relating to a posture of a person detected by the sensor, and a processor electronically connected with the display device and sensor. The processor is configured to receive the information from the sensor and process the received information in order to determine if an event occurred. This processing involves determining whether the posture of the person indicates a particular intention. If the event occurred, the processor is configured to provide an interaction with the person via the display device.
A method for human interaction based upon intention detection, consistent with the present invention, includes receiving from a sensor information relating to a posture of a person detected by the sensor and processing the received information in order to determine if an event occurred. This processing step involves determining whether the posture of the person indicates a particular intention. If the event occurred, the method includes providing an interaction with the person via a display device.
The accompanying drawings are incorporated in and constitute a part of this specification and, together with the description, explain the advantages and principles of the invention. In the drawings,
Embodiments of the present invention include a human interaction system that is capable of identifying potential people of interest in real-time and interacting with such people through real-time or time-shifted communications. The system includes a dynamic display device, a sensor, and a processor device that can capture and detect certain postures in real-time. The system can also include server software application run by the service providers and client software run on user's mobile devices. Such system enables service providers to identify, engage, and transact with potential customers, who also benefit from targeted and nonintrusive services.
In operation, system 10 via depth sensor 22 detects, as represented by arrow 25, a user having a mobile device 24 with a camera. Depth sensor 22 provides information to computer 12 relating to the user's posture. In particular, depth sensor 22 provides information concerning the position and orientation of the user's body, which can be used to determine the user's posture. System 10 using processor 16 analyzes the user's posture to determine if the user appears to be taking a photo, for example. If such posture (intention) is detected, computer 12 can provide particular content on display device 20 relating to the detected intention, for example a QR code can be displayed. The user upon viewing the displayed content may interact with the system using mobile device 24 and a network connection 26 (e.g., Internet web site) to web server 14.
Display device 20 can optionally display the QR code with the content at all times while monitoring for the intention posture. The QR code can be displayed in the bottom corner, for example, of the displayed picture such that it does not interfere with the viewing of the main content. If intention is detected, the QR code can be moved and enlarged to cover the displayed picture.
In this exemplary embodiment, the principle of detecting a photo taking intention (or posture) is based on the following observations. The photo taking posture is uncommon; therefore, it is possible to differentiate from normal postures such as customers walking by or simply watching a display. The photo taking postures from different people share some universal characteristics, such as the three-dimensional position of a camera relative to the head and eye and the object being photographed, despite different types of cameras and ways to use them. In particular, different people use their cameras differently, such as single-handed photo taking versus using two hands, and using an optical versus electronics viewfinder to take a photo. However, as illustrated in
This observation is abstracted in
Embodiments of the present invention can simplify the task of sensing those positions through an approximation, as shown in
The camera viewfinder position is approximated with the position(s) of the camera held by the photo taker's hand(s), Pviewfinder≈Phand (Prhand and Plhand). The eye position is approximated with the head position, Phead≈Peye. The object position 48 (center of display) for the object being photographed is calculated with the sensor position and a predetermined offset between the sensor and the center of display, Pdisplay=Psensor+Δsensor
Therefore, the system determines if the detected event has occurred (photo taking) when the head (Phead) and at least one hand (Prhand or Plhand) of the user form a straight line pointing to the center of display (Pdisplay). Additionally, more qualitative and quantitative constraints can be added in spatial and temporal domains to increase the accuracy of the detection. For example, when both hands are aligned with the head-display direction, the likelihood of correct detection of photo taking is significantly higher. As another example, when the hands are either too close or too far away from the head, it may indicate different postures (e.g., pointing at the display) other than a photo taking event. Therefore, a hand range parameter can be set to reduce false positives. Moreover, since the photo-taking action is not instantaneous, a “persistence” period can be added after the first positive posture detection to ensure that such detection was not the result of false momentarily body or joint recognition by the depth sensor. The detection algorithm can determine if the user remains in the photo-taking posture for a particular time period, for example 0.5 seconds, to determine that an event has occurred.
In the real world the three points (object, hand, head) are not perfectly aligned. Therefore, the system can consider the variations and noise when conducting the intention detection. One effective method to quantify the detection is to use the angle between the two vectors formed by the left or right hand, head, and the center of display as illustrated in
System 10 processes the received information from sensor 22 in order to determine if an event occurred (step 64). As described in the exemplary embodiment above, the system can determine if a person in the monitored space is attempting to take a photo based upon the person's posture as interpreted by analyzing the information from sensor 22. If an event occurred (step 66), such as detection of a photo taking posture, system 10 provides interaction based upon the occurrence of the event (step 68). For example, system 10 can provide on display device 20 device a QR code, which when captured by the user's mobile device 24 provides the user with a connection to a network site such as an Internet web site where system 10 can interact with the user via the user's mobile device. Aside from a QR code, system 10 can display on display device 20 other indications of a web site such as the address for it. System 10 can also optionally display a message on display device 20 to interact with the user when an event is detected. As another example, system 10 can remove content from display device 20, such as an image of the user, when an event is detected.
Although the exemplary embodiment has been described with respect to a potential customer, the intention detection method can be used to detect the intention of others and interact with them as well.
Table 1 provides sample code for implementing the event detection algorithm in software for execution by a processor such as processor 16.
Number | Name | Date | Kind |
---|---|---|---|
6256046 | Waters et al. | Jul 2001 | B1 |
6531999 | Trajkovic | Mar 2003 | B1 |
7613324 | Venetianer | Nov 2009 | B2 |
8606011 | Ivanchenko et al. | Dec 2013 | B1 |
8606735 | Cho et al. | Dec 2013 | B2 |
8657683 | Langridge et al. | Feb 2014 | B2 |
20060256083 | Rosenberg | Nov 2006 | A1 |
20070103552 | Patel | May 2007 | A1 |
20070298882 | Marks et al. | Dec 2007 | A1 |
20080030460 | Hildreth et al. | Feb 2008 | A1 |
20090100338 | Saetti | Apr 2009 | A1 |
20100053359 | Mooradian | Mar 2010 | A1 |
20100199228 | Latta et al. | Aug 2010 | A1 |
20100199232 | Mistry et al. | Aug 2010 | A1 |
20100266210 | Markovic et al. | Oct 2010 | A1 |
20100302138 | Poot et al. | Dec 2010 | A1 |
20110034244 | Marks et al. | Feb 2011 | A1 |
20110107216 | Bi | May 2011 | A1 |
20110128384 | Tiscareno | Jun 2011 | A1 |
20110175810 | Markovic et al. | Jul 2011 | A1 |
20110262002 | Lee | Oct 2011 | A1 |
20110301934 | Tardif | Dec 2011 | A1 |
20120119985 | Kang | May 2012 | A1 |
20120159404 | Vasireddy et al. | Jun 2012 | A1 |
20120176303 | Miyake | Jul 2012 | A1 |
20130278493 | Wei et al. | Oct 2013 | A1 |
20140089866 | Mongia et al. | Mar 2014 | A1 |
Entry |
---|
3db Solution product information, 1 page. http://www.3dbsolution.com/products.php. |
Durrant, “Automics: Souvenir Generating Photoware for Theme Parks”, ACM CHI 2011, 10 pages. |
Elmezain, “Posture and Gesture Recognition for Human-Computer Interaction”, Advanced Technologies, pp. 415-440. |
Khan, “Face and Arm-Posture Recognition for Secure Human-Machine Interaction”, IEEE SMC 2008, 7pages. |
Koppel, “Chained Displays: Configurations of Public Displays Can Be Used to Influence Actor-, Audience-, and Passer-By Behavior”, ACM CHI 2012, pp. 317-326. |
Kuikkaniemi, “From Space to Stage: How Interactive Screens Will Change Urban Life”, IEEE Computer Society, Jun. 2011, pp. 40-47. |
Müller, “Jörg, Persuasion with Smart Digital Signage”, ACM CHI 2008, pp. 1-4. |
Müller, “Looking Glass: A Field Study on Noticing Interactivity of a Shop Window”, ACM CHI 2012, pp. 297-306. |
Picsolve product information on Ride Photography, 9 pages. http://picsolve.com/photography—services/ride—photography.aspx. |
Rehg, “Vision for a Smart Kiosk”, Computer Vision and Pattern Recognition, Jun. 1997, pp. 690-696. |
Truong, “Preventing Camera Recording by Designing a Capture-Resistant Environment”, Ubicomp 2005, pp. 1-14. |
Vogel, “Interactive Public Ambient Displays: Transitioning from Implicit to Explicit, Public to Personal, Interaction with Multiple Users”, ACM UIST 2004, 10 pages. |
Number | Date | Country | |
---|---|---|---|
20140139420 A1 | May 2014 | US |