The present invention is directed to the use of avatars at interactive kiosks. More particularly, the present invention is directed to methods and apparatus for selecting and customizing avatars based on visual appearance and gait analysis of a user.
Interactive kiosks are becoming more and more prevalent in today's society. Conventional kiosks range from informative to transactional, including countless varieties of combinations thereof. Conventional kiosks typically include a keyboard, a trackball or mouse-type device, a touchscreen, and/or a card reader for paging through menus, inputting data, and completing transactions.
Given that a portion of the population prefers not to interact with a kiosk in an impersonal, computer-oriented environment, it may be desirable to provide a kiosk having a mechanism to personalize the interaction with users. For example, it may be desirable to provide a kiosk with an avatar for interacting with users. Motion of the avatar can be controlled so as to mimic human motions and behavior.
Still, avatars may not always attract new users because certain portions of the population may be reluctant to interact with other portions of the population with which they are uncomfortable. For example, a young, contemporary college student may not be inclined to interact with a kiosk having an avatar that mimics an older, traditional business man. It should be appreciated how every facet of an avatar's appearance can appeal to or offend a potential user. Features such as age, gender, race, hair length, glasses, piercings, tattoos, attire, gait, and other aspects of appearance can influence whether a user is more or less willing to interact with an avatar-based kiosk.
Some users may be more attracted to an interactive kiosk if the avatar has an appearance and/or behavior that reflects the general characteristics of a user. For example, a more youthful user may be more inclined to interact with a kiosk having a similarly youthful-looking avatar, and a more elderly person may be more inclined to interact with a kiosk having a similarly elderly-looking avatar. Thus, it may be desirable to provide a system and method for observing the appearance and/or behavior of a user prior to initiation of interaction with the kiosk and to select an avatar for interaction based on the observations.
According to various aspects of the disclosure, a method of generating an avatar for a user may include receiving image data of a user from a camera, generating feature vectors for a plurality of features of a user, associating the user with a likely user group selected from a number of defined user groups based on the feature vectors, and assigning an avatar based on the associated user group.
In accordance with some aspects of the disclosure, an apparatus for avatar generation may comprise a video interface configured to receive image data of a user, and an avatar generation engine configured to receive the image data from the video interface, generate feature vectors for a plurality of features of a user, associate the user with a likely user group selected from a number of defined user groups based on the feature vectors, and assign an avatar based on the associated user group.
In various aspects of the disclosure, a method of incrementally training a user group classifier may comprise receiving image data of a user from a camera, generating an aggregate feature vector from a plurality of feature vectors associated with a plurality of features of a user, receiving personal information and/or personal preferences input by the user, and determining a target user group for the user based on the user input. The method may include associating the aggregate feature vector with the determined target user group and training a user group classifier based on the association of the aggregate feature vector with the determined target user group
In order to describe the manner in which the above-recited and other advantages and features of the invention can be obtained, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered to be limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
The kiosk system 100 may include the computer 102, a video display 116, and input devices 120, 122, 124. In addition, the kiosk system 100 can have any of a number of other output devices including line printers, laser printers, plotters, and other reproduction devices connected to the computer 102. The kiosk system 100 can be connected to one or more other computers via a communication interface 108 using an appropriate communication channel 130 such as a modem communications path, a computer network, or the like. The computer network may include a local area network (LAN), a wide area network (WAN), an Intranet, and/or the Internet.
The computer 102 may comprise a processor 104, a memory 106, input/output interfaces 108, 118, a video interface 110, an avatar generation engine 112, and a bus 114. Bus 114 may permit communication among the components of the computer 102.
Processor 104 may include at least one conventional processor or microprocessor that interprets and executes instructions. Memory 106 may be a random access memory (RAM) or another type of dynamic storage device that stores information and instructions for execution by processor 104. Memory 106 may also include a read-only memory (ROM) which may include a conventional ROM device or another type of static storage device that stores static information and instructions for processor 104.
The video interface 110 is connected to the video display 116 and provides video signals from the computer 102 for display on the video display 116. User input to operate the computer 102 can be provided by one or more input devices 120, 122, 124 via the input/output interface 118. For example, an operator can use the keyboard 124 and/or a pointing device such as the mouse 122 to provide input to the computer 102. In some aspects, the camera 120 may provide video data to the computer 102.
The kiosk system 100 and computer 102 may perform such functions in response to processor 104 by executing sequences of instructions contained in a computer-readable medium, such as, for example, memory 106. Such instructions may be read into memory 106 from another computer-readable medium, such as a storage device or from a separate device via communication interface 108.
The kiosk system 100 and computer 102 illustrated in
Referring now to
As shown in
The gait module may observe step size and/or frequency, body tilt, or the like. The physical features module may perform height and weight estimation, for example, via a calibrated camera. The age/gender may determine whether a user is young, old, or middle based on determined thresholds, as well as the gender of the user.
The facial features module may observe iris color, emotion, a mustache, or the like, while the skin features module may observe skin tone. The hair features module may observe hair tone and texture, length of hair, and the like. The dressing features module may observe cloth tone and texture, amount of exposed skin area, t-shirts, jeans, suit, etc. The accessories module may observe glasses, piercings, tattoos, or the like, while the shoe module may differentiate between athletic, casual, and formal shoes.
The avatar generation module 112 may include a user group classifier module 252 and a prominent feature filter 254. The user group classifier module 252 receives the aggregated feature vector and determines, using pattern classification techniques such as nearest neighbor classification (K-means), a user group to which the user most likely belongs. The determination of the user group may be a selection among a number of user groups stored in an avatar database 256 along with at least one avatar representative of each user group. The number of user groups, as well as which group a given aggregate feature vector may associate to, can be modified dynamically as more information is gathered from users or as input by a system administrator.
The avatars representative of each user group may also be dynamically updated as more users are associated with each group. For example, if a certain percentage of users associated with a user group include the same prominent features, as determined by the prominent feature filter 254 (discussed below), the avatar associated with that user group may be modified to include that prominent feature. The avatars may also be updated from time to time by the system administrator to more accurately reflect the always-changing identity of each user group.
The prominent feature filter 254 also receives the aggregate feature vector. The prominent feature filter 254 is configured to determine prominent features of the user based on the aggregate feature vector representative of the image data from the camera 120. A number of agents can be designed to detect, for example, the unusual or distinguish features from the user, such as green hair, nose piercing, etc. The avatar generation engine 112 may be configured to customize the avatar selected by the user group classifier module 252 by adding the prominent features of the user identified by the prominent feature filter 254. The avatar generation engine 112 can then output the customized avatar to the display 116 of the kiosk system 100 for presentation to and interaction with the user.
For illustrative purposes, the avatar generation process of the avatar generation engine 112 will be described below in relation to the block diagrams shown in
In step 3300, the visual analysis modules 250 each generate a feature vector. It should be appreciated that the feature vector can be generated based on a single frame of image data or based on a series of frames of image data. One skilled in the art will recognize the benefit of considering at least a nominal number of frames when generating the feature vectors. The feature vectors are combined into an aggregate feature vector that is input to the user group classifier module 252.
The process continues to step 3400, where the user group classifier module 252 associates the user with a user group that is determined to be the most likely group for that user based on the aggregate feature vector. Control then continues to step 3500, where the avatar generation engine 112 retrieves the avatar for the associated user group from the database 256 of avatars and associates the retrieved avatar with the user. Control proceeds to step 3600.
Next, in step 3600, the prominent feature filter 254 determines whether the user displays any prominent features based on the aggregate feature vector compiled from the feature vectors of the feature analysis modules 250. The feature vectors, and thus the aggregate feature vector, may be continuously updated throughout this process. The process then goes to step 3700.
If, in step 3700, the avatar generation engine 112 determines that the user possesses one or more prominent features, control proceeds to step 3800. In step 3800, the avatar generation engine 112 customizes the user's avatar with prominent feature information recommended by the prominent feature filter 254. Control then goes to step 3900, where the customized avatar is output for user interaction, for example, via the display 116 of the kiosk system 100. Control then proceeds to step 4000, where control returns to step 3600.
If, in step 3700, the avatar generation engine 112 determines that the user does not possess one or more prominent features, control goes to step 3900 without customization to the retrieved avatar. In step 3900, the avatar is output for user interaction, and control goes to step 4000, where control returns to step 3600.
As the feature vectors and aggregate feature vector are continuously updated based on the latest frames of image data, the prominent feature filter 254 may determine, in step 3600, additional prominent features of the user that may be used to further customize the avatar in step 3700. It should be appreciated that, in some exemplary embodiments, the process of
Referring now to
Classifier A 460 may be configured to determine a target user group for the user based on the inputted personal information and preferences. The training module 464 may be configured to attempt to associate the aggregated feature vector received from the video tracking input (e.g., camera 120) via the video analysis modules 250 with the target user group determined by classifier A 460. As a result of this association of information and video data, the training module 464 may provide the parameters for classifier B 462.
Classifier A 460 may be dedicated to offline training, such as, for example, via user registration information, and can therefore provide reliable user group classification. However, for a first time user, the user's personal information and preferences are not available. Thus, the user group classifier 252 may rely on classifier B 462 to provide a most likely user classification based solely on visual features received via the video analysis modules 250.
After a user is registered and new personal information and preferences are input, classifier B's determination may need to be slightly adjusted. This adjustment may be referred to as incremental online training. Again, the detailed user profile information and/or user preferences is given to classifier A 460. If the output of classifier A 460 differs from that of classifier B 462, then classifier B is adjusted accordingly towards the target user group determined by classifier A.
Embodiments within the scope of the present disclosure may also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or combination thereof) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable media.
Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments. Generally, program modules include routines, programs, objects, components, and data structures, etc. that perform particular tasks or implement particular abstract data types. Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
It will be apparent to those skilled in the art that various modifications and variations can be made in the devices and methods of the present disclosure without departing from the scope of the invention. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and examples be considered as exemplary only.