In general, the present invention relates to systems and methods that are used to create virtual avatars and/or virtual objects that are displayed when a user interacts with a computer interface. More particularly, the present invention relates to virtual avatars and/or virtual objects that appear to project vertically above or in front of a display screen and are viewed while listening to and/or while speaking to an audio signal broadcast or an audio communication.
People interact with computers for a wide variety of reasons. As computer software becomes more sophisticated and processors become more powerful, computers are being integrated into many parts of everyday life. In the past, people had to sit at a computer keyboard or engage a touch screen to interact with a computer. In today's environment, many people interact with computers merely by talking to the computer. Various companies have programmed voice recognition interfaces. For example, Apple Inc., has developed Siri® to enable people to verbally interact with their iPhones®. Amazon Inc. has developed Alexa® to enable people to search the world wide web and order products through Amazon®.
Although interacting with a computer via a voice recognition interface is far more dynamic than a keypad or touch pad, it still has drawbacks. When two humans communicate face to face, many of the communication cues used in the conversation are visual in nature. The manner in which people move their eyes or tilt their heads provides additional meaning to words that are being spoken. When communications are purely based on audio signals, such as during a phone call, much of the nuance is lost. Likewise, when a computer communicates with a human through an audio interface, nuanced information is lost.
In order for a computer to provide a visual communication cue or response, it must provide an image of a person or object through which it can communicate or provide an active response. A virtual image of a person in a computer-generated environment is commonly called an avatar.
In the prior art, there are many systems that use avatars to transmit visual communication cues. In U.S. Patent Application Publication No. 2006/0294465 to Ronene, an avatar system is provided for a smart phone. The avatar system provides a face that changes expression in the context of a conversation. The avatar can be customized and personalized by a user.
A similar system is found in U.S. Patent Application Publication No. 2006/0079325 to Trajkovic which shows an avatar system for smart phones. The avatar can be customized, where aspects of the avatar are selected from a database.
U.S. Patent Application Publication No. 2013/0212501 to Anderson presents an avatar system that enables a computer, such as a personal computer, to provide visual cues to a user who is interacting with the computer. The avatar is customizable and changes with changing context in the communication.
An obvious problem with such prior art avatar systems is that the avatar is two-dimensional. Furthermore, the avatar is displayed on a screen that may be less than two inches wide. Accordingly, many of the visual cues that can be performed by the avatar can be difficult to see and easy to miss.
Little can be done to change the screen size on many devices such as smart phones. However, many of the disadvantages of a small two-dimensional avatar can be minimized by presenting an avatar that is three-dimensional. This is especially true if the three-dimensional effects designed into the avatar cause the avatar to appear to project out of the plane of the display. In this manner, the avatar will appear to project above or forward of the smart phone or other device during a conversation.
The best avatar would be a virtual avatar that appears as a stereoscopic or auto-stereoscopic image that projects forward of or in front of the plane of a display screen. The display screen can be placed in a vertical position common to televisions or desktop computer displays whereby the viewer would look straight ahead at the display. Alternatively, a display screen can be placed horizontally in a flat position somewhat in front of the viewer, whereby the viewer would look downward at the display. In this position, the avatar would appear to project vertically from, or above the plane of the display screen.
Three-dimensional images that are presented in this manner are particularly useful in creating avatars or objects that can be functionally viewed and manipulated during cellular phone calls, video calls, cellular or video phone conferences, cellular or video business presentations, cellular or video product presentations, cellular or video instructional and/or training presentations, acting as a virtual receptionist, a virtual museum guide and more. The virtual image of the avatar or object appears to float in front of, or to stand atop the screen, as though the image is projected into the space in front of, or above the screen.
In the prior art, there are many systems that exist for creating stereoscopic and auto-stereoscopic images that appear three-dimensional. However, most prior art systems create three-dimensional images that appear to exist behind or below the plane of the electronic screen. That is, the three-dimensional effect would cause an avatar to appear to behind the screen of a smart phone. The screen of the smart phone would appear as a window atop the underlying three-dimensional virtual environment. With a small screen, this limits the effect of the avatar and its ability to provide visual communication cues.
A need therefore exists for creating an avatar that can be used to provide visual communication cues, wherein the avatar appears three-dimensional and also appears to extend out from the electronic display from which it is shown. That is, the three-dimensional avatar would appear to be projected forward of or vertically above the screen of the electronic display, depending upon the orientation of the display. This need is met by the present invention as described and claimed below.
The present invention is a system and method of providing a virtual avatar or object to accompany audio signals being broadcast or to enhance any other form of audio communication from or to an electronic device that has a display screen. In the system, a virtual avatar model is created. The virtual avatar model is altered in real time in response to audio signals being broadcast from or to the electronic device. A 3D stereoscopic or auto-stereoscopic video file is created using the virtual avatar model while the virtual avatar model is responding to the audio signals.
The 3D video file is played on the display screen of the electronic device. When viewed, the 3D video file shows an avatar that appears, at least in part, to a viewer to be three-dimensional. Furthermore, the avatar appears to extend out from the display screen. The result is a three-dimensional avatar that appears to extend out of a display screen, wherein movements of the avatar are synchronized or nearly synchronized to audio signals that are being broadcast.
For a better understanding of the present invention, reference is made to the following description of exemplary embodiments thereof, considered in conjunction with the accompanying drawings, in which:
Although the present invention system and method can be used to create and display virtual avatars and/or objects of many types, the embodiments illustrated shows the system creating an avatar of an exemplary person for the purposes of description and discussion. Additionally, although the avatar can be displayed on any type of electronic display, the illustrated embodiments show the avatar displayed on the screen of a smart phone and on a screen of a stationary display. These two embodiments are selected for the purposes of description and explanation only. The illustrated embodiments, however, are merely exemplary and should not be considered a limitation when interpreting the scope of the appended claims.
Referring to
As will be explained, the application software 13 generates a 3D video steam 15 that is either stereoscopic or auto-stereoscopic in nature depending upon the display screen 12 where it is being viewed. The 3D video stream 15 presents a virtual scene 14 when viewed on the display screen 12. The virtual scene 14 includes an avatar 16. The virtual scene 14 has features that appear three-dimensional to a person viewing the virtual scene 14 on the display screen 12. The 3D video stream 15 can be generated using many methods. Most methods involve imaging an element in a virtual environment from two stereoscopic viewpoints. The stereoscopic images are superimposed and are varied in color, polarity or another manner that enables the stereoscopic images to be viewed differently between the left and right eye. For the stereoscopic images to appear to be three-dimensional to a viewer, the stereoscopic images must be viewed with specialized 3D glasses or viewed on a specialized display, such as an auto-stereoscopic display. In this manner, different aspects of the stereoscopic images can perceived by the left and right eyes, therein creating a three-dimensional effect.
Referring to
Referring to
The purpose of the avatar 16 being displayed is to provide a means to show visual cues to what would otherwise be merely audio communications, such as a phone call. In order for the avatar 16 to provide relative visual cues, it must be updated in real time and remain in sync with the changing audio signals 26 being heard by a person viewing the avatar 16. Adapting an avatar 16 to provide visual cues to audible communications is a three-step process.
Referring to
Once the software application 13 is downloaded onto the electronic device 10, the second step is to create or select a virtual avatar model 20 for use as the virtual subject of the 3D video stream 15. It will be understood that the general steps of selecting an avatar, Block 21, and creating a virtual avatar model 20 contain sub-steps. The software application 13, through the electronic device 10, instructs a user to choose a virtual avatar model 20. The virtual avatar model 20 can have a generic form 34, a semi-custom form 36, or a full custom form 38.
The generic form 34 of the avatar model 20 would be a selection from a menu of generic avatar models that are stored in an avatar catalog database 40 at the server 28. The generic form 34 can be a man, a woman, or any other creature or object, including licensed fantasy characters, virtual animals and virtual pets. The apparel and other accessories for the generic form 34 may be provided or may be selected. If not provided, various types of clothing, uniforms, equipment, and accessories may be selected from an accessory database 42 at the server 28.
The semi-custom form 36 is selected in the same manner as is the generic form 34. However, the face of the semi-generic form 34 is left blank on the virtual avatar model 20, or may be made to appear blank. A user then downloads one or more images of a face. This process can be dynamic, where different face images are used for different purposes. The images are modeled onto the blank face of the semi-custom form 36 using image integration software 44. See Block 45. There are several commercially available image integration software programs that enable a person to wrap a two-dimensional image of a face onto a three-dimensional avatar model. Such applications are exemplified by U.S. Patent Application Publication No. 2012/0113106 to Choi, entitled Method And Apparatus For Generating Face Avatar, the disclosure of which is herein incorporated by reference.
For generic forms 34 and semi-custom forms 36 of the virtual avatar model 20, the application software 13 provides a user with the ability to detail, personalize, and change the virtual avatar model 20 as desired, and as described above. Using the accessory database 42, a user can select hair length, hairstyle, hair color, skin color, and various other clothing and accessory options. Once the virtual avatar model 20 is complete, the virtual avatar model 20 is saved for use in animation and then for the generation of the 3D video stream 15.
The full custom form 38 of the virtual avatar model 20, can be created by downloading a full body scan or a picture set of the body of the user. After downloading such images of the user, the scans or pictures are virtually wrapped around the full custom form 38 using graphic integration software 44. Such avatar creation techniques are disclosed in U.S. Patent Application Publication No. 2012/0086783 to Sareen, entitled System And Method For Body Scanning And Avatar Creation, the disclosure of which is incorporated by reference. The full custom form 38 is dressed and has the general appearance, including such details as the appropriate hair length, hair color and skin color of the specific user since it is created from scans or photo files. Accessories can be added to the full custom form 38 using the accessory database 42.
The third step in adapting an avatar 16 to provide visual cues to audible communications is to create the 3D video stream 15 from the virtual avatar model 20 in real-time or near-time synchronization to audio signals 26. The virtual avatar model 20 itself has no artificial intelligence programming. Rather, the virtual avatar model 20 is a digital puppet that must be linked to a separate control element to control movement. The control elements for the virtual avatar model 20 are the audio signals 26 that the avatar 16 is being used to help communicate. Sound synchronizing programs 46 and/or word identification programs 48 are used to create changes in the virtual avatar model 20. Changes in the virtual avatar model 20 may include changes in facial expressions and/or changes in body movement. In a simple embodiment, the avatar model 20 is provided with a mouth 50. A sound synchronizing program 46 can be used to move the mouth 50 on the virtual avatar model 20 in synchronization with a voice in a conversation. Similarly, the volume and tone of the words being communicated can be detected. Depending if a person is speaking calmly or is yelling, preprogrammed movements in the head and body of the virtual avatar model 20 can be triggered. As such, a person can tell if a caller is speaking calmly or yelling just by looking at the body movements or facial expressions of the avatar 16 being displayed. Likewise, if music is playing, simple body movements in the virtual avatar model 20 can be set to the beat of the music. Accordingly, a person can tell if they have been placed on hold by viewing the avatar 16 dance to the on-hold music being played.
Using word recognition software 48, certain trigger words or phrases, such as “I love you”, can be identified. This can likewise trigger certain movement algorithms for the avatar model 20, and/or trigger various graphic effects to be added to the virtual three-dimensional scene along with the avatar model 20. The graphic effects that are added may include word balloons, emoticons, or other graphic images visually communicating the underlying tone and meaning of the speaker related to the message being verbally communicated, or to enhance the virtual scene in any other way. Animation software for avatars that is based upon audio signals is exemplified by U.S. Pat. No. 8,125,485 to Brown, entitled Animating Speech Of An Avatar Representing A Participant In A Mobile Communication, the disclosure of which is herein incorporated by reference.
The sound synchronization software 46 and the word recognition software 48 trigger preprogrammed changes in the virtual avatar model 20. However, the virtual avatar model 20 is a virtual digital construct. The virtual avatar model 20 must be used to create the 3D video stream 15 as the virtual avatar model 20 changes with the audio signals 26. As the virtual avatar model 20 changes with the audio signals 26, the virtual avatar model 20 is virtually imaged at a video frame rate of at least 30 frames per second. The result is the production of the 3D video stream 15. It is the 3D video stream 15 that is displayed on the display screen 12 of the electronic device 10. The 3D video stream 15 is either a stereoscopic video stream or an auto-stereoscopic video stream depending upon the design of the display screen 12. As such, when the 3D video stream 15 is viewed, the avatar 16 being presented appears three-dimensional when viewed with 3D glasses or when displayed on an auto-stereoscopic display without specialized glasses. Regardless, the avatar 16 will appear to extend forward or above the display screen when viewed in the proper manner.
The use of the avatar 16 is very useful when communicating between computers or between smart phones. The avatar 16 does not monitor the exact movements of a caller. Rather, the avatar 16 will move in response to the words and/or message communicated. The activation of the avatar 16 may be linked to a smart phone application so every time a certain person calls, the avatar 16 for that person is displayed. When a user calls another smart phone over the cellular network 30, the avatar 16 of the caller can be transmitted with the call as a data file. Alternatively, the avatar 16 can be retrieved by the recipient of the call from data stored in a previously downloaded software application. In this case, the recipient of the call has previously loaded the proper application software 13 into his/her phone. The caller's avatar 16 is selected and retrieved from the pre-installed application software, and appears when the call is answered, or when triggered by the recipient of the call. The avatar 16 of the person who placed the call will therefore appear on the smart phone of the person who was called. Likewise, either when placing the call, or when the call is answered, the avatar of the recipient of the call, will appear on the caller's smart phone.
In the earlier embodiment, the avatar 16 is shown in use with a smart phone 11. Although the avatar 16 is good at providing visual cues to what would otherwise be verbal communication, other applications exist. Referring to
In this embodiment, it will be noted that the avatar 64 is merely a bust and not a full body. This makes the features of the face more noticeable. The information unit 60 may be connected to a limited selection of informative answers. As such, when a person presses a “play” button 65 on the information unit, the information will play. Additionally, the information can be triggered to play by methods such as voice activation by viewers, sensors built into or near the information unit 60 to detect possible viewers, and other methods. The avatar 64 can be synchronized with the information played including realistic lip movement to words and facial expressions in context to the information relayed.
Alternatively, the information unit 60 can be integrated with a computer system 66 that is linked to the worldwide web 68. The computer system 66 can be loaded with an interactive computer interface 70 such as Siri® by Apple or Alexa by Amazon®. This will enable the information unit 60 to answer a large variety of questions. Since the questions are unknown and the replies unknown, the avatar 64 would use voice synchronization and word recognition software to alter the avatar 64 and interact with a user.
Additionally, in the same manner as described above, the avatar 64 can be scaled in size to display arms and hands. Word recognition algorithms can be used to trigger pre-programmed “signing” motions of the hands of the avatar 64, or of a set of hands only, to facilitate communications with a person who has a hearing deficit.
It will be understood that the embodiments of the present invention that are illustrated and described are merely exemplary and that a person skilled in the art can make many variations to those embodiments. All such embodiments are intended to be included within the scope of the present invention as defined by the claims.
This application claims the benefit of U.S. Provisional Patent Application No. 62/319,792, filed Apr. 8, 2016.
Number | Date | Country | |
---|---|---|---|
62319792 | Apr 2016 | US |