This disclosure relates to virtual reality devices and systems for providing virtual encounters using virtual devices for communication, observation, and contact.
People can be separated by physical distances and yet can interact by conventional technologies such as telephones and teleconferencing. More recently with the advent of networking and especially the Internet people can hear each other's voice and see each other's images. Other developments have increased the perception of physical closeness.
For example, various types of virtual encounters are described in my published patent application US 2005-0130108 A1 published Jun. 16, 2005. In the published application, a mannequin or a humanoid-type robot can be deployed as a surrogate for a human. In one type of encounter, a mannequin can be paired with a remote set of goggles. In another type, the surrogate is configured such that a human with sensors can produce actuation signals that are sent to actuators to a remote robot to remotely control through the actuator signals movement of the robot. Conversely, in another type of encounter, a humanoid robot can be configured with sensors for sending sensor signals to a body suit having actuators that receive the sensor signals, such that a person wearing the body suit feels what the humanoid robot senses.
Also disclosed in other types of encounters is the use of a camera supported by a surrogate sending video images that are overlaid with a virtual scene, which images are rendered by goggles worn by a user, or in the video images can be morphed into a different image that is rendered by the goggles.
As also disclosed in my published application is the use of a pair of surrogates and a pair of humans that are configured such that a first one of the pair of humans in a first location has its own surrogate in a remote second location and through the surrogate can experience stimuli that occurs at the second location, whether those stimuli are tactile, auditory, visual, etc., and vice versa.
The virtual encounters disclosed in the above mentioned published application involve pairings. Another type of virtual encounter is a multiple-paring type of virtual encounter that involves several (more than two) people at two locations interacting in the locations simultaneously in a common session. At each location there would be some number of surrogates (mannequin or robotic types). Each user would select/be assigned a remote surrogate, i.e., hereinafter referred to as a surrogate. Thus, each user will see out of that surrogate's eyes (e.g., camera), hear out of that surrogate's ears (microphone) and feel out of that surrogate's tactile sensors that are positioned anywhere and everywhere on the surrogate's body.
One problem with the multiple-paring type of virtual encounter is that if there are one or more additional surrogates at a given remote location (beyond the one surrogate that the user selected, e.g., the surrogate that the user sees/hears/feels out of and controls), then that user will see those other surrogates rather than the humans they represent. Described below are techniques that are used to modifying processing that a given user will see when the additional people (more than two) are also represented by a surrogate/robot. The techniques address the problem of a person via the “eyes,” i.e., cameras, of the surrogate that represents the person, and seeing one of the other surrogates, having the person's view modified such that real time image modification replaces the image of the one of the surrogates with a corresponding image of the person that the surrogate represents.
According to an aspect, a virtual reality encounter system includes a first surrogate supporting at least one first camera that captures image data from a first physical location in which the first surrogate is disposed to produce a first image signal, a second surrogate supporting at least one second camera that captures second image data from the first physical location in which the second surrogate is disposed to produce a second image signal, a processor configured to receive the first image signal, detect an image of the second surrogate in the first image signal, replace the image data of the second surrogate in the first physical location, with image data of a user in the first physical location to form a transformed image that substitutes the image data of the user for the image data of the second surrogate, and a user device comprising a display and transducer, the user device disposed in the second location, with the display configured to receive the transformed image.
Other aspects include methods and computer program products stored on hardware storage devices that are non-transitory, and which include either volatile and/or non-volatile memory devices and storage devices.
A solution to the above problem is to apply real time image transformation, so that rather than users seeing surrogates (whether mannequins or robotic) at the remote location, users see the humans that the surrogates represent. In other words, the image is changed in real time so that the image of the surrogate is replaced with an image of the human that the surrogate represents. The image replace can include producing a series of images corresponding to movements of the associated human. One or more of the aspects above have one or more of the following advantages. The virtual encounter system adds a higher level of perception for groups of several people being perceived as being in the same place. Aspects of the system allow groups of two people to touch and to feel each other as well as manipulate objects in each other's environment. People can change their physical appearance in the virtual environment so that they seem taller or thinner to the other person or become any entity of their own choosing.
Referring to
As will be explained below, when user 14a interacts with surrogate 12a in location 11a by seeing and hearing through the surrogate 12a, the user 14a actually perceives seeing user 14b and hearing user 14b in location 11b. Likewise, user 14b listens and sees through surrogate 12b, but perceives listening and seeing user 14a in location 11a. Details of the gateways 16a and 16b are discussed below. Suffice it to say that the gateways 16a and 16b execute processes to process and transport raw data produced from devices for instance when users 14a and 14b interact with respective surrogates 12a and 12b. Suffice here to say that cameras and microphones carried on surrogates provide images and audio that are sent to user goggles, which allow a user to see and hear what a corresponding surrogate sees and hears.
In the discussion below, a user is considered “paired” with a surrogate, when the user and paired surrogate are in different locations (i.e., the surrogate in one location acts as a “stand in” that location in place of the user in the different location) and the user is considered “associated” with a surrogate, when that user and surrogate are physically in the same location and the user interacts with that surrogate in that same physical location.
Thus in
Also shown in
With respect to user 14a at location 11a, user 14a will see user 14b as above through surrogate 12b, but at times user 14a will also see surrogate 13b through surrogate 12b. It is desired that rather than seeing surrogate 13b, that user 14a see instead user 15a who is paired with surrogate 13b. That is user 14a sees surrogate 13b because user 14a while interacting with surrogate 12a in location 11a sees and hears what the surrogate 12b sees and hears, and thus when surrogate 12b has surrogate 13b in its field of view, user 14a perceives seeing surrogate 13b, and (if user 15b is also in the field of view also sees user 15b) at location 11b. In this instance, surrogate 12b sees surrogate 13b, but not user 15a.
To address this problem, the virtual encounter system 10 includes aliasing-substitution processing. In one implementation there is one aliasing-substitution processing module for the two set of locations.
In another implementation, there is an aliasing-substitution processing module at each gateway. In this latter implementation, each gateway system 16a, 16b includes an aliasing-substitution processing module 17a, 17b, respectively.
Aliasing-substitution processing modules 17a, 17b, process images received from surrogates in respective locations and perform a real time image transformation, so that rather than seeing a surrogate of another user at a remote location, the user sees the user that the surrogate represents. Essentially, the aliasing-substitution processing works in a similar manner whether there is one or multiple aliasing-substitution processing modules.
In other words, in the context of
In either case, images of the surrounding scene (and in particular in regions of intersection between a background and the image of the person) may need to be repaired so that the images do not look jagged or unusual. A pixel based aliasing processing can be used for repair to these intersections to remove jagged edges and blend in the image with the background. The images that are rendered by the goggles worn by user 14a while interacting with surrogate 12a, and seeing through surrogate 12b in location 11b, would render not the surrogate 13b but the user 15a. Techniques to accomplish this are described below.
With respect to user 14b, user 14b will see user 14a at location 11a through surrogate 12a in location 11a and user 14b will also see surrogate 13a rather than user 15b. Again, this problem can be addressed by the virtual encounter system 10 performing aliasing-substitution processing with aliasing-substitution processing module 17a to perform a real time image transformation, so that rather than the user 14b seeing the surrogate 13a of the user 15b at remote location 11b, the user 14b sees the user 15b that is paired with the surrogate 13a.
In the implementation of a single aliasing-substitution processing module (not shown) that module would perform the functions that are performed by aliasing-substitution processing module 17a and aliasing-substitution processing module 17b.
As with the aliasing-substitution processing 17b, aliasing-substitution processing 17a receives images from the surrogate 12a and transforms the images received from the surrogate 12a in real time with either a static or dynamic replacement, meaning that the same image could be used in all replacement or replacement could be dynamic, meaning that the replacement would capture movement of the associated human user. In either case, again the surrounding scene may need to be repaired so that the images do not look jagged or unusual. Thus, the images that are rendered by the goggles worn by user 14b while interacting with surrogate 12b, and seeing through surrogate 12a in location 11a, would render not the surrogate 13a but the user 15b.
Referring to
Referring to
Referring to
Referring to
Referring to
Referring to
Likewise, communication gateway 16a and communication gateway 16b work in the opposite direction through network 24, so that the video images, from location A, recorded by camera 30a are rendered on to display 56b. The video images, recorded by camera 36a are rendered on display 60b. The sounds received by microphone 42a in location A, are transmitted to earphone 24b and sounds received in location 11a by microphone 52a are transmitted to earphone 26b. The sounds received by microphone 42b in location 11b, are transmitted to earphone 24a and sounds received in location 11b by microphone 52b are transmitted to earphone 26a. Using system 10, two people can have a conversation where each of the persons perceives that the other is in the same location as them.
Also shown in
In operation, camera 30b and camera 36b record video images from location 11b. The video images are transmitted wirelessly to communication gateway 16b as video signals. Communication gateway 16b sends the video signals through network 28 to communication gateway 16a. Communication gateway 16b transmits the video signals wirelessly to set of goggles 20a. The video images recorded by camera 30b are rendered on to display 56a, and the video images recorded on camera 36b are rendered on to display 60a.
Likewise, communication gateway 16a and communication gateway 16b work in the opposite direction through network 24, so that the video images, from location A, recorded by camera 30a are rendered on to display 56b. The video images, recorded by camera 36a are rendered on display 60b.
The sounds received by microphone 42a in location A, are transmitted to earphone 24b and sounds received in location 11a by microphone 52a are transmitted to earphone 26b. The sounds received by microphone 42b in location 11b, are transmitted to earphone 24a and sounds received in location 11b by microphone 52b are transmitted to earphone 26a.
Similar considerations apply for channels 50c and 50d with respect to users 15a, 15b and surrogates 13a and 13b.
Referring now to
For example, as shown in
As shown in
As shown in
In other embodiments, user 14a can receive an image of a user 14b, but the actual background behind user 14b is altered. For example, user 14b is in a room but user 14a perceives user 14b on a beach or on a mountaintop (not shown). Using conventional video image editing techniques, the communication gateway 16a processes the signals received from location 11b and removes or blanks-out the video image except for the portion that has the user 22b. For the blanked out areas on the image, the communication gateway 16a overlays a replacement background, e.g., virtual environment to have the user 14b appear to user 14a in a different environment, as generally described in the above incorporated by reference published application. Generally, the system can be configured so that either user 14a or user 14b can control how the user 14b is perceived by the user 14a. Communication gateway 16a using conventional techniques can supplement the audio signals received with stored virtual sounds. For example, waves are added to a beach scene, or eagles screaming are added to a mountaintop scene, as generally described in the above incorporated by reference published application.
In addition, gateway 16a can also supplement tactile sensations with stored virtual tactile sensations. For example, a user can feel the sand on her feet in the beach scene or a cold breeze on her cheeks in a mountain top scene, as generally described in the above incorporated by reference published application. In this embodiment, storage media store data for generating a virtual environment including virtual visual images, virtual audio signals, and virtual tactile signals. Computer instructions executed by processor out of memory combine the visual, audio, and tactile signals received with the stored virtual visual, virtual audio and virtual tactile signals in data, as generally described in the above incorporated by reference published application.
In other embodiments, a user 14a can receive a morphed image of user 22b. For example, an image of user 14b is transmitted through network 24 to communications gateway 16a. User 14b has brown hair, brown eyes and a large nose. Communications gateway 16a again using conventional imaging morphing techniques alters the image of user 14b so that user 14b has blond hair, blue eyes and a small noise and sends that image to goggles 20a to be rendered. Communication gateway 16a also changes the sound user 14b makes as perceived by user 14a. For example, user 14b has a high-pitched squeaky voice. Communication gateway 22b using conventional techniques can alter the audio signal representing the voice of user 14b to be a low deep voice. In addition, communication gateway 16a can alter the tactile sensation. For example, user 14b has cold, dry and scaling skin. Communications gateway 16a can alter the perception of user 14a by sending tactile signals that make the skin of user 14b seem smooth and soft, as generally described in the above incorporated by reference published application.
In this embodiment, storage media store data for generating a morph personality. Computer instructions executed by a processor out of memory combine the visual, audio, and tactile signals received with the stored virtual visual, virtual audio and virtual tactile signals of a personality in data. Thus using system 10 anyone can assume any other identity if the identity data are stored in the storage media. In other embodiments, earphones are connected to the goggles. The goggles and the earphones are hooked by a cable to a port (not shown) on the communication gateway.
Aliasing-substitution processing 17a will now be described. Aliasing-substitution processing 17a would be similar. In the processing discussed below, the image data that will substitute for images captured by surrogates is communicated over the network to the proper aliasing-substitution processing module 17a, 17b, etc.
Referring to
That is, this aliasing-substitution processing 17b can substitute already captured images of the user, e.g., user 15a associated with the particular surrogate 13b, and modify the images to represent that user 15a at a viewing angle determined from the compass data, etc. in images that are returned to user 14a, so that user 14a at times sees user 15a rather than user's 15a paired surrogate 13b. The viewing angle is an angular two dimensional (or three dimensional) direction between the particular surrogate 12b and the surrogate 13b. This viewing angle is determined via the compass data. While this approach may not fully capture the real time movement and expressions of the human (unless a high degree of image modification were used), it would address the problem of viewing of surrogates in a multi-surrogate environment.
As described here aliasing-substitution processing 17b substitutes image data going to location 11a. In other implementations aliasing-substitution processing 17b could be configured to substitute for image data coming from location 11a. Similar considerations apply for aliasing-substitution processing 17a.
In one embodiment, in addition to providing the aliasing-substitution processing 17b, the system provides morph processing (not referenced) to generate from the received images of location 11b, a real-world image depicting the environment or a morphed or virtual depiction of the environment.
An alternative mechanism for producing the generated image frames augmented with a transformation of that portion of the image containing the surrogate 13b will now be described.
Referring now to
Image frames are received 132. In this embodiment, aliasing-substitution processing 17b is invoked for each image frame. Any of several well-known recognition techniques can be applied 134 to determine whether 134a and where 134b in the image the surrogate 13b appears. The processing 134 detects the image of the surrogate and retrieves 136 an image of the user, which is scaled (can also be cropped), according to the orientation data and the data corresponding to a current view of the environment, e.g., a room containing the location, as viewed by the surrogate 12b, to fit the retrieved image into the image frame and replace the image of the surrogate 13b. This aliasing-substitution processing 17b′ generates 138 a modified image frame transformed by substitution of image of surrogate at identified location in the image frame with scaled and/or cropped image of user 15a.
For facial recognition one approach would be to compare selected facial features that are retrieved from the image to stored facial features of the surrogate. The aliasing-substitution processing upon detecting in the image the recognized surrogate will use the real-world view of the environment to delineate the extent of the surrogate and substitute that data for data corresponding to an image of the user.
Additional positional information can be obtained via beacons that send out r.f., optical or acoustic signals and using conventional triangulation techniques through a receiver on the surrogates the positional information of the receiver and hence the surrogates can be determined from the set of beacons and determine the relative positions of the surrogates. The aliasing-substitution processing can receive this positional information to determine the relative position of the surrogates, whether the surrogate appears in a view and where in an image frame the image of the surrogate would be located in the environment to position the substituted image of the user 15a into the image frame.
Referring now to
In one implementation, the compass data can be used to select which camera is used to supply the real-time moving image data. In another implementation, the cameras can be mounted on a swivel mount and can either be manually controlled or automatically controlled to track movements of the user. Either approach can be used in order to obtain the correct viewing angle with regard to the user 15b.
The arrangement thus can be either a single video camera or a set of video camera, suitably arranged such as in a ring of cameras is provided. The selected real-time image data is then used to substitute for images of the surrogate as discussed above. In this processing, 156, the aliasing-substitution module would determine whether the surrogate 13b is within the field of view of the cameras 156a in the surrogate 13b and determine where in the image frames is the image of surrogate 13b, 156b.
The aliasing-substitution module can scale 158 the moving image and generate 159 a modified image that depicts movement of the user 15b rather than the user's 15b associated surrogate 13b. This second approach would more fully capture the real time movement and expressions of the human (albeit at the expense of more complexity) that the approaches discussed above.
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
In those instances, when location 11b has other surrogates from different locations, the system 10 can execute alias processing 17c to replace the images of those other surrogates, i.e., surrogate 15cc which could be in the frame of
Similar arrangements are provided for perception by any of the users.
In other embodiments, the paired user could be another user in another location (not shown) or an existing user in the session in one of locations 11a, 11b. That is, a user, e.g., user 14b could have two paired surrogates 12a and 12c in two different locations 11a and 11c respectively. In this instance, the user 14a would select which of the surrogates 12a, and 12c to interact with during a session and could change the selection during the session or could interact with both. Selection could be made using various techniques such as through a user interface presented to the user via the goggles prior to and/or during a session. However, irrespective of the user's 14b selection, the user's paired surrogate in location 11c could be replaced by the user's image.
Referring now to
While eyeglasses or a display device can be used other types of augmenting media devices can be configured to receive the generated image. User devices, e.g., goggles, body suits, etc. can include a computing device capable of taking input from a user and communicating over a network (not shown) with a server and/or with other user devices. For example, user device can be a mobile device, a laptop, a cell phone, a personal digital assistant (“PDA”), as well as the goggles, and so forth. User devices include monitors which render images. Gateways can include server computers that can be any of a variety of computing devices capable of receiving information, such as a server, a distributed computing system, a desktop computer, a laptop, a cell phone, a rack-mounted server, and so forth.
The aliasing-substitution processing modules can be programmed computing devices that are part of the gateway devices or can be separate computing devices such as computers and or server computer systems. Servers may be a single server or a group of servers that are at a same location or at different locations. These server systems can be dedicated systems, e.g., traditional servers and/or virtual servers running in a “cloud computing” environment and networked using appropriate networking technologies such as Internet connections. Applications running on those servers may communicate using XML/SOAP, RESTful web service, and/or other appropriate application layer technologies such as HTTP and ATOM.
Servers receive information from client device user device via interfaces. Specific implementation of interfaces can be any type of interface capable of receiving information over a network, such as an Ethernet interface, a wireless networking interface, a fiber-optic networking interface, and so forth. Servers also include a processor and memory, a bus system including, for example, a data bus and a motherboard, can be used to establish and to control data communication between the components of server.
Processors may include one or more microprocessors. Generally, processor may include any appropriate processor and/or logic that is capable of receiving and storing data, and of communicating over a network (not shown). Memory can include a hard drive and a random access memory storage device, such as a dynamic random access memory, machine-readable media, or other types of non-transitory machine-readable storage devices.
Components also include storage devices configured to store information including data and software. Embodiments can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations thereof. Apparatus of the invention can be implemented in a computer program product tangibly embodied or stored in a machine-readable storage device and/or machine readable media for execution by a programmable processor; and method actions can be performed by a programmable processor executing a program of instructions to perform functions and operations of the invention by operating on input data and generating output. The invention can be implemented advantageously in one or more computer programs that are executable on a programmable system including at least one programmable processor coupled to receive data and instructions from, and to transmit data and instructions to, a data storage system, at least one input device, and at least one output device. Each computer program can be implemented in a high-level procedural or object oriented programming language, or in assembly or machine language if desired; and in any case, the language can be a compiled or interpreted language.
Suitable processors include, by way of example, both general and special purpose microprocessors. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. Generally, a computer will include one or more mass storage devices for storing data files; such devices include magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and optical disks. Storage devices suitable for tangibly embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks, etc. Any of the foregoing can be supplemented by, or incorporated in, ASICs (application-specific integrated circuits).
Other embodiments are within the scope and spirit of the description claims. For example, due to the nature of software, functions described above can be implemented using software, hardware, firmware, hardwiring, or combinations of any of these. Features implementing functions may also be physically located at various positions, including being distributed such that portions of functions are implemented at different physical locations.
Number | Name | Date | Kind |
---|---|---|---|
613809 | Tesla | Nov 1898 | A |
5103404 | McIntosh | Apr 1992 | A |
5111290 | Gutierrez | May 1992 | A |
5845540 | Rosheim | Dec 1998 | A |
5980256 | Carmein | Nov 1999 | A |
5984880 | Lander et al. | Nov 1999 | A |
6368268 | Sandvick et al. | Apr 2002 | B1 |
6583808 | Boulanger et al. | Jun 2003 | B2 |
6695770 | Choy et al. | Feb 2004 | B1 |
6726638 | Ombrellaro | Apr 2004 | B2 |
6741911 | Simmons | May 2004 | B2 |
6771303 | Zhang et al. | Aug 2004 | B2 |
6786863 | Abbasi | Sep 2004 | B2 |
6832132 | Ishida et al. | Dec 2004 | B2 |
6958746 | Anderson et al. | Oct 2005 | B1 |
7095422 | Shouji | Aug 2006 | B2 |
7124186 | Piccionelli | Oct 2006 | B2 |
7164969 | Wang et al. | Jan 2007 | B2 |
7164970 | Wang et al. | Jan 2007 | B2 |
7333622 | Algazi et al. | Feb 2008 | B2 |
9259282 | Azizian | Feb 2016 | B2 |
9479732 | Saleh | Oct 2016 | B1 |
9841809 | Kurzweil | Dec 2017 | B2 |
9971398 | Kurzweil | May 2018 | B2 |
10223821 | Kurzweil | Mar 2019 | B2 |
20020049566 | Friedrich et al. | Apr 2002 | A1 |
20020080094 | Biocca et al. | Jun 2002 | A1 |
20020188186 | Abbasi | Dec 2002 | A1 |
20030030397 | Simmons | Feb 2003 | A1 |
20030036678 | Abbasi | Feb 2003 | A1 |
20030093248 | Vock et al. | May 2003 | A1 |
20030229419 | Ishida et al. | Dec 2003 | A1 |
20040046777 | Tremblay et al. | Mar 2004 | A1 |
20040088077 | Jouppi et al. | May 2004 | A1 |
20040104935 | Williamson et al. | Jun 2004 | A1 |
20050014560 | Blumenthal | Jan 2005 | A1 |
20050062869 | Zimmermann | Mar 2005 | A1 |
20050130108 | Kurzweil | Jun 2005 | A1 |
20050131580 | Kurzweil | Jun 2005 | A1 |
20090312871 | Lee | Dec 2009 | A1 |
20120038739 | Welch | Feb 2012 | A1 |
20120045742 | Meglan | Feb 2012 | A1 |
20120167014 | Joo | Jun 2012 | A1 |
20130250034 | Kang | Sep 2013 | A1 |
20140057236 | Meglan | Feb 2014 | A1 |
20150258432 | Stafford | Sep 2015 | A1 |
20160041581 | Piccionelli | Feb 2016 | A1 |
20170085834 | Kim | Mar 2017 | A1 |
20170206710 | Touma | Jul 2017 | A1 |
20170236288 | Sundaresan | Aug 2017 | A1 |
20170296897 | Simpson | Oct 2017 | A1 |
20170334066 | Levine | Nov 2017 | A1 |
20180040133 | Srinivasan | Feb 2018 | A1 |
20180220048 | Tamir | Aug 2018 | A1 |
20180342098 | Chang | Nov 2018 | A1 |
Number | Date | Country |
---|---|---|
WO0059581 | Oct 2000 | WO |
Entry |
---|
“Development of Teleoperation Master System with a Kinesthetic Sensation of Presence,” Hasumuma et al., ICAT '99, retrieved from the Internet, (1999), pp. 1-7. |
“Virtual Humanoid Robot Platform to Develop Controllers of Real Humanoid Robots without Porting,” Kanehiro et al., Proceedings of the 2001 IEEE/RSJ, Int'l Conference on Intelligent Robots and Systems, Maui, Hawaii (2001), pp. 1093-1099. |
“Teleoperation Characteristics and Human Response Factor in Relation to a Robotic Welding System,” Hou et al., Proc. IROS 96, IEEE (1996), pp. 1195-1202. |
“Real-Time Animation of Realistic Virtual Humans,” Kalra et al., IEEE Computer Graphics and Applications, (1998), pp. 42-56. |
Number | Date | Country | |
---|---|---|---|
20190188894 A1 | Jun 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15496213 | Apr 2017 | US |
Child | 16283954 | US |