Real-time interactive communication can take on many forms. Videoconferencing or other applications that employ high fidelity user imagery can help provide contextual information about the participants, which can promote robust and informative communication. However, even with smart whiteboards and similar media it can be challenging to have remote collaboration where different participants are working on the same thing at the same time. In addition, being on-screen via full motion video can be difficult for participants, especially during long sessions or back-to-back meetings, since it is not so simple to disengage from being an active participant while continuing in the session. Connectivity issues and other bandwidth concerns, background distractions, and information overload from multiple video feeds can also create issues and hinder collaboration.
The technology relates to methods and systems that provide tactile copresence for participants working in a real-time remote collaborative arrangement. This enhanced user experience enables different users to work remotely through their own local physical media such as whiteboards, large screen displays or even poster boards. While participants may be in different physical locations, they are all able to view a common workspace in the same way (e.g., a 1:1 scale so that all participants see the same information presented equivalently and none is cut off or otherwise lost). The participants are also able to experience tactile copresence via silhouette representations (“presence shadows”) of other people, which may be generated by respective imaging devices that provided three-dimensional pose tracking at the different locations.
Such body profile-based silhouette representations, which require much less bandwidth than full motion video and can minimize connectivity issues (e.g., dropped packets, bandwidth constraints, etc.), can show gestures and positioning as well as the relative proximity of a given participant to the common workspace. This provides rich contextual information about the participants, while at the same time allowing someone to both physically and metaphorically “step away” from the workspace in order to think about something being discussed without totally disengaging from the other participants. Spatial acoustics can be used to complement the silhouette representations and enrich the copresence experience. The technology can be employed in a wide variety of interactive applications. This can include videoconferencing, personal tutorials (e.g., for calligraphy or music), games (e.g., charades, tic-tac-toe, hangman), etc. With a minimal amount of hardware and software infrastructure, a tactile copresence setup can be a simple, cost-effective solution for almost any sized workspace that a user may have. This can enable wide deployment with minimal technical restrictions.
According to one aspect of the technology, a method comprises accessing, via a first computing device associated with a first participant at a first physical location, a copresence program configured to support multiple participants, the first physical location including a first physical medium configured to display information of the copresence program; receiving, by one or more processors associated with the first physical location, depth map information of a second participant at a second physical location, the depth map information being derived from a raw image associated with the second participant captured at the second physical location; generating, by the one or more processors associated with the first physical location, a presence shadow corresponding to the second participant, the presence shadow reprojecting aspects of the second participant according to the depth map information where the aspects are blurred according to a proximity of each aspect to a second physical medium at the second physical location; and displaying using the copresence program, on the first physical medium, the presence shadow corresponding to the second participant.
One or more of the aspects closest to the second physical medium may be generated to have a first amount of blurring while one or more of the aspects farthest from the second physical medium are generated to have a second amount of blurring greater than the first amount. The presence shadow may be configured to illustrate at least one of gesturing or positioning of the second participant. Generating the presence shadow may include inverting the depth map information prior to blurring. The presence shadow may only displayed when the second participant is within a threshold distance of the second physical medium.
The method may further comprise receiving collaborative information added to the second physical medium; and displaying the received collaborative information on the first physical medium along with the presence shadow corresponding to the second participant. Here, the method may further comprise receiving additional collaborative information added to the first physical medium; and integrating the additional collaborative information with the received collaborative information to form a set of integrated collaborative information. This may further include generating one or more bookmarks of the set of integrated collaborative information. The received collaborative information may be displayed at a first resolution while the presence shadow is displayed at a second resolution lower than the first resolution.
The depth map information of the second participant may further includes three-dimensional positioning information of a writing implement, and displaying the received collaborative information may include generating a scaled representation of the received collaborative information based on the three-dimensional positioning information of the writing implement. The raw image of the second participant may be taken from a viewpoint where the second participant is between the second physical medium and a second image capture device at the second physical location.
The method may further comprise capturing, at the first physical location, a raw image of the first participant; and transmitting depth map information derived from the raw image of the first participant to a computing device associated with the second physical location. Here, a presence shadow of the first participant may not be displayed on the first physical medium at the first physical location.
The method may further comprise receiving audio information of the second participant; and playing the received audio information at the first physical location in conjunction with displaying the presence shadow of the second participant. The method may further comprise using the depth map information to separate images of the second participant at the second physical location from collaborative information added to the second physical medium. The presence shadow may be generated based on a set of skeleton coordinates derived from an image of the second participant.
The method may further include saving an interactive collaboration of the copresence program in memory. Saving the interactive collaboration may include saving one or more snapshots of collaborative information added to either the first physical medium or the second physical medium.
The method may further comprise identifying tactile input onto either the first physical medium or the second physical medium; associating the identified tactile input with an action; and either (i) changing how selected information is displayed on the first physical medium based on the action, or (ii) performing a computing operation in response to the action. Changing how the selected information is displayed may include at least one of moving, scaling, translating or rotating a piece of collaborative information. The computing operation may be at least one of copying, pasting, saving or transmitting.
According to another aspect of the technology, a system associated with a first participant at a first physical location is provided. The system comprises a projector configured to display information on a first physical medium located at the first physical location and a first computing device having one or more processors. The first computing device is operatively coupled to the projector, and the one or more processors are configured to: access a copresence program configured to support multiple participants; receive depth map information of a second participant at a second physical location, the depth map information being derived from a raw image associated with the second participant captured at the second physical location; generate a presence shadow corresponding to the second participant, the presence shadow reprojecting aspects of the second participant according to the depth map information where the aspects are blurred according to a proximity of each aspect to a second physical medium at the second physical location; and cause the projector to display on the first physical medium, using the copresence program, the presence shadow corresponding to the second participant.
The one or more processors may be further configured to: receive collaborative information added to the second physical medium; and cause the projector to display the received collaborative information on the first physical medium along with the presence shadow corresponding to the second participant. The system may further include an image capture device configured to capture imagery associated with the first participant at the first physical location, in which the image capture device has a field of view encompassing the first physical medium. Here, the image capture device may be configured to capture a raw image of the first participant; the one or more processors may be configured to generate depth map information of the first participant based on the captured raw image; and the one or more processors may be configured to transmit the depth map information to a computing device associated with the second physical location.
According to a further aspect of the technology, a system associated with a first participant at a first physical location is provided. The system comprises an image capture device configured to capture imagery associated with the first participant at the first physical location, in which the image capture device having a field of view encompassing a first physical medium; a projector configured to display information on the first physical medium located at the first physical location; and a first computing device having one or more processors. The first computing device is operatively coupled to the image capture device and the projector, and the one or more processors are configured to implement any of the methods of operation described herein.
The ease and physical connection of using real markers on a whiteboard, riffing or otherwise brainstorming with people, and sketching creatively without thinking about the tools can be irreplaceable in many interactive contexts. One area of focus of the technology is enabling working with people who are remote while getting the value and ease of working together on things like white-boarding or hands on tutorials. This is done while leveraging the value of digital media at the same time, including having a number of different users actively working on one spot in a drawing or the same section of text in a document simultaneously, while providing the ability to undo, save, or rewind.
One approach uses tools such as a user's mobile phone, a projector and a television, whiteboard, poster board or other physical presentation medium. The mobile phone captures selected information about the user, the projector presents information about other collaborators, and the physical medium provides a focal point for a shared workspace. For instance, the participants can appear as if they were writing on frosted glass from opposing sides of the shared workspace. Participants at each location see writing correctly—no one would see the text, drawings or other details backwards. Distractions in the background environment (e.g., pets or other household members) are not shown. In addition, privacy concerns and video conferencing fatigue can be addressed by the ease of stepping in and out of the shared interactive experience.
Regardless of equipment differences, the tic-tac-toe board is presented on each physical medium with an equivalent aspect ratio. As shown, a silhouette representation (presence shadow) 106 represents the second participant as presented to the first participant at the first participant's location. In this example, the second participant's representation is shown holding a stylus 108, with which the second participant can mark a “0” in a selected spot while playing the game. This is shown in view 110 of
The image capture device 204 has a field of view 242 configured to capture details of participants standing in front of or near the physical medium 208. Imagery captured by the image capture device 204 can be streamed to the computer 202 and the computer 222 via a network 244 as shown by dash-dot line 246, e.g., using a real-time communication program such as WebRTC. As shown, computer 202 connects to the network 244 via a wired or wireless link 248. The projector 206 has a field of view 250, and is configured to present information from the computer 202 on the physical medium 208 via wired (e.g., an HDMI connection) or wireless link 252. In this configuration, the collaboration can be shown on a display device of the computer and mirrored on the physical medium 208.
While
In particular,
In one example, computing device 282 may include one or more server computing devices having a plurality of computing devices, e.g., a load balanced server farm or cloud computing system, that exchange information with different nodes of a network for the purpose of receiving, processing and transmitting the data to and from other computing devices. For instance, computing device 282 may include one or more server computing devices that are capable of communicating with any of the computing devices 288-298 via the network 286. This may be done as part of hosting one or more collaborative apps that incorporate copresence in accordance with aspects of the technology.
As shown in
The processors may be any conventional processors, such as commercially available CPUs. Alternatively, each processor may be a dedicated device such as an ASIC, graphics processing unit (GPU), tensor processing unit (TPU) or other hardware-based processor. Although
The computing devices may include all of the components normally used in connection with a computing device such as the processor and memory described above as well as a user interface subsystem for receiving input from a user and presenting information to the user (e.g., text, imagery and/or other graphical elements). The user interface subsystem may include one or more user inputs (e.g., at least one front (user) facing camera and/or an externally facing camera to obtain imagery of the physical medium, a mouse, keyboard, touch screen and/or microphone) and one or more display devices that are operable to display information (e.g., text, imagery and/or other graphical elements). Other output devices, such as speaker(s) may also provide information to users.
The user-related computing devices (e.g., 288-298) may communicate with a back-end computing system (e.g., server 282) via one or more networks, such as network 286. The user-related computing devices may also communicate with one another without also communicating with a back-end computing system. The network 286, and intervening nodes, may include various configurations and protocols including short range communication protocols such as Bluetooth™, Bluetooth LE™ the Internet, World Wide Web, intranets, virtual private networks, wide area networks, local networks, private networks using communication protocols proprietary to one or more companies, Ethernet, WiFi and HTTP, and various combinations of the foregoing. Such communication may be facilitated by any device capable of transmitting data to and from other computing devices, such as modems and wireless interfaces.
In one example, each participant's computing device may connect with the computing devices of each of the other participants via the network. Here, the copresence app may be run locally on each computing device. In another example, the server computing device may host a copresence service, and each computing device may communicate with the other computing devices via the server.
There are many ways that the present technology may be employed in tactile copresence situations.
Another participant at a remote location can also run the copresence program on their computer. As shown in view 310 of
As shown in view 320 of
According to another aspect of the technology, information shared among the participants is not limited to text or images drawn on the physical medium. Anything that can be captured by the image capture device can be employed. So as shown in view 340 of
In contrast to conventional video calls, this feature promotes quiet, slow thinking and doing with a calmer interaction mode. Here, if a participant steps back so that they are outside the field of view of their image capture device, then their respective presence shadow's silhouette would not appear on anyone's physical medium of the shared canvas. Once a participant is ready to rejoin the collaboration at the physical medium, as shown in view 360 of
Due to the nature of the physical medium, nothing is set in stone, and erasing is just as natural as drawing. Anything that any participant adds to their respective physical medium is integrated into the visualization presented on the shared canvas, and anything that is erased by a participant is removed from the visualization.
As seen in view 370 of
As shown in example 400 of
One highly beneficial aspect of the technology is that the content shown on the shared workspace is presented with high fidelity in high resolution (e.g., at least 250-300 pixels per inch), while the users' representations are intentionally created to have a lower resolution that appears as silhouettes (e.g., no more than 150-250 pixels per inch). Regardless, the silhouettes are able to convey meaningful information about how the users are interacting in the collaborative experience. For instance, view 500 of
To create the visuals for the presence shadows, the image capture device includes a depth camera. This can be a depth camera on a mobile phone, a stand-alone camera unit that has two or more imaging units, or separate cameras that in combination can provide the depth information. In addition for use in generating the presence shadows, the information from the depth camera can be used to detect (and correct for) occlusion of a portion of the physical medium.
In other scenarios, in addition to information from the image capture device, other information regarding depth may also be provided by a radar sensor, a lidar sensor, an acoustical sensor (e.g., sonar) and/or any combination thereof. By way of example, such information may be used to provide additional details about user gestures either at or away from the shared canvas (e.g., the positioning and/or spacing of the fingers, how a pencil or brush is being held, etc.), used to overcome possible occlusion of the image capture device, or otherwise enrich the information used to create a participant's shadow presence. Thus, using one of these other types of sensors can serve in detecting and helping to visualize a more abstract approximation of presence. For instance, location and pose information about the person can be used to generate a visualization in which their shadow changes from a humanoid shape to a dot or blob as the person moves away (and the reverse as the person moves closer), which will still move in sync with the person at the other end. This type of visualization would retain a sense of copresence but without the fidelity of a full body shadow.
Presence shadow generation can be done as follows. As shown in view 560 of
In addition to visual changes to the presence shadow, changes may also be made to the audio input associated with a given participant. For instance, when a participant is not within a threshold distance (e.g., 2-5 feet, or more or less) from the shared canvas, or is otherwise outside the field of view of their image capture device, the audio feed for that person may be automatically muted. Once the person comes within the threshold distance or the field of view, then the audio may be turned on. The audio volume may also be tuned based on the actual distance to the physical medium, so that the volume for a particular person becomes louder the closer they are and/or fades as they move back from the physical medium. The balance of acoustical signals produced by left and right speakers could also track the lateral position of the person. In another example, the participant may turn their audio on or off using the image capture device as a tool, for instance by looking at a camera or making a gesture to enable the microphone or mute it.
It is desirable for the participants at each location to see the same things on their physical medium that is common to the other participants, in particular the textual and/or graphical information that everyone is collaborating on. Everyone should ideally see this information in the same way (e.g., with a 1:1 scale), so that everyone has an equivalent view and no information is lost. The people at each location will also see the presence shadows of the other participants (but not their own). However, imaging issues such as visual echo and feedback can adversely impact streaming video between imaging device—projector setups. Thus, calibration may be performed prior to launching a copresence app, or upon launching the app.
For instance, view 700 of
In addition to color correction, the projected image should be calibrated so that digital keystoning (image skew) is avoided. For instance,
As noted above, when using a copresence app to collaborate with remote participants, the system may create bookmarks—snapshots of the shared canvas—at different points in time. The bookmarks may be stored, e.g., in database 284 that is accessible to the participants, or locally in memory of a given participant's computing device. The bookmarks could be viewed by later-joining participants to see the evolution of the collaboration without having to ask the other participants what happened previously. In one example, any participant would be able to manually create a bookmark. For instance, this could be done with a hand gesture or certain body pose, pressing a key (e.g., the spacebar) on the participant's computer, using a separate tool such as a programmable button that is connected to the computer (or physical medium) via a Bluetooth connector, speaking a command, etc.
In another scenario, information from the collaboration may be automatically captured and imported into a task list or calendar program. Here, for example, the system may identify one or more action items for at least one participant, and automatically add such action item(s) to a corresponding app, such as by syncing the action item(s) with the corresponding app.
In a further scenario, a transcript of a copresence session may be generated using a speech-to-text feature. The transcript may be timestamped along with the timing for when information was added to, removed from or modified on the shared canvas. In this way, it could be easy to determine the context for why a particular decision was made.
Depending on the configuration of the copresence app, in yet another scenario the system could allow participants to create the user interface they need, when they need it, e.g., by drawing it on the shared canvas. For example, like a form of programming, this approach would allow users to customize their experience, while reducing computational overhead associated with maintaining elements of a UI that aren't functional—instead they are created when needed. By way of example, they could create a portal to see through to the “other side” of the shared canvas and have a higher fidelity view of the person they are collaborating with. This could be particularly useful for complex discussions where facial expressions would add additional contextual information. Here, written details (e.g., “portal to shell”) associated with circling a portion of the shared canvas would indicate what the user interface should do. Alternatively, tactile-based tools could allow for copying and pasting, such as by circling the desired information and using a dragging action across the physical medium to create a copy in another part of the shared canvas.
Although the technology herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present technology. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present technology as defined by the appended claims.
As discussed herein, tactile copresence among two or more remote participants can be employed in various real-time interactive applications. Videoconferencing apps, brainstorming meetings, personal tutorial or training sessions, games and other activities can be implemented with simple physical setups. By only sending body profile information instead of transmitting full-motion video, significant reductions in bandwidth can be achieved. By using skeleton coordinates or other depth-map information received by the other computing devices to form a 3D reprojection of the participant as a presence shadow, full image details of the participant and other items in the background environment are not shown. In addition to reducing bandwidth, this also helps minimize unnecessary details, which can enhance user comfort with being on-screen.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US21/43694 | 7/29/2021 | WO |