The introduction of Augmented Reality (AR) Head Mounted Displays (HMDs) in collaboration between remote and local workers, introduces new challenges given that camera views are now mobile. The present disclosure provides an AR HMD-based collaborative system designed for remote instruction over live mobile views during physical tasks. The collaborative system includes a world-stabilized area where remote helpers can dynamically place a pointer and annotations on the physical environment, and an indirect input mechanism with an absolute position to the world-stabilized area. Examples provided within show how the described system worked for participants engaged in a remote instructional task and how they supported effective and efficient communication.
The Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.
Embodiments of the present disclosure includes a collaborative work system comprising: a head mounted device, sized and configured to be worn by a local worker, including a worker display and a camera configured to capture a video feed from the perspective of the local worker; an input device configured to accept inputs from a remote helper, including, a helper display; and a shared virtual workspace displayed on the worker display and the helper display, comprising the video feed and annotations corresponding to the received inputs.
In some embodiments, the virtual workspace further comprises a fixed coordinate space within a world-stabilized two-dimensional plane.
In some embodiments, the virtual workspace includes a cursor configured to move according to the inputs from the remote helper.
In some embodiments, the input device includes a touchpad configured to accept touch inputs from the remote helper.
In some embodiments, the input device includes an erase function configured to erase select annotations.
In some embodiments, the system further comprises a communication device configured to send and receive instructions between the local worker and the remote helper.
In some embodiments, the communication device is a two-way audio-based communication device.
Embodiments of the present disclosure include a method for collaborative work, the method comprising: capturing a video feed from the perspective of a local worker; projecting a virtual workspace over the video feed; displaying said video feed to a remote helper; recording inputs from the remote helper in the form of annotations; combining video feed and annotations to create an annotated video; and displaying annotated video feed to local worker.
In some embodiments, the method further comprises projecting a fixed coordinate space within a world-stabilized two-dimensional plane within the workspace.
In some embodiments, the method further comprises erasing annotations after annotated video is displayed to local worker.
In some embodiments, the method further comprises manipulating a cursor within the workspace.
In some embodiments, the method further comprises sending audio instructions between the local worker and the remote helper.
In some embodiments, the annotated video is displayed to the local worker using an augmented reality headset.
In some embodiments, the inputs are recorded using a touchpad.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee. The accompanying figures are provided by way of illustration and not by way of limitation.
Section headings as used in this section and the entire disclosure herein are merely for organizational purposes and are not intended to be limiting.
The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms, or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “and” and “the” include plural references unless the context clearly dictates otherwise. By way of example, “an element” means at least one element and can include more than one element. The present disclosure also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not. As used herein, “and/or” refers to and encompasses any and all possible combinations of one or more of the associated listed items, as well as the lack of combinations where interpreted in the alternative (“or”).
For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise-Indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. For example, if a numerical range is stated as 1% to 50%, it is intended that values such as 2% to 40%, 10% to 30%, or 1% to 3%, etc., are expressly enumerated in this specification. These are only examples of what is specifically intended, and all possible combinations of numerical values between and including the lowest value and the highest value enumerated are to be considered to be expressly stated in this disclosure. “About” or “approximately” is used to provide flexibility to a numerical range endpoint by providing that a given value may be “slightly above” or “slightly below” the endpoint without affecting the desired result.
Unless otherwise defined herein, scientific and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art. The meaning and scope of the terms should be clear; in the event, however of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
Collaborative systems that provide shared visual information through shared workspaces, as provided in the present disclosure, improve situation awareness and are resources during conversational grounding, improving communication between collaborators, and virtual pointing in shared workspaces improves performance by reducing movement quantity for physical tasks as implicit guidance becomes explicit visual cues. Using Augment Reality (AR) and a Head Mounted Display (HMD), a remote helper can view a local worker's physical workspace and directly augment the local worker's view.
However, numerous challenges arise when designing for remote instruction through an AR HMD. One significant challenge is there is a shift from a fixed view to a mobile view, that is, the camera is no longer in a fixed position, but rather it moves with respect to the worker's position through an HMD or any mobile device. Referencing objects is difficult with a mobile camera and so typically a local worker is required to move carefully to a suitable location where the remote helper can then refer to a target object. Without this coordinated manual stabilization of the view, it is not clear how remote helpers can dynamically create stabilized annotations for instruction—i.e., how to point or draw a circle on a real-world object while the camera moves about in the space. The primary obstacle with a moving camera is that, as a remote expert clicks on a pixel at a given moment to draw, the next moment that pixel corresponds to another location in the local person's real world. One approach to overcome this obstacle is to require the remote expert to take a snapshot of the view before interacting on it; essentially “freezing” in time the shared workspace, drawing on it, and then either the system (a) presents it as a floating image, or (b) transfers the finished drawing as a stabilized annotation on the local worker's HMD. However, the “freeze” approach limits the remote helper's ability to engage in deictic referencing—i.e. dynamically pointing or annotating while providing verbal utterances such as “here”—which is important for facilitating clear as well as efficient communication.
Working on these challenges is a pressing concern as AR HMDs become more common and available as a consumer technology. The present provides an AR HMD-based collaborative system disclosure (sometimes referred to as the “HoloMentor” herein) designed for remote instruction over live mobile views during local worker's completion of physical tasks. The present disclosure provides at least two advantages: (1) dynamically aligning a pointer on a live mobile view, i.e. without the need for the remote helper to freeze the view, and (2) providing an accurate and intuitive interaction mechanism for a remote helper to point and annotate on those live mobile views. Experimental results using the disclosed system are provided within involving a local worker completing a building-blocks task via the instruction of a remote helper.
Experimental evaluations show that the pointer and the annotations were sufficiently accurate for the remote helper to use them during instruction and clear for the local worker to understand their reference. In addition, the remote worker was able to understand how to reliably use the disclosed input method for moving over the live mobile view. Both of these aspects enabled the remote helper to provide clear instructions through dynamic pointing and annotating over live mobile views.
The disclosed system includes several design characteristics.
Local worker. Local workers are responsible for the live mobile view, as they wear the AR HMD on their heads. The local worker's head movement, and corresponding view, may be constrained or free. Constraining head movement entails a social rather than technical solution. It includes having the remote helper ask the local person to stay still when annotating. The advantage with this approach is that there is no synchrony problem (both remote and local collaborators see the same view) and minimal movement means any pointing or annotating is accurately placed on that view. However, in environments that are already cognitively and physically taxing, such as in surgical telementoring or paramedic teleconsulting, this approach puts an additional coordinative burden. In fact, interacting with supporting technologies for secondary tasks while performing a complex primary task like surgery can interfere with cognitive and physical demands of the latter, which in turn has been shown to lead to errors. Thus, the disclosed system aims to unnecessarily constrain the local worker's movements by supporting dynamic tasks.
Remote helper. Remote helpers are responsible for pointing and creating annotations, to which the system provides feedback. Most important to note is that existing solutions do not provide a pointer that can represent the remote helper's dynamic cursor movements over a snapshot or a live view. Such deictic referencing is an important part of remote instruction. Thus, the disclosed system includes a mechanism for both pointing as well as annotating a live mobile view as one would naturally do on a static video image.
Output. Output—visualization of the pointer or annotation locations—could happen on the shared view, or elsewhere. In order to support live mobile views, and dynamic worlds, the helpers need to focus their attention on the shared view, thus, all output reside there. The pointer and annotations are fixed on the real world by making them world-stabilized. Because the volume where the pointer and annotations can live is potentially infinite (and mostly irrelevant to the task), the output space is constrained to a 2D plane where the task is executed, such as a table, rather than a 3D volume. The output space is then further constrained to a portion of this plane, where most of the task takes place.
Input. The input mechanism needs to accurately and intuitively interact in the 2D input surface showing the live mobile view. The helper could use the desktop mouse to click on the video, and then project that point to the output space to produce an annotation. However, as soon as the view changes from the worker moving their head, the pixel that the remote worker clicks also changes. Therefore, if the helper looks away when the helper clicks to annotate, the result is the unintended annotation of a line that follows the head movement. The disclosed system creates a fixed coordinate space for both users, by transforming the XY coordinates of the mouse cursor in the video window, to the corresponding XY coordinates of the output space, assuming border alignment. The output space is defined as a rectangle on the table in front of the worker. As such, when a remote helper would move their mouse towards the upper right corner of the video display window, the AR pointer would correspondingly move to the upper right corner of the output space. As this would result in having two cursors/pointers on the remote helper's video display window, which would be confusing for a remote helper, the mouse cursor is made invisible when in the video display window's bounds.
The disclosed system is an AR HMD-based collaborative system designed for remote instruction over live mobile views during local worker's completion of physical tasks. The disclosed system addresses two significant challenges: (1) dynamically aligning a pointer on a live mobile view, i.e. without the need for the remote helper to freeze the view, and (2) providing an interaction mechanism for a remote helper to point and annotate on those live mobile views. The basis of the disclosed system consists of two parts: a desktop application for the remote helper (
a. Actionport
The Actionport provides remote helpers with pointing and annotating functionality over a live stream from the local worker's head-mounted camera and is placed by the remote helper in a fixed position as a virtual overlay in the local worker's physical environment. By fixing the position of the Actionport, there is now a stable, defined space in the local worker's environment with a one-to-one mapping to a defined space on the remote helper's display. This defined shared workspace is not affected by the movement of the camera, thus a remote helper can control a cursor in the space to dynamically point or draw. For instance, when a remote technician points to a wire on a control panel and then the local technician moves their head to the right by 2 inches, the pointer remains over the wire.
A feedforward mechanism is provide in order to facilitate remote and local worker coordination in the placement of the Actionport. Even though the remote helper cannot move the position of the Actionport themselves, they can coordinate with the local workers to place the Actionport through this feedforward mechanism. This enables both collaborators to preview where an Actionport will be placed by using a raycast along the local worker's gaze direction (
In some embodiments, the spatial mapping feature of the open-sourced Mixed Reality Toolkit is used to stabilize the Actionport and its content (i.e. the pointer and annotations) in the local worker's environment. The spatial mapping feature creates triangle meshes on real-world surfaces in the environment around the Microsoft HoloLens device. First, the nearest spatial surface where the Actionport will be overlaid by using a raycast which lies within the camera's view orientation is identified. Then, a position and a normal vector (ic, a vector perpendicular to a plane) of a triangle mesh that the ray hit is calculated. An Actionport is placed here and with the normal vector parallel to the triangle mesh. Also, to rotate the bottom direction of the Actionport toward the local worker, a vector between the position of the local worker and the Actionport is calculated, and then adjusted the x angle of the Actionport.
b. Actionpad
The Actionpad, an indirect input mechanism, is provided for remote helpers to act on a tablet through touch and perceive the effect in the Actionport on the desktop application's video display window (
In some embodiments, a web-application loaded on a tablet (
The disclosed system was experimentally evaluated using the method described below.
a. Participants
Sixteen participants were recruited in June 2021, through targeted email solicitation. Once participants replied with their interest to participate, they were manually paired based on their availability. All of the participants were between 18-30 years of age, of which four identified as female and twelve as male and none had any prior experience with the Microsoft HoloLens.
b. Procedure
Each participant was placed in a separate room. The participant who played the role of the remote helper was sat in front of a desktop computer and provided a mouse, keyboard, and the Actionpad tablet. The participant who played the role of a local worker was sat at a table and provided with the HoloLens2 AR glasses. The communication between the remote helper and the local worker to complete the task was accomplished solely by using the HoloMentor's two-way audio channel, video from the AR HMD system, and augmented reality visual overlays.
After consent was attained, a pre-questionnaire was distributed. After completion of the pre-questionnaire, each participant watched the same introductory presentation that outlined how the system worked (e.g. how to set the Actionport, how to use the Actionpad) and how HoloMentor could be used for remote instruction. After completion, the participants then engaged in a practice task with HoloMentor. The remote helper was provided with a drawing of a house and a guide that ensured the participants would gain experience in using each functionality at least once. The local worker was provided a pen and a plain sheet of paper. During and after the practice task, both the participants were free to ask any clarification questions to the researchers.
The participants were then introduced to the main task: building a blocks structure. The remote helper was presented a pre-built structure situated on a base board. The base board was fixed to the table such that the remote helper can view the structure but not move it in any direction. The local worker was presented with several building blocks and a fixed base board in front of them. The base board of the local worker was fixed horizontally and the base board of the remote helper was fixed vertically. The fact that the two base boards are differently fixed at an angle of 90 degrees is deliberately designed to encourage the remote helper to use the provided rotate screen functionality. This additionally also provided the opportunity to see if and how the remote helper oriented the Actionpad to align with the orientation of the Actionport.
When the main task began, the remote helper instructed the local worker using HoloMentor. After successfully completing the task, each participant was presented a post-questionnaire. The participants then switched rooms and thereby switched roles. Another practice task was completed with a different drawing, followed by the main task again. For the second round of the main task, a different building blocks structure was assigned to the remote helper to avoid a learning effect. A second post-questionnaire was filled out by the participants to elaborate on their new role as a remote helper and local worker accordingly.
Lastly, a semi-structured interview was conducted where both researchers and both participants gathered in one large room. The interview contained questions from what the researchers observed during the main tasks and other questions from an interview script. Each participant took turns to answer each question in their role as a remote helper and a local worker.
c. Data Collection
Audio-Video Recordings & Notes: The interaction between participants during the main task was audio and video recorded using Go Pro cameras in each room and using a screen recorder software to capture the use of HoloMentor on a desktop. The placement of the cameras captured the desktop application's display, the interactions with the Actionpad, the profile of the remote helper, and the upper body and workspace of the local worker. The researchers also took observation notes during the main task.
Post-Interview: After participants performed two main tasks, the researchers and the participants came together for a semi-structured interview that was audio recorded. The interview consisted of semi-structured questions as well as additional questions added from what the researchers observed during the main tasks. Each participant took turns to answer each question in their role as a remote helper and a local worker.
Questionnaires: (1) A pre-questionnaire that included demographic questions and information about participants' previous experience with AR and the Microsoft HoloLens equipment. (2) Two post-questionnaires—one for the remote helper and one for the local worker. The questions were designed to gather information about the usefulness of the disclosed system's functionality and how or why they chose to use any functionality when the participant performed each role. The remote helper post-questionnaire specifically asked how useful they were for using the system's functions (Q1, Q2, and Q3) and how they used the functions to convey remote instructions (Q4 and Q5). The local worker post-questionnaire specifically asked whether the features were useful to perform physical tasks (Q1 and Q2) and how easy to understand the remote helper's action (Q3 and Q4). The questions were in the form of 5-point Likert scales (1=Strongly Disagree, 5=Strongly Agree).
d. Data Analysis
Three sets of data were analyzed and triangulated. First, the interaction videos captured during the main task were analyzed. Each significant moment when a remote worker used the Actionport or Actionpad to instruct on the main task was first highlighted in this data. This practice includes a detailed reflective analysis that captures utterances, pauses, overlaps, intonations, and visible actions. In the presented vignettes, the participant number and then the role the participant was playing is indicated through notation (e.g. P5RH for Participant 5 playing the Remote Helper, P6Lw for Participant 6 playing the Local Worker). The usability of the Actionport and the Actionpad were the focus: specifically, the accuracy of use of the pointer, understanding the relationship between the Actionport and the Actionpad, and how and when the remote helper uses the pointer and annotations instead of verbal instructions. The participants' behaviors were carefully observed, such as how they communicated through the Actionport, how the remote helper used the Actionpad, and how local workers responded to the information they received.
Second, the interview data was transcribed using Otter.ai and then manually checked for missing data. Using NVivo software, evaluators open-coded the transcribed data along with the questionnaire's open-ended answers to identify themes regarding the tools' use. Using selective coding, they then categorized those themes based on common patterns. A detailed description of the particular qualitative themes that reflect the participants' thoughts regarding the dynamic pointer, annotations, the Actionpad, and the Actionport were the focus. For example, evaluators identified how the dynamic pointer affects the remote helper's instructions. By integrating the conversation analysis and the interview analysis, evaluators generated 32 initial open codes that helped mark the content and organize it in a meaningful way. The final high-level themes were associated with the quality of deictic reference by the pointer and annotations, the relationship between the Actionpad and the Actionport, and point/annotate on live mobile views.
Third, the questionnaires' Likert scale responses were descriptively analyzed and the counts were reported as the median (M), standard deviation (SD), and in diverging bar charts. Although the design of the study does not allow for inferential statistical analysis, descriptive statistics are presented as evidence in support of the primary qualitative analysis.
The disclosed system helps the remote helpers overcome the challenge caused by pointing and annotating over live mobile views during the local worker's completion of physical tasks, as shown using the following vignettes.
a. Alignment of Pointer and Annotations with Actionports
The first research question was whether the Actionport was able to sufficiently align the remote helper's dynamic pointer and annotations and maintain that alignment with a mobile camera view. Evidence of this is presented through two types of examples: (1) the remote helper's demonstrated satisfaction with the alignment of the pointer and the intended target; and, (2) the local worker's demonstrated ability to understand the locations referred to and instructions provided by the remote helper.
(2) Actionport Provides Remote Workers with Good Alignment of a Dynamic Pointer and Annotations on Intended Physical World Target.
To investigate pointer accuracy during instruction, quantitative data is presented from the post-questionnaire, and then qualitative analysis of the video data and interviews. Participants acting as remote helpers rated highly their perception on the case with which to convey where they were looking (M=5, SD=0.63 on a 5 point scale), where they were pointing (M=5, SD=0.51), when articulating a spatial measure (M=4, SD=0.68), motion (M=4, SD=95), and specific objects (M=5, SD=0.74). As shown in
Through analysis of the video data, one sees that as the remote helpers used the pointer and annotation tools to provide instructions, the sixteen participants showed no evidence of being unsatisfied with the alignment of the dynamic pointer/annotation on the view. Overall, the evidence shows seamless use of the pointer and annotations to share location or placement instructions.
In Vignette 1, the remote helper is moving his pointer towards the location of the next block placement. He is able to show and verbally reference the location without any pause to correct the alignment. In addition, his affirmation that the local worker placed the block in the location he indicated.
In Vignette 1, the remote helper dynamically moves the pointer over the live mobile view while simultaneously talking to the local worker. Because of the stability of Actionport over the live mobile view, there was a reference frame for the pointer to then be able to move over this live view and accurately point to a location on it. This was perceived to be sufficiently accurate by all of the remote worker participants. Thus, they mainly used the pointer for instructions they provided the local workers. Evaluators counted every instances remote helpers used the pointer (113), annotations (83), or verbal only (83) at each instruction moment, and the pointer was used for 40.5% of instructions.
Likewise, during the post-study interviews, no participants present examples or complaints that the pointer was not accurate enough for their use. If discussed at all, the participants would focus on the ability the pointer provided them in making their references clear. For instance, the following are examples of the participant's reflection in reference to the accuracy of the pointer.
“If I was pointing [at] a block with a pointer, he was very easily going to pick it up. And we could move forward to the task. I overall thought that he understood what I was trying to convey.”-P5RH
“As [a remote helper], I found it much easier to convey what I wanted along with the ability to point to the object that I want.”-P11RH
In addition to pointing, annotations were also perceived to be correctly aligned with the intended target. Vignette 2 depicts an occasion where a remote helper drew four dots to guide the local worker in placing a square block. The four dots were fixed to the correct position on the green block, so the local worker placed the square block in that position. After drawing the four dots, the remote helper instructed the placement of the block by moving the pointer to its center.
A key assessment of the suitability of the disclosed solution is whether it can accurately maintain the placement of a pointer or annotation despite the camera view coming from a HMD, which can introduce both slight natural head movements as well as more pronounced head re-orientation. Thus, the user's perception of placement accuracy must be evaluated. Shown through three different data types, is the overall suitability of the placement. Suitability is used because the evaluation method cannot definitively measure whether the system was 100% accurate. However, users did not perceive any significant deviation to cause them to not consider the accuracy to be supportive of all they wished to accomplish. And considering the exactness of some of the intended references and annotations, the system was still perceived to work to a high degree for the fine referencing required.
(3) Actionport Provides Local Workers with Good Alignment so they can Understand Location References.
The post-questionnaire quantitative data analysis shows that local workers also responded with a high rating when asked if they could easily understand where the remote helper was looking (M=4.5, SD=0.98), pointing (M=5, SD=0.62), that they could interpret spatial measures (M=4, SD=0.70), motion (M=5, SD=0.69), and specific objects (M=5, SD=0.47). As shown in
In addition, all local workers answered that they were able to understand where the remote helper was indicating through the pointer and/or annotations.
From the qualitative analysis of the video data, local workers were able to understand the particular targets being referenced by the remote workers in the course of task completion. For example, when referencing a concrete element in the real world (
In Vignette 3, the local worker being able to act in conjunction with the instructions being provided. He is able to know exactly which block to move as well as then where to move that block. There is no need for clarifying questions by the local worker, and more to the point, the collaborative pair is able to be more efficient in their communication by the remote helper using words such as “this” and “here”, and these are unambiguous statements for the local worker.
Likewise, in Vignette 4, the remote helper uses the annotation functionality to indicate the orientation of a block's placement and the local worker demonstrates the exactness of the annotation location by his ability to satisfy the request without any clarifying questions.
In the example from Vignette 4, the annotation was a strong indication for the local worker. The remote helper only said “here” while drawing the annotation, but the local worker saw the starting position and direction of the annotation and placed the block correctly. There was no need for a conversation to confirm the location or direction.
Finally, in this next example, one unique benefit of the disclosed system is shown. Actionport is meant to support pointing over a moving camera view in order to support complex collaborative interactions. Vignette 5 shows a pointing action while at the same time the local person moves their head, changing the view. The result is nonetheless an interlacing of talk and action by both parties that is efficient and effective in getting the work done, without significant overlap. This reinforces the communication benefit of being able to point/annotate on live mobile views during task execution.
In this case, the pointer is shown to have sufficient alignment with both side to side head movement as well as when getting closer or further away from the physical space. This example also showed how useful this is for the collaborators as the remote helper can continue to provide deictic referencing while the local worker is moving their head. In other words, by supporting pointing over a live mobile video during instruction, the disclosed system is able to support the interlacing of talk and deictic referencing by the remote helper and natural head/body movement for closer inspection as well as physical object movement for the task at hand by the local worker.
In Vignette 6, the worker correctly placed a building block given an annotation, then the remote helper gives a new instruction, but the old annotation is now on top of the newly-placed block, and as the local worker now associates the new instruction with the old annotation, the action leads to a misplacement of the next building block. Thus, the dynamic nature of the physical world being augmented with instructions brings to light a new challenge to overcome when designing for remote annotation of live mobile views.
b. Use of the Actionpad to Point and Annotate
The remote helpers were able to effectively understand how to use the Actionpad to move the pointer and draw annotations. This is shown through examples of the remote helpers demonstrating the ability to point and annotate on the Actionpad independently of the live mobile view. Further findings on how the remote workers established their understanding of the relationship between the Actionpad screen and resulting movements of the pointer on the Actionport as well as the relationship of the orientation of the Actionpad with the orientation of the Actionport are also provided.
(1) Independent Interaction for Pointer and Annotations through Actionpad.
The remote helpers controlled the Actionport pointer with the Actionpad consistently and their actions were not affected by the live mobile view. Moments when remote helpers complained of disorientation when interacting with the Actionpad were not observed. In addition, when the pointer was not visible in the live mobile view due to the local worker's head movement, the remote helpers could continue pointing and annotating while asking the local worker to move their head back to make the pointer visible to them again.
As shown in Vignette 7 and
(2) Remote Workers Establish Alignment between Actionport and Actionpad.
In all cases, remote helpers first aimed to establish an alignment between their interactions on the Actionpad and the movement of their pointer in the Actionport before providing an instruction. When they wanted to move the pointer in the Actionport, first, they checked to see if their finger was positioned where they wanted on the Actionpad. Finally, while moving the pointer, they looked at whether the pointer actually moved to the location they intended in the Actionport.
P16RH: ((Looks back up towards the screen)) (.) ((Moves the pointer to the left)) (
In Vignette 8, the remote helper looked at the screen first, then the Actionpad, and then moved his finger while looking at the screen again before he began deictic referencing. Therefore, for a moment, the remote helper took the time to first check whether the interaction through the Actionpad corresponds to the movement of the Actionport's pointer. However, after learning how the two interfaces aligned, the remote helpers rarely looked at the tablet again when pointing and kept their gaze fixed on the desktop monitor when moving the pointer or drawing via the Actionpad.
(3) Coupling Rotated Actionport with Actionpad.
Remote helpers understood the relationship between the Actionpad and the Actionport. In Vignette 9, the remote helper had previously rotated the video view counterclockwise and rotated the Actionpad accordingly to align. After a while, the remote helper makes the decision to rotate the video view back to the original orientation.
This is the moment that the remote helper rotated back to the original view but she did not rotate her tablet at the same time. However, as soon as she moved the pointer and saw that the pointer moved in a different direction, she recognized she needed to also rotate the tablet and then checked again if the pointer in the Actionport moved in the direction she wanted. Vignette 9 shows the way the remote helper couples the direction of the Actionport with the Actionpad. All remote helpers who used the rotation function went through the same coupling process.
The disclosed system, consisting of an Actionport and an Actionpad, enables remote helpers to produce world-stabilized pointing and annotations without the need to “freeze” the remote view. The main takeaway is the demonstrated advantages over current approaches for producing world-stabilized annotations: in contrast to interfaces where remote helpers can only interact by taking a snapshot of the view first, the disclosed approach (1) supports pointing and thus deictic referencing in instruction, and, (2) supports live pointing and annotations while the view changes, and thus seamless communication without the need to disrupt the local work.
First, experimental results show the Actionport provided a suitable alignment of pointers and annotations to the intended target of the remote helper. In addition, none of the remote helpers indicated a loss of control of the pointer as the local workers moved their heads to explore the space. Because of this, the world-stabilized Actionport provided an embodiment for the remote helper to accurately provide deictic referencing and movement direction, without being affected by the local worker's head movement. Above all, pointing and annotating in the live mobile view had a positive effect on maintaining shared understanding between the remote helper and the local worker. For example, when the remote helper referred to a block and moved the pointer to a specific position, the local worker followed the pointer movement while holding the block almost simultaneously and then placing it in the intended position. However, the evidence shows that if given both the option of pointing or annotating, users will choose dynamic pointing for many references. More so, the Actionport provided a mechanism for achieving accurate deictic referencing. In essence, it would not be sufficient if only a pointer that participants did not feel to be accurate enough for effective use were provided.
Second, the Actionpad enabled remote helpers to continue manipulating the pointer regardless of changes in the live mobile view. Moreover, this was still true many times when the pointer was off-screen. Here, the remote helper asked the local worker to look back at the pointer in order to see its position, but in the meantime, they still continued interacting on the Actionpad. Interestingly, remote helpers quickly and intuitively understood the relation between their actions on the Actionpad and the result on the Actionport. For example, as there is a one-to-one correspondence between the XY coordinates on the Actionpad and Actionport, when the remote helper rotated the live mobile view, the Actionpad could also rotate in the same direction to case the mapping process. Even if the remote helper rotated the video view first and then forgot to rotate the Actionpad, they did so immediately when realizing that the movement (point or annotate) on the Actionpad did not match the expected outcome on the Actionport pointer.
a. Interaction for Creating World-Stabilized Annotations
Previous work proposed systems to support world-stabilized annotations by freezing the live view. However, the gap that occurs when the video stops and then restarts can cause disorientation and confusion for the remote helper. The disclosed interaction technique supports stabilized annotating on the live mobile view without freezing the view, which the study shows enabled smooth communication between collaborators and enabled the remote helper to immediately perceive their local worker's action. It's necessary to provide remote experts with mechanisms to navigate the local worker's space independently of the local worker's view—as when the local worker uses a HMD, they control the view. As beneficial this may be for visualizing the remote world, it may actually be disadvantageous when acting on the remote world, creating misalignment in communication and difficulty for providing immediate feedback.
Additional approaches and modifications to the disclosed system have been considered for example (1) feedforward mechanism: a hovering function where the remote helper first adjusts their pointer position in a personal view and when they are ready to communicate to the local worker, they press the screen (e.g. force touch) to perform the action; (2) show a 2D Reflection of the physical workspace on the Actionpad using 3D reconstruction: the image of the physical environment in which the Actionport is placed is reconstructed and it is reflected in Actionpad in real time. Lastly, formerly-placed annotations can be misinterpreted as the physical world changes beneath them. They may be mitigated by, for example, automatically removing the annotation when the real world is updated, or having mechanisms that create semantic links between annotations and objects (e.g. by drawing a line between them).
b. Dynamic Interaction for Remote Instruction in Live Mobile View
The Actionport provided a stable frame for the dynamic pointer to reference the live mobile view. Because of this, the remote helper frequently used the dynamic pointer to provide deictic referencing. If given the option of pointing or annotating, the remote helpers chose dynamic pointing for many references. The dynamic pointer is more beneficial than stabilized annotations on transient, short, procedural collaboration tasks. In addition, in situations where the live mobile view continues to change dynamically, immediate interaction is needed rather than stabilized annotation. An automatic erase function for remote instruction, in which annotations disappear after a few seconds, was more efficient than a manual erase function has been considered. The drawing annotations may need to be erased after completing each step of a task because the remaining annotations may cause confusion.
The disclosed system provided a manual erase function through the desktop application's toolbar, but some remote helpers did not use them, even though confusion could arise due to the remaining annotations. This may be because remote helpers interacted mainly through the Actionpad. Therefore, an automatic deletion function or the function of editing or erasing the previous annotations in the Actionpad directly has been considered. Finally, while disclosed system lets the remote helper reliably control the dynamic pointer in the live mobile view, the local worker does not have control over the pointer or annotations. The system may be modified to enable equal ability to interact on both sides.
c. Supporting Conversational Grounding and Situational Awareness
HoloMentor mainly supports maintaining conversational grounding and situational awareness by real-time pointing and annotating on the live mobile view. The use of the four types of gestures (deictic, iconic representations, spatial/distance, and kinetic/motion) facilitated the conversational grounding. HoloMentor provides deictic and kinetic/motion gestures through the dynamic pointer and spatial/distance and kinetic/motion gestures through annotations in the Actionport. Shared visual cues facilitated situational awareness so that the remote helpers provided better instruction. The experimental evaluation shows that the remote helper can perceive the local worker's action and change the pointing/annotating through the live mobile view, which supports the remote helper in maintaining situational awareness in a complex and dynamic environment.
Additionally, HoloMentor supports not only deictic instruction in the live mobile view but also it is compatible with setups commonly used in which a remote helper uses a desktop and a monitor. In remote instruction where the local worker is working with physical objects, AR could be more beneficial than VR because the local worker can see the objects directly. To see the physical objects through VR, not only would the system need to reconstruct the environment, but also the reconstructed environment in VR is not as accurate as of the real world.
The present disclosure provides an AR HMD-based collaborative system designed for remote instruction over live mobile views during local worker's completion of physical tasks—and empirical results of a laboratory study to evaluate its effectiveness and use. First, HoloMentor realizes pointing and annotating on a world-stabilized Actionport that lets the remote helper communicate with the local worker smoothly even though the view of the local worker's environment is not fixed. The Actionport is able to sufficiently align the remote helper's dynamic pointer and annotations and maintain that alignment with a mobile camera view. Second, the Actionpad entails a decoupling of interaction on a tablet (action) from its visualization in the live mobile view (perception). These two innovations tackle fundamental challenges that need addressing for collaborative systems based on AR through HMDs, where the view of the local worker's environment is mobile. The provided approach stands in contrast with prior work whose solution was to freeze the view and thus could not support deictic reference.
This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/482,373 filed Jan. 31, 2023, which is incorporated herein by reference in its entirety for all purposes.
This invention was made with Government support under Federal Grant Nos. U.S. Pat. No. 1,552,837 and BCS-2026510 awarded by the National Science Foundation. The Federal Government has certain rights to the invention.
Number | Date | Country | |
---|---|---|---|
63482373 | Jan 2023 | US |