The present invention relates to methods and systems for dynamic virtual convergence in video display systems. More particularly, the present invention relates to methods and systems for dynamic virtual convergence for a video-see-through head mountable display.
A video-see-through head mounted display (VSTHMD) gives a user a view of the real world through one or more video cameras mounted on the display. Synthetic imagery may be combined with the images captured through the cameras. The combined images are sent to the HMD. This yields a somewhat degraded view of the real world due to artifacts introduced by cameras, processing, and redisplay, but also provides significant advantages for implementers and users alike.
Most commercially available head-mounted displays have been manufactured for virtual reality applications, or, increasingly, as personal movie viewing systems. Using these off-the-shelf displays is appealing because of the relative ease with which they can be modified for video-see-through use. However, depending on the intended application, the characteristics of the displays frequently are at odds with the requirements for an augmented reality (AR) display.
One application for augmented reality displays is in the field of medicine. One particular medical application for AR displays is ultrasound-guided needle breast biopsies. This example is illustrated in
Most commercially available HMDs are designed to look straight ahead. However, as the object of interest (either real or virtual) is brought closer to the viewer's eyes, there is a decreasing region of stereo overlap on the nasal side of the display for each eye that is dedicated to this object. Since the image content being presented to each eye is very different, the user is presumably unable to get any depth cues from the stereo display in such situations. Users of conventional parallel display HMDs have been observed to move either the object of interest or their head so that the object of interest becomes visible primarily in their dominant eye. From this configuration they can apparently resolve the stereo conflict by ignoring their non-dominant eye.
In typical implementations of video-see-through displays, cameras and displays are preset at a fixed angle. Researchers have previously designed VST-HMDs while making assumptions about the normal working distance. In one design discussed below, the video cameras are preset to converge slightly in order to allow the wearer sufficient stereo overlap when viewing close objects. In another design, the convergence of the cameras and displays can be selected in advance to an angle most appropriate for the expected working distance. Converging the cameras or both the cameras and the displays is only practical if the user need not view distant objects, as there is often not enough stereo overlap or too much disparity to fuse distant objects.
In the pioneering days of VST AR work, researchers improvised (successfully) by mounting a single lipstick camera onto a commercial VR HMD. In such systems, careful consideration was given to issues, such as calibration between tracker and camera [Bajura 1992]. In 1995, researchers at the University of North Carolina at Chapel Hill developed a stereo AR HMD [State 1996]. The device consisted of a commercial VR-4 unit and a special plastic mount (attached to the VR-4 with Velcro™), which held two Panasonic lipstick cameras equipped with oversized C-mount lenses. The lenses were chosen for their extremely low distortion characteristics, since their images were digitally composited with perfect perspective CG imagery. Two important flaws of the device emerged: (1) mismatch between the fields of view of camera (28° horizontal) and display (ca. 40° horizontal) and (2) eye-camera offset or parallax (see [Azuma 1997] for an explanation), which gave the wearer the impression of being taller and closer to the surroundings than she actually was. To facilitate close-up work, the cameras were not mounted parallel to each other, but at a fixed 4° convergence angle, which was calculated to also provide sufficient stereo overlap when looking at a collaborator across the room while wearing the device.
Today many video-see-through AR systems in labs around the world are built with stereo lipstick cameras mounted on top of typical VR (opaque) or optical-see-through HMDs operated in opaque mode (for example, [Kanbara 2000]). Such designs will invariably suffer from the eye-camera offset problem mentioned above. [Fuchs 1998] describes a device that was designed and built from individual LCD display units and custom-designed optics. The device had two identical “eye pods.” Each pod consisted of an ultra-compact display unit and a lipstick camera. The camera's optical path was folded with mirrors, similar to a periscope, making the device “parallax-free” [Takagi 2000]. In addition, the fields of view of camera and display in each pod were matched. Hence, by carefully aligning the device on the wearer's head, one could achieve near perfect registration between the imagery seen in the display and the peripheral imagery visible to the naked eye around each of the compact pods. Thus, this VST-HMD can be considered orthoscopic [Drascic 1996], meaning that the view seen by the user through and around the displays appears consistent. Since each pod could be moved separately, the device (characterized by small field of view and high angular resolution) could be adjusted to various degrees of convergence (for close-up work or room-sized tasks), albeit not dynamically but on a per-session basis. The reason for this was that moving the pods in any way required inter-ocular recalibration. A head tracker was rigidly mounted on one of the pods, so there was no need to recalibrate between head tracker and eye pods. The movable pods also allowed exact matching of the wearer's IPD.
Other researchers have attacked the parallax problem by building devices in which mirrors or optical prisms bring the cameras “virtually” closer to the wearer's eyes. Such a design is described in detail in [Takagi 2000], together with a geometrical analysis of the stereoscopic distortion of space and thus deviation from orthostereoscopy that results when specific parameters in a design are mismatched. For example, there can be a mismatch between the convergence of the cameras and the display units (such as in the device from [State 1996]), or a mismatch between inter-camera distance and user IPD. While [Takagi 2000] advocates rigorous orthostereoscopy, other researchers have investigated how quickly users adapt to dynamic changes in stereo parameters. [Milgram 1992] investigated users' judgment errors when subjected to unannounced variations in intercamera distance. The authors in [Milgram 1992] determined that users adapted surprisingly quickly to the distorted space when presented with additional visual cues (virtual or real) to aid with depth scaling. Consequently, they advocate dynamic changes of parameters, such as inter-camera distance or convergence distance, for specific applications. [Ware 1998] describes experiments with dynamic changes in virtual camera separation within a fish tank VR system. They used a z-buffer sampling method to heuristically determine an appropriate inter-camera distance for each frame and a dampening technique to avoid abrupt changes. Their results indicate that users do not experience “large perceptual distortions,” allowing them to conclude that such manipulations can be beneficial in certain VR systems.
Finally, [Matsunaga 2000] describes a teleoperation system using live stereoscopic imagery (displayed on a monitor to users wearing active polarizers) acquired by motion-controlled cameras. The results indicate that users' performance was significantly improved when the cameras dynamically converged onto the target object (peg to be inserted into a hole) compared to when the cameras' convergence was fixed onto a point in the center of the working area.
Thus, one problem that emerges with conventional head mounted display systems is the inability to converge on objects close to the viewer's eyes. The display systems solve this problem using moveable cameras or cameras adjusted to a fixed convergence angle. Using moveable cameras increases the expense of head mounted display systems and decreases reliability. Using cameras that are adjusted to a fixed convergence angle only allows accurate viewing of objects at one distance. Accordingly, in light of the problems associated with conventional head mounted display systems, there exists a need for improved methods and systems for maintaining maximum stereo overlap for close range work using head mounted display systems.
The present invention includes methods and systems for dynamic virtual convergence for a video see through head mountable display. The present invention also includes a head mountable display with an integrated position tracker and a unitary main mirror. The head mountable display may also have a unitary secondary mirror. The dynamic virtual convergence algorithm and the head mountable display may be used in augmented reality visualization systems to maintain maximum stereo overlap in close-range work areas.
According to one aspect of the invention, a dynamic virtual convergence algorithm for a video-see-through head mountable display includes sampling an image with two cameras. The cameras each have a field of view that is larger than a field of view of displays used to display the images sampled by the cameras. A heuristic is used to estimate the gaze distance of a viewer. The display frustums are transformed such that they converge at the estimated gaze distance. The images sampled by the cameras are then reprojected into the transformed display frustums. The reprojected image is displayed to the user to simulate viewing of close-range objects. Since conventional displays do not have pixels close to the viewer's nose, stereoscopic viewing of close range images is not possible without dynamic virtual convergence. Dynamic virtual convergence according to the present invention thus allows conventional displays to be used for stereoscopic viewing of close range images without requiring the displays to have pixels near the viewer's nose.
According to yet another aspect of the invention, a method for estimating the convergence distance of a viewer's eyes when viewing a scene through a video-see-through head mounted display is disclosed. According to the method, cameras sample the scene geometry for each of the viewer's eyes. Depth buffer values are obtained for each pixel in the sampled images using information known about stationary and tracked objects in the scene. Next, the depth buffers for each scene are analyzed along predetermined scan lines to determine a closest pixel for each eye. The closest pixel depth values for each eye are then averaged to produce an estimated gaze distance. The estimated gaze distance is then compared with the distances of points on tracked objects to determine whether the distances of points on any of the tracked objects override the estimated gaze distance. Whether a point on a tracked object should override the estimated gaze distance depends on the particular application. For example, in breast cancer biopsies guided using augmented reality visualization systems, the position of the ultrasound probe is important and may override the estimated gaze distance if that distance does not correspond to a point on the probe. The final gaze distance may be filtered to dampen high-frequency changes in the gaze distance and avoid high-frequency oscillations. This filtering may be accomplished by temporally averaging a predetermined number of recent calculated gaze distance values. This filtering step increases response time in producing the final displayed image. However, undesirable effects, such as jitter and oscillations of the displayed image due to rapid changes in the gaze distance are removed.
Once the final gaze distance is determined, the dynamic virtual convergence algorithm transforms the display frustums to converge on the estimated gaze distance and reprojects the image onto the transformed display frustums. The reprojected image is displayed to the viewer on parallel display screens to simulate what the viewer would see if the viewer were actually converging his or her eyes at the estimated gaze distance. However, actual convergence of the viewer's eyes is not required.
According to another aspect of the invention, a head mountable display includes either a single main mirror or two mirrors positioned closely to each other to allow camera fields of view to overlap. The head mountable display also includes an integrated position tracker that tracks the position of the user's head. The cameras include wide-angle lenses so that the camera fields of view will be greater than the fields of view of the displays used to display the image. The head mountable display includes a display unit for displaying sampled images to the user. The display unit includes one display for each of the user's eyes.
Accordingly, it is an object of the invention to provide a method for dynamic virtual convergence to allow viewing of close range objects using a head mountable display system.
It is another object of the invention to provide a video-see-through head mountable display with a unitary main mirror.
It is yet another object of the invention to provide a video-see-through head mountable display with an integrated tracker to allow tracking of a viewer's head.
Some of the objects of the invention having been stated hereinabove, and which are addressed in whole or in part by the present invention, other objects will become evident as the description proceeds when taken in connection with the accompanying drawings as best described hereinbelow.
Preferred embodiments of the invention will now be explained with reference to the accompanying drawings, of which:
The present invention includes methods and systems for dynamic virtual convergence for a video see-through head mounted or head mountable display system.
In order to allow the user to view images that are close to the user's eyes without moving parts, computer 202 includes a dynamic virtual convergence module 218. Dynamic virtual convergence module 218 estimates the viewer's gaze distance, transforms the images sampled by cameras 210 to simulate convergence of the viewers eyes at the estimated gaze distance, and reprojects the transformed images onto display screens 212. The result of displaying the transformed images to the user is that the images viewed by the user will appear as if the user's eyes were converging on a close range object. However, the user is not required to cross or converge his or her eyes on the image to view the close range object. As a result, user comfort is increased.
Dynamic Virtual Convergence System Implementation
The [Fuchs 1998] device described above had two eye pods that could be converged physically. As each pod was toed in for better stereo overlap at close range, the pod's video camera and display were “yawed” together (since they were co-located within the pod), guaranteeing continuous alignment between display and peripheral imagery. The present embodiment deliberately violates that constraint but preferably uses “no moving parts,” and can be implemented fully in software. Hence, there is no need for recalibration as convergence is changed. It is important to note that sometimes VR or AR implementations mistakenly mismatch camera and display convergence, whereas the present embodiment intentionally decouples camera and display convergence in order to allow AR work in situations where an ortho-stereoscopic VST-HMD does not reach (because there are usually no display pixels close to the user's nose).
As described above, the present implementation uses a VST HMD with video cameras that have a larger field of view than the display unit. Only a fraction of a camera's image (proportional to the display's field of view) is actually shown in the corresponding display via re-projection. The cameras acquire enough imagery to allow full stereo overlap from close range to infinity (parallel viewing).
By enlarging the cameras' fields of view, the present invention removes the need to physically toe in the camera to change convergence. To preserve the above-mentioned alignment between display content and peripheral vision, the display would have to physically toe in for close-up work, together with the cameras, as with the device described in [Fuchs 1998]. While this may be desirable, it has been determined that it may not be possible to operate a device with fixed, parallel-mounted displays in this way, at least for some users. This surprising finding might be easier to understand by considering that if the displays converged physically while performing a near-field task, the user's eyes would also verge inward to view the task-related objects (presumably located just in front of the user's nose). With fixed displays however, the user's eyes are viewing the very same retinal image pair, but in a configuration which requires the eyes to not verge in order for stereoscopic fusion to be achieved.
Thus, virtual convergence according to the present embodiment provides images that are aligned for parallel viewing. By eliminating the need for the user to converge her eyes, the present invention allows stereoscopic fusion of extremely close objects even in display units that have little or no stereo overlap at close range. This fusion is akin to wall-eyed fusion of certain stereo pairs in printed matter or to the horizontal shifting of stereo image pairs on projection screens in order to reduce ghosting when using polarized glasses. This fusion creates a disparity-vergence conflict (not to be confused with the well-known accommodation-vergence conflict present in most stereoscopic displays [Drascic 1996]). For example, if converging cameras are pointed at an object located 1 m in front of the cameras and then present the image pair to a user in a HMD with parallel displays, the user will not converge his eyes to fuse the object but will nevertheless perceive it as being much closer than infinitely far away due to the disparity present in the image pair. This indicates that the disparity depth cue dominates vergence in such situations. The present invention takes advantage of this fact. Also, by centering the object of interest in the camera images and presenting it on parallel displays, the present invention eliminates the accommodation-vergence conflict for the object of interest, assuming that the display is collimated. In reality, HMD displays are built so that their images appear at finite but rather large (compared to the close range targeted by the present invention) distances to the user, for example, two meters in the Sony Glasstron device used in one embodiment of the invention (described below). Even so, users of a virtual convergence system will experience a significant reduction of the accommodation-vergence conflict, since virtual convergence reduces screen disparities (in one implementation of the invention, the screen is the virtual screen visible within the HMD). Reducing screen disparities is often recommended [Akka 1992] if one wishes to reduce potential eye strain caused by the accommodation-vergence conflict. Table 1 below shows the relationships between the three depth cues accommodation, disparity and vergence for a VST-HMD according to the present invention with and without virtual convergence, assuming the user is attempting to perform a close-range task.
By eliminating the moving parts, the present embodiment provides the possibility to dynamically change the virtual convergence. The present embodiment allows the computer system to make an educated guess as to what the convergence distance should be at any given time and then set the display reprojection transformations accordingly. The following sections describe a hardware and software implementation of the invention and present some application results as well as informal user reactions to this technology.
Exemplary Hardware Implementation
Referring to
In one non-orthoscopic embodiment, display 200 comprises a Sony Glasstron LDI-D100B stereo HMD with full-color SVGA (800×600) stereo displays, a device found to be very reliable, characterized by excellent image quality even when compared to considerably more expensive commercial units. Dynamic virtual convergence module 218 is operable with both orthoscopic and nonorthoscopic displays. It has a horizontal field of view of (=26°. The display-lens elements are built d=62 mm apart and cannot be moved to match a user's inter-pupillary distance (IPD). However, the displays' exit pupils are large enough [Robinett 1992] for users with IPDs between roughly 50 and 75 mm. Nevertheless, users with extremely small or extremely large IPDs will perceive a prismatic depth plane distortion (curvature) since they view images through off-center portions of the lenses; this issue is not described in further detail herein. Cameras 210 may be Toshiba IK-M43S miniature lipstick cameras mounted on display 200. The cameras are mounted parallel to each other. The distance between them is also 62 mm. There are no mirrors or prisms, hence there is a significant eye-camera offset (about 60-80 mm horizontally and about 20-30 mm vertically, depending on the wearer). In addition, there is an IPD mismatch for any user whose IPD is significantly larger or smaller than 62 mm.
The head-mounted cameras 210 are fitted with 4-mm-focal length lenses providing a field of view of approximately β=50° horizontal, nearly twice the displays' field of view. It is typical for small wide-angle lenses to exhibit barrel distortion, and in one embodiment of the invention, the barrel distortion is nonnegligible and must be eliminated (per software) before attempting to register any synthetic imagery to it. The entire head-mounted device, consisting of the Glasstron display, lenses, and an aluminum frame on which cameras and infrared LEDs for tracking are mounted, weighs well under 250 grams. (Weight was an important issue in this design since the device is used in extended medical experiments and is often worn by a medical doctor for an hour or longer without interruption.) AR software suitable for use with embodiments of the present invention runs on an SGI Reality Monster equipped with InfiniteReality2 (IR2) graphics pipes and digital video capture boards. The HMD cameras' video streams are converted from S-video to a 4:2:2 serial digital format via Miranda picoLink ASD-272p decoders and then fed to two video capture boards. HMD tracking information is provided by an Image-Guided Technologies FlashPoint 5000 opto-electronic tracker. A graphics pipe in the SGI delivers the stereo left-right augmented images in two SVGA 60 Hz channels. These images are combined into the single-channel left-right alternating 30 Hz SVGA format required by the Glasstron with the help of a Sony CVI-D10 multiplexer.
Exemplary Software Implementation
AR applications designed for use with embodiments of the present invention are largely single-threaded, using a single IR2 pipe and a single processor. For each synthetic frame, a frame is captured from each camera 210 via the digital video capture boards. When it is important to ensure maximum image quality for close-up viewing, cameras 210 are used to capture two successive National Television Standards Committee (NTSC) fields, even though that may lead to the well-known visible horizontal tearing effect during rapid user head motion.
Captured video frames are initially deposited in main memory, from where they are transferred to texture memory of computer 202. Before any graphics can be superimposed onto the camera imagery, it must be rendered on textured polygons. Dynamic virtual convergence module 218 uses a 2D polygonal grid which is radially stretched (its corners are pulled outward) to compensate for the above mentioned lens distortion, analogous to the pre-distortion technique described in [Watson 1995].
In a conventional video-see-through application one would use parallel display frustums to render the video textures since the cameras are parallel (as recommended by [Takagi 2000]). Also, the display frustums should have the same field of view as the cameras. However, for virtual convergence, dynamic virtual convergence module 218 uses display frustums that are verged in. Their fields of view are equal to the displays' fields of view. As a result of that, the user ends up seeing a reprojected (and distortion-corrected) sub-image in each eye.
The maximum convergence angle is γ=β−α, which in the present implementation is approximately 24°. At that convergence angle, the stereo overlap region of space begins at a distance zover,min=0.5d/tan(90°-β/2), which in the present implementation was approximately 66 mm, and full stereo overlap is achieved at a distance zover,full=d/(tan(β/2)-tan(α-β/2)), which in the present implementation was about 138 mm. At the latter distance, the field of view subtends an area that is d+2zover,fulltan(α-β/2) wide, or approximately 67 mm in the implementation described herein.
After setting the display frustum convergence, application-dependent synthetic elements are rasterized using the same verged, narrow display frustums. For some parts of the real world registered geometric models are stored in computer 202, and these models may be rasterized in Z only, thereby priming the Z-buffer for correct mutual occlusion between real and synthetic elements [State 1996].
Sheared vs. Rotated Display Frustums
One issue considered early on during the implementation phase of this technique was the question of whether the verged display frustums should be sheared or rotated.
Shearing the frustums keeps the image planes for the left and right eyes coplanar, thus eliminating vertical disparity or dipvergence [Rolland 1995] between the two images. At high convergence angles (i.e., for extreme close-up work), viewing such a stereo pair in the present system would be akin to wall-eyed fusion of images specifically prepared for cross-eyed fusion.
On the other hand, rotating the display frustums with respect to the camera frustums, while introducing dipvergence between corresponding features in stereo images, presents to each eye the very same retinal image it would see if the display were capable of physically toeing in (as discussed above), thereby also stimulating the user's eyes to toe in.
To compare these two methods for display frustum geometry, an interactive control (slider) was implemented in the user interface of dynamic virtual convergence module 218. For a given virtual convergence setting, blending between sheared and rotated frustums can be achieved by moving the slider. When that happens, the HMD user perceives a curious distortion of space, similar to a dynamic prismatic distortion. A controlled user study was not conducted to determine whether sheared or rotated frustums are preferable; rather, an informal group of testers was used and there was a definite preference towards the rotated frustums method overall. However, none of the testers found the sheared frustum images more difficult to fuse than the rotated frustum images, which is understandable given that sheared frustum stereo imagery has no dipvergence (as opposed to rotated frustum imagery). It is of course difficult to quantify the stereo perception experience without a carefully controlled study; for the present implementation on users' preferences were used as guidance for further development.
Automating Virtual Convergence
One goal of the present invention was to achieve on-the-fly convergence changes under algorithmic control to allow users to work comfortably at different depths. Tests were performed to determine whether a human user could in fact tolerate dynamic virtual convergence changes at all. To this end, a user interface slider for controlling convergence was implemented. A human operator continually adjusted the slider while a user was viewing AR imagery in the VST-HMD. The convergence slider operator viewed the combined left-right (alternating at 60 Hz) SVGA signal fed to the Glasstron HMD on a separate monitor. This signal appears similar to a blend between the left and right eye images, and any disparity between the images is immediately apparent. The operator continuously adjusted the convergence slider, attempting to minimize the visual disparity between the images (thereby maximizing stereo overlap). This means that if most of the image consists of objects located close to the HMD user's head, the convergence slider operator tended to verge the display frustums inward. With practice, the operators became quite skilled; most test users had positive reactions, with only one user reporting extreme discomfort.
Another object of the invention was to create a real-time algorithmic implementation capable of producing a numeric value for display frustum convergence for each frame in the AR system. Three distinct approaches were considered for this:
(1) Image content based: This is the algorithmic version of the “manual” method described above. An attractive possibility would be to use a maximization of mutual information algorithm [Viola 1995]. An image-based method could run as a separate process and could be expected to perform relatively quickly since it need only optimize a single parameter. This method should be applied to the mixed reality output rather than the real world imagery to ensure that the user can see virtual objects that are likely to be of interest. Under some conditions, such as repeating patterns in the images, a mutual information method would fail by finding an “optimal” depth value with no rational basis in the mixed reality. Under most conditions however, including color and intensity mismatches between the cameras, a mutual information algorithm would appropriately maximize the stereo overlap in the left and right eye images.
(2) Z-buffer based: This approach inspects values in the Z-buffer of each stereo image pair and (heuristically) determines a likely depth value to which the convergence should be set. [Ware 1998] gives an example for such a technique.
(3) Geometry based: This approach is similar to (2) but uses geometry data (models as opposed to pixel depths) to (again heuristically) compute a likely depth value to which the convergence should be set. In other words, this method works on pre-rasterization geometry, whereas (2) uses post-rasterization geometry.
Approaches (1) and (2) both operate on finished images. Thus, they cannot be used to set the convergence for the current frame but only to predict a convergence value for the next frame. Conversely, approach (3) can be used to immediately compute a convergence value (and thus the final viewing transformations for the left and right display frustums) for the current frame, before any geometry is rasterized. However, as will be explained below, this does not automatically exclude (1) and (2) from consideration. Rather, approach (1) was eliminated on the grounds that it would require significant computational resources. A hybrid of methods (2) and (3) was developed, characterized by inspection of only a small subset of all Z-buffer values, and aided by geometric models and tracking information for the user's head as well as for handheld objects. The following steps describe a hybrid algorithm for determining a convergence distance according to an embodiment of the present invention:
The highlighted points in each scan line represent the point in the scene that is closest to the user. Find the average of the closest depths zmin=(zmin,l+zmin,r)/2. Set the convergence distance z to zmin for now. This step is only performed if in the previous frame the convergence distance was virtually unchanged (a threshold of 0.01° may be used). Otherwise z is left unchanged from the previous frame.
The conditional update of z in Step 2 prevents most self-induced oscillations in convergence distance. Such oscillations can occur if the system continually switches between two (rarely more) different convergence settings, with the z-buffer calculated for one setting resulting in the other convergence setting being calculated for the next frame. Such a configuration may be encountered even when the user's head is perfectly still and none of the other tracked objects (such as handheld probe, pointers, needle, etc.) are moved.
Results
The dynamic virtual convergence subsystem has been applied to two different AR applications. Both applications use the same modified Sony Glasstron HMD and the hardware and software described above. The first is an experimental AR system designed to aid physicians in performing minimally invasive procedures such as ultrasound-guided needle biopsies of the breast. This system and a number of recent experiments conducted with it are described in detail in [Rosenthal 2001]. A physician used the system on numerous occasions, often for one hour or longer without interruption, while the dynamic virtual convergence algorithm was active. She did not report any discomfort while or after using the system. With her help, a series of experiments were conducted yielding quantitative evidence that AR-based guidance for the breast biopsy procedure is superior to the conventional guidance method in artificial phantoms [Rosenthal 2001]. Other physicians and researchers have all used this system, albeit for shorter periods of time, without discomfort (except for one individual previously mentioned, who experiences discomfort whenever the virtual convergence is changed dynamically).
The second AR application to use dynamic virtual convergence is a system for modeling real objects using AR.
Conclusions
Other authors have previously noted the conflict introduced in VST-HMDs when the camera axes are not properly aligned with the displays. While this is significant, significance violating this constraint may be advantageous in systems requiring the operator to use stereoscopic vision at several distances.
Mathematical models such as those developed by [Takagi 2000] demonstrate the distortion of the visual world. These models do not demonstrate the volume of the visual world that is actually stereo-visible (i.e., visible to both eyes and within 1-2 degrees of center of stereo-fused content). Dynamically converging the cameras-whether they are real cameras as in [Matsunaga 2000] or virtual cameras (i.e., display frustums) pointed at video-textured polygons as in embodiments of the present invention—makes a greater portion of the near field around the point of convergence stereoscopically visible at all times. Most users have successfully used the AR system with dynamic virtual convergence described herein to place biopsy and aspiration needles with high precision or to model objects with complex shapes. The distortion of the perceived visual world is not as severe as predicted by the mathematical models if the user's eyes converge at the distance selected by the system. (If they converge at a different distance, stereo overlap is reduced and increased spatial distortion and/or eye strain may be the result. The largely positive experience with this technique is due to a well-functioning convergence depth estimation algorithm.) Indeed, a substantial degree of perceived distortion is eliminated if one assumes that the operator has approximate knowledge of the distance to the point being converged on (experimental results in [Milgram 1992] support this statement). Given the intensive hand-eye coordination required for medical applications, it seems reasonable to conjecture that users' perception of their visual world may be rectified by other sources of information such as seeing their own hand. Indeed, the hand may act as a “visual aid” as defined by [Milgram 1992]. This type of adaptation is apparently well within the abilities of the human visual system as evidenced by the ease with which individuals adapt to new eyeglasses and to using binocular magnifying systems.
Future Work
Dynamic virtual convergence reduces the accommodation-vergence conflict while introducing a disparity-vergence conflict. It may be useful to investigate whether smoothly blending between zero and full virtual convergence is useful. Also, should that a parameter to be set on a per user basis, per session basis, or dynamically? Second, a thorough investigation of sheared vs. rotated frustums (should that be changed dynamically as well?), as well as a controlled user study for the entire system, with the goal of obtaining quantitative results, seem desirable.
The references listed below as well as all references cited in the specification are incorporated herein by reference to the extent that they supplement, explain, provide a background for or teach methodology, techniques and/or embodiments described herein.
It will be understood that various details of the invention may be changed without departing from the scope of the invention. Furthermore, the foregoing description is for the purpose of illustration only, and not for the purpose of limitation, as the invention is defined by the claims as set forth hereinafter.
This application is a continuation of U.S. patent application Ser. No. 10/492,582, filed Apr. 14, 2004, which is a national stage application under 35 U.S.C. §371 of PCT Application No. PCT/US02/33957, filed Oct. 18, 2002, and which further claims the benefit of U.S. Provisional Patent Application Ser. No. 60/335,052, filed Oct. 19, 2001, the disclosures of which are incorporated by reference herein in their entireties.
This invention was made with Government support under Grant Nos. CA47287 awarded by National Institutes of Health, and ASC8920219 awarded by National Science Foundation. The Government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
60335052 | Oct 2001 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10492582 | Jul 2004 | US |
Child | 12609915 | US |