Over time, people transform areas surrounding their desktop computers into rich landscapes of information and interaction cues. While some may refer to such items as clutter, to any particular person, the items are often invaluable and enhance productivity. Of the variety of at-hand physical media, perhaps, none are as flexible and ubiquitous as a sticky note. Sticky notes can be placed on nearly any surface, as prominent or as peripheral as desired, and can be created, posted, updated, and relocated according to the flow of one's activities.
When a person engages in mobile computing, however, she loses the benefit of an inhabited interaction context. Hence, the sticky notes created at her kitchen table may be cleaned away and, during their time at the kitchen table, they are not visible from the living room sofa. Moreover, a person's willingness to share his notes with family and colleagues typically does not extend to the passing people in public places such as coffee shops and libraries. A similar problem is experienced by the users of shared computers: the absence of a physically-customizable, personal information space.
Physical sticky notes have a number of characteristics that help support user activities. They are persistent—situated in a particular physical place—making them both at-hand and glanceable. Their physical immediacy and separation from computer-based interactions make the use of physical sticky notes preferable when information needs to be recorded quickly, on the periphery of a user's workspace and attention, for future reference and reminding.
With respect to computer-based “sticky” notes, a web application provides for creating and placing so-called “sticky” notes on a screen where typed contents are stored, and restored when the “sticky” note application is restarted. This particular approach merely places typed notes in a two-dimensional flat space. As such, they are not so at-hand as physical notes; nor are they as glanceable (e.g., once the user's desktop becomes a “workspace” filled with layers of open applications interfaces, the user must intentionally switch to the sticky note application in order to refer to her notes). For the foregoing reasons, the “sticky” note approach can be seen as a more private form of sticky note, only visible at a user's discretion.
As described herein, various exemplary methods, devices, systems, etc., allow for creation of media landscapes in mixed reality that provide a user with a wide variety of options and functionality.
An exemplary method includes accessing geometrically located data that represent one or more virtual items with respect to a three-dimensional coordinate system; generating a three-dimensional map based at least in part on real image data of a three-dimensional space as acquired by a camera; rendering to a physical display a mixed reality scene that includes the one or more virtual items at respective three-dimensional positions in a real image of the three-dimensional space acquired by the camera; and re-rendering to the physical display the mixed reality scene upon a change in the field of view of the camera. Other methods, devices, systems, etc., are also disclosed.
Non-limiting and non-exhaustive examples are described with reference to the following figures:
An exemplary application relies on camera images to build a map of a physical environment while essentially simultaneously calculating the camera's position relative to the map. Virtual items are treated as graphics to be positioned with respect to the map and rendered as graphics in conjunction with real camera images to provide a mixed reality scene.
Various examples described herein demonstrate techniques that allow a person to access the same media and information in a variety of locations and across a wide range of devices from PCs to mobile phones and from projected to head-mounted displays. Such techniques can provide users with a consistent and convenient way of interacting with information and media of special importance to them (reminders, social and news feeds, bookmarks, etc.). As explained, an exemplary system allows a user to smoothly switch away from her focal activity (e.g. watching a film, writing a document, browsing the web), to interact periodically with any of a variety of things of special importance.
In various examples, techniques are shown that provide a user various ways to engage with different kinds of digital information or media (e.g., displayed as “sticky note”-like icons that appear to float in the 3D space around the user). Such items can be made visible through an “augmented reality” (AR) where real-time video of the real world is modified by various exemplary techniques before being displayed to the user.
In a particular example, a personal media landscape of augmented reality sticky notes is referred to as a “NoteScape”. In this example, a user can establish an origin of her NoteScape by pointing her camera in a direction of interest (e.g. towards her computer display) and triggering the construction of a map of her local environment (e.g. by pressing the spacebar). As the user moves her camera through space, the system extends its map of the environment and inserts images of previously created notes. Whenever the user accesses her NoteScape, wherever she is, she can see the same notes in the same relative location to the origin of the established NoteScape in her local environment.
Various methods provide for a physical style of interaction that is both convenient and consistent across different devices, supporting periodic interactions (e.g. every 5-15 minutes) with one or more augmented reality items that may represent things of special or ongoing importance to the user (e.g. social network activity).
As explained herein, an exemplary system can bridge the gap between regular computer use and augmented reality, in a way that supports seamless transitions and information flow between the two. Whether using a PC, laptop, mobile phone, or head-mounted device, it is the display of applications (e.g. word processor, media player, web browser) in a “virtual” device displayed 2D workspace (e.g. the WINDOWS® desktop) that typically forms the focus of a user's attention. In a particular implementation using a laptop computer and a webcam, motion of the webcam (directly or indirectly) switches the laptop computer display between a 2D workspace and a 3D augmented reality. In other words, when the webcam was stationary, the laptop function returned to normal, but when the user picked up the webcam, his laptop display transformed into a view of augmented reality, as seen, at least in part, through the webcam.
A particular feature in the foregoing implementation allowed whatever the user was last viewing on the actual 2D workspace to remain on the laptop display when the user switched to the augmented reality. This approach allowed for use of the webcam to drag and drop virtual content from the 2D workspace into the 3D augmented reality around the laptop, and also to select between many notes in the augmented reality NoteScape to open in the workspace. For example, consider a user browsing the web on her laptop at home. When this user comes across a webpage she would like to have more convenient access to in future, she can pick up her webcam and points it at her laptop. In the augmented reality she can see through the webcam image that her laptop is still showing the same webpage, however, she can also see many virtual items (e.g., sticky-note icons) “floating” in the space around her laptop. Upon pointing crosshairs of the webcam at the browser tab (e.g., while holding down the spacebar of her laptop), she can “grab” the browser tab as a new item and drag it outside of the laptop screen. In turn, she can position the item, for example, high up to the left of her laptop, nearby other related bookmarks. The user can then set down the webcam and continue browsing. Then, a few days later, when she wants to access that webpage again, she can pick up the webcam, point it at the note that links to that webpage (e.g., which is still in the same place high up and to the left of her laptop) and enter a command (e.g., press the spacebar). Upon entry of the command, the augmented reality scene disappears and the webpage is opened in a new tab inside her web browser in the 2D display of her laptop.
Another aspect of various techniques described herein pertains to portability of virtual items (e.g., items in a personal “NoteScapes”) that a user can access wherever he is located (e.g., with any combination of appropriate device plus camera). For example, a user may rely on a PC or laptop with webcam (or mobile camera phone acting as a webcam), an ultra-mobile PC with consumer head-mounted display (e.g. WRAP 920AV video eyewear device, marketed by Vuzix Corporation, Rochester, N.Y.), or a sophisticated mobile camera phone device with appropriate on-board resources. As explained, depending on particular settings or preferences, style of interaction may be made consistent across various devices as a user's virtual items are rendered and displayed in the same spatial relationship to her focus (e.g. a laptop display), essentially in disregard to the user's actual physical environment. For example, consider a user sitting at her desk PC using a webcam like a flashlight to scan the space around her, with the video feed from the webcam shown on her PC monitor. If she posts a note in a particular position (e.g. eye-level, at arm's length 45 degrees to their right), the note can be represented as geometrically located data such that it always appears in the same relative position when she access her virtual items. So, in this example, if the user is later sitting on her sofa and wants to access the note again, pointing her mobile camera phone towards the same position as before (e.g. eye-level, at arm's length 45 degrees to their right) would let her view the same note, but this time on the display of her mobile phone. In the absence of a physical device to point at (such as with a mobile camera phone, in which the display is fixed behind the camera), a switch to augmented reality may be triggered by some action other than camera motion (e.g. a touch gesture on the screen). In an augmented reality mode, the last displayed workspace may then be projected at a distance in front of the camera, acting as “virtual” display from which the user can drag and drop content into her mixed reality scene (e.g., personal “NoteScape”).
Various exemplary techniques described herein allow a user to build up a rich collection of “peripheral” information and media that can help her to live, work, and play wherever she is, using the workspace of any computing device with camera and display capabilities. For example, upon command, an exemplary application executing on a computing device can transition from a configuration that uses a mouse to indirectly browse and organize icons on a 2D display to a configuration that uses a camera to directly scan and arrange items in a 3D space; where the latter can aim to give the user the sense that the things of special importance to her are always within reach.
Various examples can address static arrangement of such things as text notes, file and application shortcuts, and web bookmarks, but also the dynamic projection of media collections (e.g. photos, album covers) onto real 3D space, and the dynamic creation and rearrangement of notes according to the evolution of news feeds from social networks, news sites, collaborative file spaces, and more. At work, notifications from email and elsewhere may be presented spatially (e.g., always a flick of a webcam away). At home, alternative TV channels may play in virtual screens around a real TV screen where the virtual screens may be browsed and selected using a device such as a mobile phone.
In various implementations, there is no need for special physical markers (e.g., a fiducial marker or markers, a standard geometrical structure or feature, etc.). In such an implementation, a user with a computing device, a display, and a camera can generate a map and a mixed reality scene where rather than positioning “augmentations” relative to physical markers, items are positioned relative to a focus of the user. At a dedicated workspace such as a table, this focus might be the user's laptop PC. In a mobile scenario, however, the focus might be the direction in which the user is facing. Various implementations can accurately position notes in a 3D space without using any special printed markers through use of certain computer vision techniques that allow for building a map of a local environment, for example, as a user moves the camera around. In such a manner, the same augmentations can be displayed whatever the map happens to be—as the map is used to provide a frame of reference for stable positioning of the augmentations relative to the user. Accordingly, such an approach provides a user with consistent and convenient access to items (e.g., digital media, information, applications, etc.) that are of special importance through use of nearly any combination of display and camera, in any location.
As shown in
As described herein, geometrically located data is data that has been assigned a location in a space defined by a map. Such data may be text data, image data, link data (e.g., URL or other), video data, audio data, etc. As described herein, geometrically located data (which may simply specify an icon or marker in space) may be rendered on a display device in a location based on a map. Importantly, the map need not be the same map that was originally used to locate the data. For example, the text “Hello World!” may be located at coordinates x1, y1, z1 using a map of a first environment. The text “Hello World!” may then be stored with the coordinates x1, y1, z1 (i.e., to be geometrically located data). In turn, a new map may be generated in the first environment or in a different environment and the text displayed on a monitor according to the coordinates x1, y1, z1 of the geometrically located data.
To more clearly explain geometrically located data, consider the mixed reality space 103 and the items 132 and 134 rendered in the view on the monitor 128. These items may or may not exist in the “real” environment 110, however, they do exist as geometrically located data 130. Specifically, the items 132 are shown as documents such as “sticky notes” or posted memos while the item 134 is shown as a calendar. As described herein, a user associates data with a location and then causes the geometrically located data to be stored for future use. In various examples, so-called “future use” is triggered by a device such as the device 112. For example, as the device 112 captures information from a field of view (FOV), the computer 120 renders the FOV on the monitor 128 along with the geometrically located data 132 and 134. Hence, in
In the example of
In another example, is shown in
An example of commercially available goggles is the Joint Optical Reflective Display (JORDY) goggles, which is based on the Low Vision Enhancement System (LVES), a video headset developed through a joint research project between NASA's Stennis Space Center, Johns Hopkins University, and the U.S. Department of Veterans Affairs. Worn like a pair of goggles, LVES includes two eye-level cameras, one with an unmagnified wide-angle view and one with magnification capabilities. The system manipulates the camera images to compensate for a person's low vision limitations. The LVES was marketed by Visionics Corporation (Minnetonka, Minn.).
As described herein, a mixed reality view adaptively changes with respect to field of view (FOV) and/or view point (e.g., perspective). For example, when the user 107 moves in the environment, the virtual objects 132, based on geometrically located data 130, are rendered with respect to a map and displayed to match the change in the view point. In another example, the user 107 rotates a few degrees and causes the video camera (or cameras) to zoom (i.e., to narrow the field of view). In this example, the virtual objects 132, based on geometrically located data, are rendered with respect to a map and displayed to match the change in the rotational direction of the user 107 (e.g., goggles 185) and to match the change in the field of view. As described herein, zoom actions may be manual (e.g. using a handheld control, voice command, etc.) or automatic, for example, based on a heuristic (e.g. if a user gazes at the same object for approximately 5 seconds, then steadily zoom in).
With respect to lenses, a video camera (e.g., webcam) may include any of a variety of lenses, which may be interchangeable or have one or more moving elements. Hence, a video camera may be fitted with a zoom lens as explained with respect to
As mentioned, various exemplary methods include generating a map from images and then rendering virtual objects with respect to the map. An approach to map generation from images was described in 2007 by Klein and Murray (“Parallel tracking and mapping for small AR workspaces”, ISMAR 2007, which is incorporated by reference herein). In this article, Klein and Murray specifically describe a technique that uses keyframes and that splits tracking and mapping into two separate tasks that are processed in parallel threads on a dual-core computer where one thread tracks erratic hand-held motion and the other thread produces a 3D map of point features from previously observed video frames. This approach produces detailed maps with thousands of landmarks which can be tracked at frame-rate. The approach of Klein and Murray is referred to herein as PTM, another approach, referred to as simultaneous localization and mapping (EKF-SLAM) is also described. Klein and Murray indicate that PTM is more accurate and robust and provides for faster tracking than EKF-SLAM. Use of the techniques described by Klein and Murray allow for tracking without a prior model of an environment.
The mapping thread 310 includes a stereo initialization block 312 that may use a five-point-pose algorithm. The stereo initialization block 312 relies on, for example, two frames and feature correspondences and provides an initial map. A user may cause two keyframes to be acquired for purposes of stereo initialization or two frames may be acquired automatically. Regarding the latter, such automatic acquisition may occur, at least in part, through use of fiducial markers or other known features in an environment. For example, in the environment 110 of
The mapping thread 310 includes a wait block 314 that waits for a new keyframe. In a particular example, keyframes are added only if: there is a baseline to other keyframes and tracking quality is deemed acceptable. When a keyframe is added, an assurance is made such that (i) all points in the map are measured in the keyframe and that (ii) new map points are found and added to the map per an addition block 316. In general, the thread 310 performs more accurately as the number of points is increased. The addition block 316 performs a search in neighboring keyframes (e.g., epipolar search) and triangulates matches to add to the map.
As shown in
A map maintenance block 320 acts to maintain a map, for example, where there is a lack of camera motion, the mapping thread 310 has idle time that may be used to improve the map. Hence, the block 320 may re-attempt outlier measurements, try to measure new map features in all old keyframes, etc.
The tracking thread 340 is shown as including a coarse pass 344 and a fine pass 354, where each pass includes a project points block 346, 356, a measure points block 348, 358 and an update camera pose block 350, 360. Prior to the coarse pass 344, a pre-process frame block 342 can create a monochromatic version and a polychromatic version of a frame and creates four “pyramid” levels of resolution (e.g., 640×480, 320×240, 160×120 and 80×60). The pre-process frame block 342 also performs pattern detection on the four levels of resolution (e.g., corner detection).
In the coarse pass 344, the point projection block 346 uses a motion model to update camera pose where all map points are projected to an image to determine which points are visible and at what pyramid level. The subset to measure may be about the 50 biggest features for the coarse pass 344 and about 1000 randomly selected features for the fine pass 356.
The point measurement blocks 348, 358 can be configured, for example, to generate an 8×8 matching template (e.g., warped from a source keyframe). The blocks 348, 358 can search a fixed radius around a projected position (e.g., using zero-mean SSD, searching only at FAST corner points) and perform, for example, up to about 10 inverse composition iterations for each subpixel position (e.g., for some patches) to find about 60% to about 70% of the patches.
The camera pose update block 350, 360 typically operates to solve a problem with six degrees of freedom. Depending on the circumstances (or requirements), a problem with fewer degrees of freedom may be solved.
With respect to the rendering block 380, the data thread 370 includes a retrieval block 374 to retrieve geometrically located data and an association block 378 that may associate geometrically located data with one or more objects. For example, the geometrically located data may specify a position for an object and when this information is passed to the render block 380, the object is rendered according to the geometry to generate a virtual object in a scene observed by a camera. As described herein, the method 300 is capable of operating in “real time”. For example, consider a frame rate of 24 fps, a frame is presented to a user about every 0.04 seconds (e.g., 40 ms). Most humans consider a frame rate of 24 fps acceptable to replicate real, smooth motion as would be observed naturally with one's own eyes.
In the example of
While the foregoing example mentions targeting via crosshairs, other techniques may include 3D “liquid browsing” that can, for example, be capable of causing separation of overlapping items within a particular FOV (e.g., peak behind, step aside, lift out of the way, etc.). Such an approach could be automatic, triggered by a camera gesture (e.g. a spiral motion), a command, etc. Other 3D pointing schemes could also be applied.
In the state diagram 400 of
As the user continues with her session, the virtual content normally persists with respect to the map. Such an approach allows for quick reloading of content when the user once again picks up the camera (e.g., “camera motion detected”). Depending on the specifics of how the map exists in the underlying application, a matching process may occur that acts to recognize one or more features in the camera's FOV. If one or more features are recognized, then the application may rely on the pre-existing map. However, if recognition fails, then the application may act to reinitialize a map. Where a user relies on a mobile device, the latter may occur automatically and be optionally triggered by information (e.g., roaming information, IP address, GPS information, etc.) that indicates the user is no longer in a known environment or an environment with a pre-existing map.
An exemplary application may include an initialization control (e.g., keyboard, mouse, other command) that causes the application to remap an environment. As explained herein, a user may be instructed as to pan, tilt, zoom, etc., a camera to acquire sufficient information for map generation. An application may present various options as to map resolution or other aspects of a map (e.g., coordinate system).
In various examples, an application can generate personal media landscapes in mixed reality to present both physical and virtual items such as sticky notes, calendars, photographs, timers, tools, etc.
A particular exemplary system for so-called sticky notes is referred to herein as a NoteScape system. The NoteScape system allows a user to create a mixed reality scene that is a digital landscape of “virtual” media or notes in a physical environment. Conventional physical sticky notes have a number of qualities that help users to manage their work in their daily lives. Primarily, they provided a persistent context of interaction. Which means that that new notes are always at hand, ready to be used, and old notes are spread throughout the environment providing a glanceable display of the information that is of special importance to the user.
In the NoteScape system, virtual sticky notes exist as digital data that include geometric location. Virtual sticky notes can be portable and assignable to a user or a group of users. For example, a manager may email or otherwise transmit a virtual sticky note to a group of users. Upon receipt and camera motion, the virtual sticky note may be displayed in a mixed reality scene of a user according to some predefined geometric location. In this example, an interactive sticky note may then allow the user to link to some media content (e.g., an audio file or video file from the manager). Privacy can be maintained as a user can have control over when and how a note becomes visible.
The NoteScape system allows a user to visualize notes in a persistent and portable manner, both at hand and interactive, and glanceable yet private. The NoteScape system allows for mixed reality scenes that reinterpret how a user can organize and engage with any kind of digital media in a physical space (e.g., physical environment). As for paper notes, the NoteScape system provides a similar kind of peripheral support for primary tasks performed in a workspace having a focal computer (e.g., monitor with workspace).
The NoteScape system can optionally be implemented using a commodity web cam and a flashlight style of interaction to bridge the physical and virtual worlds. In accordance with the flashlight metaphor, a user points the web cam like a flashlight and observes the result on his monitor. Having decided where to set the origin of his “NoteScape”, the user may simply press the space bar to initiate creation of a map of the environment. In turn, the underlying NoteScape system application may begin positioning previously stored sticky notes as appropriate (e.g., based on geometric location data associated with the sticky notes). Further, the user may introduce new notes along with specified locations.
As described herein, notes or other items may be associated with a user or group of users (e.g., rather than any particular computing device). Such notes or other items can be readily accessed and interactive (e.g., optionally linking to multiple media types) while being simple to create, position, and reposition.
Once a map of sufficient breadth and detail has been generated, in a location block 524, the application locates one or more virtual items with respect to the map. As mentioned, a virtual item typically includes content and geometrical location information. For example, a data file for a virtual sticky note may include size, color and text as well as coordinate information to geometrically locate the stick note with respect to a map. Characteristics such as size, color, text, etc., may be static or defined dynamically in the form of an animation. As discussed further below, such data may represent a complete interactive application fully operable in mixed reality. According to the method 500, a rendition block 528 renders a mixed reality scene to include one or more items geometrically positioned in a camera scene (e.g., a real video scene with rendered graphics). The rendition block 528 may rely on z-buffering (or other buffering techniques) for management of depth of virtual items and for POV (e.g., optionally including shadows, etc.). Transparency or other graphical image techniques may also be applied to one or more virtual items in a mixed reality scene (e.g., fade note to 100% transparency over 2 weeks). Accordingly, a virtual item may be a multi-dimensional graphic, rendered with respect to a map and optionally animated in any of a variety of manners. Further, the size of any particular virtual item is essentially without limit. For example, a very small item may be secretly placed and zoomed into (e.g., using macro lens) to reveal content or to activate.
As described herein, the exemplary method 500 may be applied in most any environment that lends itself to map generation. In other words, while initial locations of virtual items may be set in one environment, a user may represent these virtual items in essentially the locations in another environment (see, e.g., environments 110 and 160 of
Depending on available computing resources or settings, a user may have an ability to extend an environment, for example, to build a bigger map. For example, at first a user may rely on a small FOV and few POVs (e.g., a one meter by one meter by one meter space). If this space becomes cluttered physically or virtually, a user may extend the environment, typically in width, for example, by sweeping a broader angle from a desk chair. In such an example, fuzziness may appear around the edges of an environment, indicating uncertainty in the map that has been created. As the user pans around their environment, the map is extended to incorporate these new areas and the uncertainty is reduced. Unlike conventional sticky notes, which adhere to physical surfaces, virtual items can be placed anywhere within a three-dimensional space.
As indicated in state diagram of
As mentioned, virtual items may include any of a variety of content. For example, consider the wall art 114 in the environment 110 of
With respect to linked media content, a user may provide a link to a social networking site where a user or the user has loaded media files. For example, various social networking sites allow a user to load photos and to share the photos with other users (e.g., invited friends). Referring again to the mixed reality scene 103 of the monitor 128 of
In another example, a virtual item may be a message “wall”, such a message wall associated with a social networking site that allows others to periodically post messages viewable to linked members of the user's social network.
An exemplary application may present one or more specialized icons for use in authoring content, for example, upon detection of camera motion. A specialized icon may be for text authoring where upon selection of the icon in a mixed reality scene, the display returns to a workspace with an open notepad window. A user may enter text in the notepad and then return to a display of the mixed reality scene to position the note. Once positioned, the text and the position are stored to memory (e.g., as geometrically located data, stored locally or remotely) to thereby allow for recreation of the note in a mixed reality scene for the same environment or a different environment. Such a process may automatically color code or date the note.
A user may have more than one set of geometrically located data. For example, a user may have a personal set of data, a work set of data, a social network set of data, etc. An application may allow a user to share a set of geometrically located data with one or more others (e.g., in a virtual clubhouse where position of virtual items relies on a local map of an actual physical environment). Users in a network may be capable of adding geometrically located data, editing geometrically located data, etc., in the context of a game, a spoof, a business purpose, etc. With respect to games and spoofs, a user may add or alter data to plant treats, toys, timers, send special emoticons, etc. An application may allow a user to respond to such virtual items (e.g., to delete, comment, etc.). An application may allow a user to finger or baton draw in a real physical environment where the finger or baton is tracked in a series of camera images to allow the finger or baton drawing to be extracted and then stored as being associated with a position in a mixed reality scene.
With respect to entertainment, virtual items may provide for playing multiple videos at different positions in a mixed reality scene, internet browsing at different positions in a mixed reality scene, or channel surfing of cable TV channels at different positions in a mixed reality scene.
As described herein, various types of content may be suitable for presentation in a mixed reality scene. For example, a gallery of media, of videos, of photos, and galleries of bookmarks of websites may be projected into a three dimensional space and rendered as a mixed reality scene. A user may organize any of a variety of files or file space for folders, applications, etc., in such a manner. Such techniques can effectively extend a desktop in three dimensions. As described herein, a virtual space can be decoupled from any particular physical place. Such an approach makes a mixed reality space shareable (e.g., two or more users can interact in the same conceptual space, while situated in different places), as well as switchable (the same physical space can support the display of multiple such mixed realities).
As described herein, various tasks may be performed in a cloud as in “cloud computing”. Cloud computing is an Internet based development in which typically real-time scalable resources are provided as a service. A mixed reality system may be implemented in part in a “software as a service” (SaaS) framework where resources accessible via the Internet act to satisfy various computational and/or storage needs. In a particular example, a user may access a website via a browser and rely on a camera to scan a local environment. In turn, the information acquired via the scan may be transmitted to a remote location for generation of a map. Geometrically located data may be accessed (e.g., from a local and/or a remote location) to allow for rendering a mixed reality scene. While part of the rendering necessarily occurs locally (e.g., screen buffer to display device), underlying virtual data or real data to populate a screen buffer may be generated or packaged remotely and transmitted to a user's local device.
In various trials, a local computing device performed parallel tracking and mapping as well as providing storage for geometrically located data sufficient to render graphics in a mixed reality scene. Particular trials operated with a frame rate of 15 fps on a monitor with a 1024×768 screen resolution using a web cam at 640×480 image capture resolution. A particular computing device relied on a single core processor with a speed of about 3 GHz and about 2 GB of RAM. Another trial relied on a portable computing device (e.g., laptop computer) with a dual core processor having a speed of about 2.5 GHz and about 512 MB of graphics memory, and operated with a frame rate of 15 fps on a monitor with a 1600×1050 screen resolution using a webcam at 800×600 image capture resolution
In the context of a webcam, camera images may be transmitted to a remote site for various processing in near real-time and geometrically located data may be stored at one or more remote sites. Such examples demonstrate how a system may operate to render a mixed reality scene. Depending on capabilities, parameters such as resolution, frame rate, FOV, etc., may be adjusted to provide a user with suitable performance (e.g., minimal delay, sufficient map accuracy, minimal shakiness, minimal tracking errors, etc.).
Given sufficient processing and memory, an exemplary application may render a mixed reality scene while executing on a desktop PC, a notebook PC, an ultra mobile PC, or a mobile phone. With respect to a mobile phone, many mobile phones are already equipped with a camera. Such an approach can assist a fully mobile user.
As described herein, virtual items represented by geometrically located data can be persistent and portable for display in a mixed reality scene. From a user's perspective, the items (e.g., notes or other items) are “always there”, even if not always visible. Given suitable security, the items cannot readily be moved or damaged. Moreover, the items can be made available to a user wherever the user has an appropriate camera, display device, and, in a cloud context, authenticated connection to an associated cloud-based service. In an offline context, standard version control techniques may be applied based on a most recent dataset (e.g., a most recently downloaded dataset).
As described herein, an application that renders a mixed reality scene provides a user with glanceable and private content. For example, a user can “glance at his notes” by simply picking up a camera and pointing it. Since the user can decide when, where, and how to do this, the user can keep content “private” if necessary.
As described herein, an exemplary system may operate according to a flashlight metaphor where a view from a camera is shown full-screen on a user's display where, at the center of the display is a targeting mark (e.g. crosshair or reticule). A user's actions (e.g. pressing a keyboard key, moving the camera) can have different effects depending on the position of the targeting mark relative to virtual items (e.g., virtual media). A user may activate corresponding item by any of a variety of commands (e.g., a keypress). Upon activation, an item that is a text-based note might open on-screen for editing, an item that is a music file might play in the background, an item that is a bookmark might open a new web-browser tab, a friend icon (composed of e.g. name, photo and status) might open that person's profile in a social network, and so on.
As described with respect to
When the camera is embedded within the computing device (such as with a mobile camera phone, camera-enabled Ultra-Mobile PC, or a “see through” head mounted display), camera motion alone cannot be used to enter the personal media landscape. In such situations, a different user action (e.g. touching or stroking the device screen) may trigger the transition to mixed reality. In such an implementation, an application may still insert a representation of the display at the origin (or other suitable location) of the established mixed reality scene to facilitate, for example, drag-and-drop interaction between the user's workspace and the mixed reality scene.
As explained, an exemplary application relies on camera images to build a map of a physical environment while essentially simultaneously calculating the camera's position relative to the map. Virtual items are typically treated as graphics to be positioned with respect to the map and rendered as graphics in conjunction with real camera images to provide a mixed reality scene.
As indicated in
In the example of
As described herein, an item rendered in a mixed reality scene may optionally be an application. For example, an item may be a calculator application that is fully functional in a mixed reality scene by entry of commands (e.g., voice, keyboard, mouse, finger, etc.). As another example, consider a card game such as solitaire. A user may select a solitaire item in a mixed reality scene that, in turn, displays a set of playing cards where the cards are manipulated by issuance of one or more commands. Other examples may include a browser application, a communication application, a media application, etc.
The other modules shown in
The preferences module 848 allows a user to rely on default values or user selected or defined preferences. For example, a user may select frame rate and resolution for a desktop computer with superior video and graphics processing capabilities and select a different frame rate and resolution for a mobile computing device with lesser capabilities. Such preferences may be stored in conjunction with geometrically located data such that upon access of the data, an application operates with parameters to ensure acceptable performance. Again, such data may be stored on a portable memory device, memory of a computing device, memory associated with and accessible by a server, etc.
As mentioned, an application may rely on various modules, for example, including some or all of the modules 800 of
In the foregoing application, the mapping module may be configured to access real image data of a three-dimensional space as acquired by a camera such as a webcam, a mobile phone camera, a head-mounted camera, etc. As mentioned, a camera may be a stereo camera.
As described herein, an exemplary system can include a camera with a changeable field of view; a display; and a computing device with at least one processor, memory, an input for the camera, an output for the display and control logic to generate a three-dimensional map based on real image data of a three-dimensional space acquired by the camera via the input, to locate one or more virtual items with respect to the three-dimensional map, to render a mixed reality scene to the display via the output where the mixed reality scene includes the one or more virtual items along with real image data of the three-dimensional space acquired by the camera and to re-render the mixed reality scene to the display via the output upon a change in the field of view of the camera. In such a system, the camera can have a field of view changeable, for example, by manual movement of the camera, by head movement of the camera or by zooming (e.g., an optical zoom and/or a digital zoom). Tracking or sensing techniques may be used as well, for example, by sensing movement by computing optical flow, by using one or more gyroscopes mounted on a camera, by using position sensors that compute the relative position of the camera (e.g., to determine the front of view of the camera), etc. Such techniques may be implemented by a tracking module of an exemplary application for generating mixed reality scenes.
Such a system may include control logic to store, as geometrically located data, data representing one or more virtual items located with respect to a three-dimensional coordinate system. As mentioned, a system may be a mobile computing device with a built in camera and a built in display.
As described herein, an exemplary method can be implemented at least in part by a computing device and include accessing geometrically located data that represent one or more virtual items with respect to a three-dimensional coordinate system; generating a three-dimensional map based at least in part on real image data of a three-dimensional space as acquired by a camera; rendering to a physical display a mixed reality scene that includes the one or more virtual items at respective three-dimensional positions in a real image of the three-dimensional space acquired by the camera; and re-rendering to the physical display the mixed reality scene upon a change in the field of view of the camera. Such a method may include issuing a command to target one of the one or more virtual items in the mixed reality scene and/or locating another virtual item in the mixed reality scene and storing data representing the virtual item with respect to a location in a three-dimensional coordinate system. As described herein, a module or method action may be in the form of one or more processor-readable media that include processor-executable instructions.
Computing device 900 may have additional features or functionality. For example, computing device 900 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in
Computing device 900 may also contain communication connections 916 that allow the device to communicate with other computing devices 918, such as over a network. Communication connections 916 are one example of communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data forms. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.