The following relates generally to the display of data generated from or representing spatial coordinates.
In order to investigate an object or structure, it is known to interrogate the object or structure and collect data resulting from the interrogation. The nature of the interrogation will depend on the characteristics of the object or structure. The interrogation will typically be a scan by a beam of energy propagated under controlled conditions. Other types of scanning include passive scans, such as algorithms that recover point cloud data from video or camera images. The results of the scan are stored as a collection of data points, and the position of the data points in an arbitrary frame of reference is encoded as a set of spatial-coordinates. In this way, the relative positioning of the data points can be determined and the required information extracted from them.
Data having spatial coordinates may include data collected by electromagnetic sensors of remote sensing devices, which may be of either the active or the passive types. Non-limiting examples include LiDAR (Light Detection and Ranging), RADAR, SAR (Synthetic-aperture RADAR), IFSAR (Interferometric Synthetic Aperture Radar) and Satellite Imagery. Other examples include various types of 3D scanners and may include sonar and ultrasound scanners.
LiDAR refers to a laser scanning process which is usually performed by a laser scanning device from the air, from a moving vehicle or from a stationary tripod. The process typically generates spatial data encoded with three dimensional spatial data coordinates having XYZ values and which together represent a virtual cloud of 3D point data in space or a “point cloud”. Each data element or 3D point may also include an attribute of intensity, which is a measure of the level of reflectance at that spatial data coordinate, and often includes attributes of RGB, which are the red, green and blue color values associated with that spatial data coordinate. Other attributes such as first and last return and waveform data may also be associated with each spatial data coordinate. These attributes are useful both when extracting information from the point cloud data and for visualizing the point cloud data. It can be appreciated that data from other types of sensing devices may also have similar or other attributes.
The visualization of point cloud data can reveal to the human eye a great deal of information about the various objects which have been scanned. Information can also be manually extracted from the point cloud data and represented in other forms such as 3D vector points, lines and polygons, or as 3D wire frames, shells and surfaces. These forms of data can then be input into many existing systems and workflows for use in many different industries including for example, engineering, architecture, construction and surveying.
A common approach for extracting these types of information from 3D point cloud data involves subjective manual pointing at points representing a particular feature within the point cloud data either in a virtual 3D view or on 2D plans, cross sections and profiles. The collection of selected points is then used as a representation of an object. Some semi-automated software and CAD tools exist to streamline the manual process including snapping to improve pointing accuracy and spline fitting of curves and surfaces. Such a process is tedious and time consuming. Accordingly, methods and systems that better semi-automate and automate the extraction of these geometric features from the point cloud data are highly desirable.
Automation of the process is, however, difficult as it is necessary to recognize which data points form a certain type of object. For example, in an urban setting, some data points may represent a building, some data points may represent a tree, and some data points may represent the ground. These points coexist within the point cloud and their segregation is not trivial.
Automation may also be desired when there are many data points in a point cloud. It is not unusual to have millions of data points in a point cloud. Displaying the information generated from the point cloud can be difficult, especially on devices with limited computing resources such as mobile devices.
From the above it can be understood that efficient and automated methods and systems for extracting features from 3D spatial coordinate data, as well as displaying the generated data, are highly desirable.
Embodiments of the invention or inventions will now be described by way of example only with reference to the appended drawings wherein:
a) to 5(h) are schematic diagrams illustrating example stages for generating a height map from data points having spatial coordinates.
a) and 18(b) are schematic diagrams illustrating example stages in the method of clipping in a 3D UI window.
It will be appreciated that for simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the embodiments described herein. Also, the description is not to be considered as limiting the scope of the embodiments described herein.
The proposed systems and methods display the data generated from the data points having spatial coordinates. The processing and display of the data may be carried out automatically by a computing device.
As discussed above, the data may be collected from various types of sensors. A non-limiting example of such a sensor is the LiDAR system built by Ambercore Software Inc, and available under the trade-mark TITAN.
Turning to
Each of the collected data points is associated with respective spatial coordinates which may be in the form of three dimensional spatial data coordinates, such as XYZ Cartesian coordinates (or alternatively a radius and two angles representing Polar coordinates). Each of the data points also has numeric attributes indicative of a particular characteristic, such as intensity values, RGB values, first and last return values and waveform data, which may be used as part of the filtering process. In one example embodiment, the RGB values may be measured from an imaging camera and matched to a data point sharing the same coordinates.
The determination of the coordinates for each point is performed using known algorithms to combine location data, e.g. GPS data, of the sensor with the sensor readings to obtain a location of each point with an arbitrary frame of reference.
Turning to
It can be appreciated that the data 26 may be processed according to various computer executable operations or instructions stored in the software. In this way, the features may be extracted from the data 26.
Continuing with
It can be appreciated that there may be many other different modules for extracting features from the data having spatial coordinates 26.
Continuing with
Also shown in the memory 24 is a database 520 storing one or more base models. There is also a database 522 storing one or more enhanced base models. Each base model within the base model database 520 comprises a set of data having spatial coordinates, such as those described with respect to data 26. A base model may also include extracted features 30, which have been extracted from the data 26. As will be discussed later below, a base model 522 may be enhanced with external data 524, thereby creating enhanced base models. Enhanced base models also comprise a set of data having spatial coordinates, although some aspect of the data is enhanced (e.g. more data points, different data types, etc.). The external data 524 can include images 526 (e.g. 2D images) and ancillary data having spatial coordinates 528.
An objects database 521 is also provided to store objects associated with certain base models. An object, comprising a number of data points, a wire frame, or a shell, has a known shape and known dimensions. Non-limiting examples of objects include buildings, wires, trees, cars, shoes, light poles, boats, etc. The objects may include those features that have been extracted from the data having spatial coordinates 26 and stored in the extracted features database 30. The objects may also include extracted features from a base model or enhanced base model.
Many of the above modules are described in further detail in U.S. Patent Application No. 61/319,785 and U.S. Patent Application No. 61/353,939, whereby both patent applications are herein incorporated by reference in their entirety.
Turning to
The display modules described herein provide methods for encoding, transmitting, and displaying highly detailed data on computer-limited display systems, such as mobile devices, smart phones, PDAs, etc. Highly detailed point cloud models can consist of hundred's of thousands, or even millions, of data points. It is recognized that using such detailed models on a viewing device that has limited computing and graphics power is difficult, and the challenge for doing so is significant.
It will be appreciated that any module or component exemplified herein that executes instructions or operations may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data, except transitory propagating signals per se. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology. CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the computing device 20 or accessible or connectable thereto. Any application or module herein described may be implemented using computer readable/executable instructions or operations that may be stored or otherwise held by such computer readable media.
Details regarding the different display systems and methods, that may be associated with the various modules in the display software 516, will now be discussed.
In the display of data, three-dimensional detail can be represented using parametric means, such as representing surface contours using NURBS (Non-Uniform Rational B-Spline) and other curved surface parameters. However, this approach is difficult to compute and expensive to render, and is most suitable for character rendering. Often times, artificial detail is ‘created’ via use of fractals, to give the appearance of detail where it does not exist. However, while this might make a pleasing visual picture, it does not represent the true object. Other means to represent detail include representing successively higher resolution datasets as a ‘pyramid’ whereby high resolution data is transmitted when a closer ‘zoom’ level is desired. This method breaks down when the best (e.g. highest) level of detail exceeds the ability of the transmission link or the ability of the computer to support the data volume. Moreover, higher resolution data is very large and not very well suited to compression. Many systems also employ ‘draping’ of a two-dimensional image over three-dimensional surfaces. This gives a visual appearance that may resemble a realistic 3D surface, but suffers from visual artefacts. For example, when draping a 2D image of a building with trees in the foreground, the image being draped on a 3D model of a building, the result will be flattened trees along the sides of the 3D building. Furthermore, this is only suitable for basic visual rendering techniques such as daytime lighting. By providing systems and method for height mapping and color mapping, one or more of the above issues can be addressed.
Turning to
In another aspect, turning to
Different stages or operations are shown in
At
At
The above operations shown in
The above operations allows an image of an object to include surface detail. For example, a point cloud of a building may be provided, whereby the building has protrusions (e.g. gargoyles, window ledges, pipes, etc.) that raised above the building's wall surface. The point cloud may have data points representing such protrusions. A dense polygonal representation may also reveal the shape of the protrusions. However, to reduce the data size, when the dense polygonal representation of the point cloud has been reduced, the building may appear to have a flat surface, in other words, a large polygon may represent one wall of the building, and the surface height detail is lost. Although this reduces the data size and image resolution, it is desirable to maintain the height detail. By implementing the above operations (e.g. determining a height value for each pixel in the image based on the point cloud data), the height detail for the protrusions can be maintained. Therefore, the polygon representing the wall of the building may appear flat, but still maintain surface height information from the height or bump mapping. Based on the height or bump mapping, the image can be rendered, for example, whereby pixels with lower height values are darker and pixels with higher height values are brighter. Therefore, window ledges on a building that protrude out from the wall surface would be represented with brighter pixels, and window recesses that are sunken within the wall surface would be represented with darker pixels. There are many other known visualization or image rendering methods for displaying pixels with height values which can be applied to the principles described herein.
Turning to
At block 140, a shell surface of the extracted object is generated. The shell surface comprises is a dense polygon representation (e.g. comprises many polygons). The shell surface can, for example, be generated by applying Delaunay's triangulation algorithm. Other known methods for generating wire frames or 3D models are also applicable. At block 142, the number of polygons of the shell surface is reduced. The methods and tools for polygon reduction in the area of 3D modelling and computer aided design are known and can be used herein. It can be appreciated that polygonization (e.g. surface calculation of polygon meshes) are known. For example, an algorithm such as Marching Cubes may be used to create a polygonal representation of surfaces. These polygons may be further reduced through computing surface meshes with less polygons. An underlying ‘skeleton’ model representing underlying object structure (such as is used in video games) may also be employed to assist the polygonization process. Other examples polygonization include a convex hull algorithm for computing a triangulation of points from the voxel space. This will give a representation of the outer edges of the point volume. Upon establishing the polygons or meshes, the number of polygons can be reduced using known mesh simplification techniques (e.g. simplification using quadratic errors, simplification envelopes, parallel mesh simplification, distributed simplification, vertex collapse, edge collapse, etc.). A reduction in polygons decreases the level of detail, as well as the data size, which is suitable for devices with limited computing resources.
At block 144, the reduced number of polygons are represented as a collection of pixels that compose an image. In one embodiment, at block 146, for each pixel, the closest data point to the given pixel is identified. At block 148, the height of the closest data point above the polygonal plane with which the pixel is associated is determined. The height may be measured as the distance normal (e.g. perpendicular) to the polygonal plane.
In another embodiment, at block 150, for each pixel, the closest n data points to the given pixel are identified. Then, at block 152, the average height of the closest n data points measured above the polygonal plane(s) is determined.
In another embodiment, at block 154, for each pixel in the image, the data points within distance or range x of the given pixel are identified. Then, at block 156, the average height of the data points (within the distance x) is determined.
It can be appreciated that there are various ways of calculating the height attribute that is to be associated with a pixel. The determined height is then associated with the given pixel (block 158). From the process, the output 160 of the image of the object is generated, whereby each pixel in the image has an associated height value.
A similar process can be applied to map other attributes of the data points in the point cloud. For example, in addition to mapping the height of a point above a surface, other attributes, such as color, intensity, the number of reflections, etc., can also be associated with pixels in an image.
Turning to
At block 174, for each pixel, the closest data point to the given pixel is identified. At block 176, the color value (e.g. the RGB value) of the closest data point is identified and then associated with the given pixel (block 178). The output 180 from the process is an image of the object, whereby each pixel in the image is associated with a color value (e.g. RGB value).
It is appreciated that the images with height mapping or color mapping, or both, can be compressed using known wavelet-based compression methods to allow for multi-resolution extraction of the data. Other compression methods may support multi-resolution extraction of the data.
The compressed image files can be reconstructed. At a first stage, different types data is gathered. In particular, the compressed image files for the height maps and the surface color maps, the approximate model which references these maps, as well as possible surface classification parameters are transmitted to the rendering module or processor (not shown).
At a second stage, based on the view distance and angle (e.g. zoom views, side view, etc.), the images are extracted to an appropriate resolution. This, for example, is done using wavelet-based extraction. This extraction can change as the view zooms to maintain visually appealing detail.
At a third stage, the height maps, color maps, and/or parametric surface material textures are passed to a pixel shader based rendering algorithm through use of texture memory. A pixel shader can be considered a software application that can operate on individual pixels of an image in a parallel manner, through a graphics processing unit, to produce rendering effects. Texture memory is considered dedicated fast access memory for a GPU to use. In other words, the pixel shader, using the texture memory, is able to store data in high speed memory and use a special pixel-processing program to render the building model to provide detail that is visible to the eye.
At a fourth stage, the per-pixel light-based height map and RGB texturing is used to render the approximate model. User interaction or inputs may provide height information based on reversing texture interpolation to recover texel values (e.g. values of textured pixels or textured element) from height map for precision measurement, or to provide haptic feedback of surface texture. Such compression and decompression as described above can be used to generate real-time rendering of the images. In one embodiment, real-time rendering can be performed in the GPU by setting up the parameters for geometry transformation and then invoking the rendering commands (e.g. such as for the pixel shader).
The height mapping and the color mapping can also be applied to determine or classify the materials of objects. Generally, based on the color of a surface, the height or texture of surface, and the type of object, the type of material can be determined. For example, if the object is known to be a wall that is red and bumpy, then it can be inferred or classified that the wall material is brick.
Turning to
At block 188, in the image of the object, the height properties (e.g. if there is a height or bump mapping) or the color properties (e.g. if there is a color mapping), or both, are identified for the object. In other words, it is determined if there are there any bumps or depressions in the object, or what the color patterns are on the object. At block 190, based on at least the type of object, the computing device 20 selects an appropriate material classification algorithm from a material classification database (not shown). The material classification database contains different classification algorithms, some of which are more suited for certain types of objects. At block 192, the selected classification algorithm is applied. The classification algorithm takes into account the color mapping or height mapping, or both, to determine the material of the object. At block 194, the determined material classification is associated with the object.
In general, it is recognized that the color mapping, or height mapping, or both can be used to classify the material of the object. Further, once the material is classified (e.g. brick material for a wall surface), then the object can be displayed having that material.
An example of material classification for wall and roof surfaces is provided in
At block 204, it is determined if the surface is a wall or a roof. If the surface is a wall, then at block 206, if the image has color mapping, then it is determined whether there are straight and parallel lines that are approximately horizontal to the ground. If not, at block 208, then the wall surface material is classified as stucco. If there are straight and parallel line, then at block 210, it is determined if there are segments of straight lines that are perpendicular to the parallel lines. If not, in other words there are only straight parallel lines on the wall, then the wall surface material is classified as siding (block 212). If there are segments of straight and perpendicular lines, then at block 214, the wall surface is classified as stone or brick material.
In addition, or in the alternative, if the image has height mapping as well, then at block 216 it is determined if there are rectangular-shaped depressions or elevations in the wall. If not, no action is taken (block 218). However, if so, then at block 220, the rectangular-shaped depressions or elevations are outlined, and the material of the surface within the outlines are classified as windows.
If, from block 204, the surface of the object relates to a roof, then the process continues to
Continuing with
If there are straight and parallel lines, at block 224, it is determined if there are segments of straight lines that are perpendicular to the parallel lines. If not, at block 228, the roof surface material is classified as tiling. If there are straight and perpendicular line segments, then the roof material is classified as shingles (block 226).
In addition, or in the alternative, if the image has a height or bump mapping, then at block 236 it is determined if the height variance fir the image is lower then a given threshold x. If the height variance for the roofing surface is below x, then the roof surface is classified as one of shingles, asphalt or gravel (block 238). Otherwise, the roof surface is classified as tiling (block 238).
The above algorithms are examples only, and other variations, alternatives, additions, etc. for classifying materials based on color mapping or height mapping, or both, are applicable to the principles described herein.
Other example classification methodologies include using of parameters of geometry. As discussed above, the angle of geometry of an object relative to a ground surface can be used to determine the type of object and furthermore, the type of material. Objects on the same plane as a ground (e.g. a road) can be determined based on known parameters (e.g. feature extraction). The object's recognized features can also be compared with known materials.
Other classification approaches include using color patents or image patterns from the image. In particular, regular patterns (e.g. bricks, wood) can be identified based on a set of pixels and a known set of possibilities. Road stripes and airfield markings can also be identified based on their pattern. A window can be identified based on reflections and their contrast. Lights can also be identified by their contrast to surroundings. Crops, land coverings, and bodies of water can be identified by color.
Occluded information can also be synthesized or reproduced using classification techniques, based on the height mapping and color mapping. For example, when an environment containing a wall and a tree (in front of the wall) is interrogated using LIDAR from only one angle, a 2D image may give the perception that the tree is pasted on the wall. In other words, the tree may appear to be a picture on a wall, rather than an object in front of the wall. An image with a height mapping would readily show that the tree is considered a protrusion relative to the wall surface. Therefore, if it is desired that only the wall is to be displayed, then any protrusions relative to the wall surface (based on the height mapping) can be removed. Removal of the tree also produces visual artifacts, whereby the absence of the tree produces a void (e.g. no data) in the image of the wall. This void can be synthesized by applying the same color pattern as the wall's color mapping. Alternatively, if the wall has been given a certain material classification, and if a known pattern is associated with the given material classification, then the known pattern can be used to “fill” the void. Naturally, the pattern would be scaled to correspond with the proportions of the wall, when filling the void. These approaches for artifacts can also be applied for top-down views of cars on a roof.
Other classification methods can use different inputs, such as the signal strength of return associated with points in a point cloud, and IR or other imagery spectrums.
The applications for the above classification methods include allowing the detailed display of objects without the need for a detailed RBG of bump map for an approximate model. The surfaces of the object could be more easily displayed by draping the surfaces with the patterns and textures that are correspond to the object's materials. For example, instead of showing a brick wall composed of a height mapping and a color mapping, a brick pattern can be laid over the wall surface to show the similar effect. This would involve: encoding surfaces with a material classification code; potentially encoding a color (or transparency or opaqueness level) so the surface can be accurately rendered; and encoding parametric information (such as a scale or frequency of a brick pattern or road markings).
The rendering process can use classification information to create more realistic renderings of the objects. For example, lighting can be varied based on modeling the material's interaction with lighting in a pixel shader. Material classification can also be used in conjunction with haptic effects for a touch UI. Material classification can also be used for 3D search parameters, estimation, emergency response, etc. Material classification can also be used to predict what sensor images of a feature might look like. This can be used for active surveillance, real time sensor 3D search, etc.
In another aspect of the systems and methods described herein, the display of the data is interactive. A user, for example, may want to view a 3D model of one or more objects from different perspectives. The user may also want to extract different types of information from the model. The amount and variety of spatial data is available, as can be understood from above. However, displaying the data in a convenient and interactive approach can be difficult. The difficulties of relaying the spatial data to a user are also recognized when displaying data on a 2D display screen, or a computing device with limited computing resources (e.g. mobile devices). Typically, user interface systems that are natively designed for 2D screens are not suitable for the display of rich spatial data
A 3D UI is provided to address some these difficulties. A 3D UI is a user interface that can present objects using a 3D or perspective view. UI objects include typically three categories. In a first category, there are items intended for ‘control’ of the computer application, such as push buttons, menus, drag regions, etc. In a second category, there are items intended for data display, such as readouts, plots, dynamic moving objects, etc. In a third category, there are 3D items, typically objects representing a 3D rendering of a model or other object. The 3D models or objects, as described earlier, may be generated or extracted from point cloud data that, for example, has been gathered through LiDAR.
A 3D UI is composed of 3D objects and provides a user interface to a computer application. 3D objects or models do not need to necessarily look 3D to a user. In other words, 3D objects may look 2D, since they are typically displayed on a 2D screen. However, whether the resulting images (of the 3D objects) are 2D or 3D, the generating of the images involves the use of 3D rendering for display.
In one aspect, a 3D UI system is provided to allow haptic feedback (e.g. tactile or force feedback) to be integrated with the display of 3D objects. This allows 3D spatial information, including depth, to be a part of the user experience. In another aspect, a 3D UI is provided for mapping typical 2D widget constructs into a 3D system, allowing more powerful UIs to be constructed and used in a natively 3D environment. For example, 2D widgets (e.g. a drop box, a clipped edit window, etc.) can be displayed on 2D planes in a 3D scene. In another aspect, the 3D UI allows ‘smart’ 3D models that contain interactive elements. For example, a 3D building model can be displayed and have encoded within interactive UI widgets. The UI widgets allow a user to manipulate or extract information from the building model. The 3D UI can operate in various environments, such as different classes of OpenGL based devices. OpenGL Web clients, etc.
In another aspect, the above 3D UI approaches may be integrated into a software library to manage the creation and display of these functions. Thus, the 3D UIs may be more easily displayed on different types of devices. The above 3D UI approaches also enable future applications on less typical displays, such as head mounted displays, 3D projectors, or other future display technologies.
In yet another aspect, the 3D UI provides navigation tools allowing the point of view of a 3D model to be manipulated relative to points or objects of interest.
Turning to
Based on the above, the outputs from the model convertor module 246 include geometric objects (e.g. definitions, instances (copies)); logic objects related to the dynamic display of data, interactive display panels, and haptics; and texture objects. These outputs may be stored in the processed 3D models and UI database 250.
Turning to
Continuing with
As can be seen from
Turning to
The display 272 shows an image of a budding 292 beside a road 300. It can be appreciated that the image of the building 292 and road 300 are generated or derived from 3D model of point cloud data. In other words, the three dimensional shape of the building 292 and the road 300 are known. The building 292 includes a roof 294, which in this case is tiled. Adjacent to the roof 294 is one of the building's walls 296. Located on the wall 296 are several protruding vents 298. As described earlier with respect to
Based on the position of the pointer 304 on the display 272, a haptic response is accordingly produced. In particular, the position of the pointer 304 on the display 272, represents a position on the image of the building 292 being displayed. The position on the image of the building 292 corresponds with a position on the surface of the 3D model of the building 292. Therefore, as the pointer moves across the display 272, it is also considered to be moving along the surface of a 3D model of the building 292.
It can be appreciated that the 3D UI software engine module 266 coordinates the user input for pointing or directing the position of the pointer 304 with the 3D GPU module 268. Then, the 3D GPU integrates the 3D model of the building 292, the position of the pointer 304, and the appropriate haptic response 290. The result is that the user can “feel” the features of the building 292, such as the corners, edges, and textured surfaces through the haptic response 290.
Continuing with
In another example, if the position or location of the pointer 304 on the display 272 were to move from the wall 296 to the adjacent roof 294, then the pointer 304 would consequently be crossing over the roof's edge defined by the wall 296 and roof 294. The edge would also be represented in the 3D model of the building 292 and would be defined by the surface of the wall 296 in one plane and the surface of the roof 294 in another plane (e.g. in a plane perpendicular to the wall's plane). The pixels on the display 272 representing the edge would then be associated with a haptic response, so that when the pointer 304 moves over the edge, the 3D GPU would detect the edge and provide a haptic response. In an example embodiment, the haptic response would be a short and intense vibration to tactilely represent the sudden orientation of the planes between the wall 296 and the roof 294.
In another example, the material or texture classification (e.g. based on color mapping and height mapping), and the height mapping that are associated with a polygon surface on the building model, can also be tactilely represented. When the pointer 304 moves over a bumpy surface, then the device 258 will provide a haptic response (e.g. intermittent vibrations).
In a specific example shown in
In another example, also shown in
Continuing with
At block 332, that is if the pointer 304 moves along the same polygon, it is further determined if the position of the pointer 304 in the 3D model changes in depth. In other words, it is determined if the pointer 304 is moving further away or closer from the perspective point of view of the 3D model as shown on the display 272. If so, at block 334, a haptic response is activated. The haptic response may vary depending on whether the pointer 304 is moving closer or further, and at what rate the depth is changing. If the depth is not changing along the polygon, the no action is taken (block 336).
At block 338, it is determined if there is a height map associated with the polygon. If not, not action is taken (block 344). If so, it is then determined if the pointer 304 is moving over a pixel that is raised or lowered relative to the polygon surface. If it is detected that the pointer 304 is moving over such a pixel, then a haptic response is activated (block 342). The haptic response can vary depending on the height value of the pixel. If no height value or difference is detected, then no action is taken (block 344).
If the movement of the pointer 304 is moving along the same, or within the same, polygon, then the computing device 258 may also determine if there is a material classification associated with the polygon (block 346). If so, at block 348, if it is detected that the material is textured, then a haptic response is generated. The haptic response would be represent the texture of the material. If there is not material classification, no action is taken (block 350).
Continuing with
In another aspect of the user interface, traditional two-dimensional planes may be displayed as windows in a 3D environment. This operation is generally referred to as windowing, which enables a computer to display several programs at the same time each running its own “window”. Typically, although not necessarily, the window is a rectangular area of the screen where data or information is displayed in 2D. Furthermore, in a window, the data or information is displayed within the boundary of the window but not outside (e.g. also called clipping). Further data or information in a window is occluded by other windows that are on top of them, for example when overlapping windows according to the Z-order (e.g. the order of objects along the z axis). Data or information within a window can also be resized by zooming in or out of the window, while the window size is able to remain the same. In many cases, the data or information within the window is interactive to allow a user to interact with logical buttons or menus within the window. A well-known example of a windows system is Microsoft Windows™, which allows one or more windows to be shown. As described above, windows are considered to be a 2D representation of information. Therefore, displaying the 2D data in a 3D environment becomes difficult.
The desired effect is to present a 2D window so it visually appears on a 3D plane within a 3D scene or environment. A typical approach is to render the window content to a 2D pixel buffer, which is then used as a texture map within the Graphics Processing Unit (GPU) to present the window in a scene. In particular, the clipping of data or information is done through 2D rectangles in a pixel buffer. Further, the Z-order and the resizing of information or data in the window is also computed within the reference of a 2D pixel buffer. The interactive pointer location is also typically computed by projecting a 3D location onto the 2D pixel buffer. These typical approaches involving mapping 2D content as a texture map in 3D can slow down processing due to the number of operations, as well as limit other capabilities characteristic of 3D graphics. Use of a 2D pixel buffer is considered an indirect approach and requires more processing resources due to the additional frame buffer for rendering. This also requires ‘context switching’. In other words, the GPU has to interrupt its current 3D state to draw the 2D content and then switch back to the 3D state or context. Also the indirect approach requires more pixel processing because the pixels are filled once for 3D then another time when the textured surface is drawn.
By contrast, the present 3D user interface (UI) windowing mechanism, as described further below, directly renders the widgets from a 2D window into a 3D scene without the use of a 2D pixel buffer. The present 3D UI windowing mechanism uses the concept of a 3D scene graph, whereby each widget, although originally 2D, is considered a 3D object. Matrix transformations are used so the GPU interprets the 2D points or 2D widgets directly in a 3D context. This, for example, is similar to looking at a 2D business card from an oblique angle. Matrix commands are passed to the GPU to achieve the 3D rendering effect.
Turning to
The window 360 is defined by a series of vertices 361, 362, 363, 364 that are used to define a plane. In this case, there are four vertices to represent the four corners of a rectangle or trapezoid. Lines 365, 366, 367, 368 connect the vertices 361, 362, 363, 364, whereby the lines 365, 366, 367, 368 define the boundary of the window 360. Four clipping planes 373, 374, 375, 376 are formed as a border to the window 360. The clipping planes 373, 374, 375, 376 protrude from the boundary lines 365, 366, 367, 368.
In particular, to form the clipping planes, at each vertex, the cross product of the boundary lines intersecting the corner are calculated to determine a normal vector. For example, at vertex 362, the cross product of the two vectors defined by lines 366 and 367 is computed to determine the normal vector 370. In a similar manner, the vectors 371, 372, and 369 are computed. These four vectors 369, 370, 371, 372 are normal to the plane of the window 360. A clipping plane, for example, clipping plane 375, can be computed by using the geometry equations defining lines 370 and 367. In this way, the plane equation of the clipping plane 375 can be calculated.
Turning to
At block 382, four vertices comprising x,y,z coordinates are received. These vertices (e.g. vertices 361, 362, 363, 364) define corners of a rectangular or trapezoidal window, which is a plane in 3D space. It can be appreciated that other shapes can be used to define the window 360, whereby the number of vertexes will vary accordingly.
At block 384, using line geometry, the lines (e.g. lines 365, 366, 367, 368) defining the window boundary from the four vertices are computed. At block 386, at each vertex, a vector normal to the window's plane is computed. This is done by taking the vector cross product of the boundary lines intersecting the given vertex. This results in four vectors (e.g. vectors 369, 370, 371, 372) at each corner normal to the window plane. At block 400, for each boundary line, compute a clipping plane defined by the vector of the boundary line and at least one normal vector intersecting a vertex also lying on the boundary line. This results in four clipping plane that intersect each of the boundary lines. At block 402, the “3D” objects are displayed in the window plane.
The objects (e.g. buttons, panels in the calendar, pop-up reminder, etc.) are composed of a fragments or triangle surfaces. Some objects, such as those at the edge of the window 360, have one or more vertices outside the window boundary. In other words, a portion of the object is outside the window 360 and need to be clipped. The clipping of the image means that the portion of the object outside the window is not rendered, thereby reducing processing time and operations. To clip the portion of the object outside the window 360, a boundary line is used to draw a line through the surface of the object. Triangle surfaces representing the objects are recalculated so that all vertices of the object that have not been clipped remain within the 3D objects in the window plane. Additionally, the triangles are recalculated so that the edges of the triangles are flush with the boundary lines (e.g. do not cross over to the outside area of the window). At block 406, only those triangles that are completely drawn within the window are rendered.
a) and 18(b) illustrate an example of the triangle recalculation. The window 410 defines boundaries, and the object 412 has crossed over the boundaries. The object 412 is represented by two triangles 414, 416, a typical approach in 3D surface rendering. The triangles 414, 416 are drawn in a way as if there were no clipping planes. A vertex common to both triangles 414, 416 is outside the boundary of the window 410. Therefore, as per
The effects of zooming and scrolling are created by using similar techniques to clipping. Appropriate matrix transformations are applied to geometry of the objects to either change the size of the objects (e.g. zooming in or out) or to move the location of the objects (e.g. scrolling). After the matrix transformations have been completed, if one or more vertices are outside the window 260, then clipping operations are performed, as described above.
Turning to
At block 422, the Z-order of each object that will be displayed in the window is identified. Typically, the object with the highest numbered Z-order is arranged at the front, although other Z-order conventions can be used. At block 424, for each object, a virtual shape or stencil is rendered. The stencil has the same outline as the object, whereby the stencil is represented by fragments or triangles. The content (e.g. colors, textures, shading, text) of the object is not shown. At block 426, in a stencil buffer, the stencils corresponding to the objects are arranged from back to front according to the Z-order. At block 428, in the stencil buffer, for each stencil, it is identified which parts or fragments of the stencils are not occluded (e.g. overlapped) by using the Z-ordering data and the shapes of the objects. At block 430, if required (e.g. for more accuracy), the fragments of the stencil recalculated to more closely represent the part of the stencil that is not occluded. At block 432, for each object, the pixels are rendered to show the content for only the fragments of the stencil that are not occluded. It can be appreciated that this ‘stencil’ and Z-ordering method allows 3D objects to be correctly depth buffered.
Turning to
At stage 440, a modified calendar stencil 437 is recalculated with the fragments or triangles drawn to be flush against the border of the occluded area defined by the pop-up stencil 438. As can be best seen in the exploded views 442, 446, the pop-up stencil 438 is one object and the calendar stencil 437 is another object, whereby fragments are absent in the location of the pop-up reminder. Based on the stencils, the content can now be rendered. In particular, the pop-up stencil 438 is rendered with content to produce the pop-up reminder object 444, and the modified calendar stencil 437 is rendered with content to produce the calendar object 448. It is noted that the calendar content located behind the pop-up reminder object 444 is not rendered in order to reduce processing operations. At stage 450, the pop-up reminder object 444 is shown above the calendar object 448. It can be seen that the Z-ordering method described here directly renders the objects within the window of a 3D scene and does not rely on a pixel buffer.
Turning to
At block 458, a bounding circle or bounding polygon is centered around the ray. This acts as a filter. In particular, at block 460, any objects outside the bounding circle or polygon are not considered. For objects within the bounding circle or polygon, it is determined which of the triangle surfaces within the bounding circle or polygon intersect the ray. At block 462, the triangle intersecting the ray that is closest to the camera's point of view, (e.g. the user's point of view on the display screen) is considered to be the triangle with the focus. The object associated with the intersecting triangle also has the focus. At block 464, if the object that has the focus is interactive, upon receiving a user input associated with the pointer, an action is performed. It can be appreciated that the above operations apply to both windowing and non-windowing 3D UIs. However, as the objects in the 3D UI window do not have depth and are coplanar with the window, the topmost object (e.g. object with highest Z-order) has the input focus, if it intersects with the ray.
It can be seen that by rendering the objects in a window plane as 3D objects, that a 2D buffer is not required when clipping, Z-ordering, or interacting with the objects in the window.
In another aspect of the 3D UI, a data structure is provided to more easily organize and manipulate the interactions between objects in a 3D visualization. Specifically, the images that represent objects or components in a 3D visualization can be represented as a combination of 3D objects. For example, if a 3D visualization on a screen shows a building, two trees in front of the building and a car driving by, each of these can be considered objects.
A 3D UI modeling tool is provided to create definitions or models of each of the objects. The definitions include geometry characteristics and behaviors (e.g. logic, or associated software), among other data types.
The application accesses these definitions in order to create instances of the objects. The instances do not duplicate the geometry or behavioral specifications, but create a data structure so each model can have a unique copy of the variables. Further details regarding the structure of the definitions, instances and variables are described below.
During operation, variable values and events, such as user inputs, are specified to each instance of the object. The processing also includes interpreting the behaviors (e.g. associated computer executable instructions) while rendering the geometry. Therefore each instance of the model, depending on the values of the variables, may render differently from others instances.
Turning to
Turning to
Continuing with
The logic definition 494 receives inputs that can be values associated with variable or events. The logic is defined as binary data structures holding conditional parameters, jumps (e.g. “goto” functions), and intended mathematical operations. Outputs of the logic populate variables, or initiate actions modifying the geometry of the object, or initiate actions intended to invoke external actions. External actions can include manipulation of variables in other objects.
The geometry definition 496 contains data structures representing vertices, polygons, lines and textures.
Turning to
The APIs 494 issue commands to set the value of a variable or standard variable (block 504), as well as set the values in model instances (block 506). These commands to determine the values are passed to the model instances creator 496. In order to create a model instance, the model definitions are loaded (block 508). Then, the model instances creator 496 uses the values of the variables and commands received from the APIs 494 to create instances of the model definitions (block 510). In other words, the model instances are populated with the variable values provided by the APIs 494. As the model instance is typically considered an object in 3D space, at block 512, the location (e.g. spatial coordinates) of the model instance is then established based on the API commands.
Upon creating a model instance, the logic execution engine 498 parses through the logic definition (e.g. computer executable instructions) related to the model instance (block 514). Based on the logic definition, the logic execution engine 498 implements the logic using the variable values associated with the model instance (block 516). In some cases, the logic definitions may alter or manipulate the standard variable values (block 518). Standard variables can refer to variables that are always present for a given type of object. Additional variables may exist that are used to do additional logic, etc for variants of the object. It can be appreciated, however, that the notion of a standard variable and the notion of general variables are flexible and can be altered based on the objects being displayed in a 3D scene.
The render execution engine 500 then renders or visually displays the model instances, according to the applied logic transformations and the variable values. At block 520, the render execution engine 500 parses through the model instances. Those model instances that are within the view of the display (e.g. from the perspective of the virtual “camera”) and have not been turned off (e.g. made invisible) by standard variables, are rendered (block 522). The transformations that have been determined by the logic execution engine 498 and API commands altering the state variables are applied (block 524). In other words, matrices are read from memory and passed to GPU commands (e.g. “set current matrix”). Similarly, color values, etc. are read from memory and passed via the API to the GPU. At block 526, the API commands can also be used to render the geometry, whereby the geometry in the data structure exists as a set of vertex, normal, and texture coordinates. These API commands, such as “draw this list of vertices now”, are passed to the GPU.
The interaction controller 502 allows for a user input to interact with the rendered objects, or model instances. In the example of a pointer or cursor, at block 528, it is determined which object is intersected by a pointer or cursor position. This is carried out by creating a 3D ray from the pointer and determining where the ray intersects (block 530). Once interaction with a selected model instance is recognized, events may be triggered based on the logic associated with the selected model instance (block 532).
Another example of a scene management configuration 534 is shown in
The scene management strategy described here also provides many advantages. The logic of an application is expressed as data instead of compiled source code, which allows for ‘safe’ execution. This has similarities to interpreted languages such as Java, but has a far smaller data-size and higher performance.
The scene management strategy also provides the ability to represent geometry of an application in a GPU-independent manner. In other words, geometric commands can be rendered on almost any graphics API, which is very different from APIs that allow geometry rendering commands to be contained within Java. Further, by representing geometry in a GPU-independent manner, optimization of rendering can be implemented to suit back-end applications.
The scene management strategy can represent intended user interaction of an application without code. The existing or known systems are typically weak in their ability to represent the full dynamics of an application. However, the data structures (e.g. definitions and instances) of the models allow for logic to be encoded, enabling the models to react to user stimulus or inputs. Although some known web languages can encode logic, they are not able to correlate the logic to 3D geometry and their logic is limited to use within an internet browser. Additionally, such web language systems are data intensive, while the scene management strategy requires few data resources.
The scene management strategy also has the ability to ‘clone’ a single object definition to support a collection of similar objects (e.g. instances). There are ‘smart’ widget libraries existing entirely as data structures and instances, or as tailored hand code within smart UI system. This efficiently organizes the definitions and the instances, thereby reducing the memory footprint and application size. It also allows ease of development from a collection of 3D model objects.
Applications of the scene management strategy are varied because it is considered fundamental data strategy, which is not market specific. It also supports content-driven application development chains where an execution engine can be embedded inside a larger system. For example, the 3D UI execution engine 492 can be embedded inside a gaming environment to produce user-programmable components of a larger application engine. It can also be used to support new device architectures. For example, UI or graphics logic generated using the scene management strategy can be supplied by an embedded system with no physical screen, and then transmitted to another device (e.g. a handheld tablet) which can show the UI. This would be useful for displaying data on portable medical devices.
The scene management strategy can also be used to offer ‘application’ GUIs within a larger context, beyond computer desktops. An example would be a set of building models in a geographic UI, where each building model offered is customized to the building itself (e.g. an instance of the building model definition). For example, when a user selects a building, a list restaurants in the building will be displayed. When selecting a certain a restaurant, a menu of the restaurant will be displayed. All this related information is encoded in a building model.
In another aspect, a method is provided for enhancing a 3D representation by combining video data with 3D objects. Typically, due to the complexity of geospatial data (e.g. LiDAR data), generating a 3D model and creating a visual rendering of the 3D model can be difficult and involve substantial computing resources. Therefore, 3D models tend to be static. Although there are dynamic or moving 3D models, these also typically involve extensive pre-computations. Therefore, the method provided herein addresses these issues and provides a 3D representation that can be updated with live video data. In this way, the 3D representation becomes dynamic, being updated to correspond with the video data.
Generally, the method involves combining the video data, such as image frames for a camera sensor, are correlating the images with surfaces of a 3D model (e.g. also referred to as the encoding stage). This data is then combined to generate or update surfaces of a 3D model that correspond with the video images, whereby the surfaces are visually rendered and displayed on a screen (e.g. also referred to as the decoding stage).
The video data and 3D objects are also treated as a single seamless stream, such that live video data has the effect of ‘coating’ 3D surfaces. This provides several advantages. Since video data is associated with the 3D surfaces, and the 3D objects are the unit of display, then the video data can therefore be viewed from any angle or location. Furthermore, the method allows for distortion to occur; this takes into account the angle of the camera relative to the surface at which it has captured an image. Therefore, different viewing angles can be determined and used to render the perspective at which the video images are displayed. In another advantage, since video data and surface data can be computed or processed in a continuous stream, the problem of static 3D scenes is overcome. The method also allows for computed surfaces to be retained, meaning that only the changes to the 3D scene or geometry (e.g. the deltas) will need to be transmitted, if transmission is required. This reduces the transmission bandwidth.
Turning to
Alternatively, the modules, components, and databases shown in
Continuing with
There are several approaches for extracting or generating surfaces and 3D models from 2D video data. In one approach, voxel calculations are used to match points in an image taken from different camera angles, or in some cases from a single camera angle. The multiple points found in both images are computed based on colors and pattern matching. This forms a 3d ‘voxel’ (volume pixel) representation of the object. The change in point location over a set of frames may be used to assist surface reconstruction, as is done in the POSIT algorithm used in video game tracking technology. Pose estimation, e.g. the task of determining the pose of an object in an image (or in stereo images, image sequence), can be used in order to recover camera geometry.
Another approach for extracting surfaces from a 2D video is polygonization, also referred to as surface calculation. A known algorithm such as “Marching Cubes” may be used to create a polygonal representation of surfaces. These polygons may be further reduced through computing surface meshes with less polygons. An underlying ‘skeleton’ model representing underlying object structure (such as is used in video games) may be employed to assist the polygonization process. A convex hull algorithm may be used to compute a triangulation of points from the voxel space. This will give a representation of the outer edges of the point volume. Mesh simplification may also be used to reduce the data requirements for rendering the surfaces. Once the polygons are formed, these constitute the surfaces used to generate the 3D model 704, which is used as input in the 3D model video encoding algorithm.
Surface recognition is another approach used to extract or generate 3D surfaces from 2D video. Once a polygonization is computed to a given level of simplification, the surfaces can be matched to the prior set of surfaces from an existing 3D model. The matching of surfaces can be computed by comparing vertices, size, color, or other factors. Computed camera geometry as discussed above can be used to determine what view changes have occurred to assist in the recognition.
Continuing with
The video surface mapping module 710 outputs a data stream 712 of raster image fragments associated with each surface. In particular, the data stream includes the surface 716 being modified (e.g. the location and shape of the surface on the 3D model) as well as the related processed video data 714. The processed video data 714 includes the extracted raster image fragments corresponding to the surface 716, as well as the angle of incidence between the camera sensor and the surface of the real object. The angle of incidence is used to determine the amount of distortion and the type of distortion of the raster image fragment, so that, if desired, the raster image fragment can be mapped onto the 3D model surface 716 and viewed from a variety of perspective viewpoints without being limited to the distortions of the original image.
As discussed above, the data stream 712, in one embodiment, can be compressed and sent to another computing device 258, such as a mobile device. If so, the computing device 258 decompresses the data stream 712 before further processing. Alternatively, the data stream 712 can be processed by the same computing device 20.
It can be appreciated that the process of updating a 3D model with video data is an iterative and continuous process. Therefore, there are previously stored raster image fragments (e.g. from previous iterations) stored in database 720 and previously stored surface polygons (e.g. from previous iterations) stored in database 724. The data stream 712 is used to update the databases 720 and 724.
The raster images fragments and angle of incidence data 714 are processed through a surface fragment selector module 718. The module 718 selects the higher quality raster image data. In this case, higher quality data may refer to image data that is larger (e.g. more pixels) and is less distorted. As per line 722, the previously stored raster image fragments from database 720 can be compared with the incoming raster data by module 718, whereby module 718 determines if the incoming raster data is of higher quality than the previous raster data. If so, the incoming raster data is used to update database 720.
The surfaces 716 from the data stream 712 are also used to update the surface polygons database 724. The GPU 268 then maps the raster image data and the angle incidence from database 720 onto the corresponding surface stored in database 724. As described earlier, the GPU 268 may also use the angle of incidence to change the distortion of the raster image fragment so that it suits the surface it is being mapped towards. The GPU 268 then displays the 3D model, whereby the surfaces of the 3D model are updated to reflect the information of the video data. If the video data is live, then the updated 3D model will represent live data. Additionally, the 3D model is able to display the video-enhanced live scene from various angles, e.g. different from the angle of the video sensor.
From the above, it can be seen that as video frames are continuously obtained, the 3D model can be also be continuously updated to reflect the video input. This provides a “live” or “dynamic” feel to the 3D model.
Turning to
At block 736, preferably, although not necessary, persistent surfaces in the video images or frames are detected. For example, surfaces that appear over a series of video frames are considered persistent frames. These surfaces are considered to be more meaningful data since they likely represent surfaces of larger objects or stationary objects. Persistent surfaces are can be used to determine the context for the 3D scene as it moves. For example, if the same wall, an example of a persistent surface, is identified in two separate image frames, then the wall can be used as a reference to characterize the surrounding geometry.
At block 738, it is determined which of the persistent surfaces correspond to the surfaces existing in the 3D model. The shape of a persistent surface is compared to surfaces of the 3D model. If there shapes are similar, then the persistent surface is considered to be a positive match to a surface in the 3D model.
At block 740, optionally, if the number of persistent surfaces that do not correspond with the 3D model exceed a given threshold, then the overall match between the video input data and the 3D model is considered to be poor. In other words, the data sets are considered to have low similarity. If so, then the process return to block 734 and a new set of surfaces are derived from the video data.
If the data sets are similar enough, then at block 742, for each persistent surface, a 2D fragment of raster data is extracted. The fragment of raster data are the pixels of the video image that compose the persistent surface. Therefore, the raster image covers the persistent surfaces. At block 744, for each persistent surface, the angle of incidence between the video or camera sensor and the persistent surface is determined and is associated with the persistent surface. The angle of incidence can be determined using known method. For example, points in the images can be triangulated, and the triangulated points can be used to estimate a camera pose using known computer vision methods. Upon determining the pose and the surface geometry, the angle between the camera sensor and the surface triangles is examined and used to determine an angle of incidence. The angle of incidence can be used to determine how the raster image is distorted, and to what degree. At block 746, the surface of the 3D model, and the associated raster image and angle of incidence can optionally be compressed and sent to another computing device 258 (e.g. a mobile device) for decoding and display. Optionally, the data can be displayed by the same computing device.
Turning to
At block 754, the selected raster images, associated angles of incidence, and associated surfaces in the 3D models are sent to the GPU 268. At block 756, each of the persistent surfaces in the 3D models are covered with the respective raster images. The surfaces are “coated” or “covered” with the new raster images if the new raster images have been selected, as per block 752.
At block 758, each raster image covering a persistent surface is interpolated, as to better cover the persistent surface in the 3D model. The interpolation may take into account both the angle of incidence of the video sensor and the perspective viewing angle that will be displayed to the user on the display 272.
Regarding block 758, it can be appreciated that in standard or known perspective texture map rendering, texture coordinates are specified as U and V coordinates corresponding to the linear distances across the texture in the horizontal and vertical directions. By way of background, with perspective correct texturing, vertex locations of the textured object are transformed into depth values (e.g. values along the Z-axis) based on their distance from the viewer. The virtual camera location is used to compute vertex locations in screen space through matrix transformation of the vertices. Individual pixels of the rendered, textured object on the screen are computed by taking the texel value by interpolation of U and V based on the interpolated Z location. This has the effect of compressing the texture data as rendered.
However, in the present approach in block 758, the texture map as transmitted in the video encoding will not be adjusted to be a flat map. It will contain data that already contains the real world perspective effect of the surface raster fragment. The perspective effect depends on the angle of incidence at which the real world camera filmed the surfaces. This perspective data is associated with each surface triangle within a texture map. If the scene were rendered from the original camera's perspective, the texture mapping algorithm could be simplified by excluding the step of interpolating U and V, and just obtaining the texel corresponding to each of the fragments' interpolated Z location. This means the compression effect of perspective correction would not be applied, because the data already contains the perspective effect. This can also be accomplished by modifying the Z coordinate to eliminate its effect in the perspective calculation. In order to adjust the viewing angle so the surface fragment data can be viewed from a different camera location, a matrix calculation can be used to compute deltas to the modified Z coordinates to account for the different camera angles. Therefore the interpolation would contain an adjustment based on the original camera sensor angle (e.g. the angle of incidence between the camera and the surface). The interpolated screen pixel would reflect the original perspective in the camera image plus adjustments to account for different viewing angles from the viewer's perspective. This is similar to algorithms used in orthorectification and photogrammetry to recover building surface images from photographs, with the difference that it is being applied in real time to the video reconstruction process. Furthermore, that the algorithms may use the modified vertex and pixel shader programs in a GPU.
Continuing with
At block 762, the raster images are displayed on the 3D model surfaces. As the raster images update, surfaces on the 3D model can change. This allows the 3D model to have a dynamic and “live” behaviour, which corresponds to the video data.
It is appreciated that 3D model video encoding has many applications. By way of background, it is known that 2D imagery can be presented on planes within a 3D scene. However, known methods do not work well when the surface planes in the 2D image are viewed from oblique angles. The present 3D model video encoding method has the advantage of processing 2D video images, correcting those surfaces that are hard to view due to perspective angles, and displaying those surfaces in 3D more clearly from various angles. This technique can also be combined with virtual 3D objects to assist in placing video objects in context.
A ‘pseudo’ 3D scene can also be created. This is akin to the methods used to present ‘street views’ based on video cameras. Video imagery is captured using a set of cameras arranged in a pattern and stored. The video frames can be presented within a 3D view that shows the frames from the vantage point of the view, which can further be rotated around because video frames exist from multiple angles for a given view. The 3D view is not constrained to be presented from viewpoints and camera angles that correspond to the original sensor angles.
2D video images can also be used to statically paint a 3D model. In this case, georeferenced video frames are used to create static texture maps. This allows a virtual view from any angle, but does not show dynamically updating (live) data.
In an example application, a street scene is being rendered in 3D on a computer screen. This scene could be derived, for instance, from building models extracted from video or LiDAR, using method described above. The building models are stored in a database and transmitted over a network to a remove viewing device. A user would ‘virtually’ view the scene from a viewpoint standing on the street, in front of one of the buildings. In the real world, a car is going down the actual street, which is the same street corresponding to the virtual street depicted in the 3D scene. A video or camera sensor mounted on one of the buildings is imaging the real car. The 3D model video encoding method is able to process the video images; derive a series of surfaces that make up the car; encode a 3D model of that car's surfaces with imagery from the video mapped to the surfaces; and transmit the 3D model of the car as a live video ‘avatar’ to the remote viewer. Therefore, the car can be displayed in the 3D remote scene and viewed from different angles in addition to those angles captured by the original video camera. In other words, the remote viewer, from the vantage point of the street, could display the car moving down the street, even though the original video camera that identified the car was in a different location than the virtual viewpoint.
In another example, there is a conference with a set of participants, with some participants attending ‘virtually’. One of the participant's ‘virtual’ vantage point is at the head of a table. A set of sensors images the room from opposite corners of the ceiling. Algorithms associated with the sensor data would identify the room's contents and participants in the conference. The algorithms would then encode a set of 3D objects for transmission to a remote viewer. The virtual attendee could ‘attend’ the conference by displaying the 3D room and its participants on his large screen TV. By attaching a simple tracking device to the participant's headset (e.g. such as those used for simulation games), the participant could turn their head and look at each of the other participants as they spoke. The remote viewer would display the participants' 3D avatars, whereby the 3D avatars would be correctly positioned in the room according to their actual positions in the conference room. The scene, as displayed on the remote viewer, would be moving as the virtual attendee moved, giving the virtual attendee a realistic sensation of being at the table in the room.
It can therefore be seen that encoding a 3D model with 2D video has many applications and advantages, which are not limited to the examples provided herein.
In another aspect, systems and methods are also provided for allowing a user to determine how a 3D scene is viewed (e.g. using module 70). Navigation tools are provided, whereby upon receiving user inputs associated with the navigation buttons, the view of the 3D scene being displayed on a screen changes.
This proposed system and method for geospatial navigation facilitates user interaction with geospatial datasets in 3D space, particularly on mobile devices (e.g. smart phones, PDAs, mobile phones, pagers, tablet computers, net books, laptops, etc.) and embedded systems where user interaction is not performed on a desktop computer through a mouse. Some of the innovations are however also useful on the desktop, and the description is not meant to exclude it.
By way of background, geospatial data refers to polygonal data comprising ground elevation, potentially covering a wide area It can also refer to imagery data providing ground covering; 3D features and building polygonal models; volumetric data such as point clouds, densities, and data fields; vector datasets such as networks of roadways, area delineations, etc.; and combinations of the above.
Most 3D UI navigation systems make use of several methods to enable movement throughout a 3D dataset. These can include a set of UI widgets (e.g. software buttons) that enable movement or view direction rotation (e.g. look left, look right). These widgets may also provide a viewer with location awareness and the ability to specify a new location via dragging, point, or click. These methods are difficult to use when trying to precisely position a viewpoint relative to a point of interest. The navigation is typically performed relative to a users perspective, and therefore, can be imprecise when attempting to focus the virtual camera's view on a object.
Other known navigation methods include a pointing device, such as a mouse, which may be enabled to provide movement or view direction rotation. These methods are good for natural interaction, but again do not facilitate focus on a certain object.
One of the limitations with most navigation methods is that, although some may support ‘fly through’, they do not provide methods that allow a user to rapidly look at objects of interest. Another difficulty with most navigation interfaces is that they give poor awareness as to what is behind a viewer.
The proposed geospatial navigation system and method includes the behaviour of a ‘camera’ on a boom, similar to camera boom used to film movies. Camera booms, also called camera jibs or cranes, allow a camera to move in many degrees of freedom, often simultaneously. This navigation behaviour allows for many different navigation movements. In the geospatial navigation system, objects, preferably all objects, in the 3D scene become interactive. In other words, objects can be selected through a pointer or cursor.
The pointer or cursor can be controlled through a touch screen, mouse, trackball, track pad, scroll wheel, or other pointing devices. Selection may also be done via discrete means (e.g. jumping from target to target based on directional inputs). Upon selecting an object, the viewpoint of the display can be precisely focused on the selected object. Navigation buttons are provided for manipulating a camera direction and motion relative to a selected object or focus point, thereby displaying different angles and perspectives of the selected object or focus point. Navigation buttons are also provided for changing the camera's focus point by selecting a new object and centering the camera focus on the new object.
Inputs may also be used to manipulate ‘boom rotations’ about the focus object (azimuth and elevation) either smoothly or in discrete jumps through an interval or preset values. This uses the camera boom approach. These rotations can be initiated by selecting widgets, using a pointing device input, or through touch screen controls. The length of the camera boom may also be controlled, thereby controlling the zoom (e.g. the size of the object relative to the display area). The length of the boom may be manipulated using a widget, mouse wheel, or pinch-to-zoom touch screen, or in discrete increments tied to buttons, or menus. It can be appreciated that the representation of the navigation interfaces can vary, while producing similar navigation effects.
Example including activating a forward motion button, thereby translating or moving the virtual camera along the terrain, or up the side of a building. These motions take into account the intersection of the camera's boom with the 3D scene.
Other controls include elevating the virtual camera's location above the height of the ground, as a camera might be manipulated in a movie by elevating its platform.
Other camera motions that are interactive can be supported, such as moving the virtual camera along a virtual ‘rail’ defined by a vector or polygonal feature.
Navigation may be enhanced by linking a top-down view of a 2D map to the 3D scene, to present a correlated situation awareness. For instance, a top-down view or plan view of the 3D scene point may be displayed in the 2D map, whereby the map would be centered on the same focal point as the virtual camera's 3D focal point. As the camera's focal point moves, the correlated plan view in the 2D map also moves along. Additionally, as the virtual camera rotates, the azimuth of the camera's view is matched to the azimuth of the top-down view. In other words, the top-down view is rotated so that the upwards direction on the top-down view is aligned with the facing direction of the virtual camera. For example, if the virtual camera rotates to face East, then the top-down view consequently rotates so that the East facing direction is aligned with the upwards direction of the top-down view. The range of the 2D map, that is the amount of distance displayed in the plan view, can be controlled by altering the virtual camera boom length or height of the virtual camera above map in the 2D mode. This allows the 2D map to show a wide area, while the 3D perspective view is close up.
This method advantageously allows for precise and intuitive navigation around 3D geospatial data. Further, since the navigation method allows both continuous and discrete motions, a viewpoint can be precisely positioned and adjusted more conveniently. The method also allows both wide areas and small areas to be navigated smoothly, allowing, for instance, a viewer to transition from viewing an entire state to a street-level walk through view easily. Finally, the method is not reliant on specialized input devices or fine user motions based on clicking devices. This makes it suitable for embedded applications such as touch screens, in-vehicle interfaces, devices with limited inputs (e.g. pilot hat switch), or displays with slow refresh rates where controlling smooth motion is difficult.
Turning to
Turning to
Azimuth buttons or controls 804 and 802 change the azimuth of the viewing angle, while still maintaining focus point 788 at the center of the screen, although from different angles. For example, upon receiving an input associated with azimuth button 804, the perspective viewing angle of the focus point 788 rotates counter clockwise. Upon receiving an input associated with azimuth button 802, the perspective viewing angle rotates clockwise about the focus point 788. In both the elevation and azimuth navigation changes, the geospatial location of the focus point within the 3D scene remains the same.
Zoom buttons or controls 792 and 804 allow for the screen view to zoom in to (e.g. using zoom button 792) and zoom out from (e.g. using zoom button 804) the focus point 788. Although the zoom settings may change, the geospatial location of the focus point 788 within the 3D scene remains the same.
In order change focus points, forward translation button 790 and backward translation button 808 can be used to advance the camera view point forward and backward, respectively. This is similar to moving a camera boom forward or backward along a rail. For example, upon receiving an input associated with forward translation button 790, the screen view translates forward, including the focus point 788. In other words, a new focus point having a different location coordinates is selected, whereby the new focus point is at the center of the screen 786. Similarly, the spatial coordinates of the focus point 788 changes when selecting any one of sideways translation buttons 798 and 800. When selecting the right translation button 800, the screen view shifts to the right, including the location of the focus point 788.
Turning to
Control interface 812 also has navigation controls for reorienting the azimuth and elevation viewing angles. Receiving an input associated with elevation control 820 (e.g. the upward arrow) causes the elevation angle of the screen view to increase, while receiving an input associated with elevation control 822 (e.g. downward arrow) causes the elevation angle to decrease. Receiving an input associated with azimuth control 816 (e.g. right arrow) causes the azimuth angle of the screen view to rotate in one direction, while receiving an input associated with azimuth control 818 (e.g. left arrow) causes the azimuth angle of the screen view to rotate in another direction. The change in the azimuth and elevation viewing angles are centered on a focus point.
A virtual joystick 824, shown by the circle between the arrows, allows the screen view to translate forward, backward, left and right. This also changes the 3D coordinates of the focus point. As described earlier, the focus point can be an object. Therefore, as a user moves through a 3D scene, new points or objects can be selected as the screen's focus, and the screen view can be rotated around the focus point or object using the controls described here.
Control interface 812 also includes a vertical translation control 826 which can be used to vertically raise or lower the screen view. For example, this effect is conceptually generated by placing the virtual camera 780 on an “elevator” that is able to move up and down. By moving a pointer, or in a touch screen, sliding a finger, up the vertical translation control 826, the screen view translates upwards, while moving a finger or sliding a finger downwards causes the screen view to translate downwards. This control 826 can be used, for example, ascend or descend the wall of a building in the 3D scene. For example, if a user wished to scan the side of a building from top-to-bottom, the user can set the building as the focus point. Then, from the top of the building, the user can use the vertical translation control 826 to move the screen view of the building downwards, while still maintaining a view of the building wall in the screen view.
Continuing with
The top-down view 828 can also be used as control interface to select new focus points or focus object. For example, both the top-down view 828 and the perspective screen view may be centered on a first object. Upon receiving an input on the top-down view 828 associated with a second object shown on the top-down view 828, the focus point of the top-down view 828 and the perspective screen view shift to center on the location coordinates of the second object. In a more specific example, the perspective screen view and top-down view may be centered on a bridge. However, the top-down view 828 may be able to show more objects, such as a nearby building located outside the perspective screen view. When a user selects the building in the top-down view 828 (e.g. clicks on the building, or taps the building), the focus point of the top-down view 828 and the perspective screen view shift to be centered on the building. The user can then use the azimuth and elevation control to view the building from different angles. It can therefore be seen that the top-down view 828 facilitates quick navigation between different objects.
It can be appreciated that the above-described user interfaces can vary. The buttons and controls can be activated by using a pointer, a touch screen, or other known user interface methods and systems. It can also be appreciated that the above geospatial navigation advantageously allows for precise navigation and viewing around a 3D scene. Further, although the above examples typically relate to continuous or smooth navigation, the same principles can be used to implement discrete navigation. For example, controls or buttons for “ratchet” zooming (e.g. changing the zoom between discrete intervals) or ratchet azimuth and elevation angle shifts can be used to navigate a 3D scene.
In general, a method is provided for displaying data having spatial coordinates, the method comprising: obtaining a 3D model, the 3D model comprising the data having spatial coordinates; generating a height map from the data; generating a color map from the data; identifying and determining a material classification for one or more surfaces in the 3D model based on at least one of the height map and the color map; based on at least one of the 3D model, the height map, the color map, and the material classification, generate one or more haptic responses, the haptic responses able to be activated on a haptic device; generating a 3D user interface (UI) data model comprising one or more model definitions derived from the 3D model; generating a model definition for a 3D window, the 3D window able to be displayed in the 3D model; actively updating the 3D model with video data; displaying the 3D model; and receiving an input to navigate a point of view through the 3D model to determine which portions of the 3D model are displayed.
In general a method is provided for generating a height map from data points having spatial coordinates, the method comprising: obtaining a 3D model from the data points having spatial coordinates; generating an image of least a portion of the 3D model, the image comprising pixels; for a given pixel in the image, identifying one or more data points based on proximity to the given pixel; determining a height value based on the one or more data points; and associating the height value with the given pixel.
In another aspect, the 3D model is obtained from the data points having spatial coordinates by generating a shell surface of an object extracted from the data points having spatial coordinates. In another aspect, the shell surface is generated using Delaunay's triangulation algorithm. In another aspect, the 3D model comprises a number of polygons, and the method further comprises reducing the number of polygons. In another aspect, the 3D models comprises a number of polygons, and the image is of at least one polygon of the number of polygons. In another aspect, the one or more data points based on the proximity to the given pixel comprises a predetermined number of data points closest to the given pixel. In another aspect, the predetermined number of data points is one. In another aspect, the one or more data points based on the proximity to the given pixel are located within a predetermined distance of the given pixel. In another aspect, every pixel in the image is associated with a respective height value.
In general a method is provided for generating a color map from data points having spatial coordinates, the method comprising: obtaining a 3D model from the data points having spatial coordinates; generating an image of least a portion of the 3D model, the image comprising pixels; for a given pixel in the image, identifying a data point located closest to the given pixel; determining a color value of the data point located closest to the given pixel; and associating the color value with the given pixel.
In another aspect, the color value is a red-green-blue (RGB) value. In another aspect, the 3D model is obtained from the data points having spatial coordinates by generating a shell surface of an object extracted from the data points having spatial coordinates. In another aspect, the shell surface is generated using Delaunay's triangulation algorithm. In another aspect, the 3D model comprises a number of polygons, and the method further comprises reducing the number of polygons. In another aspect, the 3D models comprises a number of polygons, and the image is of at least one polygon of the number of polygons. In another aspect, every pixel in the image is associated with a respective color value.
In general, a method is provided for determining a material classification for a surface in a 3D model, the method comprising: providing a type of an object corresponding to the 3D model; providing an image corresponding to the surface in the 3D model, the image associated with a height mapping and a color mapping; and determining the material classification of the surface based on the type of the object, and at least one of the height mapping and the color mapping.
In another aspect, the material classification is associated with the object. In another aspect, the method further comprising selecting a material classification algorithm from a material classification database based on the type of the object. In another aspect, the method further comprising applying the material classification algorithm, which includes analyzing at least one of the height mapping and the color mapping. In another aspect, the 3D model is generated from data points having spatial coordinates. In another aspect, the type of the object is any one of a building wall, a building roof, and a road. In another aspect, the type of the object is the building wall if the object is approximately perpendicular to a ground surface in the 3D model; the type of the object is the building roof if the object is approximately perpendicular to the building wall; and the type of the object is the road if the object is approximately parallel to the ground surface. In another aspect, the method further comprising increasing a contrast in color of the color mapping of the image. In another aspect, the type of the object is a wall, and the method further comprising, if there are no straight and parallel lines in the color mapping that are approximately horizontal relative to a ground surface in the 3D model, determining the material classification for the surface to be stucco. In another aspect, the type of the object is a wall, and the method further comprising: if there are straight and parallel lines in the color mapping that are approximately horizontal relative to a ground surface in the 3D model, and, if there are straight lines perpendicular to the straight and parallel lines, determining the material classification for the surface to be brick; and if there are straight and parallel lines in the color mapping that are approximately horizontal relative to a ground surface in the 3D model, and, if there are no straight lines perpendicular to the straight and parallel lines, determining the material classification for the surface to be siding. In another aspect, the type of the object is a wall, and the method further comprising, if there are rectangular shaped elevations or depressions in the height mapping, determining the material classification to be windowing material. In another aspect, the type of the object is a roof, and the method further comprising: if there are no straight and parallel lines in the color mapping, and if the surface is gray, determining the material classification to be gravel; and if there are no straight and parallel lines in the color mapping, and if the surface is black, determining the material classification to be asphalt. In another aspect, wherein the type of the object is a roof, and the method further comprising: if there are straight and parallel lines in the color mapping, and if there are straight lines perpendicular to the straight and parallel lines, determining the material classification for the surface to be shingles; and if there are straight and parallel lines in the color mapping, and if there are no straight lines perpendicular to the straight and parallel lines, determining the material classification for the surface to be tiles. In another aspect, the type of the object is a roof, and the method further comprising: if a height variance of the height mapping is lower than a threshold, determining the material classification for the surface to be any one of shingles, asphalt and gravel; and if not, determining the material classification for the surface to be tiling.
In general, a method of providing a haptic response is provided, the method comprising: displaying on a display screen a 2D image of a 3D model; detecting a location of a pointer on the display screen; correlating the location of the pointer on the 2D image with a 3D location on the 3D model; and if the 3D location corresponds with one or more features of the 3D model providing the haptic response.
In another aspect, the one or more features of the 3D model comprises at least a first polygon and a second polygon that are not co-planar with each other, and as the pointer moves from the first polygon to the second polygon, providing the haptic response. In another aspect, the one or more features comprises a change in depth of a surface on the 3D model, and as the pointer moves across the surface, providing the haptic response. In another aspect, the one or more features comprises a height map associated with the 3D model, the height map comprising one or more pixels each associated with a height, and as the pointer moves over a pixel in the height map that is raised or lowered over a surface of the 3D model, providing the haptic response. In another aspect, the one or more features of the 3D model comprises a surface that has a textured material classification, and as the pointer moves over the surface, providing the haptic response. In another aspect, the haptic response is provided by a haptic device. In another aspect, the haptic device comprises any one of a buzzer and a piezoelectric strip actuator.
In general, a method is provided for displaying a window on a display screen, the window defined by a polygon in a plane located in a 3D space, the method comprising: computing clipping planes projecting from each edge of the polygon, the clipping planes normal to the polygon; providing a 3D object in the window, a portion of the 3D object located within a space defined by the clipping planes and the polygon, and another portion of the 3D object located outside the space defined by the clipping planes and the polygon; computing a surface using a surface triangulation algorithm for the portion of the 3D object located within a space defined by the clipping planes and the polygon, the surface comprising triangles; and when displaying the 3D object on the display screen, rendering the triangles of the surface.
In another aspect, wherein: the polygon comprises vertices and boundary lines forming the edges of the polygon; at each vertex a vector that is normal to the plane is computed; and each clipping plane is defined by at least one vector that is normal to the plane and at least one edge. In another aspect, at least one of edge of at least one of the triangles, located within the portion of the 3D object located within the space defined by the clipping planes and the polygon, are flush with at least one edge of the polygon.
In general, a method is provided for displaying at least two 3D objects in a window on a display screen, the window defined by a polygon in a plane located in a 3D space, and a first 3D object having Z-order than a second 3D object, the method comprising: rendering a first virtual shape having a first outline matching the first 3D object, the first virtual shape comprising a first set of triangles; rendering a second virtual shape having a second outline matching the second 3D object, the second virtual shape comprising a second set of triangles; determining a portion of the second 3D object that is not occluded by the first 3D object; applying a surface triangulation algorithm for the portion of the second 3D object; and rendering the portion of the second 3D object.
In another aspect, the surface triangulation algorithm is a Delaunay triangulation algorithm. In another aspect, a Z-order of a third 3D object is higher than the Z-order of the first 3D object, the method further comprising: determining a portion of the first 3D object that is not occluded by the third 3D object; applying the surface triangulation algorithm for the portion of the first 3D object; and rendering the portion of the first 3D object.
In general, a method is provided for interacting with one or more 3D objects displayed on a display screen, the 3D objects located in a 3D space, the method comprising: determining a 2D location of a pointer on the display screen; computing a 3D ray from the 2D location to a 3D point in the 3D space; generating a 3D boundary around the 3D ray; identifying the one or more 3D objects that intersect the 3D boundary; identifying a 3D object, of the one or more 3D objects, that is closest to a point of view of the 3D space being displayed on the display screen; and providing a focus for interaction on the 3D object that is closest to the point of view.
In another aspect, if the 3D object, that is closest to the point of view, is interactive, upon receiving a user input associated with the pointer, performing an action.
In general, a method is provided for organizing a data for visualizing one or more 3D objects in a 3D space on a display screen, the method comprising: associating with the 3D space the one or more 3D objects; associating with the 3D space a point of view for viewing the 3D space, the point of view defined by at least a location in the 3D space; and associating with each of the or more 3D object a model definition, the model definition comprising a variable definition, a geometry definition, and a logic definition.
In another aspect, the variable definition comprises names of one or more variables and data types of the one or more variables. In another aspect, the logic definition comprises inputs, logic algorithms, and outputs. In another aspect, the geometry definition comprises data structures representing at least one of vertices, polygons, lines and textures. In another aspect, each of the one or more 3D objects is an instance of the model definition, the instance comprising a reference to the model definition and one or more variable values corresponding to the variable definition.
In general, a method is provided for encoding video data for a 3D model, the method comprising: detecting a surface in the video data that persistently appears over multiple video frames; determining a surface of the 3D model that corresponds with the surface in the video data; extracting 2D image data from the surface in the video data; and associating the 2D image data with an angle of incidence between a video sensor and the surface in the video data, wherein the video sensor has captured the video data.
In another aspect, the method further comprising deriving one or more surfaces from the video data, the surface in the video data being one of the one or more surfaces. In another aspect, the method further comprising detecting multiple surfaces in the video data that persistently appear over the multiple video frames, and if the number of the multiple surfaces in the video data that correspond to the 3D model is less than a threshold, new surfaces are derived from the video data.
In general, a method is provided for decoding video data encoded for a 3D model, the video data comprising a 2D image and an angle associated with a surface in the 3D model, the method comprising: covering the surface in the 3D model with the 2D image; and interpolating the 2D image based on at least the angle.
In another aspect, the angle is an angle of incidence between a video sensor and a surface in the video data, the surface in the video data corresponding to surface in the 3D model, wherein the video sensor has captured the video data. In another aspect, the 2D image is interpolated also based on an angle at which the 3D model is viewed.
In general, a method is provided for controlling a point of view when displaying a 3D space, the method comprising: selecting a focus point in the 3D space, the point of view having a location in the 3D space; computing a distance, an elevation angle and an azimuth angle between the focus point and the location of the point of view; receiving an input to change at least one of the distance, the elevation angle and the azimuth angle; and computing a new location of the point of view based on the input while maintaining the focus point.
In another aspect, the method further comprising selecting a new focus point in the 3D space for the point of view.
The above principles for viewing 3D spatial data may be applied to a number of industries including, for example, mapping, surveying, architecture, environmental conservation, power-line maintenance, civil engineering, real-estate, budding maintenance, forestry, city planning, traffic surveillance, animal tracking, clothing, product shipping, etc. The different software modules may be used alone or combined together.
The steps or operations in the flow charts described herein are just for example. There may be many variations to these steps or operations without departing from the spirit of the invention or inventions. For instance, the steps may be performed in a differing order, or steps may be added, deleted, or modified.
While the basic principles of this invention or these inventions have been herein illustrated along with the embodiments shown, it will be appreciated by those skilled in the art that variations in the disclosed arrangement, both as to its details and the organization of such details, may be made without departing from the spirit and scope thereof. Accordingly, it is intended that the foregoing disclosure and the showings made in the drawings will be considered only as illustrative of the principles of the invention or inventions, and not construed in a limiting sense.
The present application claims priority from U.S. Provisional Application No. 61/382,408 filed on Sep. 13, 2010, the entire contents of which are hereby incorporated by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2011/051445 | 9/13/2011 | WO | 00 | 8/1/2013 |
Number | Date | Country | |
---|---|---|---|
61382408 | Sep 2010 | US |