This application is related to co-pending and jointly-owned U.S. patent application Ser. No. 11/415,960, entitled “Coverage Mask Generation For Large Images,” filed May 2, 2006, which application is incorporated by reference herein in its entirety.
The subject matter of this application is generally related to image processing.
Dramatic improvements in computer processing power and broadband streaming technology have lead to the development of interactive three-dimensional (3D) map systems for navigating the Earth. Interactive 3D map systems typically provide a user interface (UI) with navigation controls for dynamically navigating cities, neighborhoods and other terrain. The navigation controls enable users to tilt, pan, rotate and activate 3D terrain and buildings for different perspectives at a point of interest.
The production imagery used by interactive 3D map systems is typically derived by processing large pieces of geo-located imagery or “assets,” which can be taken from a single pass of a satellite or stitched together from multiple aerial photos. Once the imagery is processed it can be moved to datacenters where it can be distributed to other devices. To ensure that accurate 3D maps are generated, the production imagery is periodically updated in the datacenters. Unfortunately, the updating of large scale satellite imagery (and terrain data) for the entire Earth can be a time consuming and laborious process.
Assets of raw geo-located imagery can be divided into tiles or other shapes and coverage masks can be generated for each tile. For each tile, fragments of pixels from coverage masks of neighboring tiles can be extracted and tagged. The fragments can be sorted and stored in a data structure so that fragments having the same tag can be grouped together in the data structure. The fragments can be used to feather the coverage mask of the tile to produce a blend mask.
Multi-resolution imagery and mask pyramids can be generated by extracting fragments from tiles and minified (e.g., down-sampled). The minified fragments can be tagged (e.g., by ancestor tile name), sorted and stored in a data structure, so that fragments having like tags can be stored together in the data structure. The fragments can be assembled into fully minified tiles for each level in the image pyramids.
Output tiles from the processes described above can be output into a first projection (e.g., latitude/longitude). The first projection can be re-projected into a second projection (e.g., a Mercator projection) using techniques that minimize distortion in the re-projected imagery.
In some implementations, a method of generating a blend mask includes: dividing imagery into tiles; retrieving a coverage mask associated with a tile defining the location of imagery in the tile, where the tile is at least partially surrounded by neighboring tiles; extracting image fragments from coverage masks associated with the neighboring tiles; tagging the image fragments; organizing the image fragments into groups by tags; and feathering the boundaries of the coverage masks using fragments having the same tags to generate a blend mask for the tile.
In some implementations, a method of generating a multi-resolution image pyramid includes: dividing imagery into tiles; for each resolution level in the image pyramid, generating a minified fragment from each tile; tagging the minified fragments with ancestor tile identifiers; grouping fragments with the same tag; and compositing minified fragments with the same tag to produce minified tiles at each level of the image pyramid.
In some implementations, an imagery re-projection method includes: dicing imagery in a first projection into tiles; for each tile, determining output resolution levels to which the tile will be mapped; generating fragments from the tiles for each output level; tagging the fragments with output tile identifiers; grouping fragments with the same tag; and assembling fragments with the same tag to produce output tiles in a second projection.
Other implementations of large-scale image processing using mass parallelization techniques are disclosed, including implementations directed to systems, methods, apparatuses and computer-readable mediums.
In some implementations, the UI 100 is generated and presented by a user device. A client application on the user device can communicate with one or more datacenters over a network (e.g., the Internet, intranet, wireless network, etc.) to retrieve imagery and associated meta-data from one or more server systems in the datacenters. Such a client/server architecture reduces the amount of data that a user device stores locally to operate the interactive 3D map system. The imagery provided by datacenters can be generated from raw satellite imagery and other information (e.g., terrain data, vector data, etc.) which is processed before being served to user devices, as described with respect to
In some implementations, the blending process 204 orders and blends together processed images generated by the ingestion process 202. The blended image products are made available to datacenters 210 through a file system 206 and a delivery channel 208. The preproduction phase can be implemented using mass parallelization techniques, as described with respect to
In the production phase, one or more datacenters 210 retrieve the image products from the file system 206 and deliver the image products to user devices 212 through a network 214 (e.g., Internet, intranet, Ethernet, wireless network, etc.). The image products can include imagery and associated meta-data for one or more locations on the Earth. An exemplary file system 206 can be Google Inc.'s Global File System (GFS), as described in Ghemawat, Sanjay et al., “The Google File System,” Association For Computing Machinery (ACM), 19th Symposium On Operating System Principles (SOSP), Oct. 19-22, 2003, Lake George, N.Y., which article is incorporated by reference herein in its entirety.
User devices 212 can be any electronic device capable of displaying a map, including but not limited to: personal computers (portable or desktop), mobile phones, smart phones, personal digital assistants (PDAs), game consoles, high definition televisions, set-top boxes, navigation systems (e.g., global positioning system (GPS)), avionics displays, etc. The system 200 is exemplary and other configurations and arrangements for image processing and delivery are possible. For example, the ingestion and blending processes could be performed in the datacenters. Also, the tile imagery and meta-data could be provided to the datacenters by different sources.
Large pieces of geo-located imagery are taken from a single pass of a satellite or are stitched together from multiple aerial photos. These raw images or “assets” can be received from one or more sources and can have a variety of orientations. The assets can be re-projected 302 into a suitable coordinate system for the map system (e.g., a geospatial coordinate system) and stored in one or more data structures 312 (e.g., database table). In some implementations, the re-projected assets are divided 304 into tiles which are processed independently in a parallel processing infrastructure. Ideally, tiles are stored so tiles that include imagery for geographic locations that are close to each other have a high probability of being stored on the same machine or in the same machine cluster to reduce the overhead associated with accessing information located on multiple machines. To achieve this ideal condition, the tiles can be sized to fall within the storage constraints of the machines or a cluster of machines. The assets can be divided into any desired shape. A tile shape, however, typically requires less computational and/or representational overhead during processing.
After an asset is re-projected and divided into tiles, the tiles can be minified (i.e., down-sampled) and stored in the data structure 312. The size of a minified tile can be selected so that the minified tile can fit into the memory of a single machine to facilitate efficient parallel processing, as previously described.
A coverage mask can be generated 306 for each minified tile and stored in the data structure 312. A coverage mask is essentially a 2D mask through which the imagery can be seen. Optionally, an “alpha” value can be associated with each mask for fading the imagery during the blending process 204. A coverage mask can be, for example, a binary file that contains a binary number for each pixel in a tile. A binary “1” can indicate the presence of imagery in a pixel and a binary “0” can indicate the absence of imagery in a pixel. When the coverage mask is mapped to its associated processing tile, the pixels that do not contain imagery can be masked out.
After the coverage masks are generated they can be feathered into blend masks and stored in the data structure 312. The blend masks can be used to feather the boundaries of high-resolution imagery against coarser resolution imagery during the blending process 204. The feathering of coverage masks can be done by replacing each pixel with a function of the average of other pixels in a small box around the pixel, as described with respect to
During feathering, the fragments 504a, 504b, 504c, 504d, 504f, 504g, 504h, 504i, that contribute are assembled with the coverage mask for Tile E. In this example, the fragments 504a, 504b, 504c, 504d, 504f, 504g, 504h, 504i, extracted from coverage masks for neighboring Tiles A-D and F-I, are assembled around the coverage mask for Tile E. A mask feathering filter 606 is used to average or smooth all the pixel values. The pixels that are surrounded by neighboring pixels of the same value can remain unchanged, while pixels that are surrounded by neighboring pixels with value 0 and neighboring pixels with value 1 can be assigned a fractionary value between 0 and 1, as a result of the averaging process. Note that each of the fragments 504a, 504b, 504c, 504d, 504f, 504g, 504h, 504i can contribute a value to the average calculation. Any suitable known mask feathering filter can be used (e.g., Gaussian blur filter).
In some implementations, simple averaging can be performed using summed area tables or other acceleration structures. In the simple averaging process, pixel values are not directly averaged since directly averaging pixel values can transform pixels that have no image data (e.g., coverage mask=0) into pixels with fractionary blend values which could blend in regions with no data. Instead, only pixels values within the coverage mask are averaged. If the average pixel value is below a threshold value (e.g., 0.6), the pixel can become fully transparent (e.g., blend mask=0). If the average pixel value is within a predetermined range (e.g., between 0.6 and 1.0), then the resulting blend mask can be set to a value between 0 and 1. Upon completion of the feathering process, the fragments 504a, 504b, 504c, 504d, 504f, 504g, 504h, 504i, are extracted and the coverage mask for Tile E (with feathered edges) is stored in the data structure 312 for use as a blend mask in the blending process 204.
Each fragment can be tagged by destination using a neighboring tile ID or key (706). In the example of
After the fragments are tagged and sorted, the fragments can be used by a machine in a parallel processing infrastructure to feather the boundaries of a tile coverage mask using, for example, a mask feathering filter (710). The mask feathering filter can be, for example, a low pass filter that averages pixels in the coverage mask and the fragments so that the boundaries of the coverage masks gradually taper from fully opaque to fully transparent.
The feathered coverage mask (i.e., a blend mask) can be stored separately from its associated imagery to avoid damage to the imagery caused by, for example, additional compression. The blend mask can be stored losslessly since it usually consists of low-frequency data and compresses well. By storing the blend mask separately, the feathering parameters (e.g., the width of the feathered region) can be changed without affecting the imagery.
In some implementations, the blend mask can be used in an extra culling round before retrieving the imagery for blending. For example, at blend time if only a small portion of a tile is left un-opaque after a few high-resolution assets have been blended, it is possible that by looking at the blend mask a determination can be made that blending a lower-resolution asset will not touch the “available” part of the tile, and thereby avoid reading or decoding its associated imagery.
An approximation A′ of image A can be constructed with an upsampler 806 and an interpolator filter 808. The approximation image A′ can then be subtracted from the original image A to generate a prediction residual EJ=A-A′. The original image A can be reconstructed using AJ-1 and EJ. Other levels can be generated by downsampling or upsampling as needed.
The pyramid building process is best described by example using actual numbers for dimensions. It should be apparent, however, that the numbers described are only examples and that actual dimensions can differ. During image pyramid generation, a 1024×1024 image can be divided into tiles having 8×8 pixels to form a grid of 128×128 tiles. To get the original 1024×1024 image downsampled once, the 8×8 tiles can be composited back into the 1024×1024 image and downsampled to a 512×512 image. The 512×512 image can then be divided into 8×8 tiles forming a grid of 64×64 tiles.
This approach can be improved by downsampling each of the original 8×8 tiles, producing a 128×128 grid of fragments, where each fragment includes 4×4 pixels. Each 2×2 group of fragments can be composited together. Note that in this example each fragment includes 4×4 pixels, so a 2×2 composite of fragments will include 8×8 pixels. Thus, a fragment is an entire finest-level tile downsampled to a lower resolution.
In this example, each original, high-resolution 8×8 tile knows exactly which ancestor it will contribute to. For example, the tile at row 48, column 33 (numbered from the top left in the 128×128 grid) will contributed to the upper right quadrant of the tile at row 24, column 16 in the downsampled 64×64 grid. That is, to construct a downsampled tile at row 24, column 16, in the 64×64 grid, the system needs to downsample the imagery from tiles (48, 32), (48, 33), (49, 32) and (49, 33) in the full resolution grid (the 128×128 grid), where the numbers in the parenthesis are the row and column of the tile in the full resolution grid.
Since each finest level tile knows which downsampled ancestor tile it will contribute to, the downsampled fragment can be tagged with the ancestor tile ID (e.g., a unique string identifying the tile). The tagged fragments can be sorted to group or cluster together fragments with the same tags (i.e., all the fragments for a particular ancestor tile), and the fragments can be composited into a fully minified tile. Note that each fragment also knows its respective position (i.e., quadrant) in the ancestor tile. An exemplary sorter that is suitable for use in a parallel processing infrastructure is described in Dean, Jeffrey et al. “MapReduce: Simplified Data Processing on Large Clusters.”
The process described above produces one level of the multi-resolution image pyramid. To generate an entire pyramid, each 8×8 tile outputs a 2×2 fragment in addition to the 4×4 fragment, where the 2×2 fragment has been downsampled twice, and also a 1×1 fragment (downsampled three times), and so forth. In our example, the tile at row 48, column 33 in the original 128×128 grid, the 2×2 fragment would contribute to (12, 8) in the 32×32 grid (corresponding to the original image downsampled twice, or 256×256 total resolution), and the 1×1 fragment would go to (6,4) in the 16×16 grid (original image downsampled three times, or 128×128 total resolution). The sorting step works the same as before except for successively higher levels in the image pyramid, there is more a more fragments that end up contributing to each output tile. For example, for each output tile in the 32×32 grid, fragments from 16 original tiles will contribute, in the 16×16 grid, each tile will get fragments from 64 original tiles.
In some implementations, the blend masks associated with the tiles are also minified using known averaging techniques, and the imagery is downsampled using weighting from the minified blend mask. The weighting prevents no-data pixels from contributing artifacts (e.g., “black creep”) to the minified tiles, while fully retaining and anti-aliasing the data present in the original image.
a through 10c illustrate distortion of imagery when mapping between two coordinate systems.
Converting between projections for large imagery databases is challenging since the division in tiles is different. Several fragments from multiple tiles in the original projection can combine to form a tile in the final projection. For example, one original tile can contribute fragments to multiple output tiles. Since converting between projections can involve distorting the imagery, applications may need to adjust the resolution level during the conversion process so pixels are never “stretched” by more than a predetermined factor (e.g., a factor of 2); otherwise, the resulting images can look distorted (e.g., blurry, stretched, pixilated, etc.).
Once the output levels are determined, then tile fragments are generated for the output levels (1106). This can be conceptualized as stretching the input tile according to the Mercator re-projection formula, then minifying it by each of the factors of the formula to produce several minified versions. For example, for a magnification of 2, a 2×2 box of input pixels can be transformed into a 1×2.71 box by minifying horizontally and stretching vertically. A box filter can be used on each horizontal row of pixels, and the filtered results can be interpolated vertically. For a magnification factor of 4, a 4×4 box of input pixels will map to a 1×2.71 box in output space. Or equivalently, one output pixel receives contributions from 4×1.476 input pixels. An average of the pixels in a 4×2 box can be used. The minified versions can be intersected with the output tile boundaries on those output levels. In practice, the needed fragments can be created as empty images with the proper dimensions, then filled pixel by pixel. For each pixel in an output fragment, the input pixels that contribute to the output pixel are averaged together to fill the output pixel.
Each fragment is tagged with the label or name of its corresponding output tile (1108) and sorted so that fragments with the same tag are grouped together (1110). Fragments with the same tag are assembled together to produce the output tile in the Mercator projection (1112). Some fragments may overlap if they are generated from different input levels. In such cases, the fragments generated from higher resolution input levels are placed on top of ones from lower resolution input levels.
In the sorting phase, some overlapping fragments may end up completely obscured. This condition can be detected using a pre-pass optimization process. The optimization process performs a mock transformation on the input tile and computes fragment dimensions without filling the fragments with pixels. These hidden fragments can be tagged with a special marker (e.g., a marker in a table in the data structure 312) so that the pixel filling step can be skipped.
Various modifications may be made to the disclosed implementations and still be within the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
2818211 | Macklem | Dec 1957 | A |
5581637 | Cass et al. | Dec 1996 | A |
5652717 | Miller et al. | Jul 1997 | A |
5778092 | MacLeod et al. | Jul 1998 | A |
5796868 | Dutta-Choudhry | Aug 1998 | A |
5987189 | Schmucker et al. | Nov 1999 | A |
6005978 | Garakani | Dec 1999 | A |
6075567 | Ohnishi | Jun 2000 | A |
6075905 | Herman et al. | Jun 2000 | A |
6188804 | Weldy et al. | Feb 2001 | B1 |
6313837 | Assa et al. | Nov 2001 | B1 |
6326965 | Castelli et al. | Dec 2001 | B1 |
6359617 | Xiong | Mar 2002 | B1 |
6434265 | Xiong et al. | Aug 2002 | B1 |
6453233 | Kato | Sep 2002 | B1 |
6470265 | Tanaka | Oct 2002 | B1 |
6470344 | Kothuri et al. | Oct 2002 | B1 |
6493021 | Rouge et al. | Dec 2002 | B1 |
6526176 | Kovacevic et al. | Feb 2003 | B1 |
6591004 | VanEssen et al. | Jul 2003 | B1 |
6625611 | Teig et al. | Sep 2003 | B1 |
6684219 | Shaw et al. | Jan 2004 | B1 |
6694064 | Benkelman | Feb 2004 | B1 |
6720997 | Horie et al. | Apr 2004 | B1 |
6732120 | Du | May 2004 | B1 |
6735348 | Dial et al. | May 2004 | B2 |
6757445 | Knopp | Jun 2004 | B1 |
6766248 | Miyahara | Jul 2004 | B2 |
6842638 | Suri et al. | Jan 2005 | B1 |
6882853 | Meyers | Apr 2005 | B2 |
6985903 | Biacs | Jan 2006 | B2 |
7006110 | Crisu et al. | Feb 2006 | B2 |
7138998 | Forest et al. | Nov 2006 | B2 |
7190839 | Feather et al. | Mar 2007 | B1 |
7248965 | Tanizaki et al. | Jul 2007 | B2 |
7298869 | Abernathy | Nov 2007 | B1 |
7490084 | Kothuri et al. | Feb 2009 | B2 |
7519603 | Parker | Apr 2009 | B2 |
7552008 | Newstrom et al. | Jun 2009 | B2 |
7561156 | Levanon et al. | Jul 2009 | B2 |
20010039487 | Hammersley et al. | Nov 2001 | A1 |
20020101438 | Ham et al. | Aug 2002 | A1 |
20020141640 | Kraft | Oct 2002 | A1 |
20020163582 | Gruber et al. | Nov 2002 | A1 |
20030114173 | Carroll | Jun 2003 | A1 |
20040021770 | Krill | Feb 2004 | A1 |
20040057633 | Mai et al. | Mar 2004 | A1 |
20040081355 | Takahashi | Apr 2004 | A1 |
20040095343 | Forest et al. | May 2004 | A1 |
20040234162 | Jalobeanu et al. | Nov 2004 | A1 |
20040252880 | Takizawa et al. | Dec 2004 | A1 |
20050041842 | Frakes et al. | Feb 2005 | A1 |
20050091223 | Shaw et al. | Apr 2005 | A1 |
20050265631 | Mai et al. | Dec 2005 | A1 |
20050270311 | Rasmussen et al. | Dec 2005 | A1 |
20060143202 | Parker | Jun 2006 | A1 |
20060184519 | Smartt | Aug 2006 | A1 |
20060222079 | Park et al. | Oct 2006 | A1 |
20070182734 | Levanon et al. | Aug 2007 | A1 |
20070276970 | Werner et al. | Nov 2007 | A1 |
20090074275 | O Ruanaidth | Mar 2009 | A1 |