None.
Many mobile devices, such as cellular phones and tablets, include cameras to obtain images of scenes. Such mobile devices are convenient for acquiring images since they are frequently used for other communications, the image quality is sufficient for many purposes, and the acquired image can typically be shared with others in an efficient manner. The three dimensional quality of the scene is apparent to the viewer of the image, while only two dimensional image content is actually captured.
Other mobile devices, such as cellular phones and tablets, with a pair of imaging devices are capable of obtaining images of the same general scene from slightly different viewpoints. The acquired pair of images obtained from the pair of imaging devices of generally the same scene may be processed to extract three dimensional content of the image. Determining the three dimensional content is typically done by using active techniques, passive techniques, single view techniques, multiple view techniques, single pair of images based techniques, multiple pairs of images based techniques, geometric techniques, photometric techniques, etc. In some cases, object motion is used to for processing the three dimensional content. The resulting three dimensional image may then be displayed on the display of the mobile device for the viewer. This is especially suitable for mobile devices that include a three dimensional display.
The foregoing and other objectives, features, and advantages of the invention will be more readily understood upon consideration of the following detailed description of the invention, taken in conjunction with the accompanying drawings.
Referring to
Referring to
While the determination of one or more properties of a three-dimensional scene by a mobile device is advantageous, it is further desirable that the selection of the determination be suitable for a pleasant user experience. For example, the user preferably interacts with a touch screen display on the mobile device to indicate the desired action. In addition, the mobile device may include two-way connectivity to provide data to, and receive data in response thereto, a server connected to a network. The server may include, for example, a database and other processing capabilities. In addition, the mobile device may include a local database together with processing capabilities.
The three dimensional characteristics of an image may be determined in a suitable manner. The mobile device typically includes a pair of cameras which have parallel optical axes and share the same imaging sensor. In this case, the three-dimensional depth) (Z3D) is inversely proportion to the two-dimensional disparity (e.g., disp). With a pair of cameras having parallel optical axes (for simplicity purposes) the coordinate system may be referenced to the left camera. The result of the determination is an estimated depth of the position P in the image. The process may be repeated for a plurality of different points in the image. In another embodiment the mobile device may use a pair of cameras with non-parallel camera axes. The optical axes of the cameras are either converging or diverging. The 3D coordinates of the matched image points are computed as intersection point of 3D rays extended from the original 2D pixels in each image. This process may be referred to as “triangulation”. The three dimensional coordinates of the object of interest (namely, x, y, and z) may be determined in any suitable manner. The process may be repeated for a plurality of different points in the image. Accordingly, based upon this information, the distance, length, surface area, volume, etc. may be determined for the object of interest.
Referring to
The user of the mobile device may capture a stereo image pair with active guidance 310 that includes the object of interest. Referring to
In many cases, the camera functionality of the mobile device may be operated in a normal fashion to obtain pictures. However, when the three dimensional image capture and determination feature is instigated, the active guidance 310 together with the horizontal line 130 is shown, which is different in appearance than other markings that may occur on the screen of the display during normal camera operation.
Referring again to
Typically, the imaging sensors on mobile devices have a relatively small size with high pixel resolution. This tends to result in images with a substantial amount of noise, especially in low light environments. The high amount of image noise degrades the pixel matching accuracy between the corresponding pair of images, and thus reducing the accuracy of the three dimensional position estimation. To reduce the noise, the system checks if the image is noisy 340, and if sufficiently noisy, a noise reduction process 350 is performed, using any suitable technique. Otherwise, the noise reduction process 350 is omitted. The noise reduction technique may include a bilateral filter. The bilateral filter uses an edge preserving (and texture) filter and noise reducing smoothing filter. The intensity value at each pixel in an image is replaced by a weighted average of intensity values of nearby pixels. This weight may be based on a Gaussian distribution. The weight may depend not only on the Euclidean distance but also on radiometric differences (differences in the range, e.g., color intensity). This preserves sharp edges by systematically looping through each pixel and according weights to the adjacent pixels accordingly.
Referring to
where p and q are spatial pixels locations, and Ip and Iq are pixel values of pixels p and q, G is a Gaussian distribution function, and Wp is a normalization factor. The new value for pixel p may be computed as
Iq=ΣqεSwqIq
After the noise reduction process 350, if applied, the user may touch the screen to identify the points of interest of the object 360. When a user's finger touches the screen, it is preferable that a magnified view of the current finger location is displayed. Since the pixel touched by the finger may not be the exact point that the user desired, it is desirable to refine the user selection 370 by searching a local neighborhood around the selected point to estimate the likely most salient pixel. The most salient points based upon the user-selected pixels are preferably on object edges and/or object corners. The matching point in the other view is preferably computed by using a disparity technique.
Referring to
E(u, v)=Σx,yw(x, y)[I(x+u, y+v)−i(x, y)]2
where w(x,y) is a weighting function, and l(x,y) is an image intensity. By Taylor series approximation, E may be for example,
where M is a 2×2 matrix computed from image derivates, for example,
The score of a pixel may be computed as
S=det(M)−k(trace(M))2 where k is an empirically determined constant between 0.04 and 0.06.
The pixel with the greatest maximum score 376 is selected to replace the user's selected point 378.
Based upon the identified points, as refined, the system may determine the matching points 380 for the pair of images, such as given one pixel in the left image, xl, a matching technique may find its corresponding pixel in the right image, xr. Referring to
The extracted reference block 386 is compared with the candidate image blocks 384 to determine a cost value associated with each 387, representative of a similarity measure. The candidate with the smallest cost value is selected 388 as the corresponding pixel location in the other image. The disparity d may be computed as d=xi−xr.
The quantitative accuracy of three dimensional measurements is that the error of the estimated depth is proportional to the square of the absolute depth value and the disparity error, and thus additional accuracy estimation of the disparity is desirable. The location of the matching point 380 may be further modified for sub-pixel accuracy 390. Referring also to
The three dimensional coordinates of the identified points of interest are calculated 400. Referring to
where B is a baseline length between stereo cameras and f is the focal length of both cameras. Referring to
An accuracy measurement error value of the computed 3D coordinates can be predicted 410 for each measurement and visualized on the image (if desired), to indicate how reliable the estimated 3D coordinate values are. It can be represented as a percentage relative to the original absolute value, e.g., +/−5% of 5 meters. The geometric object parameters may be calculated and displayed 420.
The terms and expressions which have been employed in the foregoing specification are used therein as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding equivalents of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims which follow.