The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an exhaustive or limiting overview of the disclosure. The summary is not provided to identify key and/or critical elements of the invention, delineate the scope of the invention, or limit the scope of the invention in any way. Its sole purpose is to present some of the concepts disclosed in a simplified form, as an introduction to the more detailed description that is presented later.
Typical stereo camera algorithms assume that the signals coming in from the different cameras are synchronized and that the cameras are rectified. Perfectly synchronized camera signals means that the cameras take pictures at exactly the same time. In reality, perfect synchronization is typically not possible; however, practical implementations rely on an assumption that the camera signals are substantially synchronized. Substantially synchronized cameras may be synchronized to within approximately a few milliseconds of each other. Rectified camera signals means that the images are co-planar, i.e., rectified cameras have co-planar imaging planes. In many cases stereo algorithms may be run based on ‘loose’ synchronization; however, stereo algorithms may still require accurate rectification.
Images from multiple cameras may be transformed; such as through projective-geometry based techniques to convert them into new images which behave as if they were acquired by co-planar cameras. This geometric transformation of the images is called epipolar rectification. This process is typically achieved on synchronized cameras. However, using unsynchronized cameras may often be cheaper or more convenient.
A matching system may be used to match unsynchronized camera signals. In one example, the images from each camera may be matched by synchronizing images from each camera such that the scene of each image is determined to be static. Alternatively or additionally, the images from one camera may be compared to images from another camera to determine the best synchronous match. These matched, synchronized image portions may be processed to generate a transformation structure that may be used to rectify the images, e.g., transform images from the cameras as if the cameras had co-planar image planes, or used in any other process, such as calibration of the cameras. Camera calibration means estimating the internal and external parameters of a camera, such as the focal length and/or relative geometry of the multiple cameras.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
Exemplary Operating Environment
Although not required, the matching system will be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various environments.
With reference to
Device 100 may also contain communication connection(s) 112 that allow the device 100 to communicate with other devices. Communications connection(s) 112 is an example of communication media. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term ‘modulated data signal’ means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency, infrared, and other wireless media. The term computer readable media as used herein includes both storage media and communication media.
Device 100 may also have input device(s) 114 such as keyboard, mouse, pen, voice input device, touch input device, laser range finder, infra-red cameras, video input devices, and/or any other input device. Output device(s) 116 such as display, speakers, printer, and/or any other output device may also be included.
As noted above, the images 212, 214 to be transformed must be of the same scene 250, which as shown in
However, some cameras 212, 214 may be un-synchronized in time. For example,
In addition, with a moving image of a person and/or moving background, the matched images, e.g., image 326 and image 348, may not be of the same image, e.g., some portion of the image may be changed between the time image 326 is taken and the time image 348 is taken. If the camera signals are not synchronized, e.g., un-synchronized, the feature point matcher engine 232 may not be able to accurately match features of the images 326, 348. However, rectification of unsynchronized cameras can be achieved from a set of matched image portions of a static scene. For example, if nothing is moving, then there is no difference between synchronized and unsynchronized cameras. Therefore, unsynchronized cameras may be rectified by automatically detecting instances where the scene is static.
An example matching system 400 is shown in
Each camera may provide a plurality of images which are spaced over time, e.g., a ‘movie’ of the scene 450. For example, camera 402 may provide two images 412, 414 which are spaced apart in time, and camera 404 may provide two images 416, 418 which are also spaced apart in time.
The sequence of images 412, 414, 416, 418 from each camera may be output to a motion detector engine 460. The motion detector engine may detect images where at least a portion of the scene is moving. For example, the motion detector engine 460 may use optical flow techniques to detect when a portion of an image from one of the cameras 402, 404 moves over time relative one or more other images from the same camera. In one example, the movement engine may compare the image 412 with the image 414, such as by image differencing. More particularly, image 412 may be subtracted from image 414, and the image difference between image 412 and image 414 may indicate a movement in the detected scene 450 represented in the images from camera 402. In this manner, the movement engine may detect images indicating scene movement for any combination of the cameras providing image input to the motion detector engine. In one example, the motion detector engine may detect images from camera 402 indicating scene movement, and may detect images from camera 404 indicating scene movement. In an alternative example, the motion detector engine may detect movement in images from one camera (or a subset of the input cameras) and assume that the other camera images with time frames sufficiently close to the time of movement will also show movement.
The difference between two images from the same camera may be corrupted by electronic noise. Accordingly, a predetermined threshold value may be compared to the image difference to determine if the indicated scene is sufficiently static. Any appropriate threshold may be used and may depend on the quality of the cameras. For example, for the Logitech QuickCam for Notebooks Pro, a suitable threshold may be 8 levels.
Image portions with detected movement may be discarded. In some cases, only a portion of the scene may have moved as indicated by the image difference between image 412 and 414. Accordingly, it may be appropriate to discard only those portions of images where that part of the represented scene is moving, e.g., the static portion of the image may be retained.
The motion detector engine 460 may associate an indicator with each image portion, which may identify the image in temporal space or may indicate the particular static image represented. For example, a time stamp may be associated with each image 462, 464 to indicate the time or relative time that the image was taken of the scene. For example, the images may be numbered or listed over time as 1st image, 2nd, image, and the like. Alternatively, the motion detector engine may detect a time of first movement in the images from one camera, and associate a first static group identifier with the static images occurring before that time of first movement. Similarly, the motion detector engine may associate a second static group identifier with the static images occurring between the time of first detected movement and the second time of detected movement in the images from one camera. The motion detector engine may provide a similar indicator for the static images from each camera.
Since the images from a single camera indicate a static scene in the time between detected movement, at least a portion of those images may be approximately identical. Accordingly, placement in a static image data structure, such as a storage array, may indicate the image's placement in temporal space. More particularly, for each camera, the motion detector engine may discard all but one of the static images in each group of static images occurring in the time between detected movements, and may store those images in temporal order. Accordingly, each camera may be associated with a storage array of images with each static image representing a group of static images between times of detected movement. In this manner, the static image portions 462, 464 may be stored in temporal order, and/or each image portion may be identified with an appropriate indicator.
The static image portions 462, 462 from each respective set of images from the cameras 402, 404 may be output to a synchrony engine 470. The synchrony engine may examine and match the image portion 462 from the first camera with the image portion 464 of the second camera of a similar time static scene. The matched images 462, 464 from each camera may form a synchronized group of images. More particularly, the synchrony engine may match an image identifier from the first image 462 with the image identifier of the second image 464. For example, the image from the first camera associated with a particular time of non-movement may be matched to the image from the second camera from the same time of non-movement, or more particularly, an image from the first camera of the first time of non-movement may be matched to an image from the second camera of the first time of non-movement. In this manner, the lack of time synchronization between the images from different cameras is reduced since the images portions in a synchronized group represent a static scene portion such that synchronization of the images is not dependent on the exact time that the image was taken of the scene.
The synchronized image portions 462, 464 may then be output to a feature detector engine 230, such as that used in the prior art transformation system 200 of
The matched image features for each synchronized image set may be stored in any suitable format. For example, the matched features for a group of synchronized images may be stored as a vector of matched features. With multiple synchronized image groups, which may be pairs, the vectors of matched features may be concatenated into a matrix.
Matching points of the synchronized images may be output to a homography engine 234 to determine the transformation of each image to a projective plane, e.g., the transformation to convert the actual image planes of cameras 402, 404 to be co-planar. The determined transformation parameters or transformation structure may be output to a transformation engine 236 to be applied against the images 412, 414, 416, 418 or any other images from the cameras 402, 404 to convert them to transformed images 422, 424 as if they were received from cameras having co-planar image planes.
Since the determination of transformation parameters from the homographic engine may improve with information from multiple static images, the scene 450 viewed by the cameras may be changed. In one example, the cameras may be moved to point at another static image; however, to provide accurate transformation data, the relative geometry of the cameras must remain fixed with respect to one another. Thus, to ‘move the cameras’, both cameras may be moved in their relative positions by moving a camera rig attached to the cameras.
In other cases, the scene 450 itself may be changed with the cameras 402, 404 maintained in their original position. For example, if the scene is of a person, the person may remain stationary and the cameras may gather the image data 412, 414 through camera 402 and image data 416, 418 through camera 404. The movement engine may detect when the images are static and capture the static images 462, 464 to pass on to the homography engine 234.
To change the scene and provide additional image data to the homography engine, objects within the scene 450 may be moved to a different position and then maintained as shown in the scene 550 of
As noted above, the motion detection engine may detect where the scene 450 is moving, and may discard those frames or portions of images where portions of the scene are moving. The scene may be repeatedly modified and pauses or maintained in a static position, e.g., the person may move slightly and pause and then move slightly and pause, to gather multiple matched groups of corresponding feature points. In one example, the user may move her head or another viewable object to different quadrants of the viewing area of the cameras, e.g., upper-left quadrant, upper right quadrant, lower left quadrant, lower right quadrant, and/or center of the viewing area to provide various static image groups to be used by the homography engine. It is to be appreciated that any scene changing techniques maybe used, including, changing the object closest to the cameras (e.g., person, book, box, and the like), changing a static image (e.g., holding up different calibration charts similar to eye charts and the like), and the like.
In some cases, the user may modify the scene in a suggested or predetermined way, e.g., move her body and/or another viewable object to different sections of the viewing area of the cameras and/or moving the cameras while maintaining the relative geometry of he cameras. The timing of the scene changes may be pre-suggested to the user such that the user changes the scene in a predetermined timing sequence of modifying and maintaining a scene. For example, a user may be instructed to remain stationary for approximately 1 second, then move, and remain stationary for another second, and repeat a predetermined number of times. Alternatively, the user may be instructed to maintain a static scene until the motion detection engine detects a sufficiently static scene, e.g., the image difference is below a predetermined threshold. The motion detection engine may then indicate to the user that the user should modify the scene and maintain that new static scene. The motion detection engine may provide any suitable indication to the user such as an audible signal, a visual signal, and the like. The user may repeat this process a predetermined number of times. In one alternative example, the static images of the scene from the un-synchronized cameras may be processed through the synchrony engine, the feature detector engine, the feature point matcher engine, and the homography engine. If the image data is insufficient to form the transformation structure, e.g., the transformation calibration is not converged, then the motion detection engine may provide an indication to the user to modify the scene and provide another static scene for analysis by the matching system. In this manner, the transformation may be analyzed in real time as the user is providing the static images for processing.
In one example, the motion detection engine 460 of
In operation, the matching system 400 of
The images indicating a static scene from each camera may be matched 621 or synchronized with appropriate static scene images from the other cameras. The features of each static image portion may be detected 622, which may be matched 624 to the detected features of the other image in the synchronized pair of images. A transformation structure, such as a transformation matrix, may be determined 626. The transformation structure may be used to transform 628 images from the at least two cameras to rectify the images from each camera.
In another example, the matching system 400 of
To reduce the effect of the above problems, the matching system 700 of
While the preferred embodiment of the invention has been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention. For example, the homograhics engine 232 and the transformation engine 234 discussed above with reference to