The present invention relates to image processing. More particularly, the invention relates to the creation of movies using mobile apparatus. Still more particularly the invention relates to the automatic creation of a video that comprises a virtual tour.
By Virtual Tour (VT) it is normally intended to refer to a mode that enables a user to look at a certain place and walk around it through non linear data content. The most famous VT product available today on the web is Google's “Street View”, where data is captured by Google a van using dedicated cameras and hardware. There are companies that provide services to create VT content mainly for the real estate agents. However, according to existing solutions the creation of a VT requires dedicated hardware and/or the use of offline editing tools.
The current options for a user to create VT content are:
According to existing solutions the various editors present much limitation:
It is therefore clear that a solution is needed, that overcomes the drawbacks of the prior art and, inter alia:
The invention is directed to a method for generating a virtual tour (VT) comprising while shooting a video of the motion of an image-capturing device, identifying three distinct states, consisting of:
In one embodiment of the invention the method comprises providing to the user an indication as to the map scene and the current capturing mode.
In an embodiment of the invention the image-capturing device is a smart phone. In another embodiment of the invention the image-capturing device is a tablet PC.
In still another embodiment of the invention a map of the area being captured is created “on the fly”.
In yet a further embodiment of the invention editing tools are provided for editing the map, which may comprise means for associating an image with a location on the map.
In the drawings:
In accordance with the invention suitable software is provided on (or otherwise associated with) a camera device, which guides the user in the process of capturing the VT content, thus enabling the online VT creation.
In one embodiment of the invention, inter alia, the following elements are provided:
Acquisition of the VT:
The software associated with the camera device, using image processing technology and/or other sensor's data, analyzes the captured scene while shooting the VT. The capturing process is divided into two main features:
According to the invention and based on sensors data and image processing, it is possible to recognize the three different situations mentioned above and to combine them together online in order to create a virtual tour. The user has a GUI, which gives him an indication as to the map scene and the current capturing mode.
A map of the captured tour is created while capturing. The map is used for the virtual tour viewer at later stage.
The abovementioned classification is based on a combination of image processing and of sensors data, because:
The differentiating between turning around and going forward is based on a combination of parameters from sensors and imaging.
To further illustrate this point, while being outside, where GPS works well, there is a simple way to differentiate—if the camera location changed—the user moved forward and if it did not change—he was either standing or turning around.
The gyro, compass and or accelerometers, on the other hand, cannot tell standing from moving forward. This differentiation must be supported by image processing.
The sensors also suffer from environment noise. For example, the compass is affected by electrical devices found in its vicinities.
The optical flow of consecutive frames can provide the full needed information. For example:
The aforementioned “optical flow” method is one of the suitable methods to determine camera movement between two frames. Other methods known in the art and not discussed herein in detail for the sake of brevity, can also be used.
The capturing of data for the VT may also be non-continuous. For example a user may start at point A, go to point B and then want to capture the VT route from point A to point C. In this case the user may stop capturing at point B, and restart capturing when he is back in point A.
The invention enables the creation and editing to the VT tour map. As shown in the figures and further discussed in detail below, a schematic map is created during the capture process. This map can be presented to the user while capturing the VT online, as shown in the illustrative example of
The following illustrative description makes reference to the technical features and the GUI implemented on an Android mobile phone with compass and accelerometers (but without a gyro). As said, the purpose of the invention is to allow a user to create and share a virtual tour. Once created a virtual tour allows a person to explore a place without actually being there. In one embodiment of the invention an illustrative implementation is divided into 3 stages:
The invention was tested using an android galaxy tab (P1 device), as well as on a Galaxy S phone. Android version was Froyo (2.2).
The capture engine receives for each frame the below Inputs:
It will then output:
From VirtualTourApi.h:
The process result is a view table where each kept frame is represented as a single row (see detailed description below).
Capture Engine processing description (for an illustrative, specific embodiment)
Rigid Registration
The motion estimation of the camera is done by SAD (Sum of Absolute Difference) minimization on a set of significant points.
Detection Stage
First, the accelerometer inputs' variance is check for the last few frames. Hand shakes while walking are clearly seen on accelerometers input. Scanning happens on frames 65-240, 420-600, 650-890, 1020- (
If the variance is big and we are currently in “walking” mode—continue walking.
If the variance is small and we are currently in “scanning” mode—continue scanning.
Else—check by 2D registration
If the camera movement by the last few frames' visual information is smooth and horizontal—we are in “scan” mode. Otherwise—in “walk” mode.
If no visual information exists on the last few frames (that is, scanning or walking against a white wall), compass data will replace the visual information in the detection stage.
In general, the visual information is considered more reliable all through the analysis, and azimuth is used only as fallback and for sanity check. This is the result in unreliable inputs used while developing and testing.
Frame Handling
Each scan has a “marker” frame near its start, which is compared and matched to the coming frames when the scan is closed.
When a scan starts, each coming frame is checked. Once a frame with sufficient visual information is detected, it is set to be the marker. Frames of the scan before the marker frame are not kept.
Scan frames are accumulated. Frames are kept so that the gap between them is about ⅙ of the frame size. This means that if the field of view, in the scanning direction, is 45 degrees, a frame will be kept every 7.5 degrees, and a full scan will hold about 48 frames.
After scanning about 270 degrees, we start comparing the current frames to the marker to try “closing” the scan. The 270 threshold represent the unreliability of azimuth estimate based on sensors and image registration inputs.
Once a frame is matched to the marker, the scan is closed. At this stage we need to connect the last frame from walking stage to the relevant scan frame, to create accurate junction point at scan entry.
This is done by comparing the last frame on the path (walking stage) to scan frames.
The user gets a feedback of scan closed, and should continue scanning until he leaves the room.
When the user starts walking out of the room, scan stop is detected, and the first frame of the path is compared to scan frames, to create an accurate junction point at scan exit.
If a scan closing point was missed, a new marker is chosen from among scan frames, the scan beginning is erased, and scanning continues. This is rare, and happens mostly if the scene changed, if the user moved while scanning, or if the device was tilted.
Fallbacks and Detection Errors
Incomplete Scan
If a scan was stopped before being closed, the engine decides whether to keep it as “partial scan” or to convert all the frames to path frames. A scan is kept if it already holds more than 180 degrees.
Very Short Path
If we moved for “scan” to “path” state, and shortly thereafter detected a scan again, we conclude that the scan stop was wrong and probably resulted from a shaking of the user's hand. In this case we continue the previously stopped scan. If we were in “redundant scan” mode, that is, the scan was already closed, we have no problem continuing. If the scan was not yet closed, we must restart the scan (that is—erase the incomplete scan just created), and choose a new marker.
Image Matching
Matching two images is done whenever we need to “connect” two frames that are not consecutive. This can be for scan frame versus either scan marker, scan entry, or scan exit.
Matching is done as following:
First, the two images are registered by 2D translation, as done for consecutive frames. The “significance” of the minima found while registering, that is, the ratio of SAD value on best match to SAD values on other translations, is calculated and kept as a first score to evaluate the match.
Next, homography transformation is found to best transform one image to the other. This homography represents a slight distortion between images, resulting from camera tilt, or from the user moving slightly closer or farther from the scene.
A warped image is created according to 2D translation and homography. The similarity of the warped image to the other one is evaluated by two independent calculations—SAD on a set of selected grid points, uniformly distributed in the image, and cross-correlation on down sampled images.
The three scores—SAD minima significance, SAD on grid points and cross-correlation, are combined and threshold to get a final decision as to the two images match.
The example shown in
View Table Structure
This table structure is defined in VirtualTourTypes.H:
Creating a Virtual Tour—Application+GUI Description
Capture Stage
Overview
Before starting the capture stage the user should plan a route that will cover all places of interest in the tour (
1) Room scan—360 deg scan.
2) Corridor scan—scan of a walk from room to room.
In the example of
UI
In the capture stage all the user has to do is to press the start button (“a” in
Other than the start/stop button the screen has 2 more GUI indications:
In addition, in this specific illustrative example, the user can use the android menu button in order to perform the following operations:
Map Edit Stage
After finishing the capture stage, a database containing all the images and their calculated location is available. From that database the application automatically creates a map. Since the sensors data is not accurate enough a manual map editing is needed. This is done by the map edit stage. The map edit stage allows the user the following changes to the map:
Room Movement
Long press on a room and then drag it to the desired new position. By doing so the user can fix the room locations to match their actual location. The example of
In order to move a room the user needs to long press on a room and drag it to the new desired position. Note that at the end of all changes the user needs to save the new vtour file (via options menu→save).
Room Merge
If the same room is scanned twice it is possible to merge two rooms. For example in the floor plan showed in
In order to actually make a room merge the user long presses on a room and then drags it over the target room for merge. The user will be asked if he wants to merge the rooms and once he presses ok the merge will be done. Note that at the end of all changes the user needs to save the new vtour file (via options menu→save).
Corridor Split
The map view, always show a corridor as a straight line between 2 rooms. Some times when a corridor is not straight it is desired to add a split point in a corridor line. That new split point can be moved and create a corridor which is not straight.
In order to create a corridor split point the user needs to long press on the corridor at the location of the desired split point. In order to move a split point the user long presses the point and then moves it. Note that at the end of all changes the user needs to save the new vtour file (via options menu→save).
Corridor Split Merge
It is possible to merge a corridor split point with a room. This is needed in cases a user maps several rooms while walking in one direction and then returns back just in order to record the corridors view on the other direction.
Note that at the end of all changes the user needs to save the new vtour file (via options menu→save).
Options Menu
Pressing the android options menu allows the user to do the following operations:
All the above description and examples have been provided for the purpose of illustration and are not meant to limit the invention in any way. Many alternative sensors and sensor analyses can be provided, as well as many other viewing and editing options, all without exceeding the scope of the invention.