The present invention relates generally to the art of sensors and displays. It finds particular application in vision systems for operators of manned and unmanned vehicles and is illustrated and described herein primarily with reference thereto. However, it will be appreciated that the present invention is also amenable to surveillance and other tele-observation or tele-presence applications and all manner of other panoramic or wide-angle video photography applications.
Although it has been possible to collect panoramic images and even spherical images for a number of years, it has not been possible to simultaneously acquire and display data panoramically, at its true resolution, in real-time, as three-dimensional (3-D) stereoscopic images. Nor has it been possible to share non-coincident stereo views of the outside of a vehicle. The lack of these capabilities has severely hampered the ability to implement adequate operator interfaces in those vehicles that do not allow the operator to have direct view of the outside world, such as fighting vehicles like tanks and armored personnel carriers, among many other applications. Personnel often prefer to have themselves partially out of the vehicle hatches in order to gain the best visibility possible, putting them at risk of casualty. In the case of tanks, the risk to such personnel includes being hit by shrapnel, being shot by snipers, getting pinned by the vehicle when it rolls, as well as injuring others and property due to poor visibility around the vehicle as it moves.
Previous attempts at mitigating these problems include the provision of windows, periscopes, various combinations of displays and cameras, but none of these has provided a capability that mitigates the lack of view for the operators. Hence, operators still prefer direct viewing, with its inherent dangers. Windows must be small and narrow since they will not withstand ballistics and hence provide only a narrow field of view. Windows also let light out, which at night pinpoints areas for enemy fire. Periscopes have a narrow field of view and expose the operator to injury, e.g., by being struck by the periscope when the vehicle tosses around. Periscopes may also induce nausea when operators look through them for more than very short periods. Previous attempts with external cameras and internal displays similarly induce nausea, provide a narrow or limited field of view, do not easily accommodate collaboration among multiple occupants, endure significant lag times between image capture and display thereby causing disorientation for the users, do not provide adequate depth perception, and, in general, do not replicate the feeling of directly viewing the scenes in question. Further, when a sensor is disabled, the area covered by that sensor is no longer visible to the operator. Hence as of 2005, vehicle operators are still being killed and injured in large numbers.
In addition, display systems for remotely operated unmanned surface, sub-surface, and air vehicles suffer from similar deficiencies, thereby limiting the utility, survivability, and lethality of these systems.
The current state of the art involves the use of various types of camera systems to develop a complete view of what is around the sensor. For example, the Ladybug camera from PT Grey, the Dodeca camera from Immersive Media Corporation, and the SVS-2500 from iMove, Inc., all do this with varying degrees of success. These and other companies have also developed camera systems where the individual sensors are separated from each other by distances of many feet and the resulting data from the dispersed cameras is again “stitched” together to form a spherical or semi spherical view of what is around the vehicle. Most of these cameras have accompanying software that allows a user to “stitch” together the images from a number of image sensors that make up the spherical camera, into a seamless spherical image that is updated from 5 to 30 times per second. Accompanying software also allows one to “de-warp” portions of the spherical image for users to view in a “flat” view, without the distortion caused by the use of very wide-angle lenses on the cameras that make up the spherical sensors. These systems are generally non-real-time and require a post-processing step to make the images appear as a spherical image, although progress is being made in making this process work in real-time. Unfortunately, tele-observation situations such as viewing what is going on outside of a tank as it is being operated require a maximum of a few hundred milliseconds of latency from image capture to display. Present systems do not provide a stereo 3-D view and, hence, cannot replicate the stereoscopic depth that humans use in making decisions and perceiving their surroundings.
Furthermore, the fielded current state of the art still generally involves the use of pan-tilt type camera systems. These pan-tilt camera systems do not allow for multiple users to access different views around the sensor and all users must share the view that the “master” who is controlling the device is pointing the sensor towards.
Accordingly, the present invention contemplates a new and improved vision system and method wherein a complete picture of the scene outside a vehicle or similar enclosure is presented to any number of operators in real-time stereo 3-D, and which overcome the above-referenced problems and others.
In accordance with one aspect, a panoramic camera system includes a plurality of camera units mounted and arranged in a circumferential, coplanar array. Each camera unit includes one or more lenses for focusing light from a field of view onto an array of light-sensitive elements. A panoramic image generator combines electronic image data from the multiplicity of the fields of view to generate electronic image data representative of a first 360-degree panoramic view and a second 360-degree panoramic view, wherein the first and second panoramic views are angularly displaced. A stereographic display system is provided to retrieve operator-selectable portions of the first and second panoramic views and to display the user selectable portions in human viewable form.
In accordance with another aspect, a method of providing a video display of a selected portion of a panoramic region comprises acquiring image data representative of a plurality of fields of view with a plurality of camera units mounted in a common plane and arranged in a circumferential array. Electronic image data from the multiplicity of the fields of view is combined to generate electronic image data representative of a first 360-degree panoramic view and a second 360-degree panoramic view, said first and second panoramic views being angularly displaced with respect to each other. Selected portions of said first and second panoramic views are retrieved and converted into human viewable form.
One advantage of the present development resides in its ability to provide a complete picture of what is outside a vehicle or similar enclosure, to any desired number of operators in the vehicle or enclosure in real-time stereo 3-D.
Another advantage of the present vision system is that it provides image comprehension by the operator that is similar to, or in some cases better than, comprehension by a viewer outside the vehicle or enclosure. For example, since the depicted system allows viewing the uninterrupted scene around the vehicle/enclosure, and it provides high-resolution stereoscopic images to provide a perception of depth, color, and fine detail. In some instances, image comprehension may be enhanced due to the ability to process the images of the outside world and to enhance the view with multiple spectral inputs, brightness adjustments, to see through obstructions on the vehicle, etc.
Another advantage of the present invention is found in the near-zero lag time between the time the scene is captured and the time it is presented to the operator(s), irrespective of the directions(s) the operator(s) may be looking in.
Still another advantage of the present development resides in its ability to calculate the coordinates (e.g., x, y, z) of an object or objects located within the field of view.
Still another advantage of the present invention is the ability to link the scene presented to the operator, the location of objects in the stereo scenes via image processing or operator queuing, the calculation of x, y, z position from the stereo data and finally, the automated queuing of weapons systems to the exact point of interest. This is a critical capability that allows the very rapid return of fire, while allowing an operator to make the final go/no go decision, thereby reducing collateral or unintended damage.
Still further advantages and benefits of the present invention will become apparent to those of ordinary skill in the art upon reading and understanding the following detailed description of the preferred embodiments.
The invention may take form in various components and arrangements of components, and in various steps and arrangements of steps. The drawings are only for purposes of illustrating preferred embodiments and are not to be construed as limiting the invention.
Referring now to the drawing figures,
Other vision system embodiments may employ two or more sub-arrays of 1 to n sensors such that the combined fields of view for the sensors cover the entire 360-degree area around the vehicle, structure, or enclosure. The images from the sensors can then be fused together to obtain the panoramic view. Such embodiments allow the sensor sub-arrays to be distributed within a limited area and still provide the panoramic views necessary for stereo viewing. For example,
As best seen in the schematic depiction in
Preferably, the image sensing elements 134 are color sensors, e.g., in accordance with a red-green-blue or other triadic color scheme. Optionally, additional sensor elements, sensitive to other wavelengths of radiation such as ultraviolet or infrared, may be provided for each pixel. In this manner, infrared and/or ultraviolet images can be acquired concurrently with color images.
In the embodiment of
An image-processing module 142 collects and sorts the video images from the multiple cameras 112. As is best seen in
A panoramic image processor 146 generates two angularly displaced panoramic imagers. The angularly displaced images may be generated by a number of methods. In certain embodiments, as best illustrated in
An alternative method of generating the stereo panoramic images from the sensors 112 is shown in
The left eye perspective image is presented to the left eye of the operator and the right eye perspective image is presented to the right eye of the operator via a stereoscopic display 126. The differences between the left eye and right eye images provide depth information or cues which, when processed in the visual center of the brain, provide the viewer with a perception of depth. In the preferred embodiment, the stereoscopic display 126 is head-mounted display of a type having a left-eye display and a right-eye display mounted on a head-worn harness. Other types of stereoscopic displays are also contemplated, as are conventional two-dimensional displays.
In operation, the display 126 tracks the direction in which the wearer is looking and sends head tracking data 148 to the processor 142. A stereo image generator module 150 retrieves the corresponding portions of the left and right eye panoramic images to generate a stereoscopic image. A graphics processor 152 presents the stereoscopic video images in human viewable form via the display 126. The video signal 154 viewable on the display 126 can be shared with displays worn by other users.
In a preferred embodiment, one or more client computer-based information handling systems 156 may be connected to the host system 124. The client viewer includes a processor 158 and a graphics card 160. Head tracking data 148 is generated by the client display 126 is received by the processor 158. The client computer 156 requests those portions of the left and right panoramic images to generate a stereo view which corresponds to the direction in which the user is viewing. The corresponding video images are forwarded to the computer 156 and output via the graphics processor 160.
In this manner, multiple viewers may access and view portions of the panoramic images independently. In the embodiment of
In certain embodiments, an image representation of the user's location, such as the vehicle 116, which may be a 2-D or 3-D representation, such as an outline, wire frame, or other graphic representation of the vehicle 116, may be superimposed over the display image so that the relative positions of the vehicle 116 versus other objects in the video streams can be determined by the driver or others in the crew. This is important, as it is now the case that drivers routinely collide with people and objects due to an inability to perceive the impending collision, which may be due to a lack of view or the inability to perceive the relative depth of objects in the field of view. This is of particular concern for large land vehicles such as tanks, sea vehicles such as ships, and air vehicles such as helicopters. Preferably, the vehicle overlay is selectively viewable, e.g., via an operator control 162.
The views are preferably made available in real-time to one or more operators via a panoramic (e.g., wide field of view), ultra high-resolution head mount display (tiled near eye displays with N per eye) while tracking where they are looking (the direction the head is pointed relative to the sensor array 110) in order to provide the appropriate view angle. This may be accomplished using OpenGL or other graphics image display techniques. As used herein, the term “real-time” is not intended to preclude relatively short processing times.
In the depicted preferred embodiment of
In certain embodiments, a distance calculation module 164 may also utilize the stereoscopic images to calculate the coordinates of one or more objects located within the field of view. In the preferred embodiment wherein the cameras are substantially aligned horizontally, horizontal pixel offsets of an imaged object in the field of view of adjacent cameras 112 can be used to measure the distance to that object. It will be recognized that, in comparing adjacent images to determine the horizontal pixel offset, some vertical offset may be present as well, for example, when the vehicle is on an inclined surface. Depending on the type of vehicle, enclosure, etc., non-horizontal camera arrays may also be employed.
By way of non-limiting example, the calculation of the coordinates is particularly useful where the vehicle is being fired upon by a sniper or other source and the vehicle operator attempts to return fire. A vehicle embodying or incorporating the present vision system may acquire angularly displaced images of the flash of light from the sniper's weapon, which may then be located in real-time within the 3-D stereo view. The coordinates of the flash can then be calculated to give the vehicle operator(s) the approximate x, y, and z data for the target. This distance to the target can then be factored in with other ballistic parameters to sight in the target.
Object Distance(166)=Camera Separation(170)×Factor(174)/Offset(172).
In certain embodiments, objects in the acquired images may be modeled in 3-D using a 3-D model processor 176. By using the x and y coordinates of an object of interest (e.g., as calculated using the position of the object on the 2-D sensors 134 of the cameras 112 in combination with the distance to the object, or, the z coordinate), the position of the object of interest relative to the observer can be determined. By determining the three-dimensional coordinates of one or more objects of interest, a 3-D model of the imaged scene or portions thereof may be generated. In certain embodiments, the generated 3-D models may be superimposed over the displayed video image.
In some configurations, the cameras 112 may be used in landscape mode, giving a greater horizontal field of view (FOV) than vertical FOV. Such configurations will generally produce cylindrical panoramic views. However, it will be recognized that the cameras can also be used in portrait mode, giving a greater vertical FOV than horizontal FOV. This configuration may be used to provide spherical or partial spherical views when the vertical FOV is sufficient to supply the necessary pixel data. This configuration will generally require more cameras because of the smaller horizontal field of view of the cameras.
The sensors may be of various types (e.g., triadic color, electro-optical, infrared, ultraviolet, etc.) and resolutions. In certain embodiments, sensors with higher resolution than is needed for 1:1 viewing of the scenes may be employed to allow for digital zoom without losing the resolution needed to provide optimum perception by the user. Without such higher resolution, digital zoom causes the image to be pixilated when digitally zoomed and looks rough to the eye, reducing the ability to perceive features in the scene. In addition to allowing stereo viewing, embodiments in which there is overlap between adjacent cameras 112 provide redundant views so that if a sensor is lost, the view can still be seen from another sensor that covers the same physical area of interest.
On certain embodiments, the present invention utilizes a tiled display so that a very wide FOV which is also at a high resolution can be presented to the user, thereby allowing the user to gain peripheral view and the relevant and very necessary visual cues that this enables. Since the human eye only has the ability to perceive high resolution in the center of the FOV, the use of high resolution for peripheral areas can be a significant waste of system resources and an unnecessary technical challenge. In certain embodiments, the resolution of the peripheral areas of the FOV can be displayed at a lower resolution than the direct forward or central portion of the field of view. In this manner, the amount of data that must be transmitted to the head set is significantly reduced while maintaining the WFOV and high resolution in the forward or central portion of the view.
The functional components of the computer system 124 have been described in terms functional processing modules. It will be recognized that such modules may be implemented in hardware, software, firmware, or combinations thereof. Furthermore, it is to be appreciated that any or all of the functional or processing modules described herein may employ dedicated processing circuitry or, may be employed as software or firmware sharing common hardware.
Referring now to
If one or more processing steps are to be performed, e.g., based on user-selectable settings, the process proceeds to step 212 where it is determined if the coordinates of an imaged object are to be calculated. If one or more objects are to be located, the process proceeds to step 216 and the coordinates of the object of interest are calculated based on the horizontal offset between adjacent sensor units 112, e.g., as detailed above by way of reference to
At step 224, it is determined whether a 3-D model is to be generated, e.g., based on user selectable settings. If a 3-D model is to be generated at step 224, the process proceeds to generate the 3-D model at step 228. If the 3-D model is to be stored at step 232, the model data is stored in a memory 178 at step 236. The process then proceeds to step 240 where it is determined if the 3-D model is to be viewed. If the model is to be viewed, e.g., as determined via a user-selectable parameter, the 3-D model is prepared for output in human-viewable form at step 244 and the process proceeds to step 252.
If a 3-D model is not to be created at step 224, or, if the 3-D model is not to be viewed at step 244, the process proceeds to step 248 and left eye and right eye panoramic stereo views are generated. If the field of view of the selected image, i.e., the panoramic stereo image or 3-D model image, is to be displayed selected based on head tracking in step 252, then head tracker data is used to select the desired portion of the panoramic images for display at step 256. If it is determined that head tracking is not employed at step 252, then mouse input or other operator input means is used to select the desired FOV at step 260. Once the desired field of view is selected at step 256 or step 260, a stereo image is output to the display 126 at step 264. The process then repeats to provide human viewable image output at a desired frame rate.
The invention has been described with reference to the preferred embodiments. Obviously, modifications and alterations will occur to others upon reading and understanding the preceding detailed description. It is intended that the invention be construed as including all such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
This application is a continuation of U.S. application Ser. No. 11/265,584 filed Nov. 2, 2005. The aforementioned application is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 11265584 | Nov 2005 | US |
Child | 14986987 | US |