The present invention relates to an apparatus and method for panoramic video imaging.
Panoramic imaging systems including optical devices, unwarping software, displays and various applications are disclosed in U.S. Pat. Nos. 6,963,355; 6,594,448; 7,058,239; 7,399,095; 7,139,440; 6,856,472; and 7,123,777 assigned to Eyesee360, Inc. All of these prior patents are incorporated herein by reference.
In one aspect, the invention provides an apparatus including a housing, a concave panoramic reflector, a support structure configured to hold the concave panoramic reflector in a fixed position with respect to the housing, and a mounting device for positioning the housing in a fixed orientation with respect to a computing device such that light reflected by the concave panoramic reflector is directed to a light sensor in the computing device.
The housing 12 further includes a projection 42 extending from the second portion, and shaped to couple to a case or other mounting structure that is used to couple the optical device to a computing device and to hold the optical device in a fixed orientation with respect to the computing device. In the embodiment of
The optical device housing further includes a generally triangularly shaped potion 52 extending between sides of the first and second portions. The triangular portion can function as an enlarged fingerhold for insertion and removal.
The housing 62 further includes a plurality of protrusions 96, 98, 100 and 102 extending from a flat surface 104 of the second portion, and shaped to couple to a plurality of recesses in a case or other mounting structure that is used to couple the optical device to a computing device and to hold the optical device in a fixed orientation with the computing device. The housing further includes a generally triangularly shaped potion 106 extending between sides of the first and second portions. The rotational symmetry of the protrusions allows the mount to interface in up to four different orientations for operation.
The curvature of the panoramic mirror can be altered to provide different fields of view. The gap 84 may provide a further constraint based on what rays of light it occludes from reflection. Possible fields of view may range from −90 degrees below the horizon to about 70 degrees above, or anything in between.
The mirror 86 is sized to reflect light encompassed by the field of view of a camera in the computing device. In one example, the camera vertical field of view is 24°. However, the size and configuration of the components of the optical device can be changed to accommodate cameras having other fields of view.
The case includes a smoothly contoured lip, symmetric on both parts and formed continuously over a curved path. It is designed to provide a positive “snap” action when attached, and an equal removal and insertion force. The smooth contour is designed to avoid wear from repeated cycles. It also imparts a tension that pulls the two sections together to form a tight fit around the phone, which aids in keeping alignment between the camera opening 132 and the iPhone® camera. The opening 132 can be slightly undersized with respect to the protruding barrel on the optic. This provides an interference fit which increases the holding force of the case. Additionally, the profile of the barrel could bulge outwards to fit into the opening. The opening 132 may taper out towards the phone, which would provide additional holding force.
The optic collects light from 360 degrees of the horizontal environment, and a subset of the vertical environment (for example, ±45° from the horizon) surrounding the optic is reflected by a curved mirror in the optic. This reflection can then be recorded by a camera, or by a recording device capable of receiving image data from a camera, to capture a panoramic still or motion image.
One or more flat, secondary mirrors can be included within the optic to accommodate a more convenient form factor or direction of capture. Secondary mirror(s) could also be curved for purposes of magnification or focus.
Parameters:
Equations:
In the equations, A is the angle between the direction of a ray ro and a line parallel to the camera axis 294 in radians; Rex is the angle between the camera axis and a point on the mirror that reflects ray ro in radians; Rce the angle between the camera axis and an edge of the mirror in radians; ro is the inner radius in millimeters; α is the gain factor; θ is the angle between the camera axis and the reflected ray r in radians; and k is defined in terms of α in the first equation.
In Embodiment #1, the mirror equation has been extended to take into account a camera start angle (Rcs expressed in radians). In the case of the Embodiment #2 mirror design, the camera start angle would be zero. Evaluating the additional terms in the Embodiment #1 with Rcs set to zero, the equation reduces:
A microphone 314 is provided to detect sound. The microphone output is stored in an audio buffer 316 and compressed 318 before being recorded. The computing device may include sensors that include a global positioning system (GPS) sensor, an accelerometer, a gyroscope, and a compass that produce data 320 simultaneously with the optical and audio data. This data is encoded 322 and recorded.
A touch screen 324 is provided to sense touch actions 326 provided by a user. User touch actions and sensor data are used to select a particular viewing direction, which is then rendered. The computing device can interactively render the texture mapped video data in combination with the user touch actions and/or the sensor data to produce video for a display 330. The signal processing illustrated in
Many mobile computing devices, such as the iPhone®, contain built-in touch screen or touch screen input sensors that can be used to receive user commands. In usage scenarios where a software platform does not contain a built-in touch or touch screen sensor, externally connected input devices can be used. User input such as touching, dragging, and pinching can be detected as touch actions by touch and touch screen sensors though the usage of off the shelf software frameworks.
Many mobile computing devices, such as the iPhone®, also contain built-in cameras that can receive light reflected by the panoramic mirror. In usage scenarios where a mobile computing device does not contain a built-in camera, an externally connected off the shelf camera can be used. The camera can capture still or motion images of the apparatus's environment as reflected by the mirror(s) in one of the optical devices described above. These images can be delivered to a video frame buffer for use by the software application.
Many mobile computing devices, such as the iPhone®, also contain built-in GPS, accelerometer, gyroscope, and compass sensors. These sensors can be used to provide the orientation, position and motion information used to perform some of the image processing and display functions described herein. In usage scenarios where a computing device does not contain one or more of these, externally connected off the shelf sensors can be used. These sensors provide geospatial and orientation data relating to the apparatus and its environment, which are then used by the software.
Many mobile computing devices, such as the iPhone®, also contain built-in microphones. In usage scenarios where a mobile computing device does not contain a built-in microphone, an externally connected off the shelf microphone can be used. The microphone can capture audio data from the apparatus's environment which is then delivered to an audio buffer for use by the software application.
In the event that multiple channels of audio data are recorded from a plurality of microphones in a known orientation, the audio field may be rotated during playback to synchronize spatially with the interactive renderer display.
User input, in the form of touch actions, can be provided to the software application by hardware abstraction frameworks on the software platform. These touch actions enable the software application to provide the user with an interactive presentation of prerecorded media, shared media downloaded or streamed from the internet, or media which is currently being recorded or previewed.
The video frame buffer is a hardware abstraction that can be provided by an off the shelf software framework, storing one or more frames of the most recently captured still or motion image. These frames can be retrieved by the software for various uses.
The audio buffer is a hardware abstraction that can be provided by one of the known off the shelf software frameworks, storing some length of audio representing the most recently captured audio data from the microphone. This data can be retrieved by the software for audio compression and storage (recording).
The texture map is a single frame retrieved by the software from the video buffer. This frame may be refreshed periodically from the video frame buffer in order to display a sequence of video.
The system can retrieve position information from GPS data. Absolute yaw orientation can be retrieved from compass data, acceleration due to gravity may be determined through a 3-axis accelerometer when the computing device is at rest, and changes in pitch, roll and yaw can be determined from gyroscope data. Velocity can be determined from GPS coordinates and timestamps from the software platform's clock; finer precision values can be achieved by incorporating the results of integrating acceleration data over time.
The interactive renderer 328 combines user input (touch actions), still or motion image data from the camera (via a texture map), and movement data (encoded from geospatial/orientation data) to provide a user controlled view of prerecorded media, shared media downloaded or streamed over a network, or media currently being recorded or previewed. User input can be used in real time to determine the view orientation and zoom. As used in this description, real time means that the display shows images at essentially the same time the images are being sensed by the device (or at a delay that is not obvious to a user) and/or the display shows images changes in response to user input at essentially the same time as the user input is received. By coupling the panoramic optic to a mobile computing device having a built in camera, the internal signal processing bandwidth can be sufficient to achieve the real time display.
The texture map can be applied to a spherical, cylindrical, cubic, or other geometric mesh of vertices, providing a virtual scene for the view, correlating known angle coordinates from the texture with the desired angle coordinates of each vertex. In addition, the view can be adjusted using orientation data to account for changes in the pitch, yaw, and roll of the apparatus.
An unwarped version of each frame can be produced by mapping still or motion image textures onto a flat mesh correlating desired angle coordinates of each vertex with known angle coordinates from the texture.
Many software platforms provide a facility for encoding sequences of video frames using a compression algorithm. One common algorithm is AVC or H.264 compression. This compressor may be implemented as a hardware feature of the mobile computing device, or through software which runs on the general CPU, or a combination thereof. Frames of unwarped video can be passed to such a compression algorithm to produce a compressed data stream. This data stream can be suitable for recording on the devices internal persistent memory, or transmitted though a wired or wireless network to a server or another mobile computing device.
Many software platforms provide a facility for encoding sequences of audio data using a compression algorithm. One common algorithm is AAC. The compressor may be implemented as a hardware feature of the mobile computing device, or through software which runs on the general CPU, or a combination thereof. Frames of audio data can be passed to such a compression algorithm to produce a compressed data stream. The data stream can be suitable for recording on the computing device's internal persistent memory, or transmitted though a wired or wireless network to a server or another mobile computing device. The stream may be interlaced with a compressed video stream to produce a synchronized movie file.
Display views from the interactive render can be produced using either an integrated display device such as the screen on an iPhone®, or an externally connected display device. Further, if multiple display devices are connected, each display device may feature its own distinct view of the scene.
Video, audio, and geospatial/orientation/motion data can be stored to either the mobile computing device's local storage medium, an externally connected storage medium, or another computing device over a network.
Software for the apparatus provides an interactive display, allowing the user to change the viewing region of a panoramic video in real time. Interactions include touch based pan, tilt, and zoom, orientation based pan and tilt, and orientation based roll correction. These interactions can be made available as touch input only, orientation input only, or a hybrid of the two where inputs are treated additively. These interactions may be applied to live preview, capture preview, and pre-recorded or streaming media. As used in this description, “live preview” refers to a rendering originating from the camera on the device, and “capture preview” refers to a rendering of the recording as it happens (i.e. after any processing). Pre-recorded media may come from a video recording resident on the device, or being actively downloaded from the network to the device. Streaming media refers to a panoramic video feed being delivered over the network in real time, with only transient storage on the device.
As shown in
Sometimes it is desirable to use an arbitrary North value even when recorded compass data is available. It is also sometimes desirable not to have the pan angle change 1:1 with the device. In some embodiments, the rendered pan angle may change at user-selectable ratio relative to the device. For example, if a user chooses 4x motion controls, then rotating the device thru 90° will allow the user to see a full rotation of the video, which is convenient when the user does not have the freedom of movement to spin around completely.
In cases where touch based input is combined with an orientation input, the touch input can be added to the orientation input as an additional offset. By doing so, conflict between the two input methods is avoided effectively.
On mobile devices where gyroscope data is available and offers better performance, gyroscope data which measures changes in rotation along multiple axes over time, can be integrated over the time interval between the previous rendered frame and the current frame. This total change in orientation can be added to the orientation used to render the previous frame to determine the new orientation used to render the current frame. In cases where both gyroscope and compass data are available, gyroscope data can be synchronized to compass positions periodically or as a one-time initial offset.
As shown in
In cases where touch based input is combined with orientation input, touch input can be added to orientation input as an additional offset.
On mobile devices where gyroscope data is available and offers better performance, gyroscope data which measures changes in rotation along multiple axes over time, can be integrated over the time interval between the previous rendered frame and the current frame. This total change in orientation can be added to the orientation used to render the previous frame to determine the new orientation used to render the current frame. In cases where both gyroscope and accelerometer data are available, gyroscope data can be synchronized to the gravity vector periodically or as a one-time initial offset.
As shown in
On mobile devices where gyroscope data is available and offers better performance, gyroscope data which measures changes in rotation along multiple axes over time, can be integrated over the time interval between the previous rendered frame and the current frame. This total change in orientation can be added to the orientation used to render the previous frame to determine the new orientation used to render the current frame. In cases where both gyroscope and accelerometer data are available, gyroscope data can be synchronized to the gravity vector periodically or as a one-time initial offset.
The touch screen 542 is a display found on many mobile computing devices, such as the iPhone®. The touch screen contains built-in touch or touch screen input sensors that are used to implement touch actions 544. In usage scenarios where a software platform does not contain a built-in touch or touch screen sensor, externally connected off-the-shelf sensors can be used. User input in the form of touching, dragging, pinching, etc, can be detected as touch actions by touch and touch screen sensors though the usage of off the shelf software frameworks.
User input in the form of touch actions can be provided to a software application by hardware abstraction frameworks on the software platform to provide the user with an interactive presentation of prerecorded media, shared media downloaded or streamed from the internet, or media which is currently being recorded or previewed.
Many software platforms provide a facility for decoding sequences of video frames using a decompression algorithm, as illustrated in block 546. Common algorithms include AVC and H.264. Decompression may be implemented as a hardware feature of the mobile computing device, or through software which runs on the general CPU, or a combination thereof. Decompressed video frames are passed to the video frame buffer 548.
Many software platforms provide a facility for decoding sequences of audio data using a decompression algorithm, as shown in block 550. One common algorithm is AAC. Decompression may be implemented as a hardware feature of the mobile computing device, or through software which runs on the general CPU, or a combination thereof. Decompressed audio frames are passed to the audio frame buffer 552 and output to a speaker 554.
The video frame buffer 548 is a hardware abstraction provided by any of a number of off the shelf software frameworks, storing one or more frames of decompressed video. These frames are retrieved by the software for various uses.
The audio buffer 552 is a hardware abstraction that can be implemented using known off the shelf software frameworks, storing some length of decompressed audio. This data can be retrieved by the software for audio compression and storage (recording).
The texture map 556 is a single frame retrieved by the software from the video buffer. This frame may be refreshed periodically from the video frame buffer in order to display a sequence of video.
The functions in the Decode Position, Orientation, and Velocity block 558 retrieve position, orientation, and velocity data from the media source for the current time offset into the video portion of the media source.
An interactive renderer 560 combines user input (touch actions), still or motion image data from the media source (via a texture map), and movement data from the media source to provide a user controlled view of prerecorded media, shared media downloaded or streamed over a network. User input is used in real time to determine the view orientation and zoom. The texture map is applied to a spherical, cylindrical, cubic, or other geometric mesh of vertices, providing a virtual scene for the view, correlating known angle coordinates from the texture with the desired angle coordinates of each vertex. Finally, the view is adjusted using orientation data to account for changes in the pitch, yaw, and roll of the original recording apparatus at the present time offset into the media.
Information from the interactive render can be used to produce a visible output either an integrated display device 562 such as the screen on an iPhone®, or an externally connected display device.
The speaker provides sound output from the audio buffer, synchronized to video being displayed from the interactive render, using either an integrated speaker device such as the speaker on an iPhone®, or an externally connected speaker device. In the event that multiple channels of audio data are recorded from a plurality of microphones in a known orientation, the audio field may be rotated during playback to synchronize spatially with the interactive renderer display.
Examples of some applications and uses of the system include: motion tracking; social networking; 360 mapping and touring; security and surveillance; and military applications.
For motion tracking, the processing software can be written to detect and track the motion of subjects of interest (people, vehicles, etc) and display views following these subjects of interest.
For social networking and entertainment or sporting events, the processing software may provide multiple viewing perspectives of a single live event from multiple devices. Using geo-positioning data, software can display media from other devices within close proximity at either the current or a previous time. Individual devices can be used for n-way sharing of personal media (much like YouTube® or Flickr®). Some examples of events include concerts and sporting events where users of multiple devices can upload their respective video data (for example, images taken from the user's location in a venue), and the various users can select desired viewing positions for viewing images in the video data. Software can also be provided for using the apparatus for teleconferencing in a one-way (presentation style—one or two-way audio communication and one-way video transmission), two-way (conference room to conference room), or n-way configuration (multiple conference rooms or conferencing environments).
For 360° mapping and touring, the processing software can be written to perform 360° mapping of streets, buildings, and scenes using geospatial data and multiple perspectives supplied over time by one or more devices and users. The apparatus can be mounted on ground or air vehicles as well, or used in conjunction with autonomous/semi-autonomous drones. Resulting video media can be replayed as captured to provide virtual tours along street routes, building interiors, or flying tours. Resulting video media can also be replayed as individual frames, based on user requested locations, to provide arbitrary 360° tours (frame merging and interpolation techniques can be applied to ease the transition between frames in different videos, or to remove temporary fixtures, vehicles, and persons from the displayed frames).
For security and surveillance, the apparatus can be mounted in portable and stationary installations, serving as low profile security cameras, traffic cameras, or police vehicle cameras. One or more devices can also be used at crime scenes to gather forensic evidence in 360° fields of view. The optic can be paired with a ruggedized recording device to serve as part of a video black box in a variety of vehicles; mounted either internally, externally, or both to simultaneously provide video data for some predetermined length of time leading up to an incident.
For military applications, man-portable and vehicle mounted systems can be used for muzzle flash detection, to rapidly determine the location of hostile forces. Multiple devices can be used within a single area of operation to provide multiple perspectives of multiple targets or locations of interest. When mounted as a man-portable system, the apparatus can be used to provide its user with better situational awareness of his or her immediate surroundings. When mounted as a fixed installation, the apparatus can be used for remote surveillance, with the majority of the apparatus concealed or camouflaged. The apparatus can be constructed to accommodate cameras in non-visible light spectrums, such as infrared for 360 degree heat detection.
Whereas particular embodiments of this invention have been described above for purposes of illustration, it will be evident to those skilled in the art that numerous variations of the details of the described embodiments may be made without departing from the invention.
This application is a divisional application of U.S. patent application Ser. No. 13/448,673, filed Apr. 17, 2012, which claims the benefit of U.S. Provisional Application Ser. No. 61/476,634, filed Apr. 18, 2011. Both of these applications are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61476634 | Apr 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13448673 | Apr 2012 | US |
Child | 14700775 | US |