This disclosure relates to video editing, and more specifically, to simulating a virtual lens in a cropped image or video.
It is often desirable to perform crop or zoom operations on high resolution images or video frames to extract a reduced field of view sub-frame. Particularly, for wide angle or spherical images or video, subjects in the originally captured content may appear very small. Furthermore, much of the captured field of view may be of little interest to a given viewer. Thus, cropping or zooming the content can beneficially obtain an image or video with the subject more suitably framed. Wide angle lens used to capture wide angle or spherical content may introduce the perception of distortion that tends to increase near the edges and corners of the captured frames due to the fact that the cameras are projecting content from a spherical world onto a rectangular display. Thus, cropping an image to extract a sub-frame near an edge or corner of a wide angle image capture may result in an image having significantly different distortion than a sub-frame extracted from a center of the image. Furthermore, the cropped image will have a different overall distortion effect than the original image. These distortion variations may be undesirable particularly when combining cropped sub-frames corresponding to different regions of a video (e.g., to track movement of a subject of interest), or combining cropped sub-frames with uncropped frames (e.g., to produce in zoom effect).
The disclosed embodiments have other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which:
The figures and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.
Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.
In an image or video capture system, a virtual lens is simulated when applying a crop or zoom effect to an input image or video. An input image or video frame is received that has a first field of view of a scene. The input image or video frame depicts the scene with an input lens distortion caused by lens characteristics of a lens used to capture the input image or video frame. A selection of a sub-frame representing a portion of the input image or video frame is obtained that has a second field of view of the scene smaller than the first field of view. The sub-frame is processed to remap the input lens distortion centered in the first field of view to a desired lens distortion in the sub-frame centered in the second field of view. The processed sub-frame is the outputted.
When producing an output video or images from original content that includes cropping, zooming, re-pointing, and/or panning, it may be desirable for the output video or images to exhibit consistent lens characteristics. Thus, for example, it may be desirable for cropped sub-frames extracted from different portions of an original video to exhibit similar lens characteristics. Furthermore, it may be desirable for cropped sub-frames of different size to exhibit similar lens characteristics to each other and to the original uncropped video. Thus, to achieve this effect, a virtual lens model is applied to each of the extracted sub-frames 104 to produce consistent lens characteristics across each output image. As a result, the output images may simulate the same effect that would have been achieved by a camera operator manually re-orienting and/or physically moving the camera to produce the panning, re-pointing, cropping, and/or zooming effects. In one embodiment, the output images 104 may be processed so that the lens characteristics in the output images 104 match the characteristics naturally appearing in the original images 102. For example, each of the sub-frames 104-B, 104-C, 104-D may be processed to have a similar fisheye effect as the sub-frame 104-A as if the scenes depicted in sub-frames 104-B, 104-C, 104-D were natively captured in the same way as the original images 102. Alternatively, any desired lens characteristic may be applied that does not necessarily match the lens characteristic of the original image 102. In this way, a cohesive output video or set of images may be generated with consistent lens characteristics from frame-to-frame so that it is not apparent to the viewer that the panning, re-pointing, or zooming effects were created in post-processing instead of during capture. This process may be applied to any type of lens distortion including, for example, lens distortion characteristic of conventional lenses, wide angle lenses, fisheye lenses, zoom lenses, hemispherical lenses, flat lenses or other types of camera lenses.
The camera 330 can include a camera body, one or more a camera lenses, various indicators on the camera body (such as LEDs, displays, and the like), various input mechanisms (such as buttons, switches, and touch-screen mechanisms), and electronics (e.g., imaging electronics, power electronics, metadata sensors, etc.) internal to the camera body for capturing images via the one or more lenses and/or performing other functions. In one embodiment, the camera 330 may be capable of capturing spherical or substantially spherical content. As used herein, spherical content may include still images or video having spherical or substantially spherical field of view. For example, in one embodiment, the camera 330 may capture an image or video having a 360 degree field of view in the horizontal plane and a 180 degree field of view in the vertical plane. Alternatively, the camera 330 may capture substantially spherical images or video having less than 360 degrees in the horizontal direction and less than 180 degrees in the vertical direction (e.g., within 10% of the field of view associated with fully spherical content). In other embodiments, the camera 330 may capture images or video having a non-spherical wide angle field of view.
As described in greater detail in conjunction with
The media server 340 may receive and store images or video captured by the camera 330 and may allow users to access images or videos at a later time. In one embodiment, the media server 340 may provide the user with an interface, such as a web page or native application installed on the client device 335, to interact with and/or edit the stored images or videos and to generate output images or videos relevant to a particular user from one or more stored images or videos. At least some of output images or video frames may have a reduced field of view relative to the original images or video frames so as to produce zooming, re-pointing, and/or panning effect. To generate the output images or video, the media server 340 may extract a sequence of relevant sub-frames having the reduced field of view from the original images or video frames. For example, sub-frames may be selected from one or more input images or video frames to generate output images or video that tracks a path of a particular individual or object. In one embodiment, the media server 340 can automatically identify sub-frames by identifying spherical images or video captured near a particular location and time where a user was present (or other time and location of interest). In another embodiment, a time-varying path (e.g., a sequence of time-stamped locations) of a target (e.g., a person, object, or other scene of interest) can be used to automatically find spherical video having time and location metadata closely matching the path. Furthermore, by correlating the relative location of the camera 330 with a location at each time point in the path of interest, the media server 340 may automatically determine a direction between the camera 330 and the target and thereby automatically select the appropriate sub-frames depicting the target. In other embodiments, the media server 340 can automatically identify sub-frames of interest based on the image or video content itself or an associated audio track. For example, facial recognition, object recognition, motion tracking, or other content recognition or identification techniques may be applied to the video to identify sub-frames of interest. Alternatively, or in addition, a microphone array may be used to determine directionality associated with a received audio signal, and the sub-frames of interest may be chosen based on the direction between the camera and the audio source. These embodiments beneficially can be performed without any location tracking of the target of interest. Furthermore, in one embodiment, after the media server 340 identifies sub-frames of interest, the media server 340 automatically obtains a sub-frame center location, a sub-frame size, and a scaling factor for transforming the input image based on the metadata associated with the input image or based on image characteristics of the input image (e.g., time and location of interest, target of interest, the image or video content itself or an associated audio track). The scaling factor is defined as a ratio of a size of the input image to the sub-frame size. The media server 340 applies the crop or zoom effect applied to the input image based on the sub-frame center location, sub-frame size, and the scaling factor to generate the sub-frame. Further still, any of the above techniques may be used in combination to automatically determine which sub-frames to select for generating output images or video. In other embodiments, the selection of sub-frames may be performed manually using post-processing tools, e.g., image or video editing tools. In some embodiments, the media server 340 obtains metadata associated with the input image. The metadata at least specifies the lens characteristics of the lens to capture the input image. The media server 340 processes the sub-frame using the lens characteristics specified in the metadata. For example, the media server 340 processes the sub-frame to remap the input lens distortion centered in a first field of view of the input image to a desired lens distortion in the sub-frame centered in a second field of view of the sub-frame. The second field of view of the sub-frame is smaller than the first field of view. The desired lens distortion exhibits consistent lens characteristics with those in the input image. The media server 340 outputs the processed sub-frame with the same size as the input image.
A user can interact with interfaces provided by the media server 340 via the client device 335. The client device 335 may be any computing device capable of receiving user inputs as well as transmitting and/or receiving data via the network 320. In one embodiment, the client device 335 may comprise a conventional computer system, such as a desktop or a laptop computer. Alternatively, the client device 335 may comprise a device having computer functionality, such as a personal digital assistant (PDA), a mobile telephone, a smartphone or another suitable device. The user can use the client device 335 to view and interact with or edit videos or images stored on the media server 340. For example, the user can view web pages including summaries for a set of videos or images captured by the camera 330 via a web browser on the client device 335.
One or more input devices associated with the client device 335 may receive input from the user. For example, the client device 335 can include a touch-sensitive display, a keyboard, a trackpad, a mouse, a voice recognition system, and the like. In some embodiments, the client device 335 can access videos, images, and/or metadata from the camera 330 or one or more metadata sources 310, and can transfer the accessed metadata to the media server 340. For example, the client device may retrieve videos or images and metadata associated with the videos or images from the camera via a universal serial bus (USB) cable coupling the camera 330 and the client device 335. The client device 335 can then upload the retrieved videos and metadata to the media server 340. In one embodiment, the client device 335 may interact with the video server 340 through an application programming interface (API) running on a native operating system of the client device 335, such as IOS® or ANDROID™. While
The media server 340 may communicate with the client device 335, the metadata sources 310, and the camera 330 via the network 320, which may include any combination of local area and/or wide area networks, using both wired and/or wireless communication systems. In one embodiment, the network 320 may use standard communications technologies and/or protocols. In some embodiments, the processes attributed to the client 335 or media server 340 herein may instead by performed within the camera 330.
Various components of the environment 300 of
The lens 412 can be, for example, a wide angle lens, hemispherical, or hyperhemispherical lens that focuses light entering the lens to the image sensor 414 which captures images and/or video frames. As described above, different lens may produce different lens distortion effects in different portions of the image or video frame due to different lens characteristics. For example, the lens characteristics may cause straight lines in the image of a scene to appear as curved lines in at least a portion of the image or video frame. In another example, the lens characteristics may change orientations of straight lines in an image of the scene. In such an example, the vertical or horizontal straight lines may appear to be oblique lines in the image of the scene. In another example, the lens characteristics may cause lines of the same length in the scene to appear to be different lengths in different portions of the image or video frame. The lens characteristics may be based on an optical design of the lens. Examples of lens characteristics that may affect the lens distortion may include, for example, a focal length, an f-number, a field of view, a magnification, a numerical aperture, a resolution, a working distance, an aperture size, lens materials, lens coatings, or other lens characteristics. Different types of lens may have different lens characteristics causing different distortions. For example, a conventional lens may have a fixed focal length (e.g., greater than 50 mm) and produces a “natural” field of view that may look natural to observers from a normal view distance. A wide angle lens may have a shorter focal length (e.g., less than 40 mm) than the one of conventional lens and may produce a wide field of view (also referred to as an expanded field of view). The types of the wide angle lens may include rectilinear wide-angle lens and a fisheye lens. The rectilinear wide-angle lens may produce a wide field of view that yields images of a scene in which straight lines in the scene appear as straight lines in the image. The fisheye lens produces a wider field of view than the rectilinear wide-angle lens and may cause straight lines in the scene to appear as curved lines in the image in at least a portion of the image. A hemispherical lens (which may be a type of fisheye lens) may produce a hemispherical field of view. A zoom lens may magnify a scene so that objects in the scene appear larger than in the image. A flat may have a flat shape that introduces other types of distortion into the image.
The image sensor 414 may capture high-definition images or video having a resolution of, for example, 720p, 1080p, 4k, or higher. In one embodiment, spherical video or images may be captured as a 5760 pixels by 2880 pixels with a 360 degree horizontal field of view and a 180 degree vertical field of view. For video, the image sensor 414 may capture video at frame rates of, for example, 30 frames per second, 60 frames per second, or higher. The image processor 416 may perform one or more image processing functions of the captured images or video. For example, the image processor 416 may perform a Bayer transformation, demosaicing, noise reduction, image sharpening, image stabilization, rolling shutter artifact reduction, color space conversion, compression, or other in-camera processing functions. Processed images and video may be temporarily or persistently stored to system memory 430 and/or to a non-volatile storage, which may be in the form of internal storage or an external memory card.
An input/output (I/O) interface 460 may transmit and receive data from various external devices. For example, the I/O interface 460 may facilitate the receiving or transmitting video or audio information through an I/O port. Examples of I/O ports or interfaces may include USB ports, HDMI ports, Ethernet ports, audioports, and the like. Furthermore, embodiments of the I/O interface 460 may include wireless ports that can accommodate wireless connections. Examples of wireless ports include Bluetooth, Wireless USB, Near Field Communication (NFC), and the like. The I/O interface 460 may also include an interface to synchronize the camera 330 with other cameras or with other external devices, such as a remote control, a second camera, a smartphone, a client device 235, or a media server 340.
A control/display subsystem 470 may include various control and display components associated with operation of the camera 330 including, for example, LED lights, a display, buttons, microphones, speakers, and the like. The audio subsystem 450 may include, for example, one or more microphones and one or more audio processors to capture and process audio data correlated with video capture. In one embodiment, the audio subsystem 450 may include a microphone array having two or microphones arranged to obtain directional audio signals.
Sensors 440 may capture various metadata concurrently with, or separately from, video or image capture. For example, the sensors 440 may capture time-stamped location information based on a global positioning system (GPS) sensor, and/or an altimeter. Other sensors 440 may be used to detect and capture orientation of the camera 330 including, for example, an orientation sensor, an accelerometer, a gyroscope, or a magnetometer. Sensor data captured from the various sensors 440 may be processed to generate other types of metadata. For example, sensor data from the accelerometer may be used to generate motion metadata, comprising velocity and/or acceleration vectors representative of motion of the camera 330. Furthermore, sensor data from the may be used to generate orientation metadata describing the orientation of the camera 330. Sensor data from the GPS sensor provides GPS coordinates identifying the location of the camera 330, and the altimeter measures the altitude of the camera 330. In one embodiment, the sensors 440 may be rigidly coupled to the camera 330 such that any motion, orientation or change in location experienced by the camera 330 may also be experienced by the sensors 440. The sensors 440 furthermore may associates a time stamp representing when the data was captured by each sensor. In one embodiment, the sensors 440 may automatically begin collecting sensor metadata when the camera 330 begins recording a video or captures an image.
In an embodiment, the media server 340 may enable users to create and manage individual user accounts. User account information is stored in the user storage 505. A user account may include information provided by the user (such as biographic information, geographic information, and the like) and may also include additional information inferred by the media server 340 (such as information associated with a user's historical use of a camera and interactions with the media server 340). Examples of user information may include a username, contact information, a user's hometown or geographic region, other location information associated with the user, other users linked to the user as “friends,” and the like. The user storage 505 may include data describing interactions between a user and videos captured by the user. For example, a user account can include a unique identifier associating videos uploaded by the user with the user's user account.
The image/video storage 510 may store videos or images captured and uploaded by users of the media server 340. The media server 340 may access videos or images captured using the camera 330 and store the videos or images in the image/video storage 510. In one example, the media server 340 may provide the user with an interface executing on the client device 335 that the user may use to upload videos or images to the image/video storage 510. In one embodiment, the media server 340 may index images and videos retrieved from the camera 330 or the client device 335, and may store information associated with the indexed images and videos in the image/video storage 510. For example, the media server 340 may provide the user with an interface to select one or more index filters used to index images or videos. Examples of index filters may include but are not limited to: the time and location that the image or video was captured, the type of equipment used by the user (e.g., ski equipment, mountain bike equipment, etc.), the type of activity being performed by the user while the image or video was captured (e.g., snowboarding, mountain biking, etc.), or the type of camera 330 used to capture the content.
In some embodiments, the media server 340 generates a unique identifier for each image or video stored in the image/video storage 510 which may be stored as metadata associated with the image or video in the metadata storage 525. In some embodiments, the generated identifier for a particular image or video may be unique to a particular user. For example, each user can be associated with a first unique identifier (such as a 10-digit alphanumeric string), and each image or video captured by a user may be associated with a second unique identifier made up of the first unique identifier associated with the user concatenated with an image or video identifier (such as an 8-digit alphanumeric string unique to the user). Thus, each image or video identifier may be unique among all images and videos stored at the image/video storage 510, and can be used to identify the user that captured the image or video.
The metadata storage 525 may store metadata associated with images or videos stored by the image/video storage 510 and with users stored in the user storage 505. Particularly, for each image or video, the metadata storage 525 may store metadata including time-stamped location information associated with each image or frame of the video to indicate the location of the camera 330 at any particular moment during capture of the content. Additionally, the metadata storage 525 may store other types of sensor data captured by the camera 330 in association with an image or video frame including, for example, gyroscope data indicating motion and/or orientation of the device. In some embodiments, metadata corresponding to an image or video may be stored within an image or video file itself, and not in a separate storage module. The metadata storage 525 may also store time-stamped location information associated with a particular user so as to represent a user's physical path during a particular time interval. This data may be obtained from a camera held by the user, a mobile phone application that tracks the user's path, or another metadata source. Furthermore, in one embodiment, the metadata storage 525 stores metadata specifying the lens characteristics with the image or video and metadata associated with the input image or image characteristics of the input image (e.g., time and location of interest, target of interest, the image or video content itself or an associated audio track).
The web server 530 may provide a communicative interface between the media server 340 and other entities of the environment of
A pre-processing module 560 may pre-process and indexes uploaded images or videos. For example, in one embodiment, uploaded images or videos may be automatically processed by the pre-processing module 560 to conform the images or videos to a particular file format, resolution, etc. Furthermore, in one embodiment, the pre-processing module 560 may automatically parse the metadata associated with images or videos upon being uploaded.
The image/video generation module 540 may automatically generate output images or videos relevant to a user or to a particular set of inputs. For example, the image/video generation module 540 may generate an output video or sequence of images including content that tracks a sequence of locations representing a physical path over a particular time interval. Alternatively, the image/video generation module 440 may generate an output video or sequence of images including content that tracks a particular face or object identified in the images or video, tracks an area of motion having particular motion characteristics, tracks an identified audio source, etc. The output images or videos may have a reduced field of view (e.g., a standard non-spherical field of view) and represent relevant sub-frames to provide an image or video of interest. For example, the image or video may track a particular path of an individual, object, or other target so that each sub-frame depicts the target as the target moves through a given scene.
In some embodiments, image/video generation module 540 obtains metadata associated with the input image from metadata storage 525 and identifies sub-frames of interest. The image/video generation module 540 automatically obtains a sub-frame center location, a sub-frame size, and a scaling factor for transforming the input image based on the metadata associated with the input image or image characteristics of the input image (e.g., time and location of interest, target of interest, the image or video content itself or an associated audio track). The image/video generation module 540 processes the sub-frame using the lens characteristics specified in the metadata and outputs the processed sub-frame with the same size as the input image.
In an embodiment, the media server 340 may enable the user to select from predefined image or video generation templates. For example, the user can request that the media server 340 generate a video or set of images based on location tracking, based on facial recognition, gesture recognition, audio tracking, motion detection, voice recognition, or other techniques. Various parameters used by the media server 340 to select relevant frames such as thresholds governing proximity distance and clip duration can be adjusted or pre-set.
In an embodiment, the user interface may also provide an interactive viewer that enables the user to pan around within the content being viewed. This may allow the user to search for significant moments to incorporate into the output video or image and manually edit the automatically generated video or image. In one embodiment, the user interface enables various editing effects to be added to a generated output image or video. For example, the editing interface may enable effects such as, cut-away effects, panning, tilting, rotations, reverse angles, image stabilization, zooming, object tracking,
In an alternative embodiment, a two-step transformation may be used instead of a direct mapping. For example, based on known characteristics of the lens and the location and size of the selected sub-frame, an appropriate inverse function may be performed to remove the lens distortion present in the sub-frame. For example, if the original input image or video frame is captured with a fisheye lens, curvature in the areas of the sub-frame corresponding to the edges and corners of the original input image or video frame may be removed. The inverse function of the input lens distortion may be applied centered on the field of view of the original input image or video frame. As a result of applying the inverse function, the sub-frame may be transformed to a rectilinear image in which straight lines in the portion of the scene depicted in the sub-frame appear straight. Then, a desired lens distortion function centered at the center of the sub-frame may be applied to the rectilinear image to re-introduce a lens distortion effect.
Throughout this specification, some embodiments have used the expression “coupled” along with its derivatives. The term “coupled” as used herein is not necessarily limited to two or more elements being in direct physical or electrical contact. Rather, the term “coupled” may also encompass two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other, or are structured to provide a thermal conduction path between the elements.
Likewise, as used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.
Finally, as used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for the described embodiments as disclosed from the principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the scope defined in the appended claims.
Number | Date | Country | |
---|---|---|---|
62164409 | May 2015 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16713839 | Dec 2019 | US |
Child | 16884962 | US | |
Parent | 16535940 | Aug 2019 | US |
Child | 16713839 | US | |
Parent | 16229512 | Dec 2018 | US |
Child | 16535940 | US | |
Parent | 15157207 | May 2016 | US |
Child | 16229512 | US |