This relates generally to image capturing including still and motion picture capture.
Generally, a shutter is used in a still imaging device such as a camera to select a particular image for capture and storage. Similarly in movie cameras, a record button is used to capture a series of frames to form a clip of interest.
Of course one problem with both of these techniques is that a certain degree of skill is required to time the capture to the exact sequence that is desired.
In accordance with some embodiments, no shutter or button needs to be operated in order to select a frame or group of frames for image capture in “buttonless frame selection”, as used herein. This frees the user from having to operate the camera to select frames of interest. In addition, it reduces the amount of skill needed in order to time the operation of a button to capture exactly that frame or group of frames that are really of interest.
Thus, referring to
Thus in some embodiments, a frame or group of frames is selected without the user ever having ever having operated a button to indicate which frame or frames the user wants to record. In some embodiments, post-capture analysis may be done to find those frames that are of interest. This may be done using audio or video analytics to find features or sounds within the captured media that indicate that the user wishes to record a frame or group of frames. In other embodiments, specific image features may be found in order to identify the frame or frames of interest in real time during image capture.
Referring to
In order to identify those frames, rules may be stored as indicated at 30. These rules indicate how to determine what it is that the user wants to get from the captured frames. For example, after the fact, a user may indicate that really what he or she was interested in recording was the depiction of friends at the end of a trip. The analytics engine 28 may analyze the completed audio or video recorded content in order to find that specific frame or frames of interest.
Thus, in some embodiments a continuous sequence of frames are recorded and then after the fact, the frames may be analyzed, using video or audio analytics, together with user input to find the frame or frames of interest. It is also possible after the fact to find particular gestures or sounds within the continuously captured frames. For example, proximate in time to the frame or frames of interest, the user may make a known sound or gesture which can be searched for thereafter in order to find the frame or frames of interest.
In accordance with another embodiment shown in
The sensors 32 may be coupled to media encoding device 40 which is coupled to the storage 20 and provides the media 22 for storage in the storage 20. Also coupled to the sensors is the analytics engine 28 itself coupled to the rules engine 34. The analytics engine may be coupled to the metadata 24 and the moments of interest 26. The analytics engine may be used to identify those moments of interest signaled by the user in the content being recorded.
A common time or sequencing 38 may provide an indication of a time for a time stamp so that the time or moment of interest can be identified.
In both embodiments, post capture and real time identification of frames of interest, the frame closest to the designated moment of interest serves as the first approximation of the intended or optimal frame. Having selected a moment of interest by either of the techniques, a second set of analytic criteria may be used to improve frame selection. Frames within a window of time before and after the initial selection may be scored against the criteria and a local maximum within the moment window may be selected. In some embodiments, a manual control may be provided to override the virtual frame selection.
A number of different capture scenarios may be contemplated. Capture may be initiated by sensor data. Examples of sensor data based capture may be global positioning system coordinate, acceleration or time data capture. The capture of images may be based on data sensed on the person carrying the camera or by characteristics of movement or other features of an object depicted in an imaged scene or a set of frames.
Thus, when the user crosses the finish line he or she may be at a particular global positioning point that causes a body mounted camera to snap a picture. Similarly, the acceleration of the camera itself may trigger a picture so that a picture of the scene as observed by a ski jumper may be captured. However, the video frames may be analyzed for objects moving with a certain acceleration which may trigger capture. Since many cameras include onboard accelerometers and other sensor data that may be included in the metadata associated with the captured image or frames, this information is easily available. Capture can also be triggered by time which may also be included in the captured frame.
In other embodiments, objects may be detected, objects may be recognized, and spoken commands or speech may be detected or actually understood and recognized as the capture trigger. For example when the user says “capture”, the frame may be captured. When the user's voice is recognized, in the captured audio, that may be the trigger to capture a frame or set of frames. Likewise when a particular statement is made, that may trigger image capture. And still another example, a statement is made that has a certain meaning may trigger image capture. And still other examples when particular objects are recognized within the image, image capture may be initiated.
In some embodiments, training may be associated with image detection, recognition or understanding embodiments. Thus a system may be trained to recognize voice, to understand the user's speech, or to associate given objects with the captured triggering. This may be done during a set up phase using graphical user interfaces in some embodiments.
In other embodiments, there may be intelligence in the selection of the actual captured frame. When the trigger is received, a frame proximate to the trigger point may be selected based on a number of criteria including the quality of the actual captured image frame. For example, overexposed or underexposed frames proximate the trigger point may be skipped to obtain the closest-in-time frame of good image quality.
Thus referring to
The sequence 42 proceeds by directing the imaging device 10 to continuously capture frames as indicated in block 44. Real time capture of moments of interest is facilitated by audio or video analytics unit 46 that analyzes the captured video and audio for queues that indicate that a particular sequence is to be captured. For example, an eye-blinking gesture or a hand gesture may be used to signal a moment of interest. Similarly a particular sound may be made to indicate a moment of interest. Once the analytics identifies the signal, a hit may be indicated as determined in diamond 48. Then the time may be flagged as of interest in block 50. In some embodiments instead of flagging a particular frame, a time may be indicated using a time stamp for example. Then frames proximate to the time of interest may be flagged so that the user does not have to provide the indication with a high degree of timing accuracy.
Referring next to
The sequence 52 also performs continuous capture of a series of frames as indicated in block 54. A check at diamond 56 determines whether a request to find a moment of interest has been received. If so, analytics may be used as indicated in block 58 to analyze the recorded content to identify a moment of interest having particular features. The content may be audio and/or video content. The features can be any audio or video analytically determinable signal that the user may have deliberately done at the time or may recall having been done at the time that is useful to identify a particular moment of interest. If a hit is detected at diamond 60, a time frame corresponding to the time of the hit may be flagged as a moment of interest as indicated at block 62. Again, instead of flagging a particular frame, a time may be used instead in some embodiments to make the identification of frames less skilled dependent.
Finally turning to
The sequence 64 begins by locating that frame which is closest to the recorded time of interest as indicated in block 66. A predetermined number of frames may be collected before and after the located frame as indicated in block 68.
Next as indicated in block 70, the frames may be scored. The frames may be scored based on their similarity as determined by video or audio analytics to the features that were specified as the basis for identifying moments of interest.
Then the best frame may be selected as indicated in block 72 and used as an index into the set of frames. In some cases only the best frame may be used. In other cases a clip may be defined within a set of sequential frames defined by how close the frames score to the ideal.
The graphics processing techniques described herein may be implemented in various hardware architectures. For example, graphics functionality may be integrated within a chipset. Alternatively, a discrete graphics processor may be used. As still another embodiment, the graphics functions may be implemented by a general purpose processor, including a multicore processor.
References throughout this specification to “one embodiment” or “an embodiment” mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation encompassed within the present invention. Thus, appearances of the phrase “one embodiment” or “in an embodiment” are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be instituted in other suitable forms other than the particular embodiment illustrated and all such forms may be encompassed within the claims of the present application.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US11/67457 | 12/28/2011 | WO | 00 | 6/13/2013 |