Scene and activity identification in video summary generation

Description

BACKGROUND

Technical Field

This disclosure relates to a camera system, and more specifically, to processing video data captured using a camera system.

Description of the Related Art

Digital cameras are increasingly used to capture videos in a variety of settings, for instance outdoors or in a sports environment. However, as users capture increasingly more and longer videos, video management becomes increasingly difficult. Manually searching through raw videos (“scrubbing”) to identify the best scenes is extremely time consuming. Automated video processing to identify the best scenes can be very resource-intensive, particularly with high-resolution raw-format video data. Accordingly, an improved method of automatically identifying the best scenes in captured videos and generating video summaries including the identified best scenes can beneficially improve a user's video editing experience.

BRIEF DESCRIPTIONS OF THE DRAWINGS

The disclosed embodiments have other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram of a camera system environment according to one embodiment.

FIG. 2 is a block diagram illustrating a camera system, according to one embodiment.

FIG. 3 is a block diagram of a video server, according to one embodiment.

FIG. 4 is a flowchart illustrating a method for selecting video portions to include in a video summary, according to one embodiment.

FIG. 5 is a flowchart illustrating a method for generating video summaries using video templates, according to one embodiment.

FIG. 6 is a flowchart illustrating a method for generating video summaries of videos associated with user-tagged events, according to one embodiment.

FIG. 7 is a flowchart illustrating a method of identifying an activity associated with a video, according to one embodiment.

FIG. 8 is a flowchart illustrating a method of sharing a video based on an identified activity within the video, according to one embodiment.

DETAILED DESCRIPTION

The figures and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.

Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.

Example Camera System Configuration

FIG. 1 is a block diagram of a camera system environment, according to one embodiment. The camera system environment 100 includes one or more metadata sources 110, a network 120, a camera 130, a client device 135 and a video server 140. In alternative configurations, different and/or additional components may be included in the camera system environment 100. Examples of metadata sources 110 include sensors (such as accelerometers, speedometers, rotation sensors, GPS sensors, altimeters, and the like), camera inputs (such as an image sensor, microphones, buttons, and the like), and data sources (such as external servers, web pages, local memory, and the like). Although not shown in FIG. 1, it should be noted that in some embodiments, one or more of the metadata sources 110 can be included within the camera 130.

The camera 130 can include a camera body having a camera lens structured on a front surface of the camera body, various indicators on the front of the surface of the camera body (such as LEDs, displays, and the like), various input mechanisms (such as buttons, switches, and touch-screen mechanisms), and electronics (e.g., imaging electronics, power electronics, metadata sensors, etc.) internal to the camera body for capturing images via the camera lens and/or performing other functions. As described in greater detail in conjunction with FIG. 2 below, the camera 130 can include sensors to capture metadata associated with video data, such as motion data, speed data, acceleration data, altitude data, GPS data, and the like. A user uses the camera 130 to record or capture videos in conjunction with associated metadata which the user can edit at a later time.

The video server 140 receives and stores videos captured by the camera 130 allowing a user to access the videos at a later time. In one embodiment, the video server 140 provides the user with an interface, such as a web page or native application installed on the client device 135, to interact with and/or edit the videos captured by the user. In one embodiment, the video server 140 generates video summaries of various videos stored at the video server, as described in greater detail in conjunction with FIG. 3 and FIG. 4 below. As used herein, “video summary” refers to a generated video including portions of one or more other videos. A video summary often includes highlights (or “best scenes”) of a video captured by a user. In some embodiments, best scenes include events of interest within the captured video, scenes associated with certain metadata (such as an above threshold altitude or speed), scenes associated with certain camera or environment characteristics, and the like. For example, in a video captured during a snowboarding trip, the best scenes in the video can include jumps performed by the user or crashes in which the user was involved. In addition to including one or more highlights of the video, a video summary can also capture the experience, theme, or story associated with the video without requiring significant manual editing by the user. In one embodiment, the video server 140 identifies the best scenes in raw video based on the metadata associated with the video. The video server 140 may then generate a video summary using the identified best scenes of the video. The metadata can either be captured by the camera 130 during the capture of the video or can be retrieved from one or more metadata sources 110 after the capture of the video.

Metadata includes information about the video itself, the camera used to capture the video, the environment or setting in which a video is captured or any other information associated with the capture of the video. For example, metadata can include acceleration data representative of the acceleration of a camera 130 attached to a user as the user captures a video while snowboarding down a mountain. Such acceleration metadata helps identify events representing a sudden change in acceleration during the capture of the video, such as a crash the user may encounter or a jump the user performs. Thus, metadata associated with captured video can be used to identify best scenes in a video recorded by a user without relying on image processing techniques or manual curation by a user.

Examples of metadata include: telemetry data (such as motion data, velocity data, and acceleration data) captured by sensors on the camera 130; location information captured by a GPS receiver of the camera 130; compass heading information; altitude information of the camera 130; biometric data such as the heart rate of the user, breathing of the user, eye movement of the user, body movement of the user, and the like; vehicle data such as the velocity or acceleration of the vehicle, the brake pressure of the vehicle, or the rotations per minute (RPM) of the vehicle engine; or environment data such as the weather information associated with the capture of the video. The video server 140 may receive metadata directly from the camera 130 (for instance, in association with receiving video from the camera), from a client device 135 (such as a mobile phone, computer, or vehicle system associated with the capture of video), or from external metadata sources 110 such as web pages, blogs, databases, social networking sites, or servers or devices storing information associated with the user (e.g., a user may use a fitness device recording fitness data).

A user can interact with interfaces provided by the video server 140 via the client device 135. The client device 135 is any computing device capable of receiving user inputs as well as transmitting and/or receiving data via the network 120. In one embodiment, the client device 135 is a conventional computer system, such as a desktop or a laptop computer. Alternatively, the client device 135 may be a device having computer functionality, such as a personal digital assistant (PDA), a mobile telephone, a smartphone or another suitable device. The user can use the client device to view and interact with or edit videos stored on the video server 140. For example, the user can view web pages including video summaries for a set of videos captured by the camera 130 via a web browser on the client device 135.

One or more input devices associated with the client device 135 receive input from the user. For example, the client device 135 can include a touch-sensitive display, a keyboard, a trackpad, a mouse, a voice recognition system, and the like. In some embodiments, the client device 135 can access video data and/or metadata from the camera 130 or one or more metadata sources 110, and can transfer the accessed metadata to the video server 140. For example, the client device may retrieve videos and metadata associated with the videos from the camera via a universal serial bus (USB) cable coupling the camera 130 and the client device 135. The client device can then upload the retrieved videos and metadata to the video server 140.

In one embodiment, the client device 135 executes an application allowing a user of the client device 135 to interact with the video server 140. For example, a user can identify metadata properties using an application executing on the client device 135, and the application can communicate the identified metadata properties selected by a user to the video server 140 to generate and/or customize a video summary. As another example, the client device 135 can execute a web browser configured to allow a user to select video summary properties, which in turn can communicate the selected video summary properties to the video server 140 for use in generating a video summary. In one embodiment, the client device 135 interacts with the video server 140 through an application programming interface (API) running on a native operating system of the client device 135, such as IOS® or ANDROID™. While FIG. 1 shows a single client device 135, in various embodiments, any number of client devices 135 may communicate with the video server 140.

The video server 140 communicates with the client device 135, the metadata sources 110, and the camera 130 via the network 120, which may include any combination of local area and/or wide area networks, using both wired and/or wireless communication systems. In one embodiment, the network 120 uses standard communications technologies and/or protocols. In some embodiments, all or some of the communication links of the network 120 may be encrypted using any suitable technique or techniques. It should be noted that in some embodiments, the video server 140 is located within the camera 130 itself.

Example Camera Configuration

FIG. 2 is a block diagram illustrating a camera system, according to one embodiment. The camera 130 includes one or more microcontrollers 202 (such as microprocessors) that control the operation and functionality of the camera 130. A lens and focus controller 206 is configured to control the operation and configuration of the camera lens. A system memory 204 is configured to store executable computer instructions that, when executed by the microcontroller 202, perform the camera functionalities described herein. A synchronization interface 208 is configured to synchronize the camera 130 with other cameras or with other external devices, such as a remote control, a second camera 130, a smartphone, a client device 135, or a video server 140.

A controller hub 230 transmits and receives information from various I/O components. In one embodiment, the controller hub 230 interfaces with LED lights 236, a display 232, buttons 234, microphones such as microphones 222, speakers, and the like.

A sensor controller 220 receives image or video input from an image sensor 212. The sensor controller 220 receives audio inputs from one or more microphones, such as microphone 212a and microphone 212b. Metadata sensors 224, such as an accelerometer, a gyroscope, a magnetometer, a global positioning system (GPS) sensor, or an altimeter may be coupled to the sensor controller 220. The metadata sensors 224 each collect data measuring the environment and aspect in which the video is captured. For example, the accelerometer 220 collects motion data, comprising velocity and/or acceleration vectors representative of motion of the camera 130, the gyroscope provides orientation data describing the orientation of the camera 130, the GPS sensor provides GPS coordinates identifying the location of the camera 130, and the altimeter measures the altitude of the camera 130. The metadata sensors 224 are rigidly coupled to the camera 130 such that any motion, orientation or change in location experienced by the camera 130 is also experienced by the metadata sensors 224. The sensor controller 220 synchronizes the various types of data received from the various sensors connected to the sensor controller 220. For example, the sensor controller 220 associates a time stamp representing when the data was captured by each sensor. Thus, using the time stamp, the measurements received from the metadata sensors 224 are correlated with the corresponding video frames captured by the image sensor 212. In one embodiment, the sensor controller begins collecting metadata from the metadata sources when the camera 130 begins recording a video. In one embodiment, the sensor controller 220 or the microcontroller 202 performs operations on the received metadata to generate additional metadata information. For example, the microcontroller may integrate the received acceleration data to determine the velocity profile of the camera 130 during the recording of a video.

Additional components connected to the microcontroller 202 include an I/O port interface 238 and an expansion pack interface 240. The I/O port interface 238 may facilitate the receiving or transmitting video or audio information through an I/O port. Examples of I/O ports or interfaces include USB ports, HDMI ports, Ethernet ports, audioports, and the like. Furthermore, embodiments of the I/O port interface 238 may include wireless ports that can accommodate wireless connections. Examples of wireless ports include Bluetooth, Wireless USB, Near Field Communication (NFC), and the like. The expansion pack interface 240 is configured to interface with camera add-ons and removable expansion packs, such as a display module, an extra battery module, a wireless module, and the like.

Example Video Server Architecture

FIG. 3 is a block diagram of an architecture of the video server. The video server 140 in the embodiment of FIG. 3 includes a user storage module 305 (“user store” hereinafter), a video storage module 310 (“video store” hereinafter), a template storage module 315 (“template store” hereinafter), a video editing module 320, a metadata storage module 325 (“metadata store” hereinafter), a web server 330, an activity identifier 335, and an activity storage module 340 (“activity store” hereinafter). In other embodiments, the video server 140 may include additional, fewer, or different components for performing the functionalities described herein. Conventional components such as network interfaces, security functions, load balancers, failover servers, management and network operations consoles, and the like are not shown so as to not obscure the details of the system architecture.

Each user of the video server 140 creates a user account, and user account information is stored in the user store 305. A user account includes information provided by the user (such as biographic information, geographic information, and the like) and may also include additional information inferred by the video server 140 (such as information associated with a user's previous use of a camera). Examples of user information include a username, a first and last name, contact information, a user's hometown or geographic region, other location information associated with the user, and the like. The user store 305 may include data describing interactions between a user and videos captured by the user. For example, a user account can include a unique identifier associating videos uploaded by the user with the user's user account.

The video store 310 stores videos captured and uploaded by users of the video server 140. The video server 140 may access videos captured using the camera 130 and store the videos in the video store 310. In one example, the video server 140 may provide the user with an interface executing on the client device 135 that the user may use to upload videos to the video store 315. In one embodiment, the video server 140 indexes videos retrieved from the camera 130 or the client device 135, and stores information associated with the indexed videos in the video store. For example, the video server 140 provides the user with an interface to select one or more index filters used to index videos. Examples of index filters include but are not limited to: the type of equipment used by the user (e.g., ski equipment, mountain bike equipment, etc.), the type of activity being performed by the user while the video was captured (e.g., snowboarding, mountain biking, etc.), the time and data at which the video was captured, or the type of camera 130 used by the user.

In some embodiments, the video server 140 generates a unique identifier for each video stored in the video store 310. In some embodiments, the generated identifier for a particular video is unique to a particular user. For example, each user can be associated with a first unique identifier (such as a 10-digit alphanumeric string), and each video captured by a user is associated with a second unique identifier made up of the first unique identifier associated with the user concatenated with a video identifier (such as an 8-digit alphanumeric string unique to the user). Thus, each video identifier is unique among all videos stored at the video store 310, and can be used to identify the user that captured the video.

The metadata store 325 stores metadata associated with videos stored by the video store 310. For instance, the video server 140 can retrieve metadata from the camera 130, the client device 135, or one or more metadata sources 110, can associate the metadata with the corresponding video (for instance by associating the metadata with the unique video identifier), and can store the metadata in the metadata store 325. The metadata store 325 can store any type of metadata, including but not limited to the types of metadata described herein. It should be noted that in some embodiments, metadata corresponding to a video is stored within a video file itself, and not in a separate storage module.

The web server 330 provides a communicative interface between the video server 140 and other entities of the environment of FIG. 1. For example, the web server 330 can access videos and associated metadata from the camera 130 or the client device 135 to store in the video store 310 and the metadata store 325, respectively. The web server 330 can also receive user input provided to the client device 135, can request video summary templates or other information from a client device 135 for use in generating a video summary, and can provide a generated video summary to the client device or another external entity.

Event of Interest/Activity Identification

The video editing module 320 analyzes metadata associated with a video to identify best scenes of the video based on identified events of interest or activities, and generates a video summary including one or more of the identified best scenes of the video. The video editing module 320 first accesses one or more videos from the video store 310, and accesses metadata associated with the accessed videos from the metadata store 325. The video editing module 320 then analyzes the metadata to identify events of interest in the metadata. Examples of events of interest can include abrupt changes or anomalies in the metadata, such as a peak or valley in metadata maximum or minimum values within the metadata, metadata exceeding or falling below particular thresholds, metadata within a threshold of predetermine values (for instance, within 20 meters of a particular location or within), and the like. The video editing module 320 can identify events of interest in videos based on any other type of metadata, such as a heart rate of a user, orientation information, and the like.

For example, the video editing module 320 can identify any of the following as an event of interest within the metadata: a greater than threshold change in acceleration or velocity within a pre-determined period of time, a maximum or above-threshold velocity or acceleration, a maximum or local maximum altitude, a maximum or above-threshold heart rate or breathing rate of a user, a maximum or above-threshold audio magnitude, a user location within a pre-determined threshold distance from a pre-determined location, a threshold change in or pre-determined orientation of the camera or user, a proximity to another user or location, a time within a threshold of a pre-determined time, a pre-determined environmental condition (such as a particular weather event, a particular temperature, a sporting event, a human gathering, or any other suitable event), or any other event associated with particular metadata.

In some embodiments, a user can manually indicate an event of interest during capture of the video. For example, a user can press a button on the camera or a camera remote or otherwise interact with the camera during the capture of video to tag the video as including an event of interest. The manually tagged event of interest can be indicated within metadata associated with the captured video. For example, if a user is capturing video while snowboarding and presses a camera button associated with manually tagging an event of interest, the camera creates metadata associated with the captured video indicating that the video includes an event of interest, and indicating a time or portion within the captured video at which the tagged event of interest occurs. In some embodiments, the manual tagging of an event of interest by a user while capturing video is stored as a flag within a resulting video file. The location of the flag within the video file corresponds to a time within the video at which the user manually tags the event of interest.

In some embodiments, a user can manually indicate an event of interest during capture of the video using a spoken command or audio signal. For instance, a user can say “Tag” or “Tag my moment” during the capture of video to tag the video as including an event of interest. The audio-tagged event of interest can be indicated within metadata associated with the captured video. The spoken command can be pre-programmed, for instance by a manufacturer, programmer, or seller of the camera system, or can be customized by a user of the camera system. For instance, a user can speak a command or other audio signal into a camera during a training period (for instance, in response to configuring the camera into a training mode, or in response to the selection of a button or interface option associated with training a camera to receive a spoken command). The spoken command or audio signal can be repeated during the training mode a threshold number of times (such as once, twice, or any number of times necessary for the purposes of identifying audio patterns as described herein), and the camera system can identify an audio pattern associated with the spoken commands or audio signals received during the training period. The audio pattern is then stored at the camera, and, during a video capture configuration, the camera can identify the audio pattern in a spoken command or audio signal received from a user of the camera, and can manually tag an event of interest during the capture of video in response to detecting the stored audio pattern within the received spoken command or audio signal. In some embodiments, the audio pattern is specific to spoken commands or audio signals received from a particular user and can be detected only in spoken commands or audio signals received from the particular user. In other embodiments, the audio pattern can be identified within spoken commands or audio signals received from any user. It should be noted that manually identified events of interest can be associated with captured video by the camera itself, and can be identified by a system to which the captured video is uploaded from the camera without significant additional post-processing.

As noted above, the video editing module 320 can identify events of interest based on activities performed by users when the videos are captured. For example, a jump while snowboarding or a crash while skateboarding can be identified as events of interest. Activities can be identified by the activity identifier module 335 based on metadata associated with the video captured while performing the activities. Continuing with the previous example, metadata associated with a particular altitude and a parabolic upward and then downward velocity can be identified as a “snowboarding jump”, and a sudden slowdown in velocity and accompanying negative acceleration can be identified as a “skateboarding crash”.

The video editing module 320 can identify events of interest based on audio captured in conjunction with the video. In some embodiments, the video editing module identifies events of interest based on one or more spoken words or phrases in captured audio. For example, if audio of a user saying “Holy Smokes!” is captured, the video editing module can determine that an event of interest just took place (e.g., within the previous 5 seconds or other threshold of time), and if audio of a user saying “Oh no! Watch out!” is captured, the video editing module can determine that an event of interest is about to occur (e.g., within the next 5 seconds or other threshold of time). In addition to identifying events of interest based on captured dialogue, the video editing module can identify an event of identify based on captured sound effects, captured audio exceeding a magnitude or pitch threshold, or captured audio satisfying any other suitable criteria.

In some embodiments, the video editing module 320 can identify video that does not include events of interest. For instance, the video editing module 320 can identify video that is associated with metadata patterns determined to not be of interest to a user. Such patterns can include metadata associated with a below-threshold movement, a below-threshold luminosity, a lack of faces or other recognizable objects within the video, audio data that does not include dialogue or other notable sound effects, and the like. In some embodiments, video determined to not include events of interest can be disqualified from consideration for inclusion in a generated video summary, or can be hidden from a user viewing captured video (in order to increase the chance that the remaining video presented to the user does include events of interest).

The activity identifier module 335 can receive a manual identification of an activity within videos from one or more users. In some embodiments, activities can be tagged during the capture of video. For instance, if a user is about to capture video while performing a snowboarding jump, the user can manually tag the video being captured or about to be captured as “snowboarding jump”. In some embodiments, activities can be tagged after the video is captured, for instance during playback of the video. For instance, a user can tag an activity in a video as a skateboarding crash upon playback of the video.

Activity tags in videos can be stored within metadata associated with the videos. For videos stored in the video store 310, the metadata including activity tags associated with the videos is stored in the metadata store 325. In some embodiments, the activity identifier module 335 identifies metadata patterns associated with particular activities and/or activity tags. For instance, metadata associated with several videos tagged with the activity “skydiving” can be analyzed to identify similarities within the metadata, such as a steep increase in acceleration at a high altitude followed by a high velocity at decreasing altitudes. Metadata patterns associated with particular activities are stored in the activity store 340.

In some embodiments, metadata patterns associated with particular activities can include audio data patterns. For instance, particular sound effects, words or phrases of dialogue, or the like can be associated with particular activities. For example, the spoken phrase “nice wave” can be associated with surfing, and the sound of a revving car engine can be associated with driving or racing a vehicle. In some embodiments, metadata patterns used to identify activities can include the use of particular camera mounts associated with the activities in capturing video. For example, a camera can detect that it is coupled to a snowboard mount, and video captured while coupled to the snowboard mount can be associated with the activity of snowboarding.

Once metadata patterns associated with particular activities are identified, the activity identifier module 335 can identify metadata patterns in metadata associated with other videos, and can tag or associate other videos associated with metadata including the identified metadata patterns with the activities associated with the identified metadata patterns. The activity identifier module 335 can identify and store a plurality of metadata patterns associated with a plurality of activities within the activity store 340. Metadata patterns stored in the activity store 340 can be identified within videos captured by one user, and can be used by the activity identifier module 335 to identify activities within videos captured by the user. Alternatively, metadata patterns can be identified within videos captured by a first plurality of users, and can be used by the activity identifier module 335 to identify activities within videos captured by a second plurality of users including at least one user not in the first plurality of users. In some embodiments, the activity identifier module 335 aggregates metadata for a plurality of videos associated with an activity and identifies metadata patterns based on the aggregated metadata. As used herein, “tagging” a video with an activity refers to the association of the video with the activity. Activities tagged in videos can be used as a basis to identify best scenes in videos (as described above), and to select video clips for inclusion in video summary templates (as described below).

Videos tagged with activities can be automatically uploaded to or shared with an external system. For instance, if a user captures video, the activity identifier module 335 can identify a metadata pattern associated with an activity in metadata of the captured video, in real-time (as the video is being captured), or after the video is captured (for instance, after the video is uploaded to the video server 140). The video editing module 320 can select a portion of the captured video based on the identified activity, for instance a threshold amount of time or frames around a video clip or frame associated with the identified activity. The selected video portion can be uploaded or shared to an external system, for instance via the web server 330. The uploading or sharing of video portions can be based on one or more user settings and/or the activity identified. For instance, a user can select one or more activities in advance of capturing video, and captured video portions identified as including the selected activities can be uploaded automatically to an external system, and can be automatically shared via one or more social media outlets.

Best Scene Identification and Video Summary Generation

The video editing module 320 identifies best scenes associated with the identified events of interest for inclusion in a video summary. Each best scene is a video clip, portion, or scene (“video clips” hereinafter), and can be an entire video or a portion of a video. For instance, the video editing module 320 can identify video clips occurring within a threshold amount of time of an identified event of interest (such as 3 seconds before and after the event of interest), within a threshold number of frames of an identified event of interest (such as 24 frames before and after the event of interest), and the like. The amount of length of a best scene can be pre-determined, and/or can be selected by a user.

The amount or length of video clip making up a best scene can vary based on an activity associated with captured video, based on a type or value of metadata associated with captured video, based on characteristics of the captured video, based on a camera mode used to capture the video, or any other suitable characteristic. For example, if an identified event of interest is associated with an above-threshold velocity, the video editing module 320 can identify all or part of the video corresponding to above-threshold velocity metadata as the best scene. In another example, the length of a video clip identified as a best scene can be greater for events of interest associated with maximum altitude values than for events of interest associated with proximity to a pre-determined location.

For events of interest manually tagged by a user, the length of a video clip identified as a best scene can be pre-defined by the user, can be manually selected by the user upon tagging the event of interest, can be longer than automatically-identified events of interest, can be based on a user-selected tagging or video capture mode, and the like. The amount or length of video clips making up best scenes can vary based on the underlying activity represented in captured video. For instance, best scenes associated with events of interest in videos captured while boating can be longer than best scenes associated with events of interest in videos captured while skydiving.

The identified video portions make up the best scenes as described herein. The video editing module 320 generates a video summary by combining or concatenating some or all of the identified best scenes into a single video. The video summary thus includes video portions of events of interest, beneficially resulting in a playable video including scenes likely to be of greatest interest to a user. The video editing module 320 can receive one or more video summary configuration selections from a user, each specifying one or more properties of the video summary (such as a length of a video summary, a number of best scenes for inclusion in the video summary, and the like), and can generate the video summary according to the one or more video summary configuration selections. In some embodiments, the video summary is a renderable or playable video file configured for playback on a viewing device (such as a monitor, a computer, a mobile device, a television, and the like). The video summary can be stored in the video store 310, or can be provided by the video server 140 to an external entity for subsequent playback. Alternatively, the video editing module 320 can serve the video summary from the video server 140 by serving each best scene directly from a corresponding best scene video file stored in the video store 310 without compiling a singular video summary file prior to serving the video summary. It should be noted that the video editing module 320 can apply one or more edits, effects, filters, and the like to one or more best scenes within the video summary, or to the entire video summary during the generation of the video summary.

In some embodiments, the video editing module 320 ranks identified best scenes. For instance, best scenes can be ranked based on activities with which they are associated, based on metadata associated with the best scenes, based on length of the best scenes, based on a user-selected preference for characteristics associated with the best scenes, or based on any other suitable criteria. For example, longer best scenes can be ranked higher than shorter best scenes. Likewise, a user can specify that best scenes associated with above-threshold velocities can be ranked higher than best scenes associated with above-threshold heart rates. In another example, best scenes associated with jumps or crashes can be ranked higher than best scenes associated with sitting down or walking. Generating a video summary can include identifying and including the highest ranked best scenes in the video summary.

In some embodiments, the video editing module 320 classifies scenes by generating a score associated with each of one or more video classes based on metadata patterns associated with the scenes. Classes can include but are not limited to: content-related classes (“snow videos”, “surfing videos”, etc.), video characteristic classes (“high motion videos”, “low light videos”, etc.), video quality classes, mode of capture classes (based on capture mode, mount used, etc.), sensor data classes (“high velocity videos”, “high acceleration videos”, etc.), audio data classes (“human dialogue videos”, “loud videos”, etc.), number of cameras used (“single-camera videos”, “multi-camera videos”, etc.), activity identified within the video, and the like. Scenes can be scored for one or more video classes, the scores can be weighted based on a pre-determined or user-defined class importance scale, and the scenes can be ranked based on the scores generated for the scenes.

In one example, the video editing module 320 analyzes metadata associated with accessed videos chronologically to identify an order of events of interest presented within the video. For example, the video editing module 320 can analyze acceleration data to identify an ordered set of video clips associated with acceleration data exceeding a particular threshold. In some embodiments, the video editing module 320 can identify an ordered set of events occurring within a pre-determined period of time. Each event in the identified set of events can be associated with a best scene; if the identified set of events is chronologically ordered, the video editing module 320 can generate a video summary by a combining video clips associated with each identified event in the order of the ordered set of events.

In some embodiments, the video editing module 320 can generate a video summary for a user using only videos associated with (or captured by) the user. To identify such videos, the video editing module 320 can query the video store 310 to identify videos associated with the user. In some embodiments, each video captured by all users of the video server 140 includes a unique identifier identifying the user that captured the video and identifying the video (as described above). In such embodiments, the video editing module 320 queries the video store 310 with an identifier associated with a user to identify videos associated with the user. For example, if all videos associated with User A include a unique identifier that starts with the sequence “X1Y2Z3” (an identifier unique to User A), the video editing module 320 can query the video store 310 using the identifier “X1Y2Z3” to identify all videos associated with User A. The video editing module 320 can then identify best scenes within such videos associated with a user, and can generate a video summary including such best scenes as described herein.

In addition to identifying best scenes, the video editing module 320 can identify one or more video frames that satisfy a set of pre-determined criteria for inclusion in a video summary, or for flagging to a user as candidates for saving as images/photograph stills. The pre-determined criteria can include metadata criteria, including but not limited to: frames with high motion (or blur) in a first portion of a frame and low motion (or blur) in another portion of a frame, frames associated with particular audio data (such as audio data above a particular magnitude threshold or audio data associated with voices or screaming), frames associated with above-threshold acceleration data, or frames associated with metadata that satisfies any other metadata criteria as described herein. In some embodiments, users can specify metadata criteria for use in flagging one or more video frames that satisfy pre-determined criteria. Similarly, in some embodiments, the video editing module 320 can identify metadata patterns or similarities in frames selected by a user to save as images/photograph stills, and can identify subsequent video frames that include the identified metadata patterns or similarities for flagging as candidates to save as images/photograph stills.

Video Summary Templates

In one embodiment, the video editing module 320 retrieves video summary templates from the template store 315 to generate a video summary. The template store 315 includes video summary templates each describing a sequence of video slots for including in a video summary. In one example, each video summary template may be associated with a type of activity performed by the user while capturing video or the equipment used by the user while capturing video. For example, a video summary template for generating video summaries of a ski tip can differ from the video summary template for generating video summaries of a mountain biking trip.

Each slot in a video summary template is a placeholder to be replaced by a video clip or scene when generating a video summary. Each slot in a video summary template can be associated with a pre-defined length, and the slots collectively can vary in length. The slots can be ordered within a template such that once the slots are replaced with video clips, playback of the video summary results in the playback of the video clips in the order of the ordered slots replaced by the video clips. For example, a video summary template may include an introductory slot, an action slot, and a low-activity slot. When generating the video summary using such a template, a video clip can be selected to replace the introductory slot, a video clip of a high-action event can replace the action slot, and a video clip of a low-action event can replace the low-activity slot. It should be noted that different video summary templates can be used to generate video summaries of different lengths or different kinds.

In some embodiments, video summary templates include a sequence of slots associated with a theme or story. For example, a video summary template for a ski trip may include a sequence of slots selected to present the ski trip narratively or thematically. In some embodiments, video summary templates include a sequence of slots selected based on an activity type. For example, a video summary template associated with surfing can include a sequence of slots selected to highlight the activity of surfing.

Each slot in a video summary template can identify characteristics of a video clip to replace the slot within the video summary template, and a video clip can be selected to replace the slot based on the identified characteristics. For example, a slot can identify one or more of the following video clip characteristics: motion data associated with the video clip, altitude information associated with the video clip, location information associated with the video clip, weather information associated with the clip, or any other suitable video characteristic or metadata value or values associated with a video clip. In these embodiments, a video clip having one or more of the characteristics identified by a slot can be selected to replace the slot.

In some embodiments, a video clip can be selected based on a length associated with a slot. For instance, if a video slot specifies a four-second length, a four-second (give or take a pre-determined time range, such as 0.5 seconds) video clip can be selected. In some embodiments, a video clip shorter than the length associated with a slot can be selected, and the selected video clip can replace the slot, reducing the length of time taken by the slot to be equal to the length of the selected video clip. Similarly, a video clip longer than the length associated with a slot can be selected, and either 1) the selected video clip can replace the slot, expanding the length of time associated with the slot to be equal to the length of the selected video clip, or 2) a portion of the selected video clip equal to the length associated with the slot can be selected and used to replace the slot. In some embodiments, the length of time of a video clip can be increased or decreased to match the length associated with a slot by adjusting the frame rate of the video clip to slow down or speed up the video clip, respectively. For example, to increase the amount of time taken by a video clip by 30%, 30% of the frames within the video clip can be duplicated. Likewise, to decrease the amount of time taken by a video clip by 60%, 60% of the frames within the video clip can be removed.

To generate a video summary using a video summary template, the video editing module 320 accesses a video summary template from the template store 315. The accessed video summary template can be selected by a user, can be automatically selected (for instance, based on an activity type or based on characteristics of metadata or video for use in generating the video summary), or can be selected based on any other suitable criteria. The video editing module 320 then selects a video clip for each slot in the video summary template, and inserts the selected video clips into the video summary in the order of the slots within the video summary template.

To select a video clip for each slot, the video editing module 320 can identify a set of candidate video clips for each slot, and can select from the set of candidate video clips (for instance, by selecting the determined best video from the set of candidate video clips according to the principles described above). In some embodiments, selecting a video clip for a video summary template slot identifying a set of video characteristics includes selecting a video clip from a set of candidate video clips that include the identified video characteristics. For example, if a slot identifies a video characteristic of “velocity over 15 mph”, the video editing module 320 can select a video clip associated with metadata indicating that the camera or a user of the camera was traveling at a speed of over 15 miles per hour when the video was captured, and can replace the slot within the video summary template with the selected video clip.

In some embodiments, video summary template slots are replaced by video clips identified as best scenes (as described above). For instance, if a set of candidate video clips are identified for each slot in a video summary template, if one of the candidate video slips identified for a slot is determined to be a best scene, the best scene is selected to replace the slot. In some embodiments, multiple best scenes are identified for a particular slot; in such embodiments, one of the best scenes can be selected for inclusion into the video summary based on characteristics of the best scenes, characteristics of the metadata associated with the best scenes, a ranking of the best scenes, and the like. It should be noted that in some embodiments, if a best scene or other video clip cannot be identified as an above-threshold match for clip requirements associated with a slot, the slot can be removed from the template without replacing the slot with a video clip.

In some embodiments, instead of replacing a video summary template slot with a video clip, an image or frame can be selected and can replace the slot. In some embodiments, an image or frame can be selected that satisfies one or more pre-determined criteria for inclusion in a video summary as described above. In some embodiments, an image or frame can be selected based on one or more criteria specified by the video summary template slot. For example, if a slot specifies one or more characteristics, an image or frame having one or more of the specified characteristics can be selected. In some embodiments, the video summary template slot can specify that an image or frame is to be selected to replace the slot. When an image or frame is selected and used to replace a slot, the image or frame can be displayed for the length of time associated with the slot. For instance, if a slot is associated with a four-second period of display time, an image or frame selected and used to replace the slot can be displayed for the four-second duration.

In some embodiments, when generating a video summary using a video summary template, the video editing module 320 can present a user with a set of candidate video clips for inclusion into one or more video summary template slots, for instance using a video summary generation interface. In such embodiments, the user can presented with a pre-determined number of candidate video clips for a particular slot, and, in response to a selection of a candidate scene by the user, the video editing module 320 can replace the slot with the selected candidate video clip. In some embodiments, the candidate video clips presented to the user for each video summary template slot are the video clips identified as best scenes (as described above). Once a user has selected a video clip for each slot in a video summary template, the video editing module 320 generates a video summary using the user-selected video clips based on the order of slots within the video summary template.

In one embodiment, the video editing module 320 generates video summary templates automatically, and stores the video summary templates in the template store 315. The video summary templates can be generated manually by experts in the field of video creation and video editing. The video editing module 320 may provide a user with a user interface allowing the user to generate video summary templates. Video summary templates can be received from an external source, such as an external template store. Video summary templates can be generated based on video summaries manually created by users, or based on an analysis of popular videos or movies (for instance by including a slot for each scene in a video).

System Operation

FIG. 4 is a flowchart illustrating a method for selecting video portions to include in a video summary, according to one embodiment. A request to generate a video summary is received 410. The request can identify one or more videos for which a video summary is to be generated. In some embodiments, the request can be received from a user (for instance, via a video summary generation interface on a computing device), or can be received from a non-user entity (such as the video server 140 of FIG. 1). In response to the request, video and associated metadata is accessed 420. The metadata includes data describing characteristics of the video, the context or environment in which the video was captured, characteristics of the user or camera that captured the video, or any other information associated with the capture of the video. As described above, examples of such metadata include telemetry data describing the acceleration or velocity of the camera during the capture of the video, location or altitude data describing the location of the camera, environment data at the time of video capture, biometric data of a user at the time of video capture, and the like.

Events of interest within the accessed video are identified 430 based on the accessed metadata associated with the video. Events of interest can be identified based on changes in telemetry or location data within the metadata (such as changes in acceleration or velocity data), based on above-threshold values within the metadata (such as a velocity threshold or altitude threshold), based on local maximum or minimum values within the data (such as a maximum heart rate of a user), based on the proximity between metadata values and other values, or based on any other suitable criteria. Best scenes are identified 440 based on the identified events of interest. For instance, for each event of interest identified within a video, a portion of the video corresponding to the event of interest (such as a threshold amount of time or a threshold number of frames before and after the time in the video associated with the event of interest) is identified as a best scene. A video summary is then generated 450 based on the identified best scenes, for instance by concatenating some or all of the best scenes into a single video.

FIG. 5 is a flowchart illustrating a method for generating video summaries using video templates, according to one embodiment. A request to generate a video summary is received 510. A video summary template is selected 520 in response to receiving the request. The selected video summary template can be a default template, can be selected by a user, can be selected based on an activity type associated with captured video, and the like. The selected video summary template includes a plurality of slots, each associated with a portion of the video summary. The video slots can specify video or associated metadata criteria (for instance, a slot can specify a high-acceleration video clip).

A set of candidate video clips is identified 530 for each slot, for instance based on the criteria specified by each slot, based on video clips identified as “best scenes” as described above, or based on any other suitable criteria. For each slot, a candidate video clip is selected 540 from among the set of candidate video clips identified for the slot. In some embodiments, the candidate video clips in each set of candidate video clips are ranked, and the most highly ranked candidate video clip is selected. The selected candidate video clips are combined 550 to generate a video summary. For instance, the selected candidate video clips can be concatenated in the order of the slots of the video summary template with which the selected candidate video clips correspond.

FIG. 6 is a flowchart illustrating a method for generating video summaries of videos associated with user-tagged events, according to one embodiment. Video is captured 610 by a user of a camera. During video capture, an input is received 620 from the user indicating an event of interest within the captured video. The input can be received, for instance, through the selection of a camera button, a camera interface, or the like. An indication of the user-tagged event of interest is stored in metadata associated with the captured video. A video portion associated with the tagged event of interest is selected 630, and a video summary including the selected video portion is generated 640. For instance, the selected video portion can be a threshold number of video frames before and after a frame associated with the user-tagged event, and the selected video portion can be included in the generated video summary with one or more other video portions.

FIG. 7 is a flowchart illustrating a method 700 of identifying an activity associated with a video, according to one embodiment. A first video and associated metadata is accessed 710. An identification of an activity associated with the first video is received 720. For instance, a user can identify an activity in the first video during post-processing of the first video, or during the capture of the first video. A metadata pattern associated with the identified activity is identified 730 within the accessed metadata. The metadata pattern can include, for example, a defined change in acceleration metadata and altitude metadata.

A second video and associated metadata is accessed 740. The metadata pattern is identified 750 within the metadata associated with the second video. Continuing with the previous example, the metadata associated with the second video is analyzed and the defined change in acceleration metadata and altitude metadata is identified within the examined metadata. In response to identifying the metadata pattern within the metadata associated with the second video, the second video is associated 750 with the identified activity.

FIG. 8 is a flowchart illustrating a method 800 of sharing a video based on an identified activity within the video, according to one embodiment. Metadata patterns associated with one or more pre-determined activities are stored 810. Video and associated metadata are subsequently captured 820, and a stored metadata pattern associated with an activity is identified 830 within the captured metadata. A portion of the captured video associated with the metadata pattern is selected 840, and is outputted 850 based on the activity associated with the identified metadata pattern and/or one or more user settings. For instance, a user can select “snowboarding jump” and “3 seconds before and after” as an activity and video portion length, respectively. In such an example, when a user captures video, a metadata pattern associated with a snowboarding jump can be identified, and a video portion consisting of 3 seconds before and 3 seconds after the video associated with the snowboarding jump can automatically be uploaded to a social media outlet.

ADDITIONAL CONFIGURATION CONSIDERATIONS

Throughout this specification, some embodiments have used the expression “coupled” along with its derivatives. The term “coupled” as used herein is not necessarily limited to two or more elements being in direct physical or electrical contact. Rather, the term “coupled” may also encompass two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other, or are structured to provide a thermal conduction path between the elements.

Likewise, as used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.

Finally, as used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.

Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a camera expansion module as disclosed from the principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.

Claims

1. A method of generating a video summary of a video, the method comprising: accessing metadata associated with a video, the accessed metadata representative of one or more aspects of the capture of the video as a function of time during capture of the video;identifying patterns in the metadata as a function of time that correspond to performance of one or more activities being performed by a subject of the video;determining the one or more activities being performed by the subject of the video during specific portions of the video based on the identifications of the patterns in the metadata as a function of time, the one or more activities including a first activity performed by the subject of the video during a first portion of the video, the first activity being of a given type of activity;identifying moments within the video at which events of interest are captured in the video based on the accessed metadata, the moments including a first moment during performance of the first activity by the subject of the video at which a first event of interest occurs, the first event being of a given type of event;identifying individual highlight scenes in the video for the individual events of interest, wherein lengths of footage in the video included in the highlight scenes before and after the moments at which the events of interest occur are based on types of the activities being performed by the subject of the video at the moments in the video at which the events of interest are captured and types of the events such that a first scene in the video is identified for the first event of interest and a length of footage included in the first scene before and after the first moment is a first length based on the given type of activity being of a first type of activity and the given type of event being of a first type of event, a second length based on the given type of activity being of the first type of activity and the given type of event being of a second type of event, a third length based on the given type of activity being of a second type of activity and the given type of event being of the first type of event, and a fourth length based on the given type of activity being of the second type of activity and the given type of event being of the second type of event, the first length different from the second length and the third length, the third length different from the fourth length; andgenerating a video summary of the video for playback, the video summary including at least one of the highlight scenes.
2. The method of claim 1, wherein generating the video summary comprises concatenating a plurality of the highlight scenes.
3. The method of claim 1, wherein the accessed metadata is generated by a camera during the capture of the video.
4. The method of claim 3, wherein the accessed metadata comprises telemetry data describing a motion of the camera during the capture of the video.
5. The method of claim 3, wherein the accessed metadata comprises location data describing a location of the camera during the capture of the video.
6. The method of claim 3, wherein the accessed metadata comprises biometric data describing characteristics of a user of the camera during the capture of the video.
7. The method of claim 1, wherein the accessed metadata is accessed from an external entity after the capture of the video.
8. The method of claim 7, wherein the accessed metadata comprises environment data describing characteristics of an environment in which the video was captured.
9. The method of claim 1, further comprising: ranking the identified highlight scenes based on a likelihood that the identified highlight scene will be of interest to a user;wherein the at least one highlight scene included in the video summary are selected based on the ranking.
10. The method of claim 1, wherein the video includes multiple video segments captured at non-contiguous times.
11. A system that generates a video summary of a video, the system comprising: a non-transitory computer-readable storage medium storing instructions configured to, when executed: access metadata associated with a video, the accessed metadata representative of one or more aspects of capture of the video as a function of time during capture of the video;identify patterns in the metadata as a function of time that correspond to performance of one or more activities being performed by a subject of the video;determine the one or more activities being performed by the subject of the video during specific portions of the video based on the identifications of the patterns in the metadata as a function of time, the one or more activities including a first activity performed by the subject of the video during a first portion of the video, the first activity being of a given type of activity;identify moments within the video at which events of interest are captured in within the video based on the accessed metadata, the moments including a first moment during performance of the first activity by the subject of the video at which a first event of interest occurs, the first event being of a given type of event;identify individual highlight scenes in the video for the individual events of interest, wherein lengths of footage in the video included in the highlight scenes before and after the moments at which the events of interest occur are based on types of the activities being performed by the subject of the video at the moments in the video at which the events of interest are captured and types of the events such that a first scene in the video is identified for the first event of interest and a length of footage included in the first scene before and after the first moment is a first length based on the given type of activity being of a first type of activity and the given type of event being of a first type of event, a second length based on the given type of activity being of the first type of activity and the given type of event being of a second type of event, a third length based on the given type of activity being of a second type of activity and the given type of event being of the first type of event, and a fourth length based on the given type of activity being of the second type of activity and the given type of event being of the second type of event, the first length different from the second length and the third length, the third length different from the fourth length; andgenerate a video summary of the video for playback, the video summary including at least one of the highlight scenes; anda processor configured to execute the instructions.
12. The system of claim 11, wherein the accessed metadata is generated by a camera during the capture of the video and comprises one or more of: telemetry data describing a motion of the camera during the capture of the video, location data describing a location of the camera during the capture of the video, and biometric data describing characteristics of a user of the camera during the capture of the video.
13. The system of claim 11, wherein the accessed metadata is accessed from an external entity after the capture of the video and comprises environment data describing characteristics of an environment in which the video was captured.
14. The system of claim 11, wherein the instructions are further configured to: rank the identified highlight scenes based on a likelihood that the identified highlight scene will be of interest to a user;wherein the at least one highlight scene included in the video summary based on the ranking.
15. A non-transitory computer-readable storage medium storing instructions for identifying scenes in captured video for inclusion in a video summary, the instructions configured to, when executed: access metadata associated with a video, the accessed metadata representative of one or more aspects of capture of the video as a function of time during capture of the video;identify patterns in the metadata as a function of time that correspond to performance of one or more activities being performed by a subject of the video;determine the one or more activities being performed by the subject of the video during specific portions of the video based on the identifications of the patterns in the metadata as a function of time, the one or more activities including a first activity performed by the subject of the video during a first portion of the video, the first activity being of a given type of activity;identify moments within the video at which events of interest are captured in within the video based on the accessed metadata, the moments including a first moment during performance of the first activity by the subject of the video at which a first event of interest occurs, the first event being of a given type of event;identify individual highlight scenes in the video for the individual events of interest, wherein lengths of footage in the video included in the highlight scenes before and after the moments at which the events of interest occur are based on types of the activities being performed by the subject of the video at the moments in the video at which the events of interest are captured and types of the events such that a first scene in the video is identified for the first event of interest and a length of footage included in the first scene before and after the first moment is a first length based on the given type of activity being of a first type of activity and the given type of event being of a first type of event, a second length based on the given type of activity being of the first of activity and the given type of event being of a second type of event, a third length based on the given type of activity being of a second type of activity and the given type of event being of the first type of event, and a fourth length based on the given type of activity being of the second type of activity and the given type of event being of the second type of event, the first length different from the second length and the third length, the third length different from the fourth length; andgenerate a video summary of the video for playback, the video summary including at least one of the highlight scenes.
16. The computer-readable storage medium of claim 15, wherein the accessed metadata is generated by a camera during the capture of the video and comprises one or more of: telemetry data describing a motion of the camera during the capture of the video, location data describing a location of the camera during the capture of the video, and biometric data describing characteristics of a user of the camera during the capture of the video.
17. The computer-readable storage medium of claim 15, wherein the accessed metadata is accessed from an external entity after the capture of the video and comprises environment data describing characteristics of an environment in which the video was captured.
18. The computer-readable storage medium of claim 15, wherein the instructions are further configured to: rank the identified highlight scenes based on a likelihood that the identified highlight scene will be of interest to a user;wherein the at least one highlight scene included in the video summary based on the ranking.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to, and the benefit of, U.S. Provisional Application No. 62/039,849, filed Aug. 20, 2014, which is incorporated by reference herein in its entirety.

US Referenced Citations (260)

Number	Name	Date	Kind
5130794	Ritchey	Jul 1992	A
6337683	Gilbert	Jan 2002	B1
6593956	Potts	Jul 2003	B1
6633685	Kusama	Oct 2003	B1
7194636	Harrison	Mar 2007	B2
7222356	Yonezawa	May 2007	B1
7266771	Tow	Sep 2007	B1
7483618	Edwards	Jan 2009	B1
7512886	Herberger	Mar 2009	B1
7681223	Takahashi	Mar 2010	B2
7885426	Golovchinsky	Feb 2011	B2
7970240	Chao	Jun 2011	B1
8180161	Haseyama	May 2012	B2
8396878	Acharya	Mar 2013	B2
8446433	Mallet	May 2013	B1
8606073	Woodman	Dec 2013	B2
8611422	Yagnik	Dec 2013	B1
8612463	Brdiczka	Dec 2013	B2
8706675	Samaniego	Apr 2014	B1
8718447	Yang	May 2014	B2
8730299	Kozko	May 2014	B1
8763023	Goetz	Jun 2014	B1
8768142	Ju	Jul 2014	B1
8774560	Sugaya	Jul 2014	B2
8910046	Matsuda	Dec 2014	B2
8971623	Gatt	Mar 2015	B2
8988509	Macmillan	Mar 2015	B1
8990328	Grigsby	Mar 2015	B1
9032299	Lyons	May 2015	B2
9036001	Chuang	May 2015	B2
9041727	Ubillos	May 2015	B2
9077956	Morgan	Jul 2015	B1
9087297	Filippova	Jul 2015	B1
9111579	Meaney	Aug 2015	B2
9142253	Ubillos	Sep 2015	B2
9142257	Woodman	Sep 2015	B2
9151933	Sato	Oct 2015	B2
9204039	He	Dec 2015	B2
9208821	Evans	Dec 2015	B2
9245582	Shore	Jan 2016	B2
9253533	Morgan	Feb 2016	B1
9317172	Lyons	Apr 2016	B2
9342376	Jain	May 2016	B2
9396385	Bentley	Jul 2016	B2
9418283	Natarajan	Aug 2016	B1
9423944	Eppolito	Aug 2016	B2
9436875	Curcio et al.	Sep 2016	B2
9473758	Long	Oct 2016	B1
9479697	Aguilar	Oct 2016	B2
9564173	Swenson	Feb 2017	B2
20020165721	Chang	Nov 2002	A1
20030048843	Hashimoto	Mar 2003	A1
20040001706	Jung	Jan 2004	A1
20040128317	Sull	Jul 2004	A1
20050025454	Nakamura et al.	Feb 2005	A1
20050060365	Robinson	Mar 2005	A1
20050108031	Grosvenor	May 2005	A1
20050198018	Shibata	Sep 2005	A1
20060080286	Svendsen	Apr 2006	A1
20060115108	Rodriguez	Jun 2006	A1
20060122842	Herberger	Jun 2006	A1
20060156219	Haot	Jul 2006	A1
20070002946	Bouton	Jan 2007	A1
20070088833	Yang	Apr 2007	A1
20070106419	Rachamadugu	May 2007	A1
20070168543	Krikorian	Jul 2007	A1
20070173296	Hara	Jul 2007	A1
20070182861	Luo	Aug 2007	A1
20070204310	Hua	Aug 2007	A1
20070230461	Singh	Oct 2007	A1
20080044155	Kuspa	Feb 2008	A1
20080123976	Coombs et al.	May 2008	A1
20080152297	Ubillos	Jun 2008	A1
20080163283	Tan	Jul 2008	A1
20080177706	Yuen	Jul 2008	A1
20080183843	Gavin	Jul 2008	A1
20080208791	Das	Aug 2008	A1
20080253735	Kuspa	Oct 2008	A1
20080313541	Shafton	Dec 2008	A1
20090019995	Miyajima	Jan 2009	A1
20090027499	Nicholl	Jan 2009	A1
20090125559	Yoshino	May 2009	A1
20090213270	Ismert	Aug 2009	A1
20090252474	Nashida	Oct 2009	A1
20090274339	Cohen	Nov 2009	A9
20090327856	Mouilleseaux	Dec 2009	A1
20100045773	Ritchey	Feb 2010	A1
20100046842	Conwell	Feb 2010	A1
20100064219	Gabrisko	Mar 2010	A1
20100086216	Lee et al.	Apr 2010	A1
20100104261	Liu	Apr 2010	A1
20100161720	Colligan	Jun 2010	A1
20100183280	Beauregard	Jul 2010	A1
20100199182	Lanza	Aug 2010	A1
20100231730	Ichikawa	Sep 2010	A1
20100245626	Woycechowsky	Sep 2010	A1
20100251295	Amento	Sep 2010	A1
20100274714	Sims	Oct 2010	A1
20100278504	Lyons	Nov 2010	A1
20100278509	Nagano	Nov 2010	A1
20100281375	Pendergast	Nov 2010	A1
20100281386	Lyons	Nov 2010	A1
20100287476	Sakai	Nov 2010	A1
20100299630	McCutchen	Nov 2010	A1
20100318660	Balasubramanian et al.	Dec 2010	A1
20100321471	Casolara	Dec 2010	A1
20110025847	Park	Feb 2011	A1
20110069148	Jones	Mar 2011	A1
20110069189	Venkataraman	Mar 2011	A1
20110075990	Eyer	Mar 2011	A1
20110093605	Choudhury	Apr 2011	A1
20110093798	Shahraray	Apr 2011	A1
20110103700	Haseyama	May 2011	A1
20110134240	Anderson	Jun 2011	A1
20110137156	Razzaque	Jun 2011	A1
20110170086	Oouchida	Jul 2011	A1
20110173565	Ofek	Jul 2011	A1
20110206351	Givoly	Aug 2011	A1
20110211040	Lindemann	Sep 2011	A1
20110242098	Tamaru	Oct 2011	A1
20110258049	Ramer	Oct 2011	A1
20110293250	Deever	Dec 2011	A1
20110317981	Fay	Dec 2011	A1
20110320322	Roslak	Dec 2011	A1
20120014673	O'Dwyer	Jan 2012	A1
20120020656	Farmer	Jan 2012	A1
20120027381	Kataoka et al.	Feb 2012	A1
20120030029	Flinn	Feb 2012	A1
20120030263	John	Feb 2012	A1
20120057852	Devleeschouwer	Mar 2012	A1
20120114233	Gunatilake	May 2012	A1
20120123780	Gao	May 2012	A1
20120127169	Barcay	May 2012	A1
20120131591	Moorthi	May 2012	A1
20120141019	Zhang	Jun 2012	A1
20120192225	Harwell	Jul 2012	A1
20120198319	Agnoli	Aug 2012	A1
20120206565	Villmer	Aug 2012	A1
20120209889	Agnoli	Aug 2012	A1
20120210205	Sherwood	Aug 2012	A1
20120210228	Wang	Aug 2012	A1
20120246114	Edmiston	Sep 2012	A1
20120283574	Park	Nov 2012	A1
20120311448	Achour	Dec 2012	A1
20130024805	In	Jan 2013	A1
20130041948	Tseng	Feb 2013	A1
20130044108	Tanaka	Feb 2013	A1
20130058532	White	Mar 2013	A1
20130063561	Stephan	Mar 2013	A1
20130078990	Kim	Mar 2013	A1
20130104177	Kwan	Apr 2013	A1
20130114902	Sukthankar	May 2013	A1
20130127636	Aryanpur	May 2013	A1
20130136193	Hwang	May 2013	A1
20130142384	Ofek	Jun 2013	A1
20130151970	Achour	Jun 2013	A1
20130166303	Chang	Jun 2013	A1
20130182166	Shimokawa	Jul 2013	A1
20130185388	Mackie	Jul 2013	A1
20130191743	Reid	Jul 2013	A1
20130195429	Fay	Aug 2013	A1
20130197967	Pinto	Aug 2013	A1
20130208134	Hamalainen	Aug 2013	A1
20130208942	Davis	Aug 2013	A1
20130215220	Wang	Aug 2013	A1
20130222583	Earnshaw	Aug 2013	A1
20130235071	Ubillos	Sep 2013	A1
20130239051	Albouze	Sep 2013	A1
20130259390	Dunlop	Oct 2013	A1
20130259399	Ho	Oct 2013	A1
20130263002	Park	Oct 2013	A1
20130282747	Cheng	Oct 2013	A1
20130283301	Avedissian	Oct 2013	A1
20130287214	Resch	Oct 2013	A1
20130287304	Kimura	Oct 2013	A1
20130300939	Chou et al.	Nov 2013	A1
20130308921	Budzinski	Nov 2013	A1
20130318443	Bachman	Nov 2013	A1
20130330019	Kim	Dec 2013	A1
20130343727	Rav-Acha	Dec 2013	A1
20140026156	Deephanphongs	Jan 2014	A1
20140064706	Lewis, II	Mar 2014	A1
20140072285	Shynar	Mar 2014	A1
20140093164	Noorkami et al.	Apr 2014	A1
20140096002	Dey	Apr 2014	A1
20140105573	Hanckmann et al.	Apr 2014	A1
20140149865	Tanaka	May 2014	A1
20140152762	Ukil	Jun 2014	A1
20140161351	Yagnik	Jun 2014	A1
20140165119	Liu	Jun 2014	A1
20140169766	Yu	Jun 2014	A1
20140176542	Shohara	Jun 2014	A1
20140193040	Bronshtein	Jul 2014	A1
20140212107	Saint-Jean	Jul 2014	A1
20140219634	McIntosh	Aug 2014	A1
20140226953	Hou	Aug 2014	A1
20140232818	Carr	Aug 2014	A1
20140232819	Armstrong	Aug 2014	A1
20140245336	Lewis, II	Aug 2014	A1
20140282661	Martin	Sep 2014	A1
20140300644	Gillard	Oct 2014	A1
20140328570	Cheng	Nov 2014	A1
20140334796	Galant	Nov 2014	A1
20140341527	Hurley	Nov 2014	A1
20140341528	Mahate	Nov 2014	A1
20140366052	Ives	Dec 2014	A1
20140376876	Bentley	Dec 2014	A1
20150015680	Wang	Jan 2015	A1
20150022355	Pham	Jan 2015	A1
20150029089	Kim	Jan 2015	A1
20150039646	Sharifi	Feb 2015	A1
20150058709	Zaletel	Feb 2015	A1
20150067811	Agnew	Mar 2015	A1
20150071547	Keating	Mar 2015	A1
20150085111	Lavery	Mar 2015	A1
20150113009	Zhou	Apr 2015	A1
20150154452	Bentley	Jun 2015	A1
20150156247	Hensel	Jun 2015	A1
20150178915	Chatterjee	Jun 2015	A1
20150186073	Pacurariu	Jul 2015	A1
20150220504	Bocanegra Alvarez	Aug 2015	A1
20150254871	Macmillan	Sep 2015	A1
20150256689	Erkkila	Sep 2015	A1
20150256746	Macmillan	Sep 2015	A1
20150256808	Macmillan	Sep 2015	A1
20150262616	Jaime	Sep 2015	A1
20150271483	Sun	Sep 2015	A1
20150281710	Sievert	Oct 2015	A1
20150287435	Land	Oct 2015	A1
20150294141	Molyneux	Oct 2015	A1
20150318020	Pribula	Nov 2015	A1
20150339324	Westmoreland	Nov 2015	A1
20150373281	White	Dec 2015	A1
20150375117	Thompson	Dec 2015	A1
20150382083	Chen	Dec 2015	A1
20160005435	Campbell	Jan 2016	A1
20160005440	Gower	Jan 2016	A1
20160026874	Hodulik	Jan 2016	A1
20160027470	Newman	Jan 2016	A1
20160027475	Hodulik	Jan 2016	A1
20160029105	Newman	Jan 2016	A1
20160055885	Hodulik	Feb 2016	A1
20160088287	Sadi	Mar 2016	A1
20160094601	Besehanic	Mar 2016	A1
20160098941	Kerluke	Apr 2016	A1
20160103830	Cheong	Apr 2016	A1
20160119551	Brown	Apr 2016	A1
20160133295	Boyle	May 2016	A1
20160189752	Galant	Jun 2016	A1
20160217325	Bose	Jul 2016	A1
20160225405	Matias	Aug 2016	A1
20160225410	Lee	Aug 2016	A1
20160234345	Roberts	Aug 2016	A1
20160260000	Yamakawa	Sep 2016	A1
20160286235	Yamamoto	Sep 2016	A1
20160292881	Bose	Oct 2016	A1
20160300594	Allen	Oct 2016	A1
20160358603	Azam	Dec 2016	A1
20160366330	Boliek	Dec 2016	A1
20170006214	Andreassen	Jan 2017	A1

Foreign Referenced Citations (15)

Number	Date	Country
2933226	Jan 2010	FR
H09181966	Jul 1997	JP
2005252459	Sep 2005	JP
2006053694	Feb 2006	JP
2006053694	Feb 2006	JP
2008059121	Mar 2008	JP
2009053748	Mar 2009	JP
2011188004	Sep 2011	JP
2011188004	Sep 2011	JP
2001020466	Mar 2001	WO
2006001361	Jan 2006	WO
2009040538	Apr 2009	WO
2012057623	May 2012	WO
2012057623	May 2012	WO
2012086120	Jun 2012	WO

Non-Patent Literature Citations (31)

Entry
PCT Invitation to Pay Additional Fees and, Where Applicable, Protest Fee, PCT/US15/41624, dated Sep. 23, 2015, 3 Pages.
PCT International Search Report and Written Opinion for PCT/US2015/023680, dated Oct. 6, 2015, 17 Pages.
PCT Invitation to Pay Additional Fees, and Where Applicable, Protest Fee, Application No. PCT/US2015/023680, dated Jun. 15, 2015, 2 Pages.
PCT International Search Report for PCT/US15/41624 dated Nov. 4, 2015.
PCT International Search Report for PCT/US15/23680 dated Aug. 3, 2015.
PCT International Written Opinion for PCT/US2015/041624, dated Dec. 17, 2015, 17 Pages.
PCT International Preliminary Report on Patentability for PCT/US2015/023680, dated Oct. 4, 2016.
Ricker, ‘First Click: TomTom's Bandit camera beats GoPro with software’ Mar. 9, 2016 URL: http://www.theverge. com/2016/3/9/11179298/tomtom-bandit-beats-gopro (6 pages).
PCT International Search Report and Written Opinion for PCT/US15/18538, dated Jun. 16, 2015, 26 pages.
PCT International Search Report for PCT/US17/16367 dated Apr. 14, 2017 (2 pages).
PCT International Search Reort for PCT/US15/18538 dated Jun. 16, 2015 (2 pages).
Supplemental European Search Report for 15773596.0 dated Aug. 4, 2017, 15 pages.
FFmpeg, “AVPacket Struct Reference,” Doxygen, Jul. 20, 2014, 24 pages, [online] [retrieved on Jul. 13, 2015] Retrieved from the internet <URL:https://www.ffmpeg.org/doxygen/2.5/group_lavf_decoding.html>.
FFmpeg, “Demuxing,” Doxygen, Dec. 5, 2014, 15 Pages, [online] [retrieved on Jul. 13, 2015] Retrieved from the internet <URL:https://www.ffmpeg.org/doxygen/2.3/group_lavf_encoding.html>.
FFmpeg, “Muxing,” Doxygen, Jul. 20, 2014, 9 Pages, [online] [retrieved on Jul. 13, 2015] Retrieved from the internet <URL: https://www.ffmpeg.org/doxyg en/2. 3/structA VP a ck et. html>.
Han et al., Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, International Conference on Learning Representations 2016, 14 pgs.
He et al., “Deep Residual Learning for Image Recognition,” arXiv:1512.03385, 2015, 12 pgs.
Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5MB model size,” arXiv:1602.07360, 2016, 9 pgs.
Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5MB model size”, arXiv:1602.07360v3 [cs.CV] Apr. 6, 2016 (9 pgs.).
Ioffe et al., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167, 2015, 11 pgs.
Parkhi et al., “Deep Face Recognition,” Proceedings of the British Machine Vision, 2015, 12 pgs.
PCT International Search Report and Written Opinion for PCT/US15/12086 dated Mar. 17, 2016, 20 pages.
Schroff et al., “FaceNet: A Unified Embedding for Face Recognition and Clustering,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 10 pgs.
Tran et al., “Learning Spatiotemporal Features with 3D Convolutional Networks”, arXiv:1412.0767 [cs.CV] Dec. 2, 2014 (9 pgs).
Yang et al., “Unsupervised Extraction of Video Highlights Via Robust Recurrent Auto-encoders” arXiv:1510.01442v1 [cs.CV] Oct. 6, 2015 (9 pgs).
Ernoult, Emeric, “How to Triple Your YouTube Video Views with Facebook”, SocialMediaExaminer.com, Nov. 26, 2012, 16 pages.
Nicole Lee, Twitter's Periscope is the best livestreaming video app yet; Mar. 26, 2015 URL:http://www.engadget.com/2015/03/26/periscope/ [Retrieved Aug. 25, 2015] 11 pages.
Japanese Office Action for JP Application No. 2013-140131, dated Aug. 5, 2014, 6 pages.
Office Action for U.S. Appl. No. 13/831,124, dated Mar. 19, 2015, 14 pages.
PSonar URL: http://www.psonar.com/about retrieved on Aug. 24, 2016, 3 pages.
Kiyoharu Aizawa et al., Efficient retrieval of life log based on context and content, Proceedings of the 1st. ACM Workshop on Continuous Archival and Retrieval of Personal Experiences. Carpe '04. New York, NY, Oct. 15, 2004; [Proceedings of the Workshop on Continuous Archival and Retrieval of Personal Experiences. Carpe], 20041015; 20041015-20041015 New York, NY : ACM, US, Source info: pp. 22-31.

Related Publications (1)

	Number	Date	Country
	20160027470 A1	Jan 2016	US

Provisional Applications (2)

	Number	Date	Country
	62039849	Aug 2014	US
	62028254	Jul 2014	US

Scene and activity identification in video summary generation

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension

Abstract