This disclosure relates to systems and methods configured to identify activities and/or events represented in a video.
Videos may be analyzed based on their visual content to identify an activity being performed during video capture. Analyzing visual content may involve techniques that are computationally expensive.
This disclosure relates to systems and methods configured to identify activities and/or events represented in a video. An activity and/or event may be represented in a video by virtue of one or both of an entity moving with a capture device during capture of the video preforming the activity and/or event, or the video portraying one or more entities performing the activity and/or event. Activity types may be characterized by one or more of common movements, equipment, spatial context, and/or other features. Events may be characterized by one or both of individual movements and/or sets of movements that may routinely occur during performance of an activity.
A system that identifies activities and/or events represented in a video may include one or more physical processors, and/or other components. The one or more physical processors may be configured by machine-readable instructions. Executing the machine-readable instructions may cause the one or more physical processors to facilitate identifying activities and/or events represented in a video. The machine-readable instructions may include one or more computer program components. The computer program components may include one or more of a video component, a sensor component, a transform component, an encoding component, a performance component, and/or other computer program components.
The video component may be configured to obtain information defining one or more videos, and/or other information. Information defining a video may include, for example, a video file. A video may include one or more of visual content, audio content, and/or other content. The visual content may be presented in the form of individual frame images in a set of multiple frame images of a video. The multiple frame images may be presented in an ordered sequence. The audio content may include recorded and/or provided audio that may accompany visual content. The audio content may be synchronized with visual content.
The sensor component may be configured to obtain sensor output signals generated from one or more sensors. Sensor output signals may be generated contemporaneously with capture of a video by a capture device. Sensor output signals may span a time duration. A given sensor may include one or more of a motion sensor, a sound transducer, and/or other sensors. Sensor output signals of a motion sensor may characterize motion of a capture device over time. Sensor output signals of a sound transducer may characterize an audio component of a video.
The transform component may be configured to transform sensor output signals to a frequency domain to generate information defining individual spectrogram representations and/or individual sets of spectrogram representations of the sensor output signals. The information defining individual spectrogram representations in the set of spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows along a time duration of the sensor output signals. Time windows for an individual spectrogram representation may have an individual time length that may set an individual time resolution of the individual spectrogram representation.
The encoding component may be configured to encode information defining individual spectrogram representations and/or individual sets of spectrogram representations into an image file.
The performance component may be configured to identify one or more activities and/or events represented in a video. The identification may be based on one or both of an individual spectrogram representation of the sensor output signals or an image file having information defining a set of spectrogram representations encoded therein.
These and other objects, features, and characteristics of the system and/or method disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. As used in the specification and in the claims, the singular form of “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
System 10 may include one or more of a processor 11, electronic storage 12, interface 13 (e.g., bus, wireless interface, etc.), and/or other components. Electronic storage 12 may include electronic storage medium that electronically stores information. Electronic storage 12 may store software algorithms, information determined by processor 11, information received remotely, and/or other information that enables system 10 to function properly. For example, electronic storage 12 may store information related to one or more of images, videos, image exemplars, and/or other information.
Processor 11 may be configured to provide information processing capabilities in system 10. As such, processor 11 may comprise one or more of a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. Processor 11 may be configured by machine readable instructions 100. Executing machine-readable instructions 100 may cause processor 11 to identify activities and/or events represented in a video. Machine-readable instructions 100 may include one or more computer program components. Machine readable instructions 100 may include one or more of a video component 102, a sensor component 104, a transform component 106, an encoding component 108, a performance component 110, and/or other computer program components.
In some implementations, processor 11 may be included in one or more of a server (not shown), a computing platform (not shown), a capture device (not shown), and/or other devices. By way of non-limiting illustration, a server may include processor 11 and may communicate with computing platforms via client/server architecture and/or other communication scheme. The server may be configured to provide features and/or functions of processor 11 to users via computing platforms. In some implementations, one or more features and/or functions of processor 11 may be attributed to individual computing platforms associated with users. By way of non-limiting illustration, individual computing platforms may obtain machine-readable instructions that may be the same or similar to machine-readable instructions 100 such that features and/or functions of processor 11 may be carried out locally at the individual computing platforms. In some implementations, one or more features and/or functions of processor 11 may be attributed to individual capture devices. By way of non-limiting illustration, individual capture devices may obtain machine-readable instructions that may be the same or similar to machine-readable instructions 100 such that features and/or functions of processor 11 may be carried out locally at the individual capture devices. A computing platform may include one or more of a desktop computer, a laptop computer, a smartphone, a tablet computer, and/or other computing platform. A capture device may include an action camera, a camera-enabled computing platform, and/or other devices. It is noted that in some implementations, system 10 may include one or more of one or more servers, one or more computing platforms, one or more capture devices, and/or other components described herein yet not explicitly shown in
A capture device may be configured for one or both of video capture and/or image capture. A capture device may include one or more sensors coupled to the capture device, and/or other components. A sensor may be coupled to a capture device by virtue of being attached to the capture device and/or in communication with the capture device. The sensor output signals generated by an individual sensor may span an individual time duration. In some implementations, a time duration associated with generation of sensor output signals may correspond to a duration of a video captured by a capture device. For example, sensor output signals may be generated over the same or similar duration of video capture by a captured device.
In some implementations, sensors coupled to a capture device may include one or more of an image sensor, a geolocation sensor, a motion sensor, a sound transducer, an environment sensor, and/or other sensors.
An image sensor may be configured to generate output signals conveying light and/or electromagnetic radiation incident on the image sensor, and/or other information. In some implementations, an image sensor may comprise one or more of a photosensor array (e.g., an array of photosites), a charge-coupled device sensor, an active pixel sensor, a complementary metal-oxide semiconductor sensor, an N-type metal-oxide-semiconductor sensor, and/or other image sensors.
A geo-location sensor may be configured to generate output signals conveying location of a capture device, and/or other information. By way of non-limiting illustration, a geo-location sensor may comprise a GPS, and/or other sensors.
A motion sensor may be configured to generate output signals characterizing motion of a capture device over time. The motion of the capture device characterized by the output signals of the motion sensor may include one or more of speed, acceleration, rotation (e.g., pitch, roll, and/or yaw), orientation, and/or other motion. A motion sensor may include an inertial measurement unit, and/or other devices. By way of non-limiting illustration, a motion sensor may include one or more of an accelerometer, a gyroscope, a magnetometer, and/or other sensors.
A sound transducer may be configured to generate output signals conveying changes in pressure indicative of sound waves incident on the sound transducer. The output signals may characterize audio content of a video. By way of non-limiting illustration, a sound transducer may include a microphone.
An environment sensor may be configured to generate output signals conveying ambient environment information. Ambient environment information may include one or more of altitude, depth, ambient light, and/or other information. By way of non-limiting illustration, an environment sensor may include one or more of an altimeter, a pressure sensors, a light sensor, and/or other sensors.
The video component 102 may be configured to obtain information defining one or more videos, and/or other information. Information defining a video may include, for example, a video file. A video may include one or more of visual content, audio content, and/or other content. The visual content may be presented in the form of individual frame images in a set of multiple frame images of a video. The multiple frame images may be presented in an ordered sequence. The audio content may include recorded and/or provided audio that may accompany visual content. The audio content may be synchronized with visual content.
The video component 102 may be configured to obtain information defining one or more videos from one or more storage locations. A storage location may include electronic storage 12, electronic storage of one or more capture devices (not shown in
The video component 102 may be configured to obtain information defining one or more videos during acquisition of the information and/or after acquisition of the information by one or more capture devices. For example, video component 102 may obtain information defining one or more videos while the one or more videos are being captured by one or more capture devices. The video component 102 may obtain information defining one or more videos after the one or more videos have been captured and/or stored in memory (e.g., electronic storage 12, etc.). In some implementations, one or more videos may be characterized by one or more encoded framerates. An encoded framerate may define a number of frame images within a video per a time duration (e.g., number of frame images per second, etc.).
In some implementations, visual content may be defined by one or more of real-world visual information, electronic information, playback information, and/or other information. Real-world visual information may comprise information related to light and/or electromagnetic radiation incident on an image sensor of a capture device, and/or other information. Electronic information may comprise information related to information stored in electronic storage that conveys the light and/or electromagnetic radiation incident on an image sensor and may constitute a conversion of the real-world visual information to information suitable for electronic storage. Playback information may comprise information that may facilitate visual reproduction of the captured real-world visual information on a computing platform and/or other display device for viewing by a user, and/or other information. By way of non-limiting example, playback information may comprise a different format of the electronic information that may be readable by a playback device.
In some implementations, audio content may be defined by one or more of real-world audio information, electronic information, playback information, and/or other information. Real-world audio information may comprise information related to sound waves incident on a sound transducer and/or other sensor of a capture device, and/or other information. Electronic information may comprise information stored in electronic storage that may constitute a digital conversion of the real-world audio information to electronic information (e.g., an audio file). Playback information may comprise information that facilitates audible reproduction of captured real-world audio information on a computing platform and/or other audio reproduction device, and/or other information. By way of non-limiting example, playback information may comprise a different format of the electronic information that may be readable by a playback device.
The sensor component 104 may be configured to obtain sensor output signals generated by one or more sensors, and/or other information. The sensor component 104 may be configured to obtain sensor output signals from one or more storage locations. A storage location may include electronic storage 12, electronic storage of one or more capture devices (not shown in
The sensor component 104 may be configured to obtain sensor output signals during acquisition of the sensor output signals and/or after acquisition of the sensor output signals by one or more capture devices. For example, sensor component 104 may obtain sensor output signals from one or more sensors while the one or more videos and/or sensor output are being captured by one or more capture devices. The sensor component 104 may obtain sensor output signals after the one or more videos and/or sensor output signals have been captured and/or stored in memory (e.g., electronic storage 12, etc.).
The transform component 106 may be configured to transform sensor output signals and/or other information to a frequency domain. Transforming sensor output signals and/or other information to a frequency domain may generate information defining one or more frequency domain representations of the sensor output signals. In some implementations, a frequency domain representation may comprise a spectrogram representation and/or other frequency domain representations.
In some implementations, transforming sensor output signals and/or other information to a frequency domain may include applying one or more Fourier transforms to the sensor output signals. A Fourier transform may include one or more of a short-time Fourier transform (STFT) (alternatively, a short-term Fourier transform), a continuous-time STFT, a discrete-time STFT, a sliding DFT, and/or other transforms.
In some implementations, information defining individual spectrogram representations may be generated based on successions of transforms of sensor output signals within time windows along a time duration of the sensor output signals. The time windows used for transformation of an individual spectrogram representation may have a time length. The time length of the time windows may set an individual time resolution of an individual spectrogram representation. In some implementations, a transform of a sensor output signals may be generated as a time window is slid along the time axis of the sensor output signals over the time duration of the sensor output signals. In some implementations, a transform of sensor output signals may be generated based on individual time segments within the sensor output signals. Individual time segments may have a time length that is the same or similar to a time window used for the transformation.
By way of non-limiting illustration, transform component 106 may be configured to transform sensor output signals to a frequency domain to generate information defining a set of spectrogram representations of the sensor output signals. The information defining individual spectrogram representations in the set of spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows along a time duration of the sensor output signals. The time windows for an individual spectrogram representation in the set of spectrogram representations may have an individual time length. An individual time length of the time windows of a transform may set an individual time resolution of an individual spectrogram representation.
In some implementations, a set of spectrogram representations of sensor output signals may include one or more of a first spectrogram representation, a second spectrogram representation, a third spectrogram representation, and/or other a spectrogram representations. The first spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows having a first time length. The second spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows having a second time length. The third spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows having a third time length. The first spectrogram representation may have a first time resolution based on using the time windows of the first time length. The second spectrogram representation may have a second time resolution based on using the time windows of the second time length. The third spectrogram representation may have a third time resolution based on using the time windows of the third time length. In some implementations, an individual time length may be one of 10 milliseconds, 200 milliseconds, 1 second, and/or other time lengths. It may generally be desired that the time windows be different enough to capture signal changes at different time scales. By way of non-limiting illustration, the first time length may be 10 milliseconds, the second time length may be 200 milliseconds, and the third time length may be 1 second.
Returning to
The encoding component 108 may be configured to encode information defining individual spectrogram representations of a set of spectrogram representations into individual color channels of an image file. The encoded information defining the set of spectrogram representations may then be processed by processes that may conventionally handle image files (see, e.g., performance component 110).
By way of non-limiting illustration, information defining individual spectrogram representations in a set of spectrogram representations may be encoded into individual color channels of an image file such that information defining a first spectrogram representation in the set of spectrogram representations may be encoded into a first color channel of the image file, information defining a second spectrogram representation in the set of spectrogram representations may be encoded into a second color channel of the image file, information defining a third spectrogram representation in the set of spectrogram representations may be encoded into a third color channel of the image file, and/or other information defining other individual spectrogram representations in the set spectrogram representations may be encoded into other channels of the image file.
The performance component 110 may be configured to identify one or more activities and/or events represented in one or more videos. In some implementations, identification may be based on one or more spectrogram representations of sensor output signals generated by one or more sensors. In some implementations, identification may be based on an image file that includes information defining a set of spectrogram representations encoded into the image file.
In some implementations, identification may be based on one or more spectrogram representations of sensor output signals generated by one or more motion sensors. In some implementations, identification may be based on one or more spectrogram representations of sensor output signals generated by a single sensor. In some implementations, the single sensor may include one of a motion sensor, a sound transducer, and/or other sensor.
An activity and/or event may be represented in a video by virtue of one or both of an entity moving with a capture device during capture of the video preforming the activities and/or events, or the video portraying one or more entities performing the activities and/or events. Individual activities may be of one or more activity types. Activity types may be characterized by one or more of common movements, equipment, spatial context, and/or other features. Common movements may refer to movement so entities performing the activity that may conventionally define the activity. Equipment may refer to objects conventionally used in an activity. Spatial context may refer spatial relationship between an entity moving with a capture device and people and/or objects depicted in a video captured by the capture device. Events may be characterized by one or both of individual movements and/or sets of movements that may routinely occur during performance of an activity.
An activity type may include one or more of a sport type, a leisure type, and/or other types.
A sport type activity may include one or more sports characterized by one or more of common movements, equipment, spatial context, and/or other features that may be specific to individual ones of the one or more sports. By way of non-limiting illustration, sports of the sport activity type may include one or more of individual sports (e.g., tennis, track and field, golf, boxing, swimming, gymnastics, skiing, bowling, wrestling, powerlifting, mixed martial arts, tennis, archery, cycling, surfing, snowboarding, motorcycling, auto racing, and/or other individual sports), team sports (e.g., baseball, basketball, football, hockey, volleyball, tennis, and/or other team sports), and/or other sports that may be distinguishable based on one or more of common movements, equipment, spatial context, and/or other features that may be specific to individual sports. By way of non-limiting illustration, baseball may be characterized by one or more of common movements of players on a baseball diamond (e.g., running bases), equipment (e.g., bats, balls, gloves, bases, etc.), spatial context (e.g., arrangement of players on a field, a spatial relationship between a pitcher and a batter, etc.), and/or other features that may be specific to baseball.
A leisure type activity may include one or more leisure activities characterized by one or more of common movements, equipment, spatial context, and/or other features that may be specific to individual ones of the one or more leisure activities. By way of non-limiting illustration, activities of the leisure type may include one or more of walking, running, gamboling, swinging (on a swing), playing games (e.g., board games, video games, arcade games, etc.), and/or leisure activities that may be distinguishable based on one or more of common movements, equipment, spatial context, and/or other features that may be specific to individual leisure activities. By way of non-limiting illustration, swinging on a swing may be characterized by one or more of common movements of a person swinging (e.g., pendulum motion), equipment (e.g., a support structure, a swing, ropes, tree limb, etc.), spatial context, and/or other features that may be specific to swinging on a swing.
An event may be characterized by one or both of individual movements and/or sets of movements that may routinely occur during performance of an activity of a given activity type. It is noted that the amount of individual movements and/or sets of movements that may routinely occur during performance of an activity may be quite large. As such, while the below illustrates various examples of what individual movements and/or sets of movements that may routinely occur during performance of an activity of a given activity type, it is to be understood that this is for illustrative purposes only. One skilled in the art may ascertain other individual movements and/or sets of movements that may routinely occur during performance of one or more activities of one or more activity types that may be within the scope of the present disclosure.
Events occurring during performance of a sport type activity may include one or both of individual movements and/or sets of movements that may routinely occur during performance of the sport type activity. The following examples are provided for illustrative purposes.
By way of non-limiting illustration, events occurring during performance of the sport of baseball may include one or both of individual movements and/or sets of movements that may routinely occur during performance of the sport of baseball. Individual movements that may routinely occur during performance of the sport of baseball may include, for a batter, swinging a bat, and/or other individual movements. A set of movements that may routinely occur during performance of the sport of baseball, for a batter, may include dropping the bat and running to first base. Individual movements that may routinely occur during performance of the sport of baseball may include, for a fielder or baseman, catching a ball with the closure of mitt, and/or other individual movements. A set of movements that may routinely occur during performance of the sport of baseball, for a fielder or baseman, may include running for a ball, sliding for a catch, and/or making a catch.
By way of non-limiting illustration, events occurring during performance of the sport of surfing may include one or both of individual movements and/or sets of movements that may routinely occur during performance of the sport of surfing. Individual movements that may routinely occur during performance of the sport of surfing may include one or more of paddling with one or both arms, standing up on the surfboard, falling into the water, and/or other individual movements. A set of movements that may routinely occur during performance of the sport of surfing may include one or more of pumping down a wave, performing a maneuver of duck diving under an approaching wave, and/or other sets of movements.
Events occurring during performance of a leisure type activity may include one or both of individual movements and/or sets of movements that may routinely occur during performance of the leisure type activity. The following examples are provided for illustrative purposes.
By way of non-limiting illustration, events occurring during performance of the leisure activity of walking may include one or both of individual movements and/or sets of movements that may routinely occur during performance of the leisure activity of walking. Individual movements that may routinely occur during performance of the leisure activity of walking may include one or more of taking a step, stopping, turning around, performing a skip (or hop or bounce), falling, and/or other individual movements. A set of movements that may routinely occur during performance of the leisure activity of walking may include one or more of taking a series of steps while increasing speed, falling then getting back up, and/or other sets of movements.
In some implementations, one or more activities and/or events may be identified by performance component 110 using one or more machine learning techniques, and/or other techniques. Machine learning techniques may include one or more of a convolutional neural network, decision tree learning, supervised learning, minimax algorithm, unsupervised learning, semi-supervised learning, reinforcements learning, deep learning, artificial neural networks, support vector machine, clustering algorithms, genetic algorithms, random forest, and/or other techniques. A machine learning technique may be trained by providing exemplary inputs and specifying desired outputs.
In some implementations, one or more user-provided exemplars of sensor output signals from one or more sensors, one or more user-identified activities and/or events associated with the sensor output signals, and/or other information may be utilized at a training stage of a machine learning process. One or more spectrogram representations of the exemplar sensor output signals may be determined and used as exemplary inputs. The user-identified activities and/or events may be specified as the desired outputs.
In some implementations, information input into a trained machine learning process to identify one or more activities and/or events represented in a video may include one or more of an individual spectrogram representation of sensor output signals generated contemporaneously with capture of the video by a capture device, individual sets of spectrogram representations of sensor output signals generated contemporaneously with capture of the video by a capture device, individual image files including encoded information defining individual sets of spectrogram representations of sensor output signals generated contemporaneously with capture of the video by a capture device, and/or other information. The trained machine learning process may be configured to output identifications of one or more activities and/or events represented in the video.
Returning to
Although processor 11 is shown in
It should be appreciated that although computer components are illustrated in
The description of the functionality provided by the different computer program components described herein is for illustrative purposes, and is not intended to be limiting, as any of computer program components may provide more or less functionality than is described. For example, one or more of computer program components 102, 104, 106, 108, and/or 110 may be eliminated, and some or all of its functionality may be provided by other computer program components. As another example, processor 11 may be configured to execute one or more additional computer program components that may perform some or all of the functionality attributed to one or more of computer program components 102, 104, 106, 108, and/or 110 described herein.
The electronic storage media of electronic storage 12 may be provided integrally (i.e., substantially non-removable) with one or more components of system 10 and/or removable storage that is connectable to one or more components of system 10 via, for example, a port (e.g., a USB port, a Firewire port, etc.) or a drive (e.g., a disk drive, etc.). Electronic storage 12 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EPROM, EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. Electronic storage 12 may include one or more virtual storage resources (e.g., cloud storage, a virtual private network, and/or other virtual storage resources). Electronic storage 12 may be a separate component within system 10, or electronic storage 12 may be provided integrally with one or more other components of system 10 (e.g., processor 11). Although electronic storage 12 is shown in
In some implementations, method 200 may be implemented in a computer system comprising one or more of one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information), non-transitory electronic storage storing machine-readable instructions, and/or other components. The one or more processing devices may include one or more devices executing some or all of the operations of method 200 in response to instructions stored electronically on one or more electronic storage media. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 200.
Referring to
At operation 202, sensor output signals may be transformed to a frequency domain to generate information defining one or more spectrogram representation of the sensor output signals. In some implementations, operation 202 may be performed by a processor component the same as or similar to transform component 106 (shown in
At operation 203, one or more activities and/or events represented in a video may be identified from one or more spectrogram representations of sensor output signals of one or more motion sensors. In some implementations, operation 203 may be performed by a processor component the same as or similar to performance component 110 (shown in
In some implementations, method 300 may be implemented in a computer system comprising one or more of one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information), non-transitory electronic storage storing machine-readable instructions, and/or other components. The one or more processing devices may include one or more devices executing some or all of the operations of method 300 in response to instructions stored electronically on one or more electronic storage media. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 300.
Referring to
At operation 302, sensor output signals may be transformed to a frequency domain to generate information defining a set of spectrogram representations of the sensor output signals. The information defining individual spectrogram representations in the set of spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows along a time duration of the sensor output signals. Time windows for an individual spectrogram representation may have an individual time length that sets an individual time resolution of the individual spectrogram representation. The set of spectrogram representations of the sensor output signals may include one or more of a first spectrogram representation, a second spectrogram representation, a third spectrogram representation, and/or other spectrogram representations. The first spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows having a first time length. The second spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows having a second time length. The third spectrogram representation may be generated based on successions of transforms of the sensor output signals within time windows having a third time length. In some implementations, operation 302 may be performed by a processor component the same as or similar to transform component 106 (shown in
At operation 303, information defining a set of spectrogram representations may be encoded into an image file. The information defining individual spectrogram representations in a set of spectrogram representations may be encoded into individual color channels of the image file. By way of non-limiting illustration, information defining a first spectrogram representation may be encoded into a first color channel of the image file. Information defining a second spectrogram representation may be encoded into a second color channel of the image file. Information defining a third spectrogram representation may be encoded into a third color channel of the image file. In some implementations, operation 303 may be performed by a processor component the same as or similar to encoding component 108 (shown in
At operation 304, one or more activities and/or events represented in a video may be identified from an image file having information defining one or more spectrogram representations encoded therein. In some implementations, operation 304 may be performed by a processor component the same as or similar to performance component 108 (shown in
Although the system(s) and/or method(s) of this disclosure have been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any implementation can be combined with one or more features of any other implementation.
Number | Name | Date | Kind |
---|---|---|---|
5130794 | Ritchey | Jul 1992 | A |
6337683 | Gilbert | Jan 2002 | B1 |
6593956 | Potts | Jul 2003 | B1 |
7222356 | Yonezawa | May 2007 | B1 |
7483618 | Edwards | Jan 2009 | B1 |
8446433 | Mallet | May 2013 | B1 |
8611422 | Yagnik | Dec 2013 | B1 |
8718447 | Yang | May 2014 | B2 |
8730299 | Kozko | May 2014 | B1 |
8763023 | Goetz | Jun 2014 | B1 |
8910046 | Matsuda | Dec 2014 | B2 |
8988509 | Macmillan | Mar 2015 | B1 |
9032299 | Lyons | May 2015 | B2 |
9036001 | Chuang | May 2015 | B2 |
9077956 | Morgan | Jul 2015 | B1 |
9111579 | Meaney | Aug 2015 | B2 |
9142253 | Ubillos | Sep 2015 | B2 |
9151933 | Sato | Oct 2015 | B2 |
9204039 | He | Dec 2015 | B2 |
9208821 | Evans | Dec 2015 | B2 |
9245582 | Shore | Jan 2016 | B2 |
9253533 | Morgan | Feb 2016 | B1 |
9317172 | Lyons | Apr 2016 | B2 |
9423944 | Eppolito | Aug 2016 | B2 |
9473758 | Long | Oct 2016 | B1 |
9479697 | Aguilar | Oct 2016 | B2 |
9564173 | Swenson | Feb 2017 | B2 |
20040128317 | Sull | Jul 2004 | A1 |
20050025454 | Nakamura | Feb 2005 | A1 |
20060122842 | Herberger | Jun 2006 | A1 |
20070173296 | Hara | Jul 2007 | A1 |
20070204310 | Rua | Aug 2007 | A1 |
20070230461 | Singh | Oct 2007 | A1 |
20080044155 | Kuspa | Feb 2008 | A1 |
20080123976 | Coombs | May 2008 | A1 |
20080152297 | Ubillos | Jun 2008 | A1 |
20080163283 | Tan | Jul 2008 | A1 |
20080177706 | Yuen | Jul 2008 | A1 |
20080208791 | Das | Aug 2008 | A1 |
20080253735 | Kuspa | Oct 2008 | A1 |
20080313541 | Shafton | Dec 2008 | A1 |
20090213270 | Ismert | Aug 2009 | A1 |
20090274339 | Cohen | Nov 2009 | A9 |
20090327856 | Mouilleseaux | Dec 2009 | A1 |
20100045773 | Ritchey | Feb 2010 | A1 |
20100064219 | Gabrisko | Mar 2010 | A1 |
20100086216 | Lee | Apr 2010 | A1 |
20100104261 | Liu | Apr 2010 | A1 |
20100183280 | Beauregard | Jul 2010 | A1 |
20100231730 | Ichikawa | Sep 2010 | A1 |
20100245626 | Woycechowsky | Sep 2010 | A1 |
20100251295 | Amento | Sep 2010 | A1 |
20100278504 | Lyons | Nov 2010 | A1 |
20100278509 | Nagano | Nov 2010 | A1 |
20100281375 | Pendergast | Nov 2010 | A1 |
20100281386 | Lyons | Nov 2010 | A1 |
20100287476 | Sakai | Nov 2010 | A1 |
20100299630 | McCutchen | Nov 2010 | A1 |
20100318660 | Balsubramanian | Dec 2010 | A1 |
20100321471 | Casolara | Dec 2010 | A1 |
20110025847 | Park | Feb 2011 | A1 |
20110069148 | Jones | Mar 2011 | A1 |
20110069189 | Venkataraman | Mar 2011 | A1 |
20110075990 | Eyer | Mar 2011 | A1 |
20110093798 | Shahraray | Apr 2011 | A1 |
20110134240 | Anderson | Jun 2011 | A1 |
20110173565 | Ofek | Jul 2011 | A1 |
20110206351 | Givoly | Aug 2011 | A1 |
20110211040 | Lindemann | Sep 2011 | A1 |
20110258049 | Ramer | Oct 2011 | A1 |
20110293250 | Deever | Dec 2011 | A1 |
20110320322 | Roslak | Dec 2011 | A1 |
20120014673 | O'Dwyer | Jan 2012 | A1 |
20120027381 | Kataoka | Feb 2012 | A1 |
20120030029 | Flinn | Feb 2012 | A1 |
20120057852 | Devleeschouwer | Mar 2012 | A1 |
20120123780 | Gao | May 2012 | A1 |
20120127169 | Barcay | May 2012 | A1 |
20120206565 | Villmer | Aug 2012 | A1 |
20120311448 | Achour | Dec 2012 | A1 |
20130024805 | In | Jan 2013 | A1 |
20130044108 | Tanaka | Feb 2013 | A1 |
20130058532 | White | Mar 2013 | A1 |
20130063561 | Stephan | Mar 2013 | A1 |
20130078990 | Kim | Mar 2013 | A1 |
20130127636 | Aryanpur | May 2013 | A1 |
20130128042 | Bridge | May 2013 | A1 |
20130136193 | Hwang | May 2013 | A1 |
20130142384 | Ofek | Jun 2013 | A1 |
20130151970 | Achour | Jun 2013 | A1 |
20130166303 | Chang | Jun 2013 | A1 |
20130191743 | Reid | Jul 2013 | A1 |
20130195429 | Fay | Aug 2013 | A1 |
20130197967 | Pinto | Aug 2013 | A1 |
20130208134 | Hamalainen | Aug 2013 | A1 |
20130208942 | Davis | Aug 2013 | A1 |
20130215220 | Wang | Aug 2013 | A1 |
20130259399 | Ho | Oct 2013 | A1 |
20130263002 | Park | Oct 2013 | A1 |
20130283301 | Avedissian | Oct 2013 | A1 |
20130287214 | Resch | Oct 2013 | A1 |
20130287304 | Kimura | Oct 2013 | A1 |
20130300939 | Chou | Nov 2013 | A1 |
20130308921 | Budzinski | Nov 2013 | A1 |
20130318443 | Bachman | Nov 2013 | A1 |
20130343727 | Rav-Acha | Dec 2013 | A1 |
20140026156 | Deephanphongs | Jan 2014 | A1 |
20140064706 | Lewis, II | Mar 2014 | A1 |
20140072285 | Shynar | Mar 2014 | A1 |
20140093164 | Noorkami | Apr 2014 | A1 |
20140096002 | Dey | Apr 2014 | A1 |
20140105573 | Hanckmann | Apr 2014 | A1 |
20140161351 | Yagnik | Jun 2014 | A1 |
20140165119 | Liu | Jun 2014 | A1 |
20140169766 | Yu | Jun 2014 | A1 |
20140176542 | Shohara | Jun 2014 | A1 |
20140193040 | Bronshtein | Jul 2014 | A1 |
20140212107 | Saint-Jean | Jul 2014 | A1 |
20140219634 | McIntosh | Aug 2014 | A1 |
20140226953 | Hou | Aug 2014 | A1 |
20140232818 | Carr | Aug 2014 | A1 |
20140232819 | Armstrong | Aug 2014 | A1 |
20140245336 | Lewis, II | Aug 2014 | A1 |
20140300644 | Gillard | Oct 2014 | A1 |
20140328570 | Cheng | Nov 2014 | A1 |
20140341528 | Mahate | Nov 2014 | A1 |
20140366052 | Ives | Dec 2014 | A1 |
20140376876 | Bentley | Dec 2014 | A1 |
20150015680 | Wang | Jan 2015 | A1 |
20150016712 | Rhoads | Jan 2015 | A1 |
20150022355 | Pham | Jan 2015 | A1 |
20150029089 | Kim | Jan 2015 | A1 |
20150058709 | Zaletel | Feb 2015 | A1 |
20150085111 | Lavery | Mar 2015 | A1 |
20150154452 | Bentley | Jun 2015 | A1 |
20150178915 | Chatterjee | Jun 2015 | A1 |
20150186073 | Pacurariu | Jul 2015 | A1 |
20150220504 | Bocanegra Alvarez | Aug 2015 | A1 |
20150254871 | Macmillan | Sep 2015 | A1 |
20150256746 | Macmillan | Sep 2015 | A1 |
20150256808 | Macmillan | Sep 2015 | A1 |
20150271483 | Sun | Sep 2015 | A1 |
20150287435 | Land | Oct 2015 | A1 |
20150294141 | Molyneux | Oct 2015 | A1 |
20150318020 | Pribula | Nov 2015 | A1 |
20150339324 | Westmoreland | Nov 2015 | A1 |
20150375117 | Thompson | Dec 2015 | A1 |
20150382083 | Chen | Dec 2015 | A1 |
20160005435 | Campbell | Jan 2016 | A1 |
20160005440 | Gower | Jan 2016 | A1 |
20160026874 | Hodulik | Jan 2016 | A1 |
20160027470 | Newman | Jan 2016 | A1 |
20160027475 | Hodulik | Jan 2016 | A1 |
20160029105 | Newman | Jan 2016 | A1 |
20160055885 | Hodulik | Feb 2016 | A1 |
20160088287 | Sadi | Mar 2016 | A1 |
20160098941 | Kerluke | Apr 2016 | A1 |
20160119551 | Brown | Apr 2016 | A1 |
20160124071 | Baxley | May 2016 | A1 |
20160217325 | Bose | Jul 2016 | A1 |
20160225405 | Matias | Aug 2016 | A1 |
20160225410 | Lee | Aug 2016 | A1 |
20160234345 | Roberts | Aug 2016 | A1 |
20160358603 | Azam | Dec 2016 | A1 |
20160366330 | Boliek | Dec 2016 | A1 |
20170006214 | Andreassen | Jan 2017 | A1 |
Number | Date | Country |
---|---|---|
2001020466 | Mar 2001 | WO |
2009040538 | Apr 2009 | WO |
Entry |
---|
Ernoult, Emeric, “How to Triple Your YouTube Video Views with Facebook”, SocialMediaExaminer.com, Nov. 26, 2012, 16 pages. |
FFmpeg, “AVPacket Struct Reference,” Doxygen, Jul. 20, 2014, 24 Pages, [online] [retrieved on Jul. 13, 2015] Retrieved from the internet <URL:https://www.ffmpeg.org/doxygen/2.5/group_lavf_decoding.html>. |
FFmpeg, “Demuxing,” Doxygen, Dec. 5, 2014, 15 Pages, [online] [retrieved on Jul. 13, 2015] Retrieved from the internet <URL:https://www.ffmpeg.org/doxygen/2.3/group_lavf_encoding.html>. |
FFmpeg, “Muxing,” Doxygen, Jul. 20, 2014, 9 Pages, [online] [retrieved on Jul. 13, 2015] Retrieved from the internet <URL: https://www.ffmpeg.org/doxyg en/2. 3/structA VP a ck et. html>. |
Han et al., Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, International Conference on Learning Representations 2016, 14 pgs. |
He et al., “Deep Residual Learning for Image Recognition,” arXiv:1512.03385, 2015, 12 pgs. |
Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size,” arXiv:1602.07360, 2016, 9 pgs. |
Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”, arXiv:1602.07360v3 [cs.CV] Apr. 6, 2016 (9 pgs.). |
Ioffe et al., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167, 2015, 11 pgs. |
Parkhi et al., “Deep Face Recognition,” Proceedings of the British Machine Vision, 2015, 12 pgs. |
PCT International Preliminary Report on Patentability for PCT/US2015/023680, dated Oct. 4, 2016, 10 pages. |
PCT International Search Report and Written Opinion for PCT/US15/12086 dated Mar. 17, 2016, 20 pages. |
PCT International Search Report and Written Opinion for PCT/US2015/023680, dated Oct. 6, 2015, 13 pages. |
PCT International Search Report for PCT/US15/23680 dated Aug. 3, 2015, 4 pages. |
PCT International Search Report for PCT/US15/41624 dated Nov. 4, 2015, 5 pages. |
PCT International Written Opinion for PCT/US2015/041624, dated Dec. 17, 2015, 7 pages. |
Schroff et al., “FaceNet: a Unified Embedding for Face Recognition and Clustering,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, 10 pgs. |
Tran et al., “Learning Spatiotemporal Features with 3D Convolutional Networks”, arXiv:1412.0767 [cs.CV] Dec. 2, 2014 (9 pgs). |
Yang et al., “Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-encoders” arXiv:1510.01442v1 [cs.CV] Oct. 6, 2015 (9 pgs). |
Ricker, “First Click: TomTom's Bandit camera beats GoPro with software” Mar. 9, 2016 URL: http://www.theverge.com/2016/3/9/11179298/tomtom-bandit-beats-gopro (6 pages). |
PCT International Search Report and Written Opinion for PCT/US15/18538, dated Jun. 16, 2015, 26 pages. |
PCT International Search Report for PCT/US17/16367 dated Apr. 14, 2017 (2 pages). |
PCT International Search Reort for PCT/US15/18538 dated Jun. 16, 2015 (2 pages). |