This disclosure relates to systems and methods that classify events monitored by sensors.
Convolutional neural networks may be trained to classify activities using multiple feature sources (e.g., image sensor, audio sensor) by concatenating features from the multiple features sources into a single combined feature and processing the single combined feature using a standard loss function. However, such training scheme fails to explicitly capture another piece of information that can improve training; that each individual feature source may have enough information to classify activities.
This disclosure relates to classifying events monitored by sensors. A set of sensor information conveyed by sensor output signals may be accessed. The sensor output signals may be generated by a set of sensors. The set of sensor information may characterize an event monitored by the set of sensors. The set of sensor information may be processed through a multi-feature convolutional neural network. The multi-feature convolutional neural network may be trained using a branch-loss function. The branch-loss function may include individual loss functions for individual sensor information and one or more combined loss functions for combined sensor information. A classification of the event may be obtained from the multi-feature convolutional network based on the set of sensor information.
A system that classifies events monitored by sensors may include one or more processors, and/or other components. The processor(s) may be configured by machine-readable instructions. Executing the machine-readable instructions may cause the processor(s) to facilitate classifying events monitored by sensors. The machine-readable instructions may include one or more computer program components. The computer program components may include one or more of an access component, a process component, an obtain component, and/or other computer program components.
The access component may be configured to access a set of sensor information conveyed by sensor output signals. The sensor output signals may be generated by a set of sensors. The set of sensor information may characterize an event monitored by the set of sensors. The set of sensor information may include first sensor information, second sensor information, and/or other sensor information. The first sensor information may be conveyed by first sensor output signals. The first sensor output signals may be generated by a first sensor. The first sensor information may characterize the event monitored by the first sensor. The second sensor information may be conveyed by second sensor output signals. The second sensor output signals may be generated by a second sensor. The second sensor information may characterize the event monitored by the second sensor.
In some implementations, the set of sensor information may further include third sensor information. The third sensor information may be conveyed by third sensor output signals. The third sensor output signals may be generated by a third sensor. The third sensor information may characterize the event monitored by the third sensor.
In some implementations, the first sensor information may include first visual information and/or other information. The first sensor output signals may include first visual output signals and/or other output signals. The first sensor may include a first image sensor and/or other sensors.
In some implementations, the second sensor information may include second visual information and/or other information. The second sensor output signals may include second visual output signals and/or other output signals. The second sensor may include a second image sensor and/or other sensors.
In some implementations, the second sensor information may include audio information and/or other information. The second sensor output signals may include audio output signals and/or other output signals. The second sensor may include an audio sensor and/or other sensors.
In some implementations, the second sensor information may include motion information and/or other information. The second sensor output signals may include motion output signals and/or other output signals. The second sensor may include an motion sensor and/or other sensors.
In some implementations, the second sensor information may include location information and/or other information. The second sensor output signals may include location output signals and/or other output signals. The second sensor may include an location sensor and/or other sensors.
The process component may be configured to process the set of sensor information through a multi-feature convolutional neural network. The multi-feature convolutional neural network may be trained using a branch-loss function. The branch-loss function may include individual loss functions for individual sensor information, one or more combined loss functions for combined sensor information, and/or other loss functions.
In some implementations, individual loss functions for individual sensor information may include a first sensor information loss function, a second sensor information loss function, and/or other sensor information loss function. The first sensor information loss function may include the first sensor information processed through a first fully connected layer, a second softmax layer, and a first loss function. The second sensor information loss function may include the second sensor information processed through a second fully connected layer, a second softmax layer, and a second loss function.
In some implementations, individual loss functions for the individual sensor information may further include a third sensor information loss function. The third sensor information loss function may include the third information processed through a third fully connected layer, a third softmax layer, and a third loss function.
In some implementations one or more of the first loss function, the second loss function, and the third loss function may include a cross-entropy loss function, a quadratic loss function, or an exponential loss function.
In some implementations, one or more combined loss functions for combined sensor information may include a first combined loss function and/or other combined loss function. The first combined loss function may include a combination of a first output of the first fully connected layer and a second output of the second fully connected layer processed through a first combined fully connected layer, a first combined softmax layer, and a first combined loss function.
In some implementations, one or more combined loss functions may further include a second combined loss function, a third combined loss function, and a fourth combined loss function. The second combined loss function may include a combination of the second output of the second fully connected layer and a third output of the third fully connected layer processed through a second combined fully connected layer, a second combined softmax layer, and a second combined loss function. The third combined loss function may include a combination of the first output of the first fully connected layer and the third output of the third fully connected layer processed through a third combined fully connected layer, a third combined softmax layer, and a third combined loss function. The fourth combined loss function may include a combination of the first output of the first fully connected layer, the second output of the second fully connected layer, and the third output of the third fully connected layer processed through a fourth combined fully connected layer, a fourth combined softmax layer, and a fourth combined loss function.
The obtain component may be configured to obtain a classification of the event from the multi-feature convolutional neural network. The classification of the event may be obtained based on the set of sensor information and/or other information.
These and other objects, features, and characteristics of the system and/or method disclosed herein, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, will become more apparent upon consideration of the following description and the appended claims with reference to the accompanying drawings, all of which form a part of this specification, wherein like reference numerals designate corresponding parts in the various figures. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the invention. As used in the specification and in the claims, the singular form of “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
Electronic storage 12 may be configured to include electronic storage medium that electronically stores information. Electronic storage 12 may store software algorithms, information determined by processor 11, information received remotely, and/or other information that enables system 10 to function properly. For example, electronic 12 may store information relating to set of sensors 14, first sensor 15, second sensor 16, third sensor 17, sensor output signals, sensor information, multi-feature convolutional neural network, branch-loss function, classification of events, and/or other information.
Set of sensors 14 may be configured to generate sensor output signals conveying a set of sensor information. The set of sensor information may characterize an event monitored by set of sensors 14. Set of sensors 14 may include first sensor 15, second sensor 16, and/or other sensors. In some implementations, set of sensors 14 may include third sensor 17. Two or more sensors of set of sensors 14 may be located at same or different locations. For example, first sensor 15 and second sensor 16 may include the same type of sensor (e.g., image sensor) monitoring an event from the same location (e.g., located within a body of a camera and having different viewing directions/field of views of the event). First sensor 15 and second sensor 16 may include the same type of sensor (e.g., image sensor) monitoring an event from different locations (e.g., capturing visuals of the event from different locations). First sensor 15 and second sensor 16 may include different types of sensor (e.g., image sensor and motion sensor) monitoring an event from the same location (e.g., located within a body of a camera). First sensor 15 and second sensor 16 may include different type of sensors (e.g., image sensor and audio sensor) monitoring an event from different locations (e.g., capturing visuals and sounds of the event from different locations).
First sensor 15 may generate first sensor output signals. The first sensor output signals may convey first sensor information. The first sensor information may characterize the event monitored by first sensor 15. Second sensor 16 may generate second sensor output signals. The second sensor output signals may convey second sensor information. The second sensor information may characterize the event monitored by second sensor 16. Third sensor 17 may generate third sensor output signals. The third sensor output signals may convey third sensor information. The third sensor information may characterize the event monitored by third sensor 17.
One or more of first sensor 15, second sensor 16, third sensor 17, and/or other sensors may include an image sensor, an audio sensor, a motion sensor, a location sensor, and/or other sensors. An image sensor may generate visual output signals conveying visual information within the field of view of the image sensor. Visual information may define one or more images or videos of the event. An audio sensor may generate audio output signals conveying audio information. Audio information may define one or more audio/sound clips of the event. A motion sensor may generate motion output signals conveying motion information. Motion information may define one or more movements and/or orientations of the motion sensor/object monitored by the motion sensor (e.g., camera in which the motion sensor is located). A location sensor may generate location output signals conveying location information. The location information may define one or more locations of the location sensor/object monitored by the location sensor (e.g., camera in which the location sensor is located). Other types of sensors are contemplated.
Processor 11 may be configured to provide information processing capabilities in system 10. As such, processor 11 may comprise one or more of a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. Processor 11 may be configured to execute one or more machine readable instructions 100 to facilitate classifying events monitored by sensors. Machine readable instructions 100 may include one or more computer program components. Machine readable instructions 100 may include one or more of access component 102, process component 104, obtain component 106, and/or other computer program components.
Access component 102 may be configured to access a set of sensor information conveyed by sensor output signals. Access component 102 may be configured to access the set of sensor information charactering the event monitored by set of sensors 14. Access component 102 may access one or more sensor information (e.g., visual information, audio information, motion information, location information) from one or more storage locations. A storage location may include electronic storage 12, electronic storage of one or more sensors, and/or other locations. For example, access component 102 may access visual information (from one or more image sensors) stored in storage media 12.
Access component 102 may be configured to access one or more sensor information during the acquisition of the sensor information and/or after the acquisition of the sensor information by one or more sensors. For example, access component 102 may access visual information defining an image while the image is being captured by one or more image sensors. Access component 102 may access visual information defining an image after the image has been captured and stored in memory (e.g., storage media 12).
Process component 104 may be configured to process the set of sensor information through a multi-feature convolutional neural network. Individual sensor information may provide individual features for processing by the multi-feature convolutional neural network. For example, visual sensor information may provide one or more images/videos as features for processing by the multi-feature convolutional neural network. Non-visual information (e.g., audio information, motion information, location information) may be converted into one or more visual representations (e.g., spectrogram) for processing by the multi-feature convolutional neural network. A multi-feature convolutional neural network may include a one-dimensional convolutional neural network, a two-dimensional convolutional neural network, a three-dimensional convolutional neural network, and/or a convolutional neural network of other dimensions.
The multi-feature convolutional neural network may be trained using a branch-loss function. The branch-loss function may include individual loss functions for individual sensor information, one or more combined loss functions for combined sensor information, and/or other loss functions. Training of the multi-feature convolutional neural network using a branch-loss function enables the multi-feature convolutional neural network to classify activities using one or more individual features, and/or one or more combined features. Training of the multi-feature convolutional neural network using a branch-loss function increases the accuracy of the classification performed by the multi-feature convolutional neural network.
Combined features A-B loss function (FAB 330) may include a combination of outputs of fully connected layer 314 and fully connected layer 324 (combined features 332) processed through fully connected layer 334, softmax 336, and loss 338. One or more of losses 318, 328, 338 may include a cross-entropy loss, a quadratic loss, an exponential loss, and/or other loss.
Combined features A-B loss function (FAB 440) may include a combination of outputs of the fully connected layer for feature A 412 and the fully connected layer for feature B 422 processed through a fully connected layer, a softmax layer, and a loss function. Combined features B-C loss function (FBC 450) may include a combination of outputs of the fully connected layer for feature B 422 and the fully connected layer for feature C 432 processed through a fully connected layer, a softmax layer, and a loss function. Combined features A-C loss function (FAC 460) may include a combination of outputs of the fully connected layer for feature A 412 and the fully connected layer for feature C 432 processed through a fully connected layer, a softmax layer, and a loss function. Combined features A-B-C loss function (FABC 470) may include a combination of outputs of the fully connected layer for feature A 412, the fully connected layer for feature B 422, and the fully connected layer for feature C 432 processed through a fully connected layer, a softmax layer, and a loss function. One or more loss functions may include a cross-entropy loss, a quadratic loss, an exponential loss, and/or other loss.
In some implementations, one or more weighing factors may be introduced into a branch loss function. Weighing factors may change the influence of different loss functions in a branch loss function. For example,
The multi-feature convolutional neural networks may be trained using branch-loss functions for other numbers of features. For example,
Obtain component 106 may be configured to obtain a classification of the event from the multi-feature convolutional neural network. The classification of the event may be obtained based on the set of sensor information (e.g., features) and/or other information. Classification of the event may equal/be obtained from (at inference time) softmax values (e.g., values of softmax 336, values of softmax of FABC 470). A classification of an event obtained from a multi-feature convolutional neural network may have greater accuracy than a classification of an event obtained from a convolutional neural network trained using standard loss functions for concatenated features.
For example, a person may be surfing while using multiple sensors to monitor the surfing activity. Multiple sensors used by the surfer may include a camera mounted on the surfing board and/or a camera mounted on the person's body (e.g., head or chest-mounted camera). One or both of the cameras may additionally include one or more audio sensors to record sounds, motion sensors to measure motion of the person/surfboard, location sensors to identify locations of the person/surfboard, and/or other sensors. The multi-feature convolutional neural network trained using a branch loss function (for two or more of visual features, audio features, motion features, location features, other features), which processes multiple features for classification, may more accurately classify the person's activity as “surfing” than a convolutional neural network trained using a standard loss function (for concatenated features), which processes concatenated features for classification. Uses of other sensors/sensor information and identification of other activities by the multi-feature convolutional neural network are contemplated.
Implementations of the disclosure may be made in hardware, firmware, software, or any suitable combination thereof. Aspects of the disclosure may be implemented as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a tangible computer readable storage medium may include read only memory, random access memory, magnetic disk storage media, optical storage media, flash memory devices, and others, and a machine-readable transmission media may include forms of propagated signals, such as carrier waves, infrared signals, digital signals, and others. Firmware, software, routines, or instructions may be described herein in terms of specific exemplary aspects and implementations of the disclosure, and performing certain actions.
Although processor 11 and electronic storage 12 are shown to be connected to interface 13 in
Although processor 11 is shown in
It should be appreciated that although computer components are illustrated in
The description of the functionality provided by the different computer program components described herein is for illustrative purposes, and is not intended to be limiting, as any of computer program components may provide more or less functionality than is described. For example, one or more of computer program components 102, 104, and/or 106 may be eliminated, and some or all of its functionality may be provided by other computer program components. As another example, processor 11 may be configured to execute one or more additional computer program components that may perform some or all of the functionality attributed to one or more of computer program components 102, 104, and/or 106 described herein.
The electronic storage media of electronic storage 12 may be provided integrally (i.e., substantially non-removable) with one or more components of system 10 and/or removable storage that is connectable to one or more components of system 10 via, for example, a port (e.g., a USB port, a Firewire port, etc.) or a drive (e.g., a disk drive, etc.). Electronic storage 12 may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EPROM, EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. Electronic storage 12 may be a separate component within system 10, or electronic storage 12 may be provided integrally with one or more other components of system 10 (e.g., processor 11). Although electronic storage 12 is shown in
In some implementations, method 200 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, a central processing unit, a graphics processing unit, a microcontroller, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices executing some or all of the operation of method 200 in response to instructions stored electronically on one or more electronic storage mediums. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operation of method 200.
Referring to
At operation 202, the set of sensor information may be processed through a multi-feature convolutional neural network. The multi-feature convolutional neural network may be trained using a branch-loss function. The branch-loss function may include individual loss functions for individual sensor information and one or more combined loss functions for combined sensor information. In some implementations, operation 202 may be performed by a processor component the same as or similar to process component 104 (Shown in
At operation 203, a classification of the event may be obtained from the multi-feature convolutional neural network. The classification may be obtained based on the set of sensor information. In some implementations, operation 203 may be performed by a processor component the same as or similar to obtain component 106 (Shown in
Although the system(s) and/or method(s) of this disclosure have been described in detail for the purpose of illustration based on what is currently considered to be the most practical and preferred implementations, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed implementations, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any implementation can be combined with one or more features of any other implementation.