This disclosure relates generally to neural networks and, more particularly, to a spectral nonlocal block for a neural network and methods, apparatus, and articles of manufacture to control the same.
A neural network typically includes multiple layers of nodes, which include an input layer, one or more intermediate layers, and an output layer of the neural network, also referred to as the classification layer of the neural network. The training of the neural network typically includes varying the node weights in the layers of the neural network to meet a classification performance target. Some neural network initialization techniques focus on maintaining the magnitudes of the weights of the layers within a target range, which helps ensure convergence of the neural network.
The figures are not to scale. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts, elements, etc.
Descriptors “first,” “second,” “third,” etc., are used herein when identifying multiple elements or components which may be referred to separately. Unless otherwise specified or understood based on their context of use, such descriptors are not intended to impute any meaning of priority or ordering in time but merely as labels for referring to multiple elements or components separately for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for ease of referencing multiple elements or components.
As noted above, a neural network typically includes multiple layers of nodes, which include an input layer, one or more intermediate layers, and an output layer of the neural network, also referred to as the classification layer of the neural network. The training of the neural network includes varying the node weights in the layers of the neural network to meet a classification performance target. Neural networks (e.g., convolutional neural networks (CNNs), deep neural networks, etc.) are increasingly used in many fields including computer vision tasks. Traditional neural networks have a limited field of view in classifying data, which hinders long-range dependencies of rich, structured information used in computer vision tasks. Long range dependencies correspond to a rate of decay of statistical dependence of two points with increasing time interval or spatial distance between the two points. Some neural networks include convolutional layer(s) that focus on a small section of input data (e.g., a 3 by 3 kernel of an image). In such neural networks, a larger receptive field can be obtained by stacking multiple convolution layers. However, stacking multiple layers creates a damping effect caused by interference between a large number of positional pairs. Examples disclosed herein utilize the full range of input data (e.g., an image) to avoid stacking deeper layers, thereby resulting in a flexible layer that avoids the damping effect caused by the interference between the large number of positional pairs of traditional techniques.
To capture long-range dependencies for related data (e.g., one or more images captured by an image and/or video sensor), nonlocal blocks have been introduced into neural networks to create a dense affinity matrix that includes a relation between every pairwise position and use the affinity matrix as an attention map to aggregate features. However, such nonlocal blocks diminish the differentiated features due to a damping effect resulting from an interference between the large number of position pairs. Examples disclosed herein include an efficient nonlocal block including a spectral nonlocal block (SNL) and/or a general SNL (gSNL). The nonlocal block disclosed herein can be inserted into neural network backbones (e.g., as a plug and play component) to capture long-range dependencies with better efficiency than traditional nonlocal blocks.
Examples disclosed herein process a full range of the input data to provide increase efficiency in object detection, segmentation, etc. Although interference increases as the range of the input data increases, examples disclosed herein achieve better context encoding by processing a full-range of dependencies while suppressing the interference using the SNL and gSNL blocks. Accordingly, examples disclosed herein utilize a SNL block and a gSNL block to process a full-range of dependencies using a 1st order and/or a full-order Chebyshev polynomials to approximate a filter of a fully-connected graph that can be implemented in existing models. The examples disclosed herein achieve better performance in multiple computer vision tasks including image/video classification compared to prior models.
Artificial intelligence (AI), including machine learning (ML), deep learning (DL), and/or other artificial machine-driven logic, enables machines (e.g., computers, logic circuits, etc.) to use a model to process input data to generate an output based on patterns and/or associations previously learned by the model via a training process. For instance, the model may be trained with data to recognize patterns and/or associations and follow such patterns and/or associations when processing input data such that other input(s) result in output(s) consistent with the recognized patterns and/or associations.
Many different types of machine learning models and/or machine learning architectures exist. In examples disclosed herein, a neural network model is used. In general, machine learning models/architectures that are suitable to use with the example approaches disclosed herein include neural network based models (e.g., convolution neural networks (CNNs), deep neural networks (DNNs), etc.). However, other types of machine learning models could additionally or alternatively be used, such as deep learning and/or any other type of AI model.
In general, implementing an ML/AI system involves two phases, a training phase (also referred to as a learning phase) and an inference phase. In the training phase, a training algorithm is used to train a model to operate in accordance with patterns and/or associations based on, for example, training data, also referred to herein as training samples. In general, the model includes internal parameters that guide how input data is transformed into output data, such as through a series of nodes and connections within the model to transform input data into output data. In some examples, hyperparameters are used as part of the training process to control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, etc.). Hyperparameters are defined to be training parameters that are determined prior to initiating the training process.
Different types of training may be performed based on the type of ML/AI model and/or the expected output. For example, supervised training uses training samples that include inputs and corresponding expected (e.g., labeled) outputs to select parameters (e.g., by iterating over combinations of select parameters) for the ML/AI model that reduce model error. As used herein, labelling refers to an expected output of the machine learning model (e.g., a classification, an expected output value, etc.). Alternatively, unsupervised training (e.g., used in deep learning, a subset of machine learning, etc.) involves inferring patterns from inputs to select parameters for the ML/AI model (e.g., without the benefit of expected (e.g., labeled) outputs).
In examples disclosed herein, ML/AI models are trained using any training algorithm and/or any type of training data. In examples disclosed herein, training is performed until an acceptable amount of error is achieved. Training is performed using hyperparameters that control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, etc.). In some examples, re-training may be performed. Such re-training may be performed in response to obtaining additional training data, for example.
In some examples, training is performed using training data. Because supervised training is used, the training data is labeled. Labeling is applied to the training data by an audience measurement entity, a server, and/or a human.
Once training is complete, the model is deployed for use as an executable construct that processes an input and provides an output based on the network of nodes and connections defined in the model. The model may be stored locally or remotely. The model may then be executed by a model generator or other device to perform classifications of input data.
Once trained, the deployed model may be operated in an inference phase to process data. In the inference phase, data to be analyzed (e.g., live data) is input to the model, and the model executes to create an output. This inference phase can be thought of as the AI “thinking” to generate the output based on what the AI model learned from the training (e.g., by executing the model to apply the learned patterns and/or associations to the live data). In some examples, input data undergoes pre-processing before being used as an input to the machine learning model. Moreover, in some examples, the output data may undergo post-processing after it is generated by the AI model to transform the output into a useful result (e.g., a display of data, an instruction to be executed by a machine, etc.).
In some examples, output of the deployed AI model may be captured and provided as feedback. By analyzing the feedback, an accuracy of the deployed model can be determined. If the feedback indicates that the accuracy of the deployed model is less than a threshold or other criterion, training of an updated model can be triggered using the feedback and an updated training data set, hyperparameters, etc., to generate an updated, deployed model.
The feature extraction block 105 of
In the illustrated example of
The example classification block 110 of
The example feature extraction block 105 of
The example classification block 110 of
The example input features 200 of
The example convolutor(s) 202 of
The example affinity matrix generator 208 of
The example matrix applicator 210 of
F(A,Z)=O1+O2 (Equation 1)
In Equation 1, O1 is the output of the example convolutor 202 (e.g., the fourth weighted input features) and O2 is the output of the affinity matrix applicator 210 (e.g., a connected graph). The example accumulator 230 generates the 1st order Chebyshev polynomials defined in Equation 1 by summing O1 and O2, as further described below.
To generate O2, the example matrix multiplier 212 of the matrix applicator 210 of
When adding into the early stage of a network (e.g., when the features may not be well aggregated), the nonlocal block should have the ability to be consecutively stacked into the network to form a deeper nonlocal structure to exploit the full range dependencies. Accordingly, the example full-order spectral nonlocal block 218 corresponds to the characteristics of steady state when consecutively connecting multiple spectral nonlocal blocks. The example full-order spectral nonlocal block 218 generates an additional term to approximate the full-order Chebyshev polynomials corresponding to a stable hypothesis (e.g., when adding more than two consecutively-connected SNL blocks with the same affinity matrix X into a network structure, the SNL blocks are stable when the variable affinity matrix satisfies Ak=A). The example full-order spectral nonlocal block 218 leverages the stable hypothesis to simplify the kth order Chebyshev polynomial (e.g., Tk(A)) into a piece-wise function, as shown below in Equation 2.
In Equation 2, I is the identity matrix. Accordingly, the example full-order spectral nonlocal block 218 generates 2A-I (e.g., a Chebyshev approximation matrix) to generate the Chebyshev approximation graph corresponding to a full order spectral nonlocal operator.
The example Chebyshev matrix approximator 220 of
The example matrix multiplier 224 of the Chebyshev matrix applicator 222 of
After the full-order spectral nonlocal operator (e.g., O) has been generated, the example accumulator 232 of
While an example manner of implementing the full spectral nonlocal block 107 of
Flowcharts representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the full spectral nonlocal block 107 of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc. in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by a computer, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, the disclosed machine readable instructions and/or corresponding program(s) are intended to encompass such machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open-ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” entity, as used herein, refers to one or more of that entity. The terms “a” (or “an”), “one or more”, and “at least one” can be used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
At block 302, the example convolutor 202 (
At block 306, the example affinity matrix generator 208 and the example reshaper 204 (
At block 310, the example affinity matrix generator 208 generates the affinity matrix based on the second reduced weighted input features and the third reduced weighted input features (e.g., ϕ∈RWH×Cs and ψ∈RWH×Cs). For example, the affinity matrix generator 208 reduces the dimensions of the second weighted input features (e.g., ϕ∈RW×H×Cs) and the third weighted input features (e.g., ψ∈RW×H×Cs) from three dimensions to two dimensions (e.g., ϕ∈RWH×Cs and ψ∈RWH×Cs). In this manner, the example affinity matrix generator 208 can calculate the affinity matrix by multiplying the second reduced weighted input features by the transpose of the third reduction weighted input features (e.g., A=(ϕ)(ψ)T). At block 312, the example multiplier 212 (
At block 316, the example Chebyshev matrix approximator 220 (
At block 324, the example accumulator 230 (
The processor platform 400 of the illustrated example includes a processor 412. The processor 412 of the illustrated example is hardware. For example, the processor 412 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor 412 may be a semiconductor based (e.g., silicon based) device. In
The processor 412 of the illustrated example includes a local memory 413 (e.g., a cache). In
The processor platform 400 of the illustrated example also includes an interface circuit 420. The interface circuit 420 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 422 are connected to the interface circuit 420. The input device(s) 422 permit(s) a user to enter data and/or commands into the processor 412. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, a trackbar (such as an isopoint), a voice recognition system and/or any other human-machine interface. Also, many systems, such as the processor platform 400, can allow the user to control the computer system and provide data to the computer using physical gestures, such as, but not limited to, hand or body movements, facial expressions, and face recognition.
One or more output devices 424 are also connected to the interface circuit 420 of the illustrated example. The output devices 424 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speakers(s). The interface circuit 420 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 420 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 426. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 400 of the illustrated example also includes one or more mass storage devices 428 for storing software and/or data. Examples of such mass storage devices 428 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
Machine executable instructions 432 corresponding to the instructions of
Example methods, apparatus, systems, and articles of manufacture to a spectral nonlocal block for a neural network and methods, apparatus, and articles of manufacture to control the same are disclosed herein. Further examples and combinations thereof include the following: Example 1 includes an apparatus comprising a first convolution filter to perform a first convolution using input features and first weighted kernels to generate first weighted input features, the input features corresponding to data input into a neural network, an affinity matrix generator to perform a second convolution using the input features and second weighted kernels to generate second weighted input features, perform a third convolution using the input features and third weighted kernels to generate third weighted input features, and generate an affinity matrix based on the second and third weighted input features, a second convolution filter to perform a fourth convolution using the first weighted input features and fourth weighted kernels to generate fourth weighted input features, a first accumulator to generate a spectral nonlocal operator by adding the fourth weighted input features to a connected weighted graph corresponding to the affinity matrix, and a second accumulator to transmit output features corresponding to the spectral nonlocal operator to a subsequent component of the neural network.
Example 2 includes the apparatus of example 1, wherein the first convolution filter is the second convolution filter.
Example 3 includes the apparatus of example 1, wherein the affinity matrix generator is to generate the affinity matrix by decreasing dimensions of the second weighted input features and the third weighted input features, and multiplying the second weighted input features by a transpose of the third weighted input features.
Example 4 includes the apparatus of example 1, further including a multiplier to multiply the affinity matrix with the first weighted input features to generate an affinity product, the first weighted input features having dimensions reduced prior to the multiplication, a reshaper to increase the dimensions of the affinity product, and a third convolution filter to perform a fifth convolution using the affinity product and fifth weighted kernels to generate the connected weighted graph.
Example 5 includes the apparatus of example 1, wherein the second accumulator is to generate the output features by adding the spectral nonlocal operator and the input features.
Example 6 includes the apparatus of example 1, wherein the apparatus is implemented as a layer in the neural network.
Example 7 includes the apparatus of example 1, wherein the second accumulator is to transmit the output features to a classifier of the neural network.
Example 8 includes the apparatus of example 1, further including a Chebyshev matrix approximator to generate a Chebyshev approximation matrix by multiplying the affinity matrix by a scalar, and subtracting an identity matrix from the scaled affinity matrix.
Example 9 includes the apparatus of example 8, further including a multiplier to multiply the Chebyshev approximation matrix with the first weighted input features to generate a Chebyshev approximation product, the first weighted input features having dimensions reduced prior to the multiplication, a reshaper to increase dimensions of the Chebyshev approximation product, and a third convolution filter to perform a fifth convolution using the Chebyshev approximation product and fifth weighted kernels to generate a Chebyshev approximation graph.
Example 10 includes the apparatus of example 9, wherein the first accumulator is to generate a full order spectral nonlocal operator by adding the spectral nonlocal operator with the Chebyshev approximation graph, the output features corresponding to the full order spectral nonlocal operator.
Example 11 includes a non-transitory computer readable storage medium comprising instructions which, when executed, cause one or more processors to at least perform a first convolution using input features and first weighted kernels to generate first weighted input features, the input features corresponding to data input into a neural network, perform a second convolution using the input features and second weighted kernels to generate second weighted input features, perform a third convolution using the input features and third weighted kernels to generate third weighted input features, and generate an affinity matrix based on the second and third weighted input features, perform a fourth convolution using the first weighted input features and fourth weighted kernels to generate fourth weighted input features, generate a spectral nonlocal operator by adding the fourth weighted input features to a connected weighted graph corresponding to the affinity matrix, and transmit output features corresponding to the spectral nonlocal operator to a subsequent component of the neural network.
Example 12 includes the non-transitory computer readable storage medium of example 11, wherein the instructions cause the one or more processors to generate the affinity matrix by decreasing dimensions of the second weighted input features and the third weighted input features, and multiplying the second weighted input features by a transpose of the third weighted input features.
Example 13 includes the non-transitory computer readable storage medium of example 11, wherein the instructions cause the one or more processors to multiply the affinity matrix with the first weighted input features to generate an affinity product, the first weighted input features having dimensions reduced prior to the multiplication, increase the dimensions of the affinity product, and perform a fifth convolution using the affinity product and fifth weighted kernels to generate the connected weighted graph.
Example 14 includes the non-transitory computer readable storage medium of example 11, wherein the second accumulator is to generate the output features by adding the spectral nonlocal operator and the input features.
Example 15 includes the non-transitory computer readable storage medium of example 11, wherein the one or more processors are implemented as a layer in the neural network.
Example 16 includes the non-transitory computer readable storage medium of example 11, wherein the instructions cause the one or more processors to transmit the output features to a classifier of the neural network.
Example 17 includes the non-transitory computer readable storage medium of example 11, wherein the instructions cause the one or more processors to generate a Chebyshev approximation matrix by multiplying the affinity matrix by a scalar, and subtracting an identity matrix from the scaled affinity matrix.
Example 18 includes the non-transitory computer readable storage medium of example 17, wherein the instructions cause the one or more processors to multiply the Chebyshev approximation matrix with the first weighted input features to generate a Chebyshev approximation product, the first weighted input features having dimensions reduced prior to the multiplication, increase dimensions of the Chebyshev approximation product, and perform a fifth convolution using the Chebyshev approximation product and fifth weighted kernels to generate a Chebyshev approximation graph.
Example 19 includes the non-transitory computer readable storage medium of example 18, wherein the instructions cause the one or more processors to generate a full order spectral nonlocal operator by adding the spectral nonlocal operator with the Chebyshev approximation graph, the output features corresponding to the full order spectral nonlocal operator.
Example 20 includes an apparatus comprising means for performing a first convolution using input features and first weighted kernels to generate first weighted input features, the input features corresponding to data input into a neural network, means for performing a second convolution using the input features and second weighted kernels to generate second weighted input features, the means for performing the second convolution to, perform a third convolution using the input features and third weighted kernels to generate third weighted input features, and generate an affinity matrix based on the second and third weighted input features, means for performing a fourth convolution using the first weighted input features and fourth weighted kernels to generate fourth weighted input features, means for generating a spectral nonlocal operator by adding the fourth weighted input features to a connected weighted graph corresponding to the affinity matrix, and means for transmitting output features corresponding to the spectral nonlocal operator to a subsequent component of the neural network.
Example 21 includes the apparatus of example 20, wherein the means for performing the first convolution is the means for performing the fourth convolution.
Example 22 includes the apparatus of example 20, wherein the means for generating the affinity matrix is to decrease dimensions of the second weighted input features and the third weighted input features, and multiply the second weighted input features by a transpose of the third weighted input features.
Example 23 includes the apparatus of example 20, further including means for multiplying the affinity matrix with the first weighted input features to generate an affinity product, the first weighted input features having dimensions reduced prior to the multiplication, means for increasing the dimensions of the affinity product, and means for performing a fifth convolution using the affinity product and fifth weighted kernels to generate the connected weighted graph.
Example 24 includes the apparatus of example 20, wherein the second accumulator is to generate the output features by adding the spectral nonlocal operator and the input features.
Example 25 includes the apparatus of example 20, wherein the apparatus is implemented as a layer in the neural network.
Example 26 includes the apparatus of example 20, wherein the means for transmitting is to transmit the output features to a classifier of the neural network.
Example 27 includes the apparatus of example 20, further including means for generating a Chebyshev approximation matrix by multiplying the affinity matrix by a scalar, and subtracting an identity matrix from the scaled affinity matrix.
Example 28 includes the apparatus of example 27, further including means for multiplying the Chebyshev approximation matrix with the first weighted input features to generate a Chebyshev approximation product, the first weighted input features having dimensions reduced prior to the multiplication, means for increasing dimensions of the Chebyshev approximation product, and means for performing a fifth convolution using the Chebyshev approximation product and fifth weighted kernels to generate a Chebyshev approximation graph.
Example 29 includes the apparatus of example 28, wherein the means for generating the spectral nonlocal operator is to generate a full order spectral nonlocal operator by adding the spectral nonlocal operator with the Chebyshev approximation graph, the output features corresponding to the full order spectral nonlocal operator.
Example 30 includes a method comprising performing, by executing an instruction using a processor, a first convolution using input features and first weighted kernels to generate first weighted input features, the input features corresponding to data input into a neural network, performing, by executing an instruction with the processor, a second convolution using the input features and second weighted kernels to generate second weighted input features, performing, by executing an instruction with the processor, a third convolution using the input features and third weighted kernels to generate third weighted input features, and generating, by executing an instruction with the processor, an affinity matrix based on the second and third weighted input features, performing, by executing an instruction with the processor, a fourth convolution using the first weighted input features and fourth weighted kernels to generate fourth weighted input features, generating, by executing an instruction with the processor, a spectral nonlocal operator by adding the fourth weighted input features to a connected weighted graph corresponding to the affinity matrix, and transmitting output features corresponding to the spectral nonlocal operator to a subsequent component of the neural network.
Example 31 includes the method of example 30, wherein the generating of the affinity matrix includes decreasing dimensions of the second weighted input features and the third weighted input features, and multiplying the second weighted input features by a transpose of the third weighted input features.
Example 32 includes the method of example 30, further including multiplying the affinity matrix with the first weighted input features to generate an affinity product, the first weighted input features having dimensions reduced prior to the multiplication, increasing the dimensions of the affinity product, and performing a fifth convolution using the affinity product and fifth weighted kernels to generate the connected weighted graph.
Example 33 includes the method of example 30, further including generating the output features by adding the spectral nonlocal operator and the input features.
Example 34 includes the method of example 30, further including transmitting the output features to a classifier of the neural network.
Example 35 includes the method of example 30, further including generating a Chebyshev approximation matrix by multiplying the affinity matrix by a scalar, and subtracting an identity matrix from the scaled affinity matrix.
Example 36 includes the method of example 36, further including multiplying the Chebyshev approximation matrix with the first weighted input features to generate a Chebyshev approximation product, the first weighted input features having dimensions reduced prior to the multiplication, increasing dimensions of the Chebyshev approximation product, and performing a fifth convolution using the Chebyshev approximation product and fifth weighted kernels to generate a Chebyshev approximation graph.
Example 37 includes the method of example 38, further including generating a full order spectral nonlocal operator by adding the spectral nonlocal operator with the Chebyshev approximation graph, the output features corresponding to the full order spectral nonlocal operator.
From the foregoing, it will be appreciated that example technical solutions to a spectral nonlocal block for a neural network and methods, apparatus, and articles of manufacture to control the same have been disclosed. Disclosed examples improve neural network classifications using the disclosed spectral nonlocal block and/or the disclosed full-order spectral nonlocal block. The disclosed spectral nonlocal block and/or the disclosed full-order spectral nonlocal block capture long-range dependencies without diminishing differentiated features due to a damping effect cause by interface between a large number of position pairs. When examples disclosed herein are implemented in a neural network with transferred channels on an image classification data set (e.g., a CIFAR1000 dataset, an ImageNet dataset, etc.), examples disclosed herein correspond to accuracy improvements eight times more than techniques. Likewise, examples disclosed herein correspond to accuracy improvements for the fin-grained image classification dataset (e.g., CUB dataset) and/or an action recognition dataset (e.g., UCF101 dataset). When examples disclosed herein is implemented in a neural network with different positions on a CIFAR1000 Dataset, examples disclosed herein correspond to an accuracy improvements two times more than techniques. Examples disclosed herein further increase accuracy for different network types (e.g., different position 3, same position 2, same position 5) by 2.3-4.7 times more than traditional techniques. Additionally, the computation costs and memory size corresponding to the SNL block disclosed herein are lower or comparable with traditional techniques. Accordingly, disclosed examples are accordingly directed to one or more improvement(s) in the functioning of a neural network.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.