This application claims the benefit under 35 USC 119(a) of Korean Patent Application No. 10-2019-0098810 filed on Aug. 13, 2019, and Korean Patent Application No. 10-2019-0127258 filed on Oct. 14, 2019, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated herein by reference for all purposes.
The following description relates to a neural network method and apparatus.
The technological automation of processes such as recognition, for example voice recognition and speech recognition, has been implemented through processor implemented neural network models, as specialized computational architectures, which, after substantial training, may provide computationally intuitive mappings between input patterns and output patterns. The trained capability of generating such mappings may be referred to as a learning capability of the neural network. Further, because of the specialized training, such specially trained neural network may thereby have a generalization capability of generating a relatively accurate output with respect to an input pattern that the neural network may not have been trained for, for example.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In a general aspect, a processor implemented data processing method includes receiving a first input plane corresponding to a first input channel from among a plurality of input planes of an input feature map, receiving a first weight plane corresponding to the first input channel among a plurality weight planes of a weight kernel, generating first cumulative data by accumulating multiplication results from multiplication operations between at least a portion of first input elements in the first input plane, and at least a portion of first weight elements in the first weight plane; and generating a first output plane corresponding to a first output channel among a plurality of output planes of an output feature map based on the first cumulative data, wherein each of the plurality of input planes, and each of the plurality of weight planes respectively correspond to an input channel, and wherein each of the plurality of output planes corresponds to an output channel.
The generating of the first output plane may include generating the first output plane based on a sum of cumulative data for each input channel including the first cumulative data.
The method may include receiving a second input plane corresponding to a second input channel among the input planes, receiving a second weight plane corresponding to the second input channel among the plurality of weight planes; and generating second cumulative data by accumulating multiplication results from multiplications between at least a portion of second input elements in the second input plane, and at least a portion of second weight elements in the second weight plane.
The generating of the first output plane may include generating the first output plane based on a sum of the first cumulative data and the second cumulative data.
The generating of the first cumulative data may include extracting, from the first input plane, first input element vectors corresponding to the portion of the first weight elements, generating first weighted input element vectors corresponding to multiplication results from multiplication operations between the first input element vectors and the portion of the first weight elements; and generating the first cumulative data by accumulating the first weighted input element vectors.
The extracting of the first input element vectors may include determining offsets corresponding to the first input element vectors based on indices of the portion of the first weight elements; and extracting the first input element vectors from the first input plane based on the determined offsets.
A size of the first input element vectors and a size of the first weighted input element vectors may correspond to a single instruction multiple data (SIMD) operation unit.
When the first cumulative data is generated, an operation of multiplying zero weight elements corresponding to a value of zero among the portion of the first weight elements and the portion of the first input elements may be skipped.
The method may further include determining a number of non-zero weight elements not corresponding to zero among the first weight elements; and selecting an operation type corresponding to the determined number of non-zero weight elements from among a plurality of operation types to perform a preset type of operation.
The generating of the first cumulative data may include generating the first cumulative data by accumulating the multiplication results from the multiplication operations between the portion of the first input elements and the non-zero weight elements corresponding to the portion of the first weight elements based on the selected operation type.
The generating of the first cumulative data may include extracting, from the first input plane, first input element vectors corresponding to the non-zero weight elements based on indices of the non-zero weight elements, generating first weighted input element vectors corresponding to multiplication results from multiplication operations between the first input element vectors and the non-zero weight elements corresponding to the portion of the first weight elements; and generating the first cumulative data by accumulating the first weighted input element vectors.
The method may further include separately multiplying respective weight elements of each of the weight planes by plural elements of the first input plane.
In a general aspect, a data processing apparatus includes one or more processors configured to receive a first input plane corresponding to a first input channel from among a plurality of input planes of an input feature map, receive a first weight plane corresponding to the first input channel among a plurality of weight planes of a weight kernel, generate first cumulative data by accumulating multiplication results from multiplication operations between at least a portion of first input elements in the first input plane and at least a portion of first weight elements in the first weight plane; and generate a first output plane corresponding to a first output channel among a plurality of output planes of an output feature map respectively corresponding to output channels based on the first cumulative data, wherein each of the plurality of input planes, and each of the plurality of weight planes respectively correspond to an input channel, and wherein each of the plurality of output planes corresponds to an output channel.
The processor may further be configured to generate the first output plane based on a sum of cumulative data for each input channel including the first cumulative data.
The processor may be further configured to receive a second input plane corresponding to a second input channel among the input planes, receive a second weight plane corresponding to the second input channel among the plurality of weight planes; and generate second cumulative data by accumulating multiplication results from multiplications between at least a portion of second input elements in the second input plane and at least a portion of second weight elements in the second weight plane.
The processor may be further configured to generate the first output plane based on a sum of the first cumulative data and the second cumulative data.
The processor may be further configured extract, from the first input plane, first input element vectors corresponding to the portion of the first weight elements; generate first weighted input element vectors corresponding to multiplication results from multiplication operations between the first input element vectors and the portion of the first weight elements; and generate the first cumulative data by accumulating the first weighted input element vectors.
The processor may be further configured to determine offsets corresponding to the first input element vectors based on indices of the portion of the first weight elements; and extract the first input element vectors from the first input plane based on the determined offsets.
A size of the first input element vectors and a size of the first weighted input element vectors may correspond to a single instruction multiple data (SIMD) operation unit.
When the first cumulative data is generated, an operation of multiplying zero weight elements corresponding to a value of zero among the portion of the first weight elements and the portion of the first input elements may be skipped.
The processor may be further configured to determine a number of non-zero weight elements not corresponding to zero among the first weight elements; and select an operation type corresponding to the determined number of non-zero weight elements from among a plurality of operation types to perform a preset type of operation.
The processor may be further configured to generate the first cumulative data by accumulating the multiplication results from the multiplication operations between the portion of the first input elements and the non-zero weight elements corresponding to the portion of the first weight elements based on the selected operation type.
The processor may be further configured to extract, from the first input plane, first input element vectors corresponding to the non-zero weight elements based on indices of the non-zero weight elements, generate first weighted input element vectors corresponding to multiplication results from multiplication operations between the first input element vectors and the non-zero weight elements corresponding to the at least portion of the first weight elements; and generate the first cumulative data by accumulating the first weighted input element vectors.
The apparatus may include a memory storing instructions that, when executed by the one or more processors, configure the one or more processors to perform the receiving of the first input plane, the receiving of the first weight plane, the generating of the first cumulative data, and the generating of the first output plane.
In a general aspect, a processor-implemented method performed by a processor of an electronic apparatus includes receiving an input plane of a layer of a neural network including a plurality of input elements, receiving a weight plane corresponding to the input plane of the layer, the weight plane including a plurality of weight elements; and generating an output plane by accumulating multiplication results obtained by performing a multiplication operation between each of the weight elements in the weight plane and a corresponding input element of the input elements in the input plane.
When a zero weight element corresponding to a value of zero is present among the weight elements, a multiplication between the zero weight element and an input element corresponding to the zero weight element may be skipped.
A convolution operation associated with the layer of the neural network may be performed based on single instruction multiple data (SIMD).
The input plane and the weight plane may correspond to a single input channel, and the output plane corresponds to a single output channel.
The input plane may be one of a plurality of input planes corresponding to an input feature map of the layer, and the weight plane is one of a plurality of weight planes corresponding to a weight kernel of the layer, and wherein an output feature map of the layer is determined based on the output plane, and one or more output planes generated based on one or more other input planes excluding the input plane among the plurality of input planes, and one or more other weight planes excluding the weight plane among the plurality of weight planes.
In a general aspect, a processor-implemented method includes receiving an input feature map including a plurality of input planes, receiving a weight kernel including a plurality of weight planes, performing a cumulative convolution operation between the input feature map and the weight kernel, and generating an output plane based on the cumulative convolution operation.
The method may further include generating cumulative planes by performing multiply and accumulate (MAC) operations between the plurality of input planes and the plurality of weight planes.
The output plane may be generated by accumulating outputs of the cumulative planes.
Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described or provided, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order. Also, descriptions of features that are known after an understanding of the disclosure of this application may be omitted for increased clarity and conciseness.
The features described herein may be embodied in different forms and are not to be construed as being limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application.
As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items.
Throughout the specification, when an element, such as a layer, region, or substrate, is described as being “on,” “connected to,” or “coupled to” another element, it may be directly “on,” “connected to,” or “coupled to” the other element, or there may be one or more other elements intervening therebetween. In contrast, when an element is described as being “directly on,” “directly connected to,” or “directly coupled to” another element, there can be no other elements intervening therebetween. As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items.
Although terms such as “first,” “second,” and “third” may be used herein to describe various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not to be limited by these terms. Rather, these terms are only used to distinguish one member, component, region, layer, or section from another member, component, region, layer, or section. Thus, a first member, component, region, layer, or section referred to in examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
The terminology used herein is for describing various examples only and is not to be used to limit the disclosure. The articles “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms “comprises,” “includes,” and “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, operations, members, elements, and/or combinations thereof.
Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and based on an understanding of the disclosure of the present application. Terms, such as those defined in commonly used dictionaries, are to be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the disclosure of the present application and are not to be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Also, in the description of example embodiments, detailed description of structures or functions that are thereby known after an understanding of the disclosure of the present application will be omitted when it is deemed that such description will cause ambiguous interpretation of the example embodiments.
Hereinafter, examples will be described in detail with reference to the accompanying drawings, and like reference numerals in the drawings refer to like elements throughout.
The one or more operations may be implemented through processor-implemented neural network models, as specialized computational architectures that, after substantial training, may provide computationally intuitive mappings between input data or patterns and output data or patterns or pattern recognitions of input patterns. The trained capability of generating such mappings or performing such pattern recognitions may be referred to as a learning capability of the neural network. Such trained capabilities may also enable the specialized computational architecture to classify such an input pattern, or portion of the input pattern, as a member that belongs to one or more predetermined groups. Further, because of the specialized training, such specially trained neural network may thereby have a generalization capability of generating a relatively accurate or reliable output with respect to an input pattern that the neural network may not have been trained for, for example.
In an example, the neural network 110 may be a deep neural network (DNN), as a non-limiting example. The DNN may include a plurality of layers. For example, the deep neural network may include an input layer to which input data is applied, an output layer for outputting a result derived through prediction based on training and the input data, and a plurality of hidden layers for performing a neural network operation between the input layer and the output layer.
In an example, the input layer may correspond to, or may be referred to as, the lowest layer of the neural network, and the output layer may correspond to, or may be referred to as, the highest layer of the neural network. A layer order may be assigned and named or referred to sequentially from the output layer, that is the highest layer, to the input layer that is the lowest layer. For example, a Hidden Layer 2 may correspond to a layer higher than a Hidden Layer 1 and the Input Layer, but lower than the Output Layer.
The DNN may include one or more convolutional layers, and may further include one or more of fully connected layers, a recurrent neural network, and the like, or may include different or overlapping neural network portions respectively with such full, convolutional, or recurrent connections, according to machine learning used to process information.
As noted, the neural network 110 may perform the one or more operations, for example, the object recognition operation or the user verification operation by mapping input data and output data that are in a nonlinear relationship based on deep learning approaches, such as in a convolution neural network or a recurrent neural network. The deep learning approach may refer to a machine learning method used to recognize, as non-limiting examples, an image or a voice (or speech) from a big dataset. The deep learning approach may be construed as a problem-solving process in optimization to locate a point at which energy or loss is minimized while training the neural network 110 using prepared training data. The deep learning approach may be classified into supervised or unsupervised learning, through which weights corresponding to an architecture or model of the neural network 110 may be obtained. Through such obtained weights or elements of kernel(s), the input data and the output data may be mapped according to a trained objective of the neural network 110.
The neural network 110 may be a deep neural network (DNN) including a plurality of layers which includes an input layer, at least one hidden layer, and an output layer. For example, as illustrated in
In the CNN or CNN portion, data input to each layer may be referred to as an input feature map or volume, and data output from each layer may be referred to as an output feature map or volume. The input feature map from a previous layer and the output feature map of a current layer may be referred to as activation data. In addition, an input feature map in an input layer may correspond to input data.
To process the operation associated with the neural network 110, the data processing apparatus 100 may perform a convolution operation between an input feature map and a weight kernel for each convolutional layer, and generate an output feature map based on a result of the convolution operation. The weight kernel may have multiple channels, corresponding to the number of channels of the input feature map, and there may further be multiple weight kernels resulting in the generation of an output feature map of multiple channels. The neural network 110 may have a capacity sufficient to implement a function, when a width and a depth of the neural network 110 are sufficiently large. The neural network 110 may achieve optimal performance when the neural network 110 learns or is trained with a sufficiently large amount of training data through a training process, as discussed above.
A weight kernel may be predetermined, e.g., the weight kernel includes trained weight elements, which indicates that it is determined before the neural network 110 is initiated (implemented). The initiation of the neural network 110 may indicate that the neural network 110 is ready for inference. In an example, the initiation of the neural network 110 may indicate that the neural network 110 is loaded in a memory, or that input data for the inference is input to the neural network 110 after the neural network 110 is loaded in the memory. Inference is the process of applying a trained neural network to an input to produce an output.
As is further described below, a convolution operation may be performed by accumulating, in an output feature map, intermediate results of the convolution operation, and may not require a buffering operation of converting a weight kernel or an input feature map to a form suitable for a convolution and storing it in a buffer. That is, the convolution operation may use data of the input feature map stored in a planar form. Thus, efficiency of the convolution operation may be improved greatly. Additionally, in the convolution operation, a unit operation may correspond to multiplying one weight element corresponding to a scalar and one input plane corresponding to a matrix. Thus, for weight elements having a value of zero, zero-skipping may be effectively processed through software.
Each weight plane and each output plane may include elements of a preset bit-width. For example, each weight plane may have a size of K*K, and each input plane and each output plane may have a size of W*H, in which W, K, and H indicate respective numbers of elements. An element of a weight plane may also be referred to as a weight element, and an element of an input plane and an element of an output plane may also be referred to as an input element and an output element, respectively. In an example, a convolution operation may be performed elementwise.
Hereinafter, it may be assumed for convenience of description that a width and a height of a weight plane are the same as K, and a size of an input plane and a size of an output plane are the same as W*H. However, a width and a height of a weight plane, and a size of an input plane and a size of an output plane may differ from each other according to an example.
Referring to
Such sliding window convolution operation may be typically implemented to perform a convolution operation, and differs from a cumulative convolution operation described herein. For example, in the sliding window convolution operation, a buffering operation may be performed on the input feature map 320 to generate column vectors. However, in one or more examples, a cumulative convolution operation herein may accumulate, in the output feature map 330, intermediate results of a convolution operation, and thus there may not be a need to perform operations such as a buffering operation as in the sliding window convolution operation.
By the sliding window convolution operation, an operation between the weight kernel 310 and data stored in a noncontinuous address of the input feature map 320 may be performed while the weight kernel 310 is sliding across the input feature map 320, and thus the input feature map 320 may be converted to a suitable form of continuous data to increase a speed of processing the operation. In the example of
A column vector may be buffered in a column buffer from the input feature map 320 in a planar structure or an interleaved structure. In an example of the planar structure, while the input feature map 320 is being buffered as a column vector, a noncontinuous maximum memory access may occur to an extent of a result from a multiplication of a height K of a kernel and the number C of input channels to determine one output element. In an example of the interleaved structure, while the input feature map 320 is being buffered as a column vector, the noncontinuous maximum memory access may occur to an extent of the height K of the kernel to determine one output element.
In contrast, in one or more examples, by a cumulative convolution operation discussed herein, the intermediate results of the convolution operation may be accumulated in the output feature map 330, and thus such additional buffering operation may not be needed to convert the input feature map 320 to such planar or interleaved structure. Thus, the cumulative convolution operation may minimize a memory access and maximize a speed of processing the convolution operation.
As previously discussed, in an example, an output feature map may include D output planes.
Referring to
Referring to
The cumulative plane 531 may be generated through a multiply and accumulate (MAC) operation between the input plane 511 and the weight plane 521. The cumulative plane 532 is generated through a MAC operation between the input plane 512 and the weight plane 522. The cumulative plane 533 may be generated through a MAC operation between the input plane 513 and the weight plane 523. The MAC operation will be described hereinafter in greater detail. When the cumulative planes 530 are generated, the output plane 540 is generated based on the cumulative planes 530. For example, the output plane 540 may be generated through a sum of the cumulative planes 530.
Referring to
Referring to
Considering a sliding window approach, between the weight plane including the weight elements w1 through w9 and the input plane 712, in one or more examples, response regions 721 through 729 in the input plane 712 that respectively respond to the weight elements w1 through w9 may be defined. For example, input elements in the response region 721 respond to the weight element w1, input elements in the response region 722 respond to the weight element w2, and input elements in the response region 729 respond to the weight element w9.
A size of the response regions 721 through 729 is the same as a size of the input plane 711. In addition, respective offsets of the response regions 721 through 729 are determined based on respective indices of the weight elements w1 through w9. For example, when a width of the input plane 711 is W+2, the offsets of the response regions 721 through 729 are defined as (W+2)*a+b, in which “a” denotes a quotient obtained by dividing (i−1) by K, and “b” denotes a remainder obtained by dividing (i−1) by K, and i denotes an index of an weight element and K denotes a width of a weight kernel. In this example, an offset may be determined based on an input plane, for example, an original point of the input plane to which padding is applied. Thus, the offset of the response region 721 is 0, the offset of the response region 722 is 1, and the offset of the response region 729 is (W+2)*2+2.
Multiplication results 731 through 739 are generated from respective multiplications between input elements in the response regions 721 through 729 and the weight elements w1 through w9. The cumulative plane 740 is generated by accumulating each of the multiplication results 731 through 739. In an example, an output plane may be generated through a sum of C cumulative planes. In this example, the cumulative plane 740 of
As described above, an output feature map may be generated by accumulating multiplication results, for example, the multiplication results 731 through 739, that correspond to intermediate results of a convolution operation. Accordingly, an operation of converting an input feature map to continuous data, and storing the continuous data in a buffer, may not be needed. Thus, it is possible to reduce an amount of time used for such conversion and buffering, and accelerate an operational speed of the convolution operation and save memory space used to store the converted data.
Referring to
Referring to
From the response regions 911 through 919, input element vectors are extracted and stored in registers r1 through r9. For example, a first input element vector of the response region 911 is stored in the register r1, and a second input element vector of the response region 912 is stored in the register r2. Similarly, the input element vectors are respectively stored in the registers r1 through r9 in sequential order.
Each of the input element vectors may be multiplied elementwise by a corresponding weight element among the weight elements w1 through w9, and thus weighted input element vectors are generated. In an example, the first input element vector of the response region 911 is stored in the register r1 and multiplied by the weight element w1, and thus a first weighted input element vector is generated. Similarly, the second input element vector of the response region 912 is stored in the register r2 and multiplied by the weight element w2, and thus a second weighted input element vector is generated. A size of the response regions 911 through 919, the input element vectors, and the weighted input element vectors may correspond to a SIMD operation unit.
The weighted input element vectors generated through such processes described above may be accumulated, and a cumulative vector corresponding to the sliding region 910 may be generated. The process may be repeated for each of sliding regions, and cumulative vectors respectively corresponding to the sliding regions may be generated. The generated cumulative vectors may form a cumulative plane. Here, a cumulative plane and a cumulative vector may refer to different forms of cumulative data, and may collectively be referred to as cumulative data.
Referring to
In the example of
When cumulative vectors are repeatedly stored in the output region 1011 based on the number of input channels (e.g., the number of accumulations is one less than the number of the input channels), an output element vector corresponding to the output region 1011 is determined. Additionally, such processes for the output region 1011 is performed on remaining output regions in the output plane 1010, the output plane 1010 may then be determined. Thus, a cumulative convolution operation may be implemented through SIMD.
In an example, a convolution operation may be performed for each input plane as a unit, or for each response region in an input plane as a unit, and thus may effectively process zero-skipping through software, or a combination of software and hardware.
Referring to
Referring to the example illustrated in
Referring to
In operation 1220, an operation type corresponding to the determined number of non-zero weight elements may be selected from among operation types, and data corresponding to the non-zero weight elements is loaded into a register. In the example of
Data to be loaded to a register may correspond to at least a portion of an input plane. For example, an input element vector corresponding to a non-zero weight element may be loaded to the register. An offset corresponding to the input element vector may be determined based on an index of the non-zero weight element, and the input element vector may be extracted from the input plane based on the determined offset and stored in the register. In the example of
A preset type of operation may include a type of operation that performs a MAC operation between non-zero weight elements and data loaded to a register, and generates cumulative data. The data may be loaded to the register based on the number of the non-zero weight elements and an offset. For example, a MAC operation between a non-zero weight element and an input element vector stored in a register may be performed. In the example of
In operation 1230, a source code corresponding to each operation type may be executed. In an example, a source code corresponding to each of operation types 0 through 9, as only examples, may be stored in a memory code area, and a source code corresponding to the selected operation type may be loaded from the memory code area and executed. In the example of
Referring to
In operation 1302, an input plane ic is obtained. In operation 1303, a weight plane wcd is obtained. In an example, “c” denotes an index of an input channel, and may be a natural number, for example, 1 through C, with an initial value of 1. Input planes and weight planes may respectively correspond to input channels. In an example, an input plane i1 and a weight plane w1d correspond to a first input channel, and an input plane i2 and a weight plane w2d correspond to a second input channel.
In operation 1306, a MAC operation is performed. In an example, cumulative data is generated by accumulating multiplication results from multiplications between at least a portion of input elements in the input plane ic and at least a portion of weight elements in the weight plane wcd. In this example, input element vectors corresponding to at least a portion of the weight elements are extracted from the input plane ic, weighted input element vectors corresponding to multiplication results from multiplications between the extracted input element vectors and at least a portion of the weight elements are generated, and then the cumulative data is generated by accumulating the weighted input element vectors. In this example, offsets corresponding to the input element vectors may be determined based on indices of at least a portion of the weight elements, and the input element vectors are extracted from the input plane ic based on the determined offsets.
In operations 1304 and 1305, zero-skipping is performed. Specifically, in operation 1304, zero encoding is performed. In operation 1305, an operation type is selected. When the number of non-zero weight elements is determined through the zero encoding, an operation type corresponding to the determined number of non-zero weight elements is selected, and input elements corresponding to the non-zero weight elements are loaded to a register. For example, input element vectors corresponding to the non-zero weight elements are loaded to the register.
When the operation type is selected, operations based on a preset operation type may be performed. For example, the operations may include multiplying non-zero weight elements and the input elements, or the input element vectors, in the register, and generating the cumulative data, for example, a cumulative vector, by accumulating results of the multiplying. Thus, when the cumulative data is generated, an operation of multiplying zero weight elements and the input elements may be skipped.
In operation 1307, an output is accumulated. For example, cumulative data corresponding to an output of a MAC operation may be accumulated. For example, when a first repetition for c which is 1 (c=1) is performed, an input plane is obtained and a weight plane w1d is obtained, and first cumulative data is generated by accumulating multiplication results from multiplications between at least a portion of first input elements in the obtained input plane and at least a portion of first weight elements in the obtained weight plane w1d. When a second repetition for c which is 2 (c=2) is performed, an input plane i2 is obtained and a weight plane w2d is obtained, and second cumulative data is generated by accumulating multiplication results from multiplications between at least a portion of second input elements in the obtained input plane i2 and at least a portion of second weight elements in the obtained weight plane w2d. In this example, the generated first cumulative data and second cumulative data are accumulated. When a Cth repetition for c which is C (c=C) is performed, an output plane is generated based on a sum of cumulative data for each input channel.
In operation 1308, c and C are compared. When c and C are different, for example, when c is less than C, c is increased by 1 in operation 1309, and operation 1302 is performed. When c is equal to C, d and D are compared in operation 1309. When d and D are different, for example, when d is less than D, d is increased by 1 in operation 1311 and operation 1301 is performed. A convolution may be performed on all input channels while an output channel is set or fixed through operations 1308 and 1309, and a convolution operation may be performed on all output channels by changing an output channel through operations 1310 and 1311.
Referring to
The data processing apparatus 1500 may receive input data, and process an operation of a neural network associated with the received input data. The operation of the neural network may include, as non-limiting examples, an object recognition operation and a user verification operation, as examples. The processing apparatus 1500 may perform one or more of the operations or methods described herein in relation to processing by the neural network, and provide a user with a result of processing by the neural network. The processing apparatus 1500 may perform a cumulative convolution operation as described above while processing the operation of the neural network.
Referring to
The processor 1510 may execute instructions to perform one or more of the operations or methods described above with reference to
The electronic apparatus 1600 may receive input data, and process an operation of a neural network associated with the received input data. The operation of the neural network may include, as non-limiting examples, an object recognition operation and a user verification operation, as examples. The electronic apparatus 1600 may perform a cumulative convolution operation as described above while processing the operation of the neural network. The electronic apparatus 1600 may include the processing apparatus described above with reference to
Referring to
The one or more processors 1610 may execute a function and an instruction in the electronic apparatus 1600. For example, the processor 1610 may process instructions stored in the memory 1620 or the storage device 1640. The processor 1610 may perform one or more of the operations or methods described above with reference to
The memory 1620 may store information to be used to process the operation of the neural network. The memory 1620 may include a computer-readable storage medium or a computer-readable storage device. The memory 1620 may store instructions to be executed by the processor 1610, and store related information while software or an application is being executed by the electronic apparatus 1600.
The camera 1630 may capture a still image, a moving or video image, or both images. The camera 1630 may capture an image of a facial region to be input by a user for facial verification or recognition. The camera 1630 may also provide a three-dimensional (3D) image including depth information of objects.
The storage device 1640 may include a computer-readable storage medium or a computer-readable storage device. The storage device 1640 may store a greater amount of information for a longer period of time, compared to the memory 1620. The storage device 1640 may include, for example, a magnetic hard disk, an optical disc, a flash memory, a floppy disk, and other types of nonvolatile memory that are well-known in the related technical field.
The input device 1650 may receive an input from a user through a traditional input method, including, as non-limiting examples, a keyboard and a mouse, and a new input method, for example, a touch input, a voice input, and an image input. The input device 1650 may include, for example, a keyboard, a mouse, a touchscreen, a microphone, and other devices that may detect the input from the user and transmit the detected input to the electronic apparatus 1600.
The output device 1660 may provide an output of the electronic apparatus 1600 to a user through a visual, auditory, or tactile channel. The output device 1660 may include, for example, a display, a touchscreen, a speaker, a vibration generator, and other devices that may provide the output to the user. The network interface 1670 may communicate with an external device through a wired or wireless network.
The neural network apparatuses, data processing apparatuses, the electronic apparatus, data processing apparatus 100, processor 1510, memory 1520, processor 1610, memory 1620, camera 1630, storage device 1640, input device 1650, output device 1660, network interface 1670, and other devices, and other components described herein with respect to
The methods illustrated in
Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software include higher-level code that is executed by the one or more processors or computers using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions in the specification, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of a non-transitory computer-readable storage medium include read-only memory (ROM), random-access programmable read only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROMs, CD-Rs, CD+Rs, CD-RWs, CD+RWs, DVD-ROMs, DVD-Rs, DVD+Rs, DVD-RWs, DVD+RWs, DVD-RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, blue-ray or optical disk storage, hard disk drive (HDD), solid state drive (SSD), flash memory, a card type memory such as multimedia card micro or a card (for example, secure digital (SD) or extreme digital (XD)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
While this disclosure includes specific examples, it will be apparent, after an understanding of the disclosed application, that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents. Therefore, the scope of the disclosure is defined not by the detailed description, but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
10-2019-0098810 | Aug 2019 | KR | national |
10-2019-0127258 | Oct 2019 | KR | national |