This application claims the benefit under 35 USC § 119(a) of Korean Patent Application No. 10-2020-0102600, filed on Aug. 14, 2020, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes.
The following description relates to a method and apparatus with convolution operation processing based on redundancy reduction.
Technical automation of a recognition process may be implemented using, for example, a neural network model implemented by a processor as a special calculation structure, which may provide a computationally intuitive mapping between an input pattern and an output pattern after considerable training. An ability to be trained to generate such mapping may be referred to as a “training ability of a neural network.” Moreover, due to specialized training, such a specialized and trained neural network may have a generalization ability to generate a relatively accurate output for an input pattern that is not trained.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In one general aspect, a processor-implemented neural network layer convolution operation method includes: obtaining a first input plane of an input feature map and a first weight plane of a weight kernel; generating base planes, corresponding to an intermediate operation result of the first input plane, based on at least a portion of available weight values of the weight kernel; generating first accumulation data based on at least one plane corresponding to weight element values of the first weight plane among the first input plane and the base planes; and generating a first output plane of an output feature map based on the first accumulation data.
The generating of the first accumulation data may include: determining a first target plane corresponding to a weight value of a first weight element of the first weight plane among the first input plane and the base planes; determining a first target region in the first target plane based on an offset of the first weight element; and generating the first accumulation data by performing an accumulation operation based on target elements of the first target region.
The determining of the first target region may include determining the first target region using a first pointer pointing to the first target region among pointers pointing to different regions of the first target plane based on the offset of the first weight element.
Each of the base planes may correspond to a respective available weight value among the portion of available weight values, and the determining of the first target plane may include determining, as the first target plane, a base plane corresponding to an available weight value equal to an absolute value of the weight value of the first weight element.
The generating of the first accumulation data further may include: determining a second target plane corresponding to a weight value of a second weight element of the first weight plane among the first input plane and the base planes; and determining a second target region in the second target plane based on an offset of the second weight element, and the performing of the accumulation operation may include accumulating target elements of the first target region and corresponding target elements of the second target region.
The first target region may correspond to one-dimensional (1D) vector data of a single-instruction multiple-data (SIMD) operation.
The offset of the first weight element may correspond to a position of the first weight element in the first weight plane.
A number of the available weight values may be determined based on a bit precision of the weight kernel.
A bit precision of the weight kernel may be less than or equal to 3 bits.
The intermediate operation result of the first input plane may correspond to a multiplication result of the first input plane, and the generating of the base planes may include generating the base planes corresponding to the multiplication result through a shift operation and an addition operation instead of performing a multiplication operation.
The first input plane and the first weight plane may correspond to a first input channel among a plurality of input channels, and the first output plane may correspond to a first output channel among a plurality of output channels.
The method may include: generating second accumulation data based on a second input plane of the input feature map and a second weight plane of the weight kernel, wherein the generating of the first output plane may include generating the first output plane by accumulating the first accumulation data and the second accumulation data.
A non-transitory computer-readable storage medium may store instructions that, when executed by a processor, configure the processor to perform the method.
In another general aspect, a neural network layer convolution operation apparatus includes: a processor configured to: obtain a first input plane of an input feature map and a first weight plane of a weight kernel; generate base planes, corresponding to an intermediate operation result of the first input plane, based on at least a portion of available weight values of the weight kernel; generate first accumulation data based on at least one plane corresponding to weight element values of the first weight plane among the first input plane and the base planes; and generate a first output plane of an output feature map based on the first accumulation data.
For the generating of the first accumulation data, the processor may be configured to determine a first target plane corresponding to a weight value of a first weight element of the first weight plane among the first input plane and the base planes, determine a first target region in the first target plane based on an offset of the first weight element, and generate the first accumulation data by performing an accumulation operation based on target elements of the first target region.
The processor may be configured to determine the first target region by determining a first pointer pointing to the first target region among pointers pointing to different regions of the first target plane based on the offset of the first weight element.
A bit precision of the weight kernel may be less than or equal to 3 bits.
The intermediate operation result of the first input plane may correspond to a multiplication result for the first input plane, and the processor may be configured to generate the base planes corresponding to the multiplication result through a shift operation and an addition operation instead of performing a multiplication operation.
The apparatus may include a memory storing instructions that, when executed by the processor, configure the processor to perform the obtaining of the first input plane, the generating of the base planes, the generating of the first accumulation data, and the generating of the first output plane.
An electronic apparatus may include: the apparatus above and a camera configured to generate an input image based on detected visual information, wherein the apparatus above is a processor, and the input feature map may correspond to the input image.
In another general aspect, an electronic apparatus includes: a camera configured to generate an input image based on detected visual information; and a processor configured to obtain a first input plane of an input feature map corresponding to the input image and a first weight plane of a weight kernel, generate base planes, corresponding to an intermediate operation result of the first input plane, based on at least a portion of available weight values of the weight kernel, generate first accumulation data based on at least one plane corresponding to weight element values of the first weight plane among the first input plane and the base planes, and generate a first output plane of an output feature map based on the first accumulation data.
For the generating of the first accumulation data, the processor may be configured to determine a first target plane corresponding to a weight value of a first weight element of the first weight plane among the first input plane and the base planes, determine a first target region in the first target plane based on an offset of the first weight element, and generate the first accumulation data by performing an accumulation operation based on target elements of the first target region.
The processor may be configured to determine the first target region by determining a first pointer pointing to the first target region among pointers pointing to different regions of the first target plane based on the offset of the first weight element.
The intermediate operation result of the first input plane may correspond to a multiplication result for the first input plane, and the processor may be configured to generate the base planes corresponding to the multiplication result through a shift operation and an addition operation instead of performing a multiplication operation.
The processor may be configured to generate any one or any combination of a classification result, a detection result, a tracking result, an identification result, a recognition result, and an authentication result of the input image, based on the output feature map.
In another general aspect, a processor-implemented neural network layer convolution operation method includes: obtaining an input plane of an input feature map and a weight plane of a weight kernel; generating base planes as corresponding to multiplication results between the input plane and available weight values of the weight kernel; determining target regions among the base planes and the input plane that correspond to weight elements of the weight plane, based on weight values of the weight elements and positions of the weight elements in the weight plane; and generating a portion of an output plane of an output feature map by accumulating the target regions.
The generating of the base planes may include generating a base plane for each absolute value of the available weight values greater than one.
Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described or provided, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order. Also, descriptions of features that are known may be omitted for increased clarity and conciseness.
The features described herein may be embodied in different forms, and are not to be construed as being limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application.
Hereinafter, some example embodiments will be described in detail with reference to the accompanying drawings. Various modifications may be made to the example embodiments. Here, the example embodiments are not construed as limited to the disclosure and should be understood to include all changes, equivalents, and replacements within the idea and the technical scope of the disclosure.
The terminology used herein is for the purpose of describing example embodiments only and is not intended to be limiting of the present disclosure. As used herein, the singular forms (for example, “a”, “an”, and “the”) are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items. As used herein, the terms “include,” “comprise,” and “have” specify the presence of stated features, integers, steps, operations, elements, components, numbers, and/or a combination thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, numbers, and/or combinations thereof. The use of the term “may” herein with respect to an example or embodiment (for example, as to what an example or embodiment may include or implement) means that at least one example or embodiment exists where such a feature is included or implemented, while all examples are not limited thereto.
Unless otherwise defined herein, all terms used herein including technical or scientific terms have the same meanings as those generally understood by one of ordinary skill in the art to which this disclosure pertains consistent with and after an understanding of the present disclosure. Terms, such as those defined in commonly used dictionaries, should be construed to have meanings matching contextual meanings in the relevant art and the present disclosure, and are not to be construed as an ideal or excessively formal meaning unless otherwise defined herein.
When describing the example embodiments with reference to the accompanying drawings, like reference numerals refer to like constituent elements and a repeated description related thereto will be omitted. In the description of example embodiments, detailed description of well-known related structures or functions will be omitted when it is deemed that such description will cause ambiguous interpretation of the present disclosure.
Also, the terms “first,” “second,” “A,” “B,” “(a),” “(b),” and the like may be used herein to describe components according to example embodiments. Each of these terminologies is not used to define an essence, order or sequence of a corresponding component but used merely to distinguish the corresponding component from other component(s). Although terms of “first” or “second” are used herein to describe various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not to be limited by these terms. Rather, these terms are only used to distinguish one member, component, region, layer, or section from another member, component, region, layer, or section. Thus, a first member, component, region, layer, or section referred to in examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
Throughout the specification, when an element, such as a layer, region, or substrate, is described as being “on,” “connected to,” or “coupled to” another element, it may be directly “on,” “connected to,” or “coupled to” the other element, or there may be one or more other elements intervening therebetween. In contrast, when an element is described as being “directly on,” “directly connected to,” or “directly coupled to” another element, there can be no other elements intervening therebetween. Likewise, expressions, for example, “between” and “immediately between” and “adjacent to” and “immediately adjacent to” may also be construed as described in the foregoing.
A component having a common function with a component included in one example embodiment is described using a like name in another example embodiment. Unless otherwise described, description made in one example embodiment may be applicable to another example embodiment and detailed description within a redundant range is omitted.
The neural network 110 may correspond to a deep neural network (DNN) including a plurality of layers. The plurality of layers may include an input layer, at least one hidden layer, and an output layer. A first layer, a second layer and an n-th layer of
In the CNN, data input to each layer may be referred to as an “input feature map” and data output from each layer may be referred to as an “output feature map”. The input feature map and the output feature map may also be referred to as activation data. An output feature map of a layer may be, or may be used to generate, an input feature map of a subsequent layer. When a convolutional layer corresponds to an input layer, an input feature map of the input layer may correspond to input data. For example, the input data may be an input image or data resulting from an initial processing of the input image.
The neural network 110 may be trained based on deep learning, and may perform inference suitable for the purpose of training, by mapping input data and output data that are in a nonlinear relationship. The deep learning may be a machine learning scheme for solving an issue such as image or voice recognition from a big data set. The deep learning may be understood as a process of solving an optimization issue to find a point at which energy is minimized while training the neural network 110 based on prepared training data.
Through supervised or unsupervised learning of the deep learning, a structure of the neural network 110 or a weight corresponding to a model may be obtained or determined, and input data and output data may be mapped to each other through the weight. For example, when a width and a depth of the neural network 110 are sufficiently large, the neural network 110 may have a capacity large enough to implement an arbitrary function. When the neural network 110 is trained on a sufficiently large quantity of training data through an appropriate training process, an optimal performance may be achieved.
In the following description, the neural network 110 may be expressed as being “pre-trained”, where “pre-” may indicate a state before the neural network 110 is “started”. The “started” neural network 110 may indicate that the neural network 110 may be ready for inference. For example, “start” of the neural network 110 may include loading of the neural network 110 in a memory, or an input of input data for inference to the neural network 110 after the neural network 110 is loaded in the memory.
The processing apparatus 100 may perform a convolution operation between an input feature map of each convolutional layer and a weight kernel to process an operation associated with each convolutional layer, and may generate an output feature map based on an operation result of the convolution operation. To process an operation associated with the neural network 110, a plurality of operations including a multiplication and accumulation (MAC) operation may be processed. Also, a large amount of computing resources and time may be consumed to process an operation. The processing apparatus 100 of one or more embodiments may lighten the neural network 110 and perform high-speed operation processing, thereby reducing the above consumption of the computing resources and the time so that the neural network 110 may also be effectively implemented in a resource-limited environment, such as a sensor environment or an embedded environment (for example, a mobile terminal).
For a high speed and low power operation of the neural network 110, a low-bit precision may be applied to an activation and a weight. A relatively low bit precision may be assigned to the weight in comparison to the activation. A reduction in bit precisions of both the activation and the weight may negatively influence a network performance (for example, an accuracy). To effectively maintain the network performance, the bit precision of the activation may be maintained and the bit precision of the weight may be lowered. For example, the activation may be expressed in 8 bits and the weight may be expressed in 3 bits or less (for example, 3 bits or 2 bits).
When the weight is expressed with a low-bit precision, a large number of redundant operations may occur during a convolution operation, which will be further described below. Expressing the weight with the low-bit precision may indicate that a small number of weight values may be expressed with the low-bit precision, and accordingly an operation based on the same weight value may be repeatedly performed. Due to a characteristic of a convolution operation that repeatedly performs a MAC operation, such redundancy may occupy a large portion of the convolution operation. In one or more embodiments, intermediate operation results to be used in the convolution operation may be secured in advance, and may be used to perform the convolution operation, and thus the processing apparatus 100 of one or more embodiments may reduce redundancy that may occur during the convolution operation.
The weight kernels 210, the input feature map 220, and the output feature map 230 may each include a plurality of planes. For example, each of the weight kernels 210 may include “C” weight planes, the input feature map 220 may include “C” input planes, and the output feature map 230 may include “D” output planes. In this example, the “C” weight planes and “C” input planes may respectively correspond to input channels, and the “D” output planes may respectively correspond to output channels (and the “D” output planes may correspond to the “D” number of weight kernels 210, for example). In other words, “C” may correspond to a number of input channels, and “D” may correspond to a number of output channels.
The output feature map 230 may be generated based on the convolution operation between the weight kernels 210 and the input feature map 220. Each of the weight kernels 210 may have a size of “K×K×C” and the input feature map 220 may have a size of “W×H×C”, and accordingly the convolution operation between the weight kernels 210 and the input feature map 220 may correspond to a three-dimensional (3D) convolution operation. For example, a first output plane 231 may be generated as an operation result of a 3D convolution operation between a first weight kernel 211 and the input feature map 220.
Each of the weight kernels 210 may be divided into a plurality of weight planes, and the input feature map 220 may be divided into a plurality of input planes, and accordingly the 3D convolution operation may be reconstructed with a combination of a plurality of 2D convolution operations. For example, the 3D convolution operation between the first weight kernel 211 and the input feature map 220 may be reconstructed by accumulating operation results of 2D convolution operations between weight planes of the first weight kernel 211 and input planes of the input feature map 220, to generate an output plane of the output feature map 230. In this example, a 2D convolution operation between a weight plane and an input plane corresponding to the same input channel, for example, a first weight plane 212 and a first input plane 221, may be performed.
For example, a first operation result may be generated through a convolution operation between the first weight plane 212 and the first input plane 221. The above operation result may be referred to as an “accumulation plane”. When an operation result is expressed in a form of a 2D plane, the operation result may be referred to as an “accumulation plane”. When the operation result is expressed in a form of a one-dimensional (1D) vector, the operation result may be referred to as an “accumulation vector”. Also, the accumulation plane and the accumulation vector may be collectively referred to as “accumulation data”.
Other accumulation planes may be generated through convolution operations between the other input planes of the input feature map 220 and the other weight kernels of the first weight kernel 211. All “C” accumulation planes associated with the input feature map 220 and the first weight kernel 211 may be generated and accumulated, to generate the first output plane 231. Another output plane of the output feature map 230 may be generated through a convolution operation between another weight kernel among the weight kernels 210 and the input feature map 220. When convolution operations for all the weight kernels 210 are completed, the output feature map 230 may be completely generated.
Referring to
Although the path 325 is abstractly shown in
In the example of
Referring to
When weight elements have the same weight value, a large quantity of redundant data may be generated between intermediate planes for the weight elements. For example, when the weight elements W1 and W2 have the same weight value as “2”, the intermediate planes 312 and 314 may be generated by multiplying input elements of the respective regions 311 and 313 in the same input plane 310 by the weight value of “2”. The regions 311 and 313 may overlap in a relatively wide area (for example, may overlap a substantially similar area of the input plane 310), and a multiplication by the same weight value may be performed, and accordingly a large quantity of redundant data may be generated between the intermediate planes 312 and 314. Weight values of weight elements may be briefly referred to as “weight element values”.
The weight elements W1 to W9 may be expressed with a predetermined bit precision. For example, the bit precision of the weight elements W1 to W9 may be indicated by S. In this example, the weight elements W1 to W9 may have “2^S” available weight values. When S decreases, a number of available weight values that may be expressed with S may decrease, and accordingly a probability that weight elements have the same value may increase. In this example, the processing apparatus 100 of one or more embodiments may reduce a redundancy of operations by recombining a convolution operation process. Depending on examples, multiplication results for the input plane 310 may be secured in advance as intermediate operation results, and a convolution operation may be performed based on the multiplication results, and thus the processing apparatus 100 of one or more embodiments may reduce redundancy that may occur during the convolution operation.
Depending on examples, to perform a convolution operation, multiplication results for an input plane 410 may be secured (for example, determined) in advance as intermediate operation results of the convolution operation. For example, based on at least a portion of the available weight values 420, the multiplication results for the input plane 410 may be secured in advance. The multiplication results secured in advance may be referred to as “base planes”, and a weight value used to derive a base plane may be referred to as a “base weight value”.
In an example, to minimize a number of operations used to secure multiplication results and a buffer space used to store the secured multiplication results, a minimum number of multiplication results may be secured. For example, zero skipping may be performed on a weight value of “0”, and the input plane 410 may be used for a weight value of “1” without a change. Thus, multiplication results for the weight values of “0” and “1” may not be separately secured. Also, in an example of weight values (for example, “−2” and “2”, “−3” and “3”, and/or “−4” and “4”) having the same absolute value, one of two multiplication results may be secured. Sign-related processing may be performed on the secured multiplication result, and accordingly the multiplication result on which the sign-related processing is performed may be used as the other multiplication result.
Thus, a multiplication operation for “2^S” available weight values may be covered through multiplication results secured based on “2^(S−1)−1” weight values. Accordingly, a first base plane 431, a second base plane 432 and a third base plane 433 corresponding to multiplication results for the input plane 410 may be generated based on base weight values of “2”, “3” and “4”, as shown in
Each base plane may correspond to a multiplication result between a base weight value and the input plane 410. For example, the first base plane 431 may correspond to a multiplication result between the base weight value of “2” and the input plane 410. Also, the second base plane 432 may correspond to a multiplication result between the base weight value of “3” and the input plane 410, and the third base plane 433 may correspond to a multiplication result between the base weight value of “4” and the input plane 410. In this example, each multiplication operation may correspond to an element-wise operation.
In an example, each multiplication result corresponding to a base plane may be generated through a shift operation and an addition operation, instead of performing a multiplication operation (for example, a direct multiplication operation). Since the multiplication operation requires a larger number of computations than the shift operation and the addition operation, the processing apparatus 100 of one or more embodiments may reduce an amount of computations used to secure a base plane by replacing the multiplication with the shift operation and the addition operation. For example, when the input plane 410 is denoted by I, the first base plane 431 may be generated through I>>1 corresponding to a shift operation. Also, the second base plane 432 may be generated through I+(I>>1) corresponding to a shift operation and an addition operation, and the third base plane 433 may be generated through I>>2 corresponding to a shift operation. In this example, each shift operation and each
When base planes are secured, the base planes may be stored in the buffer space and used for a convolution operation. For example, the first base plane 431 may be stored in a first buffer, the second base plane 432 may be stored in a second buffer, and the third base plane 433 may be stored in a third buffer.
At least one plane among the input plane 410 and the first base plane 431 through the third base plane 433 may be selectively used based on an actual weight value of a weight plane. In an example, when a weight value is “2”, at least a portion of a region (for example, a target region) may be extracted from the first base plane 431 stored in the first buffer, and may be used as intermediate data. In another example, when a weight value is “−3”, at least a portion of a region (for example, a target region) may be extracted from the second base plane 432 stored in the second buffer, and may be used as intermediate data. In this example, a sign of the intermediate data may be inverted, or a subtraction operation instead of an addition operation may be applied in a subsequent accumulation process. As described above, intermediate data may be generated using each base plane based on each weight value of the weight plane, and accumulation data may be generated by accumulating the intermediate data.
Referring to
In an example, for a multiplication operation between a weight element and the input plane 510, a target plane corresponding to a weight value of the weight element may be selected from the input plane 510 and the first base plane 520 through the third base plane 540, and a target point of the target plane may be determined based on an offset of the weight element. In the above process, a pointer may be determined to point to the target plane and the target point. For example, a point pointing to a predetermined target region may be selected from pointers pointing to different regions of a target plane based on an offset of a weight element. In an example, when a weight element has a weight value of “−2” and an offset of “5”, a target region may be pointed to by a pointer P1_5. For example, a target point may correspond to a start address of a target region. When target regions for all weight elements of a weight plane are determined through the above process, accumulation data for the weight plane may be completed through an accumulation operation of target elements of the target regions.
Also, the input plane 610 corresponding to a weight value of “1” of a fifth weight element of the weight plane 620 among the input plane 610 and the base planes related to the input plane 610 may be determined as a second target plane, and a second target region may be determined in the second target plane based on an offset “5” of the fifth weight element. The second target plane and the second target region may be pointed to by a pointer P0_5 for the fifth weight element.
Similarly to the first weight element and the fifth weight element, a pointer P2_6 for a sixth weight element may be derived, a pointer P1_7 for a seventh weight element may be derived, and a pointer P1_9 for a ninth weight element may be derived. Since the other weight elements have a weight value “0”, zero skipping may be performed on the other weight elements. The operation result 630 may include a pointer function f. The pointer function f may map a pointer and a target region.
When target regions are derived using the pointer function, accumulation data for the weight plane 620 may be completed through an accumulation operation of target elements of the target regions. The accumulation operation may correspond to an element-wise operation. Target elements of target regions corresponding to positive weight values, for example, “3”, “1” and “2”, may be accumulated based on an addition operation, and target elements of target regions corresponding to negative weight values, for example, “−3” and “−2”, may be accumulated based on a subtraction operation.
Referring to
Referring to
In addition, a fourth target region 851 of the third base plane 850 pointed to by a pointer P3_3 may be determined corresponding to a third weight element “−4”, and a fifth target region 852 of the third base plane 850 pointed to by a pointer P3_7 may be determined corresponding to a seventh weight element “−4”. Target elements of the fourth target region 851 and the fifth target region 852 may be stored in registers r4 and r5, respectively. The target elements of each of the first target region 841, the second target region 825, the third target region 831, the fourth target region 851 and the fifth target region 852 may correspond to 1D vector data.
An accumulation operation based on the target elements of each of the first target region 841, the second target region 825, the third target region 831, the fourth target region 851 and the fifth target region 852 may be performed. For example, the target elements stored in the registers r1 through r3 may be accumulated in a register r6 based on an addition operation. Also, the target elements stored in the registers r4 and r5 may be accumulated in the register r6 based on a subtraction operation. When accumulation operations associated with the registers r1 through r5 are completed, the accumulation vector 801 corresponding to an operation result of the accumulation-based convolution operation between the first input region 821 and the weight plane 810 of
When the first output region 861 of the output plane 860 includes a prestored accumulation vector, the accumulation vector may be loaded into the register r6, and the target elements in the registers r1 through r5 may be accumulated in the register r6. For example, when a convolution operation for a weight plane other than the weight plane 810 is previously performed for an output channel, an accumulation vector may be previously stored in the first output region 861. In an example, when a convolution operation for the weight plane 810 of
The processing apparatus obtains a weight plane wcd in operation 930 and determines pointers in operation 940. For example, the processing apparatus may determine a first target plane corresponding to a weight value of a first weight element of the weight plane wcd among the c-th input plane ic and the base planes bc and may determine a first target region in the first target plane based on an offset of the first weight element. The processing apparatus may determine a pointer based on an identifier of the first target plane and an identifier (for example, an offset identifier) of the first target region. Thus, the processing apparatus may determine a pointer of each of weight elements of the weight plane wcd. For example, zero skipping may be applied to a weight element having a value of “0”.
In operation 950, the processing apparatus performs an accumulation operation. For example, an element with a positive weight value may be accumulated through an addition operation, and an element with a negative weight value may be accumulated through a subtraction operation. In operation 960, the processing apparatus accumulates outputs. For example, the processing apparatus may accumulate accumulation data accumulated in operation 950 in a corresponding region of an output plane.
In operation 970, the processing apparatus compares d and D. When d and D are different, that is, when a weight kernel for which operations on the c-th input plane ic and the base planes bc have not been completed yet, remains among “D” weight kernels, d may be increased by “1” and operation 930 may be performed. When d and D are the same, that is, when operations on the c-th input plane ic and the base planes bc for all the “D” weight kernels are completed, an accumulation process of the c-th input plane ic may be terminated. In operation 980, the processing apparatus compares c and C. When c and C are different, that is, when an accumulation process for all input channels is not completed, c may be increased by “1” and operations 910 and 930 may be performed. When c and C are the same, that is, when accumulation for all the input channels is completed, an accumulation convolution operation of a corresponding layer may be terminated, which may indicate that an output feature map of the layer is completed.
The processor 1110 may execute instructions to perform at least one or all of the operations described above with reference to
The electronic apparatus 1200 includes a processor 1210 (e.g., one or more processors), a memory 1220 (e.g., one or more memories), a camera 1230, a storage device 1240, an input device 1250, an output device 1260, and a network interface 1270. The processor 1210, the memory 1220, the camera 1230, the storage device 1240, the input device 1250, the output device 1260, and the network interface 1270 may communicate with each other via a communication bus 1280. For example, the electronic apparatus 1200 may be implemented as at least a portion of, for example, a mobile device such as a mobile phone, a smartphone, a personal digital assistant (PDA), a netbook, a tablet computer or a laptop computer, a wearable device such as a smartwatch, a smart band or smart glasses, a computing device such as a desktop or a server, home appliances such as a television (TV), a smart TV or a refrigerator, a security device such as a door lock, or a vehicle such as a smart vehicle.
The processor 1210 may execute instructions and functions in the electronic apparatus 1200. For example, the processor 1210 may process instructions stored in the memory 1220 or the storage device 1140. The processor 1210 may perform at least one of the operations described above with reference to
The memory 1220 may store data for processing of a convolution operation. The memory 1220 may include a non-transitory computer-readable storage medium or a non-transitory computer-readable storage device. The memory 1220 may store instructions that are to be executed by the processor 1210, and also store information associated with software and/or applications when the software and/or applications are being executed by the electronic apparatus 1200.
The camera 1230 may detect visual information and may generate an input image, for example, a photo and/or a video, based on the detected visual information. For example, the camera 1230 may generate a user image including a user (for example, a face). In an example, the camera 1230 may provide a three-dimensional (3D) image including depth information associated with objects.
The storage device 1240 may include a non-transitory computer-readable storage medium or a non-transitory computer-readable storage device. In an example, the storage device 1240 may store a greater amount of information than that of the memory 1220 for a relatively long period of time. For example, the storage device 1240 may include magnetic hard disks, optical disks, flash memories, floppy disks, or other forms of non-volatile memories known in the art.
The input device 1250 may receive an input from a user through a traditional input scheme using a keyboard and a mouse, and through a new input scheme such as a touch input, a voice input and an image input. The input device 1250 may include, for example, a keyboard, a mouse, a touch screen, a microphone, or other devices configured to detect an input from a user and transmit the detected input to the electronic apparatus 1200.
The output device 1260 may provide a user with an output of the electronic apparatus 1200 through a visual channel, an auditory channel, or a tactile channel. The output device 1260 may include, for example, a display, a touchscreen, a speaker, a vibration generator, or any other device configured to provide a user with the output. The network interface 1270 may communicate with an external device via a wired or wireless network.
The processing apparatuses, processors, memories, electronic apparatuses, cameras, storage devices, input devices, output devices, network interfaces, communication buses, processing apparatus 100, processing apparatus 1100, processor 1110, memory 1120, electronic apparatus 1200, processor 1210, memory 1220, camera 1230, storage device 1240, input device 1250, output device 1260, network interface 1270, communication bus 1280, apparatuses, units, modules, devices, and other components described herein with respect to
The methods illustrated in
Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software includes higher-level code that is executed by the one or more processors or computer using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions used herein, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of a non-transitory computer-readable storage medium include read-only memory (ROM), random-access programmable read only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD−ROMs, CD−Rs, CD+Rs, CD−RWs, CD+RWs, DVD−ROMs, DVD−Rs, DVD+Rs, DVD−RWs, DVD+RWs, DVD—RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, blue-ray or optical disk storage, hard disk drive (HDD), solid state drive (SSD), flash memory, a card type memory such as multimedia card micro or a card (for example, secure digital (SD) or extreme digital (XD)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
While this disclosure includes specific examples, it will be apparent after an understanding of the disclosure of this application that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents..
Number | Date | Country | Kind |
---|---|---|---|
10-2020-0102600 | Aug 2020 | KR | national |