The disclosure relates to an electronic device and a method for controlling the electronic device, and more particularly, to an electronic device capable of efficiently performing a convolution operation and a method for controlling the electronic device.
Recently, development of a neural network accelerator or deep learning chipsets for efficiently implementing and executing a function of artificial intelligence has been accelerated according to the development of an Artificial Intelligence (AI) field.
In the case of a neural network accelerator for performing a convolution operation process, there is a need for a technology for a single neural network accelerator to efficiently process both a 3D convolution operation and a depthwise convolution operation.
However, in the case of a neural network accelerator having a hardware structure for parallel processing in an input channel direction according to the related art, when performing a 3D convolution operation, an operation may be performed by using all operators, whereas when a depthwise convolution operation is performed, even though a relatively small amount of operation amount is required compared to a case of performing a 3D convolution operation, there is a problem in that an operator may not be efficiently utilized.
An aspect of the disclosure is to provide an electronic device capable of efficiently performing a convolution operation by using a parallel hardware structure of a neural network accelerator, and a method for controlling the electronic device.
According to an aspect of the disclosure, an electronic device includes: a memory configured to store three-dimensional input data includes (i) a plurality of input values divided based on a plurality of channels, (ii) first kernel information on a kernel includes a plurality of weights for each of the plurality of channels, and (iii) second kernel information generated by converting the plurality of weights configured in a two-dimensional matrix form for each of the plurality of channels to a three-dimensional matrix form. A processor includes a plurality of multiplication modules corresponding to the plurality of channels. The processor is configured to perform a convolution operation based on the plurality of input values and the plurality of weights through the plurality of multiplication modules. The processor is further configured to: based on the convolution operation being a depthwise convolution operation, control an input selection module to (a) configure the plurality of input values to correspond to a first channel among the plurality of channels and (b) input the plurality of input values to two or more multiplication modules among the plurality of multiplication modules, input first set of weights corresponding to the first channel, one by one, to the two or more multiplication modules based on the second kernel information, obtain a plurality of intermediate values based on each of the multiplication operation results by performing a multiplication operation with each of the plurality of weights for each of the plurality of input values through the two or more multiplication modules, and obtain a plurality of output values based on each of a summed result by summing intermediate values respectively corresponding to locations of the kernels from among the plurality of intermediate values through a first intermediate value accumulation module.
Each of the plurality of input values corresponding to the first channel is an input to the input selection module for each preset cycle, and the input selection module is configured to transmit each of the plurality of input values to the two or more multiplication modules for each of the preset cycle.
The two or more channels include the first channel and at least one channel adjacent to the first channel, and a number of the two or more multiplication modules corresponds to a number of the plurality of weights included in the first kernel information.
The kernel is a two-dimensional kernel, and the processor further includes a buffer storing intermediate values corresponding to a row of the kernel among the plurality of intermediate values, and the processor obtains the plurality of output values by summing intermediate values corresponding to each of locations of the kernel among the intermediate values stored in the buffer through the first intermediate value accumulation module.
The processor is further configured to obtain the plurality of output values by performing the convolution operation by using the two or more multiplication modules corresponding to the number of the plurality of weights included in the kernel in parallel.
The processor is further configured to: based on the convolution operation being a three-dimensional convolution operation, control the input selection module to bypass the input values that are input to the input selection module to the plurality of multiplication modules, and input a second set of weights corresponding to each of the plurality of multiplication modules to the plurality of multiplication modules based on the first kernel information.
The processor further includes a second intermediate value accumulation module to sum intermediate values for each of the plurality of channels obtained through the plurality of multiplication modules.
According to another aspect of the disclosure, a method of controlling an electronic device includes: performing a convolution operation, through a plurality of multiplication modules corresponding to a plurality of channels, based on three-dimensional input data includes (i) a plurality of input values divided based on the plurality of channels, (ii) first kernel information on a kernel includes a plurality of weights for each of the plurality of channels, and (iii) second kernel information generated by converting the plurality of weights configured in a two-dimensional matrix form for each of the plurality of channels to a three-dimensional matrix form. Based on the convolution operation being a depthwise convolution operation, the method further includes controlling an input selection module to (a) configure a plurality of input values corresponding to a first channel among the plurality of channels and (b) input the plurality of input values to two or more multiplication modules among the plurality of multiplication modules; inputting a first set of weights corresponding to the first channel, one by one, to the two or more multiplication modules based on the second kernel information; obtaining a plurality of intermediate values based on each of the multiplication operation results by performing a multiplication operation with each of the plurality of weights for each of the plurality of input values through the two or more multiplication modules; and obtaining a plurality of output values based on each of a summed result by summing intermediate values respectively corresponding to locations of the kernels from among the plurality of intermediate values through a first intermediate value accumulation module.
Each of the plurality of input values corresponding to the first channel are an input to the input selection module for each preset cycle, and the input selection module transmits each of input values input to two or more multiplication modules for each of the preset cycle.
The two or more channels include the first channel and at least one channel adjacent to the first channel, and a number of the two or more multiplication modules corresponds to a number of the plurality of weights included in the first kernel.
The method further includes obtaining the plurality of output values by summing intermediate values corresponding to each of locations of the kernel among intermediate values stored in a buffer through the first intermediate value accumulation module. The buffer is configured to store intermediate values corresponding to a row of the kernel among the plurality of intermediate values.
The method further includes obtaining the plurality of output values by performing the convolution operation by using two or more multiplication modules corresponding to the number of the plurality of weights included in the kernel in parallel.
The method further includes: based on the convolution operation being a three-dimensional convolution operation, controlling the input selection module to bypass the plurality of input values input to the input selection module to the plurality of multiplication modules; and inputting a second set of weights corresponding to each of the plurality of multiplication modules to the plurality of multiplication modules based on the first kernel information.
According to another aspect of the disclosure, a non-transitory computer readable recording medium includes a program for executing a control method of an electronic device. The electronic device performs a convolution operation, through a plurality of multiplication modules corresponding to a plurality of channels, based on three-dimensional input data includes (i) a plurality of input values divided based on the plurality of channels, (ii) first kernel information on a kernel includes weights for each of the plurality of channels, and (iii) second kernel information generated by converting the plurality of weights configured in a two-dimensional matrix form for each of the plurality of channels to a three-dimensional matrix form. The method of controlling the electronic device includes: based on the convolution operation being a depthwise convolution operation, controlling an input selection module such that a plurality of input values corresponding to a first channel among the plurality of channels are input to all of two or more multiplication modules among the plurality of multiplication modules; inputting a first set of weights corresponding to the first channel, one by one, to the two or more multiplication modules based on the second kernel information; obtaining a plurality of intermediate values based on each of the multiplication operation results by performing a multiplication operation with each of the plurality of weights for each of the plurality of input values through the two or more multiplication modules; and obtaining a plurality of output values based on each of a summed result by summing intermediate values respectively corresponding to locations of the kernels from among the plurality of intermediate values through a first intermediate value accumulation module.
According to another aspect of the disclosure, a method of accelerating in calculation of convolution operations by using a parallel hardware structure of a neural network accelerator includes a plurality of multiplication modules and an input selection module. The method includes: receiving, by the plurality of multiplication modules, three-dimensional input data includes: (i) a plurality of input values divided based on the plurality of channels, (ii) first kernel information on a kernel includes a plurality of weights for each of the plurality of channels, and (iii) second kernel information generated by converting the plurality of weights performing a convolution operation corresponding to a plurality of channels, through the plurality of multiplication modules, based on the three-dimensional input data includes: controlling, based on the convolution operation being a depthwise convolution operation, the input selection module to: (a) configure a plurality of input values corresponding to a first channel among the plurality of channels, and (b) to input the plurality of input values to two or more multiplication modules among the plurality of multiplication modules; inputting a first set of weights corresponding to the first channel, one by one, to the two or more multiplication modules based on the second kernel information; obtaining a plurality of intermediate values based on each of the multiplication operation results by performing a multiplication operation with each of the plurality of weights for each of the plurality of input values through the two or more multiplication modules; obtaining a plurality of output values based on each of a summed result by summing intermediate values respectively corresponding to locations of the kernels from among the plurality of intermediate values through a first intermediate value accumulation module, and transmitting the obtained plurality of output values to a device connected to the neural network accelerator.
The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
The disclosure may have various modifications and includes various embodiments, some of which are illustrated in the drawings and described in detail in the detailed description. However, this disclosure is not intended to limit the embodiments described herein but includes various modifications, equivalents, and / or alternatives. In the context of the description of the drawings, like reference numerals may be used for similar components.
In describing the disclosure, well-known functions or constructions are not described in detail since they would obscure the disclosure with unnecessary detail. In addition, the embodiments described below may be modified in various different forms, and the scope of the technical concept of the disclosure is not limited to the following embodiments. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terms used in this disclosure are used merely to describe a particular embodiment, and are not intended to limit the scope of the claims. The expression of a singular includes a plurality of representations, unless the context clearly indicates otherwise.
The terms “have”, “may have”, “include”, and “may include” used in the example embodiments of the disclosure indicate the presence of corresponding features (for example, elements such as numerical values, functions, operations, or parts), and do not preclude the presence of additional features.
In the description, the term “A or B”, “at least one of A or/and B”, or “one or more of A or/and B” may include all possible combinations of the items that are enumerated together. For example, the term “at least one of A or/and B” includes (1) including at least one A, (2) including at least one B, or (3) including both at least one A and at least one B.
In addition, expressions “first”, “second”, or the like, used in the disclosure may indicate various components regardless of a sequence and/or importance of the components, may be used to distinguish one component from the other components, and do not limit the corresponding components.
When any component (for example, a first component) is (operatively or communicatively) coupled with/to or is connected to another component (for example, a second component), it is to be understood that any component may be directly coupled with/to another component or may be coupled with/to another component through the other component (for example, a third component).
On the other hand, when any component (for example, a first component) is “directly coupled with/to” or “directly connected to” to another component (for example, a second component), it is to be understood that the other component (for example, a third component) is not present between the directly coupled components.
The expression “configured to” used in the disclosure may be interchangeably used with other expressions such as “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” and “capable of,” depending on cases. The term “configured to” does not necessarily refer to a device being “specifically designed to” in terms of hardware.
Instead, under some circumstances, the expression “a device configured to” may refer, for example, to the device being “capable of” performing an operation together with another device or component. For example, the phrase “a processor configured to perform A, B, and C” may refer, for example, to a dedicated processor (e.g., an embedded processor) for performing the corresponding operations, or a generic-purpose processor (e.g., a central processing unit (CPU) or an application processor) that can perform the corresponding operations by executing one or more software programs stored in a memory device.
The term such as “module,” “unit,” “part”, and so on may refer, for example, to an element that performs at least one function or operation, and such element may be implemented as hardware or software, or a combination of hardware and software. Further, except for when each of a plurality of “modules”, “units”, “parts”, and the like needs to be realized in an individual hardware, the components may be integrated in at least one module or chip and be realized in at least one processor.
It is understood that various elements and regions in the figures may be shown out of scale. Accordingly, the scope of the disclosure is not limited by the relative sizes or spacing drawn from the accompanying drawings.
Hereinafter, an embodiment according to the disclosure will be described in detail with reference to the accompanying drawings so as to be easily carried out by a person skilled in the art to which the disclosure belongs.
An electronic device 100 is a device to perform a convolution operation. To be specific, the electronic device 100 may obtain output data by performing the convolution operation based on input values included in input data and weights by kernels.
“Input data” may be three-dimensional data including input values distinguished according to a plurality of channels. Specifically, the input data may be a three-dimensional matrix including a plurality of input values divided according to a row, a column, and a depth, and may be divided into a plurality of channels corresponding to each depth. The term “input data” may be replaced with the term “input feature map” or the like, and the term “input value” may be replaced with the term “input activation value.”
The “kernel” may be a matrix including a plurality of weights for performing a multiplication operation with input values. Specifically, a kernel may be constructed in the form of a matrix of one-dimensional matrix, two-dimensional matrix, or three-dimensional matrix according to the type of convolution operation to be performed. The size of the kernel may be determined according to the horizontal length (i.e., the number of columns), the vertical length (i.e., the number of rows), the depth (i.e., the depth) of the kernel, and the plurality of weights included in the kernel may be divided according to a plurality of channels corresponding to the depth of the kernel. The term “kernel” may be replaced with terms such as a filter or a mask.
The “convolution operation” refers to an operation of multiplying input values included in input data and weights included in a kernel, respectively, and then summing each of the multiplication results. In particular, the convolution operation may include a three-dimensional (3D) convolution operation and a depthwise convolution operation. The 3D convolution operation refers to a convolution operation of obtaining 3D output data by using 3D input data and a 3D kernel, and the depthwise convolution operation refers to a convolution operation of obtaining 3D output data by using 3D input data and a one-dimensional kernel or two-dimensional kernel. A detailed calculation process based on each type of the convolution operation will be described below together with the description of an embodiment according to the disclosure. The convolution operation may be performed through a neural network model such as a Convolutional Neural Network (CNN). However, the type of a neural network model to which the disclosure may be applied is not limited thereto.
The “output data” may be a 3D matrix including a plurality of output values divided according to rows, columns, and depths, and may be divided into a plurality of channels corresponding to respective depths. The row, column, and depth of the output data do not correspond to the row, column, and depth of the input data, and the row, column, and depth of the output data may vary depending on the size, stride, padding, etc. of the kernel used for the convolution operation. The term “output data” may be replaced with terms such as an “output feature map”, and the term “output value” may be replaced with the term “output activation value”.
As shown in
At least one instruction regarding the electronic device 100 may be stored in the memory 110. In addition, an operating system (OS) for driving the electronic device 100 may be stored in the memory 110. The memory 110 may store various software programs or applications for operating the electronic device 100 according to various embodiments. The memory 110 may include a semiconductor memory such as a flash memory, a magnetic storage medium (such as a hard disk), or the like.
Specifically, the memory 110 may store various software modules for operating the electronic device 100, and the processor 120 may control the operation of the electronic device 100 by executing various software modules that are stored in the memory 110. That is, the memory 110 may be accessed by the processor 120, and may perform reading, recording, modifying, deleting, updating, or the like, of data by the processor 120.
The memory 110 may include a non-volatile memory 110A capable of maintaining stored information even when power supply is interrupted, and a volatile memory 110B requiring continuous power supply to maintain the stored information. For example, the non-volatile memory 110A may be implemented with at least one of One Time Programmable ROM (OTPROM), Programmable ROM (PROM), Erasable and Programmable ROM (EPROM), Electrically Erasable and Programmable ROM (EEPROM), mask ROM, or flash ROM, and the volatile memory 110B may be implemented with at least one of Dynamic RAM (DRAM), Static RAM (SRAM), or Synchronous Dynamic RAM (SDRAM). In this disclosure, a term memory may be used to include the memory 110, the ROM, RAM in the processor 120, or a memory card (e.g., a micro Secure Digital (SD) card, memory stick) mounted to the electronic device 100.
In various embodiments according to the disclosure, the memory 110 may store input data, output data, information on a weight, and the like according to the disclosure. Although information on a weight may be simply stored in
The “first kernel information” refers to information on a kernel including a weight for each of a plurality of channels. For example, the first kernel information may be information on a kernel including a weight of a 3 * 3 (horizontal * vertical) matrix for each of a plurality of channels. The “second kernel information” refers to information on a kernel converted so that weights for each of a plurality of channels of the first kernel information are arranged in the direction of the plurality of channels. For example, the second kernel information may be information on a kernel generated by converting weights configured in the form of a 3 * 1 (horizontal * vertical) matrix for each of a plurality of channels into a multi-channel form of 1 * 1 * 3 (horizontal * vertical * depth).
In other words, the first kernel information is a term for referring to information on a kernel of a typical form used for a convolution operation, that is, information on a kernel composed of a matrix for each of a plurality of channels. The second kernel information is a term for referring to information on a kernel generated by converting information on a kernel composed of a matrix for each of a plurality of channels into a kernel in a multi-channel form in order to perform a depthwise convolution operation in parallel among convolution operations according to the disclosure.
Various information required within a range for achieving the purpose of the disclosure may be stored in the memory 110, and the information stored in the memory 110 may be received from an external device or updated through input by a user.
The processor 120 controls overall operations of the electronic device 100. Specifically, the processor 120 is connected to a configuration of the electronic device 100 including the memory 110 as described above, and controls overall operations of the electronic device 100 by executing at least one instruction stored in the memory 110 as described above.
The processor 120 may be implemented in various ways. For example, the processor 120 may be implemented as at least one of an Application Specific Integrated Circuit (ASIC), an embedded processor, a microprocessor, a hardware control logic, a hardware Finite State Machine (FSM), a Digital Signal Processor (DSP), or the like. Further, processor 120 may include at least one of a Central Processing Unit (CPU), a Graphic Processing Unit (GPU), a Main Processing Unit (MPU), or the like.
The processor 120 may load data required for performing various operations from the non-volatile memory 110A to the volatile memory 110B. The loading refers to an operation of loading and storing data stored in the non-volatile memory 110A in the volatile memory 110B so that the processor 120 may access. The volatile memory 110B may be implemented as a component included in the processor 120 as one component of the processor 120, but this is merely an embodiment, and may be implemented as a separate component from the processor 120.
In particular, one or more processors 120 according to the disclosure may be implemented. The processor 120 may include a neural network accelerator for efficiently controlling an operation process of a convolutional neural network model, and a central processing unit (CPU) for controlling operations of various configurations including a neural network accelerator. In addition, the neural network accelerator may include a plurality of Micro Processor Units (MPUs) and the like, and the plurality of MPUs may include a plurality of modules for implementing one or more embodiments according to the disclosure.
As shown in
“The input selection module 121” refers to a module for transmitting input values to a plurality of multiplication modules 122 in different manners according to the type of convolution operation. When an input value included in the input data is received, the input selection module 121 may transmit the input values to the plurality of multiplication modules 122 based on the type of the convolution operation. The input selection module 121 may be implemented as a software module as well as a hardware module included in the processor 120. Hereinafter, for convenience of description, a case where the input selection module 121 is implemented as a hardware module included in the processor 120 will be described.
“The plurality of multiplication modules 122” refers to a module for performing a multiplication operation between an input value and a weight. Specifically, when an input value is received from the input selection module 121 and a weight is received by the processor 120, the plurality of multiplication modules 122 may multiply the received input value and the weight to transmit an intermediate value, which is a result value of the multiplication, to the intermediate value accumulation module 123.
The “intermediate value accumulation module 123” refers to a module for obtaining an output value by summing intermediate values obtained in a convolution operation process. Specifically, when a plurality of intermediate values are received from the plurality of multiplication modules 122, the intermediate value accumulation module 123 may obtain and output an output value based on a plurality of intermediate values. The intermediate value accumulation module 123 may include a first intermediate value accumulation module 123-1 (e.g., as shown in
Here, a depthwise procedure using the first intermediate value accumulation module 123-1 will be first described, and a 3D convolution process using the second intermediate value accumulation module 123-2 will be described with reference to
The processor 120 may identify the type of a convolution operation to be performed, and control the plurality of modules in a different manner according to whether the convolution operation to be performed is a 3D convolution operation or a depthwise convolution operation. In one embodiment, the processor 120 may use first kernel information when performing a 3D convolution operation and use second kernel information when performing a depthwise convolution operation. Hereinafter, an embodiment related to a depthwise convolution operation will be described with reference to
Specifically,
The processor 120 may input, to a plurality of MPUs for each preset cycle, input values having the same row and column and different depths among input values included in the input data. However, in the case of a depthwise convolution operation, unlike a 3D convolution operation as described below, a process of adding intermediate values for input values of different channels in order to obtain one output value is not necessary. Accordingly, in describing an embodiment of a depthwise convolution operation, an operation process for input values corresponding to a first channel among a plurality of channels will be mainly described.
When the depth convolution operation is performed, the processor 120 may control the input selection module 121 such that a plurality of input values corresponding to a first channel among the plurality of channels are input to all of two or more multiplication modules 122 among the plurality of multiplication modules 122. In particular, the number of two or more multiplication modules 122 may correspond to the number of the plurality of weights included in the kernel. In the following description of the disclosure, the term “two or more multiplication modules 122” is used for specifying multiplication modules 122 used in a depthwise convolution operation according to the disclosure among a plurality of multiplication modules 122.
In other words, the processor 120 may control the input selection module 121 such that the input value is input to all of the multiplication modules 122 as many as the number of the plurality of weights included in the kernel whenever each of a plurality of inputs corresponding to the first channel is input to the input selection module 121. In contrast, in the related art, the input values corresponding to the first channel are input to the multiplication module 122 corresponding to the first channel in the depthwise convolution operation.
For example, as the operation process of time T2 at
The processor 120 may input a plurality of weights corresponding to the first channel, one by one, to two or more multiplication modules 122 based on the second kernel information.
In other words, the processor 120 according to the disclosure may input a plurality of weights corresponding to the first channel, one by one, to two or more multiplication modules 122 based on the second kernel information converted into the kernel in the form of a multi-channel. In contrast, in the related art, a plurality of weights corresponding to the first channel is input to only the multiplication module 122 corresponding to the first channel in a depthwise convolution operation.
For example, as in an operation process at time T2 in
A plurality of weights (W0, W1, and W2) may be constructed in the form of a multi-channel form of 1 * 1 * 3 (horizontal * vertical * depth) and stored in the memory 110 as second kernel information.
As described above, when the plurality of input values and the plurality of weights are input to the plurality of multiplication modules 122, the processor 120 may perform a multiplication operation with each of the plurality of weights for each of the plurality of input values through the plurality of multiplication modules 122 to obtain a plurality of intermediate values based on each multiplication operation result.
For example, as in the operation process at time T2 of
Referring to
As described above, when a plurality of intermediate values are obtained, the processor 120 may sum intermediate values corresponding to each location of a kernel among the plurality of intermediate values through the intermediate value accumulation module 123 to obtain a plurality of output values according to each sum result. Specifically, the processor 120 may obtain a plurality of output values according to a sum of intermediate values by using a first intermediate value accumulation module 123-1 among the intermediate value accumulation modules 123. The “intermediate values respectively corresponding to the positions of the kernels” refer to intermediate values obtained according to a multiplication result by multiplying a plurality of weights included in a kernel and a plurality of input values corresponding thereto while sequentially moving a kernel according to a predetermined interval (i.e., stride) on a matrix of input data.
For example, when the operation process of T3 and T4 is sequentially performed in
An output value O2 is obtained by summing an intermediate value F2 * W0 obtained at a time T2, an intermediate value F3 * W1 obtained at a time T3, and an intermediate value F4 * W2 obtained at a time T4, but the value may be obtained by a similar method in the case of an output value O0 and an O1. If an intermediate value F0 * W0 is obtained at a time T0 through a plurality of multiplication modules 122, and if an intermediate value F1 * W0 and F1 * W1 are obtained at a time T1, an intermediate value F0 * W0, F1 * W1, and F2 * W2 may be summed at a time T2 to obtain an output value O0 addition, and an intermediate value F1 * W0, F2 * W1, and F3 * W2 may be summed at time T3 to obtain an output value O1.
The intermediate values obtained through the plurality of multiplication modules 122 may be temporarily stored in a register included in the intermediate value accumulation module 123, and the intermediate values stored in the register may be used to obtain a final output value. Each of “A” and “B” of
In the configuration included in the first intermediate value accumulation module 123-1, the registers included in the accumulation module may be referred to as a so-called Partial Sum Register (PSR), and among the configurations included in the first intermediate value accumulation module 123-1, the configuration illustrated as “+” symbol may be referred to as a so-called Partial Sum Adder (PSA) as a summer included in the intermediate value module.
The processor 120 may obtain an intermediate value F0*W1 and F0*W2 as well as the intermediate value F0*W0 at time T0. The processor 120 may also obtain intermediate value F1*W2 as well as the intermediate value F1*W0 and F1*W1 But, the intermediate values F0*W1, F0*W2 and F1*W2 do not correspond to intermediate values corresponding to the location of the kernel and do not need to be obtained.
The above description has been made on the basis of a calculation process for input values corresponding to a first channel among a plurality of channels for convenience of description as described above, but an operation process according to an embodiment as described above may also be applied to other channels other than the first channel among the entire plurality of channels. Accordingly, the electronic device 100 may obtain output data including output values for the entire input data.
The first kernel information used for the 3D convolution operation and the second kernel information used for the depthwise convolution operation are respectively preconstructed and stored in the memory 110, according to an embodiment. In a state where only the first kernel information is stored in the memory 110, the processor 120 may convert weights for each of a plurality of channels of the first kernel information in a direction of a plurality of channels, input the same to the plurality of multiplication modules 122, and may perform depthwise convolution operation.
According to the embodiment described above with reference to
Specifically, when performing a depthwise convolution operation, the electronic device 100 may obtain an output value for every cycle from a time T2 corresponding to a third cycle by arranging a kernel arranged in a horizontal direction in a 3D convolution operation in a channel direction to multiply one input value with each of a plurality of weight values through a plurality of multiplication modules 122 at the same time. Accordingly, whenever each output value is obtained, it is possible to achieve an improvement in operation efficiency (e.g., acceleration in calculations of convolution operations), which is three times higher than that of a related art requiring a calculation process of three cycles.
An example as shown in
The electronic device 100 according to the disclosure may perform a depthwise convolution operation as well as a 3D convolution operation, and a detailed process of performing a 3D convolution operation under an architecture according to the disclosure as illustrated in
As shown in
In detail, when performing a one-dimensional convolution operation with reference to
The processor 120 according to an embodiment of the disclosure may further include the buffer 123-3, and the buffer 123-3 may be used to store some of intermediate values obtained for each row of the kernel. Although
In
Referring to
As shown in
Furthermore, the processor 120 may obtain a third intermediate value according to an operation result of a third row input value F2 and a third row weight K2 through the plurality of multiplication modules 122, and may obtain one output value O0 by adding the third intermediate value to a sum value of the first intermediate value and the second intermediate value stored in the buffer 123-3. Here, “one output value” refers to one output value among output values corresponding to each case in which a two-dimensional kernel is located on a matrix of input data.
The processor 120 may obtain the other output value O1 by adding an intermediate value according to the calculation result of the input value F1 of the second row and the weight K0 of the first row, an intermediate value according to the operation result of the input value F2 of the third row and the weight K1 of the second row, and an intermediate value according to the operation result of the input value F3 of the fourth row and the weight K2 of the third row through the first intermediate value accumulation module 123-1.
In
Specifically, when an input value F0 of the first row among a plurality of input values corresponding to a first channel is input to the input selection module 121, the processor 120 may control the input selection module 121 such that the F0 is input to the multiplication module 122 corresponding to the first channel, and also the multiplication module 122 corresponding to each of the second channel and the third channel adjacent to the first channel. The processor 120 may input a plurality of weights W0,0, W0,1, and W0,2, one by one, corresponding to the first row to each of three multiplication modules 122 corresponding to each of the first channel, the second channel, and the third channel.
When an input value F0 is input to all three multiplication modules 122 corresponding to each of a first channel, a second channel, and a third channel, and weights W0,0, W0,1, and W0,2 are input to each of three multiplication modules 122, one by one, the processor 120 may perform a multiplication operation between F0 and W0,0 through the multiplication module 122 corresponding to a first channel to obtain an intermediate value called F0*W0,0, and may perform a multiplication operation between F0 and W0,1 through the multiplication module 122 corresponding to a second channel to obtain an intermediate value called F0 * W0,1, and may perform a multiplication operation between F0 and W0,2 through the multiplication module 122 corresponding to the third channel to obtain an intermediate value of F0*W0,2.
The processor 120 may obtain an output value corresponding to the input value F0 of the first row by summing the intermediate values through the first intermediate value accumulation module 123-1, and store the intermediate value of the output value O0 corresponding to the input value F0 in the first row in the buffer 123-3.
When an output value corresponding to the input value F1 of the second row is obtained, in the same manner as the process of obtaining an output value corresponding to the input value F0 of the first row, the processor 120 may add an output value corresponding to the input value F1 of the second row to an intermediate value stored in the buffer 123-3 through the first intermediate value accumulation module 123-1 and store the sum value in the buffer 123-3.
Furthermore, when an output value corresponding to the input value F2 of the third row is obtained in the same manner as the process of obtaining an output value corresponding to the input value F0 of the first row and an output value corresponding to the input value F1 of the second row, the processor 120 may obtain one output value O0 by summing the output value corresponding to the input value F2 of the third row with the summation value stored in the buffer 123-3 to through the first intermediate value accumulation module 123-1.
According to the embodiment described above with reference to
The electronic device 100 may perform a two-dimensional convolution operation by only adding one buffer 123-3 for cumulatively storing each result of the one-dimensional convolution operation, the electronic device 100 may perform a two-dimensional convolution operation by using a hardware area compared to the embodiment of
As shown in (1) of
Next, as shown in (2) of
When an input value F2 of a plurality of third rows is input, the processor 120 may obtain an output value O0 by adding a result obtained by performing an operation with a weight K2 of a third row as shown in (3) of
The processor 120 obtains intermediate values corresponding to the row of the kernel and stores the intermediate values in the buffer 123-3, and accumulates the intermediate values obtained for each row in the column direction to perform a two-dimensional convolution operation, but according to another embodiment, the processor 120 may perform a two-dimensional convolution operation by obtaining intermediate values corresponding to the column of the kernel and storing the intermediate values in the buffer 123-3 and accumulating the intermediate values obtained for each column in the row direction.
According to the embodiment described above with reference to
Various embodiments of the disclosure have been made on the basis of the case in which a depthwise convolution operation is performed, but as described above, the processor 120 may perform a 3D convolution operation by using a plurality of modules according to the disclosure.
As shown in
In the 3D convolution operation, operations between input values included in three-dimensional input data and weights included in a three-dimensional kernel are performed. All of a plurality of multiplication modules 122 corresponding to each of the plurality of channels are used. Specifically, in a 3D convolution operation, a convolution operation is performed between a set of input values having the same row and column and different depths among input values included in input data and a set of weights having the same rows and columns and different depths among the weights included in the kernel. Accordingly, in inputting weights to a plurality of multiplication modules 122 in a 3D convolution operation, first kernel information may be used instead of second kernel information converted into a multi-channel form.
Specifically,
In particular, in the description of
At the time of T0, the processor 120 may control the input selection module 121 such that the input values included in the F0 are input to the multiplication module 122 of the channel corresponding to each input value. The processor 120 may control the input selection module 121 such that an input value by a plurality of channels are input to the multiplication module 122 of the corresponding channel in the same manner as the input value corresponding to the first channel among the input values included in the F0 is input to the multiplication module 122 corresponding to the first channel. An input value corresponding to the second channel is input to the multiplication module 122 corresponding to the second channel.
The processor 120 may input weights included in W0 to the multiplication module 122 of a channel corresponding to each weight. For example, the processor 120 may input a weight by a plurality of channels to the multiplication module 122 of the corresponding channel in the same manner as the weight corresponding to the first channel among the weights included in the W0 is input to the multiplication module 122 corresponding to the first channel. The weight corresponding to the second channel is input to the multiplication module 122 of the corresponding channel.
When input values included in F0 and weights included in W0 are input to a plurality of multiplication modules 122, the processor 120 may obtain a first intermediate value according to a multiplication operation result of an input value (F0(0)) of a first channel included in the F0 and a weight value (W0(0)) of a first channel included in the W0 through a multiplication module 122 corresponding to the first channel, obtain a second intermediate value according to a multiplication operation result of a second channel input value (F0(1)) included in the F0 and a weight value (W0(1)) of a second channel included in the W0 through the multiplication module 122 corresponding to the second channel, and obtain intermediate values corresponding to each of the third channel to the 64th channel in a similar manner.
When intermediate values corresponding to each of the first to 64th channels are obtained, the processor 120 may sum intermediate values corresponding to each of the first to 64th channels through the second intermediate value module to obtain a first sum value (O0(1)) according to the summation result.
At time T1, the processor 120 may obtain a second sum value O0(2), and add the first sum value O0(1) and the second sum value O0(2) to obtain a sum value O0(1~2) according to the summation result.
In addition, at time T2, the processor 120 may obtain a third sum value O0(3), sum the first sum value O0(1), the second sum value O0(2), and the third sum value O0(3), and obtain one output value O0 according to the summation result in the same manner as the process of obtaining the first summation value O0(1) and the second summation value O0(2). Here, one output value refers to one output value among output values corresponding to each case where a three-dimensional kernel is located on a matrix of input data.
According to the embodiment described above with reference to
Referring to
In other words, the electronic device 100 according to the disclosure may control the input selection module 121 such that whenever each of a plurality of inputs corresponding to the first channel is input to the input selection module 121, the input value is input to all the multiplication modules 122 as many as the number of a plurality of weights included in the kernel, unlike the depthwise convolution operation in the related art in which input values corresponding to the first channel are input to the multiplication module 122 corresponding to the first channel.
The electronic device 100 may input a plurality of weights corresponding to a first channel, one by one, to two or more multiplication modules 122 in operation S1230. In other words, the electronic device 100 according to the disclosure may input a plurality of weights corresponding to a first channel one by one to two or more multiplication modules 122 on the basis of second kernel information in the form of a multi-channel, unlike the depthwise convolution operation in the related art in which a plurality of weights corresponding to the first channel are input only to the multiplication module 122 corresponding to the first channel.
The electronic device 100 may perform a multiplication operation with each of a plurality of weights for each of a plurality of input values through two or more multiplication modules 122 to obtain a plurality of intermediate values according to each multiplication operation result in operation S1240. In addition, the electronic device 100 may sum intermediate values corresponding to each location of a kernel among a plurality of intermediate values through the first intermediate value accumulation module 123-1 to obtain a plurality of output values according to each addition result in operation S1250.
The operations of the control method of the electronic device 100 according to the disclosure have been briefly described above, but this is only to omit the redundant description of the same contents, and various embodiments related to the control process by the processor 120 may be applied to the control method of the electronic device 100 as well.
The control method of the electronic device 100 according to the above-described embodiment may be implemented as a program and provided to the electronic device 100. In particular, a program including a control method of the electronic device 100 may be stored and provided in a non-transitory computer readable medium.
Specifically, in a non-transitory computer-readable recording medium including a program for executing a control method of the electronic device 100, when a convolution operation is a depthwise convolution operation, the method for controlling the electronic device 100 may include controlling the input selection module 121 such that each of a plurality of input values corresponding to a first channel among a plurality of channels for distinguishing input data is input to all of a plurality of multiplication modules 122 corresponding to each of two or more channels among a plurality of channels; inputting each of the plurality of weights corresponding to the first channel to the plurality of multiplication modules 122 one by one; obtaining a plurality of intermediate values according to each of multiplication operation results by performing a multiplication operation with each of the plurality of weights for each of the plurality of input values through the plurality of multiplication modules 122; and obtaining a plurality of output values according to the sum result by summing intermediate values corresponding to each of the plurality of weights among the plurality of intermediate values through the first intermediate value accumulation module 123-1.
The non-transitory computer readable medium may include a medium that stores data semi-permanently rather than storing data for a very short time, such as a register, a cache, a memory, etc., and is readable by an apparatus (i.e., executable by at least one processor). For example, the aforementioned various applications or programs may be stored in the non-transitory computer readable medium, for example, a Compact Disc (CD), a Digital Versatile Disc (DVD), a hard disc, a Blu-ray disc, a Universal Serial Bus (USB), a memory card, a Read Only Memory (ROM), and the like, and may be provided.
The controlling method of the electronic device 100 and the non-transitory computer-readable recording medium including a program for executing a controlling method of the electronic device 100 are described in brief, but this is merely to avoid repetitive description, and the various embodiments of the electronic device 100 may be applied to the controlling method of the electronic device 100, and a computer-readable recording medium including a program executing a controlling method of the electronic device 100.
According to one or more embodiments of the disclosure as described above, an electronic device may efficiently perform a depthwise convolution operation. A function related to the neural network model and a convolution operation process may be performed through the memory 110 and the processor 120.
The processor 120 may include one or a plurality of processors 142. At this time, one or a plurality of processors 120 may be a general purpose processor, such as a Central Processing Unit (CPU), an Application Processor (AP), or the like, a graphics-only processing unit such as a Graphics Processing Unit (GPU), a Visual Processing Unit (VPU), or an AI-dedicated processor such as a Neural Processing Unit (NPU), MPU.
The one or a plurality of processors 120 control the processing of the input data in accordance with a predefined operating rule or Artificial Intelligence (AI) model stored in the non-volatile memory 110A and the volatile memory 110B. The predefined operating rule or artificial intelligence model is provided through training or learning.
Being provided through learning may refer, for example, to, by applying a learning algorithm to a plurality of learning data, a predefined operating rule or AI model of a desired characteristic being made. The learning may be performed in a device itself in which AI according to an embodiment is performed, and/or may be implemented through a separate server/system.
The AI model may include a plurality of neural network layers. Each layer has a plurality of weight values, and performs a layer operation through calculation of a previous layer and an operation of a plurality of weights. Examples of neural networks include, but are not limited to, CNN, Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), Generative Adversarial Networks (GAN), and deep Q-networks.
The learning algorithm may include a method for training a predetermined target device (for example, a robot) using a plurality of learning data to cause, allow, or control the target device to make a determination or prediction. Examples of learning algorithms include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.
The machine-readable storage medium may be provided in the form of a non-transitory storage medium. The, “non-transitory” storage medium may not include a signal and is tangible, but does not distinguish whether data is permanently or temporarily stored in a storage medium. For example, the “non-transitory storage medium” may include a buffer 123-3 in which data is temporarily stored.
According to various embodiments, a method disclosed herein may be provided in a computer program product. A computer program product may be traded between a seller and a purchaser as a commodity. A computer program product may be distributed in the form of a machine readable storage medium (e.g., compact disc ROM (CD-ROM)) or distributed online through an application store (e.g., PlayStore™) or distributed (e.g., download or upload) online between two user devices (e.g., smartphones) directly. In the case of on-line distribution, at least a portion of the computer program product (e.g., a downloadable app) may be stored temporarily or at least temporarily in a storage medium such as a manufacturer’s server, a server in an application store, or the memory in a relay server.
Alternatively or in addition, each of the components (e.g., modules or programs) according to one or more embodiments may include a single entity or a plurality of entities, and some sub-components of the sub-components described above may be omitted, or other sub-components may be further included in the various embodiments. Alternatively or additionally, some components (e.g., modules or programs) may be integrated into one entity to perform the same or similar functions performed by the respective components prior to the integration.
The operations performed by the module, the program, or other component, in accordance with various embodiments may be performed in a sequential, parallel, iterative, or heuristic manner, or at least some operations may be executed in a different order or omitted, or other operations may be added.
The term “unit” or “module” used in the disclosure includes units includes hardware, software, or firmware, or any combination thereof, and may be used interchangeably with terms such as, for example, logic, logic blocks, parts, or circuits. A “unit” or “module” may be an integrally constructed component or a minimum unit or part thereof that performs one or more functions. For example, the module may be configured as an Application-Specific Integrated Circuit (ASIC).
Embodiments may be implemented as software that includes instructions stored in machine-readable storage media readable by a machine (e.g., a computer). A device may call instructions from a storage medium and that is operable in accordance with the called instructions, including an electronic device (e.g., the electronic device 100).
When the instruction is executed by a processor, the processor may perform the function corresponding to the instruction, either directly or under the control of the processor, using other components. The instructions may include a code generated by a compiler or a code executed by an interpreter.
While the disclosure has been illustrated and described with reference to various example embodiments, it will be understood that the various example embodiments are intended to be illustrative, not limiting. One of ordinary skill in the art will understand that various changes in form and detail may be made without departing from the true spirit and full scope of the disclosure, including the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2020-0133509 | Oct 2020 | KR | national |
10-2021-0005465 | Jan 2021 | KR | national |
This application is a by-pass continuation application of International Application No. PCT/KR2021/013802, filed on Oct. 7, 2021, which based on and claims priority to Korean Patent Application No. 10-2020-0133509, filed on Oct. 15, 2020 and Korean Patent Application No. 10-2021-0005465, filed on Jan. 14, 2021 in the Korean Intellectual Property Office, the disclosures of which are incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2021/013802 | Oct 2021 | WO |
Child | 18120241 | US |