This application is based on and claims priority under 35 U.S.C. § 119(a) to Korean Patent Application No. 10-2020-0026010, which was filed in the Korean Intellectual Property Office on Mar. 2, 2020, the entire disclosure of which is incorporated herein by reference.
The disclosure relates generally to an electronic apparatus and a method for controlling the electronic apparatus, and more particularly, to an electronic apparatus that operates based on artificial intelligence (AI) technology, and a method for controlling the electronic apparatus.
AI systems implementing intelligence of a human level are being developed. An AI system may include a system in which a machine learns and determines by itself, unlike conventional rule-based smart systems. AI systems are being utilized in various areas, such as voice recognition, image recognition, and future prediction.
More recently, AI systems for resolving a given problem through a deep neural network based on deep learning are being developed.
A deep neural network includes a plurality of hidden layers between an input layer and an output layer, and provides a model implementing an AI technology through neurons included in each layer. A deep neural network as described above generally includes a plurality of neurons for deriving an accurate result value. However, when such a large number of neurons exist, although the accuracy of an output value for an input value may increase, there is a problem that a lot of time must be spent to derive an output value. Also, there is a problem that, due to the large number of neurons, a deep neural network cannot be used in mobile devices, such as a smartphone having a limited memory, due to the problem of capacity, etc.
The disclosure is provided to address at least the aforementioned problems, and to provide at least the advantages described below.
An aspect of the disclosure is to provide an electronic apparatus that accurately derives an output value within a short time, and allows implementation of an AI technology in mobile devices having a limited hardware and memory resources.
In accordance with an aspect of the disclosure, an electronic apparatus is provided for performing an operation of a neural network model. The electronic apparatus includes a memory configured to store weight data including quantized weight values of the neural network model; and a processor configured to obtain operation data based on input data and binary data having at least one bit value different from each other, generate a lookup table by matching the operation data with the binary data, identify operation data corresponding to the weight data from the lookup table, and perform an operation of the neural network model based on the identified operation data.
In accordance with an aspect of the disclosure, a method is provided for controlling an electronic apparatus to perform an operation of a neural network model. The method includes obtaining operation data based on input data and binary data including at least one bit value different from each other; generating a lookup table by matching the operation data with the binary data; identify operation data corresponding to weight data including quantized weight values of the neural network model from the lookup table; and performing an operation of the neural network model based on the identified operation data.
The above and other aspects, features, and advantages of certain embodiments of the disclosure will be more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which:
Hereinafter, various embodiments of the disclosure will be described with reference to the accompanying drawings. However, the various embodiments do not limit the technology described in the disclosure to a specific embodiment, but should be interpreted to include various modifications, equivalents and/or alternatives of the embodiments of the disclosure. With respect to the detailed description of the drawings, similar components may be designated by similar reference numerals.
Expressions such as “have,” “may have,” “include” and “may include” should be construed as denoting that there are such characteristics (e.g. elements such as numerical values, functions, operations, and components), and the expressions are not intended to exclude the existence of additional characteristics.
The expressions “A or B,” “at least one of A and/or B,” or “one or more of A and/or B”, etc., may include all possible combinations of the listed items. For example, “A or B,” “at least one of A and B,” or “at least one of A or B” may refer to all of the following cases: (1) including at least one A, (2) including at least one B, or (3) including at least one A and at least one B.
Further, the expressions “first,” “second,” etc., may be used to describe various elements regardless of any order and/or degree of importance. Such expressions are used only to distinguish one element from another element, and are not intended to limit the elements.
A description that one element (e.g. a first element) is “(operatively or communicatively) coupled with/to” or “connected to” another element (e.g. a second element) should be interpreted to include both the case where the one element is directly coupled to the another element, and the case where the one element is coupled to the another element through still another element (e.g. a third element). In contrast, a description that one element (e.g. a first element) is “directly coupled” or “directly connected” to another element (e.g. a second element) can be interpreted to mean that still another element (e.g. a third element) does not exist between the one element and the another element.
The expression “configured to” may be interchangeably used with other expressions such as “suitable for,” “having the capacity to,” “designed to,” “adapted to,” “made to,” and “capable of,” depending on cases. The term “configured to” does not necessarily mean that a device is “specifically designed to” in terms of hardware. Instead, under some circumstances, the expression “a device configured to” may mean that the device “is capable of” performing an operation together with another device or component. For example, the phrase “a processor configured to perform A, B, and C” may mean a dedicated processor (e.g. an embedded processor) for performing the corresponding operations, or a generic-purpose processor (e.g. a central processing unit (CPU) or an application processor (AP)) that can perform the corresponding operations by executing one or more software programs stored in a memory device.
In addition, “a module” or “a part” performs at least one function or operation, and may be implemented as hardware or software, or as a combination of hardware and software. Further, a plurality of “modules” or “parts” may be integrated into at least one module and implemented as at least one processor, except “modules” or “parts” that are described as being necessarily implemented as specific hardware.
Referring to
The electronic apparatus derives output data from input data by using a neural network model (or an AI model), and the electronic apparatus may include a desktop personal computer (PC), a laptop computer, a smartphone, a tablet PC, a server, etc. Alternatively, the electronic apparatus may be a system wherein a clouding computing environment is constructed. However, the disclosure is not limited thereto, and the electronic apparatus may be any suitable apparatus capable of performing an operation of a neural network model.
The memory 110 may include a hard disk, a non-volatile memory, a volatile memory, etc. A non-volatile memory may be a one-time programmable read only memory (OTPROM), a programmable ROM (PROM), an erasable and programmable ROM (EPROM), an electrically erasable and programmable ROM (EEPROM), a mask ROM, a flash ROM, etc., and a volatile memory may be a dynamic RAM (DRAM), a static RAM (SRAM), a synchronous dynamic RAM (SDRAM), etc.
Meanwhile, in
Although
The memory 110 may store weight data of a neural network model. The weight data may be a used for an operation of a neural network model, and the memory 110 may store a plurality of weight data corresponding to a plurality of layers constituting a neural network model.
The memory 110 may store weight data including quantized weight values. A quantized weight value may be −1 or 1, and weight data may be expressed as a matrix of m×n consisting of −1 or 1. Alternatively, the weight value −1 may be replaced with 0 and stored in the memory 110. That is, the memory 110 may store weight data consisting of 0 or 1. Weight data including weight values of −1 or 1 may be stored in a first memory (e.g., a hard disk), and weight data including weight values of 0 or 1 may be stored in a second memory (e.g., an SDRAM).
Quantization of a neural network model may be performed by the processor 120 of the electronic apparatus 100, and also be performed by an external apparatus (e.g., a server). When quantization of a neural network model is performed by an external apparatus, the processor 120 may receive weight data including quantized weight values from the external apparatus, and store the weight data in the memory 110.
A neural network model as described above may be based on a neural network. For example, a neural network model may be based on a recurrent neural network (RNN), i.e., a kind of deep learning model for learning data that changes according to passage of time such as time series data. However, the disclosure is not limited thereto, and a neural network model may be based on various networks, such as a convolutional neural network (CNN), a deep neural network (DNN), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), etc.
Alternatively, the memory 110 may store a model generated based on rules, but not a model trained through an AI algorithm. Essentially, there is no special limitation on a model stored in the memory 110.
The processor 120 controls the overall operations of the electronic apparatus. Accordingly, the processor 120 may include one processor or a plurality of processors. The one processor or the plurality of processors may be generic-purpose processors such as a CPU, and may also be graphic-dedicated processors, such as a graphic processing unit (GPU), or AI-dedicated processors, such as a neural network processing unit (NPU). The processor 120 may be a System on Chip (SoC) (e.g., an on-device AI chip), large scale integration (LSI), or a field programmable gate array (FPGA).
The processor 120 may quantize weight values of a neural network model. Specifically, when quantizing weight values with a kth bit, the processor 120 may quantize weight values of a neural network model through various quantization algorithms satisfying Equation (1).
In Equation (1), w is a weight value before quantization and α is a scaling factor. Additionally, b is a quantized weight value, which may be −1 or +1.
The processor 120 may quantize a weight value through a greedy algorithm. The processor 120 may obtain a scaling factor and a quantized weight value for k=1 in Equation (1) based on Equation (2).
In Equation (2), w is a weight value before quantization and α* is a scaling factor when k=1. Additionally, b* is a quantized weight value when k=1, and it may be −1 or +1. Further, n may be an integer greater than or equal to 1.
The processor 120 may obtain a scaling factor and a quantized weight value when k=i (1<i≤k) by repetitively utilizing Equation (3). That is, the processor 120 may obtain a scaling factor and a quantized weight value when k=i (1<i≤k) by using r, which is a difference between a weight value before quantization and a quantized weight value when k=1.
In Equation (3), w is a weight value before quantization and α is a scaling factor. Additionally, b is a quantized weight value, which may be −1 or +1. Further, r represents a difference between a weight value before quantization and a quantized weight value when case k=1.
The electronic apparatus may store weight data including a scaling factor, and weight values quantized to −1 or 1 in the memory 110. Although
The processor 120 may derive output data from the input data based on quantized weight values of a neural network model. The input data may be text, an image, a user voice, etc. For example, the text may be text input through an input such as a keyboard or a touch pad, and the image may be an image photographed through a camera of the electronic apparatus. The user voice may be spoken into a microphone of the electronic apparatus.
The output data may be different according to the kind of input data and/or the neural network model. That is, the output data may differ according to what kind of input data is input into what kind of neural network model. For example, when the neural network model is for language translation, the processor 120 may derive output data expressed in a second language from input data expressed in a first language. When the neural network model is for image analysis, the processor 120 may receive an image as input data of the neural network model, and derive information on an object detected from the image as output data. When the neural network model is for voice recognition, the processor 120 may receive a user voice as input data, and derive text corresponding to the user voice as output data. The aforementioned examples of output data are not limiting, and the kinds of output data may vary.
When input data is received, the processor 120 may express the input data as a matrix (or a vector or a tensor) including a plurality of input values. The method for expressing the input data as a matrix (or a vector or a tensor) may vary according to the kind and the type of the input data. For example, when text (or text that is converted from a user voice) is input data, the processor 120 may express the text as a vector through one-hot encoding or word embedding. One-hot encoding is a method of expressing only the value of the index of a specific word as 1 and expressing the values of the remaining indices as 0, and word embedding is a method of expressing a word as a real number with a dimension of a vector set by a user (e.g., 128 dimensions). As a method for word embedding, Word2Vec, FastText, Glove, etc., may be used. When an image is the input data, the processor 120 may express each pixel of the image as a matrix. For example, the processor 120 may express each pixel of the image as values of 0 to 255 for each of red, green, blue (RGB) colors, or express the image as a matrix with a value of dividing values expressed as 0 to 255 by a predetermined value (e.g., 255).
The processor 120 may derive at least one intermediate data from the input data based on quantized weight values and an input value of the input data, and then derive output data for the at least one intermediate data.
Unlike a conventional electronic apparatus deriving output data from input data by performing a matmul operation for a plurality of quantized weight values and a plurality of input values, the processor 120 may derive output data from input data by using a lookup table. This prevents a problem of latency that occurs conventionally when a plurality of matmul operations are performed and prevents a phenomenon of memory overload.
As described above with reference to
The processor 120 may generate a lookup table based on input values of input data and binary data. The binary data may include n bit values having a value of 0 or 1. The amount of binary data may be 2n. For example, binary data of 2 bits may include two bit values, each having a value of 0 or 1, and may be 00, 01, 10, or 11. As another example, binary data of 4 bits may include four bit values, such as 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, or 1111.
The processor 120 may obtain n input values from each column of an input matrix, and obtain operation data based on bit values of binary data and the obtained n input values. The bit values of the binary data may be 0 or 1 as described above, and the processor 120 may apply −1 to the input values when the bit values of the binary data are 0, and apply 1 to the input values when the bit values of the binary data are 1. Thereafter, the processor 120 may generate a lookup table by matching the obtained operation data with each of the binary data.
Referring to
The processor 120 may generate at least one lookup table for each column of the input matrix. As described above, when the input matrix is a 4×3 matrix, and n=2, the processor 120 may generate two lookup tables for each column of the input matrix, i.e., six lookup tables in total as illustrated in
As described above, the processor 120 may derive output data based on quantized weight values and input values of input data. Specifically, the processor 120 may derive output data for input data X based on an operation of weight data W (this may include a scaling factor A and a quantized weight value B) and the input data X. When weight values of the weight data are quantized to 3 bits, the processor 120 may derive output data from input data X based on Equation (4) below.
WX≈(AoBo+A1B1+A2B2)*X (4)
Equation (4) may be expressed in matrix form, as illustrated in
Referring to
Referring to
The processor 120 may identify n weight values corresponding to n input values in each row of a weight matrix including weight values of 0 or 1. That is, when lookup tables are generated by obtaining n input values in each column of the input matrix, the processor 120 may identify n weight values corresponding to n input values in each row of the weight matrix. When n=2, the processor 120 may identify two weight values corresponding to two input values for each row in a matrix including weight values.
Referring to
The processor 120 may identify binary data corresponding to the identified weight values among the binary data. The identified binary data includes the same bit values as the weight values. As illustrated in
The processor 120 may obtain an operation value corresponding to binary data identified from a lookup table. More specifically, the processor 120 may determine a lookup table including operation values corresponding to the identified weight values among the plurality of lookup tables 311 and 312. Specifically, if the identified weight values are values included in the k column of the weight matrix, the processor 120 may determine a lookup table generated based on the input values of the k row of the input matrix, among the plurality of lookup tables, as a lookup table including operation values corresponding to the identified weight values. If the identified weight values are values included in the first and second columns of the weight matrix, the processor 120 may determine the first lookup table 311 generated based on the input values of the first and second rows of the input matrix, between the first and second lookup tables 311 and 312, as a lookup table including operation values corresponding to the identified weight values. If the identified weight values are values included in the third and fourth columns of the weight matrix, the processor 120 may determine the second lookup table 312 generated based on the input values of the third and fourth rows of the input matrix, between the first and second lookup tables 311 and 312, as a lookup table including operation values corresponding to the identified weight values.
The processor 120 may obtain an operation value 0.20 matched with the binary data 10 identified from the first lookup table 311, obtain an operation value −0.37 matched with the binary data 01 identified from the second lookup table 312, and obtain a y1 value of the output matrix through summing up the operation values 0.20 and 0.37. For the second row of the weight matrix, the processor 120 may obtain an operation value −0.20 matched with the binary data 01 identified from the first lookup table 311, obtain an operation value 0.37 matched with the binary data 10 identified from the second lookup table 312, and obtain a y4 value of the output matrix through summing up the operation values −0.20 and 0.37. For the nth row of the weight matrix, the processor 120 may obtain an output value through a similar manner.
Referring to
Referring to
After the output values of the output matrix are obtained, the processor 120 may perform an operation of the output matrix and the scaling factor, and accordingly, obtain a result value of the aforementioned Equation (4). The processor 120 may output the final output data by using the obtained result value. As described above, if the neural network model is for language translation, the output data may be text in a different language from the text input data, and if the neural network model is for image analysis, the output data may be include information on objects included in an input image. However, the output data is not limited these examples.
As described above, output values are obtained through lookup tables without a matmul operation of quantized weight values and input values of input data, and thus, the problem of latency according to the large number of operations and a phenomenon of memory overload can be prevented.
Referring to
Referring to
Referring to
Referring to
Referring to
Although
As described above, when including the same intermediate operation expression, an operation of any one operation expression is performed by using an operation value of another operation expression, and accordingly, the number of operations of a processor for generating lookup tables can be greatly reduced.
Referring to
Specifically, the processor 120 may divide the input matrix into a first input matrix and a second input matrix based on predetermined rows, and divide the weight matrix into a third weight matrix and a fourth weight matrix based on predetermined columns. The predetermined rows may be n/2 rows if the number of rows of the input matrix is n, and the predetermined columns may be n/2 columns if the number of columns of the weight matrix is n. However, the examples are non-limiting.
The processor 120 may generate a plurality of lookup tables based on the input values of each column of the first input matrix, obtain operation data corresponding to each row of the third weight matrix from the plurality of lookup tables, generate a plurality of lookup tables based on the input values of each column of the second input matrix, and obtain operation data corresponding to each row of the fourth weight matrix from the plurality of lookup tables. The aforementioned technical idea can be applied to the method of generating lookup tables and obtaining operation data based on the lookup tables, and thus detailed explanation will be omitted.
When the number of rows of the input matrix and the number of columns of the weight matrix are 512, respectively, the processor 120 may divide the input matrix into an input matrix X1 and an input matrix X2 based on 256 rows, and divide the weight matrix into a weight matrix W1 and a weight matrix W2 based on 256 columns. The processor 120 may generate lookup tables based on the input values of the input matrix X1, obtain operation values corresponding to the weight matrix W1 from the lookup tables, generate lookup tables based on the input values of the input matrix X2, and obtain operation values corresponding to the weight matrix W2 from the lookup tables.
As described above, a plurality of operation values are obtained incrementally through divided matrices, and accordingly, the problem of memory overload may be avoided and the memory may be effectively used.
When the processor 120 is implemented as a plurality of processors, operation values corresponding to the weight matrix W1 may be obtained from the lookup tables based on the input matrix X1 and operation values corresponding to the weight matrix W2 may be obtained from the lookup tables based on the input matrix X2 in parallel through the plurality of processors. Accordingly, the time spent for the operations can be reduced.
Referring to
The first memory 110 may store an input matrix, a scaling factor for operations of a neural network model, and a weight matrix. The input matrix may include a plurality of input values, and the weight matrix may include weight values quantized to 0 or 1 as described above.
The LUT generator 130 may load the input matrix from the first memory 110. The LUT generator 130 may obtain operation values for the input values of the input matrix for each of the binary data. Specifically, when generating lookup tables of binary data of n bits, the LUT generator 130 may obtain n input values from each column of the input matrix, and obtain operation values for each of the binary data based on the binary data and then input values. The LUT generator 130 may match information of the columns and the rows of the input matrix, which became a basis for generation of lookup tables, to the lookup tables, and then store the matched information. The information of the columns may be used in determining lookup tables corresponding to each column of the output matrix among the plurality of lookup tables generated for each column of the input matrix. The information of the rows may be used in determining lookup tables including operation values corresponding to each column of the weight matrix among the plurality of lookup tables corresponding to each column of the output matrix.
The LUT generator 130 may generate lookup tables based on binary data of 8 bits. Specifically, the LUT generator 130 may obtain eight input values in each column of the input matrix, and obtain operation data for each of the binary data based on the binary data of 8 bits and the eight input values. This operation is performed in consideration of the processor 120, e.g., a CPU, processing data in byte units, and accordingly, may prevent overload of the processor by not performing shift operations for processing data of at a bit level.
The second memory 140 may store at least one lookup table. The second memory 140 may be a scratch pad memory (SPM) that temporarily stores data such as a lookup table.
The processor 120 may load the weight matrix from the first memory 110, and load lookup tables from the second memory 140. The processor 120 may obtain operation values of the weight values of the weight matrix from the lookup tables, accumulate the operation values in the accumulator, and obtain output values of the output matrix (i.e., the weight matrix B*the input matrix X) based on the summation of the operation values accumulated in the accumulator. The processor 120 may store information on the output values of the output matrix in the first memory 110. Afterwards, the multiplier 150 may load the output values stored in the first memory 110 and the scaling factor, and perform a multiplication operation of the output values and the scaling factor.
Although
In the above-described embodiments, lookup tables are generated based on input values of input data. However, an electronic apparatus according to an embodiment may generate lookup tables based on weight values of weight data by applying the aforementioned method in a reverse way. That is, the processor 120 may leave weight values as they are (i.e., processing them as real number values), and quantize input values of the input data. Thereafter, the processor 120 may generate lookup tables wherein operation values are matched with each of the binary data based on the weight values and the binary data of n bits, and obtain operation data corresponding to the input data from the lookup tables. Such lookup tables based on weight data may be used in operations of a language model wherein the size of input data is small and the size of weight data is big. The lookup tables based on input data described above may be used in operations of an image model wherein the size of input data is big and the size of weight data is small.
Referring to
In step S720, the electronic apparatus generates lookup tables in which the operation data is matched with the binary data.
In step S730, the electronic apparatus acquires operation data corresponding to the weight data from the lookup tables. The weight data may include the plurality of weight values of the matrix. The electronic apparatus may identify n weight values corresponding to the n input values in each row of the weight matrix, and identify binary data corresponding to the identified n weight values among the binary data. The electronic apparatus may obtain operation data corresponding to the identified binary data from the lookup tables.
In step S740, the electronic apparatus performs operations of the neural network model based on the obtained operation data.
Methods according to the aforementioned various embodiments of the disclosure may be implemented in the form of software or an application that can be installed on conventional display apparatuses.
A non-transitory computer readable medium storing a program sequentially performing the controlling method of an electronic apparatus according to the disclosure may also be provided.
A non-transitory computer readable medium refers to a medium that stores data semi-permanently, and is readable by machines, but not a medium that stores data for a short moment such as a register, a cache, and a memory. Specifically, the aforementioned various applications or programs may be provided while being stored in a non-transitory computer readable medium such as a compact disc (CD), a digital versatile disc (DVD), a hard disc, a blue-ray disc, a universal serial bus (USB), a memory card, a ROM, etc.
While the disclosure has been particularly shown and described with reference to certain embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2020-0026010 | Mar 2020 | KR | national |