This application claims the benefit under 35 USC § 119 (a) of Korean Patent Application No. 10-2023-0196506, filed on Dec. 29, 2023, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes.
The following disclosure relates to a method and apparatus with variable parameter expression.
Number formats used in deep learning quantization include fixed point and floating point formats. Fixed point is a number representation that uses all bits to express the mantissa with an implicit fixed exponent value, and floating point is a number representation that expresses a value with fixed-length exponent and mantissa, where the real number expressed without considering the position of the decimal point and the exponent indicating the position are expressed separately.
Compared to fixed point representation, floating point representation may express a larger range of numbers and have a larger bit range, but may have such a slow computation speed that a separate floating point arithmetic unit is often used.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In one general aspect, a neural processor apparatus includes: a comparator configured to read a value of a fixed-length exponent of a previously-converted parameter value and obtain mantissa-length information of a mantissa, wherein the mantissa-length information is obtained from a mapping table based on being mapped to the value of the exponent; a shifter configured to read the mantissa of the previously-converted parameter value and use the mantissa-length information to convert a structure of the previously-converted parameter value; and the mapping table, in which the mantissa-length information of the mantissa is mapped to the value of the exponent.
In the mapping table, the mapping of the mantissa-length information to the value of the exponent may be determined according to a distribution of neural network parameters.
The shifter may be further configured to shift the position of the value of the mantissa based on the mantissa-length information of the mantissa.
The shifter may be further configured to add “0” according to a number of bits allocated to express the previously-converted parameter value.
Based on a method of expressing the previously-converted parameter value, the comparator may be further configured to obtain exponent shift information of a fixed-length real part mapped to a value of the real part of the parameter value, and the shifter may be further configured to shift the value of the real part according to the exponent shift information of the real part.
The shifter may be further configured to sum a value of a decimal part of the parameter value and the value of the real part shifted by the exponent shift information of the real part.
The comparator and the shifter may be configured to read the exponent and the mantissa of the previously-converted parameter from a static random-access memory (SRAM), respectively.
For a multiply and accumulate (MAC) operation, the parameter value with the converted structure and the value of the exponent may be input into an arithmetic unit.
In another aspect, an operating method of a processor apparatus includes: reading parameters of a trained model; based on a distribution of the parameters, determining a mapping table mapping mantissa-lengths to respectively corresponding fixed-length exponents; and based on the mapping table, converting values of the parameters to have a number format including a fixed-length exponent and a variable-length mantissa.
In the mapping table, the mantissa-lengths may be determined according to the distribution of the parameters.
The operating method may further include: recording into a memory the exponent values and the mantissa values of the parameters as converted to the number format.
The converting of the values of the parameters to the number format including the fixed-length exponent value and the variable-length mantissa may include converting remaining values after the first occurrence of a “1” in the variable-length mantissa to the number format.
A non-transitory computer-readable storage medium may store instructions that, when executed by a processor, cause the processor to perform any of the operating methods.
In another general aspect, a processor apparatus includes: one or more processors; a memory; and one or more programs stored in the memory and configured to be executed by the one or more processors to cause the one or more processes to perform a process including: reading parameters of a trained model; based on a statistical distribution of the parameters, determining a mapping table mapping mantissa-lengths to respectively corresponding fixed-length exponents; and based on the mapping table, converting values of the parameters to have a number format including a fixed-length exponent and a variable-length mantissa, wherein the converting is performed according to the values of the parameters.
In the mapping table, the mantissa-lengths may be determined according to the statistical distribution of the parameters.
The process may further include: recording into the memory the exponent values and the mantissa values of the parameters as converted to the number format.
The converting of the values of the parameters to the number format including the fixed-length exponent value and the variable-length mantissa may include converting remaining values after the first occurrence of a “1” in the variable-length mantissa to the number format. Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described or provided, the same or like drawing reference numerals will be understood to refer to the same or like elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order. Also, descriptions of features that are known after an understanding of the disclosure of this application may be omitted for increased clarity and conciseness.
The features described herein may be embodied in different forms and are not to be construed as being limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application.
The terminology used herein is for describing various examples only and is not to be used to limit the disclosure. The articles “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items. As non-limiting examples, terms “comprise” or “comprises,” “include” or “includes,” and “have” or “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, operations, members, elements, and/or combinations thereof.
Throughout the specification, when a component or element is described as being “connected to,” “coupled to,” or “joined to” another component or element, it may be directly “connected to,” “coupled to,” or “joined to” the other component or element, or there may reasonably be one or more other components or elements intervening therebetween. When a component or element is described as being “directly connected to,” “directly coupled to,” or “directly joined to” another component or element, there can be no other elements intervening therebetween. Likewise, expressions, for example, “between” and “immediately between” and “adjacent to” and “immediately adjacent to” may also be construed as described in the foregoing.
Although terms such as “first,” “second,” and “third”, or A, B, (a), (b), and the like may be used herein to describe various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not to be limited by these terms. Each of these terminologies is not used to define an essence, order, or sequence of corresponding members, components, regions, layers, or sections, for example, but used merely to distinguish the corresponding members, components, regions, layers, or sections from other members, components, regions, layers, or sections. Thus, a first member, component, region, layer, or section referred to in the examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and based on an understanding of the disclosure of the present application. Terms, such as those defined in commonly used dictionaries, are to be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the disclosure of the present application and are not to be interpreted in an idealized or overly formal sense unless expressly so defined herein. The use of the term “may” herein with respect to an example or embodiment, e.g., as to what an example or embodiment may include or implement, means that at least one example or embodiment exists where such a feature is included or implemented, while all examples are not limited thereto.
The number format described below is a general-purpose number format with a diverse range of expressions. The number format may have the form as shown in
A mapping table may be determined depending on the range and space of a set of data values to be expressed with the number format, and the range and space of the data values to be expressed may be determined differently depending on the application.
A method of expressing a data value in the format shown in
Since the length of the exponent is fixed, when the number of cases of the exponent is e, ceil (log2e) bits are needed (here, e is a quantity rather than the mathematical constant e). For example, if the number of cases of the exponent is “8”, then “3” bits are needed.
In the mantissa, the numbers after the first-occurring “1” may be continued after the exponent as they are, and “0” may be stored in the mapping table as a predetermined exponent code. Depending on the value of the exponent, the mantissa may be stored as a separate mapping table or fixed as hardware to indicate how many bits the mantissa has.
A neural processor apparatus (hereinafter, the “apparatus”) may convert parameters of a neural network, e.g., to the format of
In operation 210, the neural processor apparatus reads the parameters of a trained model. The parameters may correspond to parameters of at least one layer of the trained model, for example. In some embodiments, the parameters may be weights of respective inter-node connections.
In operation 220, the neural processor apparatus determines a mapping table containing mantissa-length information of mantissas respectively corresponding to fixed-length exponents. That is to say, the mapping table may map different fixed-length exponents to respectively corresponding mantissa-lengths. The mantissa-length information of a mantissa may indicate the number of bits with which to express a corresponding mantissa.
In the mapping table, the mantissa-lengths mapped to the exponents may be automatically determined based on the distribution of the parameters, or may be determined through experimental analysis. For example, for parameters with a high distribution (i.e., more frequently occurring), the mapping table may map to greater mantissa lengths; bits having a lot of length information of the mantissa may be allocated so that more net mantissa information is expressed, and small numbers may be omitted within large numbers.
In operation 230, the neural processor apparatus converts values of the parameters from a same format of the values to be converted (e.g., a standard format such as FP16 or some other format), to a number format with a fixed-length exponent value and a variable-length mantissa. Regarding how the exponent may be converted, reduction may be based on the possible combinations of the exponent part. For example, if there are 8 possibilities of the exponent part, a 3-bit combination can be generated such as 000, 001, 010 . . . , and a one-to-one mapping may be established for each, resulting in a new/shorter exponent.
A parameter may be converted to the variable-length-mantissa number format previously described with reference to
Thereafter, the mantissas of respective parameters may be expressed according to the number of mantissa bits indicated by referring to the mapping table, which indicates how many bits (excluding the first occurrence of a “1”, from the left) are to be used to express the mantissas for the respectively corresponding exponent values. In this way, a number format may be formed with a set of the encoded exponent and the mantissa excluding the first “1”.
As a result of the conversion, a mapping table indicating the length of the mantissa, encoded bits, and values corresponding to the number format may be output, in cases where the mapping table is not computed or provided beforehand. Here, “value” is a value represented by the set that includes the encoded exponent part and the mantissa part excluding the first 1 (“value” does not refer the mapped value).
To summarize, the mapping table may map specific statistically-determined bit patterns of exponents to respectively corresponding mantissa lengths. It is possible, depending on the distribution of exponent values, that the converted exponents are compressed as compared to their original form.
A neural network processor apparatus 300 may include a comparator 310 configured to read the 3-bit (for example) exponent of a previously-converted parameter (a parameter in the variable-mantissa format), use the exponent as an index into a mapping table 301, and thus obtain the mantissa-length information corresponding to (mapped to) the value of the exponent. A shifter 320 may be configured to read the mantissa (e.g., 7 bits) of the previously-converted parameter and convert the value of the mantissa to a value/form before the conversion (or, to the format prior to conversion) based on the obtained mantissa-length information. For example, the 7-bit converted mantissa may be converted, by corresponding shifting, to its original 8-bit mantissa.
To summarize, the comparator 310 may obtain the mantissa-length information of the mantissa according to the value of its exponent by referring to the mapping table 301. Conversion of a parameter from the variable-length-mantissa format to the original/standard format may be referred to as reverse conversion, or, reversion.
The value of the exponent and the value of the mantissa (from the parameter in its converted (variable-length-mantissa) form) may be provided to the comparator 310 and the shifter 320, respectively, through a memory (e.g., static random-access memory (SRAM), etc.) in which the exponent and the mantissa are stored separately.
The comparator 310 may verify how much the mantissa is compressed by looking up the value of the exponent in the mapping table 301 corresponding thereto. The mapping table 301 indicates how many bits with which the mantissa is expressed (in converted form), according to the value of the exponent. The mapping table 301 may indicate how many bits are needed to express the remaining values except (after) the first occurrence of “1” in the mantissa before conversion. To elaborate, as explained above, based on the bit combinations generated according to the possible values of the exponent part, each is mapped on a one-to-one basis. Using this mapping information, the encoded mapping may then be converted back to the original exponent part.
The shifter 320 may revert the mantissa by converting the value of the mantissa (from its previously-converted form) to the form the mantissa had before being converted (e.g., a fixed-length mantissa possibly according to a standard floating point format) by shifting the value of the previously-converted mantissa by the mantissa-length information mapped to the value of the exponent. To explain further, the read mantissa of the shifter has a fixed length (maximum mantissa length). If a table indicates that a shorter mantissa length is needed, the shifter pads with zeros to reach the maximum length (meaning the length increases, but only with leading zeros, not the mantissa itself). For shorter mantissa lengths, the read mantissa may contain multiple mantissas of different values (data) that are not aligned (starting from the rightmost, LSB). For example, if a 7-bit value 11001 01 is input and only 11001 is needed, it shifts (along with a leading 1) to produce 001 11001.
Thereafter, the converted value of the mantissa and the converted value of the exponent may be input into an arithmetic unit 330 and be in a form suitable for computation thereon by the arithmetic unit 330 (e.g., a standard form such as FP16). The arithmetic unit 330 may obtain the reverted pre-conversion parameter value using the values of the mantissa and exponent of the parameter as converted by the shifter 320, and, for example, may perform a MAC operation with the reverted parameter.
The structure shown in
A neural network processor apparatus 400 may include a comparator 410 configured to (i) read the real (data) part of a parameter converted by fixed point and (ii) obtain exponent shift information of the real part mapped to a mapping table 401 according to the value of the real part. The apparatus may also include a shifter 420 configured to read the decimal (fractional) part of the parameter converted by fixed point and convert the value of the decimal part to a pre-conversion parameter by summing the value of the decimal part and the value of the real part whose exponent shifted by the exponent shift information.
According to
The mapping table 401 may map exponent shift information for the real part according to the value of the real part. The exponent shift information may be provided in a form where the exponential shift information decreases by one digit as the value of the real part decreases. For example, for an 8-bit parameter, a mapping table that decreases in the form of 1xxxxxxx, 1xxxxxx, 1xxxxx, . . . , 1 as the real part value from 111 to 000 decreases may be provided.
Alternatively, the mapped value of the mapping table 401 may be expressed as indicating the number of digits to be moved to the right of the real part, depending on the value of the real part. For example, if the value of the real part is 010, the value of the decimal part may be expressed after 1 digit of the real part.
The shifter 420 may shift the exponent of the real part based on the value of the real part and the exponent shift information of the real part obtained from the comparator 410, and obtain a 16-bit parameter before conversion by summing the value of the decimal part as it is.
Information on the real part and the decimal part may be separately stored in a memory such as SRAM, and the separately stored information may be input into the comparator 410 and the shifter 420, respectively.
Parameters before conversion into a computable form may be input into an arithmetic unit 430 to perform a MAC operation thereon. The arithmetic unit 430 may perform a MAC operation using the parameters before conversion by the shifter 320.
A method of performing a MAC operation using a 16-bit parameter by an arithmetic unit will be described.
In the example of
Assuming that the data format of the weight and activation is a binary format of half-precision floating point (FP16, as per an IEEE standard, for example), the weight may be assumed to be 1.25:0011_1101_0000_0000, and the activation may be assumed to be 2.125:0100_0000_1000_0000.
Analyzing the example weight, the sign is indicated by the leftmost (most significant) bit and is 0, the exponent is indicated by the next 5 bits and is 01111, the mantissa is represented by the remaining bits and is 0100000000. This value may be expressed as 01111 for the exponent and 0000000101 for the converted mantissa in a mapping table. From the mapping table, the mantissa may be converted to 01 and stored in a memory.
To be input into an arithmetic unit, the pre-conversion parameter is obtained from the converted parameter. The exponent 01111 may be obtained from the converted parameter, and it may be verified that the shift information of the mantissa is 8 digits according to the mapping table.
After converting the weight to the pre-conversion value, an operation may be performed as shown in
An XOR operation may be performed on the sign, an addition may be performed on the exponent, and a multiplication may be performed on the mantissa. Thereafter, an output may be obtained by performing normalization and rounding on the operation result of the exponent and the operation result of the mantissa.
The obtained output may be the activation for the next layer, or may be the final output value.
By configuring an NPU arithmetic unit to compute a number format expressed by a fixed exponent and a variable mantissa, multiplication operations may be processed for the formats defined through the description of
The distribution of the deep learning parameters may be stored in an on-chip memory such as SRAM in an efficient size of a number format expressed by a fixed exponent and a variable mantissa, and the accuracy, computational efficiency, storage efficiency, and space efficiency may be flexibly adjusted.
Unlike
Even if the same fixed-point expression space is stored, due to the distribution characteristics of deep learning parameters, it may be stored in the memory in a small size.
Since the NPU of the smartphone consumes a significant amount of power to read parameter information from an external memory when performing a neural network operation (e.g., an inference), reading converted parameters may reduce the traffic and power consumption for parameters through the parameters of the example expressed with a small number of bits.
If the neural network processing rate decreases due reading parameters, it is possible to improve the neural network processing rate with respect to reading parameters.
Referring to
The communication interface 810 may receive parameters.
The processor 830 may convert the parameters received through the communication interface 810 to a predetermined number format. The processor 830 may convert the obtained parameters into a number format including a fixed exponent and a variable mantissa by referring to a mapping table.
The memory 850 may store a variety of information generated in a processing operation of the processor 830 described above. In addition, the memory 850 may store a variety of data and programs. The memory 850 may include a volatile memory or a non-volatile memory. The memory 850 may include a large-capacity storage medium such as a hard disk to store a variety of data.
In addition, the processor 830 may perform the at least one method described above with reference to
The processor 830 may execute the program (code/instructions) and control the processor apparatus 800. Program code to be executed by the processor 830 may be stored in the memory 850.
The computing apparatuses, the processors, the memories, the image sensors, the displays, the information output system and hardware, the storage devices, and other apparatuses, devices, units, modules, and components described herein with respect to
The methods illustrated in
Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software includes higher-level code that is executed by the one or more processors or computer using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions herein, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of a non-transitory computer-readable storage medium include read-only memory (ROM), random-access programmable read only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROMs, CD-Rs, CD+Rs, CD-RWs, CD+RWs, DVD-ROMs, DVD-Rs, DVD+Rs, DVD-RWs, DVD+RWs, DVD-RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, blue-ray or optical disk storage, hard disk drive (HDD), solid state drive (SSD), flash memory, a card type memory such as multimedia card micro or a card (for example, secure digital (SD) or extreme digital (XD)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
While this disclosure includes specific examples, it will be apparent after an understanding of the disclosure of this application that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.
Therefore, in addition to the above disclosure, the scope of the disclosure may also be defined by the claims and their equivalents, and all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0196506 | Dec 2023 | KR | national |