The present disclosure relates to an information processing device and an information processing method.
Recently, neural networks, which are mathematical models simulating the mechanism of the cerebral nervous system, have attracted attention. In addition, various techniques for reducing processing load of operation in the neural network have been proposed. For example, Non Patent Literature 1 discloses a description related to a quantization function that realizes accurate quantization of intermediate values and weights in learning.
However, in the quantization function described in Non Patent Literature 1, consideration for a dynamic range related to quantization is not sufficient. Therefore, it is difficult to optimize the dynamic range with the quantization function described in Non Patent Literature 1.
Therefore, the present disclosure proposes a novel, improved information processing device and information processing method which are capable of reducing processing load of operation and realizing learning with higher accuracy.
According to the present disclosure, an information processing device is provided that includes a learning unit that optimizes parameters that determine a dynamic range by an error back propagation and a stochastic gradient descent in a quantization function of a neural network in which the parameters that determine the dynamic range are arguments.
Moreover, according to the present disclosure, an information processing method, by a processor, is provided that includes optimizing parameters that determine a dynamic range by an error back propagation and a stochastic gradient descent in a quantization function of a neural network in which the parameters that determine the dynamic range are arguments.
As described above, according to the present disclosure, it is possible to reduce processing load of operation and to realize learning with higher accuracy.
It is to be noted that the above effect is not necessarily limited, and any of the effects presented in the present description or other effects that can be understood from the present description may be achieved in addition to or in place of the above effect.
Hereinafter, a preferred embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. It is to be noted that in the present description and drawings, components having substantially an identical functional configuration are given an identical reference sign, and redundant description thereof is omitted.
It is to be noted that the explanation will be made in the following order.
1. Embodiment
1.1. Overview
1.2. Functional configuration example of information processing device 10
1.3. Details of optimization
1.4. Effects
1.5. API Details
2. Hardware configuration example
3. Summary
Recently, learning techniques using neural networks such as deep learning have been widely studied. While the learning technique using the neural network has high accuracy, processing load related to the operation is large, and hence an operation method that effectively reduces the processing load is required.
For this reason, in recent years, many quantization techniques have been proposed in which, for example, parameters such as a weight and a bias are quantized into a few bits to perform operation efficiency improvement and memory saving. Examples of the quantization techniques described above include linear quantization and power quantization.
For example, in the case of linear quantization, an input value x input as float is converted into an int by performing quantization using a quantization function indicated in the following Expression (1), and effects such as operation efficiency improvement and memory saving can be achieved. Expression (1) may be a quantization function used when the quantized value is not permitted to be a negative value (sign=False). In addition, in Expression (1), n represents a bit length and δ represents a step size.
In addition, the quantization function given in the following Expression (2) can be used in power-of-two quantization, for example. It is to be noted that Expression (2) may be a quantization function used when the quantized value is not permitted to be a negative value or 0 (sign=False, zero=False). In addition, in Expression (2), n represents a bit length and m represents an upper (lower) limit value.
In addition, on the right side of
In addition, on the left side of
In addition, on the right side of
As described above, according to quantization techniques such as linear quantization and power quantization, it is possible to realize operation efficiency improvement and memory saving by expressing the input value with a smaller bit length.
In recent years, however, neural networks generally have tens to hundreds of layers. Here, for example, a neural network with 20 layers where the weight coefficient, the intermediate value, and the bias are quantized by power-of-two quantization and trials are made up to the bit length of [2, 8] and the upper limit value of [−16, 16] is assumed. In this case, there are (7×33)×2=462 ways of quantization of the parameter and 7×33=231 ways of quantization of the intermediate value, and thus there are (462×231){circumflex over ( )}20 patterns in total.
Therefore, it has been practically difficult to manually obtain truly optimal hyperparameters.
The technical idea according to the present disclosure was conceived by focusing on the above point, and the technical idea is to make it possible to automatically search for a hyper parameter that realizes highly accurate quantization. For this purpose, an information processing device 10 that realizes an information processing method according to an embodiment of the present disclosure includes a learning unit 110 that optimizes parameters that determine a dynamic range by an error back propagation and a stochastic gradient descent in a quantization function of a neural network in which the parameters that determine the dynamic range are arguments.
Here, the parameters that determine the dynamic range may include at least a bit length at the time of quantization.
The parameters that determine the dynamic range may include various parameters that affect the determination of the dynamic range together with the bit length at the time of quantization. Examples of the parameter include an upper limit value or a lower limit value at the time of power quantization and a step size at the time of linear quantization.
That is, the information processing device 10 according to the present embodiment is capable of optimizing a plurality of parameters that affect the determination of the dynamic range in various quantization functions without depending on a specific quantization technique.
In addition, the information processing device 10 according to the present embodiment may locally or globally optimize the parameters described above on the basis of the setting made by a user, for example.
On the other hand, the information processing device 10 according to the present embodiment may optimize, for a plurality of layers in common, the parameters that determine the dynamic range. The information processing device 10 according to the present embodiment may optimize the bit length n and the upper limit value m in the power-of-two quantization for the entire neural network in common, as illustrated in the lower part of
In addition, as illustrated in
Hereinafter, the function described above of the information processing device 10 according to the present embodiment will be described in detail.
First, the functional configuration example of the information processing device 10 according to an embodiment of the present disclosure will be described.
The network may include a public line network such as the Internet, a telephone line network, and a satellite communication network, a various local area networks (LAN) including Ethernet (registered trademark), and a wide area network (WAN). In addition, a network 30 may also include a dedicated line network such as an Internet protocol-virtual private network (IP-VPN). In addition, the network 30 may also include a wireless communication network such as Wi-Fi (registered trademark) and Bluetooth (registered trademark).
(Learning Unit 110)
The learning unit 110 according to the present embodiment has a function of performing various types of learning using the neural network. In addition, the learning unit 110 according to the present embodiment performs quantization of weights and biases at the time of learning using a quantization function.
At this time, one of the characteristics of the learning unit 110 according to the present embodiment is to optimize parameters that determine the dynamic range by the error back propagation and the stochastic gradient descent in a quantization function of a neural network in which the parameters that determine the dynamic range are arguments. The function of the learning unit 110 according to the present embodiment will be described in detail separately.
(Input/Output Control Unit 120)
The input/output control unit 120 according to the present embodiment controls the API for the user to perform settings related to learning and quantization by the learning unit 110. The input/output control unit 120 according to the present embodiment acquires various values having been input by the user via the API and delivers them to the learning unit 110. In addition, the input/output control unit 120 according to the present embodiment can present parameters optimized on the basis of the various values to the user via the API. The function of the input/output control unit according to the present embodiment will be separately described in detail later.
(Storage Unit 130)
The storage unit 130 according to the present embodiment has a function of storing programs and data used in each configuration included in the information processing device 10. The storage unit 130 according to the present embodiment stores various parameters used for learning and quantization by the learning unit 110, for example.
A functional configuration example of the information processing device 10 according to the present embodiment has been described above. It is to be noted that the configuration described above with reference to
Next, optimization of the parameter by the learning unit 110 according to the present embodiment will be described in detail. First, an object to be quantized by the learning unit 110 according to the present embodiment will be described.
As illustrated in
In addition, the learning unit 110 according to the present embodiment performs backward propagation by calculating a partial differential of the learning parameters such as the weight and the bias on the basis of a parameter gradient output from a downstream layer in a backward direction as illustrated in the lower part of
In addition, the learning unit 110 according to the present embodiment updates learning parameters such as the weight and the bias so as to minimize the error by the stochastic gradient descent. At this time, the learning unit 110 according to the present embodiment can update the learning parameter using the following Expression (3), for example. Although Expression (3) indicates an expression for updating the weight w, other parameters can be updated by the similar calculation. In Expression (3), C represents cost and t represents iteration.
w
t
←w
t-1
−η∂C/∂w (3)
Thus, the learning unit 110 according to the present embodiment advances learning by performing forward propagation, backward propagation, and update of learning parameters. At this time, the learning unit 110 according to the present embodiment quantizes the learning parameters such as the weight w and the bias described above using the quantization function, thereby allowing the operation load to be reduced.
At this time, the learning unit 110 according to the present embodiment can similarly quantize the weight w of the float type into the weight wq of the int type on the basis of a bit length nq and an upper limit value mq that have been quantized from the float type to the int type.
In addition,
The learning and quantization by the learning unit 110 according to the present embodiment have been outlined. Subsequently, the optimization of the parameters that determine the dynamic range by the learning unit 110 according to the present embodiment * will be described in detail.
It is to be noted that in the following, an example of calculation in the case where the learning unit 110 according to the present embodiment performs optimization of parameters that determine the dynamic range in linear quantization and power-of-two quantization will be described.
In addition, in the following, the value quantized in the linear quantization is expressed by the following Expression (4). At this time, the learning unit 110 according to the present embodiment optimizes the bit length n and the step size δ as parameters that determine the dynamic range.
k·δ with k∈[0,2n−1](‘sign=False’)
or ±k·δ with k∈[0,2n-1−1](‘sign=True’) (4)
In addition, in the following, the value to be quantized in the power-of-two quantization is expressed by the following Expression (5). At this time, the learning unit 110 according to the present embodiment optimizes the bit length n and the upper (lower) limit value as parameters that determine the dynamic range.
[Expression 5]
2k with k∈[m−2n+1,m](‘with_zero=False’,sign=‘False’)
±2k with k∈[m−2n-1+1,m](‘with_zero=False’,sign=‘True’)
{0,2k} with k∈[m−2n-1+1,m](‘with_zero=True’,sign=‘False’)
{0,±2k} with k∈[m−2n-2+1,m](‘with_zero=True’,sign=‘True’) (5)
In addition, the quantization and the optimization of the parameters that determine dynamic range are performed in the affine layer or the convolution layer.
In addition, the gradient given is related to the input/output of a scalar value, and λ∈{n, m, δ} related to a cost function C is given by the chain rule.
Here, an output y∈R to an input x∈R of the scalar value is also a scalar value, and the gradient of the cost function C related to the parameter is expressed by the following Expression (6).
In addition, an output y∈RI of the vector value to an input x∈RI is also a vector value, and the gradient of the cost function C related to the parameter is expressed by the following Expression (7) as all the outputs yi based on λ.
The premise related to the optimization of the parameters that determine the dynamic range according to the present embodiment has been described above. Subsequently, the optimization of the parameters in each quantization technique will be described in detail.
First, the optimization of the parameters related to linear quantization where negative values are not permitted by the learning unit 110 according to the present embodiment will be described. Here, let the bit length n and the step size δ in the forward propagation be [minn, maxn] and [minδ, maxδ], respectively, and let the bit length n quantized to the int type by a round function be nq. At this time, the quantization of the input value is expressed by the following Expression (8).
In addition, the gradient of the bit length n and the gradient of the step size δ are expressed in the backward propagation by the following Expressions (9) and (10), respectively.
Next, the optimization of the parameters related to linear quantization where negative values are permitted by the learning unit 110 according to the present embodiment will be described. Here, let the bit length n and the step size δ in the forward propagation be [minn, maxn] and [minδ, maxδ], respectively, and let the bit length n quantized to the int type by a round function be nq. At this time, the quantization of the input value is expressed by the following Expression (11).
In addition, the gradient of the bit length n and the gradient of the step size δ are expressed in the backward propagation by the following Expressions (12) and (13), respectively.
Next, the optimization of the parameters related to power-of-two quantization where negative values and 0 are not permitted by the learning unit 110 according to the present embodiment will be described. Here, let the bit length n and the upper (lower) limit value m in the forward propagation be [minn, maxn] and [minm, maxm], respectively, and let the bit length n and the upper (lower) limit value m that are quantized to the int type by the round function be nq and mq, respectively. At this time, the quantization of the input value is expressed by the following Expression (14).
It is to be noted that the value of 0.5 in Expression (14) described above and the expressions thereafter related to power-of-two quantization is a value used for differentiation from the lower limit value, and it may not be limited to 0.5 and may be log2 1.5, for example.
In addition, in the backward propagation, the gradient of the bit length n is all 0 except the condition presented by the following Expression (15), and the gradient of the upper (lower) limit value m is expressed by the following Expression (16).
Next, the optimization of the parameters related to power-of-two quantization where negative values are permitted and 0 is not permitted by the learning unit 110 according to the present embodiment will be described. Here, let the bit length n and the upper (lower) limit value m in the forward propagation be [minn, maxn] and [minm, maxm], respectively, and let the bit length n and the upper (lower) limit value m that are quantized to the int type by the round function be nq and mq, respectively. At this time, the quantization of the input value is expressed by the following Expression (17).
In addition, in the backward propagation, the gradient of the bit length n is all 0 except the condition indicated by the following Expression (18), and the gradient of the upper (lower) limit value m is expressed by the following Expression (19).
Next, the optimization of the parameters related to power-of-two quantization where negative values are not permitted and 0 is permitted by the learning unit 110 according to the present embodiment will be described. Here, let the bit length n and the upper (lower) limit value m in the forward propagation be [minn, maxn] and [minm, maxm], respectively, and let the bit length n and the upper (lower) limit value m that are quantized to the int type by the round function be nq and mq, respectively. At this time, the quantization of the input value is expressed by the following Expression (20).
In addition, in the backward propagation, the gradient of the bit length n is all 0 except the condition indicated by the following Expression (21), and the gradient of the upper (lower) limit value m is expressed by the following Expression (22).
Next, the optimization of the parameters related to power-of-two quantization where both negative values and 0 are permitted by the learning unit 110 according to the present embodiment will be described. Here, let the bit length n and the upper (lower) limit value m in the forward propagation be [minn, maxn] and [minm, maxm], respectively, and let the bit length n and the upper (lower) limit value m that are quantized to the int type by the round function be nq and mq, respectively. At this time, the quantization of the input value is expressed by the following Expression (23).
In addition, in the backward propagation, the gradient of the bit length n is all 0 except the condition presented by the following Expression (24), and the gradient of the upper (lower) limit value m is expressed by the following Expression (25).
Next, effects of optimization of parameters that determine the dynamic range according to the present embodiment will be described. First, results of classification using CIFAR-10 will be described. It is to be noted that ResNet-20 was adopted as the neural network.
In addition, here, as the initial value of the bit length n in all layers, 4 bits or 8 bits was set, and three experiments were conducted in which the weight w was quantized by linear quantization, power-of-two quantization where 0 is not permitted, and power-of-two quantization where 0 is permitted.
In addition, as the initial value of the upper limit value m of the power-of-two quantization, the value calculated by the following Expression (26) was used for all layers.
In addition, as the initial value of the step size δ of the linear quantization, the value of the power of 2 calculated by the following Expression (27) was used for all layers.
In addition, as the permissible range of each parameter, n∈[2, 8], m∈[−16, 16], and δ∈[2−12, 2−2] were set.
First, the result of the best validation error in each condition is illustrated in
It is to be noted that the detailed value of the error in each condition is as follows. In
Float Net 7.84%
FixPoint, Init4: 9.49%
FixPoint, Init8: 9.23%
Pow2, Init4: 8.42%
Pow2, Init8: 8.40%
Pow2wz, Init4: 8.74%
Pow2wz, Init8: 8.28%
Next, the optimization result of the parameters in each layer will be presented.
In addition,
Referring to
In addition,
In addition,
Referring to
As described above, according to the optimization of the parameters that determine the dynamic range according to the present embodiment, it is possible to automatically optimize each parameter for each layer regardless of the quantization technique, and it is possible to dramatically reduce the load of manual searching and to greatly reduce the operation load in a huge neural network.
Next, experiment results when quantization of the intermediate value was performed are presented. Here, ReLU was replaced in the power-of-two quantization where 0 is permitted and negative values are not permitted. In addition, as the data set, CIFAR-10 was used similarly to the quantization of the weight.
In addition, as the setting of each parameter, n∈[3, 8] and the initial value of 8 bits, m∈[−16, 16] were set.
In addition,
Referring to
Subsequently, experiment results when quantization of the weight w and quantization of the intermediate value were performed simultaneously are presented. Also in this experiment, as the data set, CIFAR-10 was used similarly to the quantization of the weight. In addition, as the setting of each parameter, n∈[2, 8] and the initial value of 2, 4, or 8 bits, m∈[−16, 16] and the initial value m=0 were set.
It is to be noted that the experiments were conducted with initial learning rates of 0.1 and 0.01.
In addition,
Referring to
In addition,
Referring to
In addition,
Referring to
In addition,
Referring to
The effects of optimization of parameters that determine the dynamic range according to the present embodiment have been described above. According to the optimization of the parameters that determine the dynamic range according to the present embodiment, it is possible to automatically optimize each parameter for each layer regardless of the quantization technique, and it is possible to dramatically reduce the load of manual searching and to greatly reduce the operation load in a huge neural network.
Next, the API controlled by the input/output control unit 120 according to the present embodiment will be described in detail. As described above, the input/output control unit 120 according to the present embodiment controls the API for the user to perform settings related to learning and quantization by the learning unit 110. The API according to the present embodiment is used, for example, for the user to input, for each layer, initial values of the parameters that determine the dynamic range and various settings related to quantization, for example, setting of whether or not to permit negative values or 0.
At this time, the input/output control unit 120 according to the present embodiment can acquire the set value input by the user via the API, and return, to the user, the parameters that determine the dynamic range optimized by the learning unit 110 on the basis of the set value.
Here, focusing on the upper part of
On the other hand, in the API of the linear quantization according to the present embodiment illustrated in the lower part of
At this time, the user can obtain, in addition to the output value h of the corresponding layer, the optimized bit length n and the step size δ that are stored in each variable described above. As described above, according to the API controlled by the input/output control unit 120 according to the present embodiment, the user can input the initial value, setting, and the like of each parameter related to quantization, and can easily acquire the value of the optimized parameter.
It is to be noted that although the API when inputting the step size δ is indicated in the example illustrated in
Here, focusing on the upper part of
On the other hand, in the API of the power-of-two quantization according to the present embodiment illustrated in the lower part of
At this time, the user can obtain, in addition to the output value h of the corresponding layer, the optimized bit length n and the upper limit value m that are stored in each variable described above. As described above, the API according to the present embodiment allows the user to perform any setting for each layer and to optimize, for each layer, the parameters that determine the dynamic range.
It is to be noted that if it is desired to perform quantization using the identical parameter described above in a plurality of layers, the user may set the identical variable defined upstream in a function corresponding to each layer, as illustrated in
As described above, the API according to the present embodiment allows the user to freely set whether to use a different parameter for each layer or to use different parameters common to a plurality of any layers (for example, a block or all target layers). The user can perform setting for using the identical n and n_q in a plurality of layers, and at the same time, using m and m_q different in each layer, for example.
Next, a hardware configuration example of the information processing device 10 according to an embodiment of the present disclosure will be described.
(Processor 871)
The processor 871 functions as, for example, an arithmetic processing unit or a control unit, and controls the overall operation of each component or a part thereof on the basis of various programs recorded in the ROM 872, the RAM 873, the storage 880, or a removable recording medium 901.
(ROM 872, RAM 873)
The ROM 872 is a means for storing a program to be read into the processor 871, data to be used for operation, and the like. The RAM 873 temporarily or permanently stores, for example, a program to be read into the processor 871 and various parameters which change appropriately when the program is executed.
(Host Bus 874, Bridge 875, External Bus 876, Interface 877)
The processor 871, the ROM 872, and the RAM 873 are interconnected via the host bus 874 capable of high-speed data transmission, for example. On the other hand, the host bus 874 is connected to the external bus 876 having a relatively low data transmission speed via the bridge 875, for example. In addition, the external bus 876 is connected to various components via the interface 877.
(Input Device 878)
As the input device 878, for example, a mouse, a keyboard, a touch screen, a button, a switch, a lever, or the like is used. Furthermore, as the input device 878, a remote controller capable of transmitting a control signal using infrared rays or other radio waves is sometimes used. In addition, the input device 878 includes a voice input device such as a microphone.
(Output Device 879)
The output device 879 is a device capable of visually or aurally notifying the user of acquired information, which is a display device such as a cathode ray tube (CRT), an LCD, or an organic EL, an audio output device such as a speaker or a headphone, a printer, a mobile phone, a facsimile or the like. In addition, the output device 879 according to the present disclosure also includes various vibration devices capable of outputting tactile stimulation.
(Storage 880)
The storage 880 is a device for storing various types of data. As the storage 880, for example, a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like is used.
(Drive 881)
The drive 881 is a device that reads information recorded on the removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, or writes information on the removable recording medium 901.
(Removable Recording Medium 901)
The removable recording medium 901 is, for example, a DVD medium, a Blu-ray (registered trademark) medium, an HD DVD medium, various semiconductor storage media, or the like. It is a matter of course that the removable recording medium 901 may be, for example, an IC card equipped with a non-contact IC chip, an electronic device, or the like.
(Connection Port 882)
The connection port 882 is a port for connecting an external connection device 902 such as a universal serial bus (USB) port, an IEEE 1394 port, a small computer system interface (SCSI) port, an RS-232C port, or an optical audio terminal.
(External Connection Device 902)
The external connection device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, or the like.
(Communication Device 883)
The communication device 883 is a communication device for connecting to a network, such as a communication card for a wired or wireless LAN, Bluetooth (registered trademark), or wireless USB (WUSB), a router for optical communication, a router for asymmetric digital subscriber line (ADSL), or a modem for various types of communication.
As described above, the information processing device 10 that realizes the information processing method according to an embodiment of the present disclosure includes the learning unit 110 that optimizes parameters that determine the dynamic range by the error back propagation and the stochastic gradient descent in the quantization function of the neural network in which the parameters that determine the dynamic range are arguments. According to such a configuration, it is possible to reduce processing load of operation and to realize learning with higher accuracy.
While the preferred embodiment of the present disclosure has been described above in detail with reference to the accompanying drawings, the technical scope of the present disclosure is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field of the present disclosure can conceive of various variations or modifications within the scope of the technical idea set forth in the claims, and it is understood that such variations or modifications also fall within the technical scope of the present disclosure.
In addition, the effects described herein are illustrative or exemplary and not restrictive. That is, the technology according to the present disclosure can achieve other effects obvious to those skilled in the art from the description herein in addition to the above effects or in place of the above effects.
In addition, it is also possible to create a program for causing hardware such as a CPU, a ROM, and a RAM that are incorporated in the computer to exert functions equivalent to those of the configuration of the information processing device 10, and it is also possible to provide a computer-readable recording medium in which the program is recorded.
It is to be noted that the following structure also falls within the technical scope of the present disclosure.
(1)
An information processing device, comprising:
a learning unit that optimizes parameters that determine a dynamic range by an error back propagation and a stochastic gradient descent in a quantization function of a neural network in which the parameters that determine the dynamic range are arguments.
(2)
The information processing device according to (1), wherein
the parameters that determine the dynamic range include at least a bit length at a time of quantization.
(3)
The information processing device according to (2), wherein
the parameters that determine the dynamic range include an upper limit value or a lower limit value at a time of power quantization.
(4)
The information processing device according to (2) or (3), wherein
the parameters that determine the dynamic range include a step size at a time of linear quantization.
(5)
The information processing device according to any one of (1) to (4), wherein
the learning unit optimizes, for each layer, the parameters that determine the dynamic range.
(6)
The information processing device according to any one of (1) to (5), wherein
the learning unit optimizes, for a plurality of layers in common, the parameters that determine the dynamic range.
(7)
The information processing device according to any one of (1) to (6), wherein
the learning unit optimizes, for an entire neural network in common, the parameters that determine the dynamic range.
(8)
The information processing device according to any one of (1) to (7), further comprising:
an input/output control unit that controls an interface that outputs the parameters that determine the dynamic range optimized by the learning unit.
(9)
The information processing device according to (8), wherein
the input/output control unit acquires an initial value input by a user via the interface, and outputs the parameters that determine the dynamic range optimized on a basis of the initial value.
(10)
The information processing device according to (9), wherein
the input/output control unit acquires an initial value of a bit length input by the user via the interface, and outputs a bit length at a time of quantization optimized on a basis of the initial value of the bit length.
The information processing device according to any one of (8) to (10), wherein
the input/output control unit acquires setting related to quantization input by a user via the interface, and outputs the parameters that determine the dynamic range optimized on a basis of the setting.
(12)
The information processing device according to (11), wherein
setting related to the quantization includes setting of whether or not to permit a quantized value to be a negative value.
(13)
The information processing device according to (11) or (12), wherein
setting related to the quantization includes setting of whether or not to permit a quantized value to be 0.
(14)
The information processing device according to any one of (1) to (13), wherein
the quantization function is used for quantization of at least any of a weight, a bias or an intermediate value.
(15)
An information processing method, by a processor, comprising:
optimizing parameters that determine a dynamic range by an error back propagation and a stochastic gradient descent in a quantization function of a neural network in which the parameters that determine the dynamic range are arguments.
Number | Date | Country | Kind |
---|---|---|---|
2018-093327 | May 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/010101 | 3/12/2019 | WO | 00 |