This application claims the benefit under 35 USC § 119(a) of Korean Patent Application No. 10-2021-0115470, filed on Aug. 31, 2021, and Korean Patent Application No. 10-2022-0066180, filed on May 30, 2022, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated herein by reference for all purposes.
The following description relates to an apparatus and method with multi-format data support.
To support an operation of multi-format data, a method of individually providing an operation apparatus corresponding to a multi-format according to a format of data or concatenating, to an output, and thereby outputting a plurality of sub-type data by distributing an input of an operation apparatus that supports a maximum data type, may be used.
In the case of performing an operation on floating point data, a floating point adder used for accumulation may require a long processing time. Therefore, a data hazard issue according to a pipeline may occur in a high-speed operation.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In one general aspect, an apparatus with multi-format data support includes: a receiver configured to receive a plurality of data corresponding to a plurality of data formats; and one or more processors configured to: multiply the plurality of data using one or more multipliers; perform a first alignment on a result of the multiplication based on an exponent value of the plurality of data; add a result of the first alignment; and perform a second alignment on a result of the addition based on the exponent value and an operation result of a previous cycle.
For the multiplying, the one or more processors may be configured to: multiply a first bit input and a second bit input included in the plurality of data; convert a sign of a result of the multiplication of the first bit input and the second bit input; and combine the result of the multiplication of the first bit input and the second bit input with the converted sign to generate the result of the multiplying of the plurality of data.
For the multiplying, the one or more processors may be configured to multiply a plurality of first bit inputs of the plurality of data.
The one or more processors may be configured to: add the exponent value; obtain a maximum exponent value based on the exponent value; determine a sum of remaining exponent values; and determine a difference between the maximum exponent value and the sum.
For the performing of the first alignment, the one or more processors may be configured to shift the result of the multiplication based on a difference between a maximum exponent value obtained based on the exponent value and a sum of remaining exponent values.
For the performing of the second alignment, the one or more processors may be configured to shift the result of the addition based on a maximum exponent value obtained based on the exponent value and the operation result of the previous cycle.
For the shifting of the result of the addition, the one or more processors may be configured to shift the result of the addition based on a difference between the maximum exponent value and an exponent value stored according to the operation result of the previous cycle.
For the performing of the second alignment, the one or more processors may be configured to: extend a sign bit of the plurality of data based on a predetermined radix point; and add the extended sign bit to the exponent value.
The one or more processors may be configured to accumulate a result of the second alignment.
The one or more processors may be configured to: remove a sign bit with a predetermined length from an output of a result of the accumulation; and perform normalization on the output in which the sign bit is removed.
The one or more processors may include: one or more multipliers configured to perform the multiplying of the plurality of data; a first aligner configured to perform the first alignment on a result of the multiplication; an adder tree configured to perform the adding of the result of the first alignment; and a second aligner configured to perform the second alignment on the result of the addition.
In another general aspect, a processor-implemented method with multi-format data support includes: receiving a plurality of data corresponding to a plurality of data formats; multiplying the plurality of data using one or more multipliers; performing a first alignment on a result of the multiplication of the plurality of data based on an exponent value of the plurality of data; adding a result of the first alignment; and performing a second alignment on a result of the addition based on the exponent value and an operation result of a previous cycle.
The multiplying may include: multiplying a first bit input and a second bit input included in the plurality of data; converting a sign of a result of the multiplication of the first bit input and the second bit input; and combining the result of the multiplication of the first bit input and the second bit input with the converted sign to generate the result of the multiplying of the plurality of data.
The multiplying may include multiplying a plurality of first bit inputs of the plurality of data.
The method may include: adding the exponent value; obtaining a maximum exponent value based on the exponent value; determining a sum of remaining exponent values; and determining a difference between the maximum exponent value and the sum.
The performing of the first alignment may include shifting the result of the multiplication based on a difference between a maximum exponent value obtained based on the exponent value and a sum of remaining exponent values.
The performing of the second alignment may include shifting the result of the addition based on a maximum exponent value obtained based on the exponent value and the operation result of the previous cycle.
The shifting of the result of the addition may include shifting the result of the addition based on a difference between the maximum exponent value and an exponent value stored according to the operation result of the previous cycle.
The performing of the second alignment may include: extending a sign bit of the plurality of data based on a predetermined radix point; and adding the extended sign bit to the exponent value.
The method may include accumulating a result of the second alignment.
The method may include: removing a sign bit with a predetermined length from an output of a result of the accumulation; and performing normalization on the output in which the sign bit is removed.
In another general aspect, one or more embodiments include a non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform any one, any combination, or all operations and methods described herein.
In another general aspect, an apparatus with multi-format data support includes: one or more processors configured to: multiply a plurality of data corresponding to a plurality of data formats using one or more multipliers; perform a first alignment on a result of the multiplication based on a difference between a maximum exponent value among exponent values of the plurality of data and a sum of remaining exponent values; add a result of the first alignment; and
perform a second alignment on a result of the addition based on a difference between the maximum exponent value and an exponent value of an operation result of a previous cycle.
The first alignment may include a right-shift and the second alignment may include a left-shift.
The one or more processors may be configured to: add a predetermined value to an exponent value of an output of a result of an accumulation of a result of the second alignment; and perform normalization on the output in which the sign bit is removed.
In another general aspect, an apparatus with multi-format data support includes: one or more processors configured to: multiply the plurality of data by routing data of a plurality of data corresponding to a plurality of data formats to one or more corresponding multipliers of a multiplier-accumulator (MAC) array determined based on the plurality of data formats; perform a first alignment on a result of the multiplication based on an exponent value of the plurality of data;
add a result of the first alignment; and perform a second alignment on a result of the addition based on the exponent value and an operation result of a previous cycle.
The multipliers of the MAC array comprise a plurality of multipliers corresponding a larger bit input and another multiplier corresponding to a smaller bit input.
Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described or provided, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, with the exception of operations necessarily occurring in a certain order. Also, descriptions of features that are known in the art, after an understanding of the disclosure of this application, may be omitted for increased clarity and conciseness.
Although terms of “first,” “second,” and the like are used to explain various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not limited to such terms. Rather, these terms are used only to distinguish one member, component, region, layer, or section from another member, component, region, layer, or section. For example, a first member, component, region, layer, or section referred to in examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
Throughout the specification, when an element, such as a layer, region, or substrate, is described as being “on,” “connected to,” or “coupled to” another element, it may be directly “on,” “connected to,” or “coupled to” the other element, or there may be one or more other elements intervening therebetween. In contrast, when an element is described as being “directly on,” “directly connected to,” or “directly coupled to” another element, there can be no other elements intervening therebetween. Likewise, expressions, for example, “between” and “immediately between” and “adjacent to” and “immediately adjacent to” may also be construed as described in the foregoing.
The terminology used herein is for the purpose of describing particular examples only and is not to be limiting of the present disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items. As used herein, the terms “include,” “comprise,” and “have” specify the presence of stated features, integers, steps, operations, elements, components, numbers, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, numbers, and/or combinations thereof. The use of the term “may” herein with respect to an example or embodiment (for example, as to what an example or embodiment may include or implement) means that at least one example or embodiment exists where such a feature is included or implemented, while all examples are not limited thereto.
Unless otherwise defined herein, all terms used herein including technical or scientific terms have the same meanings as those generally understood by one of ordinary skill in the art to which this disclosure pertains after and understanding of the present disclosure. Terms defined in dictionaries generally used should be construed to have meanings matching contextual meanings in the related art and the present disclosure, and are not to be construed as an ideal or excessively formal meaning unless otherwise defined herein.
Hereinafter, the examples are described in detail with reference to the accompanying drawings. Like reference numerals illustrated in the respective drawings refer to like elements and further description related thereto is omitted.
Referring to
The neural network may refer to an overall model having a problem solution capability in such a manner that nodes forming a network through a synaptic combination change bonding strength of connections through learning. While the network may be referred to as “neural” network, such reference is not intended to impart any relatedness with respect to how the network computationally maps or thereby intuitively recognizes information and how a biological brain operates. I.e., the term “neural network” is merely a term of art referring to the hardware-implemented network.
A node of the neural network may include a combination of weights or biases. The neural network may include a layer including neurons or nodes. The neural network may infer a result desired to be predicted from an arbitrary input by changing a weight of a node through learning.
The neural network may include a deep neural network. The neural network may include a convolutional neural network (CNN), a recurrent neural network (RNN), a perceptron, a multilayer perceptron, a feed forward (FF), a radial basis network (RBN), a deep feed forward (DFF), a long short term memory (LSTM), a gated recurrent unit (GRU), an auto encoder (AE), a variational auto encoder (VAE), a denoising auto encoder (DAE), a sparse auto encoder (SAE), a Markov chain (MC), a Hopfield network (HN), a Boltzmann machine (BM), a restricted Boltzmann machine (RBM), a deep belief network (DBN), a deep convolutional network (DCN), a deconvolutional network (DN), a deep convolutional inverse graphics network (DCIGN), a generative adversarial network (GAN), a liquid state machine (LSM), an extreme learning machine (ELM), an echo state network (ESN), a deep residual network (DRN), a differentiable neural computer (DNC), a neural turning machine (NTM), a capsule network (CN), a Kohonen network (KN), and/or an attention network.
The operation apparatus 10 may be implemented as or in a personal computer (PC), a data server, and/or a portable device.
The portable device may be implemented as a laptop computer, a mobile phone, a smart phone, a tablet PC, a mobile Internet device (MID), a personal digital assistant (PDA), an enterprise digital assistant (EDA), a digital still camera, a digital video camera, a portable multimedia player (PMP), a personal navigation device or a portable navigation device (PND), a handheld game console, an e-book, and/or a smart device. The smart device may be implemented as a smart watch, a smart band, and/or a smart ring.
The operation apparatus 10 of one or more embodiments may maintain precision, may improve operation efficiency, and may support various types of data formats. The operation apparatus 10 of one or more embodiments may remove a performance degradation factor in a floating point multiply-accumulation and may enable a high-performance, high-precision neural processor application by smoothly performing a trade-off between a performance and a precision of an application stage. The operation apparatus 10 may support multiple data formats and reduce a data hazard by a latency occurring in floating point accumulation.
The operation apparatus 10 of one or more embodiments may perform a one-cycle operation without supporting of postprocessing and performance loss by providing a feedback loop having a shorter latency and a simple structure through a pseudo-floating addition with a fixed radix point through a second alignment, for example, a global alignment, and an exponent update.
When packing a partial accumulation result, the operation apparatus 10 may reduce a large shifter using a coarse normalization and may configure further simple operation hardware accordingly. Depending on examples, the operation apparatus 10 may modify and implement even a 1-bit unit normalization. Even in this case, simplification of reloading and packing may be maintained.
The operation apparatus 10 may include a receiver 110, a processor 130 (e.g., one or more processors), and a memory 150 (e.g., one or more memories).
The receiver 110 may receive and/or store a plurality of data corresponding to a plurality of data formats. The receiver 110 may include a receiving interface and may output the plurality of data to the processor 130.
The processor 130 may process data stored in the memory 150 and/or received from the receiver 110. The processor 130 may execute a computer-readable code (e.g., software) stored in the memory 150 and instructions induced by the processor 130.
The processor 130 may refer to a data processing device implemented as hardware having, for example, circuitry in a physical structure for executing desired operations. For example, the desired operations may be performed by a code or instructions included in a program.
For example, the data processing device implemented as hardware may include a microprocessor, a central processing unit, a processor core, a multi-core processor, a multiprocessor, an application-specific integrated circuit (ASIC), and a field programmable gate array (FPGA).
The processor 130 may include a multiplier module 131, a first aligner 133, an adder tree 135, and a second aligner 137. The processor 130 may further include an accumulator 139, an exponent module (not shown), and a custom formatter (not shown).
The multiplier module 131 may include at least one multiplier configured to multiply the plurality of data. The multiplier module 131 may include a first multiplier, a second multiplier, a sign converter, and a combiner.
The first multiplier and the second multiplier may multiply data. The first multiplier and the second multiplier may multiply bit inputs of lengths.
The first aligner 133 and the second aligner 137 may align input data. The first aligner 133 and the second aligner 137 may align data by shifting stored data by a bit number.
The adder tree 135 may include a plurality of adders configured in a tree structure. The adder tree 135 may add input data using a plurality of adders.
The accumulator 139 may accumulate input data.
The memory 150 may store data for an operation or an operation result. The memory 150 may store instructions or a program executable by the processor 130. For example, the instructions may include instructions for executing an operation of the processor 130 and/or an operation of each component of the processor 130.
The memory 150 may be implemented as a volatile memory device or a non-volatile memory device.
The volatile memory device may be implemented as a dynamic random access memory (DRAM), a static random access memory (SRAM), a thyristor RAM (T-RAM), a zero capacitor RAM (Z-RAM), or a twin transistor RAM (TTRAM).
The non-volatile memory device may be implemented as an electrically erasable programmable read-only memory (EEPROM), a flash memory, a magnetic RAM (MRAM), a spin-transfer torque (STT)-MRAM, a conductive bridging RAM (CBRAM), a ferroelectric RAM (FeRAM), a phase change RAM (PRAM), a resistive RAM (RRAM), a nanotube RRAM, a polymer RAM (PoRAM)), a nano floating gate memory (NFGM), a holographic memory, a molecular electronic memory device, or an insulator resistance change memory.
Referring to
The first multiplier may multiply a first bit input and a second bit input included in a plurality of data. The second multiplier may multiply a plurality of first bit inputs included in the plurality of data. The first bit input and the second bit input may differ from each other.
The sign converter 216 may convert a sign of an output of a multiplier. The combiner 217 may combine an output of the sign converter 216.
The operation apparatus 10 may perform an operation using the multiplier module 210 to support a half-precision floating point and integer 8-bit and 4-bit operations. The multiplier module 210 may include an array that includes four multipliers, for example, the multiplier 211, the multiplier 212, the multiplier 213, and the multiplier 214, each having an 8×4-bit input and a single multiplier, for example, the multiplier 215, having a 4×4-bit input. The multiplier module 210 may include the sign converter 216 configured to change a sign of a multiplication result and the combiner 217 configured to combine an operation result according to a data format as a primary adder tree.
The multiplier module 210 may receive inputs of two absolute values of 2 bytes and may perform a multiplication on 4-bit, 8-bit, 16-bit integers and a mantissa of a 16-bit floating point using four 8×4-bit multipliers 211 to 214 and a single 4×4-bit multiplier 215.
Referring to
The operation apparatus 10 may allocate an input to the same operation apparatus provided in a small scale according to a data format used for a vector input having a multi-data format.
The operation apparatus 10 may encode a multiplication result according to a sign of input data and may perform a sign extension to minimize switching. The operation apparatus 10 may combine and output an encoding result to support all of various integer-type and floating point-type data.
The operation apparatus 10 may perform a data hazard-free operation. The operation apparatus 10 of one or more embodiments may provide a feedback loop having a shorter latency and a simpler structure through a pseudo-floating addition with a fixed radix point through a global alignment and an exponent update using the second aligner 137, compared to a typical operation apparatus.
The operation apparatus 10 may provide a floating point representation in a custom format. The operation apparatus 10 may use the floating point representation including an exponent part and an encoded (e.g., may include an integer part) mantissa part. The operation apparatus 10 of one or more embodiments may reduce an operation processing time by transmitting a partial operation result to an outside and by simplifying a reloading structure.
The operation apparatus 10 may perform multiplication and accumulation operations of various lengths using unsigned multipliers of K n×m-bit inputs. The operation apparatus 10 may receive (K/2)×2n-bit and (K/2)×m-bit absolute values and a (K/2)-bit sign and may generate an m-bit output.
Each input may be routed to a plurality of small-scale multipliers according to a data format to be operated and a result of a multiplier according to a corresponding input may be encoded according to each sign and combined with outputs of a plurality of multipliers.
The operation apparatus 10 may perform a data hazard-free floating point operation. The operation apparatus 10 of one or more embodiments may simplify an alignment process by applying a custom format for independent intermediate data and by improving an operating scheme for an internal floating point operation, compared to a typical operation apparatus.
The operation apparatus 10 of one or more embodiments may achieve a data hazard-free structure through a single-cycle accumulation by performing an addition capable of responding to an overflow, by not requiring a separate normalization during an accumulation, and by enabling a small latency accumulation.
The operation apparatus 10 may fetch independent custom format data and standard floating point data. The operation apparatus 10 may store custom format data in an accumulation device and may perform multiply-accumulation on input data, may convert again an accumulation result to the custom format data, and then may transmit the custom format data to an external device.
The operation apparatus 10 may include a multiplier module 131, a first aligner 133, an exponent module 310, an adder tree (AT) 135, a second aligner 137, an accumulator 139, and a custom formatter 330.
The multiplier module 131 may support an operation on multi-format data. The operations of the multiplier module 131 may be the same as what has been described above with reference to
The first aligner 133 may right-shift a plurality of outputs of the multiplier module 131 according to an exponential difference.
The first aligner 133 may perform a first alignment on a multiplication result of the plurality of data based on an exponent value of the plurality of data. The first aligner 133 may shift the multiplication result based on a difference between a maximum exponent value obtained based on the exponent value and a sum of remaining exponent values.
The exponent module 310 may add exponent values of all input pairs of the multiplier module 310 and, here, may calculate and output a maximum exponent value among the exponent values and a difference between the maximum exponent value and a sum of remaining exponent values. The exponent module 310 may calculate the difference between the maximum exponent value and the sum.
The adder tree 135 may add a first alignment result. The adder tree 135 may add a right-shifted value.
The second aligner 137 may perform a second alignment on an addition result based on the exponent value and an operation result of a previous cycle. The second aligner 137 may shift the addition result based on a maximum exponent value obtained based on the exponent value and the operation result of the previous cycle. The second aligner 137 may shift the addition result based on a maximum exponent value obtained based on the exponent value and the operation result of the previous cycle.
The second aligner 137 may extend a sign bit of the plurality of data based on a predetermined radix point. The second aligner 137 may add the extended sign bit to the exponent value.
The second aligner 137 may calculate a difference between the maximum exponent value delivered from the exponent module 330 and an exponent value stored according to an operation result of a previous cycle, and may left-shift an output result of the adder tree 135 and/or an output result of the accumulator 139 having a relatively smaller exponent value between the output result of the adder tree 135 and the internal exponent result of the accumulator 139.
The accumulator 139 may accumulate a second alignment result. The accumulator 139 may perform an addition on a signed fixed decimal point with an output result of the adder tree 135 or an output result of the accumulator 360 having a relatively larger exponent value. The accumulator 139 may modify and store existing information using an addition result and the larger exponent value.
The custom formatter 330 may perform normalization on an output in which a sign bit is removed. When data being accumulated is to be delivered to an external device, the custom formatter 330 may remove a sign bit of a multiple length of k having a value of 1 or more from an accumulation value and may add a multiple value of k to an exponent value stored in the accumulator 139. The custom formatter 330 may pack the added exponent value and a signed output of the accumulator 139 subject to a coarse normalization process in which a partial sign bit is removed as a signed mantissa and may deliver the same to an external storage device.
Here, when data having the same format as one packed by the custom formatter 330 is delivered from the external device and reloaded to the accumulator 139, the second aligner 137 may perform a sign extension on the signed mantissa according to a radix point pre-specified in the accumulator 139, may perform a compensation by adding a length of an extended sign bit to the delivered exponent value, and may align an accumulation result.
An exponent value input to and output from the operation apparatus 10 may include a bias value, such as bias=2(n-1)−1, for bit length n.
The example of
Referring to
In the example of
In the example of
In the example of
Referring to
A combiner (e.g., the combiner 217 of
In the example of
In the example of
Referring to
The operation apparatus 10 may include a register 1129, a swap module (SWAP) 1131, a minimum/maximum extractor (min/MAX) 1133, and a difference value extractor (DIFF) 1135, a global aligner 1137 (for example, the second aligner 137 of
The exponent extractor 1113 may extract an exponent value from input data.
The MAC array 1115 may multiply input data. The MAC array 1115 may operate in the same manner as the multiplier module 131 of
The exponent module 1117 may add an exponent value extracted from the exponent extractor 1113. The exponent module 1117 may calculate a difference between a maximum exponent value among exponent values and a sum of remaining exponent values, and output the calculated difference. The exponent module 1117 may calculate an addition of an exponent part by a multiplication. For example, the exponent module 1117 may add exponent values Ea and Eb obtained from the exponent extractor 1113 based on a bias. In this example, when the bias is 1 with respect to a multiplication result, the exponent module 1117 calculate an exponent value as as Ea+Eb+1. The exponent module 1117 may output sft_seq by searching for a maximum sum. The exponent module 1117 may output 16 sft_amt by calculating a difference of the maximum sum for the local aligner 1123.
The register 1119, the register 1125, the register 1129, and the register 1141 may store therein data.
The adder 1121 may add an output of the MAC array 1115. The adder 1121 may add 7 zero bits to a tail. The local aligner 1123 may align an output of the adder 1121. The local aligner 1123 may perform an alignment in the same manner as the first aligner 133 of
The adder tree 1127 may add 16 inputs. The adder tree 1127 may operate in the same manner as the adder tree 135 of
The swap module 1131 may receive an output of the adder tree 1127 and replace a portion of data. The minimum/maximum extractor 1133 may extract a minimum or maximum value based on an output of the swap module 1131, an output of the exponent module 1117, and psum_in.
The global aligner 1137 may align an operation result. The global aligner 1137 may operate in the same manner as the second aligner 137 of
The accumulator 1139 may accumulate an output of the global aligner 1137. The accumulator 1139 may operate in the same manner as the accumulator 139 of
A subtotal generator may generate a partial sum. The subtotal generator may include the normalizer 1143 and the coarse normalization detector 1145. The normalizer 1143 may perform byte-wise shift-left. Here, a shift amount may be a multiple of 8. The coarse normalization detector 1145 may perform a reading sign detection having an encoded output. The coarse normalization detector 1145 may calculate a shift factor. For example, when a reading sign is one_pos_inc, the coarse normalization detector 1145 may calculate a shift factor as follows.
casex(one_pos_inc+1)
The coarse normalization detector 1145 may calculate a difference between a shift factor and a global shift sequence.
A final output 1147 of the operation apparatus 10 may include exp that is an exponent part and mantissa that is a mantissa part. For example, the exponent part may include 6 bits and the mantissa part may include 26 bits. Dissimilar to a standard floating point format, the mantissa part may include a hidden bit and a signed number. An exponent may be biased by 31 and, similar to the standard floating point format, a partial area may be reserved for INF, overflow, underflow, and zero.
Referring to
In operation 1210, a receiver, for example, the receiver 110 of
In operation 1230, a multiplier module, for example, the multiplier module 210 of
In operation 1250, a first aligner, for example, the first aligner 133 of
In operation 1270, an adder tree, for example, the adder tree 135 of
In operation 1290, a second aligner, for example, the second aligner 137 of
A second multiplier, for example, the multiplier 215 of
An exponent module, for example, the exponent module 310 of
An accumulator, for example, the accumulator 139 of
A custom formatter, for example, the custom formatter 330, may remove a sign bit with a predetermined length from an output of the accumulator. The custom formatter 330 may perform normalization on an output in which the sign bit is removed.
The operation apparatuses, receivers, processors, memories, multiplier modules, first aligners, adder trees, second aligners, accumulators, multipliers, sign converters, combiners, exponent modules, custom formatters, input registers, exponent extractors, MAC arrays, registers, adders, local alignment modules, swap modules, minimum/maximum extractors, difference value extractors, global aligners, coarse normalization detectors, normalizers, operation apparatus 10, receiver 110, processor 130, memory 150, multiplier module 131, first aligner 133, adder tree 135, second aligner 137, accumulator 139, multiplier module 210, multiplier 211, multiplier 212, multiplier 213, multiplier 214, multiplier 215, sign converter 216, combiner 217, multiplier module 310, first aligner 320, exponent module 330, adder tree (AT) 340, second aligner 350, accumulator 360, custom formatter 370, input register 1111, exponent extractor 1113, MAC array 1115, exponent module (EXP) 1117, register 1119, adder (Add) 1121, local alignment module 1123, register 1125, adder tree 1127, register 1129, swap module (SWAP) 1131, minimum/maximum extractor (min/MAX) 1133, difference value extractor (DIFF) 1135, global aligner 1137, accumulator 1139, register 1141, coarse normalization detector 1145, normalizer 1143, and other apparatuses, devices, units, modules, and components described herein with respect to
The methods illustrated in
Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software includes higher-level code that is executed by the one or more processors or computer using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions in the specification, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media. Examples of a non-transitory computer-readable storage medium include read-only memory (ROM), random-access programmable read only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROMs, CD-Rs, CD+Rs, CD-RWs, CD+RWs, DVD-ROMs, DVD-Rs, DVD+Rs, DVD-RWs, DVD+RWs, DVD-RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, blue-ray or optical disk storage, hard disk drive (HDD), solid state drive (SSD), flash memory, a card type memory such as multimedia card micro or a card (for example, secure digital (SD) or extreme digital (XD)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
While this disclosure includes specific examples, it will be apparent after an understanding of the disclosure of this application that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0115470 | Aug 2021 | KR | national |
10-2022-0066180 | May 2022 | KR | national |