The present technique relates to the field of data processing, and more particularly to techniques for generating an output value representing a shifted input value.
There are many situations when performing data processing operations where it is required to perform a shift operation on an input value. However, when employing typical shift logic circuitry, the time taken to perform the required shift operation on an input value can be significant, and this can adversely affect performance.
It would be desirable to enable the time taken to perform a shift operation to be reduced, so as to reduce the delay associated with performing shift operations on data.
In accordance with a first example arrangement, there is provided an apparatus for performing a computation equivalent to applying a shift to an input value to generate an output value, comprising: mask generation circuitry to generate an N-bit mask in dependence on a provided shift amount indication, where N is a number of possible bit positions that a given bit of the input value may be located within the output value after the shift is performed, wherein the mask generation circuitry is arranged to perform N independent logical operations on bits forming the shift amount indication, each logical operation producing a mask bit value for a corresponding bit position of the N-bit mask, and the N logical operations being arranged such that, for any given shift amount indication, only one bit position in the generated N-bit mask will have its mask bit value indicating a set state; and output value generation circuitry to apply the N-bit mask to the given bit of the input value in order to determine a corresponding location of the given bit within the output value, and to determine a location within the output value of each other bit of the input value in dependence on the corresponding location of the given bit.
In accordance with a second example arrangement, there is provided a system comprising: the apparatus in accordance with the first example arrangement, implemented in at least one packaged chip; at least one system component; and a board, wherein the at least one packaged chip and the at least one system component are assembled on the board.
In accordance with a further example arrangement, there is provided a chip-containing product comprising the system in accordance with the second example arrangement, assembled on a further board with at least one other product component.
In accordance with a yet further example arrangement, there is provided a computer-readable medium to store computer-readable code for fabrication of an apparatus in accordance with the first example arrangement discussed above. The computer-readable medium may be a transitory computer-readable medium or a non-transitory computer-readable medium.
In accordance with a still further example arrangement, there is provided a method of performing a computation equivalent to applying a shift to an input value to generate an output value, comprising: employing mask generation circuitry to generate an N-bit mask in dependence on a provided shift amount indication, where N is a number of possible bit positions that a given bit of the input value may be located within the output value after the shift is performed, the mask generation circuitry performing N independent logical operations on bits forming the shift amount indication, each logical operation producing a mask bit value for a corresponding bit position of the N-bit mask, and the N logical operations being arranged such that, for any given shift amount indication, only one bit position in the generated N-bit mask will have its mask bit value indicating a set state; and applying the N-bit mask to the given bit of the input value in order to determine a corresponding location of the given bit within the output value, and determining a location within the output value of each other bit of the input value in dependence on the corresponding location of the given bit.
Further aspects, features and advantages of the present technique will be apparent from the following description of examples, which is to be read in conjunction with the accompanying drawings, in which:
In accordance with the techniques described herein, an apparatus is provided that can perform a computation equivalent to applying a shift to an input value in order to generate an output value. However, rather than using conventional shift logic circuitry to perform the shift, the technique described herein instead makes use of a mask that can be applied to the input value in order to generate the required output value.
In particular, the apparatus has mask generation circuitry to generate an N-bit mask in dependence on a provided shift amount indication, where N is a number of possible bit positions that a given bit of the input value may be located within the output value after the shift is performed. The mask generation circuitry is arranged to perform N independent logical operations on bits forming the shift amount indication, each logical operation producing a mask bit value for a corresponding bit position of the N-bit mask. Since the N logical operations are independent of each other, there is a great deal of flexibility as to how the logical operations are performed, and in one example implementation the logical operations are performed in parallel so that all of the N mask bit values required for the mask can be generated in parallel.
The N logical operations are arranged such that, for any given shift amount indication, only one bit position in the generated N-bit mask will have its mask bit value indicating a set state. The value used to indicate the set state may be varied dependent on implementation. In one example implementation, a value of 1 is used to indicate the set state, and a value of 0 is used to indicate the unset (or clear) state, and hence in such an implementation only one bit position in the N-bit mask will have its mask bit value set to 1. However, it will be appreciated that in an alternative implementation a value of 0 could be used to indicate the set state and a value of 1 could be used indicate the clear state, in which case only one bit position in the N-bit mask would have its mask bit value set to 0.
The apparatus further has output value generation circuitry that is used to apply the N-bit mask to the given bit of the input value in order to determine a corresponding location of the given bit within the output value, and to determine a location within the output value of each other bit of the input value in dependence on the corresponding location of the given bit.
By such an approach, it has been found that the delay associated with applying a shift to an input value can be significantly reduced, thereby enabling significant performance improvements to be realised in an apparatus where the application of shifts to data values is often required.
The shift indication amount can take a variety of forms. However, in one example implementation, the shift indication amount is an X-bit value, and each logical operation uses X inputs chosen from non-inverted and inverted versions of each bit of the shift indication amount, such that each logical operation has a different X inputs to any other logical operation of the N logical operations. The logical operations are chosen such that, for all combinations of possible values of the X bits forming the shift indication amount, only one of the logical operations will produce a mask bit value that is in the set state.
The logical operations performed can take a variety of forms, dependent on implementation. However, in one example implementation, each logical operation is arranged to perform a logical AND of the X inputs provided for that that logical operation. In such an example implementation, only one of the logical operations will receive as inputs all 1s, and hence only one of the logical operations will produce an output of 1, which in one example implementation can be used to indicate the set state.
The number of bits forming the output value, and the bit length size of the output value relative to the input value, may vary dependent on implementation. In one example implementation, the output value comprises at least N bits, and hence is at least as large as the mask. In one such example implementation, the output value generation circuitry is then arranged to align the N-bit mask with N bit positions of the output value such the corresponding location of the given bit within the output value is identified by the one bit position in the generated N-bit mask that has the mask bit value indicating the set state.
There are various ways in which the location within the output value of each other bit of the input value can be determined, in dependence on the corresponding location of the earlier-mentioned given bit. However, in one example implementation, the output value generation circuitry is arranged to apply a shifted version of the N-bit mask to each other bit of the input value other than the given bit in order to determine a corresponding location of each other bit within the output value. For any selected other bit of the input value the shifted version of the N-bit mask is arranged to adjust for a number of bit positions between the given bit and the selected other bit. By such an approach, it has been found that the same N-bit mask can be reused, albeit in a logically shifted form, to work out the position within the output value of each bit of the input value. This can lead to a particularly efficient implementation.
Indeed, in one example implementation, each shifted version of the N-bit mask is achieved by rewiring of the N-bit mask. Hence no logic is required in order to generate each shifted version of the N-bit mask, and instead simple wiring patterns can be used to move the N-bit mask to align it appropriately with respect to the bit positions of the output value so as to cause each bit of the input value to be located correctly within the output value in order to produce a result equivalent to applying a shift to the input value. This can lead to a significant reduction in the time taken to produce the output value, when compared with techniques using conventional shift logic circuits.
In one example implementation, a given shifted version of the N-bit mask used for a given other bit of the input value comprises the N-bit mask aligned with N bit positions of the output value such the corresponding location of the given other bit within the output value is identified by the one bit position in the given shifted version of the N-bit mask that has the mask bit value indicating the set state. Hence, a simple application of that shifted version of the N-bit mask can be used to propagate the value of the given other bit from the input value to the corresponding location in the output value. For example, in situations where the single bit of the N-bit mask that is in the set state has a value of 1, the given other bit of the input value can be replicated N times to produce an N-bit value which is then logically AND-ed with the given shifted version of the N-bit mask in order to propagate the value of that given other bit into the corresponding location within the output value.
In one example implementation, the input value comprises L bits, and L−1 shifted versions of the N-bit mask are formed, each shifted version being associated with one of the L−1 other bits of the input value.
In such an example implementation, the output value generation circuitry may be arranged to generate L intermediate results, each intermediate result being generated by applying a version of the N-bit mask to its associated bit of the input value, and the output value generation circuitry may comprise combining circuitry to logically combine the L intermediate results to generate the output value.
As mentioned earlier, the number of bits forming the output value may vary dependent on implementation. However, in one example implementation the output value has P bits, where P is greater than N. In such an implementation, the output value generation circuitry may be arranged to generate each intermediate result such that that intermediate result includes P-N padding bit values provided in at least one of most significant bit positions and least significant bit positions of the intermediate result. The number of padding bit values added in most significant bit positions and the number of padding bit values added in least significant bit positions will depend on the bit of the input value associated with each intermediate result in question.
The given bit of the input value that the N-bit mask is applied to may vary dependent on implementation, but in one example implementation the given bit is the most significant bit of the input value. In such an implementation, the intermediate result generated by applying the N-bit mask to the given bit has padding bits provided in the P-N least significant bit positions, and the intermediate result generated for each successive less significant bit of the input value has one more padding bit in the most significant bit positions than the intermediate result generated for an adjacent more significant bit of the input value.
In one particular example implementation P=N+L−1, where as mentioned earlier L represents the number of bits in the input value. In this particular example implementation, the most significant bit of the input value is positioned within one of the N bit positions of the output value identified by the N-bit mask, and each less significant bit of the input value is positioned in a corresponding less significant bit position of the output value. By arranging for the output value to have N+L−1 output bits, and aligning the N-bit mask with the most significant N bit positions of the output value, this means that even in situations where the most significant bit of the input value is placed within the output value in the least significant bit position identified by the N-bit mask, the remaining L−1 bits of the input value can be positioned within the output value at corresponding less significant bit positions than the bit position of the output value used to accommodate the most significant bit of the input value.
The number of bits in the output value and the number of bits in the input value may vary dependent on implementation. For example, in some implementations there may be the same number of bits in the output value as in the input value. However, in one example implementation the output value has more bits than the input value, the input value comprises L bits, and the output value generation circuitry is arranged to apply the N-bit mask so as to replicate the input value in a sequence of L bit positions within the output value. The N-bit mask is hence effectively used to decide where within the output value that sequence should be placed.
In one such example implementation, the output value generation circuitry is arranged to generate the output value such that each bit position other than the L bit positions has its value set to a predetermined value. The predetermined value can take a variety of forms. For example, in one implementation the predetermined value may be a 0, such that the input value is replicated in a sequence of L bit positions within the output value, with all other bit positions being set to 0. However, in an alternative implementation the predetermined value could be 1, and in that case the input value would be replicated in a sequence of L bit positions within the output value, with all other bit positions being set to 1.
The techniques described herein can be used in a wide variety of situations where it is desired to perform an effective shift on an input value when producing an output value. In one particular example implementation, such an approach may be used when converting a floating point value into a fixed point value. In particular, whilst a variety of computations may be performed using floating point values, it is often the case that at some point it is desired to convert the floating point value into fixed point form. It is possible to convert any finite floating point number into a wider fixed point number.
In one such example implementation, the input value may be a significand of a floating point value, and the shift indication amount may be an indication of an exponent of the floating point value. The effective shift required of the significand will be dependent on the value of the exponent of the floating point value, and hence the exponent can be used to provide a shift indication amount to the mask generation circuitry in order to cause an appropriate mask to be generated. Furthermore, in such an implementation, the output value may be a fixed point value comprising more bits than are provided by the significand of the floating point value.
The techniques described herein can in principle be applied in association with any floating point format. However, they are particularly beneficially implemented in association with FP8 floating point format, where the relatively constrained range of finite values that can be represented in the floating point format can readily be accommodated within a reasonably sized fixed point number. In one particular example implementation, the technique is used when converting into fixed point form the product generated by multiplying two finite FP8 floating point numbers. Such a product may have a significand that is 8 bits long and the product exponent may be represented by 6 bits. A 6-bit exponent means that the first bit of the product significand can be any of 64 (26) bit positions with the remaining seven bits of the product significand immediately following the first bit, resulting in a 71-bit fixed point number. In actual fact, due to biased FP exponent 0 and 1 representing the same true exponent, and due to the fact that the maximum input exponents do not indicate finite floating point numbers, it is possible to represent any finite FP8 product as a 68-bit fixed point number. Within such an implementation, it is readily possible to generate a mask to implement the required shift functionality, and as a result to significantly reduce the time taken to create the fixed point form representing the earlier-mentioned floating point product, when compared with the use of traditional shift logic components.
As mentioned earlier, the shift indication amount may be derived from an indication of the exponent of the floating point value. In one particular implementation, the indication of the exponent provided for this purpose is the biased exponent used in floating point representations, but in alternative implementations the real exponent value could be used instead.
As mentioned earlier, whilst in some implementations the output value may have more bits than the input value, the techniques can still be used even if this is not the case. For instance, in one example implementation the output value has a same number of bits as the input value. The number of bits used to form the input value and the output value may in one such example implementation also equal the number of bits in the mask, i.e. P=L=N. Alternatively, in some implementations, it may be possible that the number of bits used for both the input value and the output value exceeds the number of bits used for the mask, and in such an application the mask may be repeated in whole or in part to create a larger value.
In one particular example implementation where the output value has the same number of bits as the input value, the shift may be used to implement a rotate such that the output value is a rotated version of the input value.
Particular example implementations of the techniques described herein will now be discussed with reference to the figures.
Output value generation circuitry 20 is arranged to receive the N-bit mask produced by the mask generation circuitry, along with the input value, which in the example of
By using the above approach, it has been found that the delay associated with applying a shift to an input value can be significantly reduced when compared with the use of conventional shift logic circuits.
Each logical operation instance 40, 42, 44, 46 will comprise logic circuitry used to generate a single corresponding mask bit of the N-bit mask 50 based on the provided X input bits received from the bit selection circuitry 30. Each logical operation instance can perform its associated logical operation independently of the other logical operation instances, with the bit selection circuitry 30 providing each logical operation instance with the relevant X bits to be used as inputs to the logical operation performed by that instance. Hence each of the logical operation instances 40, 42, 44, 46 may, in one example implementation, operate in parallel so that the N bits of the N-bit mask are produced in parallel.
In one example implementation, each logical operation instance 40, 42, 44, 46 is constructed identically to each other instance, but no logical operation instance will receive the same X inputs. The logical operation performed by each logical operation instance is such that, for all combinations of possible values of the X bits forming the shift indication amount, only one of the logical operations will produce a mask bit value that is in the set state. In one example implementation, the set state is indicated by a value of 1, and hence in one example implementation the N-bit mask will have one bit set to 1 with all other bits having a value of 0. However, it will be appreciated that the value used for the set state could in an alternative implementation be a value of 0, and hence in that implementation the N-bit mask will have one bit set to 0 with all other bits having a value of 1.
In one example implementation where the set state is indicated by a value of 1, each logical operation instance 40, 42, 44, 46 may be arranged to perform a logical AND of the X bits provided as an input. Only one of the logical operation instances will receive X inputs all having a value of 1, and hence only one of the logical operation instances will produce a mask bit having a value of 1.
Whilst the required one-bit right shifts 80, 82, 84 could be performed in a variety of ways, in one particular example implementation the effective shifting of the mask is achieved merely by appropriate rewiring of the N-bit mask. Hence, no additional logic is required in order to generate each shifted version of the N-bit mask, and instead simple wiring patterns can be used to move the N-bit mask to align it appropriately for use by each intermediate result generator 70, 72, 74, 76.
Each intermediate result generator 70, 72, 74, 76 is arranged to determine, based on the provided version of the mask, the appropriate bit position within the output value that the associated bit of the input value provided to that intermediate result generator should reside within. In one particular example implementation, this is achieved by replicating the provided bit of the input value N times and then logically combining it with the provided version of the mask to propagate the input bit value into the bit position of an intermediate result that corresponds with the appropriate bit position within the output value. In particular, each version of the mask used will only have one bit position in the set state, and that bit position can be used to propagate the relevant bit of the input value into the bit position of the intermediate result that corresponds with the appropriate bit position within the output value. Combining circuitry 90 can then be used to combine the various intermediate results in order to generate an output value where each bit of the input value is at the correct location within the output value taking into account the desired shift indicated by the shift amount indication that was used to generate the mask.
As noted earlier, the output value may in one example implementation comprise P bits, and in one particular example implementation P is greater than N. In such an implementation, each intermediate result generator 70, 72, 74, 76 may be arranged to generate its intermediate result such that the intermediate result includes P-N padding bit values provided in at least one of most significant bit positions and least significant bit positions of the intermediate result. In the example illustrated in
This is illustrated schematically in
Once the various intermediate results have been generated in the above manner, then the combining circuitry 90 can be arranged to logically combine the various individual intermediate results to form a final output value. In one example implementation, a logical OR operation is performed to combine the various intermediate results to form the output value.
At step 165, N independent logical operations are performed (for example using the logical operation instances 40, 42, 44, 46 discussed earlier with reference to
At step 170, for an L-bit input value L intermediate result generation functions are performed, each using a version of the N-bit mask and one bit of the input value. This step can for example be performed using the intermediate result generator circuits 70, 72, 74, 76 discussed earlier with reference to
The techniques described herein can be used in a wide variety of situations where it is desired to perform an effective shift on an input value when producing an output value. However, as noted earlier, in one particular example implementation such an approach may be used when converting a floating point value into a fixed point value. The following discussion of floating point numbers is provided by way of background.
Floating-point (FP) is a useful way of approximating real numbers using a small number of bits. The IEEE 754-2008 F P Standard proposes multiple different formats for FP numbers, for example binary 64 (also known as double precision, or DP), binary 32 (also known as single precision, or SP), and binary 16 (also known as half precision, or HP). The numbers 64, 32, and 16 refer to the number of bits required for each format.
FP numbers are quite similar to the “scientific notation” taught in science classes, where instead of negative two million one would write −2.0×106. The parts of this number are the sign (in this case negative), the significand (2.0), the base of the exponent (10), and the exponent (6). All of these parts have analogs in FP numbers, although there are differences, the most important of which is that the constituent parts are stored as binary numbers, and the base of the exponent is always 2.
More precisely, FP numbers all consist of a sign bit, some number of biased exponent bits, and some number of fraction bits. In particular, the above-mentioned formats consist of the following bits:
The sign is 1 for negative numbers and 0 for positive numbers. Every number, including zero, has a sign.
The exponent is biased, which means that the true exponent differs from the one stored in the number. For example, biased SP exponents are 8-bits long and range from 0 to 255. Exponents 0 and 255 are special cases, but all other exponents have bias 127, meaning that the true exponent is 127 less than the biased exponent. The smallest biased exponent is 1, which corresponds to a true exponent of −126. The maximum biased exponent is 254, which corresponds to a true exponent of 127. HP and DP exponents work the same way, with the biases indicated in the table above.
SP exponent 255 (or DP exponent 2047, or HP exponent 31) is reserved for infinities and special symbols called NaNs (not a number). Infinities (which can be positive or negative) have a zero fraction. Any number with exponent 255 and a non-zero fraction is a NaN. Infinity provides a saturation value, so it actually means something like “this computation resulted in a number that is bigger than what we can represent in this format.” NaNs are returned for operations that are not mathematically defined on the real numbers, for example dividing zero by zero or taking the square root of a negative number.
Exponent zero, in any of the formats, is reserved for subnormal numbers and zeros. A normal number represents the value:
Numbers with both exponent and fraction equal to zero are zeros.
The following table has some example numbers in HP format. The entries are in binary, with ‘_’ characters added to increase readability. Notice that the subnormal entry (4th line of the table, with zero exponent) produces a different significand than the normal entry in the preceding line.
The FP way of handling signs is called sign-magnitude, and it is different from the usual way integers are stored in a computer (two's complement). In sign-magnitude representation, the positive and negative versions of the same number differ only in the sign bit. A 4-bit sign-magnitude integer, consisting of a sign bit and 3 significand bits, would represent plus and minus one as:
In two's complement representation, an n-bit integer i is represented by the low order n bits of the binary n+1-bit value 2n+i, so a 4-bit two's complement integer would represent plus and minus one as:
The two's complement format is practically universal for signed integers because it simplifies computer arithmetic.
A fixed-point number looks exactly like an integer, but actually represents a value that has a certain number of fractional bits. Sensor data is often in fixed-point format, and there is a great deal of fixed-point software that was written before the widespread adoption of FP. Fixed-point numbers are quite tedious to work with because a programmer has to keep track of the “binary point”, i.e. the separator between the integer and fractional parts of the number, and also has to constantly shift the number to keep the bits in the correct place. FP numbers don't have this difficulty, so it is desirable to be able to convert between fixed-point numbers and FP numbers. Being able to do conversions also means that we can still use fixed-point software and data, but we are not limited to fixed-point when writing new software.
More recently newer formats of floating point numbers have been developed, for example:
These continue to follow the IEEE exponent biasing rules, where an n-bit exponent 0<e<2n−1 has bias 2n−1−1 and exponent zero is for subnormals and zeros. For most formats the exponent 2n−1 is for infinities and NaNs, but the 1-4-3 format is likely to use the maximum exponent for biased numerical values.
It has been realised that it may be useful to use the FP8 formats as storage formats. This does not mean that they can't be used in computations, but that they tend not to be intermediate or final results of computations, just inputs to computations. The most common computation in machine learning is matrix multiplication, where each entry of the product matrix is computed as a sum of many products.
Some particular computations that may be usefully performed are a sum of an SP value with four FP8 products, and a sum of an HP value with two FP8 products.
The FP8 inputs can be any combination of the two FP8 formats. A suitable way to handle this is to convert any FP8 inputs to a hybrid 1-5-3 format. Conversion from 4-bit exponents to 5-bit exponents can be performed as follows.
Let e be the true exponent, let e_4b be the biased 4-bit exponent, and let e_5b be corresponding biased 5-bit exponent. The way biasing works we have e_4b=e+7, and e_5b=e+15. The underlying true exponent e does not change, which makes converting from 4 to 5 bit exponents straightforward: e_4b-7=e_5b−15, or e_5b=e_4b+8. Adding 8 to a 4-bit exponent does not require an adder:
Hence, converting to a common 5-bit biased exponent involves hardly any gates, just one inverter.
It is necessary to deal with zero exponents, and for computing shift distances they are best treated as exponent 1 (because biased exponents zero and one represent the same real exponent). Infinities and NaNs are handled outside of the normal exponent computations.
As usual, the product of two numbers with true 5-bit exponents ea and eb has exponent ea+eb. For most multipliers the computation is not that simple because the sum of two biased exponents gives an exponent that has twice the bias: ea_5b+eb_5b=ea+15+eb+15=ea+eb+30. When using the technique describe herein to convert each product to a fixed point number, it would be convenient to use a 6-bit biased exponent for the sum. By the IEEE exponent biasing convention, a 6-bit biased exponent would have bias 31. It has already been shown above that the sum of the two biased 5-bit exponents has bias 30, so the correct bias can readily be achieved by adding a carry-in bit to the sum of the exponents:
As discussed earlier, when using the techniques described herein to convert a floating point value into a fixed point value, the input value can take the form of the significand of the floating point value, and the shift indication amount may be an indication of an exponent of the floating point value.
In one particular example implementation, the techniques described herein are used when converting into fixed point form the product generated by multiplying two finite FP8 floating point numbers. Such a product may have a significand that is 8 bits long and the product exponent may be represented by 6 bits. A 6-bit exponent means that the first bit of the product significand can be any of 64 (26) bit positions with the remaining seven bits of the product significand immediately following the first bit, resulting in a 71-bit fixed point number. In actual fact, due to biased FP exponent 0 and 1 representing the same true exponent, and due to the fact that the maximum input exponents do not indicate finite floating point numbers, it is possible to represent any finite FP8 product as a 68-bit fixed point number. Within such an implementation, it is ready possible to generate a 64-bit mask to implement the required shift functionality, and the following illustrates how the biased exponent bits may be used as inputs to the logical operations used to produce each bit of the mask:
As will be apparent from the above discussion, using the mask, an intermediate result can be generated for each of the bits of the input value, which as discussed above will in this case be an 8-bit significand. Hence eight intermediate results will be generated, which can then be combined by a logical OR operation to generate the final 68-bit fixed point representation. This can be illustrated as follows:
Whilst in the above example the output value has more bits than the input value, the techniques described herein can still be used even if this is not the case. For instance, in one example implementation the output value may have the same number of bits as the input value. The number of bits used to form the input value and the output value may in one such example implementation also equal the number of bits in the mask, i.e. P=L=N. Alternatively, in some implementations, it may be possible that the number of bits used for both the input value and the output value exceeds the number of bits used for the mask, and in such an application the mask may be repeated in whole or in part to create a larger value.
In one particular example implementation where the output value has the same number of bits as the input value, the shift may be used to implement a rotate such that the output value is a rotated version of the input value, as illustrated schematically in
Hence, it can be seen that the techniques described herein have a wide variety of applications, and can significantly improve performance by reducing the delay associated with performing shifts on data values.
Concepts described herein may be embodied in a system comprising at least one packaged chip. The apparatus described earlier is implemented in the at least one packaged chip (either being implemented in one specific chip of the system, or distributed over more than one packaged chip). The at least one packaged chip is assembled on a board with at least one system component. A chip-containing product may comprise the system assembled on a further board with at least one other product component. The system or the chip-containing product may be assembled into a housing or onto a structural support (such as a frame or blade).
As shown in
In some examples, a collection of chiplets (i.e. small modular chips with particular functionality) may itself be referred to as a chip. A chiplet may be packaged individually in a semiconductor package and/or together with other chiplets into a multi-chiplet semiconductor package (e.g. using an interposer, or by using three-dimensional integration to provide a multi-layer chiplet product comprising two or more vertically stacked integrated circuit layers).
The one or more packaged chips 400 are assembled on a board 402 together with at least one system component 404 to provide a system 406. For example, the board may comprise a printed circuit board. The board substrate may be made of any of a variety of materials, e.g. plastic, glass, ceramic, or a flexible substrate material such as paper, plastic or textile material. The at least one system component 404 comprise one or more external components which are not part of the one or more packaged chip(s) 400. For example, the at least one system component 404 could include, for example, any one or more of the following: another packaged chip (e.g. provided by a different manufacturer or produced on a different process node), an interface module, a resistor, a capacitor, an inductor, a transformer, a diode, a transistor and/or a sensor.
A chip-containing product 416 is manufactured comprising the system 406 (including the board 402, the one or more chips 400 and the at least one system component 404) and one or more product components 412. The product components 412 comprise one or more further components which are not part of the system 406. As a non-exhaustive list of examples, the one or more product components 412 could include a user input/output device such as a keypad, touch screen, microphone, loudspeaker, display screen, haptic device, etc.; a wireless communication transmitter/receiver; a sensor; an actuator for actuating mechanical motion; a thermal control device; a further packaged chip; an interface module; a resistor; a capacitor; an inductor; a transformer; a diode; and/or a transistor. The system 406 and one or more product components 412 may be assembled on to a further board 414.
The board 402 or the further board 414 may be provided on or within a device housing or other structural support (e.g. a frame or blade) to provide a product which can be handled by a user and/or is intended for operational use by a person or company.
The system 406 or the chip-containing product 416 may be at least one of: an end-user product, a machine, a medical device, a computing or telecommunications infrastructure product, or an automation control system. For example, as a non-exhaustive list of examples, the chip-containing product could be any of the following: a telecommunications device, a mobile phone, a tablet, a laptop, a computer, a server (e.g. a rack server or blade server), an infrastructure device, networking equipment, a vehicle or other automotive product, industrial machinery, consumer device, smart card, credit card, smart glasses, avionics device, robotics device, camera, television, smart television, DVD players, set top box, wearable device, domestic appliance, smart meter, medical device, heating/lighting control device, sensor, and/or a control system for controlling public infrastructure equipment such as smart motorway or traffic lights.
Concepts described herein may be embodied in computer-readable code for fabrication of an apparatus that embodies the described concepts. For example, the computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts. The above computer-readable code may additionally or alternatively enable the definition, modelling, simulation, verification and/or testing of an apparatus embodying the concepts described herein.
For example, the computer-readable code for fabrication of an apparatus embodying the concepts described herein can be embodied in code defining a hardware description language (HDL) representation of the concepts. For example, the code may define a register-transfer-level (RTL) abstraction of one or more logic circuits for defining an apparatus embodying the concepts. The code may define a HDL representation of the one or more logic circuits embodying the apparatus in Verilog, SystemVerilog, Chisel, or VHDL (Very High-Speed Integrated Circuit Hardware Description Language) as well as intermediate representations such as FIRRTL. Computer-readable code may provide definitions embodying the concept using system-level modelling languages such as SystemC and SystemVerilog or other behavioural representations of the concepts that can be interpreted by a computer to enable simulation, functional and/or formal verification, and testing of the concepts.
Additionally or alternatively, the computer-readable code may define a low-level description of integrated circuit components that embody concepts described herein, such as one or more netlists or integrated circuit layout definitions, including representations such as GDSII. The one or more netlists or other computer-readable representation of integrated circuit components may be generated by applying one or more logic synthesis processes to an RTL representation to generate definitions for use in fabrication of an apparatus embodying the invention. Alternatively or additionally, the one or more logic synthesis processes can generate from the computer-readable code a bitstream to be loaded into a field programmable gate array (FPGA) to configure the FPGA to embody the described concepts. The FPGA may be deployed for the purposes of verification and test of the concepts prior to fabrication in an integrated circuit or the FPGA may be deployed in a product directly.
The computer-readable code may comprise a mix of code representations for fabrication of an apparatus, for example including a mix of one or more of an RTL representation, a netlist representation, or another computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus embodying the invention. Alternatively or additionally, the concept may be defined in a combination of a computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus and computer-readable code defining instructions which are to be executed by the defined apparatus once fabricated.
Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc. An integrated circuit fabricated using the computer-readable code may comprise components such as one or more of a central processing unit, graphics processing unit, neural processing unit, digital signal processor or other components that individually or collectively embody the concept.
Some example configurations are set out in the following numbered clauses:
1. An apparatus to perform a computation equivalent to applying a shift to an input value to generate an output value, comprising:
2. An apparatus as in Clause 1, wherein the shift indication amount is an X-bit value, and each logical operation uses X inputs chosen from non-inverted and inverted versions of each bit of the shift indication amount, such that each logical operation has a different X inputs to any other logical operation of the N logical operations.
3. An apparatus as in Clause 2, wherein each logical operation is arranged to perform a logical AND of the X inputs provided for that that logical operation.
4. An apparatus as in any preceding clause, wherein:
5. An apparatus as in Clause 4, wherein the output value generation circuitry is arranged to apply a shifted version of the N-bit mask to each other bit of the input value other than the given bit in order to determine a corresponding location of each other bit within the output value, where for any selected other bit of the input value the shifted version of the N-bit mask is arranged to adjust for a number of bit positions between the given bit and the selected other bit.
6. An apparatus as in Clause 5, wherein application of the shifted version of the N-bit mask is achieved by rewiring of the N-bit mask.
7. An apparatus as in Clause 5 or Clause 6, wherein a given shifted version of the N-bit mask used for a given other bit of the input value comprises the N-bit mask aligned with N bit positions of the output value such the corresponding location of the given other bit within the output value is identified by the one bit position in the given shifted version of the N-bit mask that has the mask bit value indicating the set state.
8. An apparatus as in any of clauses 5 to 7, wherein the input value comprises L bits, and L−1 shifted versions of the N-bit mask are formed, each shifted version being associated with one of the L−1 other bits of the input value.
9. An apparatus as in Clause 8, wherein the output value generation circuitry is arranged to generate L intermediate results, each intermediate result being generated by applying a version of the N-bit mask to its associated bit of the input value, and the output value generation circuitry comprises combining circuitry to logically combine the L intermediate results to generate the output value.
10. An apparatus as in Clause 9, wherein the output value has P bits, where P is greater than N, and each intermediate result includes P-N padding bit values provided in at least one of most significant bit positions and least significant bit positions of the intermediate result.
11. An apparatus as in Clause 10, wherein the given bit is the most significant bit of the input value, the intermediate result generated by applying the N-bit mask to the given bit has padding bits provided in the P-N least significant bit positions, and the intermediate result generated for each successive less significant bit of the input value has one more padding bit in the most significant bit positions than the intermediate result generated for an adjacent more significant bit of the input value.
12. An apparatus as in any preceding clause, wherein the output value has more bits than the input value, the input value comprises L bits, and the output value generation circuitry is arranged to apply the N-bit mask so as to replicate the input value in a sequence of L bit positions within the output value.
13. An apparatus as in Clause 12, wherein the output value generation circuitry is arranged to generate the output value such that each bit position other than the L bit positions has its value set to a predetermined value.
14. An apparatus as in any preceding clause, wherein the input value is a significand of a floating point value, and the shift indication amount is an indication of an exponent of the floating point value.
15. An apparatus as in Clause 14, wherein the output value is a fixed point value comprising more bits than are provided by the significand of the floating point value.
16. An apparatus as in any of clauses 1 to 11, wherein the output value has a same number of bits as the input value.
17. An apparatus as in Clause 16, wherein the shift implements a rotate such that the output value is a rotated version of the input value.
18. A system comprising:
19. A chip-containing product comprising the system of Clause 18 assembled on a further board with at least one other product component.
20. A computer-readable medium to store computer-readable code for fabrication of the apparatus of any of clauses 1 to 17.
21. A method of performing a computation equivalent to applying a shift to an input value to generate an output value, comprising:
In the present application, the words “configured to . . . ” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
In the present application, lists of features preceded with the phrase “at least one of” mean that any one or more of those features can be provided either individually or in combination. For example, “at least one of: [A], [B] and [C]” encompasses any of the following options: A alone (without B or C), B alone (without A or C), C alone (without A or B), A and B in combination (without C), A and C in combination (without B), B and C in combination (without A), or A, B and C in combination.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims.