This disclosure relates to data processing.
Some data processing arrangements allow for vector processing operations, involving applying a single vector processing instruction to data items of a data vector having a plurality of data items at respective positions in the data vector. By contrast, scalar processing operates on, effectively, single data items rather than on data vectors.
In an example arrangement there is provided data processing apparatus comprising:
In another example arrangement there is provided a data processing method comprising:
In another example arrangement there is provided a virtual machine comprising a data processor to execute a computer program comprising machine readable instructions, in which execution of the computer program causes the data processor to operate as a data processing apparatus comprising:
Further respective aspects and features of the disclosure are defined by the appended claims.
The present technique will be described further, by way of example only, with reference to embodiments thereof as illustrated in the accompanying drawings, in which:
Referring now to the drawings,
The processor 20 can access a storage array 90 of at least n×n storage locations. This is drawn in broken line to illustrate that it may or may not be provided as part of the processor 20. In various examples, the storage array can be implemented as any one or more of the following: architecturally-addressable registers; non-architecturally-addressable registers; a scratchpad memory; and a cache.
The processing circuitry 60 may be, for example vector processing circuitry and/or scalar processing circuitry. A general distinction between scalar processing and vector processing is as follows. Vector processing involves applying a single vector processing instruction to data items of a data vector having a plurality of data items at respective positions in the data vector. Scalar processing operates on, effectively, single data items rather than on data vectors. Vector processing can be useful in instances where processing operations are carried out on many different instances of the data to be processed. In a vector processing arrangement, a single instruction can be applied to multiple data items (of a data vector) at the same time. This can improve the efficiency and throughput of data processing compared to scalar processing.
The present techniques relate to processing two dimensional arrays of data items, stored in for example the storage array 90. The two-dimensional storage arrays may, in at least some examples, be accessed as vectors, for example of n elements.
In example embodiments, the storage array 90 may store a square array portion of a larger or even higher-dimensioned array or matrix of data items in memory.
Multiple instances of the storage array 90 (that is to say, two or more instances) may be provided so as to store multiple respective arrays of data items.
The discussion below relates to example program instructions 34. Embodiments of the present disclosure include an apparatus, for example of the type shown in
Optionally, where a vector processor is in use, the vector processing operations may be under the control of so-called predicates. Here, a respective predicate can control whether or not a particular vector function is applied in respect of one of the data item positions within the linear arrays (which could be treated as data vectors in this example arrangement).
As discussed above, the processing circuitry 60 is arranged, under control of instructions decoded by decoder circuitry 50, to access the registers 70 and/or the storage array 90. Further details of this latter arrangement will now be described with reference to
In the present examples, the storage array 90 is arranged as an array 205 of at least n×n storage locations 200, where n is an integer greater than 1. In the present example, n is 16 which implies that the granularity of access to the storage locations 200 is 1/16th of the total storage in either horizontal or vertical array directions. This aspect will be discussed further below.
From the point of view of the processing circuitry, the array of n×n locations is accessible as n linear (one-dimensional) arrays in a first direction (for example, a horizontal direction as drawn) and n linear arrays in a second array direction (for example, a vertical direction as drawn). Each linear array has n elements so that each of the storage arrays stores a linear array of n data items. In other words, the n×n storage locations are arranged or at least accessible, from the point of view of the processing circuitry 60, as 2n linear array, each of n data items.
Therefore, this provides an example in which the array of n×n storage locations comprises an array of storage elements accessible by the instruction processing circuitry as 2n linear arrays, the 2n linear arrays comprising n linear arrays in the first array direction and n linear arrays in the second array direction, each linear array containing n data items (for example, though this is not a requirement, as a data vector register. Example instructions discussed below may specify one or more of the 2n linear arrays.
Implementations of these techniques will be discussed with reference to
In further examples, rather than simply accessing a linear array as discussed above, access, for a vector of n vector elements, is made to a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction. Such a parameter may be, for example, a reference to an index vector or to a register storing such a vector of indices.
For example, the array location accessed for the given vector element of the vector may be defined by at least a pair of coordinates associated with the given vector element of the vector by parameters of the array access instruction. In examples, the pair of coordinates may define, for the given vector element of the vector, an array location in each of a first array direction and a second array direction different to the first array direction (for example, x and y directions as presented schematically below).
One of these coordinates could be implied by an element position or lane, in that the array location accessed for the given vector element of the vector may be defined by a coordinate in a first array direction dependent upon a vector position of the given vector element, and a coordinate in a second array direction different to the first array direction defined by a parameter of the array access instruction. This arrangement applies to the examples (such as that shown in
In various examples, the second array direction may be orthogonal to the first array direction. Examples of horizontal rows and vertical columns are discussed here.
Where indexing is used in a direction or coordinate axis, the instruction decoder circuitry may be configured to select the first array direction and the second array direction from two candidate array directions in response to a parameter of the array access instruction. For example, the “HV” parameter discussed below can be used.
Predicated control of operation may optionally be used, in which the instruction processing circuitry is responsive to one or more sets of predicates associated with respective vector elements to control accessing of the array register in respect of the respective vector elements.
In example arrangements, techniques such as those defined by the so-called Scalable Vector Extension (SVE) or SVE2 arrangements by Arm Limited can be used so that the vector processing circuitry is configured to select the vector length applicable to the vector processing circuitry.
Implementations of these techniques will be discussed with reference to
The array of storage locations 200 is accessible by access circuitry 210, 220, column selection circuitry 230 and row selection circuitry 240, under the control of control circuitry 250 in communication with at least the processing circuitry and optionally with the decoder circuitry 50.
In connection with a given access, it may be that only the access circuitry 210 or only the access circuitry 220 is active.
The types of access under discussion, applicable to any of the types of array access discussed here, including those discussed with reference to
In other words, the array access instruction may comprise an instruction selected from the list consisting of: a vector storage instruction to store data items to respective locations in the array register; and a vector retrieval instruction to retrieve data items from respective locations in the array register. The array access instruction may comprise a vector storage instruction to store vector elements of an input data vector to respective locations in the array register; or a vector retrieval instruction to retrieve data items of a set of memory locations of the main memory to respective vector elements of a destination data vector. Where the data processing apparatus comprises a main memory accessible by the vector processing circuitry, the vector storage instruction may comprise an instruction selected from the list consisting of: a first vector retrieval instruction to retrieve vector elements of an output data vector from respective locations in the array register; and a second vector retrieval instruction to retrieve data items to a set of memory locations of the main memory from respective locations in the array register.
Each of these types of access may be selectable by the use of a separate respective instruction or op-code, and/or by the use of respective parameters of an instruction. For example, a single instruction may provide for any of these accesses, with the direction (write or read as set out above) being defined by an instruction parameter, and with the source/destination (vector register or set of memory locations) being defined and/or identified by another parameter.
The writing to or reading from the register 260 or the set 270 of locations can be performed serially, for example one data item or element at a time in a predefined order, in parallel (all at substantially the same time) or in groups of, for example, 4 or 8 data items or elements in parallel. The routing of data items to or from vector elements or memory locations can be under the control of the access circuitry 210, 220 and/or the control circuitry 250.
In the drawings which follow, the register 260 and set 270 of memory locations is not drawn, purely for clarity of the representations.
With reference to
In example arrangements second array direction (for example vertical or horizontal as drawn in
In order to access one of the linear arrays A1H0 . . . . A1H15 in the first direction, for example the horizontal direction as drawn, reference is made to
Similarly, with reference to
The so-called granularity of the arrangement of
Note that this is just an example arrangement for the purposes of the present explanation. In general, an example architecture may support a scalable or processor-selectable vector length as discussed above, for example with the processor 20 maintaining a variable VL indicative of the vector length in use. The value of VL is established or selected using techniques defined by the SVE/SVE2 arrangements discussed above. So while in this particular example an example vector length of 512 bits is used, in general, A1Hm represents (VL/32) items each of 32 bits, or even more generally (VL/ELEM_SIZE ( ) items of ELEM_SIZE ( ) bits.
In example arrangements the instruction processing circuitry 60 is configured to store an input vector or linear array to the array of storage locations as a group (A1Hm) of n storage locations arranged in the first array direction; and is responsive to a data retrieval instruction, to retrieve, as a linear array, a set of n storage locations arranged in an array direction (A1Hm or A1Vm for example) selected, under control of the data retrieval instruction, from the set of candidate array directions; and the first array direction is a predetermined array direction (for example, horizontal as drawn). In other words, data writes are constrained to the first direction whereas data reads are allowed in either direction. But of course, another example arrangement could be provided in which data writes and data reads are allowed in either direction.
The examples discussed above relate to accessing multi-dimensional storage arrays such as two-dimensional storage arrays in either the horizontal or the vertical directions so as to store or retrieve linear arrays with respect to the two-dimensional storage arrays.
In contrast to the linear array system discussed above, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction. Such a parameter may be, for example, a reference to an index vector or to a register storing such a vector of indices.
Examples of the present techniques, to be discussed below, provide further instructions or variants of instructions which, when decoded and executed, provide for the use of one or more vectors of indices to perform indexed accesses by row (horizontal) and/or column (vertical) in order to store or gather individual elements from the storage array into (or from) a destination/source vector in a register or in memory.
In these examples, the apparatus of
For the purposes of the following discussion,
In order to generate elements of the destination vector Zd, each element is derived from a respective column ZA0V of the array ZA0 in
It will be understood that instead of a vertically-indexed access, a horizontally-indexed access arrangement could be used so that (again assuming that the predication allows for all lanes to be processed), the destination vector Zd is assembled by gathering four respective data values from storage locations defined by a vertical position applicable to the position within the destination vector of the respective destination vector element and a horizontal position defined by the respective element of the index vector Zc. An example of such an arrangement is illustrated schematically by
The syntax used in these examples is as follows:
Here, MOVA represents an array move instruction. Zd is a destination vector register. B indicates a byte format in this example. Pg is a predicate register which controls operation for each vector lane. M is a modifier relating to the predicate operation, for example defining whether an inactive predicate indicates that the element at that lane should be set to zero or maintained at its previous value. ZAt defines the array in use (such as ZA0 as drawn). H or V defines whether the indexed access is in a horizontal or vertical direction. Once again, B indicates a byte format. Zc is a vector of indices.
In a first operation, MOVA Zd.B, Pg/M, ZA0V.B [Zb], a destination register Zd is populated by [a b e f] using vertically indexed access to the array ZA0 according to the index vector Zb.
In a second operation, MOVA Ze.B, Pg/M, ZA0V.B [Zc], a destination register Ze is populated by [c d g h] using vertically indexed access to the array ZA0 according to the index vector Zc.
A previously proposed primary zip operation (ZIP1) populates a vector register Zf The ZIP1 instruction reads adjacent vector elements from the lower half of two source registers (in this case Zd and Ze as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination register Zf. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
A previously proposed secondary zip operation (ZIP2) populates a vector register Zg. The ZIP2 operation reads adjacent vector elements from the upper half of two source vector registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination vector register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.
A potential use of such instructions could be the extraction of small patches from a storage array, for example to provide processing for a patch of pixels from an image, or to extract the rows of small matrices produced by predicated outer product instructions, for example those smaller than the streaming vector length divided by the element size.
In a variant of the above operations, a move operation could be provided from a source vector register into the storage array, with the storage array locations to which data is moved from respective vector elements of the source vector register being defined by a horizontally or vertically indexed access using these same techniques. Here, the syntax may be similar to that discussed above, but using a “store” command with the destination being defined by indexed access into an array (an example being <ZAt><HV>.B [<Zc>] as used above) and the source being defined by a vector register (an example being <Zd>.B as used above). Once again, predication can be used in the same manner as described above.
In the present examples, the move instruction as discussed (MOVA) can operate in either direction between a vector register and a set of array locations, with the sense or direction of the operation (reading from the array locations or writing to the array locations) being defined by the ordering of the operands defining the origin and destination of the data (destination defined first, then origin, in the example syntax shown here). At least in the examples discussed here, however, a store operation to be described below is always from a set to array locations to memory.
Another variant is illustrated by the following example instruction:
Here, ST1W is a word-based store instruction. The suffix .S indicates a data element size in the vector, being selected from a list which may include (for example):
The data destination is defined in the same way as discussed above, in that <ZAt><HV>.S [<Zc>] defines access into the array ZAt, either horizontally (H) or vertically (V) indexed by an index vector stored in Zc. Optional predication is provided by <Pg>. The expression [<Xn|SP> {,<Xm>, LSL #2}] provides a known definition (in the context of the SVE system) of a corresponding set of memory locations, by defining a base address (Xn or SP), then an offset (Xm) defined as a number of elements.
In the examples described above, the instruction decoder circuitry is responsive to an array access instruction (such as the move or store instructions discussed above), to control the instruction processing circuitry to access (for example read from or write to), for a vector of n vector elements (whether embodied in a vector register or in memory), a set of n storage locations each having a respective array location in the array register, the array location accessed for a given vector element of the vector being defined by one or more coordinates associated with the given vector element by one or more parameters of the array access instruction. Examples of such parameters include ><HV>.B [<Zc>], namely the index vector, for example in conjunction with the definition <HV> of horizontal or vertical indexing. In other examples given below, examples of such parameters may define respective indices (such as a pair of indices) in each of two (or indeed more) coordinate directions)
Any of the techniques discussed above may be used in the context of two-dimensional indexing. An example is illustrated schematically in
Here, the syntax may be similar to that discussed above, except that both the H and V parameters may be set (to indicate indexing in both directions) and the pair of index registers defined as the source of the indexing information:
In various embodiments the arrays can be implemented as any one or more of the following: architecturally-addressable registers; non-architecturally-addressable registers; a scratchpad memory; and a cache.
More than Two Dimensions—Examples
The techniques described here can be extended to arrays having more than two dimensions, for example 3-dimensional arrays such as n×n×n storage arrays. Here, one or more coordinates of storage locations to be accessed in respect of a vector element position or lane can be defined by an entry in an index vector, zero or more coordinates can be implied by the vector element position or lane, and zero or more coordinates can be specified by one or more parameters of the access instruction.
The techniques described above may be implemented by the processing circuitry (which may comprise or may control the control circuitry) causing the control circuitry to control the access and selection circuitry 210, 220, 230, 240 to access the appropriate elements in the storage array.
By way of summary,
In the present application, the words “configured to . . . ” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
Although illustrative embodiments of the present techniques have been described in detail herein with reference to the accompanying drawings, it is to be understood that the present techniques are not limited to those precise embodiments, and that various changes, additions and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the techniques as defined by the appended claims. For example, various combinations of the features of the dependent claims could be made with the features of the independent claims without departing from the scope of the present techniques.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2200675.3 | Jan 2022 | GB | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/GB2022/053215 | 12/14/2022 | WO |