The present disclosure generally relates to computer systems, and more particularly to methods and systems for providing indexed or indirect load and store operations in a computer environment utilizing vertical and horizontal processing modes.
As is known, to improve the efficiency of multi-dimensional computations, Single-Instruction, Multiple Data (SIMD) architectures have been developed. A typical SIMD architecture enables one instruction to operate on several operands simultaneously. In particular, SIMD architectures take advantage of packing many data elements within one register or memory location. With parallel hardware execution, multiple operations can be performed with one instruction, resulting in significant performance improvement and simplification of hardware through reduction in program size and control. Traditional SIMD architectures perform mainly “vertical” operations where the corresponding elements in separate operands are operated upon in parallel and independently.
Although many applications currently in use can take advantage of such vertical operations, there are a number of important applications, which require the rearrangement of the data-elements before vertical operations can be implemented so as to provide realization of the application. Exemplary applications include many of those frequently used in graphics and signal processing. In contrast with those applications that benefit from vertical operations, many applications are more efficient when performed using horizontal mode operations.
For example, in many operations, the performance of a graphics pipeline is enhanced by utilizing vertical processing techniques, where portions of the graphics data are processed in independent parallel channels. Other operations, however, benefit from horizontal processing techniques where blocks of graphics data are processed in a serial manner. The use of both vertical mode and horizontal mode processing, also referred to as dual mode, presents challenges in data loading and storing operations. The challenges are amplified with the application of indexed or indirect operations where the operands are processed as relative address locations. For example, indexed operations generally require one or more separate operations to accomplish an otherwise basic load or store operation. For at least these reasons, the above-discussed computer processing functions are data and instruction intensive and therefore will realize improved efficiencies from systems, methods and apparatuses for providing indexed load and store operations in a dual mode computer processing environment.
Embodiments of the present disclosure provide a computer system, comprising: array logic configured to store a plurality of vectors, wherein each the plurality of vectors comprises a horizontal array; index logic configured to store offset data, relative to a base address, corresponding to each of the plurality of vectors; loading logic configured to retrieve each of the plurality of vectors; transposition logic configured to transpose the plurality of vectors into a vertical configuration using the offset data; and register logic configured to receive the plurality of vectors, wherein each of the plurality of vectors comprises a vertical array.
Embodiments of the present disclosure can also be viewed as providing methods of indexed loading in a dual mode computer processor, comprising: retrieving a plurality of vectors from an array, the array comprising a plurality of array rows and a plurality of array columns and the array configured to store each of the plurality of vectors in one of the plurality of array rows; generating a plurality of offset values, each of the plurality of offset values corresponding to a position of one of the plurality of rows relative to a base address; transposing the plurality of vectors into a vertical orientation utilizing the plurality of offset values; and storing the transposed plurality of vectors, wherein each of the plurality of vectors is configured as a corresponding one of a plurality of columns.
Embodiments of the present disclosure can also be viewed as providing a computer processing apparatus for loading indexed operations in a dual mode processing environment comprising: a data array, having at least one dimension, configured to store a plurality of data sets; an index register configured to store a plurality of offset values corresponding to an address within the data array; an accumulator configured to receive the plurality of data sets from the array; and a destination register configured to receive the plurality of data sets in a transposed configuration.
Embodiments of the present disclosure can also be viewed as providing computer hardware for loading indexed operations in a dual mode processing environment, comprising: a means for storing a plurality vectors in a first register, wherein each of the vectors comprises a plurality of components and wherein the plurality of components are vertically oriented; a means for retrieving the plurality of vectors from the first register; a means for generating a plurality of offset values corresponding to the plurality of vectors; and a means for receiving the plurality of vectors into a second register, wherein each of the plurality of components within each of the plurality of vectors is received utilizing the corresponding one of the plurality of offset values.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.
Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
Having summarized various aspects of the present disclosure, reference will now be made in detail to the description of the disclosure as illustrated in the drawings. While the disclosure will be described in connection with these drawings, there is no intent to limit it to the embodiment or embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents included within the spirit and scope of the disclosure as defined by the appended claims.
It is noted that the drawings presented herein have been provided to illustrate certain features and aspects of the embodiments of the disclosure. It will be appreciated from the description provided herein that a variety of alternative embodiments and implementations may be realized, consistent with the scope and spirit of the present disclosure.
As summarized above, the present application is directed to embodiments of apparatus, systems and methods of providing indexed load and store operations in a dual mode computer environment. Although exemplary embodiments are presented in the context of a computer graphics system, one of ordinary skill in the art will appreciate that the apparatus, systems and methods herein are applicable in any computer system using vertical mode and horizontal mode processing.
Reference is briefly made to
Reference is briefly made to
Reference is now made to
Reference is now made to
An accumulator 540 is provided for collecting the vectors 512-515. The accumulator 540 is configured such that the vectors 512-515 remain in the same horizontal orientation as when stored in the array 510. As discussed above, the accumulator 520 may be a memory location or may be achieved in logic within a processor. Transposition logic 550 is applied to the accumulated vector data to generate a vertical orientation for loading and storage in the destination register 530. The vertical orientation or configuration in the destination register 530 is such that each column shares the offset value that corresponds to a particular vector and each row constitutes a different vector component. In an embodiment, each column constitutes data provided for a single process, also referred to as a process thread. The vertical configuration facilitates vertical SIMD computations involving the processing of multiple data elements such as those found in image processing, three-dimensional graphics, and multi-dimensional data manipulations.
Reference is now made to
Reference is now made to
The vectors 724 in the temporary data storage location 730 are oriented in the same horizontal configuration as in the source data storage device 720 such that each row consists of the individual vector components 736 of each vector. The configuration of the four vectors 724, each having four vector components 736 creates a four-by-four matrix in the temporary data storage 730. A transposition function 740 is applied to the four-by-four matrix and the result is stored in a destination register 750. The four vectors 724 are stored in the destination register 750 at consecutive register addresses 752 in a vertical orientation such that each column contains a vector 724 and each row contains the same component value 736 for all of the vectors 724. In this manner, the vectors are configured for efficient vertical mode processing.
Reference is now made to
In summary,
Reference is now made to
Offset values related to a relative address of each vector are generated in block 920. The offset values provide array location information for each of the vectors relative to a base address. The base address may be a fixed reference within the array or may be assigned to an array location for a particular set of vectors. Any indexed or indirect operation will utilize the combination of the base address and the offset value to determine the actual location of data.
The horizontally-oriented vectors that are retrieved and accumulated are then transposed into a vertical orientation in block 930. The transposition entails converting the rows of horizontally oriented data into columns of vertically oriented data such that each column of transposed data represents one of the vectors. Accordingly, each row of transposed data represents a particular component of the vectors. In the vertical configuration, each of the offset values corresponds to one of the columns of data or vectors. After transposition, the vertically oriented data is stored in a destination register as shown in block 940. The vertical orientation of the data in the destination register permits the vectors to be processed in multiple parallel threads.
Reference is now made to
Also provided is hardware, software, or some combination thereof for retrieving the vectors from the source register as shown in block 1020 and for receiving the vectors into a destination register as shown in block 840. Although retrieving the vectors and generating the offset values are essentially independent operations, the combined results from both are necessary to receive the vectors into a destination register. Since the destination register stores the vectors in a vertical configuration and the source register also uses a vertical configuration, there is no transposition requirement.
The methods of the present disclosure can be implemented in hardware, software, firmware, or a combination thereof. In some embodiments, the methods are implemented in software or firmware that is stored in a memory and that is executed by a suitable instruction execution system. If implemented in hardware, as in an alternative embodiment, the logic can be implemented with any or a combination of the following technologies, which are all well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
Any process descriptions or blocks in flow charts should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process. Alternate implementations are included within the scope of an embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present disclosure.
It should be emphasized that the above-described embodiments of the present disclosure, particularly, any embodiments, are merely possible examples of implementations, merely set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) of the disclosure without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and the present disclosure and protected by the following claims.