Claims
- 1. An apparatus for use in a computer system comprising:
a memory having stored therein a first packed data and a second packed data; and a processor coupled to said memory to receive said first packed data and said second packed data, said processor performing operations on data elements in said first packed data and said second packed data to generate a plurality of data elements in a third packed data in response to receiving an instruction, at least two of said plurality of data elements in said third packed data storing the result of multiply-add operations.
- 2. The apparatus of claim 1, said first packed data including a first data element, a second data element, a third data element, and a fourth data element;
said second packed data containing at least a fifth data element, a sixth data element, a seventh data element, and an eighth data element; and said third packed data containing at least a ninth data element and a tenth data element, said ninth data element representing the result of:
(said first data element multiplied by said fifth data element) added to (said second data element multiplied by said sixth data element) said tenth data element representing the result of:
(said third data element multiplied by said seventh data element) added to (said fourth data element multiplied by said eighth data element).
- 3. The apparatus of claim 2, said first data element, said second data element, said third data element, said fourth data element, said fifth data element, said sixth data element, said seventh data element, and said eighth data element each comprising a first predetermined number of bits; and
said ninth data element and said tenth data element each comprising a second predetermined number of bits, said second predetermined number of bits being greater than said first predetermined number of bits.
- 4. The apparatus of claim 1, wherein said multiply-add operation is performed with saturation.
- 5. An apparatus for use in a computer system comprising:
a first storage area; and a circuit coupled to said first storage area, said circuit multiplying a value A by a value B to generate a first intermediate result, multiplying a value C by a value D to generate a second intermediate result, multiplying a value E by a value F to generate a third intermediate result, multiplying a value G by a value H to generate a fourth intermediate result, adding said first intermediate result to said second intermediate result to generate a value I, adding said third intermediate result to said fourth intermediate result to generate a value J, and storing said value I and said value J in said first storage area as elements of a first packed data in response to an enable signal.
- 6. The apparatus of claim 5, said computer system further comprising:
a second storage area coupled to said circuit, said second storage for storing said value A, said value B, said value C, and said value D as data elements of a second packed data; and a third storage area coupled to said circuit, said third storage for storing said value E, said value F, said value G, and said value H as data elements of a third packed data.
- 7. The apparatus of claim 5, said value I and said value J providing a higher precision than at least one of said value A, said value B, said value C, said value D, said value E, said value F, said value G, and said value H.
- 8. A computer system comprising:
a processor; and a storage area coupled to said processor having stored therein,
a multiply-add instruction for operating on a first packed data and a second packed data, said first packed data containing at least data elements A, B, C, and D each including a predetermined number of bits, said second packed data containing at least data elements E, F, G, and H each including said predetermined number of bits, said processor generating a third packed data containing at least data elements I and J in response to receiving said multiply-add instruction, said data element I equal to (A×E)+(B×F), said data element J equal to (C×G)+(D×H).
- 9. The computer system of claim 8, said processor further including a first register, said processor, in addition to generating said third packed data, also storing said third packed data in said first register in response to receiving said multiply-add instruction.
- 10. The computer system of claim 8, said processor further including:
a first register having stored therein said first packed data; and a second register having stored therein said second packed data.
- 11. The computer system of claim 8, said storage area further having stored therein said first packed data and said second packed data.
- 12. The computer system of claim 8, said data elements I and J providing a higher precision than at least one of said data elements A, B, C, D, E, F, G, and H.
- 13. The computer system of claim 8, said data elements I and J including two times said predetermined number of bits.
- 14. The computer system of claim 8, said data elements A, B, C, D, E, F, G, H, I and J are either unsigned or signed data elements.
- 15. A processor comprising:
a first storage for storing a first packed data containing at least an A, a B, a C, and a D data element; a second storage area for storing a second packed data containing at least an E, an F, a G, and an H data element; a multiply-add circuit including:
a first multiplier coupled to said first storage area to receive said A and coupled to said second storage area to receive said E; a second multiplier coupled to said first storage area to receive said B and coupled to said second storage area to receive said F; a third multiplier coupled to said first storage area to receive said C and coupled to said second storage area to receive said G; a fourth multiplier coupled to said first storage area to receive said D and coupled to said second storage area to receive said H; a first adder coupled to said first multiplier and said second multiplier; a second adder coupled to said third multiplier and said fourth multiplier; and a third storage area coupled to said first adder and said second adder, said third storage area having at least a first field and a second field, said first field for storing the output of said first adder as a first data element of a third packed data, said second field for storing the output of said second adder as a second data element of said third packed data.
- 16. An apparatus for use in a computer system comprising:
a first storage area having at least a first field and a second field; and a circuit, coupled to said first storage area, operating in response to a signal, said circuit comprising:
a multiplication means for multiplying a value A by a value B to generate a first intermediate result, multiplying a value C by a value D to generate a second intermediate result, multiplying a value E by a value F to generate a third intermediate result, and multiplying a value G by a value H to generate a fourth intermediate result; and an arithmetic means for adding said first intermediate result and said second intermediate result to generate a value I, and adding said third intermediate result and said fourth intermediate result to generate a value J; and a storage means for storing said value I in said first field and said value J in said second field as a first packed data.
- 17. An apparatus for use in a computer system comprising:
a memory having stored therein a first packed data and a second packed data each containing initial data elements, each of said initial data elements in said first packed data having a corresponding initial data element in said second packed data; a circuit, coupled to said first storage area, operating in response to a signal, said circuit comprising:
a multiplication means for multiplying together said corresponding initial data elements in said first packed data and said second packed data to generate corresponding intermediate data elements, said intermediate data elements being divided into a number of sets; an arithmetic means for generating a plurality of result data elements, a first of said plurality of result data elements representing the sum of said intermediate result data elements in a first of said number of sets, a second of said plurality of result data elements representing the sum of said intermediate result data elements in a second of said number of sets; and a storage means for storing said result data elements as a third packed data in said memory.
- 18. The apparatus of claim 17, wherein said memory includes a register for storing said third packed data.
- 19. The apparatus of claim 17, wherein said first packed data and said second packed data each contain at least four initial data elements, and wherein each of said sets contain at least two intermediate data elements.
- 20. The apparatus of claim 17, wherein said arithmetic operations are performed with saturation.
- 21. The apparatus of claim 17, wherein said initial data elements, said intermediate data elements, and said result data elements are each either signed or unsigned values.
- 22. The apparatus of claim 17, wherein said intermediate data elements and said result data elements contain twice as many bits as said initial data elements.
- 23. An apparatus for use in a computer system comprising:
a memory having stored therein a first packed data and a second packed data, said first packed data storing a first plurality of sets of data elements, each of said first plurality of sets of data elements having a corresponding set of data elements in said second packed data; and a processor coupled to said memory to receive said first packed data and said second packed data, said circuit storing in a third storage area a plurality of data element as a third packed data in response to receiving an instruction, each of said plurality of data elements storing the dot product of one of said first plurality of sets of data elements in said first packed data and said corresponding set of data elements in said second packed data.
- 24. In a computer system, a method comprising the steps of:
A) receiving an instruction; and B) performing the following steps in response to receiving said instruction,
B1) multiplying together a first value and a second value to generate a first intermediate result, B2) multiplying together a third value and a fourth value to generate a second intermediate result, B3) multiplying together a fifth value and a sixth value to generate a third intermediate result, B4) multiplying together a seventh value and an eighth value to generate a fourth intermediate result, B5) adding together said first intermediate result and said second intermediate result to generate a first data element in a first packed data, B6) adding together said third intermediate result and said fourth intermediate result to generate a second data element in said first packed data; B7) storing said first packed data in a first storage area.
- 25. In a computer system, a method for manipulating a first packed data and a second packed data, said first packed data including A1, A2, A3, and A4 as data elements, said second packed data including B1, B2, B3, and B4 as data elements, said method comprising the steps of:
receiving an instruction; and performing the following steps in response to receiving said instruction,
performing the operation (A1×B1)+(A2×B2) to generate a first data element in a third packed data; performing the operation (A3×B3)+(A4×B4) to generate a second data element in said third packed data; storing said third packed data in a first storage area.
- 26. In a computer system having stored therein a first packed data and a second packed data each containing initial data elements, each of said initial data elements in said first packed data having a corresponding initial data element in said second packed data, a method for performing multiply add operations, said method comprising the steps of:
receiving an instruction; and performing the following steps in response to receiving said instruction,
multiplying together said corresponding initial data elements in said first packed data and said second packed data to generate corresponding intermediate data elements, said intermediate data elements being divided into a number of sets; generating a plurality of result data elements, a first of said plurality of result data elements representing the sum of said intermediate result data elements in a first of said number of sets, a second of said plurality of result data elements representing the sum of said intermediate result data elements in a second of said number of sets; and storing said plurality of result data elements as a third packed data in a memory.
CROSS-REFERENCE TO RELATED APPLICATION
[0001] Ser. No. ______, titled “A Method and Apparatus for Performing Multiply-Subtract Operations on Packed Data,” filed , by Alexander D. Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Andrew F. Glew, Carole Dulong, Eiichi Kowashi, and Wolf Witt.
Continuations (1)
|
Number |
Date |
Country |
Parent |
08522067 |
Aug 1995 |
US |
Child |
09989736 |
Nov 2001 |
US |