Claims
- 1. A method comprising:
decoding an instruction of a first instruction format identifying a horizontal arithmetic operation, a first source having a first plurality of packed data elements including elements X0 and X1 and a second source having a second plurality of packed data elements including elements Y0 and Y1; executing the horizontal arithmetic operation on the first plurality of packed data elements and the second plurality of packed data elements to produce a first arithmetic result element from X0 and X1 and a second arithmetic result element from Y0 and Y1; and storing a third plurality of packed data elements including a first element to represent said first arithmetic result element, and a second element to represent said second arithmetic result element.
- 2. The method of claim 1 further comprising
overwriting said first plurality of packed data elements in the first source with said third plurality of packed data elements.
- 3. The method of claim 1, said third plurality of packed data elements storing elements to represent horizontal addition operations in a register specified by bits three through five of the instruction.
- 4. The method of claim 3, said third plurality of packed data elements storing elements to represent saturated arithmetic sums, Y0+Y1 and X0+X1.
- 5. The method of claim 3, said third plurality of packed data elements comprising 16-bit elements to represent sums, Y0+Y1 and X0+X1.
- 6. The method of claim 3, said third plurality of packed data elements comprising 32-bit elements to represent sums, Y0+Y1 and X0+X1.
- 7. The method of claim 1, said third plurality of packed data elements storing elements to represent horizontal subtraction operations.
- 8. The method of claim 7, said third plurality of packed data elements storing elements to represent saturated arithmetic differences, Y0−Y1 and X0−X1.
- 9. The method of claim 1, said third plurality of packed data elements storing elements to represent horizontal floating-point arithmetic operations.
- 10. A processor comprising:
a storage area to store a first packed data operand and a second packed data operand; and an execution unit coupled to said storage area, the execution unit in response to receiving a single instruction to perform operations on data elements in said first packed data operand and said second packed data operand to generate a plurality of data elements in a packed data result, a first of said plurality of data elements in said packed data result being the result of an intra-arithmetic operation performed by the execution unit using a first pair of data elements of said first packed data operand and a second of said plurality of data elements in said packed data result being the result of an intra-arithmetic operation performed by the execution unit using a second pair of data elements of said second packed data operand.
- 11. The processor of claim 10, wherein
each of said plurality of data elements in said packed data result being the result of an intra-add operation.
- 12. The processor of claim 11, wherein
each of said plurality of data elements in said packed data result being the result of an intra-add operation with signed saturation.
- 13. The processor of claim 10, wherein
each of said plurality of data elements in said packed data result being the result of an intra-subtract operation.
- 14. The processor of claim 13, wherein
each of said plurality of data elements in said packed data result being the result of an intra-subtract operation with signed saturation.
- 15. The processor of claim 10, the execution unit, in response to said single instruction, overwriting said first packed data operand with said packed data result.
- 16. A apparatus comprising:
a first storage area for storing a first packed data operand, containing at least an A data element and a B data element packed together; a second storage area for storing a second packed data operand containing at least a C data element and a D data element packed together; and an arithmetic circuit responsive to execution of a single instruction to arithmetically combine the A data element and the B data element to generate a first result element of a third packed data, and to arithmetically combine the C data element and the D data element to generate a second result element of the third packed data.
- 17. The apparatus of claim 16 further comprising:
a multiplexer circuit to align at least one of the A data element and the B data element and to align at least one of the C data element and the D data element for an intra-arithmetic operation; and an operation control unit, coupled with the multiplexer circuit, to signal for the alignment of said at least one of the A data element and the B data element and said at least one of the C data element and the D data element responsive to execution of said single instruction.
- 18. The apparatus of claim 17 wherein
said operation control unit, is coupled with the arithmetic circuit, to signal for the combination of the A data element and the B data element and for the combination of the C data element and the D data element according to an intra-add operation.
- 19. The apparatus of claim 17 wherein
said operation control unit, is coupled with the arithmetic circuit, to signal for the combination of the A data element and the B data element and for the combination of the C data element and the D data element according to an intra-subtract operation.
- 20. The apparatus of claim 17 wherein
said operation control unit, is coupled with the arithmetic circuit, to signal for the combination of the A data element and the B data element and for the combination of the C data element and the D data element according to a saturating arithmetic operation.
- 21. The apparatus of claim 16 further comprising:
a decoder to decode said single instruction and to enable execution of said single instruction; and a register file comprising said first storage area and said second storage area, to provide the A data element, the B data element, the C data element and the D data element responsive to the execution of said single instruction.
- 22. The apparatus of claim 21 further comprising:
a wireless communication device to send and receive digital data over a wireless network; a memory to store digital data and software including the single instruction and to supply the single instruction to said decoder; and an input output system responsive to said software to interface with the wireless communication device receiving data to process or sending data processed at least in part by said single instruction.
- 23. A system comprising:
a first storage area for storing a first packed data operand, containing at least an A data element and a B data element packed together; a second storage area for storing a second packed data operand containing at least a C data element and a D data element packed together; a decoder to decode a single instruction and to enable execution of said single instruction; an arithmetic circuit responsive to enabling execution of said single instruction to arithmetically combine the A data element and the B data element to generate a first result element of a third packed data, and to arithmetically combine the C data element and the D data element to generate a second result element of the third packed data; a wireless communication device to send and receive digital data over a wireless network; a memory to store digital data and software including the single instruction and to supply the single instruction to said decoder; and an input output system responsive to said software to interface with the wireless communication device receiving data to process or sending data processed at least in part by said single instruction.
- 24. The system of claim 23, wherein
each of said first and second result elements of the third packed data being the result of an intra-add operation.
- 25. The system of claim 24, wherein
each of said first and second result elements of the third packed data being the result of an intra-add operation with signed saturation.
- 26. The system of claim 23, wherein
each of said first and second result elements of the third packed data being the result of an intra-subtract operation.
RELATED APPLICATIONS
[0001] This is a continuation-in-part application claiming, under 35 U.S.C. § 120, the benefit of the filing dates of U.S. application Ser. No. 09/952,891, filed Oct. 29, 2001, currently pending; and of U.S. application Ser. No. 10/193,645, filed Jul. 9, 2002, currently pending; which is a continuation of application Ser. No. 9/053,401, filed Mar. 31, 1998, now U.S. Pat. No. 6,418,529.
Continuations (1)
|
Number |
Date |
Country |
Parent |
09053401 |
Mar 1998 |
US |
Child |
10193645 |
Jul 2002 |
US |
Continuation in Parts (2)
|
Number |
Date |
Country |
Parent |
09952891 |
Oct 2001 |
US |
Child |
10610784 |
Jun 2003 |
US |
Parent |
10193645 |
Jul 2002 |
US |
Child |
10610784 |
Jun 2003 |
US |