Claims
- 1. A method comprising:
responsive to a first single instruction, identifying a first operand including four packed data elements, (r3, r2, r1 and r0), and identifying a second operand including four packed coefficients, (w3, w2, w1 and w0), generating four packed first products, (r3w3, r2w2, r1w1 and r0w0) and storing said four packed first products at a first destination identified by said first single instruction; responsive to a second single instruction, identifying a third operand including four packed data elements, (s3, s2, s1 and s0), and identifying a fourth operand including four packed coefficients, (w7, w6, w5 and w4), generating four packed second products, (s3w7, s2w6, s1w5 and s0w4) and storing said four packed second products at a second destination identified by said second single instruction; and responsive to a third single instruction, identifying a fifth operand including the four packed first products and identifying a sixth operand including the four packed second products, generating four packed sums, (s2w6+s3w7, s0w4+s1w5, r2 w2+r3w3, and r0w0+r1w1) and storing them at a third destination identified by said third single instruction.
- 2. The method of claim 1 further comprising:
responsive to the first single instruction, overwriting said four packed data elements in the first operand with said four packed first products; responsive to the second single instruction, overwriting said four packed data elements in the second operand with said four packed second products; and responsive to the third single instruction, overwriting said four packed first products in the fifth operand with said four packed sums.
- 3. The method of claim 1 further comprising:
responsive to a fourth single instruction identifying a seventh operand including the four packed first products and identifying an eighth operand including the four packed second products, generating four packed differences, (s2w6−s3w7, s0w4−s1w5, r2w2−r3w3, and r0w0−r1w1) and storing them at a fourth destination identified by said fourth single instruction.
- 4. The method of claim 3, said fourth destination storing elements to represent saturated packed differences, (s2w6−s3w7, s0w4−s1w5, r2w2·r3w3, and r0w0−r1w1).
- 5. The method of claim 1, said third destination storing elements to represent horizontal addition operations in a register specified by bits three through five of the third single instruction.
- 6. The method of claim 5, said third destination storing elements to represent saturated arithmetic sums, (s2w6+s3w7, s0w4+s1w5, r2 w2+r3w3, and r0w0+r1w1).
- 7. The method of claim 5, said third destination comprising packed 16-bit elements to represent said four packed sums, (s2w6+s3w7, s0w4+s1w5, r2 w2+r3w3, and r0w0+r1w1).
- 8. The method of claim 5, said third destination comprising packed 32-bit elements to represent said four packed sums, (s2w6+s3w7, s0w4+s1w5, r2w2+r3w3, and r0w0+r1w1).
- 9. The method of claim 8, said third destination storing elements to represent horizontal floating-point arithmetic operations.
- 10. A processor comprising:
a storage area to store a first packed data operand, a second packed data operand and a third packed data operand; and an execution unit coupled to said storage area, the execution unit to execute a first single instruction on data elements in said first packed data operand and said second packed data operand to generate a plurality of data elements in a first packed data result, at least one of said plurality of data elements in said first packed data result being the result of an intra-add operation performed on a first pair of data elements of said first packed data operand and at least one other of said plurality of data elements in said first packed data result being the result of an intra-add operation performed on a second pair of data elements of said second packed data operand, the execution unit to execute a second single instruction on data elements in said third packed data operand and said second packed data operand to generate a plurality of data elements in a second packed data result, at least one of said plurality of data elements in said second packed data result being the result of an intra-subtract operation performed on a third pair of data elements of said third packed data operand and at least one other of said plurality of data elements in said second packed data result being the result of an intra-subtract operation performed on the second pair of data elements of said second packed data operand.
- 11. The processor of claim 10, wherein
each of said plurality of data elements in said first packed data result being the result of an intra-add operation with signed saturation.
- 12. The processor of claim 11, wherein
each of said plurality of data elements in said second packed data result being the result of an intra-subtract operation with signed saturation.
- 13. The processor of claim 10, the execution unit, in response to said first single instruction, overwriting said first packed data operand with said first packed data result.
- 14. A apparatus comprising:
a first storage area for storing a first packed data operand, containing at least an A data element and a B data element packed together; a second storage area for storing a second packed data operand containing at least a C data element and a D data element packed together; and an arithmetic circuit responsive to execution of a first single instruction to add the A data element and the B data element to generate a first result element of a third packed data, and to add the C data element and the D data element to generate a second result element of the third packed data, the arithmetic circuit responsive to execution of a second single instruction to subtract the A data element and the B data element to generate a third result element of a fourth packed data, and to subtract the C data element and the D data element to generate a fourth result element of the fourth packed data.
- 15. The apparatus of claim 14 wherein
each of said plurality of data elements in said third packed data and said fourth packed data are the result of an intra-add operation or an intra-subtract operation with signed saturation.
- 16. The apparatus of claim 14 further comprising:
a decoder to decode said first single instruction and said second single instruction and to enable execution of said first single instruction and said second single instruction; and a register file comprising said first storage area and said second storage area, to provide the A data element, the B data element, the C data element and the D data element responsive to the execution of said first or second single instruction.
- 17. A system comprising:
a first storage area for storing a first packed data operand, containing at least an A data element and a B data element packed together; a second storage area for storing a second packed data operand containing at least a C data element and a D data element packed together; a third storage area for storing a third packed data operand containing at least a E data element and a F data element packed together; a decoder to decode a first single instruction and to enable execution of said first single instruction, the decoder to decode a second single instruction and to enable execution of said second single instruction; an arithmetic circuit responsive to enabling execution of said first single instruction to add the A data element and the B data element to generate a first result element of a fourth packed data, and to add the C data element and the D data element to generate a second result element of the fourth packed data, the arithmetic circuit responsive to enabling execution of said second single instruction to subtract the E data element and the F data element to generate a third result element of a fifth packed data, and to subtract the C data element and the D data element to generate a fourth result element of the fifth packed data; a wireless communication device to send and receive digital data over a wireless network; a memory to store digital data and software including the first and second single instructions and to supply the first and second single instructions to said decoder; and an input output system responsive to said software to interface with the wireless communication device receiving data to process or sending data processed at least in part by said first and second single instructions.
- 18. The system of claim 17, wherein
each of said first and second result elements of the fourth packed data are the result of an intra-add operation with signed saturation.
- 19. The system of claim 18, wherein
each of said third and fourth result elements of the fifth packed data are the result of an intra-subtract operation with signed saturation.
- 20. The system of claim 17, wherein
each of said first and second result elements of the fourth packed data and said third and fourth result elements of the fifth packed are 16-bit results.
- 21. The system of claim 17, wherein
each of said first and second result elements of the fourth packed data and said third and fourth result elements of the fifth packed are 32-bit results.
- 22. The system of claim 21, wherein
each of said first and second result elements of the fourth packed data and said third and fourth result elements of the fifth packed data are floating point results.
- 23. The system of claim 21, wherein
each of said first and second result elements of the fourth packed data and said third and fourth result elements of the fifth packed data are unsaturated signed integer results.
- 24. A computer software product including one or more recordable media having executable instructions stored thereon including a first instruction and a second instruction which, when executed by a processing device, cause the processing device to:
access a first packed data operand, containing at least an A data element and a B data element packed together at a first storage area; access a second packed data operand containing at least a C data element and a D data element packed together at a second storage area; add the A data element and the B data element to generate a first result element of a third packed data in response to said first instruction; add the C data element and the D data element to generate a second result element of the third packed data in response to said first instruction; access a fourth storage area for storing a fourth packed data operand containing at least a E data element and a F data element packed together; access the second packed data operand containing at least the C data element and the D data element packed together at the second storage area; subtract the E data element and the F data element to generate a third result element of a fifth packed data in response to said second instruction; and subtract the C data element and the D data element to generate a fourth result element of the fifth packed data in response to said first instruction.
- 25. The computer software product of claim 24 which, when executed by a processing device, further cause the processing device to:
overwrite said first packed data operand with said third packed data in response to said first instruction.
- 26. The computer software product of claim 24 which, when executed by a processing device, further cause the processing device to:
overwrite said fourth packed data operand with said fifth packed data in response to said second instruction.
- 27. The system of claim 24, wherein
each of said first and second result elements of the third packed data and said third and fourth result elements of the fifth packed data are saturated signed values.
- 28. The system of claim 24, wherein
each of said first and second result elements of the third packed data and said third and fourth result elements of the fifth packed data are 16-bit integer values.
- 29. The system of claim 24, wherein
each of said first and second result elements of the third packed data and said third and fourth result elements of the fifth packed data are 32-bit integer values.
RELATED APPLICATIONS
[0001] This is a continuation-in-part application claiming, under 35 U.S.C. § 120, the benefit of the filing dates of U.S. application Ser. No. 09/952,891, filed Oct. 29, 2001, currently pending; and of U.S. application Ser. No. 10/193,645, filed Jul. 9, 2002, currently pending; which is a continuation of application Ser. No. 9/053,401, filed Mar. 31, 1998, now U.S. Pat. No. 6,418,529.
Continuations (1)
|
Number |
Date |
Country |
Parent |
09053401 |
Mar 1998 |
US |
Child |
10193645 |
Jul 2002 |
US |
Continuation in Parts (2)
|
Number |
Date |
Country |
Parent |
09952891 |
Oct 2001 |
US |
Child |
10611326 |
Jun 2003 |
US |
Parent |
10193645 |
Jul 2002 |
US |
Child |
10611326 |
Jun 2003 |
US |