Claims
- 1. An arithmetic processor comprising:
- a first data register means having a bit length of R1 bits;
- means for combining first and second multi-bit data numbers into a first packed word having a first high-order portion and a first low-order portion and being stored in said first register means and combining third and fourth data numbers into a second packed word having a second high-order portion and a second low-order portion, said first and second multi-bit data numbers having bit lengths of N1 and N2, said first packed word having a bit length of P1 such that P1.gtoreq.N1+N2, said third and fourth multi-bit data numbers having bit lengths of N3 and N4, said second packed word having a bit length of P2 such that P2.gtoreq.N3 N4, said first data number being directed to said first low-order portion of said first packed word, said second data number being directed to said first high-order portion of said first packed word, said third data number being directed to said second low-order portion of said second packed word, and said fourth data number being directed to said second high-order portion of said second packed word;
- a single-instruction single-data arithmetic logic unit of bit length L1 for additions and subtractions with an unbroken carry chain of contents of said first packed word in said first multi-bit register means with said second packed word to produce a third packed word of bit length P3, where L1.gtoreq.P1, L1.gtoreq.P2, and L1.gtoreq.P3;
- means for directing said first packed word stored in said first register means and said second packed word to said single-instruction single-data arithmetic logic unit to produce said third packed word; and
- means for extracting a first output number of bit length N5 from a third low-order portion of said third packed word and extracting a second output number of bit length N6 from a third high-order portion of said third packed word where P3.gtoreq.N5+N6.
- 2. The system as in claim 1 wherein said first and third low-order and high-order portions of said first and third packed words, respectively, comprise a first half and a second half thereof.
- 3. The system as in claim 2 wherein P1=P2=P3=32.
- 4. The system as in claim 3 wherein N1 =N2=N3=N4=N5=N6=16.
- 5. An apparatus comprising:
- a first multi-bit data register means of bit length R1;
- means for combining first and second multi-bit data numbers into a first packed word in said first register means and combining third and fourth multi-bit data numbers into a second packed word said first packed word having a bit length of P1 and said first and second multi-bit data numbers having bit lengths of N1 and N2, respectively, such that P1.gtoreq.N1+N2, said second packed word having a bit length of P2 and said third and fourth multi-bit data numbers having bit lengths of N3 and N4, respectively, such that P2.gtoreq.N3+N4;
- a single-instruction single-data arithmetic logic unit of bit length L1 for adding or subtracting with an unbroken carry chain contents of said first packed word in said first multi-bit register means and said second packed word to produce a third packed word of bit length P3, where L1.gtoreq.P1, L1.gtoreq.P2, and L1.gtoreq.P3, functional relationships between each pair of functionally adjacent bit processors in said arithmetic logic unit being the same;
- means for directing said first packed word stored in said first register means and said second packed word to said arithmetic logic unit to generate said third packed word; and
- means for extracting first and second output numbers from said third packed word, said first output number having a bit length of N5 and said second output number having a bit length of N6, and P3 24 N5+N6.
- 6. A single-instruction multiple-data arithmetic processor for performing a functional operation f() on a first number A to provide a first result X=f(A), and performing said functional operation f() on a second number B to provide a second result Y=f(B), said functional operation f() being a shift operation said first number being an n-bit number and said second number being an m-bit number comprising:
- means for doublevector production producing a q-bit doublevector C from said first number A and said second number B, where q.gtoreq.m+n;
- a p-bit single-operation single-data arithmetic logic unit providing said functional operation f() on said doublevector C to provide an r-bit output doublevector Z=f(C), functional relationships between functionally adjacent pairs of bit processors in said arithmetic logic unit being the same, where p.gtoreq.q and p.gtoreq.r;
- means for number extraction from said output doublevector Z to provide said first result X and said second result Y, whereby said first number A is related to said first result X, and said second number B is related to said second result Y by a multiplication or division by a power of two, said first result X is an s-bit number and said second result Y is a t-bit number, and r.gtoreq.s+t.
- 7. The processor of claim 6 wherein said functional operation f() is a left-shift operation.
- 8. The processor of claim 6 wherein n=m=s=t=16.
- 9. The processor of claim 6 wherein said doublevector C is generated from said first number A and said second number B according to the relation
- C=A*2.sup.n +B.
- 10. The processor of claim 9 wherein said first result X and said second result Y are extracted from said output doublevector Z according to the relations
- Y=Z-((Z>>n)*2.sup.n),
- and
- X=(Z-Y)/2.sup.n.
- 11. The processor of claim 6 wherein said doublevector C is generated from said first number A and said second number B according to the relation
- C=(A<<n).vertline.B,
- where ".vertline." represents a bitwise OR Operation.
- 12. The processor of claim 11 wherein said first result X and said second result Y are extracted from said output doublevector Z according to the relations
- Y=Z& (2.sup.n -1),
- where "&" represents a bitwise AND operation, and
- X=Z>>n.
- 13.
- 13. The processor of claim 6 wherein said functional operation f() is a right-shift operation.
- 14. The processor of claim 6 wherein said functional operation f() is a right shift and said doublevector C includes extra bits to the right of said first and second numbers A and B in said doublevector C so q>m+n.
- 15. The processor of claim 6 wherein said functional operation f() is a left shift and said doublevector C includes extra bits to the left of said first and second numbers A and B in said doublevector C so q>m+n.
- 16. An arithmetic processor for performing a functional operation f() on a first n-bit number A1 and a second n-bit number A2 to provide a first n-bit result X=f(A1, A2), and performing said functional operation f() on a third m-bit number B1 and a fourth m-bit number B2 to provide a second m-bit result Y=f(B1, B2), comprising:
- means for doublevector production said means producing a first doublevector C1 from said first number A1 and said third number B1, and a second doublevector C2 from said second number A2 and said fourth number B2, said first and second doublevectors C1 and C2 having bit lengths of p and q, respectively, where p.gtoreq.n+m and q.gtoreq.m+n;
- a single-instruction single-data r-bit arithmetic logic unit providing said functional operation f() on said first doublevector C1 and said second doublevector C2 to provide an output doublevector Z=f(C1,C2) of bit length s, where said functional operation f() is an addition or subtraction with an unbroken carry chain, and r.gtoreq.p, r .gtoreq.q, and r.gtoreq.s;
- means for number extraction from said output doublevector Z to provide said first result X and said second result Y, whereby said processor functions as a single-instruction multiple-data machine.
- 17. The processor of claim 16 wherein said functional operation f() is addition.
- 18. The processor of claim 16 wherein said functional operation f() is subtraction.
- 19. The processor of claim 16 wherein n=16 and m=16.
- 20. The processor of claim 16 wherein said first doublevector C1 is generated from said first number A1 and said third number B1 according to the relation
- C1=A1*2.sup.n +B1,
- and said second doublevector C2 is generated from said second number A2 and said fourth number B2 according to the relation
- C2=A2*2.sup.n +B2.
- 21. The processor of claim 20 wherein said first result X and said second result Y are extracted from said output doublevector Z according to the relations
- Y=Z-((Z>>n)*2.sup.n),
- and
- X=(Z-Y)/2.sup.n.
- 22. The processor of claim 16 wherein said first doublevector C1 is generated from said first number A1 and said third number B1 according to the relation
- C1=(A1<<n).vertline.B1,
- and said second doublevector C2 is generated from said second number A2 and said fourth number B2 according to the relation
- C2=(A2<<n).vertline.B2,
- where ".vertline." represents a bitwise OR operation.
- 23. The processor of claim 22 wherein said first result X and said second result Y are extracted from said output doublevector Z according to the relations
- Y=Z & (2.sup.n -1),
- where "&" represents a logical AND operation, and
- X=Z>>n.
- 24. The processor of claim 16 wherein said functional operation f() is an addition or subtraction said first doublevector C1 includes extra bits to the left of said first and third numbers A1 and B1 so p>m+n, and said second doublevector C2 includes extra bits to the left of said second and fourth numbers A2 and B2 so q>m+n.
- 25. A data processor for computation of a transform on a first input data array X.sub.1 of numbers of bit length N1 to generate a first output data array of numbers of bit length M1, and computation of said transform on a second input data array X.sub.2 of numbers of bit length N2 to generate a second output data array of numbers of bit length M2, comprising:
- means for production of an input doublevector array Y of numbers of bit length P1 from said first data array X.sub.1 and said second data array X.sub.2, where P1.gtoreq.N1+N2;
- means for computation of said transform on said input doublevector array Y in a single-instruction single-data L1-bit arithmetic logic unit to produce an output doublevector data array of numbers of bit length P2 by a series of arithmetic and Boolean operations, said transform utilizing additions and subtractions with unbroken carry chains, where L1.gtoreq.P1, L1.gtoreq.P2, and P2.gtoreq.M1+M2; and
- means for extraction of said first and second output data arrays from said output doublevector data array.
- 26. The processor of claim 25 wherein said series of operations includes additions, subtractions and shifts, but includes no multiplications.
- 27. The processor of claim 26 wherein said transform is a Generalized Chen Transform.
- 28. The processor of claim 27 wherein said means for production produces said input doublevector Y according to
- Y=X.sub.1 *2.sup.n +X.sub.2.
- 29. The processor of claim 27 wherein said means for production produces said input doublevector Y according to
- Y=(X.sub.1 <<n).vertline.X.sub.2,
- where ".vertline." represents a bitwise OR operation.
CROSS-REFERENCE TO RELATED APPLICATIONS
This is a continuation of application Ser. No. 08/004,904, filed Jan. 21, 1993, now abandoned; which is a continuation-in-part of application Ser. No. 07/743,474, filed Aug. 9, 1991, now abandoned. The present application is related to U.S. Pat. No. 5,129,015, issued Jul. 7, 1992; to application Ser. No. 07/743,517, filed Aug. 9, 1991, now U.S. Pat. No. 5,319,724, which is a continuation-in-part of the aforementioned patent; and to application Ser. No. 07/811,468, filed Dec. 19, 1991, now U.S. Pat. No. 5,172,237, which is a continuation-in-part of the aforementioned application, all of which are entitled An Apparatus and Method for Compressing Still Images.
US Referenced Citations (13)
Non-Patent Literature Citations (10)
Entry |
Acheroy, M.; Use of the DCT for the Restoration of an Image Sequence SPIE vol. 593, Medical Image Processing, pp. 142-149 (1985, Bellingham, Washington). |
Cooley, and Turkey, J. W.; An Algorithm for (fast) Fourier Series Mat Comput, XIX No. 90, pp. 296-301, (1965). |
Chen, W., et al.; A Fast Computational Algorithm for the DCT IEEE Trans. Commun. vol. COM-25 No. 9, pp. 1004-1009 (1977). |
Wu, H. R. and Paolini, F. J.; On the Two Dimensional Vector Split-Radix FFT Algorithm IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, No. 2, pp. 1302-1304 (1989). |
Lee, B. G.; A Fast Cosine Transform IEEE ASSP, vol. XXXIII pp. 28A. 3. 1-4 (1984). |
Jalali and Rao; Limited Wordlength and FDCT Processing Accuracy IEEE ASSP-81, vol. III, pp. 1180-2 (1981). |
Wu, H. R. and Paolini, F. J.; A Structural Approach to Two Dimensional Direct Fast Discrete Cosine Transform Algorithms, International Symposium on Computer Architecture and Digital Signal Processing, Hong Kong, pp. 358-362 (Oct. 1989). |
Wang, Z.; Reconsideration of a Fast Computational Algorithm for the Discrete Cosine Transform IEEE Trans. Commun. vol. COM-31, No. 1, pp. 121-123 (1983). |
Mano, M. M.; Computer System Architecture, Second Edition, Sec. 4-4, pp. 125-127 (1982). |
Mono, Computer System Architecture, 2nd ed, Prentice Hall (1982) pp. 125-127. |
Continuations (1)
|
Number |
Date |
Country |
Parent |
04904 |
Jan 1993 |
|
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
743474 |
Aug 1991 |
|