Method and system for selecting and using source operands in computer system instructions

Description

CROSS-REFERENCES TO RELATED APPLICATIONS

The following applications, including this one, are being filed concurrently, and the disclosure of the other applications are incorporated by reference into this application in their entirety for all purposes:

U.S. patent application Ser. No. 09/410,633, entitled “AN INTEGER INSTRUCTION SET ARCHITECTURE AND IMPLEMENTATION”;

U.S. patent application Ser. No. 09/690,340, entitled “A METHOD FOR LOADING AND STORING DATA IN A COMPUTER SYSTEM”;

U.S. patent application Ser. No. 09/411,600, entitled “A FLOATING POINT INSTRUCTION SET ARCHITECTURE AND IMPLEMENTATION”.

U.S. patent application Ser. No. 09,410,675, entitled “A METHOD FOR ENCODING COMPUTER INSTRUCTION DATA FIELD”.

BACKGROUND OF THE INVENTION

The present invention relates generally to computer instruction set architectures, and particularly to the setting of selected operand fields.

In the past decade RISC (Reduced Instruction Set Computer) architectures, in which each instruction is ideally performed in a single operational cycle, have become popular. RISC architecture computers present several advantages over standard architecture computers. For instance, RISC instruction sets are capable of much higher data processing speeds due to their ability to perform frequent operations in shorter periods of time. The RISC devices began with 16-bit instruction sets, and grew to 32-bit instruction set architectures.

Pipelining techniques have been used in conjunction with RISC architectures to increase data throughput. Pipelining brought the need for data dependency checking; where the output of one instruction is the expected input into a following instruction. In some cases, instructions are divided into monadic (single source) and dyadic (dual source) instructions, each having its own unique dependency logic.

In addition to the complexities introduced by pipelining, applications have also contributed to the increasing complexity of RISC architectures. Frequently used constants, such as zero, can be set in different places from different sources.

Thus there is need for simplifying dependency logic without adding additional complexities to the hardware. In addition, there is a need to have a centralized, known source for zero to simplify the use of this frequently accessed constant.

SUMMARY OF THE INVENTION

According to the present invention, techniques for setting selected operand fields in pipelined architectures are provided. Methods and systems for efficiently selecting operand fields according to the present invention can be operative on a variety of computer architectures, including RISC architectures.

In a specific embodiment, the present invention provides a method for performing dependency checking on computer instructions in a pipeline of a computer system including determining if a first computer instruction has an opcode operating on only a first source operand. The computer instruction can have an opcode and a plurality of source operands, for example. Next, additional source operands can be replaced with the first source operand or the constant zero operand. Dependencies can be detected between the operands in the computer instruction and operands in other computer instructions in the pipeline. In a present embodiment, detecting can use the dyadic dependency checking for monadic instructions.

In another embodiment, the present invention provides a computer system for executing a computer instruction in a pipeline. The system can include a memory containing the computer instruction. The computer instruction can have a plurality of data fields, for example. A register that can return all zeros and a computer processor for executing the computer instruction stored in memory can also be part of the computer system. In a presently preferable embodiment, the register can be a 64-bit read only register, for example. The computer system can place one operand into the register while executing the computer instruction, for example.

Numerous advantages are provided by select embodiments according to the present invention. Embodiments can provide for setting selected operand fields in pipelined instructions for select computer architectures. In some embodiments, dependency checking for pipelined instructions can be provided. Many embodiments can be operable with RISC type computer architectures. Select embodiments can provide a standard hardware source for frequently used constant values and the like.

These and other advantages and features of the present invention will become apparent to those skilled in this art upon a reading of the following detailed description, which should be taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1

illustrates a simplified block diagram of a representative top level partitioning of a core in a specific embodiment of the present invention;

FIG. 2

illustrates a simplified block diagram of a representative Instruction Flow Unit in a specific embodiment of the present invention;

FIG. 3

illustrates a simplified diagram of a representative computer instruction format in a specific embodiment of the present invention;

FIG. 4A

illustrates a simplified diagram of a representative pipeline with no data dependency between instructions in a specific embodiment according to the present invention;

FIG. 4B

illustrates a simplified diagram of a representative pipeline with data dependencies in a specific embodiment of the present invention; and

FIG. 4C

illustrates a simplified diagram of a representative pipeline with data dependencies causing a stall in a specific embodiment of the present invention.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

Embodiments according to the present invention can provide techniques for setting selected operand fields in pipelined architectures. Methods and systems for efficiently selecting operand fields according to the present invention can be operative on a variety of computer architectures, including RISC architectures.

In a specific embodiment, the present invention may be implemented in a CPU having a core unit which may include six units and a detachable Floating-Point Unit (FPU).

FIG. 1

illustrates simplified block diagram of a representative top level partitioning of a core of the present invention. This diagram is merely an example which should not limit the scope of the claims herein. One of ordinary skill in the art would recognize many other variations, alternatives, and modifications. Table 1 describes some of the functions of the units illustrated in core

100

of FIG.

1

.

TABLE 1

Unit

Acronym

Description

S5 Core 200

S5

Top level core block

Bus interface

BIU

Controls bus access to external modules

unit 205

such as peripheral modules and external

memory interface.

Instruction

IFU

The front end of the CPU pipe: fetch,

Flow Unit

decode, issue & branch. Also contains mode

210

B emulation.

Instruction

IMU

Handles all integer and multimedia

multimedia

instructions. The main CPU datapath.

unit 220

Instruction

ICU

Comprises the Instruction Cache and the

cache Unit

Instruction Translation Lookaside Buffer

230

(TLB)

Load Store

LSU

Handles all memory instructions and Data

Unit 240

cache control.

Data cache

DCU

Comprises the Data Cache and the Data

Unit 250

Translation Lookaside Buffer (TLB)

Floating Point

FPU

Detachable Floating point unit (not shown

Unit 265

in FIG. 1).

FIG. 2

illustrates a simplified block diagram of an Instruction Fetch Unit (IFU)

210

in a specific embodiment of the present invention. This diagram is merely an example which should not limit the scope of the claims herein. One of ordinary skill in the art would recognize many other variations, alternatives, and modifications.

FIG. 2

illustrates instructions entering a Fetch Unit (FE)

242

from an Instruction Cache Unit (ICU)

130

. A Decoder (DEC)

244

can identify logical locations of the source and destination operands. Logical locations can include general-purpose register, floating-point register, target address register, control register, embedded immediate constant, the PC, and the like. Decoder

244

can pass its identification information to a Pipeline Control Unit (PPC)

246

that can select the proper source operands from the instructions. The Pipeline Control Unit

246

can also monitor the execution of the instruction through other stages of the instruction pipeline. PPC

246

can ensure that instructions are executed smoothly and correctly, for example. Instructions may be held in the decode stage until all the source operands are ready or can be ready when needed for execution of the instruction. An Operand File (OF)

248

can comprise source registers, i.e., General Purpose Registers (GPR's). Further reference may be had to Appendix 1 for a detailed description of a specific embodiment of IFU

210

.

FIG. 3

illustrates a simplified diagram of a representative example computer instruction format

260

in a specific embodiment of the present invention. This diagram is merely an example which should not limit the scope of the claims herein. One of ordinary skill in the art would recognize many other variations, alternatives, and modifications. Instruction format

260

is an example of a dyadic instruction including an opcode

262

, a register source

1

264

, a register source

2

268

, and a destination register

270

. Optionally, an extension

266

to opcode

262

and reserved bits

272

may be provided. In alternative embodiments, source

2

268

can be replaced by a 6-bit immediate address. Extension

266

and source

2

268

can be replaced with a 10-bit immediate address. Source

1

264

, extension

266

, and source

2

268

can be replaced with a 16-bit immediate address.

FIG. 3

also illustrates a general purpose register

63

(GPR

63

)

280

which can be a read-only register storing value zero. Any of the source registers

264

,

268

or the destination register

270

can be set to GPR

63

.

In a specific embodiment of the present invention, there are two general categories of instructions: the floating-point instructions (or FP instructions) and the rest integer, multimedia, load/store, flow-control instructions (or simply integer instructions). The former operates on floating-point registers, which do not have a constant register, while the later operates on, among others, the general-purpose registers, which has a constant-zero register R

63

. For FP instructions, all un-used 2nd source operand specifiers, i.e., the contents of the field in the instruction identifying the 2nd source register, may be encoded the same as the 1st source operand specifier so that a generic dependency checking logic can be used to detect instruction dependencies without knowing whether the instruction is monadic or dyadic. For integer instructions, all un-used 2nd source operand specifiers may be encoded as binary “

63

.” This is because R

63

as a constant register has no read-after-write dependency. There can be no writing into R

63

and then reading from it. Since this property is true for both monadic and dyadic instructions, forcing all un-used 2nd operand specifiers to be encoded as

63

allows us to use a generic dependency checker on the integer side to check for read-after-write dependencies.

In one representative example, general purpose register

63

(GPR

63

) is used by the instruction PTABS. The PTABS instruction, Table 2, gives a target address specified by the source register Rn. The target address is stored in the target address register TRa. The reserved bits

20

-

25

may be implemented as “111111” or

63

. Thus the hardware for a dyadic dependency checker for read-after write dependencies may be used on PTABS, a monadic instruction.

TABLE 2

In another representative example, general purpose register

63

(GPR

63

) is used by the instruction GETTR. This instruction sign-extends a 32-bit target register (TR) into a 64-bit value. Table 3 illustrates a format for a GETTR instruction in a particular embodiment according to the present invention. Execution of a GETTR instruction, moves the value held in a target address register TR

b

into a general register R

d

. The value returned by GETTR ensures that any unimplemented high-order bits of the source target register are seen as sign extensions of the highest implemented bit. Table 3 illustrates a machine code representation of the instruction, followed by an assembly language mnemonic. Next is shown the functional algorithm, which may be implemented in software, hardware, or both.

TABLE 3

Table 3 illustrates an implementation of the GETTR instruction described in Table 2. In the specific embodiment of Table 3, the GETTR instruction is implemented as an ADD.L with the 2nd operand (Rn=63) being 0 to get the sign-extension. During execution of the ADD.L instruction, the low 32 bits of Rm are added to the low 32 bits of Rn. The sign-extended 32-bit result can be stored in Rd. Thus by having the 2nd source operand specifier, Rn, encoded as

63

, the 32-bit sign-extension operation for a monadic instruction, such as GETTR, can be implemented with the same circuit that implements the addition then sign-extension operation for a dyadic instruction, such as ADD.L.

TABLE 4

In some embodiments, GPR

63

can be a read-only register always having all zeroes stored in it. During hazard detection, PPC

246

can check if the current instruction has R

63

as a destination. If this is the case, PPC

246

marks the instruction as a non-valid destination. In this way, subsequent instructions may never find hazards on Register

63

and it is read from the register file. This may be further explained, since R

63

is a constant register, designating R

63

as the target register can not (i) change the value of R

63

, and (ii) cause dependencies from subsequent instructions that use R

63

as a source operand. The specific embodiment of the architecture takes advantage of this property and uses it to provide a prefetch hint to the cache. For example, when a Load instruction is decoded, the IFU checks if the target register is R

63

(this logic already exists for the dependency checking). If the target register is R

63

, the LSU is informed that this is a cache hint and the result will not be used so that the LSU does not need to stall the pipeline if the operand can not be found in the cache. In addition, the LSU will not raise any exception if the load address is bad. Since R

63

can not be written into, the load (into R

63

) will proceed down the pipeline like a NOP except that the cache control is informed to load the operand into the cache if it is not already there.

FIG. 4A

shows a pipeline with no data dependency between instructions op

1

310

, op

2

314

, and op

3

318

. This diagram is merely an example which should not limit the scope of the claims herein. One of ordinary skill in the art would recognize many other variations, alternatives, and modifications.

FIG. 4A

illustrates instruction op

1

310

having an opcode, op

1

, followed by, R

1

, a position of the first source register (

264

in FIG.

3

), R

2

, the position of the second source register (

268

in FIG.

3

), and R

3

, the position of the destination register (

270

in FIG.

3

). The execution timeline

312

illustrates the execution cycle for instruction

310

having op

1

. Execution timeline

312

has a decode stage, D, and a write stage, W. There are three execution stages in

312

, E

1

, E

2

, and E

3

. The next instruction

314

is executed as shown by execution timeline

316

. Comparing execution timeline

316

with execution timeline

312

, it can be seen that instruction

314

can be decoded, D of

316

, while instruction

310

is in execution stage E

1

of

312

. The arrangement of decode stages (D) in execution timelines

312

,

316

and

320

illustrates that a new instruction can be decoded in times i, i+1, and i+2, where each vertical arrangement of blocks in execution timelines

312

,

316

and

320

, represents the same time cycle. For example at time “i+1”

322

, execution cycle

312

is in the E

1

stage and execution cycle

316

is in the D stage.

FIG. 4B

illustrates simplified diagram of a representative pipeline with data dependencies in a specific embodiment of the present invention. This diagram is merely an example which should not limit the scope of the claims herein. One of ordinary skill in the art would recognize many other variations, alternatives, and modifications.

FIG. 4B

illustrates a first instruction

330

having an opcode op

4

and a destination register R

3

. Destination register R

3

is used as a source in subsequent instruction

334

, having opcode op

5

. If the destination data can be produced in one cycle for instruction

330

(op

4

), then execution timeline

332

shows that at E

1

, the data is available to the decode cycle, D, of execution timeline

336

for subsequent instruction

334

. Similarly, the result of instruction

334

(op

5

) is produced by execution stage E

1

depicted by execution timeline

336

. This result is passed to decode stage D, of instruction

338

(op

6

) illustrated by execution timeline

340

.

FIG. 4C

illustrates a simplified diagram of a representative pipeline with data dependencies causing a stall in a specific embodiment of the present invention. This diagram is merely an example which should not limit the scope of the claims herein. One of ordinary skill in the art would recognize many other variations, alternatives, and modifications. In this example, instruction

350

(op

7

) takes two execution cycles E

1

, E

2

after decode stage D, as illustrated by execution timeline

352

. The R

3

result of instruction

350

is used as an input to instruction

354

(op

8

). Therefore, the PPC “stalls” one cycle

358

in execution timeline

356

, in order to obtain the correct value of R

3

from processing of instruction

350

(op

7

). As instruction

360

(op

6

) uses result R

6

of instruction

354

(op

8

), processing of this instruction is also delayed, as indicated by execution timeline

362

. Thus, in order to prevent a hazard, there may be a data dependency check between the result register R

3

of instruction

350

(op

7

) and the two input source registers R

3

and R

5

of instruction

354

(op

8

). Both sources from instruction

354

(op

8

) may need to be checked, as both source values are required to compute the result R

6

of the instruction.

In a specific embodiment of the present invention, monadic instructions may have a format such as format of

260

of

FIG. 3

, with one source register

264

or

268

left unused. If the unused source register were set to be equal to the used source register, then the dependency pipeline diagram illustrated in

FIG. 4C

could be used. There would not be a need for a separate monadic dependency checking circuit.

An example of a monadic instruction with a dyadic format in a specific embodiment of the present invention is the FABS.D instruction. Table 5 illustrates instruction FABS.D, which computes the absolute value of a double-precision floating-point number. It reads the value of DRg, clears its sign bit and stores the result in DRf. The second source register is represented by reserved bits

10

-

15

or “r1.”

TABLE 5

Thus setting “r1” in the above instruction to the value of the used source register, DRg, in FABS.D, would allow use of the dyadic dependency checking as described hereinabove with reference to FIG.

4

C.

Another advantage of the replicated source operand in a monadic instruction may be that this gives more flexibility to the design. In a typical design there will be source operand buses that pass through the pipeline. The replicated source operand allows that source to be accessed on either the source

1

or source

2

bus to best suit the design. This leads to more flexibility in the physical layout of the design (it may be more physically convenient to take the operand from one bus rather than the other). Also, if the two buses are not equally utilized (i.e., one is loaded more heavily than the other), then the replicated source operand allows the less loaded bus to be used, hence equalizing their loading.

CONCLUSION

In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. Other embodiments will be apparent to those of ordinary skill in the art. For example, the instructions may be 16 or 64 or 128 bits or more in length, there may be three source operands of which only one is used (hence copied into the other operands), or the pipeline may contain more or less than three stages. Thus it is evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the invention as set forth in the appended claims and their full scope of equivalents.

Claims

1. A method for performing dependency checking on instructions in a pipeline of a computer system, said pipeline containing a first instruction and subsequent instructions, said instructions comprising an opcode, a first source operand and a second source operand, said instructions comprising monadic instructions and dyadic instructions, said monadic instructions having an opcode that operates only on the first source operand, said dyadic instructions having an opcode that operates on the first source operand and the second source operand, said method comprising:determining if said first instruction comprises a monadic instruction; if said first instruction comprises a monadic instruction, replacing said second source operand with a token; and detecting any dependencies between operands in said first instruction and operands in said subsequent instructions in said pipeline.
2. The method of claim 1 wherein said detecting further comprises:performing dyadic dependency checking for said monadic instructions and said dyadic instructions.
3. The method of claim 1 wherein said token further comprises said first operand.
4. The method of claim 1 further comprising:detecting whether said first instruction comprises a floating point instruction; if said first instruction comprises a floating point instruction, using said first operand as said token; otherwise using an integer value as said token.
5. The method of claim 4 further comprising:detecting if a third operand comprises an integer value; and if a third operand comprises an integer value, providing a pre-fetch signal to the cache.
6. The method of claim 1 wherein said instructions comprise RISC instructions.
7. A computer system for executing instructions, said instructions comprising a first instruction and subsequent instructions, said instructions comprising an opcode, a first operand and subsequent operands, said instructions comprising monadic instructions and dyadic instructions, said monadic instructions having an opcode that operates only on a first operand, said dyadic instructions having an opcode that operates on a first operand and a second operand, said system comprising:a memory, said memory holding said instructions; a processor, said processor operative to execute said instructions; a pipeline, said pipeline unit operative to control processing by said processor of said instructions retrieved from said memory; wherein said pipeline unit is operatively disposed to: determine if said first instruction comprises a monadic instruction; if said first instruction comprises a monadic instruction, replace said subsequent source operands with a token; and detect any dependencies between operands in said first instruction and operands in said subsequent instructions in said pipeline.
8. The system of claim 7 wherein said detecting further comprises:performing dyadic dependency checking for said monadic instructions and said dyadic instructions.
9. The system of claim 7 wherein said token further comprises said first operand.
10. The system of claim 7, wherein said pipeline unit is further operative to:detect whether said first instruction comprises a floating point instruction; if said first instruction comprises a floating point instruction, use said subsequent source operands as said token; otherwise use an integer value as said token.
11. The system of claim 7 wherein said instructions comprise RISC instructions.
12. The system of claim 7 further comprising:a 64-bit register which is read-only and returns all zeros.
13. The system of claim 7 wherein said processor is a 64-bit computer processor for executing the instruction stored in memory; wherein said executing comprises one data field using said 64-bit register.
14. The computer system of claim 12 wherein said pipeline unit marks said 64-bit register if said 64-bit register is a destination register in said instruction.
15. A method for performing dependency checking on computer instructions in a pipeline of a computer system comprising:determining if a first computer instruction, comprising an opcode and a plurality of source operands, has the opcode operating on only a first source operand; replacing a unused operand with the first source operand; and detecting any dependencies between the operands in the first computer instruction and operands in another computer instruction in the pipeline by performing dyadic dependency checking using the unused operand and first source operand.
16. A method for performing dependency checking on computer instructions in a pipeline of a computer system comprising:determining if a first computer instruction, comprising an opcode and a plurality of source operands, has the opcode operating on only a first source operand; replacing a unused operand with a special operand, wherein the special operand references a read-only register comprising zeros; and detecting any dependencies between the operands in the first computer instruction and operands in another computer instruction in the pipeline by performing dyadic dependency checking using the unused operand and first source operand.

US Referenced Citations (61)

Number	Name	Date	Kind
4306285	Moriya et al.	Dec 1981	A
4814981	Rubinfeld	Mar 1989	A
5251311	Kasai	Oct 1993	A
5386565	Tanaka et al.	Jan 1995	A
5423050	Taylor et al.	Jun 1995	A
5434804	Bock et al.	Jul 1995	A
5440705	Wang et al.	Aug 1995	A
5448576	Russell	Sep 1995	A
5452432	Macachor	Sep 1995	A
5455936	Maemura	Oct 1995	A
5479652	Dreyer et al.	Dec 1995	A
5483518	Whetsel	Jan 1996	A
5488688	Gonzales et al.	Jan 1996	A
5530965	Kawasaki et al.	Jun 1996	A
5570375	Tsai et al.	Oct 1996	A
5590354	Klapproth et al.	Dec 1996	A
5596734	Ferra	Jan 1997	A
5598551	Barajas et al.	Jan 1997	A
5608881	Masumura et al.	Mar 1997	A
5613153	Arimilli et al.	Mar 1997	A
5627842	Brown et al.	May 1997	A
5657273	Ayukawa et al.	Aug 1997	A
5682545	Kawasaki et al.	Oct 1997	A
5682546	Garg et al.	Oct 1997	A
5704034	Circello	Dec 1997	A
5708773	Jeppesen, III et al.	Jan 1998	A
5724549	Selgas et al.	Mar 1998	A
5737516	Circello et al.	Apr 1998	A
5751621	Arakawa	May 1998	A
5768152	Battaline et al.	Jun 1998	A
5771240	Tobin et al.	Jun 1998	A
5774701	Matsui et al.	Jun 1998	A
5778237	Yamamoto et al.	Jul 1998	A
5781558	Inglis et al.	Jul 1998	A
5796978	Yoshioka et al.	Aug 1998	A
5828825	Eskandari et al.	Oct 1998	A
5832248	Kishi et al.	Nov 1998	A
5835963	Yoshioka et al.	Nov 1998	A
5848247	Matsui et al.	Dec 1998	A
5860127	Shimzaki et al.	Jan 1999	A
5862387	Songer et al.	Jan 1999	A
5867726	Ohsuga et al.	Feb 1999	A
5884092	Kiuchi et al.	Mar 1999	A
5896550	Wehunt et al.	Apr 1999	A
5918045	Nishii	Jun 1999	A
5930523	Kawasaki et al.	Jul 1999	A
5930833	Yoshioka et al.	Jul 1999	A
5944841	Christie	Aug 1999	A
5950012	Shiell et al.	Sep 1999	A
5953538	Duncan et al.	Sep 1999	A
5956477	Ranson et al.	Sep 1999	A
5978874	Singhal et al.	Nov 1999	A
5978902	Mann	Nov 1999	A
5983017	Kemp et al.	Nov 1999	A
5983379	Warren	Nov 1999	A
6023757	Nishimoto et al.	Feb 2000	A
6038582	Arakawa et al.	Mar 2000	A
6038661	Yoshioka et al.	Mar 2000	A
6091629	Osada et al.	Jul 2000	A
6092172	Nishimoto et al.	Jul 2000	A
6243732	Arakawa et al.	Jun 2001	B1

Foreign Referenced Citations (17)

Number	Date	Country
165 600	Nov 1991	EP
636 976	Feb 1995	EP
652 516	May 1995	EP
702 239	Mar 1996	EP
720 092	Jul 1996	EP
933 926	Aug 1999	EP
945 805	Sep 1999	EP
959 411	Nov 1999	EP
8320796	Dec 1996	JP
8329687	Dec 1996	JP
9212358	Aug 1997	JP
9311786	Dec 1997	JP
WO 9813759	Apr 1998	JP
10106269	Apr 1998	JP
10124484	May 1998	JP
10177520	Jun 1998	JP
WO9813759	Apr 1998	WO

Non-Patent Literature Citations (1)

Entry
Richard York; Real Time Debug for System-on-Chip Devices; Jun. 1999; pp. 1-6.

Method and system for selecting and using source operands in computer system instructions

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Abstract

Description

Claims

US Referenced Citations (61)

Foreign Referenced Citations (17)

Non-Patent Literature Citations (1)