The present invention relates generally to processor instruction sets and, more particularly, to an instruction set for processing micro-controller type instructions and digital signal processor instructions from a single instruction stream.
Processors, including microprocessors, digital signal processors and microcontrollers, operate by running software programs that are embodied in one or more series of instructions stored in a memory. The processors run the software by fetching the instructions from the series of instructions, decoding the instructions and executing them.
In addition to program instructions, data is also stored in memory that is accessible by the processor. Generally, the program instructions process data by accessing data in memory, modifying the data and storing the modified data into memory.
The instructions themselves also control the sequence of functions that the processor performs and the order in which the processor fetches and executes the instructions. For example, the order for fetching and executing each instruction may be inherent in the order of the instructions within the series. Alternatively, instructions such as branch instructions, conditional branch instructions, subroutine calls and other flow control instructions may cause instructions to be fetched and executed out of the inherent order of the instruction series.
The program instructions that comprise a software program are taken from an instruction set that is designed for each processor. The instruction set includes a plurality of instructions, each of which specifies operations of one or more functional components of the processor. The instructions are decoded in an instruction decoder which generates control signals distributed to the functional components of the processor to perform the operation(s) specified in the instruction.
The instruction set itself, in terms of breadth, flexibility and simplicity dictates the ease with which programmers may generate programs. The instruction set also reflects the processor architecture and accordingly the functional and performance capability of the processor.
There is a need for a processor and an instruction set that includes a robust and an efficient set of instructions for a wide variety of applications. Given the rapid growth of digital signal processing (DSP) applications, there is a further need for an instruction set that incorporates DSP type instructions and micro-controller type instructions. There is a further need to provide processor having a tightly coupled DSP engine and a microcontroller arithmetic logic unit (ALU) for many types of applications conventionally handled separately by either a microcontroller or a digital signal processor, including motor control, soft modems, automotive body computers, speech recognition, echo cancellation and fingerprint recognition.
According to embodiments of the present invention, an instruction set is provided that features ninety four instructions and eleven address modes to deliver a mixture of flexible micro-controller like instructions and specialized digital signal processor (DSP) instructions that execute from a single instruction stream.
According to an embodiment of the present invention, a processor executes instructions within the designated instruction set. The processor includes a program memory, a program counter, registers and at least one execution unit. The program memory stores program instructions, including instructions from the designated instruction set. The program counter determines the current instruction for processing. The registers store operand data specified by the program instructions and the execution unit(s) execute the current instruction. The execution unit may include a DSP engine and arithmetic logic unit. Each designated instruction is identified to the processor by designated encoding and to programmers by a designated mnemonic.
A more complete understanding of the present disclosure and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, wherein:
a is a block diagram illustrating a stack pointer at initialization according to an embodiment of the present disclosure.
b is a block diagram illustrating a stack pointer after a PUSH operation according to an embodiment of the present disclosure.
c is a block diagram illustrating a stack pointer after a PUSH operation according to an embodiment of the present disclosure.
d is a block diagram illustrating a stack pointer after a POP operation according to an embodiment of the present disclosure.
While the present invention is susceptible to various modifications and alternative forms, specific exemplary embodiments thereof have been shown by way of example in the drawings and are herein described in detail. It should be understood, however, that the description herein of specific embodiments is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
In order to describe the instruction set and its relationship to a processor for executing the instruction set, an overview of pertinent processor elements is first presented with reference to
Overview of Processor Elements
The processor 100 includes a program memory 105, an instruction fetch/decode unit 110, instruction execution units 115, data memory and registers 120, peripherals 125, data I/O 130, and a program counter and loop control unit 135. The bus 150, which may include one or more common buses, communicates data between the units as shown.
The program memory 105 stores software embodied in program instructions for execution by the processor 100. The program memory 105 may comprise any type of nonvolatile memory such as a read only memory (ROM), a programmable read only memory (PROM), an electrically programmable or an electrically programmable and erasable read only memory (EPROM or EEPROM) or flash memory. In addition, the program memory 105 may be supplemented with external nonvolatile memory 145 as shown to increase the complexity of software available to the processor 100. Alternatively, the program memory may be volatile memory which receives program instructions from, for example, an external non-volatile memory 145. When the program memory 105 is nonvolatile memory, the program memory may be programmed at the time of manufacturing the processor 100 or prior to or during implementation of the processor 100 within a system. In the latter scenario, the processor 100 may be programmed through a process called in-line serial programming.
The instruction fetch/decode unit 110 is coupled to the program memory 105, the instruction execution units 115 and the data memory 120. Coupled to the program memory 105 and the bus 150 is the program counter and loop control unit 135. The instruction fetch/decode unit 110 fetches the instructions from the program memory 105 specified by the address value contained in the program counter 135. The instruction fetch/decode unit 110 then decodes the fetched instructions and sends the decoded instructions to the appropriate execution unit 115. The instruction fetch/decode unit 110 may also send operand information including addresses of data to the data memory 120 and to functional elements that access the registers.
The program counter and loop control unit 135 includes a program counter register (not shown) which stores an address of the next instruction to be fetched. During normal instruction processing, the program counter register may be incremented to cause sequential instructions to be fetched. Alternatively, the program counter value may be altered by loading a new value into it via the bus 150. The new value may be derived based on decoding and executing a flow control instruction such as, for example, a branch instruction. In addition, the loop control portion of the program counter and loop control unit 135 may be used to provide repeat instruction processing and repeat loop control as further described below.
The instruction execution units 115 receive the decoded instructions from the instruction fetch/decode unit 110 and thereafter execute the decoded instructions. As part of this process, the execution units may retrieve one or two operands via the bus 150 and store the result into a register or memory location within the data memory 120. The execution units may include an arithmetic logic unit (ALU) such as those typically found in a microcontroller. The execution units may also include a digital signal processing engine, a floating point processor, an integer processor or any other convenient execution unit. A preferred embodiment of the execution units and their interaction with the bus 150, which may include one or more buses, is presented in more detail below with reference to
The data memory and registers 120 are volatile memory and are used to store data used and generated by the execution units. The data memory 120 and program memory 105 are preferably separate memories for storing data and program instructions respectively. This format is a known generally as a Harvard architecture. It is noted, however, that according to the present invention, the architecture may be a Von-Neuman architecture or a modified Harvard architecture which permits the use of some program space for data space. A dotted line is shown, for example, connecting the program memory 105 to the bus 150. This path may include logic for aligning data reads from program space such as, for example, during table reads from program space to data memory 120.
Referring again to
The data I/O unit 130 may include transceivers and other logic for interfacing with the external devices/systems 140. The data I/O unit 130 may further include functionality to permit in circuit serial programming of the Program memory through the data I/O unit 130.
The W registers 240 are general purpose address and/or data registers. The DSP engine 230 is coupled to both the X and Y memory buses and to the W registers 240. The DSP engine 230 may simultaneously fetch data from each the X and Y memory, execute instructions which operate on the simultaneously fetched data and write the result to an accumulator (not shown) and write a prior result to X or Y memory or to the W registers 240 within a single processor cycle.
In one embodiment, the ALU 270 may be coupled only to the X memory bus and may only fetch data from the X bus. However, the X and Y memories 210 and 220 may be addressed as a single memory space by the X address generator in order to make the data memory segregation transparent to the ALU 270. The memory locations within the X and Y memories may be addressed by values stored in the W registers 240.
Any processor clocking scheme may be implemented for fetching and executing instructions. A specific example follows, however, to illustrate an embodiment of the present invention. Each instruction cycle is comprised of four Q clock cycles Q1-Q4. The four phase Q cycles provide timing signals to coordinate the decode, read, process data and write data portions of each instruction cycle.
According to one embodiment of the processor 100, the processor 100 concurrently performs two operations—it fetches the next instruction and executes the present instruction. Accordingly, the two processes occur simultaneously. The following sequence of events may comprise, for example, the fetch instruction cycle:
The following sequence of events may comprise, for example, the execute instruction cycle for a single operand instruction:
The following sequence of events may comprise, for example, the execute instruction cycle for a dual operand instruction using a data pre-fetch mechanism. These instructions pre-fetch the dual operands simultaneously from the X and Y data memories and store them into registers specified in the instruction. They simultaneously allow instruction execution on the operands fetched during the previous cycle.
The multiplier 300 has inputs coupled to the W registers 240 and an output coupled to the input of a multiplexer 305. The multiplier 300 may also have inputs coupled to the X and Y bus. The multiplier may be any size however, for convenience, a 16×16 bit multiplier is described herein which produces a 32 bit output result. The multiplier may be capable of signed and unsigned operation and can multiplex its output using a scaler to support either fractional or integer results.
The output of the multiplier 300 is coupled to one input of a multiplexer 305. The multiplexer 305 has another input coupled to zero backfill logic 310, which is coupled to the X Bus. The zero backfill logic 310 is included to illustrate that 16 zeros may be concatenated onto the 16 bit data read from the X bus to produce a 32 bit result fed into the multiplexer 305. The 16 zeros are generally concatenated into the least significant bit positions.
The multiplexer 305 includes a control signal controlled by the instruction decoder of the processor which determines which input, either the multiplier output or a value from the X bus is passed forward. For instructions such as multiply and accumulate (MAC), the output of the multiplier is selected. For other instructions such as shift instructions, the value from the X bus (via the zero backfill logic) may be selected. The output of the multiplexer 305 is fed into the sign extend unit 315.
The sign extend unit 315 sign extends the output of the multiplexer from a 32 bit value to a 40 bit value. The sign extend unit 315 is illustrative only and this function may be implemented in a variety of ways. The sign extend unit 315 outputs a 40 bit value to a multiplexer 320.
The multiplexer 320 receives inputs from the sign extend unit 315 and the accumulators 345 and 350. The multiplexer 320 selectively outputs values to the input of a barrel shifter 330 based on control signals derived from the decoded instruction. The accumulators 345 and 350 may be any length. According to the embodiment of the present invention selected for illustration, the accumulators are 40 bits in length. A multiplexer 360 determines which accumulator 345 or 350 is output to the multiplexer 320 and to the input of an adder 340.
The instruction decoder sends control signals to the multiplexers 320 and 360, based on the decoded instruction. The control signals determine which accumulator is selected for either an add operation or a shift operation and whether a value from the multiplier or the X bus is selected for an add operation or a shift operation.
The barrel shifter 330 performs shift operations on values received via the multiplexer 320. The barrel shifter may perform arithmetic and logical left and right shifts and circular shifts where bits rotated out one side of the shifter reenter through the opposite side of the buffer. In the illustrated embodiment, the barrel shifter is 40 bits in length and may perform a 15 bit arithmetic right shift and a 16 bit left shift in a single cycle. The shifter uses a signed binary value to determine both the magnitude and the direction of the shift operation. The signed binary value may come from a decoded instruction, such as shift instruction or a multi-precision shift instruction. According to one embodiment of the invention, a positive signed binary value produces a right shift and a negative signed binary value produces a left shift.
The output of the barrel shifter 330 is sent to the multiplexer 355 and the multiplexer 370. The multiplexer 355 also receives inputs from the accumulators 345 and 350. The multiplexer 355 operates under control of the instruction decoder to selectively apply the value from one of the accumulators or the barrel shifter to the adder/subtractor 340 and the round and saturate logic 365.
The adder/subtractor 340 may select either accumulator 345 or 350 as a source and/or a destination. In the illustrated embodiment, the adder/subtractor 340 has 40 bits. The adder receives an accumulator input and an input from another source such as the barrel shifter 331, the X bus or the multiplier. The value from the barrel shifter 331 may come from the multiplier or the X bus and may be scaled in the barrel shifter prior to its arrival at the other input of the adder/subtractor 340. The adder/subtractor 340 adds to or subtracts a value from the accumulator and stores the result back into one of the accumulators. In this manner values in the accumulators represent the accumulation of results from a series of arithmetic operations. The round and saturate logic 365 is used to round 40 bit values from the accumulator or the barrel shifter down to 16 bit values that may be transmitted over the X bus for storage into a W register or data memory. The round and saturate logic has an output coupled to a multiplexer 370. The multiplier 370 may be used to select either the output of the round and saturate logic 365 or the output from a selected 16 bits of the barrel shifter 330 for output to the X bus.
Description of the Instruction Set
The designated instruction set according to the present invention is set forth in the following tables, and are listed in alphabetical order using mnemonics. The mnemonics are merely illustrative, and one of ordinary skill in the art will understand that alternate mnemonics may be used to achieve the same result. The designated instruction set and descriptions of each designated instruction is presented in the following tables. To simplify the definition, each variant of an instruction is given a different “PLA mnemonic.” The detailed definitions of the instructions are listed by the PLA mnemonic in each table which lists the illustrative assembly syntax of each mnemonic, gives examples of usage of that syntax, gives the PLA mnemonic. Symbols used in the definitions of the tables of the instruction set are defined in Table 1.
(Wnd)
(Wn)<7:0>
(Wn)<3:0>
Instruction Operation Details
An explanation of the instruction operation details are enhanced by reference to several figures, specifically
Implied W Register Utilization
Certain W registers have implied utilization in the instruction set. W0-W3 are used as the operands for DSP instructions. W4-W7 are used as the prefetch addresses for DSP instructions. W14 is the frame pointer utilized by the LNK and ULNK instructions. W15 acts as the stack pointer.
Default Ww
W0 serves as the default Ww register for file register instructions. In this capacity, Ww acts as the W register in C16 and C18 compatible instructions.
Byte Operations
When a byte is moved into a W register, the byte is written into the LSbyte of the register and the MSbyte is left alone. Byte operations on the registers will operate on the LSbyte of the register. The MSbyte of the register is left alone. For byte operations, the status flags will be adjusted to respond to the <7:0> bits of the register. For example, the carry bit will originate from ALU<7>. When a byte is moved from a W register, the source is the LSbyte and it overwrites the target byte in the memory. Other bytes are not affected.
Byte Operations in Bit Instructions—W Registers
The Bit operation instructions that use the W registers can address bytes or words without the requirement for a B bit. These instructions include BCLR, BSET, BSW.C, BSW.Z, BTG, BTST.C, BTST.Z, BTSTS.C, BTSTS.Z, BTST.C and BTST.Z. This works by making the bit field selection look at the LSB of the word or byte being addressed by the W register. If the address of the word or byte LSB is one, then zero that LSB and set the MSB of the bit selection field.
The instructions that have 10-bit literals have byte and word modes. For byte instructions, the literal is truncated at 8 bits. If the user specifies a signed value {−128 . . . −1}, the truncated 2's compliment is coded. Unsigned values may range from {0 . . . 255}. For word instructions, the literal is sign extended to 16-bits.
Program Memory Addressing
Program memory contains a user space and a test space. The most significant bit (PMA<23>) of the program memory address selects user/test space. The least significant bit (PMA<0>) selects a byte for data addressing and table addressing modes.
Program memory addresses coded into instructions are coded in a lit23 or Slit16 format. The lit23 format encodes a direct address that represents PMA<22:0>. PMA<23> is not user space and is not encoded. The Slit16 format encodes an instruction count offset. The is added to the PC to generate the next address. The Slit15 format does not encode the PMA<0>bit as it represents an instruction count. The Slit16<15> bit is sign extended when to the PC.
Shadows
Shadow registers are 1 level deep mini-stack registers attached to several key user registers. A PUSH.S will copy the user registers to the shadows and a POP.S will copy the shadows back to the user registers. Shadow registers are attached to W0 . . . W15, the STATUS register, and the LCR,LSR,LER registers used by DO and REPEAT instructions.
MAC
The MAC instruction is a pipelined instruction. The first pipeline stage generates the effective addresses of the X and Y data and fetches the X and Y data. The second pipeline stage computes the multiply and accumulate, storing the results into the accumulator.
Forms
The MAC instruction, and variants, can have several formats. Fundamentally, it must specify a target accumulator and a multiplicand and multiplier (ACC=X*Y). For Example:
The MAC can also specify a prefetch for the next X or Y operand. The assembler can discriminate the X or Y prefetch based on the register used as the indirect address. [W4] or [W5] specifies the X prefetch and [W6] or [W7] specifies the Y prefetch. If a prefetch is specified, it must have a prefetch destination register. Legal forms of prefetch include:
A write back can be specified. The write back uses the W9 register as the destination address. In this way, the assembler can discern the write back option.
Squaring in the DSP engine is done with the square PLA opcodes. These are variants of the MAC and MPY opcodes.
For Example:
This instruction will multiply W0 time W0 and write the result in ACCB while doing the prefetch and write back. The assembler can tell that a MAC or MPY should translate to SQRAC or SQR instructions by finding the Wm*Wm format.
File Registers
File registers include parts of user RAM area and the Special Function Registers (SFR). The file register space is 8192 bytes. The file registers are directly addressable using the f field in the file register instructions.
All data addresses are byte addresses. When using byte instructions, the bytes are addressed directly. When using word instructions, the address must be word aligned. The least significant address bit must be 0.
Carry and Borrow in PIC Instructions
The PIC uses one unified carry and borrow bit, the C bit in the status register. The following examples show the functionality of the carry/borrow.
If a normal add generates a carry out of the 15th bit, the carry bit is set.
An ass carry will use the carry bit as an additional input. If the add generates a carry out of the 15th bit, the carry bit is set.
A subtract instruction inverts the bits of the subtrahend, forces the carry in to 1 and does an add. This has the effect of generating the 2's compliment of the subtrahend. If the add generates a carry out of the 15th bit, the carry bit is set. However, in the case of a subtract, the carry bit is viewed as a BORROW bit. So a 1 in the carry bit indicates no borrow. A 0 in the carry bit indicates a borrow.
Subtracting 3−2 generates no borrow, so the C bit is 1.
Subtracting 3−3 generates no borrow, so the C bit is 1. The Z bit indicates a zero result.
Subtracting 2−3 generates a borrow, so the C bit is 0. The N bit indicates a negative result.
A subtract with borrow instruction inverts the bits of the subtrahend, leaves the carry at its previous state and does an add. This has the effect of generating the 2's compliment of the subtrahend while inputing a BORROW bit.
Subtract/borrow 3−2 with no borrow in generates no borrow, so the C bit is 1.
Subtract/borrow 3−2 with borrow in generates no borrow, so the C bit is 1. The result is 0, so the Z bit is set.
Subtract/borrow 2−3 with borrow in generates a borrow, so the C bit is 0. The N bit indicates a negative result.
Overflow Conditions
When doing 2's compliment mathematics, the OV flag indicates an overflow. When doing multi-word math, the overflow is ignored until the most significant operation.
Branch Conditions
Conditional branch instructions are valid after compare or subtract instructions. The compare is minuend-subtrahend and the condition tests are in the same order. For example, BGT will be true if the minuend is greater than the subtrahend or (minuend>subtrahend).
Stack Operation
The dsPIC stack is a software stack implemented in user RAM area. While the device has provisions to allow pointer manipulation on any of the 16 W registers, W15 is the assumed stack pointer.
The stack starts at lower memory and grows towards high memory. The stack pointer points to the next available location. The stack pointer is manipulated with the source and destination addressing modes as shown in Table 192 and Table 193. With respect to
a shows a block diagram illustrating a stack pointer at initialization.
Multi-Word Move Operations
The multi-word move instructions manipulated with the source and destination addressing modes as shown in Table 192 and Table 193.
Link and Unlink Instructions
The link and unlink instructions assume that W15 is a stack pointer and W14 is a frame pointer. The link instruction is used during a calling sequence.
The LNK instruction will push the calling routines FP onto the stack. The new FP will be set to point to the current stack pointer. Then the literal is subtracted from the stack pointer which reserves the amount of memory allocated.
Inside of the routine, the stack is used to save values. [W14+n] will access the Temp locations used by the routine. [W14−n] is used to access the parameters.
At the end of the routine, the ULNK instruction will copy the FP to the stack pointer then POP the callers FP back to the FP.
This returns the stack back to the state in
A return instruction will return to the caller. The caller is responsible for removing the parameters from the stack.
This returns the stack back to the state in
Multi-Word Shift Instructions
The CARRY1 and CARRY0 registers hold the temporary values of the shift.
32-Bit Left Shifts
The multi-word left shift instructions utilize the shifter associated with the ACCn registers. The instruction can shift 0 to 31 positions. Although the shifter can only implement shifts of up to 15 positions to the left, by rearranging the storing into the destination registers an apparent shift of 31 positions may be obtained.
The Multi-Word Left Shift By 4 Instruction Execution (see
The Multi-Word Left Shift By 20 Instruction Execution (see
Note the shifter is shifting (20-16), making the shift equivalent to the previous example. When the instruction detects a shift value greater than 15, it is only necessary to realign the result registers and perform a smaller shift.
32-Bit RIGHT Shifts
The multi-word right shift instructions are similar to the left shifts. The Multi-Word Right Shift By 4 Instruction Execution (see
The Multi-Word Right Shift By 20 Instruction Execution (see
Note that the examples given show arithmetic shifts. If logical shifts are used, zeros would replace the sign bits.
16-Bit Shifts
The ASR, LSR and SL instructions allow for shifts of 16-bit words. The shift value should be limited to 15 positions by the user for useful results.
Multi-Word Shifts on Words Longer Than 32 Bits
The MSL and MSR instructions allow for shifts of words greater than 32 bits. This may be useful for IP addresses or encription keys. Note that the shift is still limited up to 31 positions. For example, to shift a 64 bit word:
Multi-Word Rotates
Because the CARRY registers are readable, the multi-word shift instructions may be used for rotates. For example, to left rotate a 16 bit word:
For example, to left rotate a 32 bit word:
Using the MSL and MSR instructions, rotates of greater word lengths may be achieved.
DSP Data Formats
Integer and Fractional Data
The dsPIC DSP core supports integer and fractional data operations. Data format selection is made by the IF bit in the DSP control register CORCON<0>. Setting this bit to “1” selects integer mode; setting this bit to “0” selects fractional mode.
Integer data is inherently represented as a signed two's-complement value, where the MSB is defined as a sign bit. Generally speaking, the range of an N-bit two's complement integer is −2N-1 to 2N-1−1. For a 16-bit integer, the data range is −32768 (0x8000) to 32767 (0x7FFF), including 0 (see
When the dsPIC is in fractional mode, data is represented as a two's complement fraction where the MSB is defined as a sign bit and the radix point is implied to lie just after the sign bit (Q1.X format). The range of an N-bit two's complement fraction with this implied radix point is −1.0 to (1-21-N). For a 16-bit fraction, the Q1.15 data range is −1.0 (0x8000) to 0.999969482 (0x7FFF), including 0 (see
Super Saturation Mode
The SATMOD bit, CORCON<3>, enables Super Saturation mode and expands the dynamic range of the accumulators by using 8 guard bits. When the SATMOD bit is set to “1”, Super Saturation mode is enabled and the 40-bit accumulators support an integer range of −5.498×1011 (0x80 0000 0000) to 5.498×1011 (0x7F FFFF FFFF). In fractional mode, the guard bits of the accumulator do not modify the location of the radix point and the 40-bit accumulators use Q9.31 fractional format. Note that all fractional operation results are stored in the 40-bit accumulator justified with a Q1.31 radix point. As in integer mode, the guard bits merely increase the dynamic range of the accumulator. Q9.31 fractions have a range of −256.0 (0x80 0000 0000) to (256.0−4.65661×10−10) (0x7F FFFF FFFF). See Section 2.3.3 of the Core DOS for a description of the dsPIC overflow and saturation modes.
Scaling and Normalizing With FBCL Instruction
To minimize quantization errors that are associated with data processing using DSP instructions, it is important to utilize the complete available resolution of the dsPIC register set. This may require scaling data up to avoid underflows (i.e., when processing data from a 12-bit ADC) or scaling data down to avoid overflows (i.e., when sending data to a 10-bit DAC). The scaling which must be performed to minimize quantization errors depends on the dynamic range of the input data which is operated on, and the requirements of the dynamic range of the output data. At times these conditions may be known apriori and fixed scaling may be employed. Other times, scaling conditions may be not be fixed or known, and then dynamic scaling must be used to process data.
The Find First Bit Change Left (FBCL) instruction can effeciently be used to perform dynamic scaling. The FBCL function determines the exponent of the byte or word which it operates on (namely the amount which the value may be shifted before overflowing), and stores the exponent such that it may be used to later scale the value by shifting. The exponent is determined by detecting the first bit change starting from the sign bit and working towards the LSB. Scaling Examples shows data with various dynamic ranges, their exponents, and the value after scaling each data to maximize the dynamic range.
*A “hole” where FBCL fails to detect the correct exponent
As a practical example, assume that block processing is performed on a sequence of data with very low dynamic range stored in Q1.15 fractional format. To minimize quantization errors, the data may be scaled up to prevent any quantization loss which may occur as it is processed. The FBCL instruction can be executed on the sample with the largest magnitude to determine the optimal scaling value for processing the data. Note that scaling the data up is performed by left shifting the data (see Section 2.2 of the Core DOS for a description of the Barrel Shifter). This is demonstrated with the code snippet below.
Accumulator Normalization With FBCL
The process of scaling a quantized value for its maximum dynamic range is known as normalization (the data in the third column in Table 169: Scaling Examples, contains normalized data). Accumulator normalization is a technique used to ensure that the accumulator is properly aligned before storing data from the accumulator, and the FBCL instruction facilitates this function.
The two 40-bit accumulators each have 8 guard bits which expand the accumulator from Q1.31 to Q9.31 when operating in Super Saturation mode. Even in Super Saturation mode the Store Accumulator (SAC) instruction only stores 16-bit data (in Q1.15 format) from ACC<31:16>.
Proper data alignment for storing the contents of the accumulator may be achieved by scaling the accumulator down if the guard bits are in use, or scaling the accumulator up if all of the accumulator high bits are not being used. To perform such scaling, the FBCL instruction must operate on the guard bits in byte mode and it must operate on the high accumulator in word mode. If a shift is required, the ALU's 40-bit shifter is employed using the SFTAC instruction to perform the scaling. Listed below is a code snippet for accumulator normalization.
The above code assumes that negative values are returned by FBCL to facilitate scaling up.
DO Operations
The DO instructions implement simple looping. The instruction will execute a set of instructions a certain number of times. The loop count is selected with a constant or a W register. The loop will be executed n+1 times. For a W register, only the LS 14-bits are significant. The DO instruction loads the LSR register with the value of the PC after the DO instruction. It adds the loop offset to that PC and loads that value to the LER register. It then continues to execute code starting with PC+2 until the PC matches the LER. When PC matches LER, the loop count is compared to negative. If not, the PC is loaded with the LSR value to branch back to the loop start. The loop count is decremented. When the loop count compares negative, the next sequential instruction executes. The instructions in the loop need not be consecutive.
The instruction set coding is illustrated with reference to Tables 2 through 162 which depict the PLA mnemonic for each instruction, its assembly syntax, a corresponding description and its corresponding 24 bit opcode. Each of these opcodes is unique and provides a basis for the instruction fetch/decode 110 to derive and transmit different control signals to each processor element to selectively involve that element in the instruction processing. Table 188 sets forth status flag operations for the instruction set.
The instruction set may be grouped into the following functional categories: move instructions; math instructions; rotate/shift instructions; bit instructions; DSP instructions; skip instructions; flow instructions and stack instructions.
Table 190 depicts addressing modes for source registers. Table 191 depicts addressing modes for destination registers. Table 190 depicts offset addressing modes for WSO source registers. Table 193 depicts offset addressing modes for WSO destination registers. Tables 194 through 199 depict examples of prefetch operations and MAC operations. Collectively, the Tables illustrate the composition of the instruction op-code, the mnemonics that are assigned to the opcodes and details of the operation of the instruction.
The following terms, used in the Appendices, are intended to specify an illustrative embodiment of a processor, such as a digital signal controller, that may be used to implement the instruction set according to the present invention: “RoadRunner” and “dsPIC.” Other embodiments may be implemented as a matter of design choice.
Address Generator Units
The following description is enhanced by reference to
The dsPIC core contains two independent address generator units. The X AGU is for MCU and DSP instructions. The Y AGU is for DSP MAC class of instructions only. They are capable of supporting three types of data addressing:
Linear and modulo data addressing modes can be applies to data space or program space. Although bit reversed addressing will work with any EA calculation, by definition it is only applicable to data space.
Data Space Organization
Although the data space memory is organized as 16-bit words, all effective addresses (EAs) point to bytes. Instructions can thus access any byte or aligned words (data words at an even address). Misaligned word accesses are not supported, and if attempted will initiate an address error trap. The LS-bit of the EA is used to determine upper or lower byte access. The LS-bit becomes a ‘don't care’ for word accesses. Each memory (or register where appropriate) must provide independent upper and lower byte write lines to support byte writes. In addition, a muliplexor must be included to route the LS byte of an operand to the upper or lower byte of the target EA word for both reads and writes.
When executing instructions which require just one source operand to be fetched from data space, the X AGU is used to calculate the effective address. The AGU can generate an address to point to anywhere in the 64K byte data space. It supports all addressing modes, modulo addressing for low overhead circular buffers, and bit reversed addressing to facilitate FFT data reorganization.
When executing instructions which require two source operands to be concurrently fetched (i.e. the MAC class of DSP instructions), both the X and Y AGUs are used simultaneously and the data space is split into two independent address spaces, X and Y. The Y AGU supports register indirect post-modified and modulo addressing only. Note that the data write phase of the MAC class of instruction does not split X and Y address space. The write EA is calculated using the X AGU and the data space is configured for full 64Kbyte access.
In the split data space mode, some W register address pointers are dedicated to AGU X, others to AGU Y (see
Instruction Addressing Modes
While alternate addressing modes are possible with the present invention, the basic set of addressing modes for this illustrative example are shown in Table 170. Note that, ‘Wn+=’ indicates that the contents of Wn is added to something to form the effective address which is then written back into Wn. ‘Wn+’ indicates that the contents of Wn is added to something to form the effective address but the contents of Wn remain unchanged.
The addressing modes in Table 170 form the basis of three groups of addressing modes optimized to support specific instruction features. They are Mode 1, Mode 2 and Mode 3. The DSP MAC and derivative instructions are an exception where the addressing modes are encoded differently. This set of addressing modes is referred to as Mode 4. Refer to dsPIC Instruction Set DOS for full details.
EA = effective address
All address modification values (except Wb) are scaled for word access
All but a few instructions support both 8-bit and 16-bit operand data sizes. In order to efficiently accommodate this requirement, all effective addresses are byte aligned. As the data space is 16-bits wide, the following consequences must be understood.
2. The LS-bit of the effective address is used to select which byte (upper or lower) is multiplexed onto bits [7:0] of the data bus for byte sized accesses.
3. Post and pre-modification of a register by a constant value to create a new effective address must take into account of the data size accessed. All constant values, whether implied (e.g. post-inc) or declared (e.g. post-modify with S5lit) are scaled by a factor of 2 for word accesses. For example:
[Ws]+=1 will post-modify data source pointer Ws by 1 for a byte access, and by 2 for a word access. [Ws]+=Slit5 will post-modify data source pointer Ws by Slit5 for byte accesses and Slit5<<1 (shift left by 1) for word accesses. Finally, register offsets are not scaled.
Unless otherwise noted, it is assumed that all addresses and addressing modes refer to byte size accesses. All addressing modes which have to calculate the EA (pre-modified, register offset and constant offset) have very tight timing requirements which may require some instruction addressing sequence restrictions.
Mode 1
Mode 1 determines the addressing mode for one of the two operand sources required for the three operand instructions (found in categories ‘MATH’ and ‘SKIP’). These instructions are of the form:
Operand1 is always a register (i.e. the addressing mode can only be register direct) which is referred to as Wb. Operand 2 is fetched from data memory based upon the addressing mode selected by Mode 1. Mode 1 therefore defines one of the source operand addressing modes and implies that of the other source operand.
In addition, Mode 1 may also provide a signed 5-bit constant (literal) as the operand. In this case, the instruction is of the form:
Operand 1 is always a register (i.e. the addressing mode can only be register direct) which is selected from the Ws field in the instruction. The 4-bit Wb field forms the 4 LS-bits of a signed constant. It is concatenated with the LS-bit of the three bit Mode 1 field to form the 5-bit signed constant value.
In summary, Mode 1 supports the addressing modes shown in Table 171
Mode 1, Register Direct
Addressing Mode 1, Submode 0 is register direct. The implied effective address is the memory mapped address of register Ws. Rather than executing a memory fetch, it may be preferable to perform two W-array fetches if bussing allows. The operand is contained in Ws as shown in
Mode 1, Register Indirect
Addressing Mode 1, Submode 1 is register indirect. The effective address contained in register Ws points to the operand as shown in
Mode 1, Register Indirect with Post Decrement
Addressing Mode 1, Submode 2 is register indirect with post decrement. The effective address contained in register Ws points to the operand. Ws is then post decremented as shown in
Mode 1, Register Indirect with Post Increment
Addressing Mode 1, Submode 3 is register indirect with post increment. The effective address contained in register Ws points to the operand. Ws is then incremented as shown in
Mode 1, Register Indirect with Pre Decrement
Addressing Mode 1, Submode 4 is register indirect with pre-decrement. Register Ws is decremented to form the effective address which points to the operand as shown in
Mode 1, Register Indirect with Pre Increment
Addressing Mode 1, Submode 5 is register indirect with pre increment. Register Ws is incremented to form the effective address which points to the operand as shown in
Mode 1, Register Direct with 5-Bit Signed Literal
Addressing Mode 1, Submode {fraction (6/7)} is register direct with 5-bit signed literal. As shown in
Mode 2
Mode 2 determines the addressing mode for either the result destination or a source operand, depending upon instruction requirements. It follows the same definition for each encoding as Mode 1 except that it applies to only one operand. The Mode 1 signed 5-bit constant value mode makes little sense where Mode 2 is used, and is therefore not supported. In summary, Mode 2 supports the addressing mode shown in Table 172.
Mode 2, Register Direct
Addressing Mode 2, Submode 0 is register direct. The implied effective address is the memory mapped address of register Wsrc or Wdst. The operand is contained in Wsrc as shown in
Mode 2, Register Indirect
Addressing Mode 2, Submode 1 is register indirect. The effective address contained in register Wsrc points to the operand as shown in
Mode 2, Register Indirect with Post Decrement
Addressing Mode 2, Submode 2 is register indirect with post decrement. The effective address contained in register Wsrc points to the operand, or the effective address contained in register Wdst points to the result destination. Wsrc or Wdst is then post decremented as shown in
Mode 2, Register Indirect with Post Decrement
Addressing Mode 2, Submode 3 is register indirect with post decrement. The effective address contained in register Wsrc points to the source operand, or the effective address contained in register Wdst points to the result destination. Wsrc or Wdst are then decremented as shown in
Mode 2, Register Indirect with Pre Decrement
Addressing Mode 2, Submode 4 is register indirect with pre decrement. Register Wsrc or Wdst is decremented to form the effective address which points to the operand as shown in
Mode 2, Register Indirect with Pre Increment
Addressing Mode 2, Submode 5 is register indirect with pre increment. Register Wsrc or Wdst is incremented to form the effective address which points to the operand as shown in
Mode 3
Mode 3 is used by ‘MOVE’ and some of the DSP class instructions where addressing flexibility is important. It follows the same definition for each encoding as Mode 1 except that it uses the Wb field as an address operand (instead of a data operand). In addition, Mode 3 also supports register with register offset addressing mode, sometimes referred to as register indexed.
The 5-bit signed constant required by Submode 6/7 is created by concatenating the Wb field with the LS-bit of the 3-bit Mode 3 field. For the MOV instruction, the Mode 3 addressing modes can differ for the source and destination EA. However, the 4-bit Wb field is shared between both source and destination (but typically only used by one). In summary, Mode 3 supports the addressing mode shown in Table 173.
Mode 3, Register Direct
Addressing Mode 3, Submode 0 is register direct. The implied effective address is the memory mapped address of register Wsrc or Wdst. The operand is contained in Wsrc as shown in
Mode 3, Register Indirect
Addressing Mode 3, Submode 1 is register indirect. The effective address contained in register Wsrc points to the operand as shown in
Mode 3, Register Indirect with Post Decrement
Addressing Mode 3, Submode 2 is register indirect with post decrement. The effective address contained in register Wsrc points to the operand, or the effective address contained in register Wdst points to the result destination. Wsrc or Wdst is then post decremented as shown in
Mode 3, Register Indirect with Post Modification
Addressing Mode 3, Submode 3 is register indirect with post-increment. The effective address contained in register Wsrc points to the operand or the effective address contained in register Wdst points to the result destination. Wsrc or Wdst are then incremented as shown in
Mode 3, Register Indirect with Pre Decrement
Addressing Mode 2, Submode 4 is register indirect with pre decrement. Register Wsrc or Wdst is decremented to form the effective address which points to the operand as shown in
Mode 3, Register Indirect with Register Offset
Addressing Mode 3, Submode 5 is register indirect with register offset. For an operand read, the effective address of the operand is formed by adding the contents of Wsrc and Wb as shown in
Mode 3, Register Indirect with Constant Offset
Addressing Mode 3, Submode 6/7 is register indirect with constant offset. For an operand read, the effective address of the operand is formed by adding the contents of Wsrc and a 5-bit signed literal, as shown in
Mode 4
The dual source operand DSP instructions (MAC, CLRAC, MPYAC & MOVAC) utilize a simplified set of addressing modes (Mode 4) to allow the user to effectively manipulate the data pointers through register indirect tables.
Wsrc must be a member of the set {W4, W5, W6, W7}. For data reads, W4 and W5 will always be directed to the X AGU and W6 and W7 will always be directed to the Y AGU. The effective addresses generated (before and after modification) must therefore be valid addresses within X data space for W4 and W5, and Y data space for W6 and W7. Register indirect with register offset addressing is only available for W5 (in X space) and W7 (in Y space).
In summary, Mode 4 supports the addressing modes shown in Table 174 for X data space and those shown in Table 175 for Y data space.
Mode 4 instructions are word sized only, so post-modification values are already scaled appropriately
Addressing mode defined by read address space
Mode 4 instructions are word sized only, so post-modification values are already scaled appropriately
Addressing mode defined by read address space
Mode 4, Register Indirect
Addressing Mode 4, Submodes 0 & 8 are register indirect. The effective address contained in register Wsrc points to the operand as shown in
Mode 4, Register Indirect with Post Increment
Addressing Mode 4, Submodes 1, 2, 3, 9, 10 & 11 are register indirect with post increment. The effective address contained in register Wsrc points to the operand. Wsrc is then post incremented by 2, 4 or 6 as shown in
Mode 4, Pre-Fetch Inhibit
Addressing mode Mode 4, Submode 4 will inhibit a data fetch from X or Y address space. No target registers are modified.
Mode 4, Register Indirect with Register Offset
Addressing Mode 4, Submodes 12 is register indirect with register offset. The effective address of the operand is formed by adding the contents of Wsrc (W5 or W7) and W8 as shown in
Addressing Mode 4, Submodes 5, 6, 7, 13, 14 & 15 are register indirect with post decrement. The effective address contained in register Wsrc points to the operand. Wsrc is then post decremented by 2, 4 or 6 as shown in
X AGU
The X AGU supports all addressing modes including modulo addressing and bit reversed addressing. A block diagram is shown in
Effective Address Adder
The effective address (EA) adder generates the effective addresses for all instruction using X data space prior to modification by modulo addressing. It supports all addressing modes including bit reversed addressing. The adder accepts the source or destination W register on the A input and either of the following on B input based upon which addressing mode is required.
The Modulo and Bit Reversed Addressing Controller block enables or disables these addressing modes, and provides the appropriate control signals to the rest of the AGU. If modulo and bit reversed addressing are disabled, the EA adder result passes unmodified to the AGU output.
Modulo Addressing Comparator/Subtractor
Modulo addressing relies on automatic correction of any generated EA such that it is forced back into the selected circular buffer address range. For an incrementing buffer, the offset sign is positive. The end address is therefore routed to the subtractor, and subtracted from the new EA. If the result is negative, the address is within the buffer boundaries and will propagate unchanged. If the result is positive (including zero), indicating the EA has passed the end address, it is logically ORed with the start address. This is equivalent to adding it to the start address to create the wrap address for a start address on a ‘zero’ power of two boundary.
For a decrementing buffer, the offset sign is negative. The start address is therefore routed to the subtractor, and subtracted from the new EA. If the result is positive, the address is within the buffer boundaries and will propagate unchanged. If the result is negative, indicating the EA has passed the start address, it is logically AND'ed with the start address. This is equivalent to adding it (a negative value) to the start address to create the wrap address for an end address on a ‘ones’ address boundary.
Y AGU
As the Y AGU is only used by the MAC class of DSP instructions, its function is restricted to supporting post-modified register indirect (using a constant modifier) and modulo addressing. A block diagram is shown in
Effective Address Adder
The effective address (EA) Adder generates the effective addresses for all instruction using Y data space prior to modification by modulo addressing. It supports post-modified register indirect (using a constant modifier). It does not support bit reversed addressing. The adder accepts the source or destination W register on the A input and a constant (0, +2, +4, +6, −2, −4 or −6) on B input, depending upon the post modified constant declared the instruction.
Modulo Addressing Controller
The Modulo Addressing Controller block enables or disables modulo addressing, and provides the appropriate control signals to the rest of the AGU. If modulo addressing is disabled, the EA adder result passes unmodified to the AGU output.
Modulo Addressing Comparator/Subtractor
Modulo addressing relies on automatic correction of any generated EA such that it is forced back into the selected circular buffer address range. For an incrementing buffer, the offset sign is positive. The end address is therefore routed to the subtractor, and subtracted from the new EA. If the result is negative, the address is within the buffer boundaries and will propagate unchanged. If the result is positive (including zero), indicating the EA has passed the end address, it is logically ORed with the start address. This is equivalent to adding it to the start address to create the wrap address for a start address on a ‘zero’ power of two boundary.
For a decrementing buffer, the offset sign is negative. The start address is therefore routed to the subtractor, and subtracted from the new EA. If the result is positive, the address is within the buffer boundaries and will propagate unchanged. If the result is negative, indicating the EA has passed the start address, it is logically AND'ed with the start address. This is equivalent to adding it (a negative value) to the start address to create the wrap address for an end address on a ‘ones’ address boundary.
Modulo Addressing
Modulo addressing is a method of providing an automated means to support circular data buffers using hardware. The objective is to remove the need for software to perform data address boundary checks when executing tightly looped code as is typical in many DSP algorithms.
dsPIC modulo addressing can operate in either data or program space (since the data pointer mechanism is essentially the same for both). One circular buffer can be supported in each of the X (which also provides the pointers into Program space) and Y data spaces. Modulo addressing can operate on any W register pointer.
In order to minimize the hardware size for modulo addressing support, certain usage restrictions may be imposed. In summary, any one circular buffer can only be allowed to operate in one direction as the buffer start address (for incrementing buffers) or end address (for decrementing buffers) is restricted based upon the direction of the buffer. The direction is determined from the address offset sign.
Start and End Address
The modulo addressing scheme requires that either a starting or an end address be specified and loaded into the 16-bit modulo buffer address registers, XMODSRT, XModeND, YMODSRT, YModeND.
The data buffer start address is arbitrary but must be at a ‘zero’, power of two boundary for incrementing address buffers. It can be any address for decrementing address buffers. For example, if the buffer size (modulus value) is chosen to be 100 bytes (0x64), then the buffer start address for an incrementing buffer must contain 7 least significant zeros. Valid start addresses may therefore be 0xXX00 and 0xXX80 where ‘x’ is any hexadecimal value. Adding the buffer length to this value will give the end address to be written into X/YModeND. For example, if the start address was chosen to be 0x2000, then the X/YModeND would be set to (0x2000+0x0064)=0x2064. Note that the last physical address of the buffer will be at end address −1 because the buffer range is 0 to 0x63. ‘Starting address’ refers to the smallest address boundary of the circular buffer. The initial entry address (first access of the buffer) may point to any address within the modulus range.
The data buffer end address is arbitrary but must be at a ‘ones’ boundary for decrementing buffers. It can be at any address for an incrementing buffer. For example, if the buffer size (modulus value) is chosen to be 100 bytes (0x64), then the buffer end address for an incrementing buffer must contain 7 least significant ones. Valid end addresses may therefore be 0xXXFF and 0xXX7F where ‘X’ is any hexadecimal value. Subtracting the buffer length from this value the adding 1 will give the start address to be written into X/YMODSRT. For example, if the end address was chosen to be 0x207F, then the start address would be (0x207F−0x0064+1)=0x201C, which is the first physical address of the buffer.
In an incrementing buffer, the modulo addressing hardware performs the address correction by subtracting the buffer end address from the EA and, if the result is positive, adding it to the start address. As the start address is on a ‘zero’, power of two boundary, the addition may be performed by a logical OR operation.
In a decrementing buffer, the modulo addressing hardware performs the address correction by subtracting the buffer start address from the EA and, if the result is negative, adding it to the end address. As the end address is on a ‘ones’ boundary, the addition may be performed by a logical AND operation. All modulo addressing EA calculations assume word size data (LS-bit of every EA is always clear). The XM value may scaled accordingly to generate compatible (byte) addresses, leaving the LS-bit of all EAs clear.
Buffer Length
The data buffer length can be any value up to 64K words. The buffer length is not used in this scheme to correct buffer addresses or determine modulo range.
W Address Register Selection
The modulo and bit reversed addressing control register MODCON<15:0> contains enable flags plus W register field to specify the W address registers. The XWM and YWM fields selects which registers will operate with modulo addressing. If XWM=15, AGU X modulo addressing is disabled. Similarly, if YWM=15, AGU Y modulo addressing is disabled.
Modulo addressing and bit reversed addressing should not be enabled together. In the event that the user attempts to do this, bit reversed addressing will assume priority when active and X modulo addressing will be disabled.
The X address space pointer W register (XWM) to which modulo addressing is to be applied, is stored in MODCON<3:0> (see Table 176). Modulo addressing is enabled for X data space when XWM is set to any value other than 15 and the XModeN bit is set at MODCON[15].
The Y address space pointer W register (YWM) to which modulo addressing is to be applied, is stored in MODCON<7:4> (see Table 177). Modulo addressing is enabled for Y data space when YWM is set to any value other than 15 and the YModeN bit is set at MODCON[14].
Modulo Addressing Applicability
Modulo addressing can be applied to the effective address (EA) calculation associated with any W register. It is important to realize that the address boundaries checks look for addresses less than or greater than the upper (for incrementing buffers) and lower (for decrementing buffers) boundary addresses (not just equal to). Address changes may therefore jump over boundaries and still be adjusted correctly.
Legend
R = Readable bit
W = Writable bit
U = Unimplemented bit, read as ‘0’
−n = Value at POR
1 = bit is set
0 = bit is cleared
x = bit is unknown
Legend
R = Readable bit
W = Writable bit
U = Unimplemented bit, read as ‘0’
−n = Value at POR
1 = bit is set
0 = bit is cleared
x = bit is unknown
Legend
R = Readable bit
W = Writable bit
U = Unimplemented bit, read as ‘0’
−n = Value at POR
1 = bit is set
0 = bit is cleared
x = bit is unknown
Legend
R = Readable bit
W = Writable bit
U = Unimplemented bit, read as ‘0’
−n = Value at POR
1 = bit is set
0 = bit is cleared
x = bit is unknown
Legend
R = Readable bit
W = Writable bit
U = Unimplemented bit, read as ‘0’
−n = Value at POR
1 = bit is set
0 = bit is cleared
x = bit is unknown
Modulo Addressing Restrictions
As stated above, for an incrementing buffer the circular buffer start address (lower boundary) is arbitrary but must be at a ‘zero’, power of two boundary. For a decrementing buffer, the circular buffer end address is arbitrary but must be at a ‘ones’ boundary. With this scheme, there are no restriction regarding how much an EA calculation can exceeds the address boundary being checked, and still be successfully corrected. Once configured, the direction of successive addresses into a buffer cannot be changed. Although all EA's will continue to be generated correctly irrespective of offset sign, only one address boundary is checked for each type of buffer. Accessing an incrementing buffer with a decrementing address could result in the address decrementing through the start address. If this occurs, an out of range address will be detected but the address wrap operation will fail unless the end address is on a ‘ones’ address boundary (because the addition is simplified to an OR operation). For example, if the start address=0x2000, end addresses that will support a bi-directional buffer include 0x200F, 0x203F or any modulo 2 length buffer. As similar augment applies to accessing a decrementing buffer with an incrementing address.
Modulo Addressing Timing
Modulo addressing can operate on both source and destination operands (i.e. for data reads and writes). Consequently, it must meet timing for the standard instruction cycle timing. Ideally, all AGU adder results should be stable by the end of Q1 (for reads and stack writes) or Q3 (for writes or stack reads). Effective address selection should occur on rising Q2 or Q4. The W address register update (when required) should occur during Q2.
Alternatively, each AGU could be built as an asynchronous block allowing the address calculation and selection to ripple through. However, it is highly likely that this will result in many spurious address transitions which could effect power consumption if allowed to propagate too far.
Bit Reversed Addressing
Bit reversed addressing is intended to simplify data re-ordering for radix-2 FFT algorithms. It is supported by the X AGU only. The carry propagation direction for a bit reversed EA calculation is changed to most significant bit to least significant bit. The modifier (a constant value or register contents) must also be regarded as having its bit order reversed. For example, for a 16 entry buffer (words & byte data size implications are discussed later), the address pointer and result are bit re-ordered as shown in
This example shows a pointer being incremented by one by an adder with a conventional carry direction. The modifier is presented in normal bit order (ls-bit to the right). The address pointer is a bit reversed EA and is presented in reversed bit order (LS-bit to the left). The address and result must be flipped around a pivot point in the middle of the address length in order for this to work with a conventional adder. The problem arises when the buffer length is a variable which makes the bit swap operation unreasonably complex (the pivot point varies). An alternative is to keep the address source and destination in reversed order and use a bit reversed modifier with a reversed carry adder as shown in
Table 181 shows the result of traversing the entire buffer, starting at address 0. Other modifier values will produce a bit-reversed address sequence, but only this one is reported to be of any real use.
Bit reversed addressing is only supported by the X AGU. The address adder carry reverse signal (see
XB<14:0> is the bit reversed address modifier which is typically a constant, indirectly representing the size of the FFT data buffer. The XB values required to provide the correct bit reversal ‘pivot’ points for various size buffers are shown in Table 182. All bit reversed EA calculations assume word size data (LS-bit of every EA is always clear). The XB value is scaled accordingly to generate compatible (byte) addresses.
As can be seen from
See in particular
Legend
R = Readable bit
W = Writable bit
U = Unimplemented bit, read as ‘0’
−n = Value at POR
1 = bit is set
0 = bit is cleared
x = bit is unknown
Many applications require significant amounts of fixed data (e.g. MELP) which can only be held in non-volatile memory. This data can also exceed the 32K word limit of data space memory. Consequently, this data will have to reside in on-chip program FLASH, ROM or in external program space. In order to accommodate this requirement, two addressing options are provided.
The operation of these addressing options is discussed elsewhere in this specification. The following sections revisit the table instructions, in particular the addressing modes supported.
Table Instruction Operation
There are four ‘table’ instructions as shown in Table 184 that operate with Mode 2 addressing modes for both operand source and destination. They operate in a manner similar to that for data space access except that the EA for program space (source or destination) is concatenated with a 8-bit page register, TABPAG<7:0> to create a 24-bit address. All table instructions treat the program memory as 16-bit wide, byte addressable (i.e. same as data space). Program space EA[24:1] forms the 24-bit program memory address and the EA[0] becomes a byte select bit. The TBLRDL and TBLWTL instructions are dedicated to accessing the LS program word.
The program word is viewed as a 32-bit entity which consists of a 24-bit program word plus an 8-bit ‘phantom’ byte (MS-byte). This allows TBLRDH and TBLWTH instructions (which are dedicated to accessing the MS program word) to maintain orthogonality with TBLRDL and TBLWTL. For TBLRDH and TBLWTH instructions, EA[0] remains a byte select bit but physical memory is only present in the LS-byte (EA[0]=0). A byte read of the MS-byte (EA[0]=1) will return 0x00.
Table Read Operation
The program memory is always read as 24-bit long words. The LS-bit of the EA is used by the TBLRDL and TBLWTL (if required) to select required byte of the LS program word. Table 184 indicates which instruction and data width will access the various parts of the program word.
MS-byte read will return 0x00
TBLRDH.w reads a data word from [EAsrc]<31:16>, though [EAsrc]<31:24> will equal 0x00. TBLRDH.b reads a data byte from [EAsrc]<31:24> (always equal to 0x00) or [EAsrc]<16:23> based on the state of EA[0]. The data byte is transferred into destination EA[7:0].
TBLRDL.w reads a data word from [EAsrc]<15:0>. TBLRDL.b reads a data byte from [EAsrc]<15:0> or [EAsrc]<7:0> based on the state of EA[0]. The data byte is transferred into destination EA[7:0].
For most applications, it is assumed that only the LS word of the program word will be used for data storage. The MS byte of the program word would then typically contain an illegal instruction trap to prevent the machine from ever inadvertently attempting to execute data. However, TBLRDH is provided to allow the use of all program memory for data storage if desired.
Table Writes
Mode 2 Addressing for Program Space
Mode 2 determines the addressing mode for the operand source/destination in program space or the operand source/destination from data space, depending upon instruction requirements. It follows the same definition for each encoding as Mode 1 except that it applies to only one operand. The Mode 1 signed 5-bit constant value mode makes little sense where Mode 2 is used, and is therefore not supported.
In summary, Mode 2 for program space data accesses supports the addressing mode shown in Table 185. Mode 2 Submode 0 is meaningless for TBLRD source and TBLWT destination operands as the program memory must be addressed with a pointer. The following addressing mode descriptions are for table read operations.
Note,
this is not meaningful for TBLRD or TBLWT instructions
Mode 2, Register Direct
Addressing Mode 2, Submode 0 is register direct. The implied effective address is the memory mapped address of register Wdst. The table read result is written to Wdst as shown in
Mode 2, Register Indirect
Addressing Mode 2, Submode 1 is register indirect. The effective address contained in register Wsrc points to the operand as shown in
Mode 2, Register Indirect with Post Decrement
Addressing Mode 2, Submode 2 is register indirect with post decrement. The effective address contained in register Wsrc points to the operand, or the effective address contained in register Wdst points to the result destination. Wsrc or Wdst is then post decremented as shown in
Mode 2, Register Indirect with Post Increment
Addressing Mode 2, Submode 3 is register indirect with post increment. The effective address contained in register Wsrc points to the source operand, or the effective address contained in register Wdst points to the result destination. Wsrc or Wdst are then incremented as shown in
Mode 2, Register Indirect with Pre Decrement
Addressing Mode 2, Submode 4 is register indirect with pre decrement. Register Wsrc or Wdst is decremented to form the effective address which points to the operand as shown in
Mode 2, Register Indirect with Pre Increment
Addressing Mode 2, Submode 5 is register indirect with pre increment. Register Wsrc or Wdst is incremented to form the effective address which points to the operand as shown in
Architectural Description
The foregoing illustrative example utilizes the disclosure provided above for various descriptions. The illustrative example of the central processing unit disclosed herein is a 16-bit (data) modified Harvard architecture with a greatly enhanced instruction set including significant support for digital signal processing (DSP). The forgoing description is better understood with reference to
Core Overview
The core has a 24-bit instruction word, with a variable length opcode field. The PC is 24-bits wide (with the LS-bit always clear) addressing up to 8M long words (23-bits). An ‘C18-like’ instruction prefetch mechanism is used to help maintain throughput. Deeper levels of pipelining have been intentionally avoided to maintain good real-time performance. Unconditional overhead free program loop constructs are supported using the DO and REPEAT instructions, both of which are interruptable at any point.
The working register array has been extended to 16×16-bit registers, each of which can act as data, address or offset registers. One working register (W15) operates as a software stack for interrupts and calls.
The data space is 32K words of word or byte addressable space which is split into two blocks referred to as X and Y data memory. Each block has its own independent Address Generation Unit (AGU). Most instructions operate solely through the X memory AGU which will make it appear as one linear space encompassing all data space. The MAC class of DSP instructions will operate through both the X and Y AGUs, splitting the data address space into two parts. The X and Y data space boundary is arbitrary and defined through the address decode of each memory array. See
The upper 32K bytes of data space memory can optionally be mapped into the lower half (user space) of program space at any 16K program word boundary defined by the 8-bit Data Space Program PAGe (DSPPAG) register. This lets any instruction to access program space as if it were data space (other than the additional access cycle it consumes) plus allows external RAM hooked onto the external program space to be mapped into data space, effectively providing an external data space bus.
Overhead free circular buffers (modulo addressing) are supported in both X and Y address spaces. They are intended to remove the loop overhead for DSP algorithms but X modulo addressing can be universally applied using any instructions.
The X AGU also supports bit reverse addressing to greatly simplify input or output data reordering for radix-2 FFT algorithms.
The core supports inherent (no operand), relative, literal, memory direct and four groups of addressing modes (MODE 1, MODE 2, MODE 3 and MODE 4) for register direct and register indirect modes. Each group offers up to six addressing modes. Instructions are associated with predefined addressing modes depending upon their functional requirements.
For most instructions, the core is capable of executing a data (or program data) memory read, a working register (data) read, a data memory write and a program (instruction) memory read per instruction cycle. As a result, three operand instructions can be supported, allowing A+B=C operations to be executed in a single cycle.
A DSP engine has been included to significantly enhance the core arithmetic capability and throughput. It features a high speed 16-bit by 16-bit multiplier, a 40-bit ALU, two 40-bit saturating accumulators and a 40-bit bidirectional barrel shifter. The barrel shifter is capable of shifting a 40-bit value up to 15 bits right or up to 16-bits left in a single cycle. The DSP instructions operate seamlessly with all other instructions and have been designed for optimal real-time performance. The MAC class of instructions can concurrently fetch two data operands from memory while multiplying two W registers. This requires that the data space be split for these instructions and linear for all others. This is achieved in a transparent and flexible manner through dedicating certain working registers to each address space for the MAC class of instructions.
The core features a vectored exception scheme with 15 individually prioritized vectors. The exceptions consist of reset, seven traps and eight interrupts. One interrupt level may be selected (typically the highest one) to execute as a fast (1 cycle entry, 1 cycle exit) interrupt. This function is actually an extension of the logic required to allow a REPEAT instruction loop to be interrupted which can significantly reduce latency in some application. A block diagram of the core is shown in
Compiler Driven Enhancements
In addition to DSP performance requirements, the core architecture was strongly influenced by recommendations which would lead to a more efficient (code size and speed) C compiler.
For most instructions, the core is capable of executing a data (or program data) memory read, a working register (data) read, a data memory write and a program (instruction) memory read per instruction cycle. As a result, three operand instructions can be supported, allowing A+B=C operations to be executed in a single cycle. Instruction addressing modes are significantly more flexible than those of other processors, and are matched closely to compiler needs. The working register array has been extended to 16×16-bit registers, each of which can act as data, address or offset registers. One working register (W15) operates as a software stack for interrupts and calls.
Linear indirect access of all data space is possible, plus the memory direct address range has been extended to 8Kbytes (256bytes in C18). This together with the addition of 16-bit direct address LOAD and STORE instructions has allowed the C1 8 data space memory banking scheme to be eliminated. Linear indirect access of 32K word (64K byte) pages within program space (user and test space) is possible using any working register via new table read and write instructions. Part of data space can be mapped into program space, allowing constant data to be accessed as if it were in data space.
Instruction Fetch Mechanism
The core does not support an instruction pipeline. A pre-fetching mechanism accesses instruction a cycle ahead to maximize available execution time. Most instructions execute in a single cycle. Exceptions are:
Most instructions access data as required during instruction execution. Instructions which utilize the multiplier array must have data available at the beginning of the instruction cycle. Consequently, this data must be prefetched, usually by the preceding instruction, resulting in a simple out of order data processing model.
Data Address Space
The core features one program space and two data spaces. The data spaces can be considered either separately (for some DSP instructions) or together as one linear address range (for MCU instructions). The data spaces are accessed using two Address Generation Units (AGUs) and separate data paths.
Data Spaces
The X AGU is used by all instructions and supports all addressing modes. It also supports modulo and bit reversed addressing for any instructions subject to addressing mode restrictions (see [See Modulo and Bit Reversed Addressing Controller]). The X data path is the return data path for all single data space access instructions.
The Y AGU and data path are used in concert with the X AGU by the MAC class of instructions to provide two concurrent data read paths. No writes occur across the Y-bus. This class of instructions dedicate two W register pointers, W6 and W7, to always operate through the Y AGU and address Y data space independently from X data space. Note that during accumulator write-back, the data address space is considered combined X and Y, so the write will occur across the X-bus. Consequently, it can be to any address irrespective of where the EA is directed.
The Y AGU only supports MODE 4 post modification addressing modes associated with the MAC class of instructions. It also supports modulo addressing for automated circular buffers. Of course, all other instructions can access the Y data address space through the X AGU when it is regarded as part of the composite linear space.
The boundary between the X and Y data spaces is arbitrary and is defined by the memory address decode only (the CPU has no knowledge of the physical location of X or Y memory). The boundary is not user programmable but may change from variant to variant. Obviously, to present a linear data space to the MCU instructions, the address spaces of X and Y data spaces must be contiguous but this is not an architectural necessity. Note that any memory located between 0x8000 and 0xFFFF will not be accessible when program space visibility is enabled for this address space. It should be noted that as address space 0x8000 to 0xFFFF can map to a single memory in program space, it may need to be assigned to either X or Y space (but not both since concurrent accesses from the same space are not possible).
All (effective addresses) are 16-bits wide and point to bytes within the data space to facilitate backward compatibility with previous processors. Consequently, the data space address range is 64K bytes or 32K words.
Data Space Width
The core data width is 16-bits. All internal registers and data space memory are organized as 16-bits wide (some CPU registers are not 16-bits wide, see
Data Alignment
To help maintain backward compatibility and improve data space memory usage efficiency, the ISA supports both word and byte operations. Referring to
Byte reads will always read the entire word, so mechanisms to clear or set peripheral status bits when read (e.g. quick flag clearing mechanisms) are not allowed. As a consequence of this byte accessibility, all effective address calculations (including those generated by the DSP operations which are restricted to word size) must be scaled to step through word aligned memory. For example, the core must recognize that post modified register indirect addressing mode, [Ws]+=1, will result in a value of Ws+1 for byte operations and Ws+2 for word operations.
All word accesses must be aligned (to an even address). Mis-aligned word data fetches are not supported so care must therefore be taken when mixing byte and word operations or translating from C18 code. Should a mis-aligned read or write be attempted, an address fault trap will forced. Depending upon where the fault occurred in the instruction cycle, the Q1/Q2 access (typically a read) and/or the Q3/Q4 access (typically a write) for the instruction underway will be inhibited, and the PC will not be incremented. The trap will then be taken, allowing the system and/or user to examine the machine state prior to execution of the address fault.
All byte loads into any W register are loaded into the LS-byte. The MS-byte is not modified. It should be noted that byte operations use the 16-bit ALU and can produce results in excess of 8-bits. However, to maintain C18 backwards compatibility, the ALU result from all byte operations is written back as a byte (i.e. MS byte not modified), and the status register is updated based only upon the state of the LS-byte of the result.
A sign extend (SE) instruction is provided to allow users to translate 8-bit signed data to 16-bit signed values. Alternatively, for 16-bit unsigned data, users can clear the MS-byte of any W register though executing a CLR.b instruction on the appropriate address.
Although most instructions are capable of operating on word or byte data sizes, it should be noted that the DSP and some other new instructions operate on words only.
Data Space Memory Map
The data space memory is split into two blocks, X and Y data space. A key element of this architecture is that Y space is a subset of X space, and is fully contained within X space. In order to provide an apparent linear addressing space, X and Y space would typically have contiguous addresses (though this is not an architectural necessity).
When executing any instruction other than a MAC class one, the X block consists of the entire 64Kbyte data address space (including all Y addresses). When executing a MAC class of instruction, the X block consists of the entire 64Kbyte data address space less the Y address block for data reads (only). In other words, the full address space is available to all instructions other than the MAC class. During Q1/Q2 data reads, the MAC class of instructions extracts the Y address space from data space and addresses it using EA's sourced from W6 and W7. The remaining data space is referred to as X space but could more accurately be described as “X-Y” space, and is concurrently addressed using W4 and W5 during the same Q1/Q2 data read portion of the cycle. Both “X-Y” and Y address spaces are concurrently accessed only by the MAC class of instruction.
Note that it is the register number (and instruction class) that determine which address space is accessed for data reads and not the EA. Consequently, the data space partitioning of Y address space is arbitrary. In all cases, should an EA point to unoccupied space, all zeros will be returned. For example, although Y address space is visible by all non-MAC class instructions using any addressing mode, an attempt by a MAC instruction to fetch data from that space using W4 or W5 (X space pointers) will return 0x0000.
An example data space memory map is shown in
An 8Kbyte access space is reserved in X address memory space between 0x0000 and 0x1FFF which is directly addressable via a 13-bit absolute address field within all memory direct instructions. The remaining X address space and all of the Y address space is addressable indirectly. The whole of X data space is additionally addressable using LDW and STW instructions which support memory direct addressing with a 16-bit address field.
Program Space Visibility from Data Space
The upper 32Kbytes of data space may optionally be mapped into any 16Kword program space page. This provides transparent access of stored constant data from X data space without the need to use special instructions (i.e. TBLRD, TBLWT instructions). Granularity of program space window may change, subject to conclusions of code security analysis.
This feature also allows the user to map the upper half of data space into an unused area of program memory and thus to the external bus (all unused internal addresses will be mapped externally). Through the placement of an external RAM at this address, external data space support is also provided. Data read and writes must therefore be supported to this address space. Note that the external address map is now essentially no longer strictly Harvard as program and data memory are combined.
Program space access through the data space occurs if the MS-bit of the data space EA is set and program space visibility is enabled by setting the PSV bit in the Core Control register (“CORCON”). Most of the CORCON function relate to DSP operation. Depending upon FLASH setup and access time, the instruction may need to be at least partially pre-decoded during Q4 of the prior instruction. Even so, this will remain a critical path, as the source EA cannot be evaluated until the data write completes in the prior instruction.
Data accesses to this area will add an addition cycle to the instruction being executed since two program memory fetches will be required. The data is fetched in the first cycle, which, other than for some instruction decode, is essentially a NOP. The next instruction is prefetched in the second cycle while the current instruction completes execution (i.e. normal operation) as shown in
Furthermore, instructions executing from internal program memory but accessing external data memory RAM will sustain additional delay due to wait state insertion. Read-modify-write operations will sustain twice the delay. The External Bus Interface (EBI) definition is not complete at this time, however, it is expected that the device will be required to insert an even number of Q clocks into the instruction cycle between Q2 and Q3, and between Q4 and Q1 (of the next cycle) for external data space accesses.
Although not an architectural necessity, a typical data space configuration would define Y data space to be outside this re-mappable area, making the visible program space map to X data space. Y data space will typically contain state (variable) data for DSP operations, and must therefore be RAM. X data space will typically contain coefficient (constant) data which could be NVM or initialized RAM.
Although each transparent data space address will map directly into a program address (see
For external accesses, data space would only require a 16-bit data path, with the trap instruction being automatically concatenated onto any 16-bit data reads.
The data space address is mapped into program memory as shown in
Data Pre-Fetch from Program Space within a REPEAT Loop
When prefetching data resident in program space via the data space window from within a REPEAT loop, all iterations of the repeated instruction will reload the instruction from the Instruction Latch without re-fetching it, thereby releasing the program bus for a data prefetch as shown in
It is important to note that only the MAC class of instructions, which operate with prefetched data, will operate in this manner. All other instructions (e.g. MOV) which require data to be read by the end of Q2 will require the additional cycle in order to complete the data read prior to execution of the instruction during the second cycle.
Program Address Space
The program address space is 8M long words. It is addressable by a 24-bit value from either the PC, table instruction EA or data space EA when program space is mapped into data space as defined by Table 186. Note that the program space address is incremented by two between successive program words in order to provide compatibility with data space addressing. Consequently, the LS-bit of the program space address is always 0, resulting in 23-bits (8M) of address. Program space data accesses use the LS-bit of the program space address as a byte select (same as data space). Memory mapped or stacked PC may need to include the zero LS-bit.
The address space is split into two 4M long word spaces, one for user space the other for test and vector memory space as shown in
The program memory width is 24-bits (long word). To support data storage and FLASH programming, the array must support both word wide access from bits 0-15 and byte wide access from bits 16-23. An instruction fetch example is shown in
There are two methods by which program space can be accessed—via special TABLE instructions or through the remapping of a 16Kword program space page into the upper half of data space (see
Table Instructions
A set of TABLE instructions are provided to move byte or word sized data to and from program space. The instructions are orthogonal even though the MS byte will always read zeros. See dsPIC Instruction Set DOS for more details.
The PC is incremented by two for each successive 24-bit program word. This allows program memory addresses to directly map to data space addresses as shown in
For all the table instructions, the calculated EA (using MODE 2 addressing modes) is concatenated with the 8-bit data table page register, TABPAG<7:0>, to form a 23-bit effective programs space address plus a byte select for program memory as shown in
The LS-bit of the calculated EA becomes the byte select and is used by TBLRDL and TBLWRL (see Program Memory DOS-00204) to select which byte is accessed. The TBLRDL and TBLWRL instructions therefore view program space as byte or aligned word addressable, 16-bit wide, 64K byte pages (i.e. same as data space). EA[0] is ignored for word wide accesses.
The TBLRDH and TBLWRH instructions are used to access the high order byte of the program address. These instructions also support word or byte access for orthogonality but the high order byte of the program address can only be read from the LS byte as shown in
It is assumed that for most applications that the high byte (P[23:16]) will not be used for data, making the program memory appear 16-bits wide for data storage. It is intended that the high byte contain a illegal opcode trap to protect the device from accidental execution of stored data. The TBLRDH and TBLWRH instructions are primarily provided for array program/verification purposes and for those applications who wish to compress data storage.
HEX Data File Compatibility
The program space data access described above can be made compatible with HEX format data files by regarding the program memory as 32-bits wide. Inserting the ‘phantom’ byte as shown in
HEX File Compatibility
External Bus Support
As discussed herein, program space is 24-bits wide which will require either a mix of external FLASH devices to provide all 24-bits in one bus cycle, or several cycles to fetch the 24-bit word in either 8-bit or 16-bit sections. The External Bus Interface (EBI) module will attempt to provide the user maximum flexibility in this area.
Data access is potentially somewhat simpler as the fundamental data size is 16-bits. To permit single (bus) cycle, 16-bit wide external memory access, the EBI may optionally be configured to read from a 16-bit external bus and then automatically concatenate an 8-bit trap field prior to passing the 24-bit pword to the CPU. A 16-bit external data bus can therefore be provided for data storage without compromising device robustness. The unused portion of the external bus data path can also revert back to I/O.
Clocking Scheme
Each instruction cycle (Tcy) is comprised of four Q cycles (Q1-Q4). These Q clock are derived using simple logic (i.e. there is no requirement to make them non-overlapping) within the core (and each peripheral module) from global QA and QB quadrature clocks. The quadrature clocks are generated by the PLL module. Maintaining minimal skew between QA and QB across the device will be a critical factor in attaining the target performance. The four phase Q cycles provide the timing/designation for the Decode, Read, Process Data, Write etc., of each instruction cycle.
Each instruction will show the detailed Q cycle operation for the instruction. Although most instructions follow the scheme above, some issue two reads, others two writes per cycle. From a Q cycle perspective, the DSP instructions differ from in MCU instruction in so much as the DSP instruction can perform two simultaneous source data reads during the Q1/Q2 access from X and Y data space.
Instruction Cycle Timing
Internally, the program address latch is updated at the start of every Q1, and the instruction is fetched from the program memory and latched into the ROMLATCH using Q4. The PC is actually adjusted (incremented or loaded) during Q4 of the previous cycle but not transferred into the program address latch until the next instruction has started.
The instruction is decoded and executed during the following Q1 through Q4. The Instruction is decoded during Q1, though some pre-decode of register and addressing mode bit fields during the prior Q4 may be necessary to speed up execution. Care should be taken with any pre-decoding of the instruction to avoid issues (e.g., having to add extra cycles) during interrupt or call returns.
There are two, independent data space accesses to (possibly) two different addresses during each instruction cycle. During Q1 the (remainder) of the instruction decode is performed and the source operand EA is calculated. During Q2, the source operand data is fetched from memory or peripherals. The ALU performs the computation during Q3 at the same time as the destination EA is also calculated in one of the AGUs. During Q4 the results are written to the destination location.
The clocks and instruction execution flow are shown in
Instruction Flow/Pipelining
An “Instruction Cycle” consists of four Q cycles (Q1, Q2, Q3, and Q4). The instruction fetch and execute are pipelined such that fetch takes one instruction cycle while decode and execute takes another instruction cycle. However, due to this prefetch mechanism, each instruction effectively executes in one cycle.
Instruction Flow Types
There are five types of instruction flows.
The dsPIC core supports both REPEAT and DO instruction constructs to provide unconditional automatic program loop control.
The REPEAT instruction will cause the instruction immediately following to be repeated a fixed number of times as defined by an 14-bit literal encoded in the instruction. The REPEATW instruction will cause the instruction immediately following it to be repeated a fixed number of times as defined by the contents of a W register declared within the instruction, enabling the loop count to be a variable. The loop count is held in the 16-bit RCOUNT register (which is memory mapped) and is thus user accessible. It is initialized by the REPEAT[W] instruction during Q2.
The instruction to be repeated is prefetched during the REPEAT[W] instruction and held in the ROMLATCH. It is not fetched again for all subsequent iterations, and the Instruction Register is loaded from the locked ROMLATCH.
For a loop count value equal 1, REPEAT[W] has the effect of a NOP (other than RCOUNT being loaded with 1). The RA (Repeat Active) status bit in the SR is not set during execution of REPEAT[W] and the PC is incremented as would normally be the case during Q4 of an instruction. The repeat loop is essentially disabled before it begins, allowing the next instruction to execute only once while pre-fetching the subsequent instruction (i.e. normal execution flow).
For loop count values greater than 1, the PC is not incremented as would normally be the case during Q4 of an instruction (and will therefore continue to point to the instruction to be repeated). Further PC increments are inhibited until the loop ends. The RA (Repeat Active) status bit in the SR is also set during execution of REPEAT[W]. See
The RCOUNT register is decremented then tested during each instruction iteration. It will equal two at the beginning of the penultimate instruction. The subsequent decrement will make RCOUNT=1, signifying the end of the repeat loop, which causes the RA bit in the SR to be cleared. In addition, the PC increment inhibit is released and the PC bumps in Q4 of this instruction to point to the instruction after the repeated instruction. The last instruction to be repeated is then executed as a normal instruction (i.e. includes an instruction prefetch & PC bump). Testing for the end of loop during the penultimate instruction is required to allow a normal instruction prefetch to occur during the last iteration (i.e. no delays due to ‘end of loop’ tests).
A consequence of executing the last instruction outside the repeat loop is that the loop will effectively iterate [loop count+1] times (i.e. a loop count of 0 is not possible). Choosing the loop termination count value to equal one enables the loop count and number of iteration to match for all but RCOUNT equal to zero. For a loop count value of 0, REPEAT will iterate the next instruction 16384 times and REPEATW will iterate the next instruction 65536 times
The combined instruction flow diagram for REPEAT[W] and DO[W] is shown in
A REPEAT instruction loop may be interrupted at any time. As is the case for all instructions, the PC update is arranged such that it will not be incremented during the instruction when an exception is acknowledged. For a repeated instruction, the PC update is already inhibited (by the RA bit) which ensures that, upon return, the RETFIE instruction will correctly prefetch said instruction (i.e. the stacked PC will point to the instruction to be repeated).
Exception processing proceeds as normal, except for a fast interrupt acknowledgment where the contents of the Instruction Latch are transferred into a temp register (IR Temp). This occurs irrespective of the state of the RA bit and is not related to the REPEAT operation. Standard exception processing completes and the ISR is executed as normal in either case.
Note that, in order to interrupt a REPEAT in progress, the LS-byte of the SR (SRL, which includes the RA bit) is stacked during exception processing. This preserves the state of the RA bit prior to interruption. The RA bit in the SR is then cleared, also during exception processing. In addition, the RCOUNT register has a shadow register associated with it which is loaded during exception processing (any exception, not just for a fast interrupt). This, in conjunction with the preservation of the RA bit (SRL stacked), permits another REPEAT instruction to be executed within the initial interrupt service routine (i.e. any ISR provided interrupt nesting is not enabled).
Should interrupt nesting be enabled, subsequent interrupts must stack the RCOUNT register before another REPEAT loop may be executed from within the ISR. If RCOUNT is stacked, the RA preservation feature will also operate for all subsequent nested interrupts. Note that RCOUNT must be restored prior to returning from the ISR. The RA bit is restored automatically during interrupt return processing. Also note that for nested interrupts, the most efficient method to handling REPEAT instructions within ISRs will be to always stack RCOUNT.
Interrupt return operates as normal and requires no special handling for returning into a REPEAT[W] loop. Normal interrupts will prefetch the repeated instruction during the second cycle of the RETFIE. Return from a fast interrupt will reload the Instruction Latch from the IR Temp register and execute the next repeat iteration during the second cycle. The stacked RA bit will be restored when the SRL register is pooped and, if set, the interrupted REPEAT loop will be resumed. Clearing the RA bit in the stacked SR from within an ISR is a method to force an interrupted loop to terminate (subject to one more iteration) after the interrupt returns. RA is not software modifiable within the SR.
The DO & DOW instructions will execute instructions following the DO[W] until an end address is reached at which time instruction execution will start again at the instruction immediately following the DO[W]. This will be repeated a finite number of times as defined by either an 14-bit literal encoded in the 1st word of the instruction (for DO) or by the contents of a W register declared within the instruction (for DOW), enabling the loop count to be a variable. The instruction execution order need not be sequential, nor does the loop end address have to be greater than the start address.
Referring to
The loop start address (PC) is stored in the DOSTART register during Q2 of the second cycle. The two cycle DO[W] instruction then calculates the end address by executing a 23-bit signed addition of the current PC[23:1] (which points to the first loop instruction) and a signed 16-bit literal offset encoded within the 2nd word of the DO[W] instruction. This is executed using the MCU ALU during Q1 and Q3 of the 2nd cycle. The loop end address is stored in the DOEND register during Q4. The DOEND and DOSTART registers are closely coupled with the PC as shown in
The DO[W] literal address offset is such that the end address is calculated to be the last instruction within the loop. This will cause a valid PC address compare during the Q1 compare operation of the penultimate instruction (i.e. during the prefetch of the last instruction). This will then enable the loop counter to be decremented and tested, and the result combined with the address compare during Q3 of the same instruction.
If the loop counter after decrement does not equal 1 (as shown in
If the loop counter after decrement equals 1 (as shown in
The DO loop is equivalent to the ‘C’ construct DO-WHILE which implies that the loop will be executed at least once. Choosing the loop termination count value to equal one enables the loop count and number of iteration to match for all DCOUNT values except zero.
For a DCOUNT loop count value of 0, DO will iterate the loop 16384 times and REPEATW will iterate the loop 65536 times. The loop end comparison may be an equality test only. The loop end address must be pre-fetched in order for the end of loop condition to be recognized. That is, exiting the loop to a PC value greater than the end address (or less than the start address) will not cause the loop count to change.
The combined instruction flow diagram for REPEAT[W] and DO[W] is shown in
The DOSTART, DOEND and DCOUNT loop registers have a shadow register associated with them which permit a single level of nesting. In addition, as the DOSTART, DOEND and DCOUNT registers are user accessible, they may be manually saved to permit additional nesting. However, it should be noted that the overhead associated with manually saving these registers outweighs the benefits of additional DO loop nesting with the possible exception of a DO loop within an interrupt.
When a DO is executed, the. DOSTART, DOEND and DCOUNT registers are transferred into the shadow registers prior to being updated with the new loop values. The DA bit is also shadowed prior to being set during DO execution. These operations occur for all DO instruction executions, whether nested or not. Similarly, during all loop exits, the shadow contents of the DOSTART, DOEND and DCOUNT registers and the DA bit are transferred back into their respective host registers.
DO Loops and Interrupts
A DO[W] loop may be interrupted at any time without penalty. Note that, in order to suspend an interrupted DO loop during execution of an ISR, the LS-byte of the SR (SRL, which includes the DA bit) is stacked then cleared (in the SR) during exception processing. Although this is not essential because the DO loop end address is unlikely to be encountered during the ISR, it is consistent with REPEAT operation. If a background DO loop was active (stacked DA bit set), the DOSTART, DOEND and DCOUNT registers must then be stacked before another DO loop may be executed from within the ISR. This applies to any interrupt class. These register must be restored prior to returning from the ISR. Prior to executing a DO within an interrupt requires stacking and restoring five words of data. This overhead may mean DO is not the most efficient means for loop control within an ISR.
Interrupt return operates as normal and requires no special handling for returning into a DO[W] loop. The stacked DA bit will be restored into the SRL register and, if set, the interrupted DO loop will resume. Clearing the DA bit in the stacked SR from within an ISR is a method to force an interrupted loop to terminate early after the interrupt returns. The loop will complete the iteration underway and then terminate. If the interrupt occurs during the penultimate or last instruction of the loop, one more iteration of the loop will occur. DA is not software modifiable within the SR.
DO and REPEAT Restrictions
Any instruction can follow a REPEAT except for:
As it is not especially useful to execute any of these instructions within a repeat loop, the restrictions on this instruction are minimal.
REPEAT is interruptible and can be then be nested from within an initial (first, unnested) ISR. If interrupt nesting is enabled, REPEAT can be nested from within any ISR but only after the user stacks the appropriate registers manually (all REPEAT control registers are user accessible).
All DO loops must contain at least 2 instructions because the loop termination tests are performed in the penultimate instruction. REPEAT should be used for single instruction loops. All other restrictions with regard to the DO loop revolve around the last instruction. With the notable exception of CALLW, the last instruction should not be:
If at all possible, the assembler should be capable of flagging these instructions if placed at the end of a DO loop.
The (one word) CALLW will function correctly at the end of a DO loop because the stacked PC will address the start of loop instruction (to fetch upon return).
PC relative instructions (e.g. RCALL, branches) won't work correctly at the end of a loop because the PC calculation will be performed using the current PC value which will be the loop start address. That is, the assembler psuedo-PC and the real PC do not match at this point.
Should execution of a REPEAT[W] instruction as the last loop instruction be attempted, the DO[W] loop counter will take priority and the REPEAT target instruction will never be executed before the DO[W] loop jumps to the loop start. Should the last loop instruction be the instruction being repeated within a REPEAT loop, the DO[W] loop counter will also take priority and the REPEAT target instruction will only execute once with no change to RCOUNT before the DO[W] loop jumps to the loop start.
Two-word instructions will fail if placed at the end of a DO loop because the PC is adjusted in the penultimate instruction in order to accommodate the instruction prefetch (without a dead cycle). Consequently, the second word of a two-word instruction would therefore be incorrectly fetched from the loop start address.
RETURN and RETLW will work correctly when the last instruction of a DO loop but the user must be responsible for returning into the loop to complete it.
Programmer Model
The programmers model is shown in
Most of these registers have a shadow register associated with them as shown in
Byte instructions which target the working register array only effect the least significant byte of the target register. However, a consequence of memory mapped working registers is that both the least and most significant bytes can be manipulated through byte wide data memory space accesses.
Uninitialized W Register Trap
The W register array (except W15) is not effected by a reset and therefore must be considered uninitialized until a written to. An attempt to read an uninitialized register for an address access will generate an address error trap (fetch of an uninitialized address). In this situation, the user will most likely choose to reset the application, though recovery may be possible through an examination of the problematic instruction (via the stacked return address).
This function is achieved through the addition of a single latch to each W register (W0 through W14). The latch is cleared by reset and set by the first write to the associated register, as shown in [See Uninitialized W Register Trap]. When the latch is clear, a read of the corresponding register to either AGU will force an address error trap. W15 is initialized during reset (see [See Software Stack Pointer]) and consequently does not require this feature.
Default W Register Selection
The default W register for all file register instructions is defined by the WD[3:0] field in the CORCON (Core Control register). This field is reset to 0x0000, corresponding to register W0.
Software Stack Pointer
W15 has been dedicated as the software stack pointer, and will be automatically modified by exception processing and subroutine calls and returns. However, W15 can be referenced by any instruction in the same manner as all other W registers. This simplifies reading, writing and manipulating the stack pointer (e.g. creating stack frames). In order to protect against misaligned stack accesses, W15[0] is always clear.
W15 is initialized to 0x0200 during a reset. This will point to valid RAM in all derivatives and will guarantee stack availability for non-maskable trap exceptions or priority level 7 interrupts which may occur before the SP is set to where the user desires it. The user may reprogram the SP during initialization to any location within data space.
W14 has been dedicated as a stack frame pointer as defined by the LNK and ULNK instructions. However, W14 can be referenced by any instruction in the same manner as all other W registers.
The stack pointer always points to the first available free word and fills working from lower towards higher addresses. It pre-decrements for stack pops (reads) and post increments for stack pushs (writes) as shown in
There is a stack limit register (SPLIM) associated with the stack pointer that is uninitialized at reset. SPLIM[15:1] is a 15-bit register. As is the case for the stack pointer, SPLIM[0] is forced to 0 because all stack operations must be word aligned.
The stack overflow check will not be enabled until a word write to SPLIM occurs after which time it can only be disabled by a reset. All EA's generated using W15 as Wsrc or Wdst (but not Wb) are compared against the value in SPLIM. Should the EA be greater than the contents of SPLIM, then a stack error trap is generated. This comparison is a subtraction, so the trap will occur for any SP greater than SPLIM. In addition, should the SP EA calculation wrap over the end of data space (0xFFFF), AGU X will generate a carry signal which will also cause a stack error trap (if the SPLIM register has been initialized).
The stack is initialized to 0x0200 during reset. A simple stack underflow mechanism is provided which will initiate a stack error trap should the stack pointer address ever be less than 0x0200.
The dsPIC core has a 16-bit status register (SR), the LS-byte of which is referred to as the lower status register (SRL). A detailed description is shown in Table 187. SRL contains all the MCU ALU operation status flags (including the new ‘sticky Z’ (SZ) bit) plus the REPEAT and DO loop active status bits. During exception processing, SRL is concatenated with the MS-byte of the PC to form a complete word value which is then stacked. The upper byte of the SR contains the DSP Adder/Subtractor status bits.
All SR bits are read/write except for the DA and RA bits which are read only because accidentally setting them could cause erroneous operation (include inhibiting PC increments). When the memory mapped SR is the destination address for an operation which affects the any of the SR bits, data writes are disabled to all bits.
Legend
R = Readable bit
W = Writable bit
U = Unimplemented bit, read as ‘0’
−n = Value at POR
1 = bit is set
0 = bit is cleared
x = bit is unknown
Exceptions and Stack
The core supports a prioritized interrupt and trap exception scheme. There are up to eight levels of interrupt priority, each of which has an interrupt vector associated with it. Each interrupt source is user programmable with regard to what priority (and therefore vector address) it uses. The highest priority interrupt is non-maskable. The are seven traps available to improve operational robustness, all of which are non-maskable. They adhere to a predefined priority scheme.
Stacking associated with exceptions and subroutine calls is executed on a software stack. Register W15 is dedicated as the stack pointer and has the LSB=0.
Note:
1For byte operations, add or subtract 1.
2For word operations, add or subtract 2.
3: Wd assumed to be in register direct mode (qqq = 000).
Note:
1For byte operations, add or subtract 1.
2For word operations, add or subtract 2.
3. Ws assumed to be in register direct mode (ppp = 000).
Note:
1For byte operations, add or subtract 1.
2For word operations, add or subtract 2.
3For byte and word operations, add 2's compliment Wb.
4For byte operations, add or subtract gwwww.
5For word operations, add or subtract (2 * gwwww) or gwwww0.
For byte operations, add or subtract 1.
For word operations, add or subtract 2.
For byte and word operations, add 2's compliment Wb.
For byte operations, add or subtract hwwww.
For word operations, add or subtract (2 * hwwww) or hwwww0.
The invention, therefore, is well adapted to carry out the objects and attain the ends and advantages mentioned, as well as others inherent therein. While the invention has been depicted, described, and is defined by reference to exemplary embodiments of the invention, such references do not imply a limitation on the invention, and no such limitation is to be inferred. The invention is capable of considerable modification, alternation, and equivalents in form and function, as well occur to those ordinarily skilled in the pertinent arts and having the benefit of this disclosure. The depicted and described embodiments of the invention are exemplary only, and are not exhaustive of the scope of the invention. Consequently, the invention is intended to be limited only by the spirit and scope of the appended claims, giving full cognizance to equivalents in all respects.
This application is a continuation-in-part of, and claims priority to, U.S. patent application Ser. No. 09/870,457 which was filed on Jun. 1, 2001 by the same inventors and assigned to the same entity, and is herein incorporated by reference for all purposes. This application is related to the following applications: U.S. application for “Repeat Instruction with Interrupt” on Jun. 1, 2001 by M. Catherwood, et al. (MTI-1665); U.S. application for “Low Overhead Interrupt” on Jun. 1, 2001 by M. Catherwood, et al. (MTI-1666); U.S. application for “Find First Bit Value Instructions” on Jun. 1, 2001 by M. Catherwood (MTI-1667); U.S. application for “Bit Replacement and Extraction Instructions” on Jun. 1, 2001 by B. Boles, et al. (MTI-1668); U.S. application for “Shadow Register Array Control Instructions” on Jun. 1, 2001 by M. Catherwood, et al. (MTI-1669); U.S. application for “Multi-Precision Barrel Shifting” on Jun. 1, 2001 by J. Conner, et al. (MTI-1670); U.S. application for “Dynamically Reconfigurable Data Space” on Jun. 1, 2001 by M. Catherwood, et al. (MTI-1735); U.S. application for “Modified Harvard Architecture Processor Having Data Memory Space Mapped to Program Memory Space” on Jun. 1, 2001 by J. Grosbach, et al. (MTI-1736); U.S. application for “Modified Harvard Architecture Processor Having Data Memory Space Mapped to Program Memory Space with Erroneous Execution Protection” on Jun. 1, 2001 by M. Catherwood (MTI-1737); U.S. application for “Dual Mode Arithmetic Saturation Processing” on Jun. 1, 2001 by M. Catherwood (MTI-1738); U.S. application for “Compatible Effective Addressing With a Dynamically Reconfigurable Data Space Word Width” on Jun. 1, 2001 by M. Catherwood, et al. (MTI-1739); U.S. application for “Maximally Negative Signed Fractional Number Multiplication” on Jun. 1, 2001 by M. Catherwood (MTI-1754); U.S. application for “Euclidean Distance Instructions” on Jun. 1, 2001 by M. Catherwood (MTI-1755); U.S. application for “Sticky Z Bit” on Jun. 1, 2001 by J. Elliot (MTI-1756); U.S. application for “Variable Cycle Interrupt Disabling” on Jun. 1, 2001 by B. Boles, et al. (MTI-1757); U.S. application for “Register Pointer Trap” on Jun. 1, 2001 by M. Catherwood (MTI-1758); U.S. application for “Modulo Addressing Based on Absolute Offset” on Jun. 1, 2001 by M. Catherwood (MTI-1759); U.S. application for “Dual Dead Time Unit for PWM Module” on Jun. 1, 2001 by S. Bowling (MTI-1789); U.S. application for “Fault Pin Priority” on Jun. 1, 2001 by S. Bowling (MTI-1790); U.S. application for “Extended Resolution Mode for PWM Module” on Jun. 1, 2001 by S. Bowling (MTI-1791); U.S. application for “Configuration Fuses for Setting PWM Options” on Jun. 1, 2001 by S. Bowling (MTI-1792); U.S. application for “Automatic A/D Sample Triggering” on Jun. 1, 2001 by B. Boles (MTI-1794); U.S. application for “Reduced Power Option” on Jun. 1, 2001 by M. Catherwood (MTI-1796) which are all hereby incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 09870457 | Jun 2001 | US |
Child | 10969338 | Oct 2004 | US |