The present invention relates to digital interfaces for microcontrollers, and in particular to programmable interfaces capable of driving a wide variety of outputs.
Single board computing devices, such as the Raspberry Pi, have become very popular for a variety of hobbyist and industrial uses. The various models of Raspberry Pi are provided with a set of general purpose input/output (GPIO) pins which can be used for a variety of purposes, such as transmitting or receiving data using a variety of interface standards or controlling external devices, and are controlled by the System on Chip (SoC) of the computing device. It is desirable to provide extended capabilities for input and output of data and control of external devices.
According to the present invention, there is provided an I/O circuit block comprising:
Optionally, the state machine is configured to execute instructions of an instruction set, the instruction set including: an IN instruction to transfer a specified number of bits into the register unit; and an OUT instruction to transfer a specified number of bits from the register unit.
Optionally, the instruction set consists of: the IN instruction; the OUT instruction; a JMP instruction; a WAIT instruction; a PUSH instruction; a PULL instruction; a MOV instruction; an IRQ instruction; and a SET instruction.
Optionally, instructions executed by the state machine include an operation to set a value to at least one of the plurality of terminals concurrently with execution of another operation defined by the instruction.
Optionally, the register unit comprises an input FIFO register and an output FIFO register, the input FIFO register and the output FIFO register being configurable as a single, double length FIFO register for input or output.
Optionally, the state machine comprises an input shift register and an output shift register and is operable in an auto pull mode wherein a first predetermined number of bits are transferred from the register unit to the output shift register when the content of the output shift register is less than a first threshold and/or in an auto push mode wherein a second predetermined number of bits are transferred to the register unit from the input shift register when the content of the input shift register is greater than a second threshold.
Optionally, the state machine is configured to also execute instructions from one or more of: the register unit, a shift register within the state machine and a configuration register within the state machine.
Optionally, the state machine comprises a wrap control field specifying a sequence of instructions which are executed repeatedly.
Optionally, the state machine comprises a configurable clock divider to enable the state machine to operate at a lower clock rate than an external clock.
Optionally, there are a plurality of state machines and a corresponding plurality of register units.
Optionally, the instruction memory is shared between the plurality of state machines and configured for multiple simultaneous reads.
Optionally, the input/output unit comprises a multiplexer configured to enable each of the state machines to be connected to any of the terminals.
Optionally, the I/O block has a plurality of IRQ flags and wherein each of the state machines can set and read each of the IRQ flags.
According to the present invention there is also provided an integrated circuit comprising at least one I/O circuit block as described above.
Optionally, the integrated circuit has a plurality of clock sources and wherein the integrated circuit is operable in a DORMANT mode wherein all the clock sources are halted.
Optionally, the circuit is configured to wake from DORMANT mode on receipt of an input without a clock running.
Optionally, the integrated circuit has a plurality of peripheral registers and wherein the peripheral registers are addressable bitwise in at least one of the following modes:
Optionally, the peripheral register comprises a read/write accessible register and a bus interposer which translates upstream atomic writes into downstream read-modify-write sequences.
Optionally, the integrated circuit has at least one general purpose CPU.
According to the present invention there is further provided an assembler for a state machine having an instruction set consisting of: an IN instruction; an OUT instruction; a JMP instruction; a WAIT instruction; a PUSH instruction; a PULL instruction; a MOV instruction; an IRQ instruction; and a SET instruction.
According to the present invention there is further provided an program executable by a state machine having an instruction set consisting of: an IN instruction; an OUT instruction; a JMP instruction; a WAIT instruction; a PUSH instruction; a PULL instruction; a MOV instruction; an IRQ instruction; and a SET instruction.
Embodiments of the present invention can provide a low cost microcontroller device with the quality, cost and simplicity of the Raspberry Pi. Much like the Raspberry Pi is an accessible computer, embodiments of the invention can provide an accessible chip with everything you need to build a product. The present invention can be used in a wide variety of applications, such as providing data interfaces, controlling devices and converting streams of data (e.g. audio data, video data or sensor data) from one format to another. The present invention can be used in consumer products and industrial control settings.
The invention will be described further below with reference to exemplary embodiments and the accompanying schematic drawings, in which:
In the various drawings, like parts are denoted by like references.
The present invention is described below with reference to an embodiment included in a microcontroller referred to herein as the RP2040. It will be appreciated that an interface according to the invention can be used in other microcontrollers and other devices. For example, the interface of the invention may be incorporated in a System on Chip in particular but not exclusively for use in single board computers. The present invention can also be embodied as a separate IC or combined with other modules, such as memory, processors, etc.
A system overview of the RP2040 1 is shown in
The RP2040 is supported with both C/C++ and MicroPython cross-platform development environments, including easy access to runtime debugging. It has UF2 boot and floating-point routines baked into the chip. The in-built USB can act as both device and host. It has two symmetric cores and high internal bandwidth, making it useful for signal processing and video. The chip has a large amount of internal RAM but uses external flash, allowing you to choose how much memory you need.
RP2040 has a dual processor complex, internal memory and peripheral blocks connected via AHB/APB bus fabric. Code may be executed directly from external memory through a dedicated SPI, DSPI or QSPI interface. A small cache improves performance for typical applications. Debug is available via the SWD interface. Internal SRAM is arranged in banks which can contain code or data and is accessed via dedicated AHB bus fabric connections, allowing bus masters to access separate bus slaves without being stalled. DMA bus masters are available to offload repetitive data transfer tasks from the processors.
GPIO pins 13 can be driven directly, or from a variety of dedicated logic functions. Dedicated hardware for fixed functions such as SPI, I2C, UART. As discussed in detail below, flexible configurable PIO controllers can be used to provide a wide variety of IO functions.
A USB controller with embedded PHY can be used to provide FS/LS Host or Device connectivity under software control. Four ADC inputs which are shared with GPIO pins. Two PLLs to provide a fixed 48 MHz clock for USB or ADC, and a flexible system clock up to 133 MHz. An internal Voltage Regulator to supply the core voltage so the end product only needs supply the IO voltage.
The RP2040 bus fabric routes addresses and data across the chip, with a maximum sustained throughput of four 32-bit transfers per system clock cycle. This provides access to code for processor instruction fetch, data, and memory-mapped IO.
The bus fabric connects 4 AHB-Lite masters, i.e. devices which generate addresses:
The four bus masters can access any four different crossbar ports simultaneously, and the bus fabric does not add wait states to any AHB-Lite slave access, so at a system clock of 125 MHz, the maximum sustained bus bandwidth is 2.0 GB/s. The system address map has been arranged to make this parallel bandwidth available to as many software use cases as possible for example, the striped SRAM alias (SRAM) scatters main memory accesses across four crossbar ports (SRAM0 . . . 3), so that more memory accesses can proceed in parallel.
At the centre of the RP2040 bus fabric is a 4:10 fully-connected crossbar. Its 4 upstream ports are connected to the 4 system bus masters, and the 10 downstream ports connect to the highest-bandwidth AHB-Lite slaves (namely the memory interfaces) and to lower layers of the fabric.
The crossbar is built from two components:
The main crossbar on RP2040 consists of 4 1:10 splitters and 10 4:1 arbiters, with a mesh of 40 AHB-Lite bus channels between them. Note that, as AHB-Lite is a pipelined bus, the splitter may be routing back a response to an earlier request from downstream port A, whilst a new request to downstream port B is already in progress. This does not incur any cycle penalty.
As shown in
Two Cortex-M0+ processors are each provided with a dedicated 32-bit AHB-Lite bus port, for code fetch, loads and stores. The SIO is connected to the single-cycle IOPORT bus of each processor, and provides GPIO access, two-way communications, and other core-local peripherals. Both processors can be debugged via a single multi-drop Serial Wire Debug bus. 26 interrupts (plus NMI) are routed to the NVIC and WIC on each processor.
The processors use a number of interfaces to communicate with the rest of the system:
The Single-cycle IO block (SIO), shown in
The SIO appears as memory-mapped hardware within the IOPORT space. The single-cycle IO block contains memory-mapped hardware which the processors must be able to access quickly. The FIFOs and spinlocks support message passing and synchronisation between the two cores. The shared GPIO registers provide fast and concurrency-safe direct access to GPIO-capable pins.
Some core-local arithmetic hardware can be used to accelerate common tasks on the processors.
All IOPORT reads and writes (and therefore all SIO accesses) take place in exactly one cycle, unlike the main AHB-Lite system bus, where the Cortex-M0+ requires two cycles for a load or store, and may have to wait longer due to contention from other system bus masters. This is vital for interfaces such as GPIO, which have tight timing requirements.
SIO registers are mapped to word-aligned addresses in the range 0xd0000000 . . . 0xd000017c. The remainder of the IOPORT space is reserved for future use.
RP2040 has 36 multi-functional General Purpose Input/Output (GPIO) pins 13, divided into two banks. In a typical use case, the pins in the QSPI bank (QSPI_SS, QSPI_SCLK and QSPI_SD0 to QSPI_SD3) are used to execute code from an external flash device, leaving the User bank (GPIO0 to GPIO29) for the programmer to use. All GPIOs support digital input and output, but GPIO26 to GPIO29 can also be used as inputs to the chip's Analogue to Digital Converter (ADC). Each GPIO can be controlled directly by software running on the processors, or by a number of other functional blocks.
The User GPIO bank supports the following functions:
The QSPI bank supports the following functions:
The logical structure of an example IO is shown in
Each GPIO is connected to the off-chip world via a “pad”. Pads are the electrical interface between the chip's internal logic and external circuitry. They translate signal voltage levels, support higher currents and offer some protection against electrostatic discharge (ESD) events. Pad electrical behaviour can be adjusted to meet the requirements of the external circuitry. The following adjustments are available:
The pad's Output Enable, Output Data and Input Data ports are connected, via the IO mux, to the function controlling the pad. All other ports are controlled from the pad control register. The register also allows the pad's output driver to be disabled, by overriding the Output Enable signal from the function controlling the pad. See GPIO0 for an example of a pad control register.
Both the output signal level and acceptable input signal level at the pad terminal 71 are determined by the digital IO supply (IOVDD).
IOVDD can be any nominal voltage between 1.8V and 3.3V, but to meet specification when powered at 1.8V, the pad input thresholds must be adjusted by writing a 1 to the pad VOLTAGE_SELECT registers. By default the pad input thresholds are valid for an IOVDD voltage between 2.5V and 3.3V. Using a voltage of 1.8V with the default input thresholds is a safe operating mode, though it will result in input thresholds that don't meet specification.
Pad input threshold are adjusted on a per bank basis, with separate VOLTAGE_SELECT registers for the pads associated with the User IO bank (IO Bank 0) and the QSPI IO bank. However, both banks share the same digital IO supply (IOVDD), so both register should always be set to the same value.
Programmable Input/Output (PIO)
The programmable input/output block (PIO) is a versatile hardware interface. It can support a variety of IO standards, including:
A PIO is programmable in the same sense as a processor. There are two PIO blocks with four state machines each, that can independently execute sequential programs to manipulate GPIOs and transfer data. Unlike a general purpose processor, PIO state machines are highly specialised for IO, with a focus on determinism, precise timing, and close integration with fixed-function hardware. Each state machine is equipped with:
The registers and bus FIFO may be longer or shorter than 32 bits, e.g. 16 or 64 bits. Each state machine, along with its supporting hardware, occupies approximately the same silicon area as a standard serial interface block, such as an SPI or I2C controller. However, PIO state machines can be configured and reconfigured dynamically to implement numerous different interfaces.
Making state machines programmable in a software-like manner, rather than a fully configurable logic fabric like a complex programmable logic device (CPLD), allows more hardware interfaces to be offered in the same cost and power envelope. This also presents a more familiar programming model, and simpler tool flow, to those who wish to exploit PIO's full flexibility by programming it directly, rather than using a premade interface from the PIO library.
PIO is highly performant as well as flexible, thanks to a carefully selected set of fixed-function hardware inside each state machine. For example, video data can be output at a rate of 360 Mb/s during the active scanline period when running from a 48 MHz system clock. To achieve this, one state machine is used to handle frame/scanline timing and generate the pixel clock, while another handles the pixel data, and unpacks run-length-encoded scanlines.
State machines' inputs and outputs are mapped to up to 32 GPIOs, and all state machines have independent, simultaneous access to any GPIO. For example, the standard UART code allows TX, RX, CTS and RTS to be any four arbitrary GPIOs, and I2C permits the same for SDA and SCL. The amount of freedom available depends on how exactly a given PIO program chooses to use PIO's pin mapping resources, but at the minimum, an interface can be freely shifted up or down by some number of GPIOs.
The four state machines execute from a shared instruction memory. System software loads programs into this memory, configures the state machines and IO mapping, and then sets the state machines running. PIO programs come from various sources: assembled directly by the user, drawn from the PIO library, or generated programmatically by user software.
From this point on, state machines are generally autonomous, and system software interacts through DMA, interrupts and control registers, as with other peripherals on RP2040. For more complex interfaces, PIO provides a small but flexible set of primitives which allow system software to be more hands-on with state machine control flow.
PIO state machines execute short, binary programs. Programs for common interfaces, such as UART, SPI, or I2C, are available in the PIO library, so in many cases, it is not necessary to write PIO programs. However, the PIO is much more flexible when programmed directly, supporting a wide variety of interfaces which may not have been foreseen by its designers.
The PIO has a total of nine instructions: JMP, WAIT, IN, OUT, PUSH, PULL, MOV, IRQ, and SET which are discussed in more detail below. More, fewer or different instructions may be implemented but the present inventors have determined that this set achieves a desirable balance between complexity of the state machine, flexibility and efficiency of programming; fewer instructions may reduce functionality and/or require programs to be longer whereas more instructions would require the state machine to be more complex, possibly meaning it has to run at a lower clock speed or have higher power consumption.
Though the PIO only has a total of nine instructions, it would be difficult to edit PIO program binaries by hand. PIO assembly is a textual format, describing a PIO program, where each command corresponds to one instruction in the output binary. Below is a link to an example program in PIO assembly: https://github.com/raspberrypi/pico-examples/blob/masterl/pio/squarewave/squarewave.pio
The PIO assembler is included with the Pico SDK, and is called pioasm. This program processes a PIO assembly input text file, which may contain multiple programs, and writes out the assembled programs ready for use. For the Pico SDK these assembled programs are emitted in form of C headers, containing constant arrays.
On every system clock cycle, each state machine fetches, decodes and executes one instruction. Each instruction takes precisely one cycle, unless it explicitly stalls (such as the WAIT instruction). Instructions may also insert a delay of up to 31 cycles before the next instruction is executed to aid the writing of cycle-exact programs.
The program counter, or PC, points to the location in the instruction memory being executed on this cycle. Generally, PC increments by one each cycle, wrapping at the end of the instruction memory. Jump instructions are an exception and explicitly provide the next value that PC will take. An example program can be found at https://github.com/raspberrypi/pico-examples blob/master/pio/squarewave/squarewave.pio
Our example assembly program shows both of these concepts in practice. It drives a 50/50 duty cycle square wave onto a GPIO, with a period of four cycles. Using some other features (e.g. side-set, discussed below) this can be made as low as two cycles. The system has write-only access to the instruction memory, which is used to load programs: https://github.com/raspberrypi/pico-examples/blob/master/pio/squarewave/squarewave.c
The clock divider slows the state machine's execution by a constant factor, represented as a 16.8 fixed-point fractional number. Using the above example, if a clock division of 2.5 were programmed, the square wave would have a period of 4×2.5=10 cycles. This is useful for setting a precise baud rate for a serial interface, such as a UART, as shown in line 47 of the previously referenced program.
The system can start and stop each state machine at any time, via the CTRL register. Multiple state machines can be started simultaneously, and the deterministic nature of PIO means they can stay perfectly synchronised, as demonstrated in line 67 of the referenced program.
The above code fragments are part of a complete application which drives a 12 MHz square wave out of GPIO 0.
Most instructions are executed from the instruction memory, but there are other sources, which can be freely mixed:
The last of these is particularly versatile: instructions can be embedded in the stream of data passing through the FIFO. The I2C example (below) uses this to embed e.g. STOP and RESTART line conditions alongside normal data. In the case of MOV and OUT EXEC, the MOV/OUT itself executes in one cycle, and the execute on the next.
Each state machine possesses a small number of internal registers. These hold input or output data, and temporary values such as loop counter variables.
The Output Shift Register (OSR) holds and shifts output data, between the TX FIFO and the pins (or other destinations, such as the scratch registers).
For example, to stream data through the FIFO and output to the pins at a rate of one byte per two clocks, see https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 334—pull_example1
Autopull allows the hardware to automatically refill the OSR in the majority of cases, with the state machine stalling if it tries to OUT from an empty OSR. This has two benefits:
After configuring autopull, the above referenced program can be simplified to the following, which behaves identically: https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 334—pull_example2
Program wrapping (discussed below) allows further simplification and, if desired, an output of 1 byte every system clock cycle, demonstrated at https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 334—pull_example3
Some peripherals, like UARTs, must shift in from the left to get correct bit order, since the wire order is LSB-first; however, the processor may expect the resulting byte to be right-aligned. This is solved by the special null input source, which allows the programmer to shift some number of zeroes into the ISR, following the data.
State machines remember how many bits, in total, have been shifted out of the OSR via OUT instructions, and into the ISR via IN instructions. This information is tracked at all times by a pair of hardware counters, capable of holding values from 0 to 32 inclusive (the width of a shift register). The state machine can be configured to perform certain actions when the IN or OUT count reaches a configurable threshold:
On PIO reset, or the assertion of CTRL_SM_RESTART, the ISR shift counter is cleared to 0 (nothing yet shifted in), and the OSR shift counter is initialised to 32 (nothing remaining to be shifted out). Some other instructions affect the shift counters:
Each state machine has two 32-bit internal scratch registers 1203, 1204, called X and Y. They are used as:
For example, suppose we wanted to produce a long pulse for “1” data bits, and a short pulse for “0” data bits: https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 336—ws2812_led
Here X is used as a loop counter, and Y is used as a temporary variable for branching on single bits from the OSR. This program can be used to drive a WS2812 LED interface, although more compact implementations are possible (as few as 3 instructions). MOV allows the use of the scratch registers to save/restore the shift registers if, for example, you would like to repeatedly shift out the same sequence.
Each state machine has a pair of 4-word deep FIFOs, one for data transfer from system to state machine (TX) 121a, and the other for state machine to system (RX) 121b. The TX FIFO is written to by system bus masters, such as a processor or DMA controller, and the RX FIFO is written to by the state machine. FIFOs decouple the timing of the PIO state machines and the system bus, allowing state machines to go for longer periods without processor intervention.
FIFOs also generate data request (DREQ) signals, which allow a system DMA controller to pace its reads/writes based on the presence of data in an RX FIFO, or space for new data in a TX FIFO. This allows a processor to set up a long transaction, potentially involving many kilobytes of data, which will proceed with no further processor intervention.
Often, a state machine is only transferring data in one direction. In this case the SHIFTCTRL_FJOIN option can merge the two FIFOs into a single 8-entry FIFO going in one direction only. This is useful for high-bandwidth interfaces such as DPI.
State machines may momentarily pause execution for a number of reasons:
In this case, the program counter does not advance, and the state machine will continue executing this instruction on the next cycle. If the instruction specifies some number of delay cycles before the next instruction starts, these do not begin until after the stall clears.
Side-set is not affected by stalls, and always takes place on the first cycle of the attached instruction.
PIO controls the output level and direction of up to 32 GPIOs, and can observe their input levels. On every system clock cycle, each state machine may do none, one, or both of the following:
Each of these operations is on some contiguous range of GPIOs, with the base and count configured via each state machine's PINCTRL register. OUT, SET, IN and side-set have their own independent mappings, which are allowed to overlap.
For each individual GPIO output (level and direction separately), PIO considers all 8 writes that may have occurred on that cycle, and applies the write from the highest-numbered state machine. If the same state machine performs a SET/OUT and a side-set on the same GPIO simultaneously, the side-set is used. If no state machine writes to this GPIO output, its value does not change from the previous cycle.
Generally each state machine's outputs are mapped to a distinct group of GPIOs, implementing some peripheral interface.
IRQ flags are state bits which can be set or cleared by state machines or the system. There are 8 in total: all 8 are visible to all state machines, and the lower 4 can also be masked into one of PIO's interrupt request lines, via the IRQ0_INTE and IRQ1_INTE control registers. They have two main uses:
The instruction memory is implemented as a 1-write 4-read register file, so all four state machines can read an instruction on the same cycle, without stalling.
There are three ways to apply the multiple state machines:
In this embodiment, state machines cannot communicate data between themselves, but they can synchronise with one another by using the IRQ flags. Omitting provision for direct communication of data between the state machines reduces their complexity. Instead data can be moved between the FIFOs using the on-chip DMA. There are 8 flags total (the lower four of which can be masked for use as system IRQs), and each state machine can set or clear any flag using the IRQ instruction, and can wait for a flag to go high or low using the WAIT IRQ instruction. This allows cycle-accurate synchronisation between state machines.
PIO Assembler (pioasm)
The PIO Assembler parses a PIO source file and outputs the assembled version ready for inclusion in a program. Pioasm currently supports output for the Pico SDK and MicroPython.
A description of the command line arguments can be obtained by running:
pioasm -?
giving:
Within the Pico SDK you do not need to invoke pioasm directly, as the CMake function pico_generate_pio header(TARGET PI_FTE) takes care of invoking pioasm and adding the generated header to the include path of the target TARGET for you.
Table 1 below lists pioasm directives to control the assembly of PIO programs:
The following types of values can be used to define integer numbers or branch targets.
Line comments are supported with // or;
C-style block comments are supported via /* and */
Labels are of the form:
All pioasm instructions follow a common pattern:
Pioasm instruction names, keywords and directives are case insensitive; lower case is used in the Assembly Syntax sections below as this is the style used in the Pico SDK.
Commas appear in some Assembly Syntax sections below, but are entirely optional, e.g. out pins, 3 may be written out pins 3, and jmp x-- label may be written as jmp x--, label. The Assembly Syntax sections below uses the first style in each case as this is the style used in the Pico SDK.
Text in the PIO file may be passed, unmodified, to the output based on the language generator being used.
For example the following (comment and function) would be included in the generated header when the default c-sdk language generator is used.
The general format is
with targets being recognized by a particular language generator (note that target is usually the language generator name e.g. c-sdk, but could potentially be some_language.some_some_group if the language generator supports different classes of pass thru with different output locations.
This facility allows you to encapsulate both the PIO program and the associated setup required in the same source file.
The following example shows a multi program source file (with multiple programs) which we will use to highlight c-sdk and python output features
The c-sdk language generator produces a single header file with all the programs in the PIO source file:
The pass thru sections (% c-sdk #) are embedded in the output, and the PUBLIC defines are available via #define.
A method is created for each program (e.g. s2812_program_get_default_configo) which sets up a pio_sm_config based on the .side_set, .wrap and wrap_target settings of the program, which you can then use as a basis for configuration the PIO state machine. https://github.com/raspberrypi/pico-examples/blob/master/pio/ws2812/generated/ws2812.pio.h
The python language generator produces a single python file with all the programs in the PIO source file. The pass thru sections (% python #) would be embedded in the output, and the PUBLIC defines are available as python variables. Also note the use of lang_opt python to pass initializers for the @pico.asm_pio decorator.
The python language output is provided as a utility. MicroPython supports programming with the PIO natively, so you may only want to use pioasm when sharing PIO code between the Pico SDK and MicroPython. No effort is currently made to preserve label names, symbols or comments, as it is assumed you are either using the PIO file as a source or python; not both. The python language output can of course be used to bootstrap your MicroPython PIO development based on an existing PIO file. https://github.com/raspberrypi/pico-examples/blob/master/pio/ws2812/generated/ws2812.pio.h
The hex generator only supports a single input program, as it just dumps the raw instructions (one per line) as a 4 bit hexidecimal number. Given the program referenced below: https://github.com/raspberrypi/pico-examples/blob/master/pio/squarewave/squarewave.pio
The hex output produced is detailed at: https://github.com/raspberrypi/pico-examples/blob/master/pio/squarewave/generated/squarewave.hex
PIO instructions are 16 bits long, and have the following encoding:
All PIO instructions execute in one clock cycle.
The Delay/side-set field is present in all instructions. Its exact use is configured for each state machine by PINCTRL_SIDESET_COUNT:
Operation: et program counter to Address i Condition is true, otherwise no operation.
Delay cycles on a JMP always take effect, whether Condition is true or false, and they take place after Condition is evaluated and the program counter is updated.
WAIT
Operation: Stall until some condition is met. Like all stalling instructions, delay cycles begin after the instruction completes. That is, if any delay cycles are present, they do not begin counting until after the wait condition is met.
WAIT 1 IRQ x should not be used with IRQ flags presented to the interrupt controller, to avoid a race condition with a system interrupt handler.
Operation: Shift Bit count bits from Source into the Input Shift Register (ISR). Shift direction is configured for each state machine by SHIFTCTRL_IN_SHIFTDIR. Additionally, increase the input shift count by Bit count, saturating at 32.
If automatic push is enabled, IN will also push the ISR contents to the RX FIFO if the push threshold is reached (SHIFTCTRL_PUSH_THRESH). IN still executes in one cycle, whether an automatic push takes place or not. The state machine will stall if the RX FIFO is full when an automatic push occurs. An automatic push clears the ISR contents to all-zeroes, and clears the input shift count.
IN always uses the least significant Bit count bits of the source data. For example, if PINCTRL_IN_BASE is set to 5, the instruction IN 3, PINS will take the values of pins 5, 6 and 7, and shift these into the ISR. First the ISR is shifted to the left or right to make room for the new input data, then the input data is copied into the gap this leaves. The bit order of the input data is not dependent on the shift direction.
NULL can be used for shifting the ISR's contents. For example, UARTs receive the LSB first, so must shift to the right. After 8 IN PINS, 1 instructions, the input serial data will occupy bits 31 . . . 24 of the ISR. An IN NULL, 24 instruction will shift in 24 zero bits, aligning the input data at ISR bits 7 . . . 0. Alternatively, the processor or DMA could perform a byte read from FIFO address+3, which would take bits 31 . . . 24 of the FIFO contents.
in <source>, <bit_count>
where:
Operation: Shift Bit count bits out of the Output Shift Register (OSR), and write those bits to Destination. Additionally, increase the output shift count by Bit count, saturating at 32.
A 32-bit value is written to Destination: the lower Bit count bits come from the OSR, and the remainder are zeroes. This value is the least significant Bit count bits of the OSR if SHIFTCTRL_OUT_SHIFTDIR is to the right, otherwise it is the most significant bits.
PINS and PINDIRS use the OUT pin mapping.
If automatic pull is enabled, the OSR is automatically refilled from the TX FIFO if the pull threshold, SHIFTCTRL_PULL_THRESH, is reached. The output shift count is simultaneously cleared to 0. In this case, the OUT will stall if the TX FIFO is empty, but otherwise still executes in one cycle. The specifics are given in section Section 3.5.4.
OUT EXEC allows instructions to be included inline in the FIFO datastream. The OUT itself executes on one cycle, and the instruction from the OSR is executed on the next cycle. There are no restrictions on the types of instructions which can be executed by this mechanism. Delay cycles on the initial OUT are ignored, but the executee may insert delay cycles as normal.
OUT PC behaves as an unconditional jump to an address shifted out from the OSR.
out <destination>, <bit_count>
where:
Operation: Push the contents of the ISR into the RX FIFO, as a single 32-bit word. Clear ISR to all-zeroes.
PUSH IFFULL helps to make programs more compact, like autopush. It is useful in cases where the IN would stall at an inappropriate time if autopush were enabled, e.g. if the state machine is asserting some external control signal at this point.
push (iffull)
push (iffull) block
push (iffull) noblock
where:
Operation: Load a 32-bit word from the TX FIFO into the OSR.
Some peripherals (UART, SPI . . . ) should halt when no data is available, and pick it up as it comes in; others (I2S) should clock continuously, and it is better to output placeholder or repeated data than to stop clocking. This can be achieved with the Block parameter.
A nonblocking PULL on an empty FIFO has the same effect as MOV OSR, X. The program can either preload scratch register X with a suitable default, or execute a MOV X, OSR after each PULL NOBLOCK, so that the last valid FIFO word will be recycled until new data is available.
PULL IFEMPTY is useful if an OUT with autopull would stall in an inappropriate location when the TX FIFO is empty. For example, a UART transmitter should not stall immediately after asserting the start bit. IfEmpty permits some of the same program simplifications as autopull, but the stall occurs at a controlled point in the program.
pull (ifempty)
pull (ifempty) block
pull (ifempty) noblock
where:
Operation: Copy data from Source to Destination.
MOV PC causes an unconditional jump. MOV EXEC has the same behaviour as OUT EXEC (section Section 3.4.5), and allows register contents to be executed as an instruction. The MOV itself executes in 1 cycle, and the instruction in Source on the next cycle. Delay cycles on MOV EXEC are ignored, but the executee may insert delay cycles as normal.
The STATUS source has a value of all-ones or all-zeroes, depending on some state machine status such as FIFO full/empty, configured by EXECCTRL_STATUS_SEL.
MOV can manipulate the transferred data in limited ways, specified by the Operation argument. Invert sets each bit in
Destination to the logical NOT of the corresponding bit in Source, i.e. 1 bits become 0 bits, and vice versa. Bit reverse sets each bit n in Destination to bit 31-n in Source, assuming the bits are numbered 0 to 31.
mov <destination>, (op)<source>
where:
<destination> Is one of the destinations specified above.
<op> If present, is:
Operation: Set or clear the IRQ flag selected by Index argument.
IRQ flags 4-7 are visible only to the state machines; IRQ flags 0-3 can be routed out to system level interrupts, on either of
the PIO's two external interrupt request lines, configured by IRQ0_INTE and IRQ1_INTE.
The modulo addition bit allows relative addressing of ‘IRQ’ and ‘WAIT’ instructions, for synchronising state machines
which are running the same program. Bit 2 (the third LSB) is unaffected by this addition.
If Wait is set, Delay cycles do not begin until after the wait period elapses.
irq <irq_num>(_rel)
irq set <irq_num>(_rel)
irq nowait <irq_num>(_rel)
irq wait <irq_num>(rel)
irq clear <irq_num>(rel)
where:
Operation: Write immediate value Data to Destination.
This can be used to assert control signals such as a clock or chip select, or to initialise loop counters. As Data is 5 bits in size, scratch registers can be SET to values from 0-31, which is sufficient for a 32-iteration loop.
The mapping of SET and OUT onto pins is configured independently. They may be mapped to distinct locations, for example if one pin is to be used as a clock signal, and another for data. They may also be overlapping ranges of pins: a UART transmitter might use SET to assert start and stop bits, and OUT instructions to shift out FIFO data to the same pins.
set <destination>, <value>
where:
Side-set is a feature that allows state machines to change the level or direction of up to 5 pins, concurrently with the main execution of the instruction. One example where this is desirable is a fast SPI interface: here a clock transition toggling 1->0 or 0->1) must be simultaneous with a data transition, where a new data bit is shifted from the OSR to a GPIO. In this case an OUT with a side-set would achieve both of these at once.
Side-set makes the timing of the interface more precise, reduces the overall program size (as a separate SET instruction is not needed to toggle the clock pin), and also increases the maximum frequency the SPI can run at.
Side-set also makes GPIO mapping much more flexible, as its mapping is independent from SET. For example, SDA and SCL can be mapped to any two arbitrary pins. Normally, SCL toggles to synchronise data transfer, and SDA contains the data bits being shifted out. However, some particular I2C sequences such as Start and Stop line conditions, need a fixed pattern to be driven on SDA as well as SCL. The mapping I2C uses to achieve this is:
This lets the state machine serve the two use cases of data on SDA and clock on SCL, or fixed transitions on both SDA and SCL, while still allowing SDA and SCL to be mapped to any two GPIOs of choice.
The side-set data is encoded in the Delay/side-set field of each instruction. Any instruction can be combined with side-set, including instructions which write to the pins, such as OUT PINS or SET PINS. Side-set's pin mapping is independent from OUT and SET mappings, though it may overlap. If side-set and an OUT or SET write to the same pin simultaneously, the side-set data is used.
If an instruction stalls, the side-set still takes effect immediately. https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 351—spi_tx_fast
The spi_tx_fast example shows two benefits of this: data and clock transitions can be more precisely co-aligned, and programs can be made faster overall, with an output of one bit per two system clock cycles in this case. Programs can also be made smaller.
There are four things to configure when using side-set:
In the above example, we have only one side-set data bit, and every instruction performs a side-set, so no enable bit is required. SIDESET_COUNT would be 1, SIDE_EN would be false. SIDE_PINDIR would also be false, as we want to drive the clock high and low, not high- and low-impedance. SIDESET_BASE would select the GPIO the clock is driven from.
PIO programs often have an “outer loop”: they perform the same sequence of steps, repetitively, as they transfer a stream of data between the FIFOs and the outside world. The square wave program from the introduction (https://github.com/raspberrypi/pico-examples/blob/master/pio/squarewave/squarewave.pio) is a minimal example of this.
The main body of the program drives a pin high, and then low, producing one period of a square wave. The entire program then loops, driving a periodic output. The jump itself takes one cycle, as does each set instruction, so to keep the high and low periods of the same duration, the set pins, 1 has a single delay cycle added, which makes the state machine idle for one cycle before executing the set pins, 0 instruction. In total, each loop takes four cycles. There are two frustrations here:
As the Program Counter (PC) naturally wraps to 0 when incremented past 31, we could solve the second of these by filling the entire instruction memory with a repeating pattern of set pins, 1 and set pins, 0, but this is wasteful. State machines have a hardware feature, configured via their EXECCTRL control register, which solves this common case. https://github.com/raspberrypi/pico-examples/blob/master/pio/squarewave/squarewave_wrap.pio
After executing an instruction from the program memory, state machines use the following logic to update PC:
The .wrap_target and .wrap assembly directives are essentially labels. They export constants which can be written to the WRAP_BOTTOM and WRAP_TOP control fields, respectively: https://github.com/raspberrypi/pico-examples/blob/master/pio/squarewave/generated/squarewave/wrap.pio.h
The squarewave_wrap example has delay cycles inserted, so that it behaves identically to the original squarewave program. Thanks to program wrapping, these can be removed, so that the output toggles twice as fast, while maintaining an even balance of high and low periods: https://github.com/raspberrypi/pico-examples/blob/master/pio/squarewave/squarewave/fast.pio
By default, each state machine possesses a 4-entry FIFO in each direction: one for data transfer from system to state machine (TX), the other for the reverse direction (RX). However, many applications do not require bidirectional data transfer between the system and an individual state machine, but may benefit from deeper FIFOs: in particular, high-bandwidth interfaces such as DPI. For these cases, SHIFTCTRL_FJOIN can merge the two 4-entry FIFOs into a single 8-entry FIFO.
By default, each state machine possesses a 4-entry FIFO in each direction: one for data transfer from system to state machine (TX), the other for the reverse direction (RX). However, many applications do not require bidirectional data transfer between the system and an individual state machine, but may benefit from deeper FIFOs: in particular, high-bandwidth interfaces such as DPI. For these cases, SHIFTCTRL_FJOIN can merge the two 4-entry FIFOs into a single 8-entry FIFO.
Another example is a UART: because the TX/CTS and RX/RTS parts a of a UART are asynchronous, they are implemented on two separate state machines. It would be wasteful to leave half of each state machine's FIFO resources idle. The ability to join the two halves into just a TX FIFO for the TX/CTS state machine, or just an RX FIFO in the case of the RX/RTS state machine, allows full utilisation. A UART equipped with an 8-deep FIFO can be left alone for twice as long between interrupts as one with only a 4-deep FIFO.
The area and power footprint of this whole FIFO arrangement is nearly identical to a single 8-deep FIFO, but this design covers many more use cases.
When one FIFO is increased in size (from 4 to 8), the other FIFO on that state machine is reduced to zero. For example, if joining to TX, the RX FIFO is unavailable, and any PUSH instruction will stall. The RX FIFO will appear both RXFULL and RXEMPTY in the FSTAT register. The converse is true if joining to RX: the TX FIFO is unavailable, and the TXFULL and TXEMPTY bits for this state machine will both be set in FSTAT.
8 FIFO entries is sufficient for 1 word per clock through the RP2040 system DMA, provided the DMA is not slowed by contention with other masters.
Changing FJOIN discards any data present in the state machine's FIFOs. If this data is irreplaceable, it must be drained beforehand.
With each OUT instruction, the OSR gradually empties, as data is shifted out. Once empty, it must be refilled: for example, a PULL transfers one word of data from the TX FIFO to the OSR. Similarly, the ISR must be emptied once full. One approach to this is a loop which performs a PULL after an appropriate amount of data has been shifted: https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 354—manual_pull
This program shifts out 4 bits from each FIFO word, with an accompanying bit clock, at a constant rate of 1 bit per 4 cycles. When the TX FIFO is empty, it stalls with the clock high (noting that side-set still takes place on cycles where the instruction stalls).
This program has some limitations:
This is a common type of problem for PIO, so each state machine has some extra hardware to handle it. State machines keep track of the total shift count OUT of the OSR and IN to the ISR, and trigger certain actions once these counters reach a programmable threshold.
This is shorter and simpler than the original, and can run twice as fast, if the delay cycles are removed, since the hardware refills the OSR “for free”. Note that the program does not determine the total number of bits to be shifted before the next pull; the hardware automatically pulls once the programmable threshold, SHIFCTRL_PULL_THRESH, is reached, so the same program could also shift out e.g. 16 or 32 bits from each FIFO word.
Finally, note that the above program is not exactly the same as the original, since it stalls with the clock output low, rather 3.5. than high. We can change the location of the stall, using the PULL IFEMPTY instruction, which uses the same configurable threshold as autopull: https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 355—somewhat_manual_pull
Below is a complete example (PIO program, plus a C program to load and run it) which illustrates autopull and autopush both enabled on the same state machine. It programs state machine 0 to loopback data from the TX FIFO to the RX FIFO, with a throughput of one word per two clocks. It also demonstrates how the state machine will stall if it tries to OUT when both the OSR and TX FIFO are empty. https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 356—auto_push_pull
To trigger automatic push or pull at the correct time, the state machine tracks the total shift count of the ISR and OSR, using a pair of saturating 6 bit counters.
On any OUT or IN instruction, the state machine compares the shift counters to the values of SHIFTCTRL_PULL_THRESH and SHIFTCTRL_PUSH_THRESH to decide whether action is required. Autopull and autopush are individually enabled by the SHIFTCTRL_AUTOPULL and SHIFTCTRL_AUTOPUSH fields.
Pseudocode for an ‘IN’ with autopush enabled can be found at: https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 357
Note that the hardware performs the above steps in a single machine clock cycle (unless there is a stall). Threshold is configurable from 1 to 32.
On non-‘OUT’ cycles, the hardware performs the equivalent of the following referenced pseudocode: https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf, page 358
An autopull can therefore occur at any point between two ‘OUT’ s, depending on when the data arrives in the FIFO. On ‘OUT’ cycles, the sequence is a little different: https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.pdf page 358
The hardware is capable of refilling the OSR simultaneously with shifting out the last of the shift data, as these two operations can proceed in parallel. However, it cannot fill an empty OSR and ‘OUT’ it on the same cycle, due to the long logic path this would create.
The refill is somewhat asynchronous to your program, but an ‘OUT’ behaves as a data fence, and the state machine will never ‘OUT’ data which you didn't write into the FIFO.
Note that a ‘MOV’ from the OSR is undefined whilst autopull is enabled; you will read either any residual data that has not been shifted out, or a fresh word from the FIFO, depending on a race against system DMA. Likewise, a ‘MOV’ to the OSR may overwrite data which has just been autopulled. However, data which you ‘MOV’ into the OSR will never be overwritten, since ‘MOV’ updates the shift counter.
If you do need to read the OSR contents, you should perform an explicit ‘PULL’ of some kind. The nondeterminism described above is the cost of the hardware managing pulls automatically. When autopull is enabled, the behaviour of ‘PULL’ is altered: it becomes a no-op if the OSR is full. This is to avoid a race condition against the system DMA. It behaves as a fence: either an autopull has already taken place, in which case the ‘PULL’ has no effect, or the program will stall on the ‘PULL’ until data becomes available in the FIFO.
‘PUSH’ does not need a similar behaviour, because autopush does not have the same nondeterminism.
PIO runs off the system clock, but this is simply too fast for many interfaces, and the number of Delay cycles which can be inserted is limited. Some devices, such as UART, require the signalling rate to be precisely controlled and varied, and ideally multiple state machines can be varied independently while running identical programs. Each state machine is equipped with a clock divider, for this purpose. It would be possible to share clock dividers between state machines.
Rather than slowing the system clock itself, the clock divider redefines how many system clock periods are considered to be “one cycle”, for execution purposes. It does this by generating a clock enable signal, which can pause and resume execution on a per-system-clock-cycle basis. The clock divider generates clock enable pulses at regular intervals, so that the state machine runs at some steady pace, potentially much slower than the system clock.
Implementing the clock dividers in this way allows interfacing between the state machines and the system to be simpler, lower-latency, and with a smaller footprint. The state machine is completely idle on cycles where clock enable is low, though the system can still access the state machine's FIFOs and change its configuration.
The clock dividers are 16-bit integer, 8-bit fractional, with first-order delta-sigma for the fractional divider. The clock divisor can vary between 1 and 65536, in increments of 1/256.
If the clock divisor is set to 1, the state machine runs on every cycle, i.e. full speed:
In general, an integer clock divisor of n will cause the state machine to run 1 cycle in every n, giving an effective clock speed of fsys/n.
Fractional division will maintain a steady state division rate of n+f/256, where n and f are the integer and fractional fields of this state machine's CLKDIV register. It does this by selectively extending some division periods from n cycles to n+1.
For small n, the jitter introduced by a fractional divider may be unacceptable. However, for larger values, this effect is much less apparent.
For fast asynchronous serial, it is recommended to use even divisions or multiples of 1 Mbaud where possible, rather than the traditional multiples of 300, to avoid unnecessary jitter.
Internally, PIO has a 32-bit register for the output levels of each GPIO it can drive, and another register for the output enables (Hi/Lo-Z). On every system clock cycle, each state machine can write to some or all of the GPIOs in each of these registers.
The write data and write masks for the output level and output enable registers come from the following sources:
Each OUT/SET/side-set operation writes to a contiguous range of pins, but each of these ranges is independently sized and positioned in the 32-bit GPIO space. This is sufficiently flexible for many applications. For example, if one state machine is implementing some interface such as an SPI on a group of pins, another state machine can run the same program, mapped to a different group of pins, and provide a second SPI interface.
On any given clock cycle, the state machine may perform an OUT or a SET, and may simultaneously perform a side-set. The pin mapping logic generates a 32-bit write mask and write data bus for the output level and output enable registers, based on this request, and the pin mapping configuration.
If a side-set overlaps with an OUT/SET performed by that state machine on the same cycle, the side-set takes precedence in the overlapping region.
Each state machine may assert an OUT/SET and a side-set through its pin mapping hardware on each cycle. This generates 32 bits of write data and write mask for the GPIO output level and output enable registers, from each state machine.
For each GPIO, PIO collates the writes from all four state machines, and applies the write from the highest-numbered state machine. This occurs separately for output levels and output values—it is possible for a state machine to change both the level and direction of the same pin on the same cycle (e.g. via simultaneous SET and side-set), or for one state machine to change a GPIO's direction while another changes that GPIO's level. If no state machine asserts a write to a GPIO's level or direction, the value does not change.
The data observed by IN instructions is mapped such that the LSB is the GPIO selected by PINCTRL_IN_BASE, and successively more-significant bits come from successively higher-numbered GPIOs, wrapping after 31. In other words, the IN bus is a right-rotate of the GPIO input values, by PINCTRL_IN_BASE. If fewer than 32 GPIOs are present, the PIO input is padded with zeroes up to 32 bits. Some instructions, such as WAIT GPIO, use an absolute GPIO number, rather than an index into the IN data bus. In this case, the right-rotate is not applied.
To protect PIO from metastabilities, each GPIO input is equipped with a standard 2-flipflop synchroniser. This adds two cycles of latency to input sampling, but the benefit is that state machines can perform an IN PINS at any point, and will see only a clean high or low level, not some intermediate value that could disturb the state machine circuitry.
This is absolutely necessary for asynchronous interfaces such as UART RX. It is possible to bypass these synchronisers, on a per-GPIO basis. This reduces input latency, but it is then up to the user to guarantee that the state machine does not sample its inputs at inappropriate times. Generally this is only possible for synchronous interfaces such as SPI. Synchronisers are bypassed by setting the corresponding bit in INPUT_SYNC_BYPASS.
Sampling a metastable input can lead to unpredictable state machine behaviour. This should be avoided.
Besides the instruction memory, state machines can execute instructions from 3 other sources:
Here we load an example program into the state machine, which does two things:
The C program sets the state machine running, at which point it enters the hang loop. While the state machine is still running, the C program forces in a jmp instruction, which causes the state machine to break out of the loop.
When an instruction is written to the INSTR register, the state machine immediately decodes and executes that instruction, rather than the instruction it would have fetched from the PIO's instruction memory. The program counter does not advance, so on the next cycle (assuming the instruction forced into the INSTR interface did not stall) the state machine continues to execute its current program from the point where it left off, unless the written instruction itself manipulated PC.
Delay cycles are ignored on instructions written to the INSTR register, and execute immediately, ignoring the state machine clock divider. This interface is provided for performing initial setup and effecting control flow changes, so it executes instructions in a timely manner, no matter how the state machine is configured.
Instructions written to the INSTR register are permitted to stall, in which case the state machine will latch this instruction internally until it completes. This is signified by the EXECCTRL_EXEC_STALLED flag. This can be cleared by restarting the state machine, or writing a NOP to INSTR.
In the second phase of the example state machine program, the OUT EXEC instruction is used. The OUT itself occupies one execution cycle, and the instruction which the OUT executes is on the next execution cycle. Note that one of the instructions we execute is also an OUT—the state machine is only capable of executing one OUT instruction on any given cycle.
OUT EXEC works by writing the OUT shift data to an internal instruction latch. On the next cycle, the state machine remembers it must execute from this latch rather than the instruction memory, and also knows to not advance PC on this second cycle.
This program will print “12345678” when run.
If an instruction written to INSTR stalls, it is stored in the same instruction latch used by OUT EXEC and MOV EXEC, and will overwrite an in-progress instruction there.
If EXEC instructions are used, instructions written to INSTR must not stall
Described below are a few examples of programs that can be executed by the PIO in order to demonstrate its applicability to a wide variety of applications.
SPI is a common serial interface with a twisty history. The following referenced program implements full-duplex (i.e. transferring data in both directions simultaneously) SPI, with a CPHA parameter of 0. https://github.com/raspberrypi/pico-examples/blob/masterlpio/spi/spi.pio, lines 14-32
This code uses autopush and autopull to continuously stream data from the FIFOs. The entire program runs once for every bit that is transferred, and then loops. The state machine tracks how many bits have been shifted in/out, and automatically pushes/pops the FIFOs at the correct point. A similar program handles the CPHA=1 case: https://github.com/raspberrypi/pico-examples/blob/master/pio/spi/spi.pio, lines 34-42
A C helper function configures the state machine, connects the GPIOs, and sets the state machine running. Note that the SPI frame size—that is, the number of bits transferred for each FIFO record—can be programmed to any value from 1 to 32, without modifying the program. Once configured, the state machine is set running. https://github.com/raspberrypi/pico-examples/blob/master/pio/spi/spi.pio, lines 46-71
The state machine will now immediately begin to shift out any data appearing in the TX FIFO, and push received data into the RX FIFO. https://github.com/raspberrypi/pico-examples/blob/master/pio/spi/pio/spi.c, lines 18-34
Putting this all together, this complete C program will loop back some data through a PIO SPI at 1 MHz, with all four CPOL/CPHA combinations: https://github.com/raspberrypi/pico-examples/blob/master/pio/spi/spi/loopback.c
WS2812 LEDs are driven by a proprietary pulse-width serial format, with a wide positive pulse representing a “1” bit, and narrow positive pulse a “0”. Each LED has a serial input and a serial output; LEDs are connected in a chain, with each serial input connected to the previous LED's serial output.
The LEDs consume 24 bits of pixel data, then pass any additional input data on to their output. In this way a single serial burst can individually program the colour of each LED in a chain. A long negative pulse latches the pixel data into the LEDs. https://github.com/raspberrypi/pico-examples/blob/master/pio/ws2812/ws2812.pio, lines 1-27
This program shifts bits from the OSR into X, and produces a wide or narrow pulse on side-set pin 0, based on the value of each data bit. Autopull must be configured, with a threshold of 24. Software can then write 24-bit pixel values into the FIFO, and these will be serialised to a chain of WS2812 LEDs. https://github.com/raspberrypi/pico-examples/blob/master/pio/ws2812/ws2812.pio
A C program configures the state machine to execute this program correctly, and sends some test patterns to a string of 150 LEDs. This program transmits on GPIO 0, but any pin can be selected, by changing the constant PIN_TX.
The state machine's clock divider is configured to slow execution to around 10 MIPS. If system clock speed is 120 MHz, this is a clock divisor of 12.
Note it is possible to make this program as short as 3 instructions, at the cost of making transmission time dependent on data content:
Although not designed for computation, PIO is quite likely Turing-complete, and it is conjectured that it could run DOOM, given a sufficiently high clock speed. https://github.com/raspberrypi/pico-examples/tree/master/pio/addition/addition.pio, lines 1-26
A full 32-bit addition takes only around one minute at 125 MHz. The program pops two numbers from the TX FIFO and pushes their sum to the RX FIFO, which is perfect for use either with the system DMA, or directly by the processor: https://github.com/raspberrypi/pico-examples/tree/master/pio/addition/addition.c
The clocks block provides independent clocks to on-chip and external components. It takes inputs from a variety of clock sources allowing the user to trade off performance against cost, board area and power consumption. From these sources it uses multiple clock generators to provide the required clocks. This architecture allows the user flexibility to start and stop clocks independently and to vary some clock frequencies whilst maintaining others at their optimum frequencies.
For very low cost or low power applications where precise timing is not required, the chip can be run from the internal Ring Oscillator (ROSC). Alternatively the user can provide external clocks or construct simple relaxation oscillators using the GPIOs, the XIN input and appropriate external passive components. Where timing is more critical, the Crystal Oscillator (XOSC) can provide an accurate reference to the 2 on-chip PLLs to provide fast clocking at precise frequencies.
The clock generators select from the clock sources and optionally divide the selected clock before outputting through enable logic which provides automatic clock disabling in SLEEP mode.
An on-chip frequency counter facilitates debugging of the clock setup and also allows measurement of the frequencies of external clocks. The on-chip resus component restarts the system clock from a known good clock if it is accidentally stopped. This allows the software debugger to access registers and debug the problem.
The chip has an ultra-low power mode called DORMANT in which all on-chip clock sources are stopped to save power. External sources are not stopped and can be used to provide a clock to the on-chip RTC which can provide an alarm to wake the chip from DORMANT mode. Alternatively the GPIO interrupts can be configured to wake the chip from DORMANT mode in response to an external event.
Up to 4 generated clocks can be output to GPIOs at up to 50 MHz. This allows the user to supply clocks to external devices, thus reducing component counts in power, space and cost sensitive applications.
The RP2040 can be run from a variety of clock sources shown in
The on-chip Ring Oscillator 231 requires no external components. It runs automatically from power-up and is used to clock the processors during the initial boot stages. The startup frequency is typically 6 MHz but varies with PVT (Process, Voltage and Temperature). The frequency is likely to be in the range 4-8 MHz and is guaranteed to be in the range 1-12 MHz.
For low cost applications where frequency accuracy is unimportant, the chip can continue to run from the ROSC. If greater performance is required the frequency can be increased in fine steps to a frequency well beyond the capability of the chip's components by programming the registers in the Ring Oscillator. The frequency will vary with PVT (Process, Voltage and Temperature) so the user must take care to avoid exceeding the maximum frequencies described in the clock generators section. This variation can be mitigated in various ways if the user wants to continue running from the ROSC at a frequency close to the maximum. Alternatively the user can use an external clock or the XOSC to provide a stable reference clock and use the PLLs to generate the higher frequencies. However, this will require external components, will cost board area and will increase power consumption.
If an external clock or the XOSC is used then the ROSC can be stopped to save power. However, the reference clock generator and the system clock generator must be switched to an alternate source before doing so.
The ROSC is not affected by SLEEP mode. If required the frequency can be reduced before entering SLEEP mode to save power. On entering DORMANT mode the ROSC is automatically stopped and is restarted in the same configuration when exiting DORMANT mode. If the ROSC is driving clocks at close to their maximum frequencies then it is recommended to drop the frequency before entering SLEEP or DORMANT mode to allow for frequency variation due to changes in environmental conditions during SLEEP or DORMANT mode.
If the user wants to use the ROSC clock externally then it can be output to a GPIO pin using one of the clk_gpclk0-3 generators
The Crystal Oscillator (XOSC), shown in
The XOSC is disabled on boot, as RP2040 boots using the Ring Oscillator (ROSC). To start the XOSC, the programmer must set the enable bit and then poll the status register to know that the XOSC output is stable.
The XOSC supports dormant mode, which allows it to be stopped from oscillating until woken up by an asynchronous interrupt. This can either come from the RTC, being clocked by an external clock, or a GPIO pin going high or low. To put the XOSC into dormant mode, a specific code has to be written to the dormant register. This means it is unlikely to be done by mistake.
Peripheral registers may be accessed in one of 4 methods, selected by address decode.
This allows individual fields of a control register to be modified without performing a read-modify-write sequence in software: instead the changes are posted to the peripheral, and performed in-situ. Without this capability, it is difficult to safely access IO registers when an interrupt service routine is concurrent with code running in the foreground, or when the two processors are running code in parallel. Note that this is more flexible than byte or halfword writes, as any combination of fields can be updated in one operation.
Each register block is allocated 4 kB of address space, with the four atomic access aliases occupying a total of 16 kB. Most peripherals on RP2040 provide this functionality natively, and atomic writes have the same timing as normal read/write access. Some peripherals (I2C, UART, SPI and SSI) instead have this functionality added using a bus interposer, which translates upstream atomic writes into downstream read-modify-write sequences, at the boundary of the peripheral. This extends the access time by two system clock cycles.
Logic modules and components of the present invention can be incorporated in a variety of other devices, such as IO modules, interfaces, single board computers, micro-controller devices, etc. and are particularly useful in portable devices such as smart phones due to their low power consumption. Logic modules of the invention can be embodied in separated integrated circuits or incorporated in other devices, such as System on Chip devices. The PIO and state machines require little silicon real estate, comparable to a conventional single-purpose interface, and so can be included in a die with other modules.
An integrated circuit according to an embodiment of the invention can be mounted on a daughter board that is connected to a motherboard or main board of a computer via a USB hub.
The methods of the present invention may be performed by computer systems comprising one or more computers. A computer used to implement the invention may comprise one or more processors, including general purpose CPUs, graphical processing units (GPUs), tensor processing units (TPU) or other specialised processors. A computer used to implement the invention may be physical or virtual. A computer used to implement the invention may be a server, a client or a workstation. Multiple computers used to implement the invention may be distributed and interconnected via a network such as a local area network (LAN) or wide area network (WAN). Individual steps of the method may be carried out by a computer system but not necessarily the same computer system. Results of a method of the invention may be displayed to a user or stored in any suitable storage medium. The present invention may be embodied in a non-transitory computer-readable storage medium that stores instructions to carry out a method of the invention. Any suitable programming language may be used to implement the invention. The present invention may be embodied in a computer system comprising one or more processors and memory or storage storing instructions to carry out a method of the invention.
Having described the invention it will be appreciated that variations may be made on the above described embodiments, which are not intended to be limiting. The invention is defined in the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2100601.0 | Jan 2021 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2021/053246 | 12/10/2021 | WO |