Typically on routers and other networking devices the memory (DRAM) is logically divided into two parts. The first part is the Main Processor Memory, referred to in the following as PMEM, which is the part of the memory that is accessed only by the CPU. The second part is Shared I/O Memory which is accessed by both the CPU and I/O Devices.
The PMEM is used by program TEXT, DATA, BSS, HEAP and STACK. The routing tables, route caches, configurations, etc., will fall into PMEM. Since PMEM is accessed ONLY by the CPU, it is mapped using the CACHE attribute which means that when a memory location in this space is accessed the corresponding line is brought into the cache so that subsequent accesses to that memory location are made from the cache. Cache is a small fast memory located closer to the CPU than the main memory. The accesses to the cache are an order of magnitude faster than the accesses to the main memory. The PMEM is cached with the write-back attribute which means that whenever some modification is made to this memory location by the CPU the change is reflected only in the cache and not in the main memory. The memory is updated only when this cache line in the cache needs to be replaced. The main point to note here is that cacheable memory accesses are much faster than the non-cacheable memory accesses.
I/O Memory, referred to in the following as IOMEM, is the part of the memory that is shared by the CPU and other I/O devices, such as the Network Controllers, and is used by I/O Devices for storage of incoming packets and by the CPU for placing outgoing packets, packet data buffers, particle data buffers, and device control structures (descriptors, address tables, etc.) reside in IOMEM. Packets received on the network interfaces are placed in IOMEM by the I/O devices to be picked up and processed by the CPU. After the processing is done, CPU will place the packets in the IOMEM to be picked up and transmitted by the I/O devices.
Since IOMEM is shared by multiple memory masters, it is mapped with the NON-CACHABLE attribute. That means that any access to this part of the memory goes all the way to the memory and nothing is brought into cache. This is done to preserve coherency of the memory. If IOMEM is made cachable and brought into cache by the CPU, updates made by the CPU will not be made to the memory and thus will not be seen by the I/O devices.
As part of packet processing the CPU will perform multiple accesses to the packet data which lies in IOMEM. These accesses are to: 1) decode the encapsulation; 2) read the Layer 3 addresses; and 3) do other things needed to route the packet.
Since these CPU accesses to IOMEM are all Non-Cached accesses, they are a major performance bottleneck because, as described above, Non-Cached accesses are an order of magnitude slower than Cached accesses.
In one embodiment of the invention, a scheme is developed where accesses to the packet data are made cached while preserving memory coherency giving a performance boost in packet processing.
In another embodiment of the invention, shared I/O memory is mapped into two address spaces. The first being cached with the write-through attribute set and the second being uncached.
In another embodiment of the invention, the address space has the virtual addresses equal to the physical addresses and is used by I/O devices to access packets from the shared I/O memory.
In another embodiment of the invention, the CPU uses the first address space to access and store packets during fast switching. The setting of the writethrough attribute assures coherency between the I/O memory and the cache.
In another embodiment of the invention, the CPU invalidates cache lines corresponding to address where an accessed packet is stored to preserve coherency.
Other features and advantages will be apparent in view of the following detailed description and appended drawings.
The invention will now be described with reference to various embodiments implemented in a routing platform. In the following, the term routing platform is utilized broadly to include any component such a router, bridge, switch, layer 2 or layer 3 switch, gateway, etc., that refers to components utilized to implement connectivity within a network or between networks. In the following, embodiments will be described, by way of example, not limitation, that operate on routing platforms designed and manufactured by the assignee of the present patent application. However, it is understood by persons of skill in the art that the invention has broad utility in any routing platform.
An embodiment of the invention will now be described that provides for cached IOMEM accesses only by the CPU, non-cached IOMEM accesses by I/O devices, and that prevents loss of cache coherency with the IOMEM. The embodiment is implemented by software utilizing standard microprocessor hardware.
Before describing the first embodiment, the environment in which the, embodiment operates will be briefly described. Generally a routing platform includes a chassis, which contains basic components such as power a supply, fans, slots, ports and modules that slide into the slots. The modules inserted into the slots are line cards which are the actual printed circuit boards that handle packet ingress and egress. Line cards provide one or more interfaces over which traffic flows. Thus, depending on the number of slots and interfaces, a router can be configured to work with a variety of networking protocols.
The basic function of a routing platform is routing, or forwarding, which consists of the operation of transferring a packet from an input interface to an output interface. The routing, or forwarding, function comprises two interrelated processes to move information in the network, i.e., making a routing decision by routing and moving packets to the next-hop destination by switching. Many routing platforms perform both routing and switching, and there are several types of each.
In
The DRAM 14 includes the working storage 20 utilized by the CPU (PMEM) and the shared storage 22 dedicated to handling the router's packet buffer (IOMEM). The packet buffers, particles, descriptor rings etc., reside in IOMEM.
In this embodiment a data cache 24 is located on the CPU and is used to cache data during accesses to the addresses from an address space having the cache attribute set.
Thus,
The mapping of IOMEM will now be described with reference to the flowchart of
As depicted in
In one embodiment the following mappings are utilized for the first and second address spaces: IOMEM_CACHED virtual address=(Physical Addr 0×40000000); and IOMEM_UNCACHED virtual address=(Physical Addr).
Thus, in this embodiment, addresses in the first and second address spaces differ by the value of the MSB. Addresses in the first address space require translation and are mapped with the CACHE attribute. Since data accesses by the CPU will be cached subsequent accesses to the same data from the cache will not encounter memory access latency thereby removing a significant performance penalty and increasing the packet processing speed.
The I/O Devices access the memory using the physical address and hence the IOMEM_UNCACHED virtual addresses are made equal to the physical address.
In this embodiment, when the operating system (OS) allocates buffers in IOMEM the IOMEM_CACHED addresses are returned.
The accessing of IOMEM by the CPU will now be described with reference to the flowchart of
The common code used by the operating system and executed by the CPU (process switching code, fast switching code, CEF switching code, etc.) will always access the particles and buffers using addresses form the IOMEM_CACHED address space. When the CPU executes fast switching code (for example) it uses the IOMEM_CACHED addresses so that data is cached for fast access.
The above IOMEM_CACHED scheme is only used for data buffers, all the other data structures shared by the processor and the I/O devices, e.g., descriptors, address filter tables, etc. that reside in the IOMEM space are accessed by the processor using IOMEM_UNCACHED space. When these non-buffer data structures are allocated at initialization, the IOS® operating system returns a cached address from the IOMEM_CACHED space and the device driver then converts this address to an IOMEM_UNCACHED address and stores it in the driver's local structure. From that point on, all accesses to this memory storing non-buffer data structures would be through the uncached address except the call to free this memory. Before a call to free this memory is made, this address is converted back to IOMEM_CACHED address, as all mallocs and frees use IOMEM_CACHED address space.
When the CPU has a packet for transmission, the buffer addresses in the packet will be IOMEM_CACHED addresses. Since, the I/O Devices use IOMEM_UNCACHED addresses, a conversion needs to be done before the packets are given to the I/O Devices for transmission. In the currently described embodiment, this conversion requires only a flipping of a bit for the addresses that described above. When the CPU gives empty buffers to the I/O Devices for receiving packets, the same conversion from IOMEM_CACHED to IOMEM_UNCACHED is required.
During the reception of a packet, the I/O device that receives the packet may be allocated a buffer at an IOMEM location that may have been cached previously by the CPU. If the CPU were to read these IOMEM locations it would access stale data from the cache and not the new data stored by the I/O Device. Accordingly, when the CPU needs to pick up the received packet data, it invalidates all the cache lines corresponding to the memory locations of the received packet. Once the invalidation is done, the access to the packet data goes to the memory and fetches the new data stored by the I/O Device. The technique for invalidating the cache lines is platform dependent and utilizes an instruction specific to the CPU being utilized.
There is no need for cache invalidations or cache flushes during transmission because IOMEM_CACHED is mapped with a WriteThrough attribute so that as soon as the packet data is modified in the cache the modification is reflected in the memory.
The invention may be implemented as program code, stored on a computer readable medium, that is executed by a digital computer. The computer readable medium may include, among other things, magnetic media, optical media, and so on.
The invention has now been described with reference to specific embodiments. Alternatives and substitutions will now be apparent to persons of ordinary skill in the art. For example, the particular memory mapping described is not critical to the invention. Accordingly, it is not intended to limit the invention except as provided by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5761722 | Vishin et al. | Jun 1998 | A |
6182164 | Williams | Jan 2001 | B1 |
6324595 | Tsai et al. | Nov 2001 | B1 |
20030135847 | Gouriou et al. | Jul 2003 | A1 |