The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
Reference will now be made in detail to the present preferred embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
The main idea of the present invention is to design innovative multiple queues with smaller memory requirement and at the same time without performance degradation.
According to the PCA protocol, each access category (AC) tries to gain TXOP following the CSMA/CA rules defined by the WiMedia UWB MAC specification. There is one important property in CSMA/CA rules. If an AC gains the TXOP, the AC can use the communication channel exclusively and is allowed to transmit frames continuously until the end of the TXOP. This property is the key point of the present invention.
Initially, each of the pre-load queues 210-213 is assigned to a corresponding access category (AC0-AC3) for storing UWB PCA frames to be transmitted. When no access category gains TXOP, each access category is only allowed to store frames in its corresponding pre-load queue. AC0 is only allowed to store frames in the pre-load queue 210. AC1 is only allowed to store frames in the pre-load queue 211, and so on. When one of the access categories gains the TXOP, the common area queue 220 is assigned to that access category for storing UWB PCA frames to be transmitted. For example, if AC1 gains TXOP, the common area queue 220 is assigned to be used by AC1. Then AC1 has the access right to the common area queue 220 until another AC at the same device gains TXOP. When AC1 loses TXOP, AC1 loses the access right to the common area queue 220. In response, AC1 discards the frames stored in the common area queue 220 and then releases the common area queue 220. The discarded frames can be moved again from host memory later.
By this approach, the total size of the pre-load queues and the common area queue can be smaller than the total size of traditional TX queues, and there is no performance impact on the queues of the present invention. For example, if the size of each pre-load queue 210-213 is 1 KB and the size of the common area queue 220 is 7 KB. The total memory size is 11 KB, which is far smaller than the 32 KB size of the conventional memory architecture 100 in
At moment T1, frame A0 of AC0 is ready for transmission at the host memory allocated for AC0 and begins to be moved by DMA into the AC0 pre-load queue. In this scenario, each pre-load queue has a size of 1 KB. Therefore the AC0 pre-load queue can only store the first 1 KB segment A0,1 of frame A0.
At moment T2, frame segment A0,1 in the AC0 pre-load queue has reached the predetermined size threshold (0.5 KB) for backoff, and AC0 starts its backoff state machine in order to gain the TXOP. The predetermined size threshold has to be smaller than the individual size of the pre-load queues and is adjustable according to specific requirements of an application. Such an early backoff ensures an access category can get the TXOP and transmit its frames sooner. Consequently the throughput is improved and the pre-load queues can be smaller to reduce total cost.
At moment T3, frame C0 of AC2 is ready for transmission at the host memory of AC2 and the host DMA module begins moving frame C0 into the AC2 pre-load queue. At moment T4, frame C0 reaches the predetermined size threshold of 0.5 KB and AC2 starts its backoff state machine for gaining the TXOP. At moment T5, frame D0 of AC3 is ready for transmission at the host memory of AC3 and the host DMA module begins moving frame D0 into the AC3 pre-load queue. At moment T6, frame D0 reaches the predetermined size threshold and AC3 also starts its backoff state machine. Now there are three access categories (AC0, AC2 and AC3) competing for the TXOP.
At moment T7, AC3 gains the TXOP, gains access to the communication channel, and therefore is allowed to store frames in the common area queue. Access categories AC0 and AC2 suspend their backoff state machines. The host DMA module begins moving frame segment D1,2 into the common area queue. Frames D0 and D1 begin their sequential transmission on air. Note that, before transmission, frame D1 is split into two segments D1,1 and D1,2. D1,1 is stored in the AC3 pre-load queue and D1,2 is stored in the common area queue. Both the AC3 pre-load queue and the common area queue are available to AC3 for storing frames as long as AC3 is still holding the TXOP.
At moment T8, AC3 loses the TXOP, stops frame transmission, releases the communication channel, and releases the common area queue. Access categories AC0 and AC2 resume their backoff state machines. At moment T9, AC0 gains the TXOP and access to the communication channel. Frame A0 begins its transmission on air.
The pre-load queues and the common area queue in this embodiment reside in the MAC sublayer, and the frames are transmitted to the physical layer. When applicable, the memory architecture and the memory management method of the present invention can be applied to other layers and/or other architectures as well.
The memory architecture in this embodiment works because actual buffering is needed only when an access channel gains the TXOP. Therefore a single common area queue is sufficient. The pre-load queues are mandatory because frames must be readily available in a corresponding pre-load queue when an access category gains the TXOP. Otherwise the frames have to be moved from host memory to the pre-load queue and there will be performance impact resulting from an idle period in the communication channel. Or even worse, the idle channel may be occupied by some access category of another device. In such a case, collision may occur and will not be detected until timeout.
In the scope of the present invention, the sizes of the pre-load queues and the common area queue can be adjusted according to different bus architectures between the host memory and the queues. Trade-off between cost and performance are also a reason for queue size adjustment. Larger queues deliver better performance and require higher cost.
The queue sizes can also be determined according to the throughputs of the producer (for example, the host device) and the consumer (for example, the physical layer) of the UWB PCA frames. Larger queues are required for sufficient buffering if the producer is slower. On the other hand, if the producer is fast enough, the queues can be implemented smaller.
In summary, the present invention achieves smaller total queue size and lower hardware implementation cost by sharing a single common area queue among all PCA access categories. Furthermore, the present invention provides pre-load queues for initial buffering when an access category gains the TXOP to prevent idle periods in the communication channel. As a result, the present invention introduces no performance impact compared to the conventional memory architecture.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.