The disclosed subject matter relates to data processing. More particularly, this disclosure relates to a novel and improved pointer computation method and system for a scalable, programmable circular buffer.
Increasingly, electronic equipment and supporting software applications involve signal processing. Home theater, computer graphics, medical imaging and telecommunications all rely on signal-processing technology. Signal processing requires fast math in complex, but repetitive algorithms. Many applications require computations in real-time, i.e., the signal is a continuous function of time, which must be sampled and converted to digital, for numerical processing. The processor must thus execute algorithms performing discrete computations on the samples as they arrive. The architecture of a digital signal processor (DSP) is optimized to handle such algorithms. The characteristics of a good signal processing engine typically may include fast, flexible arithmetic computation units, unconstrained data flow to and from the computation units, extended precision and dynamic range in the computation units, dual address generators, efficient program sequencing, and ease of programming.
One promising application of DSP technology includes communications systems such as a code division multiple access (CDMA) system that supports voice and data communication between users over a satellite or terrestrial link. The use of CDMA techniques in a multiple access communication system is disclosed in U.S. Pat. No. 4,901,307, entitled “SPREAD SPECTRUM MULTIPLE ACCESS COMMUNICATION SYSTEM USING SATELLITE OR TERRESTRIAL REPEATERS,” and U.S. Pat. No. 5,103,459, entitled “SYSTEM AND METHOD FOR GENERATING WAVEFORMS IN A CDMA CELLULAR TELEHANDSET SYSTEM,” both assigned to the assignee of the claimed subject matter.
A CDMA system is typically designed to conform to one or more telecommunications, and now streaming video , standards. One such first generation standard is the “TIA/EIA/IS-95 Terminal-Base Station Compatibility Standard for Dual-Mode Wideband Spread Spectrum Cellular System,” hereinafter referred to as the IS-95 standard. The IS-95 CDMA systems are able to transmit voice data and packet data. A newer generation standard that can more efficiently transmit packet data is offered by a consortium named “3rd Generation Partnership Project” (3GPP) and embodied in a set of documents including Document Nos. 3G TS 25.211, 3G TS 25.212, 3G TS 25.213, and 3G TS 25.214, which are readily available to the public. The 3 GPP standard is hereinafter referred to as the W-CDMA standard. There are also video compression standards, such as MPEG-1, MPEG-2, MPEG-4, H.263, and WMV (Windows Media Video), as well as many others that such wireless handsets will increasingly employ.
In many applications, buffers are widely used. A common type is a circular buffer that wraps around itself, so that the lowest numbered entry is conceptually or logically located adjacent to its highest numbered entry although physically they are apart by the buffer length or range. The circular buffer provides direct access to the buffer, so as to allow a calling program to construct output data in place, or parse input data in place, without the extra step of copying data to or from a calling program. In order to facilitate this direct access, the circular buffer makes sure that all references to buffer locations for either output or input are to a single contiguous block of memory. This avoids the problem of the calling program not having to deal with split buffer spaces when the cycling of data reaches the circular buffer end location. As a result, the calling program may use a wide variety of applications available without the need to be aware that the applications are operating directly in a circular buffer.
One type of circular buffer requires the buffer to be both power-of-2 aligned as well as have a length that is a power of 2. In such a circular buffer, the point calculation simply involves a masking step. While this may provide a simple calculation, the requirement of the buffer length being a power of 2 makes such a circular buffer not useable by certain algorithms or implementations.
In the use of a circular buffer, the length of the buffer includes a starting location and an ending location. For many applications, it would be desirable for the starting location and ending location to be determinable or programmable. With a programmable starting location and ending location for the circular buffer, a wider variety of algorithms and processes could use the circular buffer. Moreover, as the different algorithms and processes change, the circular buffer's operation could also change so as to provide increased operational efficiency and utility.
In addressing a particular location in the circular buffer, a pointer that addresses a particular buffer location will move either up or down to the buffer location. This process, unfortunately, is less than fully efficient. Oftentimes, the process is cumbersome in that it requires three addition/subtraction operations. A first operation is required to generate a new buffer pointer by adding a stride to the current buffer pointer. A second operation is required to determine if the new pointer has overflowed or underflowed the buffer address range. Then, a third operation is required to adjust the new pointer in case of an overflow or an underflow. These 3 operations require either 3 separate adders in a perfectly pipelined operation or alternately require the circular addressing to become a non-pipelineable multi-cycle operation. If it were possible to reduce the number of these operations, then significant DSP improvements could result from either the area and/or power savings of fewer adders or performance improvement since these operations occur numerous times during DSP and other applications.
A need exists, therefore, for a pointer computation method useable in a class of scalable and programmable circular buffers, which class of circular buffers supports a programmable buffer length.
Furthermore, a need exists for a pointer computation method for a class of scalable and programmable circular buffers that requires as few additions as possible to detect the wrap around conditions, and that permits adjustment of the pointer value in the event that the temporary pointer exceeds the circular buffer boundary.
Techniques for making and using a pointer computation method and system for a scalable, programmable circular buffer are disclosed, which techniques improve both the operation of a digital signal processor and the efficient use of digital signal processor instructions for processing increasingly robust software applications for personal computers, personal digital assistants, wireless handsets, and similar electronic devices, as well as increasing the associated digital processor speed and service quality.
According to one aspect of the disclosed subject matter, there is provided a method and a system for determining a circular buffer pointer location. A pointer location within a circular buffer is determined by establishing a length of the circular buffer, a start address that is aligned to a power of 2, and an end address located distant from the start address by the length and less than a power of 2 greater than the length. The method and system determine a current pointer location for an address within the circular buffer, a stride value of bits between the start address and the end address, a new pointer location within the circular buffer that is shifted from the current pointer location by the number of bits of the stride value. An adjusted pointer location is within the circular buffer by an arithmetic operation of the new pointer location with the length. In the event of a positive stride, the adjusted pointer location is determined by, in the event that the new pointer location is less than the end address, adjusting the adjusted pointer location to be the new point location. Alternatively, in the event that the new pointer location is greater than the end address, adjusting the adjusted pointer by subtracting the length from the new pointer location. The adjusted pointer location is set, in the event of a negative stride by, in the event that the new pointer location is greater than said start address, adjusting the adjusted pointer location to be the new point location. Alternatively, in the event that the new pointer location is less than said start address, adjusting the adjusted pointer by adding the length to the new pointer location.
These and other aspects of the disclosed subject matter, as well as additional novel features, will be apparent from the description provided herein. The intent of this summary is not to be a comprehensive description of the claimed subject matter, but rather to provide a short overview of some of the subject matter's functionality. Other systems, methods, features and advantages here provided will become apparent to one with skill in the art upon examination of the following FIGURES and detailed description. It is intended that all such additional systems, methods, features and advantages that are included within this description, be within the scope of the accompanying claims.
The features, nature, and advantages of the disclosed subject matter will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:
The disclosed subject matter for a novel and improved pointer computation method and system for a scalable, programmable circular buffer for a multithread digital signal processor has application in a very wide variety of digital signal processing applications involving multi-thread processing. One such application appears in telecommunications and, in particular, in wireless handsets that employ one or more digital signal processing circuits.
At a receiver unit 22, the transmitted signal is received by an antenna 24 and provided to a receiver (RCVR) 26. Within receiver 26, the received signal is amplified, filtered, down converted, demodulated, and digitized to generate in phase (I) and (Q) samples. The samples are then decoded and processed by a receive (RX) data processor 28 to recover the transmitted data. The decoding and processing at receiver unit 22 are performed in a manner complementary to the coding and processing performed at transmitter unit 12. The recovered data is then provided to a data sink 30.
The signal processing described above supports transmissions of voice, video, packet data, messaging, and other types of communication in one direction. A bi-directional communications system supports two-way data transmission. However, the signal processing for the other direction is not shown in
IQ 44 in IU 42 keeps a sliding buffer of the instruction stream. Each of the six threads T0:T5 that DSP 40 supports has a separate eight-entry IQ 44, where each entry may store one VLIW packet or up to four individual instructions. Decode and issue circuitry 48 logic is shared by all threads for decoding and issuing a VLIW packet or up to two superscalar instructions at a time, as well as for generating control buses and operands for each pipeline SLOT0:SLOT3. In addition, decode and issue circuitry 48 does slot assignment and dependency check between the two oldest valid instructions in IQ 44 entry for instruction issue using, for example, using superscalar issuing techniques. PLC 50 logic is shared by all threads for resolving exceptions and detecting pipeline stall conditions such as thread enable/disable, replay conditions, maintains program flow etc.
In operation, general register file (GRF) 52 and control register file (CRF) 54 of selected thread is read, and read data is sent to execution data paths for SLOT0:SLOT3. SLOT0:SLOT3, in this example, provide for the packet grouping combination employed in the present embodiment. Output from SLOT0:SLOT3 returns the results from the operations of DSP 40.
The present embodiment, therefore, may employ a hybrid of a heterogeneous element processor (HEP) system using a single microprocessor with up to six threads, T0:T5. Processor pipeline 46 has six pipeline stages, matching the minimum number of processor cycles necessary to fetch a data item from IU 42. DSP 40 concurrently executes instructions of different threads T0:T5 within a processor pipeline 46. That is, DSP 40 provides six independent program counters, an internal tagging mechanism to distinguish instructions of threads T0:T5 within processor pipeline 46, and a mechanism that triggers a thread switch. Thread-switch overhead varies from zero to only a few cycles.
Turning to
SLOT0 and SLOT1 pipelines are in DU 68, SLOT2 is in MU 66, and SLOT3 is in SU 64. CU 62 provides source operands and control buses to pipelines SLOT0:SLOT3 and handles GRF 52 and CRF 54 file updates. GRF 52 holds thirty-two 32-bit registers which can be accessed as single registers, or as aligned 64-bit pairs. Micro-architecture 60 features a hybrid execution model that mixes the advantages of superscalar and VLIW execution. Superscalar issue has the advantage that no software information is needed to find independent instructions. A decode stage, DE, performs the initial decode of instructions so as to prepare such instructions for execution and further processing in DSP 40. A register file pipeline stage, RF, provides for registry file updating. Two execution pipeline stages, EX1 and EX2, support instruction execution, while a third execution pipeline stage, EX3, provides both instruction execution and register file update. During the execution, (EX1, EX2, and EX3) and writeback (WB) pipeline stages IU 42 builds the next IQ 44 entry to be executed. Finally, writeback pipeline stage, WB, performs register update. The staggered write to register file operation is possible due to IMT micro-architecture and saves the number of write ports per thread. Because the pipelines have six stages, CU 52 may issue up to six different threads.
To further explain the operation of DU 68, wherein the claimed subject matter may operate, reference is now made to the basic functions performed therein according to the several partitions of the following description. In particular, DU 68 executes load-type, store-type, and 32-bit instructions from ALU 84. The major features of DU 68 include fully pipelined operation in all of DSP 40 pipeline stages, DE, RF, EX1, EX2, EX3, and WB pipeline stages using the two parallel pipelines of SLOT0 and SLOT1. DU 68 may accept either VLIW or superscalar dual instruction issue. Preferably, SLOT0 executes uncacheable or cacheable load or store instructions, 32-bit ALU 84 instructions, and DCU 86 instructions. SLOT1 executes uncacheable or cacheable load instructions and 32-bit ALU 84 instructions.
DU 68 receives up to two decoded instructions per cycle from CU 60 in the DE pipeline stage including immediate operands. In the RF pipeline stage, DU 68 receives general purpose register (GPR) and/or control register (CR) source operands from the appropriate thread specific registers. The GPR operand is received from the GPR register file in CU 60. In the EX1 pipeline stage, DU 68 generates the effective address (EA) of a load or store memory instruction. The EA is presented to MMU 87, which performs the virtual to physical address translation and page level permissions checking and provides page level attributes. For accesses to cacheable locations, DU 68 looks up the data cache tag in the EX2 pipeline stage with the physical address. If the access hits, DU 68 performs the data array access in the EX3 pipeline stage.
For cacheable loads, the data read out of the cache is aligned by the appropriate access size, zero/sign extended as specified and driven to CU 60 in the WB pipeline stage to be written into the instruction specified GPR. For cacheable stores, the data to be stored is read out of the thread specific register in the CU 60 in the EXI pipeline stage and written into the data cache array on a hit in the EX2 pipeline stage. For both loads and stores, auto-incremented addresses are generated in the EX1 and EX2 pipeline stages and driven to CU 60 in the EX3 pipeline stage to be written into the instruction specified GPR.
DU 68 also executes cache instructions for managing DCU 86. The instructions allow specific cache lines to be locked and unlocked, invalidated, and allocated to a GPR specified cache line. There is also an instruction to globally invalidate the cache. These instructions are pipelined similar to the load and store instructions. For loads and stores to cacheable locations that miss the data cache, and for uncacheable accesses, DU 68 presents requests to BIU 70. Uncacheable loads present a read request. Store hits, misses and uncacheable stores present a write request. DU 68 tracks outstanding read and line fill requests to BIU 70. DU 68 provides a non-blocking inter-thread, i.e., allows accesses by other threads while one or more threads are blocked pending completion of outstanding load requests.
AGU 80, to which the present disclosure pertains, provides two identical instances of the AGU 80 data path, one for SLOT0 and one for SLOT1. Note, however, that the disclosed subjected matter may operate, and actually does exist and operate, in other blocks of DU 68, such as ALU 84. For illustrative purposes in understanding the function and structure of the disclosed subject matter, attention is directed, however, to AGU 80 which generates both the effective address (EA) and the auto-incremented address (AIA) for each slot according to the exemplary teachings herein provided.
LCU 82 enables load and store instruction executions, which may include cache hits, cache misses, and uncacheable loads, as well as store instructions. In the present embodiment, the load pipeline is identical for SLOT0 and SLOT1. The store execution via LCU 82 provides a store instruction pipeline write through cache hit instructions, write back cache hit instruction, cache miss instructions, uncacheable write instructions. Store instructions only execute on SLOT0 with the present embodiment. On a write-through store, a write request is presented to BIU 70, regardless of hit condition. On a write-back store, a write request is presented to BIU 70 if there is a miss, and not if there is a hit. On a write-back store hit, the cache line state is updated. A store miss presents a write request to BIU 70 and does not allocate a line in the cache.
ALU 84 includes ALUO 85 and ALUL 89, one for each slot. ALU 84 contains the data path to perform arithmetic/transfer/compare (ATC) operations within DU 68. These may include 32-bit add, subtract, negate, compare, register transfer, and MUX register instructions. In addition, ALU 84 also completes the circular addressing for the AIA computation.
Referring to
With these definitions, reference is now made to
Stride input 116 goes to MUX 128 and inverter 130, which provides an inverted input to MUX 128. Stride direction input 118 also goes to MUX 128, M-bit adder 126, MUX 132 and inverter 134. AND gate 122 derives a pointer offset as the bitwise AND of current pointer input 112 and the base mask from base mask generator 114. AND gate 120 derives a pointer base 136 from the logical AND of current pointer 112 and the offset mask from inverter 124, which offset mask is the inverted output from base mask generator 114.
M-bit adder 126 generates a summand 138 for M-bit adder 140. The summand derives from the summation of a pointer offset from AND gate 122, multiplexed output from MUX 128, and stride direction 118 input. M-bit adder 140 derives a summation 142 from summand 138, multiplexed output from MUX 132, and inverter 134. Summation 142 equals summand 138 plus/minus the circular buffer length 144. Circular buffer length 144 derives from MUX 132 in response to inputs from inverter 146 and length input 148. Summation 142, summand 138, and the most significant bit M 183 from M-bit adder 140 feed to MUX 150 to yield the new pointer offset 152. Finally, OR gate 154 performs a logical OR operation using the multiplexed output from MUX 150 and pointer base 136 to yield the desired new pointer 156.
Clear advantages of the disclosed process over known methods include the requirement of only two additions, i.e., the operations of M-bit adders 126 and 140. Also, the disclosed process and system permit varying N and M to derive a family of circular buffers. As such the disclosed embodiment provides for design optimizations across power, speed, and area design considerations. Furthermore, the present process and system support a signed offset and programmable circular buffer lengths. Still another advantage of the present embodiment includes requiring only generic M-bit adders with no required intermediate bit carry terms. In addition, the disclosed embodiment may use the same data path for both positive and negative strides.
To illustrate the beneficial effects of the present method, the following examples are provides. So, let N equal 5, L equal 30 (i.e., B011110), where M equals N+1=6. The current pointer, P, current stride, S, and sign of stride, D, all of which are variables in the following example. The result of the disclosed process examples provides the various new pointer locations within circular buffer 100.
In the first example, let P=62(B111110), S=1(B000001), and D=Positive (B0) (which is an overflow case). In such case, the mask from base mask generator 114 is 011111, the pointer offset from AND gate 122 is 011110, and the pointer base 136 from AND gate 120 is 100000. Summand 138 from M-bit adder 126 is 011110 +000001=011111. Summation 142 becomes 011111+100001+000001 =000001. The new pointer offset is determined based on Bit6 being 0 for summation 142. This results in the selection of summation 142, which is 000001, as the new pointer offset. The new pointer then becomes 000001+100000=100001
In a second example, let P=62(B111110),S=1(B000001), and D=Negative(B1). In such case, the mask from base mask generator 114 is 011111, the pointer offset from AND gate 122 is 011110, and the pointer base 136 from AND gate 120 is 100000. Summand 138 from M-bit adder 126 is 011110+111110+000001=011101. Summation 142 becomes 011101+011110=111011. The new pointer offset is determined based on Bit6 being 1 for summation 142 for summation 142. This results in the selection of summand 138, which is 011101 as the new pointer offset. The new pointer then becomes 011101+100000=111101.
In a third example, let P=33(B100001),S=1(B000001), and D=Positive(B0). In such case, the mask from base mask generator 114 is 011111, the pointer offset from AND gate 122 is 000001, and the pointer base 136 from AND gate 120 is 100000. Summand 138 from M-bit adder 126 is 000001+000001+000010=011101. Summation 142 becomes 000010+100001=100100. The new pointer offset is determined based on Bit6 being 1 for summand 138. This results in the selection of summand 138, which is 000010 as the new pointer offset. The new pointer, then becomes 000010+100000=100010.
In a fourth example, let P=33(B100001), S=1(B000001), and D=Negative(B1), which is an underflow case. In such case, the mask from base mask generator 114 is 011111, the pointer offset from AND gate 122 is 000001, and the pointer base 136 from AND gate 120 is 100000. Summand 138 from M-bit adder 126 is 000001 +111110+000001=0111101. Summation 142 becomes 000000+011110=011110. The new pointer offset is determined based on Bit6 being 1 for summation 142. This results in the selection of summation 142, which is 011110 as the new pointer offset. The new pointer, then becomes 011101+100000=111110.
The disclosed subject matter, therefore, provides a pointer computation method and system for a scalable, programmable circular buffer 100 wherein the starting location of circular buffer 100 aligns to a power of two corresponding to the size of circular buffer 100. A separate register contains the length of circular buffer 100. By aligning the base of circular buffer 100, the disclosed subject matter requires only subtraction operation to achieve a pointer location. With such a process, only two additions, using two M-Bit adders, as herein described are needed. The present approach permits varying N and M to derive an optimal family of circular buffers 100 across a number of different power, speed and area metrics. The present method and system support signed offset and programmable lengths. In addition, the disclosed subject matter requires only generic M-Bit adders with no intermediate bit carry terms, while using the same data-path for both positive and negative strides.
The present method and system with a starting location, S, which is aligned to a power of two corresponding to a memory size that can contain a buffer length, L. The buffer length, L, may or may not need be stored as state in DU 68. The process takes a number of bits, B, which is the power of two greater than L. A pointer, R, is taken which falls in between the base and base+L. The process then uses a computer instruction and modifies the original pointer, R, by either adding or subtracting a constant value to derive a modified pointer, R′. Then, the starting location, S, is adjusted by setting the least significant bits (LSB) of the B bits to zero. The process then determines, the ending location, E, by taking the logical OR of S and L. If the modified pointer, R′, is derived by adding a constant, the process includes subtracting the ending location, E, from the modified pointer, R′, to derive the new offset location, O. If the offset location, O, is positive, then, the final result is derived from taking the logical OR of the determined starting location, S, and the derived offset location, O. If the modified pointer, R′, is derived by subtracting a constant, then, the process includes subtracting the modified pointer, R′, from the ending location, E, to derive the new offset location, O. If the bit corresponding to the value B+1 of the modified pointer, R′, is not equal to the bit corresponding to the value B+1 of the original pointer, R, then, the final result is the logical OR of the new starting location, S, and the new offset, O for establishing the new pointer location, R′. Otherwise, the new offset, O, determines the modified pointer location, R′.
Variations of the disclosed subject matter may include encoding the end address, E, directly instead of encoding the length of the number of bits, L. This may allow for a circular buffer of arbitrary size, while reducing the size and complexity of circular buffer calculation.
For illustrating yet another application of the present teachings,
Thus, referring to
The embodiment of
Note that non-circular the auto incremented address computation is completed in AGU 80, where the circular the auto incremented address computation also requires ALU 82, in the illustrated example. Because a load or store instruction cannot both pre-increment to generate an EA and post-increment to generate the AIA, the same adder can be shared for both EA and the AIA.
In circular addressing mode, address generation process 160 maintains circular buffer 100 with accesses separated by a stride, which may be either positive or negative. The current value of the pointer is added to the stride. If the result either overflows or underflows the address range of circular buffer 100, the buffer length is subtracted or added (respectively) to have the pointer point back to a location within circular buffer 100.
In DSP 40, the start address of circular buffer 100 aligns to the smallest power of 2 greater than the length of the buffer. If the stride, which is the immediate offset, is positive, then the addition can result in two possibilities. Either the sum is within the circular buffer length in which case it is the final the AIA value, or it is bigger than the buffer length, in which case the buffer length needs to be subtracted. If the stride is negative, then the addition can again result in two outcomes.
If the sum is greater than the start address, then it is the final the AIA value. If the sum is less than the start address, the buffer length needs to be added. The data path here takes advantage of the fact that the start address is aligned to 2(K+2) and that length is required to be less than 2(K+2), where K is an instruction-specified immediate value. The Rx [31:(K+2)] value is masked to zero prior to the addition. A reverse mask preserves the prefix bits [31: (K+2)] for later use. The buffer overflow is determined, when the stride (immediate offset) is positive, by adding the masked Rx to the stride in the AGU 80 adder and subtracting the length from the sum in the ALU 82 adder. If the result is positive then the AIA [(K+2)−1:0] comes from the ALU 82 adder, otherwise the results comes from the AGU 80 adder. The AIA [31:(K+2)] equals Rx [31:(K+2)].
The buffer underflow is determined, when the stride is negative, by adding the masked Rx to the stride in the AGU adder. If this sum is positive, then the AIA [(K+2)−1:0] comes from the AGU 80 adder. If the sum is negative, then the length is added to the sum in the ALU 82 adder, and the AIA [(K+2)−1:0] comes from the ALU 82 adder. Again the AIA [31:(K+2)] equals Rx[31:(K+2)].
Note that whether length is added or subtracted in the ALU 82 adder is determined by the sign of the offset. An issue with the POR option is that it adds an AND gate to perform the mask to the Rx input of the adder, which is in the critical path. An alternative implementation is as follows.
In this case Rx is added to the stride. The sum of the AGU 80 adder (which is non-critical for the AIA) is masked, so that only sum [(K+2)−1:0] is presented as one input to the ALU 82 adder, while length or its two's complement is presented as the other input. If the stride is positive, then length is subtracted from the masked sum in the ALU adder. If the result is positive, then the AIA [(K+2)−1:0] comes from the AGU 80 adder and no overflow occurs, otherwise the result comes from the ALU adder (over flow). The AIA [31:(K+2)] always equals Rx[31:(K+2)].
If the stride is negative, if the AGU adder Sum [31:2(K+2)] is compared with Rx[31:(K+2)]. If these prefix bits stayed the same, this means no underflow occurred. In this case, the AIA[(K+2)−1:0] comes from the AGU 80 adder. If the prefix bits differ, then there was an underflow. In this case length is added to the masked sum in the AGU 80 adder. The AIA[(K+2):0], in this case, comes from the AGU 80 adder. Again, the AIA [31:(K+2)] always equals Rx[31:(K+2)]. With this approach, the masking AND is eliminated from the critical path. However a 28-bit comparator is added.
The processing features and functions described herein can be implemented in various manners. For example, not only may DSP 40 perform the above-described operations, but also the present embodiments may be implemented in an application specific integrated circuit (ASIC), a micro controller, a microprocessor, or other electronic circuits designed to perform the functions described herein. The foregoing description of the preferred embodiments, therefore, is provided to enable any person skilled in the art to make or use the claimed subject matter. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the innovative faculty. Thus, the claimed subject matter is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.