The technical field of this invention is linear feedback shift register used to generate code sequences.
Generating parallel blocks of the scramble code sequences using minimal storage, hardware allows for general purpose solutions. The cost of the new generator has the same storage requirements for all codes for all standards and is of order O(n). This is not the case with current methods which have storage requirements of the order O(n3). The solution allows software based scramble code generators to be used for more generally programmable solutions for multiple standards. Currently they are designed for fixed purposes such as CDMA2000 or 3GPP up or down link. Also in the case of masked generators such as CDMA2000 long codes a mask is needed that collapses the state down to a single bit for each clock. This requires extra hardware. The method converts the mask to a constant known offset allowing the standard new solution to be used for parallel output generation.
There is a direct link between a Fibonacci based generator and a Galois generator. This link maps the states between the 2 machines. The Galois state cannot be used directly for parallel bit generation but the mapping allows this. The Galois machine can be advanced in logarithmic time to a required point. Mapping from the code to a table allows an arbitrary mask to be converted to a constant coefficient. A reverse mapping has also been found and an algorithm for its calculation. This allows the initial state to be found for a Galois based generator using the Fibonacci state.
This method does not use a mechanical hardware matrix generator it uses Galois field multipliers and adders which are very compact in hardware. It uses minimal storage and one implementation can serve many standards at no extra cost.
This method is general purpose and allows easy software implementation on any processor previous methods are bit based and do not allow efficient parallel data path implementation which what processors use. They are also specialized to particular standards due to the high cost of the matrices, which are sparse but random.
These and other aspects of this invention are illustrated in the drawings, in which:
Digital signal processor system 100 includes a number of cache memories.
Level two unified cache 130 is further coupled to higher level memory systems. Digital signal processor system 100 may be a part of a multiprocessor system. The other processors of the multiprocessor system are coupled to level two unified cache 130 via a transfer request bus 141 and a data transfer bus 143. A direct memory access unit 150 provides the connection of digital signal processor system 100 to external memory 161 and external peripherals 169.
Central processing unit 1 has a 32-bit, byte addressable address space. Internal memory on the same integrated circuit is preferably organized in a data space including level one data cache 123 and a program space including level one instruction cache 121. When off-chip memory is used, preferably these two spaces are unified into a single memory space via the external memory interface (EMIF) 4.
Level one data cache 123 may be internally accessed by central processing unit 1 via two internal ports 3a and 3b. Each internal port 3a and 3b preferably has 32 bits of data and a 32-bit byte address reach. Level one instruction cache 121 may be internally accessed by central processing unit 1 via a single port 2a. Port 2a of level one instruction cache 121 preferably has an instruction-fetch width of 256 bits and a 30-bit word (four bytes) address, equivalent to a 32-bit byte address.
Central processing unit 1 includes program fetch unit 10, instruction dispatch unit 11, instruction decode unit 12 and two data paths 20 and 30. First data path 20 includes four functional units designated L1 unit 22, S1 unit 23, M1 unit 24 and D1 unit 25 and 16 32-bit A registers forming register file 21. Second data path 30 likewise includes four functional units designated L2 unit 32, S2 unit 33, M2 unit 34 and D2 unit 35 and 16 32-bit B registers forming register file 31. The functional units of each data path access the corresponding register file for their operands. There are two cross paths 27 and 37 permitting access to one register in the opposite register file each pipeline stage. Central processing unit 1 includes control registers 13, control logic 14, and test logic 15, emulation logic 16 and interrupt logic 17.
Program fetch unit 10, instruction dispatch unit 11 and instruction decode unit 12 recall instructions from level one instruction cache 121 and deliver up to eight 32-bit instructions to the functional units every instruction cycle.
Processing occurs in each of the two data paths 20 and 30. As previously described above each data path has four corresponding functional units (L, S, M and D) and a corresponding register file containing 16 32-bit registers. Each functional unit is controlled by a 32-bit instruction. The data paths are further described below. A control register file 13 provides the means to configure and control various processor operations.
The fetch phases of the fetch group 310 are: Program address generate phase 311 (PG); Program address send phase 312 (PS); Program access ready wait stage 313 (PW); and Program fetch packet receive stage 314 (PR). Digital signal processor core 110 uses a fetch packet (FP) of eight instructions. All eight of the instructions proceed through fetch group 310 together. During PG phase 311, the program address is generated in program fetch unit 10. During PS phase 312, this program address is sent to memory. During PW phase 313, the memory read occurs. Finally during PR phase 314, the fetch packet is received at CPU 1.
The decode phases of decode group 320 are: Instruction dispatch (DP) 321; and Instruction decode (DC) 322. During the DP phase 321, the fetch packets are split into execute packets. Execute packets consist of one or more instructions which are coded to execute in parallel. During DP phase 322, the instructions in an execute packet are assigned to the appropriate functional units. Also during DC phase 322, the source registers, destination registers and associated paths are decoded for the execution of the instructions in the respective functional units.
The execute phases of the execute group 330 are: Execute 1 (E1) 331; Execute 2 (E2) 332; Execute 3 (E3) 333; Execute 4 (E4) 334; and Execute 5 (E5) 335. Different types of instructions require different numbers of these phases to complete. These phases of the pipeline play an important role in understanding the device state at CPU cycle boundaries.
During E1 phase 331, the conditions for the instructions are evaluated and operands are read for all instruction types. For load and store instructions, address generation is performed and address modifications are written to a register file. For branch instructions, branch fetch packet in PG phase 311 is affected. For all single-cycle instructions, the results are written to a register file. All single-cycle instructions complete during the E1 phase 331.
During the E2 phase 332, for load instructions, the address is sent to memory. For store instructions, the address and data are sent to memory. Single-cycle instructions that saturate results set the SAT bit in the control status register (CSR) if saturation occurs. For single cycle 16 by 16 multiply instructions, the results are written to a register file. For M unit non-multiply instructions, the results are written to a register file. All ordinary multiply unit instructions complete during E2 phase 322.
During E3 phase 333, data memory accesses are performed. Any multiply instruction that saturates results sets the SAT bit in the control status register (CSR) if saturation occurs. Store instructions complete during the E3 phase 333.
During E4 phase 334, for load instructions, data is brought to the CPU boundary. For multiply extensions instructions, the results are written to a register file. Multiply extension instructions complete during the E4 phase 334.
During E5 phase 335, load instructions write data into a register. Load instructions complete during the E5 phase 335.
Note that “z” in the z bit column refers to the zero/not zero comparison selection noted above and “x” is a don't care state. This coding can only specify a subset of the 32 registers in each register file as predicate registers. This selection was made to preserve bits in the instruction coding.
The dst field (bits 23 to 27) specifies one of the 32 registers in the corresponding register file as the destination of the instruction results.
The scr2 field (bits 18 to 22) specifies one of the 32 registers in the corresponding register file as the second source operand.
The scr1/cst field (bits 13 to 17) has several meanings depending on the instruction opcode field (bits 3 to 12). The first meaning specifies one of the 32 registers of the corresponding register file as the first operand. The second meaning is a 5-bit immediate constant. Depending on the instruction type, this is treated as an unsigned integer and zero extended to 32 bits or is treated as a signed integer and sign extended to 32 bits. Lastly, this field can specify one of the 32 registers in the opposite register file if the instruction invokes one of the register file cross paths 27 or 37.
The opcode field (bits 3 to 12) specifies the type of instruction and designates appropriate instruction options. A detailed explanation of this field is beyond the scope of this invention except for the instruction options detailed below.
The s bit (bit 1) designates the data path 20 or 30. If s=0, then data path 20 is selected. This limits the functional unit to L1 unit 22, S1 unit 23, M1 unit 24 and D1 unit 25 and the corresponding register file A 21. Similarly, s=1 selects data path 20 limiting the functional unit to L2 unit 32, S2 unit 33, M2 unit 34 and D2 unit 35 and the corresponding register file B 31.
The p bit (bit 0) marks the execute packets. The p-bit determines whether the instruction executes in parallel with the following instruction. The p-bits are scanned from lower to higher address. If p=1 for the current instruction, then the next instruction executes in parallel with the current instruction. If p=0 for the current instruction, then the next instruction executes in the cycle after the current instruction. All instructions executing in parallel constitute an execute packet. An execute packet can contain up to eight instructions. Each instruction in an execute packet must use a different functional unit.
In third generation wireless systems such as 3GPP, CDMA2000, IS2000 the symbol data generated for channel coding is further spread using orthogonal spreading codes such as Walsh codes. These spread the signal to the transmission rate from a factor of 4 to a factor of 512. After spreading, some form of scrambling takes place. This has no effect on the bandwidth but minimizes the peak to average ratio, so that power amplifiers in the system operate closer to their efficiency point. This scramble code is assumed to be very long and cannot be stored in a memory. Thus it has to be generated as needed. The starting phase of this sequence is also a variable and so to generate the sequence on the right phase requires some way of mapping the phase to the actual initial seed of the generator.
Generators are based on the linear feedback shift register (LFSR). Several techniques are used to generate the state of the machine for a particular phase. It is often required to generate several bits of the sequence at once as uplink receiver structures typically use massively parallel architectures. This invention allows both general purpose generation for multiple standards and codes and parallel generation of the bits combined with low data storage.
The suggested architecture provides these common benefits for less total area than a conventional design. This invention permits: parallel generation of bits; multi-standard support for small incremental cost; support of long code generation using arbitrary masks; lower memory cost than previous methods; lower gate count due to using efficient Galois field multiplier rather than power matrix method; and lower gate count due to encoding of the seed.
The Fibonacci form LFSR includes plural 1-bit state registers 501 to 509. Each operational cycle the state of state register SM passes to the next state register SM−1. The output of the LFSR generator is taken from the state register S0 501. The feed forward is performed by selecting taps according to a polynomial generator. The tap weights g1 512 to gN−1 519 determine the feed forward. Each tap weight can be 0 or 1. If 0, then there is no feed forward at that tap. If 1, then the value of the state register is part of the feed forward. Note that there is no tap weight g0. In effect g0 is always 1. Otherwise the LFSR would be at least one bit shorter. The set of exclusive OR gates 521 to 529 combines all the taps and supplies the input to state register SN−1 509.
The convention adopted for the initial state S and the polynomial generator T are as follows. The
is the state vector of the pseudo-noise (PN) generator.
is the tap weight vector of the pseudo-noise generator.
The LFSR sequences of most practical importance are called maximal length. A maximal length is a sequence generated by an LFSR with an N state shift register has a periodicity 2N−1. An LFSR that generates a maximal length sequence will be termed a maximal length LFSR. Maximal length LFSR sequences are also called m-sequences. For a given N, an LFSR with a proper choice of the taps Ti will result in a maximal length sequence.
The tap vector T represents the feed forward path connections. Using the conventions above described, the pseudo-noise generator operation can be modeled in a state transition matrix formulation as follows:
where: M(1) is the N by N binary-valued matrix representing the shift register. The matrix operations are in GF(2). The transition state matrix M(1) is built as follows:
By substitution equation (1) becomes:
Equations (1) and (3) give the state vector at cycle 1 given the initial state at cycle 0 and the state transition matrix. In general:
Equation (4) shows a way to generate any state and thus N consecutive chips of the sequence at offset k given the initial state. This is a very direct, mechanical method to generate the state of the Fibonacci generator for any arbitrary time offset. For a particular set of feed forward taps chosen the matrix M is very sparse as are its powers. Thus a dedicated circuit for a particular polynomial optimizes to a compact solution. The output is taken from state bit 0 as in equation (5).
A Fibonacci machine is used because it generates bits in parallel and a sequence of bits can be taken from the state. Thus the state contains the future N bits from the last N stages.
However when different polynomials are required, for example to support different standards, the arbitrary matrices need to be programmable. In the case of one of the 3GPP uplink polynomials, this would require, 25 by 25 by 25 bits, or 488 32 bit words. This is an expensive solution because each bit would need a storage register. The Fibonacci generator cannot be advanced without using the power matrices by an arbitrary amount and this is an expensive solution when needed for general purpose use in multiple standards.
This prior art allows multiple bits to be generated at once. According to this prior art it is necessary to map the code to its Fibonacci form to generate the future bits. The disadvantage of the power matrix form is the memory required to store the feed forward matrices for a general purpose polynomial is very large on the order of N3 bits. This invention has the advantage that for arbitrary polynomials the memory cost is only 2N bits.
A system based on Galois field arithmetic would be much more efficient in terms of storage for the general case and be accessible to arithmetic speedup methods. In particular, Galois field arithmetic is more easily accomplished than the matrix operations described above in the Fibonacci form. This invention uses mapping that allows a Galois generator to be used in the form of a multiplicative field in GF(2N). The bits in the element in this field are mapped to the equivalent Fibonacci machine state at that time. This invention is: programmable with little storage needs for different standards; requires the same or less hardware for the special case; and lends itself to efficient software implementation
Prior art teaching enable mapping the Galois form of
The data flows through each structure in opposite directions. The output of the Galois machine is the input to state bit 0 and the output of state bit n−1 via the feed back. The output of the Fibonacci machine is the output of state bit 0.
The construction of the Fibonacci machine illustrated in
The output bit equation corresponding to equation (5) is:
The sequence of states that the Galois machine goes through after each clock is equivalent to a finite multiplicative field generated for GF(2N) over the polynomial g. The elements of g are the tap connections of the LFSR. Multiplication by α in GF(2N) can be seen to be equivalent to the state machine being clocked as the whole state is added to the current state if the bit about to be shifted out is 1. By a process of induction any number of clocks can be translated into a multiplication by αi: where α is the primitive element of the field.
N is the number of bits in the state and the generator polynomial is the feedback tap polynomial. Galois field multiplication is a polynomial product of 2 numbers in the field followed by a polynomial division by the generator polynomial, with the remainder being the result. In a manner analogous to multiplying powers of the state transition matrix to get arbitrary offsets from zero time, multiplying by powers of α will also yield the desired state.
The output bits of the Fibonacci and the Galois structures are identical assuming the correct initial state. The Galois sequence is equivalent to the Fibonacci sequence though there is a time offset between them and the order of the bits in the shift register state is also different. The two state sequences generated by the two structures can be mapped to each other for the desired result. Thus for an arbitrary set of bits of a state in G all of the same bits in F can be reproduced using a fixed linear mapping function. The previously unknown time offset between Galois and Fibonacci machines is also shown as:
SF=f(SG) (8)
This mapping is reversible and linear, allowing the Galois state to be computed from the Fibonacci state.
The following is an example employing one of the scramble codes used in the downlink in the 3GPP specification. This has generator polynomial G=X18+X10+X7+X5+1. This is an 18 bit shift register. Table 1 shows the Fibonacci sequence for the first 32 values.
Table 2 shows the equivalent sequence of Galois numbers.
Tables 1 and 2 show that the output values are the same. The right most or most significant bit of these tables are the same. Tables 1 and 2 also show that the first 8 bits of the Fibonacci sequence state and the last 8 bits of the Galois sequence state are identical though time reversed. The advantage of the Fibonacci generator is that all of the future bits from the current time up to the size of the shift register can be extracted from the current state. So inspection of the Fibonacci sequence shows that any state i contains the future N−1 output bits giving immediate access to the required N bits for a parallel update. This allows a single update from the Fibonacci generator to generate up to 18 bits for each output with no modification in this example.
For a maximal length LFSR both the Fibonacci and Galois forms of the generator generate the same output sequence, but shifted in time. This can be shown as follows. Let cF(k) be the sequence of bits generated by the Fibonacci form and cG(k) be the sequence generated by the Galois form. If we take the initial state for the Fibonacci form generator SF(0) to be [1,0,0 . . . 0]T, then from
cF(k)=[1,0,0 . . . 0]M(k)[1,0,0 . . . 0]T (9)
Similarly, for SG(0) to be [1,0,0 . . . 0]T, then:
cG(k)=[1,0,0, . . . 1]M(k)
Noting that [0,0,0 . . . 0,1]=[1,0,0 . . . 0] M(1)
cG(k)=[1,0,0 . . . 0]M(k+1)
Equations (6) and (8) together imply that cG(k)=cF(k+1). Thus, when SG(0)=[1,0,0 . . . 0]T=SF(0), the Galois generator generates the same sequence as the Fibonacci generator, except for a shift of one bit.
This can be generalized to arbitrary initial states. Suppose the Fibonacci and Galois form generators starts with respective arbitrary initial states SF(0) and SG(0). Since for a maximal length LFSR every non-zero state is reachable from every other state in a number of steps less than 2N−1 steps, there exists α and β such that SF(0)=M(α)[1,0,0 . . . 0]T and SG(0)=M(β)[1,0,0 . . . 0]T. Therefore:
The Fibonacci and Galois form generators produce identical bitstreams, except for a constant shift of (k+α−β−1) modulo2(N−1).
The Fibonacci generator however must be built in the power matrix form to be able to generate sequences at arbitrary phases and update them by arbitrary amounts. In contrast, the Galois state contains some of the Fibonacci state bits but then is broken up into seemingly random groups of bits that are delayed versions of the required sequences.
In the Galois sequence the feedback taps form discontinuities in the sequences. The number of these discontinuities is equal to the number of tap weights. However the cost of a general purpose Fibonacci machines is high. If the Galois state can be mapped in a relatively simple way to give the same exact state output as the Fibonacci sequence then a general purpose parallel output scramble code generator can be built with low storage costs.
A general purpose Fibonacci generator requires over 4000 bytes of storage ((32*32*32)/8). In contrast, in theory a Galois programmed machine would require only 12 bytes of storage (32*3/8) assuming largest polynomial was 32. For multiple standard support this invention is most attractive. For single standard support this invention is still competitive especially on a software platform. The main reason for the large reduction in storage is due to the regular repetitive structure of the mappings.
The states of the 2 machines are represented in this application with the most significant bit or output bit on the right hand side. The time offsets of the sequences were measured by exhaustive search. In the Fibonacci sequence all of the time shifts are simple, incrementing by 1 each time.
Generating these time shifts would seem like a complex problem. This application defines a model of the way the sequences for each state bit are related to each other referred to here as the recursive definition of the code. Each bit along the state is an earlier time sequence of the previous one. This is equivalent of saying the next sequence is α−1 multiplied by of the current state. This is true until a feedback tap is encountered then the most significant bit, the earliest bit of the generator used as the feedback bit is added to the current state bit. This new state bit is then multiplied by α−1 as before until it meets the next feedback bit and so on.
For a Fibonacci machine the following is true:
The feed forward equation is the same as the polynomial definition of the code.
For a Galois machine there is the following relation between states. The feedback path is directly connected to the first bits of the sequence so the least significant bit of the state becomes the most significant bit of the state on each clock pulse. If a feedback tap occurs, the next state is the sum of the current state and the original output bit delayed by 1.
si+1=(si+gi)α−1 (14)
so=sn−1α−1 (15)
Consider the example of the downlink code G=X18+X10+X7+X5+1. According to these rules the sequence of ratios between each state bit and state bit zero (the output of the last state bit) is:
S17=(((α−5+α0)·α−13+α0)·α−3+α0)·α−8,(=α0)
S16=(((α−5+α0)·α−12+α0))·α−3+α0)·α−7,(=α1)
S15=(((α−5+α0)·α−11+α0))·α−3+α0)·α−6,(=α2)
S14=(((α−5+α0)·α−10+α0))·α−3+α0)·α−5,(=α3)
S13=(((α−5+α0)·α−9+α0))·α−3+α0)·α−4,(=α4)
S12=(((α−5+α0)·α−8+α0))·α−3+α0)·α−3,(=α5)
S11=(((α−5+α0)·α−7+α0))·α−3+α0)·α−2,(=α6)
S10=(((α−5+α0)·α−6+α0))·α−3+α0)·α−1,(=α7)
S9=((α−5+α0)·α−5+α0)·α−3,
S8=α−5+α0)·α−4+α0)·α−2,
S7=α−5+α0)·α−3+α0)·α−1,
S6=(α−5+α0)·α−2,
S5=(α−5+α0)·α−1,
S4=α−5,
S3=α−4,
S2=α−3,
S1=α−2,
S0=α−1
This shows the state is not a continuous sequence of values, there are discontinuities due to bits 0 to 9 not following the sequence of the other entries. The sequence needs to be of the form of the Fibonacci machine. By inspection it is shown that:
(((α−5+α0)α−6+α0)α−3+α0)α−1=α7 (13)
Which is equivalent to:
α−11+α−6+α−4+α−1=α7 (17)
Multiplying both sides by α11 returns the generator polynomial where f(α)=0:
α18+α10+α7+5+α0=0 (18)
Thus many of the required bits are in the form needed but some have to be reconstructed. The missing parts are all available or can be generated. This process can be performed by hand for any code but it is laborious. This process is detailed below. The equations are expanded out to form normal polynomial equations. It is now clear which missing values are needed to reconstruct the equivalent Fibonacci state. By inspection it is seen that if state0 is α−1 then state17 must be α0 as they are directly connected and so on until the first feedback tap is encountered. Thus the missing values to make the state the same as the Fibonacci state are all available but reversed from least significant bit to most significant bit. Thus there is a mapping from the Galois state to the Fibonacci state. There may still be an arbitrary time offset between the 2 state spaces.
α−18+α−13+α−11+α−8,(=α0)
α−17+α−12+α−10+α−7,(=α1)
α−16+α−11+α−9+α−6,(=α2)
α−15+α−10+α−8+α−5,(=α3)
α−14+α−9+α−7+α−4,(=α4)
α−13+α−8+α−6+α−3,(=α5)
α−12+α−7+α−5+α−2,(=α6)
α−11+α−6+α−4+α−1,(=α7)
α−10+α−5+α−3(+α0),(=α8)
α−9+α−4+α−2(+α1),(=α9)
α−8+α−3+α−1(+α2),(=α10)
α−7+α−2(+α0+α3),(=α11)
α−6+α−1(+α1+α4),(=α12)
α−5(+α0+α2+α5),(=α13)
α−4(+α1+α3+α6),(=α14)
α−3(+α2+α4+α7),(=α15)
α−2(+α3+α5+α8),(=α16)
α−1(+α4+α6+α9),(=α17)
The above sequence is combined with itself in the above manner to generate a contiguous sequence. The missing part of the sequence is shown in parentheses. Contiguity is defined by the next bit being a delayed version of the previous bit. Each element in the state is a delayed version of the previous state the same as a Fibonacci sequence. Feed forward matrix F shown below is used to convert the Galois field state into the Fibonacci sequence. This matrix has a special structure. Every column is an up shifted copy of the previous column. This matrix F is added to the identity to form the full feed forward matrix. Only N bits are required to define the functionality of the converter. This allows the structure of the feed forward unit to be highly regular and optimized. It only requires N control inputs.
This is a recursive method that defines the time relationship between each bit in the Galois state. Upon calculation of the coefficient between state bit I and state bit 0, the antilog in the polynomial field of interest yields the actual time offset. For example with bit 9:
α−10+α−5+α−3=α79607 (19)
From equation (19), we have:
offset(9)=antilog(α−10+α−5+α−3)=79607 (20)
The recursive definition for each code can be used to form a lookup table that allows the mask value used in the example CDMA2000 in the long code to be converted to a multiplicative coefficient to allow the Galois machine to still be used for parallel sequence extraction. The required extra bits added are shown below. Each vertical column i−1 is a shifted version of column i:
In the case of the existing Galois state, for any particular code the table of coefficients which advance the state to the required point can be generated. The vector of inputs α−1, α−2, α−N+1, α−N is multiplied by the tap weight vector matrix TWM. The TWM has the structure shown in below. The feedback taps are promoted from GF(2) to GF(2N):
For the polynomial
where: X=α and g(α)=0 is a primitive element of the field. This means in the example shown that α−18+α−13+α−11+α−8=α0 since α18+α10+α7+α5+α0=0. In other words MN−1=α0=1.
SG0=FF−1SF0 (22)
The initial Galois state is determined from the desired initial Fibonacci state using inverse mapping.
SGi=GiSG0α1SG0 (23)
This method then arbitrarily advances from time zero to time to form Galois state at time i. Note the equivalence:
SFi=FFSGi (24)
Equation (24) forward maps from Galois state to Fibonacci state and accesses to parallel output bits.
The following algorithm is a closed form deterministic method of calculating the feed-forward matrix. The polynomial matrix is defined as:
The 0th column of the matrix G is the Generator polynomial g multiplied by the up shift matrix R where:
The matrix G can also be considered as the Galois generator matrix down shifted by 1 and the 0th element set to 1.
It is known that any bit sequence from the Fibonacci sequence is equal to the equivalent Galois sequence with an arbitrary shift. The contiguous sequence of outputs for a particular state can also be produced using the following equivalence. For a particular state αk there is an output bit Xk. The output bit Xk+1 is generated form the same bit position from state αk+I. This is repeated up to N bits in the state. To merge the output bits the matrix R is upshifted. The total combination of bits from state k can then be expressed as the matrix F where G is the Galois field generator matrix:
The matrix F performs a store of the N outputs form the most significant bit of the Galois state. The Galois field multiplier advances the state by a block of time samples. The resulting matrix always has full rank N. All elements on the diagonal are 1 and the matrix is always upper triangular and per-symmetric. Therefore this matrix cannot have zero determinant and is thus always invertible. The rows are last row of every power of G from 0 to N−1.
This provides a mapping from a particular g-space state to the equivalent f-space state. To produce the g-space state for a particular f-space state requires the inverse of F. It is straightforward to calculate the inverse in an iterative manner by a technique called inverse iteration. The feed forward matrix FF to convert a Fibonacci state to a Galois state is its self inverse in many polynomials. In the case of the downlink 3GPP scramble code generators and 1 downlink generator this is true.
The matrix F has a known formula, but none is known for its inverse. The inverse iteration selects the initial estimate of F−1 to be F. The product of these two matrices is added to the identity matrix to form an error matrix e. This is added GF(2) to the estimate of E-1. This is repeated up to N times and always generates the inverse. The algorithm always yields the inverse because the feed forward matrix and the inverse are per-symmetric and lower triangular so that only one bit needs to be changed to correct the whole diagonal at each stage. Once the bit has been set to the correct value all subsequent iterations are orthogonal. This algorithm is summarized as follows:
F0−1=F:i=0
E=F−1i·F+I
if |E|=0:stop
F−1i+1=F−1i+E:i=1, . . . ,N−1
Often the iteration meet the stop condition while requiring fewer than N−1 iterations. In the case of the 42-bit IS95 long code only 4 iterations are needed. The 18-bit 3GPP downlink Y scramble code needs only 1 iteration. The matrices are generally sparse and so only a few non zero diagonals need to be corrected.
A mapping is always needed from the Fibonacci form to the Galois form any time the initial state is arbitrary, thus it is necessary to know the inverse. Thus to preserve the initial state it must first be mapped from Fibonacci to Galois to that it can then be transformed back to Fibonacci.
It is now possible to show the actual time difference between the 2 sequences. The initial state for the Galois machine is chosen as a known value SG0 and the output is taken from bit N−1. The feed-forward matrix F is then applied to this state to produce the equivalent Fibonacci field element. The output from bit 0 is used as the sequence and will be identical to the Fibonacci sequence. A unique feed-forward F can be calculated for a specific polynomial. This gives the required initial starting state of the Fibonacci generator to make the 2 sequences the same.
SF0=F·SG0 (28)
By having the same state SF0 in both machines the output sequences will be offset in phase. We are in a position to calculation this phase for a particular initial Fibonacci state.
SG0=αgSF0=F·αg
SF0=αfαf=F·αg
f=anti logα(F·αg)
offset=anti logα(F·αg)−g
Thus the difference between the two sequences is related, but depends on the actual initial state. If we choose g=0, then we can simplify this equation. With g=0, SF0 is 1 in the most significant bit and zero elsewhere. Multiplying this by F will produce the same value. As predicted in the proof offset=0.
for an arbitrary state, the offset is an unpredictable combination of elements in feed forward matrix. The following is an example implementing a generator using these methods.
The following example is a design of an equivalent Galois generator structure for the 3GPP uplink scramble code. This is a combination of 2 Fibonacci generators X and Y.
Generator polynomial X is g(X)=X25+X3+1 provided by summer 1013. Generator polynomial Y is f(X)=X25+X3+X2+X+1 provided by summer 1023. The initial state of X is a 24 bit number with the twenty fifth state bit set to 1, thus the initial state is n0, n1, n2, n3, n4, n5, n6, n7, n8, n9, n10, n11, n12, n13, n14, n15, n16, n17, n18, n19, n20, n21, n22, n23, ‘1.’ The initial state of Y is all ones. The 2 generators are then summed together modulo 2 in summer 1015. This forms the in-phase part of the sequence. The quadrature part is the same sequence delayed by 16,777,232 chips. The delay is formed in summers 1017 and 1027. Summer 1025 modulo 2 sums these delayed signals for the quadrature output.
The generator matrix for the Fibonacci form is:
Code X has feed forward taps at α4, α7 and α18. These are added together to form the delay. The Galois generator is shown below.
The next step is to calculate the feed forward matrix to convert G to F using the algorithm disclosed above. The feed forward matrix is a self inverse for both codes. Applying the algorithm gives the following feed forward matrix:
This matrix has the same effect as adding (modulo 2) the state to shifted versions of itself. Each diagonal in the feed forward (or feedback) matrix is equivalent to a shift in value to the number of rows for the leading diagonal. So the identity is a shift of 0, in equation (30) the diagonal is 22 from the leading diagonal so is a right shift of 22. In this case:
StateX=StateXˆ(StateX>>22) (30)
In the case of generator X the initial state is defined as n23 . . . n0, the 24 bit binary representation of the scrambling sequence number n with n0 being the least significant bit. The X sequence depends on the chosen scrambling sequence number n and is denoted xn. The initial conditions are: xn(24)=n0, xn(23)=n1, . . . xn(2)=n22, xn(1)=n23, xn(0)=1. This initial Fibonacci state must first be bit reversed because the state in the Fibonacci generator is defined to be a time reversed version of the Galois generator. This state is: bit reverse (0x1000000n). This value goes through the transformation in equation (30). In Galois field over the polynomial g(X)=X25+X3+1, the feed forward path α4+α7+α18, is equal to the Galois field element 0x0040090. When the logarithm of this value is taken to the base of alpha the index is 16777232 as defined in the 3GPP standard. The delay coefficient will then form the initial value of the delayed version of the code. Generator X is thus completely defined using Galois field arithmetic.
The generator matrix for the Fibonacci form is:
Code Y has feed forward taps at α4+α6+α17. These are added together to form the delay element. The Galois generator is:
The process then calculates the feed forward matrix to convert from Galois form to Fibonacci form using the algorithm described above. Applying the algorithm produces the following feed forward matrix:
This matrix has the same effect as adding (modulo 2) the state to shifted versions of itself as shown in equation (31):
StateY=StateYˆ(StateY>>22)ˆ(StateY>>23)ˆ(StateY>>24) (31)
In the case of generator Y the initial Fibonacci state of 0x1FFFFFF is bit reversed and passed through the feedback matrix FF to form the constant initial value 0x1FFFFF8 in Galois the field, over the polynomial f(X)=X25+X3+X2+X1+1. The delayed sequence given by α4+α6+α17 is equal to the Galois filed element 0x020050. When the logarithm of this value is taken to the base of alpha the index is also 16777232 as defined in the 3GPP standard. This completely defines generator Y using Galois field arithmetic.
This method is summarized as follows. The Galois definition of scramble codes X is:
so=bitreverse(0x1000000ˆn)
XAi=αiX0::X0=so(so>>22)::f(x)=X25+X3+1
F(XAi)=XAi⊕(XAi>>22)
XBi=αiX0::X0=(so⊕(so>>22))0x040090
F(XBi)=XBi⊕(XBi>>22)
The 2 sequences XA and XB are given the initial state. In the case of the delayed sequence the initial state is multiplied by the delay coefficient 0x040090. This is possible by the associative property of the multiplication. In these equations: the symbol ⊕ means the element wise modulo 2 addition in GF(2N); and the symbol means the multiplication in the Field GF(2N). The Galois definition of scramble codes Y is:
so=bitreverse(0x1FFFFFF)
YAi=αiY0:Y0=so⊕(so>>22)⊕(so>>23)⊕(so>>24)
f(X)=X25+X3+X2+X1+1
F(YAi)=YAi⊕(YAi>>22)⊕(YAi>>23)⊕(YAi>>24)
YBi=αiY0D0:D0=0x020050
F(YBi)=YBi⊕(YBi>>22)⊕(YBi>>23)⊕(YBi>>24)
The nomenclature F(j) denotes the Fibonacci mapping using the feed forward matrix for each code. The scramble code state can be advanced by an arbitrary amount. Each output i contains 25 consecutive symbols:
YAi=αk·i·Y0 (32)
The distance k between each block of symbols can be chosen by multiplying the state by αk at each time instant i. It may be useful if the hardware is available to advance the sequence by a convenient amount such as 8, 16 or 24 symbols allowing more flexibility.
Consider the example of the 3GPP downlink code G=X18+X10+X7+X5+1 to illustrate the different components of the Galois state. The following is called the recursive definition set and this is generated using the recursive definition method and the mapping describe above.
Tap17=α0=0x00001
Tap16=α1=0x00002
Tap15=α2=0x00004
Tap14=α3=0x00008
Tap13=α4=0x00010
Tap12=α5=0x00020
Tap11=α6=0x00040
Tap10=α7=0x00080
Tap9=α−10+α−5+α−3=0x00101
Tap8=α−9+α−4+α−2=0x00202
Tap7=α−8+α−3+α−1=0x00404
Tap6=α−7+α−2=0x00809
Tap5=α−6+α−1=0x01012
Tap4=α−5=0x02025
Tap3=α−4=0x0404a
Tap2=−3=0x08094
Tap1=α−2=0x0128
Tap0=α−1=0x20250
If mask bits 2 and 4 were set the starting seed would be 0x08094ˆ0x02025=0x0a0b1. This would be post multiplied by a Galois field element to correct the sequence for the right current mask. This keeps the same hardware structure. The feed forward matrix is placed at the end of this chain. The initial seed is this element multiplied by the current initial seed.
In the case of the IS95 long code 42-bit code the table would contain 42 entries. For a software implementation these can be rapidly summed (modulo2) together, for a general speedup these can be encoded into groups of bits. In pairs of bits there would be 4 possible combinations of the 2 entries, making the table 4*42 entries. In groups of 3, 8*42 entries. A direct tradeoff of speed and memory can be made.
In a similar fashion to the section of a mask generation, an arbitrary time offset is often applied to the state to get the needed path delay and system time. Once this time is known, the code can be generated at the arbitrary instant in block format. C0 is the offset coefficient.
C0=αT
Assuming the time offset T is an N bit number, the time can be split into a product series as follows:
This results in a requirement to perform N−1 products. For the case of code X in the 3GPP uplink standard, the power series is shown in Table 3.
This table can predict a state 33 million chips ahead. This is equivalent of a path delay of abut 8.74 seconds, which is a million miles. A path delay of 100 Km is much as is probably required which is 2048 chips at 3.84 MHz. So 11 bits is enough.
The Toff value bits are scanned and the appropriate number of multiples are performed. This may be as many as 25. This is a time consuming process and would cause an increase in hardware to compute the coefficient. Therefore one solution is to group the bits into symbols, which are pre-computed products of elements in the table based on the active bits in the symbol. This is shown in equation (35):
Each block of M bits selects an entry in a table sized 2M. Entries in this table are scaled by α2
The Texas Instruments TMS320C6400 class digital signal processor core such as described above in conjunction with FIGS. 1 to 4 has an extended Galois field multiplier instruction GMPY4.
A·B=A0·B+A1·B·α8·α8+A3·B·α8·α8·α8 (36)
Where A0, A1, A2 and A3 are the first, second, third and fourth byte respectively in a 32-bit data word A. This method can be used to generate the arbitrary time offset calculations and calculation of the initial state values for the 4 generators. A full 32 by 32 multiplication can be using the illustrated GMPY4 instruction with similar production of partial products and addition.
Scramble code generator is split into 2 parts. The first part generates the required states for a particular time offset. This can be performed every frame or even every call and then tracked. Thus the processing overhead can be highly amortized. The second part generates blocks of the scramble code in as efficient manner as possible. In appendices section list the c code that generates the 4 required initial states for scramble code generation. This function takes 129 cycles to execute. It only has to be called once per frame. Once the initial conditions have been generated the codes can be generated in bulk. The most convenient block of generation is 8 bits of each polynomial per iteration. The four 8-bit quantities are combined to form eight 2-bit scramble codes. The equations Clong,1,n(i)=XA(i)+YA(i) and Clong,2,n(i)=XB(i)+YB(i) form the in phase and quadrature sequences. These are then further combined using the following equation:
Clong,n(i)=clong,l,n(i)(1+j(−1)iclong,2,n(2└i/2┘)) (37)
In the following sequence 8 bits of code1 are represented as C1 and 8 bits of code are represented as C2 is represented the in phase 8 bits are generated using the following instructions:
IQ=_pack2(C1,C1ˆ((C2&0xAA)|(C2&0xAA)>>1));
IQ=_shfl(IQ);
IQ=_bitr(IQ);
I and Q are interleaved to make a 16 bit value. The best performance achievable is 2.66 codes per cycle. This is a computational load of 1.44 MHz per channel. Without this technique and the instruction level support of the TMS320C6400 digital signal processor the cost of this operation could be ten times greater, putting a significant limitation on a software implementation.
Fibonacci generators cannot be advanced arbitrarily. However Galois has an equivalent arithmetic representation which is more accessible to software implementation and general purpose hardware. The techniques shown link the two constructs together so the best properties of both can be fully exploited.
The desirable properties of the Fibonacci generator have been made accessible using the feed forward matrix mapping. Once an initial seed is generated it can be advanced by any arbitrary time interval by multiplication by powers of α. This can be further decomposed into a combination of powers of 2. The machine is then advanced by k bits per cycle by a repeated multiplication by αk. The F matrix returns the required Fibonacci state containing the future n bits.
Because there is a fixed mapping for each code, less storage is required than the equivalent matrix technique. Hardware cost is approximately equal in either case for a particular code. However this technique allows the same hardware to be used for any number of codes. The storage requirements for the general case are of the order N where as the power method is of the order N3. Galois field multipliers are very small and compact and of the order of logN*N2 gates.
The following is an example of the generator equations and matrices when using this invention as an IS95 long code generator. The Galois generator equation is:
G(X)=X42+X35+X33+X31+X27+X26+X25+X22+X21+X19+X18+X17+X16+X10+X7+X6+X5+X3+X2+X1+1 (38)
Feed forward matrix to restore the above to the Fibonacci format is:
The transformed inverse feed forward matrix to convert from Fibonacci form to Galois form is:
Elements selected by the mask for equivalent initial state offset is:
The tap weight vector for multiplication by powers of α−1 is:
The 2 generators are then summed together modulo 2 in summer 1215. This forms the in-phase part of the sequence. The quadrature part is the same sequence delayed via summers 1217 and 1227. Summer 1225 modulo 2 sums these delayed signals for the quadrature output. The generator equation G(X)=X18+X10+X7+X5+1 is provided by summer 1213. The transposed conversion matrices the uplink code generator of the form for the generator equation G(X)=X18+X10+X7+X5+1 are:
The delay coefficient is 0x0000FF60. The transposed conversion self inverse matrix for f(x)=X18+X7+1 downlink code 2 produced via summer 1223 is:
The delay coefficient is 0x0008050 for an arbitrary delay of αn=0 to 218−2.
The LFSR structures described above are known as gold codes. A fundamental property of these sequences is that any linear combination of time delayed sequences of these will merely generate another time delayed sequence. This is an observation of the property in Galois field arithmetic. It is equivalent to the fact that a sum of numbers in GF(2N) being a number also in GF(2N). Because the elements of the multiplicative group also form an additive group, a sequence can be shifted in time by multiplying it by a power of α. This multiplication can also be achieved by adding the correct value as shown in equation (39):
The additive group is not as well behaved as the multiplicative group, but it can be used to generate very large shift amounts to sequences with just additive combinations of some very small time shifts. This principle is harnessed in converting the Galois sequence to the Fibonacci one. The Galois state sequence in Table 1 is a set of the parallel gold sequences just like the Fibonacci sequence. Each column in the Galois table is a time shifted version of the first.
This application claims priority under 35 U.S.C. 119(e)(1) to U.S. Provisional Application No. 60/746,673 filed May 8, 2006.
Number | Date | Country | |
---|---|---|---|
60746673 | May 2006 | US |