This application is a 371 of International application PCT/EP2019/079861, filed 31 Oct. 2019, the entire contents of which are hereby fully incorporated herein by reference for all purposes. PCT/EP2019/079861 claims the priority benefit of European patent application EP 18000850.0, filed 31 Oct. 2018, the entire contents of which are hereby fully incorporated herein by reference for all purposes.
The project leading to this application has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No 716562).
The present invention relates to metastability-containing circuits for sorting an arbitrary number of inputs.
Metastability is a fundamental obstacle when crossing clock domains, potentially resulting in soft errors with critical consequence. As it has been shown that metastability cannot be avoided deterministically, synchronizers are employed to reduce the error probability to tolerable levels. This approach trades precious time for reliability: the more time is allocated for metastability resolution, the smaller the probability of metastability-induced faults.
Recently, a different approach has been proposed, coined metastability-containing (MC) circuits (S. Friedrichs, M. Függer and C. Lenzen, “Metastability-Containing Circuits,” in IEEE Transactions on Computers, vol. 67, no. 8, pp. 1167-1183, 1 Aug. 2018). It accepts a limited amount of metastability in the input to a digital circuit and ensures limited metastability of its output, so that the result is still useful. In particular when sorting inputs, metastability can be contained when sorting inputs arising from time-to-digital converters, i.e., measurement values can be correctly sorted without resolving metastability using synchronizers first.
Sorting Networks: Sorting networks sort n inputs from a totally ordered universe by feeding them into n parallel wires that are connected by 2-sort elements, i.e., subcircuits sorting two inputs; these can act in parallel whenever they do not depend on each other's output. A correct sorting network sorts all possible inputs, i.e., the wires are labeled 1 to n such that the ith wire outputs the ith element of the sorted list of inputs. The size of a sorting network is its number of 2-sort elements and its depth is the maximum number of 2-sort elements an input may pass through until reaching the output.
Parallel Prefix Computation: Ladner and Fischer (R. E. Ladner, M. J. Fischer, “Parallel prefix computation”, JACM, vol. 27, no. 4, pp. 831-838, 1980) studied the parallel application of an associative operator to all prefixes of an input string of length l (over an arbitrary alphabet). They give parallel prefix computation (PPC) circuits of depth O(log l) and size O(l) (given a constant-size circuit implementing the operator). A number of additional constructions have been developed for adders, and special cases of the construction by Ladner and Fischer were discovered (in all likelihood) independently, cf. [24]. However, no other construction simultaneously achieves asymptotically optimal depth and size.
It is an object of the invention to provide smaller, faster and less error-prone circuits for sorting possibly metastable inputs.
This object is achieved by a circuit according to independent claim 1. Advantageous embodiments are defined in the dependent claims.
According to an aspect of the invention, CMOS implementations of basic gates realize Kleene logic. The task of comparing inputs can be decomposed into performing a four-valued comparison on each prefix pair of two input strings, followed by inferring the corresponding output bits. Plugging the resulting 2-sort(B) circuits for B-bit inputs into a sorting network for n values readily yields an MC sorting circuit for n valid strings.
The above reduces the task of MC sorting to a parallel prefix computation (PPC) problem, for which circuits that are simultaneously (asymptotically) optimal in depth and size are known due to a celebrated result by Ladner and Fischer (Richard E Ladner and Michael J Fischer. Parallel Prefix Computation. JACM, 27(4):831-838, 1980). According to an aspect of the invention, the inventive circuits can be derived using their framework, which allows for a trade-off between depth and size of the 2-sort circuit. Most prominently, optimizing for depth reduces the depth of the circuit to optimal ┌log B┐, at the expense of increasing the size by a factor of up to 2. However, relying on the construction from Ladner at al. as-is results in a very large fan-out. In a further aspect, the invention proposes reducing fan-out to any number f≥3 without affecting depth, increasing the size by a factor of only 1+O(1/f) (plus at most 3B/2 buffers). In particular, our results imply that the depth of an MC sorting circuit can match the delay of a non-containing circuit, while maintaining constant fan-out and a constant-factor size overhead.
Post-layout area and delay of the designed circuits compare favorably with a baseline provided by a straightforward non-containing implementation.
We set [N]:={0, . . . , N−1} for N∈ and [i, j]={i, i+1, . . . , j} for i, j∈, i≤j. We denote :={0, 1} and M:={0,1, M}. For a B-bit string g∈M and i∈[1,B], denote by gi its i-th bit, i.e., g=g1g2 . . . gB. We use the shorthand gi,j:=gi . . . gj, where i, j∈[1,B] and i≤j. Let par(g) denote the parity of g∈B, i.e, par(g)=Σi=1Bgi mod 2. For a function f and a set A, we abbreviate f(A):={f(y)|y∈A}.
A standard binary representation of inputs is unsuitable: uncertainty of the input values may be arbitrarily amplified by the encoding. E.g. representing a value unknown to be 11 or 12, which are encoded as 1011 resp. 1100, would result in the bit string 1MMM, i.e., a string that is metastable in every position that differs for both strings. However, 1MMM may represent any number in the interval from 8 to 15, amplifying the initial uncertainty of being in the interval from 11 to 12. An encoding that does not lose precision for consecutive values is Gray code.
A B-bit binary reflected Gray code, rgB:[N]→, is defined recursively. For simplicity (and without loss of generality) we set N:=2B. A 1-bit code is given by rg1(0)=0 and rg1(1)=1. For B>1, we start with the first bit fixed to 0 and counting with rgB−1(⋅) (for the first 2B−1 codewords), then toggle the first bit to 1, and finally “count down” rgB−1(⋅) while fixing the first bit again, cf. Table 1 (
As each B-bit string is a codeword, the code is a bijection and the encoding function also defines the decoding function. Denote by ⋅:B→[N] the decoding function of a Gray code string, i.e., for x∈[N], rgB(x)=x.
For two binary reflected Gray code strings g, h∈B, we define their maximum and minimum as
For example:
maxrg{0011,0100}=maxrg{rgB(2),rgB(7)}=0100,
minrg{0111,0101}=minrg{rgB(9),rgB(10)}=0111.
Inputs to the sorting circuit may have some metastable bits, which means that the respective signals behave out-of-spec from the perspective of Boolean logic. However, they are valid strings in the sense of the invention. Valid strings have at most one metastable bit. If this bit resolves to either 0 or 1, the resulting string encodes either x or x+1 for some x, cf. Table 2 (
More formally, if B∈ and N=2B, the set of valid strings of length B is defined as
The operator * is called the superposition and is defined as
The specification of maxrg and minrg may be extended to valid strings in the above sense by taking all possible resolutions of metastable bits into account. More particularly, in order to extend the specification of maxrg and minrg to valid strings, the metastable closure (Stephan Friedrichs, Matthias Függer, and Christoph Lenzen. Metastability-Containing Circuits. Transactions on Computers, 67, 2018) is used. The metastable closure of an operator on binary inputs extends it to inputs that may contain metastable bits, by considering all possible stable resolutions of the inputs, applying the operator and taking the superposition of the results.
The closure is the best one can achieve w.r.t. containing metastability with clocked logic using standard registers, i.e., when fM(x)i=M, no such implementation can guarantee that the ith output stabilizes in a timely fashion.
If one wants to construct a circuit computing the maximum and minimum of two valid strings, allowing to build sorting networks for valid strings, one also needs to answer the question what it means to ask for the maximum or minimum of valid strings. To this end, suppose a valid string is rgB(x)*rgB(x+1) for some x∈[N−1], i.e., the string contains a metastable bit that makes it uncertain whether the represented value is x or x+1. If one waits for metastability to resolve, the string will stabilize to either rgB(x) or rgB(x+1). Accordingly, it makes sense to consider rgB(x)*rgB(x+1) “in between” rgB(x) and rgB(x+1), resulting in the following total order on valid strings (cf. Table 2,
Definition (<). A total order < is defined on valid strings as follows. For g, h∈B, g<h⇔g<h. For each x∈[N−1], we define rgB(x)<rgB(x)*rgB(x+1)<rgB(x+1). We extend the resulting relation on SrgB×SrgB to a total order by taking the transitive closure. Note that this also defines ≤, via g≤h⇔(g=h∨g<h).
We intend to sort with respect to this order. It turns out that implementing a 2-sort circuit w.r.t. this order amounts to implementing the metastable closure of maxrg and minrg. In other words, maxMrg and minMrg are the max and min operators w.r.t. the total order on valid strings shown in Table 2 (
maxMrg{1001,1000}=rg4(15)=1000,
maxMrg{0M10,0010}=rg4(3)*rg4(4)=0M10, and
maxMrg{0M10,0110}=rg4(4)=0110.
Hence, our task is to implement maxMrg and minMrg.
Definition (2-sort(B)). For B∈, a 2-sort(B) circuit is specified as follows.
Input: g, h∈SrgB
Output: g′, h′∈SrgB
Functionality: g′=maxMrg{g, h}, h′=minMrg{g, h}.
The invention seeks to use standard components and combinational logic only. In particular, the behavior of basic gates on metastable inputs may be specified via the metastable closure of their behavior on binary inputs, cf. Table 3 (
It can be shown that the basic CMOS gates shown in
More particularly,
Because the parity keeps track of whether the remaining bits are compared w.r.t. the standard or “reflected” order, the state machine performs the comparison correctly w.r.t. the meaning of the states indicted in
For all i∈[1,B], we have that max
In order to extend this approach to potentially metastable inputs, all involved operators are replaced by their metastable closure: for i∈[1, B] (i) compute s(i), (ii) determine maxrg{g, h}i and minrg{g, h}i according to Table 4 (
The reader may raise the question why we compute sM(1) for all i∈[09,B−1] instead of computing only sM(B)) with a simple tree of ⋄M elements, which would yield a smaller circuit. Since sM(B) is the result of the comparison of the entire strings, it could be used to compute all outputs, i.e., we could compute the output by out(sM(B), gihi) instead of out(s(i−1)
While it is not obvious that this approach yields correct outputs, it may be formally proven that: (P1) ⋄M is associative. (P2) repeated application of ⋄M computes sM(i). (P3) applying outM to sM(i−1) and gihi results for all valid strings in maxMrg{g, h}i minMrg{g, h}i. This yields the desired correctness. Regarding the first point, we note the statement that ⋄M is associative does not depend on B. In other words, it can be verified by checking for all possible x, y, z∈B;M2 whether (x⋄My)⋄Mz=x⋄M(y⋄Mz).
While it is tractable to manually verify all 36=729 cases (exploiting various symmetries and other properties of the operator), it is tedious and prone to errors. Instead, it was verified that both evaluation orders result in the same outcome by a short computer program, proving the desired associativity of the operator.
For the convenience of the reader, Table 6 (
It may be observed that for g∈SrgB, if there is an index 1≤m<B such that gm=M then gm+1,B=10B−m−1.
The reasoning is based on distinguishing two main cases: one is that sM(i) contains at most one metastable bit, the other that sM(i′)MM. Each of these cases can be proven by technical statements.
It may further be observed that if |res(sM(i))|≤2 for any i∈[B+1], then res(sm(i))=⋄j=1ires(gjhj).
The operator out: B2×B2→B2 is the operator given in Table 4 (
In order to derive a small circuit from the above, we make use of the PPC framework by Ladner and Fischer. They described a generic method that is applicable to any finite state machine translating a sequence of B input symbols to B output symbols, to obtain circuits of size O(B) and depth O(log B). They reduce the problem to a parallel prefix computation (PPC) task by observing that each input symbol defines a restricted transition function, whose compositions evaluated on the starting state yield the state of the machine after the corresponding number of steps. This matches our needs, as we need to determine sM(i) for each i∈[B]. However, their generic construction involves large constants. Fortunately, we have established that ⋄M:BM2×BM2 is an associative operator, permitting us to directly apply the circuit templates for associative operators they provide for computing sM(i)=(⋄M)j=1igjhj for all i∈[B]. Accordingly, only these templates are discussed.
We revisit the part of the framework relevant to our construction, also providing a minor improvement on their results in the process. To this end, we first formally specify the PPC task for the special case of associative operators.
Definition 5.1 (PPC⊕(B)). For associative ⊕: D×D→D and B∈N, a PPC⊕(B) circuit is specified as follows.
Input: d∈DB,
Output: π∈DB,
Functionality: πi=⊕j=1idj for all i∈[1, B].
In our case, ⊕=⋄M and D=BM2 the method by Ladner et al. provides a family of recursive constructions of PPC⊕ circuits. They are obtained by combining two different recursive patterns.
More particularly, suppose that C and P are circuits implementing ⊕ and PPC⊕(┌B/2┐) for some B∈N, respectively. Then applying the recursive pattern given at the left of
The second recursive pattern, shown in
Definition. T0 is a single leaf. T1 consists of the (right) root and two attached leaves. For b≥2, Tb can be constructed from Tb−1 and Tb−2 by taking a (right) root r, attaching the root of Tb−1 as its right child, a new left node l as the left child of r, and then attaching the root of Tb−2 as (only) child of l.
The recursive construction is now defined as follows. A right node applies the pattern given in
In the following, denote by PPC(C,Tb) the circuit that results from applying the recursive construction described above to the base circuit C implementing ⊕. Moreover, we refer to the ith input and output of the subcircuit corresponding to node ν∈Tb as diν and πiν, respectively.
It may be shown that If C implements ⊕, PPC(C,Tb) is a PPC⊕(2b)circuit and PPC(C,Tb) has depth b·d(C).
It remains to bound the size of the circuit. Denote by Fi, i∈N, the ith Fibonacci number, i.e., F1=F2=1 and Fi+1=Fi+Fi−1 for all 2≤i∈N. Then it may be shown that PPC(C,Tb) has size (2b+2−Fb+5+1)|C|.
Asymptotically, the subtractive term of Fb+5 is negligible, as Fb+5∈(1/√{square root over (5)}+0(1))((1+√{square root over (5)})/2)b+5⊆O(1.62b); however, unless B is large, the difference is substantial. We also get a simple upper bound for arbitrary values of B. To this end, we “split” in the recursion such that the left branch is “complete,” while applying the same splitting strategy on the right. This is where our construction differs from and improves on the method of Ladner et al. They perform a balanced split and obtain an upper bound of 4B on the circuit size.
It follows that for B∈N and circuit C implementing ⊕, set b:=┌log B┐. Then a PPC⊕(B) of depth ┌log B┐d(C) and size smaller than (5B−2b−Fb+3)|C|≤(4B−Fb+3) exists.
We remark that one can give more precise bounds by making case distinctions regarding the right recursion, which for the sake of brevity we omit here. Instead, we computed the exact numbers for B≤70.
The construction derived from iterative application of the above results can be combined with PPC(C,Tb), achieving the following trade-off; note that if B=2b for b∈N, then F┌log B┐−k+3 can be replaced by Fb−k+5.
Suppose C implements ⊕. For all k∈[┌log B┐+1] and b∈N, there is a PPC⊕(B) circuit of depth (┌log B┐+k)d(C) and size at most
The optimal depth construction incurs an excessively large fan-out of Θ(B), as the last output of left recursive calls needs to drive all the copies of C that combine it with each of the corresponding right call's outputs. This entails that, despite its lower depth, it will not result in circuits of smaller physical delay than simply recursively applying the construction from
We now modify the recursive construction to ensure a constant fan-out, at the expense of a limited increase in size of the circuit. The result is the first construction that has size O(B), optimal depth, and constant fan-out.
In the following, we denote by f≥3 the maximum fanout we are trying to achieve, where we assume that gates or memory cells providing the input to the circuit do not need to drive any other components. For simplicity, we consider C to be a single gate.
We proceed in two steps. First, we insert 2B buffers into the circuit, ensuring that the fan-out is bounded by 2 everywhere except at the gate providing the last output of each subcircuit corresponding to a right node. In the second step, we will resolve this by duplicating such gates sufficiently often, recursively propagating the changes down the tree. Neither of these changes will affect the output of the circuit or its depth, so the main challenges are to show our claim on the fan-out and bounding the size of the final circuit.
Step 1: Almost Bounding Fan-Out by 2
Before proceeding to the construction in detail, we need some structural insight on the circuit.
For node ν∈Tb, define its range Rυ and left-count αυ recursively as follows.
Suppose the subcircuit of PPC(C,Tb) represented by node ν∈Tb in depth d∈[b+1] has range Rυ=[i, i+j].
This leads to an alternative representation of the circuit PPC(C,Tb), see
From this representation, we will derive that the following modifications of PPC(C,Tb) result in a PPC⊕(2b) circuit PPC(C,Tb)′, for which a fan-out larger than 2 exclusively occurs on the last outputs of subcircuits corresponding to nodes of Tb.
With the exception of gates providing the last output of subcircuits corresponding to nodes of Tb (600, 602 in
It remains to count the inserted buffers. The following helper statement will be useful for this, but also later on.
Denote by Lb⊆Tb the set of leaves of Tb. Then |Lb|=Fb+2 and Σν∈L
Next, consider the recurrence given by L′0=1, L′1=2, and L′b=L′b−1+2L′b−2 for b≥2; the factor of 2 assigns twice the weight to the subtree rooted at the child of the root's left child, thereby ensuring that each leaf is accounted for with weight 2α
Denote by s the size of a buffer. Then
|PPC(C,Tb)′|=|PPC(C,Tb)|+(2b+2b−1−Fb+3)s.
Step 2: Bounding Fan-Out by f
In the second step, we need to resolve the issue of high fan-out of the last output of each recursively used sub circuit in PPC(C,Tb)′. Our approach is straightforward. Starting at the root of Tb and progressing downwards, we label each node υ with a value α(υ) that specifies a sufficient number of additional copies of the last output of the sub circuit represented by υ to avoid fan-out larger than f. At right nodes, this is achieved by duplicating the gate computing this output sufficiently often, 600, 602 in
We start by defining α(υ) and then argue how to use these values for modifying the circuit to obtain our fan-out f circuit. Afterwards, we will analyze the increase in size of the circuit compared to PPC(C,Tb)′.
Definition (α(υ)). Fix b∈N0. For ν∈Tb in depth d∈[b+1], define
Suppose that for each leaf ν∈Tb, there are └a(ν)┘ additional copies of the root of the aggregation tree, and for each right node ν∈Tb, we add └a(ν)┘ gates that compute (copies of) the last output of their corresponding sub circuit of PPC(C,Tb)′. Then we can wire the circuit such that all gates that are not in aggregation trees have fan-out at most f, and each output of the circuit is driven by a gate or buffer driving only this output.
It remains to modify the aggregation trees so that sufficiently many copies of the roots' output values are available.
Consider an aggregation tree corresponding to leaf ν∈Tb and fix f≥3. We can modify it such that the fan-out of all its non-root nodes becomes at most f, there are └a(ν)┘ additional gates computing the same output as the root, and at most (fa(ν))/(f−2)+(2α
Finally, we need to count the total number of gates we add when implementing these modifications to the circuit.
For f≥3, define PPC(f)(C,Tb) by modifying PPC(C,Tb)′ according to Lemmas 5.15 and 5.16. Then, with λ1:=(1+√{square root over (5)})/4, |PPC(f)(C,Tb)| is bounded by
We summarize our findings in the following:
Suppose that C implements ⊕, buffers have size s and depth at most d(C), and set λ1:=(1+√{square root over (5)})/4. Then for all k∈[b+1], b∈N0 and f≥3, there is a PPC⊕(2b) circuit of fan-out f, depth (b+k)d(C), and size at most
Due to space constraints, we refrain from analyzing the size of the construction for values of B that are not powers of 2. However, in
First, we need to specify implementations of the sub circuits computing ⋄M and outM.
From Tables 5a and 5b (
(s⋄b)1=s1
(s⋄b)2=
out(s,b)1=
out(s,b)2=s1b2+s2b1+b1b2.
In general, realizing a Boolean formula f by replacing negation, multiplication, and addition by inverters, AND, and OR gates, respectively, does not result in a circuit implementing fM1 However, we can easily verify that the above formulas are disjunctions of all prime implicants of their respective functions. As one can manually verify, these formulas evaluate to the truth tables given in Tables 6 and 7 (
(s⋄b)1=s1(
(s⋄b)2=s2(
out(s,b)1=b1(b2+
out(s,b)2=b2(b1+s1)+b1s2.
1 For instance, (s⋄b)1=s1b1+
We see that, in fact, a single circuit with suitably wired (and possibly negated) inputs can implement all four operations. As for sel1=
XMUX(sel1,sel2,x,y):=y(x+sel2)+x sel1.
Table 8 (
We note that this circuit is not a particularly efficient XMUX implementation; a transistor-level implementation would be much smaller. However, our goal here is to verify correctness and give some initial indication of the size of the resulting circuits—a fully optimized ASIC circuit is beyond the scope of this article. The size of the implementation may be slightly reduced by moving negations. Due to space limitations, we refrain from detailing this modification here, but note that
We now have all the pieces in place to assemble a containing 2-sort(B) circuit. As stated above, ⋄M is associative. Thus, from a given implementation of ⋄M (e.g., two copies of the circuit from
The correctness of this construction follows from the above explanations, where we can plug in any PPC⋄
More particularly, we implemented the design given in
For the implementation of PPC⋄
After behavioral simulation we continue with a comparison of our design and a standard sorting approach Bin-comp(B). As mentioned earlier, the 2-sort(B) implementation given in
Since metastability-containing circuits may include additional gates that are not required in traditional Boolean logic, Boolean optimization may compromise metastability containing properties. Accordingly, we were forced to disable optimization during synthesis of the circuits.
As a binary benchmark Bin-comp was used: In short, Bin-comp consists of a simple VHDL statement comparing two binary encoded inputs and outputting the maximum and the minimum, accordingly. It follows the same design process as 2-sort, but then undergoes optimization using a more powerful set of basic gates. For example, the standard cell library provides prebuilt multiplexers. These multiplexers are used by Bin-comp, but not by 2-sort. We stress that these more powerful gates provide optimized implementations of multiple Boolean functions, yet each of them is still counted as a single gate. Thus, comparing our design to the binary design in terms of gate count, area, and delay disfavors our solution. Moreover, we noticed that the optimization routine switches to employing more powerful gates when going from B=8 to B=16 (cf.
Nonetheless, our design performs comparably to the non-containing binary design in terms of delay, cf.
We emphasize that we refrained from optimizing the design by making use of all available gates or devising transistor-level implementations, since such an approach is tied to the utilized library or requires design of standard cells.
In conclusion, we demonstrated that efficient metastability containing sorting circuits are possible. Our results indicate that optimized implementations can achieve the same delay as non-containing solutions, without a dramatic increase in circuit size. This is of high interest to an intended application motivating us to design MC sorting circuits: fault tolerant high-frequency clock synchronization. Sorting is a key step in envisioned implementations of the Lynch-Welch algorithm with improved precision of synchronization. The complete elimination of synchronizer delay is possible due to the efficient MC sorting networks presented in this article; enabling an increment of the rate at which clock corrections are applied, significantly reducing the negative impact of phase drift of local clock sources on the precision of the algorithm.
More generally speaking, MC circuits like those presented here are of interest in mixed signal control loops whose performance depends on very short response times. When analog control is not desirable, traditional solutions incur synchronizer delay before being able to react to any input change. Using MC logic saves the time for synchronization, while metastability of the output corresponds to the initial uncertainty of the measurement; thus, the same quality of the computational result can be achieved in shorter time. Note that our circuits are purely combinational, so they can be used in both clocked and asynchronous control logic.
Examples of such control loops are clock synchronization circuits, but MC has been shown to be useful for adaptive voltage control and fast routing with an acceptable low probability of data corruption as well.
Number | Date | Country | Kind |
---|---|---|---|
18000850 | Oct 2018 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2019/079861 | 10/31/2019 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/089408 | 5/7/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5781465 | Lutz | Jul 1998 | A |
9831888 | Waltari | Nov 2017 | B1 |
Entry |
---|
WIPO, International Search Report dated Jan. 27, 2020 in International Appln. No. PCT/EP2019/079861, (3 p.). |
WIPO, Written Opinion dated Dec. 19, 2019 in International Appln. No. PCT/EP2019/079861, (6 p.). |
WIPO, International Preliminary Report dated Dec. 8, 2020 in International Appln. No. PCT/EP2019/079861, (7 p.). |
Bund et al., “Optimal Metastability-Containing Sorting Networks,” arxiv.org, Jan. 22, 2018, pp. 1-6. |
Friedrichs et al., “Metastability-Containing Circuits,” arxiv.org, Jun. 21, 2016, pp. 1-29. |
Lenzen et al., “Efficient Metastability-Containing Gray Code 2-Sort,” 2016 22nd IEE International Symposium on Asynchronous Circuits and Systems, May 8, 2016, pp. 49-56. |
David Harris, “A Taxonomy of Parallel Prefix Networks,” Conference Record of the 37th Asilomar Conference on Signals, vol. 2, Nov. 9, 2003, pp. 2213-2217. |
Number | Date | Country | |
---|---|---|---|
20210349687 A1 | Nov 2021 | US |