High-speed wireline interfaces can be configured with a digital feedback equalizer (DFE) to improve the bit error rate (BER). However, a DFE can be challenging to implement at high data rates due to the necessity of making and propagating a decision every unit interval (UI).
In the drawings, like numerals may describe the same or similar components or features in different views. Like numerals having different letter suffixes may represent different instances of similar components. Some embodiments are illustrated by way of example, and not limitation, in the figures of the accompanying drawings in which:
The following detailed description refers to the accompanying drawings. The same reference numbers may be used in different drawings to identify the same or similar elements. In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular structures, architectures, interfaces, techniques, etc., to provide a thorough understanding of the various aspects of various embodiments. However, it will be apparent to those skilled in the art having the benefit of the present disclosure that the various aspects of the various embodiments may be practiced in other examples that depart from these specific details. In certain instances, descriptions of well-known devices, circuits, and methods are omitted so as not to obscure the description of the various embodiments with unnecessary detail.
The following description and the drawings sufficiently illustrate specific embodiments to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. Portions and features of some embodiments may be included in or substituted for those of other embodiments. Embodiments outlined in the claims encompass all available equivalents of those claims.
As used herein, the term “chip” (or die) refers to a piece of a material, such as a semiconductor material, that includes a circuit, such as an integrated circuit or a part of an integrated circuit. The term “memory IP” indicates memory intellectual property. The terms “memory IP,” “memory device,” “memory chip,” and “memory” are interchangeable.
The term “a processor” configured to carry out specific operations includes both a single processor configured to carry out all of the operations (e.g., operations or methods disclosed herein) as well as multiple processors individually configured to carry out some or all of the operations (which may overlap) such that the combination of processors carry out all of the operations.
Ultra-high-speed wireline links such as Peripheral Component Interconnect Express (PCIe) Generation 7 (Gen7) at 128 Gb/s (PAM4 at 64 Gb/s) use ADC-based analog front ends followed by extensive digital equalization (e.g., as illustrated in
Incoming received signals 102 are processed in the analog domain and the digital domain to generate digital output symbols 112.
The FFE 108 can be configured with more than 20 taps. However, an FFE produces undesirable undershoots in the pulse response when used to cancel the first post-cursor. Linear equalizers, like the FFE 108, can amplify noise and cross-talk as well.
A 1-tap DFE can provide significant improvement in the bit error rate (BER), as can be seen in the bathtub curves of
In reference to
As seen in
The disclosed techniques can be used to implement a full 1-tap DFE for ultra-high-speed wireline interfaces that reduces the latency by greater than ten times (e.g., latency of less than Ins) over the current state of the art.
An example of a prior solution is illustrated in
In DFE 300, the received symbols are processed in groups of 64 with a 64 UI clock (1 GHz for 128 Gbps). The fundamental timing constraint would be to compute and propagate 64 decisions in 64 UI. The 1-tap DFE for each symbol requires a multiply-add and a decision slicer operation to be completed within 1 UI (˜15.6 ps), which is not practical. Prior solutions have, therefore, resorted to breaking the feedback for every 16th symbol and replacing it with a “feedforward” decision based on the partially equalized symbol from the previous UI (e.g., using input symbol 302). This reduces the timing constraint to compute and propagate 16 decisions in 64 UI. However, even this is challenging to complete within 64 UI (e.g., 1 ns), and pipelining across several cycles is required to close timing. This results in a significant overhead in terms of latency.
The following disadvantages are associated with the existing DFE solution illustrated in
(a) High Datapath Latency: Pipelining results in high latency in the critical datapath (5 ns for DFE), which is unacceptable in memory-access applications.
(b) Lower Performance due to partial DFE: As described above, every 16th symbol uses the decision on the partially equalized symbol from the previous UI without the benefit of decision feedback equalization. This increases the probability of error in the recovered data, and the larger the DFE coefficient, the higher the likelihood of an error. Once an error is generated, this can also propagate to other symbols, which will degrade the overall BER. Therefore, this implementation puts a practical limit on how much first post-cursor inter-symbol interference (ISI) can be corrected using the digital DFE.
(c) High Flop Count: Pipelining also necessitates additional flop stages to carry forward the unequalized digital samples, as well as the decisions post-DFE, till all the 16 UIs have finished calculating. For a 64 UI datapath, the previous solution will require an additional 1584 flops over the proposed approach.
(d) Degraded CDR Performance: The high latency of the previous approach also precludes using the post-DFE data in the proportional path of a second-order CDR loop. In order to meet the loop latency requirements, a parallel FFE needs to be implemented, and this will add more area and power as well.
The disclosed DFE configurations (e.g., DFEs described in connection with
This invention provides a complete 1-tap DFE solution with much lower latency (reduced from 5 ns to ˜0.5 ns) and fewer physical flops. As a result, the RX data path latency can be shrunk from 7 ns to 3 ns, enabling faster memory accesses via CXL and inter-socket data accesses via UXI. It also enables achieving higher CDR bandwidth (10 MHz), which is required for PCIe, by keeping the CDR proportional path latency low. It significantly cuts down the number of flops (by 1584) over the prior art and even more by potentially eliminating the need for a separate parallel digital equalizer for the CDR.
In some aspects, DFEs 404 can be used to reduce the computation time for a digital DFE by parallelizing the computations with loop unrolling and multiple symbols lookahead processing. Within a given block of symbols, the computation time can be reduced from O(N) to O(log2N). For the decision feedback path, this can be reduced from O(N) to just 1 in the most optimal case. For example, DFEs 404 complete within 0.5 ns, which allows integration of error slicer and phase detector logic within the same cycle.
The disclosed techniques include an approach for doing a multiple symbol look ahead for a digital DFE for ultra-high speed wireline links like PCIe Gen7 at 128 Gbps (GS/s). This is essential for the practical realization of a complete digital DFE at these speeds.
In some aspects, a DFE uses the decision on the previously received symbol to remove the ISI on the current symbol introduced by the previous symbol.
The inputs are a sampled digital representation of the incoming signal generated using an ADC. Multiple symbols are accumulated and then sent to the digital signal processing block to recover the initially transmitted symbol value. The total delay to perform DFE for a set of n symbols would be:
A slicer (e.g., comparator 512A) gets a 7-bit value corresponding to the value of the input voltage and it will compare this to 3 pre-programmed voltage values. The delay Tslice is the amount of logic delay to compare against the 3 reference voltage levels and generate the output.
The fundamental constraint for timing closure is as follows:
In some aspects, loop unrolling can be used to reduce the delay through the DFE, where the symbols for all the different combinations of the previous symbol are calculated in parallel, and the decisions are then propagated through all the n symbols.
The processing line for input symbol 612 includes adders 614, comparators 616, and multiplexer 618. Adders 614 are used to add the input symbol with possible previous output symbols to obtain possible outputs corresponding to the input symbol 612. Multiplexer 618 selects one of the possible outputs as digital output symbol 620 based on a selection signal (e.g., a previously determined output symbol such as digital output symbol 610).
The processing line for input symbol 712 includes adders 714, comparators 716, and multiplexer 718. Adders 714 are used to add the input symbol with possible previous output symbols to obtain possible outputs corresponding to the input symbol 712. Multiplexer 718 selects one of the possible outputs as digital output symbol 720 based on a selection signal (e.g., a previously determined output symbol such as digital output symbol 710).
In the examples of
The delay is still of the order of O(n), but the daisy chained delays are multiplexer (MUX) delays, which are smaller than adder, slicer, and multiplier delays. Assuming Tmux to be 3 gate delays and n=64, the loop-unrolling approach can be viable until about a symbol rate of 30 GS/s (and lower than that).
The disclosed techniques herein below include a method to reduce the delay further using a symbol look ahead approach that can reduce the total MUX delays to O(log2n). First, a one-symbol look ahead is described below. A tree-based approach is next presented, which allows a lookahead across multiple symbols and enables the reduction of total delay from O(n) to O(log2n).
The processing line for input symbol 812 includes adders 814, comparators 816, and multiplexers 818, 820, and 822. Adders 814 are used to add the input symbol with possible previous output symbols to obtain possible outputs corresponding to the input symbol 812. Multiplexers 818-822 are used for selecting one of the possible outputs as digital output symbol 824 based on selection signals from the processing line for input symbol 802 as well as the previously determined output symbol such as D_prev.
The processing line for input symbol 912 includes adders 914, comparators 916, and multiplexers 918 and 920. Adders 914 are used to add the input symbol with possible previous output symbols to obtain possible outputs corresponding to the input symbol 912. Comparators 916 are used to compare the input signal received from the adders with a known voltage signal to generate a digital output symbol. Multiplexers 918-920 are used for selecting one of the possible outputs as digital output symbol 922 based on selection signals from the processing line for input symbol 902 as well as the previously determined output symbol such as D_prev.
The loop unrolled values from the first symbol are used to reduce the possible values for the second symbol to values that depend only on the symbol coming into this set of 2 symbols. The total DFE forward and feedback path computation delays are now reduced to the following:
This principle can then be extended even further across multiple symbols all the way up to N input symbols. In some aspects, the multiple symbol lookahead can be implemented in a tree-style structure. The symbol lookahead for groups of 2 symbols is generated first, followed by groups of 4 symbols, and so on till all N symbols have been covered. For ease of understanding, this is represented in terms of a dot diagram, which is explained in connection with
As illustrated in
As illustrated in
The processing line for input symbol 1302 includes adders 1304 and comparators 1306. Adders 1304 are used to add the input symbol with possible previous output symbols to obtain possible outputs corresponding to the input symbol 1302. Comparators 1306 compare the output symbol from adders 1304 with a voltage reference signal 1308 to generate corresponding digital output symbols 1310, 1312, 1314, and 1316, which represent all possible output symbols associated with input symbol 1302.
Multiplexers 1404, 1406, 1408, and 1410 use corresponding selection signals 1412, 1414, 1416, and 1418 to select a digital output symbol 1420, 1422, 1424, and 1426 as multiplexer outputs.
Selection signals for the 4×4 MUXs 1518-1540 can be output symbols from loop unrolling circuits or other 4×4 MUXs (e.g., as illustrated in
The DFE configurations of
In some aspects, for a grouping of m symbols of lookahead, the number of 4×4 MUX levels in the forward path would be log2(m).
The constraints for the total delay through the forward and feedback logic paths of the DFE will be as follows:
When n=m, the total MUX delays will be log(n)+1 and the delay along the feedback path will be 1, i.e., a constant independent of n.
Plugging in the initial logic depth estimates, it is seen that with the appropriate choice of the number of symbols to lookahead (m), a complete 1-tap DFE can be implemented for 64 GS/s or even higher symbol rates. In some aspects, the forward evaluation path can be pipelined to arbitrary depth if needed. However, for PCIe Gen7 at 64 GS/s, no pipelining is required with a 64 UI logic clock as the total logic depth estimate with n=64 and m=8 is about 50 corresponding to a total gate delay estimate of 0.5 ns, which is less than the clock period of Ins. Likewise, the feedback path delay estimate for a logic depth estimate of 24 is 240 ps, which is less than the clock period.
Table 1 below shows a comparison of the look ahead implementation with the prior approach. The DFE by itself is about 40% larger. However, when the additional cost of adding an FFE for the CDR path is accounted for, the area for the entire DSP (FFE+DFE+PD) is comparable. Without including the additional FFE for the CDR, the area impact for the entire DSP (FFE+DFE) is only ˜4%, since the DFE is only about 1/10th the size of the FFE.
In this regard, the disclosed DFE techniques can be used for reducing the total evaluation time for a digital DFE and for reducing it from O(n) to O(log(n)) by using a decision look ahead tree. This will cut the evaluation delay for a digital DFE by >5-10× over prior art with comparable area while using fewer logic flops and will enable ultra-high speed wireline links to employ a complete digital DFE solution with the associated performance improvements.
At operation 1802, at least a first input symbol and a second input symbol are received (e.g., as received by loop unrolling circuits 1102 and 1104).
At operation 1804, a first plurality of modified symbols is generated based on the first input symbol and a plurality of symbol versions associated with a previously generated output symbol (e.g., as generated by the adders and comparators of the loop unrolling circuit 1104).
At operation 1806, a first plurality of output symbols is generated based on the first plurality of modified symbols (e.g., the output symbols from loop unrolling circuit 1104).
At operation 1808, the first plurality of output symbols are multiplexed (e.g., by 4×4 MUX 1110) to generate the first multiplexed output symbols using a first selection signal. The first selection signal is based on one or more of a second plurality of output signals corresponding to a second input signal (e.g., the selection signal for 4×4 MUX 1110 is based on the output symbols generated by loop unrolling circuit 1102).
At operation 1810, one of the first multiplexed output symbols is output (e.g., by MUX 1120) using a second selection signal. The second selection signal is based on the previously generated output symbol (e.g., output symbol Din, as illustrated in
Machine (e.g., computer system) 1900 may include a hardware processor 1902 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 1904, and a static memory 1906, some or all of which may communicate with each other via an interlink (e.g., bus) 1908. In some aspects, the main memory 1904, the static memory 1906, or any other type of memory (including cache memory) used by machine 1900 can be configured based on the disclosed techniques or can implement the disclosed memory devices.
Specific examples of main memory 1904 include Random Access Memory (RAM) and semiconductor memory devices, which may include, in some embodiments, storage locations in semiconductors such as registers. Specific examples of static memory 1906 include non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; RAM; and CD-ROM and DVD-ROM disks.
Machine 1900 may further include a display device 1910, an input device 1912 (e.g., a keyboard), and a user interface (UI) navigation device 1914 (e.g., a mouse). In an example, the display device 1910, the input device 1912, and the UI navigation device 1914 may be a touch screen display. The machine 1900 may additionally include a storage device (e.g., drive unit or another mass storage device) 1916, a signal generation device 1918 (e.g., a speaker), a network interface device 1920, and one or more sensors 1921, such as a global positioning system (GPS) sensor, compass, accelerometer, or other sensors. The machine 1900 may include an output controller 1928, such as a serial (e.g., universal serial bus (USB), parallel, or other wired or wireless (e.g., infrared (IR), near field communication (NFC), etc.) connection to communicate or control one or more peripheral devices (e.g., a printer, card reader, etc.). In some embodiments, the hardware processor 1902 and/or instructions 1924 may comprise processing circuitry and/or transceiver circuitry.
The storage device 1916 may include a machine-readable medium 1922 on which one or more sets of data structures or instructions 1924 (e.g., software) embodying or utilized by any one or more of the techniques or functions described herein can be stored. Instructions 1924 may also reside, completely or at least partially, within the main memory 1904, within static memory 1906, or the hardware processor 1902 during execution thereof by the machine 1900. In an example, one or any combination of the hardware processor 1902, the main memory 1904, the static memory 1906, or the storage device 1916 may constitute machine-readable media.
Specific examples of machine-readable media may include non-volatile memory, such as semiconductor memory devices (e.g., EPROM or EEPROM) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; RAM; and CD-ROM and DVD-ROM disks.
While the machine-readable medium 1922 is illustrated as a single medium, the term “machine-readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database and/or associated caches and servers) configured to store instructions 1924.
An apparatus of the machine 1900 may be one or more of a hardware processor 1902 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a hardware processor core, or any combination thereof), a main memory 1904 and a static memory 1906, one or more sensors 1921, a network interface device 1920, one or more antennas 1960, a display device 1910, an input device 1912, a UI navigation device 1914, a storage device 1916, instructions 1924, a signal generation device 1918, and an output controller 1928. The apparatus may be configured to perform one or more of the methods and/or operations disclosed herein. The apparatus may be intended as a component of machine 1900 to perform one or more of the methods and/or operations disclosed herein and/or to perform a portion of one or more of the methods and/or operations disclosed herein. In some embodiments, the apparatus may include a pin or other means to receive power. In some embodiments, the apparatus may include power conditioning hardware.
The term “machine-readable medium” may include any medium that is capable of storing, encoding, or carrying instructions for execution by machine 1900 and that causes machine 1900 to perform any one or more of the techniques of the present disclosure or that is capable of storing, encoding, or carrying data structures used by or associated with such instructions. Non-limiting machine-readable medium examples may include solid-state memories and optical and magnetic media. Specific examples of machine-readable media may include non-volatile memory, such as semiconductor memory devices (e.g., Electrically Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM)) and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; Random Access Memory (RAM); and CD-ROM and DVD-ROM disks. In some examples, machine-readable media may include non-transitory machine-readable media. In some examples, machine-readable media may include machine-readable media that is not a transitory propagating signal.
The instructions 1924 may further be transmitted or received over a communications network 1926 using a transmission medium via the network interface device 1920 utilizing any one of several transfer protocols (e.g., frame relay, internet protocol (IP), transmission control protocol (TCP), user datagram protocol (UDP), hypertext transfer protocol (HTTP), etc.). Example communication networks may include a local area network (LAN), a wide area network (WAN), a packet data network (e.g., the Internet), mobile telephone networks (e.g., cellular networks), Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards known as Wi-Fi®, IEEE 802.16 family of standards known as WiMax®), IEEE 802.15.4 family of standards, a Long Term Evolution (LTE) family of standards, a Universal Mobile Telecommunications System (UMTS) family of standards, peer-to-peer (P2P) networks, among others.
In an example, the network interface device 1920 may include one or more physical jacks (e.g., Ethernet, coaxial, or phone jacks) or one or more antennas to connect to the communications network 1926. In an example, the network interface device 1920 may include one or more antennas 1960 to wirelessly communicate using at least one single-input multiple-output (SIMO), multiple-input multiple-output (MIMO), or multiple-input single-output (MISO) techniques. In some examples, the network interface device 1920 may wirelessly communicate using multiple-user MIMO techniques. The term “transmission medium” shall be taken to include any intangible medium that can store, encode, or carry instructions for execution by the machine 1900 and includes digital or analog communications signals or other intangible media to facilitate communication of such software.
Examples, as described herein, may include, or may operate on, logic or several components, modules, or mechanisms. Modules are tangible entities (e.g., hardware) capable of performing specified operations and may be configured or arranged in a particular manner. In an example, circuits may be arranged (e.g., internally or concerning external entities such as other circuits) in a specified manner as a module. In an example, the whole or part of one or more computer systems (e.g., a standalone, client, or server computer system) or one or more hardware processors may be configured by firmware or software (e.g., instructions, an application portion, or an application) as a module that operates to perform specified operations. In an example, the software may reside on a machine-readable medium. In an example, the software, when executed by the underlying hardware of the module, causes the hardware to perform the specified operations.
Accordingly, the term “module” is understood to encompass a tangible entity, be that an entity that is physically constructed, specifically configured (e.g., hardwired), or temporarily (e.g., transitorily) configured (e.g., programmed) to operate in a specified manner or to perform part, all, or any operation described herein. Considering examples in which modules are temporarily configured, each of the modules need not be instantiated at any one moment in time. For example, where the modules comprise a general-purpose hardware processor configured using the software, the general-purpose hardware processor may be configured as respective different modules at separate times. The software may accordingly configure a hardware processor, for example, to constitute a particular module at one instance of time and to constitute a different module at a different instance of time.
Some embodiments may be implemented fully or partially in software and/or firmware. This software and/or firmware may take the form of instructions contained in or on a non-transitory computer-readable storage medium. Those instructions may then be read and executed by one or more processors to enable the performance of the operations described herein. The instructions may be in any suitable form, such as but not limited to source code, compiled code, interpreted code, executable code, static code, dynamic code, and the like. Such a computer-readable medium may include any tangible non-transitory medium for storing information in a form readable by one or more computers, such as but not limited to read-only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media, flash memory, etc.
The above-detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show, by way of illustration, specific embodiments that may be practiced. These embodiments are also referred to herein as “examples.” Such examples may include elements in addition to those shown or described. However, also contemplated are examples that include the elements shown or described. Moreover, also contemplated are examples using any combination or permutation of those elements shown or described (or one or more aspects thereof), either with respect to a particular example (or one or more aspects thereof) or with respect to other examples (or one or more aspects thereof) shown or described herein.
Publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usage between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) is supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of “at least one” or “one or more.” In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc., are used merely as labels and are not intended to suggest a numerical order for their objects.
The embodiments as described above may be implemented in various hardware configurations that may include a processor for executing instructions that perform the techniques described. Such instructions may be contained in a machine-readable medium such as a suitable storage medium or a memory or other processor-executable medium.
The embodiments as described herein may be implemented in several environments, such as part of a system on chip, a set of intercommunicating functional blocks, or similar, although the scope of the disclosure is not limited in this respect.
Described implementations of the subject matter can include one or more features, alone or in combination, as illustrated below by way of examples.
Example 1 is an apparatus comprising: a first plurality of adders comprising a corresponding plurality of input terminals receiving a first input symbol; a first plurality of comparators, an input terminal for each comparator of the first plurality of comparators coupled to a corresponding output terminal of an adder of the first plurality of adders; a second plurality of adders comprising a corresponding plurality of input terminals receiving a second input symbol; a second plurality of comparators, an input terminal for each comparator of the second plurality of comparators coupled to a corresponding output terminal of an adder of the second plurality of adders; and a first plurality of multiplexers, each multiplexer of the first plurality of multiplexers coupled to output terminals of the second plurality of comparators, and a selection terminal for each multiplexer of the first plurality of multiplexers is coupled to at least one output terminal of a comparator of the first plurality of comparators.
In Example 2, the subject matter of Example 1 includes a third plurality of adders comprising a corresponding plurality of input terminals receiving a third input symbol.
In Example 3, the subject matter of Example 2 includes a third plurality of comparators, wherein an input terminal for each comparator of the third plurality of comparators is coupled to a corresponding output terminal of an adder of the third plurality of adders.
In Example 4, the subject matter of Example 3 includes a second plurality of multiplexers, each multiplexer of the second plurality of multiplexers coupled to output terminals of the third plurality of comparators, and a selection terminal for each multiplexer of the second plurality of multiplexers is coupled to at least one output terminal of a multiplexer of the first plurality of multiplexers.
In Example 5, the subject matter of Example 4 includes a fourth plurality of adders comprising a corresponding plurality of input terminals receiving a fourth input symbol.
In Example 6, the subject matter of Example 5 includes a fourth plurality of comparators, wherein an input terminal for each comparator of the fourth plurality of comparators is coupled to a corresponding output terminal of an adder of the fourth plurality of adders.
In Example 7, the subject matter of Example 6 includes a third plurality of multiplexers, each multiplexer of the third plurality of multiplexers coupled to output terminals of the fourth plurality of comparators, and a selection terminal for each multiplexer of the third plurality of multiplexers is coupled to at least one output terminal of a comparator of the third plurality of comparators.
In Example 8, the subject matter of Example 7 includes a fourth plurality of multiplexers, each multiplexer of the fourth plurality of multiplexers coupled to output terminals of the third plurality of multiplexers, and a selection terminal for each multiplexer of the fourth plurality of multiplexers is coupled to at least one output terminal of a multiplexer of the second plurality of multiplexers.
In Example 9, the subject matter of Example 8 includes a first single output multiplexer, the first single output multiplexer comprising four input terminals coupled to corresponding output terminals of the first plurality of comparators.
In Example 10, the subject matter of Example 9 includes a second single output multiplexer, the second single output multiplexer comprising four input terminals coupled to corresponding output terminals of the first plurality of multiplexers.
In Example 11, the subject matter of Example 10 includes a third single output multiplexer, the third single output multiplexer comprising four input terminals coupled to corresponding output terminals of the second plurality of multiplexers.
In Example 12, the subject matter of Example 11 includes a fourth single output multiplexer, the fourth single output multiplexer comprising four input terminals coupled to corresponding output terminals of the fourth plurality of multiplexers.
In Example 13, the subject matter of Examples 1-12 includes one or more interconnects coupled to the first plurality of adders, the first plurality of comparators, the second plurality of adders, the second plurality of comparators, and a first plurality of multiplexers.
In Example 14, the subject matter of Examples 1-13 includes subject matter where the apparatus comprises a processor, and wherein the processor includes one or more of the first plurality of adders, the first plurality of comparators, the second plurality of adders, the second plurality of comparators, and a first plurality of multiplexers.
In Example 15, the subject matter of Examples 12-14 includes one or more interconnects coupling two or more of the first plurality of adders, the first plurality of comparators, the second plurality of adders, the second plurality of comparators, the third plurality of adders, the third plurality of comparators, the fourth plurality of adders, the fourth plurality of comparators, the first plurality of multiplexers, the second plurality of multiplexers, the third plurality of multiplexers, and the fourth plurality of multiplexers.
Example 16 is a digital feedback equalizer comprising a plurality of loop unrolling circuits, each loop unrolling circuit of the plurality of loop unrolling circuits configured to receive an input symbol, generate a plurality of modified symbols based on the input symbol, and a plurality of symbol versions associated with a previously generated output symbol; and generate a plurality of output symbols based on the plurality of modified symbols; and a first plurality of multiplexers coupled to a first loop unrolling circuit of the plurality of loop unrolling circuits, the first plurality of multiplexers configured to output one or more of the plurality of output symbols generated by the first loop unrolling circuit based on a selection signal comprising one or more of the plurality of output symbols generated by a second loop unrolling circuit of the plurality of loop unrolling circuits.
In Example 17, the subject matter of Example 16 includes a first single output multiplexer coupled to the first loop unrolling circuit, the first single output multiplexer to output one of the plurality of output symbols generated by the first loop unrolling circuit based on a selection signal comprising the previously generated output symbol.
In Example 18, the subject matter of Example 17 includes a second plurality of multiplexers coupled to a second loop unrolling circuit of the plurality of loop unrolling circuits, the second plurality of multiplexers configured to output one or more of the plurality of output symbols generated by the second loop unrolling circuit based on a selection signal comprising one or more of the plurality of output symbols generated by the first loop unrolling circuit and output by the first plurality of multiplexers.
In Example 19, the subject matter of Example 18 includes a second single output multiplexer coupled to the second plurality of multiplexers, the second single output multiplexer to output one of the plurality of output symbols generated by the second loop unrolling circuit based on the selection signal comprising the previously generated output symbol.
In Example 20, the subject matter of Example 19 includes a third plurality of multiplexers coupled to a third loop unrolling circuit of the plurality of loop unrolling circuits, the third plurality of multiplexers configured to output one or more of the plurality of output symbols generated by the third loop unrolling circuit based on a selection signal comprising one or more of the plurality of output symbols generated by the second loop unrolling circuit.
In Example 21, the subject matter of Example 20 includes a fourth plurality of multiplexers coupled to the third plurality of multiplexers, the fourth plurality of multiplexers configured to output one or more of the plurality of output symbols generated by the third loop unrolling circuit based on a selection signal comprising one or more of the plurality of output symbols generated by the second loop unrolling circuit and output by the second plurality of multiplexers.
In Example 22, the subject matter of Example 21 includes a third single output multiplexer coupled to the fourth plurality of multiplexers, the third single output multiplexer to output one of the plurality of output symbols generated by the third loop unrolling circuit and output by the fourth plurality of multiplexers based on the selection signal comprising the previously generated output symbol.
Example 23 is a method comprising receiving at least a first input symbol and a second input symbol; generating a first plurality of modified symbols based on the first input symbol and a plurality of symbol versions associated with a previously generated output symbol; generating a first plurality of output symbols based on the first plurality of modified symbols; multiplexing the first plurality of output symbols to generate first multiplexed output symbols using a first selection signal, the first selection signal based on one or more of a second plurality of output signals corresponding to a second input signal; and outputting one of the first multiplexed output symbols using a second selection signal, the second selection signal based on the previously generated output symbol.
In Example 24, the subject matter of Example 23 includes generating a second plurality of modified symbols based on the second input symbol and the plurality of symbol versions associated with the previously generated symbol; generating a second plurality of output symbols based on the second plurality of modified symbols; and outputting one of the second plurality of modified symbols based on the previously generated output symbol.
Example 25 is at least one machine-readable medium including instructions that, when executed by processing circuitry, cause the processing circuitry to perform operations to implement any of Examples 1-24.
Example 26 is an apparatus comprising means to implement any of Examples 1-24.
Example 27 is a system to implement any of Examples 1-24.
Example 28 is a method to implement any of Examples 1-24.
The above description is intended to be illustrative and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with others. Other embodiments may be used, such as by one of ordinary skill in the art upon reviewing the above description. The abstract is to allow the reader to ascertain the nature of the technical disclosure quickly. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped to streamline the disclosure. However, the claims may not set forth every feature disclosed herein as embodiments may feature a subset of said features. Further, embodiments may include fewer features than those disclosed in a particular example. Thus, the following claims are hereby incorporated into the Detailed Description, with a claim standing on its own as a separate embodiment. The scope of the embodiments disclosed herein is to be determined regarding the appended claims, along with the full scope of equivalents to which such claims are entitled.