The present invention relates to enabling robust and online fault tolerance in optical interconnection networks. In particular, the present invention relates to methods and apparatus for lane fault tolerance in parallel surface-normal multi-wavelength optical interconnect networks.
Fault tolerance is an important consideration in any communication system, and particularly in optical networks where transmission errors can result in significant data loss. As such, numerous approaches have been proposed to improve fault tolerance in optical networks. One approach to improving fault tolerance is through the use of redundant lanes, carried by redundant wavelengths. Redundancy provides a mechanism for maintaining system functionality in the event of failures or errors. This can be achieved through the use of standby lanes, which are activated in the event of failure on the primary lane, or through the use of parallel lanes that provide alternative pathways for data transmission.
The use of built-in lane redundancy improves fault tolerance in parallel multi-wavelength surface-normal optical links. The redundant lanes are coupled into a single fiber core through an integrated and compact surface-normal multiplexer. This technique reduces the cost, area, and power penalties accrued for the redundant lanes.
Systems utilize a multitude of redundant wavelengths, corresponding to redundant optical elements and lanes, as a failover mechanism. A detection and isolation logic pinpoints the faulty lane, and an online failover approach ensures the faulty lane's data is routed to a redundant lane. Features include out-of-band and side-band communication of the fault information.
Apparatus for detecting and repairing faults in an optical communication system has spaced apart nodes connected by an optical fiber. Each node has an optical engine comprising multiple optical transmitters and multiple optical receivers including a redundant transmitter and a redundant receiver, as well as a wavelength multiplexer/demultiplexer and link control circuitry. The optical transmitters emit at differing wavelengths and are coupled into the optical fiber through the wavelength multiplexer/demultiplexer, and additional wavelengths are demultiplexed from the optical fiber to the optical receivers through the wavelength multiplexer/demultiplexer. A lane is a transmitter at an optical engine at a first node, the optical fiber, and a receiver at an optical engine at a second node.
The link control circuitry is configured to detect faulty lanes in real time while the apparatus is communicating. Link control circuitry at the first node and link control circuitry at the second node communicate with each other to identify the faulty lane, send data to the redundant lane, deskew the redundant lane data, and turn off the faulty lane. Multiple lanes may designated as redundant lanes to replace multiple faulty lanes.
The optical multiplexer/demultiplexers can be thin-film filter zig-zag multiplexer/demultiplexers with some filter bands reserved for redundant wavelengths. In some embodiments, the optical engines has two or more links, and each link has multiple optical transmitters and multiple optical receivers including a redundant transmitter and a redundant receiver, and link control circuitry. For example, an embodiment may include 32 links.
The optical transmitters and optical receivers may be surface normal to the optical engine. The optical transmitters and optical receivers might be directly integrated on a silicon logic layer of an optical engine comprising physical and data link layers. In some embodiments, the optical transmitters are vertical-cavity surface-emitting lasers with cavities tuned for wavelengths partitioned across a wavelength band, some of those wavelengths being redundant and the receivers are broadband photodetectors responsive across the wavelength band.
A method of detecting and repairing faults in an optical communication system having nodes spaced apart from each other and connected via an optical fiber, includes providing at each node an optical engine comprising multiple primary optical transmitters and multiple primary optical receivers, one redundant optical transmitter and one redundant optical receiver, link control circuitry, and a wavelength multiplexer/demultiplexer, providing multiple primary lanes between the nodes, wherein a lane is defined as a primary transmitter at an optical engine at a far-side node, the optical fiber, and a primary receiver at an optical engine at a near-side node, communicating between the nodes via the primary lanes, transmitting from optical transmitters at differing wavelengths, coupling the differing wavelength transmissions into the optical fiber via the wavelength multiplexer/demultiplexer, demultiplexing the differing wavelength transmissions from the optical fiber to the optical receivers via the wavelength multiplexer/demultiplexer, monitoring communication and detecting faulty primary lanes while communicating.
Once faulty lane is detected, it is identified at the near-side receiver of the faulty primary lane. Next is failover event communication of the faulty primary lane from the near-side of the faulty primary lane to the far-side of the faulty primary lane. A redundant lane is created using a redundant transmitter adjacent to the primary transmitter of the faulty lane and a redundant receiver adjacent to the primary receiver of the faulty lane. While communication continues, including on the faulty lane, the redundant lane is trained.
Data from the faulty link is used to deskew the redundant lane. For example, the data sent on the redundant lane can mirror the data sent on the faulty lane. Once the redundant lane is deskewed, the faulty lane can be taken offline.
Detecting a faulty lane evaluates communication errors, for example retry logs. Or error counts, an eye scan, or analog to digital converter (ADC) histogram may be used.
The failover event communicating step can be performed using an idle redundant transmitter as sideband or using an out-of-band fabric manager.
Table 1 lists elements of the present invention and their corresponding reference numbers.
In
Following lane training, in step 512 the selected far-side redundant transmitter 157 switches to mirroring the faulty lane data, enabling the receiver to deskew the redundant lane. Finally the faulty lane is turned off in step 514 and the link resumes normal operation 502.
This application relates to U.S. patent application Ser. No. ______, titled “Redundant Transmission and Receive Elements for High-Bandwidth Communication” by inventors Ryan Boesch, J. Israel Ramirez, and Keith Behrman, and filed concurrently herewith, which application is hereby incorporated herein by reference. This application claims the benefit of U.S. Patent Application 63/326,193, filed 31 Mar. 2022 and incorporates it herein by reference.
Number | Date | Country | |
---|---|---|---|
63326193 | Mar 2022 | US |