Pluggable optical modules, and particularly the hardware therein, have become increasingly complex in recent years. In addition, pluggable optical modules (or host systems to which the modules are connected) installed at customer sites or in the field may fail for various reasons. Thus, a method for reconstructing the exact failure is needed in order to repair and/or improve the pluggable modules. However, due to the increasing complexity, reconstruction can be challenging. In particular, reconstruction may be difficult due to differences in module configuration and setup, module environment variables and failure conditions. Further, reconstruction may not be feasible because of the financial impact on the customer and the anticipated service interruption.
Disclosed herein are methods and apparatuses for providing on-board failure logging (OBFL) capability to pluggable optical modules. By providing OBFL capability, engineers may be able to more easily reconstruct the exact failure occurring on either the pluggable optical modules or the host systems. Specifically, in order to assist troubleshooting failures, environment variables of the optical modules, failure of events, optical module statuses, etc. may be captured and stored in memory of the optical module such that the root cause for the failure can be determined at a later point in time.
A method of capturing a failure log for an optical module for use with a host system may be provided according to one implementation of the invention. The optical module may include an optical monitoring circuit and a buffer memory. The method may include: measuring environment variables of the optical module using the optical monitoring circuit; capturing an optical module control and status signal and storing the measured environment variables and the optical module control and status signal in the buffer memory.
Optionally, the method may include receiving time of day information. The measured environment variables and the optical module control and status signal may then be associated with the time of day information.
In some implementations, the time of day information may be received from the host system.
In other implementations, the time of day information may be extracted from network data.
Alternatively or additionally, the method may include receiving time of day information, measuring environment variables of the optical module using the optical monitoring circuit, capturing an optical module control and status signal and storing the measured environment variables and the optical module control and status signal in the buffer memory every predetermined time period. For example, the predetermined time period may be 1 second.
In one implementation, the method may include storing the measured environment variables and the optical module control and status signal in the buffer memory using a round-robin algorithm.
In addition, the environment variables may include at least one of temperature, supply voltage, transmission optical power and reception optical power of the optical module.
An optical module for use with a host system may be provided according to another implementation of the invention including: an optical monitoring circuit configured to measure environment variables of the optical module; a buffer memory; and a computing device configured to receive the measured environment variables, capture an optical module control and status signal and store the measured environment variables and the optical module control and status signal in the buffer memory.
Optionally, the computing device may be configured to receive time of day information. The measured environment variables and the optical module control and status signal may then be associated with the time of day information.
In some implementations, the time of day information may be received from the host system.
In other implementations, the time of day information may be extracted from network data.
Alternatively or additionally, the computing device may be configured to receive time of day information, receive the measured environment variables of the optical module, capture an optical module control and status signal and store the measured environment variables and the optical module control and status signal in the buffer memory every predetermined time period.
The computing device may also be configured to store the measured environment variables and the optical module control and status signal in the buffer memory using a round-robin algorithm.
In addition, the buffer memory may be EEPROM. Alternatively or additionally, the buffer memory may have a capacity of 25.2 k bytes.
Further, the environment variables of the optical module may include at least one of temperature, supply voltage, transmission optical power and reception optical power of the optical module.
A non-transient computer-readable storage medium may be provided according to yet another implementation of the invention. The storage medium may have computer-executable instructions stored thereon that cause a computing device of an optical module for use with a host system to: receive measured environment variables of the optical module from an optical monitoring circuit; capture an optical module control and status signal and store the measured environment variables and the optical module control and status signal in a buffer memory.
In some implementations, the computer-executable instructions may cause the computing device to receive time of day information. The measured environment variables and the optical module control and status signal may then be associated with the time of day information.
Alternatively or additionally, the computer-executable instructions may cause the computing device to receive time of day information, receive measured environment variables of the optical module from the optical monitoring circuit, capture an optical module control and status signal and store the measured environment variables and the optical module control and status signal in the buffer memory every predetermined time period.
Other systems, methods, features and/or advantages will be or may become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features and/or advantages be included within this description and be protected by the accompanying claims.
The components in the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding parts throughout the several views.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. Methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure. While implementations will be described for providing OBFL capability to pluggable optical modules, it will become evident to those skilled in the art that the implementations are not limited thereto, but are applicable for providing OBFL capability to other devices.
The pluggable module 104 may also include a computing device 110. The computing device 110 may preferably include a processing unit and memory (i.e., volatile and/or non-volatile memory), for example. According to existing multisource agreements, pluggable modules should at least include a non-volatile EEPROM for storing information regarding the optical module. Thus, if the pluggable module 104 does not include the computing device, the pluggable module 104 should at least include memory for storing information regarding the pluggable module. This is discussed below with regard to
In some implementations, the computing device 110 may be a microcontroller, i.e., an IC chip having a processing unit, system memory and programmable input/output interfaces. The system memory of the microcontroller may optionally be NVRAM. Additionally, the microcontroller may be programmed to control operations of the pluggable module 104.
The pluggable module 104 shown in
One skilled in the art would understand that the pluggable modules 104 shown in
In order to provide OBFL capability, the existing resources of the pluggable module 204 can be leveraged to store failure-related data to an internal memory (i.e., a non-volatile buffer memory, for example) for retrieval at a later time to reconstruct the exact failure. Thus, the existing memory of the computing device 210 may optionally be extended to retain the failure-related data. In addition, the pluggable module hardware (i.e., the optical monitoring circuit, for example) may be utilized to measure environment variables. In one implementation, the optical monitoring circuit 212 may measure module temperature, supply voltage, transmission optical power and reception optical power. A failure log module 220 executed by the computing device 210 may configure the measured environment variables for storage in the memory of the computing device 210. The failure log module 220 may be implemented using module firmware. However, one of ordinary skill in the art would understand that the failure log module 220 may also be implemented using hardware, firmware or software, or any combination thereof. In addition, the failure log module 220 may be configured to capture and store the pluggable module control and status signal, as well as the environment variables. The pluggable module control and status signal may contain information regarding module events such as system reset events, system alarm events and/or time of day (TOD) information. Each instance of stored data should preferably be associated with the TOD information (i.e., a time stamp) so that the sequence of failure events can be determined. However, the time stamp may optionally be stored only at the beginning of the round-robin in order to reduce the size of the memory required for storing the failure-related data. In some implementations, 7 bytes may be required to store the captured pluggable module environment variables and the control and status signal, for example. In particular, 1 byte may be required for each of the module temperature and supply voltage, 2 bytes may be required for each of the transmission and reception optical powers and 1 byte may be required for the control and status signal. Additionally, 5 bytes may be required for the time stamp included with each instance of failure-related data. For example, the 5 byte time stamp is calculated based on a decimal of 2-digit of year+2-digit of month+2-digit of day+2-digit of 24-h hour+2-digit of minutes+2-digit of seconds. Thus, the worst case is decimal “991231235959,” which equals hex 0xE6_C9_FC—57—77.
The size of the memory required to store the failure-related data may be determined based on the capture frequency and the desired sample period. In one implementation, in order to provide sufficient information to reconstruct the exact failure, the capture frequency may be every 1 second and the desired sample period may be 1 hour. Accordingly, the depth of the array required for storing the failure-related data is 3.6 k (i.e., 3,600 samples per hour). However, the capture frequency and the desired sample are not limited to the above example and may be set to any value by the host. The size of the required memory will increase by increasing the capture frequency and/or the desired sample period. For example, when the failure log module 220 is configured to capture and store the pluggable module environment variables and control and status signal every 1 second and the desired sample period is 1 hour, the memory must be capable of storing 3,600 samples. Thus, the maximum size of the memory required may be (7 byte failure-related data+5 byte time stamp)*3,600=43.2 k bytes. In addition, in order to minimize the size of the memory, the failure log module 220 may be configured to perform a round-robin storage algorithm. In other words, the memory becomes a circular buffer and when the buffer is full, a subsequent write is performed by overwriting the oldest data. Optionally, in order to further reduce the size of the memory, the 5 byte time stamp may only be stored at the beginning of the round-robin in other implementations. Because the capture frequency is known, the time associated with each instance of failure-related data can be derived based on the 5 byte time stamp stored at the beginning of the round-robin and the capture frequency. In this case, the minimum size of the memory required may be (7 byte failure-related data)*3,600+5 byte time stamp=25,195 bytes (i.e., 25.2 k bytes).
As discussed above, the pluggable module 304 shown in
After obtaining the TOD information, the pluggable module environment variables and the control and status signal may be captured at 412, and then the module information may be stored in memory at 414. As discussed above, the pluggable module environment variables and the control and status signal may be continuously captured and stored after determining that the host system should start capturing at 418 using a round-robin storage algorithm until determining that the host system should stop capturing at 416. The process for providing OBFL capability ends at 420 when the logged data is dumped from the host system.
It should be understood that the various techniques described herein may be implemented in connection with hardware, firmware or software or, where appropriate, with a combination thereof. Thus, the methods and apparatuses of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computing device, the machine becomes an apparatus for practicing the presently disclosed subject matter. In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs may implement or utilize the processes described in connection with the presently disclosed subject matter, e.g., through the use of an application programming interface (API), reusable controls, or the like. Such programs may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language and it may be combined with hardware implementations.
By providing OBFL capability to pluggable optical modules, engineers may be able to more efficiently troubleshoot failures because the module environment variables, failure events and module statuses may be captured and stored by the pluggable module. In addition to facilitating reconstruction of the exact failure, providing OBFL capability to pluggable modules may reduce the cost of troubleshooting to the customer and minimize service interruptions with only an incremental cost increase in producing the pluggable module.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.