The present invention relates to the processing state information such as interrupts in a hierarchical network of nodes having a tree configuration.
Modern computer systems often comprise many components interacting with one another in a highly complex fashion. For example, a server installation may include multiple processors, configured either within their own individual (uniprocessor) machines, or combined into one or more multiprocessor machines. These systems operate in conjunction with associated memory and disk drives for storage, video terminals and keyboards for input/output, plus interface facilities for data communications over one or more networks. The skilled person will appreciate that many additional components may also be present.
The ongoing maintenance of such complex systems can be an extremely demanding task. Typically various hardware and software components need to be upgraded and/or replaced, and general system administration tasks must also be performed, for example to accommodate new uses or users of the system. There is also a need to be able to detect and diagnose faulty behaviour, which may arise from either software or hardware problems.
One known mechanism for simplifying the system management burden is to provide a single point of control from which the majority of control tasks can be performed. This is usually provided with a video monitor and/or printer, to which diagnostic and other information can be directed, and also a keyboard or other input device to allow the operator to enter desired commands into the system.
It will be appreciated that such a centralised approach generally provides a simpler management task than a situation where the operator has to individually interact with all the different processors or machines in the installation. In particular, the operator typically only needs to monitor diagnostic information at one output in order to confirm whether or not the overall system is operating properly, rather than having to individually check the status of each particular component.
However, although having a single control terminal makes it easier from the perspective of a system manager, the same is not necessarily true from the perspective of a system designer. Thus the diagnostic or error information must be passed from the location where it is generated, presumably close to the source of the error, out to the single service terminal.
One known mechanism for collating diagnostic and other related system information is through the use of a service bus. This bus is terminated at one end by a service processor, which can be used to perform control and maintenance tasks for the installation. Downstream of the service processor, the service bus connects to all the different parts of the installation from which diagnostics and other information have to be collected.
(As a rough analogy, one can consider the service processor as the brain, and the service bus as the nervous system permeating out to all parts of the body to monitor and report back on local conditions. However, the analogy should not be pushed too far, since the service bus is limited in functionality to diagnostic purposes; it does not form part of the mainstream processing apparatus of the installation).
In designing the architecture of the service bus, there are various trade-offs that have to be made. Some of these are standard with communications devices, such as the (normally conflicting) requirements for speed, simplicity, scalability, high bandwidth or information capacity, and cheapness. However, there is also a specialised design consideration for the service bus, in that it is particularly likely to be utilised when there is some malfunction in the system. Accordingly, it is important for the service bus to be as reliable and robust as possible, which in turn suggests a generally low-level implementation.
One particular problem is that a single fault in a complex system will frequently lead to a sort of avalanche effect, with multiple errors being experienced throughout the system. There is a danger that in trying to report these errors, the service bus may be swamped or overloaded, hindering rapid and effective diagnosis of the fault.
In accordance with one embodiment of the invention, there is provided a method of processing interrupt state information in a hierarchical network of nodes having a tree configuration, comprising a root node at the top of the hierarchy, one or more intermediate nodes, and a plurality of leaf nodes at the bottom of the hierarchy. Each leaf node is linked to the root node by zero, one or more intermediate nodes. Intrinsic information is maintained at each leaf node about one or more interrupt states, and extrinsic information is maintained at each intermediate node. This extrinsic information is derived from the interrupt states of those leaf nodes below the intermediate node in the hierarchy. The method navigates from the root node to a first leaf node having at least one set interrupt state, and masks out the set interrupt state at the first leaf node. The extrinsic information in any intermediate nodes above the first leaf node in the hierarchy is then updated in accordance with the fact that the set interrupt state at the first leaf node is now masked out. This process is repeated for all other leaf nodes in the network having a set interrupt state.
A node typically represents a computer system or a component (such as a processor) within a computer system. The network can span one or more computer systems, with the nodes linked together by any suitable data communications links. Note that neither the nodes nor the communications links have to be homogeneous throughout the network. The method is also applicable to other forms of network in which interrupt information is transferred from one node to another.
Leaf nodes at the bottom of the tree store intrinsic information; in other words, as far as the network is concerned, intrinsic information is generated internally within the node where it is stored (although its ultimate origin may be outside the leaf node per se). This is to be contrasted with extrinsic information stored at intermediate nodes, which is dependent on the interrupt state of leaf nodes below the intermediate node in the hierarchy, rather than any internal state of the intermediate node itself.
In one embodiment, any change in the interrupt state of a node in the network is automatically propagated to those nodes above it in the hierarchy, which then update their extrinsic information in accordance with the changed interrupt state of the node. Thus if the interrupt state of a node changes, it spontaneously or autonomously sends notification of this to the node above it in the network, or sets a state on some line that can be detected by the other node.
In general, it is the responsibility of the root node to process the interrupt state information from all the leaf nodes. Since there are many leaf nodes for a single root node, it is important for the root node to be able to do this without being bombarded by excessive amounts of interrupt state data being sent back up the network. In one embodiment, this is assisted by two levels of consolidation. Firstly, within a leaf node itself, there can be multiple information items, each of which is set according to whether or not a corresponding interrupt is present, and each of which may be individually masked out. A leaf node is regarded as having a particular output state if at least one of these information items is set without being masked out. The extrinsic information maintained at those intermediate nodes above the leaf node in the hierarchy is then determined accordingly. Secondly, within an intermediate node, the extrinsic information represents a consolidated version of the individual interrupt states of all leaf nodes and any intermediate nodes below it in the hierarchy. This consolidated version is then regarded as representing the particular output state of the intermediate node, for passing up the tree. As a result of the above scheme, there is only a single overall (consolidated) interrupt status associated with any given node, whether a leaf node or an intermediate node, thereby providing a manageable information flow to the root node.
A limitation of the above approach is that once an intermediate node is set to the consolidated output state, it is, in effect, saturated. In other words, it can no longer respond if another leaf node below it is set to the particular output state, since there will not be any change in the consolidated status for the intermediate node. However, the method described above allows the network to re-sensitise itself. In one embodiment, this is done by repeatedly descending through those intermediate nodes whose extrinsic information indicates that a leaf node below it has the particular output state, and masking out the set interrupt states at the relevant leaf nodes (typically on an item by item basis). This has the effect of removing the particular output state of this leaf node from the consolidated version seen by intermediate nodes above the leaf node in the hierarchy, which in turn allows output state information from other leaf nodes to propagate up this route.
Thus the particular output state of each leaf node can be examined one at a time, and any interrupt states contained within that leaf node masked out. This then provides a systematic and controlled approach for the root node to investigate interrupt status at the various leaf nodes.
Note that the interrupt states of the leaf node are simply masked out to allow the network to be quickly re-sensitised. Any more substantive processing and resetting of the interrupt states of a leaf node is likely to be more time-consuming, and so is deferred until later. Although the network can no longer detect the state of the masked information items, this is acceptable because in many circumstances the event of most interest is when an information item first indicates the presence of a particular interrupt state. Subsequent transitions in this interrupt state are then of lesser interest until the root node or some other control system is properly able to reset the item (or more accurately, the underlying component or device with which the interrupt is associated). At this point, the mask for the interrupt can be cleared, so that the network is once again sensitised to this information item.
In one embodiment, each information item comprises a binary variable representing the presence or absence of an interrupt. A status register is used for storing the information items as individual bits, and a masking register is used for storing a plurality of mask bits. Each mask bit corresponds to an information item in the status register, so that an information item can be masked out by setting the corresponding mask bit. (Of course, the mask can be configured as having negative or positive polarity).
In one embodiment, at least one intermediate node in the network also maintains intrinsic information comprising one or more information items. Each of these items can be set according to whether or not a corresponding interrupt is present, and each item can be individually masked out. This intrinsic information can be processed in substantially the same manner as the intrinsic information in leaf nodes. (Note that the consolidated interrupt status of such an intermediate node is set to indicate the presence of an interrupt if any information item therein is set to indicate the presence of an interrupt, or if any leaf node below it in the hierarchy has an interrupt present).
In another embodiment of the invention, there is provided a method of processing interrupt state information in a leaf node in a hierarchical network of nodes. The network has a tree configuration comprising a root node at the top of the hierarchy, one or more intermediate nodes, and a plurality of leaf nodes at the bottom of the hierarchy. Each leaf node is linked to the root node by zero, one or more intermediate nodes. The method involves maintaining one or more information items at the leaf node, each of which may be set according to whether or not a corresponding interrupt is present. Each information item may also be individually masked out. The leaf node is regarded as having a particular output state if at least one of the information items is set to indicate the presence of an interrupt without being masked out. It is assumed that initially the leaf node does not have the particular output state, but subsequently at least one information item is set to indicate that an interrupt is present. This first change in interrupt state of the leaf node is propagated to the intermediate node above it in the hierarchy. Responsive to a command received over the network, the relevant set interrupt state is then masked out, and consequently a second change in the particular output state of the leaf node is now propagated to the intermediate node above it in the hierarchy.
In another embodiment, there is provided a method of processing interrupt state information in an intermediate node in a hierarchical network of nodes. The network has a tree configuration comprising a root node at the top of the hierarchy, one or more intermediate nodes, and a plurality of leaf nodes at the bottom of the hierarchy. Each leaf node is linked to the root node by zero, one or more intermediate nodes. The method involves maintaining at the intermediate node an extrinsic information item representing a consolidated version of whether an interrupt state is present in any leaf node or intermediate node below the intermediate node in the hierarchy, and one or more intrinsic information items, each of which may be set to indicate the presence of a corresponding interrupt state, and each of which may be individually masked out. The intermediate node is set to have an overall interrupt state if at least one of the intrinsic or extrinsic information items indicates the presence of an interrupt state without being masked out. The intermediate node is responsive to a command from higher in the network to mask out any intrinsic information item that is set to indicate the presence of an interrupt state, with any change in the overall interrupt state of the intermediate node then being propagated up the network hierarchy.
In accordance with another embodiment of the invention, there is provided apparatus forming a hierarchical network of nodes having a tree configuration, comprising a root node at the top of the hierarchy, one or more intermediate nodes, and a plurality of leaf nodes at the bottom of the hierarchy. Each leaf node is linked to the root node by zero, one or more intermediate nodes. Each leaf node includes memory for maintaining intrinsic information node about whether one or more interrupt states in the leaf node are set, a mask corresponding to each interrupt state, for causing the state to be disregarded if the mask is set, and a communications link to an intermediate node. The leaf node is responsive to a change in one or more interrupt states to notify the intermediate node accordingly over the communications link. Each intermediate node includes memory for maintaining extrinsic information about leaf nodes below it in the hierarchy having at least one set interrupt state. The apparatus further includes logic for processing each leaf node in turn having at least one set interrupt state to mask out the set interrupt state.
In accordance with another embodiment of the invention, there is provided apparatus for use as a leaf node in a hierarchical network of nodes. The network has a tree configuration comprising a root node at the top of the hierarchy, one or more intermediate nodes, and a plurality of leaf nodes at the bottom of the hierarchy. Each leaf node is linked to the root node by zero, one or more intermediate nodes. The apparatus comprises memory for maintaining one or more information items at the leaf node, each of which is set according to whether or not a corresponding interrupt is present, and each of which may be individually masked out responsive to a command received over the network. The leaf node is regarded as having a particular output state if at least one of the information items is set without being masked out. Initially it is assumed that the leaf node does not have the particular output state. The apparatus further comprises logic for setting at least one information item to indicate that a corresponding interrupt is present, and a communications link for connection to an intermediate node immediately above the leaf node in the hierarchy, to allow a change in the output state of the leaf node to be automatically propagated over the link to the intermediate node.
In accordance with another embodiment, there is provided apparatus for use as an intermediate node in a hierarchical network of nodes. The network has a tree configuration comprising a root node at the top of the hierarchy, one or more intermediate nodes, and a plurality of leaf nodes at the bottom of the hierarchy. Each leaf node is linked to the root node by zero, one or more intermediate nodes. The apparatus includes a memory for storing an extrinsic information item representing a consolidated version of whether an interrupt is present in any leaf node or intermediate node in the hierarchy below the intermediate node, and for storing one or more intrinsic information items, each of which may be set to indicate the presence of a corresponding interrupt, and each of which may be individually masked out. The apparatus further includes logic for setting the intermediate node to have an overall interrupt state, if any of the intrinsic or extrinsic information items in the intermediate node indicates the presence of an interrupt without having been masked out. In addition, the logic is responsive to a predetermined command from higher in the network to mask out any intrinsic information items that indicate the presence of an interrupt. The apparatus also includes a communications link for propagating any change in the overall interrupt state of the intermediate node automatically up the network hierarchy.
In accordance with another embodiment of the invention, there is provided a computer program product comprising machine readable program instructions. When loaded into one or more devices these can be executed by the device(s) to implement the methods described above. Note that the program instructions are typically supplied as a software product for download over a physical wired or wireless network, such as the Internet, or on a physical storage medium such as DVD or CD-ROM. In either case, the software can then be loaded into machine memory for execution by an appropriate processor (or processors), or by some other semiconductor device, and may also be stored on a local non-volatile storage, such as a hard disk drive. The program instructions may also represent microcode or firmware, potentially supplied preloaded into a machine, for example by storage in a ROM, or burnt into a programmable.logic array (PLA).
It will be appreciated that the embodiments based on apparatus and computer program products can generally utilise the same particular features as described above in relation to the method embodiments.
Various embodiments of the invention will now be described in detail by way of example only with reference to the following drawings in which like reference numerals pertain to like elements and in which:
The service bus 200 of
Note that a node may comprise a wide variety of possible structures from one or more whole machines, down to an individual component or a device within such a machine, such as an application specific integrated circuit (ASIC). There may be many different types of node linked to the service bus 205. The only requirement for a node is that it must be capable of communicating with other nodes over the service bus 205.
For simplicity, the tree architecture in
It will be appreciated that within the above constraints a great variety of tree configurations are possible. For example, in some trees the leaf chips may have a constant depth, in terms of the number of levels within the hierarchy. In contrast, the tree of
It will also be appreciated that the single path in
A computing installation incorporating a service bus is illustrated in
Computer system 100 also incorporates a service bus, headed by service processors 50A and 50B. Each of these can be implemented by a workstation or similar, including associated memory 54, disk storage 52 (for non-volatile recording of diagnostic information), and I/O unit 56. In the embodiment of
The topology of the service bus in
The leaf chips 100 and router chips are typically formed as application specific integrated circuits (ASICs), with the leaf chips being linked to or incorporated in the device that they are monitoring. As will be described in more detail below, a given chip may function as both a router chip and a leaf chip. For example, router chip 60F and leaf chip 140B might be combined into a single chip. Note also that although not shown in
In the particular embodiment illustrated in
In one particular embodiment, the service processor 201 is connected to the topmost router chip 202A (see
A packet sent over service bus 205 generally contains certain standard information, such as an address to allow packets from the service processor to be directed to the desired router chip or leaf node. The skilled person will be aware of a variety of suitable addressing schemes. The service processor is also responsible for selecting a particular route that a packet will take to a given target node, if the service bus topology provides multiple such routes. (Note that Response packets in general simply travel along the reverse path of the initial Send packet). In addition, a packet typically also includes a synchronisation code, to allow the start of the packet to be determined, and error detection/correction facilities (e.g. parity, CRC, etc.); again, these are well within the competence of the skilled person. Note that if an error is detected (but cannot be corrected), then the detecting node may request a retransmission of the corrupted packet, or else the received packet may simply be discarded and treated as lost. This will generally then trigger one or more time-outs, as discussed in more detail below.
The architecture of the service bus can be regarded as SP-centric, in that it is intended to provide a route for diagnostic information to accumulate at the service processor. However, one difficulty with this approach is that as communications move up the hierarchy, there is an increasing risk of congestion. This problem is most acute for the portion of the service bus between router chip 202A and service processor 201 (see
The standard mechanism for reporting a system problem over the service bus 205 is to raise an interrupt. However, the inter-relationships between various components in a typical system installation may cause propagation of an error across the system. As a result, one fault will frequently produce not just a single interrupt, but rather a whole chain of interrupts, as the original error leads to consequential errors occurring elsewhere in the system. For example, if a storage facility for some reason develops a fault and cannot retrieve some data, then this error condition may be propagated to all processes and/or devices that are currently trying to access the now unavailable data.
Indeed, it is possible for a single fault at one location to cause a thousand or more interrupt signals to be generated from various other locations in a complex installation. In the service bus architecture of
Leaf chip 203 includes two flip-flops shown as I0 301 and I2 302. The output of these two flip-flops is connected to a comparator 305. Router chip 202 includes a further flip-flop, I1 303. The state of flip-flop I0 is determined by some interrupt parameter. In other words, I0 is set directly in accordance with whether or not a particular interrupt is raised. The task of I1 is to then try to mirror the state of I0 Thus I1 contains the state that router chip 202 believes currently exists in flip-flop I0 in leaf chip 203. Lastly, flip-flop I2 302 serves to mirror the state of I1, so that the state of I2 represents what the leaf chip 203 believes is the current state of flip-flop I1 in router chip 202.
It is assumed that initially all three flip-flops, I0, I1, and I2, are set to 0, thereby indicating that no interrupts are present (the system could of course also be implemented with reverse polarity, i.e., with 0 indicating the presence of an interrupt). Note that this is a stable configuration, in that I1 is correctly mirroring I0, and I2 is correctly mirroring I1. We now assume that an interrupt signal is received at flip-flop I0, in other words some hardware component within leaf chip 203 raises an interrupt signal which sets the state of flip-flop I0 so that it is now equal to 1. At this point we therefore have the configuration (1, 0, 0) in I0, I1, and I2 respectively.
Once I0 has been set to indicate the presence of an interrupt, the comparator 305 now detects that there is a discrepancy between the state of I0 and I2, since the latter remains at its initial setting of 0. The leaf chip 203 responds to the detection of this disparity by sending an interrupt packet on the service bus 205 to router chip 202. This transmission is autonomous, in the sense that the bus architecture permits such interrupt packets to be initiated by a leaf node (or router chip) as opposed to just the service processor.
When router chip 202 receives the interrupt packet from leaf chip 203, it has to update the status of flip-flop I1. Accordingly, the value of I1 is changed from 0 to 1, so that we now have the state of (1, 1, 0) for I0, I1, and I2 respectively. Having updated the value of I1, the router chip 202 now sends a return packet to the leaf chip 203 confirming that the status of I1 has indeed been updated. The leaf chip 203 responds to this return packet by updating the value of the flip-flop I2 from 0 to 1. This means that all three of the flip-flops are now set to the value 1. Consequently, the comparator 305 will now detect that I0 and I2 are again in step with one another, having matching values. It will be appreciated that at this point the system is once more in a stable configuration, in that I1 correctly reflects the value of I0, and I2 correctly reflects the value of I1.
In one particular embodiment, the interrupt packet sent from leaf chip 203 to router chip 202 contains four fields. The first field is a header, containing address information, etc, and the second field is a command identifier, which in this case identifies the packet as an interrupt packet. The third field contains the actual updated interrupt status from I0 while the fourth field provides a parity or CRC checksum. The acknowledgement to such an interrupt packet then has exactly the same structure, with the interrupt status now being set to the value stored at I1.
In order to regulate the above operations, a time-out mechanism is provided in leaf chip 203. This provides a timer T1 304A, which is set whenever an interrupt packet is sent from leaf chip 203 to router chip 202. A typical value for this initial setting of timer T1 might be say 1 millisecond, although this will of course vary according to the particular hardware involved. The timer then counts down until confirmation arrives back from the router chip 202 that it received the interrupt packet and updated its value of the flip-flop I1 accordingly. If however the confirmation packet is not received before the expiry of the time-out period, then leaf chip 203 resends the interrupt packet (and also resets the timer). This process is continued until router chip 202 does successfully acknowledge receipt of the interrupt packet (there may be a maximum number of retries, after which some error status is flagged).
It will be appreciated that removal or resetting of the interrupt occurs in substantially the same fashion as the initial setting of the interrupt. Thus the reset is triggered by flip-flop I0 being returned to 0, thereby indicating that the associated interrupt has been cleared. The comparator 305 now detects that there is a discrepancy between I0 and I2, since the latter is still set to a value of 1. This reflects the fact that from the perspective of the router chip 202, flip-flop I0 is supposedly still set to indicate the presence of an interrupt. As before, this discrepancy results in the transmission of an interrupt signal (packet) from the leaf chip 203 to the router chip 202 over service bus 205, indicating the new status of flip-flop I0 On receipt of this message the router chip updates the value of flip-flop I1 so that it now matches I0. At this point, there is a status of (0, 0, 1) for I0, I1, and I2 respectively.
The router chip 202 now sends a message back to the leaf chip 203 confirming that it has updated its value of I1. (Note that the leaf chip 203 uses the same time-out mechanism while waiting for this confirmation as when initially setting the interrupt). Once the confirmation has been received, this results in the leaf chip updating the value of I2 so that this too is set back to 0. At this point the system has now returned to its initial (stable) state where all the flip-flops (I0, I1, and I2) are set to 0.
The interrupt reporting scheme just described can also be exploited for certain other diagnostic purposes. One reason that this is useful is that interrupt packets are allowed to do certain things that are not otherwise permitted on the service bus (such as originate at a child node). In addition, re-use of interrupt packets for other purposes can help to generally minimise overall traffic on the service bus.
In one embodiment these additional diagnostic capabilities are achieved by use of a second timer T2 304B within the leaf chip 203. This second timer represents a heartbeat timer, in that it is used to regularly generate an interrupt packet from leaf node 203 to router chip 202, in order to reassure router chip 202 that leaf chip 203 and connection 205 are both properly operational, even if there is no actual change in interrupt status at leaf node 203. Thus if the router chip 202 does not hear from leaf node 203 for a prolonged period, this may be either because the leaf chip 203 is working completely correctly, and so not raising any interrupts, or alternatively it may be because there is some malfunction in the leaf chip 203 and/or the serial bus connection 205 that is preventing any interrupt from being reported. By using the timer T2 to send the interrupt signal as a form heartbeat, the router node can distinguish between these two situations.
Timer T2 is set to a considerably longer time-out period than timer T1, for example 20 milliseconds (although again this will vary according to the particular system). If an interrupt packet is generated due to a change in interrupt status at leaf chip 203, as described above, within the time-out period of T2, then timer T2 is reset. This is because the interrupt packet sent from leaf chip 203 to router chip 202 obviates the need for a heartbeat signal, since it already indicates that the leaf chip and its connection to the router chip are still alive. (Note that dependent on the particular implementation, T2 may be reset either when the interrupt packet is sent from leaf chip 203, or when the acknowledgement is received back from router chip 202).
However, if timer T2 counts down without such an interrupt packet being sent (or acknowledgement received), then the expiry of T2 generates an interrupt packet itself for sending from leaf chip 203 to router chip 202. Of course, the interrupt status at leaf chip 203 has not actually changed, but the transmission of the interrupt packet on expiry of T2 serves two purposes. Firstly, it acts as a heartbeat to router chip 202, indicating the continued operation of leaf chip 203 and connection 205. Secondly, it helps to maintain proper synchronisation between I0, I1, and I2, in case one of them is incorrectly altered at some stage, without this change otherwise being detected.
In order to make use of the heartbeat signal from leaf chip 203, a timer T3 304C is added into to the router chip 202. This timer is reset each time an interrupt packet (and potentially any other form of packet) from the leaf chip 203 is received at the router chip 202. The time-out period at this timer is somewhat longer than the heartbeat time-out period set for T2 at leaf node 203, for example, thirty milliseconds or more. Providing another interrupt packet is received within this period, then timer T3 on the router chip 202 is reset, and will not reach zero.
However, if no further interrupt packets are received from leaf chip 203, then this timer will count down to zero (i.e. it will time-out). In this case the router chip knows that there is some problem with the connection 205 and/or with the leaf chip itself 203. This is because when everything is properly operational, it is known that leaf chip 203 will generate at least one interrupt packet within the heartbeat period, as specified by T2. In contrast, the expiry of T3 indicates that no interrupt packet has been received from leaf chip 203 within a period significantly longer than the heartbeat interval (assuming of course that T3 is properly set in relation to T2). At this point, the router chip 202 can perform the appropriate action(s) to handle the situation. This may include setting an interrupt status within itself, which in turn will lead to the situation being reported back to the service processor 201 (as described below).
As well as providing a heartbeat signal, the interrupt packets can also be used for testing signal integrity over connection 205. This can be done by reducing the setting of timer T2 from its normal or default value to a much shorter one, say 20 microseconds (note that if the reset of T2 is triggered by the transmission of an interrupt packet from leaf chip 203, rather than by the receipt of the following acknowledgement, the setting of T2 for this mode of testing should allow time for this acknowledgement to be received). This then leads to a rapid exchange of interrupt packets and acknowledgements over 205, at a rate increased by a factor of about 1000 compared to the normal heartbeat rate. This represents a useful testing exercise, in that if connection 205 is able to adequately handle transmissions at this very high rate, then it should not have difficulty with the much lower rate of normal interrupt reporting and heartbeat signals. Note that such testing and the setting of timer T2 are performed under the general control of the service processor 201.
In the embodiment shown in
Each incoming link is terminated by a control block, namely control block 410 in respect of link 205b and control block 420 in respect of link 205a. The control blocks perform various processing associated with the transmission of packets over the service bus 205, for example adding packet headers to data transmission, checking for errors on the link, and so on. Many of these operations are not directly relevant to an understanding of the present invention and so will not be described further, but it will be appreciated that they are routine for the person skilled in the art. Note that control units 410 and 420 each contain a timer, denoted 411 and 421 respectively. These correspond to timer T3304C in
Associated with each control block 410, 420 is a respective flip-flop, or more accurately respective registers 415, 425, each comprising a set of four flip-flops. These registers correspond to the flip-flop I1 shown in
As previously described in relation to
Once router chip 202 has received interrupt status information from nodes below it in the hierarchy, it must of course also be able to pass this information up the hierarchy, so that it can make its way to the service processor 201. In order to avoid congestion near the service processor, an important part of the operation of the router node 202 is to consolidate the interrupt information that it receives from its child nodes. Accordingly, the interrupt values stored in registers 415 and 425 (plus any other equivalent units if router node 202 has more than two child nodes) are fed into OR gate 440, and the result is then passed for storage into register 445. Register 445 again comprises four flip-flops, one for each of the different interrupt levels, and the consolidation of the interrupt information is performed independently for each of the four interrupt levels.
Consequently, register 445 presents a consolidated status for each interrupt level indicating whether any of the child nodes of router chip 202 currently has an interrupt set. Indeed, as will later become apparent, register 445 in fact represents the consolidated interrupt status for all descendant nodes of router chip 202 (i.e. not just its immediate child nodes, but their child nodes as well, and so on down to the bottom of the service bus hierarchy).
It is also possible for router node 202 to generate its own local interrupts. These may arise from local processing conditions, reflecting operation of the router node itself (which may have independent functionality or purpose over and above its role in the service bus hierarchy). Alternatively (or additionally), the router node may also generate a local interrupt because of network conditions, for example if a heartbeat signal such as discussed above fails to indicate a live connection to a child node.
The locally generated interrupts of the router chip 202, if any, are produced by local interrupt unit 405, which will be described in more detail below, and are stored in the block of flip-flops 408. Again it is assumed that there are four independent levels of interrupt, and accordingly register 408 comprises four individual flip-flops.
An overall interrupt status for route noder 202 can now be derived based on (a) a consolidated interrupt status for all of its child (descendant) nodes, as stored in register 445; and (b) its own locally generated interrupt status, as stored in register 408. In particular, these are combined, via OR gate 450 and the result stored in register 455. As before, the four interrupt levels of are handled independently, so that OR gate 450 in fact represents four individual OR gates operating in parallel, one for each interrupt level.
The results of this OR operation are stored in register 455, and correspond in effect to the value of I0 for router node 202, as described in relation to
Router chip 202 further includes a register 456 comprising four flip-flops, which are used in effect to store the value of I2 (see
Router chip 202 therefore acts both as a parent node to receive interrupt status from lower nodes, and also as a child node in order to report this status further up the service bus hierarchy. Note that the interrupt status that is reported over link 205C represents the combination of both the locally generated interrupts from router chip 202 (if any), plus the interrupts received from its descendant nodes (if any).
After the interrupt packet triggered by a positive signal from comparator 460 is transmitted upstream, a response packet should be received in due course over link 205C. This will contain an updated value of I1 (see
The control unit 430 also includes timers T1 431 and T2 432, whose function has already been largely described in relation to
The skilled person will be aware that there are many possible variations on the implementation of
It is also possible to implement timers T1 and T2 by a single timer for the standard mode of operation. This single timer then has two settings: a first, which is relatively short, is used to drive packet retransmission in the absence of an acknowledgement, and the second, relatively long, is used to drive a heartbeat signal. One mechanism for controlling the timer is then based on outgoing and incoming transmissions, whereby sending an interrupt packet (re)sets timer 431 to its relatively short value, while receiving an acknowledgement packet (re)sets the timer 431 to its relatively long value. Alternatively, the timer may be controlled by a comparison of the values of I0 and I2, in that if these are (or are changed to be) the same, then the longer time-out value is used, while if these are (or are changed to be) different, then the shorter time-out value is used.
A further possibility is that node 202 does not have any locally generated interrupts, so that block 405 and register 408 are effectively missing. Conversely, if node 202 is a leaf chip node, then there will be no incoming interrupt status to forward up the service bus hierarchy, hence there will be no interrupts received at gate 440, which can therefore be omitted. In either of these two cases it will be appreciated that gate 450 also becomes redundant and the interrupt status, whether locally generated or from a child node, can be passed directly onto register 455.
It will also be recognised that while registers 445 and 408 have been included in
The processing of
The method now proceeds to step 915 where a comparison is made as to whether or not I0 and I2 are the same. If I0 has not been updated (i.e., step 910 has been bypassed because of a negative outcome to step 905), then I0 and I2 will still be the same, and so processing will return back up to step 905 via step 955, which detects whether or not the timer, as set to the heartbeat value, has expired. This represents in effect a wait loop that lasts until a change to interrupt status does indeed occur, or until the system times out.
In either eventuality, processing then proceeds to send an interrupt packet from the child node to the parent node (step 920). As previously described, the interrupt packet contains the current interrupt status. Note that if step 920 has been reached via a positive outcome from step 955 (expiry of the heartbeat timer), then this interrupt status should simply repeat information that has previously been transmitted. On the other hand, if step 920 has been reached via a negative outcome from step 915 (detection of a difference between I0 and I2), then the interrupt status has been newly updated, and this update has not previously been notified to the parent node.
Following transmission of the interrupt packet at step 920, the timer is set (step 925), to its acknowledgement value. A check is now made to see whether or not this time-out period has expired (step 930). If it has indeed expired, then it is assumed that the packet has not been successfully received by the parent node and accordingly the method loops back up to step 920, which results in the retransmission of the interrupt packet. On the other hand, if the time-out period is still in progress, then the method proceeds to step 935 where a determination is made as to whether or not a confirmation packet has been received. If not, the method returns back up to step 930. This loop represents the system in effect waiting either for the acknowledgement time-out to expire, or for the confirmation packet to be received from the parent node.
Note that if a confirmation packet is received, but is incorrect because some error is detected but cannot be corrected by the ECC, then the system treats such a confirmation packet as not having been received. In this case therefore, the interrupt packet is resent when the time-out expires at step 930. Another possible error situation arises if the returned value of I1 does not match I0, but the received packet is otherwise OK (the ECC is correct). This is initially handled as a correctly received packet, but will subsequently be detected when the method reaches step 915 (as described below).
Assuming that the confirmation packet is indeed correctly received before the expiry of the acknowledgement time-out, then step 935 will have a positive outcome, and the method proceeds to update the value of I2 appropriately (step 940). This updated value should agree with the value of I0 as updated at step 910, and so these two should now match one another again. The method can now loop back to the beginning, via step 950, which resets the timer to its heartbeat value, and so re-enters the loop of steps 955, 905 and 915. A stable configuration, analogous to the start position (albeit with an updated interrupt status) has therefore been restored again.
One potential complication is that, as previously mentioned, a given node may have two or more parent nodes, in order to provide redundancy in routing back to service processor. Assuming that the service processor has knowledge of the current status of each node (whether or not it is functional), it may direct a child node to report all interrupts to a particular parent node if another parent is not functional at present. Alternatively, the child node may direct an interrupt packet first to one parent, and then only to another parent if it does not receive a confirmation back from the first parent in good time. Yet another possibility is for the child node to simply report any interrupt to both (all) of its parents at the substantially same time. This does mean that a single interrupt may be reported back twice to the service processor, but due to the consolidation of interrupt signals at higher levels of the service bus architecture, any resultant increase in overall network traffic is unlikely to be significant. (Note that such duplicated interrupt reporting does not cause confusion at the service processor, since the original source of each interrupt still has to be determined, as described below in relation to
It should also be noted there is only a single interrupt status (per level), even although there may be multiple interrupt sources (from local and/or from child nodes). For example, in
The method then proceeds to step 855 where a timer is set. The purpose of this timer, as previously described, is to monitor network conditions to verify that the link to the child node is still operational. Thus a test is made at step 860 to see whether or not the time-out period of the timer has expired. If so, then it is assumed that the child node and/or its connection to the parent node has ceased proper functioning, and the parent node generates an error status (typically in the form of a locally generated interrupt) at step 865. This then allows the defect to be reported up the service bus to the service processor.
If at step 860 the time-out period has not yet expired, then a negative outcome results, and the method proceeds to step 870. Here, a test is made to see whether or not an interrupt packet has been received from the child node. If no such packet has been received then the method returns back again to step 860. Thus at this point the system is effectively in a loop, waiting either for an interrupt packet to be received, or for the time-out period to expire.
(Note that while the processing of steps 860 and 870 is shown as a loop, where one test follows another in circular fashion, the underlying implementation may be somewhat different, as for example is the case in the embodiment of
Assuming that at some stage an interrupt packet is indeed received (as sent by the child node at step 920 of
As previously discussed, the precise contents of the interrupt packet sent at step 920 in
In one embodiment, for a system that supports four interrupt levels, the interrupt packet simply includes a four-bit interrupt status. In other words, each interrupt packet contains a four-bit value representing the current (new) settings for the four different interrupt levels, thereby allowing multiple interrupt levels to be updated simultaneously. However, other approaches could be used. For example, an interrupt packet could specify which particular interrupt level(s) is (are) to be changed. A relatively straightforward scheme would be to update only a single interrupt level per packet, since as previously discussed it is already known that there is only one such interrupt packet per level (until all the interrupts for that level are cleared).
Note that the processing of
It will be appreciated that the interrupt scheme of
Unit 405 includes four main components: an interrupt status register (ISR) 601; a mask pattern register (MPR) 602; a set of AND gates 603; and an OR gate 604. The interrupt status register 601 comprises multiple bits, denoted as a, b, c, d and e. It will be appreciated that the five bits in ISR 601 in
Each bit in the ISR 601 is used to store the status of a corresponding interrupt signal from some device or component (not shown). Thus when a given device or component raises an interrupt, then this causes an appropriate bit of interrupt status register 601 to be set. Likewise, when the interrupt is cleared, then this causes the corresponding bit in ISR 601 to be cleared (reset). Thus the interrupt status register 601 directly tracks the current interrupt signals from corresponding devices and components as perceived at the hardware level.
The mask pattern register 602 also comprises multiple bits, denoted again as a, b, c, d, and e. Note that there is one bit in the MPR for each bit in the interrupt status register 601. Thus each bit in the ISR 601 is associated with a corresponding bit in the MPR 602 to form an ISR/MPR bit pair (601a and 602a; 601b and 602b; and so on).
An output is taken from each bit in the ISR 601 and from each bit in the MPR 602, and corresponding bits from an ISR/MPR bit pair are passed to an associated AND gate. (As shown in
Thus for each pair of corresponding bits in the ISR 601 and MPR 602 there is a separate AND gate 603. For example, ISR bit 601 a and MPR bit 602a are both connected as inputs to AND gate 603a; ISR bit 601b and MPR bit 602b are connected as the two inputs to AND gate 603b; and so on for the remaining bits in the ISR and MPR registers. Note that the values of the bits within the MPR can also be read (and set) by control logic within a node (not shown in
The set of AND gates 603 are connected at their outputs to a single OR gate 604. The output of this OR gate is in turn connected to flip-flop 408 (see
The result of the configuration of
(It will be appreciated that the mask could of course be implemented using reverse polarity, in which case it would perhaps better be regarded as an interrupt enable register. In such an implementation, a zero would be provided from register 602 to disable or mask an interrupt, and a one to enable or propagate an interrupt. Note that with this arrangement, the inverters between the AND gates 603 and the register 602 would be removed).
The OR gate 604 provides a single output signal that represents a consolidated status of all the interrupt signals that have not been masked out. In other words, the output from OR gate 604 indicates an interrupt whenever at least one ISR bit is set without its corresponding MPR bit being set. Conversely, OR gate 604 will indicate the absence of an interrupt if all the interrupts set in ISR 601 (if any) are masked out by MPR 602 (i.e., the corresponding bits in MPR 602 are set).
One motivation for the configuration of
The reason for this is to minimise congestion at the top of the service bus hierarchy. Thus even although multiple nodes below router chip 202a may be raising interrupt signals, these are consolidated into just a single signal for passing on to service processor 201. In this way, the message volume over the service bus 205 is greatly reduced at the top of the hierarchy to try to avoid congestion.
However it will be appreciated that the decrease in traffic on the service bus is at the expense of an effective loss of information, namely the details of the origin of any given interrupt. Therefore, in one embodiment of the invention a particular procedure is adopted to allow the service processor 201 to overcome this loss of information, so that it can properly manage interrupts sent from all the various components of the computer installation.
One factor underlying this procedure is that once an interrupt has been raised by a particular device or component, then this device or component will frequently generate multiple successive interrupt signals. However, these subsequent interrupts are usually of far less interest than the initial interrupt signal. The reason for this is that the initial interrupt signal indicates the presence of some error or malfunction, and it is found that such errors then often continue (in other words further interrupt signals are received) until the underlying cause of the error can be rectified.
Thus in one embodiment of the present invention, the procedure depicted in
Having started at the service processor, it is assumed that there are no locally generated interrupts at step 710 so we progress to step 720, where a test is made to see if there are any interrupts that are being received from a child node. Referring back again to
Having descended to the next level down in the service bus hierarchy, the method loops back up to step 710. Here a test is again performed to see if there are any locally generated interrupts. Let us assume for the purposes of illustration that the only node that is actually locally generating an interrupt signal at present is leaf chip 203B. Accordingly, test 710 will again prove negative. Therefore, we will then loop around the same processing as before, descending one level for each iteration through router chips 202B, 202E, and 202F, until we finally reach leaf chip 203B.
At this point the test of step 710 will now give a positive outcome, so that processing proceeds to step 715. This causes the control logic of the node to update the MPR 602 to mask out a locally generated interrupt signal. More particularly, it is assumed that just a single interrupt signal is masked out at step 715 (i.e., just one bit in the MPR 602 is set). Accordingly, after this has been performed, processing loops back to step 710 to see if there are still any locally generated interrupts. If this is the case, then these further interrupts will be masked out by updating the mask register one bit at a time at step 715. This loop will continue until all the locally generated interrupts at the node are masked out.
Note that the decision of which particular bit in the MPR to alter can be made in various ways. For example, it could be that the leftmost bit for which an interrupt is set could be masked out first (i.e. bit a, then bit b, then bit c, and so as depicted in
It will be appreciated that at the same time as the control logic of the node updates the MPR in step 715, it typically reads the ISR status. It can then report the particular interrupt that is being cleared up to the service processor, and/or perform any other appropriate action based on this information. Note that such reporting should not now overload the service bus 205 because it is comparatively controlled. In other words, the service processor should receive an orderly succession of interrupt signal reports, as each interrupt signal is processed in turn at the various nodes.
It will also be noted that at this point the interrupts themselves have not been cleared, rather they have just been masked out. This is because, as mentioned earlier, there may well be a re-occurrence of same error very quickly (due to the same underlying malfunction), resulting in the interrupt signal being set once again. Consequently, clearing of the interrupt signal itself in ISR 601 is deferred until suitable remedial or diagnostic action has been taken (not shown in
This strategy therefore prevents flooding the service processor with repeated instances of the same interrupt signal (derived from the same ongoing problem), since these which are of relatively little use to the service processor for diagnostic purposes, but at the same time allows the system to be re-sensitised to other interrupts from that node. Note that when the interrupt signal is eventually cleared, then the corresponding MPR bit is likewise cleared or reset back to zero (not shown in
Once all the locally generated interrupts have been cleared at step 710 then we proceed to step 720 where it is again determined if there are any interrupt signals present from a child node. Since we are currently at leaf chip 203B, which does not have any child nodes, then this test is now negative, and the method proceeds to step 730. Here it is tested to see whether or not we are at the service processor itself. If so, then there are no currently pending interrupts in the system that have not yet been masked out, and so processing can effectively be terminated at step 750. (It will be appreciated that at this point the service processor can then determine the best way to handle those interrupts that are currently masked out).
However, assuming at present that we are still at leaf chip 203B, then step 730 results in a negative outcome, leading to step 735. This directs us to the parent node of our current location, i.e., in this particular case back up to router chip 202F. (Note that if a child node can have multiple parents, then at step 735 any parent can be selected, although returning to the parent through which the previous descent was made at step 725 can be regarded as providing the most systematic approach).
We then return to step 710, where it will be again determined that there are no locally generated interrupts at router chip 202F, so we now proceed to step 720. At this point, the outcome of step 720 for router chip node 202F is negative, unlike the previous positive response for this node. This is because the interrupt(s) at leaf chip 203B has now been masked out, and this is reflected in the updated contents of flip-flop 445 for the router chip (see
(It will be appreciated that if leaf chip 203C also has a pending interrupt, then router chip 202F would maintain its interrupt status even after the interrupt(s) from leaf chip 203B had been cleared. In this case, when the test of step 720 was performed for router chip 202F, then it would again be positive, and this would lead via step 725 to leaf chip 203C, to clear the interrupts stored there).
Assuming now that there are no longer any child nodes of router node 202F with pending interrupts, then step 720 will have a negative outcome. Consequently, the method will loop through step 730, again taking the negative outcome because this is not the service processor. At step 735 processing will then proceed to parent router chip node 202E.
Providing that there no further interrupts present in the service bus, the same loop of steps 710, 720, 730 and 735 will be followed twice more, as we ascend through router chip 202B and router chip 202A, before eventually reaching service processor 201. At this point, step 730 results in a positive outcome, leading to an exit from the method at step 750, as previously described.
Thus the procedure described by the flowchart of
In one embodiment, the processing of
Note that although
It will also be appreciated that the processing of
Similarly the processing of
a, 8b, 8c, 8d, and 8e illustrate various stages of the application of the method of
Thus looking at
If we now apply the processing of
According to step 725, we then descend the leftmost branch from node A to node B, loop back again to step 710, and follow the processing through once more to descend to node C at step 725. This time when we arrive back at step 710, there is a locally generated interrupt at node C, so we follow the positive branch to update the MPR at step 715. Processing then remains at node C until the MPR is updated sufficiently to remove or mask out all locally generated interrupts. This takes us to the position shown in
At this point there are no longer any locally generated interrupts at node C, so step 710 produces a negative result, as does step 720, because node C has no child nodes. We therefore go to step 730, which also produces a negative outcome, causing us to ascend the hierarchy to node B at step 735. Returning to step 710, which is again negative because node B has no locally generated interrupts, there is however an interrupt still from a child node, namely node D. Accordingly, step 720 produces a positive result, leading us to step 725, where we descend to node D.
We then loop up again to step 710, and since this node does contain a locally generated interrupt, we go to step 715 where the MPR for node D is updated. These two steps are then repeated if necessary until the locally generated interrupts at node D have been completely masked, taking us to the position illustrated in
After the local interrupts have been masked from node D, the next visit to step 710 results in a negative outcome, as does the test of step 720, since node D is a leaf node with no child nodes. This takes us through to step 730, and from there to step 735, where we ascend up to node B. Since node B now has no interrupts, then steps 710 and 720 will both test negative, as will the test at step 730, leaving us to again ascend the network, this time to node A.
Since node A does not have any locally generated interrupts but only interrupts from child nodes (nodes F), we proceed through steps 710 and 720 to step 725, where we descend to the leftmost child node from which an interrupt signal is being received. This now corresponds to node F, which is the only node currently passing an interrupt signal up to node A.
Returning to step 710, this finds that node F is indeed generating its own local interrupt(s), which is (are) masked at step 715, resulting in the situation shown in
The method now returns back up to step 710, which produces a positive outcome due to the locally generated interrupt at node G. This is then addressed by updating the masking pattern register at step 715 as many times as necessary. Once the locally generated interrupt at node G has been removed, this then clears the child node interrupt status at node F and also at node A (and the service processor). Consequently, the method of
Note that the above embodiments have been described primarily as a combination of computer hardware and software. For example, certain operations are directly implemented in hardware, such as the determination by comparator 305 at the first node (see
Note also that the approach described herein is not necessarily restricted just to computers and computing, but can apply to any situation in which status information needs to be conveyed from one location to another (for example controlling a telecommunications or other form of network, remote security monitoring of various sites, and so on).
In conclusion, a variety of particular embodiments have been described in detail herein, but it will be appreciated that this is by way of exemplification only. The skilled person will be aware of many further potential modifications and adaptations using the teachings set forth herein that fall within the scope of the claimed invention and its equivalents.
Number | Name | Date | Kind |
---|---|---|---|
4860201 | Stolfo et al. | Aug 1989 | A |
5606703 | Brady et al. | Feb 1997 | A |
5907712 | Slane | May 1999 | A |
6052739 | Bopardikar et al. | Apr 2000 | A |
6078970 | Nordstrom et al. | Jun 2000 | A |
6085278 | Gates et al. | Jul 2000 | A |
6449667 | Ganmukhi et al. | Sep 2002 | B1 |
6606676 | Deshpande et al. | Aug 2003 | B1 |
6687865 | Dervisoglu et al. | Feb 2004 | B1 |
6742139 | Forsman et al. | May 2004 | B1 |
Number | Date | Country |
---|---|---|
2272310 | May 1994 | GB |
Number | Date | Country | |
---|---|---|---|
20040030819 A1 | Feb 2004 | US |