System and Method to Support Use of Bus Spare Wires in Connection Modules

Information

  • Patent Application
  • 20080082878
  • Publication Number
    20080082878
  • Date Filed
    September 13, 2006
    17 years ago
  • Date Published
    April 03, 2008
    16 years ago
Abstract
In a computer system with multiple chips connected via a connection module with high speed elastic interface buses that support bus repair is enhanced by use of a spare net. Support is provided to ensure that the spare net can be tested in the same way that every normal bus net can be tested at all supported environments. It ensure that the system controller can find out what connections are bad and how to apply the controls to repair them for all tests and in the field for the customer.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter which is regarded as the invention is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:



FIG. 1 illustrates an example of the chips that may be mounted on the connection module.



FIG. 2 illustrates an example of Elastic Interface between two chips with the inclusion of a spare net for bus repair



FIG. 3 illustrates the design for Elastic Interface Repair and structures required to test the spare net.



FIG. 4 illustrates an example of use of the Electronic Fuse technology to support Elastic Interface Repair.





The detailed description explains the preferred embodiments of the invention, together with advantages and features, by way of example with reference to the drawings.


DETAILED DESCRIPTION OF THE INVENTION

In some of the current IBM servers the chips that make up the central electronics complex that may include the system controller or L2 cache memory, microprocessors, and memory interface controllers are mounted on a connection module. There are numerous high frequency high bandwidth buses that communicate via IBM elastic interface buses. These buses may include spare bus bits to allow a given elastic interface to operate even when there is a fault in the connection module. In FIG. 1 see just such an example of the connection model used in the IBM system z9. The module, 100, contains 1 single clock controller chip, 110, a single L2 cache controller chip, 120, two memory controller chips, 130 and 131, four L2 cache data chips, 140-143, and eight microprocessor chips, 150-157, each of which contain 2 microprocessor cores. This module is referred to as the central electronics complex or CEC.


In FIG. 2 we see a single such elastic interface (EI) bus from the driving chip X, 200, to the receiving chip Y, 210. The EI logic on the driving chip, 220, drives the elastic interface bus, 250, and the EI spare net, 260, to the receiving EI logic on the receiving chip, 230. The data bus may include checking that is used to detect which bit on an interface bus is bad. This may include such checking as ECC codes that are able to detect exactly which data bit is incorrect.


In FIG. 3 the details of how the EI spare net operation is used. For a given EI bus that is n+1 bits in length, bit 0 to bit n there will be an EI bus of size n+2 including the spare bit. By example for a given set of three bits of the bus sourced in the driving side, 300, bus bits x−1, x, and x+1 labeled 320, 330, and 340. Any two adjacent bits of the bus can be muxed by the 2:1 muxes, 330, 331, or 332, into one of two possible bit positions on the EI bus bits, 340, 341, or 342. On the receiving side, 310, these EI bus bits, 340, 341, or 342 are each sinked into two receive side muxes 350, 351, or 352. This in turn will produce the bits in the receive side bus bits x−1, x, x+1, 360, 361, or 362. Lets look at just bit x of the bus in the two cases of a non-spared bus and a spared bus. When the bus has not been spared bit x will feed two different muxes, 331 and 332. The controls on these muxes will set such that drive side bit x, 321, will propagate out of mux, 331, to the EI bus bit x, 341. In the receive side this EI bus bit x, 341 will be connected to two different muxes, 350 and 351. Again the mux controls will be set to that bit x will propagate out of mux 351 to become receive side bus bit x, 361. In the case of a bad net on the EI bus prior to bit x of the EI bus, 341, a sparing action will take place. The controls for the EI muxes will be altered. In this case drive side bit x will be muxed by mux 332 onto EI bus bit x+1, 342. In the receive side EI bus bit x+1, 342, enters the receive mux 351. In this case the controls are set to propagate the EI bus bit x+1, 342 out mux 351 to again become receive side bus bit x, 361.


Now that we have seen how any generic bit x behaves with and without the spare action. Lets us again look at FIG. 3 to see what occurs with the last two bits of the EI bus. For driver side bus bits n−1 and n, 323 and 324, feed to muxes 333 and 334. These bits then normally propagate out to EI bus bits n−1 and n, 343 and 344. These EI bus fits then feed muxes on the receive side, 353 and 354. These then propagate to the receive side bus bits n−1 and n, 363 and 364. When the bus is not being spared the EI bus spare net has no functional role. In order to test it always we chose to eliminate the mux between driver side bus bit n, 324, and a fixed value of ground that would have been 335. Rather we chose to when the EI spare bit, 345, is used or not to drive it with driver bus bit n, 324. Then on the receive side we build a checking circuit to verify that when the spare bus bit, 345, is not used functionally that its value matches that EI bus bit n, 344. This is done the checking circuit 355. The checking logic gets disabled when the sparing action occurs on the EI bus and the EI spare bit, 345, is used functionally. This is reported as a error checker signal, 365. So in the case where the spare bit is not used driving side bus bits n−1 and n, 323 and 324, propagate through muxes 333 and 334 to EI bus bits n−1 and n, 343 and 344. These bits in turn feed the muxes 353 and 354 on the receive side to generate receive bus bits n−1 and n, 363 and 364. Driver side bit n, 324, is also driven on EI spare bit, 345. On the receive side EI bus bit n, 344, and EI spare bit, 345, are checked in compare logic, 355, to generate a error signal, 365. When the bus has been spared driver side bus bit n−1, 323, propagates through mux 334 to EI bus bit n, 344, to mux 353, to generate receive side bit n−1, 363. Driver side bit n, 324, is drive on the EI spare net, 345, to receive side mux 354, to provide receive side bus bit n. The checking logic in 355 is disabled as the EI spare bit, 345 and EI bus bit n, 344 are no longer the same.


It is doing this checking when the bus is not spared that allows the spare net to be tested at all functional and environmental tests that all normal bus nets are tested at. This allows the spare net to be used at any step in the manufacture testing or in the customer's office as a spare net and know it has met all of the required testing like all other functional bus bits.


In FIG. 4 we have an example of the chip on the connection module that will hold the information on which nets should be spared for this module. In chip X, 400 which we have chosen as one of the chips that has a single instance on the connection module are the e-fuse array, 410. This bank of e-fuses will consume a non-trivial amount of area but not lead to any significant number of chip I/O. The values will be read via a scan method by the service element in the system at system power-on from the internal values the e-fuses provide. For each desired EI sparing action the valid bit, 420, will be set. In addition an EI bus number will be specified, 430. This will be used by the service element to know which pair of EI logic driver and receiver logic, 220 and 230, need to have the sparing controls set. The repair bit position, 440, indicates which bit needed to be spared and is encode in such a way as to match how the mux controls for all the muxes on the driver and receiver side must be set to get the correct sparing action. This makes it easy for the service element to apply to both sides of the interface. These e-fuses, 410, are programmed on the module after the chips have been mounted to it based on the original manufacturing data about which nets were bad in the connection module.


The capabilities of the present invention can be implemented in software, firmware, hardware or some combination thereof


The diagrams depicted herein are just examples. There may be many variations to these diagrams or the steps (or operations) described therein without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted or modified. All of these variations within the scope of the claims are considered a part of the claimed invention.


While the preferred embodiment to the invention has been described, it will be understood that those skilled in the art, both now and in the future, may make various improvements and enhancements which fall within the scope of the claims which follow. These claims should be construed to maintain the proper protection for the invention first described.

Claims
  • 1. A computer system, comprising a computer system having a plurality of processing chips, a connection module for said chips having multi-bit buses between the chips connection module spare nets forming a spare network of a plurality of nets able to repair one or more of said multi-bit buses, said spare nets be fully testable.
  • 2. The computer system according to claim 1 wherein said a connection module has a plurality of the chips of the system mounted on it and multi-bits buses that communicate between the chip on the connection module.
  • 3. The computer system according to claim 2 wherein said spare network provides a spare bus network of circuit nets built into the connection module that are coupled for use to repair a net in the bus that was bad due a manufacturing defect.
  • 4. The computer system according to claim 3 wherein said spare nets are fully testable when a known connection made to the spare network is made to ensure a predicable value is driven on the spare network even when not used.
  • 5. The computer system according to claim 4 wherein said connection module includes checking logic to ensure that the spare network is operating as desired through all test parameters that the module is tested at during a manufacture test.
  • 6. The computer system according to claim 5 wherein more than one chip is coupled to uses said connection module with multi-bit buses between the chips to be able to test and repair the bus, and all of said chips have fully testable spare networks.
  • 7. A computer system according to claim 1 including a plurality of chips connected via a connection module with high speed elastic interface buses, said chips supporting bus repair is enhanced by use of a spare network form on said chips.
  • 8. The computer system according to claim 7 wherein the spare network can be tested in the same way that every normal bus net can be tested at all supported environments.
  • 9. The computer system according to claim 8 wherein the chips are coupled to a system controller to ensure said system controller can find out what if any connections in the chips coupled to the system controller are bad and how to apply the controls to repair them for all tests and in the field for a customer.
  • 10. The computer system according to claim 1 wherein where the buses between the chips are high speed elastic interfaces with elastic interface repair logic.
  • 11. The computer system according to claim 2 including error detection hardware logic on the bus that can detect a bad bit on the interface, detect which specific bit said bad bit was, and report said bad bit to a service module for the system which can then repair the bus to correct the defect by setting correct control values in elastic interface repair logic.
  • 12. The computer system according to claim 11 wherein the correction of a defect occurs after the final manufacture repair settings have been made.
  • 13. A computer system made of more than one chip, comprising a plurality of chips coupled to a connection module with multi-bit buses between the chips, said connection module having built into the connection module spare nets enabling repair of a multi-bit bus, and wherein known manufacture faults to be repaired can be programmed on a chip in the connection module after the mounting of the chips to the module with electronic fuses.
  • 14. The computer system according to claim 13 wherein said computer system is comprised of more than a single chip; and wherein a connection module has the chips of the system mounted on it with multi-bit buses that communicate between the chip on the connection module; and a spare bus net built into the connection module that is used to repair a net in the bus that was bad due a manufacturing defect.
  • 15. The computer system according to claim 14 wherein a chip on the module has electronic fuses that can be programmed after the chip has been mounted on the connection module to store the information for one more known manufacture connection faults.
  • 15. The computer system according to claim 14 wherein the buses between the chips are high speed elastic interfaces with elastic interface repair logic.
  • 16. The computer system according to claim 15 wherein information stored in said electronic fuses contain a designated bus number used to locate elastic interface controls for both a driving and a receiving side and repair bit position and which are encoded the same way as elastic interface repair logic controls are defined.
  • 17. The computer system according to claim 16 wherein said elastic interface is between two chips that are on a connection module, and wherein said spare net is included for bus repair.
  • 18. The computer system according to claim 17 wherein the spare net is testable.
  • 19. A method of testing a computer system, comprising: after manufacture testing via spare network provided on chips of a connection module for multi-bus chips circuit nets built into the connection module that are coupled for use to repair a net in the bus that was bad due a manufacturing defect.
  • 20. The method according to claim 19 wherein said method is performed as a service of repairing a computer system, comprising testing after manufacture circuit nets on chips having a spare network provided on chips of a connection module for multi-bus chips circuit nets built into the connection module that are coupled for use to repair a net in the bus that was bad due a manufacturing defect.