Service Alarm Correlation

Abstract
A system and method for correlating alarms from a plurality of network elements (NEs) are provided to unambiguously associate separate alarms to one another. This is accomplished by a method where a fault identifier FID is generated by a serving NE who discovered the faulty hardware or software unit. The serving NE signals its lost or degraded service to a client NE in a traffic message and appends the generated FID to the traffic message. The client NE extracts the FID from the traffic message and appends it to a service alarm which the NE sends to a network management system. The serving NE also generates an alarm message and provides it with same FID. The serving NE sends the alarm message and its FID to the network management system. The service alarm and the alarm message received by the network management system will thus contain the same FID. In the management system the FID is used to correlate the two alarms with one another.
Description
TECHNICAL FIELD OF THE INVENTION

The present invention relates to a method, system and network elements for processing alarm information within a telecommunication network managed from a network management system.


DESCRIPTION OF RELATED ART

The objective for an operator supervising the telecommunication network via a network management system is to be able to restore quickly degraded or lost services by locating and correcting a faulty unit causing the degraded or lost service. When alerted of a degraded or lost service, the operator needs hence to associate or correlate the lost service with/to the responsible faulty unit.


Network Elements (NEs) in a telecommunication network have different tasks which together aim to connect two or several user equipments (UEs) together. The NEs may depend on each other in such way that if one NE fails then another NE will fail to provide its services as consequence (a client-server relationship between NEs). An NE comprises hardware (HW) units and software. Software is stored in a memory and runs under control of a processor and operation system. HW units may further include specific HW units providing the functionality supported by the NE. Within an NE a number of functions execute. These functions may act as serving functions to client functions in other NEs and if the function is faulty in serving NE then the client NE will have its service degraded or lost as consequence. For example, example: a faulty board in a radio base station (RBS) may show up in a client Radio Network Controller (RNC) as a message “cell disabled” indicating that the operational state of the cell with the RBS containing the faulty board is out of order.


Due to commercial reasons operators tend to mix NEs from different vendors in their telecommunication networks. To limit dependencies in implementation the information to be shared between NEs is limited. In a radio network typically the information shared to set up radio functions is standardised, but not information about HW equipment.


The disadvantage of not being able to inform a client NE about the faulty HW in the serving NE is that each NE will send an alarm to the network management system, but the alarms are not correlated, i.e. have no unique association to one another. The alarms upon reception in the network management system are time stamped and stored, for example in a database 10. When displaying the alarms in an alarm list the two alarms will be separated by other alarms, which have been received from same or other NEs during same time period. The operator has then a difficult task to conclude that a service alarm from client NE is the consequence of a faulty HW in serving NE. Time and competence to locate the fault and thereby restore service increases.


SUMMARY OF THE INVENTION

One object of the invention is to provide a solution to the problem of correlating alarms, triggered by network elements that have a dependency of one another, such that the alarms are unambiguously associated with one another. This is achieved with the system according to claim 1, the network elements in accordance with claims 5, 6 and the method in accordance with claim 7.


The invention is based on a fault identifier (FID) mechanism, which provides a unique association between the lost service and the responsible faulty unit.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1. is a block diagram of a managed network for a telecommunication system, and



FIG. 2. is a schematic view of a list of collected alarms in the network management system.





DETAILED DESCRIPTION OF EMBODIMENTS


FIG. 1 shows a network management system (NMS) 1, a client network element (NE) 2 and a serving NE 3. In the serving NE there is a faulty hardware (HW) or software unit 4 and fault identity (FID) generator 5. In the client NE there is a FID extractor 6 and faulty service detection means 7. Between the serving NE and the NMS as well as between the client NE and the NMS there are management interfaces 8 and 9 respectively. Between the serving and client NEs there is a traffic interface 10.


The serving NE has the task to set up and maintain a certain set of services, which the client NE is in control over. Typically in a cellular radio network the serving NE is a radio base station (RBS) supplying, for example, user equipments (UEs) in a range of a radio cell with user data such as speech or images.


The client NE in such a cellular radio network is typically a radio network controller (RNC), having the task to control one or several connected RBSes acting as serving NEs. The RNC controls the RBSes to set up and maintain UEs with requested services, for example speech connections and controls that UEs can roam between cells served by several RBSes.


An NE emits an alarm when a faulty hardware unit is detected and makes the alarm available over the management interface. The alarm generating mechanism in the NE appends following information in the alarm:


Who: The name of the device or NE experiencing the fault


What: The condition of the fault, i.e. the symptom of the fault


When: The time the problem was detected


In addition to the NEs above, a telecommunications network has one or more NMSes, which among other things, are used to supervise the NEs. For supervision the NMS has a mechanism to receive and store alarms, for example in a database 11 and present the alarms to an operator of an operator console 12. The NMS communicates with each respective NE via the management interfaces. Management messages are exchanged between the NEs and the NMS over the management interfaces. The traffic interface between client NE and serving NE provides traffic messages for negotiating the services requested by the client NE. Specifically in a radio network the interface provides traffic messages to add and delete cells served by the RBS to support a connection between the UE and the RNC. The information exposed over the traffic interface is however limited. This is due to the fact that operators tend to mix NEs from different vendors in their telecommunication networks. Standardization bodies have agreed upon the use of general and implementation-independent service primitives in the traffic interface in order to limit dependencies between vendors' implementation. For example it is not possible to send information on the identity of a failing HW or software unit over the traffic interface, the only information exposed over the interface is that the serving NE has a failure and cannot deliver the service requested by the client NE. The disadvantage of not being able to inform a client NE about the faulty HW in the serving NE is that both NEs will send alarms to the network management system, but the alarms are not correlated, i.e. they have no unique association to one another. The operator has then a difficult task to conclude that a service alarm from the client NE is the consequence of a faulty HW in serving NE. Time and competence to locate the fault and thereby restore service increases.


An overview of the alarm processing method pursuant to the present invention is provided in FIG. 1. When the serving NE detects the faulty hardware unit it sends an alarm, said alarm referred to as a hardware alarm 13, to the NMS. The non-shown device in the serving NE who discovered the faulty hardware unit retrieves a unique fault identifier (FID) from the FID generator 5. The unique FID is generated by combining a NEs network unique name with an integer. The integer is derived from a 19 bit variable and is stepped with 1 for every new fault. An example for a FID: RBS1262143, where RBS1 is the network unique name of the network element and 262143 is the decimal equivalent to the 19 bit variable.


The FID is appended to the hardware alarm and forwarded to the NMS over the management interface 13. The hardware alarm is stored in the database 11 and is presented to the operator in an alarm list 16 to be described below in connection with FIG. 2. Upon detection of the faulty HW or software unit the serving NE also sends a traffic message 15 to the client NE and appends the same FID to it. The traffic message informs the client NE that the requested service is no longer available. The client NE receives the traffic message with the appended FID and generates in response thereto an alarm indicative of the lost service, said alarm referred to as a service alarm 14. In particular the client NE extracts the FID from the traffic message and appends it to the service alarm 14. The service alarm with the appended FID is forwarded to the NMS over the management interface 9. The NMS upon reception stores and presents the service alarm with the appended FID in the alarm list.



FIG. 2 shows an alarm list 16 with a number of alarms received from many different serving and client NEs in the cellular radio system. There is one alarm listed on each row and the list is possible to scroll. The list is made from the stored hardware and service alarms in the database. The alarms contain the identity of the NE experiencing the fault, such as RNC or RBS and also the reporting faulty unit or service. The detection time may also be part of each alarm. The alarms are listed in the chronological order they where generated and time-stamped in the NEs. As shown there are many other alarms which have been generated in same or other NEs supervised by same NMS during the time span from the reception of the hardware alarm 13 and the reception of the service alarm 14. As shown it is easy to unambiguously associate the hardware and service alarms to each other by the FIDs 17 which both emanate from one and the same detected failure.


While the system and method shown and described is the preferred, it is apparent that the FID can be generated in other ways than from combining a NEs network unique name with an integer. A unique FID may be obtained by assigning each NE a number series and by assigning different number series to individual NEs. The FID may thus be generated within each NE as a randomly selected number within the assigned number series.

Claims
  • 1. A system for correlating alarms from a plurality of network elements NEs in a telecommunications network, said system comprising: NEs which depend on each other in such way that if one NE fails then another NE will fail to provide its services as consequence, said failing NE referred to as serving NE and said another NE referred to as client NE,serving NEs being adapted to signal traffic messages to the client NE, but being unable to provide information on a faulty hardware or software unit, anda network management system supervising the NEs and adapted to receive and store alarms characterized in that a serving NE is provided with:means for generating a fault identifier (FID) related to the faulty hardware or software unit,means for forwarding an alarm message to the network management system and including therein the FID,means for providing the traffic message with the same FID, and in thatthe client NE is provided with means for extracting the FID and append the extracted FID to a service alarm message expressing the service fault and with means for forwarding the service alarm message to the network management system.
  • 2. A system in accordance with claim 1, characterized in that the FID generator is adapted to generate a randomly selected number for each fault.
  • 3. A system in accordance with claim 1, characterized in that the FID generator is adapted to extract the name of the NE that the generator resides in, and combine it with an integer, which is stepped for every new detected fault.
  • 4. A system in accordance with claim 1, characterized in that the information model exposed in the interface between the serving NE and the client NE comprises information to set up and maintain traffic connections between the client and serving NEs, but not information to express faulty hardware units.
  • 5. A system in accordance with claim 1, characterized in that the system is a radio network, the serving node is a radio base station, and the client node is a radio network controller.
  • 6. A serving NE comprising: program software for a service and for operating the service,hardware (HW),an interface towards a client NE said interface having service primitives for signalling availability of requested services but having no service primitives for signalling of a faulty hardware or software unit,an alarm interface towards a network management system,fault detection means for detecting a faulty hardware or software unit in the serving NE,means for generating an alarm message in response to detection of a faulty hardware or software unit by the fault detection means, the alarm message being forwarded to the network management system interface, characterized bya device for generating unique fault indicators (FIDs),a device for appending a generated unique FID to the alarm message, and for appending the same unique FID to a traffic message sent to client NE over said interface towards the client NE.
  • 7. A client NE comprising: program software for a service and for operating the service,hardware (HW),an interface towards a serving NE, said interface having service primitives for signalling availability of requested services but having no service primitives for signalling of faulty hardware or software unit in the serving NE,an alarm interface towards a network management system,fault detection means for extracting service primitives received from the serving NE,means for generating a service alarm in response the fault detection means extracting service primitives indicative of the inability of the serving NE to provide the requested service, the service alarm being forwarded to the network management system interfaces,
  • 8. A method for correlating alarms from a plurality of network elements (NEs) in a telecommunications network, wherein NEs which depend on each other in such way that if one NE fails then another NE will fail to provide its services as consequence, the failing NE referred to as serving NE and the another NE referred to as client NE, said method comprising: the serving NE discovers a faulty hardware or software unit and in response thereto forwards an alarm message indicative of the faulty unit to a network management system,the serving NE forwards to the client NE a traffic message indicative of the inability of the serving NE to provide the requested service, but is unable to forward information on the faulty hardware,client NE receives said traffic message and forwards in response thereto a service alarm indicative of the lost service to the network management system, andthe network management system stores the service and hardware alarms and presents them to an operator, characterized bythe serving NE upon detection of a faulty hardware or software unit generates fault identity (FID) and associates it with the detected faulty unit,the serving NE appending the FID to a traffic message which it transmits to the client NE and to an alarm message which it transmits to the management system,the client NE appending the FID to service alarm, andthe network management system upon reception of the service alarm and the alarm message associates the two alarms to one another using said FID.
  • 9. A method in accordance with claim 8, characterized in that the fault identifier is a randomly selected number unique for the NE in which the fault is detected.
  • 10. A method in accordance with claim 8, characterized in that the fault identifier is a combination of the name of the NE in which the generator resides and an integer, which is stepped for every new detected fault.
PCT Information
Filing Document Filing Date Country Kind 371c Date
PCT/SE04/01769 11/29/2004 WO 00 5/1/2008
Provisional Applications (2)
Number Date Country
60395892 Jul 2002 US
60440276 Jan 2003 US