Claims
- 1. A method for providing automated diagnostic services for a cluster computer system comprising a plurality of nodes, each of the plurality of nodes providing an application to a plurality of clients, the method comprising the steps of:
receiving a current value of a network parameter related to cluster middleware associated with the cluster computer system; analyzing the current value of the network parameter relative to a predetermined reference value for the network parameter; and providing information based on the analysis of the current value relative to the predetermined reference value.
- 2. The method of claim 1, wherein the network parameter relates to a network heartbeat interval for a node in the cluster computer system and the predetermined reference value is an optimal network heartbeat interval for the node based on the current heartbeat link for the node.
- 3. The method of claim 2, wherein the step of analyzing the current value of the network heartbeat interval relative to the optimal network heartbeat interval comprises determining whether the difference between the current value and the optimal network heartbeat interval is within a predetermined variance.
- 4. The method of claim 3, wherein the step of providing information based on the analysis of the current value relative to the optimal network heartbeat interval comprises providing a warning of a potential failover recovery problem if the difference between the current value and the optimal network heartbeat interval is not within the predetermined variance.
- 5. The method of claim 3, further comprising the step of determining whether an alternative heartbeat link for the node is available if the difference between the current value and the optimal network heartbeat interval is not within the predetermined variance.
- 6. The method of claim 3, further comprising the step of repeating the above steps for another node in the cluster computer system if the difference between the current value and the optimal network heartbeat interval is within the predetermined variance.
- 7. The method of claim 5, further comprising the step of providing a warning of a potential failover recovery problem if an alternative heartbeat link for the node is not available.
- 8. The method of claim 5, further comprising the step of, if an alternative heartbeat link for the node is available, determining the optimal network heartbeat interval for the node based on the alternative heartbeat link for the node and analyzing the current value of the network heartbeat interval relative to the optimal network heartbeat interval associated with the alternative heartbeat link for the node.
- 9. The method of claim 1, wherein the network parameter relates to a node timeout value for a node in the cluster computer system and the predetermined reference value comprises a predefined threshold range for the node timeout value.
- 10. The method of claim 9, wherein the predefined threshold range for the node timeout value is based on a function of a network heartbeat interval for the node.
- 11. The method of claim 10, wherein the step of analyzing the current value of the node timeout value relative to the predefined threshold range comprises determining whether the current value of the node timeout value is within a predetermined variance.
- 12. The method of claim 11, wherein the step of providing information based on the analysis of the current value relative to the predefined threshold range for the node timeout value comprises providing a warning that the node timeout value is not within the predefined threshold range.
- 13. The method of claim 12, wherein the step of providing information based on the analysis of the current value relative to the predefined threshold range for the node timeout value further comprises generating an instruction configured to set the node timeout value within the predefined threshold range.
- 14. The method of claim 12, wherein the predetermined reference value further comprises a predefined recommended range and wherein the step of providing information based on the analysis of the current value relative to the predefined threshold range and the predefined recommended range further comprises, if the current value of the node timeout value is greater than the upper bound of the predefined threshold range, providing a warning that the node timeout value is too high and generating an instruction configured to set the node timeout value of the node to the upper bound of the predefined threshold range.
- 15. The method of claim 12, wherein the predetermined reference value further comprises a predefined recommended range and wherein the step of providing information based on the analysis of the current value relative to the predefined threshold range and the predefined recommended range further comprises, if the current value of the node timeout value is greater than the upper bound of the predefined recommended range, determining whether an empirical condition associated with the cluster computer system exists that suggests the current value of the node timeout value should be greater than the upper bound of the predefined recommended range.
- 16. The method of claim 15, further comprising the step of, if an empirical condition does not exist, providing a warning that the node timeout value is too high and generating an instruction configured to set the node timeout value of the node to the upper bound of the predefined threshold range.
- 17. The method of claim 12, wherein the predetermined reference value further comprises a predefined recommended range and wherein the step of providing information based on the analysis of the current value relative to the predefined threshold range and the predefined recommended range further comprises, if the current value of the node timeout value is less than the lower bound of the predefined recommended range, determining whether an empirical condition associated with the cluster computer system exists that suggests the current value of the node timeout value should be less than the upper bound of the predefined recommended range.
- 18. The method of claim 17, further comprising the step of, if an empirical condition does not exist, providing a warning that the node timeout value is too low and generating an instruction configured to set the node timeout value of the node to the lower bound of the predefined recommended range.
- 19. The method of claim 12, wherein the predetermined reference value further comprises a predefined recommended range and wherein the step of providing information based on the analysis of the current value relative to the predefined threshold range and the predefined recommended range further comprises, if the current value of the node timeout value is not less than the lower bound of the predefined threshold range, providing a warning that the node timeout value is too low and generating an instruction configured to set the node timeout value of the node to the lower bound of the predefined threshold range.
- 20. The method of claim 1, wherein the network parameter relates to an autostart timeout interval for a node in the cluster computer system and the predetermined reference value comprises a predefined range for the autostart timeout interval.
- 21. The method of claim 20, wherein the step of analyzing the current value of the auto start timeout interval relative to the predefined range comprises determining whether the current value of the autostart timeout interval is within the predefined range.
- 22. The method of claim 21, wherein the step of providing information based on the analysis of the current value relative to the predefined range for the autostart timeout interval comprises, if the current value of the autostart timeout interval is above the upper bound of the predefined range, providing an instruction configured to decrease the autostart timeout interval of the node.
- 23. The method of claim 21, wherein the step of providing information based on the analysis of the current value relative to the predefined range for the autostart timeout interval comprises, if the current value of the autostart timeout interval is below the lower bound of the predefined range, providing an instruction configured to increase the autostart timeout interval of the node.
- 24. The method of claim 21, wherein the step of determining whether the current value of the auto start timeout interval is within the predefined range is performed after determining that a cluster unification process has been initiated during reboot of the node.
- 25. The method of claim 1, wherein the network parameter relates to a network polling interval for a node in the cluster computer system and the predetermined reference value comprises a predefined range for the autostart timeout interval.
- 26. The method of claim 25, wherein the step of analyzing the current value of the network polling interval relative to the predefined range comprises determining whether the current value of the network polling interval is within the predefined range.
- 27. The method of claim 26, wherein the step of providing information based on the analysis of the current value relative to the predefined range for the network polling interval comprises, if the current value of the network polling interval is above the upper bound of the predefined range, providing an instruction configured to decrease the network polling interval of the node.
- 28. The method of claim 26, wherein the step of providing information based on the analysis of the current value relative to the predefined range for the network polling interval comprises, if the current value of the network polling interval is below the lower bound of the predefined range, providing an instruction configured to increase the network polling interval of the node.
- 29. The method of claim 26, wherein the step of determining whether the current value of the network polling interval is within the predefined range is performed after determining that the network polling has been set.
- 30. A system for providing automated diagnostic services for a cluster computer system comprising a plurality of nodes, each of the plurality of nodes providing a mission-critical application to a plurality of clients, the system comprising:
a first portion of logic configured to receive a current value of a network parameter related to cluster middleware associated with the cluster computer system; a second portion of logic configured to analyze the current value of the network parameter relative to a predetermined reference value for the network parameter; and a third portion of logic configured to provide information based on the analysis of the current value relative to the predetermined reference value.
- 31. The system of claim 30, further comprising a computer configured to store and implement the first, second, and third portions of logic.
- 32. The system of claim 30, wherein the first, second, and third portions of logic are embodied in an operating system associated with the computer.
- 33. The system of claim 30, wherein the first, second, and third portions of logic are embodied in cluster middleware associated with the computer.
- 34. The system of claim 30, further comprising a network interface card configured to communicate with a cluster interface.
- 35. The system of claim 34, further comprising one or more clients in communication with the computer via the cluster interface.
- 36. The system of claim 30, further comprising a network interface configured to communicate with the cluster computer system via a communications network and wherein the current value of the network parameter is received via a communications network and the information based on the analysis is provided to the cluster computer system via the communications network.
- 37. The system of claim 30, wherein the network parameter relates to a network heartbeat interval for a node in the cluster computer system and the predetermined reference value is an optimal network heartbeat interval for the node based on the current heartbeat link for the node.
- 38. The system of claim 37, wherein the second portion of logic is further configured to determine whether the difference between the current value and the optimal network heartbeat interval is within a predetermined variance.
- 39. The system of claim 38, wherein the third portion of logic is further configured to provide a warning of a potential failover recovery problem if the difference between the current value and the optimal network heartbeat interval is not within the predetermined variance.
- 40. The system of claim 38, further comprising a fourth portion of logic configured to determine whether an alternative heartbeat link for the node is available if the difference between the current value and the optimal network heartbeat interval is not within the predetermined variance.
- 41. The system of claim 38, further comprising a fourth portion of logic configured to repeat the first, second, and third portions of logic for another node in the cluster computer system if the difference between the current value and the optimal network heartbeat interval is within the predetermined variance.
- 42. The system of claim 40, further comprising a fifth portion of logic configured to provide a warning of a potential failover recovery problem if an alternative heartbeat link for the node is not available.
- 43. The system of claim 40, further comprising a fifth portion of logic configured to determine, if an alternative heartbeat link for the node is available, the optimal network heartbeat interval for the node based on the alternative heartbeat link for the node and analyze the current value of the network heartbeat interval relative to the optimal network heartbeat interval associated with the alternative heartbeat link for the node.
- 44. The system of claim 30, wherein the network parameter relates to a node timeout value for a node in the cluster computer system and the predetermined reference value comprises a predefined threshold range for the node timeout value.
- 45. The system of claim 44, wherein the predefined threshold range for the node timeout value is based on a function of a network heartbeat interval for the node.
- 46. The system of claim 45, wherein the third portion of logic is further configured to determine whether the current value of the node timeout value is within a predetermined variance.
- 47. The system of claim 46, wherein the third portion of logic is further configured to provide a warning that the node timeout value is not within the predefined threshold range.
- 48. The system of claim 47, wherein the third portion of logic is further configured to generate an instruction configured to set the node timeout value within the predefined threshold range.
- 49. The system of claim 47, wherein the predetermined reference value further comprises a predefined recommended range and wherein the third portion of logic is further configured to, if the current value of the node timeout value is greater than the upper bound of the predefined threshold range, provide a warning that the node timeout value is too high and generate an instruction configured to set the node timeout value of the node to the upper bound of the predefined threshold range.
- 50. The system of claim 47, wherein the predetermined reference value further comprises a predefined recommended range and wherein the third portion of logic is further configured to, if the current value of the node timeout value is greater than the upper bound of the predefined recommended range, determine whether an empirical condition associated with the cluster computer system exists that suggests the current value of the node timeout value should be greater than the upper bound of the predefined recommended range.
- 51. The system of claim 50, further comprising a fourth portion of logic configured to, if an empirical condition does not exist, provide a warning that the node timeout value is too high and generate an instruction configured to set the node timeout value of the node to the upper bound of the predefined threshold range.
- 52. The system of claim 47, wherein the predetermined reference value further comprises a predefined recommended range and wherein the third portion of logic is further configured to, if the current value of the node timeout value is less than the lower bound of the predefined recommended range, determine whether an empirical condition associated with the cluster computer system exists that suggests the current value of the node timeout value should be less than the upper bound of the predefined recommended range.
- 53. The system of claim 52, wherein the third portion of logic is further configured to, if an empirical condition does not exist, provide a warning that the node timeout value is too low and generate an instruction configured to set the node timeout value of the node to the lower bound of the predefined recommended range.
- 54. The system of claim 47, wherein the predetermined reference value further comprises a predefined recommended range and wherein the third portion of logic is further configured to, if the current value of the node timeout value is not less than the lower bound of the predefined threshold range, provide a warning that the node timeout value is too low and generate an instruction configured to set the node timeout value of the node to the lower bound of the predefined threshold range.
- 55. The system of claim 30, wherein the network parameter relates to an autostart timeout interval for a node in the cluster computer system and the predetermined reference value comprises a predefined range for the autostart timeout interval.
- 56. The system of claim 55, wherein the second portion of logic is further configured to determine whether the current value of the autostart timeout interval is within the predefined range.
- 57. The system of claim 56, wherein the third portion of logic is further configured to, if the cur-rent value of the autostart timeout interval is above the upper bound of the predefined range, provide an instruction configured to decrease the autostart timeout interval of the node.
- 58. The system of claim 56, wherein the third portion of logic is further configured to, if the current value of the autostart timeout interval is below the lower bound of the predefined range, provide an instruction configured to increase the autostart timeout interval of the node.
- 59. The system of claim 56, wherein the second portion of logic is further configured to determine whether the current value of the autostart timeout interval is within the predefined range is performed after determining that a cluster unification process has been initiated during reboot of the node.
- 60. The system of claim 30, wherein the network parameter relates to a network polling interval for a node in the cluster computer system and the predetermined reference value comprises a predefined range for the network polling interval.
- 61. The system of claim 60, wherein the second portion of logic is further configured to determine whether the current value of the network polling interval is within the predefined range.
- 62. The system of claim 61, wherein the third portion of logic is further configured to, if the current value of the network polling interval is above the upper bound of the predefined range, provide an instruction configured to decrease the network polling interval of the node.
- 63. The system of claim 61, wherein the third portion of logic is further configured to, if the current value of the network polling interval is below the lower bound of the predefined range, provide an instruction configured to increase the network polling interval of the node.
- 64. The system of claim 61, wherein the second portion of logic is further configured to determine whether the current value of the network polling interval is within the predefined range is performed after determining that the network polling has been set.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part of copending U.S. utility application entitled, “Systems and Methods for Providing an Automated Diagnostic Audit for Cluster Computer Systems,” having Ser. No. 09/840,784, and filed Apr. 23, 2001, which is hereby incorporated in its entirety by reference. This application is also related to copending, and concurrently-filed, U.S utility application entitled “Systems and Methods for Providing Automated Diagnostic Services for a Cluster Computer System,” (Atty. Docket No. 050820-1650; HP Docket No. 10013525-1) having Ser. No. ______, and filed Oct. 26, 2001, which is hereby incorporated by reference in its entirety.
Continuation in Parts (1)
|
Number |
Date |
Country |
| Parent |
09840784 |
Apr 2001 |
US |
| Child |
10008855 |
Oct 2001 |
US |