Information technology (IT) infrastructures of organizations have grown in complexity over the last few decades. A typical IT infrastructure such as a data center of an enterprise may include multiple information technology resources such as servers, computer systems, switches, storage devices, storage area networks, computer applications, etc.
For a better understanding of the solution, examples will now be described, with reference to the accompanying drawings, in which:
An information technology infrastructure of an enterprise may comprise of a few to thousands of information technology resources. It may be desirable to have a solution that monitors these resources, collects performance metrics related thereto, and provides a feedback related to their status and functioning.
The term “information technology (IT) infrastructure” may be defined as a combined set of hardware, software, networks, facilities, etc. in order to develop, test, deliver, monitor, control or support IT services. Also, as used herein, the term “resource” may refer to a software and/or a hardware component that may be accessible locally or over a network. Some non-limiting examples of a resource may include a server, a router, and a disk drive.
Collecting metrics from resources and then transferring them to a metrics server for further processing is a continuous process. Usually metrics collection may occur in small intervals, e.g., once every minute; however it can go up or down depending on the requirement of an application. Typically, for each metric, complete information such as a metrics identifier, a timestamp of the metric and a value of the metric is sent. From each resource, several metrics may be collected at periodic time intervals. In a scenario where there may a large number of resources (e.g., a hybrid cloud, a datacenter, etc.), each creating data for several metrics that may get transferred at regular intervals, the process may not only generate a large amount of network traffic, but also consume vast computing power both at the source and a receiver.
To address this issue, the present disclosure describes various examples for collecting performance metrics from a device. In an example, performance metrics for that are to be collected from a first device may be selected. The performance metrics may be indexed by assigning an index entry to the respective performance metrics on the first device. In an example, sequential indexing may be fused. A fixed sequence of the performance metrics may be maintained on the first device. The sequence number in this fixed sequence may be treated as the index for that entry. The fixed sequence of the performance metrics along with the index entry assigned to the respective performance metrics may be shared with a second device. A first performance data of the respective performance metrics on the first device may be determined. The first performance data of the respective performance metrics may be shared with the second device. In an example, the sharing may comprise sending, to the second device, the index entry and the first performance data of the respective performance metrics in an order corresponding to the fixed sequence of the performance metrics on the first device.
The present disclosure provides an effective method of collecting performance metrics by reducing the data transfer frequency and the amount of data transferred.
In an example, first device 102 and second device 104 may each be a computing device which may include any type of computing system that is capable of executing machine-readable instructions. Examples of the computing device may include, without limitation, a server, a desktop computer, a notebook computer, a tablet computer, a thin client, a mobile device, and a personal digital assistant (PDA).
In an example, first device 102 may be a network device. Examples of the network device may include, without limitation, a network router, a virtual router, a network switch, and a virtual switch.
In an example, first device 102 may be a storage device. Examples of the storage device may include, without limitation, a non-transitory machine-readable storage medium that may store, for example, machine executable instructions, data file, and metadata related to a data file. Some non-limiting examples of a non-transitory machine-readable storage medium may include a hard disk, a storage disc (for example, a CD-ROM, a DVD, etc.), a disk array, a storage tape, a solid state drive, a Serial Advanced Technology Attachment (SATA) disk drive, a Fibre Channel (FC) disk drive, a Serial Attached SCSI (SAS) disk drive, a magnetic tape drive, and the like. The storage device may be a direct-attached storage i.e. storage that is directly attached to a node. In an example, the storage device may be an external storage (for example, a storage array) that may communicate with a node via a communication interface.
In an example, first device 102 and second device 104 may be communicatively coupled, for example, via a computer network. The computer network may be a wireless or wired network. The computer network may include, for example, a Local Area Network (LAN), a Wireless Local Area Network (WAN), a Metropolitan Area Network (MAN), a Storage Area Network (SAN), a Campus Area Network (CAN), or the like. Further, computer network may be a public network (for example, the Internet) or a private network (for example, an intranet). In an example, first device 102 and second device 104 may be directly attached, for example, through a cable.
In an example, first device 102 may include a selection engine 120, an index engine 122, a maintenance engine 124, a sequence engine 126, a determination engine 128, and a transfer engine 130.
In an example, first device 102 may be implemented by at least one computing device and may include at least engines 120, 122, 124, 126, 128, and 130 which may be any combination of hardware and programming to implement the functionalities of the engines described herein. In examples described herein, such combinations of hardware and programming may be implemented in a number of different ways. For example, the programming for the engines may be processor executable instructions stored on at least one non-transitory machine-readable storage medium and the hardware for the engines may include at least one processing resource to execute those instructions. In some examples, the hardware may also include other electronic circuitry to at least partially implement at least one engine of first device 102. In some examples, the at least one machine-readable storage medium may store instructions that, when executed by the at least one processing resource, at least partially implement some or all engines of first device 102. In such examples, first device 102 may include the at least one machine-readable storage medium storing the instructions and the at least one processing resource to execute the instructions.
In an example, selection engine 120 may be used to select performance metrics that are to be collected from first device 102. Selection engine 120 may provide a user interface, for example, a Graphical User Interface (GUI) or a Command Line Interface (CLI), for a user to select performance metrics that are to be collected from first device 102. In an example, second device 104, which may be communicatively coupled to first device 102 either through a direct connection or a computer network may be used to select performance metrics for collection from first device 102. Second device 104 may use the selection engine 120 on first device 102 for making the selection. Some non-limiting examples of the performance metrics that may be selected may include CPU utilization, memory utilization, storage utilization, or network utilization on first device 102.
Once the performance metrics are selected, index engine may be used to index the performance metrics by assigning an index entry, which may be sequence number, to respective performance metrics on first device 102. For example, if performance metrics such as CPU utilization, memory utilization, storage utilization, and network utilization are selected on first device 102, then their identities may be kept in a sequence to assign them unique sequential index value to each of the performance metrics. For example, index entries M1, M2, M3, and M4 may be used to represent identities of performance metrics CPU utilization, memory utilization, storage utilization, and network utilization respectively. This is illustrated in
Maintenance engine 124 may organize and maintain the identities of the performance metrics in a fixed sequence on first device 102. For example, if index entries M1, M2, M3, and M4 are used to represent identities of performance metrics CPU utilization, memory utilization, storage utilization, and network utilization respectively, then maintenance engine 124 may maintain their identities on first device 102 in a certain sequence, for example, M1, M2, M3, and M4. This is illustrated in
Sequence engine 126 may share the fixed sequence of the selected performance metrics along with the index entry assigned to the respective performance metrics on first device 102, with second device 104. In this manner, the sequence of performance metrics and their respective index entries would be in-sync on both first device 102 and second device 104. The identities of the selected performance metrics would be maintained in the same sequence both on first device 102 and second device 104.
Determination engine 128 may determine a first performance data of the selected performance metrics on first device 102. For example, if performance metrics include CPU utilization, memory utilization, storage utilization, and network utilization, then the first performance data may include their respective performance metric values such as 85%, 50%, 24%, and 66% respectively. The performance metric values could be any numerical values, and may not necessarily be in percentage.
Transfer engine 130 may share the first performance data of the selected performance metrics with second device 104. In an example, the sharing may comprise sending, to second device 104, the index entry and the first performance data of the respective performance metrics in an order corresponding to the fixed sequence of the performance metrics on first device 102. For example, if index entries M1, M2, M3, and M4 are used to represent identities of performance metrics CPU utilization, memory utilization, storage utilization, and network utilization respectively, and their respective performance metric values are 85%, 50%, 24%, and 66% respectively, then transfer engine 130 may send the data to second device 104, for example, in the following sequence: [M1, 85%], [M2, 50%], [M3, 24%], and [M4, 66%]. In another example, the first performance data may be sent in the following sequence [M1, M2, M3, M4], [85%, 50%, 24%, 66%]. The point to be noted is that the identities of the performance metrics (for example, index entries) and their respective performance metrics are shared with second device 104 according to the fixed sequence defined on first device 102. This is illustrated in
In an example, determination engine 128 may determine a second performance data of the selected performance metrics. For example, if performance metrics include CPU utilization, memory utilization, storage utilization, and network utilization, then the second performance data may include their respective performance metric values such as 82%, 85%, 34%, and 87% respectively. In an example, determination engine 128 may determine an nth performance data of the selected performance metrics.
Determination engine 128 may further determine whether the second performance data for any of the performance metrics meets a criterion defined on the first device 102. In an example, a criterion(s) may be defined for a performance metric on first device 102. In another example, a criterion(s) on first device 102 may be defined through second device 104. The criterion may include a rule(s) for a performance metric which when met may trigger an action on first device 102. For example, a criterion may be defined for CPU utilization that may comprise a rule that if CPU utilization exceeds a certain pre-defined value, for example, 75%, an action may be initiated on first device 102. Likewise, a criterion(s) may be defined for other performance metrics on first device 102. In an example, first device 102 may receive a request for determining at least one of the first performance data and the second performance data related to the selected performance metrics from second device 104. In response, first device 102 may determine at least one of the first performance data and the second performance data related to the performance metrics in response to a request from the second device 104. In an example, determination engine 128 may determine whether an nth performance data for any of the performance metrics meets a criterion defined on the first device 102
In response to a determination that the second performance data (or the nth performance data) for any of the selected performance metrics meets the criterion defined therefor on first device 102, transfer engine 130 may share the second performance data of the selected performance metrics (or the nth performance data) with second device 104. In an example, the sharing may comprise sending, to second device 104, the index entry and the second performance data of the respective performance metrics in an order corresponding to the fixed sequence of the performance metrics on first device 102.
In another example, the sharing may comprise sending, to second device 104, the index entry and the second performance data of only the impacted performance metrics in an order corresponding to the fixed sequence of the performance metrics on first device 102. In other words, the second performance data of only those performance metrics that meet the criterion defined therefor on first device 102 may be shared with second device 104. In a further example, index engine may be used to represent the second performance data of the selected performance metrics as a binary sequence of Boolean values. In the binary sequence, a true Boolean value may be used to represent the second performance data of a performance metric that meets the criterion defined on first device 102. A false Boolean value may be used to represent the second performance data of a performance metric that does not meet the criterion defined on first device 102. Maintenance engine 124 may organize the second performance data (or the nth performance data) of the performance metrics that meet the criterion defined the first device 102 in an order corresponding to the fixed sequence of the performance metrics on first device 102.
Transfer engine 130 may share the second performance data of the selected performance metrics as the binary sequence of Boolean values with second device 104. Transfer engine 130 may share, with second device 104, the second performance data of the performance metrics that meet the criterion defined on first device 102 in the order corresponding to the fixed sequence of the performance metrics on first device 102. An example is illustrated in
In response, second device 104 may use the binary sequence of Boolean values and the second performance data of the performance metrics that meet the criterion defined on first device 102, to identify changed performance metrics on first device 102. Second device 104 may use the second performance data of the performance metrics that meet the criterion defined on the first device 102 as current performance data of the changed performance metrics on second device 104.
In an example, the second performance data of the selected performance metrics may be used by second device 104 for monitoring the performance metrics on first device 102. In another example, the second performance data of the selected performance metrics may be used for generating alerts by second device 104.
In response to a determination that the second performance data for any of the selected performance metrics does not meet the criterion defined on first device 102, the second performance data may not be shared with second device 104. If none of the selected performance metrics meet the criterion defined therefor, it may indicate that none of the selected performance metrics have changed to the extent that the rule defined in their respective criteria has been met. In such case, in an example, first device 102 may send a keepalive message(s) to second device 104 to simply indicate that a particular component related to a performance metric is still functional. In response to the keepalive message, second device 104 may use the first performance data of the selected performance metrics as current performance data of the respective performance metrics. This is illustrated in
There may be performance metrics that may change more frequently than others. For example, CPU utilization, memory utilization etc. On the other hand, some performance metrics may not change often. For example, total disk capacity of a host, total number of CPUs in a host etc. In an example, based on a pre-defined frequency-based criterion, determination engine 128 may determine both such categories of metrics. Those performance metrics that meet the pre-defined frequency-based criterion (e.g., changing more than a pre-defined number of times in a pre-defined time period) may be shared with the second device 104 in a manner described earlier. On the other hand, those performance metrics that do not meet the pre-defined time-based criterion (e.g., do not change more than a pre-defined number of times in a pre-defined time period) may not be shared with the second device 104.
In an example, instead of exact values of the second performance data (or the nth performance data) of the performance metrics, approximate values may be shared with the second device 104. The performance data may transferred only when it crosses from one range to another. To this end, first the whole range may be divided into multiple ranges. For example, for CPU utilization metrics, the possible values may be divided into a range of 0% to 100%, of 5% each interval. In another scenario, if the upper limits of the possible metrics values are not known, but the extent of changes are known (for example, for a metric “Data Transferred per Second”) where the fluctuations may be couple of KBs, then for such metrics the increase or decrease of say 10 KB may be considered as a range. For example, if the current value is 86 KB/s and, during next minute interval, if it is 88 KB/s, then they both may be considered in the same window. However, if the value is 67 KB/s or 92 KB/s, then the new value is in another range. Thus in the subsequent interval, if the metric values remain in the same range, then the value may not be shared with the second device.
In an example, index engine may assign a common timestamp to the first performance data of the respective performance metrics on first device 102. Transfer engine 130 may share the common timestamp along with the first performance data with second device 104. This may save network bandwidth during transfer of the first performance data from first device 102 to second device 104, since instead of sharing individual timestamps for the first performance data of the respective performance metrics a single common timestamp is shared. Likewise, index engine may assign a common timestamp to the second or nth performance data of the respective performance metrics on first device 102. Transfer engine 130 may share the common timestamp along with a particular performance date with second device 104.
In an example, device 700 may represent any type of computing device capable of executing machine-readable instructions. Examples of computing device may include, without limitation, a server, a desktop computer, a notebook computer, a tablet computer, a thin client, a mobile device, a personal digital assistant (PDA), and the like.
In an example, device 700 may be a network device. Examples of the network device may include, without limitation, a network router, a virtual router, a network switch, and a virtual switch.
In an example, device 700 may be a storage device. Examples of the storage device may include, without limitation, a non-transitory machine-readable storage medium that may store, for example, machine executable instructions, data file, and metadata related to a data file. Some non-limiting examples of a non-transitory machine-readable storage medium may include a hard disk, a storage disc (for example, a CD-ROM, a DVD, etc.), a disk array, a storage tape, a solid state drive, a Serial Advanced Technology Attachment (SATA) disk drive, a Fibre Channel (FC) disk drive, a Serial Attached SCSI (SAS) disk drive, a magnetic tape drive, and the like. The storage device may be a direct-attached storage i.e. storage that is directly attached to a node. In an example, the storage device may be an external storage (for example, a storage array) that may communicate with a node via a communication interface.
In an example, device 700 may be may be communicatively coupled to a second device (e.g., second device 104 of
In an example, device 700 may include a selection engine 720, an index engine 722, a maintenance engine 724, a sequence engine 726, a determination engine 728, and a transfer engine 730. In an example, selection engine 720, index engine 722, maintenance engine 724, sequence engine 726, determination engine 728, and transfer engine 730 may perform functionalities as described in respect of selection engine 120, index engine 122, maintenance engine 124, sequence engine 126, determination engine 128, and transfer engine 130 of
In an example, selection engine 720 may select performance metrics for collection from device 700. Index engine 722 may index the performance metrics by assigning an index entry to respective performance metrics on device 700. Maintenance engine 724 may maintain a fixed sequence of the performance metrics on device 700. Sequence engine 726 may share the fixed sequence of the performance metrics along with the index entry assigned to the respective performance metrics with a second device. Determination engine 728 may determine a first performance data of the respective performance metrics on device 700. Transfer engine 730 may share the first performance data of the respective performance metrics with the second device. In an example, the sharing may comprises sending, to the second device, the index entry and the first performance data of the respective performance metrics in an order corresponding to the fixed sequence of the performance metrics on device 700.
For the purpose of simplicity of explanation, the example method of
It should be noted that the above-described examples of the present solution is for the purpose of illustration. Although the solution has been described in conjunction with a specific example thereof, numerous modifications may be possible without materially departing from the teachings of the subject matter described herein. Other substitutions, modifications and changes may be made without departing from the spirit of the present solution. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the parts of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or parts are mutually exclusive.
Number | Name | Date | Kind |
---|---|---|---|
6311175 | Adriaans | Oct 2001 | B1 |
6643613 | McGee | Nov 2003 | B2 |
6772411 | Hayes et al. | Aug 2004 | B2 |
7444263 | White | Oct 2008 | B2 |
7631034 | Haustein | Dec 2009 | B1 |
7644162 | Zhu | Jan 2010 | B1 |
7966524 | Hayton | Jun 2011 | B2 |
8175863 | Ostermeyer et al. | May 2012 | B1 |
8447851 | Anderson et al. | May 2013 | B1 |
9160447 | Pillai | Oct 2015 | B1 |
10061816 | Convertino | Aug 2018 | B2 |
20050015624 | Ginter | Jan 2005 | A1 |
20060072583 | Sanda et al. | Apr 2006 | A1 |
20070283326 | Consolatti | Dec 2007 | A1 |
20080021994 | Grelewicz | Jan 2008 | A1 |
20100281160 | Ros-Giralt | Nov 2010 | A1 |
20120079098 | Moehler | Mar 2012 | A1 |
20130030761 | Lakshminarayan | Jan 2013 | A1 |
20130042088 | Archer | Feb 2013 | A1 |
20140006858 | Helfman | Jan 2014 | A1 |
20140040174 | Leung | Feb 2014 | A1 |
20140220998 | Kovacs | Aug 2014 | A1 |
20160323157 | Marvasti et al. | Nov 2016 | A1 |
20160335260 | Convertino | Nov 2016 | A1 |
20160337226 | Padala et al. | Nov 2016 | A1 |
20160366033 | Lange et al. | Dec 2016 | A1 |
20170085447 | Chen | Mar 2017 | A1 |
20170351226 | Bliss | Dec 2017 | A1 |
20180089258 | Bhattacharjee | Mar 2018 | A1 |
20180089259 | James | Mar 2018 | A1 |
20180089262 | Bhattacharjee | Mar 2018 | A1 |
20180089269 | Pal | Mar 2018 | A1 |
20180089278 | Bhattacharjee | Mar 2018 | A1 |
20180089312 | Pal | Mar 2018 | A1 |
20180089324 | Pal | Mar 2018 | A1 |
20180287902 | Chitalia | Oct 2018 | A1 |
20180316547 | Kamath Govinda | Nov 2018 | A1 |
20180316759 | Shen | Nov 2018 | A1 |
20190052551 | Barczynski | Feb 2019 | A1 |
20190095488 | Bhattacharjee | Mar 2019 | A1 |
20190095491 | Bhattacharjee | Mar 2019 | A1 |
20190138419 | Poghosyan | May 2019 | A1 |
20190138638 | Pal | May 2019 | A1 |
20190138639 | Pal | May 2019 | A1 |
20190138640 | Pal | May 2019 | A1 |
20190138641 | Pal | May 2019 | A1 |
20190138642 | Pal | May 2019 | A1 |
20190147084 | Pal | May 2019 | A1 |
20190147085 | Pal | May 2019 | A1 |
20190147086 | Pal | May 2019 | A1 |
20190147092 | Pal | May 2019 | A1 |
20190163796 | Hodge | May 2019 | A1 |
20190258632 | Pal | Aug 2019 | A1 |
20190258636 | Bhattacharjee | Aug 2019 | A1 |
20190258637 | Bhattacharjee | Aug 2019 | A1 |
20190272271 | Bhattacharjee | Sep 2019 | A1 |
20190310977 | Pal | Oct 2019 | A1 |
20190317816 | Chandran | Oct 2019 | A1 |
20190334795 | Terayama | Oct 2019 | A1 |
20190340564 | Holmquist | Nov 2019 | A1 |
20200007405 | Chitalia | Jan 2020 | A1 |
20200050586 | Pal | Feb 2020 | A1 |
20200050607 | Pal | Feb 2020 | A1 |
20200050612 | Bhattacharjee | Feb 2020 | A1 |
20200065303 | Bhattacharjee | Feb 2020 | A1 |
Entry |
---|
Francesco Fusco et al., NET-FLi: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic , 2010, [Retrieved on Nov. 18, 2020]. Retrieved from the internet: <URL: https://dl.acm.org/doi/pdf/10.14778/1920841.1921011> 12 Pages (1382-1393) (Year: 2010). |
Paolo Ciaccia et al., Searching in Metric Spaces with User-Defined and Approximate Distance, 2002, [Retrieved on Nov. 18, 2020]. Retrieved from the internet: <URL: https://dl.acm.org/doi/pdf/10.1145/582410.582412> 40 Pages (398-437) (Year: 2002). |
“Prometheus—Monitoring system & time series database”, available online at <https://web.archive.org/web/20190330171906/https://prometheus.io/>, Mar. 30, 2019, 5 pages. |
Openstack Wiki, “Monasca”, 2014, available online at <https://web.archive.org/web/20190209055844/https://wiki.openstack.org/wiki/Monasca>, Feb. 9, 2019, 15 pages. |
Server Density, “25 Monitoring and Alerting Tools”, Aug. 1, 2016, available online at <https://blog.serverdensity.com/25-monitoring-and-altering-tools/>, 8 pages. |
Rackspace.com, “Rackspace Monitoring Concepts”; printed on Jul. 9, 2018 from https://developer.rackspace.com/docs/rackspace-monitoring/v1/getting-started/concepts/; 5 pages. |
Thalheirn et al.; “Sieve: Actionable Insights From Monitored Metrics in Distributed Systems”; Dec. 11, 2017; 14 pages. |
Number | Date | Country | |
---|---|---|---|
20200319989 A1 | Oct 2020 | US |