A computer system may include applications that are released and able to run on various combinations of database systems, operating systems, virtualization layers and cloud services, such as Infrastructure-as-a-Service (“IaaS”). Various infrastructure components of the computer system may be instrumented and monitored to help keep business processes up and running. While a snapshot of current monitoring data may provide a relatively good impression of current system behavior, monitoring data history for a relatively long period of time may better help determine how the behavior of the computer system changes over time. For example, a monitoring data history of more than one year may be maintained, which might add up to several 100 Giga-Bits (“GB”) of raw data for various elements of the computer system. Keeping such a substantial amount of data, however, may be expensive and increase the Total Cost of Ownership (“TCO”) of the computer system.
The following description is provided to enable any person in the art to make and use the described embodiments and sets forth the best mode contemplated for carrying out some embodiments. Various modifications, however, will remain readily apparent to those in the art.
In some cases, a computer system may include applications that are released and able to run on various combinations of database systems, operating systems, virtualization layers and cloud services, such as IaaS. By way of example, only
Various infrastructure components of the system 100 may be instrumented and monitored to help keep business processes up and running. While a snapshot may provide a relatively good impression of current system 100 behavior, a monitoring platform 150 may receive monitoring data and store information into a storage unit 160 as monitoring data history 170 for a relatively long period of time to better determine how the behavior of the computer system 100 changes over time. For example, a monitoring data history 170 of more than one year may be maintained, which might add up to several 100 GB of raw data for various elements of the computer system 100. Keeping such a substantial amount of data, however, may be expensive and increase the TCO of the computer system 100.
One approach to reducing the amount of stored monitoring data history 170 is to aggregate the information. For example, after one minute the raw data that was originally collected every 10 seconds may be aggregated on a minute basis.
To attempt to partly compensate this accuracy loss, maximum and minimum values associated with an aggregated time period may also be maintained (as illustrated by the “Max” and “Min” columns in the table 200 of
To avoid such problems,
At S410, monitoring data for a “computer system” may be received, the monitoring data including at least one d digit operating performance parameter of the computer system. As used herein, the phrase “computer system” may refer to a system that includes, for example, a database system, an operating system, a virtualization layer, a cloud service, an infrastructure as a service platform, a real-time analytics, interactive data exploration and application platform, a real time data acquisition platform, a transactional, analytical, online application, a customer mobile application, a business object suite, and/or a business objects data service.
At S420, a rounding engine may access the monitoring data and transform the monitoring data into rounded monitoring data such that the d digit operating performance parameter is rounded to preserve only the m most significant digits, m being less than d. Consider, for example, a 6 digit operating performance parameter of “123456” that is to be rounded to preserve only the 3 most significant digits. In this case, the rounding engine would transform “123456” into “123000.” Now consider, for example, a 6 digit operating performance parameter of “123456” that is to be rounded to preserve the 4 most significant digits. In this case, the rounding engine would transform “123456” into “123500.” According to some embodiments, a particular digit may be rounded to the nearest integer. In other approaches, a digit might always be rounded down (or up). Note that rounding monitoring data may be created for each monitoring data that is received (that is, aggregation or average values may be avoided).
At S430, the rounded monitoring data may then be stored into a history storage unit. The history storage unit may, for example, store the rounded monitoring data into a rounded monitoring data history. The history storage unit may comprise, for example, columnar data storage in an in-memory database. The rounded monitoring data history in the history storage unit may then later be retrieved and used to determine, for example, a standard aggregation, a sum, an exception aggregation, a maximum value, and/or a minimum value. Note that separate rounded monitoring data history may be maintained for multiple computer systems (and the information about each computer system may later be combined and/or analyzed as appropriate).
The computer system 500 may include one or more data sources, such as a query-responsive data source or a source that is or becomes known, including but not limited to a Structured-Query Language (“SQL”) relational database management system. The data source may comprise a relational database, a multi-dimensional database, an eXtendable Markup Language (“XML”) document, or any other data storage system storing structured and/or unstructured data. The data of the data source may be distributed among several relational databases, dimensional databases, and/or other data sources. Embodiments are not limited to any number or types of data sources. For example, the data source may comprise one or more OLAP databases, spreadsheets, text documents, presentations, etc.
In some embodiments, a data source may be implemented in Random Access Memory (e.g., cache memory for storing recently-used data) and one or more fixed disks (e.g., persistent memory for storing their respective portions of the full database). Alternatively, the data source may implement an “in-memory” database, in which volatile (e.g., non-disk-based) memory (e.g., Random Access Memory) is used both for cache memory and for storing its entire respective portion of the full database. In some embodiments, the data of the data source may comprise one or more of conventional tabular data, row-based data, column-based data, and object-based data. The data source may also or alternatively support multi-tenancy by providing multiple logical database systems which are programmatically isolated from one another. Moreover, the data of the data source may be indexed and/or selectively replicated in an index to allow fast searching and retrieval thereof.
A rounding engine 580 in the monitoring platform 550 may receive monitoring data, round the monitoring data to preserve a pre-determined number of most significant digits, and store the rounded information into a columnar database storage unit 560 as rounded monitoring data history 570. The rounded monitoring data history 570 may represent a relatively long period of time and may facilitate a determination about how the computer system 500 behavior changes over time. Note that such an approach may avoid aggregation and utilize efficient compression capabilities of columnar data storage in an in-memory database. For example, a columnar database may have relatively good compression ratio when the table columns contain many duplicates. Unfortunately, accurate raw data generally does not lead to many duplicates. In contrast, rounded raw data does generally include many duplicates depending on the number of digits that are rounded. If only the first digit of a fixed-length number (counting from left) is maintained, there will only be 10 different values per column. If the second digit of the fixed-length number is also maintained, there will be a maximum of 100 different values per column, etc.
Assuming a normal deviation of rounding errors, a deviation of the rounded operating performance parameters as compared to the operating performance parameters prior to rounding is given by the following equation:
where σ is the deviation, X is the operating performance parameter, y is the rounded operating performance parameter, and N is the number of times monitoring data was received. Note that the formula may comprise a calculation of a standard deviation of a set of normal distributed data records. Xi represents the original data records (8 digits, without any rounding), and Yi represents the rounded data records. The term (Xi-Yi) represents the variance of both original and rounded data records, and it describes how much the rounded data record deviated from the original data record. Calculating the standard deviation in this way may provide an estimate of how much deviation can be expected between the original and rounded data records after aggregation (sum, average) for a data set with n data records. When the size of collected data is a problem, the data may be aggregated (sum up or average data records that are collected each minute to an aggregated data record for each hour, for instance) or the precision of data records may be reduced (rounding or filtering data). The formula illustrates that the deviation of aggregated rounded and aggregated original data may be negligible when the number of data records is high enough. Thus, losing data precision may provide a substantially better alternative as compared to aggregating monitoring data upfront.
On an aggregated level, the deviation of rounded data compared to generated original data may be negligible, even for relatively low accuracy levels. As a result, aggregations may continue to work with rounded data with almost no difference as compared to original, un-rounded data. Moreover, the deviation of single raw data records as compared to rounded records at digit 2 may acceptable for the purpose of root cause analysis of a computer system, because the maximum deviation may be substantially 5%. Because the rounded data is kept at the original sampling rate, any smoothing effect may be avoided and analysis may still calculate standard aggregations, like sums, and exceptional aggregations, like maximum and minimum values, in all directions.
Note that embodiments of a monitoring platform having a rounding engine may be implemented in any of a number of different ways. For example,
The apparatus 500 includes a processor 710 operatively coupled to a communication device 720, a data storage device 730, one or more input devices 740, one or more output devices 750, and a memory 760. The communication device 720 may facilitate communication with external devices, such as a reporting client, a data storage device, or elements of a computer system being monitored. The input device(s) 740 may comprise, for example, a keyboard, a keypad, a mouse or other pointing device, a microphone, knob or a switch, an Infra-Red (“IR”) port, a docking station, and/or a touch screen. The input device(s) 740 may be used, for example, to enter information into apparatus 700 such as rounding information, report generation requests, etc. The output device(s) 750 may comprise, for example, a display (e.g., a display screen) a speaker, and/or a printer to output monitoring data history reports.
The data storage device 730 may comprise any appropriate persistent storage device, including combinations of magnetic storage devices (e.g., magnetic tape, hard disk drives and flash memory), optical storage devices, Read Only Memory (“ROM”) devices, etc., while the memory 760 may comprise Random Access Memory (“RAM”).
A rounding engine 732 may comprise program code executed by processor 710 to cause apparatus 700 to perform any one or more of the processes described herein. Embodiments are not limited to execution of these processes by a single apparatus. The monitoring data history 734 and/or rounded monitoring data history 736 may be stored, for example, in a columnar database. The data storage device 730 may also store data and other program code for providing additional functionality and/or which are necessary for operation of apparatus 700, such as device drivers, operating system files, etc.
To facilitate different number ranges and scales, some embodiments described herein may use a fixed precision for all values, counting from left, of the two most significant digits. That is, a value of “1,124,345” will be rounded to “1,100,000” and a value of “193” will be rounded to “190.”
Note that the table 900 may show the size of the table in a database, depending on the number of relevant digits that are not rounded. Starting with the maximum of 8 digits (not rounded at all) up to 1 digit (7 digits rounded). In the case of 8 digits (not rounded at all) the table within the database has a size of 17,179 KB. In case of 2 digits (6 digits are rounded), the same table has a size of 1,133 KB. That is, 8 digits (not rounded at all) compared to 2 digits (6 digits are rounded) shows a compression factor of about 17,179 KB divided by 1,133 KB is approximately 15. In the table, the “Digits” column represents the number of unrounded digits. The original, raw, un-rounded data has 8 digits. The column “Size” in the table 900 shows the remaining table size depending on the number of rounded digits.
In one approach to life cycle management for historical data, the original sample data is stored in a rounded format from the beginning. That is, the transformation is performed as each operating performance parameter is received. In another approach, the original sample data may be preliminarily stored as measured with the highest accuracy (that is, un-rounded). An asynchronous job may then periodically rounds data, for which the highest accuracy is no longer needed. For example,
When the event occurs at S1020, each of the stored d digital operating performance parameters is rounded to the m most significant digits at 51030. The batch of rounded operating performance values may then be added to a rounded monitoring data history at 51040 (and the original un-rounded values may be deleted). The method 1000 may then continue collecting un-rounded monitoring data at S1010. In this way, the rounding transformation may be performed asynchronously (or synchronously) for a plurality of received operating performance parameters upon an occurrence of an event.
The foregoing diagrams represent logical architectures for describing processes according to some embodiments, and actual implementations may include more or different components arranged in other manners. Other topologies may be used in conjunction with other embodiments. Moreover, each system described herein may be implemented by any number of devices in communication via any number of other public and/or private networks. Two or more of such computing devices may be located remote from one another and may communicate with one another via any known manner of network(s) and/or a dedicated connection. Each device may comprise any number of hardware and/or software elements suitable to provide the functions described herein as well as any other functions. For example, any computing device used in an implementation of system 500 may include a processor to execute program code such that the computing device operates as described herein.
All systems and processes discussed herein may be embodied in program code stored on one or more non-transitory computer-readable media. Such media may include, for example, a floppy disk, a CD-ROM, a DVD-ROM, a Flash drive, magnetic tape, and solid state Random Access Memory (RAM) or Read Only Memory (ROM) storage units. Embodiments are therefore not limited to any specific combination of hardware and software.
Embodiments described herein are solely for the purpose of illustration. Those in the art will recognize other embodiments may be practiced with modifications and alterations to that described above.