The present application relates to archiving real-time data on system activities.
Enterprise systems are complex and keep evolving. It is difficult if not impossible to keep track of security vulnerabilities in such systems; many unknown zero-day vulnerabilities exist today. A promising solution is to monitor the machines inside the enterprise system, notify system administrators whenever abnormal behaviors are detected, and provide support to diagnose the abnormal behaviors. The monitoring data is a real-time data stream that describes system activities of all the monitored machines. To provide accesses to both real-time and historical data and to support subsequent queries and analysis, we propose a Data Stream Management System (DSMS) that archives the monitoring data of system activities.
Conventional systems only focus on how to support continuous queries over continuous streams and traditional stored data sets via computing physical query plans that are flexible enough to support optimizations and fine-grained scheduling decisions. As the bottleneck of archiving system activities is its huge amount of data and the queries rarely span across days, in this work, we investigate how to leverage the characteristics of system activities to improve the data archiving. No existing work has studied the improvement of data archiving from this aspect.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
In one aspect, a data stream system includes one or more monitored machines generating real-time data stream that describes system activities of the monitored machines; a data stream management module receiving the real-time data stream; and a data stream archiving module coupled to the data stream management module, the data stream archiving module including a data stream receiver and a data stream inserter.
In another aspect, the system activities are partitioned by machine and by day, and such partition is leveraged to physically partition the database 103. Next, leveraging the characteristics of the system activities, the system maintains a partial state of system objects that participate in the system activities to perform data deduplication in the memory, greatly reducing the number of times the server accesses database for such purposes. Additionally, since for all the system activities, only a small amount of data requires updates on the stored data, the server can maintain a buffer in the memory to hold the incoming data and perform batch insertion, eliminating the needs of parsing insertion SQLs for each record and improving I/O performance. Such buffer is also used to eliminate the needs of updating data in the database if the data to be updated is still in the buffer and never flushes to the database. Finally, another low-execution-frequency thread is used to insert historical data.
Advantages of the system may include one or more of the following. The system is specialized for optimizing the data archiving by exploiting the characteristics of system activities. The solution is the first in its kind to make data archiving store less duplicated data and become more scalable with low overhead.
Referring now to the drawings in which like numerals represent the same or similar elements and initially to
First the system activities are partitioned by machine and by day, and such partition is leveraged to physically partition the database 103. Second, leveraging the characteristics of the system activities, the system maintains a partial state of system objects that participate in the system activities to perform data deduplication in the memory, greatly reducing the number of times the server accesses database for such purposes. Third, since for all the system activities, only a small amount of data requires updates on the stored data, the server can maintain a buffer in the memory to hold the incoming data and perform batch insertion, eliminating the needs of parsing insertion SQLs for each record and improving I/O performance. Such buffer is also used to eliminate the needs of updating data in the database if the data to be updated is still in the buffer and never flushes to the database. As the data is partitioned and inserted in batches, parallel insertion using multi-thread is feasible and the insertion performance could be further improved. Finally, another low-execution-frequency thread is used to insert historical data.
A first storage device 122 and a second storage device 124 are operatively coupled to system bus 102 by the I/O adapter 120. The storage devices 122 and 124 can be any of a disk storage device (e.g., a magnetic or optical disk storage device), a solid state magnetic device, and so forth. The storage devices 122 and 124 can be the same type of storage device or different types of storage devices.
A speaker 132 is operatively coupled to system bus 102 by the sound adapter 130. A transceiver 142 is operatively coupled to system bus 102 by network adapter 140. A display device 162 is operatively coupled to system bus 102 by display adapter 160.
A first user input device 152, a second user input device 154, and a third user input device 156 are operatively coupled to system bus 102 by user interface adapter 150. The user input devices 152, 154, and 156 can be any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, a device incorporating the functionality of at least two of the preceding devices, and so forth. Of course, other types of input devices can also be used, while maintaining the spirit of the present principles. The user input devices 152, 154, and 156 can be the same type of user input device or different types of user input devices. The user input devices 152, 154, and 156 are used to input and output information to and from system 100.
Of course, the processing system 100 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements. For example, various other input devices and/or output devices can be included in processing system 100, depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art. For example, various types of wireless and/or wired input and/or output devices can be used. Moreover, additional processors, controllers, memories, and so forth, in various configurations can also be utilized as readily appreciated by one of ordinary skill in the art. These and other variations of the processing system 100 are readily contemplated by one of ordinary skill in the art given the teachings of the present principles provided herein.
Referring now to
In one embodiment, components 204, 206, 208, and 210 may include any components now known or known in the future for performing operations in physical (or virtual) systems (e.g., file access, Internet access, and spawn new processes to handle data, etc.), and data collected from various components (or received (e.g., as time series event data including file events and network events)) may be employed as input to the aging profiling engine 212 according to the present principles. The archival engine/controller 212 may be directly connected to the physical system or may be employed to remotely monitor components of the system according to various embodiments of the present principles.
While the machine-readable storage medium is shown in an exemplary embodiment to be a single medium, the term “machine-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable storage medium” shall also be taken to include any medium that is capable of storing or encoding a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention. The term “machine-readable storage medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many other embodiments will be apparent to those of skill in the art upon reading and understanding the above description. Although the present invention has been described with reference to specific exemplary embodiments, it will be recognized that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope of the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than a restrictive sense. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
This application claims priority to Provisional Application 62/137,414 filed Mar. 24, 2015, the content of which is incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62137414 | Mar 2015 | US |