The present invention relates to a method for determining cached data for cloud storage architecture and a cloud storage system using the method. More particularly, the present invention relates to method for determining data in cache memory of the cloud storage architecture and a cloud storage system using the method.
For a cloud service system, it usually tries to provide its services to clients as soon as possible in response to the requests therefrom. When the number of clients is not large, the goal can be easily achieved. However, if the number of clients is significant, due to the limitation of hardware architecture of the cloud service system and the flow of network, there should have a reasonable room for response time. On the other hand, if the cloud service is commercially competing with other cloud services, no matter what the constraint is, with limited resources, the cloud service system should skillfully respond to their clients' requests in the shortest time. That is a popular issue that lots of developers of cloud system are faced with, and a suitable solution is very much welcome.
In a conventional working environment, please refer to
As the description above, it is obvious that determining the proper data to store in the cache 5 is important and can improve the performance of the cloud service since hot data (more accesses) can be accessed fast for most requests while cold data (less accesses) are provided with a tolerable slower speed. In average, time to response for all requests from the client computers 1 falls in an acceptable range. Currently, there are many conventional algorithms to determine data to be cached (stored in the cache 5). For example, Least Recently Used (LRU), Most Recently Used (MRU), Pseudo-LRU (PLRU), Segmented LRU (SLRU), 2-way set associative, Least-Frequently Used (LFU), Low Inter-reference Recent Set (LIRS), etc. These algorithms are performed by the characteristics of recency and frequency of the data been analyzed. The results have nothing to do with other data (not data-associated). There are some prior arts, such as Patent CN101777081A and DOI:10.1109/SKG.2005.136, disclosing another type of cache algorithm. They are categorized as “data-associated algorithms. They take original cache data (results from conventional cache algorithms) as target data to obtain “data-associated” data to be cached. It means new cached data are associated with the original cache data in certain degree (the new cache data have higher chance to appear along with the original cache data). The algorithms above are all found to be effective for some patterns of workloads. However, since they all count the data which appear within a relative time segment, rather than an absolute time segment, it causes a phenomenon that the data chosen to be cached in a first time segment, e.g. a first 8-hours, by all algorithms may not necessarily be accessed in a second time segment, e. g. a second 8-hours after the first 8-hours. It is quite easy to understand this since almost all data accesses are absolutely time-related or frequency-related, for example, booting during 8:55 AM to 9:05 AM every morning, meeting held in 2:00 PM Wednesdays, payroll billing once per two weeks, inventory conducted on the last day of every month, etc. Therefore, time stamp itself is an important and independent factor to consider for cached data. However, there is no such suitable solution yet.
This paragraph extracts and compiles some features of the present invention; other features will be disclosed in the follow-up paragraphs. It is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims.
The goal of the present invention is to provide a method for determining data in cache memory of a cloud storage system and a cloud storage system using the method. The method takes time-associated data accessed during a period of time in the past to analyze which data should be cached. The method includes the steps of: A. recording transactions from cache memory of a cloud storage system during a period of time in the past, wherein each transaction comprises a time of recording, or a time of recording and cached data been accessed during the period of time in the past; B. assigning a specific time in the future; C. calculating a time-associated confidence for every cached data from the transactions based on a reference time; D. ranking the time-associated confidences; and E. providing the cached data with higher time-associated confidence in the catch memory, and removing the cached data in the cache memory with lower time-associated confidence when the cache memory is full before the specific time in the future. The step E may be replaced by step E′: providing the cached data with higher time-associated confidence and data calculated from at least one other cache algorithm in the catch memory to fill the cache memory before the specific time in the future, wherein there is a fixed ratio between the cached data with higher time-associated confidence and the data calculated from other cache algorithm.
According to the present invention, the fixed ratio may be calculated based on the number of the data or space occupied by the data. The specific time is a specific minute in an hour, a specific hour in a day, a specific day in a week, a specific day in a month, a specific day in a season, a specific day in a year, a specific week in a month, a specific week in a season, a specific week in a year, or a specific month in a year. The transactions may be recorded regularly with a time span between two consecutively recorded transactions. The reference time may be within specific minutes in an hour, within specific hours in a day, or within specific days in a year.
The time-associated confidence is calculated and obtained by the steps of: C1. calculating a first number which is the number the reference time appeared in the period of time in the past; C2. calculating a second number which is the number of the reference time when a target cached data is accessed; and C3. dividing the second number by the first number.
Preferably, the cache algorithm is Least Recently Used (LRU) algorithm, Most Recently Used (MRU) algorithm, Pseudo-LRU (PLRU) algorithm, Random Replacement (RR) algorithm, Segmented LRU (SLRU) algorithm, 2-way set associative algorithm, Least-Frequently Used (LFU) algorithm, Low Inter-reference Recent Set (LIRS) algorithm, Adaptive Replacement Cache (ARC) algorithm, Clock with Adaptive Replacement (CAR) algorithm, Multi Queue (MQ) algorithm, or data-associated algorithm with target data coming from the result of step D. The data may be in a form of object, block, or file.
The present invention also discloses a cloud storage system. The cloud storage system includes: a host, for processing data access; a cache memory, connected to the host, for temporarily storing cached data for fast access; a transaction recorder, configured to or installed in the cache memory, connected to the host for recording transactions from the cache memory during a period of time in the past, wherein each transaction comprises a time of recording, or a time of recording and cached data been accessed during the period of time in the past, receiving a specific time in the future from the host, calculating a time-associated confidence for every cached data from the transactions based on a reference time, ranks the time-associated confidences, and providing the cached data with higher time-associated confidence in the catch memory, and removing the cached data in the cache memory with lower time-associated confidence when the cache memory is full before the specific time in the future; and a number of auxiliary memories, connected to the host, for distributedly storing data for access.
The cloud storage system may also include: a host, for processing data access; a cache memory, connected to the host, for temporarily storing cached data for fast access; a transaction recorder, configured to or installed in the cache memory, connected to the host for recording transactions from the cache memory during a period of time in the past, wherein each transaction comprises a time of recording, or a time of recording and cached data been accessed during the period of time in the past, receiving a specific time in the future from the host, calculating a time-associated confidence for every cached data from the transactions based on a reference time, ranks the time-associated confidences, and providing the cached data with higher time-associated confidence and data calculated from at least one other cache algorithm in the catch memory to fill the cache memory before the specific time in the future, wherein there is a fixed ratio between the cached data with higher time-associated confidence and the data calculated from other cache algorithm; and a number of auxiliary memories, connected to the host, for distributedly storing data for access. The fixed ratio may be calculated based on the number of the data or space occupied by the data.
According to the present invention, the specific time in the future may be a specific minute in an hour, a specific hour in a day, a specific day in a week, a specific day in a month, a specific day in a season, a specific day in a year, a specific week in a month, a specific week in a season, a specific week in a year, or a specific month in a year. The transactions may be recorded regularly with a time span between two consecutively recorded transactions. The reference time may be within specific minutes in an hour, within specific hours in a day, or within specific days in a year.
The time-associated confidence is calculated and obtained by the steps of: C1. calculating a first number which is the number the reference time appeared in the period of time in the past; C2. calculating a second number which is the number of the reference time when a target cached data is accessed; and C3. dividing the second number by the first number.
Preferably, the cache algorithm may be LRU algorithm, MRU algorithm, PLRU algorithm, RR algorithm, SLRU algorithm, 2-way set associative algorithm, LFU algorithm, LIRS algorithm, ARC algorithm, CAR algorithm, MQ algorithm, or data-associated algorithm with target data generated from the transaction recorder. The data may be in a form of object, block, or file.
The data cached are time-related. Thus, when the next related time comes, these data are most possible to be accessed. Before the related time, these data can be stored to the cache memory to improve the performance of the cloud storage system. This is what conventional cache algorithms are hard to achieve.
The present invention will now be described more specifically with reference to the following embodiments.
An ideal architecture to implement the present invention is shown in
Job function of the host 101 is mainly to process data access for the requests from the client devices. In fact, the host 101 may be a controller in the server 100. In other embodiments, if a CPU (Central Processing Unit) of the server 100 has the same function of the controller mentioned above, the host 101 can refer to the CPU or even the server 100 itself. It is not to define the host 100 by the form but its function. In addition, the host 101 may have further functions, e. g. fetching hot data to the cache memory 102 for caching. It is not in the scope of the present invention.
The cache memory 102 is connected to the host 101. It can temporarily store cached data for fast access. In practice, the cache memory 102 can be any hardware providing high speed data access. For example, the cache memory 102 may be an SRAM. The cache memory 102 may be an independent module for a large cloud storage system. Some architecture may embed it into the host 101 (CPU). Like caches in other cloud storage system, there may be a predefined caching algorithm to determine which data should be cached in the cache memory 102. The present invention is to provide a mechanism parallelly co-work with the existing caching algorithm for a specific purpose or timing. In fact, it can also dominate the caching mechanism to replace the cached data determined by the original caching algorithm.
The transaction recorder 103 is a key part in the cloud storage system 10. In this embodiment, it is a hardware module and configured to the cache memory 102. In other embodiment, the transaction recorder 103 may be software installed in a controller of the cache memory 102 or the host 101. In the present embodiment, the transaction recorder 103 is connected to the host 101. It has several functions that are the features of the present invention: recording transactions from the cache memory 102 during a period of time in the past, wherein each transaction includes a time of recording, or a time of recording and cached data been accessed during the period of time in the past, receiving a specific time in the future from the host 101, calculating a time-associated confidence for every cached data from the transactions based on a reference time, ranking the time-associated confidences, and providing the cached data with higher time-associated confidence in the catch memory 102 and removing the cached data in the cache memory 102 with lower time-associated confidence when the cache memory 102 is full before the specific time in the future (or providing the cached data with higher time-associated confidence and data calculated from other cache algorithm in the catch memory 102 to fill the cache memory 102 before the specific time in the future). These functions will be described with a method provided by the present invention later. It should be emphasized that the term “time-associated confidence” used in the present invention is similar to the definition of confidence value in the associated rule. The time-associated confidence is further extended to the confidence value calculated by taking a specific time or time segment as a target to obtain the probability one or more data had been accessed in the historical data.
The auxiliary memories 104 are also connected to the host 101. They can distributedly store data for access from the demands of clients. Different from the cache memory 102, the auxiliary memories 104 have slower I/O speed so that any data therein has slower access speed in response to access requests. Frequently accessed data in the auxiliary memories 104 will be duplicated and stored to the cache memory 102 for caching. In practice, the auxiliary memory 104 may be a SSD, HDD, writable DVD, or even magnetic tape. Arrangement of the auxiliary memories 104 depends on the purpose of the cloud storage system 10 or the workloads running over. In this example, there are 3 auxiliary memories 104. In fact, in a cloud storage system, the number of auxiliary memories may be hundreds to thousands, or even more.
Before further description, some definitions used in the present invention are explained here. Please refer to
It should be noticed that in practice, the number of transactions is large and may be thousands or more, for example, ten minutes of time span and records over 3 months. 24 transactions are used only as an example for illustration. The more transactions the transaction recorder 103 has, the more precise a demand of data in a specific time in the future is predicted. Of course, not all data cached in the cache memory 102 may be accessed during a period of time. As shown in
Before the method to determine data in the cache memory 102 is disclosed with the cloud storage system 10, look at the cached data first. Although there are 18 cached data, depending on the capacity of the cache memory 102, the number of the cached data may be larger than 18. The 18 cached data are currently available on 07:50:05 by the method of the present invention and/or other caching algorithms used by the cloud storage system 10. Since the transaction recorder 103 may add new data to the cache memory 102 from one of the auxiliary memories 104 if that data are accessed too often, cached data for analysis may change as well. There might be other data cached before 03:50:05 but removed because it is not requested or “expected to be accessed”.
From
The main goal of the present invention is to predict requests of data at a specific time in the future according to the historical information and provide corresponding data in the cache memory 102 before the specific time in the future comes. A method to determine data in cache memory 102 of the cloud storage system 10 has several processes. Please refer to
The third step is to calculate a time-associated confidence for every cached data from the transactions based on a reference time (S03). The reference time refers to the time “within specific minutes in an hour” (H00, each 20 minutes in the first hour of a day). In other example, the reference time may be “within specific hours in a day” or “within specific days in a year”, depending on the number of records and time span. In particular example, the reference time can be “within all sub-time units of a main-time units”. For example, within 24 hours in a day. The time-associated confidence is calculated and obtained by the steps of: A. calculating a first number which is the number the reference time appeared in the period of time in the past; B. calculating a second number which is the number of the reference time when a target cached data is accessed; and C. dividing the second number by the first number. In this example, the calculated time-associated confidences for all data are tabularized in
Next, rank the time-associated confidences (S04). The results of the examples are also shown in
In another embodiment, the last step (S05) can be different. It means the transaction recorder 103 has different function other than the one in the previous embodiment. The changed step is providing the cached data with higher time-associated confidence and data calculated from at least one other cache algorithm in the catch memory 102 to fill the cache memory 102 before the specific time in the future. There is a fixed ratio between the cached data with higher time-associated confidence and the data calculated from other cache algorithm. The fixed ratio is calculated based on the number of the data or space occupied by the data. Come back to
While the invention has been described in terms of what is presently considered to be the most practical and preferred embodiments, it is to be understood that the invention needs not be limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements included within the spirit and scope of the appended claims, which are to be accorded with the broadest interpretation so as to encompass all such modifications and similar structures.