The present invention relates to data warehouse systems, and more particularly, to an improved system and method for allocating and managing storage resources to optimize system performance.
A database is a collection of stored data that is logically related and that is accessible by one or more users or applications. A popular type of database is the relational database management system (RDBMS), which includes relational tables, also referred to as relations, made up of rows and columns (also referred to as tuples and attributes). Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, thing, or other object about which the table contains information.
One of the goals of a database management system is to optimize the performance of queries for access and manipulation of data stored in the database. Given a target environment, an optimal query plan is selected, with the optimal query plan being the one with the lowest cost, e.g., response time, as determined by an optimizer. The response time is the amount of time it takes to complete the execution of a query on a given system.
In computer systems featuring a plurality of storage devices, storage space of hard disks (HDDs), solid state storage devices (SSDs), or other storage medium, can be managed to ensure that the most frequently accessed data stays on the faster storage devices. One such system, Teradata Virtual Storage (TVS) by Teradata Corporation, attempts to achieve optimal system performance by placing the most frequently accessed data (hottest) in the locations with lowest average response time (fastest).
In a Teradata Database system, the Teradata Virtual Storage subsystem allocates storage for Teradata Access Module Processors (AMPs), described below, and identifies these units of storage, known as cylinders, by an identifier known as a cylinder identifier or cylinder ID. One component of this cylinder ID is an index field which, along with other fields in the cylinder ID, is used to logically identify the piece of storage. TVS may move the contents of the cylinder from one physical location to another, but will still refer to these contents by the same cylinder ID. The cylinder ID to physical location relationship is stored in a map on the disk where the cylinder currently resides. Algorithms in TVS attempt to keep the maximum index value as small as possible to limit the amount of memory required for lookup. Indexes can be reused when a cylinder is freed. The cylinder ID created from the re-used index may point to a cylinder on a different disk from the original cylinder.
This process works fine as long as all disks which might hold cylinders for an AMP are accessible during startup so that TVS can determine all of the cylinder ID indexes that are in-use for the AMP. If in-use indexes are not accounted for because a disk is missing, they may be reused and corruption will occur when the failed disk is returned to service because there will be duplicate cylinder IDs. Because of this, and the fact that other components of the Teradata Database system in addition to TVS have trouble starting up with cylinders missing, AMPs that might be affected by a missing disk are kept offline in a state known as “FATAL**” to prevent any allocations that might result in a duplicate cylinder ID index.
There is a desire to be able to startup the database system with some storage components missing. A method to prevent the assignment of duplicate cylinder ID indexes during startup is desired.
Simple solutions such as segregating cylinder IDs by device do not work because TVS can migrate cylinders between devices and the cylinder ID must not be changed when the cylinder is moved. A system for keeping track of the highest index allocated and using an increment above that when a disk is down has the problem of escalating values if there are multiple restarts while the device is missing, i.e., each restart will look at the highest index assigned and increment it to be safe and the incremented value becomes the new high value for the next restart. Maintaining a mirror copy of each disk's map on an alternate disk is another solution that may be desirable for other reasons, but it uses a large amount of disk space to hold the duplicate maps and also requires writes to both disks to write the modified map blocks when the map is updated. The invention described below provides a solution to the reuse and escalation problems discussed above without a mirrored map or significant memory or disk space overhead.
Teradata Virtual Storage, also referred to as Virtual Storage System (VSS), is described in U.S. Pat. Nos. 7,562,195 and 8,375,193, which are incorporated by reference herein.
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures, in which:
It is to be understood that the following disclosure provides many different embodiments or examples for implementing different features of various embodiments. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting.
Contemporary database system tools enable system resources, such as storage disks, to be profiled to allow users to stipulate “temperatures” to specific storage device zones thereby providing the opportunity to improve performance by placing special, frequently accessed data on faster portions of devices. Modern database systems may feature groups of individual disk drives or other storage media, and thus a zone of contiguous storage space, such as a track, sectors, blocks, extents, etc., of a particular device can't be assumed to be, for example, fastest based on the zone's ID. As referred to herein, a data storage device zone comprises a contiguous physical section of storage space of a data storage medium. A device profiler allows testing each zone individually and assignment of a numerical value representing a profiled access speed to the tested zone. Further, users may assign a relative temperature to an allocation request and to provide guidelines on how devices should be split-up amongst performance zones. As referred to herein, a data “temperature” comprises a data access frequency quantifier that specifies the frequency of access of particular data. For example, data that is frequently accessed may be qualified with a temperature of “HOT” while data that is infrequently accessed may be assigned a temperature of “COOL.”
As shown, the database system 100 includes one or more processing nodes 1051 . . . Y that manage the storage and retrieval of data in data storage facilities 1101 . . . Y. Each of the processing nodes may host one or more physical or virtual processing modules, such as one or more access module processors (AMPS). Each of the processing nodes 1051 . . . Y manages a portion of a database that is stored in a corresponding one of the data storage facilities 1101 . . . Y. Each of the data-storage facilities 1101 . . . Y includes one or more disk drives or other storage medium.
The system stores data in one or more tables in the data-storage facilities 1101 . . . Y. The rows 1151 Y of the tables are stored across multiple data storage facilities 1101 . . . Y to ensure that the system workload is distributed evenly across the processing nodes 1151 Y. A parsing engine 120 organizes the storage of data and the distribution of table rows 1151 Y among the processing nodes 1051 . . . Y and accesses processing nodes 1151 Y via an interconnect 130. The parsing engine 120 also coordinates the retrieval of data from the data storage facilities 1101 . . . Y in response to queries received from a user, such as one at a client computer system 135 connected to the database system 100 through a network 125 connection. The parsing engine 120, on receiving an incoming database query, applies an optimizer 122 component to the query to assess the best plan for execution of the query. Selecting the optimal query execution plan includes, among other things, identifying which of the processing nodes 1151 Y are involved in executing the query and which database tables are involved in the query, as well as choosing which data-manipulation techniques will serve best in satisfying the conditions of the query. To this end, the parser and/or optimizer may access a data dictionary 124 that specifies the organization, contents, and conventions of one or more databases. For example, the data dictionary 124 may specify the names and descriptions of various tables maintained by the MPP system 150 as well as fields of each database. Further, the data dictionary 124 may specify the type, length, and/or other various characteristics of the stored tables. The database system typically receives queries in a standard format, such as the Structured Query Language (SQL) put forth by the American National Standards Institute (ANSI).
Each of the processing modules 2051 . . . N manages a portion of a database that is stored in a corresponding one of the data storage facilities 1101a . . . 1N. Each of the data storage facilities 1101a . . . 1N includes one or more disk drives or other storage mediums. The DBS may include multiple nodes 1052 . . . Y in addition to the illustrated node 1051 connected by way of the interconnect 130.
The system stores data in one or more tables in the data-storage facilities 1101a . . . 1N. The rows 1151a . . . 1N of the tables are stored across multiple data storage facilities 1101a . . . 1N to ensure that the system workload is distributed evenly across the processing modules 2051 . . . N. A parsing engine 221 organizes the storage of data and the distribution of table rows 1101a . . . 1N among the processing modules 2051 . . . N. The parsing engine 221 also coordinates the retrieval of data from the data-storage facilities 1101a . . . 1N in response to queries received from a user at a client computer system 1351 . . . N. The DBS 100 usually receives queries and commands to build tables in a standard format, such as SQL.
In one implementation, the rows 1151a . . . 1N are distributed across the data storage facilities 1101a . . . 1N by the parsing engine 221 in accordance with their primary index. The primary index defines the columns of the rows that are used for calculating a hash value. The function that produces the hash value from the values in the columns specified by the primary index is called the hash function. Some portion, possibly the entirety, of the hash value is designated a “hash bucket.” The hash buckets are assigned to data-storage facilities 1101a . . . 1N and associated processing modules 2051 . . . N by a hash bucket map. The characteristics of the columns chosen for the primary index determine how evenly the rows are distributed.
In one example system, a parsing engine, such as the parsing engine 120, is made up of three components: a session control 300, a parser 305, and a dispatcher 310 as shown in
In accordance with an embodiment, the system 100 may feature a virtual storage system (TVS) 230, hosted by one or more of the nodes, such as node 1051 depicted in
The Teradata Virtual Storage system replaces traditional fixed assignment disk storage with a virtual connection of storage to the data warehouse's AMPs, as illustrated in
As stated earlier, starting the database system with some storage components missing can result in corruption and reuse of cylinder IDs used by TVS. A method to startup the database system when storage components are missing is described below.
To facilitate the explanation of the system and invention, the following assumptions are presented:
The invention consists of a data structure, shown in
Minimally, the structure will contain:
Initially, all fields of the structure are zero and all devices are present, as shown in
The TVS system includes allocators for handling cylinder allocation. During startup, each allocator will determine the highest cylinder index in use by any of the AMPs that are assigned to it, as illustrated in
The structure is kept in persistent node-local storage, such as a memory mapped file, by each allocator. It is reinitialized to zero only when the system is initialized, e.g., a sysinit.
Referring again to
After the missing devices have been brought back into service and are no longer missing, the safe index is set to zero, as shown in step 1608, as there is no longer a chance of reusing a missing index.
As stated earlier,
Some key features and benefits of the methodology described above and illustrated in the figures include:
The foregoing description of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed.
Additional alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teaching. Accordingly, this invention is intended to embrace all alternatives, modifications, equivalents, and variations that fall within the spirit and broad scope of the attached claims.
This application claims priority under 35 U.S.C. §119(e) to the following co-pending and commonly-assigned patent application, which is incorporated herein by reference: Provisional Patent Application Ser. No. 62/098,049, entitled “METHOD AND SYSTEM FOR PREVENTING REUSE OF CYLINDER ID INDEXES IN A COMPUTER SYSTEM WITH MISSING STORAGE DRIVES,” filed on Dec. 30, 2014, by Gary Boggs.
Number | Name | Date | Kind |
---|---|---|---|
20110060887 | Thatcher | Mar 2011 | A1 |
20150363447 | Dickie | Dec 2015 | A1 |
20160196168 | Koizumi | Jul 2016 | A1 |
Number | Date | Country | |
---|---|---|---|
20160188248 A1 | Jun 2016 | US |
Number | Date | Country | |
---|---|---|---|
62098049 | Dec 2014 | US |