Some embodiments of the invention relate to computing systems, and in particular, to space management for objects stored in the computing system.
Modern database systems have been developed to handle many different data types, including non-traditional data types such as images, text, audio, and video data. Such non-traditional data types are often stored as “large objects” (LOBs) in the database system. LOBs may be of any size, but are often much larger than traditional data types. For example, LOBs in some database systems may span anywhere from 1 Kbyte to many Gbytes in size.
Because of their size, LOBs often cannot be efficiently handled with the same techniques used to handle traditional data types. The size of LOBs could result in space management difficulties within the database system. Given this size issue with LOBs, the specific techniques used to handle storage and disk management tasks for LOBs could have a very significant impact upon the performance of the database system, e.g., with respect to system I/O and space utilization.
One possible approach for performing space management for large objects is to divide the available storage space into equal sized pages. The size of the page would be configured at the LOB creation time. Every I/O operation would be bounded by this size limit. The problem with this approach is that LOBs be associated with storage of objects having different sizes, and therefore one value for the page size may not be suitable for all object sizes.
For example, consider if the LOBs are stored with a relatively large page size. The advantage of the larger page size is that large LOBs may see an increase in I/O latency performance proportional to the size of the LOB. However, there are also significant disadvantages since the large page size could cause a significant waste of storage space for smaller LOBs.
Consider if the LOBs are stored with a relatively small page size. The advantage of the smaller page size is that less storage space will be wasted, since smaller LOBs will better fit the page size. However, this approach will more likely result in larger LOBS being split apart to fit into multiple separate pages. This could cause fragmentation and a decrease in I/O latency performance.
Another possible approach is to allow users to manually alter the page size. However, this may present a manageability problem since approach is necessitated by having fairly sophisticated and well-trained users that will have to be aware of and able to adequately adjust additional storage parameters.
Based on the foregoing, it is clearly desirable to provide a method and mechanism to more efficiently management storage for large objects.
Embodiments of the invention relate to methods, systems, and computer program products for implementing space management for large objects stored in the computing system. According to some embodiments, storage of large objects are managed by dynamically creating contiguous chunks of storage space of varying lengths. The length of each chunk may vary depending upon object size being stored, fragmentation of the storage space, available free space, and/or expected length of the object.
Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the invention.
The accompanying drawings are included to provide a further understanding of the invention and, together with the Detailed Description, serve to explain the principles of the invention. The same or similar elements between figures may be referenced using the same reference numbers.
Embodiments of the invention relate to methods, systems, and computer program products for implementing space management for large objects stored in the computing system. According to some embodiments, storage of large objects are managed by dynamically creating contiguous chunks of storage space of varying lengths. The length of each chunk may vary depending upon object size being stored, fragmentation of the storage space, available free space, and/or expected length of the object.
Within the database 106, objects may be stored on a storage device using one or more storage segments 110. Throughout this document, the invention may be described with respect using disk storage and disk drives as an illustrative, but a non-limiting, example of a storage device. Each storage segment is associated with a large number of chunks 116 and 118 to store the objects, e.g., LOB data. Each chunk is a contiguous portion of the storage system. A first structure 112 is used to track available free space within the segment 110. The first structure 112 can also be termed a “committed free space” (CFS) structure to the extent it represents free chunks that are guaranteed to correspond to already-committed transactions, and hence are available to be used by and allocated to other transactions. A second structure 114 is used to track space within the segment 110 that are not guaranteed to be associated with committed transactions, and hence are unavailable to be allocated to the extent they are already being used by a live, un-committed transaction. The second structure 114 can also be termed a “un-committed free space” (UFS) structure.
One key aspect of this embodiment is that the chunks 116 and 118 within a segment 110 may correspond to different sizes. In this way, objects can be stored within contiguous chunks in the segment that match as much as possible the exact size of the object being stored. This approach serves to significantly reduce fragmentation in the storage system. This also addresses the “one size fits all” problem of prior approaches that attempt to store all LOBs using the same fixed-sized pages. Any number of different and suitable sizes maybe used to allocate the chunks 116 and 118, spanning from very small chunks for smaller objects to much larger chunks for very large objects.
Another aspect of this embodiment is that the CFS and UFS structures are located within or associated with the same segment. This serves to increase the speed and efficiency of storage management and access since the system only needs to look at structures within a single segment to manage object storage. This is in contrast to alternative approaches that may require the storage system to look within multiple locations to perform these tasks, which could significantly decrease the speed and efficiency of the storage system.
In the illustrative approach of
The series of hash buckets continues as appropriate for increasing sized groups of chunks until there are sufficient hash buckets to track all chunks in the segment. In the present example, the last hash bucket 206 is used to track chunks in the segment ranging from between 1 Mbytes in size to 64 Mbytes in size. This means that the largest chunk size allowed in this example system is 64 Mbytes in size. A linked list 212 is maintained for hash bucket 206 to track the individual chunks in the segment within that size range. It is noted that the specific ranges disclosed in
For example, if a chunk is associated with a transaction that has not yet committed, then it will be tracked with the UFS structure. In some embodiments, it may be possible that a transaction has already committed, but due to different system delays that information is not yet known by the storage system. In this case, the chunk may actually be available to be re-allocated, but the storage system may not yet know the exact state of a particular chunk with regard to whether its associated transaction has or has not committed and therefore it is still listed with the UFS structure. A clean-up process may be employed to shift management of chunks for committed transactions from the UFS structure to the appropriate CFS structures.
It is noted that that in one embodiment, the UFS structure does not place the chunks into different groups based upon chunk size. Instead, chunks in the UFS structure are placed near other chunks that are associated with the same transaction. This approach is designed to optimize the clean-up process when a transaction commits, since all of the chunks associated with the same transaction will likely be re-allocated to the CFS structures at the same or similar time at or shortly after a commit.
Chunks 440 and 442 are not immediately available to be allocated, e.g., because they are associated with uncommitted transactions, with chunk 440 being 32 Kbyte in size and chunk 442 being 32 Mbyte in size. These chunks are tracked with UFS structure 490. In particular, UFS structure 490 is associated with a linked list having a structure 436 that corresponds and points to chunk 440. UFS structure 490 also includes a structure 438 that corresponds and points to chunk 442. Since these chunks 440 and 442 correspond to the UFS structure 490, these chunks will not automatically be allocated when there is a need for additional storage.
Chunks 444, 446, and 448 are available to allocated to new transactions, e.g., because they are associated with already-committed transactions, with chunk 444 being 12 Kbyte in size, chunk 446 being 24 Kbyte in size, and chunk 448 being 12 Kbyte in size. These chunks are tracked with CFS structures 402-406. In particular, CFS hash bucket 402 is employed to track available chunks that range from 2 Kbytes to 32 Kbytes-1 in size. CFS hash bucket 402 is associated with a linked list having a structure 430 that corresponds and points to chunk 444. This CFS hash bucket 402 also includes a structure 432 that corresponds and points to chunk 446 and a structure 434 that corresponds to and points to chunk 448. Since chunks 444, 446, and 448 are tracked with the CFS structures, these chunks will be automatically available to be allocated upon the need for additional storage.
Referring to
This process is illustrated in
In some embodiments, the space allocation check for an exact size fit for the LOB is performed only within a single metadata listing within the linked list of the CFS structure. In an alternate embodiment, some or all of the linked list can be traversed along the list to find a chunk having an exact size match for the LOB.
Returning back to
This process is illustrated in
As a result, the space management system will determine whether any of the free chunks that are available can be split to create a new 8 Kbyte chunk. In the present example, structure 833 is identified which corresponds to a free chunk 864 that is 12 Kbytes in size, which is large enough to split to create a new 8 Kbyte portion. The chunk 864 will be split as shown in
Returning back to
This process is illustrated based upon
As a result, the space management system will determine whether the segment can be expanded to add a new 28 Kbyte chunk to satisfy the request 873. In the present example, the segment is expanded to include a new free chunk 839 as shown in
Returning back to
At 620, a determination is made regarding whether there are any free chunks of the appropriate size tracked by the UFS structure that are actually available to be used for space allocation. If so, then at 622 the identified chunk is re-designated as being available for allocation. The LOB can then be stored into the re-designated chunk (624).
This process is illustrated in
Referring back to
At 626, the space management system identifies two or more free chunks that can be combined to provide enough space to store the LOB. In an embodiment, this action is preferably accomplished by identifying multiple contiguous free chunks. If multiple contiguous free chunks cannot be found, then multiple non-contiguous free chunks may be employed. At 628, the identified free chunks are allocated and used to store the LOB.
This process is illustrated as shown in
As a result, the space management system will determine whether there are multiple free chunks that can be combined together to store the 60 Kbyte LOB. In the present example, the free chunk 850 of size 48 Kbytes and the free chunk 864 of size 12 Kbytes can be combined together to form a space of the appropriate 60 Kbyte size. Therefore, these two free chunks are allocated and used to satisfy the request 875 to store the LOB.
At this point, the chunks are still being tracked by the UFS structure even though it is associated with a committed transaction, and therefore can actually be considered a free chunk available to be re-allocated. However, a scheduled clean-up process may not have actually taken action yet to re-designate the chunk(s) to the CFS structures.
During this time period, it is possible that there is a need to immediately re-allocate the chunk(s) to satisfy a space allocation request (708). If so, then the chunk(s) are identified from the UFS structure (716) and used to allocate space for the new space allocate request (718).
Otherwise, the system will wait for the scheduled clean-up process to address the de-allocated chunk(s) (710). Clean up activities will occur as scheduled (712) and the chunks will be associated with the appropriate CFS structures (714).
System Architecture Overview
According to one embodiment of the invention, computer system 2300 performs specific operations by processor 2307 executing one or more sequences of one or more instructions contained in system memory 2308. Such instructions may be read into system memory 2308 from another computer readable/usable medium, such as static storage device 2309 or disk drive 2310. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and/or software. In one embodiment, the term “logic” shall mean any combination of software or hardware that is used to implement all or part of the invention.
The term “computer readable medium” or “computer usable medium” as used herein refers to any medium that participates in providing instructions to processor 2307 for execution. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media include, for example, optical or magnetic disks, such as disk drive 2310. Volatile media include dynamic memory, such as system memory 2308.
Common forms of computer readable media include, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
In an embodiment of the invention, execution of the sequences of instructions to practice the invention is performed by a single computer system 2300. According to other embodiments of the invention, two or more computer systems 2300 coupled by communication link 2315 (e.g., LAN, PTSN, or wireless network) may perform the sequence of instructions required to practice the invention in coordination with one another.
Computer system 2300 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 2315 and communication interface 2314. Received program code may be executed by processor 2307 as it is received, and/or stored in disk drive 2310, or other non-volatile storage for later execution.
Computer system 2300 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 2315 and communication interface 2314. Received program code may be executed by processor 2307 as it is received, and/or stored in disk drive 2310, or other non-volatile storage for later execution. In an embodiment, the computer system 2300 operates in conjunction with a data storage system 2331, e.g., a data storage system 2331 that contains a database 2332 that is accessible by the computer system 2300. The computer system 2300 communicates with the data storage system 2331 through a data interface 2333. A data interface 2333, which is coupled to the bus 2306, transmits and receives electrical, electromagnetic or optical signals that include data streams representing various types of signal information, e.g., instructions, messages and data.
Number | Name | Date | Kind |
---|---|---|---|
6076151 | Meier | Jun 2000 | A |
6173313 | Klots et al. | Jan 2001 | B1 |
6192377 | Ganesh et al. | Feb 2001 | B1 |
6295610 | Ganesh et al. | Sep 2001 | B1 |
6353828 | Ganesh et al. | Mar 2002 | B1 |
6493726 | Ganesh et al. | Dec 2002 | B1 |
6510421 | Ganesh et al. | Jan 2003 | B1 |
6574717 | Ngai et al. | Jun 2003 | B1 |
6631374 | Klein et al. | Oct 2003 | B1 |
6647510 | Ganesh et al. | Nov 2003 | B1 |
6684223 | Ganesh et al. | Jan 2004 | B1 |
6714943 | Ganesh et al. | Mar 2004 | B1 |
6728719 | Ganesh et al. | Apr 2004 | B1 |
6728831 | Bridge | Apr 2004 | B1 |
6772176 | Saha et al. | Aug 2004 | B1 |
6804672 | Klein et al. | Oct 2004 | B1 |
6854046 | Evans et al. | Feb 2005 | B1 |
6957236 | Ganesh et al. | Oct 2005 | B1 |
6961729 | Toohey et al. | Nov 2005 | B1 |
6961865 | Ganesh et al. | Nov 2005 | B1 |
6976022 | Vemuri et al. | Dec 2005 | B2 |
6981004 | Ganesh et al. | Dec 2005 | B2 |
6983286 | Sinha et al. | Jan 2006 | B1 |
7010529 | Klein et al. | Mar 2006 | B2 |
7047386 | Ngai et al. | May 2006 | B1 |
7133941 | Klein et al. | Nov 2006 | B2 |
7155427 | Prothia et al. | Dec 2006 | B1 |
7237147 | Ganesh et al. | Jun 2007 | B2 |
7240065 | Yang et al. | Jul 2007 | B2 |
7249152 | Muthulingam et al. | Jul 2007 | B2 |
7251660 | Yang et al. | Jul 2007 | B2 |
7277900 | Ganesh et al. | Oct 2007 | B1 |
7284109 | Paxie et al. | Oct 2007 | B1 |
7334004 | Ganesh et al. | Feb 2008 | B2 |
7346690 | Sinha et al. | Mar 2008 | B1 |
7401084 | Sinha et al. | Jul 2008 | B1 |
7415466 | Ganesh et al. | Aug 2008 | B2 |
7418544 | Mukherjee et al. | Aug 2008 | B2 |
7437525 | Yang et al. | Oct 2008 | B2 |
7480662 | Potapov et al. | Jan 2009 | B2 |
7499953 | Krishnaswamy et al. | Mar 2009 | B2 |
7526508 | Tan et al. | Apr 2009 | B2 |
7552149 | Sinha et al. | Jun 2009 | B2 |
7571173 | Yang et al. | Aug 2009 | B2 |
7574419 | Krishnaswamy et al. | Aug 2009 | B2 |
20020194206 | Ganesh et al. | Dec 2002 | A1 |
20030028722 | Bachmat et al. | Feb 2003 | A1 |
20030031176 | Sim | Feb 2003 | A1 |
20030220951 | Muthulingam et al. | Nov 2003 | A1 |
20040054643 | Vemuri et al. | Mar 2004 | A1 |
20040054644 | Ganesh et al. | Mar 2004 | A1 |
20040177099 | Ganesh et al. | Sep 2004 | A1 |
20050004936 | Potapov et al. | Jan 2005 | A1 |
20050015563 | Yang et al. | Jan 2005 | A1 |
20050050109 | Klein et al. | Mar 2005 | A1 |
20050055385 | Sinha et al. | Mar 2005 | A1 |
20050055492 | Muthulingam et al. | Mar 2005 | A1 |
20050114409 | Sinha et al. | May 2005 | A1 |
20050120062 | Sinha et al. | Jun 2005 | A1 |
20050120064 | Sinha et al. | Jun 2005 | A1 |
20050125573 | Klein et al. | Jun 2005 | A1 |
20050240633 | Krishnaswamy et al. | Oct 2005 | A1 |
20050256829 | Yang et al. | Nov 2005 | A1 |
20050256849 | Krishnaswamy et al. | Nov 2005 | A1 |
20050256897 | Sinha et al. | Nov 2005 | A1 |
20050262110 | Gu et al. | Nov 2005 | A1 |
20050278350 | Yang et al. | Dec 2005 | A1 |
20050278359 | Yang et al. | Dec 2005 | A1 |
20060122963 | Klein et al. | Jun 2006 | A1 |
20060129779 | Cannon et al. | Jun 2006 | A1 |
20060149791 | Sinha et al. | Jul 2006 | A1 |
20060212492 | Jacobs et al. | Sep 2006 | A1 |
20070083566 | Krishnaswamy et al. | Apr 2007 | A1 |
20070088912 | Mukherjee et al. | Apr 2007 | A1 |
20070130616 | Ng et al. | Jun 2007 | A1 |
20070136819 | Ng et al. | Jun 2007 | A1 |
20070288529 | Ganesh et al. | Dec 2007 | A1 |
20080098045 | Radhakrishnan et al. | Apr 2008 | A1 |
20080098083 | Shergill et al. | Apr 2008 | A1 |
20080098236 | Pandey et al. | Apr 2008 | A1 |
20080114963 | Cannon et al. | May 2008 | A1 |
20080144079 | Pandey et al. | Jun 2008 | A1 |
20080183686 | Bhattacharyya et al. | Jul 2008 | A1 |
20080243865 | Hu et al. | Oct 2008 | A1 |
20080281846 | Hoang et al. | Nov 2008 | A1 |
20090024578 | Wang et al. | Jan 2009 | A1 |
20090030956 | Zhang et al. | Jan 2009 | A1 |
20090037366 | Shankar et al. | Feb 2009 | A1 |
20090037495 | Kumar et al. | Feb 2009 | A1 |
20090037498 | Mukherjee et al. | Feb 2009 | A1 |
20090106281 | Marwah et al. | Apr 2009 | A1 |
20090157701 | Lahiri et al. | Jun 2009 | A1 |
20090164525 | Krishnaswamy et al. | Jun 2009 | A1 |
20090205011 | Jain et al. | Aug 2009 | A1 |
20090205012 | Jain et al. | Aug 2009 | A1 |
Number | Date | Country | |
---|---|---|---|
20090037499 A1 | Feb 2009 | US |