Large objects (LOBs) are typically used to store large amounts of data. For example. LOBs can be images, audio, or other multimedia. LOBs can come in a variety of different forms, such as a binary large object (BLOB), a character large object (GLOB), and a national character large object (NCLOB). Databases can store LOBs on-disk or in-store. However, based on their sheer size, LOBs can create a large footprint on the database. Additionally, LOBs can require additional processing power from databases. Consequently, databases may be unable to completely and sufficiently process databases comprising LOBs and/or fully meet the expectations of users.
The accompanying drawings are incorporated herein and form a part of the specification.
In the drawings, like reference numbers generally indicate identical or similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.
Provided herein are system, apparatus, device, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for proving large objects (LOBs) on demand.
In some embodiments, a database may include a manager, an in-memory store, and an on-disk store. Accordingly, in storing LOBs in the database in a fashion to provide on-demand access to LOBs, the manager can create an index vector and a dictionary for the on-disk store. In creating the index vector, the manager can generate identifiers corresponding to the LOBs. The manager can do so using an n-bit compression scheme. In doing so, each identifier in the index vector can have a consecutive bit representation.
Further, in creating the dictionary, the manager can include the index vector's identifiers and LOB data corresponding to the identifier's LOB. Depending on the size of the LOB, the LOB data can be the LOB itself or the LOB location in the on-disk store. In some embodiments, where the LOBs are small (e.g., less than 1000 bytes), the dictionary can include the LOB and, optionally, compress the LOB. And, where the LOBs are medium or large (e.g., 1000 bytes or more), the manager can store the LOB in a dedicated location in the on-disk store. For example, where the LOBs are medium (e.g., from 1000 bytes to 4000 bytes), the manager can store the LOBs in a container of the on-disk store and reference a location in the on-disk store. And, where the LOBs are large (e.g., more than 4000 bytes), the manager can store the LOBs in dedicated files of the on-disk store.
Along these lines, in some embodiments, in creating the index vector and/or dictionary, the manager can use one or more pages of the on-disk store. Accordingly, in some embodiments, the manager can store the index vector on consecutive pages of the on-disk store. Likewise, the manager can also store the dictionary on successive pages of the on-disk store. When storing the dictionary on one or more pages of the on-disk store, the manager can create a data structure comprising a page number and a page offset of the appropriate LOB data.
Accordingly, when processing user requests with respect to LOBs stored in the database, the manager can first determine the appropriate value identifier from the index vector stored in the on-disk store. Thereafter, using the identifier, the manager can identify corresponding LOB data. In some embodiments, where the dictionary is paged, the manager can use the dictionary's data structure to determine the page number and offset of the corresponding LOB data.
As stated above, depending on the size of the LOB, the LOB data can be the LOB itself or the LOB location in the on-disk store (e.g., a container or a dedicated file). If the LOB data is the LOB itself, the manager can load the LOB from the on-disk store into the in-memory store. However, if the LOB data is a location of the on-disk store, the manager can retrieve the LOB from the dedicated location and load the LOB into the in-memory store. The manager can then perform one or more actions related to the LOB in the in-memory store.
By operating in such a fashion, the footprint of the database's in-memory store and/or on-disk store are significantly reduced. As a result, the database is able to process user requests relating to LOBs more efficiently. Likewise, the computing device or system supporting the database is not required to have expensive hardware to process user requests relating to LOBs efficiently. In turn, LOBs are provided to on-demand, irrespective of the underlying computing device or system's hardware.
Database 102 can include manager 104, in-memory store 106, and/or on-disk store 108. Manager 104 can include request handler 112, builder 114, compression operator 116, page generator 118, Aerator 120, and/or load operator 122. As will be discussed in more detail below, request handler 112 can receive user requests from user devices 104A and 104B.
Builder 114 can create index vector 124 and/or dictionary 126 for on-disk store 108. To create index vector 124, builder 114 can analyze data entries in one or more columns of a columnar database. Builder 114 can then identify one or more columns of the columnar database, where the column(s) include LOBS. Builder 114 can then create index vector 124 based on the column(s) having LOBs. Accordingly, index vector 124 can also be provided in a columnar fashion. Along these lines, in some embodiments, index vector 124 can include multiple columns in a sequential manner such that the first column is followed by the second column followed by the third column and so on. By organizing data in such a fashion, index vector 124's 10th entry for a first column can correspond to the 10th entry for another column. For example, one column can correspond to written identifiers, and another column can correspond to the correspond LOB.
After identifying columns from a columnar database for creating index vector 124, builder 114 can provide unique identifiers for each LOB. In some embodiments, compression operator 116 can perform an n-bit compression scheme on the written identifiers to create the unique identifiers. Compression operator 116's n-bit compression scheme can be independent of a bit architecture (e.g., 32-bit or 64-bit architecture) of a computing device that processes database 102. For example, compression operator 116's n-bit compression scheme can determine the total number of bits required to represent each row of a column. Compression operator 116's n-bit compression scheme can then provide consecutive bit representations to each row of the column.
Referring to
As stated above, builder 114 can also create dictionary 126 for on-disk store 108. In doing so, builder 114 can create index vector 124's identifiers and LOB data corresponding to the identifier's LOB. As illustrated in
Accordingly, large-sized LOBs can be saved as individual files 134 (of
LOB container 500 can include data structure 508 that provides the location of the LOBs in the chain of pages. For each medium-sized LOB 502, data structure 508 can include LOB identifier 512A-C and location 514A-C of the LOB in the chain of pages. LOB identifiers 512A-C can be provided sequentially based on the location of LOBs in the chain of pages. For example, as shown, three consecutive LOBs can have identifiers “3456,” “3457,” and “3456,” respectively. Further, in some embodiments, LOB 502A-C's location 514A-C can include a page number of a sequence of pages and an offset on the particular page. The offset 514A-C can indicate the number of bytes from the beginning of the particular page at which LOBs resides. Accordingly, offset 514A-C can include an indication of the start of LOBs.
Referring to
Accordingly, depending on the size of the LOB data and pages 130A and 130B, LOB data can be smaller or larger than pages 130A and 130B. Page generator 118 can then divide LOB data into segments such that each segment is to be provided onto consecutive pages 130A and 130B and is no greater in size than its respective page 130A and 130B. Thus, in some embodiments, a particular page 130A can include different LOB data in its entirety or a portion thereof. By operating in such a fashion, page generator 118 can utilize pages 130A and 130B in their full capacity.
Along these lines, when storing LOB data on one or more pages 130A and 130B, page generator can create a data structure providing the location of the LOB data on one or more pages. The data structure can be similar to container 500's data structure 508. Dictionary 126's data structure can include index vector 124's identifiers and the LOB location data on one or more pages 130A and 130B. The location can include a page number and/or an offset. The offset can indicate the number of bytes from the beginning of the particular page at which LOBs resides.
After creating index vector 124 and/or dictionary 126, as stated previously, manager 104's request handle 118 can receive user requests with respect to LOBs from user devices 104A and 104B. In some embodiments, user requests can include a specific row position, or a specific range of row positions, in index vector 124. Request handle 118 can then determine a specific row position, or a specific range positions, associated with the user request.
Iterator 120 can thereafter traverse through index vector 124 to identify an identifier 202A-F (of
After determining identifiers from index vector 124, iterator 120 can use the identifiers to identify corresponding LOB data from dictionary 126. As described above, in some embodiments, the corresponding LOB data can be on one or more pages 130A and 130B, Accordingly, iterator 120 can acquire a location of the LOB data on one or more pages 130A and 130B from a data structure that maps the identifiers to the location of the LOB data on the pages 130A and 130B. Iterator 120 can then identify the LOB data with the location provided by the data structure. In some embodiments, compression operator 116 can decompress the LOB data.
As described above, in some embodiments, depending on a LOB's size, the LOB data can be the LOB itself or the LOB location outside of the dictionary 126. For example, in some embodiments, small-sized LOBs can be stored in dictionary 126. Accordingly, load operator 122 can retrieve the LOB data from dictionary 126 and place the LOB in in-memory store 106. However, in some embodiments, for medium- and large-sized LOB, the LOB data can correspond to a location outside of dictionary 126. And, for medium-sized and large-sized LOBs, the location can be on-disk store 108's container 132 and individual files 134, respectively.
Along these lines, as described above, in some embodiments, medium-sized LOBs can be stored on one or more pages of container 132. And, in some embodiments, container 132 can include a data structure including LOB identifiers and corresponding page numbers and offsets for the LOBs. Thus, for medium-sized LOBs, the LOB data can be the LOB identifier or the page and offset in container 132.
Accordingly, for medium-sized LOBs, load operator 122 can retrieve the LOB from container 132. For example, where the LOB data is a LOB identifier, iterator 120 can determine the page number and offset from container 132's data structure and then parse through container 132's one or more pages to identify the LOB data. Similarly, where the LOB data is a page number and offset for container 132's one or more pages, iterator 120 can parse through container 132's one or more pages to identify the LOB data. Load operator 122 can then load the LOB's data from this location in container 132 into in-memory store 106.
As stated previously, large-sized LOBs can be stored in on-disk store 108's individual tiles 134. Accordingly, for large-sized LOBs, the LOB data can be a physical location of individual files 134 in on-disk store 108. Thus, load operator 122 can load the LOB's data from this location into in-memory store 106.
After loading the LOB data from on-disk store 108 into in-memory store 106, request handler 112 can perform user requests with respect LOBs. For example, LOBs can be provided to user devices 104A and 104B to be viewed. LOBs can also be modified during merge, as well as any other type of action as would be understood by a person of ordinary skill.
Referring to
In 602, database 102 creates index vectors 124 and 200 to include one or more identifiers 202A-F corresponding to LOBs. To do so, database 102 can analyze data entries one or more columns of a columnar database. Database 102 can then identify one or more columns of the columnar database, where the column(s) include LOBs. Database 102 can then create index vector 124 based on the column(s) having LOBs.
After identifying columns from a columnar database for creating index vector 124, database 102 can provide identifiers 202A-F for each LOB. In some embodiments, database 102 can perform an n-bit compression scheme on the written identifiers to create the identifiers 202A-F.
In 604, database 102 creates dictionary 126 and 300 to include the identifiers 302A-F and corresponding LOB data. The dictionary 300's identifiers 302A-F can be provided in the same order as the index vector 200's identifier 202A-F (of
In some embodiments, database 102 can create at least one of index vector 124 and 200 and dictionary 126 and 300 on one or more pages 128A, 128B, 130A, and 130B of on-disk store.
In 606, database 102 can provide access to LOBs stored in dictionary 126, container 132, and/or files 134. For example, users via user devices 104A and 104B can request operations on LOBs. Database 102 can determine value identifiers 202A-F corresponding to LOBs from index vector 124. Using the value identifiers 202A-F, database 102 can determine the corresponding LOB data from dictionary 126. If the LOBs are small, the LOB data can be the LOB itself. If the LOBs are medium or large, the LOB data can correspond to a location in container 132 or files 134.
Various embodiments may be implemented, for example, using one or more well-known computer systems, such as computer system 700 shown in
Computer system 700 may include one or more processors (also called central processing units, or CPUs), such as a processor 704. Processor 704 may be connected to a communication infrastructure or bus 706.
Computer system 700 may also include user input/output device(s) 703, such as monitors, keyboards, pointing devices, etc., which may communicate with communication infrastructure 706 through user input/output interface(s) 702.
One or more of processors 704 may be a graphics processing unit (GPU). In an embodiment, a GPU may be a processor that is a specialized electronic circuit designed to process mathematically intensive applications. The GPU may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, videos. etc.
Computer system 700 may also include a main or primary memory 708, such as random access memory (RAM). Main memory 708 may include one or more levels of cache. Main memory 708 may have stored therein control logic (i.e., computer software) and/or data.
Computer system 700 may also include one or more secondary storage devices or memory 710. Secondary memory 710 may include, for example, a hard disk drive 712 and/or a removable storage device or drive 714. Removable storage drive 714 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, tape backup device, and/or any other storage device/drive.
Removable storage drive 714 may interact with a removable storage unit 718. Removable storage unit 718 may include a computer usable or readable storage device having stored thereon computer software (control logic) and/or data. Removable storage unit 718 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/any other computer data storage device. Removable storage drive 714 may read from and/or write to removable storage unit 718.
Secondary memory 710 may include other means, devices, components, instrumentalities or other approaches for allowing computer programs and/or other instructions and/or data to be accessed by computer system 700. Such means, devices, components, instrumentalities or other approaches may include, for example, a removable storage unit 722 and an interface 720. Examples of the removable storage unit 722 and the interface 720 may include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM or PROM) and associated socket, a memory stick and USB port, a memory card and associated memory card slot, and/or any other removable storage unit and associated interface.
Computer system 700 may further include a communication or network interface 724. Communication interface 724 may enable computer system 700 to communicate and interact with any combination of external devices, external networks, external entities, etc. (individually and collectively referenced by reference number 728). For example, communication interface 724 may allow computer system 700 to communicate with external or remote devices 728 over communications path 726, which may be wired and/or wireless (or a combination thereof), and which may include any combination of LANs, WANs, the Internet, etc. Control logic and/or data may be transmitted to and from computer system 700 via communication path 726.
Computer system 700 may also be any of a personal digital assistant (PDA), desktop workstation, laptop or notebook computer, netbook, tablet, smart phone, smart watch or other wearable, appliance, part of the Internet-of-Things, and/or embedded system, to name a few non-limiting examples, or any combination thereof.
Computer system 700 may be a client or server, accessing or hosting any applications and/or data through any delivery paradigm, including but not limited to remote or distributed cloud computing solutions; local or on-premises software (“on-premise” cloud-based solutions “as a service” models (e.g., content as a service (CaaS), digital content as a service (DCaaS), software as a service (SaaS), managed software as a service (MSaaS), platform as a service (PaaS), desktop as a service (DaaS), framework as a service (FaaS), backend as a service (BaaS), mobile backend as a service (MBaaS), infrastructure as a service (IaaS), etc.); and/or a hybrid model including any combination of the foregoing examples or other services or delivery paradigms.
Any applicable data structures, file formats, and schemas in computer system 700 may be derived from standards including but not limited to JavaScript Object Notation (JSON), Extensible Markup Language (XML), Yet Another Markup Language (YAML), Extensible Hypertext Markup Language (XHTML), Wireless Markup Language (WML), MessagePack, XML User Interface Language (XUL), or any other functionally similar representations alone or in combination. Alternatively, proprietary data structures, formats or schemas may be used, either exclusively or in combination with known or open standards.
In some embodiments, a tangible, non-transitory apparatus or article of manufacture comprising a tangible, non-transitory computer useable or readable medium having control logic (software) stored thereon may also be referred to herein as a computer program product or program storage device. This includes, but is not limited to, computer system 700, main memory 708, secondary memory 710, and removable storage units 718 and 722, as well as tangible articles of manufacture embodying any combination of the foregoing. Such control logic, when executed by one or more data processing devices (such as computer system 700), may cause such data processing devices to operate as described herein.
Based on the teachings contained in this disclosure, it will be apparent to persons skilled in the relevant art(s) how to make and use embodiments of this disclosure using data processing devices, computer systems and/or computer architectures other than that shown in
It is to be appreciated that the Detailed Description section, and not any other section, is intended to be used to interpret the claims. Other sections can set forth one or more but not all exemplary embodiments as contemplated by the inventor(s), and thus, are not intended to limit this disclosure or the appended claims in any way.
While this disclosure describes exemplary embodiments for exemplary fields and applications, it should be understood that the disclosure is not limited thereto. Other embodiments and modifications thereto are possible, and are within the scope and spirit of this disclosure. For example, and without limiting the generality of this paragraph, embodiments are not limited to the software, hardware, firmware, and/or entities illustrated in the figures and/or described herein. Further, embodiments (whether or not explicitly described herein) have significant utility to fields and applications beyond the examples described herein.
Embodiments have been described herein with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined as long as the specified functions and relationships (or equivalents thereof) are appropriately performed. Also, alternative embodiments can perform functional blocks, steps, operations, methods, etc. using orderings different than those described herein.
References herein to “one embodiment,” “an embodiment,” “an example embodiment,” or similar phrases, indicate that the embodiment described can include a particular feature, structure, or characteristic, but every embodiment can not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of persons skilled in the relevant art(s) to incorporate such feature, structure, or characteristic into other embodiments whether or not explicitly mentioned or described herein. Additionally, some embodiments can be described using the expression “coupled” and “connected” along with their derivatives. These terms are not necessarily intended as synonyms for each other. For example, some embodiments can be described using the terms “connected” and/or “coupled” to indicate that two or more elements are in direct physical or electrical contact with each other. The term “coupled,” however, can also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
The breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
This application claims priority to provisional patent application No. 62/858,963, titled “Native Store Extension for Combined Workload Databases” to Sherkat et al, filed Jun. 7, 2019, which is herein incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62858693 | Jun 2019 | US |