The present disclosure relates to databases and, more particularly, to document migration from one type of database to another type of database.
Document databases are popular with developers because such databases make it easy to retrieve and store hierarchically-organized data corresponding to app-tier language objects. For example, documents, such as JSON documents, are a developer-friendly data access format that provide a flexible schema. The JSON document model avoids the need for decomposition or reconstitution and complexities that come with decomposition/reconstitution. A disadvantage of document databases is that they have significant limitations as a data storage format, especially as the complexity of an application that accesses a document database increases.
A new feature in database technology, referred to as “duality views”, allows data to be stored in relational tables and accessed as JSON documents. In this way, the new feature combines the benefits of relational and document worlds while avoiding their respective pitfalls. This new feature provides flexibility and simplicity for database developers. For this new feature to be successful, there should be an easy mechanism for users to migrate their data and applications from document databases to relational databases.
However, there are several challenges in supporting such a mechanism. For example, duality views are declarative. In other words, users must define their relational schema and declare tables and duality views over this schema. Not only is this defining difficult in general, developers using document databases may be unfamiliar with relational concepts and declarative programming. It is not trivial for such developers to figure out the optimal schema and view definitions for their respective data. Additionally, it is difficult for such developers to seamlessly migrate their respective applications and data to relational databases.
Current approaches to migrating data from a document database to a relational database involve several manual steps that are non-trivial, time-consuming, and oft times frustrating.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.
A system and method for automatically migrating documents from a document database to a relational database are provided. In one technique, a three-stage process is implemented to ensure a successful migration: (1) validating that document collections can be converted to duality views, (2) normalizing document collections to relational tables, and (3) generating scripts that, when executed, create tables, indexes, and/or duality views.
Embodiments improve computer-related technology involving migrating documents from one type of database to another type. Embodiments are automatic, fast, and hide a lot of complexity regarding relational databases from developers. Once a document is migrated to a relational database, users gain benefits of relational databases, such as normalization, ACID properties, and security features. While embodiments automate the most difficult and time-consuming steps in the migration process from document databases to relational databases with duality views, additional embodiments allow for users to make changes to an automatically-generated schema or entity-relationship model before the relational tables and duality views are created. In summary, embodiments significantly reduce the barrier for application developers to migrate documents from document databases to relational databases.
DDB system 110 exports (or sends) one or more document collections to ADM system 120. An example of DDB system 110 is Mongo DB. A document collection may involve one or more sets of documents. Each set of documents may be exported in a single transmission over a single network connection. If there are multiple sets of documents, then each set of documents may be exported in different transmissions over a single network connection or over different network connections. Thus, DDB system 110 may be communicatively coupled to ADM system 120 over one or more computer networks, such as a local area network (LAN), a wide area network (WAN), or the Internet. Similarly, RDB system 130 may be communicatively coupled to ADM system 120 over one or more similar computer networks.
A document collection may include documents of the same format or documents of different formats. (A document format refers to the number of elements in a document, how the elements are organized in the document, and/or the data types of those elements.) For example, one document collection may include one or more documents that include employee data records that are formatted in a particular manner and another document collection may include one or more documents that include department data records that are formatted in a different manner. As another example, a single document collection may include a document comprising employee data records and a document comprising department data records.
ADM system 120 receives a document collection (from DDB system 110) and performs operations with respect to documents in the document collection. Those operations correspond to the operations performed by components of ADM system 120, such as validation, normalization, and schema generation. Generally, ADM system 120 validates (or determines) whether a document collection can be converted to a relational format and a duality view, normalizes the document collection to relational tables (including deduplication of data), and generates database language (e.g., SQL) scripts to create tables, indexes, and/or duality views. Thus, ADM system 120 performs the most difficult tasks of the document migration process. The primary task of a user of ADM system 120 is to specify a document collection location, such as providing, to ADM system 120, input that specifies a storage location (physical or logical) where a document collection (or a subset thereof) is stored. In response, ADM system 120 either performs the migration or generates and presents, to the user, a script for generating one or more database objects that store relational data originating from the document collection.
Validator 122 performs schema validation on each document collection. A schema is the organization or structure for data in a database. A schema indicates (1) relationships between objects and between objects and fields, (2) object names and field names, and (3) field types. For example, an object has a name and one or more fields, each of which has a name and a field type. An object may be a parent object, a child object, or both. A parent object is an object with one or more child objects. A child object is an object with a parent object. An object may be both a parent object (of one object) and a child object (of another object). A parent object may be a root object, which means the parent object is not a child object of any other object. A child object may be a leaf object, meaning that the child object is not a parent object of any other object. Examples of types of fields (or field types) include number, date, and string.
Validator 122 determines whether the schema of a document collection is consistent. In other words, validator 122 determines whether there is a global schema that all documents in a collection satisfy. For example, a document collection that will pass the validation phase should not have an employee document as well as a department document.
Given a collection of documents, the following conditions should be satisfied: (1) a sufficient number of documents in the collection satisfy a global schema (there is at least one low entropy field, as described herein); and (2) there is sufficient overlap between the fields of the documents in the collection. If at least one of these conditions fails, then a global schema for documents in the collection does not exist and validation fails. For example, a first employee document includes a field named “designation” while a second employee document does not include the field “designation.” Continuing with this example, the second employee document includes a department name field while the first employee document does not include the field “department name.” In addition, both employee documents include an employee identifier, a name, and a department identifier. Thus, there exists a global schema that satisfies all documents in the collection and there is overlap between the fields of the documents in the collection.
The following is pseudocode for generating a global schema for a document collection:
Examples of paths in an employee document include “/”, “/empId”, and “/dept/deptId”. Thus, a path may include a root designation (e.g., “/”) and a series of one or more field names separated by a “/”. Examples of field types include Number, Date, String, and Object, which is a type of field that includes one or more other fields.
Global schema 400 includes a root node 410 representing an employee document, four child nodes 410-440 (three of which are child attribute nodes that correspond to non-object fields of an employee document, and one of which is a child object node that corresponds to an object field of an employee document), and three grandchild nodes 442-446, or child nodes of the child object node 440.
In the above example pseudocode, fields that have high entropy are removed. High entropy fields cannot be normalized to relational columns. A field has (or is) high entropy if the field appears in a small percentage of documents (e.g., less than 10%) or if the field has been associated with multiple types, each of which occurs greater than a certain frequency. For example, if a field name is associated with a string type in 80% of all instances of that field name in a document collection and is associated with a number type in the other 20% of all those instances, then the field name is removed from consideration and therefore, will not be a field or column in any resulting relational schema.
There are at least two options to process fields with high entropy: (a) report documents containing such fields to a user or administrator (and, optionally, validate the remaining documents); or (b) place all such fields in a flex JSON column (via flex field support). The following pseudocode describes a variation of option (a):
Thus, if a document from the documents collection (that was used to generate the global schema) does not match the global schema (“global_schema”) that validator 122 constructed, then the reject count (which is initialized to zero) increments by one. If the reject count exceeds a rejection limit, then validation fails and the normalization and schema generation steps are skipped. A document does not match a global schema if the document contains at least one field that is not found in the global schema. For example, if the field designation was removed due to high entropy (e.g., the field was associated with a different data type over 5% of the time) and a document contains the field designation, then the reject count increments by one.
In the example pseudocode above where validation may fail, any documents that fail are stored in an error log. In this option, a user or system administrator may view the error log and determine whether to modify the global schema or the document in order for the document to match.
In option (b) the validation step does not need to be executed. Instead, ingestion of high entropy fields is supported using flex field support, where a ‘flex’ JSON column is added at each level that contains such high entropy fields, and these fields are automatically embedded inside the ‘flex’ JSON column. The following is an example create table statement where a “Flex” JSON column is added to support and store high entropy fields:
The following is an example create duality view statement where a “Flex” JSON column is added to support and store high entropy fields:
After a document collection is validated, normalizer 124 normalizes the data in the document collection in order to prepare the data for relational consumption, or storage. Normalization generally involves de-duplication of data across fields and identifying non-overlapping entities and the relationships between them. The output of this process is a set of normalized entities, the relationships between them, and a mapping of document fields to relational fields.
More specifically, normalization involves using the structure of the input document collections as a guide to identify entities to be normalized. Each node in the global schema that has non-null children becomes a candidate for normalization. All such nodes are added to a normalization set. Then, for each node in this normalization set, the primary key (PK) for the node is identified. If no PK can be found, then an auto-generated identity PK column may be added to the node. Then, for each pair of nodes in the normalization set, it is determined whether the nodes represent the same entity. If so, then the pair of nodes are combined into a single super-node.
Based on these steps, normalization comprises six phases: building a schema tree for each document collection, identifying candidate nodes for normalization (where the hierarchy information is used to identify which fields should be grouped together in a single entity), identifying the PK for each candidate entity or node, normalizing the candidate entities, identifying foreign key (FK) references between entities, and constructing an entity-relationship (ER) model and mapping between document fields and entity attributes. These six phases are described in more detail herein.
Employee schema tree 500 comprises a root node 502, three child attribute nodes 510-530 (each of which corresponds to a field of an employee object), and a child object node 540 (which corresponds to a department object), which comprises three child attribute nodes 542-546 (each of which corresponds to a field of the department object). Child object node 540 is a singleton descendant, meaning that if a department object is a child of an employee node, then there is only one department object.
Department schema tree 550 comprises a root node 552, four child attribute nodes 560-575 (each of which corresponds to a field of a department object), and a child object node 580 (which corresponds to one or more employee objects), which child object node 580 comprises two child attribute nodes 582-584 (each of which corresponds to a field of the employee object). Child object node 580 is an array descendant, meaning that if an employee object is a child of a department node, then there may be multiple employee objects.
Given schema trees 500 and 550, candidate nodes for normalization are identified. A candidate node is each parent node, which is associated with one or more child nodes, that has non-null children. Therefore, in schema trees 500 and 550, each of nodes 502, 540, 552 and 580 is a candidate node because each of those nodes is a parent node.
In some scenarios, a document or entity might contain unnested attributes from another document or entity. For example, an employee entity or document contains a department identifier, a department name, and a department budget, all at the same level as an employee identifier and an employee name.
To handle this unnested attribute scenario, a functional dependency tree is generated (up to a depth of 2) within a sub-object where the root is a primary key (PK). A functional dependency between two attributes (or fields) means that given the value of one attribute, it is possible to accurately determine the value of the other attribute. Based on this automatic analysis given the example above, a functional dependency tree (FDT) may be generated where the root of the FDT is employee identifier (i.e., the PK), department identifier is a child attribute of the root, and department name and budget are child attributes of department identifier.
Next, each sub-tree (at depth 1) can be considered to be an unnested entity if and only if (1) the difference in cardinality (or number of unique values) between the root and the sub-tree is greater than a certain threshold (e.g., at least 2×) and (2) the sub-tree has at least one field at depth 2 (relative to the root in the FDT). For example, if there are at least two employees in each department (as indicated in the employee documents), then (1) is satisfied. In other words, there will probably be many more unique employee identifiers than unique department identifiers. On the other hand, if there is a closer to a one-to-one relationship between employee identifier and department identifier, then it does not make sense to create another table to store the department data, which would be duplicative of the department data in an employee document/record.
In some scenarios, a document or entity might contain nested attributes in the same entity. For example, an employee entity or document contains a name entity that is itself an object that contains a first name field and a last name field. In such scenarios, it would be beneficial to treat a nested entity as a set of nested attributes of the parent entity instead of creating a separate relational table for the nested entity. Such treatment would involve removing the nested entity and including the attributes of that nested entity as direct child entities of the parent entity. Such “unnesting” is performed if (a) the nested entity does not have a primary key (PK) or (b) the cardinality difference between the parent and nested entity is less than a particular threshold (e.g., 2).
In phase 3, a primary key is identified for each candidate node. In order to do this, for each candidate node, all ordered pairs of attributes of the candidate node are considered and it is determined whether a functional dependency exists between each pair. A functional dependency exists between attribute a1 and attribute a2 if the “strength” of the dependency between the two attributes is greater than a particular threshold α. For example, strength(a1, a2)>α, where strength( ) is a function that measures dependency and α is the particular threshold, such as 0.99. An example of the strength function is the following:
strength(a1, a2)=(#of unique values of a1)/(#of unique (a1, a2) value pairs)
This functional dependency check does not work if an input collection uses composite keys. To solve this scenario of composite keys, groups of two attributes (or even more) as candidate PKs may be considered if no single attribute is a PK.
A PK attribute is the attribute that has functional dependencies to all other attributes of a candidate node. If no such attribute exists, then the candidate node does not have a PK and cannot be normalized to a separate entity. In this case, this candidate node may be considered a nested attribute set of the parent entity. On the other hand, if multiple attributes satisfy the PK definition, then the following rules may be used to break ties:
If multiple attributes are PK candidates, then one is selected as the PK for the entity in question and the remaining may be identified as unique keys (UKs)
Given node 502 in
Based on Table A, attributes “empId” and “name” have functional dependencies to all other attributes of node 502 and both attributes satisfy the conditions to be a PK. A tie between “empId” and “name” is broken using rule 3 (prefer attributes with lower variance of the respective values), resulting in selecting “empId” as the PK.
Using this process of PK identification, in light of
Once PKs are identified for candidate nodes, the candidate entities are normalized. Two nodes, N1 and N2, may be normalized into a single node (or entity) if there is a one-to-one relationship between at least one pair of attributes of N1 and N2. Generally, a join of N1 and N2 is performed based on an equality on their PKs (or UKs). If a one-to-one correspondence between the attributes in the join result is determined, then N1 and N2 may be grouped into a single relation.
For example, the following is one way in which node 502 (referenced as node1 in Table C) and node 580 (referenced as node4 in Table C) may be normalized into a single entity. First, a join of node1 and node4 is performed based on an equality on their PKs, i.e., node1.empId=node4.empId.
The number of rows in the join result should be at least (1−alpha) % the size of the number of data points in the larger of the two nodes. Alpha may be a high value, such as 0.99. Also, the size of the join result should be a significant percentage (e.g., >98%) of the size of the data contained in the smaller of the two nodes. The size of the data contained in a node is the number of data points in the node. For example, for the employee node, the size would be the number of employee objects. This latter condition eliminates matches that result from extremely small, coincidental overlaps in the domains of the subtrees' parent attributes.
After the join result is generated, a matrix (attrMatrix) is constructed that has one cell per pair of attributes from each node. Such constructing is performed by iterating over every row (R) in the join result and increment attrMatrix[a1][a2] by 1 if R[a1]=R[a2]. This matrix is presented as Table D.
After the attrMatrix is constructed, a mapping between node1 and node4 is determined, such that
where a1 is an attribute in node1 and a2 is an attribute in node4 and α is threshold (e.g., 0.99), and
for all other attributes a2′ in node4. This latter condition makes sure that the a2 that was identified has a higher value than other attributes in the same node.
Based on attrMatrix, a one-to-one mapping between node1 and node4 may be constructed where node1.empId maps to node4.empId and node1.name maps to node4.empname, but node1.designation does not map to any attribute in node4. This mapping is shown in
Based on the mapping indicated in
Similarly, node2 and node3 (corresponding to nodes 540 and 552) may be normalized into a single entity, as indicated in entity 800 depicted in
After two or more entities are normalized, relationships between the normalized entities are considered in the (JSON) schemas. For one-to-many relationships, the PK of the parent entity is added as a FK in the child entity. For one-to-one relationships, the PK of the child entity is added as a FK in the parent entity. For many-to-many relationships, a mapping table may be created with two columns: the PK of the parent entity is added as a FK in the child entity and the PK of the child entity as added as another FK. The combination of the two FKs may be marked as the PK of the mapping table.
In the example normalized entities of
In phase 6 of the normalization process, normalizer 124 constructs an entity-relationship (ER) model and mapping between document fields and entity attributes. Based on the JSON schema for each collection and the normalized entities, normalizer 124 builds ER model 1000, as depicted in
Schema generator 126 generates, based on validated schemas and normalized entities, a recommendation of a table creation statement and/or a duality view creation statement. In a related embodiment, a user (e.g., a database administrator) overrides one or more recommendations. Alternatively, no user intervention is required.
Either way, schema generator 126 automatically generates one or more tables and/or duality views, for example, by executing a create table statement and/or a create duality view statement. After these relational database objects are generated, a database application seamlessly migrates to a relational database system (e.g., RDB system 130).
In an embodiment, users of DDB system 120 are able to run their existing applications without significant changes (or no changes) on top of these automatically-generated duality views.
The following is pseudocode for generating a create table statement in Structured Query Language (SQL), which statement is a type of data definition language (DDL) statement. Embodiments are not limited to SQL and may be applicable to any database query language.
A schema_node is an object that contains data about a node in one of the schema trees that was generated. A schema_node has a type, such as a root node, a parent node, a child node, or a field. If a root node, then the schema_node indicates a table name and key data, such as which field is a private key or foreign key. If the schema_node is a node with fields, then the schema_node will indicate the names of the fields.
Given the normalized employee entity described herein as input, the above pseudocode would generate the following DDL statement:
Additionally, the same pseudocode, given the normalized department entity described herein as input, would generate the following DDL statement:
The following is pseudocode for generating a create index statement in SQL in order to generate an index on foreign keys:
Given the normalized employee entity described herein as input, the above pseudocode would generate the following DDL statement (since the normalized employee entity has a foreign key, but not the normalized department entity):
The following is pseudocode for generating a create duality view statement in SQL:
Duality views allow documents to be stored in a relational format, but accessed by applications in their native format (e.g., JSON). In this way, applications that accessed the documents in their native format may continue doing so even though the documents are actually stored in a relational format. With duality views, the documents that the applications present to a user will look the same as the documents that were sent to ADM system 120 from DDB system 110. Given the normalized employee entity described herein as input, the above pseudocode would generate the following DDL statement:
The data on the left side of the colon is a field name in the original document while the data on the right side of the colon is a corresponding field name in a relational table, such as a column name or a table name.
Additionally, the same pseudocode, given the normalized department entity described herein as input, would generate the following DDL statement:
Documents from DDB system 110 may be imported into RDB system 130 may be performed in one or more ways. For example, one or more document (e.g., JSON) collections are imported into a set of tables with a single (e.g., JSON) column using a export API (e.g., “mongoexport”) that specifies a collection and, optionally a set of documents and an import API (e.g., mongoimport”) that specifies a document database, user credentials and the collection/documents to export.
Once the document collections are stored in a single JSON column of a table in RDB system 130, a set of functions is executed in order to determine a relational schema for the input collections. The set of functions may be part of a PL/SQL package. Two or more functions may be used to validate, normalize, and generate table schema. Alternatively, a single function may be used to perform each of these main operations. An advantage of having two or more functions is that a user/administrator would be allowed to view an ER model that is output from the validate and normalize function(s) and modify the ER model. For example, the user may add or remove fields, add or remove FKs, add or modify relationships, etc. The user may then specify the modified ER model as input to another function (e.g., a generate schema function), which will automatically generate and execute DDL statements that are based on the modified ER model.
After the schema is generated, the document data from the single (e.g., JSON) column is moved to one or more duality views. For example, employee and department documents are moved from a single JSON column to multiple duality views. This may be performed using an “insert as select” statement for each document collection.
Example usage of an application interface is the following code:
In a related embodiment, a graphical user interface (GUI) presents a button or other selectable graphical element that, when selected, performs the operations described herein. In this way, a user is not required to specify any code. Instead, the user may only be required to (1) specify a location where one or more document collections are stored and a target database in which to store the one or more document collections and (2) select a graphical button.
According to one embodiment, the techniques described herein are implemented by one or more special-purpose computing devices. The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination. Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques. The special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
For example,
Computer system 1100 also includes a main memory 1106, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 1102 for storing information and instructions to be executed by processor 1104. Main memory 1106 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 1104. Such instructions, when stored in non-transitory storage media accessible to processor 1104, render computer system 1100 into a special-purpose machine that is customized to perform the operations specified in the instructions.
Computer system 1100 further includes a read only memory (ROM) 1108 or other static storage device coupled to bus 1102 for storing static information and instructions for processor 1104. A storage device 1110, such as a magnetic disk, optical disk, or solid-state drive is provided and coupled to bus 1102 for storing information and instructions.
Computer system 1100 may be coupled via bus 1102 to a display 1112, such as a cathode ray tube (CRT), for displaying information to a computer user. An input device 1114, including alphanumeric and other keys, is coupled to bus 1102 for communicating information and command selections to processor 1104. Another type of user input device is cursor control 1116, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 1104 and for controlling cursor movement on display 1112. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
Computer system 1100 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 1100 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 1100 in response to processor 1104 executing one or more sequences of one or more instructions contained in main memory 1106. Such instructions may be read into main memory 1106 from another storage medium, such as storage device 1110. Execution of the sequences of instructions contained in main memory 1106 causes processor 1104 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
The term “storage media” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operate in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical disks, magnetic disks, or solid-state drives, such as storage device 1110. Volatile media includes dynamic memory, such as main memory 1106. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid-state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 1102. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 1104 for execution. For example, the instructions may initially be carried on a magnetic disk or solid-state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 1100 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 1102. Bus 1102 carries the data to main memory 1106, from which processor 1104 retrieves and executes the instructions. The instructions received by main memory 1106 may optionally be stored on storage device 1110 either before or after execution by processor 1104.
Computer system 1100 also includes a communication interface 1118 coupled to bus 1102. Communication interface 1118 provides a two-way data communication coupling to a network link 1120 that is connected to a local network 1122. For example, communication interface 1118 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 1118 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 1118 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 1120 typically provides data communication through one or more networks to other data devices. For example, network link 1120 may provide a connection through local network 1122 to a host computer 1124 or to data equipment operated by an Internet Service Provider (ISP) 1126. ISP 1126 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 1128. Local network 1122 and Internet 1128 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 1120 and through communication interface 1118, which carry the digital data to and from computer system 1100, are example forms of transmission media.
Computer system 1100 can send messages and receive data, including program code, through the network(s), network link 1120 and communication interface 1118. In the Internet example, a server 1130 might transmit a requested code for an application program through Internet 1128, ISP 1126, local network 1122 and communication interface 1118.
The received code may be executed by processor 1104 as it is received, and/or stored in storage device 1110, or other non-volatile storage for later execution.
Software system 1200 is provided for directing the operation of computer system 1100. Software system 1200, which may be stored in system memory (RAM) 1106 and on fixed storage (e.g., hard disk or flash memory) 1110, includes a kernel or operating system (OS) 1210.
The OS 1210 manages low-level aspects of computer operation, including managing execution of processes, memory allocation, file input and output (I/O), and device I/O. One or more application programs, represented as 1202A, 1202B, 1202C . . . 1202N, may be “loaded” (e.g., transferred from fixed storage 1110 into memory 1106) for execution by the system 1200. The applications or other software intended for use on computer system 1100 may also be stored as a set of downloadable computer-executable instructions, for example, for downloading and installation from an Internet location (e.g., a Web server, an app store, or other online service).
Software system 1200 includes a graphical user interface (GUI) 1215, for receiving user commands and data in a graphical (e.g., “point-and-click” or “touch gesture”) fashion. These inputs, in turn, may be acted upon by the system 1200 in accordance with instructions from operating system 1210 and/or application(s) 1202. The GUI 1215 also serves to display the results of operation from the OS 1210 and application(s) 1202, whereupon the user may supply additional inputs or terminate the session (e.g., log off).
OS 1210 can execute directly on the bare hardware 1220 (e.g., processor(s) 1104) of computer system 1100. Alternatively, a hypervisor or virtual machine monitor (VMM) 1230 may be interposed between the bare hardware 1220 and the OS 1210. In this configuration, VMM 1230 acts as a software “cushion” or virtualization layer between the OS 1210 and the bare hardware 1220 of the computer system 500.
VMM 1230 instantiates and runs one or more virtual machine instances (“guest machines”). Each guest machine comprises a “guest” operating system, such as OS 1210, and one or more applications, such as application(s) 1202, designed to execute on the guest operating system. The VMM 1230 presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems.
In some instances, the VMM 1230 may allow a guest operating system to run as if it is running on the bare hardware 1220 of computer system 1100 directly. In these instances, the same version of the guest operating system configured to execute on the bare hardware 1220 directly may also execute on VMM 1230 without modification or reconfiguration. In other words, VMM 1230 may provide full hardware and CPU virtualization to a guest operating system in some instances.
In other instances, a guest operating system may be specially designed or configured to execute on VMM 1230 for efficiency. In these instances, the guest operating system is “aware” that it executes on a virtual machine monitor. In other words, VMM 1230 may provide para-virtualization to a guest operating system in some instances.
A computer system process comprises an allotment of hardware processor time, and an allotment of memory (physical and/or virtual), the allotment of memory being for storing instructions executed by the hardware processor, for storing data generated by the hardware processor executing the instructions, and/or for storing the hardware processor state (e.g. content of registers) between allotments of the hardware processor time when the computer system process is not running. Computer system processes run under the control of an operating system, and may run under the control of other programs being executed on the computer system.
The above-described basic computer hardware and software is presented for purposes of illustrating the basic underlying computer components that may be employed for implementing the example embodiment(s). The example embodiment(s), however, are not necessarily limited to any particular computing environment or computing device configuration. Instead, the example embodiment(s) may be implemented in any type of system architecture or processing environment that one skilled in the art, in light of this disclosure, would understand as capable of supporting the features and functions of the example embodiment(s) presented herein.
The term “cloud computing” is generally used herein to describe a computing model which enables on-demand access to a shared pool of computing resources, such as computer networks, servers, software applications, and services, and which allows for rapid provisioning and release of resources with minimal management effort or service provider interaction.
A cloud computing environment (sometimes referred to as a cloud environment, or a cloud) can be implemented in a variety of different ways to best suit different requirements. For example, in a public cloud environment, the underlying computing infrastructure is owned by an organization that makes its cloud services available to other organizations or to the general public. In contrast, a private cloud environment is generally intended solely for use by, or within, a single organization. A community cloud is intended to be shared by several organizations within a community; while a hybrid cloud comprises two or more types of cloud (e.g., private, community, or public) that are bound together by data and application portability.
Generally, a cloud computing model enables some of those responsibilities which previously may have been provided by an organization's own information technology department, to instead be delivered as service layers within a cloud environment, for use by consumers (either within or external to the organization, according to the cloud's public/private nature). Depending on the particular implementation, the precise definition of components or features provided by or within each cloud service layer can vary, but common examples include: Software as a Service (SaaS), in which consumers use software applications that are running upon a cloud infrastructure, while a SaaS provider manages or controls the underlying cloud infrastructure and applications. Platform as a Service (PaaS), in which consumers can use software programming languages and development tools supported by a PaaS provider to develop, deploy, and otherwise control their own applications, while the PaaS provider manages or controls other aspects of the cloud environment (i.e., everything below the run-time execution environment). Infrastructure as a Service (IaaS), in which consumers can deploy and run arbitrary software applications, and/or provision processing, storage, networks, and other fundamental computing resources, while an IaaS provider manages or controls the underlying physical cloud infrastructure (i.e., everything below the operating system layer). Database as a Service (DBaaS) in which consumers use a database server or Database Management System that is running upon a cloud infrastructure, while a DbaaS provider manages or controls the underlying cloud infrastructure, applications, and servers, including one or more database servers.
In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what is intended by the applicants to be the scope of the invention, is the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.
In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. The sole and exclusive indicator of the scope of the invention, and what is intended by the applicants to be the scope of the invention, is the literal and equivalent scope of the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction.
This application claims the under 35 U.S.C. § 119(e) of provisional application 63/547,353, filed Nov. 4, 2023, by Ajit Mylavarapu et al., the entire contents of which is hereby incorporated by reference.
| Number | Date | Country | |
|---|---|---|---|
| 63547353 | Nov 2023 | US |