The present technique relates generally to importing data in accordance with a schedule.
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present technique, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present technique. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Organizations, regardless of size, rely upon access to information technology (IT) and data and services for their continued operation and success. A respective organization's IT infrastructure may have associated hardware resources (e.g. computing devices, load balancers, firewalls, switches, etc.) and software resources (e.g. productivity software, database applications, custom applications, and so forth). Over time, more and more organizations have turned to cloud computing approaches to supplement or enhance their IT infrastructure solutions.
Cloud computing relates to the sharing of computing resources that are generally accessed via the Internet. In particular, a cloud computing infrastructure allows users, such as individuals and/or enterprises, to access a shared pool of computing resources, such as servers, storage devices, networks, applications, and/or other computing based services. By doing so, users are able to access computing resources on demand that are located at remote locations, which resources may be used to perform a variety of computing functions (e.g., storing and/or processing large quantities of computing data). For enterprise and other organization users, cloud computing provides flexibility in accessing cloud computing resources without accruing large up-front costs, such as purchasing expensive network equipment or investing large amounts of time in establishing a private network infrastructure. Instead, by utilizing cloud computing resources, users are able redirect their resources to focus on their enterprise's core functions.
As part of using cloud computing resources, a client may use a cloud based resource as well as local or external systems. In such a situation, it may be desirable to bring data stored or generated locally or on an external system into a database maintained on the cloud computing resource. Such data importing may be part of a routine process, where the external or local data generation is part of an ongoing client process, or a one-time or limited-time event, such as part of a data integration or onboarding event. As may be appreciated, however, for large data sets or files (e.g., millions of records), such an operation may be time-consuming and resource intensive.
A summary of certain embodiments disclosed herein is set forth below. It should be understood that these aspects are presented merely to provide the reader with a brief summary of these certain embodiments and that these aspects are not intended to limit the scope of this discussion. Indeed, this technique may encompass a variety of aspects that may not be set forth below.
As discussed herein, the present approach addresses various issues related to importing locally or externally generated data to a cloud-based resource, such as may be available as part of a client instance of a cloud platform. Such an import process may occur in separate stages, with a first stage corresponding to importing client data from an external source to a cloud-based resource, (e.g., to a staging (i.e., temporary) table within a client instance maintained on the cloud) and a second stage corresponding to transforming the data from the staging table to target table utilized on the instance by the client). At the first stage, an import file (or other data set) external to the client instance is parsed, and the data extracted and stored in the staging table on the client instance. In one example of an implementation, column headers in the data file to be imported (e.g., a csv or spreadsheet file) or table column names (e.g., in a JDBC import file) are used as the column names in staging table. With respect to the transformation step in this example, these names are mapped to corresponding target table columns, such as based upon a user defined mapping scheme. That is, in this example, if there is a column in the external data file referenced as “Name” there will be a corresponding column referenced as “Name” in the staging table. Based on the transform mapping, this “Name” column in the staging table may be associated with a “User Name” column in the target table. In some circumstances, additional processing may be performed as part of the transformation step after basic mapping or user-established rules may be invoked upon inserting or updating data in the target table. In conventional approaches, such an import process may be time and resource intensive.
In accordance with the present approach, concurrent processing is employed to facilitate the import process. By way of example, in accordance with the present approach data to be imported is partitioned into multiple, separate import sets. The multiple imports set mechanism allows import rows to be grouped within the staging table. Thus, while conventional approaches group all import rows to one import set, which is subsequently processed in a sequential or single-threaded fashion, the approach discussed herein groups the imports rows into multiple import sets within the staging table, allowing the import set rows to be transformed concurrently using one thread per import set.
Various refinements of the features noted above may exist in relation to various aspects of the present technique. Further features may also be incorporated in these various aspects as well. These refinements and additional features may exist individually or in any combination. For instance, various features discussed below in relation to one or more of the illustrated embodiments may be incorporated into any of the above-described aspects of the present technique alone or in any combination. The brief summary presented above is intended only to familiarize the reader with certain aspects and contexts of embodiments of the present technique without limitation to the claimed subject matter.
Various aspects of this technique may be better understood upon reading the following detailed description and upon reference to the drawings in which:
One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and enterprise-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
As used herein, the term “computing system” refers to an electronic computing device such as, but not limited to, a single computer, virtual machine, virtual container, host, server, laptop, and/or mobile device, or to a plurality of electronic computing devices working together to perform the function described as being performed on or by the computing system. As used herein, the term “medium” refers to one or more non-transitory, computer-readable physical media that together store the contents described as being stored thereon. Embodiments may include non-volatile secondary storage, read-only memory (ROM), and/or random-access memory (RAM). As used herein, the term “application” refers to one or more computing modules, programs, processes, workloads, threads and/or a set of computing instructions executed by a computing system. Example embodiments of an application include software modules, software objects, software instances and/or other types of executable code.
As discussed herein, the present approach facilitates concurrent processing of data to be imported, such as from an external source, to a table or database within a client instance made accessible in a cloud computing environment. In accordance with this approach, data to be imported undergoes a two stage import process. In the first stage, client data from an external source is imported to a cloud-based resource, (e.g., to a staging (i.e., temporary) table within a client instance maintained on the cloud). At this first stage, the data to be imported (e.g., an import file) is partitioned into multiple, separate import sets data which allows import rows to be grouped by import set within the staging table. In this manner, the import rows corresponding to different imports sets within the staging table may be concurrently processed in the subsequent transformation step, such as by a different thread for each import set. In this manner different import sets of the data may be concurrently processed to update the target table based upon client defined column mappings and/or other rules.
With the preceding in mind, the following figures relate to various types of generalized system architectures or configurations that may be employed to provide services to an organization (e.g., a client) using a cloud-based architecture, such as a multi-instance or multi-tenant framework, and on which the present approaches may be employed. Correspondingly, these system and platform examples may also relate to systems and platforms on which the techniques discussed herein may be implemented or otherwise utilized. Turning now to
For the illustrated embodiment,
In
To utilize computing resources within the platform 16, network operators may choose to configure the data centers 18 using a variety of computing infrastructures. In one embodiment, one or more of the data centers 18 are configured using a multi-tenant cloud architecture, such that one of the server instances 26 handles requests from and serves multiple customers. Data centers 18 with multi-tenant cloud architecture commingle and store data from multiple customers, where multiple customer instances are assigned to one of the virtual servers 26. In a multi-tenant cloud architecture, the particular virtual server 26 distinguishes between and segregates data and other information of the various customers. For example, a multi-tenant cloud architecture could assign a particular identifier for each customer in order to identify and segregate the data from each customer. Generally, implementing a multi-tenant cloud architecture may suffer from various drawbacks, such as a failure of a particular one of the server instances 26 causing outages for all customers allocated to the particular server instance.
In another embodiment, one or more of the data centers 18 are configured using a multi-instance cloud architecture to provide every customer its own unique customer instance or instances. For example, a multi-instance cloud architecture could provide each customer instance with its own dedicated application server and dedicated database server. In other examples, the multi-instance cloud architecture could deploy a single physical or virtual server 26 and/or other combinations of physical and/or virtual servers 26, such as one or more dedicated web servers, one or more dedicated application servers, and one or more database servers, for each customer instance. In a multi-instance cloud architecture, multiple customer instances could be installed on one or more respective hardware servers, where each customer instance is allocated certain portions of the physical server resources, such as computing memory, storage, and processing power. By doing so, each customer instance has its own unique software stack that provides the benefit of data isolation, relatively less downtime for customers to access the platform 16, and customer-driven upgrade schedules. An example of implementing a customer instance within a multi-instance cloud architecture will be discussed in more detail below with reference to
Although
As may be appreciated, the respective architectures and frameworks discussed with respect to
With this in mind, and by way of background, it may be appreciated that the present approach may be implemented using one or more processor-based systems such as shown in
With this in mind, an example computer system may include some or all of the computer components depicted in
The one or more processors 202 may include one or more microprocessors capable of performing instructions stored in the memory 206. Additionally or alternatively, the one or more processors 202 may include application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or other devices designed to perform some or all of the functions discussed herein without calling instructions from the memory 206.
With respect to other components, the one or more busses 204 include suitable electrical channels to provide data and/or power between the various components of the computing system 200. The memory 206 may include any tangible, non-transitory, and computer-readable storage media. Although shown as a single block in
With the preceding in mind,
Turning now to
In this example, an initial stage may be, upon selection or identification of the import data 400, to import (step 404) the import data 400 (e.g., a flat file, spreadsheet, or database table) a staging table 408 (e.g., a temporary table) located in the client instance 102. As may be appreciated, the import job associated with importing the import data 400 may be run on a scheduled or periodic basis or may be run on-demand, such as by being initiated manually by a user.
In this example, the staging table 408 stores parsed data based on column or field name as present in the original data file 400. This step effectively moves the import data 400 from the client local environment to the client instance 102 based on the cloud platform 16. As may be appreciated, in the depicted example, the import data 400 is imported as a single import set as a consequence of being generated via a single import operation or job. An example of such an import set stored in a staging table 408 is depicted in
Once in the client instance 102, the data in the staging table 408 may be processed to modify (step 412) a target table or tables 416. The modify step 412 may take the form of an update or insert operation or query used to update target table(s) 416 based on data found in the staging table 408. At this stage, one or more transform maps 420 may be applied to map the import data 400 as represented into the staging table 408 to the appropriate format and/or layout of the target table(s) 416, which may include relationships or multi-table layout and/or may be configured for use by applications running on the client instance 102 and used by the client. In practice, the transformation step involved in modifying the target table(s) 416 may be implemented as a single threaded implementation of one or more transformation scripts and/or target table rules, and may therefore take substantial time (e.g., hours or days) to perform for large data sets.
With the preceding in mind, in accordance with aspects of the present approach, concurrent processing is employed to facilitate the data import process from a local file to a table maintained on a client instance. Turning to
With respect to the data partitioning stage, in accordance with this approach, the import data 400 (e.g., a data file or table) is partitioned into N import sets, where N corresponds to the degree of concurrency supported or desired. In practice, the number of import sets may be based on the number of available transform jobs that can be worked at a given time. In one implementation, there may be two transform jobs made available per node (e.g., application node) running in the customer instance 102, so that a single application node gives rise to two transform jobs (and therefore two import sets), two application nodes gives rise to four transform jobs, and so forth. The number of available transform jobs, however, may vary from this ratio and/or may be customizable based on client need. Further, the relationship between nodes and transform jobs does not have to scale linearly, but may instead increase non-linearly to reflect that larger clients (As reflected by the number of nodes in the client instance 102) may have disproportionately greater data import needs.
As in the preceding example, an example of a staging table 408 is depicted in
The above described aspects are illustrated in an alternative manner in
After loading data into the multiple import sets, the import set identifiers may be inserted or added to a queue 436. A set of transform jobs 438 may poll this queue. When a job is found in the queue 436, it is pulled in the queue and processes, so if multiple (e.g, two, four, and so forth) jobs are found in the queue, all will be processed concurrently, as shown. As shown, the transform jobs 438 may each correspond to a respective transform trigger 440, with the number of triggers determined by the number of active nodes 442, as discussed in greater detail below.
With this in mind, in one implementation, the number of import sets is determined based on the number of available system triggers 440 (i.e., sys_triggers) to concurrently execute the transform jobs 438 needed to transform the data stored in the staging table 408 to how the data will be stored in the target table(s) 416. In one implementation, by default two system triggers 440 are defined for each active node 442 (e.g., application node) so that in a one node instance, two runnable system triggers will be defined, in a four node instance, eight system triggers will be defined, and so forth.
As noted above, concurrent processing of import data may introduce various issues not raised by non-concurrent import processes. For example, potential duplication of records in the target table 416 may arise in exceptional circumstances. In particular, coalesce fields may be used to define the unique key columns. After importing a data set there should not be two records with same coalesce field values. However, with the concurrent import sets, if two threads check for existence of a record simultaneously, both threads may end up with add the record, leading to the duplication of a data record.
With this in mind, in accordance with the present approach, such duplication of records in the target table 416 may be avoided by acquiring a row lock for each insertion of rows. The row lock may be based on a mutex object with coalesce values and target table names, where the mutex object, as used herein, is a synchronization object whose state indicates whether it is or is not currently ties to a thread (i.e., it can only be associated with one thread as a time). By employing mutex objects ties to the coalesce values and target table names, a process flow can be implemented in which insertion of duplicate records into the target table 416 is avoided in a multi-threaded process.
In other exceptional circumstances, it may be desirable to perform a data import process with concurrency, as discussed herein, but also to process one or more groups of the records to in the order in which they appear in the original import data 400, i.e., the order or sequence of records within a subset of the records may have significance. In practice, it may not be feasible to guarantee the order of processing of all records in the import data 400 while providing for concurrent processing. However, the order of processing for a subset of the records of the import data 400 can be assured if they are present in the same import set, as discussed above.
By way of example, in one embodiment, hashing may be employed to provide such custom partitioning. In one such example, the customer provides a script (with the current record as an input parameter) to define grouping (i.e., rows belonging to same group should return same value). By default, the row number may be used as the hash function (i.e., no grouping) and customers can customize it to allow for grouping. An example of one such grouping hash function is as follows:
With the preceding discussion and examples in mind,
Turning to
A user-viewable list of scheduled import jobs is depicted in an example interface shown in
The configuration of an import job selected from the example interface of
In the example illustrated in
Turning to
Each listed import set (column 470) is unique and presumably corresponds to a separate staging table 408. Details may also be displayed related to the schedule and import job type (column 540) and status of the import job (column 542). In the event an import job is performed with concurrency, multiple separate and distinct import sets 470 are generated with respect to a single set of import data 450, though the separate import sets in combination or in the aggregate convey the data of the import data set. Thus, in combination, a set of import sets 470 may correspond to a single concurrent import set 550 that corresponds to the data associated with a set of import data, even though the import data is processed through separate import sets to achieve concurrent processing. Thus, and as shown in
With respect to the insertion of the import set data stored in the staging table 408 to the target table(s) 416, as noted above this may occur in response to one or more configurable system triggers.
Turning to
With the preceding in mind, the present approach facilitates concurrent processing of data to be imported, such as from an external source, to a table or database within a client instance made accessible in a cloud computing environment. As discussed herein, data to be imported is partitioned into multiple, separate import sets which populate a staging table. The data in the staging table is processed concurrently to populate a target table or tables. The processing of the staging table to populate the target table or tables may be in response to one or more system triggers.
The specific embodiments described above have been shown by way of example, and it should be understood that these embodiments may be susceptible to various modifications and alternative forms. It should be further understood that the claims are not intended to be limited to the particular forms disclosed, but rather to cover all modifications, equivalents, and alternatives falling within the spirit and scope of this disclosure.
The techniques presented and claimed herein are referenced and applied to material objects and concrete examples of a practical nature that demonstrably improve the present technical field and, as such, are not abstract, intangible or purely theoretical. Further, if any claims appended to the end of this specification contain one or more elements designated as “means for [perform]ing [a function] . . . ” or “step for [perform]ing [a function] . . . ”, it is intended that such elements are to be interpreted under 35 U.S.C. 112(f). However, for any claims containing elements designated in any other manner, it is intended that such elements are not to be interpreted under 35 U.S.C. 112(f).
Number | Name | Date | Kind |
---|---|---|---|
6122630 | Strickler | Sep 2000 | A |
6321229 | Goldman et al. | Nov 2001 | B1 |
6609122 | Ensor | Aug 2003 | B1 |
6799189 | Huxoll | Sep 2004 | B2 |
6816898 | Garg et al. | Nov 2004 | B1 |
6895586 | Brasher et al. | May 2005 | B1 |
7020706 | Cates | Mar 2006 | B2 |
7027411 | Pulsipher | Apr 2006 | B1 |
7028301 | Ding | Apr 2006 | B2 |
7062683 | Warpenburg | Jun 2006 | B2 |
7131037 | LeFaive | Oct 2006 | B1 |
7170864 | Matharu | Jan 2007 | B2 |
7392300 | Anantharangachar | Jun 2008 | B2 |
7619512 | Gerber | Oct 2009 | B2 |
7617073 | Trinon | Nov 2009 | B2 |
7685167 | Mueller | Mar 2010 | B2 |
7689628 | Garg | Mar 2010 | B2 |
7716353 | Golovinsky | May 2010 | B2 |
7769718 | Murley | Aug 2010 | B2 |
7783744 | Garg | Aug 2010 | B2 |
7890802 | Gerber | Feb 2011 | B2 |
7925981 | Pourheidari | Apr 2011 | B2 |
7930396 | Trinon | Apr 2011 | B2 |
7933927 | Dee | Apr 2011 | B2 |
7945860 | Vambenepe | May 2011 | B2 |
7966398 | Wiles | Jun 2011 | B2 |
8051164 | Peuter | Nov 2011 | B2 |
8082222 | Rangarajan | Dec 2011 | B2 |
8151261 | Sirota | Apr 2012 | B2 |
8224683 | Manos | Jul 2012 | B2 |
8266096 | Navarrete | Sep 2012 | B2 |
8380645 | Kowalski | Feb 2013 | B2 |
8402127 | Solin | Mar 2013 | B2 |
8457928 | Dang | Jun 2013 | B2 |
8478569 | Scarpelli | Jul 2013 | B2 |
8554750 | Rangaranjan | Oct 2013 | B2 |
8612408 | Trinon | Dec 2013 | B2 |
8646983 | Myers | Feb 2014 | B2 |
8674992 | Poston | Mar 2014 | B2 |
8689241 | Naik | Apr 2014 | B2 |
8743121 | De Peuter | Jun 2014 | B2 |
8745040 | Kowalski | Jun 2014 | B2 |
8812539 | Milousheff | Aug 2014 | B2 |
8818994 | Kowalski | Aug 2014 | B2 |
8832652 | Mueller | Sep 2014 | B2 |
8887133 | Behnia | Nov 2014 | B2 |
9015188 | Behne | Apr 2015 | B2 |
9037536 | Vos | Apr 2015 | B2 |
9065783 | Ding | Jun 2015 | B2 |
9098322 | Apte | Aug 2015 | B2 |
9122552 | Whitney | Sep 2015 | B2 |
9137115 | Mayfield | Sep 2015 | B2 |
9239857 | Trinon | Jan 2016 | B2 |
9317327 | Apte | Apr 2016 | B2 |
9323801 | Morozov | Apr 2016 | B2 |
9363252 | Mueller | Jun 2016 | B2 |
9396242 | Varley et al. | Jul 2016 | B2 |
9412084 | Kowalski | Sep 2016 | B2 |
9467344 | Gere | Oct 2016 | B2 |
9535737 | Joy | Jan 2017 | B2 |
9613070 | Kumar | Apr 2017 | B2 |
9645833 | Mueller | May 2017 | B2 |
9654473 | Miller | May 2017 | B2 |
9659051 | Hutchins | May 2017 | B2 |
9766935 | Kelkar | Sep 2017 | B2 |
9792387 | George | Oct 2017 | B2 |
9805322 | Kelkar | Oct 2017 | B2 |
9852165 | Morozov | Dec 2017 | B2 |
10002203 | George | Jun 2018 | B2 |
10754565 | Graefe | Aug 2020 | B2 |
20070203952 | Baron et al. | Aug 2007 | A1 |
20070239791 | Cattell | Oct 2007 | A1 |
20120259894 | Varley | Oct 2012 | A1 |
20160179437 | Piduri et al. | Jun 2016 | A1 |
20180129693 | Chatterjee | May 2018 | A1 |
20200034365 | Martin | Jan 2020 | A1 |
20200042522 | Zait | Feb 2020 | A1 |
Entry |
---|
Gonazalez et al.: “Semantic representations for knowledge modelling of a Natural Language Interface to Databases using ontologies”; International Journal of combinatorial optimization Problems and Informatics; Aug. 1, 2015; pp. 28-42 (XP055682974). |
Pazos R. et al: “Comparative Study on the Customization of Natural Language interfaces to Databases”; Springerplus, vol. 5, No. 1, Apr. 30, 2016 (XP055683685). |
Nihalani et al.: “Natural langauge Interface for Database: A Brief review”, IJCSI International Journal of Computer Science Issues, vol. 8, Issue 2, Mar. 1, 2011, pp. 600-608 (XP05581542). |
International Search Report and Written Opinion of PCT Application No. PCT/US2020/014286 dated Apr. 20, 2020; 16 pgs. |
Number | Date | Country | |
---|---|---|---|
20200159746 A1 | May 2020 | US |