This disclosure relates generally to data privacy and data security, and more particularly to generating code executable to implement a data replication policy, according to various embodiments.
Maintaining data availability and consistency across databases in cloud-based applications presents a significant technical problem. For example, it is important for users to be compliant with the General Data Protection Regulation (GDPR) for data that is replicated across the cloud to different datacenters. As a non-limiting example, consider a database located in the United States that stores users' personal information, such as name, age, gender, address, SSN, etc. If this data needs to be replicated from its source database in the U.S. to a target database outside of the U.S. (e.g., to a target database located in an EU member state), it may be necessary to transform some portion(s) of the data to satisfy security concerns and comply with GDPR (or other) requirements.
The amount of data that is generated, for example through the use of web services, is growing exponentially. Maintaining customers' data availability and consistency across databases in cloud-based applications presents a significant technical problem. For example, it is important for users to be compliant with GDPR or other data-privacy regulations for data that is replicated across datacenters, as noted above. To enable replication of data from a source database to a target database, a data replication policy may be applied. As used herein, the term “data replication policy” refers to a set of rules that govern the replication of data from a source data store (e.g., a database at a first datacenter) to a target data store (e.g., a second database at a second datacenter). Existing data replication systems present various technical shortcomings. For example, while such systems may enable data replication between databases using a data replication policy, such systems require the policy to be coded using hand-written rules (provided, for example, in the Human-Optimized Config. Object Notation (HOCON) format). Stated differently, these prior systems require data replication policies to be designed and implemented manually, which presents various technical problems. For example, coding such a data replication policy is a time-consuming and error-prone process, requiring the user to have a significant understanding of the database objects involved and the particular coding syntax (e.g., the HOCON format). Consider, as one non-limiting example, an instance in which the user (e.g., a database administrator) manually codes a policy at the row/column level for an organization (e.g., a tenant of a multi-tenant system) but the code contains a syntactical or semantic error. In such a situation, the data replication operation will be interrupted, costing significant time and effort in resolving the error. If the policy has any syntactic or semantic errors, the data will not be replicated accurately, potentially violating GDPR or other data-privacy mandates.
In various embodiments, the disclosed systems and methods address these and other technical problems by allowing a user to define a data replication policy based on the user's interaction with a policy designer graphical user interface (GUI). Based on the user's input, the disclosed systems and methods are operable to generate code that is executable to implement the data replication policy (e.g., by performing various transformations to selected data objects) without requiring the user manually write the code. For example, in some embodiments, the disclosed techniques include a policy designer module that is operable to monitor and capture user-triggered actions, performed via a policy designer GUI, associated with various types of data objects (e.g., schemas, tables, columns, etc.) along with the transformations performed. The policy designer module may store information indicative of the user's input, such as information indicative of each user action, including, for a given data transformation, the data object involved and the order of the transformation in the sequence of the user's input. Based on this stored information, the policy designer module is operable to generate code (e.g., provided in HOCON format) executable to implement a data replication policy for use in a data replication operation. For example, in various embodiments, the policy designer module is operable to convert the information indicative of the user input into corresponding code statements (e.g., provided in any suitable format, such as HOCON). The policy designer module may generate one or more code statements for each of the transformations specified via the policy designer GUI such that the disclosed system generates code executable to implement a data replication policy specifying the transformations that the user wishes to make without requiring the user to go through the technically rigorous and error-prone process of manually coding the policy. In various embodiments, the disclosed techniques improve data security and compliance with privacy regulations during the data-replication process and improve the data-replication process as a whole.
Referring now to
In
As used herein, the term “transformation” refers to a modification to one or more characteristics of a data object. One non-limiting example of a transformation is the modification of a name associated with a data object. For example, the user may select a column database object via the policy designer GUI 112 and specify a transformation that changes the name of the column database object when a transformed version of that database object is replicated to a target database. Another non-limiting example of a transformation is the modification of a value associated with a data object. For example, the user may select a table database object via the policy designer GUI 112 and specify transformation that modifies one or more of the values stored in the table database object. For instance, consider an embodiment in which a table database object stores users' personal information, such as a Social Security number (SSN) and residence address. In such an embodiment, user (e.g., the database administrator) may wish to mask the SSNs when the table database object is replicated to a target database (which may be located in another country, in some embodiments). For example, the user may wish to encrypt the SSNs using SHA1, MD4, or any other suitable encryption algorithm. In various embodiments, the user may provide input indicative of this desired transformation via the policy designer GUI 112 (e.g., using a series of drop-down menus or any other suitable input elements). In other embodiments, the user (e.g., the database administrator) may wish to not replicate the SSNs to the target database at all. In such an embodiment, the user may provide input indicating that the SSN portion (e.g., a column) of the table database object is not to be replicated as part of the data replication policy. Further, in some instances, the user may wish to modify the residence address information stored in the table database object. For example, the user may provide input specifying a transformation that modifies a street level address of the users to simply denote the state or country of residence of the users. Example transformations that a user may specify via the policy designer GUI 112 are described in more detail below with reference to
Note that, in some embodiments, policy designer GUI 112 may be provided to the client device as a web application in which the server system 102 provides code and data usable to render the policy designer GUI 112 via a browser application executing on the client device 110. In other embodiments, policy designer GUI 112 may be provided as part of a software application executing on the client device 110. In such embodiments, server system 102 may provide the client device 110 with data indicative of one or more source databases (such as source database 122) to allow the user to define a data replication policy via the policy designer GUI 112.
In various embodiments, policy designer module 104 is operable to generate code 150 that is executable to implement the data replication policy specified by the user input via the policy designer GUI 112. For example, in some embodiments, the policy designer module 104 may store information indicative of the user input provided via the policy designer GUI 112. In various embodiments, policy designer module 104 may store metadata associated with the database objects on which one or more transformations are to be performed, details of the transformation to be performed, and an order in which the one or more transformations are to be performed. Stated differently, in various embodiments, policy designer module 104 stores information indicating the order and type of transformations applied for each of the database objects specified by the user via the policy designer GUI 112. Based on this information, the policy designer module 104 is operable to generate code 150 that is executable to implement the data replication policy specified by the user. For example, for each transformation specified by the user, the policy designer module 104 may generate one or more corresponding code statements that are executable to perform those transformations in the order specified by the user. In one non-limiting example, policy designer module 104 is operable to generate these code statements using a HOCON format. Once it has generated the code 150 executable to implement the data replication policy, policy designer module 104 may store this code in a data replication policy store 108. In various embodiments, data replication policy store 108 may be implemented using any of various suitable storage devices included in or accessible to (e.g., by direct connection or via one or more communication networks) server system 102. Note that, although shown as part of server system 102, data replication policy store 108 may be implemented using a storage device separate from and accessible to the server system 102 or the policy designer module 104.
In
Note that, although policy designer module 104, data replication module 106, and data replication policy store 108 are shown on the same server system 102 in
Turning now to
In
Policy designer module 104 further includes action engine 204. In various embodiments, action engine 204 is operable to capture the user input (e.g., performed by the user via the policy designer GUI 112) indicative of the one or more transformations to be performed as part of a data replication policy and store that information (e.g., in the ledger 205). For example, in some embodiments, the user may specify a series of one or more transformation to be made to a first database object stored in the source database 122 as part of a data replication policy. In various embodiments, the action engine 204 is operable to store information indicative of the series of one or more transformations (such as the type of transformations being made, steps involved in the transformations, etc.) and the order in which the transformations are to be performed. Stated differently, in various embodiments, the action engine 204 may store in ledger 205 an ordered list of the transformations to be applied to a database object. Ledger 205 may be stored in a storage device included in (or accessible to) the server system 102.
In
Note, however, that this example function is provided merely as one non-limiting embodiment. In other embodiments, other suitable functions (or variations of the function provided above) may be used to generate the code 150 executable to implement the data replication policy.
In
Policy designer module 104 further includes policy publisher module 210, which, in various embodiments, is operable to publish the code 150 to a requesting system. For example, once the code 150 has been successfully been validated by the policy validation module 208, the policy publisher module 210 is operable to read the code 150 from the data replication policy store 108 and send (that is “publish”) the code 150 to a requesting system for use in a data replication operation, as described in more detail below with reference to
Referring now to
In the depicted embodiment, source system 120 includes relay module 308. In various embodiments, relay module 308 is operable to execute the code 150 to perform the transformations specified in the data replication policy (that is, relay module 308 is operable to execute the code 150 to implement the data replication policy) on the data from source database 122 that is to be replicated. For example, in various embodiments, once a data replication operation is initiated, relay module 308 may retrieve code 150 from the server system 102. In some embodiments, relay module 308 may send a request to the policy designer module 104 identifying a particular data replication policy and, in response, policy publisher module 210 may retrieve and send the code 150 that is executable to implement the particular data replication policy to the relay module 308. (Note, however, that in some embodiments, the code 150 may be stored on or otherwise available to the source system 120 prior to initiation of the data replication operation such that the code 150 does not need to be requested from the server system 102.) Once it has access to the code 150, relay module 308 may execute the code 150 on the specified database objects from source database 122 to generate the transformed versions of these database objects. Relay module 308 may transfer the transformed database objects from the source system 120 to the target system 302. In the depicted embodiment, relay module 308 sends the transformed data via a databus service (e.g., using the Salesforce™ platform) using the data replication module 106. As indicated in
In the depicted embodiment, target system 302 includes target database 304 (the database to which the data is replicated) and consumer module 306. In various embodiments, consumer module 306 is operable to receive the transformed version of the data from the relay module 308 and to store the data in the target database 304. Note that, in some embodiments, transformations may be performed on the data to be replicated by relay module 308, by consumer module 306, or by both. For example, in some embodiments, once the consumer module 306 receives the transformed version of the data from the relay module 308, consumer module 306 may also perform one or more transformations on the data before it is stored in the target database 304. For example, as noted above, source system 120 and target system 302 may be located in different countries and, as such, data stored at the respective systems may be subject to different data-privacy regulations. To ensure compliance with applicable data-privacy regulations, target system 302 may separately apply a data replication policy to the data it receives prior to storing it in target database 304, in at least some embodiments.
In
Note that, although the policy designer module 104 is shown executing on a single server system 102 that is remote from source system 120 and target system 302, this embodiment is provided merely as one non-limiting example. In other embodiments, policy designer module 104 may be implemented as a microservice that may be executed by various computing systems within a public or private cloud, which may include the source system 120, the target system 302, or server system 102. For example, in some embodiments, each of source system 120, target system 302, and server system 102 may be implemented in one or more datacenter environments. For example, in some embodiments, each of server system 102, source system 120, and target system 302 may be implemented at separate datacenters. In other embodiments, however, one or more of server system 102, source system 120, and target system 302 may be implemented at the same datacenter.
Note that the transformations depicted in
Referring now to
At 602, in the illustrated embodiment, a computer system provides, to a client device, code that is usable to generate a GUI that enables a user of the client device define a data replication policy, via the GUI, to be implemented during a data replication operation. For example, policy designer module 104 may generate code (e.g., specified using HTML, JavaScript, or any other suitable programming or markup language) that is usable by client device 110 (e.g., by a browser application executing on the client device 110) to provide the policy designer GUI 112 to a user of client device 110. As described above, in various embodiments, policy designer GUI 112 may be used to define a data replication policy based on the user's actions, rather than requiring the user manually write code for the data replication policy.
At 604, in the illustrated embodiment, the computer system receives, from the client device, user input provided via the GUI that defines a first data replication policy. In various embodiments, the user input indicates a series of transformations to be performed, as part of the first data replication policy, on first database object from a source database to replicate a transformed version of the first database object to a target database. For example, the user may provide input via the policy designer GUI 112 to define a data replication policy, including one or more transformations to be made to one or more database objects, to replicate data from the source database 122 to a target database 304. In various embodiments, receiving the user input includes detecting user actions performed via the GUI at the client device. Further note that, in various embodiments, the computer system stores information indicative of the user input, such as metadata associated with the first database object and an order in which the series of transformations are to be performed on the first database object.
At 606, in the illustrated embodiment, the computer system generates, based on the user input, code that is executable to implement the first data replication policy, where the code is usable to perform the series of transformations on the first database object to generate the transformed version of the first database object. For example, as described above with reference to
In some embodiments, method 600 further includes receiving, by the computer system, second user input associated with a second database object in the source database, where the second user input corresponds to a second series of transformations to be performed, as part of the first data replication policy, to replicate a transformed version of the second database object. In some such embodiments, generating the code 150 executable to implement the first data replication policy includes generating one or more code statements that are executable to perform the second series of transformations, on the second database object, in an order specified by the second user input. Note that, in some embodiments, the second database object may be of a different object-type than the first database object, and that the second series of transformations may be different from the series of transformations to be performed on the first database object.
Note that, in some embodiments, subsequent to generating the code executable to implement the first data replication policy, the computer system validates the code to identify any syntactical or semantical errors. In some such embodiments, in response to a determination that the code does not include any syntactical or semantical errors (or the code does not include significant errors such that the code is deemed suitable for use), the computer system sends the code to a source datacenter (e.g., source system 120) at which the source database (e.g., source database 122) is maintained. Further note that, in some embodiments, the source database is a multi-tenant database that is configured to store data for a plurality of tenants, where the data replication policy is usable to replicate database objects, from the source database, that are associated with a first one of the plurality of tenants.
In some embodiments, method 600 further includes the computer system receiving the transformed version of the first database object (e.g., from the source system 120) and sending the transformed version of the first database object to a target datacenter at which the target database is maintained. As noted above, in some embodiments, the source datacenter (at which the source database is maintained) and the target datacenter (at which the target database is maintained) are located in different countries and, potentially, on different continents. In some embodiments, method 600 further includes the computer system receiving second user input provided via the GUI that defines a second data replication policy, where the second user input indicates a second series of transformations to be performed, on one or more database objects, prior to storage in the target database at the target datacenter. For example, as described above with reference to
Referring now to
At 702, in the illustrated embodiment, a server system generates code indicative of a GUI that is operable to: provide information corresponding to a plurality of database objects from a source database to a user, and receive transformation instructions from the user specifying one or more transformations to be performed on a given one of the plurality of database objects during a data replication operation. For example, in various embodiments, policy designer module 104 is operable to generate code for the policy designer GUI 112. At 704, in the illustrated embodiment, the server system sends the code indicative of the GUI to a client device, such as client device 110.
At 706, in the illustrated embodiment, the server system receives, from the client device, a set of user-specified transformation instructions provided via the GUI, where the set of user-specified transformation instructions indicate a first plurality of transformations to be performed on a first database object of the plurality of database objects. In some embodiments, the first plurality of transformations includes at least one of modifying a name associated with the first database object, and modifying a value associated with the first database object (as two non-limiting examples). In some embodiments, modifying the value associated with the first database object includes encrypting the value using a first encryption algorithm, such as the SHA1 algorithm.
At 708, in the illustrated embodiment, the server system generates, based on the set of user-specified instructions, code executable to implement a data replication policy, where the code includes code statements executable to perform the first plurality of transformations on the first database object. In some embodiments, method 700 further includes the server system sending the code executable to implement the data replication policy to a computer system at the first datacenter, receiving, from the computer system at the first datacenter, a transformed version of the first database object, and sending the transformed version of the first database object to a second computer system at a second datacenter for storage in a second database, as described in more detail above with reference to
Referring now to
Processor subsystem 820 may include one or more processors or processing units. In various embodiments of computer system 800, multiple instances of processor subsystem 820 may be coupled to interconnect 880. In various embodiments, processor subsystem 820 (or each processor unit within 820) may contain a cache or other form of on-board memory.
System memory 840 is usable to store program instructions executable by processor subsystem 820 to cause system 800 perform various operations described herein. System memory 840 may be implemented using different physical, non-transitory memory media, such as hard disk storage, floppy disk storage, removable disk storage, flash memory, random access memory (RAM—SRAM, EDO RAM, SDRAM, DDR SDRAM, RAMBUS RAM, etc.), read only memory (PROM, EEPROM, etc.), and so on. Memory in computer system 800 is not limited to primary storage such as system memory 840. Rather, computer system 800 may also include other forms of storage such as cache memory in processor subsystem 820 and secondary storage on I/O devices 870 (e.g., a hard drive, storage array, etc.). In some embodiments, these other forms of storage may also store program instructions executable by processor subsystem 820.
I/O interfaces 860 may be any of various types of interfaces configured to couple to and communicate with other devices, according to various embodiments. In one embodiment, I/O interface 860 is a bridge chip (e.g., Southbridge) from a front-side to one or more back-side buses. I/O interfaces 860 may be coupled to one or more I/O devices 870 via one or more corresponding buses or other interfaces. Examples of I/O devices 870 include storage devices (hard drive, optical drive, removable flash drive, storage array, SAN, or their associated controller), network interface devices (e.g., to a local or wide-area network), or other devices (e.g., graphics, user interface devices, etc.). In one embodiment, I/O devices 870 includes a network interface device (e.g., configured to communicate over WiFi, Bluetooth, Ethernet, etc.), and computer system 800 is coupled to a network via the network interface device.
Turning now to
In various embodiments, MTCS 900A may be configured to provide computing resources to users 910 associated with various tenants 912 of MTCS 900A. For example, MTCS 900A may host software applications (e.g., using application servers 902) and store data (e.g., using multi-tenant databases 906) for various tenants 912 such that users 910 associated with the tenants 912 may access the software applications and data via network 908. Network 908 may be any suitable network implemented as a LAN (local area network), WAN (wide area network), wireless network, point-to-point network, star network, token ring network, hub network, or any other appropriate configuration or combination thereof. In various embodiments, tenant data (e.g., stored in databases 906) may be maintained such that the data of one tenant (e.g., tenant 912A) is kept separate from the data of another tenant (e.g., tenant 912C) such that the separate tenants do not have access to the other's data, unless such data is expressly shared.
In some instances, various embodiments of the present disclosure may be implemented in the context of one or more multi-tenant computer systems, such as MTCS 900A of
Although the embodiments disclosed herein are susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the figures and are described herein in detail. It should be understood, however, that figures and detailed description thereto are not intended to limit the scope of the claims to the particular forms disclosed. Instead, this application is intended to cover all modifications, equivalents and alternatives falling within the spirit and scope of the disclosure of the present application as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description.
This disclosure includes references to “one embodiment,” “a particular embodiment,” “some embodiments,” “various embodiments,” “an embodiment,” etc. The appearances of these or similar phrases do not necessarily refer to the same embodiment. Particular features, structures, or characteristics may be combined in any suitable manner consistent with this disclosure.
As used herein, the term “based on” is used to describe one or more factors that affect a determination. This term does not foreclose the possibility that additional factors may affect the determination. That is, a determination may be solely based on specified factors or based on the specified factors as well as other, unspecified factors. Consider the phrase “determine A based on B.” This phrase specifies that B is a factor that is used to determine A or that affects the determination of A. This phrase does not foreclose that the determination of A may also be based on some other factor, such as C. This phrase is also intended to cover an embodiment in which A is determined based solely on B. As used herein, the phrase “based on” is synonymous with the phrase “based at least in part on.”
As used herein, the phrase “in response to” describes one or more factors that trigger an effect. This phrase does not foreclose the possibility that additional factors may affect or otherwise trigger the effect. That is, an effect may be solely in response to those factors, or may be in response to the specified factors as well as other, unspecified factors. Consider the phrase “perform A in response to B.” This phrase specifies that B is a factor that triggers the performance of A. This phrase does not foreclose that performing A may also be in response to some other factor, such as C. This phrase is also intended to cover an embodiment in which A is performed solely in response to B.
As used herein, the terms “first,” “second,” etc. are used as labels for nouns that they precede, and do not imply any type of ordering (e.g., spatial, temporal, logical, etc.), unless stated otherwise. As used herein, the term “or” is used as an inclusive or and not as an exclusive or. For example, the phrase “at least one of x, y, or z” means any one of x, y, and z, as well as any combination thereof (e.g., x and y, but not z).
It is to be understood that the present disclosure is not limited to particular devices or methods, which may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. As used herein, the singular forms “a,” “an,” and “the” include singular and plural referents unless the context clearly dictates otherwise. Furthermore, the word “may” is used throughout this application in a permissive sense (i.e., having the potential to, being able to), not in a mandatory sense (i.e., must). The term “include,” and derivations thereof, mean “including, but not limited to.” The term “coupled” means directly or indirectly connected.
Within this disclosure, different entities (which may variously be referred to as “units,” “circuits,” other components, etc.) may be described or claimed as “configured” to perform one or more tasks or operations. This formulation—[entity] configured to [perform one or more tasks]—is used herein to refer to structure (i.e., something physical, such as an electronic circuit). More specifically, this formulation is used to indicate that this structure is arranged to perform the one or more tasks during operation. A structure can be said to be “configured to” perform some task even if the structure is not currently being operated. A “memory device configured to store data” is intended to cover, for example, an integrated circuit that has circuitry that performs this function during operation, even if the integrated circuit in question is not currently being used (e.g., a power supply is not connected to it). Thus, an entity described or recited as “configured to” perform some task refers to something physical, such as a device, circuit, memory storing program instructions executable to implement the task, etc. This phrase is not used herein to refer to something intangible.
The term “configured to” is not intended to mean “configurable to.” An unprogrammed FPGA, for example, would not be considered to be “configured to” perform some specific function, although it may be “configurable to” perform that function after programming.
Reciting in the appended claims that a structure is “configured to” perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) for that claim element. Should Applicant wish to invoke Section 112(f) during prosecution, it will recite claim elements using the “means for” [performing a function] construct.
In this disclosure, various “modules” operable to perform designated functions are shown in the figures and described in detail above (e.g., policy designer module 104, data replication module 106, etc.). As used herein, the term “module” refers to circuitry configured to perform specified operations or to physical, non-transitory computer-readable media that stores information (e.g., program instructions) that instructs other circuitry (e.g., a processor) to perform specified operations. Such circuitry may be implemented in multiple ways, including as a hardware circuit or as a memory having program instructions stored therein that are executable by one or more processors to perform the operations. The hardware circuit may include, for example, custom very-large-scale integration (VLSI) circuits or gate arrays, off-the-shelf semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, or the like. A module may also be any suitable form of non-transitory computer readable media storing program instructions executable to perform specified operations.
Although specific embodiments have been described above, these embodiments are not intended to limit the scope of the present disclosure, even where only a single embodiment is described with respect to a particular feature. Examples of features provided in the disclosure are intended to be illustrative rather than restrictive unless stated otherwise. The above description is intended to cover such alternatives, modifications, and equivalents as would be apparent to a person skilled in the art having the benefit of this disclosure.
The scope of the present disclosure includes any feature or combination of features disclosed herein (either explicitly or implicitly), or any generalization thereof, whether or not it mitigates any or all of the problems addressed herein. Accordingly, new claims may be formulated during prosecution of this application (or an application claiming priority thereto) to any such combination of features. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the appended claims.
This application claims the benefit of U.S. Provisional Application No. 62/945,616, filed on Dec. 9, 2019, which is hereby incorporated by reference as if entirely set forth herein.
Number | Date | Country | |
---|---|---|---|
62945616 | Dec 2019 | US |