In today's information driven economy, an enterprise may collect large amounts of data. The data is used to compute derivative information and generate semantic meaning from events and values. For example, a computation may calculate the mean profit for all companies using a financial services provider in the first quarter of 2020. As another example, the standard deviation in the minimum temperature of freezers in a chain of grocery stores may be computed. Additionally, machine learning may use vast data sets for training.
Many computations are repeated again and again, which is wasteful. Additionally, the input data may be “live,” meaning that the values within data sets may change over time. With changing values, computations on the same data set will change each time it is recomputed. This creates an issue when stable values are needed to reproduce or verify certain results. Multiple copies of the input data may be made that remains fixed in time but this is also wasteful, difficult to manage, and hard to verify. A challenge is to verify and reuse calculations in a scalable fashion without wasting resources.
In general, in one or more aspects, the disclosure relates to a method that implements verifiable cacheable calculations. A result is calculated. The result is hashed to generate a name of the result. The result is an input of a set of inputs from which the name is generated. Each input of the set of inputs identifies one of a data set, a query, and a function. The result is stored in a cache using the name generated from hashing the result. A request is received to access the result using the name. The result is retrieved from the cache using the name generated from hashing the result corresponding to the input. The result is presented in response to the request.
In general, in one or more aspects, the disclosure relates to a system that includes a server with one or more processors and one or more memories. An application, executing on the one or more processors of the server, is configured to implement verifiable cacheable calculations. A result is calculated by a result generator of the application. The result is hashed, by the result generator, to generate a name of the result. The result is an input of a set of inputs from which the name is generated. Each input of the set of inputs identifies one of a data set, a query, and a function. The result is stored in a cache of the server using the name generated from hashing the result. A request is received, by the application, to access the result using the name. The result is retrieved from the cache using the name generated from hashing the result corresponding to the input. The result is presented in response to the request.
In general, in one or more aspects, the disclosure relates to a method using verifiable cacheable calculations. A request is transmitted to access a result using a name. The result is calculated in response to a previous request. The result is hashed to generate the name. The result is an input of a set of inputs from which the name is generated. Each input of the set of inputs identifies one of a data set, a query, and a function. The result is stored in a cache using the name generated from hashing the result. In response to the request, the result is retrieved from the cache using the name generated from hashing the result. The result is received in response to the request.
Other aspects of the invention will be apparent from the following description and the appended claims.
Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.
In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
Throughout the application, ordinal numbers (e.g., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.
In general, embodiments of the disclosure verify and reuse calculations in a scalable fashion without wasting resources. In the context of web services, a web application may use the same calculations to generate different web pages shown to different clients. Instead of redoing the calculations each time for each page for each client, embodiments of the invention perform the calculations a first time, generate a name from a hash of the calculation, and store the calculations in a cache using the name. For subsequent pages that use the same calculations, the embodiments may retrieve the calculations from the cache using the name without having to redo the calculations and reduce the computational resources needed to generate and transmit the page.
Additionally, for machine learning applications, the same initial calculations may be performed on the same training data as part of training a machine learning model. Embodiments of the disclosure may retrieve cached versions of the initial calculations instead of recalculating the initial calculations every time a machine learning model is trained and used, which reduces the computational resources needed to train and use machine learning models and applications.
To reduce the waste of repetitive calculations, an expression, e.g., “M(Q(db))” can be computed once and the result (“R”) cached for reuse, which is addressable by a unique name. A data set is a set of data that may be stored in a database or saved in a cache and may be the results of a query or of a function.
“db” represents a data set within a database, which may include business accounting data for multiple companies with transactions over the lifespan of each company.
“Q(·)” represents a database query that, when applied to db, produces a query result (a data set), which, for example, may be contain the profit of each company.
“M(·)” represents a function that returns the result “R” (a data set), which, for example, may be a calculation of a mean profit from the individual or periodic profits of a set of companies.
The database, db, is the input into query Q(·) (represented by the expression “Q(db)”), which is then input into the mean function, M(·) (represented by the expression “M(Q(db))”), giving the result, R, as shown Equation 1 below.
R=M(Q(db)) Eq. 1
Embodiments of the disclosure assign a unique name to R and subsequent computations may refer to R using the name, rather than recomputing the value of R again. Furthermore, the name of the result R may represent the actual data set output by calculating M(Q(db)), which may include one or multiple values in any number of dimensions. If db, M, or Q change, the computed result R will also change and a different corresponding name will be assigned to the output. In Equations 2-4 below, the data sets db1 and db2 are different yielding results R1 and R2 that are also different and which would have different unique names.
db
1
≠db
2 Eq. 2
M(Q(db1))→R1≢M(Q(db2))→R2 Eq. 3
R
1
≠R
2 Eq. 4
A name refers to either an immutable data set or an expression. Examples of immutable data sets include a data set in a database, the data set of a query result, and the data set of the function result. Examples of expressions include queries (e.g., the query represented by the expression “Q(db)”), functions, and sequences of queries and functions (e.g., the expression “M(Q(db))”).
In the case where a name refers to an immutable data set (e.g., the data set represented by the expression “db”), the data is fetched and may be returned. In the case where a name refers to an expression (e.g., “M(Q(db))”), if a result for the name does not exist, then the expression is evaluated and the result cached with its corresponding name.
A name is a unique label for the data and the operations used to perform calculations. A hash may be used to derive a unique name in addition to other techniques. For example, additional techniques include a digital signature, an administratively assigned name, and so forth.
In one embodiment, a name of a data set is a hash computed from different inputs. For example, a 256-bit hash of an entire raw data set is an adequate name to uniquely distinguish the raw data set. For brevity in this disclosure, a 256-bit hash is represented by an abbreviation, e.g., a 256-bit hash value “56674b93766d3262080cf7fc62c7459987f43eb41640b0bf5ce14b0f93069aa1” may be represented by the abbreviation “5667 . . . 9aa1”. In one embodiment, the name may be appended to a domain name to generate a uniform resource identifier (URI) from which the data set associated with the name may be retrieved.
The hash function used to generate the hash value may be a cryptographic hash function implementing an algorithm to map data of arbitrary size (also known as a “message”) to a bit array of a fixed size (also known as a “hash value”, a “hash”, or a “message digest”). The hash function is a one-way function that is practically infeasible to invert, i.e., to generate the input from the output. Examples of hash functions that may be used include MD5 (message digest 5), SHA (secure hash algorithm), RIPEMD (RACE integrity primitives evaluation message digest), Whirlpool, BLAKE, etc.
Equation 5 below computation of the hash value for a name is represented by the below. Where “A, B, C, . . . ” denotes the inputs to the name derivation function “F” and N is the resulting name. As an example, the input “A” may identify the data set represented by “db”, the input “B” may identify the query “Q(·)”, the input “C” may identify the query result of “Q(db)”, etc. The inputs may also include metadata about the data sets, queries, and functions. For example, an input may include a timestamp that indicates the date and time a data set was generated.
F:A,B,C, . . . =N Eq. 5
In one embodiment, the name derivation function “F” generates a single 256-bit hash value by hashing all of the inputs. For example, with the inputs “db, Q(db)”, the name derivation function “F” may hash the data set of “db” concatenated with and the query result of “Q(db)” to generate the name “42E . . . 45F”.
In another embodiment, the name derivation function “F” appends hash values for each of the inputs to generate the name. For example, with the inputs “db, Q(db)”, the name derivation function “F” appends the hash value for “db” (“1E0 . . . 8EC”) with the hash value for “Q(db)” (“55D . . . 17F”) with a separator (e.g., “/”) to generate the name “1E0 . . . 8EC/55D . . . 17F”.
Expressions identify data sets (e.g., “db”) and sequences of calculations (e.g., “M(Q(db))”) to perform on data sets. As expressions are evaluated, partial and final results are generated that are cached for future use. For example, to evaluate the expression “M(Q(db))”:
the data set “db” is retrieved from network attached storage and stored to the cache with the unique name “1E0 . . . 8EC”;
the query “Q(db)” or “Q(‘1E0 . . . 8EC’)” is calculated to form a partial result stored in the cache with the unique name “55D . . . 17F”; and
the function “M(Q(db))” or “M(‘55D . . . 17F’)” is calculated to for a final result stored in the cache with the unique name “F26 . . . 3A1”.
In one embodiment, data sets (including results) are named with a “raw name” computed from directly hashing the content of the data set without taking into account the expressions used to compute the data set. With a raw name, the evaluation of the two expressions x2 and
In one embodiment, chained names are used that provide for data provenance. Data provenance in a name shows the sequence of calculations. For example, for the expression M(Q(db)), the inputs to the name derivation function may include:
The output from the name derivation function from the above inputs may be the name
In one embodiment, signed chained names are used in which the names of the data sets, queries, functions, and results may be digitally signed. The digital signature may be performed with a cryptographic algorithm, examples of which include RSA (Rivest-Shamir-Adleman), DSA (digital signature algorithm), ECDSA (elliptic curve digital signature algorithm), etc. As an example, with the expression M(Q(db)), the data forming the query result Q(db) and the function result M(Q(db)) may each be signed with a private key. Signing the data allows subsequent proof that the signor of the data set generated the data set. The data set may be signed and then hashed to form the name or vice versa, i.e., the data is hashed, which is then signed. As an example, for the expression M(Q(db)), the inputs to the name derivation function may include:
The output from the name derivation function from the above inputs may be the name
In one embodiment, computable chained names are used in which partial or final results are not included in the name. Computable chained names enable computing the expected name of a final result by starting with the name of data set (e.g., “db”) and computing the desired name. For example, for the expression M(Q(db)), the expressions Q(db) and M(Q(db)) may not be included in the set of inputs and are not used by the name derivation function. The inputs may include:
The output from the name derivation function from the above inputs may be the name
In one embodiment, collapsing subexpressions are used to generate a name. With collapsing subexpressions, only certain data sets are cached. For example, for the expression M(Q(db)), the inputs to the name derivation function may include:
The output from the name derivation function from the above inputs may be the name
The subexpressions are evaluated but the partial results are not saved. When evaluating M(Q(db))), a name of db and of M(Q(·)) are computed. The name of the final result includes the full name of db and the hash of the M(Q(·)) function. This generalizes to any collection of operations where the hash of Mk( . . . (M2(M1(Q(·)))) . . . ) is computed once only for all data sets. Formally, let M={M1, . . . , Mk} be a collection of functions, and a function-operator precedence on M. For every valid permutation σ=ƒ1(ƒ2( . . . (·) . . . )), ƒi∈M of operations in ordering, compute the hash of σ once for all queries Q(db).
In one embodiment, graphs may be used to generate names. Let G=(V, E, L) be a rooted directed acyclic graph where V, E, L are the sets of vertices, edges, and vertex labels respectively. For every v∈V:L(v) is the computed name of the operations leading to v. A directed edge (a, b) is added whenever the child node b is a result of an operation applied to its parent a.
Formally, the label of the root r of G is the name of the data set (db), i.e., the hash of the raw data set: L(r)=hash(db).
For every new query Q on db, a new vertex v is added to V with a corresponding edge (r, v) and label L(v)=L(r)+hash(Q).
For every new operation M on Q(db), a new vertex u is introduced with edge (v, u) and label L(u)=L(v)+hash(Q).
For any chained operations Mk( . . . (M2(M1(Q(db)))) . . . ), there is a directed path from the root of G to a node vk of the form P=r, u, v1, v2, . . . , vk where:
If a new operation is performed, its corresponding node and label are added to G. If an existing chain of operations already exists, it suffices to traverse the graph and return the label of the final node in the path of computations.
Turning to
The server application (123) is a set of programs executing on the server (121) to interact with the client (111) and the repository (191). The server application (123) processes the name (117) from the request (115) with the name processor (125) and generates names used by the system with the name generator (135).
The name processor (125) is a set of programs of the server application (123) that processes the names from requests, including the name (117) from the request (115). The name processor (125) includes the result locator (127) and the result generator (129).
The result locator (127) is a program of the name processor (125) that locates data in the cache (171). The result locator (127) may receive the name (117) as an input and output a data set corresponding to the name from the cache or a code indicating that the cache does not include a data set that corresponds to the name (117). In one embodiment, the result locator (127) uses a mapping between the names (146) and the cached data sets (173) to determine if the cache (171) includes a data set corresponding to the name (117) and responsive to the request (115). In one embodiment, the result locator (127) may determine if the cache (171) includes a corresponding cached data set for each data set, query, or function that corresponds to the name (117). For example, the name (117) may identify the expression “M(Q(db))” and the result locator (127) may determine which, if any, of the cached data sets (173) correspond to the data set for the expression “db”, the data set for the partial result for the expression “Q(db)”, and the data set for the final result for the expression “M(Q(db))”.
The result generator (129) is a program of the name processor (125) that generates results by evaluating expressions from names, such as the name (117). For example, if the result locator (127) indicates that the cache (171) does not include any of the data sets, partial results, or final results for the name of the expression “M(Q(db))”, the result generator (129) may retrieve data sets, generate partial and final results, and store data sets (including results) to the cache (171). For the expression “db”, the result generator (129) may retrieve the data set (198) from the repository (191) and store the data set (198) as the cached data set (176) in the cache (171). For the expression “Q(db)”, the result generator (129) may apply the query (139) (corresponding to the expression “Q(·)”) to the data set (198) (or equivalently to the cached data set (176)) to generate a query result, and save the query result (a partial result) to the cache (171) as the cached data set (175). For the expression “M(Q(db))”, the result generator (129) may apply the function (143) (corresponding to the expression “M(·)”) to the cached data set (175) (the partial result generated from the expression “Q(db)”) to generate a result, and store the result to the cache (171) as the cached data set (174).
Continuing with
The first input “db” (which may correspond to the data set (198)) has the name “5d5 . . . 68a”, is version 8, and was last updated on March 28. The name may be used to locate “db” in the cache (171) and may be a hash value generated by applying a hash function to the data from the data set (198).
The second input “Q” (which may correspond to the query (139)) has the name “9f8 . . . 3d5” and is version 1. The name may be used to identify the query (139) and may be a hash value generated by applying a hash function to the query (139) (e.g., to the query string conforming to a query language, e.g., structured query language (SQL)).
The third input “Q(db)” indicates that the previous two inputs are to be combined to generate the result. For example, the result generator may generate a result for “Q(db)” that is saved to the cache (171) as the cached data set (175). The third input does not include a name. A name, generated by the name generator (135) from the inputs (132), may be included in the response (181) with the result (188) so that future requests may utilize the name generated with the name generator (135).
The queries (138), including the query (139), are data query language requests for information retrieval with database and information systems. Different query languages may be used in the queries (138) by the system (100) to access the data sets (197) of the repository (191). As an example, the query (139) may identify the data set (198), which may be filtered version of a larger data set.
The functions (142), including the function (143), are programs that process data accessible to the system (100), e.g., the data sets (197). Each function may perform one or multiple operations onto a data set. For example, one function may calculate the mean of a data set and another function may calculate the squares of the values within the data set. As another example, a function may perform an algorithm of a machine learning model, e.g., a neural network model. The function may perform the forward pass or backward pass of the neural network model on a data set from the repository (191) to generate a result that may be stored back to the repository (191) and to the cache (171).
Still referring to
The names (146) may be mapped to memory addresses of the cache (171) that correspond to the cached data sets (173). As an example, the name (117) may be mapped to the cached data set (174), which may be generated by the result generator (129) to form the result (188).
The cache (171) stores the cached data sets (173). In one embodiment, the cache (171) is a data store of the server (121) implemented with non-persistent storage, e.g., random access memory (RAM).
The cached data sets (173), including the cached data set (174), are cached versions (i.e., copies) of data sets and results retrieved and generated by the server (121). Each of the cached data sets may be identified by at least one of the names (146). As an example, the cached data set (174) may be identified by the name (117).
The graph (151) is a data structure maintained by the server (121). The graph (151) may track the cached data sets (173) in the cache (171) and enumerate the data sets, queriers, functions, and sequences thereof that may be processed by the system (100). The graph (151) may be traversed by the name generator (135) to construct the names (146) from paths formed by the nodes (153) and the edges (157) (see the path (160) of
The nodes (153), including the nodes (154), (155), and (156), represent data sets used by the system. A node may identify data sets in the repository (191) and in the cache (171). For example, the node (156) may represent the data set (198) (in the repository (191)) corresponding to the expression “db” and to the cached data set (176) (in the cache (171)). The node (155) may represent the cached data set (175), which may correspond to the result obtained from evaluating the expression “Q(db)”. The node (154) may represent the cached data set (174), which may correspond to the result obtained from evaluating the expression “M(Q(db))”.
The edges (157), including the edge (158) and (159), connect the nodes (153) of the graph (151). The edges (157) identify the queries or functions used to generate a child node from a parent node. For example, the edge (159) may identify the query (139) as the query used to generate the child node (155) (representing “Q(db)”) from the parent node (156) (representing (“db”). The edge (158) may identify the function (143) as the function used to generate the child node (154) (representing “M(Q(db))”) from the parent node (155) (representing “Q(db)”).
The labels (161), including the label (162), may correspond to the names generated from the paths of the graph (151) that identify the nodes of the graph (151). For example, the label (162) may identify the path from node (156) to node (155) through the node (155) using the edges (158) and (159). In one embodiment the label (162) may be correspond to the node (154) and the name (117).
The request (115) is a message, generated by the client (111), requesting data and is serviced by the server (121). The request includes the name (117) that identifies the result (188) (in the response (181)). The request (115) may be generated by the client application (112) in response to user interaction with the client (111).
The response (181) is a message, generated by the server (121), responsive to the request (115) and received by the client (111). The response includes the result (188) that corresponds to the data set identified in the name (117).
The client application (112), operating on the client (111), interacts with the server (121) to request and present information of the system (100) stored in the repository (191) and in the cache (171). In one embodiment, the client application (112) may be a web browser that accesses the applications running on the server (121) using web pages hosted by the server (121). In one embodiment, the client application (112) may be a web service that communicates with the applications running on the server (121) using representational state transfer application programming interfaces (RESTful APIs). Although
The client application (112) may include multiple interfaces (e.g., graphical user interfaces) for interacting with the system (100). A user may operate the client application (112) to perform tasks that retrieve and write information from and to the data sets (197) in the repository (191).
The server (121) uses the server application (123), the cache (171), the queries (138), the functions (142), the names (146), and the graph (151) to process requests from clients (including the request (115)) and generate responses (including the response (181)). The server (102) executes the server application (123) that communicates with the client (111) and the repository (191). The server (102) receives and responds to requests from the client (111) and stores data to the repository (191) in response to interactions with the client (111). Each of the programs running on the server (102) may execute inside one or more containers hosted by the server (102). The server (121) may be one of multiple servers hosted by a cloud environment to service requests from multiple clients, including the client (111). The server (102) may be one of a set of virtual machines hosted by a cloud services provider to deploy the server application (123). The server (121) may be embodied as a computing system as described in
The client (111) may be a device (or process executing on a device) that interacts with the server (121) by sending and receiving messages, including the request (115) and the response (181). The messages may be sent and received as part of a representational state transfer application programming interface (RESTful API) using hypertext transfer protocol (HTTP) messages that include text formatted in accordance with JavaScript object notation. Other protocols and standards for communication and data serialization may be used, including remote procedure calls (RPC), protocol buffers (Protobuf), etc. The client (111) may be embodied as a computing system as described in
The repository (191) is a computing system that may include multiple computing devices in accordance with the computing system (700) and the nodes (722) and (724) described below in
Turning to
Each of the nodes (156), (155), and (154) includes an identifier, represents an expression, and may be referenced by a name and a label. The names may be generated from hashing the underlying data set represented by the node. The labels may be generated from the path identified between a start node and an end node.
The node (156) is identified by the identifier k0, represents the expression “db”, and is referenced by the name “59A . . . 3ED”, which also serves as the label for the node (156). The node (155) is identified by the identifier k1, represents the expression “Q(db)”, and is referenced by the name “55D . . . 17F”, and is referenced by the label “59A . . . 3ED/C58 . . . 53B/55D . . . 17F” (having a corresponding sequence of expressions of “db/Q/Q(db)”. The node (154) is identified by the identifier k2, represents the expression “M(Q(db))”, and is referenced by the name “F26 . . . 3A1”, and is referenced by the label “59A . . . 3ED/C58 . . . 53B/55D . . . 17F/F26 . . . 3A1/F26 . . . 3A1” (having a corresponding sequence of expressions of “db/Q/Q(db)/M/M(Q(db))”.
Each of the edges (159) and (158) includes an identifier, represents an expression (e.g., a query or a function), and is referenced by a name. The edge (159) is identified by the identifier e1, represents the expression “Q”, and is reference by the name “C58 . . . 53B” corresponding to the query (139). The edge (158) is identified by the identifier e2, represents the expression “M”, and is reference by the name “C58 . . . 53B” corresponding to the function (143).
Continuing with
At Step 202, a result is calculated. In one embodiment, an initial request to access the result is receiving before calculating the result. The result may be calculated by evaluating an expression corresponding to an initial name (which may resolve to an expression instead of to a data set), which is included in the initial request. For example, the expression “M(Q(db))” may be identified in the initial name in the initial request. After generating the result in response to the initial request, the result is presented and may include the name identifying the data set (as opposed to the initial name identifying the expression) for future reference to the result.
In one embodiment, a request may specify the set of inputs that include a data set, a query, and a function. For example, the request may correspond to the string of expressions “db/Q(db)/M(Q(db))”. To evaluate the expression, the data set (e.g., “db”), identified by a first input of the set of inputs is located in a repository. The query (e.g., “Q”), identified by a second input of the set of inputs, is applied to the data set (“db”) to generate the query result (“Q(db)”), which is also referred to as a partial result. The function (e.g., “M”), identified by a third input of the set of inputs, is applied to the query result (“Q(db)”, i.e., the partial result) to generate the final result (“M(Q(db))”).
At Step 204, the result is hashed to generate a name of the result. For example, the result of evaluating the expression “M(Q(db))” may be hashed to generate the name.
In one embodiment, the result may be an input of a set of inputs from which the name is generated. For example, the set of inputs may correspond to the string of expressions “db/Q(db)/M(Q(db))”, which includes the expression “M(Q(db))”, which evaluates to the result that was calculated.
In one embodiment, each input of the set of inputs may identify one of a data set, a query, and a function. A data set is a set of data. Data from a data set of a repository, query results, partial results, and final results may each be a “data set” that is specified by an input of the set of inputs. The set of inputs may include one or more inputs. Each of the inputs may be a data set, a query, or a function. Any number or combination of data sets, queries, and functions may be included in a set of inputs. For example, one set of inputs may include a data set; another set of inputs may include a data set and a function; another set of inputs may include multiple data sets, multiple queries, and multiple functions.
In one embodiment, the set of inputs is hashed to generate the name from the set of inputs. For example, the request may correspond to the string of expressions “db/Q(db)/M(Q(db))”. Instead of hashing each result individually, the data sets and results may be appended and then hashed. In one embodiment, each result may be individually hashed, the individual hash concatenated, and the concatenated hashes may be hashed again to generate a single hash value.
In one embodiment, multiple hashes may be appended together to generate the name. A first input from the set of inputs is hashed to generate a first hash value. A second input from the set of inputs is hashed to generate a second hash value. The first hash value is joined with the second hash value to generate the name. For example, for the request corresponding to the string of expressions “db/Q(db)”, the first input “db” is hashed to generate the hash value “59A . . . 3ED. The second input “Q(db)” is hashed to generate the hash value “55D . . . 17F”. The two hash values are joined with the separator “/” to create the name “59A . . . 3ED/55D . . . 17F”.
In one embodiment, the result may be digitally signed to generate the name. For example, the result corresponding the to the expression “M(Q(db))” may be signed with the private key of a user. Subsequent users may then verify the corresponding data set using the public key of the user. In one embodiment, the hash of the result is signed to reduce the amount of processing resources used to sign the result.
In one embodiment, a name may be generated that does not include hashes of data sets (e.g., does not include hashes of partial results, final results, the initial data set, etc.). In one embodiment, a subset of the set of inputs may be hashed without hashing a result included in the set of inputs to generate a computable name of the function corresponding to an input of the subset of the set of inputs.
As an example, the name from a request corresponding to the string of expressions “db/Q(db)/M(Q(db))” may be processed so that a hash of the query (“Q”) and a hash of the function (“M”) are included in the name (referred to as a computable name) without including hashes of the data sets (i.e., “db”, “Q(db)”, or “M(Q(db))”). The hashes of the query and the function may be appended with a separator to generate the computable name “C58 . . . 53B/0BB . . . EC2” (corresponding the to the chained expression “Q/M”). This name may then be used to generate a result, for example by applying the function (“M”) using the computable name to generate a result.
In one embodiment, some of the data sets (including results and partial results) are included in the name. A set of inputs, including a first input corresponding to either the result or a partial result, are hashed to generate the name without converting the other of the result or the partial result. For example, for a request specifying the string of expressions “db/Q(db)/M(Q(db))”, the resulting name may include hash values for the string of expressions “Q/M/M(Q(db))” (i.e., “C58 . . . 53B/0BB . . . EC2/F26 . . . 3A1”) which includes the hash value for the result “M(Q(db))” but does not include the hash value for the partial result “Q(db)” or of the initial data set “db”.
At Step 206, the result is stored in a cache using the name generated from hashing the result. The result may also be stored to the repository.
At Step 208, a request is received to access the result using the name. In one embodiment, the name may be generated by the client. In one embodiment, the name may have been included in a previous response to a previous request.
At Step 210, the result is retrieved from the cache using the name generated from hashing the result corresponding to the input. In one embodiment a mapping from the name to the memory address in the cache for the cached data corresponding to the result is used to locate the result in the cache.
At Step 212, the result is presented in response to the request. For example, the result may be presented by transmitting the result from the server to the client and then by the client displaying the result to a user.
In one embodiment, graphs may be used to generate the names. A graph is traversed to identify a path corresponding to the set of inputs. Each node of the graph may correspond to one of a data set, a partial result, and a result. The graph is directed and acyclic and includes an edge that identifies a function (e.g., “M”) applied to a first node, of the graph, to generate the result (e.g., “M(Q(db))”) corresponding to a second node of the graph. The name of the result is then generated using the path. For example, the hash values (names) for each node (and edge) may be appended with separator characters to form the final name, also referred to as a label for the second node.
In one embodiment, graphs may include different paths to the same result. A graph is traversed to identify a first path. The first path corresponds to the set of inputs and to the result. The first path is different from a second path that corresponds to the result and does not correspond to the set of inputs. The result in the cache may be accessed using the first path or the second path.
In one embodiment, nodes may be added to graphs in response to requests for data sets that have not been cached. A node corresponding to one of a result and a partial result may be added to a graph after calculating the result or partial result and storing the result or partial result in the cache. The new node is either a child of a previous calculation or a new query/function on the data. Multiple nodes may also be added. For example, if M(Q(db)) is new (i.e., neither M(·) nor Q(·) have been used before), then two new nodes may be added to the graph, one for the partial result Q(db)) and one for the result M(Q(db)).
Turning to
The expressions (301), (302), (203), (304), and (305) are referenced with the names (311), (312), (313), (314), and (315), respectively. The names (311), (313), and (315) are the hash values generated from the data sets generated from evaluating the expressions (301), (303), and (305). The names (312) and (314) are hash values generated from hashing the code for the functions (302) and (304).
Turning to
Turning to
Turning to
Turning to
Turning to
The expression (516) and (517) may effectively be equivalent expression the generate the same result leading to the hash values (535) and (537) being the same. By having the same result, the accuracy of the first result may be double checked with the second result.
Turning to
Turning to
The data set (“db”) has not changed and is retrieved from the cache (622).
Since the model (“MLM”), and the functions that make up the model, have changed, the output from the model is updated and then stored to the cache (622) using the name corresponding to the expression string “db/MLM/MLM(db)”. Additionally, the result (“MLM(db)”) is signed so that other users may verify the results. After the model is trained, the server application (620) sends the page (630) to the client device indicating that the training is complete. Using the cached data sets and calculations reduces the training time for the machine learning model.
Turning to
The server application (670) retrieves the data set (“db”), applies the query (“Q”) to the data set to generate a partial result (“Q(db)”) that is stored in the cache (672). The server application (670) applies the function (“AP”) to the partial result to generate the final result (“AP(Q(db))”). After generating the final result, the server application successfully sends the page (680) to the client device of the user which indicates that the loan corresponding to the loan application is approved. Using the cached data sets and calculations reduces the time to generate and transmit web pages, including the page (680).
Embodiments of the invention may be implemented on a computing system. Any combination of a mobile, a desktop, a server, a router, a switch, an embedded device, or other types of hardware may be used. For example, as shown in
The computer processor(s) (702) may be an integrated circuit for processing instructions. For example, the computer processor(s) (702) may be one or more cores or micro-cores of a processor. The computing system (700) may also include one or more input device(s) (710), such as a touchscreen, a keyboard, a mouse, a microphone, a touchpad, an electronic pen, or any other type of input device.
The communication interface (712) may include an integrated circuit for connecting the computing system (700) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, a mobile network, or any other type of network) and/or to another device, such as another computing device.
Further, the computing system (700) may include one or more output device(s) (708), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, a touchscreen, a cathode ray tube (CRT) monitor, a projector, or other display device), a printer, an external storage, or any other output device. One or more of the output device(s) (708) may be the same or different from the input device(s) (710). The input and output device(s) (710 and (708)) may be locally or remotely connected to the computer processor(s) (702), non-persistent storage (704), and persistent storage (706). Many different types of computing systems exist, and the aforementioned input and output device(s) (710 and (708)) may take other forms.
Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, a DVD, a storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium. Specifically, the software instructions may correspond to computer readable program code that, when executed by a processor(s), is configured to perform one or more embodiments of the invention.
The computing system (700) in
Although not shown in
The nodes (e.g., node X (722), node Y (724)) in the network (720) may be configured to provide services for a client device (726). For example, the nodes may be part of a cloud computing system. The nodes may include functionality to receive requests from the client device (726) and transmit responses to the client device (726). The client device (726) may be a computing system, such as the computing system (700) shown in
The computing system (700) or group of computing systems described in
Based on the client-server networking model, sockets may serve as interfaces or communication channel end-points enabling bidirectional data transfer between processes on the same device. Foremost, following the client-server networking model, a server process (e.g., a process that provides data) may create a first socket object. Next, the server process binds the first socket object, thereby associating the first socket object with a unique name and/or address. After creating and binding the first socket object, the server process then waits and listens for incoming connection requests from one or more client processes (e.g., processes that seek data). At this point, when a client process wishes to obtain data from a server process, the client process starts by creating a second socket object. The client process then proceeds to generate a connection request that includes at least the second socket object and the unique name and/or address associated with the first socket object. The client process then transmits the connection request to the server process. Depending on availability, the server process may accept the connection request, establishing a communication channel with the client process, or the server process, busy in handling other operations, may queue the connection request in a buffer until server process is ready. An established connection informs the client process that communications may commence. In response, the client process may generate a data request specifying the data that the client process wishes to obtain. The data request is subsequently transmitted to the server process. Upon receiving the data request, the server process analyzes the request and gathers the requested data. Finally, the server process then generates a reply including at least the requested data and transmits the reply to the client process. The data may be transferred, more commonly, as datagrams or a stream of characters (e.g., bytes).
Shared memory refers to the allocation of virtual memory space in order to substantiate a mechanism for which data may be communicated and/or accessed by multiple processes. In implementing shared memory, an initializing process first creates a shareable segment in persistent or non-persistent storage. Post creation, the initializing process then mounts the shareable segment, subsequently mapping the shareable segment into the address space associated with the initializing process. Following the mounting, the initializing process proceeds to identify and grant access permission to one or more authorized processes that may also write and read data to and from the shareable segment. Changes made to the data in the shareable segment by one process may immediately affect other processes, which are also linked to the shareable segment. Further, when one of the authorized processes accesses the shareable segment, the shareable segment maps to the address space of that authorized process. Often, only one authorized process may mount the shareable segment, other than the initializing process, at any given time.
Other techniques may be used to share data, such as the various data described in the present application, between processes without departing from the scope of the invention. The processes may be part of the same or different application and may execute on the same or different computing system.
Rather than or in addition to sharing data between processes, the computing system performing one or more embodiments of the invention may include functionality to receive data from a user. For example, in one or more embodiments, a user may submit data via a graphical user interface (GUI) on the user device. Data may be submitted via the graphical user interface by a user selecting one or more graphical user interface widgets or inserting text and other data into graphical user interface widgets using a touchpad, a keyboard, a mouse, or any other input device. In response to selecting a particular item, information regarding the particular item may be obtained from persistent or non-persistent storage by the computer processor. Upon selection of the item by the user, the contents of the obtained data regarding the particular item may be displayed on the user device in response to the user's selection.
By way of another example, a request to obtain data regarding the particular item may be sent to a server operatively connected to the user device through a network. For example, the user may select a uniform resource locator (URL) link within a web client of the user device, thereby initiating a Hypertext Transfer Protocol (HTTP) or other protocol request being sent to the network host associated with the URL. In response to the request, the server may extract the data regarding the particular selected item and send the data to the device that initiated the request. Once the user device has received the data regarding the particular item, the contents of the received data regarding the particular item may be displayed on the user device in response to the user's selection. Further to the above example, the data received from the server after selecting the URL link may provide a web page in Hyper Text Markup Language (HTML) that may be rendered by the web client and displayed on the user device.
Once data is obtained, such as by using techniques described above or from storage, the computing system, in performing one or more embodiments of the invention, may extract one or more data items from the obtained data. For example, the extraction may be performed as follows by the computing system (700) in
Next, extraction criteria are used to extract one or more data items from the token stream or structure, where the extraction criteria are processed according to the organizing pattern to extract one or more tokens (or nodes from a layered structure). For position-based data, the token(s) at the position(s) identified by the extraction criteria are extracted. For attribute/value-based data, the token(s) and/or node(s) associated with the attribute(s) satisfying the extraction criteria are extracted. For hierarchical/layered data, the token(s) associated with the node(s) matching the extraction criteria are extracted. The extraction criteria may be as simple as an identifier string or may be a query presented to a structured data repository (where the data repository may be organized according to a database schema or data format, such as XML).
The extracted data may be used for further processing by the computing system. For example, the computing system (700) of
The computing system (700) in
The user, or software application, may submit a statement or query into the DBMS. Then the DBMS interprets the statement. The statement may be a select statement to request information, update statement, create statement, delete statement, etc. Moreover, the statement may include parameters that specify data, or data container (database, table, record, column, view, etc.), identifier(s), conditions (comparison operators), functions (e.g., join, full join, count, average, etc.), sort (e.g., ascending, descending), or others. The DBMS may execute the statement. For example, the DBMS may access a memory buffer, a reference or index a file for read, write, deletion, or any combination thereof, for responding to the statement. The DBMS may load the data from persistent or non-persistent storage and perform computations to respond to the query. The DBMS may return the result(s) to the user or software application.
The computing system (700) of
For example, a GUI may first obtain a notification from a software application requesting that a particular data object be presented within the GUI. Next, the GUI may determine a data object type associated with the particular data object, e.g., by obtaining data from a data attribute within the data object that identifies the data object type. Then, the GUI may determine any rules designated for displaying that data object type, e.g., rules specified by a software framework for a data object class or according to any local parameters defined by the GUI for presenting that data object type. Finally, the GUI may obtain data values from the particular data object and render a visual representation of the data values within a display device according to the designated rules for that data object type.
Data may also be presented through various audio methods. In particular, data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device.
Data may also be presented to a user through haptic methods. For example, haptic methods may include vibrations or other physical signals generated by the computing system. For example, data may be presented to a user using a vibration generated by a handheld computer device with a predefined duration and intensity of the vibration to communicate the data.
The above description of functions presents only a few examples of functions performed by the computing system (700) of
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.