Data processing is a fundamental part of computer programming. One can choose from amongst a variety of programming languages with which to author programs. The selected language for a particular application may depend on the application context, a developer's preference, or a company policy, among other factors. Regardless of the selected language, a developer will ultimately have to deal with data, namely querying and updating data.
A technology called language-integrated queries (LINQ) was developed to facilitate data interaction from within programming languages. LINQ provides a convenient and declarative shorthand query syntax to enable specification of queries within a programming language (e.g., C#®, Visual Basic® . . . ). More specifically, query operators are provided that map to lower-level language constructs or primitives such as methods and lambda expressions. Query operators are provided for various families of operations (e.g., filtering, projection, joining, grouping, ordering . . . ), and can include but are not limited to “where” and “select” operators that map to methods that implement the operators that these names represent. By way of example, a user can specify a query expression in a form such as “from n in numbers where n<10 select n,” wherein “numbers” is a data source and the query returns integers from the data source that are less than ten. Further, query operators can be combined in various ways to generate queries of arbitrary complexity.
There can be a client-server relationship to query processing where the client generates the query and the server executes the query. Moreover, differences can exist between execution environments of clients and servers, often referred to as an impedance mismatch. This impedance mismatch is bridged by transforming a client representation of a query directly to a target-server comprehensible form. For example, a query expression integrated within a general-purpose programming language (e.g., C#®, Visual Basic®, Java . . . ) can be translated to domain-specific programming language such as T-SQL (e.g., Transact-Structured Query Language) to enable execution with respect to a relational database system. This can be accomplished utilizing intimate knowledge of a query source and an execution target to map between the source and the target.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed subject matter. This summary is not an extensive overview. It is not intended to identify key/critical elements or to delineate the scope of the claimed subject matter. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
Briefly described, the subject disclosure generally pertains to multidimensional data centric service protocol. An intermediate representation of a query expression can be generated that is independent of query-expression generation and execution environments. In other words, the intermediate representation is generated without domain specific knowledge. The intermediate representation can subsequently be provided to a query execution service, which can transform the intermediate representation to a locally executable representation. Subsequently, the query expression can be executed and results returned. Accordingly, the intermediate representation provides a uniform vehicle for exchange of query expressions across a plurality of different execution environments.
Furthermore, a number of features can be employed with respect to the intermediate representation. For example, at least a portion of the intermediate representation can be discarded as a function of a particular execution context (e.g., dynamically typed). In addition, client context information can be transmitted in conjunction with the intermediate expression to enable decisions regarding query execution to be made based thereon. Further yet, various compression techniques can be utilized to reduce the overall size of the query expression and/or representation thereof prior to transmission.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the claimed subject matter are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
Details below are generally directed toward multidimensional data-centric service protocol. Various services can be available for processing requests for data. For example, a number of servers can be accessible to network connected clients to execute query expressions, or more simply stated, queries. Rather than translating a query expression directly from a source format to a target format, an intermediate representation of a query expression can be generated for use with respect to a plurality of execution environments. Subsequently, the intermediate representation can be transmitted to a query execution environment, which can transform the intermediate representation into a locally executable form. In this manner, intricate details regarding an execution environment need not be known, which can allow a query to be potentially executed in any execution context. In addition, the intermediate representation can insulate a query expression generator from changes with respect to a query executor (e.g., data source schema changes), among other things.
Furthermore, numerous factors can also shape the created and transmitted intermediate representation. For example, where portions of the intermediate representation are not supported by an execution environment, the portions can be removed prior to transmission. Additionally, client context information can be added to the intermediate representation to enable an execution environment to employ such data in various manners. Further yet, at least portions of the query expression can be compressed to facilitate transmission. In other words, the protocol can be multidimensional.
Various aspects of the subject disclosure are now described in more detail with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the claimed subject matter.
Referring initially to
The query generation component 110 produces a local representation of a query expression (e.g. a combination of one or more values and/or operators). For example and although not limited thereto, a query expression can correspond to a language-integrated query (LINQ or LINQ query) that is specified with respect to a combination of query operators, and the generated local representation can be an expression tree. Furthermore, the query expression can optionally be segmented into two or more query expressions to enable distributed query execution. For clarity and simplicity, however, this description focuses a single query expression, which can be one of a number of sub-query expressions designated for distributed execution.
The representation generation component 120 receives, retrieves, or otherwise obtains or acquires a query expression, which specifies a query with respect to one or more data sources, and produces an intermediate representation of the query expression that is query-expression generation and execution environment independent (e.g., without domain-specific knowledge). Nevertheless, the intermediate representation captures the semantics (e.g., meaning) of the query expression implied by an ordering of one or more query operators (e.g., via type information, method calls . . . ). For example, if the client representation of a query expression is an expression tree, the representation generation component 120 can iterate through nodes of the tree and generate equivalent code that is not tied to a particular execution context (e.g., hardware or software). In one particular instance, type information can be generated at different levels of granularity since information can be determined or inferred and reconstructed. In other words, the intermediate representation is a domain-independent vehicle of knowledge exchange between the client query-generation environment and the server query-execution environment. Furthermore, the representation generation component 120 can include metadata in the intermediate representation such as client context information as described later herein.
The communication component 130 provides a means for facilitating communication of the intermediate representation to one or more query execution components 140. As will be described later herein, the communication component 130 can enable negotiation of a particular protocol between the client query-generation environment and server query-execution environment with respect to the intermediate representation.
The query execution components 140 can include execution contexts (e.g., supported hardware/software) that differ from the execution context in which the query expression was constructed. For example, the query expression can be constructed with a first programming language while a query execution component 140 supports second programming language. Further, execution context can vary amongst the query execution components 140 as well. Nevertheless, each query execution component 140 can transform the intermediate representation of a query expression into a representation executable within its particular context.
Employment of an intermediate representation is beneficial in that it provides a uniform interface for data acquisition. In other words, a single intermediate representation can be produced rather than a numerous representations targeting particular query execution contexts. Along the same lines, a query generator need not have knowledge of the intricacies of particular query contexts to interact with the contexts, and certain query expressions can be rejected during transformation to the intermediate representation. As well, a query expression generation is insulated with respect to changes with respect to a query execution component 140 (e.g., context, schema, version . . . ). Still further yet, the intermediate representation can facilitate distributed as well as parallel processing since the representation can be common for multiple query execution components. By way of example, a query execution component 140 can provide the intermediate representation to yet another query execution component 140 for execution of at least a portion of the query expression represented thereby.
A filter component 214 can also reside on the client side and include functionality to remove portions of an intermediate representation of a query expression. For example, the filter component 214 can initiate communication with the server 220 and request information regarding supported scope of a query expression including functionality, capabilities, or the like. Based at least in part on this information the filter component 214 is configured to remove portions of the intermediate representation prior to transmission. Since the intermediate representation is designed for use by multiple query executors of various sophistication and capabilities, some information such as data types might be useful in one context but be unused in another context. Accordingly, the filter component 214 can reduce the amount of data transmitted as a function of a particular execution context. In other words, the filter component 214 can perform a type of lossy compression with respect to the intermediate representation as a function of execution context. Further, it is to be appreciated that the server 220 may distribute a query execution work to other servers. In this case, upon inquiry from the filter component 214, the server 220 can respond with information that captures the maximum quantity of data needed by it or other servers it intends to employ to ensure requisite information is available. Of course, additional communication can be initiated to obtain information that was discarded prior to transmission.
In accordance with one embodiment, the intermediate representation of a query expression and/or its serialized form can include information about the client 210 wherein the client can refer to a particular computer and/or user of the computer (e.g., identity, login information . . . ). The access component 224 can acquire this information from the intermediate representation and utilize the information to control access to query expression execution functionality. In some sense, the server 220 is providing a service or more particularly data-centric services, such as a query execution service. Access to the service can be controlled for safety, security, and/or monetization reasons, among other things. For example, if an individual requests query execution and does not have a subscription to the service, the access component 224 can prevent the server 220 from executing the query and/or returning results. Similarly, the access component 224 can keep track of the number of queries executed by a client 210 for analysis and/or billing reasons where subscriptions are offered with fees tied to a number of queries (e.g., per week, per month . . . ).
The aforementioned systems, architectures, environments, and the like have been described with respect to interaction between several components. It should be appreciated that such systems and components can include those components or sub-components specified therein, some of the specified components or sub-components, and/or additional components. Sub-components could also be implemented as components communicatively coupled to other components rather than included within parent components. Further yet, one or more components and/or sub-components may be combined into a single component to provide aggregate functionality. Communication between systems, components and/or sub-components can be accomplished in accordance with either a push and/or pull model. The components may also interact with one or more other components not specifically described herein for the sake of brevity, but known by those of skill in the art.
Furthermore, various portions of the disclosed systems above and methods below can include or consist of artificial intelligence, machine learning, or knowledge or rule-based components, sub-components, processes, means, methodologies, or mechanisms (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines, classifiers . . . ). Such components, inter alia, can automate certain mechanisms or processes performed thereby to make portions of the systems and methods more adaptive as well as efficient and intelligent. By way of example and not limitation, the communication component 130 can utilizes such mechanisms to determine or infer an optimal communication protocol as a function of historical and/or contextual information, for instance.
In view of the exemplary systems described supra, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to the flow charts of
Referring to
At 740, a determination is made concerning whether to execute the query expression. Such a determination can be made as a function of safety and/or security concerns as well as subscription information, among other things. For example, if client information indicates that the request is coming from a known security risk or a maximum number of queries have already been processed, a decision can be made not to execute the query. If, however, the client information indicates that the request arises from a user with a valid subscription, the decision can be to execute the query. Still further yet, the determination at 740 can correspond more generally to a filter such that parts of the query expression are allowed to execute while others are not. In one instance, a negotiation can occur where a client agrees to obey server communicated restrictions, and thus the entire query expression is likely to be allowable. Alternatively, an agreement can be made where the server accepts arbitrary queries (or a subset thereof) but can come to the conclusion during processing that a condition exists that prevents execution of the query in its entirety.
If, at 740, a decision is made not to execute the query (“NO”) (or portion thereof), a notification of this fact can be generated at 750 and potentially provided to a requesting party. Further, although not illustrated, results from other parts of a query that were allowed to execute can be returned. Subsequently, the method 700 can terminate. If, however, at 740, the decision is to allow execution (“YES”) then the method 700 continues at numeral 760 where query execution is at least initiated. Continuing at reference numeral 770, usage information such as the fact that query was executed can be recorded along with information regarding client context, for example for later analysis or use in determining subscription compliance based on a set number of queries (e.g., 100 queries per month). Next, at 780, return of one or more results of query execution can be at least initiated.
As used herein, the terms “component,” “system,” and “engine” as well as forms thereof are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an instance, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
The word “exemplary” or various forms thereof are used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Furthermore, examples are provided solely for purposes of clarity and understanding and are not meant to limit or restrict the claimed subject matter or relevant portions of this disclosure in any manner It is to be appreciated a myriad of additional or alternate examples of varying scope could have been presented, but have been omitted for purposes of brevity.
As used herein, the term “inference” or “infer” refers generally to the process of reasoning about or inferring states of the system, environment, and/or user from a set of observations as captured via events and/or data. Inference can be employed to identify a specific context or action, or can generate a probability distribution over states, for example. The inference can be probabilistic—that is, the computation of a probability distribution over states of interest based on a consideration of data and events. Inference can also refer to techniques employed for composing higher-level events from a set of events and/or data. Such inference results in the construction of new events or actions from a set of observed events and/or stored event data, whether or not the events are correlated in close temporal proximity, and whether the events and data come from one or several event and data sources. Various classification schemes and/or systems (e.g., support vector machines, neural networks, expert systems, Bayesian belief networks, fuzzy logic, data fusion engines . . . ) can be employed in connection with performing automatic and/or inferred action in connection with the claimed subject matter.
Furthermore, to the extent that the terms “includes,” “contains,” “has,” “having” or variations in form thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
In order to provide a context for the claimed subject matter,
While the above disclosed system and methods can be described in the general context of computer-executable instructions of a program that runs on one or more computers, those skilled in the art will recognize that aspects can also be implemented in combination with other program modules or the like. Generally, program modules include routines, programs, components, data structures, among other things that perform particular tasks and/or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the above systems and methods can be practiced with various computer system configurations, including single-processor, multi-processor or multi-core processor computer systems, mini-computing devices, mainframe computers, as well as personal computers, hand-held computing devices (e.g., personal digital assistant (PDA), phone, watch . . . ), microprocessor-based or programmable consumer or industrial electronics, and the like. Aspects can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. However, some, if not all aspects of the claimed subject matter can be practiced on stand-alone computers. In a distributed computing environment, program modules may be located in one or both of local and remote memory storage devices.
With reference to
The processor(s) 820 can be implemented with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any processor, controller, microcontroller, or state machine. The processor(s) 820 may also be implemented as a combination of computing devices, for example a combination of a DSP and a microprocessor, a plurality of microprocessors, multi-core processors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The computer 810 can include or otherwise interact with a variety of computer-readable media to facilitate control of the computer 810 to implement one or more aspects of the claimed subject matter. The computer-readable media can be any available media that can be accessed by the computer 810 and includes volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to memory devices (e.g., random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM) . . . ), magnetic storage devices (e.g., hard disk, floppy disk, cassettes, tape . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), and solid state devices (e.g., solid state drive (SSD), flash memory drive (e.g., card, stick, key drive . . . ) . . . ), or any other medium which can be used to store the desired information and which can be accessed by the computer 810.
Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
Memory 830 and mass storage 850 are examples of computer-readable storage media. Depending on the exact configuration and type of computing device, memory 830 may be volatile (e.g., RAM), non-volatile (e.g., ROM, flash memory . . . ) or some combination of the two. By way of example, the basic input/output system (BIOS), including basic routines to transfer information between elements within the computer 810, such as during start-up, can be stored in nonvolatile memory, while volatile memory can act as external cache memory to facilitate processing by the processor(s) 820, among other things.
Mass storage 850 includes removable/non-removable, volatile/non-volatile computer storage media for storage of large amounts of data relative to the memory 830. For example, mass storage 850 includes, but is not limited to, one or more devices such as a magnetic or optical disk drive, floppy disk drive, flash memory, solid-state drive, or memory stick.
Memory 830 and mass storage 850 can include, or have stored therein, operating system 860, one or more applications 862, one or more program modules 864, and data 866. The operating system 860 acts to control and allocate resources of the computer 810. Applications 862 include one or both of system and application software and can exploit management of resources by the operating system 860 through program modules 864 and data 866 stored in memory 830 and/or mass storage 850 to perform one or more actions. Accordingly, applications 862 can turn a general-purpose computer 810 into a specialized machine in accordance with the logic provided thereby.
All or portions of the claimed subject matter can be implemented using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to realize the disclosed functionality. By way of example and not limitation, the data acquisition system 100, or portions thereof, can be, or form part, of an application 862, and include one or more modules 864 and data 866 stored in memory and/or mass storage 850 whose functionality can be realized when executed by one or more processor(s) 820.
In accordance with one particular embodiment, the processor(s) 820 can correspond to a system on a chip (SOC) or like architecture including, or in other words integrating, both hardware and software on a single integrated circuit substrate. Here, the processor(s) 820 can include one or more processors as well as memory at least similar to processor(s) 820 and memory 830, among other things. Conventional processors include a minimal amount of hardware and software and rely extensively on external hardware and software. By contrast, an SOC implementation of processor is more powerful, as it embeds hardware and software therein that enable particular functionality with minimal or no reliance on external hardware and software. For example, the data acquisition system 100 and/or associated functionality can be embedded within hardware in a SOC architecture.
The computer 810 also includes one or more interface components 870 that are communicatively coupled to the system bus 840 and facilitate interaction with the computer 810. By way of example, the interface component 870 can be a port (e.g., serial, parallel, PCMCIA, USB, FireWire . . . ) or an interface card (e.g., sound, video . . . ) or the like. In one example implementation, the interface component 870 can be embodied as a user input/output interface to enable a user to enter commands and information into the computer 810 through one or more input devices (e.g., pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, camera, other computer . . . ). In another example implementation, the interface component 870 can be embodied as an output peripheral interface to supply output to displays (e.g., CRT, LCD, plasma . . . ), speakers, printers, and/or other computers, among other things. Still further yet, the interface component 870 can be embodied as a network interface to enable communication with other computing devices (not shown), such as over a wired or wireless communications link.
What has been described above includes examples of aspects of the claimed subject matter. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the claimed subject matter, but one of ordinary skill in the art may recognize that many further combinations and permutations of the disclosed subject matter are possible. Accordingly, the disclosed subject matter is intended to embrace all such alterations, modifications, and variations that fall within the spirit and scope of the appended claims