Advent of a global communications network such as the Internet has facilitated exchange of enormous amounts of information. Additionally, costs associated with storage and maintenance of such information has declined, resulting in massive data storage structures. Hence, substantial amounts of data can be stored as a data warehouse, which is a database that typically represents business history of an organization. For example, such stored data is employed for analysis in support of business decisions at many levels, from strategic planning to performance evaluation of a discrete organizational unit. Such can further involve taking the data stored in a relational database and processing the data to make it a more effective tool for query and analysis.
Accordingly, data has become an important asset in almost every application, whether it is a Line-of-Business (LOB) application utilized for browsing products and generating orders, or a Personal Information Management (PIM) application used for scheduling a meeting between people. Applications perform both data access/manipulation and data management operations on the application data. Typical application operations query a collection of data, fetch the result set, execute some application logic that changes the state of the data, and finally, persist the data to the storage medium.
Traditionally, client/server applications relegated the query and persistence actions to database management systems (DBMS), deployed in the data tier. If data-centric logic, it is coded as stored procedures in the database system. The database system operated on data in terms of tables and rows, and the application, in the application tier, operated on the data in terms of programming language objects (e.g., Classes and Structs). The mismatch in data manipulation services (and mechanisms) in the application and the data tiers was tolerable in the client/server systems. However, with the advent of the web technology (and Service Oriented Architectures) and with wider acceptance of application servers, applications are becoming multi-tier, and more importantly, data is now present in every tier.
In such tiered application architectures, data is manipulated in multiple tiers. In addition, with hardware advances in addressability and large memories, more data is becoming memory resident. Applications are also dealing with different types of data such as objects, files, and XML (eXtensible Markup Language) data, for example.
In hardware and software environments, the need for rich data access and manipulation services well-integrated with the programming environments is increasing. One conventional implementation introduced to address the problems above is a data platform. The data platform provides a collection of services (mechanisms) for applications to access, manipulate, and manage data that is well integrated with the application programming environment. In general, such conventional architecture fail to adequately supply: complex object modeling, rich relationships, the separation of logical and physical data abstractions, query rich data model concepts, active notifications, better integration with middle-tier infrastructure, and the like. Moreover, in these environments errors can build up context (e.g., nest) and become difficult to trace and unwrap to locate the error source.
The following presents a simplified summary in order to provide a basic understanding of some aspects described herein. This summary is not an extensive overview of the claimed subject matter. It is intended to neither identify key or critical elements of the claimed subject matter nor delineate the scope thereof. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
The subject innovation provides for systems and methods that present error messages in context of entities to application that issue rich queries, via an error propagation component. Such error propagation component can preserve original context for errors and operates across abstraction boundaries, to map across entities model to relational model. Accordingly, errors (e.g., concurrency errors from modifying underlying database, and associated with operations that manipulate data of the underlying database) can be built up along the way (e.g., context/nesting), wherein troubleshooting processes can drill back down to the original cause and unwrap the context in a reverse order, hence the error messages are presented in the context of entities and not the underlying store.
In a related aspect, the error propagation component can further include a tracking component (that establishes a trail for the data to readily facilitate identifying where such data has originated from), and a reconstruction component (that can further reconstruct the context), wherein optimizations can be employed to minimize information required to flow, and efficiently employ memory resources. Context information can attach to predetermined values (e.g. entity values mapped to every table) as opposed to every individual values, wherein the context information is subject to propagation behaviors, such as modification, merging, splitting, and elimination. The context carriers can also be chosen based on mapping specification.
In a related methodology, initially an application defines an operation (e.g., associated with queries) in terms of entity concept. For example, the operation can be in form of inserts, deletes, updates; or a query that can then be represented by an abstract class in form of a canonical representation, which has metadata tied therewith. In addition, such metadata can contain information about where data has originated, to designate a return address and identify which pieces of data travel together. Subsequently, as operators interact with the data the return address can be interpreted at each stage and upon occurrence of an error, respective return addresses can be unraveled. Next, the data that contributed to the operation that failed can be identified by walking through the graph of return addresses.
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the claimed subject matter are described herein in connection with the following description and the annexed drawings. These aspects are indicative of various ways in which the subject matter may be practiced, all of which are intended to be within the scope of the claimed subject matter. Other advantages and novel features may become apparent from the following detailed description when considered in conjunction with the drawings.
The various aspects of the subject innovation are now described with reference to the annexed drawings, wherein like numerals refer to like or corresponding elements throughout. It should be understood, however, that the drawings and detailed description relating thereto are not intended to limit the claimed subject matter to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the claimed subject matter.
The data storage system 135 can be a complex model based at least upon a database structure, wherein an item, a sub-item, a property, and a relationship are defined to allow representation of information within a data storage system as instances of complex types. For example, the data storage system 135 can employ a set of basic building blocks for creating and managing rich, persisted objects and links between objects. An item can be defined as the smallest unit of consistency within the data storage system 135, which can be independently secured, serialized, synchronized, copied, backup/restored, and the like. Such item can include an instance of a type, wherein all items in the data storage system 135 can be stored in a single global extent of items. The data storage system 135 can be based upon at least one item and/or a container structure. Moreover, the data storage system 135 can be a storage platform exposing rich metadata that is buried in files as items. The data storage system 135 can include a database, to support the above discussed functionality, wherein any suitable characteristics and/or attributes can be implemented. Furthermore, the data storage system 135 can employ a container hierarchical structure, wherein a container is an item that can contain at least one other item. The containment concept is implemented via a container ID property inside the associated class. A store can also be a container such that the store is a physical organizational and manageability unit. In addition, the store represents a root container for a tree of containers with in the hierarchical structure. As such, queries defined by applications in terms of entity concepts can readily be employed in conjunction with relational data stores. Similarly, results obtained from executing the query can be converted back to a form understandable by the application. Accordingly, a form that queries can be written is abstracted, wherein data can be modeled in same manner as employed in associated applications 101, 103, 105 (1 to N, where N is an integer)—so that queries need not be written in a manner that data is stored in the database, but rather the abstraction. The error propagation component 110 can preserve original context for errors and operates across abstraction boundaries, to map across entities model to relational model. Accordingly, errors (e.g., concurrency errors from modifying underlying database) can be built up along the way (e.g., context/nesting), wherein troubleshooting processes can drill back down to the original cause and unwrap the context in a reverse order, hence the error messages are presented in the context of entities concepts 102 and not the underlying data storage system 135.
According to one particular aspect, the CDP 202 provides data services that are common across the application frameworks and end-user applications associated therewith. The CDP 202 further includes an API 208 that facilitates interfacing with the applications and application frameworks 204, and a runtime component 210, for example. The API 208 provides the programming interface for applications using CDP in the form of public classes, interfaces, and static helper functions. The CDP runtime component 210 is a layer that implements the various features exposed in the public API layer 208. It implements the common data model by providing object-relational mapping and query mapping, enforcing data model constraints, and the like. More specifically, the CDP runtime 210 can include: the common data model component implementation; a query processor component; a sessions and transactions component; an object cache, which can include a session cache and an explicit cache; a services component that includes change tracking, conflict detection; a cursors and rules component; a business logic hosting component; and a persistence and query engine, which provides the core persistence and query services. Internal to persistence and query services are the object-relational mappings, including query/update mappings.
The store management layer 207 provides support for core data management capabilities (e.g., scalability, capacity, availability and security), wherein the CDP 202 supports a rich data model, mapping, querying, and data access mechanisms for the application frameworks 204. The CDP mechanisms are extensible so that multiple application frameworks 204 can be built on the data platform. The application frameworks 204 are additional models and mechanisms specific to application domains (e.g., end-user applications and LOB applications). Such layered architectural approach supplies several advantages, e.g., allowing each layer to innovate and deploy independently and rapidly.
As such, queries defined by applications in terms of entity concepts can readily be employed in conjunction with relational data stores. Similarly, errors encountered from executing the query can be converted back to a form understandable by the application. Accordingly, a writing form of the queries can be abstracted, wherein data can be modeled in same manner as employed in associated applications (e.g., queries need not be written in a manner that data is stored in the database, but can be supplied in an abstract form.)
Initially, and at 410 a query is defined in terms of entity concepts. Such entity concepts can implement structure/object oriented concepts such as inheritance, nesting, and the like. For example, the query can be parsed to facilitate creation of nodes for a tree structure, which functions as a canonical tree representation of the query. As such, a plurality of nodes can be obtained that form the canonical representation, which represent a structured form of the query. Moreover, the nodes can represent various relational and Entity constructs and operations such as expressions. Next and at 420, the generated command canonical representation can be translated into query language and native dialect of the store provider. At 430 errors are encountered, which can include concurrency errors from modifying underlying database, for example. Such errors can be built up along the way (e.g., context/nesting), wherein troubleshooting processes can drill back down to the original cause and unwrapping the context in a reverse order, hence the error messages are presented in the context of entities at 440 (e.g., as opposed to displaying such errors in form of the underlying store.)
As illustrated in
For example, user Alice attempts to add a new product category to the Northwind ObjectContext using a CategoryID colliding with an existing primary key value in the store, as indicated below:
The exception supplied from the store can include:
Such store exception is typically not meaningful to User Alice namely because; the exception mentions store constructs (tables and primary keys) rather than entity constructs (extents and entity keys). Accordingly, the appropriate context can be easily inferred. The SaveChanges method is an aggregate operator, so the store exception is ambiguous about the specific change causing the violation. In such case, the subject innovation maintains sufficient information in the update pipeline to allow context wrapping of the store exception. Moreover, in this example, the update pipeline can track the cache entry or entries mapped to each store command, wherein
The constraint exception allows user Alice to resolve the collision without any knowledge of the underlying data store or of the mapping from the value layer to the store. In a related example, user Bob's Northwind entity data model includes the notion of a Product entity type and a derived DiscontinuedProduct type. Such user Bob defines an entity set “Products” of type Product. The mapping specification to the Northwind database fails to describe behavior for the DiscontinuedProduct type in the Products extent. It can be a requirement of mappings that the contents of extents are fully mapped, so this is a violation. The resulting exception indicates that there is an incomplete mapping specification, namely:
The AI component 830 can employ any of a variety of suitable AI-based schemes as described supra in connection with facilitating various aspects of the herein described innovation. For example, a process for learning explicitly or implicitly how tracing of data to its original should be performed can be facilitated via an automatic classification system and process. Classification can employ a probabilistic and/or statistical-based analysis (e.g., factoring into the analysis utilities and costs) to prognose or infer an action that a user desires to be automatically performed. For example, a support vector machine (SVM) classifier can be employed. Other classification approaches include Bayesian networks, decision trees, and probabilistic classification models providing different patterns of independence can be employed. Classification as used herein also is inclusive of statistical regression that is utilized to develop models of priority.
As will be readily appreciated from the subject specification, the subject innovation can employ classifiers that are explicitly trained (e.g., via a generic training data) as well as implicitly trained (e.g., via observing user behavior, receiving extrinsic information) so that the classifier is used to automatically determine according to a predetermined criteria which answer to return to a question. For example, with respect to SVM's that are well understood, SVM's are configured via a learning or training phase within a classifier constructor and feature selection module. A classifier is a function that maps an input attribute vector, x=(x1, x2, x3, x4, xn), to a confidence that the input belongs to a class—that is, f(x)=confidence(class).
The word “exemplary” is used herein to mean serving as an example, instance or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Similarly, examples are provided herein solely for purposes of clarity and understanding and are not meant to limit the subject innovation or portion thereof in any manner. It is to be appreciated that a myriad of additional or alternate examples could have been presented, but have been omitted for purposes of brevity.
Furthermore, all or portions of the subject innovation can be implemented as a system, method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware or any combination thereof to control a computer to implement the disclosed innovation. For example, computer readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips . . . ), optical disks (e.g., compact disk (CD), digital versatile disk (DVD) . . . ), smart cards, and flash memory devices (e.g., card, stick, key drive . . . ). Additionally it should be appreciated that a carrier wave can be employed to carry computer-readable electronic data such as those used in transmitting and receiving electronic mail or in accessing a network such as the Internet or a local area network (LAN). Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
In order to provide a context for the various aspects of the disclosed subject matter,
With reference to
The system bus 918 can be any of several types of bus structure(s) including the memory bus or memory controller, a peripheral bus or external bus, and/or a local bus using any variety of available bus architectures including, but not limited to, 11-bit bus, Industrial Standard Architecture (ISA), Micro-Channel Architecture (MSA), Extended ISA (EISA), Intelligent Drive Electronics (IDE), VESA Local Bus (VLB), Peripheral Component Interconnect (PCI), Universal Serial Bus (USB), Advanced Graphics Port (AGP), Personal Computer Memory Card International Association bus (PCMCIA), and Small Computer Systems Interface (SCSI).
The system memory 916 includes volatile memory 920 and nonvolatile memory 922. The basic input/output system (BIOS), containing the basic routines to transfer information between elements within the computer 912, such as during start-up, is stored in nonvolatile memory 922. By way of illustration, and not limitation, nonvolatile memory 922 can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory 920 includes random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and direct Rambus RAM (DRRAM).
Computer 912 also includes removable/non-removable, volatile/non-volatile computer storage media.
It is to be appreciated that
A user enters commands or information into the computer 912 through input device(s) 936. Input devices 936 include, but are not limited to, a pointing device such as a mouse, trackball, stylus, touch pad, keyboard, microphone, joystick, game pad, satellite dish, scanner, TV tuner card, digital camera, digital video camera, web camera, and the like. These and other input devices connect to the processing unit 914 through the system bus 918 via interface port(s) 938. Interface port(s) 938 include, for example, a serial port, a parallel port, a game port, and a universal serial bus (USB). Output device(s) 940 use some of the same type of ports as input device(s) 936. Thus, for example, a USB port may be used to provide input to computer 912, and to output information from computer 912 to an output device 940. Output adapter 942 is provided to illustrate that there are some output devices 940 like monitors, speakers, and printers, among other output devices 940 that require special adapters. The output adapters 942 include, by way of illustration and not limitation, video and sound cards that provide a means of connection between the output device 940 and the system bus 918. It should be noted that other devices and/or systems of devices provide both input and output capabilities such as remote computer(s) 944.
Computer 912 can operate in a networked environment using logical connections to one or more remote computers, such as remote computer(s) 944. The remote computer(s) 944 can be a personal computer, a server, a router, a network PC, a workstation, a microprocessor based appliance, a peer device or other common network node and the like, and typically includes many or all of the elements described relative to computer 912. For purposes of brevity, only a memory storage device 946 is illustrated with remote computer(s) 944. Remote computer(s) 944 is logically connected to computer 912 through a network interface 948 and then physically connected via communication connection 950. Network interface 948 encompasses communication networks such as local-area networks (LAN) and wide-area networks (WAN). LAN technologies include Fiber Distributed Data Interface (FDDI), Copper Distributed Data Interface (CDDI), Ethernet/IEEE 802.3, Token Ring/IEEE 802.5 and the like. WAN technologies include, but are not limited to, point-to-point links, circuit switching networks like Integrated Services Digital Networks (ISDN) and variations thereon, packet switching networks, and Digital Subscriber Lines (DSL).
Communication connection(s) 950 refers to the hardware/software employed to connect the network interface 948 to the bus 918. While communication connection 950 is shown for illustrative clarity inside computer 912, it can also be external to computer 912. The hardware/software necessary for connection to the network interface 948 includes, for exemplary purposes only, internal and external technologies such as, modems including regular telephone grade modems, cable modems and DSL modems, ISDN adapters, and Ethernet cards.
What has been described above includes various exemplary aspects. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing these aspects, but one of ordinary skill in the art may recognize that many further combinations and permutations are possible. Accordingly, the aspects described herein are intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.
Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.