Data persistence is a key requirement in any application, whether is a consumer application or line-of-business (LOB) application. For example, shell and media applications save documents, music, and photos, e-mail applications save message and calendar objects, and business application suites save customer and order objects. Almost all of these applications define an object model for the data and write their own persistence mechanisms.
A standard mechanism for describing, querying and manipulating data is a relational database management system (RDBMS) based on SQL (Structured Query Language). The SQL data model is the language used to declaratively describe the structure of data in the form of tables, constraints and so forth. However, data-intensive applications such as LOB applications find that SQL falls short in meeting their needs in certain respects. Firstly, the structure of their data is more complex than can be described with SQL. Secondly, they create their applications using object oriented languages that are also richer in the data structures they can represent than SQL.
Developers of these applications address these short-comings by describing their data using an object-oriented design implemented in programming languages such C#. They then transfer the SQL data to and from objects either manually or using some form of object-relational technology. Unfortunately not every object-oriented design can be easily mapped to a given SQL implementation or, in some cases, to any SQL implementation, producing a lot of manual programming work for developers to deal with differences.
Another problem is that the capabilities developers come to know and appreciate from SQL are not available to them when their data is in the form of objects. For example, expressing a query must be done in terms of the underlying database, not the objects that they use for other tasks.
A solution is to provide a richer data model that is supported by a framework and the database server or a supporting runtime. To the developer it will look simply like a database with richer capabilities for describing and manipulating data. A common and simple but rich data model, would enable a common programming model for these applications, and allows application platforms to innovate on a common data access foundation. Consequently, there is an unmet need for a rich data model that provides the capability a common programming model for multiple disparate applications.
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed innovation. This summary is not an extensive overview, and it is not intended to identify key/critical elements or to delineate the scope thereof. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
The disclosed innovation is a rich data model called the common data model (CDM). It is supported by a platform that implements it called the common data platform (CDP). The CDM is the data model common to multiple application-specific data models. For example, it can support both PIM (personal information manager end-user application data and line-of-business (LOB) data. Similarly, an application with its own data model, such as Microsoft Windows™ SDM (system definition model) can specify its model on top of the CDM. The CDM enables improved interoperability between applications.
There are a significant number of data modeling and persistence concepts commonly used in applications that can be factored out into the common data model, thereby providing rich persistence framework that can be leveraged by large numbers of applications. The CDM capabilities include subsuming relational concepts, defining a rich object abstraction for data, modeling rich semantics (e.g., relationships), minimizing mismatch between an application and the CDM, aligning with CLR (Common Language Runtime) type system, supporting behaviors to enable development of mid-tier and client applications, and providing logical concepts. Modeling concepts capture the semantics independent of the data stores.
One example of where the CDM improves on SQL is in defining relationships. In SQL, a relationship between a Customer and an Order cannot be explicitly expressed. It is possible to express a foreign key constraint from which the relationship may be inferred, but a foreign key is just one of many ways to implement relationships. In the CDM, a relationship can be expressed explicitly and has attributes in the same way as a table definition has attributes. Relationships are a first class citizen. A second example is that the CDM has a type system for objects, which enables it to integrate more naturally with the CLR.
In another aspect thereof, an alternative implementation of the CDM is provided wherein relationships are defined at a top level using <Association> or <Composition> elements. Accordingly, there is no need to define a property on the source (or parent) in order to define a Ref association (or composition).
To the accomplishment of the foregoing and related ends, certain illustrative aspects of the disclosed innovation are described herein in connection with the following description and the annexed drawings. These aspects are indicative, however, of but a few of the various ways in which the principles disclosed herein can be employed and is intended to include all such aspects and their equivalents. Other advantages and novel features will become apparent from the following detailed description when considered in conjunction with the drawings.
The innovation is now described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding thereof. It may be evident, however, that the innovation can be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate a description thereof.
As used in this application, the terms “component” and “system” are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component can be, but is not limited to being, a process running on a processor, a processor, a hard disk drive, multiple storage drives (of optical and/or magnetic storage medium), an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a server and the server can be a component. One or more components can reside within a process and/or thread of execution, and a component can be localized on one computer and/or distributed between two or more computers.
While certain ways of displaying information to users are shown and described with respect to certain figures as screenshots, those skilled in the relevant art will recognize that various other alternatives can be employed. The terms “screen,” “web page,” and “page” are generally used interchangeably herein. The pages or screens are stored and/or transmitted as display descriptions, as graphical user interfaces, or by other methods of depicting information on a screen (whether personal computer, PDA, mobile telephone, or other suitable device, for example) where the layout and information or content to be displayed on the page is stored in memory, database, or another storage facility.
Referring initially to the drawings,
The CDM 100 is the data model common to multiple application-specific data models. For example, it can support both PIM (Personal Information Manager end-user application data and Line-of-Business (LOB) data. Similarly, an SDM-type (Windows™ System Definition Model) application can specify its model on top of the CDM 100. The CDM 100 enables improved interoperability between applications.
There are a significant number of data modeling and persistence concepts that can be factored out into the CDM, thereby using a common vocabulary for describing data and enabling a common set of services that can benefit all applications, such as an object-relational persistence framework. An indirect goal of the CDM 100 is to free applications from defining their own persistence infrastructure and also to enable higher levels of application interoperability across different data stores. Other goals include subsuming relational concepts, defining a rich object abstraction for data, modeling rich semantics (e.g., relationships), minimizing mismatch between an application and the CDM, aligning with CLR (Common Language Runtime) type system, supporting behaviors to enable development of mid-tier and client applications, and logical concepts. Modeling concepts capture the semantics independent of the data stores.
The CDM of the subject innovation provides at least the following novel aspects.
Following is a textual description of the CDM type system over which any algebra will operate. Indentation indicates the indented type is a kind of the “outdented” type; for example, an Array type is an Inline type.
At 200, a schema is provided that defines a namespace for scoping schema definitions. At 202, entity types are defined for grouping properties and methods. At 204, a table set entity is defined whose properties are tables. At 206, semantic connections are expressed between entities using relationships (e.g., associates, compositions, . . . ).
Entities model real world objects. Entity is the data object that is uniquely identifiable, using its identity (key), in the CDM. Entity is the smallest unit that can be shared (referenced) using its identity. Entity has structure (e.g., properties) and behavior (e.g., methods). Some examples of different types of entities are Order, Customer, Contact, Document, etc. Entities are similar to typed rows in SQL99 or objects in ODBMSs. Entities are defined as instances of entity types. Below, for example purposes only, is syntax for an entity definition:
Every entity has a unique identity that is made up of entity key values. This identity is the basis for forming a reference to the entity. An entity key is a set of one or more properties of the entity. Every entity has a unique identity that is made up of entity key values. This identity is the basis for forming a reference to the entity. An entity key is a set of one or more properties of the entity. Every non-abstract entity type definition must specify the key properties or inherit the key specification from a base entity type. The values of the key properties may be user-defined or system generated.
An entity's identifier is formed from the entity's key plus the identifier of the entity's containing or parent entity. A parent entity is the entity containing the table where the (child) entity is stored. The key of an entity is only required to be unique within its table—another table can contain an entity with the same entity key value. Thus an entity identifier is unique by combining its key values with its parent's identifier. In some instances, a key can be unique store-wide, such as through the use of a globally unique identifier (GUID). The CDM only requires that the identifier be unique within an element, (e.g., an EntitySet).
An identity completely identifies an entity and that can be de-referenced to return the entity instance. A reference uses the identity. Given an entity, its reference value can be obtained. Two entities are the same if and only if their identities are the same. The syntax of a reference type in the CDM is “Ref(<entity_type>)” and properties can be of a “ref” type; such a property is called a ref property. Ref values can be persisted; they are durable references to entities. Copying ref values does not copy the entities to which they refer. Ref values could also provide navigation within a query language, for example, through Ref and Deref operators.
References enable the sharing of entities. For example, an order entity can have a Customer ref property. All orders for the same customer will have the same value for the customer ref property. The structure of a ref is implementation defined. Refs and keys can be exposed as types in an API. A ref implementation includes identifier information for of the entity it references, including key values and possibly the table where the entity is found. It could store a ref as individual key values (enabling efficient joining) or as a single opaque value. Functions could expose the structure of a ref to get key values or the table containing the entity.
The CDM consists of the following core concepts: entity and relationship. Entity is a set of closely related data with a single identity. Relationship is a mechanism that relates two or more entities.
An Association 302 is the most general form of relationship 300 between two or more entities. The entities, called ends, are related to one another via an explicit source-target (like foreign—primary key) relationship or via a query. Each of the ends in the relationship remains independent of the other ends. It is possible to cause one end to be deleted when the other end is deleted or to prevent one end from being deleted as long as the other end exists.
A Composition 304 is a parent entity that is related to a child entity (or entities) in such a way that the child entity is conceptually an integral part of the parent entity. The child entity lives in exactly one parent, and therefore must always be deleted when the parent entity is deleted. Further, its identity need only be unique among the other child entities in that composition.
An Association Entity 306 is defined where two or more entities, the ends, are linked together by relationships on a separate entity, the association entity, which itself may have properties. Each of the ends remains conceptually independent of the others.
Entity members have member types and/or take typed parameters. The following kinds of types, Inline Types 412 and Table Types 414, are available when describing the members of an entity. An Inline Type is a type whose data is stored inline on an entity. Like entity types, inline types are composed of members. Unlike entity types, inline types have no identity beyond that imposed by the entity within which they reside. Inline types can be declared directly and encompass several other kinds of types in the data model. Inline Types include the following.
A Simple Inline Type 416 is an inline type that has no internal structure that is visible in the common data model. CLR value types are simple types in the common data model. An Enumeration Type 418 is a set of named values. Enumeration types 418 are simple types that can be extended independently and simultaneously by multiple developers without fear of conflict. An Entity Reference Type 420 is a durable reference to a single entity, possibly including a reference to the table in which that entity resides. Entity references are used in conjunction with associations to relate two entities. A Table Reference Type 422 is a durable reference to a table. An Array Type 424 is an ordered collection of instances of an inline type other than array.
A Table Type 414 is an unordered collection of instances of a specified entity type. Tables are used in conjunction with compositions to relate two entities. All of the types listed above are contained types; that is values of these types can be contained by an entity.
An entity's type describes the members of the entity. Entity types can be derived from a base entity type, in which case the derived entity type contains all the members of the base type along with the members described for the derived type. Entity types can be extended independently and simultaneously by multiple developers without fear of conflict. Such entity extension types do not depend on type inheritance. Entity and entity extension types are not contained types.
A table set is an instance of an entity type that has table-valued properties. Declaring a table set creates a single named instance of the type and thus each of the tables it contains. A table set creates the location for storing data similar to the way that creating a database creates a location for storing data.
Referring now to
Associations can be implemented in a number of different ways. One approach is to use a reference (which is like a pointer). In the previous example of
Another approach is to use a conditional association, which is a relationship described in terms of properties. There are two properties related together in some way, or a set of properties interrelated in some way. A common value relationship is a relationship where if two entities have the same value, they are related. An example is a document (entity) that has an author name (property) and another entity called contact, and it has a contact name property. A relationship can be set up between the author name property of the document entity and the contact name property of the contact entity. If those property values are the same, then there is a relationship between those two entities. This can be generalized to create some arbitrary expression that says “if this expression is true, then these entities are related”. For example, if a first entity has a first rectangle property, and a second entity has a second rectangle property, a relationship can be defined such that the first and second entities are related if the second entity rectangle completely encloses the first entity rectangle.
A third approach is an association entity, where an entity is used to make the connection. This can be used to have properties on an association.
When an entity (e.g., SalesData) is defined, it can be related to many different entities. For example, a company will have a set of orders inside of it and a set of customers inside of it. This illustrates a couple different compositions. An Order will have order lines inside of it, such that a tree starts to take shape. At the top of the tree is a special entity called the table set entity. A table set is a single instance of some entity type. (This construct is analogous to the SQL notion of a database.) A table set provides a means of data declaration. In the figure, it is depicted in white (rather than a shaded box), since it is different from the other entities. The LOB example shows a SalesData table set (or “database”) with one or more companies, addresses, customers and orders. The Customer entity can be part of a single company at a time, for example, either the company representing the residential business or the company representing the commercial business, for example. To illustrate the concept of entity relationships in the LOB model of
An aspect of composition in the CDM is that if a company sets up orders, that set of orders is disjoint from another company's set of orders. For example, the orders from retail never intersect with or overlap with the orders from engineering. Composition means that these sets are separate, do not overlap, and they do not share. As indicated supra, associations are represented by the lines with arrows. Thus, an order can refer to a customer, and another order (not shown) can be generated that refers to the same customer. Here, Order need not refer to its immediate peer (Customer), but can refer directly to the contained Address entity. More than one order can refer to Address, which is labeled the BillingAddress. Conceptually, the order will be billed to the person at that address. However, each OrderLine of the order can be shipped to separate shipping addresses referenced off of each line. Each of those addresses come from the same customer that is referenced by the order that is the parent of the order line. Accordingly, references extend over to the addresses that belong to the customer.
Another aspect of the CDM allows introduction of a constraint. Consider the three association lines: from Order to Customer; Order to Address; and OrderLine to Address. Nothing requires that the customer addresses referenced by the order or its lines must belong to the order's customer. It is possible to have addresses refer to an address of another customer. There are, however, constraints in the CDM that can be applied to ensure that the address is that of the appropriate customer. This is referred to as the “scoping” of associations. With respect to the association between the Order entity and the Customer entity, a constraint is that the Order may only refer to a Customer in the same Company. That is, they have a “common ancestor.”
With respect to the association between the Order entity and the Address entity, a constraint is that the Order can only refer to an Address in the aggregate table /SalesData/Company/Customer/Address. Aggregate tables are addresses in constraints and queries by a property path. Here, the SalesData table set type has a table property 800 named “Companies”. Each Customer entity has its own table of Address entities. The aggregate of those tables (effectively their union) is addressed by the Table Name: SalesData/Companies/Customers/Addresses at the Address entity.
In
As indicated supra, another aspect provided by the CDM is aggregation of tables. For example, a table of order lines for one order is disjoint from another table of order lines for another order. However, it may be of interest to know how many of a certain item was ordered irrespective of order. Thus, there is a need to look through all order lines, irrespective of order, for the number of widgets ordered. Aggregate tables are the means in which to do this. That is, it is possible to look at all order lines across many different orders, or other entity types.
Following is a more detailed description of the CDM of the disclosed innovation. In the CDM, keys are defined on the type; in contrast, in SQL99 keys are defined on tables rather than on the row type definition. While decoupling key definition from the entity type definition may seem flexible (or extensible), it actually limits the reusability and portability of types. If keys are defined on tables, then type-specific behavior may not work across different tables. That is, there is no guarantee that some business logic (say creating customers, orders and relating them) written on entity types will work across different stores of the entity types, thereby diluting the re-usability of types. In SQL99, this is not an issue as it does not specify how types are mapped to client/mid-tier application programming environments. Lack of identity on entity types in SQL99 forces one to map type tables to programming language objects (Classes) rather than mapping entity types. In addition, associating identity with tables does not accommodate supporting transient entities. Identity is associated with the entity type in order to support reusable types and tier-agnostic type behaviors.
Entity Persistence. Entities can be created by invoking a constructor (new) method of the entity type; entities are persisted by adding them to a table. In the CDM, an entity table is a typed collection of entities. The entity tables are similar to SQL99 typed tables, but they are logical. That is, an entity table may be mapped to one or more physical SQL tables. The table of an entity type is referred to as an entity table of the type.
The lifetime of an entity is dependent on the lifetime of the table of which it is a member. When a table is deleted, the entities in it are also deleted. Entities can also be explicitly deleted.
Entity tables are defined by defining a composition or specifying a property in an entity type. Logically an instance of the table is created when an instance of the entity type is created and deleted when the entity instance is deleted (however, the physical SQL table is generally created when the schema that defines the type is installed and exists until the schema is uninstalled).
Such properties define a composition relationship between parent and child entity types (Order and OrderLine, respectively, in this case).
Any number of tables can be created to store instances of a given entity type. Each table is completely independent (keys are unique only in the scope of a single table, etc.). There is no global “extent” of all instances of a given type that can be queried.
An entity type can be constrained to a single table by including a Table attribute on the <EntityType> element. This is useful when the entity will contain behaviors that depend on the existence of tables of other entity types. For example, the Order type is likely to depend on the existing of a Customer table (and vice-versa) and this is reflected by the inclusion of the Table attributes in the example below:
Placing the table constraint on the entity precludes having more than one table of that type. A less restrictive approach is to place constraint on an association, as described in the Association section in the chapter on Relationships.
Table Sets. A table set type is a restricted form of an entity type. Table set types may have only reference and table valued properties, computed properties, and/or methods. For example:
A table set is an instance of a table set type. Each table set instance has a name that is unique within a given store. Table set instances can be declared in a schema or created dynamically using operations provide by a store. An example table set instance declaration from a schema is shown below:
The table set name along with a table property name can be used in the FROM clause of a query. For example:
A table set type may declare a default table set instance. For example:
It is also possible to aggregate previously defined table sets into a new table set. This is useful when combining data from two separate applications into a single application. Note that the SalesData entity type in the example above is abstract. This is because non-abstract entity types must specify key properties, and properties of simple types are not allowed in an entity type used for a table set. It is possible to aggregate previously defined table sets into a new table set. This is useful when combining data from two separate applications into a single application.
Entity Types vs. Inline Types. An inline type is a non-entity type. Inline types are similar to structure types; they are just values. They do not have any identity; each inline instance is different even if they have identical values. In the CDM, inline types can only be used as the type of entity properties. Inline type values are stored inline with the entity it is part of. Since an inline type instance does not have its own identity, it is not referenceable. It is referenceable via the entity property holding the inline type instance. Below is an example inline type definition:
Though both entity types and inline types are strongly typed and have similar structure, they have distinct persistence, sharing and operational semantics. Inline type instances are not persisted by themselves; they are structurally part of an entity type. Inline type instances cannot be shared; each instance usage is exclusive. Inline type instances are not target for most operations like copy, move, delete, backup, restore, etc.
Due to the above semantics differences, it is important to provide different inline and entity type concepts so that applications can program appropriately. In SQL99, the inline and entity type concepts are not modeled explicitly. In SQL99, there are just “user defined types”. If a type is used as the type of a column, it behaves like an inline type; if it is used to define a table, it acts as an entity type (row type). Since keys are defined on table, only rows of typed tables have identity. Since types do not have keys, in SQL, while reasoning about type instances, one has to talk in terms of type instances with keys and instances without keys.
In the CDM, entity types and inline types are explicitly modeled as separate concepts with separate syntactic specification.
Data Concurrency. Data concurrency can be managed either optimistically or pessimistically. In either case the unit of concurrency management is the entity, either for conflict detection in optimistic concurrency or locking in pessimistic concurrency. The data model allows that different conflict detection schemes may be employed and considers it a policy decision of the data model implementation and the developer, if the data model implementation gives them that flexibility (e.g., the implementation could ignore conflicts if none of the properties being updated had changed).
In pessimistic concurrency, locks are taken on a complete entity excluding its nested tables. Thus if an entity is read with a lock, the read will fail if another user has the entity locked. However, if only a child entity is locked the read of the parent will succeed. The data model allows that different locking schemes may be employed and considers it a policy decision of the data model implementation and the developer, if the data model implementation gives them that flexibility.
Relationships. A Relationship relates two or more entities. For example, a Contact authors a Document, or an Order contains OrderLines. A relationship can be an Association or Composition. Associations describe “peer-to-peer” relationships between entities, while composition describes a parent/child relationship between two entities.
Relationships themselves are not stored as instances of types in the store, but are embodied by data on the related entities. One particular use of associations specifies an entity type, the association entity, as the entity that embodies the relationship and, optionally, may store additional data as a part of the relationship. Each association has a name; the name signifies the semantic relationship between the entities. For example, DocAuthor is the name of a relationship between a document and a contact; the DocAuthor association relates the contact as the author of the document; similarly, OrderCustomer is an association which associates a customer with an order; given an order, the association can be navigated to determine its customer.
Note that the concepts of association and composition are consistent with UML association and composition concepts. The entity-relationship diagramming and the UML terminology (e.g., Role, multiplicity, . . . ) is preserved as much as possible.
The concept of relationships is not explicitly supported in the relational model. Foreign and primary keys and referential integrity provide the tools to implement relationships in a limited way. SQL99 has added object-relational extensions like Ref and Table types to support single and multi entity valued properties but relationships are not modeled formally.
The CDM combines the SQL99 ref and table properties and the UML association and composition concepts. This approach brings rich relationships and navigation to SQL and queryability to applications modeled using UML.
Associations. Associations represent peer-to-peer relationships between entities. They can be based on the use of reference typed properties or non-reference typed properties. They can also involve an association entity that plays a particular role. Each is described in turn.
Associations Using Reference Typed Properties. Consider the following example schema.
The association has two ends, each of which represents a related entity. Each end provides the following information:
The <Reference> element indicates that this is a reference-based association. This element specifies the following information: FromRole is the role that contains the reference property that implements the association; ToRole is the role that is the target of the reference; and Property is the name of the reference property.
In the OrderCustomer association, the CustRefproperty on the OrderRole end is related to the identifier of the CustomerRole entity; the CustRef property acts like a foreign key.
Associations Using Non-Reference Typed Properties. The OrderCustomer association above relates order and customer entity types on the customer's identity. In general, it is possible to relate two entity types on any properties of the ends. For example, consider the following DocumentAuthor association where Document.Author property is related to Contact.Name. Since Contact.Name is not unique this association may return multiple contacts for a document.
One difference between this example and the Customer/Order example is that instead of specifying a <Reference> element a Boolean expression is provided inside of a <Condition> element. Since the <Condition> can contain an expression of arbitrary complexity, this is a very flexible form of association. It in effect makes it easy to reuse a join in applications by adding it as a first class part of the query and programming models.
In cases where only a simple equivalence between two properties is needed, a simplified syntax is supported. In such cases, a <CommonValue> element can be used instead of a <Condition> element. The example below has the same meaning as the previous example (with the exception of the OnUpdate behavior described below):
Because the properties are explicitly listed and always have the same value, one additional feature is available: the OnUpdate attribute. The possible values of this attribute are: Cascade, SetNull, or Restrict. In this example an attribute value of “Cascade” indicates that if the property on the ContactRole end is updated, then the value is propagated to the property on the other end; if OnUpdate=“Restrict”, then this property cannot be changed if there is an entity associated with the other end; if OnUpdate=“SetNull”, then if the property on this end is updated, the property on the other end is set to null.
Association Entities. It is common to associate properties with a relationship. For example, typically the employment relationship between an organization and a person carries properties like EmploymentPeriod. The property part of the Organization or the Person type can be made but the property does not mean anything without the relationship. For example, EmploymentPeriod property on the Organization is meaningless unless the person employed is also present; similarly the property is not meaningful on the Person entity.
In the CDM, only entities (entity type instances) are persisted. Only entities (actually tables of entities) are queryable. Associations and compositions are stored as metadata. Therefore properties on associations must be stored in an entity. Such an entity is an association entity. This plays the same role as the association class in UML. The association entity type is like a middle (link or join) table in relational systems, with references (foreign keys) to the entities it is linking. The key of the association entity generally includes the references to the related entities. For example, consider the following:
PSLink is an entity type which is linking products and suppliers. PSLink relates Product and Supplier types by specifying two reference properties, ProductRef and SupplierRef, to Product and Supplier types respectively. In addition to relating products and suppliers, it has properties, Price and Quantity, meaningful to the relationship.
The possible associations between Product, Supplier and PSLink are between: PSLink and Product, PSLink and Supplier, and Product and Supplier. These associations could be explicitly defined as follows:
In the above example, the fact that PSLink is an association entity is not explicit and therefore could not be utilized in the definitions of the associations. The schema designer must tediously define all the necessary association definitions. However, this can be avoided by introducing the notion of an explicit association entity as part of an association definition. The above example is rewritten specifying PSLink as an association entity below:
Note the following aspects about this example:
Association entities can also be used in associations using non-reference typed properties. See the complete descriptions of associations later in this document for details.
Scoping Associations. Scoping an association specifies that the entities at both end of the association are children of the same entity instance. An association is scoped by placing in a Scope attribute the name of the entity type that contains both of the ends. The Table attribute on the ends must then start with that type. Consider the following example in which a Car has an Engine, Wheels and DriveTrain.
The Engine and Wheels are connected by the DriveTrain.
The above sample shows that the DriveTrain connects an Engine and Wheels from the same Car. It is not legal for the Engine from one Car to be attached to the Wheels from another Car. Any Table attributes on the <End> elements must begin with the scoping entity. An <End> element can indicate that scoping does not apply to it by adding the Scoped=“false” attribute.
Composition. Composition is a modeling concept which defines a compositional relationship between two entities. Consider the Order example again; the Lines property and the OrderLines relationship define a composition between Order and Line entity types. There is a structural relationship between Order and Line types; lines are part of (or composed in) an order. A line entity belongs to a single order entity; the line entity is exclusively part of an order; an order and its lines form an operational unit.
In contrast, in the above examples Order and Customer are independent entities. The Orders and Customers tables are independent of each other. The OrderCustomer association relates orders and customers. Unless OnDelete=Cascade is specified on the association, the lifetime of orders and customers are independent of each other. OnDelete=Cascade adds behavioral constraint that requires orders associated with a customer be deleted when the customer is deleted. However, there is no structural relationship between the order and customer entity types. For example, fetching a customer entity does not access any orders and vice-versa. In addition, it is possible that Order may participate in another but similar association with Supplier; say there is a SupplierRef property on Order and an OrderSupplier association. This association could also specify OnDelete=“Cascade”, meaning an Order's lifetime is controlled either by a customer who placed the order, or by a supplier who supplied the order. However, Order and Customer or Order and Supplier do not constitute operational units for operations like copy, move, backup, restore, etc.
While it is true that that composition is a further specialization of the association concept, it is a fundamental concept, especially from structural and operational perspective. The relationship between Order and Lines is very different from that between Order and Customer. Therefore, in the CDM composition is a distinct, top-level concept.
Composition and Association Entities. One particular usage of association entities composes the association entity itself into one of the associated entities. Consider the example below:
Here we have an entity type, Item, which contains a nested table of the Link entity type. The Link type itself is an association entity in that it sits between two related entities (two items in this case) and contains relationship specific properties. The fact that a composition is to relate Items and Links does not change the fact that the true relationship being modeled is between two Items.
Navigation. One of the benefits of modeling relationships explicitly is the ability to navigate from one entity to the related entities using the relationship definition. Such navigation can be supported via queries against the CDM or by using APIs classes generated from the entity type and relationship definitions. In the OrderCustomer example, given an order entity one can navigate to its customer using the CustRef property and the association definition. The association definition can also be used to navigate from customer to its orders. The association metadata can be used to generate a join based query to traverse from customer to orders. Similarly, in the DocumentAuthor association can be used to generate navigation from document to contact and vice-versa.
In the CDM, it is possible to pre-define association-based navigation properties using the relationship definitions. An example of such a navigation property is shown below:
The NavigationProperty element in the Customer entity specifies the navigation path from a customer (or customers) to the orders associated with the customer(s). This property could be represented in the query and programming models as a virtual (non-materialized) and queryable read-only collection of references to Order entities.
Following is a description of the CDM schema language. All entity, relationship, and table set definitions occur in the context of a schema. The schema defines a namespace that scopes the names of the things defined within the schema. The <Schema> element is the root element of a schema document. The <Schema> element may have the following attributes:
For example:
A schema may reference types defined in a different schema using fully qualified type names (namespace-name.type-name). For example:
A schema may include an <Using> element to bring the type names defined in an external schema into scope. The <Using> element may have the following attributes:
For example:
Naming Rules. All type, relationship, and table set names must conform to the CLR rules for type names. The names should also conform to the .Net Framework type naming guidelines. Type and relationship names should be unique. That is, a type and relationship cannot have the same fully qualified name and no two types or two relationships can have the same fully qualified name. All table set names should be unique (no two table sets can have the same fully qualified name), but a table set may have the same name as a type or relationship.
Simple Types. A simple type represents a single value with no internal structure visible in the data model. CLR value types are used as simple types in the CDM. A number of value types defined in the CLR's System, System.Storage, and the System.Data.SqlTypes namespace are natively supported for use in the data model. These types are:
Any value type that satisfies a set of requirements should also be able to be used as a simple type in the data model. The conditions such a type needs to satisfy will either allow it to be stored and used directly in a query (such as an UDT) or will provide the meta-data necessary provide storage and query mappings via CLR attributes.
Simple Type Constraints. It is possible to constrain values of a simple type using one of the constraint elements defined below. These elements can be nested inside of various elements that refer to simple types (e.g., a <Property> element in an <EntityType> or <InlineType> element).
Length. A <Length> constraint element can be applied to System.String, System.Storage.ByCollection, and System.Data.Sq1Types.String types to constrain the length of the value. This element may contain the following attributes:
To be compatible with a constraint specified in a base type, the value specified for Minimum must be equal to or greater then the previous value and the value specified for Maximum must be equal to or less then the previous value.
Decimal. The <Decimal> constraint element can applied to System.Decimal and System.Sq1Decimal types to constrain the precision and scale of acceptable values. This element may contain Precision and Scale attributes.
Default. The <Default> constraint element can be applied to any simple type to specify the default value to be used for a property. This element may have the following attributes: Value—Required. The default value for the property. The value of this attribute must be convertible to a value of the proper type. Note that the <Default> element does not actually specify constraint. Any value is compatible with a value specified in a base type.
Check. The <Check> constraint element can contain Boolean query expression. For the property value to be valid, this expression must evaluate to true. The expression must not have any side effects. When a check constraint is specified in both the base and derived types both constraints are validated.
Enumeration Types. An enumeration type defines a set of names that represent unique values. The type of the underlying values, nor the stored value itself, is visible in the CDM. When custom storage mappings are used, the underlying type is defined by this mapping. For proscriptive storage, the underlying type would selected automatically or could be provided via a storage specific hint. An enumeration type is declared using an <EnumerationType> element. This element may have the following attributes:
The <EnumerationType> element may contain zero or more <EnumerationMember> elements. These elements may have the following attributes:
An example enumeration is defined below:
Note that “bit flag” style enumerations cannot be described using an enumeration type. It is necessary to instead use an array of an enumeration type.
Extensible Enumerations. When an <EnumerationType> element specifies Extensible=“true” it is possible to extend the enumeration with additional values. A property of the enumeration type can contain any of the values specified in the <EnumerationType> or in any extension of that type.
The values defined in each extension are distinct from the values defined in the base enumeration type and all other extensions. This allows an enumeration type to be extended independently by multiple developers without the possibility of conflicts.
An enumeration extension type is defined using an <EnumerationExtensionType> element. This element may have the following attributes:
The <EnumerationExtensionType> element may contain zero or more <EnumerationMember> elements. These elements may have the following attributes:
An example enumeration is defined below:
Array Types. Instances of array types can store multiple instances of a specified simple, inline, enumeration, entity reference, or table reference type (arrays of arrays are not allowed). These instances are the elements of the array. The order of the elements is preserved and can be explicitly maintained by an application. Applications can insert elements into the array and delete elements from the array. Array types are specified using the syntax:
Array Type Constraints. It is possible to constrain values of an array type using one of the constraint elements defined below. These elements can be nested inside of various elements that refer to array types (e.g., a <Property> element in an <EnityType> or <InlineType> element).
ElementConstraint. The <ElementConstraint> constraint element can be used to place constrains on the elements in the array. Any constraint element that is valid for the element type can be specified inside the <ElementConstraint> element.
Occurs. The <Occurs> constraint element can be used to constrain the number of elements in the array. This element may have the following attributes:
To be compatible with a constraint specified in a base type, the value specified for Minimum must be equal to or greater then the previous value and the value specified for Maximum must be equal to or less then the previous value.
Unique. The <Unique> constraint element can be used to specify a property or properties of the element type that must contain a unique value in the array. This element may have the following attributes: Properties—Required. A comma separated list of element property names.
Check. The <Check> constraint element contains a Boolean query expression. For the property value to be valid, this expression must evaluate to true. The expression must not have any side effects. When a check constraint is specified in both the base and derived types both constraints are validated. Note that a check constraint on an array property applies to the property as a whole. For example, it can be used to check that the sum of an element property is less then some limit. Alternatively a check constraint placed inside an <ElementConstraint> element would apply to each value individually.
Table Types. Instances of table types can store an unordered collection of instances of a specified entity type. The specified entity type, or a type in its base type hierarchy, must specify key properties. Note that this does not necessarily mean that the entity type cannot be an abstract type. The key properties of the entity type stored in a nested table must be unique in any table instance. The table can hold any entity of the specified type, or a type derived from that type. Applications can insert entities into the table and remove entities from the table. Table types are specified using the syntax:
Entity types (and only entity types) can define properties of table types. Such properties represent nested tables. Nested tables define a storage location that is dependent on an instance of a containing entity. The entities stored in a nested table are considered to be part of the cohesive unit of data defined by the containing entity (e.g., they are deleted when the containing entity is deleted). However, there is no consistency guarantee concerning changes to the container and contained entities, expect as explicitly managed via an application using transactions. It is an error to define recursive tables. That is, an entity may not define a table of its type, its super-types or its sub-types, nor can an entity declare a table of some other entity type having a table of its type. Table typed properties define composition relationships between two entities (the parent entity with the property and the child entities contained in the table).
Entity Reference Types. Instances of reference types store a reference to an entity of a specified type. The reference encapsulates a reference to the table that contains the entity and the entity's key property values. A reference can be resolved to the entity that is the target of the reference. Reference types are specified using the syntax:
Table Reference Types. Instances of table reference types store a reference to a table. The target table could be a “top level” table in a table set or a nested table. A reference can be resolved to the table that is the target of the reference. Table reference types are specified using the syntax:
Properties. Properties are used in entity and inline types to allocate storage. A property is defined using a <Property> element. In addition to the common member elements defined above, this element may have the following attributes:
Some example property definitions are show below:
Property Constraints. The values that can be stored in a property can be constrained using a constraint element inside the <Property> element. The set of allowed constraint elements is dependent on the property's type and are defined as each type is discussed. In addition, a property can be further constrained in a derived type by placing the constraint elements inside a <PropertyConstraint> element. This element may have the following attributes:
When using a <PropertyConstraint> element, it is necessary that the specified constraints be compatible with any constraints defined on the base type. The description of each constraint element includes a definition of what compatibility entails. Some simple examples are shown below:
Constraining Target Types. When deriving a type from a type that has a reference property, it is possible to further constrain the type of entity that can be specified. This is done using the element inside of a <Propertyconstraint> element and specifying a value for the Type attribute as described herein. For example:
Computed Properties. Computed properties are used in entity and inline types to represent a computed, rather then stored, value. The algorithm used to compute the property's value is not considered part of the data model. A property can be described that returns a query as a part of the data model. A computed property is declared using a <ComputedProperty> element. In addition to the common member attributes defined above, this element may have the following attributes:
The example below declares a computed property named “X” with type “Int32“:
Computed Property Constraints. The values that can be stored in a property can be constrained using a constraint element inside the <ComputedProperty> element. The set of allowed constraint elements is dependent on the property's type and are defined as each type is discussed. In one implementation, <PropertyConstraint> elements also work with computed properties.
Methods. Methods are used on inline and entity types to represent an operation that can be executed. The algorithm used to implement the method is not considered part of the data model. In one implementation, a method can be described that takes a query as an input and returns a query as an output as a part of the data model. Associations essentially define such methods. A method is declared using a <Method> element. This element may have the following attributes:
A <ReturnTypeConstraints> element may be nested inside the <Method> element. Constraint elements that apply to the method's return type can be nested inside the <ReturnTypeConstraints> element. A <Method> element may have zero or more nested <Parameter> elements to specify the parameters accepted by the method. The <Parameter> element may have the following attributes:
Constraint elements that apply to the parameter's type can be nested inside the <Parameter> element. The example below declares a computed property named “X” with type “Int32” and a method named “Y” with a parameter named “a” of type “Int32” and which returns a value of type “Int32.”
Inline Types. Instances of inline types can only be persisted as a component of one and only one entity instance. Instances of inline types have no explicit identity (the identity is implicit in how it is located inside of its containing entity). The instances of inline types are considered to be part of the cohesive unit of data defined by the containing entity (e.g. they are deleted when the containing entity is deleted). There is also a guarantee that consistency across all the inline data that is part of an entity will be maintained without any explicit application action. Note that this does not preclude features like WinFS change units and synchronization from providing the schema designer finer grained control over consistency.
Schema designers can define new inline types with a structure consisting of a set of properties. Such an inline type may be derived from a single base inline type. Such types are defined using an <InlineType> element inside of a <Schema> element (nested inline types are not allowed). The <InlineType> element may have the following attributes:
Some example Inline type definitions are:
Inline Type Constraints. Instances of an inline type can be constrained using the following constraint elements. These elements can be nested inside of various elements that refer to inline types (e.g. a <Property> element in an <EnityType> or <InlineType> element).
Check. The <Check> constraint element can contain Boolean query expression. For the property value to be valid, this expression evaluates to true. The expression should not have any side effects. When a check constraint is specified in both the base and derived types both constraints are validated.
Entity Types. An entity type defines a cohesive unit of data. Instances of entity types can be persisted in one and only one table. Instances of entity types have an explicit identity. It is possible to store a durable reference to an entity type instance using a Reference type.
Schema designers can define new entity types with a structure consisting of a set of properties. An entity type may be derived from a single base entity type. Such types are declared using an <EntityType> element inside of a <Schema> element (nested entity types are not allowed). The <EntityType> element may have the following attributes:
Some example entity type definitions are:
Entity Type Constraints. Instances of an entity type can be constrained using the following constraint elements. These elements can be nested inside of various elements that refer to entity types (e.g. a <Property> element in an <EnityType> element).
Check. The <Check> constraint element can contain Boolean query expression. For the property value to be valid, this expression evaluates to true. The expression should not have any side effects. When a check constraint is specified in both the base and derived types both constraints are validated.
Entity Extension Types. To address WinFS and MBF scenarios, it must be possible to extend a given set of types without actually modifying the schemas where those types were defined. While this can be accomplished by adopting one of a number of patterns (such as providing a base class with a table of an Extension entity type), it is desirable to make such a concept a first class part of the data model. This allows special API patterns to be generated for extensions and is simpler for type designers to understand and use.
Entity types that specify the Extensible=“true” attribute can be extended. An entity extension is declared using an <EntityExtensionType> element inside of a <Schema> element. The <EntityExtensionType> element may have the following attributes:
It may not be possible to derive one extension type from another extension type. The <EntityExtensionType> element may contain any element that can be put inside of an <EntityType> element. Typically this includes <Property> elements. It is not necessary for the property names to be unique across all extensions or even across properties of the extended type. If InstanceManagement=“Implicit”, all of the properties defined in an extension must either be nullable, specify a default value, or have an array, collection, or table type with a minimum occurrence constraint of zero. Note that a singleton inline typed property cannot specify a default value and so must be nullable. This allows queries to be executed against the extension as if it were always present.
If InstanceManagement=“Explicit”, applications explicitly add and remove an extension to/from an entity instance using the operations defined in the query language for the CDM. A means to test for the presence of an extension is also provided. Entity extension instances are considered to be part of the cohesive unit of data defined by the containing entity (e.g., they are deleted when the containing entity is deleted). However, there is no consistency guarantee concerning changes to the entity and the extension, expect as explicitly managed via an application using transactions.
All EntityExtensions have a default property named “Entity” of type Ref(T), where T is the value of ExtendsType, which points to the entity instance with which the extension is associated. An example entity type and extension are shown below.
Association. An association allows two or more entities, the end entities to be related to one another. Each of the ends remains conceptually independent of the others. Associations are represented using an <Association> element. This element may have the following attributes:
An <Association> element has two or more nested <End> elements, each describing one of the ends of the association. Note that more than two <End> elements can be specified only if an <AssociationEntity> element is also specified. The <End> element may have the following attributes:
Association Styles. The style of the association indicates how the ends are connected. Every association is one of four styles, indicated by nesting one of the following elements under the <Association> element.
A <Reference> element describes a reference style association. The element may have the following attributes:
An example of a reference association “AtoB” relating entities A and B is shown below.
A <CommonValue> element describes a common value style association. The element may have the following attributes:
For an association entity one of Property1 and Property2 specifies the association entity role and the other specifies an end role. The specified end may contain an OnUpdate and/or OnDelete attribute. An example of a common value association “AtoB” relating entities A and B is shown below.
A <Condition> element describes an association based on an arbitrary join between the two ends. The element must contain an expression that evaluates to true for related entities. The role names specified in the <End> and <Using> elements can be used in this expression. If the <Association> element contains a <Condition> element, it may also contain zero or more <Using> elements. These elements describe additional entities that are used to define the relationship between the ends. The <Using> elements may have the following attributes:
Using the <End>, <Using> and <Condition> elements, a full query is constructed. This query has the form:
The following example shows two entities A and B related using an expression including a third entity C.
The <AssociationEntity> element indicates that the ends of the association are connected by an association entity. The <AssociationEntity> element may have the following attributes:
An <AssociationEntity> has one nested <End> element. An association entity defines an association between this <End> element and each of the <End> elements nested under <Association>. The style of these associations is described with a <Reference>, <Condition> or <CommonValue> element. One of the associations can be described instead with the <Composition> element:
The Role on each <End> of the <Association> must be referenced by one of the two roles on one of these style elements; the other role references the <End> of the <AssociationEntity>.
A <Composition> element describes the composition of the association entity into one of the end roles. It must be nested inside of <AssociationEntity>. The <Composition> element has the following attributes:
A <Condition> element must have the following attribute when it is nested under <AssociationEntity>:
The following example shows an entity association “AtoB” relating A and B using the association entity C, where C is composed in A and has a reference to B.
Composition. Every nested table defines a composition between entities of two types: the entity type that contains the table typed property and the entity type contained by the table. The composition is described using a <Composition> element. This element may have the following attributes:
The <Composition> element contains a <ParentEnd> element. This element may have the following attributes:
The <Composition> element must also contain a <ChildEnd> element. This element may have the following attributes:
The schema below defines two independent nested tables that can contain instances of A: B.ATable and C.ATable. There is also a nested table that can contain instances of B, D.BTable, and a nested table that can contain instances of C, D.CTable. Instances of D are contained in a top level table named DTable.
Nested Table Type Constraints. It is possible to constrain values of a collection and array type using one of the constraint elements defined below. These elements can be nested inside of various elements that refer to table types (e.g., a <Property> element in an <EntityType> or <InlineType> element).
EntityConstraint. The <EntityConstraint> constraint element can be used to place constraints on the entities in the table. Any constraint element that is valid for an entity type can be specified inside the <EntityConstraint> element.
Occurs. The <Occurs> constraint element can be used to constrain the number of entities in the table. This element may have the following attributes:
To be compatible with a constraint specified in a base type, the value specified for Minimum must be equal to or greater then the previous value and the value specified for Maximum must be equal to or less then the previous value.
Unique. The <Unique> constraint element can be used to specify a property or properties of the entity type that must contain a unique value in the table. This element may have the following attributes: Properties—Required. A comma separated list of entity property names.
Check. The <Check> constraint element can contain a Boolean query expression. For a table to be valid, this expression evaluates to true. The expression must not have any side effects. When a check constraint is specified in both the base and derived types both constraints are validated. Note that a check constraint on a table applies to the property as a whole. For example, it could be used to check that the sum of a particular property value is less then some limit. Alternatively a check constraint placed inside an <EntityConstraint> element would apply to each value individually.
Navigation Properties. A navigation property may optionally be placed on the entity specified by either end of a relationship. This property provides a means of navigating from one end to the other end of the relationship. The navigation properties are represented using an <NavigationProperty> element within the <Entity> definition. This element may have the following attributes:
Type Aliases. A type alias gives a unique name to a simple, collection, array, table, or reference type and a set of constraints. A type alias is defined using a <TypeAlias> element that allows the following attributes:
The <TypeAlias> element can contain any constraint element allowed for the aliased type. The defined alias name can be used anywhere the name of the aliased type can be used. Some example aliases are shown below:
Table Sets and Table Set Types. A table set type is a restricted form of an entity type. A table set type is defined using a <TableSetType> element. This element may contain the following attributes:
<TableSetType> elements may contain <Property> elements that specify reference and table types. Reference property may only specify table set types. <TableSetType> elements may also contain <ComputedProperty> and <Method> elements.
Table Set Instances. A table set is an instance of table set type. Table sets form the “top level” of the data model. All storage is allocated directly or indirectly by creating a table set. A table set is described using a <TableSet> element. This element may have the following attributes:
Aggregating Table Sets. The <TableSet> element contains an <AggregatedTableSet> element for each reference typed property in the specified entity type. This allows previously defined table sets to be aggregated into a new table set. This is useful when combining data from two separate applications into a single application. The <AggregatedTableSet> element may contain the following attributes:
The example below illustrates this:
Query Language. A query language for the CDM can be specified. The query language can be based on SQL targeting CDM concepts like entities and relationships.
Using Compositions in Queries. The following patterns work by determining the table set that an input table is based on. A composition can make the following functions available for use in query:
Using Associations in Queries. These patterns can work by determining the table set that an input table is based on. An association makes the following functions available for use in query:
Interfaces. Interfaces provide CLR-like interfaces for entity and inline types. They can be used to solve most of the same problems that interfaces solve in object type systems. An interface is declared using an <Interface> element. The <Interface> element can have the following attributes:
An <EntityType> or <InlineType> element can use an Implementedlnterfaces attribute to specify a comma separated list of interfaces that it implements. Each of the properties defined in a listed interface must be defined in the type. As with an interface in C#, a property defined in an interface is implicitly implemented in the type by simply declaring a property with the same name and type. The constraints on the property can be narrowed when it is declared. A property can also be explicitly implemented. This is accomplished by including the interface type name in the property name Oust as for C#). This allows different constraints to be applied to properties that have the same signature but are inherited from different interfaces. For example:
Following is a complete schema example. The example below defines Customer, Order, OrderLine, Product, and Supplier entity types along with associations describing how these entities are related.
SQL99 and the CDM. SQL99 defines several object extensions to the core relational data model (e.g., SQL92). Some key aspects of SQL99 are: User-defined types that includes both Distinct types and Structured types; Methods (Behaviors); Typed Tables; and Refs.
SQL99 defines part or all of a complete type system self-contained within the SQL data model. While objects in programming languages can be mapped to SQL objects, it is not a goal of SQL99 to define a tight binding with a programming language (e.g., Java, C#). For example, methods in SQL99 are defined in SQL procedural language rather than in a standard programming language. A goal of the CDM is to specify a tight alignment with both SQL and CLR.
User-defined Types. The types, simple and complex types, in the CDM map almost one to one with user defined types in SQL99. Simple types and simple type aliases map to SQL99 scalar types and distinct types; Complex types map to SQL99 structured data type. A main difference between a SQL structured type and a complex type is the distinction between Inline types and Entity types. In the CDM, the notion of identity/key is defined at the type definition time. In SQL99, identity is defined when the type is used to define a typed table. Therefore, in SQL99, there was no need to distinguish between types with and without identity, thereby supporting reuse of a type both for table (referenceable objects) and column definition (non-referenceable, inlined objects). Such reuse of type works just for storage as the identity can be defined at the time of the table definition. However, the same type cannot be mapped to an inlined class as well to a referenceable class. Since a goal of the CDM is to provide an application object framework and a persistence framework, the distinction between inlined types and entity types is important.
Methods/Behaviors. In the CDM, behaviors are defined using the CLR framework. SQL99 defines its own method/behaviors framework. While it is closed to most modern OO languages, it is still different and does not provide to good programming environment for applications. Typically applications (or application servers) bridge the gap between the programming environment and the database environment.
Typed Tables vs. Entity Tables. Entity tables in the CDM are similar to SQL99 typed tables. However, extent is logical—it is a logical collection of objects. There is no physical storage associated with extents. Type table are SQL table with all the storage attributes allowed on tables. Extents can be mapped to one or more physical tables.
Refs. Refs in the CDM and in SQL99 are very similar. In the CDM, a ref is scoped by specifying an extent for the ref; in SQL99, a ref is scoped by specifying a type table as the target for the ref. In both cases, ref is resolved to an object.
The CDM. At the center of data platform 1500 runtime 1506 is a CDM 1510. The intent of the CDM 1510 is to factor out the modeling concepts common across multiple application domains, from applications working mainly with user data (PIM, documents, etc.) to LOB and enterprise data. In addition to providing rich object and relationship abstraction, the CDM 1510 provides support for structure, unstructured and semi-structured data.
Row/entity data. The CDM 1510 supports a rich Entity-Relationship model to capture the structure and the behavior of structured data (e.g., business data). The CDM 1510 is a superset of the core relational model, with extensions for rich object abstraction and relationship modeling (e.g., an Author relationship between Documents and Contacts; a Lines relationship between Purchase Orders and Order Lines,).
File data. The CDM 1510 supports the “file stream” data type to store and manipulate unstructured (file) data. The file stream data type can store the data as a file and supports file access APIs. The file stream data type is natively supported in SQL Server, mapped to an NTFS file stream, and supports all the file handle/stream based operations. In addition to modeling the unstructured content as a file stream in the CDM 1510, using the entity types, useful content can be promoted as structured properties. Database-based file storage systems define the notion of a file backed item, which is an entity that models the structured properties along with the file stream of unstructured content. The file backed items provide for rich querying along with stream based operations on the associated file stream.
XML data. XML documents can be modeled to two primary ways in the CDM 1510: (1) store it as an XML data type; (2) map the XML document to one or more entities (e.g., similar to data contracts). The CDM 1510 supports the XML data type as supported in SQL Server. The XML data type can be type of any entity property; the XML data type allows for untyped or typed XML documents to be stored. Strong typing is provided by associating one or more XML schemas with the XML document properties.
Programming language integration, including Query, in the API 1502. The data platform 1500 feature components of sessions and transactions 1512, query 1514, persistence 1516, cursors 1515, services 1520, object cache 1522 and business logic hosting 1524 are encapsulated in several “runtime” classes available in the data platform API 1502.
The persistence entity 1516 includes a persistence engine which provides declarative mapping definitions that describe exactly how objects are assembled out of the component pieces that come from the relational stores. The engine includes a query generation component (not shown) that takes an expression defined by the query processor, in terms of an object query expression, and then combines it with the declarative mapping. This turns into equivalent query expressions that access the underlying tables in the database. An update generation component (not shown) looks at change tracking services, and with the help of mapping metadata, describes how to translate those changes in the world of objects to changes in the world of tables.
The persistence engine can include object-relational mappings. In other words, the modeling, access, and query abstractions provided by the data platform 1500 is object based. The primary storage technology utilized by the data platform 1500 is relational based. The persistence engine utilizes object-relational mappings (also referred to as “O-R mappings”), wherein the persistence engine can map the language classes to the underlying tabular representation.
Querying/searching File and XML data. The CDM 402 stores the unstructured and semi-structured data using the file stream and XML data types, respectively. The CQL is capable of querying these data types. For file content promoted to structured entities (e.g., WinFS file backed items), CQL's relational operators can query these entities. The unstructured data stored as file stream can be queried using full-text search. The XML content can be queried using XPath or XQuery.
Object-Relational mappings. Since the data platform 1500 provides an object-based abstraction on top of a relational (tabular) storage, it provides an O-R mapping component. The data platform 1500 supports both prescriptive mappings and non-prescriptive mappings (type designer has some flexibility in specifying mappings). Notice that a database based file storage system implementation today uses prescriptive mappings while more general O-R persistence frameworks need non-prescriptive mappings.
Caching. The data platform runtime 1506 maintains a cache of query results (e.g., cursors) and uncommitted updates. This is called the session cache. The data platform 1500 also provides an explicit cache, which enables the application to work in a disconnected mode. The data platform 1500 provides various consistency guarantees for data in the explicit cache. The cache performs identity management by correlating on-disk identity of data with the in-memory objects. The data platform runtime 1506 maintains the cache 1522 of query results (e.g., cursors discussed in detail infra) and uncommitted updates, wherein such cache can be referred to as the session cache because it is tied to the sessions, transactions 1512. In addition, it comes into existence when a session is created and goes away when the session is terminated.
The data platform 1500 can also expose another kind of cache, called the explicit cache. The explicit cache provides a cache of data from one or more queries. Once data is materialized into the explicit cache, the following data consistency guarantees can be provided: 1) read-only, not-authoritative; 2) write-through, authoritative; and 3) automatic refresh via exogenous notifications. The programming and query model against the explicit cache can be substantially similar as that over store data
Query Processor. Database access is via the query processor. The query processor allows multiple frontends to handle multiple query languages to be expressed, and then mapped to an internal canonical format. This is done in terms of the domain model and objects of the application it is working on. The queries then get passed to the processor, which is a pipeline, and then get converted into backend-specific queries.
Cursors. The data platform 1500 can provide both forward-only and scrollable cursors. Cursors support notifications, multi-level grouping with expand/collapse state, dynamic sorting and filtering. The cursor, rules 1515 are mechanisms that allow the set of data entities returned from CQL to be processed one at a time. An application can create a cursor over the result set by simply copying the entire result set into memory and overlaying a scrolling pattern on top of this in memory structure. But the ubiquity of this requirement and the complexity that is some times involved in implementing a cursor (especially when updates, paging, etc. are taken into account) means that any data platform should provide a cursoring model. In addition to the basic functionality of browsing and scrolling, data platform cursors can provide the following features: 1) exogenous notifications and maintenance; 2) multi-level grouping with expand/collapse state; and 3) dynamic sorting and filtering (e.g., “post-processing”). It is to be appreciated and understood that cursors may not be a different mechanism to specify a result set; result sets are specified by queries, and cursors are over these queries.
Business logic host 1524. The data platform 1500 provides a runtime environment to host data-centric logic on types/instances and on operations. Such data-centric business logic is distinct from application/business process logic, which can be hosted in the application server. Objects are not just rows in a database. When objects get materialized in memory, they are actually objects that have behaviors which the application can invoke. There are extension points in the system that are mainly events and callbacks that all operate to extend the data platform 1500 at runtime. These objects are not just objects, but CLR objects, NET objects, etc. The data platform 1500 allows the capability to intercept property ort method calls in those objects. Applications can customize the behavior of these objects.
The data platform 1500 provides several mechanisms for authoring business logic. These mechanisms can be divided into the following 5 categories: constraints, event handlers, static/instance methods, bindable behaviors, and static service methods each of which is discussed in more detail below. A constraints/security entity 1526 can be declarative and procedural. These constraints can be executed on the store, close in proximity to the data. Thus, the constraints 1526 are considered to be within the trust boundary. Moreover, constraints can be authored by the type designer.
The business logic hosting 1524 can employ an event handler. The data platform API 1502 raises several events on data change operations. Business logic authors can hook into these events via handler code. For example, consider an order management application. When a new order comes in, the application needs to ensure that the value of the order is less than the credit limit authorized for the customer. This logic can be part of event handler code which is run before the order is inserted into the store.
Services. The data platform 1500 provides a core set of services which are available to all data platform clients. These services include rules, change tracking, conflict detection, eventing, and notifications. Eventing extends the data platform runtime 1506 from framework-level services or for applications to add additional behaviors, and also is used for data binding at the user interface.
Constraints. The data platform 1500 provides a constraints/security component 1526 to at least one of allow the type designer to author constraints declaratively. These constraints are executed in the store. Typically, the scope of data platform constraints encompasses notions such as length, precision, scale, default, check, and so on. These constraints are enforced by the data platform constraint engine 1526 at runtime.
Security. The data platform 1500 provides a role based security model—the user's credentials determine her “role” (such as administrator, power user, approver, etc.). Each role is assigned a set of access permissions. The data platform security engine 1526 enforces these security policies. In addition, the data platform 1500 provides a security model for controlling access to entities in the data platform 1500. The security model can support authentication of an operating system user, authorization level of entities (e.g., with separate permissions for read and update), etc.
Note that the constraints/security component 1526 is illustrated separate from the data platform runtime component 1506, since it can operate as a separate entity therefrom. Alternatively, and perhaps more efficiently, the constraints/security component 1526 is combined with the store component 1508, which can be the database system.
In the following
Comments and additional constraints:
Declaring an entity type defines an EntityCollectionType and ReferenceType implicitly. Declaring complex type defines an InlineCollection implicitly. In practice these types could be implemented with generics, such as Ref <T>, EntityCollection<T>, InlineCollection<T>.
Property.Kind makes the association constraints easier to read, as the Property metaclass has a type and the property it represents has a type. Kind avoids any confusion that may bring.
AssociationEnd.Property is a property on the entity type on the other end of the association. It indicates a property referencing the entity on this end of the association. More formally on the association the following holds:
Alternative Implementation and Syntax
Following is an alternative implementation of a data model for object-rational data. The description of this alternative implementation uses a schema definition language, which one of many possible forms for expression for this alternative implementation. Note also that there are many similarities to the previous implementation described supra, yet noticeable differences in the form of naming and syntax, for example.
In this alternative description, relationships can now be defined at the top level using <Association> or <Composition> elements. There is no need to define a property on the source (or parent) in order to define a Ref association (or composition). Following are operational behavior that are introduced on relationships: OnCopy, OnSerialize, OnSecure, and OnLock. EntitySets is only defined within EntityContainerType—and is not defined within EntityType. NavigationProperty is no longer used and RelationshipProperty is utilized with slightly different semantics.
Explicit Ref typed properties are unscoped references to an entity, and do not create a relationship. RelationshipProperty is implicitly of type Ref (or Collection (Ref)). Relationship scope is specified within <Scope> clause of entity container type.
Relationships have been made more abstract. In the previous implementation, relationships were implicitly defined by Ref or collection valued properties within an entity - which presupposes a way of implementing relationships. Here, relationships are now defined at the top level. This removes relationship metadata from the type declaration, thus enabling type re-use.
Layering. Reference associations, conditional associations, and association entities are all defined within the <Association> element. This is a reflection of the fact that the meta-model has the core notion of an association between two entities; whether it is based on a reference, a condition, or related through an association entity is specified by various tags and sub-elements within <Association>. The specialized concepts (reference, condition, association entity) are layered on top of the core concept (association).
Separation of core modeling and persistence. The notion of an entity collection was used for two purposes in the previous implementation. Firstly, as a way to name and refer to persisted groups of entities and secondly, as a way to define composition. Thus, one had to deal with persistence just to model compositions. Entity collections are now a pure grouping/scoping notion; they are not allowed as fields within an entity type. A composition is defined at the top level by specifying abstract metadata about the participating entities. By separating persistence from compositions, the model is more flexible and orthogonal.
Simplification of Collections. Since an entity collection is now only allowed within an EntityContainerType (known formerly as EntityDatabaseType), only inline collections can be properties of an entity or complex type. This has resulted in a simplification of collection semantics. There is no longer a need to distinguish between entity collections and inline collections. There are just “collections” and they are used to collect inline types.
Ability to specify operational semantics. This implementation specifies relationships—associations or compositions—and specify operational behavior orthogonally on any relationship, regardless of its type. Accordingly, it is now possible to specify OnCopy, OnSerialize, OnSecure, OnLock behaviors on any relationship. Again, this provides more flexibility and captures a larger set of scenarios.
Arbitrary grouping of entities. The notion of a WinFS “item” is really an operation grouping if several entities (e.g., the Item entity, ItemFragment entity and the contained entities. Composition as defined previously was overly restrictive to model this notion. With the ability to specify operational semantics and the separation of entity collections from relationship definition allows picking and choosing operational behavior and have it apply to an arbitrary grouping of entities.
Renaming. Some names have been changed to better reflect their semantics. For example: (a) EntityCollection has been changed to EntitySet, since this is now an unordered grouping of entities; it is also consistent with the terminology used in classic ER modeling; (b) EntityDatabaseType has been changed to EntityContainerType to eliminate the term “database”, since it has many persistence/operational/administrative connotations; (c) InlineCollection has been changed to Collection because this is the only kind of collections allowed in this implementation model.
EntityDatabase is no longer employed, but is now an instance of EntityDatabaseType. <AssociationEntityType> is now defined within the <Association> element.
Thus, a data model describes the shape and semantics (constraints, behaviors) of, and relationships among, the various pieces of data that an application is interested in.
Both Customer and Address are similar in the sense that they both have internal structure (composed of multiple fields). But semantically and operationally, a Customer is different from an Address. Customer acts as the unit for query, data change operations, transactions, persistence, and sharing. An Address on the other hand always lives within a Customer and cannot be referred to or otherwise acted upon independently. In the CDM, such top level data units are called entities. All other data is considered to be inline to entities.
Looking now at Order, business rules require that every order have a corresponding customer. This is modeled by a relationship between the Order entity and the Customer entity. There are different kinds of relationships supported by the CDM. The one between an Order and a Customer is called as an association. Associations are typically used to model peer-to-peer relationships among entities. Each order is composed of several order lines (if five books are ordered on from a bookseller, then the information about each book is an order line). This is modeled as another kind of relationship-a composition. Each OrderLine within the composition is an entity.
The shape of data is described by defining types. Conceptually, a type is a name given to a set of values. For example, if something is of type int, a range of values is specified that is precisely the range of integers. A precise definition of the notion of type requires an excursion into category theory which is outside the scope of this document. Thus, the concept of a type is posited to be a formally primitive concept in this document. A type system is a mechanism for defining types and associating it with language constructs.
Central to the CDM is its type system. Data defined using this type system is considered to be strongly typed, similar to CLR. In other words, there is an expectation that the implementation provides for strict enforcement of type rules with no exceptions; that all types are known at compile time; and that any type conversions are predictable in their effect.
At the highest level, the CDM type system defines two kinds of types: entity types and inline types. Entity types have a unique identity and form the operational unit of consistency. Intuitively, entities are used to model the “top level” concepts within a data model, for example, Customers, Orders, Suppliers, etc. Inline types are contained types. They are contained within an entity type. They live and die with, are transacted with, and are referenced only within the context of a containing entity. Very roughly, an entity type is similar to a reference type in CLR while an inline type is similar to a value type. This is a limited analogy and applies only in the sense that inline types, like CLR value types, do not have an independent existence, while entity types, like CLR reference types, can exist independently and can be referenced independently.
There are several kinds of inline types: scalar types, complex types, the XML type, FILESTREAM type, and collections of these types. Some of these types are described in more detail infra.
A Schema Definition Language (SDL) is employed as syntax to describe the types. SDL is analogous to the subset of C# for defining classes, or the Data Definition Language (DDL) subset of SQL. SDL is expressed in XML (but it is not XSD based). Throughout this description SDL fragments will be used to illustrate the concept being described.
Scalar types are, in a sense, the simplest types available in the type system. They have no internal structure visible to the system. A scalar type represents a single value. CDM defines three kinds of scalar types: Simple Types, Enumeration Types, and Ref types.
Simple Types are built in, primitive types provided by the system. CLR value types are used as simple types the CDM—such as System.String, System.Boolean, System.Byte, System.Int16, etc. CDM also supports a number of types defined in System.Storage and System.Data.Sq1Types namespaces (System.Storage.ByteCollection, System.Data.Sq1Types.Sq1Single . . . ).
An Enumeration type defines a set of symbolic names—such as (red, blue, green) or (spring, summer, autumn, winter). An instance of an enumeration type has a value which ranges over this set.
An instance of a Ref type is a value that identifies an entity. A description of Ref types is provided infra.
Simple types can be used to build more complex types. A complex type can have a name or can remain anonymous.
A named complex type has a name and a set of members. A member can be as simple as a name and a scalar type. Each member is a scalar type (simple type in this case). Data members such as these are referred to as properties.
Two other kinds of members can exist in complex types: methods and computed properties, which will be described infra. A complex type can also contain other complex types.
An anonymous complex type, as the name implies, is structurally similar to a complex type, but cannot be used for declaring properties based on it. Anonymous complex types are returned as query results (such as in queries that project a subset of fields from entities). They exist in the CDM to ensure query closure. SDL does not have syntax for declaring these types.
Note one distinction between named and anonymous complex types: two named complex types with the exact same structure but different names are considered to be distinct type. But two anonymous complex types with the exact same structure are considered to be the same type.
An entity type is structurally similar to a complex type in that it is a type built out of scalar types and other complex types. However, semantically and operationally it is quite different - an entity is uniquely referenceable via an identity, the unit of consistency for various operations, and so on. Entity Types are defined in a fashion similar to Complex Types.
Collections store zero or more instances of inline types. In the following example, consider the Person type: this type stores contact information about a human being. Typically, the name, home address, work address, home phone, work phone, mobile phone, and email ID (to take the most common subset) are stored. Notice that in this list there are two addresses and three phone numbers. These can be modeled as collections using CDM, as shown in the example below:
Line 4 defines a collection of Address elements for the Addresses field; line 5 defines a collection of String elements for the PhoneNumbers field. Thus, the person can have, for example, two collections under a Person object—Addresses and PhoneNumbers. The Addresses collection contains two instances of Address type while the PhoneNumbers collection contains two instances of the String type. In the CDM, collections can be defined for simple and complex types.
Oftentimes it is desirable for a collection to have the semantics of an array. An array is a collection where the elements are ordered and indexed. Note that an array is ordered, not necessarily sorted. For example, [1,4,5,3] is an ordered collection, but unsorted. It is ordered because one can ask for the second element and be guaranteed (absent explicit changes) to get back ‘4’.
The Array attribute of the <Collection> sub-element on a collection typed <Property>, when set to true, makes it into an array. Array is an optional attribute whose value defaults to “false”.
Typically, the data model used by an application has many different types. Several of these types model totally separate concepts, e.g., Customer and Order. Such types do not share any members. But other types model concepts which have a bit of similarity with each other, are still different. For example, consider the Customer and Employee. Even though they model different concepts, there is an underlying thread of commonality between them—both have properties for address, phone number, and name. It is information needed to get in touch with somebody. In other words, it is contact information.
Even though the above example shows only inheritance of properties, derived types also inherit methods and computed properties from their base type. CDM follows the CLR semantics for inheriting methods (behavior). The (partial) SDL specification for the above inheritance hierarchy is shown in the example below:
As illustrated before, one purpose of inheritance is to share a generic concept among multiple types. Another purpose of inheritance is extensibility. Even after an ISV implements and deploys an inheritance hierarchy, customers can extend the types to suit their purposes via inheritance. A key to making this extensibility work is the notion of value substitutability (also known as polymorphism): every instance of a derived type is also an instance of a base type. Thus, if Manager is derived from Employee, then every instance of Manager is also an instance of an Employee. So when querying for all employees (base type), all managers (derived type) are returned as well.
Simple Type Constraints. The set of permissible values for a simple type can be restricted in a variety of ways, such as restricting the length (string and binary), precision, scale (decimal), etc. The default constraint allows the specification of a default value. The check constraint specifies a Boolean expression over the value of the property. If this expression evaluates to false then the property value is invalid.
Collection Type Constraints. The following constraints are supported on collections: an ElementOccurs constraint specifies the minimum and maximum cardinality of the collection. Unique constraint can be used to ensure uniqueness of collection element values. An ElementConstraint can be used to specify constraints on the type underlying the collection (e.g., if defining a collection of type Decimal, then this <ElementConstraint> can be used to specify the precision and scale).
A Check constraint specifies a Boolean query expression. This expression should evaluate to true in order for the collection to be considered valid. Note that this constraint applies to the collection property as a whole, not to an individual element within the collection.
It is recommended that type designers always define unique constraints on Collections, especially if the collection is expected to contain a large number of elements. At run time, when one element of a collection is updated, the uniqueness constraint enables the implementation to target that single element for update instead of rewriting the entire collection. This reduces update time and network traffic.
Derived Type Constraints. It is possible to constrain base type properties within a derived type by using derived type constraints. Consider a frequent flier program offered by various airlines. United Airlines, for example, has a program called Mileage Plus. This program has several membership levels—1K member, Premier Executive member, Premier member, and “just” member. To be a premier member, you have to fly at least 25,000 miles or 30 paid segments; for premier executive and 1K, these numbers are 50,000/60 and 100,000/100, respectively. Note that when defining derived type constraints, the constraints should be compatible with the base type constraints. For example, if an ElementOccurs constraint on a collection field in the base type specifies a minimum of one and maximum of fifty, then the derived type constraint should have a minimum of more than one and maximum less than fifty.
Nullability Constraint. Both simple and complex types can specify a nullability constraint. This means that an instance of this type can store the NULL value. This is specified by a Nullable attribute on the <Property> element.
Note that collection properties should not be set to Nullable and that Nullable is allowed even on complex typed properties. The default value of the Nullable attribute is “true”.
Thus far, the core modeling capabilities provided by the CDM type system of this alternative implementation have been provided. The CDM is a rich set of built-in types, ability to define complex types, create enumerations, express constraints, inheritance, and value substitutability (polymorphism).
Once these types are designed and deployed in the system, the application begins to create and manipulate instances of these types. Various operations such as copy, move, query, delete, backup/restore (for persisted data), etc., are performed on type instances. Following, the operational semantics of CDM are described.
Types and Instances. The type defines the structure of data. Actual data is stored in type instances. In object oriented languages for example, type and instances are called classes and objects.
Entity and Inline Types. Looking once again at the data used by the hypothetical LOB application, on the one hand, it has types such as Customers and Orders, and on the other hand, it also has types such as Address and AddressLine. Each of these types has a structure, that is each can be decomposed into multiple fields. However, there are two key differences between the Customer type and the Address type. Instances of Customer type have an independent existence while instances of AddressType only exist within (for example) a Customer type instance.
In this sense, a Customer is a top level type; instances of this type exist independently, are referred to independently (“show me the Customer whose CompanyName is . . . ”), and are transacted independently. The Address on the other hand is a contained type—its instances exist only within the context of a top level type instance, e.g., “the Address of a Customer” or “the shipping Address of an Order”, and so on.
The Customer type has an identity while the Address type does not. Every instance of Customer can be uniquely identified and a reference to it can be taken. The Address type cannot be separately identified outside of a type such as Customer. This fundamental distinction is formalized in the CDM by the notion of Entity Types and Inline Types. In the LOB example above, a Customer is an Entity type while an address is an Inline type.
Characteristics of Entities. Even though an entity type is structurally similar to a complex type, it has several characteristics which distinguish it from inline types. These characteristics endow an entity instance with specific operational semantics. An Entity is an instance of an entity type. An entity is the smallest unit of CDM data that can exist independently. This implies the following characteristics:
Identity: every entity has an identity. This identity is guaranteed to refer to the entity and only that entity during its lifetime. The existence of identities implies that a reference may be taken to it entities.
The entity identity is the reason why an entity is a “top level” data unit—every entity is referenceable and hence can be used in query, update, and other operations.
Unit of Consistency: queries and updates are guaranteed consistent at the level of entities. An entity is the target of most CDP operations—such as copy, move, delete, backup, restore, etc. These operations do not target inline types.
Unit of Transactions: transactions occur at the level of an entity. Stated another way, entities can be separately transacted, but inline types are always transacted within the context of the containing entity.
Unit for Relationships: relationships are specified at the level of entities, not inline types. Entities can relate to each other via compositions and reference associations; an inline type instance cannot be the source or target of a relationship.
Unit of Persistence: an entity is the unit of persistence. Inline types do not persist themselves, but persist only as parts of an entity. Essentially, inline types act by value—hence, they have to be stored as part of an entity.
Note however, that the CDM itself neither implies nor requires that entities be persisted. The CDM remains consistent regardless of whether the entities are in memory or are persisted.
Unit of Sharing: an entity is the unit of sharing within the data model. For example, the LOB application has the notion of a Product. A Supplier supplies one or more products; an Order has one or more products. In order to prevent redundant storage of product information, a Product entity can be defined and references to instances of this entity can be inserted within Order and Supplier entities. If on the other hand, a Product is an inline type, then both Order and Supplier have to contain Product instances, leading to redundant storage and the insert, delete, and update anomalies that result from it.
Enumeration types are declared using the <EnumerationType> element as shown in lines 1-5. Once declared, these types can be used within any Complex or Entity Type (see line 8).
XML and FILESTREAM types (not shown in this example) are built-in types, but not exactly scalar (that is, the data model can reason about the internals of these types to a limited extent).
Complex Types are declared using the <ComplexType> element, as shown in lines 7-13. The properties of a Complex type can be any inline type—scalar, collection, or complex type. Once declared, a Complex type can be used within any Complex or Entity types.
Collection types are declared using Collection keyword, followed by the name of an inline type. A collection of AddressType is shown in line 22.
As described supra, a fundamental characteristic of an entity is the fact that it has a unique identity. The entity identity is guaranteed to survive entity updates and continue identifying the same entity throughout its lifetime. An entity also has a key attribute. This key is composed of one or more properties. Entity keys are required to be unique within an EntitySet. Entity keys can be used by implementations to generate the entity identity. Entity keys are specified using the “Key” attribute of an <EntityType> element. Any set of scalar (including Ref types) or complex typed properties can serve as the key to an entity type as long as at least one property in the key is non-nullable. Collection typed properties should not be part of a key.
Note that only the base type of an entity type hierarchy can specify the key. Derived types automatically inherit the key from the base type. This supports making value substitutability (polymorphism) work deterministically.
Entity References. Because entities have identities, it is possible to have entity references. In this implementation of the CDM, entity references are primarily used for relationship navigation. For example, the Customer entity is related to the Order entity via the Customer_Order relationship. Given an Order, it is possible to find the associated Customer using a special property on the Order entity (known as a relationship property). This property is of type Ref(Customer). The keyword Ref denotes that this is a reference to another entity type.
It is also possible to explicitly define a Ref typed property within an entity type or a complex type. This property serves as a “pointer” to the referenced entity. The existence of a Ref typed property does not define or imply a relationship between the containing and referenced entity types. Relationships should to be defined explicitly. In the case of persisted entities, references are durable.
Inline Types. An entity either lives by itself at the top level, or is contained within an entity set. An entity cannot be directly embedded within another entity or a complex type. A complex type on the other hand, can be embedded in another complex type or an entity type. That is, both entities and complex types can have properties whose type is a complex type.
Scalar types and collections can also be embedded in other complex types or entity types. This is the reason why complex types, collections, and scalar types are collectively referred to as inline types.
Entity Sets and Entity Containers. Provided are two scenarios that help describe where entity instances live. Entity instances are persisted in some data store. The CDM itself is carefully crafted to be store agnostic. It can, in theory, be implemented over a variety of storage technologies. Entities also have a transient existence in memory. The CDM does not specify how they are materialized in memory and made available to the application. An application typically deals with multiple instances of entities of various types. These instances can either live by themselves at the top level, or be contained within entity sets. An entity set contains instances of entities of a given type (or subtype thereof). An entity key is meaningful within an entity set. Each entity within an entity set should have a unique key.
An Entity Container is a group of one or more entity sets. An entity container is used for scoping association and composition relationships. The entity container is also the persistence and administrative unit over which backup, restore, users, roles, and other such semantics are defined.
A schema is an organizational structure for holding types, relationships, and other names. It provides a namespace within which the usual name uniqueness rules apply. A schema is defined using the <Schema> element, and is the top level element of a SDL file. All entities, relationships, complex types, etc., are be defined within a schema. Schemas defined elsewhere can be imported with the <Using > element, and types defined therein may be referenced using their fully qualified name. It is also possible to use types in another schema directly (e.g., without having <Using> element by using the fully qualified syntax: Namespace.Type-name. <Using> is a syntactic shortcut to reduce typing. A type defined in any deployed schema can be used by providing the fully qualified name of that type.
The CDM specification is mute about what should happen once the types are written in SDL. With respect to types that re written in SDL, the CDP-which implements the CDM-provides tools that take an SDL description and deploys these types on the store. Depending on the nature of the store, this deployment may involve creation of store level abstractions. For example, if the store is an RDBMS, then the deployment would create databases and tables corresponding to the schemata and types; it might also create SQL code for constraint and data model enforcement. Alternatively, the CDP can be used with an existing database by mapping the database's tables to CDM types.
In addition to creating store level items, CDP tools will also generate CLR classes corresponding to the types and relationships. The implementation may use the Namespace attribute of a <Schema> element to create CLR namespaces for scoping these classes. The CDP also supports mapping existing object models to CDM types. Note that the CDM types does not require a store. In-memory deployments are possible.
Any meaningful set of real-world data always has relationships among its constituent parts. For example: A customer places one or more orders, an order contains order details, a product is available from one or more suppliers, and so on. In this sentence, data is shown in italics and the relationships among them as underlined. Clearly, the concept of relationships is integral to any data modeling endeavor. The relational model does not explicitly support relationships; primary keys, foreign keys, and referential integrity provide tools to implement some of the constraints implied by relationships. A key value of the CDM is its support of relationships as a first class notion within the data model itself. This allows for simpler and richer modeling capabilities. Relationship support also extends to CDM queries, which allow relationships to be referenced explicitly in a query and provide the ability to navigate based on relationships.
Modeling semantics can require related entities to behave as a unit for various operations. For example, consider the Contains relationship between Order and OrderLine in
Associations is one of the most common and useful relationships in data modeling. A classic example of an association is the relationship between the Customer and Order entities. Typically, this relationship includes multiplicity where each Order is associated with exactly one Customer. Every Customer has zero or more Orders. The relationship can also include operational behavior: it is useful to operate on related entities as if they were a unit. For example, when a Customer with outstanding Orders is deleted, copied, or serialized, it may be desirable for the corresponding Orders to also get deleted, copied, or serialized. The <Association> element is used to model this relationship.
The “Type” attribute on <End> defines the entity type participating in the relationship.
The “Multiplicity” attribute defines the cardinality of the end. In this example, it is specified that there is exactly one Customer (value is “1” on line 15) for zero or more orders (value is “*” on line 14). Other values for this attribute are:
“0 . . . 1“—zero or one
“1”—exactly one
“*”—zero or more
“1 . . . *”—one or more
“n”—exactly n
“n . . . m”—between n and m, inclusive, where n is less than or equal to m.
The <OnDelete> element on line 16 specifies the operational behavior for delete operations. It has one attribute, “Action” which can have one of 3 value. Cascade—delete all Orders belonging to the Customer. This is what is shown in the example above. Restrict—Prevent deletion of a Customer when outstanding Orders exist. RemoveAssociation—Remove the relationship between Orders and the Customer. Note that in order to have consistent semantics, the <OnDelete> element is allowed on only one of the ends of a relationship.
The CDM allows the specification of behavior on other operations: copy, serialize, secure, and lock. They all follow the same pattern as illustrated for the delete behavior—by using a sub-element of the form <OnOperation Action=“. . . ”>. The specification of operational behavior can be done on both associations and compositions.
Associations model peer-to-peer relationships. Compositions model strict containment (parent-child) relationships. A classic example of a composition is the relationship between an Order and the Line items corresponding to it. The order-line is completely controlled by its parent order. Semantically, it has no meaningful existence outside of the containing order. Deleting an order forces the deletion of all the order-lines. Thus Order and Line entities are referred to as being related via composition (that is, an Order is composed of one or more Line entities).
A composition relationship has the following three characteristics:
Uniqueness: The children of a given parent should have a unique key among themselves. For example, a Line belonging to an Order should have a unique key to distinguish it from all other Lines for the same Order. A common way of implementing this would be to derive the child's identity by combining its key and the parent's identity.
Cascade Delete: Deleting the parent deletes the child.
Multiplicity: The multiplicity of the parent is one. This expresses the semantic that a Line cannot exist without an Order.
The <Composition> element is used to model a composition.
Note that there is no need to specify “Multiplicity” for the parent because this is always one for a composition. In fact, it is an error to specify a “Multiplicity” other than one for the parent of a composition. Multiplicity may be specified for the child end and it can have any of the values infra.
Operational behaviors can be specified on compositions—for the copy, serialize, secure and lock operations. Note that there is no need to specify <OnDelete> because a composition always has cascade delete semantics. It is an error to: Specify <OnDelete Action=“x”> where x is not equal to “Cascade” and Specify <OnDelete> on the child end.
Often it is useful to operate on related entities as if they were a unit. For example, when copying an Order entity, it is usually desirable to have the corresponding Line entities copied as well. The CDM allows the specification of these behaviors for various operations: delete, copy, serialize, secure, and lock. The general pattern used is to define a sub-element of the form <OnOperation Action=“value”/>. The Operation can be one of Delete, Copy, Serialize, Secure, or Lock. Depending on the Operation, the value of “Action” can be Cascade, Restrict, RemoveAssociation, etc.
The following points should be kept in mind when specifying the operational semantics: all <OnOperation . . . /> elements should be specified on only one of the relationship ends. It can be illegal (for example) to specify <OnDelete> on Customer and <OnCopy> on Order. The conceptual model here is that one of the ends of the relationship is in a controlling role; operational semantics cascade down from that entity to all the other ends. There is no need to specify <OnDelete . . . /> for a composition. By definition, deleting the parent of a composition cascades the operation down to the child. All <OnOperation . . . /> elements should be specified on the parent of the composition. The end which specifies <OnDelete> or <OnSecure> with Action=“Cascade” should have a multiplicity of“1”. Otherwise, the semantics become inconsistent.
Relationships in the CDM are defined outside of the types that participate in them. The types themselves need not have a special property to establish a relationship. For example, the Order_Customer relationship can be specified using the top level <Association> element; the Customer and Order types do not have a property or other identifying marks within them to indicate that they participate in a relationship with each other.
Various mechanisms can be imagined for navigating from one end of the relationship to the other in a query or API. For instance, the CDP API has a static class for each relationship; this class has methods which can be used for navigation. To take the example of Order_Customer, the class would be named Order_Customer and the methods would be GetOrdersGivenCustomer( . . . ) and so on.
However, the most intuitive way would be to actually have a property on either end of the relationship. For example, if Order had a property called Customer, then navigation would be as simple as accessing a property. To enable this simple pattern for relationship navigation, the CDM has the notion of relationship properties. They can be defined on an entity for the express purpose of navigating to a related entity. Relationship properties are defined using the <RelationshipProperty> element. The <RelationshipProperty> element has three attributes: “Name” defines the name of the property. This name can be used in queries or the API to navigate to the other end of the relationship. “Relationship” specifies the name of an association or a composition. “End” identifies the navigation target. Relationship properties can be defined for both association and composition relationships, and can be defined at either end of the relationship.
Note that unlike “regular” properties, relationship properties do not have a “Type” attribute. This is because they are implicitly typed to an Entity Ref (or a collection of Entity Refs). This entity reference is used to navigate to the target entity.
A relationship property is either a single Entity Ref or a collection of Entity Refs, depending on the multiplicity of the end that it is targeting. The following table shows possible values for “Multiplicity” and the types of the corresponding relationship properties:
As described supra, an entity can either exist “by itself” at the top level, or reside within an entity set (which is itself within an entity container). Consider the situation within the context of the two ends of a relationship. In the case of “free floating” entities both ends of the relationship exist somewhere within the sea of entities. Consider when entities are grouped into multiple entity sets. For example, consider the following schema fragment:
The entity container (LOBData) has three entity sets: Orders, GoodCustomers, and BadCustomers. All Customer entities in GoodCustomers have a rather admirable credit and hence are deemed worthy of placing orders. The BadCustomers have defaulted on their payments. The obvious business rule here is to only want GoodCustomers to place an Order. In other words, restrict the scope of the Customer_Order relationship such that only entities in GoodCustomers can have Orders. Furthermore, all Orders must reside in the Orders entity set. This is done by using the <Scope> element inside an <EntityContainerType>, as shown below:
The <Scope> (line 5) element has one attribute—“Relationship”—which identifies the association or composition whose ends are being scoped. The <End> sub-elements, one for each end of the relationship, are used to specify the entity set that this end is scoped to. It is seen on line 6 that the Customer end is scoped to the GoodCustomers entity set; and on line 7, the Order end is scoped to the Orders entity set.
If a deployed schema has an entity container, then each end of each relationship is scoped one and only one entity set within the entity container. If explicit an <Scope> clause is absent for a given association or composition then the implementation scopes the ends to default entity sets. The manner in which implicit scoping is done is implementation defined.
The CDM allows specification of stricter scoping rules for compositions. Consider the composition between Order and Line. It does not make sense for a Line to exist without a corresponding Order. Thus, the entity set which contains Line entities should enforce the business rule that every Line should be the child of one, and only one, Order. This is done using the <Requires> element.
The following scoping requirements are satisfied by the above example: The Line end of Shipment_Line should be scoped to ShipmentLines entity set (line 16). Every Line in ShipmentLines must be an end of the Shipment_Line composition. This is done in line 7 using the <Requires> element. This element has one attribute, “Relationship”, which identifies the composition. The Line end of Order_Line should be scoped to OrderLines entity set (line 12). Every Line in OrderLines should be an end of the Order_Line composition (line 4). If the <Requires> element is absent in an entity set, then it is not constrained to have entities belonging to any given composition. This can be an important requirement in many modeling scenarios. Note that the <Requires> element can be used to specify only one composition name.
The alternative description of CDM so far has been agnostic about where exactly the entities reside. The entities can reside in memory or actually be persisted durably in a store. The core CDM neither implies, requires, nor cares about where the entities reside—the entities can reside in memory or be persisted. Following is a description of persisted entities and how CDM supports the notion of persistence.
The entity container is the top level organizational unit for entities and is the level at which persistence semantics are defined. An entity is the smallest unit of data that can be persisted. Typically, persisted entities have operational semantics associated with them—such as copy, move, backup, restore, serialize, etc. The entity container is the unit of persistence on which these semantics can be defined. It is also the organizational unit on which administrative notions such as backup, restore, users, roles, etc., are defined. In this sense, an entity container is similar to a SQL database.
The entity container has one or more entity sets within which reside all the entities. Thus, defining an instance of the entity container jump-starts the storage allocation process. Typically, the type designer defines an entity container type. The CDM itself does not define how an entity container instance is obtained from the type, and how this instance is mapped to the underlying storage structures (such as EntityContainer→Database, Entity set→table(s), etc.). The CDP defines a set of prescriptive mappings between an entity container (and the types contained therein) to SQL storage. CDP also allows non-prescriptive (user specifiable) mappings for the same.
In general, applications are free to organize entities in any way they choose—the CDM does not dictate any specific organizational structure. When talking about persisted entities however, the semantics of entities, entity sets, and the Entity Container imply a certain structural organization. When entities are persisted: every entity should live within an entity set; every entity set should live within an Entity Container; and the Entity Container should live at the top level.
Not all application developers define types; most of them use types which have been defined and deployed by other ISVs. Provided herein is a mechanism for the user of a type to be able to extend the pre-defined type in specific ways without harming the fidelity of the original type. Entity Extensions and Extensible Enumerations are two facilities provided in the CDM to allow independent extensibility of types.
CDM solves this problem elegantly by the notion of an entity extension. Entity extensions are referenceable, structured types; they can contain any member that an entity can contain—properties, methods, and computed properties. An extension is defined using the <EntityExtensionType> element. Extension types do not enjoy inheritance. <EntityExtensionType> can contain any element that can be present in an <EntityType>. It is not necessary for the property names to be unique across all extensions or even across properties of the type being extended.
Lifetime Management. An extension is tied closely to an entity because each extension type specifies the entity type that it extends. All entity extensions have a default property named “Entity” of type Ref(T), where T is the value of ExtendsType, which points to the entity instance of which this is an extension. This is shown schematically in
An extension has no independent existence outside of the entity that it is extending. CDM requires that lifetime management of extensions be part of the implementation—that is, the implementation should delete all extensions when the corresponding entity is deleted. In this sense, extensions behave like they are compositions of the corresponding entity. Even though entity extension instances are considered to be part of the cohesive unit of data defined by the containing entity, there is no consistency guarantee concerning changes to the entity and the extension, except as explicitly managed via an application using transactions.
Instance Management. If InstanceManagement=“Implicit”, all of the properties defined in an extension should either be nullable, specify a default value, or have a Collection type with a minimum occurrence constraint of zero. Note that a singleton inline typed property cannot specify a default value and so must be nullable. This allows queries to be executed against the extension as if it were always present.
If InstanceManagement=“Explicit”, applications should explicitly add and remove an extension to/from an entity instance using the operations defined in the query language for the common data model. A means to test for the presence of an extension is also provided.
Preventing Extensibility. Type designers may want to prevent the creation of extensions for a given type. This can be done by the “Extensible” attribute on the <EntityType> or <AssociationEntityType> elements. When set to “true” (which is the default value), <EntityExtensionType> s can be defined for this entity. This can be sent to “false” to prevent extensions to this type.
Extensible Enumerations and Enumeration Aliases. As vendors or their customers evolve a set of deployed types, it sometimes becomes necessary to add names to an existing enumeration. For example, consider an enumeration called ToyotaCarModels, which probably had the following names in 2795: [Corolla, Camry, MR2, Celica, Avalon]. However, in 2005, it needs to have these additional names: Prius, Echo, Matrix, and Solara.
Enumerations can be extended using the <EnumerationExtensionType> element. This is shown in the example below:
In line 7, the enumeration type ‘C’ is extended to add the names ‘Q’, ‘S’, and ‘T’. In line 12, a separate extension is made to ‘C’ by adding the names ‘T’ and ‘U’. Since the enumeration members are always qualified by the name of the underlying base or enumeration extension type, duplicate names in the extensions are OK (D.T and E.T in this example).
It is often useful to have an alias for an existing enumeration member name. A classic example of this occurs in a Seasons enumeration. In the United States, the seasons are [spring, summer, fall, winter]. In the UK or India, “fall” is called “autumn”. It is then desirable to be able to define both names, but have them conceptually map to the same enumeration member. The “AliasesMember” attribute in <EnumerationMember> element defines an alias for an existing name. In this example, C.R, D.T, and E.U are all aliases for C.Q and hence do not represent unique names.
CDM Type System
The following table presents the CDM type system in the form of an abstract syntax, exposing the system's recursive structure: the infinite set of legal sentences implied by the finite grammar. First, the type system is presented in-toto, then again with expository text and small examples.
Base Types
Structural types. The first block of definitions is a recursively closed sub-grammar comprising the CDM structural types. Sentences in this sub-grammar may refer only to (1) other terms in the same sub-grammar, (2) nominal types, by name only, (3) terms in the last block—the base types. The base types include only various name spaces and straightforward types to be developed elsewhere. With one exception (collection types in entity containers), structural types may not be directly instantiated. Rather, they build up entity types, which are a kind of nominal type and are the primary way to instantiate data.
A simpleType is either a scalar, an enum, a cname, an sname, or a reference to an ename. A scalar is a base type with no significant internal structure, like a machine-primitive Integer, a SQL Decimal, or a .NET String or DateTime. A very important kind of scalar type is the GUID. This type has the property that every instance is universally unique across space and time. Its primary use is as the type of system-generated or surrogate keys. An enum is a straightforward enumerated type as seen in C#, Visual Basic, or other .NET programming languages. A cname is a name of a complexType, one of the nominal types defined below. An sname is a typeSynonym, semantically the same as writing the definition of the named type in-line. Note that snames may not refer to nominal types: they are mere synonyms for structural types. Finally, an ename is a name of an entityType, the primary kind of nominal type. enames are enclosed in a literal Ref < > notation to emphasize that only entity types have first-class references. References to entities may model foreign keys in relational databases. CDM formalizes this usage through type-based Relationships.
There are two aspects to equality in general: first, whether two type expressions denote the same type; second, whether two instances of a single, given type are equal. The terminology “equality over types” is employed for the former and “equality over values” or “equality over instances” for the latter.
Equality over simple types is straightforward: two simple types are either manifestly the same or not (enums must have the same symbolic tags in the same order). Equality over values of simple types is defined, in principle, by bitwise comparison. In practice, some of the .NET types will have more involved equality operations. Suffice it to say that every simple type must furnish an equality operator taking two instances of the type and returning a Boolean.
A tupleType is one or more pairs of the form structuralType aname, each followed by an optional hash mark, all enclosed in ‘curly braces.’ A tuple type is like the payload of a C# class definition consisting of members or attributes, each of which has a name and one of the structuralTypes, recursively. Names of attributes are of type aname, that is, drawn from a domain of symbolic names distinct from those of ename, cname, etc. Any reference to an attribute by name must be unambiguous.
Equality over tuple types is defined as follows: two tuple types are equal if and only if their attributes have the same names and types and appear in the same order. So, {
Hash marks denote attributes that make up key specifications, meaning that, under certain circumstances, the values of hash-marked attributes must, collectively, be unique within collection-type sets or within entitySets. If more than one attribute in a type has hash marks, then the key is a compound key, consisting of all the marked attributes, concatenated, in the order in which they were defined in the text. Compound keys may span multiple tuple types, so, for instance, in
That is a toy tuple type for a contact card with three attributes named personName, socialSecurityNumber, and address. These attributes have the following three types, respectively: a free-form string for a person's name; a nested tuple comprising the three parts of a social-security number, marked with a hash-mark so as to act as a unique key in an entity set or set collection; and a nested tuple containing a street, city, state, and a two-part zip code, again represented as a more deeply nested tuple.
A collectionType is either a List,Set,Multiset, or some others, each parameterized by another structural type. Most of the time, collection types are instantiated indirectly as components of entities. However, they can be directly instantiated inside entity containers, as documented below.
Lists. A list is an ordered collection of instances of the structuralType. “Ordered” means there is a mapping from the positive integers 1, 2, . . . to the elements of the collection. “Ordered” should not be confused with “sorted:” the list [1, 3, 2] is ordered but not sorted; [1, 2, 3] is both ordered and sorted. The special case of a pair is commonplace. A pair is just a list of length two.
Equality over list types is recursive over the structural type of the members of the list. Two instances of a list type are equal if (1) there are the same number of members; (2) the member values are equal, recursively; and (3) the members appear in the same order. Example: List<Int> is the type of an list of integers.
Sets. A set is an unordered collection of instances of the structuralType wherein duplicates are not permitted. Set<Int> would be the type of a set of integers. Operations for inserting members, testing membership, and computing unions, intersections, etc.
Equality over set types is recursive over the structural type of the members of the set. Two instances of a set type are equal if they contain the same number of members and the members' values are equal, recursively, independently of order.
Counting upward from one is the mathematical convention; programming languages tend to count upward from zero. Indecision whether to count from one or zero obtains even in mathematics, however. Something so basic as the set of Natural numbers, written N, is sometimes defined as 0, 1, . . . and sometimes as 1, 2, . . . .
Multisets. A multiset is an unordered collection of instances of the structuralType, where duplicates are permitted. For instance,
Equality over multiset types is recursive over the structural type of the members of the multiset. Two instances of a multiset type are equal if (1) they contain the same number of members; and (2) the members' values are equal, recursively, independently of order, but respecting the counts of duplicates. So, [1, 1, 2, 3] and [2, 1, 3, 1] are equal as multisets, but [1, 1, 2, 3] and [1, 2, 2, 3] are not. Mathematically, the most straightforward representation of a multiset is a set of pairs of multiplicity counts and values. If multisets are so represented, then their equality is automatic, relying only on set equality and pair equality.
Nominal Types. Nominal types have the distinguishing feature that they must be uniquely named within their domains. Two of the nominal types participate in subtyping relations. Those two are complexType and entityType. Subtyping for structural types is much more complicated theoretically so is not allowed in CDM.
Entity sets are nominal both because their names are part of entity references, of type Ref<e>, and because they are one way to root instance trees. In that role, they must have names so that applications may retrieve instance data. Entity collections must also be named because they also root instance trees. Type synonyms are nominal types because their essence is naming. Finally, relations are nominal so that Association Entities may share them and refer to them by name.
Equality over nominal types is, in some sense, easier than equality over structural types. In its most draconic form, it is an error for two nominal types to have the same names and have any difference, even syntactically, in their definitions.
Equality over values of nominal types also can take advantage of naming. In particular, two instances of an entity type are equal if and only if their references are equal. This is the justification for calling the reference to an entity its identity.
Those are the six different kinds of declarations of nominal types. A type synonym introduces a particular sname into the domain of snames. When that sname is used elsewhere, the structuralType to which it refers is substituted for the sname. For example,
A complexType introduces a cname that (1) optionally extends another complex type; where (2) every instance obeys an optional constraint, expressed in a first-order predicate calculus; and (3) has the additional structure denoted by the structuralType, meaning that the given structural type is added to the structural types of the ancestors, and collisions of attributes are not permitted.
CDM permits single inheritance, meaning that a complex type may extend at most one other complex type, but recursion is permitted, so that a direct ancestor may, in turn, extend at most one other complex type.
Complex types are the richest kind of structured CDM inline type. Inline types have the distinguishing feature that, while it is possible to refer to them by name in other type definitions, it is not possible to instantiate them directly nor to take a reference to instances that may occur in entities. It is also the case that embedded key specifications, in the form of hash-marked attributes in tuple types, are ignored in the context of complex types.
The distinction between complex types and type synonyms is that the former may enjoy inheritance and suffer constraints. As an aside, note that some constraints can be checked statically, at compile time, through theorem-proving and model-checking technologies. Other constraints must be checked at run time, in which case a compiler must emit constraint-checking code. Example:
An entityType introduces an ename that (1) optionally extends another entity type; where (2) every instance obeys an optional constraint; that (3) optionally refers to a relationship by its rname; and (4) has the additional structure denoted by the structuralType, meaning that the given structural type is concatenated linearly to the structural types of the ancestor entity types, and collisions of attributes are not permitted.
Key specifications (hash marks) are also concatenated, meaning that the key specification of an entity type with ancestors is the concatenation of the key specifications of the included tuple types, in linear order.
An entity type should have a key specification. This implies that an entity type must have at least one tuple type with at least one hash mark in its inheritance chain. This restriction is enforced semantically so as not to over-complicate the syntax.
The optional rname accounts for CDM Association Entities, which are binary relations with extra attributes. Find more details in the section on relationships below.
CDM permits a single inheritance, meaning that an entity type may extend at most one other entity type, recursively. Entity types may not extend complex types, and complex types may not extend entity types. In other words, complex types and entity types inhabit separate subtyping universes.
Entity types are the richest kind of structured CDM types that participate in entitySets, where the hash-marked attributes in tuples constitute primary keys in the ordinary sense of relational databases. Additionally, entity instances have identities, which are their key values concatenated to the identities of their entity sets. 3 These identities constitute the values of the explicit Ref<ename> terms that appear in block one of the grammar.
Entity types are the primary kind of CDM type that can be directly instantiated, always in the context of an entity set, which is CDM's model for relations or tables, again in the ordinary sense of relational databases. Example:
The hash-mark on that attribute implies that social-security number will be the primary key in entity sets of employeeEntity instances. If the user preferred employee number over social-security number for keys, it would be necessary to define new tuple types with employee number hash-marked and social-security number not hash-marked, or to use the hash-mark-manipulation operations below to create new types from old.
An entitySet is a named collection of entities of type ename. Duplicate entity instances are not permitted in entity sets and the entity types must have key specifications, that is, one or more attributes tagged with hash marks. The name esname of the entity set is introduced into the distinct domain of esnames.
An entityContainer is a named collection of one or more terms, each of which is either an entity set denoted by its esname or an anonymous collection type.
Relationships. CDM requires the following canonical forms of relationships: (1) Association, (2) Conditional Association, (3) Association Entity (4) Composition. These are all variations of the same lower-level notion of a mathematical relation, and we formalize them as such.
A mathematical relation is a subset of a Cartesian product of sets such that all members of the subset satisfy some constraint. Consider a list of one or more sets (A1, A2, . . . An), also written as Ai, i=1 . . . n. The Cartesian product of these sets, written C=A1×A2× . . . ×An, is the set of n-ary, ordered tuples
C={(α1, α2, . . . , αn)|α1 ε A1, α2 ε A2, . . . , αn ε An}
The Cartesian product is the largest relation that can be formed from a list of sets because the constraint does not filter out any members.
To form a relation with fewer members than the full Cartesian product, the constraint must filter some out. For example, consider the set of employee numbers, 1 through 100, written 1 . . . 100, and the set of department numbers, 1 through 10. One way of relating employees to departments would be as follows, defining the symbol as the name of the relation:
{(e, d)|e ε 1 . . . 100, d ε 1 . . . 10, (e−1)div10=d−1}
where ‘div’ is number-theoretic integer division. This relates employees 1 through 10 to department 1, employees 11 through 20 to department 2, and so on. The predicate is the conjunction of the three parts e ε 1 . . . 100, d ε 1 . . . 10, and (e−1)div10=d−1.
This example happens, implicitly, to specify a many-to-one relationship between employees and departments. It is standard shorthand to use the name of the relation as an infix ‘operator’:
(e, d)ε =e d
First-order predicate calculus suffices to express all constraints of practical interest, including uniqueness, multiplicity, reference, and conditional membership. CDM gives names to certain patterns of constraints that are important or very common in practice. For instance, relational database systems often implement one-to-many relationships by foreign keys. A CDM referential constraint is a statement that the value of a foreign key must appear as a primary key in the target entity set. The value of that foreign key is an instance of type Ref<ename>. A pattern in predicate calculus can be abstracted to stand for the class of referential constraints.
Informally, a constraint is a composition of individual terms through logical connectives like ‘and,’ ‘or,’ and ‘not.’ Each term is an expression of Boolean type, which may refer to attributes of entities by ename and aname; to constants; to comparison operators like ≦; to variables introduced by quantifiers like ∀ and ∃; and to some special, side-effecting operational constraints, for is provided the syntax:
Each of the nine forms of operational constraint refers to one of the two entity types in a binary relation by its ename. Cascade means that the operation Delete, Copy, or Serialize, should propagate to instances of the other entity type in the relation. Restrict means that the operation should only take place if instances of the other entity type independently perform a compatible operation. Remove means that the relation itself should be removed when the operation completes.
Another special form of constraint is the multiplicity constraint:
CDM uses the term conditional constraint for constraints that do not fall into any of the categories above. It is still being debated whether a class of compositional constraints should be identified.
An entitySet is, logically speaking, a unary relation, where the Cartesian product is trivial, consisting of just one set. A binaryRelation is a binary relation, where the Cartesian product names two sets. Ternary, quaternary, etc., relations are designable, with larger and larger Cartesian products. But binary relations satisfy virtually all practical scenarios, plus, all larger relations can be trivially built up as compositions of binary relations.
A binaryRelation introduces an rname into the domain of rnames and defines a binary relation on pairs of entities or other relations that satisfy the mandatory constraint. In the base case, where the pairs are entities and not other relations, the pairs are members of the Cartesian product of the universal sets of the entity types. The recursive case, where one or both of the related kinds are other relations, supports compositional building of relations of larger arity.
Association. A CDM Association is a binary relation, as defined above.
Conditional Association. All binary relations are conditional, meaning that the constraint is mandatory. Hence, CDM Conditional Associations are binary relations, as defined above.
Association Entity. An entity type may optionally refer to a binary relation by rname. Such an entity accounts for CDM Association Entities.
Composition. A binary relation that satisfies the following two criteria is a CDM Composition: there is a multiplicity constraint in which at least one of the two entity types has a multiplicity of one; and there is an operational constraint on that entity type of the form On Delete ename Cascade.
Operations at the Type Level
Writing betterContactCard drop is the same as writing toyContactCard, since betterContactCard was built up from toyContactCard by the declaration ComplexType betterContactCard toyContactCard.
(tupleType|sname)++(tupleType|sname)→tupleType
defines the binary operation ++, which concatenates two tuple types—or synonyms that name them—to make another. Writing
(tupleType|sname)−−aname→tupleType
removes the named attribute from the tuple type. This is a nop if the input tuple type does not have the named attribute. Writing
(tupleType|sname)StripHashes→tupleType
takes all the hash-marks off all attributes in its argument's tuple type. Writing
(tupleType|sname)aname AddHash-!tupleType
results in a tuple type with the named attribute hash-marked. If that attribute were already hash-marked, it remains so. Writing
Referring now to
Generally, program modules include routines, programs, components, data structures, etc., that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the inventive methods can be practiced with other computer system configurations, including single-processor or multiprocessor computer systems, minicomputers, mainframe computers, as well as personal computers, hand-held computing devices, microprocessor-based or programmable consumer electronics, and the like, each of which can be operatively coupled to one or more associated devices.
The illustrated aspects of the invention may also be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules can be located in both local and remote memory storage devices.
A computer typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer and includes both volatile and non-volatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media can comprise computer storage media and communication media. Computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital video disk (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
With reference again to
The system bus 3208 can be any of several types of bus structure that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and a local bus using any of a variety of commercially available bus architectures. The system memory 3206 includes read-only memory (ROM) 3210 and random access memory (RAM) 3212. A basic input/output system (BIOS) is stored in a non-volatile memory 3210 such as ROM, EPROM, EEPROM, which BIOS contains the basic routines that help to transfer information between elements within the computer 3202, such as during start-up. The RAM 3212 can also include a high-speed RAM such as static RAM for caching data.
The computer 3202 further includes an internal hard disk drive (HDD) 3214 (e.g., EIDE, SATA), which internal hard disk drive 3214 may also be configured for external use in a suitable chassis (not shown), a magnetic floppy disk drive (FDD) 3216, (e.g., to read from or write to a removable diskette 3218) and an optical disk drive 3220, (e.g., reading a CD-ROM disk 3222 or, to read from or write to other high capacity optical media such as the DVD). The hard disk drive 3214, magnetic disk drive 3216 and optical disk drive 3220 can be connected to the system bus 3208 by a hard disk drive interface 3224, a magnetic disk drive interface 3226 and an optical drive interface 3228, respectively. The interface 3224 for external drive implementations includes at least one or both of Universal Serial Bus (USB) and IEEE 1394 interface technologies. Other external drive connection technologies are within contemplation of the subject invention.
The drives and their associated computer-readable media provide nonvolatile storage of data, data structures, computer-executable instructions, and so forth. For the computer 3202, the drives and media accommodate the storage of any data in a suitable digital format. Although the description of computer-readable media above refers to a HDD, a removable magnetic diskette, and a removable optical media such as a CD or DVD, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as zip drives, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the exemplary operating environment, and further, that any such media may contain computer-executable instructions for performing the methods of the invention.
A number of program modules can be stored in the drives and RAM 3212, including an operating system 3230, one or more application programs 3232, other program modules 3234 and program data 3236. All or portions of the operating system, applications, modules, and/or data can also be cached in the RAM 3212. It is appreciated that the invention can be implemented with various commercially available operating systems or combinations of operating systems.
A user can enter commands and information into the computer 3202 through one or more wired/wireless input devices, e.g., a keyboard 3238 and a pointing device, such as a mouse 3240. Other input devices (not shown) may include a microphone, an IR remote control, a joystick, a game pad, a stylus pen, touch screen, or the like. These and other input devices are often connected to the processing unit 3204 through an input device interface 3242 that is coupled to the system bus 3208, but can be connected by other interfaces, such as a parallel port, an IEEE 1394 serial port, a game port, a USB port, an IR interface, etc.
A monitor 3244 or other type of display device is also connected to the system bus 3208 via an interface, such as a video adapter 3246. In addition to the monitor 3244, a computer typically includes other peripheral output devices (not shown), such as speakers, printers, etc.
The computer 3202 may operate in a networked environment using logical connections via wired and/or wireless communications to one or more remote computers, such as a remote computer(s) 3248. The remote computer(s) 3248 can be a workstation, a server computer, a router, a personal computer, portable computer, microprocessor-based entertainment appliance, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 3202, although, for purposes of brevity, only a memory/storage device 3250 is illustrated. The logical connections depicted include wired/wireless connectivity to a local area network (LAN) 3252 and/or larger networks, e.g., a wide area network (WAN) 3254. Such LAN and WAN networking environments are commonplace in offices and companies, and facilitate enterprise-wide computer networks, such as intranets, all of which may connect to a global communications network, e.g., the Internet.
When used in a LAN networking environment, the computer 3202 is connected to the local network 3252 through a wired and/or wireless communication network interface or adapter 3256. The adaptor 3256 may facilitate wired or wireless communication to the LAN 3252, which may also include a wireless access point disposed thereon for communicating with the wireless adaptor 3256.
When used in a WAN networking environment, the computer 3202 can include a modem 3258, or is connected to a communications server on the WAN 3254, or has other means for establishing communications over the WAN 3254, such as by way of the Internet. The modem 3258, which can be internal or external and a wired or wireless device, is connected to the system bus 3208 via the serial port interface 3242. In a networked environment, program modules depicted relative to the computer 3202, or portions thereof, can be stored in the remote memory/storage device 3250. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers can be used.
The computer 3202 is operable to communicate with any wireless devices or entities operatively disposed in wireless communication, e.g., a printer, scanner, desktop and/or portable computer, portable data assistant, communications satellite, any piece of equipment or location associated with a wirelessly detectable tag (e.g., a kiosk, news stand, restroom), and telephone. This includes at least Wi-Fi and Bluetooth™ wireless technologies. Thus, the communication can be a predefined structure as with a conventional network or simply an ad hoc communication between at least two devices.
Wi-Fi, or Wireless Fidelity, allows connection to the Internet from a couch at home, a bed in a hotel room, or a conference room at work, without wires. Wi-Fi is a wireless technology similar to that used in a cell phone that enables such devices, e.g., computers, to send and receive data indoors and out; anywhere within the range of a base station. Wi-Fi networks use radio technologies called IEEE 802.11 (a, b, g, etc.) to provide secure, reliable, fast wireless connectivity. A Wi-Fi network can be used to connect computers to each other, to the Internet, and to wired networks (which use IEEE 802.3 or Ethernet). Wi-Fi networks operate in the unlicensed 2.4 and 5 GHz radio bands, at an 11 Mbps (802.11a) or 54 Mbps (802.11b) data rate, for example, or with products that contain both bands (dual band), so the networks can provide real-world performance similar to the basic 10 BaseT wired Ethernet networks used in many offices.
Referring now to
The system 3300 also includes one or more server(s) 3304. The server(s) 3304 can also be hardware and/or software (e.g., threads, processes, computing devices). The servers 3304 can house threads to perform transformations by employing the invention, for example. One possible communication between a client 3302 and a server 3304 can be in the form of a data packet adapted to be transmitted between two or more computer processes. The data packet may include a cookie and/or associated contextual information, for example. The system 3300 includes a communication framework 3306 (e.g., a global communication network such as the Internet) that can be employed to facilitate communications between the client(s) 3302 and the server(s) 3304.
Communications can be facilitated via a wired (including optical fiber) and/or wireless technology. The client(s) 3302 are operatively connected to one or more client data store(s) 3308 that can be employed to store information local to the client(s) 3302 (e.g., cookie(s) and/or associated contextual information). Similarly, the server(s) 3304 are operatively connected to one or more server data store(s) 3310 that can be employed to store information local to the servers 3304.
What has been described above includes examples of the disclosed innovation. It is, of course, not possible to describe every conceivable combination of components and/or methodologies, but one of ordinary skill in the art may recognize that many further combinations and permutations are possible. Accordingly, the innovation is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims. Furthermore, to the extent that the term “includes” is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term “comprising” as “comprising” is interpreted when employed as a transitional word in a claim.
This application claims the benefit of U.S. Provisional Patent application Ser. No. 60/657,295 entitled “DATA MODEL FOR OBJECT-RELATIONAL DATA” and filed Feb. 28, 2005, and is related to pending U.S. patent application Ser. No. 11/171,905 entitled “PLATFORM FOR DATA SERVICES ACROSS DISPARATE APPLICATION FRAMEWORKS” filed on Jun. 30, 2005. The entireties of the above-noted applications are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60657295 | Feb 2005 | US |