SYSTEM AND METHOD FOR MODELLING DATA

BACKGROUND

Healthcare data may be stored in electronic format and on diverse storage systems.

OVERVIEW

In one aspect, some implementations provide a computer-implemented method for providing user access to data entities stored at a multitude of data storage systems, the method including: receiving a definition of a data entity model that includes component elements; receiving a specification relating at least one of the component elements of the defined data entity model to data representing at least one data entity stored at a plurality of data storage systems; translating, based on the received specification, the received definition of the data entity model into a database schema linking the data entity model to the data stored at the plurality of data storage systems; receiving a request to access a portion of the data stored at one of the plurality of data storage systems; and configuring, in response to receiving the request and according to the translated schema, the request into a data access operation specific to the data storage system.

Implementations may include one or more of the following features. In one configuration, the method may further include causing the data access operation specific to the data storage system to be executed on the data storage system. Receiving the request to access a portion of the data may include receiving a data access request to access a portion of the at least one data entity, and the data access request may be specified in a uniform format not specific to the data storage system. The method may further include receiving results from executing the data access operation on the data storage system.

In another configuration, configuration, the method may further include receiving the definition of a data entity model comprises receiving a validity definition for at least one component element described in the data entity model. The method may further include validating the portion of the at least one data entity to be accessed according to the validity definition.

In yet another configuration, translating the received definitions of the data entity model comprises translating the received definition of the data entity model into a database schema for at least one data storage system that stores data entries by rows and at least one other data storage system that stores data entries by columns. The at least one data storage system that stores data entries by rows may include a Structured Query Language (SQL) database management system. The at least one data storage system that stores data entries by columns may include a NoSQL database management system.

In still another configuration, the method may further include receiving the definition of the data entity model and the specification in an extensible data format. The method may further include receiving the definition of the data entity model and the specification in an Extensible Mark-up Language (XML). The method may additionally include receiving the definition of the data entity model and the specification in a JavaScript Object Notation (JSON) format. The method may additionally include transforming the definitions of the data entity model and the specification into at least one parsable object. The method may further include: parsing the at least one parsable object to generate the database schema for each of the plurality of data storage systems.

In yet still another configuration, the method may include results from executing the data access operation on the data storage system and receiving instructions specifying a display layout associated with the data entity model. In response thereto, the method may include causing at least one component element of the data entity model to be displayed according to the specified display layout. In response thereto, the method may further include: configuring an interface to a reporting tool; transmitting, via the interface, results of executing the data access operation to the reporting tool; and causing the at least one component element to be displayed in accordance with the specified display layout.

In another aspect, some implementations may include a server capable of interfacing to a multitude of data storage systems, the server including at least one processor configured to perform the operations of receiving a definition of a data entity model that includes component elements; receiving a specification relating at least one of the component elements of the defined data entity model to data representing at least one data entity stored at a plurality of data storage systems; translating, based on the received specification, the received definition of the data entity model into a database schema linking the data entity model to the data stored at the plurality of data storage systems; receiving a request to access a portion of the data stored at one of the plurality of data storage systems; and configuring, in response to receiving the request and according to the translated schema, the request into a data access operation specific to the data storage system.

Implementations may include one or more of the following features. In one configuration, the operations may further include causing the data access operation specific to the data storage system to be executed on the data storage system. Receiving the request to access a portion of the data may include receiving a data access request to access a portion of the at least one data entity, and the data access request may be specified in a uniform format not specific to the data storage system. The operations may further include receiving results from executing the data access operation on the data storage system.

In another configuration, configuration, the operations may further include receiving the definition of a data entity model comprises receiving a validity definition for at least one component element described in the data entity model. The operations may further include validating the portion of the at least one data entity to be accessed according to the validity definition.

In still another configuration, the operations may further include receiving the definition of the data entity model and the specification in an extensible data format. The operations may further include receiving the definition of the data entity model and the specification in an Extensible Mark-up Language (XML). The operations may additionally include receiving the definition of the data entity model and the specification in a JavaScript Object Notation (JSON) format. The operations may additionally include transforming the definitions of the data entity model and the specification into at least one parsable object. The operations may further include: parsing the at least one parsable object to generate the database schema for each of the plurality of data storage systems.

In yet still another configuration, the operations may include results from executing the data access operation on the data storage system and receiving instructions specifying a display layout associated with the data entity model. In response thereto, the operations may include causing at least one component element of the data entity model to be displayed according to the specified display layout. In response thereto, the operations may further include: configuring an interface to a reporting tool; transmitting, via the interface, results of executing the data access operation to the reporting tool; and causing the at least one component element to be displayed in accordance with the specified display layout.

Implementations of the above techniques include a method, computer program product and a system. The computer program product is suitably embodied in a non-transitory machine-readable medium and includes instructions executable by one or more processors. The instructions are configured to cause the one or more processors to perform the above described actions.

The system includes one or more processors and instructions embedded in a non-transitory machine-readable medium that are executable by the one or more processors. The instructions, when executed, are configured to cause the one or more processors to perform the above described actions. The default position is not to use any external databases, but the system could be configured to perform a database check if needed.

The details of one or more aspects of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.

DESCRIPTION OF DRAWINGS

FIG. 1 is a diagram showing a web server coupled to a multitude of data storage systems according to some implementations

FIG. 2A is a diagram showing an example work flow of handling data access requests according to some implementations.

FIG. 2B is a screen shot showing an example user interface for a system administrator to configure data models according to some implementations.

FIG. 3 illustrates an example data storage configuration according to some implementations.

FIG. 4 is a diagram showing a flow chart of processing data access requests according to some implementations.

FIG. 5A is a diagram showing a flow chart of configuring the display output according to some implementations.

FIG. 5B is a diagram showing a flow chart of generating the display results according to some implementations.

DETAILED DESCRIPTION

This disclosure generally describes an interface between software designers and a multitude of heterogeneous data storage systems with diverse retrieval technologies. Implementations disclosed herein incorporates a dynamic data model and storage configuration framework through which a system administrator can specify, on a server, data entities to be modeled, the associated properties of each data entity, relationships between these data entities, as well as display layouts and designs. The framework allows application software developers to write application codes to access data stored on a multitude of underlying storage technologies using a generalized definition language. Using this framework, the application software programmer may not need to know the specifics of the underlying storage technologies. Based on the implementations disclosed herein, application program codes can interact seamlessly with a multitude of underlying data storage technologies because application software programmers can concentrate on a data model of the application rather than details of any underlying storage or retrieval technologies. In addition, implementations can incorporate adapters to accommodate new or yet to be developed storage technologies. Thus implementations disclosed herein may be extensible.

Healthcare database may serve a central role in healthcare, including patient care, administration, research and education. Healthcare data may generally refer to the healthcare records of a patient generated by a multitude of institutions, including for example, primary care physicians, hospitals, clinics, pharmacy stores, etc. These institutions may span across the country, or even the world. Such healthcare data is becoming increasingly more and more electronic. As an illustration, picture archiving and communication system (PACS) may manage imaging data of a patient obtained at a radiology or cardiology department of a hospital. Generally, diagnostic test results, history of taking drugs for treatment, surgery history, or even family history and genetic predisposition can all become part of the healthcare data of a patient. Such healthcare data in electronic form may provide patients and treating physicians with easy access to relevant healthcare data. As another illustration, insurance transactions may be conducted electronically. In fact, to provide faster service and easier access, more and more insurance companies are processing insurance claims and disbursing funds electronically.

To provide quality of service, healthcare database is expected to function across the boundaries of different institutions and technologies. These electronic databases may be hosted by different institutions anywhere in the country or even across nations. These electronic databases may utilize diverse data storage systems. Examples of data storage technologies that are used in healthcare databases may include, but are not limited to, SQL (structured query language) data servers, non-SQL data servers, network attached storage (NAS) servers, direct attached storage (DAS) device, storage area network (SAN) storage, etc. Example data storage technologies may also include cloud storage through which electronic healthcare data may be stored in virtualized pools of storages managed by third parties.

For context, healthcare data, in electronic form, is growing larger in size facilitated by hardware improvements in speed and capacity of mass storage devices, as well as similar advances in processor and networking speed. Healthcare databases may be partitioned in large tables across a cluster of separate database servers with diverse storage technologies. For example, network-attached storage (NAS) and storage area networks (SANs) coupled with fast local area networks and Fibre Channel technology enable still larger, more loosely coupled configurations of databases and distributed computing power. Example implementations of distributed database storage may include X/Open XA standard and Oracle RACrelies, both of which employs high-speed network connections between data storage servers. In the context of describing data storage technologies, server, system, and device may be used interchangeably.

FIG. 1 is a diagram showing a server 102 coupled to a multitude of data storage systems 108 to 112 according to some implementations. Server 102 may be any computer or computing apparatus consistent with the description herein. Sever 102 may be configured to handle data access requests 104 and 106. Data access requests 104 and 106 may be submitted by patients, primary care physicians, treating physicians, physical therapists, or insurance providers. Data access requests 104 and 106 may attempt to access data stored across a multitude of storage technologies, as implemented by data storage systems 108 to 112.

One solution to interfacing with the multitude of storage technologies as outlined above would be writing large blocks of conditional logic into database application programs running on server 102. The control blocks may allow the application programs to interact with each of the multitude of underlying data storage technologies in a manner consistent with the particular underlying data storage technology. This approach represents a “hardwiring” approach to dealing with interfaces for each particular underlying data storage technology. Generally, explicit knowledge of data structures specific to each underlying data storage technology may be required. Such knowledge requirement may not be waived even in implementations of isolated interfaces for a particular underlying data storage technology. Hence, to implement this solution, data structures to describe objects and object relations specific to an underlying data storage technology are hardwired into the database application program running on server 102. As a result, such solutions may necessitate complicated specifications of data structures, leading to a fragile implementation with increased occurrences of bugs and reduced suitability for maintenance.

Object relational modeling (ORM) techniques may convert data between otherwise incompatible type systems in object-oriented programming languages. The conversion creates, in effect, a “virtual object database” that can be used from within the programming language. For context, data management tasks in object-oriented (00) programming are typically implemented by manipulating objects that are almost always non-scalar values. As an illustration, an address book of patients with entries of residential address, home phone numbers, name of primary care physician, and insurance policy information may be implemented as one object. The implemented object includes attributes/fields to hold each data entry item. The implemented object may also include various methods to access the data entry items, such as, for example, retrieve a phone number, update insurance policy number, etc. However, many popular database products such as structured query language (SQL) database management systems (DBMS) can only store and manipulate scalar values such as integers and strings organized within tables. ORM techniques attempt to convert the object values into groups of simpler values for storage in the database (and convert them back upon retrieval).

Commercial and open source packages using the ORM techniques are available. For example, packages such as Hibernate and NHibernate can perform object-relational mapping. However, these existing packages only address conversion at the layer between a software application program and a target DBMS. It may be difficult to address this problem at a higher level, i.e., the application level. Moreover, existing software packages such as Hibernate and NHibernate are incapable of interfacing with newer NoSQL technologies.

For context, a NoSQL database provides a mechanism for storage and retrieval of data that uses looser consistency models rather than traditional relational databases. Motivations for this NoSQL approach include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases often include key-value stores optimized for simple retrieval and appending operations so that significant performance benefits may be obtained in terms of latency and throughput. NoSQL databases find significant and growing industry use in big data and real-time web applications. In contrast to traditional transactional database systems such as SQL DBMS, NoSQL database cannot necessarily give full atomicity, consistency, isolation, and durability (ACID) guarantees. Instead, NoSQL database implementations usually may only provide eventual consistency or NoSQL database implementations may limit transactions to single data items. This contrasting feature means that given a sufficiently long period of time over which no additional changes are generated, all updates can be expected to propagate eventually through the system. Although most of the NoSQL systems implementations can handle transactions over single documents, some NoSQL systems such as eXtreme Scale, FoundationDB, OrientDB, WakandaDB and djondb may execute transactions over multiple documents, similar to what RDBMS systems support over multiple rows. A NoSQL database typically scales horizontally, meaning the addition of more low cost commodity systems can boost aggregate computing performance including the ability to respond fast to data queries. As a result, a NoSQL database may often manage large amounts of data, when the performance and real-time nature is more important than consistency (as in indexing a large number of documents, serving pages on high-traffic web sites, and delivering streaming media).

In providing data storage solutions, some companies and researchers may debate limitations of relational database management systems. For example, some vendors, such as GigaSpaces, advocate a space based architecture, different from the mainstream relational database model, to achieve the highest performance and scalability. Others, such as Base One, believe in the mainstream relational database technology to further enhance scalability. In addition, NoSQL architectures such as Google's BigTable, have been demonstrated to further enhance scalability for specialized applications. As a further development, Google's massively distributed Spanner technology, positioned as a successor to BigTable, supports general-purpose database transactions and provides a more conventional SQL-based query language. Hence, myriad data storage technologies are available from various vendors with different, sometimes, polarizing implementation philosophies.

In managing large and heterogeneous data sets, the software industry is striving hard to address the problem of interfacing to myriad data storage solutions, including, for example, SQL storage system and NoSQL storage system. That is, it is difficult to provide a unified interface at the application layer for software designers to access data stored on diverse underlying data storage systems.

Although supporting multiple storage technologies at the application layer may be difficult, such a uniform interface enables an application software programmer to develop codes efficiently.

First, an application program code may modify or update data models used in the particular application program in a manner unaffected by the peculiarities of the underlying storage systems. Instead, the underlying storage system may act in accordance with the modified data model. Hence, the application code may be maintainable and flexible with regard to the underlying data storage implementations.

Second, application software programmers may no longer have to write specific application code for each particular data storage system that hosts data referenced by an application program. In other words, software programmer can be relieved from the burden of detailed knowledge of how the underlying storage technologies are implemented. Hence, application software programmers can focus on writing the maintainable code, thereby reducing development time to implement and modify software programs for Big Data applications.

FIG. 2A is a diagram showing an example architecture of a dynamic data model and storage configuration according to some implementations. Configuration Data 1 generally refers to data intended for configuring entities and their relations in an object relational model. In some implementations, a generalized block of Configuration Data 1 may represent, among other things, Entity Structure 2, Entity Relationships 3, Entity Display Guides 4, and Entity Behavior 5. In some implementations, extensible mark-up language (XML) may be chosen to represent Configuration Data 1. In other implementations, other extensible programming languages such as JSON may be used for the representation schemes.

Entity Structure 2 may generally refer to configuration information that specifies the layout of entities in an object relational model, including the constituent properties and the applicable property data types for each property. The constituent properties and the corresponding applicable property data types may be generally referred to as component elements.

The following XML code excerpts show an example RelaxNG compact (RNC) schema definition of Entity Structure 2.

entitymodel =

element entitymodel {

entitymodelcontent

}

entitymodelcontent =

attribute id { xsd:NMTOKEN },

attribute table { text }?,

attribute scope { “public” | “private” },

( (field | fieldref)* & indexes? & relations? )

field =

element field {

attribute id { xsd:NMTOKEN },

attribute name { text }?,

attribute type { “string” | “double” | “int” | “uint” | “bool” |

“datetime” | “date” | “currency” | “enum” | “long” | “ulong” |

“entityName” },

attribute column { text }?,

(attribute primarykey { “true” | “false” } | attribute not-null {

“true” | “false” })?,

# defaults to “sequence”

attribute keygen { “sequence” | “assigned” | “native” }?,

attribute length { xsd:unsignedInt }?,

attribute db-type { text }?,

attribute default { text }?,

attribute indexed { “true” | “false” }?,

attribute values { text }?,

element description { text }? }

fieldref =

element fieldref {

attribute id { xsd:NMTOKEN }

}

# relations are used as optional overrides for the entityjoins defined

below

# when needed in very explicit per-account override situations

relations =

element relations {

anyrelation*

}

indexes =

element indexes {

index*

}

index =

element index {

attribute id { xsd:NMTOKEN },

attribute fields { text },

attribute constraint { “none” | “unique” }?

}

anyrelation =

element rel:* {

attribute * { text }*,

anyrelation*

}

Entity Relations 3 may refer to configuration information that specifies how entities should be related to each other in terms of keys, and relationship cardinality (for example, 1:1, 1:m, m:m). For context, the cardinality of one data table with respect to another data table correspond to relationships between data tables, i.e., explaining how each table links to another. In the relational database model, tables can be related as any of: many-to-many, one-to-many, or one-to-one. This mapping is said be the cardinality of a given table in relation to another. For example, consider a database designed to keep track of hospital records. Such a database could have many tables like: a Doctor table full of doctor information, a Patient table with patient information, and a Department table with an entry for each department of the hospital. In this hypothetical model: a many-to-many relationship exists between the records in the Doctor table and records in the Patient table (Doctors have many patients, and a patient could have several doctors); a one-to-many relation exists between the Department table and the Doctor table (each doctor works for one department, but one department could have many doctors). In this hypothetical model, a one-to-one relationship may exist if, for example, the Doctor's table is split in two to keep apart doctors' personal or administrative information. In data modeling, collections of data elements are grouped into data tables. The data tables contain groups of data field names (also known as database attributes). Data tables are linked by key fields. A primary key assigns that field's special order to a table. In this hypothetical mode, a primary key for the doctor's table may be the attribute of DoctorID. A table can also have a foreign key which indicates that that field is linked to the primary key of another table. A complex data model can involve hundreds of related tables, which may be stored across various institutions and different storage systems.

The following XML code excerpts show an example RelaxNG compact (RNC) schema definition of Entity Relation 3.

entityjoins =

element entityjoins {

attribute id { xsd:NMTOKEN },

join*,

joincontext*,

joindefault?

}

join =

element join {

attribute to { text },

(attribute via { text } |

(attribute property { text }?,

attribute alias { text }?))?,

joincondition*

}

joincondition =

(element eq {

attribute property { text },

attribute value { text },

attribute type { text },

empty

})*

joincontext =

element context {

attribute name { text },

join*

}

joindefault =

element default {

attribute via { text }

}

Entity Display Guides 4 may refer to configuration information that specifies how entities should be displayed in a user interface for human visualization and inspection. Entity display guides 4 may be a user interface (UI) element. Numerous variations of representations exist. The UI representations may provide directives to controls but may not provide actual hyper-text mark-up language (HTML) rendering code. An example definition of Entity Display Guides 4 is the following:

dataform =

element dataform {

(attribute id { xsd:NMTOKEN }

& attribute editable { “true” | “false” }

& dataform.attlist),

dataformitem*,

actions?

}

dataformitem =

## input fields

element field {

attribute id { xsd:NMTOKEN },

attribute type { “text” | “textarea” | “checkbox” | “date” |

“dropdown” | “password” }?,

attribute required { “true” | “false” }?,

attribute readonly { “true” | “false” }?,

attribute helptext { text }?,

attribute style { text }?,

## type=checkbox

attribute iconify { “true” | “false” }?,

attribute truestring { text }?,

attribute falsestring { text }?,

attribute trueiconurl { text }?,

attribute falseiconurl { text }?,

## type=text|textarea|password

attribute maxlength { xsd:unsignedInt }?,

attribute columns { xsd:unsignedInt }?,

## type=textarea

attribute rows { xsd:unsignedInt }?,

## type=dropdown

attribute constrain { “true” | “false” }?,

attribute optionsnamedprojection { text }?,

attribute optionsstringid { text }?,

staticdatafieldoptions?,

element validators {

element validator {

attribute id { xsd:NMTOKEN }+

}

}?

} |

## non-input (display) fields

element empty { empty } |

element separator { empty } |

element search {

attribute id { xsd:NMTOKEN },

attribute label { text },

attribute lazy { “true” | “false” }?,

attribute helptext { text }?

}

Additional configuration data elements types may be implemented in accordance with the coding examples above. These additional element types may include DataForm, which models the entity in a columnar form (e.g., 2-4 columns); Search, which models the entity as the entity appears in a search interface; List, which models the entity as the entity appears in a listing format; DashBoard, which models the entity in a reporting view with calculated metrics; PublishedView, which models the entity as it appear to external reporting/visualization tools (e.g., Tableau, MicroStrategy, etc.).

Entity Behavior 5 may refer to configuration information that specifies how a given entity should behave when records of the entity type are created, updated, validated, or removed, as well as what subsequent actions should be taken should data in the entity change. For example, Entity Behavior 5 may include validation rules (e.g. currency fields should have ‘$’ characters in America). The following coding excerpts include RelaxNG Compact (RNC) definitions of normalization rules as an example implementation of Entity Behavior 5.

normalizationrules =

element normalizationrules {

attribute id { text },

attribute enabled { “true” | “false” }?,

(staticnormalizerule | regexnormalizerule | element uppercase {

empty } | element titlecase { empty } | element lowercase { empty })*

}*

mailingaddress =

element mailingaddress {

attribute id { text },

attribute enabled { “true” | “false” }?,

element address { text }?,

element secondary { text }?,

element city { text }?,

element state { text }?,

element zip { text }?

}

staticnormalizerule =

element static {

attribute match { text },

attribute replace { text },

attribute ignorecase { “true” | “false” }?,

empty

}

regexnormalizerule =

element regex {

attribute pattern { text },

attribute options { text }?,

attribute replace { text },

empty

}

In yet another illustration, the following excerpt of code shows an example Configuration Data 1 for a “Contract” entity, defined according to the configuration schema outline above. The “Contract” entity additionally includes a normalization section to, for example, exclude redundant entries.

Referring to FIG. 2B, a screen shot of an example graphic user interface (GUI) for a system administrator to configure data models according to some implementations is shown. The GUI shows a list of entities being modeled on the right panel. The list of data entities include the contract entity model described above. The GUI that allows a system administrator to make changes to data entity. The system administrator may choose a data entity on the left panel and then modify, in the right panel, any field associated with the chosen data entity. When the system administration hits a “save” button (not shown) on any of the field changes or hits the “add new field” button, the server described herein automatically communicates with underlying storage technologies and updates the physical models of the storage devices wherever necessary. In a similar fashion, the system administrator may update, through the example user interface, other aspects of the object relational modeling, including, for example, feature configuration, authentication, and account information.

Returning to FIG. 2A, Configuration Controller (CC) 6 reads in Configuration Data 1 from stored configuration data sets. In some implementations, the stored data sets could be represented using XML. In other implementations, the stored data sets could be represented with JSON, or any other form of extensible data format. Configuration Controller 6 takes raw configuration representations and transforms raw configuration representations into objects that can be parsed by other software programs up-stream in the application layer.

Data Model Controller (DMC) 7 is configured to receive Configuration Data 1 read through Configuration Controller 6 and then translate Configuration Data 1 into specific schema definitions the relevant plugged-in data stores. DMC 7 manages applying the data entity models, making changes to the data entity models, and validating data entity models for consistency with the target storage and retrieval technology. In other words, DMC 7, along with Configuration Controller (CC) 6, may convert a particular data entity model into data structure management statements specific for target data storage systems.

As illustrated by FIG. 2A, example target data storage systems may include SQL database 8, Caching Storage 9, Column Store 10 for NoSQL database, and Column Store 11 for NoSQL database. DMC 7 can be adapted for any storage system that supports schema definition and methods for altering schema. While an exhaustive listing of all storage systems that can be adapted for the DMC 7 may be impractical for the disclosure herein, an example implementation of a DMC 7 is described in the following coding excerpt:

interface IDataStoreAdapter {

public void CreateEntityType(string entityTypeId, Field[ ] fields);

public void RemoveEntityType(string entityTypeId);

public void ModifyEntityType(string entityTypeId, Field[ ] fields);

public Field[ ] GetEntityFields(string entitytypeId);

public Relation[ ] GetEntityRelations(string entityTypeId);

public IEntityBehavior[ ] GetEntityBehaviors(string entityTypeId);

public void StoreRecord(string entityTypeId, Field[ ] fields);

public Field[ ] LoadRecord(string entityTypeId);

public List<Field[ ]> QueryRecords(string xmlQuery);

};

In the example coding excerpt above, Field and Relation objects have not been spelled out in detail in order to save space within the disclosure. However, a person skilled in the art would readily recognize that the Field and Relation objects are included to associate filed names, data types, and join paths as described in the coding examples above.

In adapting data object for an underlying storage technology, a data store adapter may be implemented at DMC 7. In some implementations, a particular data store adapter generates relevant syntax for the underlying storage technology in order to perform the operations of creating, modifying, and interrogation as mandated by an example IDataStoreAdapter interface. For example, DMC 7 may call adapters with the IDataStoreAdapter interface, depending on the state of the input configuration data. DMC 7 may also handle the difference calculations to be performed by various underlying database storage systems when interrogating the IDataStoreAdapter interface to determine whether or not the data in the storage system matches the specified configuration. Once a match has been identified, DMC 7 then may make calls to the Create, Remove, or Modify methods to apply any required changes to the data in the target storage system.

To highlight the implementation of an adapter, consider an example involving a traditional relational database and a modern NoSQL database such that an adapter for MySQL and an adapter for MongoDB may be implemented. As illustrated by FIG. 3, both the MySQLDataStoreAdapter 304 and the MongoDBDataStoreAdapter 306 are derived from the IDataStoreAdapater 302.

The following configuration definition may be used for an data entity suitable for a data entry in MySQL database or MongoDB database:

In one illustration, a MySQLDataStoreAdapter deriving from IDataStoreAdapter might, upon having the generator CreateEntityType( )method called, generate and execute the following MySQL code:

create table Example (

Id integer unsigned primary key auto_increment,

Field0 varchar(50),

Field1 integer unsigned

);

However, the same method call to CreateEntityType( ) on a MongoDBDataStoreAdapter deriving from IDataStoreAdapter may generate and execute the following MongoDB statement:

db.createCollection(“Example”)

Because MongoDB may not strictly enforce record structure, only collection creation by name may be implemented by the MongoDBDataStoreAdapter. DMC 7 may, however, manage the fields of data entities when DMC 7 is instructed to store a record via StoreRecord( ) For example, calling StoreRecord(“Example”, “Id: 5, Field0: ‘test’, Field1: 42”) could generate the following SQL or MongoDB statements respectively, depending which underlying data storage is invoked:

MySQL : insert into Example values (5, ‘test’, 42);

MongoDB : db.Example.insert({ Id: 5, Field0: ‘test’, Field1: 42 });

Hence, using the IDataStoreAdapter, a software programmer may use a uniform format as stipulated by the StoreRecord method to create a new record in a data table. As demonstrated, the uniform format is independent of the underlying data storage system.

In another illustration, the QueryRecords( )method of an IDataStoreAdapter may translate a DMC level query description into the query language of the specific underlying storage technology. DMC level queries may be constructed by using, for example, an extendible mark-up language (XML) syntax that lists a type of entity to fetch and a collection of fields to query against, independent of join conditions.

With regard to join operations, the “entityjoins” configuration sections may be programmed to compile the proper joins. For context, a join operation in database management merges data entities from two or more tables. Example joins may be include: cross join, which combine each row from the first table with each row of the second table by concatenating rows from both tables; inner joins, which combines rows from the first table with rows from the second table based a join-predicate, for example, when fields from rows in both tables have common values; outer joins, which does not require each record in the two joined tables to have to a matching value. Other joins may be defined by, for example, the American National Standards Institute (ANSI) standards or a particular vendor. The “entityjoins” configuration may specify the join operations supported by a particular underlying data storage system. An application programmer, however, may not need to implement any join operations in the application program code being developed.

Assuming a second entity of “JoinExample” referencing the above example “Example” in the following manner:

The QueryRecords( )method of IDataStoreAdapter may be called using this query at the DMC level:

In response, the following statements may be generated by the respective IDataStoreAdapter objects for MySQL and MongoDB:

MySQL: select * from Example inner join JoinExample on

Example.Id=JoinExample.ExampleId whereJoinField=“Hello”

MongoDB: db.Example.Find({ JoinField=“Hello”})

In MongoDB, full object maps would be stored based on the entity model and entity join definitions. The entity join definitions may be similar to the above description of a join operation in the find statement. The full object map may be stored by saving in storage the entire collection of objects in a join statement in addition to the collections of individual objects. Thus, if these two individual entities are defined and the join statement is also defined, the MongoDB code to be generated when inserting a JoinField record may be the following:

db.Example_JoinExample.insert({ type: “Example”, Id: 5, Field0: ‘test’,

Field1: 42}, { type: “JoinExample”, Id: 16, ExampleId: 5, JoinField:

“Hello” });

Thus, according to some implementations described herein, adapters may be configured to perform different operations on data entities depending on the storage technology. The operations on data entities can be on data storage side. The operations on data entities can also be on data retrieval side. However, an application software programmer doesn't need to think about the details of the above distinctions among various underlying data storage technologies. In fact, the software programmer may only, in one unified high-level language, invoke the DMC 7 to modify the entity schemas, store, load, and query records. Thus, the application software programmer can take the performance advantage of different storage technology's properties (speed, transactions, consistency, etc.) tailored to solve a particular problem in a given contextual situation, without having to re-wire entire up stream codes for converting data entities to specifically fit any particular underlying data storage technology. Up-stream code may refer to higher level application program code that calls on the code being developed. If the application programmer has to write codes for converting data entities to specifically fit an underlying data storage technology, the application programmer will need to be familiar with the gory details of the underlying data storage technology. The familiarity with the underlying data storage technology generally means additional human time to learn and master the peculiarities of each underlying data storage technology. Code dependency on the underlying data storage system also means that, if the underlying data storage technology evolves, for example, as a result of Moore's law, then the conversion code needs to be rewritten by the application programmer, thus compounding the undesirable complexity of drafting the conversion code. In contrast, the adaptor architecture as disclosed herein can lead to significant savings in management, development, and maintenance cost for “Big Data” systems implementers. The savings may be more pronounced as system implementers start to integrate new “best in class” data storage technologies as such technologies become available for specific applications.

FIG. 4 is a diagram showing a flow chart of processing data access requests according to some implementations. First, a definition of a data entity model is received at, for example, a server (402). The server may be any computer or computing apparatus consistent with the descriptions herein. The data entity model may define a template for a data entity, as illustrated by Entity Structure 2 of Configuration Data 1 in FIG. 2A. The data entity model may include component elements. Component elements may include, for example, a defined field in a database table. Next, a specification relating at least one component element of the data entity model to data representing at least one data entity stored at a multitude of data storage systems may be received (404). An example specification may include Entity Relations 3 of Configuration Data 1 in FIG. 2A. In some implementations, Configuration Controller 6 may be configured to receive the definition and the specification, as illustrated in FIG. 2A. In other implementations, definition of a data entity model and the specification relating the data entity model to one or more data entities stored at a multitude of data storage systems may be configured by a systems administrator through a graphic user interface (GUI), for example, at a web portal, as illustrated in FIG. 2B.

Thereafter, the server may translate the definition of the data entity model into a database schema linking the data entity model to the at least one data entity (406). In some implementations, a Data Model Controller (DMC) 7 may be configured to perform the translation, as illustrated by FIG. 2A. In one example, as illustrated by FIG. 3, IDataStoreAdapter 302 may be configured to invoke derivative interfaces such as the MySQLDataStoreAdapter 304 and the MongoDBDataStoreAdapter 306 to perform the translation that links an input data entity model to the MySQL data storage and the MongDB database storage, respectively.

Once the server has been configured to perform the translation to various underlying data storage technologies, the server may then receive a request to access a portion of the data stored at one of the multitude of underlying data storage systems (408). In response, the server may configure the data access request into data access operations specific to the underlying data storage system according to the database schema (410). In some implementations, such requests to access data may be issued by application programs written by application software programmers, as illustrated by FIG. 1. When crafting the software code for the application programs, the application programmer may program the data access request in a unified format without regard to the specifics of the underlying data storage system. As illustrated by the example of FIG. 3, the application programmer may use IDataStoreAdapter 302 as an adapter interface. The IDataStoreAdapter 302, depending on the storage address of the requested data, may invoke either MySQLDataStoreAdapter 304 or MongoDataStore 306, as described above. Thus, the application programmer may not need to be familiar with the underlying data storage system and yet, the application code may need little maintenance even as the underlying data storage system is being upgraded.

Subsequently, the server may cause the data access operation to be executed on the specific data storage system according to the database schema (412). As illustrated by FIG. 1, the server may route the data access operation to the target data storage systems 108-112.

Results of executing the data access operation may be displayed to the user who submitted the data access request from the application program. In some implementations, as illustrated in FIG. 5A, the server may receive results from executing the data access operation, for example, a database query command, on the target data storage system (502). The server may also receive instructions specifying a display layout associated with the data entity model (504). Example instructions specifying display layout may include Entity Display Guides 4 of Configuration Data 1 in FIG. 2A. Thereafter, the server may display results of executing the data access operation on the target data storage system according to the specified display layout (506).

In other implementations, as illustrated in FIG. 5B, the display results may be presented to the user through an external, or third-party, reporting tool. Example external, or third party, reporting tool may include Tableau or MicroStrategy tools. After receiving results from executing the data access operation on the data storage system (502) and instructions specifying a display layout associated with the data entity model (504), the server may export the results from executing the data access operation to the external reporting tool (506). The server may then cause the results of executing the data access operation to be displayed at the reporting tool in accordance with the specified display layout (508).

Implementations of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-implemented computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including, by way of example, a programmable processor, a computer, or multiple processors or computers. The apparatus can also be or further include special purpose logic circuitry, e.g., a central processing unit (CPU), a FPGA (field programmable gate array), or an ASIC (application-specific integrated circuit). In some implementations, the data processing apparatus and/or special purpose logic circuitry may be hardware-based and/or software-based. The apparatus can optionally include code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. The present disclosure contemplates the use of data processing apparatuses with or without conventional operating systems, for example Linux, UNIX, Windows, Mac OS, Android, iOS or any other suitable conventional operating system.

A computer program, which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. While portions of the programs illustrated in the various figures are shown as individual modules that implement the various features and functionality through various objects, methods, or other processes, the programs may instead include a number of sub-modules, third party services, components, libraries, and such, as appropriate. Conversely, the features and functionality of various components can be combined into single components as appropriate.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., a central processing unit (CPU), a FPGA (field programmable gate array), or an ASIC (application-specific integrated circuit).

Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.

Computer-readable media (transitory or non-transitory, as appropriate) suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The memory may store various objects or data, including caches, classes, frameworks, applications, backup data, jobs, web pages, web page templates, database tables, repositories storing business and/or dynamic information, and any other appropriate information including any parameters, variables, algorithms, instructions, rules, constraints, or references thereto. Additionally, the memory may include any other appropriate data, such as logs, policies, security or access data, reporting files, as well as others. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

To provide for interaction with a user, implementations of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display), or plasma monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.

The term “graphical user interface,” or GUI, may be used in the singular or the plural to describe one or more graphical user interfaces and each of the displays of a particular graphical user interface. Therefore, a GUI may represent any graphical user interface, including but not limited to, a web browser, a touch screen, or a command line interface (CLI) that processes information and efficiently presents the information results to the user. In general, a GUI may include a plurality of user interface (UI) elements, some or all associated with a web browser, such as interactive fields, pull-down lists, and buttons operable by the business suite user. These and other UI elements may be related to or represent the functions of the web browser.

Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN), a wide area network (WAN), e.g., the Internet, and a wireless local area network (WLAN).

The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular implementations of particular inventions. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combinations.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be helpful. Moreover, the separation of various system modules and components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Particular implementations of the subject matter have been described. Other implementations, alterations, and permutations of the described implementations are within the scope of the following claims as will be apparent to those skilled in the art. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results.

Accordingly, the above description of example implementations does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure.

SYSTEM AND METHOD FOR MODELLING DATA

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

US Classifications

International Classifications

Abstract

Description

Claims