The present disclosure relates generally to database systems and more particularly to systems and methods for categorizing data in multi-tenant database systems (“MTS”).
As the Internet has grown, many different systems and techniques for organizing the explosion of information have been developed. One of the techniques is data categorization, wherein the data categories are typically conceived, maintained, and updated by the information provider who stores or hosts the information, or data, in databases. Data categorization can enable more intuitive and efficient interfaces for searching for and maintaining information as well as facilitating data analysis. At a basic level, data is categorized in a single dimension, e.g., widget company Acme may want to sort its database of customer-reported product issues by widget model. It may be more desirable, however, to provide a feature for categorizing data along multiple dimensions, e.g., widget company Acme may need to be able to analyze all customer-reported product issues along four dimensions: widget model, widget version, distribution channel, and manufacturing location. Such a feature can enable targeted analysis of any category of data, where a category can be comprised of any permutation, whether narrow or broad, of available categorization dimensions. U.S. Pat. No. 7,130,879 discloses such multi-dimensional categorization.
Like many such database features, implementation within the environment of a MTS presents novel challenges. For example, a MTS, such as the salesforce.com service, may utilize a multi-tenant architecture wherein unrelated organizations (i.e., tenants) can share database resources in a single logical database. The database entities, or tables, themselves are typically shared between tenants—each entity in the data model typically contains an organization_id column or similar column that identifies the data items associated with each tenant. All queries and data manipulation are performed in the context of a tenant filter on the organization_id column or similar column to ensure proper security and the appearance of virtual private databases. Since entities are shared, however, the provision of features like multi-dimensional categorization presents nontrivial issues. Each tenant of the MTS may have its own desired scheme of data categorization, and such categorization schemes are preferably highly customizable to meet the particular needs of each tenant.
Accordingly, it is desirable to provide systems and methods that provide for the creation, use, and maintenance of data categories that can be highly customized on a per-tenant basis in a MTS environment.
The present disclosure provides novel systems and methods for providing multi-dimensional categorization within a multi-tenant database system (“MTS”). Data items in entities stored in a MTS may be categorized along one or more category dimensions. A search query may include one or more selected categories in one or more category dimensions. Categorization methodologies include multi-selection, multi-position, and combinations thereof. Users of the MTS may also be categorized along one or more category dimensions. A filter may present a subset of data items relevant to a user in accordance with their categorization.
Some embodiments comprise retrieving one or more categories from one or more category dimensions and transmitting information identifying the one or more categories. The category dimensions are stored in the multi-tenant database system. The category dimensions that are retrieved are those which are accessible by a specified tenant.
Some embodiments comprise receiving an identification of a first category in a first category dimension, retrieving one or more data items that are categorized along the first category dimension, and transmitting information identifying the one or more data items. The one or more data items are retrieved from one or more database entities stored in a multi-tenant database system. The category dimensions are also stored in the multi-tenant database system. The category dimensions that are retrieved are those which are accessible by a specified tenant.
Some embodiments comprise a computer-readable medium encoded with instructions for performing the above-described operations and variations thereof.
Some embodiments comprise retrieving one or more categories from one or more category dimensions stored in the multi-tenant database system, transmitting information to display the one or more categories, receiving a selection of a first category in a first category dimension, receiving a selection of a second category in a second category dimension, returning one or more data items associated with at least one of the first category and the second category, wherein the one or more data items are retrieved from one or more database entities stored in the multi-tenant database system, and transmitting information identifying the one or more data items.
Reference to the remaining portions of the specification, including the drawings and claims, will realize other features and advantages of implementations. Further features and advantages of implementations, as well as the structure and operation of various embodiments, are described in detail below with respect to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
a) and (b) are an illustration of three dimensions as used in a multi-position categorization process.
Network 14 can be a local area network (“LAN”), wide area network (“WAN”), wireless network, point-to-point network, star network, token ring network, hub network, or other configuration. As the most common type of network in current use is a Transfer Control Protocol and Internet Protocol (“TCP/IP”) network such as the global internetwork of networks often referred to as the “Internet” with a capital “I,” that will be used in many of the examples herein, but it should be understood that the networks that the system might use are not so limited, although TCP/IP is the currently preferred protocol.
User systems 12 might communicate with MTS 16 using TCP/IP and, at a higher network level, use other common Internet protocols to communicate, such as Hypertext Transfer Protocol (“HTTP”), file transfer protocol (“FTP”), Andrew File System (“AFS”), wireless application protocol (“WAP”), etc. As an example, where HTTP is used, user system 12 might include a HTTP client commonly referred to as a “browser” for sending and receiving HTTP messages from a HTTP server at MTS 16. Such a HTTP server might be implemented as the sole network interface between MTS 16 and network 14, but other techniques might be used as well or instead. In some embodiments, the interface between MTS 16 and network 14 includes load sharing functionality, such as round-robin HTTP request distributors to balance loads and distribute incoming HTTP requests evenly over a plurality of servers. Preferably, each of the plurality of servers has access to the MTS's data, at least as for the users that are accessing that server.
In aspects, the system shown in
One arrangement for elements of MTS 16 is shown in
Some elements in the system shown in
According to one embodiment, each user system 12 and all of its components are operator configurable using applications, such as a browser, including program code run using a central processing unit such as an Intel Pentium® processor or the like. Similarly, MTS 16 (and additional instances of MTS's, where more than one is present) and all of their components might be operator configurable using application(s) including program code run using a central processing unit such as an Intel Pentium® processor or the like, or multiple processor units. Program code for operating and configuring MTS 16 to intercommunicate and to process web pages and other data and media content as described herein is preferably downloaded and stored on a hard disk, but the entire program code, or portions thereof, may also be stored in any other volatile or non-volatile memory medium or device as is well known, such as a ROM or RAM, or provided on any media capable of storing program code, such as a compact disk (“CD”) medium, digital versatile disk (“DVD”) medium, a floppy disk, and the like. Additionally, the entire program code, or portions thereof, may be transmitted and downloaded from a software source, e.g., over the Internet, or from another server, as is well known, or transmitted over any other conventional network connection as is well known (e.g., extranet, VPN, LAN, etc.) using any communication medium and protocols (e.g., TCP/IP, HTTP, HTTPS, WAP, Ethernet, etc.) as are well known. It will also be appreciated that program code for implementing aspects of the system can be implemented in any programming language that can be executed on a server or server system such as, for example, in C, C++, HTML, Java, JavaScript, WML, any other scripting language, such as VBScript and many other programming languages as are well known.
It should also be understood that each user system 12 may include differing elements, For example, one user system 12 might include a user's personal workstation running Microsoft's Internet Explorer® browser while connected to MTS 16 by VPN, another user system 12 might include a thin-client netbook (e.g., Asus Eee PC®) running the Opera© browser while connected to MTS 16 through an extranet, and another user system 12 might include a PDA running a WAP-enabled browser while connected to MTS 16 over third-party cellular networks.
According to one embodiment, each MTS 16 is configured to provide web pages, forms, data and media content to user systems 12 to support the access by user systems 12 as tenants of MTS 16. As such, MTS 16 provides security mechanisms to keep each tenant's data separate unless the data is shared. If more than one MTS 16 is used, they may be located in close proximity to one another (e.g., in a server farm located in a single building or campus), or they may be distributed at locations remote from one another (e.g., one or more servers located in city A and one or more servers located in city B). As used herein, each MTS 16 could include one or more logically and/or physically connected servers distributed locally or across one or more geographic locations. Additionally, the term “server” is meant to include a computer system, including processing hardware and process space(s), and an associated storage system and database application (e.g., relational database management system (“RDBMS”)), as is well known in the art. It should also be understood that “server system” and “server” are often used interchangeably herein. Similarly, the databases described herein can be implemented as single databases, a distributed database, a collection of distributed databases, a database with redundant online or offline backups or other redundancies, etc., and might include a distributed database or storage network and associated processing intelligence.
It should also be understood that each application server 100 may be communicably coupled to database systems, e.g., system database 106 and tenant database(s) 108, via a different network connection. For example, one application server 1001 might be coupled via the Internet 14, another application server 100N-1 might be coupled via a direct network link, and another application server 100N might be coupled by yet a different network connection. TCP/IP is the currently preferred protocol for communicating between application servers 100 and the database system, however, it will be apparent to one skilled in the art that other transport protocols may be used to optimize the system depending on the network interconnect used.
In aspects, each application server 100 is configured to handle requests for any user/organization. Because it is desirable to be able to add and remove application servers from the server pool at any time for any reason, there is preferably no server affinity for a user and/or organization to a specific application server 100. In one embodiment, therefore, an interface system (not shown) implementing a load-balancing function (e.g., an F5 Big-IP load balancer) is communicably coupled between the application servers 100 and the user systems 30 to distribute requests to the application servers 100. In one aspect, the load balancer uses a least connections algorithm to route user requests to the application servers 100. Other examples of load-balancing algorithms, such as round robin and observed response time, also can be used. For example, in certain aspects, three consecutive requests from the same user could hit three different servers, and three requests from different users could hit the same server. In this manner, MTS 16 is multi-tenant, wherein MTS 16 handles storage of different objects and data across disparate users and organizations.
As an example of storage, one tenant might be a company that employs a sales force where each user (e.g., a salesperson) uses MTS 16 to manage their sales process. Thus, a user might maintain contact data, leads data, customer follow-up data, performance data, goals and progress data, etc., all applicable to that user's personal sales process (e.g., in tenant database 108). In one MTS arrangement, since all of this data and the applications to access, view, modify, report, transmit, calculate, etc., can be maintained and accessed by a user system having nothing more than network access, the user can manage his or her sales efforts and cycles from any of many different user systems. For example, if a salesperson is visiting a customer and the customer has Internet access in their lobby, the salesperson can obtain critical updates as to that customer while waiting for the customer to arrive in the lobby.
While each user's sales data might be separate from other users' sales data regardless of the employers of each user, some data might be organization-wide data shared or accessible by a plurality of users or all of the sales force for a given organization that is a tenant. Thus, there might be some data structures managed by MTS 16 that are allocated at the tenant level while other data structures might be managed at the user level. Because an MTS might support multiple tenants including possible competitors, the MTS, in one implementation, has security protocols that keep data, applications, and application use separate. Also, because many tenants will opt for access to an MTS rather than maintain their own system, redundancy, up-time and backup are more critical functions and need to be implemented in the MTS.
In addition to user-specific data and tenant-specific data, MTS 16 might also maintain system-level data usable by multiple tenants or other data. Such system-level data might include industry reports, news, postings, and the like that are sharable among tenants.
In certain aspects, user systems 30 communicate with application servers 100 to request and update system-level and tenant-level data from MTS 16; this may require one or more queries to system database 106 and/or tenant database 108. MTS 16 (e.g., an application server 100 in MTS 16) automatically generates one or more SQL statements (a SQL query) designed to access the desired information.
Each database can generally be viewed as a collection of objects, such as a set of logical tables, containing data fitted into predefined categories. A “table,” one representation of a data object, is used herein to simplify the conceptual description of objects and custom objects in the present disclosure. It should be understood that “table” and “object” and “entity” may be used interchangeably herein. Each table generally contains one or more data categories logically arranged as columns or fields in a viewable schema. Each row or record of a table contains an instance of data for each category defined by the fields. For example, a CRM database may include a table that describes a customer with fields for basic contact information such as name, address, phone number, fax number, etc. Another table might describe a purchase order, including fields for information such as customer, product, sale price, date, etc. In some multi-tenant database systems, standard entity tables might be provided. For CRM database applications, such standard entities might include tables for Account, Contact, Lead and Opportunity data, each containing pre-defined fields.
The organization ID column 201 is provided to distinguish among organizations using the MTS. As shown, N different organizations have data stored in main table 200. In an exemplary embodiment, the organization IDs in column 201 are defined as Char(15), but may be defined as other data types. In one embodiment, the first 3 characters of the organization ID is set to a predefined prefix, such as “00d”, although another subset of characters in the organization ID may be used to hold such a prefix if desired.
In the particular example of
U.S. patent application Ser. No. 10/817,161 filed Apr. 2, 2004, entitled “CUSTOM ENTITIES AND FIELDS IN A MULTI-TENANT DATABASE SYSTEM,” the entire disclosure of which is incorporated by reference for all purposes, discloses additional features and aspects of entities and fields in a multi-tenant database environment.
According to one embodiment, a categorization methodology provides for multi-dimensional categorization.
Category dimensions advantageously allow for more specific searches on objects or entities by limiting searches to data items associated with categories of interest. In one embodiment, a search may include data items categorized under certain categories; such search results may be produced by using the categories as a filter or as part of the search criteria. In one embodiment, data items are filtered for different users, using user category dimensions, e.g., articles categorized for administrative users may not be shown to non-administrative users.
According to one embodiment, a categorization methodology provides tenant-specific, customizable category dimensions; standard category dimensions may also be used across multiple tenants. A tenant organization using an MTS can create and use their own category dimensions, each containing a number of categories, to categorize their data, e.g., data items in one or more of the various shared entities or objects stored in a MTS. Tenants can also categorize a single data item in multiple ways, along different category dimensions. The categories in a custom category dimension may be named, populated, and maintained by a user of a tenant organization (e.g., a user with administrator-level access) to reflect how they want to categorize their data items within a particular entity.
U.S. Pat. No. 7,130,879, filed May 22, 2000, entitled “SYSTEM FOR PUBLISHING, ORGANIZING, ACCESSING AND DISTRIBUTING INFORMATION IN A COMPUTER NETWORK,” the entire disclosure of which is incorporated by reference for all purposes, discloses additional features and aspects of categorization and category dimensions.
First, administrators define a plurality of category dimensions in the system. In an exemplary embodiment, DataCategoryGroup and DataCategory entities are created to support multi-dimensional categorization. There may be one or more DataCategoryGroup entities for each tenant. A category dimension is represented as a DataCategoryGroup entity, which has either a single tree structure where each node is a category instance or a flat list of category instances. A category dimension may have different configuration settings, e.g., organization_id, name, description, creation_date, last_modified_date, flag_flat_category. The category instances that make up the hierarchy (or flat list) for a category dimension are represented as new DataCategory entities, which are child entities of a DataCategoryGroup entity. A category may have different configuration settings, e.g., parent_id, num_child_nodes, name, description, creation_date, last_modified_date. When a category is deleted, all its child categories are also deleted. Some embodiments may include standard category dimensions, such as Geography, Industry, Product Type, Service Type, etc. An example standard Geography category dimension may represent continents or sales regions covering multiple countries at the top level, each of which include subcategories broken down by country, state, province, county, city, etc. An example standard Industry category dimension may represent a hierarchical set of product categories (e.g., top level categories may include Goods and Services; subcategories of the Services category may include Advertising, Financial, Entertainment, Health Care, Hospitality, Information Technology, Legal, Publishing, Transportation, etc.).
In one embodiment, a first profile administration permission flag (e.g., “ViewDataCategories”) can be defined to enable administrators to view data categories and their underlying tree structure in the setup. A second profile administration permission flag (e.g., “ManageDataCategories”) can be defined to allow administrators to manage the data category groups and their underlying tree structure in the setup.
Administrators then populate the category dimensions, or category groups, with categories, e.g., in a hierarchical fashion. In one embodiment, category dimensions and the categories within them can be localized (e.g., modified to conform to local language and conventions).
Administrators may associate a given entity with a subset of these category dimensions, e.g., the ones that are relevant for the given entity. For example, a category dimension “Distribution Channels” may be relevant where the entity relates to tangible products, but it may not be relevant where the entity relates to online services. Such an association may be stored as one of the configuration settings for the category dimension, or it may be stored in a dedicated entity.
During the creation of a data item for the given entity, the creator can set the relevant categories in each category dimension associated with the given entity for his data item, e.g., using a picklist/drop-down menu of available categories.
When an end user enters a search query for data items in the given entity, the end user can narrow the search by providing filter criteria in the form of category selections. Data items matching the given criteria will be retrieved, according to the appropriate methodology (e.g., multi-selection, multi-position).
A user creates a new offer, called “Christmas gold offer” for the platinum and gold clients having a Nokia phone. As seen in
In the call center, an agent receives a call from a gold client who has a Nokia phone and who wants to change his contract. The agent accesses the call center's data in the MTS and uses the classification to retrieve available offers for gold customers on Nokia phones. Among the offers, the one called “Christmas gold offer” will be retrieved.
According to one embodiment, an administrator or other user can enable an entity for categorization (e.g., by adding an attribute or checkbox on a custom entity), whereupon a relationship can be defined between an entity and a category dimension. In some embodiments, an association entity represents the selection of a category instance for an entity-dimension relationship (e.g., CustomObjectCategorySelection). In some embodiments, the relationship is defined by creating a foreign key (“FK”) field in the entity itself, wherein the FK is associated with a category dimension. Additional fields may also be added to the entity to select configuration settings. In one embodiment, a configuration setting restricts categorization to a single category selection or allows selection of multiple categories. In one embodiment, a configuration setting enforces a requirement that data items in the entity be categorized. In one embodiment, a configuration setting restricts category selections to only leaf nodes of a hierarchical category dimension. An entity may be categorized on multiple category dimensions by creating an entity-dimension relationship for each of them. For an instance of the categorizable entity, the values that are selected for an entity-dimension relationship can be deemed to be the categorization for that specific category dimension.
An entity may be categorized in different category dimensions, e.g., an article may be categorized in the Manufacturers category dimension and in the Regions category dimension. In one embodiment, an entity may also be categorized under multiple categories within each category dimension, e.g., both Nokia and Sony in Manufacturers 520. In one embodiment, an entity may be categorized under multiple categories that are at different levels of a hierarchical category dimension, e.g., Germany and Paris in Regions 500. When multiple categories of multiple category dimensions are selected, there are at least two different methods of applying and interpreting the categorizations: multi-selection and multi-position.
a) and 7(b) illustrate multi-position categorization. In a multi-position context, categorization selections are stored as coordinates of category dimensions. In an exemplary embodiment, each set of coordinates includes a single selected category for each category dimension (e.g., in
Successive categorizations are added to the overall set of multi-selections. In the example, when the data item has already been first categorized under the combination of “Nokia” and “Paris,” and then categorized under the combination of “Sony” and “Stockholm,” it would be redundant to categorize the data item under the combination of “Sony” and “Paris,” or under the combination of “Nokia” and “Stockholm.” In some embodiments, a user can de-select specific categorizations to refine the multi-selection.
In some embodiments, when a user selects a category at an intermediary node (neither leaf node nor root node) in a hierarchical category dimension, the user can explicitly select a subset of nodes that are related to the intermediary node (e.g., all child nodes, or all nodes at level N and above, or all nodes at level N and below, where N is an arbitrary level of the hierarchy selected by the user).
In some embodiments, a user can select multiple levels of a hierarchy of categorized data items at once. In the example illustrated by Regions 500 from
A category-based filter can be used to restrict display of data items of the entity that are displayed to a more selective group (e.g., sidebar filter). When filtering on a category group that is hierarchical, it is possible to specify the following options for the filter:
In one embodiment, if more than one category group filter is defined for the entity, the filters can be combined; in this situation, only those entity objects that satisfy all filters will be displayed. The filter for a category group can also have multiple category selections. In multi-selection categorization mode, the filter would be satisfied for entity objects where the field matches at least one of the specified categories in the filter.
In one embodiment, the data model includes the following 3 tables:
This table stores the category selections across the defined category groups.
For each category in the category group, this table stores a row for each of its parent and child categories.
By way of example, an article (i.e., data item) has been categorized along the two category dimensions in
In one embodiment, if one of two selected categories were a highly selective filter, the query would start with its category dimension and then go through a nested loop to filter along the other category dimension.
It is useful to capture the right statistics at the right level of granularity, and to maintain the right model, so intelligent decisions to optimize the query can be made without creating an unmanageable quantity of data to store. Accordingly, in one embodiment, certain limits may be put in place:
In order to make intelligent decisions on how to construct the most efficient queries for filtering data, it may be useful to make available (e.g., at run-time) certain statistics about the data tables. Since these tables are being constantly updated, the statistics are ideally re-computed on a regular basis. Useful statistics may include:
1. The number of records for an entity that have been categorized at a given category in a category group.
2. The number of records for an entity that have been categorized at or above a given category in a hierarchical category group.
3. The number of records for an entity that have been categorized at or below a given category in a hierarchical category group.
Such statistics can be used to generate an efficient query at run-time.
In one embodiment, a user can categorize a particular data item or set of data items along a particular category dimension with a blank or “not set” value (i.e., “un-categorize” the data item(s)). This means that no particular category within the category dimension has been selected for the data item(s). As applied to hierarchical category dimensions, this concept should not be confused with categorizing the data item(s) to the broadest category in the category dimension (e.g., the “All Regions” category of the Regions 500 dimension in
In one embodiment, a user (e.g., an administrator) can update a category dimension. A category's name can be updated; in addition, entirely new categories can be inserted into a category dimension. A user can also change the parent field of a data category, thereby moving the data category and all of its child nodes to another location in the hierarchy.
In one embodiment, a user (e.g., an administrator) can delete a category dimension. If there are entities that have been associated with the category dimension, then all such associations are also deleted. If there are data items that are categorized along a category dimension that is to be deleted, then any such categorizations are removed.
In one embodiment, a user (e.g., an administrator) can delete a particular category in a category dimension. If there are data items that are categorized at the particular category that is to be deleted, then any such categorizations can be automatically handled in a few different ways: the categorizations can be removed by categorizing those data items to “not set;” the data items can be re-categorized at the parent of the particular category that is to be deleted; or the data items can be re-categorized en masse at a category of the administrator's choice. If the category dimension is hierarchical, and if the particular category to be deleted has child node(s), then any removal operations are run on the child nodes as well as the particular category to be deleted.
In addition, when the hierarchical structure of a category group is modified, it may necessitate changes in the data that is categorized in this category group. In one example, a data item in an Offer entity is categorized at the Paris category in the Geography category group. If the Paris category is deleted from the Geography category group, then one must delete all categorizations that the Offer entity had at the Paris category, which could be hundreds of thousands to millions of records that need to be modified. Such changes in the structure of a category group must therefore be handled strategically to avoid affecting other system transactions.
Once the user makes the structural changes, the metadata of the category group is modified to reflect those changes, but it may take some time for the change to reach all the underlying data items that must be re-categorized in the new category group structure. These data-level changes can be queued for batch processing, where each batch process is given a unique set of records to work on to allow for parallelization of the work. This strategy of batch processing allows for more efficient usage of system resources and avoids the problems that a synchronous, serial process could cause. In one embodiment, while these data-level changes are being made asynchronously (“in the background”), the user may be blocked from making further changes to the category group structure. Once the asynchronous work has completed, the user is once again able to change the category group structure.
User category dimensions provide the ability to share data items and enforce data security according to different user profiles. A user of the MTS can be associated with a user category. User categories advantageously allow for filtering and targeted searches. In one embodiment, an administrator may categorize users according to their roles and locations; such users will be allowed to see data items that have been categorized with matching user categories. In one embodiment, an administrator may categorize users and data items with appropriate user categories in order to ensure that only users with appropriate permissions or levels of authority are able to view the data items categorized with the user categories. In one embodiment, when a user is added to or removed from a particular user category dimension, the visibility of associated data items changes automatically, with respect to that user.
While the invention has been described by way of example and in terms of the specific embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. For example, in one embodiment, an intermediate server or other computer provides one or more interfaces (e.g., an Application Programming Interface (“API”), a web service, an HTTP-based interface, or other conventional protocol for transmitting instructions) to a MTS in order to enable a user to perform one or more of the operations described herein. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
This application claims the benefit of the filing date of U.S. Provisional Application No. 61/256,858, filed on Oct. 30, 2009, the disclosure of which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61256858 | Oct 2009 | US |