The present technology pertains to an event log of a content management system, and more specifically pertains to the creation and querying of the event log.
Recently, users have begun storing and managing all their personal digital information by using a content management system. Such services allows users to upload and store their personal digital information on server computers accessible on the Internet or other networks from various client devices. In some instances, the service may synchronize information between client devices and service server computers to facilitate user access to information locally at the client devices. One well-known content management system is the DROPBOX content management system provided by Dropbox, Inc. of San Francisco, Calif.
As user storage more information in the content management system, finding this information, can become a challenge. Fortunately computers are powerful tools for searching for relevant information among a vast amount of digital information.
Users of the content management system can modify their personal digital data in a number of ways. In some instances, users can edit, create, rename, or delete their personal digital data stored in the content management system. In other instances, users can comment, view, and share their personal digital data stored in the content management system. In still other instances, users can collaborate with other users using notes programs, such as the PAPER collaborative notes provided by Dropbox, Inc. These modifications are all time-related events (i.e., event streams) that take place on the content management system.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or can be learned by practice of the herein disclosed principles. The features and advantages of the disclosure can be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features of the disclosure will become more fully apparent from the following description and appended claims, or can be learned by the practice of the principles set forth herein.
Disclosed are systems, methods, and non-transitory computer readable mediums of querying an event log of a user. The systems, methods, non-transitory computer readable mediums can include receiving a search request pertaining to at least a first namespace of a plurality of namespaces and determining a first index server storing a first portion of the event log associated with the first namespace, the first index server being one of a plurality of index servers, each of the plurality of index servers storing a portion of the event log pertaining to at least one namespace of the plurality of namespaces. The systems, method, non-transitory computer readable mediums can further include searching, the first portion of the event log stored at the first index server and determining a payload based on the search results of the first portion of the event logs. Finally, the systems, methods, and non-transitory computer readable mediums can include applying one or more attribute filters to the payload and sending the filtered payload.
The systems, methods, and non-transitory computer readable mediums can also include that the search request includes a user identifier and a time period. 3. The systems, methods, and non-transitory computer readable mediums can also include the search request also pertaining to a shared namespace accessible by a user account.
The systems, methods, and non-transitory computer readable mediums can also include that the search server determines that the shared namespace is stored by a second index server of the plurality of index servers, the second index server storing a second portion of the event log associated with the shared namespace and searching the second portion of the event log stored at the second index server concurrently with the searching of the first portion of the event log at the first index server.
The systems, methods, and non-transitory computer readable mediums can also include that the search server determines that the shared namespace is stored by the first index server, the first index server storing a second portion of the event log associated with the shared namespace and searching the second portion of the event log stored at the first index server concurrently with the searching of the first portion of the event log at the first index server.
The systems, methods, and non-transitory computer readable mediums can also include the event log includes a namespace index and a user identifier index, the searching of the namespace index and the user identifier index is performed in parallel, and intersecting the results of the namespace index and the user identifier index searches.
The systems, methods, and non-transitory computer readable mediums can also include determining that the payload is based on intersecting a main index with the search results from the portions of the event log of the plurality of index servers. The systems, methods, and non-transitory computer readable mediums can also include the attribute filters include at least one of an action type, path, event identifier.
The systems, methods, and non-transitory computer readable mediums can also include concurrently searching a live cache of events based on the search request, wherein the live cache of events stores events that are more recent than the events stored in the plurality of the index servers, combining the search results from the live cache and the filtered payload, wherein the combining is based on chronological order, and sending the combined results.
The above-recited and other advantages and features of the disclosure will become apparent by reference to specific embodiments thereof which are illustrated in the appended drawings. Understanding that these drawings depict only example embodiments of the disclosure and are not therefore to be considered to be limiting of its scope, the principles herein are described and explained with additional specificity and detail through the use of the accompanying drawings in which:
Various embodiments of the disclosure are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the disclosure.
It will be appreciated that for simplicity and clarity of illustration, where appropriate, reference numerals have been repeated among the different figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein can be practiced without these specific details. In other instances, methods, procedures and components have not been described in detail so as not to obscure the related relevant feature being described. The drawings are not necessarily to scale and the proportions of certain parts may be exaggerated to better illustrate details and features. The description is not to be considered as limiting the scope of the embodiments described herein.
The disclosed technology addresses the need in the art for querying an event log of content items stored at a content management system. A user can perform actions on accessible content items (e.g., in authorized namespaces) stored at the content management system. The actions can be stored in an event log of the content management system to create a history of user actions. The ability to query an event log enables a user access to historical details of actions performed over a period of time.
With respect to implementing various embodiments of the disclosed technology, an example system configuration 100 is shown in
In system 100, a user can interact with content management system 106 (e.g., an online synchronized content management system) through client devices 1021, 1022, . . . , 102n (collectively “102”) connected to network 104 by direct and/or indirect communication. Content management system 106 can support connections from a variety of different client devices, such as: desktop computers; mobile computers; mobile communications devices, e.g. mobile phones, smart phones, tablets; smart televisions; set-top boxes; and/or any other network enabled computing devices. Client devices 102 can be of varying type, capabilities, operating systems, etc. Furthermore, content management system 106 can concurrently accept connections from and interact with multiple client devices 102.
A user can interact with content management system 106 via a client-side application installed on client device 102i. In some embodiments, the client-side application can include a content management system specific component. For example, the component can be a stand-alone application, one or more application plug-ins, and/or a browser extension. However, the user can also interact with content management system 106 via a third-party application, such as a web browser, that resides on client device 102i and is configured to communicate with content management system 106. In either case, the client-side application can present a user interface (UI) for the user to interact with content management system 106. For example, the user can interact with the content management system 106 via a client-side application integrated with the file system or via a webpage displayed using a web browser application.
Content management system 106 can enable a user to store content, as well as perform a variety of content management tasks, such as retrieve, modify, browse, and/or share the content. Furthermore, content management system 106 can enable a user to access the content from multiple client devices 102. For example, client device 102i can upload content to content management system 106 via network 104. Later, the same client device 102i or some other client device 102j can retrieve the content from content management system 106.
To facilitate the various content management services, a user can create an account with content management system 106. User account database 150 can maintain the account information. User account database 150 can store profile information for registered users. In some cases, the only personal information in the user profile can be a username and/or email address. However, content management system 106 can also be configured to accept additional user information such as birthday, address, billing information, etc. Any user information or account information would be stored and used according to an industry accepted privacy policy.
User account database 150 can include account management information, such as account type (e.g. free or paid), usage information, (e.g. file edit history), maximum storage space authorized, storage space used, content storage locations, security settings, personal configuration settings, content sharing data, etc. Account management module 124 can be configured to update and/or obtain user account details in user account database 150. The account management module 124 can be configured to interact with any number of other modules in content management system 106.
An account can be used to store content, such as digital data, documents, text files, audio files, video files, etc., from one or more client devices 102 authorized on the account. The content can also include collections for grouping content items together with different behaviors, such as folders, playlists, albums, etc. For example, an account can include a public folder that is accessible to any user. The public folder can be assigned a web-accessible address. A link to the web-accessible address can be used to access the contents of the public folder. In another example, an account can include: a photos collection that is intended for photos and that provides specific attributes and actions tailored for photos; an audio collection that provides the ability to play back audio files and perform other audio related actions; or other special purpose collection. An account can also include shared collections or group collections that are linked with and available to multiple user accounts. The permissions for multiple users may be different for a shared collection.
The content can be stored in content storage 160. Content storage 160 can be a storage device, multiple storage devices, or a server. Alternatively, content storage 160 can be a cloud storage provider or network storage accessible via one or more communications networks. Content management system 106 can hide the complexity and details from client devices 102 so that client devices 102 do not need to know exactly where or how the content items are being stored by content management system 106. In some embodiments, content management system 106 can store the content items in the same collection hierarchy as they appear on client device 102i. However, content management system 106 can store the content items in its own order, arrangement, or hierarchy. Content management system 106 can store the content items in a network accessible storage (NAS) device, in a redundant array of independent disks (RAID), etc. Content storage 160 can store content items using one or more partition types, such as FAT, FAT32, NTFS, EXT2, EXT3, EXT4, HFS/HFS+, BTRFS, and so forth.
Content storage 160 can also store metadata describing content items, content item types, and the relationship of content items to various accounts, collections, or groups. The metadata for a content item can be stored as part of the content item or can be stored separately. In one variation, each content item stored in content storage 160 can be assigned a system-wide unique identifier.
Content storage 160 can decrease the amount of storage space required by identifying duplicate content items or duplicate segments of content items. Instead of storing multiple copies, content storage 160 can store a single copy and then use a pointer or other mechanism to link the duplicates to the single copy. Similarly, content storage 160 can store content items more efficiently, as well as provide the ability to undo operations, by using a content item version control that tracks changes to content items, different versions of content items (including diverging version trees), and a change history. The change history can include a set of changes that, when applied to the original content item version, produce the changed content item version.
Content management system 106 can be configured to support automatic synchronization of content from one or more client devices 102. The synchronization can be platform agnostic. That is, the content can be synchronized across multiple client devices 102 of varying type, capabilities, operating systems, etc. For example, client device 102i can include client software, which synchronizes, via a synchronization module 132 at content management system 106, content in client device 102i's file system with the content in an associated user account. In some cases, the client software can synchronize any changes to content in a designated collection and its sub-collections, such as new, deleted, modified, copied, or moved content items or collections. The client software can be a separate software application, can integrate with an existing content management application in the operating system, or some combination thereof. In one example of client software that integrates with an existing content management application, a user can manipulate content items directly in a local collection, while a background process monitors the local collection for changes and synchronizes those changes to content management system 106. Conversely, the background process can identify content that has been updated at content management system 106 and synchronize those changes to the local collection. The client software can provide notifications of synchronization operations, and can provide indications of content statuses directly within the content management application. Sometimes client device 102i may not have a network connection available. In this scenario, the client software can monitor the linked collection for content item changes and queue those changes for later synchronization to content management system 106 when a network connection is available. Similarly, a user can manually start, stop, pause, or resume synchronization with content management system 106.
A user can view or manipulate content via a web interface generated and served by user interface module 122. For example, the user can navigate in a web browser to a web address provided by content management system 106. Changes or updates to content in the content storage 160 made through the web interface, such as uploading a new version of a content item, can be propagated back to other client devices 102 associated with the user's account. For example, multiple client devices 102, each with their own client software, can be associated with a single account and content items in the account can be synchronized between each of the multiple client devices 102.
Content management system 106 can include a communications interface 120 for interfacing with various client devices 102, and can interact with other content and/or service providers 1081, 1082, . . . , 108n (collectively “108”) via an Application Program Interface (API). Certain software applications can access content storage 160 via an API on behalf of a user. For example, a software package, such as an application running on a smartphone or tablet computing device, can programmatically make calls directly to content management system 106, when a user provides credentials, to read, write, create, delete, share, or otherwise manipulate content. Similarly, the API can allow users to access all or part of content storage 160 through a web site.
Content management system 106 can also include authenticator module 126, which can verify user credentials, security tokens, API calls, specific client devices, and so forth, to ensure only authorized clients and users can access content items. Further, content management system 106 can include analytics module 134 module that can track and report on aggregate file operations, user actions, network usage, total storage space used, as well as other technology, usage, or business metrics. A privacy and/or security policy can prevent unauthorized access to user data stored with content management system 106.
Content management system 106 can include sharing module 130 for managing and sharing content publicly or privately. Sharing content publicly can include making the content item accessible from any computing device in network communication with content management system 106. Sharing content privately can include linking a content item in content storage 160 with two or more user accounts so that each user account has access to the content item. The sharing can be performed in a platform agnostic manner. That is, the content can be shared across multiple client devices 102 of varying type, capabilities, operating systems, etc. The content can also be shared across varying types of user accounts.
In some embodiments, content management system 106 can be configured to maintain a content directory identifying the location of each content item in content storage 160. The content directory can include a unique content entry for each content item stored in the content storage.
A content entry can include a content path that can be used to identify the location of the content item in a content management system. For example, the content path can include the name of the content item and a folder hierarchy associated with the content item. For example, the content path can include a folder or path of folders in which the content item is placed as well as the name of the content item. Content management system 106 can use the content path to present the content items in the appropriate folder hierarchy.
A content entry can also include a content pointer that identifies the location of the content item in content storage 160. For example, the content pointer can include the exact storage address of the content item in memory. In some embodiments, the content pointer can point to multiple locations, each of which contains a portion of the content item.
In addition to a content path and content pointer, a content entry can also include a user account identifier that identifies the user account that has access to the content item. In some embodiments, multiple user account identifiers can be associated with a single content entry indicating that the content item has shared access by the multiple user accounts.
To share a content item privately, sharing module 130 can be configured to add a user account identifier to the content entry associated with the content item, thus granting the added user account access to the content item. Sharing module 130 can also be configured to remove user account identifiers from a content entry to restrict a user account's access to the content item.
To share content publicly, sharing module 130 can be configured to generate a custom network address, such as a uniform resource locator (URL), which allows any web browser to access the content in content management system 106 without any authentication. To accomplish this, sharing module 130 can be configured to include content identification data in the generated URL, which can later be used to properly identify and return the requested content item. For example, sharing module 130 can be configured to include the user account identifier and the content path in the generated URL. Upon selection of the URL, the content identification data included in the URL can be transmitted to content management system 106 which can use the received content identification data to identify the appropriate content entry and return the content item associated with the content entry.
In addition to generating the URL, sharing module 130 can also be configured to record that a URL to the content item has been created. In some embodiments, the content entry associated with a content item can include a URL flag indicating whether a URL to the content item has been created. For example, the URL flag can be a Boolean value initially set to 0 or false to indicate that a URL to the content item has not been created. Sharing module 130 can be configured to change the value of the flag to 1 or true after generating a URL to the content item.
In some embodiments, sharing module 130 can also be configured to deactivate a generated URL. For example, each content entry can also include a URL active flag indicating whether the content should be returned in response to a request from the generated URL. For example, sharing module 130 can be configured to return a content item requested by a generated link if the URL active flag is set to 1 or true. Thus, access to a content item for which a URL has been generated can be easily restricted by changing the value of the URL active flag. This allows a user to restrict access to the shared content item without having to move the content item or delete the generated URL. Likewise, sharing module 130 can reactivate the URL by again changing the value of the URL active flag to 1 or true. A user can thus easily restore access to the content item without the need to generate a new URL.
In some embodiments content management system 106 includes event log 170 that is configured to record events taking place with content items in content management system 106. A non-exclusive list of events can include add, delete, edit, view, share, comment, etc. Event log 170 can include data to identify when an event occurred, a content item identifier, a unique event identifier, event type, a user that performed the event, and the events, among other attributes. In some embodiments event log 170 can be queried by analytics module 134 to aggregate content item operations or user actions, to determine technology, usage, or business metrics, to aid search results when searching for a content item, and to identify potentially unintentional actions performed by user device 102i.
While content management system 106 is presented with specific components, it should be understood by one skilled in the art that the architectural configuration of system 106 is simply one possible configuration and that other configurations with more or fewer components are possible.
The event creation devices can create an event regarding an action on a content item by a user. For example, User A can add a new document to content management system content items 202. Thus, an event (i.e., add) was performed on a content item (i.e., document) by a user (i.e., User A). Content management system content items 202 can be stored at a service such as, DROPBOX content management system. In some embodiments, content management system content items 202 can be stored at any such service for storing cloud-based content items. Events are generated based on additions or modifications to content management system content items 202. For example, adding, creating editing, deleting, moving, renaming a content item. When a user modifies a content item (e.g., file, directory, etc.) an event identifier (e.g., unique identifier) and event type are created and associated with the user (e.g., user identifier and namespace identifier). After an event is created, the event identifier and all information associated with the event can be sent to routing servers 208.
Front-end server 204 can be an application server hosting content management system content items, such as DROPBOX content management service. In other embodiments, front-end server 204 can be any server configured for storing content items. The source data for creating events on an application server are tasks performed on content items stored at the application server. Thus, events are generated based on interactions with the content items. For example, viewing, sharing, or comments on content items stored at the application server. When a user interacts with a content item (e.g., file, directory, etc.) an event identifier and event type are created and associated with the user (e.g., user identifier and namespace identifier). After an event is created, the event identifier and all information associated with the event can be sent to routing servers 208.
Application program interface (API) 206 can be a set of routines, protocols and tools for interacting with content management system content items, such as PAPER collaborative notes. The source data for creating events through an API are interactions with a content management server by the API. Thus, events are generated based on the interactions with the content management service, through the API. For example, posting, sharing, commenting, editing with other users in a collaborative notes environment. When a user interacts in an API, an event identifier and event type are created and associated with the user (e.g., user identifier and namespace identifier). After an event is created, the event identifier and all information associated with the event can be sent to routing servers 208.
Routing servers 208 can include processor 210 configured to receive event data from content management system content items 202, front-end server 204, and API 206. Upon receiving event data, processor 210 can store the event data in live cache 211, while an appropriate index server is determined. Processor 210 can determine an appropriate index server (e.g., 216A, 216B, 216C, etc.) for the events data to be written using several factors (e.g., load balancing, capacity, throughput, latency, failover, and/or redundancy). Processor 210 can also determine an appropriate index server based on a namespace of the user. For example, a user can have access to two types of namespaces, a root namespace and a shared namespace. A root namespace, on a content management service, is assigned to the user on creation of an account with the content management server. The user has read/write access to the root namespace. A shared namespace, on a content management service, can be assigned (or accessed) to multiple users at various times throughout the life of the shared namespace. Processor 210 can determine an appropriate index server based on the namespace the event takes place. For example, when an event takes place on a content item stored in the root namespace (or shared namespace), processor 210 can determine an index server where the root namespace events have been stored previously and select that same index server for storage of the current event data.
In some embodiments of the invention, when processor 210 determines an index server, the resulting mapping of the index server and namespace identifier are stored in mapping index 214 for future query requests. Deterministic mapping function 212 can receive identifying information of the index server were the event data was stored (e.g., a hostname or a network address) and the namespace identifier. The deterministic mapping function 212 can apply a hash function (e.g., one-way has function, a simple hash function, a consistent hash function, etc.) to the namespace identifier and store hashed value and the identifying information in mapping index 214. In other embodiments, deterministic mapping function 212 may include a hash mechanism and a modulo mechanism (shown in
Although, in
Index servers 216 can includes a plurality of servers distributed in a horizontal fashion to provide load balancing, failover, or redundancy for sharded event indexes. In this case, each of the multiple index servers may store a replica or a copy of the sharded event indexes. The sharded event indexes can be stored on one or more partitions 218A, 218B, 218C of index servers 216 (as shown in
Processor 310 is distributed over two-levels of servers: (1) one or more servers 316A, 316B, 316C (collectively referred to as “index servers 316”) responsible for storing sharded event indexes 320n and processing queries 330 against sharded event indexes 320n (e.g., main index 322, namespace index 324, user identifier index 326, and attributes index 328) and (2) one or more servers 308 (collectively referred to as “routing servers 308”) responsible for routing queries 330 from front-end servers 304 to the appropriate index servers 316 based on namespace identifiers associated with the queries 330 and combining answers 334 returned from index servers 316 into answers 332 that are then returned to the front-end servers 304 and ultimately to the client devices.
Each index (e.g., 322, 324, 326, and 328) of sharded event index 320n may be stored at a corresponding index server (e.g., 316A, 316B, 316C . . . 316N). Each index (e.g., 322, 324, 326, and 328) at an index server (e.g., 316B) may index events in one or more namespaces assigned to the index server (e.g., 316B).
In operation, front-end server 304 receives a search query 330 from a client device (e.g., 102n) and returns a personalized answer 332 thereto back to the client device. Answer 332 may be personalized in the sense that the events identified in answer 334 as relevant to query 330 may be restricted to only events that belong to a namespace that the user is authorized to access. As such, query 330 may be received at front-end search 304 in the context of an authenticated session established for an authenticated user (e.g., by username/password pair, FOB or mobile phone, biometric measurement, etc.). For example, the authenticated user may be a user of the client device that sent query 330 to front-end server 304.
Serving system 300 is capable of restricting answer 332 of query 330 to identifying only events indexed in event index 320n that satisfy query 330 and that belong to a namespace that the authenticated user is authorized to access (e.g., root namespace, shared namespace, etc.). Serving system 300 is able to perform this restricting even though event index 320n may index events that satisfy query 330 but that belong to a namespace that the authenticated user is not authorized to access.
The network request including query 330 from the user's client device may also specify identifier(s) of namespace(s) that the user wishes to search. In this case, an intersection of the set of identifier(s) of namespace(s) that the user wishes to search and the set of identifier(s) of authorized namespace(s) the user is permitted to access may be computed to determine identifier(s) of authorized namespace(s) to search. This intersection may be performed by front-end server 304.
If the network request including query 330 does not specific any requested namespaces to search, then a default set of identifier(s) of authorized namespace(s) to search may be selected. The default set can identify a) all namespaces the user is permitted to access (e.g., all namespaces associated with the authenticated user's account), or b) a subset thereof.
After the identifier(s) of authorized namespace(s) to search have been determined, a network request including query 330 and the identifier(s) of authorized namespace(s) to search may be sent from front-end server 304 to routing server 308 for further processing of query 330 by processor 310. In some embodiments, the network request (or query 330) can also include a user identifier (e.g., associated with the user's account), a type of event (e.g., add, edit, modify, delete, comment, view, share, etc.), a time range, one or more attributes (e.g., action type, path, document identifier, etc.), and/or a maximum number of results.
In response to receiving the network request from front-end server 304 including query 330 and the identifier(s) of the authorized namespace(s) to search, processor 310 at routing server 308 determines one or more index servers 316 to which to route query 330. This determination may be made based on results of routing server 308 applying deterministic mapping function 312 to each of the identifier(s) of the authorized namespace(s) to search. The deterministic mapping function 312 and mapping index 314, given an identifier of an namespace, may be used by routing server (e.g., 308) to determine an index server (e.g., 316B) that stores a sharded event index (e.g., 320n) that indexes documents in the given namespace.
According to some embodiments of the invention, deterministic mapping function 312 applies a one-way hash function, a simple hash function, a consistent hash function, or the like to a namespace identifier to search in order to determine an sharded event index (e.g., 320n) to which the namespace is assigned. To make this determination, processor 310 at routing server 308 may have access to mapping index 314. Together, deterministic mapping function 312 and mapping index 314 provide a way for processor 310 at routing server 308 to determine a hostname or a network address of an index server (e.g., 316B) at which a sharded event index (e.g., 320n) containing indexes for a given namespace is stored.
In some embodiments of the invention, deterministic mapping function 312 may include a hash mechanism and a modulo mechanism. Hash mechanism may accept a namespace identifier as input (e.g., character string data representing the namespace identifier) and may produce a hash value hv as output. For example, the hash mechanism may comprise the MD4, MD5, SHA-1, or SHA2 message-digest algorithm which, when applied to a namespace identifier provided as input, produces a hash value (e.g., a 32-bit hash value) as output. The modulo mechanism may compute the remainder r of division of the hash value hv by a modulus k, thereby mapping the input namespace identifier to one of k values in the range of 0 to k−1. The value of the modulus k may be selected based on a variety of different factors including, for example, the number of actual, expected, or desired index servers 316, the number of actual, expected, or desired namespaces indexed by event indexes 320n, and/or the number of actual, expected, or desired namespaces groups. In one exemplary embodiment, the value k is a power of 2 and equals at least 1024.
In some embodiments of the invention, mapping index 314 includes an entry for each index servers 316. Each such entry is keyed by one or more non-overlapping sub-ranges in the range 0 to k−1. For example, a first entry E1 in mapping index 314 may have a key including the values K1 and K2 defining a first range of consecutive values in the range 0 to k−1 and a second entry E2 in mapping 214 may have a key including the values K3 and K4 defining a second range of consecutive values in the range 0 to k−1 where the first range K1 to K2 does not overlap the second range K3 to K4.
When processor 310 at routing server 308 applies deterministic mapping function 312 to a given namespace identifier, a value r in the range 0 to k−1 may be produced. Processor 310 at routing server 308 may then consult mapping index 314 with the value r to identify the entry for which r is within the range of the entry key. A hostname or network address of this entry may identify an index server (e.g., 316) at which sharded event index (e.g., 320n) that indexes events belonging to the given namespace are stored.
In some embodiments of the invention, the namespaces assigned to a sharded event index (e.g., 320n) are grouped into a fewer number of namespace groups of the sharded event index so as to reduce the number of index files stored at the index server (e.g., 316) at which the sharded event index is stored. In other words, within an index shard (e.g., 320n), the namespaces assigned to the sharded event index may be partitioned into namespace groups (e.g., partitions 318). Each such namespace group may comprise multiple namespaces.
Although, in
In some instances, an index server (e.g., 316B) actually includes a plurality of servers distributed in a horizontal fashion to provide load balancing, failover, or redundancy for sharded event index 320n. In this case, each of the multiple index servers may store a replica or a copy of sharded event index 320n.
In some instances, index server 316 includes multiple servers in which each of the multiple servers stores a portion of sharded event index 320n. In this case, there may be multiple levels of routing servers. A first routing level is exemplified by routing server 308 that routes query 330 received from front-end server 304 to one or more index servers 316. A second level of routing servers may exist to further route queries within index server 316 to one or more of the multiple servers of the index server. In this case, the second level routing servers may also have a deterministic mapping function and mapping like deterministic mapping function 312 and mapping index 314 for further routing the queries based on identifiers of namespaces.
In the example illustrated in of
When routing query 330 to index server 316B, routing server 308 may send a network request to the index server including query 330. In addition, the network request may comprise identifier(s) of authorized namespace(s) to search assigned to sharded event index 320n stored at that index server. In addition, each such authorized namespace identifier may be associated in the network request with an identifier of the namespace group to which the namespace belongs.
In some embodiments, sharded event indexes 320n indexes events in four different indexes, main index 322, namespace index 324, user identifier index 326, and attribute index 328. These indexes can be key/value pair indexes (e.g., LevelDB, etc.). The indexes can store events in reverse time order (i.e., newest events at top of index). Main index 322 of sharded event index 320n of index server 316B includes all events (and associated data of the events) of the namespaces for which index server 316B serves. Namespace index 324 and user identifier index 326 include subsets of events (and subset of associated data of the events) stored in the main index 322 based on namespace identifiers and user identifiers, respectively. When routing server 308 determines (based on mapping function 312) that index server 316B includes events for a namespace of query 330, a partial query 330B is routed to index server 316B. When index server 316B receives partial query 330B, processor 310 can determine a partition 318 where events of the namespace to be queried are located. Namespace index 324 and user identifier index 326 can be traversed in parallel (i.e., keys of the indexes are traversed to determine matches with the namespace identifier or user identifier). Namespace index 324 is traversed with a namespace identifier provided by query 330B and user identifier index 326 is traversed with a user identifier provided by query 330B. In doing so, processor 310 at the index server 316B may restrict the events that can possibly be identified in answer 332B to only events that belong to an authorized namespace or user identifier to be searched. In some embodiments, the indexes are traversed for only a specific time period (e.g., last 24 hours, etc.). The results from the query on namespace index 324 can be intersected with the results from the query on user identifier index 326 by event identifier (i.e., to remove duplicates).
The intersected results from the traversal of namespace index 324 and user identifier index 326 include all events that intersect with an authorized namespace (i.e., root namespace or shared namespace) and user identifier (i.e., associated with user account). However, the intersected results include only a subset of the associated data. In order to complete the query, the intersected results can be combined with main index 322. Main index 322, as previously described, include all events and all associated data (e.g., namespace identifier, user identifier, event type, time stamp, etc.). The combination of main index 322 and the intersected results from the query of namespace index 324 and user identifier 326 creates a payload (i.e., a query of the events associated with the user identifier and namespace, along with all associated data).
In some instances, the payload can be filtered with attribute index 328. For example, when one or more attributes are included with query 330. The filtering can be performed before or after the payload is determined. In some embodiments, the attributes can be used to filter the payload. For example, an attribute of “path” can be provided with the query. In response to the attribute “path,” the payload filters out all results where the “path” provided in the query does not equal the “path” in the payload.
In response to a completed query, index server 316B can send answer 334B to routing server 308 which may identify one or more events in one or more of the authorized namespaces that satisfy the query 330. In response to receiving answer 332B (and any other partial answers from other index serves 316) routing server 308 can either send answer 332 to front end server 304 to return to the user or can combine the received answer 332B with live cache 311 of events. For example, routing servers 308 can include live cache 311 of events (i.e., before they are written to sharded event indexes 320n of index servers 316). Live cache 311 includes the most recent events. In some instances, live cache 311 can be queried in parallel to the indexes 320n. Answer 332 (i.e., the combination of the results from the query of the live cache and answer 332B) can be sent to front end server 304.
As an example, assume the network request including query 330 sent from the front-end server 304 to the routing server 308 specifies that two authorized namespaces are to be searched with corresponding namespace identifiers “abcd” and “defg” and one user is to be searched with corresponding user identifier “User A”. Further assume that according to deterministic mapping function 312 and mapping index 314, authorized namespace “abcd” belongs to namespace group “1234” and is assigned to sharded event index 320n of index server 316A and authorized namespace “defg” belongs to namespace group “5678” and is assigned to sharded event index 320n of index server 316B. In this case, the network request from routing server 308 to the index server 316A may specify that namespace “abcd” in namespace group “1234” is to be searched and the network request from routing server 308 to index server 316B may specify that namespace “defg” in namespace group “5678” is to be searched. Index server 316A may use the namespace group identifier “1234” in the network request sent to index server 316A to traverse namespace index 324 and return results matching identifier “1234” along with associated event identifiers. Similarly, user identifier index 326 can be traversed (in parallel) with user identifier “User A” to return results matching the user identifier and associated event identifiers. Similarly, the index server 316B may use the namespace group identifier “5678” and user identifier “User A” in the network request sent to index server 316B to search the corresponding indexes. The traversal results of namespace index 324 and user identifier index 326 can be intersected by the associated event identifiers (i.e., to remove duplicates). The intersected results can then be combined with main index 322 to create a payload. The payload includes all data associated with the event (e.g., event type, time stamp, user identifier, namespace identifier, etc.).
As illustrated in
A content ID in column 434 can identify each content item in the table. For example, the delete event in the first row (timestamp ‘t1’) is associated with a content item given content Identifier “1” in column 434. Content Identifiers (column 434) can be any assigned value or a hash of the content item name or portion of the content item contents. In some embodiments the content item can be identified by the content item name in the path stored in column 438, and an explicit content identifier such as illustrated in column 434 may not be needed.
An event, such as the delete event in the first row of the table is also associated with a timestamp (column 432). In some embodiments, the timestamp can be the time that the event was committed to content management system 106. In some embodiments the timestamp can be a time that the event actually occurred (events can occur on client device 102 and synced and committed to content management system 106 using synchronization module 132 at a later time).
An event, such as the delete event in the first row of the table is also associated with a Namespace (column 436). A namespace can be considered to be analogous to a root level of a file system directory, except that content management system 106 manages many namespaces. As such, each namespace is an abstraction for the root directory of a more traditional file system directory tree. Each user has private access to a root namespace. In addition, every shared collection is a namespace that can be mounted within one or many root namespaces. With this abstraction, every content item on content management system 106 can be uniquely identified by two values: a namespace (column 436) and a relative path (column 438). The namespaces shown in column 436 can be root namespaces or shared collection namespaces. The paths shown in column 438 reflect a path under either a root namespace or shared collection namespace. The path can identify subdirectories and end in a file name.
The table illustrated in
The method shown in
Each sequence shown in
In the example, when the search request is received at front-end server 304, the search request can be sent to one of a plurality of routing servers 308. In other examples, the front-end server 304 and routing servers 308 are the same. When the search request is received, method 500 can proceed to sequence 510.
At sequence 510, one or more index servers storing event logs can be determined. In some examples, processor 310 can invoke mapping function 312 to search mapping index 314 for matches with the namespace identifiers (e.g., identified in sequence 505) as illustrated in
At sequence 515, search request (e.g., query 330) can be sent to the identified index servers and event logs (e.g., sharded event index 320n) stored at the identified index servers can be searched. The event logs can include main index 322, namespace index 324, user identifier index 326, and attribute index 328. As illustrated in
At sequence 520, a payload can be determined from the results of the event log search. In some instances, a payload can be determined by intersecting the search results of the event log with main index 322. For example, the event identifiers from the search results of the event log can be used to locate the corresponding event identifiers in main index 322. As previously discussed, main index 322 includes event identifiers and all corresponding data (e.g., event type, namespace, user identifier, timestamp, attributes, etc.). When a payload has been determined, method 500 can proceed to sequence 525.
At sequence 525, one or more attribute filters can be applied to the payload. In some examples, one or more attributes can be included in the search request (e.g., query 330). The one or more attributes can be used to filter out non-matches with the payload. For example, if an attribute filter of “path” is applied to the payload, the payload will only include event identifiers (and the associated data) that match the “path” attribute (i.e., removing all other event identifiers from the payload). In other examples, one or more event identifiers can be determined from traversing attribute index 328. The results from the traversal can then be intersected with the payload (or the results from sequence 415). When the payload has been filtered, method 500 can proceed to sequence 530.
At sequence 530, the filtered payload can be sent from index servers 316 to routing servers 308. Routing servers 308 can in-turn send the filtered payload to front-end sever 304, which can provide the filtered payload to the user. In some examples and as shown in
During event queries, as shown in
To enable user interaction with the computing device 700, an input device 745 can represent any number of input mechanisms, such as a microphone for speech, a touch-sensitive screen for gesture or graphical input, keyboard, mouse, motion input, speech and so forth. An output device 735 can also be one or more of a number of output mechanisms known to those of skill in the art. In some instances, multimodal systems can enable a user to provide multiple types of input to communicate with the computing device 700. The communications interface 740 can generally govern and manage the user input and system output. There is no restriction on operating on any particular hardware arrangement and therefore the basic features here may easily be substituted for improved hardware or firmware arrangements as they are developed.
Storage device 730 is a non-volatile memory and can be a hard disk or other types of computer readable media which can store data that are accessible by a computer, such as magnetic cassettes, flash memory cards, solid state memory devices, digital versatile disks, cartridges, random access memories (RAMs) 725, read only memory (ROM) 720, and hybrids thereof.
The storage device 730 can include software modules 732, 734, 736 for controlling the processor 710. Other hardware or software modules are contemplated. The storage device 730 can be connected to the system bus 705. In one aspect, a hardware module that performs a particular function can include the software component stored in a computer-readable medium in connection with the necessary hardware components, such as the processor 710, bus 705, display 735, and so forth, to carry out the function.
Chipset 760 can also interface with one or more communication interfaces 790 that can have different physical interfaces. Such communication interfaces can include interfaces for wired and wireless local area networks, for broadband wireless networks, as well as personal area networks. Some applications of the methods for generating, displaying, and using the GUI disclosed herein can include receiving ordered datasets over the physical interface or be generated by the machine itself by processor 755 analyzing data stored in storage 770 or 775. Further, the machine can receive inputs from a user via user interface components 785 and execute appropriate functions, such as browsing functions by interpreting these inputs using processor 755.
It can be appreciated that exemplary systems 700 and 750 can have more than one processor 710 or be part of a group or cluster of computing devices networked together to provide greater processing capability.
For clarity of explanation, in some instances the present technology may be presented as including individual functional blocks including functional blocks comprising devices, device components, steps or routines in a method embodied in software, or combinations of hardware and software.
Any of the steps, operations, functions, or processes described herein may be performed or implemented by a combination of hardware and software modules, alone or in combination with other devices. In an embodiment, a software module can be software that resides in memory of a client device and/or one or more servers of a content management system and perform one or more functions when a processor executes the software associated with the module. The memory can be a non-transitory computer-readable medium.
In some embodiments the computer-readable storage devices, mediums, and memories can include a cable or wireless signal containing a bit stream and the like. However, when mentioned, non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals per se.
Methods according to the above-described examples can be implemented using computer-executable instructions that are stored or otherwise available from computer readable media. Such instructions can comprise, for example, instructions and data which cause or otherwise configure a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Portions of computer resources used can be accessible over a network. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, firmware, or source code. Examples of computer-readable media that may be used to store instructions, information used, and/or information created during methods according to described examples include magnetic or optical disks, flash memory, USB devices provided with non-volatile memory, networked storage devices, and so on.
Devices implementing methods according to these disclosures can comprise hardware, firmware and/or software, and can take any of a variety of form factors. Typical examples of such form factors include laptops, smart phones, small form factor personal computers, personal digital assistants, and so on. Functionality described herein also can be embodied in peripherals or add-in cards. Such functionality can also be implemented on a circuit board among different chips or different processes executing in a single device, by way of further example.
The instructions, media for conveying such instructions, computing resources for executing them, and other structures for supporting such computing resources are means for providing the functions described in these disclosures.
Although a variety of examples and other information was used to explain aspects within the scope of the appended claims, no limitation of the claims should be implied based on particular features or arrangements in such examples, as one of ordinary skill would be able to use these examples to derive a wide variety of implementations. Further and although some subject matter may have been described in language specific to examples of structural features and/or method steps, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to these described features or acts. For example, such functionality can be distributed differently or performed in components other than those identified herein. Rather, the described features and steps are disclosed as examples of components of systems and methods within the scope of the appended claims.