Management interface for a system that provides automated, real-time, continuous data protection

Abstract
A data management system that protects data into a continuous object store includes a management interface having a time control. The time control allows an administrator to specify a “past” time, such as a single point or range. When the time control is set to a single point, a hierarchical display of data appears on a display exactly as the data existed in the system at that moment in the past. The time control enables the management interface to operate within a history mode in which the display provides a visual representation of a “virtual” point in time in the past during which the data management system has been operative to provide the data protection service.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application also is related to commonly-owned applications:


U.S. Pat. No. 7,096,392, issued Aug. 22, 2006 and titled “METHOD AND SYSTEM FOR AUTOMATED, NO DOWNTIME, REAL-TIME, CONTINUOUS DATA PROTECTION.”


Ser. No. 11/123,994, filed May 5, 2005, and titled “SYSTEM FOR MOVING REAL-TIME DATA EVENTS ACROSS A PLURALITY OF DEVICES IN A NETWORK FOR SIMULTANEOUS DATA PROTECTION, REPLICATION, AND ACCESS SERVICES.”


BACKGROUND OF THE INVENTION

1. Technical Field


The present invention relates generally to enterprise data protection.


2. Background of the Related Art


A critical information technology (IT) problem is how to cost-effectively deliver network wide data protection and rapid data recovery. In 2002, for example, companies spent an estimated $50 B worldwide managing data backup/restore and an estimated $30 B in system downtime costs. The “code red” virus alone cost an estimated $2.8 B in downtime, data loss, and recovery. The reason for these staggering costs is simple—traditional schedule based tape and in-storage data protection and recovery approaches can no longer keep pace with rapid data growth, geographically distributed operations, and the real time requirements of 24×7×265 enterprise data centers.


Although many enterprises have embarked on availability and recovery improvement programs, many of these programs have been focused on the redundancy of the infrastructure, not on the data itself. Yet, without data availability, applications cannot be available.


Today's legacy data protection and recovery solutions are highly fragmented across a wide variety of applications, systems, and storage models. The overhead and data management maze that existing approaches bring to the network, storage, tape, and application infrastructure has caused increasing expenditures with little tangible returns for the enterprise. Worse, manual recovery techniques compound the problem with the same issues that cause downtime in the first place—human errors and process issues constitute 80% of unplanned downtime.


As a result, businesses are enduring high costs, high risk, and a constant drag on productivity. A recent survey by Aberdeen highlights IT managers' top data storage problems: managing backup and restore (78%), deploying disaster recovery (80%), and delivering required service levels (60%).


One recently-introduced technique for addressing the complex problem of providing heterogeneous, enterprise-wide data management is illustrated in FIG. 1. FIG. 1 illustrates a representative enterprise 100 in which a data management system (DMS) is implemented to provide enterprise data protection. A commercial version of this architecture is available from Asempra Technologies, Inc., of Sunnyvale, Calif. In this illustrative example, an enterprise 100 comprises a primary data tier 102 and a secondary data tier 104 distributed over IP-based wide area networks 106 and 108. Wide area network 106 interconnects two primary data centers 110 and 112, and wide area network 108 interconnects a regional or satellite office 114 to the rest of the enterprise. The primary data tier 102 comprises application servers 116 running various applications such as databases, email servers, file servers, and the like, together with associated primary storage 118 (e.g., direct attached storage (DAS), network attached storage (NAS), storage area network (SAN)). The secondary data tier 104 typically comprises one or more data management server nodes, and secondary storage 120, which may be DAS, NAS, and SAN. The secondary storage may be serial ATA interconnection through SCSI, Fibre Channel (FC or the like), or iSCSI. The data management server nodes create a logical layer that offers object virtualization and protected data storage. The secondary data tier is interconnected to the primary data tier, preferably through one or more host drivers to provide real-time data services. Data management policies 126 are implemented across the secondary storage in a well-known manner. A similar architecture is provided in data center 112. In this example, the regional office 114 does not have its own secondary storage, but relies instead on the facilities in the primary data centers.


As described in co-pending application Ser. No. 10/841,398, the DMS system associates a “host driver” 128 with one or more of the application(s) running in the application servers 116 to transparently and efficiently capture the real-time, continuous history of all (or substantially all) transactions and changes to data associated with such application(s) across the enterprise network. This facilitates real-time, so-called “application aware” protection, with substantially no data loss, to provide continuous data protection and other data services including, without limitation, data distribution, data replication, data copy, data access, and the like. In operation, a given host driver 128 intercepts data events between an application and its primary data storage, and it may also receive data and application events directly from the application and database. The host driver 128 may be embedded in the host application server 116 where the application resides; alternatively, the host driver is embedded in the network on the application data path. By intercepting data through the application, fine grain (but opaque) data is captured to facilitate the data service(s). To this end, and as also illustrated in FIG. 1, each of the primary data centers includes a set of one or more data management servers 130a-n that cooperate with the host drivers 128 to facilitate the data services. The DMS servers provide a distributed object storage that can be built above raw storage devices, a traditional file system, a special purpose file system, a clustered file system, a database, or the like. In this illustrative example, the data center 110 supports a first core region 130, and the data center 112 supports a second core region 132.


As described in co-pending application Ser. No. 11/123,994, each DMS node executes an object runtime environment. This object runtime environment includes an object manager that manages the lifecycle of all the DMS objects during runtime. The object manager creates DMS objects, and the object manager saves them in the shared storage. The objects continually undergo modification as the system protects data in the enterprise's primary storage. In an illustrative embodiment, the system automatically creates a trail of objects called versions; typically, the versions do not actually exist on primary storage, outside of the data management system. The DMS manages the creation, storage, display, recovery to primary storage, deletion (automatic via policy, or manual) and the like, of these versions. The host drivers protect data into the continuous object data store. Using this architecture, data in primary storage can be recovered to any point-in-time.


The present invention is a management interface for use in an enterprise data management system such as described above.


BRIEF SUMMARY OF THE INVENTION

A data management system that protects data into a continuous object store includes a management interface having a time control. The time control is a mechanism, such as a linear timeline, a radial time dial, a calendar, or a search specification dialog, or a combination thereof, that allows an administrator to specify a “past” time, such as a single point or range. When the time control is set to a single point, a hierarchical display of data appears on a display exactly as the data existed in the system at that moment in the past. Preferably, the visualization includes both the structure of the hierarchy (e.g., if the protected data source is a file system, the identity of the directories and their files; if the protected data source is a relational database, the identity of the databases and their binary and log files), and also the contents of the data objects themselves (i.e., what was in the files and databases). The timeline also includes a zoom function to enable the user to view and set the time at a coarse granularity (e.g., a given day) or to view and set the time at a finer granularity (e.g., seconds). A search specification dialog allows the user to specify a time range as well as a point in time. This time range is then used as a display filter, so that only files meeting specified criteria are included in the display set. As an example, a user may search for “all files which had a size >a given value at some point in December 2004 and were deleted in January 2005.”


The time control enables the management interface to operate within a history mode in which the display provides a visual representation of a “virtual” point in time in the past during which the data management system has been operative to provide the data protection service. In addition, the management interface can be toggled to operate in a real-time mode, which provides an active view of the most current protected data as it changes in real-time, typically driven by changes to primary storage. This real-time mode provides the user with the ability to view changes that occur to a set of data currently visible on the display screen. As an example, if the interface is displaying the contents of directory D1, and a file F1 in the directory is created on primary storage, then file F1 automatically appears in the display in the appropriate position in the data hierarchy.


The interface also allows an administrator to specify and manage policy including, without limitation, how long data is retained in the management system. A policy engine enables the user to assert “temporal-based” policy over data objects. As an example, an administrator may define a policy rule such as “retain all versions of all Excel files in the New York office for one month, then retain monthly snapshots of such files for the next eleven months, then purge all older versions.” Preferably, a given policy is asserted by one or more policy attributes, and attributes are grouped and managed according to one or more policy profiles. The administrator may assert policy by associating policy profiles with data objects at any level in the hierarchy.


The foregoing has outlined some of the more pertinent features of the invention. These features should be construed to be merely illustrative. Many other beneficial results can be attained by applying the disclosed invention in a different manner or by modifying the invention as will be described.





BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:



FIG. 1 is an illustrative enterprise network in which the present invention may be deployed;



FIG. 2 is an illustration of a set of data management system nodes that comprise a continuous object data store;



FIG. 3 is a representative DMS network having a management gateway according to one embodiment of the present invention;



FIG. 4 is a block diagram of a management console for use in the present invention;



FIG. 5 is a block diagram of various software modules that may be used to retrieve information about the data objects from DMS and export such information to user interface viewers in an illustrated embodiment;



FIG. 6 is a GUI button bar that includes a set of controls for the management interface;



FIG. 7A illustrates a representative display screen layout for the management interface;



FIG. 7B illustrates a representative display screen layout for the interface after a user has selected to view one or more versions of a particular data object;



FIG. 8 illustrates an additional control panel for use in policy management;



FIG. 9 illustrates the management interface when the user selects a history display mode;



FIG. 10 illustrates a time control in the form of a timeline that is part of the management interface;



FIG. 11 illustrates an operation of a beginning time button control;



FIG. 12 illustrates an operation of a now button control;



FIG. 13 illustrates a day timeline view;



FIG. 14 illustrates an hour timeline view;



FIG. 15 illustrates a minute timeline view;



FIG. 16 illustrates a second timeline view;



FIG. 17 illustrates several examples of how policy profiles are managed;



FIG. 18 illustrates how retention policy may be enforced;



FIG. 19 illustrates a specific retention policy example; and



FIG. 20 illustrates how the enterprise primary storage and DMS can be modeled as a pair of logical and physical system models according to the present invention to facilitate policy management.





DETAILED DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT

As illustrated in FIG. 1, the present invention is now described in the context of a data management system (DMS) that is implemented as a network (a wide area network “cloud”) of peer-to-peer DMS service nodes. The invention is not limited to use with such a system, however.


By way of brief background, FIG. 2 illustrates a hierarchical structure of a data management system 200 in which the invention may be implemented. As illustrated, the data management system 200 comprises one or more regions 202a-n, with each region 202 comprising one or more clusters 204a-n. A given cluster 204 includes one or more nodes 206a-n and a shared storage 208 shared by the nodes 206 within the cluster 204. A given node 206 is a data management server as described above with respect to FIG. 1. Within a DMS cluster 204, preferably all the nodes 206 perform parallel access to the data in the shared storage 208. Preferably, the nodes 206 are hot swappable to enable new nodes to be added and existing nodes to be removed without causing cluster downtime. A cluster is a tightly-coupled, share everything grouping of nodes. At a higher level, the DMS is a loosely-coupled share nothing grouping of DMS clusters. Preferably, all DMS clusters have shared knowledge of the entire network, and all clusters preferably share partial or summary information about the data that they possess. Network connections (e.g., sessions) to one DMS node in a DMS cluster may be re-directed to another DMS node in another cluster when data is not present in the first DMS cluster but may be present in the second DMS cluster. Also, new DMS clusters may be added to the DMS cloud without interfering with the operation of the existing DMS clusters. When a DMS cluster fails, its data may be accessed in another cluster transparently, and its data service responsibility may be passed on to another DMS cluster.


With reference to FIG. 3, the DMS cloud 300 typically comprises one or more DMS regions, with each region comprising one or more DMS “clusters.” In the illustrative embodiment of FIG. 3, typically there are two different types of DMS regions, in this example an “edge” region 306 and a “core” region 308. This nomenclature is not to be taken as limiting, of course. As illustrated in FIG. 1, an edge region 306 typically is a smaller office or data center where the amount of data hosted is limited and/or where a single node DMS cluster is sufficient to provide necessary data services. Typically, core regions 308 are medium or large size data centers where one or more multi-node clusters are required or desired to provide the necessary data services. The DMS preferably also includes one or more management gateways 310 for controlling the system. As seen in FIG. 3, conceptually the DMS can be visualized as a set of data sources 312. A data source is a representation of a related group of fine grain data. For example, a data source may be a directory of files and subdirectory, or it may be a database, or a combination of both. A data source 312 inside a DMS cluster captures a range of history and continuous changes of, for example, an external data source in a host server. A data source may reside in one cluster, and it may replicate to other clusters or regions based on subscription rules. If a data source exists in the storage of a DMS cluster, preferably it can be accessed through any one of the DMS nodes in that cluster. If a data source does not exist in a DMS cluster, then the requesting session may be redirected to another DMS cluster that has the data; alternatively, the current DMS cluster may perform an on-demand replication to bring in the data.


As described in co-pending application Ser. No. 11/123,994, which is incorporated herein by reference, the DMS nodes create distributed object storage to provide real-time data management services. The distributed object store can be built above raw storage devices, a traditional file system, a special purpose file system, a clustered file system, a database, and so on. Preferably, DMS builds the distributed object store over a special purpose file system for storage and access efficiency. Each DMS node executes an object runtime environment. This object runtime environment includes an object manager that manages the lifecycle of all the DMS objects during runtime. The object manager creates DMS objects, which are sometimes referred to active objects, and the object manager saves them in the shared storage. The objects continually undergoing modification as the system protects data in the enterprise's primary storage. In an illustrative embodiment, the system automatically creates a trail of objects called versions (typically, the versions do not actually exist on primary storage, outside of the data management system). The DMS manages the creation, storage, display, recovery to primary storage, deletion (automatic via policy or manual) and the like, of these versions.


According to the present invention, as illustrated in FIG. 3, the DMS includes one or more management gateways to enable enterprise administrators (or others) to manage system administration and operation, preferably of the entire DMS network (including, for example, multiple regions, clusters, nodes and storage devices) and its protected data. A management gateway is a data management application platform that provides to a user, through a viewer, a graphical user interface (GUI) for displaying a real-time object catalog for user management of the DMS and to facilitate data recovery. As will be described, the GUI includes a time control, such as a timeline, for navigating data over a range of time. The GUI presents a consistent state of the data as it was at the time the administrator selects on the timeline—both the structure of the data hierarchy (names, existence of objects, and container relationships), and also the data itself (contents of versions). Thus, the GUI presents the data hierarchy as it was at the selected point in time on the time control.



FIG. 4 illustrates components of a representative management gateway in an illustrative embodiment of the present invention. A management gateway is a data processing system 400 having one or more processors 402, suitable memory 404 and storage devices 406, input/output devices 408, an operating system 410, and one or more applications 412. One output device is a display that supports a windows-based graphical user interface (GUI). The data processing system includes hardware and software components to facilitate connectivity of the machine to the public Internet, a private network, or any other communications network. In a representative embodiment, the data processing system is a Pentium-based personal computer executing a suitable operating system such as Linux or Windows XP. Of course, any convenient processor and operating system platforms may also be used.


The management gateway can be a standalone device, or it can operate as a server to which one or more client machines are connected. FIG. 5 illustrates one embodiment wherein the management gateway operates as a server to which one or more client machines can connect to view the data. In FIG. 5, DMS 500 exports the data to the server platform 502, which supports a Web server 503 (e.g., Microsoft IIS, Apache, or the like), and a gateway service 504. The gateway service 504 includes an XML web service component 506, a DMS transport protocol (XDMP) XML API module 508, and an XDMP SDK API module 510. The XDMP components interface with the DMS. In this embodiment, end user client machines (e.g., commodity PCs having Web browsers) connect to the server via HTTP or SOAP. The client side comprises an application core module 512, the UI components 514, and a data load component module 516.


With the above as background, the following section describes an illustrated graphical user interface (GUI) for use in the data management system. As will be seen, the GUI comprises various screen layouts, buttons, wizards, and other graphic display elements that enable an administrator to navigate through time in a unique manner as will now be described and illustrated.


Button Bar


Controls and information preferably are always visible in a Button Bar at the top of the display window, as illustrated in FIG. 6. The table that follows describes the controls on the Button Bar, from left to right.

    • Home Resets the UI to the default state: Realtime mode, Repositories selected, all Regions collapsed.
    • Pulldown Menu A context-sensitive menu, with the basic commands that apply to the objects currently being displayed, such as Create, Delete, Move, etc.
    • Alerts Displays an Alerts screen.
    • Realtime/History Mode Toggles the Management Console between realtime mode, in which the UI tracks the current state of the data in the DMS, and history mode, in which the UI presents the data as of a particular time (down to the second) in the past.
    • Current Time Shows the current DMS time. The background is green in Realtime Mode, orange in History mode.
    • Task Buttons The Protect, Replicate, Recover and Switchover buttons bring up wizards to perform their respective actions.


      Screen Layout


By default the UI preferably comprises a Left Pane containing trees of selectable objects, and a Center Pane listing of the contents of the selected object, as illustrated in FIG. 7A.


Left Pane


The Left Pane preferably displays browse-able trees of selectable objects. The tabs at the bottom preferably allow the user to switch between three views:

    • DMS. Data objects in the DMS: Data Cloud, Regions, Repositories, Data sources.
    • Policy. Policy Profile documents, which are associated with data objects.
    • Network. Physical components of the DMS: Regions, Clusters, Nodes.


The DMS view preferably displays two trees:

    • Repositories. The logical hierarchy of data in the DMS: Data sources are organized into (possibly nested) Repositories, which reside in Regions.
    • Servers. Shows all the Data sources under the Servers their data originates from. Preferably, all DMS-enabled Servers are shown, grouped by Region. Servers in each Region for which data protection has not yet been enabled are shown under Unprotected Servers.


      Center Pane


The Center Pane preferably displays information for the object selected in the Left Pane. For data objects (i.e. when the DMS Tab is current) preferably there are several views:

    • Protected Data. Data objects, including, without limitation, repositories, data sources, directories, files, databases, database objects, Exchange storage groups, Exchange databases, user mailboxes, messages, user calendars and the like.
    • Replicas. Present when the selected object has one or more replicas.
    • Audit Log. Shows events related to the object selected in the Left Pane.
    • Graphs & Reports.


      The information viewed in the Center Pane is controlled by the View Menu and the Column Menu.


      Right Pane


The Right Pane displays information pertaining to the object selected in the Left or Center panes. The information preferably is presented as two property sheets, the Info Sheet and Policy Sheet. Display of the Right Pane is controlled by the arrow at the right of the center pane. Initially, preferably the Right Pane is not displayed, and the arrow points to the left. Clicking it displays the Right Pane, as illustrated in FIG. 8.

    • When the Right Pane is displayed, the arrow in the Center Pane points to the right, and clicking it preferably hides the Right Pane.


      Navigation Through Time


The two principal mechanisms for navigating the DMS history are 1) drilling down into object versions, and 2) going into History Mode and explicitly changing the current DMS time. Where the data source being protected is a file system, the “versions” are file versions, and a particular file version is created when a file is modified and closed. Where the data source is a database, a particular “version” is created whenever the database is checkpointed, quiesced or shut down, as the case may be.


Data Object Versions


Whenever data objects (such as files or databases) are displayed in the Center Pane, preferably there is a Versions column with the number of versions for the object, up to the current DMS time (in real-time mode) or the selected time (in history mode) for each data object. By clicking that number, the user can drill down into a listing of all the versions. This is illustrated in FIG. 7B. Preferably, DMS automatically creates versions as the data object changes. By visually scanning the list of versions, going back from the present, this portion of the GUI facilitates simple time-based navigation. A more powerful time navigation mechanism is provided by a History mode, which is now described.


Users with appropriate permissions may view the contents of any version, e.g., by issuing an Open command for that version, or by means of a menu or accelerator such as double-clicking on the version. The management interface then invokes a viewing application capable of displaying the data appropriately, and preferably displays the read-only data in a separate window, which may be tiled or overlapping in relation to the Left, Right and Center panes. Thus, the console can show the number of versions at any point in history, and the user can drill down to see the version list at any point in history and then return to a previous level.


History Mode & Timeline


Clicking History toggles the system from Realtime mode to History mode, as illustrated in FIG. 9.

    • While in History mode, the user can navigate through the entire time span covered by the DMS by means of 1) the Timeline or 2) a Calendar popup, which is accessed by clicking anywhere in the Time display above the Center Pane.


      Date/Time Links


The UI displays the timestamps associated with various data objects—e.g. the time a file was last modified, or the time an event occurred. An event can be a consistent checkpoint (e.g., file close, a database checkpoint or quiesce, or the like), a software upgrade, a virus detector alert, a business-associated event, or the like. Whenever such a timestamp is onscreen, the user can right-click to pop up a menu and select Go To this date & time to enter History mode and navigate to that time.


Timeline and Calendar


This section further describes the Timeline and Calendar for navigating through time in History mode.


Timeline


The Timeline preferably appears at the bottom of the window in History mode, as illustrated in FIG. 10. This is not a limitation, however.


The Timeline is used to control the current system time—i.e., the moment in time which is taken as the lens through which to view the data in the DMS. The current system time is shown by a current system time indicator (CSTI)—preferably a vertical red bar. In some views, the current unit box in the timeline is also highlighted, as shown above.


Timeline Components


The Timeline preferably contains the following controls and display areas, from left to right:

    • Current Time Box. Displays the current system time, including month, day, year, hour, minute, second, and AM/PM.
    • Beginning Time Button. Button in the form of a vertical bar that, when clicked, scrolls to show the earliest protection date at far left, with the CSTI at the left edge of the timeline, as illustrated in FIG. 11.
    • Scroll Back Far Button. Button in the form of a double-left-arrow that, when clicked, scrolls one full “timeline full” backward in time—i.e. the contents of the Timeline animate quickly and smoothly such that the time that was displayed on the far left of the timeline moves all the way to the far right.
    • Scroll Back Single Unit Button. Button in the form of a single-left-arrow that, when clicked, scrolls a single unit backward in time.
    • Timeline. Bar in the center that shows a number of units in the current zoom level. The bar length is adjusted as needed so the entire Timeline fits the current window width.
    • Current System Time Indicator. A vertical red bar within the Timeline showing the current system time. Clicking elsewhere in the Timeline causes the CSTI to jump to the new location on mouse up. The CSTI can also be dragged to a new location. Dragging off either edge causes the Timeline contents to auto-scroll in the appropriate direction.
    • Scroll Forward Single Unit Button. Button in the form of a single-left arrow that; when clicked, scrolls a single unit forward in time.
    • Scroll Forward Far Button. Button in the form of a double-right-arrow that, when clicked, scrolls one full “timeline full” forward in time—i.e. the contents of the Timeline animate quickly and smoothly such that the time that was displayed on the far right of the timeline moves all the way to the far left.
    • Now Button. Button in the form of a vertical bar that, when clicked, scrolls to show the current time at far right, with the CSTI at the right edge of the timeline, as illustrated in FIG. 12.
    • Zoom Level Box. Allows the user to select one of four zoom levels: Second, Minute, Hour, Day.


      Timeline Operation
    • The Timeline slides smoothly on and off the bottom edge of the window when the user toggles between History and Realtime modes.
    • Scrolling via any of the forward/back buttons, and changing the zoom level, preferably has no effect on the current DMS time.
    • Preferably, all of the arrow buttons have auto-repeat behavior. That is, the first unit of animated scrolling occurs immediately on mouse-down, then, after a short pause, the scrolling continues smoothly, ending on mouse-up. So clicking scrolls one unit, pressing and holding scrolls continuously.
    • Double-clicking anywhere within the Timeline proper zooms in one level; Shift-double-click zooms out one level.


      Timeline Views


By default, preferably the Timeline is in Day View, as illustrated in FIG. 13.

    • The CSTI preferably is in the middle of the current day, which is highlighted.


An Hour view is illustrated in FIG. 14.

    • The CSTI preferably is in the middle of the current hour, which is highlighted.


A Minute view is illustrated in FIG. 15.

    • The CSTI preferably is located on a vertical line or tickmark.


A Seconds view is illustrated in FIG. 16.

    • The CSTI preferably is located on one of the bright green second indicators.


      Operations in History Mode
    • In one embodiment, clicking the Protect, Replicate, or Switchover buttons in History mode brings up a dialog allowing the user to either switch to Realtime mode and continue with the Wizard, or cancel.


Thus, according to a feature of the present invention, the DMS management interface provides a “time control” that allows the user to specify a time (either single point or range) in the past. When the time control is set to a single point, then a familiar hierarchical display of data appears exactly as it was in reality at that moment in the past. Preferably, this display includes both the structure of the hierarchy (e.g., in a file system data source, which directories and files existed; in a database data source, the identity of the databases and their associated binary and log files), as well as the contents of the data objects themselves (i.e., what was in the files and databases). Although the embodiment described and illustrated using a linear timeline as the time control, this is not a limitation of the present invention. In the alternative, the time control may take other forms, such as the popup calendar described above, or a radial time dial, a calendar, or a search specification dialog. Regardless of the physical format, the timeline preferably includes the described zoom feature for “zooming out” to view and set the time at a courser granularity (e.g. day) and “zooming in” to view/set at a finer granularity (e.g. seconds).


Another form of time control is a search specification dialog. According to the invention, a search specification dialog allows the user to specify a time range as well as a point in time. This time range is then used as a display filter, so that only data objects meeting specified criteria are included in the display set. The display set may be presented as a flat list, or in the form of a filtered view of the data hierarchy (i.e. the volume/directory/file trees). The criteria can include, but are not limited to, creation date, modification date, deletion date, size, presence of a specified string within the data object, existence of the data object, and the like. The following are examples of how a user may navigate temporally by using the search specification dialog:

    • All files which had a size >IMB at some point in December 2004 and were deleted sometime in January 2005
    • All files which contained the string “Valerie Flame” and were deleted during September 2003
    • All files existing in the directory user1:C:\foo\bar at any point between 10/15/05 and 10/31/05


      As described and illustrated above, the interface also displays timestamps associated with various data objects—e.g., the time a file was last modified, or the time an event occurred (an event may be a data consistency event, a software upgrade, virus detector alert, or the like). Whenever such a timestamp is onscreen, the user can right-click to pop up a menu and select Go To this date and time to enter History mode and navigate to that time.


As has been described, the time navigation capabilities described above comprise a “history mode” in which the “virtual time” is different from the actual real-time. The management interface also provides an active view of the DMS data as it changes in real-time, typically driven by changes to primary storage. This is the Realtime mode. In this mode, the management interface becomes aware of relevant changes to the DMS at periodic intervals. As used herein, preferably “relevant” means changes to the DMS that are in the current display set, the set of data currently visible on the screen. To give a concrete example, if the interface is displaying the contents of directory D1 and file D1/F1 is created on primary storage, then F1 will automatically appear in the display. The management interface may become aware of changes by polling the DMS and asking for data that has changed since a last update, or by having the DMS notify the interface of changes since a last notification. Regardless of which method is used, polling or notification, the set of changes must then be compared with the current display set to determine if any of the changes are within the display set. Whenever changes to the display set of detected, the display is updated automatically, and the current time indicator is updated to reflect the time of last updating.


Policy Management


The management interface allows the administrator to specify and manage policy including, without limitation, how long data is retained in the system, preferably by means of a policy engine that is sensitive to “time-based” or “temporal” constraints. The policy engine enables the administrators to define temporal-based policies such as the following:

    • Retain all versions of all files/emails containing the word “Flame” forever
    • Purge all versions of all files/emails containing the word “Flame” from both the DMS and primary storage after 1 week unless also stamped w/keyword “Keep”, in which case retain forever in the DMS
    • Retain all versions of all Excel files in the New York office for 1 month, then monthly snapshots for the next 11 months, then purge all older versions


      The above examples are merely representative, and other types of policies may be implemented, e.g., such as a policy that enables DMS to cause primary storage to move, copy or delete files, e.g., to migrate aging data to cheaper storage, or to delete it from primary storage altogether. More complex policy rules may be defined whereby one or more conditions trigger changes in the values for another set of attributes (e.g., for all documents containing the string “Flame,” set the attribute “Confidentiality” to “High”). As can be seen, preferably a given policy is asserted by means of policy attributes. Attributes are grouped and managed by means of Policy Profiles, which can be thought of as documents containing groups of attributes that may be applied to certain classes of objects. The administrator asserts policy by associating Policy Profiles with data objects at any level in the hierarchy.


A model for evaluating policy attributes is summarized as follows and illustrated in FIG. 17.

    • Current Profile. Profiles can be assigned to DMS data objects at the level of data sources and above—i.e. data sources, repositories, regions and the root of the Repositories tree. If the object does not have a profile directly assigned to it, the profile assigned to the closest parent up the Repositories tree is taken as the current profile.
    • Per-attribute Override. Attributes in the current profile can be overridden by setting the attribute's value on the object itself.
    • Per-attribute Lock. Overriding by profile specification or attribute setting farther down the tree may be defeated by means of a per-profile, per-attribute Lock. When an attribute in a profile is locked, its value is enforced on all objects within the scope of the container with the profile; that is, its value may not be overridden either by a profile assignment or by a per-attribute override setting further down the tree.
    • Per-profile Block Inheritance. Inheritance from parents further up the tree may be defeated by setting the per-profile attribute Blocks inherited locks to True.


      Retention Policy


The management interface enables the administrator to control how long data is retained in the DMS, preferably based on one of three policy attributes:

    • Continuous History. How long continuous changes are retained.
    • Long Term History. How long consolidated versions are retained.
    • Long Term Interval. Frequency of consolidated versions.


The relationship between these attributes is shown in the diagram of FIG. 18. To interpret the diagram, visualize that the versions flow steadily from right to left as time goes by. As versions flow from a first continuous period into a second long term period, they are consolidated at the specified intervals; as the consolidated versions flow out of the long term period, preferably they are purged from the DMS. Note that if the most-recent version of the data set flows out of the long term period, preferably the entire data set is purged.



FIG. 19 illustrates a more specific example of an illustrative retention/pruning model for a given set of versions (e.g., V1-V15) over a given set of times T0-T6. The number of versions and times are merely illustrative:

    • T0. Version 1 (V1) is created when the data source is initially protected. A Continuous/Longterm Boundary (separating the two color segments in the diagram) is determined based on the value of Continuous History.
    • T1. V2-V6 have been generated.
    • T2. Time (V1 timestamp+Continuous History). V7-V10 have been generated. V1 becomes the first consolidated version.
    • T3. V11-V13 have been generated.
    • T4. Time (V1 timestamp+2*Continuous History). V14 has been generated. V6 becomes the second consolidated version.
    • T5. V15 has been generated.
    • T6. Time (V1 timestamp+3*Continuous History). V10 becomes the third consolidated version.



FIG. 20 illustrates how the enterprise primary storage and DMS can be modeled (by the management interface display) as a pair of logical and physical system models to facilitate policy management. These models are displayable on the GUI. The system can be viewed from two perspectives: a logical level of data and policy, and a physical level of compute nodes and storage. The upper portion of FIG. 20 illustrates the logical model, whereas the physical model forms the bottom portion. In particular, the logical user model allows the administrator to manage data and policy. To this end, a primary container object is the Repository, which contains data objects called Data Sets. As has been described above, Policy can be asserted at the level of Universe, Region, Repository or Data Set. This model is presented in the management interface by selecting a Data Tab. The physical user model allows the administrator to manage the physical components that run the DMS software. A primary container object is the Cluster, which contains two types of objects: computational units called Nodes, and storage units, including logical Volumes and Volume Groups, as well as the physical Disk Arrays themselves. This model is presented in the management interface by selecting a Network Tab.


In an illustrated embodiment, the management interface console is implemented as a gateway, a standalone machine, or some combination thereof. Generalizing, any of the described functions are implemented by a processor and associated program code. An apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.


While the above written description also describes a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary, as alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, or the like. References in the specification to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic.


Having described our invention, what we now claim is as follows.

Claims
  • 1. A data management system comprising: a distributed data storage having a data manager executing thereon;an application-aware real-time event data stream received by the data manager;wherein the received application-aware real-time event data stream comprises event-identifying data, metadata, and data changes;wherein the data manager preserves a data history of the data source in a distributed data object store on the distributed data storage;a management gateway comprising a processor;code executed on the processor to generate a graphical user interface having a display element that enables specification of a past time, wherein the graphical user interface includes a display mode object having first and second positions;code executed on the processor and responsive to the specification of the past time to generate a display of a representation of the distributed data object store, or a given portion thereof, as it existed at the past time;code executed on the processor and responsive to the specification of the past time when the display mode object is in the first position to generate a display of a representation of the distributed data object store, or a given portion thereof, as it existed at the past time; andcode executed on the processor when the display mode object is in the second position to generate a display of the representation of the distributed data object store, or a given portion thereof, at a current point-in-time,wherein, when the display mode object is in the second position, the representation of the distributed data object store is updated in real-time as given data is received in the distributed data object store.
  • 2. The data management system as described in claim 1 wherein the display element is one of: a linear timeline, a radial time dial, a calendar, a version link, and a search specification dialog.
  • 3. The data management system as described in claim 1 wherein the display of the representation of the distributed data object store comprises a structure of a distributed data object store hierarchy and contents of one or more data objects at given locations in the distributed data object store hierarchy.
  • 4. The data management system as described in claim 1 further including code executed on the processor to enable the past time to be specified at a first, coarse granularity or at a second, fine granularity.
  • 5. The data management system as described in claim 1 further including code executed on the processor for displaying links to a set of one or more versions that existed at the past time and, responsive to selection of a link, for taken a further action with respect to a selected version.
  • 6. The data management system as described in claim 1 wherein the representation of the distributed data object store also includes contents of data objects in the distributed data object store.
  • 7. The data system management as described in claim 1 further including code executed on the processor to enable specification of a retention policy.
Parent Case Info

This application is based on and claims priority to provisional application Ser. No. 60/624,358, filed Nov. 2, 2004.

US Referenced Citations (228)
Number Name Date Kind
3555184 Townley Jan 1971 A
3555195 Rester et al. Jan 1971 A
3555204 Braun Jan 1971 A
3555251 Shavit Jan 1971 A
3648250 Low et al. Mar 1972 A
4162536 Morley Jul 1979 A
4402045 Krol Aug 1983 A
4415792 Jordan Nov 1983 A
4450556 Boleda et al. May 1984 A
4451108 Skidmore May 1984 A
4455483 Schonhuber Jun 1984 A
4502082 Ragle et al. Feb 1985 A
4512020 Krol et al. Apr 1985 A
4796260 Schilling et al. Jan 1989 A
4882737 Dzung Nov 1989 A
4916450 Davis Apr 1990 A
4972474 Sabin Nov 1990 A
5005197 Parsons et al. Apr 1991 A
5148479 Bird et al. Sep 1992 A
5177796 Feig et al. Jan 1993 A
5224212 Rosenthal et al. Jun 1993 A
5274508 Tan et al. Dec 1993 A
5280584 Caesar et al. Jan 1994 A
5287504 Carpenter et al. Feb 1994 A
5303393 Noreen et al. Apr 1994 A
5305326 Solomon et al. Apr 1994 A
5311197 Sorden et al. May 1994 A
5319395 Larky et al. Jun 1994 A
5321699 Endoh et al. Jun 1994 A
5363371 Roy et al. Nov 1994 A
5365516 Jandrell Nov 1994 A
5373372 Loewen Dec 1994 A
5377102 Nishiishigaki Dec 1994 A
5382508 Ikenoue Jan 1995 A
5386422 Endoh et al. Jan 1995 A
5387994 McCormack et al. Feb 1995 A
5388074 Buckenmaier Feb 1995 A
5392209 Eason et al. Feb 1995 A
5396600 Thompson et al. Mar 1995 A
5416831 Chewning, III et al. May 1995 A
5424778 Sugiyama et al. Jun 1995 A
5430830 Frank et al. Jul 1995 A
5440686 Dahman et al. Aug 1995 A
5469444 Endoh et al. Nov 1995 A
5477492 Ohsaki et al. Dec 1995 A
5479654 Squibb Dec 1995 A
5481531 Yamamuro Jan 1996 A
5499512 Jurewicz et al. Mar 1996 A
5502491 Sugiyama et al. Mar 1996 A
5506965 Naoe Apr 1996 A
5507024 Richards, Jr. Apr 1996 A
5511212 Rockoff Apr 1996 A
5526357 Jandrell Jun 1996 A
5537945 Sugihara et al. Jul 1996 A
5560033 Doherty et al. Sep 1996 A
5561671 Akiyama Oct 1996 A
5583975 Naka et al. Dec 1996 A
5602638 Boulware Feb 1997 A
5606601 Witzman et al. Feb 1997 A
5640159 Furlan et al. Jun 1997 A
5644763 Roy Jul 1997 A
5651129 Yokote et al. Jul 1997 A
5657398 Guilak Aug 1997 A
5678042 Pisello et al. Oct 1997 A
5684536 Sugiyama et al. Nov 1997 A
5684693 Li Nov 1997 A
5684774 Yamamuro Nov 1997 A
5724241 Wood et al. Mar 1998 A
5729743 Squibb Mar 1998 A
5737399 Witzman et al. Apr 1998 A
5742509 Goldberg et al. Apr 1998 A
5742915 Stafford Apr 1998 A
5754772 Leaf May 1998 A
5764691 Hennedy et al. Jun 1998 A
5768159 Belkadi et al. Jun 1998 A
5778370 Emerson Jul 1998 A
5781612 Choi et al. Jul 1998 A
5784366 Apelewicz Jul 1998 A
5794252 Bailey et al. Aug 1998 A
5805155 Allibhoy et al. Sep 1998 A
5812130 Van Huben et al. Sep 1998 A
RE35920 Sorden et al. Oct 1998 E
5819020 Beeler, Jr. Oct 1998 A
5822749 Agarwal Oct 1998 A
5826265 Van Huben et al. Oct 1998 A
5831903 Ohuchi et al. Nov 1998 A
5841717 Yamaguchi Nov 1998 A
5841771 Irwin et al. Nov 1998 A
5848072 Prill et al. Dec 1998 A
5854834 Gottlieb et al. Dec 1998 A
5862136 Irwin Jan 1999 A
5864875 Van Huben et al. Jan 1999 A
5877742 Klink Mar 1999 A
5878408 Van Huben et al. Mar 1999 A
5893119 Squibb Apr 1999 A
5894494 Davidovici Apr 1999 A
5909435 Apelewicz Jun 1999 A
5917429 Otis, Jr. et al. Jun 1999 A
5918248 Newell et al. Jun 1999 A
5920867 Van Huben et al. Jul 1999 A
5920873 Van Huben et al. Jul 1999 A
5928327 Wang et al. Jul 1999 A
5930732 Domanik et al. Jul 1999 A
5930762 Masch Jul 1999 A
5931928 Brennan et al. Aug 1999 A
5937168 Anderson et al. Aug 1999 A
5940823 Schreiber et al. Aug 1999 A
5950201 Van Huben et al. Sep 1999 A
5953729 Cabrera et al. Sep 1999 A
5958010 Agarwal et al. Sep 1999 A
5966707 Van Huben et al. Oct 1999 A
5974563 Beeler, Jr. Oct 1999 A
5980096 Thalhammer-Reyero Nov 1999 A
5999562 Hennedy et al. Dec 1999 A
6005846 Best et al. Dec 1999 A
6005860 Anderson et al. Dec 1999 A
6031848 Brennan Feb 2000 A
6035297 Van Huben et al. Mar 2000 A
6047323 Krause Apr 2000 A
6065018 Beier et al. May 2000 A
6072185 Arai et al. Jun 2000 A
6088693 Van Huben et al. Jul 2000 A
6094654 Van Huben et al. Jul 2000 A
6108318 Kolev et al. Aug 2000 A
6108410 Reding et al. Aug 2000 A
6154847 Schofield et al. Nov 2000 A
6158019 Squibb Dec 2000 A
6163856 Dion et al. Dec 2000 A
6178121 Maruyama Jan 2001 B1
6181609 Muraoka Jan 2001 B1
6189016 Cabrera et al. Feb 2001 B1
6237122 Maki May 2001 B1
6243348 Goodberlet Jun 2001 B1
6249824 Henrichs Jun 2001 B1
6366926 Pohlmann et al. Apr 2002 B1
6366988 Skiba et al. Apr 2002 B1
6389427 Faulkner May 2002 B1
6393582 Klecka et al. May 2002 B1
6397242 Devine et al. May 2002 B1
6446136 Pohlmann et al. Sep 2002 B1
6460055 Midgley et al. Oct 2002 B1
6463565 Kelly et al. Oct 2002 B1
6487561 Ofek et al. Nov 2002 B1
6487581 Spence et al. Nov 2002 B1
6496944 Hsiao et al. Dec 2002 B1
6502133 Baulier et al. Dec 2002 B1
6519612 Howard et al. Feb 2003 B1
6526418 Midgley et al. Feb 2003 B1
6549916 Sedlar Apr 2003 B1
6611867 Bowman-Amuah Aug 2003 B1
6625623 Midgley et al. Sep 2003 B1
6629109 Koshisaka Sep 2003 B1
6670974 McKnight et al. Dec 2003 B1
RE38410 Hersch et al. Jan 2004 E
6751753 Nguyen et al. Jun 2004 B2
6779003 Midgley et al. Aug 2004 B1
6785786 Gold et al. Aug 2004 B1
6816872 Squibb Nov 2004 B1
6823336 Srinivasan et al. Nov 2004 B1
6826711 Moulton et al. Nov 2004 B2
6836756 Gruber Dec 2004 B1
6839721 Schwols Jan 2005 B2
6839740 Kiselev Jan 2005 B1
6847984 Midgley et al. Jan 2005 B1
6907551 Katagiri et al. Jun 2005 B2
6993706 Cook Jan 2006 B2
7028078 Sharma et al. Apr 2006 B1
7039663 Federwisch et al. May 2006 B1
7054913 Kiselev May 2006 B1
7069579 Delpuch Jun 2006 B2
7080081 Agarwal et al. Jul 2006 B2
7092396 Lee et al. Aug 2006 B2
7096392 Sim-Tang Aug 2006 B2
7200233 Keller et al. Apr 2007 B1
7206805 McLaughlin, Jr. Apr 2007 B1
7207224 Rutt et al. Apr 2007 B2
7272613 Sim et al. Sep 2007 B2
7290056 McLaughlin, Jr. Oct 2007 B1
7325159 Stager et al. Jan 2008 B2
7363549 Sim-Tang Apr 2008 B2
7519870 Sim-Tang Apr 2009 B1
7565661 Sim-Tang Jul 2009 B2
20010029520 Miyazaki et al. Oct 2001 A1
20010043522 Park Nov 2001 A1
20020022982 Cooperstone et al. Feb 2002 A1
20020091722 Gupta et al. Jul 2002 A1
20020107860 Gobeille et al. Aug 2002 A1
20020144177 Kondo et al. Oct 2002 A1
20020147807 Raguseo Oct 2002 A1
20020172222 Ullmann et al. Nov 2002 A1
20020178397 Ueno et al. Nov 2002 A1
20020199152 Garney et al. Dec 2002 A1
20030009552 Benfield et al. Jan 2003 A1
20030051026 Carter et al. Mar 2003 A1
20030088372 Caulfield May 2003 A1
20030117916 Makela et al. Jun 2003 A1
20030200098 Geipel et al. Oct 2003 A1
20030204515 Shadmon et al. Oct 2003 A1
20040010544 Slater et al. Jan 2004 A1
20040036716 Jordahl Feb 2004 A1
20040047354 Slater et al. Mar 2004 A1
20040080504 Salesky et al. Apr 2004 A1
20040117715 Ha et al. Jun 2004 A1
20040193594 Moore et al. Sep 2004 A1
20040199486 Gopinath et al. Oct 2004 A1
20040250212 Fish Dec 2004 A1
20050001911 Suzuki Jan 2005 A1
20050021690 Peddada Jan 2005 A1
20050076066 Stakutis et al. Apr 2005 A1
20050166179 Vronay et al. Jul 2005 A1
20050251540 Sim-Tang Nov 2005 A1
20050262097 Sim-Tang et al. Nov 2005 A1
20050286440 Strutt et al. Dec 2005 A1
20060020586 Prompt et al. Jan 2006 A1
20060026220 Margolus Feb 2006 A1
20060050970 Gunatilake Mar 2006 A1
20060064416 Sim-Tang Mar 2006 A1
20060130002 Hirayama et al. Jun 2006 A1
20060137024 Kim et al. Jun 2006 A1
20060236149 Nguyen et al. Oct 2006 A1
20060259820 Swoboda Nov 2006 A1
20060278004 Rutt et al. Dec 2006 A1
20070067278 Borodziewicz et al. Mar 2007 A1
20070094312 Sim-Tang Apr 2007 A1
20070168692 Quintiliano Jul 2007 A1
20070214191 Chandrasekaran Sep 2007 A1
20080256138 Sim-Tang Oct 2008 A1
20100146004 Sim-Tang Jun 2010 A1
Foreign Referenced Citations (3)
Number Date Country
WO-9819262 May 1998 WO
WO-0225443 Mar 2002 WO
WO-03060774 Jul 2003 WO
Related Publications (1)
Number Date Country
20060101384 A1 May 2006 US
Provisional Applications (1)
Number Date Country
60624358 Nov 2004 US