The present invention is illustrated by way of example in the accompanying drawings. The drawings should be understood as illustrative rather than limiting.
a is a graphical representation of the memory task interface as a floating overlay in an embodiment.
b is a graphical representation of the memory task interface as a composited web page toolbar element in an embodiment.
c is a graphical representation of the fact collection task basic overlay interface in an embodiment.
d is a graphical representation of the fact collection task advanced overlay interface in an embodiment.
a is a graphical representation of the memory task interface as a floating overlay on a representative web page in an embodiment.
b is a graphical representation of the fact collection task overlay on a representative web page in an embodiment.
A system, method and apparatus is provided for a digital life server. This may allow for long-term management and preservation of valuable digital information by individuals and small groups. Various embodiments generally relate to secure long-term storage, navigation, and processing of digital information in consumer devices and networks. More particularly, some embodiments relate to systems and techniques for storage, archival preservation, and historical navigation of digital information that is aggregated, created, organized, used, and distributed by individuals over very long periods of time, typically, over a lifetime.
Additionally, some embodiments further relate to systems and methods of semantic processing and annotation of transactional information flows initiated by an individual between a web browser or other application and arbitrary information services such as those commonly found on the web. Also, some embodiments relate to systems and methods for automated organization of personally valuable digital information according to temporal, topical, or other contextual relationships using metadata either specified or synthetically derived using analytic or inference techniques. Moreover, various embodiments relate to systems and methods for privacy, trust management, and protection of an individual's or small group's accumulated data, and mechanisms for the controlled sharing of information created and/or accumulated by them in conjunction with distributed storage services and applications.
The specific embodiments described in this document represent exemplary instances of the present invention, and are illustrative in nature rather than restrictive. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the invention.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Features and aspects of various embodiments may be integrated into other embodiments, and embodiments illustrated in this document may be implemented without all of the features or aspects illustrated or described.
In one embodiment, a software-based distributed system for secure preservation and organization of digital information is presented. Such information may be created or collected by individuals or small groups over long periods of time, both through the use of conventional personal computer systems, applications, and devices, and through online web browsing activities. Services provided for direct use by the individual or small group may be configured in a set of server-based software components, and are referred to in this embodiment as the digital life server (DLS). DLS functionality in this embodiment includes:
In an embodiment, the DLS provides information processing and long-term storage services configured in the form of a network-attached server appliance for deployment in an IP network. The DLS network-attached server appliance may be realized as a separate physical device including a processor, dedicated disk storage, memory, network connections, and potentially including other features. Similarly, the DLS may be implemented as part of a system or device, rather then a separate device.
Alternatively, in other embodiments, the DLS network-attached server appliance may be realized in a purely software-based implementation. Thus, the DLS may be implemented as a virtualized server, or “soft appliance,” using hypervisor technologies, such as VMWare™ or XEN™ on a shared computer. Whether the DLS server appliance is embodied in the form of a dedicated physical device or as a virtual server using shared physical computing resources, it may provide the same functionality as a network-attached server
Whether implemented as hardware, software, or some combination of the two, multiple individuals sharing an IP network, may share a single instance of a DLS server appliance. In such cases, each individual may be a unique security principal and their view of the system is through their personal account. Certain DLS services can also be accessed from locations external to the home network using standard internet protocols from a remotely connected web browser application running on any type of device.
The system, in some embodiments, additionally provides two sets of distributed services called operational support services (OSS), and online preservation services (OPS), which may be operated remotely from the DLS. Distributed OSS systems in such embodiments provide functionality including:
There can be multiple OSS service instances and they can be operated by a variety of different commercial operators/providers.
The OPS in such embodiments provides the distributed services interface to online mass storage for preservation of DLS users' data sets. In such embodiments, the DLS is typically operated with a configured OPS service. Distributed OPS systems provide functionality including:
In more detail, communications between the personal computer user's web browser and the DLS are identified as part of a secure communications channel. In the case of this illustration, the secure channel is provided for communication between the user's web browser and a web application running on the DLS. Techniques for securing this channel may utilize standard transport security protocols for communication over IP networks. In an embodiment, the channel is secured using the IETF TLS transport layer security protocol. More specifically, the IETF TLS protocol provides for a mutual authentication option that allows the communication endpoints using TLS to engage in a set of transactions using identity certificates as proofs of their authenticity. Secure communications between the user's web browser and the DLS may employ the mutual authentication option, and utilize DLS-generated identity certificates for the trust proof. The related certificates are created by the DLS' Trust Manager for the user to install in their browser using its normal mechanisms. Certificates are provided for each web browser/device combination that the user chooses to configure. The certificates are requested by the user and provided to them using a web-based administrative interface provided by the DLS in conjunction its support for user account administration.
The personal computer user's web browser is also used for communication with a wide variety of web sites over the internet. Browsing activities with third-party web sites are conducted in the normal manner utilizing the protocols and possible transport layer security mechanisms selected by the third party web site.
With further reference to
Typical for the secure communications channels of the embodiments described in
In more detail, in these embodiments, each DLS is responsible for generating the certificates required for communications with it. This means that certificates required for mutual authentication with one DLS will only work with the specific DLS, and authorization required for communication with another DLS must be explicitly granted in the form of another certificate for that particular DLS. Consequently, this approach establishes a web-of-trust topology designed on the principle that each DLS only trusts itself, and must therefore authorize each party that desires to speak to it explicitly. This style of trust management topology is consistent with the expected use of the DLS as a system for individuals or small groups, and comparatively small numbers of parties who may be authorized for shared access to a particular DLS using the TSS subsystem (which itself is only configured for operation between DLS server appliances).
In some embodiments, DLS-generated identity certificates are based on IETF specification RFC 2693, the Simple Public Key Infrastructure (SPKI) standard. It is possible using delegation as specified in the IETF SPKI standard, to construct trust chains that can effectively model hierarchical trust topologies, as well as web-of-trust approaches. It is therefore also possible to configure DLS systems in a manner that allows for hierarchical trust management, thus allowing for alternative hierarchical trust management designs that could employ one identity certificate for mutual authentication with multiple DLS systems. This is a feature of the SPKI standard that could be configured for the DLS system. Regardless, in such embodiments, trust management for establishment of secure communications channels with the DLS utilizes identity certificates without delegation in order to directly model the relationship between the DLS and each authorized partner device.
In all cases, the DLS is connected to the internet through use of a separate router/gateway system. More specifically, the DLS may require functionality typically provided by a router/gateway system or device for network configuration information including its IP address assignment and configuration of DNS address entries, typically using the IETF DHCP protocol. It is equally acceptable to incorporate the router/gateway function(s) in a server appliance with the DLS, although in such a case, the router and broadband gateway function(s) still remain functionally distinct.
Overview of the DLS Architecture
DLS 400 includes a web interaction framework 410, context manager 415, semantic processing framework 430, history subsystem 435, format conversion framework 440, web applications framework 445, interoperability services and proxies framework 420, collections subsystem 450, identity and security subsystem 455, object storage subsystem 465, preservation subsystem 470, trust management subsystem 460 and an operating system 480.
DLS 500 includes a variety of supporting systems and subsystems, and represents one embodiment of a DLS such as DLS 400. DLS 500 includes web presentation/interaction framework 502 and context manager 504. Further included are databases 506, facts presentation framework 508, query/reasoning framework 510, collection/annotation framework 514, policy/preferences framework 512 and history engine 516. Also included are content extraction/filter framework 518, object structure analyzer 520 and format conversion 522. Additionally, proxy framework and cache storage manager 524 and protocol class policies 540 are included. Moreover, IAS service agents 526, such as HTTP agent 528, SOAP/WS* agent 530 and RSS/ATOM agent 532 are included along with TSS services agents 534 such as NFS4 agent 536 and CAS services agents 542 such as CIFS/SMB agent 544, WebDAV agent 546, CalDAV agent 548 and POP/SMTP agent 550.
Further included are web applications framework 538, collections manager 552, identity and authorization manager 558, security policy system 560, versioning and integrity services 562, and object storage subsystem 564. Additionally, trust manager 556, private storage manager 568 and logical storage volume partition management 572 are included. Moreover, LDAP service 554 and preservation engine and policies 566 are included. Also, virtual machine operating system 570, base operating system 574 and boot loader 576 are included.
Further embodiments and features are described and illustrated in
DLS 818 includes personal semantic interface 821, web access and overlay interface 824, file server interface 827, mail/calendar interface 830 and RSS/Atom interface 833. User device 803 interacts with DLS 818 through user client 806 (interacting with interfaces 821 and 824), through file system 809 (interacting with interface 827), through mail/calendar client 812 (interacting with interface 830) and through RSS/Atom feeds client 815 (interacting through interface 833).
Collections manager 857 interacts with context manager 842, HTTP/SOAP/IAS service proxy 845, CIFS/WebDAV/CAS service proxy 848, POP/SMTP proxy 851 and with RSS/Atom proxy 854. Web interface 839, HTTP/SOAP proxy 845, CIFS/WebDAV proxy 848, POP/SMTP proxy 851 and with RSS/Atom proxy 854 also interact with the interfaces (821, 824, 827, 830 and 833) and thus with user device 803. Collections manager 857 also interacts with HTTP/SOAP proxy 866, RSS/Atom proxy 869, POP/SMTP proxy 872 and NFS/TSS proxy 875 to interact with the internet, for example. Moreover, collections manager 857 interacts with history engine 878, trust manager 881, identity, versioning and integrity services 884 and object storage 887 to interact with a preservation policy engine 890. Preservation policy engine 890 interacts with an outside data source (e.g. the internet). Furthermore, engine 890 and collections manager 857 both interact with local cache storage 893. Collections manager 857 also interacts with semantic processing framework 863 and thus with semantic processing databases 860. Also, web interface 839 may interact with layout and styles database 836.
DLS 920 includes personal semantic interface 925, web access and overlay interface 930 and file server interface 935. User device 905 interacts with DLS 920 through user client 910 (interacting with interfaces 925 and 930) for web access to data provided through the trusted sharing configuration, and through file system 915 (interacting with interface 935) for file-based access to data provided through the trusted sharing configuration. Collections manager 960 interacts with context manager 945 to configure applications and presentation attributes for web access to the trusted sharing data, and with HTTP/SOAP/IAS service proxy 950 and CIFS/WebDAV/CAS service proxy 955. Web interface 940 and HTTP/SOAP/IAS service proxy 950 provide the processing path for web-based data transfers, applications, and transactions, and the CIFS/WebDAV/CAS service proxy 955 provides access to file-based data through interaction with the interfaces (925, 930 and 935) and thus with user device 905. Collections manager 960 also interacts with NFS/TSS service/proxy 965 for access to data provided through the trusted sharing configuration between DLS 920 and DLS 995 or generally between two or more separate DLS instances. The NFS/TSS service/proxy interacts with internet 990 and similar peer services provided by DLS 995 for access to data provided through-the trusted sharing configuration. Moreover, collections manager 960 interacts with history engine 970 to resolve and/or update references to data involved in the trustred sharing configuration, trust manager 975 to retrieve and/or verify authorization credentials presented, identity, versioning and integrity services 980 and object storage 985 to access or store various data exchanged through the trusted sharing configuration.
This section provides an overview of each of the functional areas identified in the embodiment of
Operating System Runtime and Low-Level Storage Overview
Operating system runtime and low level storage provides functionality typical of most modern operating systems, including process scheduling, multi-threading, driver-based abstraction of hardware resources, uniform namespaces, discretionary access controls, and so on. The DLS architecture imposes additional functional requirements in some embodiments, as follows:
In addition to the above requirements, the DLS architecture, in some embodiments, specifies that the operating system and runtime be provided with a secure boot loader. The secure boot loader function must minimally ensure that: the code for the bootloader itself, and all subsequently loaded modules of the operating system and runtime up to the point that it has successfully completed loading can be verified 1) for integrity, and 2) for consistency with a specified configuration of the system.
The requirements of the DLS operating system runtime and low-level storage functional area in these embodiments can be satisfied with a variety of contemporary technologies, including for example recent versions of the Linux operating system such as SE Linux or user mode Linux (UML), kernel technologies such as the LVM3 storage management library, or TrustedBSD. Secure boot functionality may be provided by different combinations of firmware and hardware, and may be satisfied using technology specified by standards setting bodies such as the Trusted Computing Group (TCG). The secure boot requirement need not be included as an integral part of the DLS implementation if the DLS is realized as a virtualized server, or “soft appliance,” using hypervisor technology on a hardware and operating system host platform with equivalent functionality, or in other embodiments where it is not deemed necessary.
DLS requirements for strong isolation of processing and storage; secure boot authentication of the code included in the operating system and runtime; and labeled security with MAC enforcement, stem from the need to provide strong security for parties who rely on the DLS for long-term management of their data. Privacy sensitive functions performed by DLS such as creation and management of secret cryptographic keys used in identity and authorization routines, and symmetric keys used to the protect the individual's long term data, must have high-assurance guarantees against compromise. Similarly, DLS support for Trusted Sharing Services requires that exposure of any shared data objects be strictly isolated to the authorized storage areas and authorized security principals.
Object Storage Subsystem Overview
The object storage subsystem, as shown in the embodiment of
Functional APIs exported by the object storage subsystem are used by the preservation subsystem, history subsystem, and trust management subsystem.
Trust Management Subsystem Overview
Referring to
The trust manager effectively encapsulates implementation of all cryptographic processing, and centralizes all certificate and credential operations in such embodiments. The benefits of this approach are several-fold:
As illustrated in
Identity and Security Subsystem
Referring again to
As illustrated in more detail in
Preservation Subsystem Overview
The preservation subsystem is illustrated in the embodiment of
The preservation subsystem utilizes services provided by the history subsystem to manage the archive status of all data storage objects in the DLS, and to periodically update the remote archives for each security principal's account on the designated OPS. The preservation and history subsystems in combination allow the DLS to be treated effectively as a large virtual object cache—thus allowing users of the DLS to effectively treat it as a network attached storage disk of unlimited capacity. The preservation subsystem further ensures that all volatile per-security principal data and account state is preserved along with data storage objects and content, in order to minimize data loss in the event of a catastrophic failure of the DLS.
Collections Subsystem Overview
The collections subsystem is central to the embodiments of the DLS architecture of
The collections subsystem effectively integrates all functions for creation, annotation, references and referential integrity, manipulation, and management of all data storage objects in the DLS system. The collections object exports APIs for use by the interoperability services and proxies framework; the history subsystem; the preservation subsystem; the semantic processing framework; and through its API, can be invoked through the web applications framework.
PSID 1002 is a persistent system identifier—such as a key for a data entry. Descriptive label 1004 provides a label, and may include a label substructure 1032 with a human readable name 1034 and a description 1036, for example. Owner 1006 provides an indication of a user associated with the data structure 1000, and may include a credential 1038 (e.g. a digital certificate, for example). Authorizations list 1008 provides an indication of what users have various access levels for structure 1000 and may include a list of credentials 1040, for example.
Creation timestamp 1010 provides a creation record of time and date, while modified timestamp 1012 provides a time and date of last modification. Access field 1014 provides an indication of when the structure 1000 was last accessed, and may include access record(s) 1042 for further information about a last access or chain of accesses. Privacy label 1016 provides a privacy substructure 1064, including a privacy classification 1066 and declassification policy 1068, for example. Version field 1018 provides revision status data for structure 1000 and may include change records 1044 for audit purposes, for example. Preservation label 1020 indicates how the data of structure 1000 should be maintained and may include retention policy 1046.
Context metadata 1022 provides context attributes 1048 as needed. Services index 1024 provides file systems data 1050, which may include substructure 1070, with a file system data entry 1072 and a file system index entry 1074. Services index 1024 may also provide mail folder 1052, calendar folder 1054 and feeds folder 1056. Mail folder 1052 may provide mail substructure 1076 which may include mail folder type data 1078 and mail folder index 1080. Similarly, calendar folder 1054 may include calendar structure 1082 which may further include calendar folder data type 1084 and calendar folder index 1086. Likewise, feeds folder 1056 may include feeds data structure 1088, which may include feeds folder type data 1090 and feeds folder index 1092.
MetaQuery index 1026 provides access to metaquery object 1058. Categories index 1028 may provide access to category object 1060. Similarly, data object index 1030 may provide access to zero or more DLS Data Storage Objects (DSO), for example. Object 1062 may incorporate by reference or by value data from file system structure 1070, mail folder structure 1076, calendar structure 1082, feeds structure 1088, metaquery object 1058, and category object 1060.
Interoperability Services and Proxies Framework Overview
The interoperability services and proxies framework provides essential services for all network communications between the DLS and external systems in the embodiment of
A detailed description of policies and services provided by the interoperability services and proxies framework is provided in a subsequent section of this specification.
Referring again to
Web Applications Framework
The web applications framework, as illustrated in the embodiment of
Format Conversion Framework
The format conversion framework, as illustrated in the embodiment of
As illustrated in
Additionally, policies can be registered with the framework similar to conversion “plug-ins.” Policies are used by the framework to control availability of certain conversion options and/or to provide convenient aliases for certain preferred conversion settings. For example, a policy could be registered to alias a certain conversion target datatype as “default,” or “preferred” as way of directing calling applications to select a certain format from among possibly many options. As in the case of most DLS features and policies, the operational support services (OSS) provides the policies and conversion plug-ins to the format conversion framework as part of its update and maintenance services, thus assuring that the conversions are validated and known to be trusted for correct behavior. The format conversion framework is used by the collections manager, the semantic processing framework, and is available to the web applications framework.
Reference specifically to
Descriptive label 1106 provides a label, and may include a label substructure 1138 with a human readable name 1140 and a description 1142, for example. Creation timestamp 1108 provides a creation record of time and date, while modified timestamp 1110 provides a time and date of last modification. Access field 1112 provides an indication of when the structure 1100 was last accessed, and may include access record(s) 1130 for further information about a last access or chain of accesses. Privacy label 1114 provides a privacy substructure 1144, including a privacy classification 1146 and declassification policy 1148, for example. Version field 1116 provides revision status data for structure 1100 and may include change records 1130 for audit purposes, for example. Governance label 1118 may be included, and may also include governance substructure 1170, including authority 1172, policy 1174 and expiration timestamp 1176. Preservation label 1120 indicates how the data of structure 1000 should be maintained and may include retention policy 1132.
Also included may be authority metadata 1122 which may include Dublin Core 1134 (for example). Additionally, user metadata 1124 may be included and may include markup tags 1136. Datastream index 1126 may point to datastream 1150 (and additional datastreams). Datastream 1150 may include an identifier 1152, name 1154, version 1156, configuration label 1158, MIME type 1160, creation timestamp 1162, modification timestamp 1164, integrity MAC 166 and content 1168. Content 1168 may include URI 1180 and content stream 1182, for example, as part of a content substructure 1178.
Semantic Processing Framework Overview
In some embodiments, the semantic processing framework of the embodiments of
The semantic processing framework utilizes the W3C suite of RDF standards for representation and processing of semantic metadata; ontology data utilizes the W3C suite of OWL standards. Databases supporting RDF, OWL ontology data, and taxonomy data are logically encapsulated by the semantic processing framework
Context Manager
Referring to
In more detail, the context manager API provides functions for creating and manipulating context attributes, and for creating named contexts including a selected set of attributes. The resulting contexts can then be enumerated, or selected and set using the API. Context attributes allow the web application framework components that dynamically create the views in each pane to select the matching collections and data storage objects, set application default parameters, and configure presentation characteristics such as graphical representations, selective presentation of certain data fields, fonts, and/or color settings using CSS templates identified by the context attributes.
The set of contexts supported by the DLS is configurable through an administrative interface. In an embodiment, the DLS includes five pre-defined context “classes,” provided as defaults for individual's to create and organize DLS collections and data, facts, and history related activities and interests, The pre-configured contexts are named:
The names of the default contexts are designed to elicit an intuitive response from the individual when they first encounter the system. More technically, the pre-configured contexts also incorporate default attribute and configuration settings. The individual is able to reconfigure the names or default settings for any of the pre-defined contexts, and can create additional contexts.
Unlike common techniques such as application-specific configuration files or name/value pair attribute database registries, context attributes are modeled as named W3C RDF statements and resources. RDF statements are based on a subject, predicate, and object triple structure defined by the RDF standard. Modeling context attributes as RDF statements allows contexts to express directed graph relationships based on the predicates specified in the attributes, or nodes.
Context attributes are able to model concepts, such as application semantics involving dynamic behaviors based on changing time or role-based relationships. This potentially has particular importance in the DLS since time-based and role-based relationships may play a critical role in so many aspects of subsystem and related presentation behaviors in the DLS. Context attributes can be used to model concepts, such as relationships based on time, thereby supporting adaptive presentation of the underlying data types when their temporal relationships change using time-based navigation controls in this application. Presentation behaviors based on changes to conceptual relationships can also affect presentation settings across multiple components simultaneously, as for example in the case of the personal semantic workspace, thus again illustrating the system-wide benefits of the context manager in configuring and coordinating behaviors throughout the DLS system.
The context manager API additionally provides a function to “forget” a named context. The “forget” function does not immediately delete the context and its attributes, but instead marks them as available for possible deletion at a future point in time. This is important, since attributes may be reused in multiple contexts, and as along as they are referenced by any context they cannot be deleted. The context manager implements a mix of reference counting and a periodic sweep of the attributes to identify unreferenced attributes that can be garbage collected.
Web Interaction Framework
The web interaction framework, as illustrated in the embodiment of
The web interaction framework APIs allow callers, such as web application framework programs, to set and configure different presentation styles and features, and to specify delivery of certain browser logic such as embedded script code (e.g. Javascript) or active content (e.g. Java applets, or Microsoft ActiveX™ controls) depending on the characteristics of the browser. By separating the specification of the script code or active content logic required by the DLS application or subsystem from the decision about which implementation to inject in the web page stream for the particular client browser, the web interaction framework allows the DLS to evolve support for a wider variety of different client browsers without having to couple the update and maintenance cycle to parts of the application that are unaffected by presentation.
Similar to techniques used in the context manager, the web interaction framework uses RDF and RDFS to model its configuration data, thus allowing specification of semantic relationships between parts of the configuration. This for example allows configurations to express relationships affecting selection of certain script code libraries by the web interaction framework based on relationships such as whether a certain script library should be included based on requirements of another library or the characteristics of the client browser. This functionality allows the web interaction framework to provide late-binding and adaptive results which are not as easily achieved using conventional techniques based on configuration files, name/value attribute registries, or programming language-specific techniques that merge application and presentation logic in a single structure, such as Java Server Pages. These benefits are particularly important to design of the DLS in support of enabling its operation with the broadest possible variety of current and future browsers and web-enabled devices, while minimizing effects of this support to parts of the system uninvolved in interfacing directly with presentation and interaction concerns arising from those various devices and their capabilities.
Digital Life Server (DLS) Appliance Runtime and Security Architecture
The DLS isolates the collective set of data associated with each individual's or small group's account in its own separate logical volume, and executes all account-specific processing in its own separate virtual machine instance in some embodiments. The logical volume structure establishes the root of the file system and the associated namespace uniquely with the account. Security policies enforced by the base runtime system ensure that users cannot navigate or manipulate the disk file system or structures outside of their volume namespace unless they can present the required cryptographic authorization credentials. The virtual machine architecture effectively ensures that all process execution on behalf each account occurs in an isolated process space within the DLS appliance.
Identity certificates for each account/virtual machine instance provide the basis for authenticating it as a unique security principal, including the base runtime instance of the DLS itself. Authorization credentials created for each security principal function effectively as capabilities, and are used to grant/obtain access to various processing and resources throughout the system. Each principal may have potentially many authorization credentials depending on the access they require to various services and resources.
Both hierarchical and web-of-trust (non-hierarchically rooted) trust chains can be constructed using the certificates and credentials mechanism; hierarchical trust chains are a trust chain with a single root. In an embodiment, identity certificates and authorization credentials are constructed and processed according IETF RFC2693, the Simple Public Key Infrastructure (SPKI). Alternative approaches are possible and likely, particularly on certain interoperability boundaries of the system where, for example, it may be necessary to also support X509v3 certificates as required by existing or legacy third-party services. In the interest of increased protection from traceability and inadvertent exposure private data, the DLS also supports secret key certificates on system boundaries where interoperation with other supporting services can be arranged. Due to the possible and likely need to support multiple representations, each set of trust chains is managed as a separate class of trust domain.
In addition to accounts for each individual, the system may be configured to support role-based accounts in support of shared access to certain authorized resources within a single DLS, or between multiple distributed DLS systems using Trusted Sharing Sevices. For example, different groups each with their own DLS instance may desire to establish shared access to photos and video content, academic materials, diaries and/or blogs, and so on. In such cases, a role-based account created or assigned as part of the basic DLS system for the purposes of sharing group-authorized resources executes and is responsible for managing the associated resources. Role-based accounts effectively function like per-individual accounts and are primarily distinguished by their associated certificates and authorization credentials.
In an embodiment, role-based accounts are configured with contexts to facilitate logical mappings between authorizations and information organized within a given context by the individual. The effect of this configuration technique provides a direct means for the individual to comprehend how associating a given collection with a given context may affect access to information in the collection. Continuing with the previous example of group-authorized sharing using role-based authorization, an embodiment provides five pre-configured default contexts in conjunction with the context manager, one of which is the public context. The public-authorized sharing role is configured as a public authorization on the public context. Consequently, when the individual creates a collection in the public context, the collection is automatically configured with the authorizations required for parties in the public-authorized sharing role.
Processing within each per-individual account virtual machine instance utilizes a mix of discretionary access controls (DAC), and mandatory access control (MAC) policies for process-local operations. MAC policies in the virtual machine are configured as part of the distributed DLS policy provided by the OSS platform configuration policy service and are primarily used to enforce principle of least privilege security for loadable third-party modules such as content filters, format converters, and other loadable framework modules. The base runtime system spawns the per-account virtual machine instances and provides shared services such as access to storage resources, shared cryptographic routines or hardware, and user authentication to the DLS system itself.
The base runtime primarily relies on MAC policies and enforcement. Security labels maintained on resources in the base runtime system in conjunction with MAC enforcement help to isolate sensitive administrative applications and services in the base runtime from manipulation that could subvert correct operation of the DLS appliance either inadvertently or through malicious intent. Authorizations required for normal operation of the virtual machine instances and their access to storage, authentication, and communication services in the base runtime are configured as part of the standard policies in an embodiment. Per-account virtual machines are spawned upon successful authentication by the base runtime of an individual for whom an account exists on the system. Communication between the base runtime and spawned virtual machine typically utilize inter-process communication techniques (e.g. native RPC, RMI, CORBA, or SOAP) thereafter until the virtual machine is terminated.
Trust management services typically run as a separate process in each distinct account process space, including the base runtime and each per-account virtual machine for account-specific key generation, key management, signing, certificate management, credential generation, and associated prover/verifier functions. Trust management services additionally implement and enforce equivalence class mappings between trust domains, if such mappings are required for cross-domain authorization, as might occur when combined access is required to services that rely on different identity certificate representations and trust roots. Execution of trust management services as a separate local process in each virtual machine instance and in the base runtime, as opposed to a system-wide shared process, helps to enforce strong isolation between different accounts and their respective privacy requirements. This is particularly important for ensuring protection of cryptographic materials used in both public key and secret key certificates, and zero-knowledge cryptographic proofs.
Keys are generated and managed by an instance of the trust manager running in each account, and manipulated strictly in that particular account's process address space and associated Private Storage area, thus significantly reducing the potential for inadvertent exposure of secret keys and improving the basis for utilizing strong key separation for different tasks. Credentials are generated and processed by the Trust Management services in each part of the system (account virtual machines or the base runtime) in conjunction with requests for service or access to resources owned by those respective parts of the system. The resulting functionality ensures that processing within an instance of the DLS occurs with the same principled privacy and isolation as if each individual's account was executing on its own dedicated, secure processor.
DLS Device Initialization and Trust Establishment
In many embodiments, certificates for each DLS account/virtual machine instance provide the basis for authenticating the account as a legitimate security principal, including the base runtime instance of the DLS itself. The ability for these security principals to mutually prove and verify trust in each other utilizes a bi-directional set of trust chains that effectively allow the base runtime instance to verify its trust in the account/virtual machine instances for which it generates certificates, and conversely, for the account/virtual machine instances to verify their trust in the base runtime instance, each using their separate and respective instances of the trust manager as previously described. This functionality is potentially of particular importance in support of the ability to move or regenerate DLS account/virtual machine instances on a different DLS, such as when a device needs to be replaced, or if the account virtual machine is moved to or from a virtualized server, or “soft appliance” implementation as previously described.
The runtime system must additionally be able to prove its trust in the DLS device itself. An important consideration in establishing this relationship is that it must be robust in the event of DLS device replacement scenarios. For example, using services of the preservation engine as subsequently described in this specification, it should be possible to retire the original DLS device where a set of accounts were established, install a new DLS device, and restore all of the data from the individual's or small group's OPS without a requirement for participation by a third party, and without any potential for key or identity compromise due to key escrow exposure.
In an embodiment, the DLS device identity is provided by use of a removable secure chip card consistent in design and functionality with the standard SIM Card commonly used in GSM and 3GPP mobile applications. The DLS device provides support for two cards for purposes of redundancy, which are configured effectively as duplicates and integrated using standard connectors on the device main circuit board.
Initialization of the original DLS device and the base runtime utilizes the certificates and identifiers provided in the SIM Card to create the trust relationship between the device and the base runtime. No data is written to the SIM Card, as its purpose is solely for verification of the trust relationship between the running DLS software and the device in which it is installed. Thereafter, all other trust relationships between the base runtime and subsequent creation of security principals occur as described in the previous paragraphs. Once the initialization is complete, the owner should remove one of the SIM Cards and retain it in a physically secure manner. Completely removing both SIM Cards renders the device effectively unusable.
Future replacement of the DLS device hardware simply requires installation of at least one of the original SIM Cards in the new device, after which recovery utilities can be used to connect to the owner's selected OPS for restoration of their DLS account data using functionality of the preservation engine as described later in this specification.
Advanced Trust and Account Management Semantics
Secure operation of the DLS system in most embodiments is designed to ensure strong privacy for every individual and their interests, with the ability to encode sufficient policy representations for dealing with normal desires and events encountered over the course of a lifetime. Security thus must be able to cope with replication, delegation of authorizations, and separation of certain portions of data sets according to events such as when a person achieves legal adult status, an individual marries and joins or shares certain portions of their data set with their spouse, if the individual and their spouse divorce and some data assets need to be divided or replicated between them, when an individual joins or later separates from a group or business relationship, and disposition of the collected data assets when the individual dies.
Similarly, sharing of some portions of an individual's data set with their relationships, must also be accommodated with predictable and natural semantics corresponding to those relationships. The trust management credentials, object storage, virtual machine process isolation mechanisms, and Trusted Sharing Services of the DLS system are designed in their collective operation to provide technically-enforced distinctions for individuals and small groups between what they perceive and can trust as private, versus what is trusted and shared, versus the public internet. As such, semantics related to privacy and trust must be as close to intuitive as possible based on flexible technically-specified policies that reflect commonsense reasoning, accompanied by strong cryptographic protections and enforcement. As previously described, the DLS per-individual and role-based account mechanisms and trust management functions provide the basis for this functionality.
DLS Network Connection and Services Interfaces
In many embodiments, services provided by the DLS are deployed in the form of a server appliance for use in an IP protocol-based network. Since the IP protocol can be effectively deployed in a standard manner over a wide variety of underlying datalink and media access protocol disciplines, there is effectively no constraint on how the DLS is connected to the network, including various wired or wireless technologies such as the IEEE 802.11x protocol suite, ultra-wideband (UWB), and so on.
The DLS, in such embodiments, is configured as a set of proxies between traffic internal to the network and outbound network connections to the external internet, typically through an existing broadband router or gateway device. Basic proxy configuration of the DLS and the router/gateway utilizes techniques commonly understood by practitioners skilled in the art, and may include automated configuration using services defined by the UPnP™ protocol suite, and/or manual configuration using a web-based administrative interface. In the case of manual configuration, the default administrative interface is provided on a default IP address configured on the DLS for access from a locally-connected computer. Once connected to the network, the DLS utilizes DHCP services typically provided by the gateway to configure standard IP addresses and network services such as DNS, and is able to access other common IP services such as Dynamic DNS, NNTP time services, and so on.
Individuals interact with DLS-provided services through three classes of protocols interfaces:
Organization of service agents into protocol classes allows them to be managed both in terms of particular DLS security guidelines for a particular set of service agents, and for policy-based configuration management by the supporting Operational Support Services (OSS). The CAS, TSS, and IAS protocol classes logically organize sets of functionality that are integrated within the DLS architecture for distinct purposes, including:
Protocol class policies are defined and distributed by the OSS and may be periodically configured and updated through interaction with the DLS' associated OSS provider. Service agents present their associated protocol class policies, and possibly additional service agent-specific policies to the proxy framework.
The common application services protocols class, or CAS, supports DLS access from applications on personal computers or devices primarily from within the home network. Functionality supported by these protocols enable access to DLS services typically in a client-server mode using widely deployed standard application protocols. DLS services supported by the CAS class of protocol service agents include:
The purpose of the CAS protocols class is to assemble the required set of application interoperability interfaces required for connection and data transfer with the DLS. The set of supported CAS protocols is exemplary and non-limiting with respect to the possible supported interoperability protocol suites, since selection is matter of commercial relevance and may be adapted over time according to market conditions. In particular, additional protocol service agents required for interoperation with DLS-provided file services; electronic mail and messaging services; calendar services; and/or naming, discovery, and directory services can be defined and implemented consistent with the service agent architecture, and managed using protocol class policies. Protocol class policies are used to define configuration settings and restrictions such as authorizations for administrative configuration and access, protocol-specific parameter settings, parameters for secure channel configuration, and so on.
The trusted sharing services protocols class, or TSS, supports inter-DLS data object sharing between authorized security principals. Trusted sharing services allow authorized DLS security principals to export access from one or more logical storage collections to a set of authorized security principals associated with a different DLS. As an example, a group may choose to publish a collection of digital photos and notes from a trip or event to other related groups who also have DLS systems. The TSS service agent(s) in each of the DLS systems implement the protocol operations required for authenticating and connecting the authorized set of storage collections, and also manage any associated protocol-specific state associated with the resulting communication session(s). TSS service agents allow each of the shared storage collections to appear effectively local on the distributed set of connected and authorized DLS systems.
Services provided by the DLS proxy framework are used the TSS service agent to request caching services according to the TSS agent's protocol class policy, thus allowing the agent to adjust quality of service for improved liveness and response for access to the exported collection storage and data objects. Security authorizations on the collections and their data objects are interpreted and enforced by other DLS subsystems such as the trust manager. More specifically, the TSS service agent is responsible for protocol security associated with authenticating, connecting, and maintaining the communications session(s) between the authorized DLS systems—all other authorization and access decisions on the shared collection storage and data objects are enforced in a completely uniform and consistent manner according the responsible DLS trust and security subsystems.
The internet application services protocol class, or IAS, supports web-based service access with the DLS. Protocols supported by IAS service agents are utilized by the DLS for a variety of functions, including:
IAS service agents are fully consistent with the DLS service agent architecture and protocol class policy mechanisms. Protocols supported by IAS service agents include:
The purpose of the IAS protocols class is to assemble the required set of interfaces required for web-based interaction with the DLS. In particular, the set of supported IAS protocols for Web Services interoperation based on W3C SOAP and WSDL is exemplary and non-limiting with respect to the possible supported Web Services application protocol suites, since selection is matter of commercial relevance and may be adapted according to market conditions. Services provided by the DLS proxy framework are used the IAS service agent(s) to request caching services according to the IAS agent's protocol class policy, thus allowing the agent to adjust quality of service for improved liveness and response for access to various data objects.
DLS Collections
The DLS, in some embodiments, enables users to create, store, and organize information from their existing personal computers, devices, and familiar productivity and multimedia applications using the common application services (CAS) service agents. The DLS additionally operates as a transparent network proxy using the internet application service (IAS) protocol agents. IAS protocols and proxy functions allow the DLS' services to be invoked as part of the normal web browsing experience through any modern browser, inline with any web page, without additional software. Services invoked as part of the browsing experience make it possible for users to reference, save, annotate, link, and aggregate information encountered as part of their browsing experience according to their own self-defined organization. Regardless of whether the resulting organization is created through the CAS or IAS service agents, the DLS internally organizes and stores the resulting data as objects and references in organizations called collections objects.
Collections can be navigated topically or historically, expanded or annotated with additional information from potentially multiple applications, and selectively shared according to defined trust relationships with other DLS security principals (either individuals or role-based accounts, both within the same home network or in a different location).
Collections are created and managed by the DLS collections manager. Collections logically resemble the familiar concept of file system directories, but offer significant additional innovations beyond these previous structures, as follows:
Collections are the native structure for organizing all data objects managed and processed by the DLS in some embodiments, and must therefore behave polymorphically in the presence of different access methods and applications. While a variety of technologies such as network operating systems and file servers have previously developed techniques for mapping different types of file services on a common native file store (e.g. the ability to support NFS and CIFS file systems and semantics over a common storage model with fidelity for naming and native ACLs), the challenges addressed by the collection manager are broader.
Since the DLS is designed to function as a system for managing all data objects for an individual or small group over long periods of time, the collections manager must deal with file system semantics, but also semantics of other data objects including mail and messaging applications, syndication feeds, calendar and event data, and various application data. As identified in item one of the above list, the collections object supports mappings from multiple services and applications into its uniform object-based data model. As identified in item two, these mappings provide referential integrity between the different service and application views of the data, or semantics, and the internal representations of the data objects as managed by the collection.
In more detail, the collection manager provides an interface to CAS and IAS service agents that allows collection objects to be accessed using semantics and datatypes that are native to the specific type of service agent. The interface allows service agents to create and maintain a consistent view of the data they create and manage, including their security settings and metadata. The collections manager uses the collections object services index field and its array of data and index structure objects to record and manipulate this information. The collections manager provides an API that allows service agents to create a data and index object for their specific agent type; one instance of the data and index object is created for each CAS or IAS agent type that uses the collection. The data section of the object is used to record information about the types of data structures that the service agent requires for its operation, and the index section records the service agent-specific per-object data for each data object (e.g. “file”) that the agent creates or manipulates in the collection. The data and index fields are polymorphic data types that service agent specializes to map the specific semantics and data that it manipulates. The collections object can also provide additional functionality for native DLS applications to create and manage per-application views on a collection in a manner similar to support provided for per-service type views and semantics provided to CAS and IAS service agents.
While each service agent only sees its view of the data it has stored in the collection, different views on the collection object provided by DLS native semantic processing applications can access and dynamically organize the data in more flexible ways. As identified in item three in the above list, the collections object supports contextual tagging. Contextual tagging allows an individual or other DLS automated semantic processing applications to associate terms and predicates with the collection that can enhance processing of its data. For example, an individual who is a chef might create a recipes collection to manage all their mail with various friends or groups on topics related to food, recipe documents, web clippings, references to culinary web sites, and so on.
The collections manager is capable of uniformly representing all of these different data types as part of the recipes collection, and with contextual tagging, the individual can additionally associate terms and/or predicates that allow DLS applications to perform semantic processing and customized presentation of the related data. Continuing with the example, the individual might create predicate tags associating the term “healthy” with preferred types of food groups that appeal to them. Later, the DLS contextual search application can use the “recipes collection” contextual predicate tags to optimize its results so that a search on the phrase “healthy recipes” returns results prioritized to the individual's preferred food group associations with the term “healthy,” as opposed to an unprioritized list of results simply matching the basic search terms. Unlike search techniques based on lexical analysis, the DLS contextual search integrates predicate tags provided by the individual that capture personal preferences, interpretations, and knowledge as part of the search process.
Metaquery support is a related feature to contextual tagging that allows the collection object to index and retain pre-configured queries on various local and distributed data sources. For example, semantic processing features of the DLS can be configured to support optional inference engine and knowledge bases. MetaQuery support allows the collection object to maintain a set of logically related topical queries with the collection data for the purpose of synthetically generating data results in the collection using services from the classifier/inference framework. MetaQuery objects are self-typed objects managed by the Collection Object and referenced through its MetaQuery Index field. The W3C SPARQL language is one example of a MetaQuery object type. Contextual tags may be referenced in MetaQuery objects, and thus returning to our example, the individual might add a MetaQuery object that uses the “healthy” context tag that looks for results satisfying a query for all of the foods that the user has associated with the predicate “healthy,” and which are referenced in recipes published in the last month by a list of their preferred web syndication feeds. The results of the MetaQuery are dynamically generated and may be viewed when the collection object is accessed through the DLS' personal semantic workspace.
As indicated in item 4, the collections object natively supports versioning, thus allowing for changes to the collection to be tracked and, if desired, reverted to a previous version. The collections manager uses services of the DLS' versioning and integrity services to snapshot and maintain versioning information.
Item five in the above list relates to collections object support for security functions provided by the DLS. Privacy labels on each collections object allow the individual to set controls on the collection that restrict its visibility strictly to security principals holding the correct credentials. Returning to our previous example, the individual may set a privacy label indicating that only security principals holding a valid credential for the privacy label “Friends Read Only” granted by the local DLS' trust manager may access their “recipes collection.” The individual may then share their collection using the Trusted Sharing Services, and when access is attempted by another party, that person will only be able to view the “recipes collection” if they have a valid credential with the correct “Friends Read Only” privacy label. The collection object privacy label additionally supports specification of a “Declassification Policy.” The declassification policy allows the individual to indicate the conditions under which the privacy label should become nonrestrictive. For example, the individual may indicate that the label expires at a given time in the future.
Item six in the above list relates to preservation services provided by the DLS and collection object support for retention policies. The retention policy allows the individual to stipulate the frequency at which they want the collection to be written to the configured preservation system, how many versions should be retained in the system at any time, and the duration of the history that the system should preserve. Returning again to the example, the individual may find it acceptable to retain only the current version of any of the data in the collection for a period of two years, and to record it to the OPS no more frequently than once per month. This may be adequate if the data in the collection is relatively stable and the individual has no interest in navigating back over their accumulated history in the recipes collection for more than two previous years. Alternatively, the individual may frequently update their collection and have a particular interest in wanting to be able to navigate back through their history for as long as they've been accumulating it. In this second example, the retention policy could be set to maintain two versions of all updates on data in the collection, to record the collection to the OPS no less than once per week, and to maintain the history indefinitely.
Unlike conventional file systems or databases, the DLS collections object design provides unique, integrated features for treating data created or acquired from both current personal computer applications and devices as well as through online services and normal web browsing, uniformly, across long periods of time, with consistent security semantics.
The DLS collections object design point further expects that even if the original services and applications that were used to create various objects in the collection cease to exist at some point in the future, the individual will still desire to retain access to the data and, more subtlety, any knowledge they've developed as a result of linking, annotating, aggregating, and cross-referencing the various data they've acquired. The collections object and DLS storage object structures are capable of directly capturing, representing, and preserving this type of knowledge.
Data managed by collection object structures is organized in the form of DLS storage objects (DSO). A DSO shares many of the same metadata, privacy, and preservation semantics as the Collection Object, and may inherit data for the same shared fields. For example, DSOs will commonly inherit settings for their retention policy from their associated collection object.
In addition to semantics shared with the collections object, the DSO supports a variety of additional semantics particular to their per-data object relationships, as follows:
DSO support for multiple datastream objects allows services provided by the DLS to create and manage multiple variant renditions of the same data under one set of identifier, name, and metadata attributes. For example, it may be critical to retain an original and unaltered version of a document that was generated in a particular word processor format that has fallen out of wide-spread commercial support because it was cryptographically signed and has commercial or legal value. Yet, at the same time it may also be desirable to generate an easily processed and viewable rendition of the same document using services provided by the DLS format conversion framework for convenient viewing and reference in the future. DSO support for multiple datastreams and rich provenance metadata supports the ability to maintain both the original and the converted datastreams, and sufficient metadata to distinguish and trace the heritage of both versions.
The DSO datastream object structure additionally supports a configuration label attribute. The configuration label allows the collection manager to tag the DSO datastream structure with an operational support services (OSS)-provided configuration label for the version of software running on the DLS at the time of creation. As presented later during discussion of the OSS, the OSS creates a label for each software configuration it provides to DLS systems. This allows subsystems in the DLS that may need to take particular care for tracking actions associated with a particular version of software components to associate a checkpoint label with the sensitive data. The label may be used at a later point in time with cooperation of the OSS' DLS software configuration service to resolve which version of an application was used, and may be particularly helpful for specifying a specific source type to the format conversion framework if a DSO datastream must be converted for rendering in the future.
Additionally, DSO datastreams may be managed either as URI references (i.e. “by-reference” data), or actual data copies (i.e. “by-value” data). This feature of multiple datastreams support allows DSOs to support web “clippings” features of the memory task semantic application, thus allowing the created DSO to optionally retain only a reference to the original source, or a copy and a reference to the original source.
DSO support for governance labels allows each object to retain any specified conditions or restrictions associated with the original data reference, along with information about the authority and the expiration date of the label. The policy element of a governance label is an object that encapsulates a reference to data typically specified by a third party. As an example, Creative Commons Licenses are one class of governance labels currently in widespread use in the internet. Other examples of governance labels may come into use over time based on standards from groups such as ISO MPEG-21. Governance labels are an informative part of the DSO record and, if processable, are enforced by applications outside of the DLS.
Preservation Functions
Support for long-term data preservation builds on various embodiments of the DLS' storage design which effectively treats the local disk storage system as an “object cache.” Integrated metadata, versioning, and data security features supported by the collections object and DLS storage object structures, as previously described, enable secure third-party online storage and redundancy (virtualization) for remote copies of the individual's aggregate set of collections. If multiple individuals share a single DLS as in the case of a family or small group, each individual's collections are individually managed.
The DLS preservation engine and policies subsystem is responsible for managing data preservation functions. The preservation engine runs as a local process in the respective base runtime or per-account virtual machine, and implements per-account processing based on retention policies or historical navigation over collections and storage objects in the account's associated storage volume. Support for preservation functions is provided in conjunction with an associated online preservation service (OPS). The OPS is responsible for account management and backend policy management of mass storage systems for high-availability and reliability of all preserved data.
In the normal case of various embodiments, the preservation engine is invoked periodically according to the current policy settings in order to checkpoint and record both per-account collections, account information, and system data. Policies may be global (system-wide) or local (DLS-specific) in nature. Global policies are periodically supplied to the preservation engine by the OPS as a function of its administrative and maintenance services. OPS-provided global policies provide data for the frequency, versioning, and retention policy for all basic system and account data in the DLS. Local policies are derived from the per-collection retention policies. Local per-collection retention policies override the global default values supplied by the OPS, and may indicate more or less aggressive preservation strategies depending on the settings selected by the individual.
The structure of the data transacted by the preservation manager during interaction with the OPS is organized as a set of “blocks” or stream components. The data structures are referred to as an “epoch Archive Data Record Structure,” or arcdata. The arcdata structure is designed for real-time processing both during reading and writing operations, and is effectively processed in “streaming” mode. A specific instance of an arcdata structure covering preservation of data objects over a specific time period is referred to as an epoch.
Reference specifically to
The arcdata block 1250, in turn, includes an administrative redundancy block 1255, an arcdata block sub-index 1260, a privacy section 1265, a canonical storage object section 1270 and a bulk data section 1285. Canonical storage object section 1270 may store a set of DSOs (e.g. DSO[1] 1275 and DSO[n] 1280). DSOs may then point to data stream objects such as object 1290 of bulk data section 1285. Block sub-index 1260 may point to a chain of DSOs or provide a set of pointers to a set of DSOs, for example.
Storage subsystems 1333 confirms 1365 the writes were executed. Responsive to this confirmation 1365, the DLS confirms the write 1370, and completes the write request 1375. The write reservation is then released 1380, allowing for other access.
More specifically, when the preservation manager is writing data from the DLS to the OPS service, it creates an authenticated connection with the OPS service indicating the epoch that it wants to write. If the authentication materials are approved by the OPS, the OPS allocates a reservation with the storage system for the requested transfer and returns an authorization, or “ticket,” and an opaque referral “handle” to the DLS' preservation manager. The preservation manager uses the ticket and referral handle to identify the authorized reservation when it's ready to start writing data to the storage system. The preservation engine creates the arcdata record for transfer dynamically and sends the blocks incrementally as it works its way through the data selected for the archive set according to the current policies. The arcdata is cryptographically protected for confidentiality and integrity as it is transferred, using keying materials generated by the local process' trust manager. Cryptographic processing is applied at the granularity of stream blocks (except for the administrative block, which is only processed for integrity).
When the preservation manager is reading data to the DLS from the OPS service, it creates an authenticated connection with the OPS service identifying the epoch it wants to retrieve (possibly at the sub-epoch block level), and then reads the data in streaming mode from the remote storage system and processes it immediately to restore the collections and objects in the record. Similar to the writing process, decryption and integrity verification is performed dynamically as the data is received.
In more detail, when the preservation engine commences a writing sequence, it requests the DLS' history manager to determine the starting date of the epoch it should create. The starting date of the epoch is not simply the date following the last recorded checkpoint, but may instead include a sparse matrix of data from an earlier time period that has already been recorded if the data from the earlier period was modified, for example as determined from the collection object or DSO versioning metadata. The preservation engine uses the information from the history manager to process the set of collections and objects for the archive set and creates an index for the epoch that identifies all of the objects contained in it. The epoch index is then retained for the arcdata administrative block and a copy is provided to the history engine. The history engine merges the epoch index with its local master index of every collection and data object that has existed in the system. The history engine's master index is periodically recorded to the OPS as well, according to the OPS-specified retention policy.
During history navigation, such as when the individual is using the DLS' semantic history navigator, the individual may scroll to a historical point for which there is no data in the local DLS object storage for processing. The history engine services navigation requests and can determine using its master index the epoch in which a certain object exists and its dependencies (in case these might span multiple epochs). Failure to locate the requested object in local storage causes the history manager to raise a notification to the preservation engine with the epoch data it needs to retrieve. The preservation engine invokes the read process with the OPS and retrieves the associated arcdata blocks as previously described.
OPS-provided global policies for the preservation manager include information about cache management strategies, including conditions that might exist in the DLS when it is optimal for the preservation engine and history manager to perform anticipated reads if the user is operating on data that is close to an epoch for which data is no longer available on the local DLS. OPS-provided global policies also provide direction to the preservation manager and history engine for when it may be optimal to purge certain epoch data. In both cases, the OPS only provides policy data and is not involved in execution or enforcement of the policies by the DLS.
The advantage of having the OPS provide the cache management policies for the DLS preservation manager is that it is able to monitor a wide variety of access behaviors and performance metrics across aggregate workloads and generations of DLS systems as well as its own quality of service (QOS) performance. The aggregate monitoring data allows the OPS to model quality attributes systematically across its overall operations, allowing it to adjust policy for improvements to overall availability, transfer speed, liveness, effects of different block size policies on overall performance, default outstanding block transfer window settings, and possibly other conditions. The data available to the OPS for performing this monitoring is strictly aggregate and neither relies on, or contains, any DLS-specific or sensitive information.
Special Issues for Preservation of Cryptographic Materials and Account Information
The DLS, in some embodiments, requires that an individual's or small group's account must be able to survive and evolve according to consistent privacy and authorization semantics over very long periods of time, yet also implement best practices for refreshing and renewing all cryptographic materials underlying the representation, evaluation, and enforcement of those semantics. It is predictable that the set of supported key strengths and cryptographic algorithms will change, perhaps very significantly, over time, and yet over time different sets or configurations of cryptographic infrastructure will have been employed to process data or establish authorizations and trust relationships during any particular period. In support of these evolving requirements, it is therefore critical to establish a set of storage and processing mechanisms that take a virtualized approach to creating and managing all resources and data in the user's environment, such that aspects of the required infrastructure can be restored and executed when or if required, and that the history of any associated security semantics is explicit and inspectable.
The DLS is designed to automatically and securely preserve credentials, certificates and associated resources that have durable value in conjunction with the evaluation or verification of specific data objects as part of the preservation engine function, thus allowing the individual to navigate to a point in their historical timeline, and access and inspect durable parts of their record. This functionality specifically does not apply to protocols or functional aspects of communications supported by the system which must correctly enforce techniques such as perfect forward secrecy, and it is explicitly not a form of key escrow. Rather, the history engine, collections manager, and preservation engine work together to ensure that necessary resources that must exist in order to cryptographically process or verify a given item, typically a DSO datastream, are retained and can be restored when required. Techniques for ensuring protection of cryptographically-sensitive keying materials include cryptographic wrapping and binding of the materials with the associated DLS account in such a manner as to ensure that they cannot be easily copied, reused, and/or subverted for malicious purposes if inadvertently exposed. Such wrapping and binding functionality may be accomplished in a variety of ways using a reliable and sufficiently strong key or token that is uniquely associated with the DLS account. In general, preservation of cryptographic materials is managed like other collections and DSO objects using the protected arcdata streaming mechanisms as previously described. The primary difference is that preservation of sensitive cryptographic materials is transacted with the trust manager, and the trust manager is responsible for any processing that must be applied to protect the materials prior to making them available for preservation.
The DLS' preservation engine, history engine, and arcdata processing functions are able to represent the necessary information and support the ability to restore or configure processing in the virtual machine in a manner that allows associated data from a referenced epoch to be processed according to the mechanisms and policies of the system and the data as recorded. DLS processing of historical data and authorizations, for example involving digitally signed and hashed data, should be able to arrange availability of the necessary cryptographic materials from the relevant epoch in order to verify the signature and report on the integrity of the data as captured. This must be done in conjunction with the current processing configuration and policies, and it may raise an exception if certain policies have changed or expired due to the passage of time. For example, the certificate for the required signature verification key may indicate that it is no longer legally valid. Irrespective of a policy exception arising from this type of time-based condition failure, the DLS processing is able to answer questions about validity of the data within the epoch that it was originally recorded, in which case the exception can be evaluated relative to its implications as a condition arising from the different timeframe and context of interpretation.
Notice that historical processing addresses a different set of issues than processing to refresh or update a digital signature on a given DSO Datastream. In the case of refreshing the signature on an historical object, the object is retrieved if required using the standard functions of the preservation engine, and is then made available to an application either hosted by the DLS or a different system, as required. The refreshed object can then be stored as a new DSO datastream with the original DSO, or handled as a new DSO and associated datastream. The choice of the correct approach is specific to the semantics of the application or authoritative legal jurisdiction. Regardless, the DLS provides for both cases, and the resulting effects can be correctly preserved and navigated historically based on the metadata associated with the objects.
Semantic Processing Framework
Some embodiments of the semantic processing framework (SPF) provide functionality for consistent application of a set of data processing techniques for acquisition and organization of facts, queries, and reasoning over content acquired dynamically through web protocol transactions and from DLS data storage object (DSO) datastreams. Semantic processing in the DLS system provides functionality including:
Functionality supported by the framework enables creation of DLS applications that can assist users in identifying and stating knowledge about the documents, media, events, topical information, and references they value, as well as explicit and/or inferred relationships based on temporal, topical, task-based, or other predicate relationships described through standard W3C RDF statements managed by the individual's RDF fact store.
The personal semantic workspace application, as illustrated in
Functionality provided by the SPF includes:
SPF subsystems and their use in various DLS application scenarios is illustrated in
Referring first to
It is additionally important to understand how SPF subsystems handle references and identifiers. As previously introduced, the SPF supports processing on any supported datatype, where the set of possible supported datatypes is extensible and can evolve over time. It is therefore desirable for SPF processing to utilize a self-identifying type of object reference, and in the case of the DLS this datatype is referred to as the conformable object reference (COR).
The COR is an extensible object structure for passing different types of references as self-identifying datatypes in a uniform manner. The COR additionally provides a means for specifying certain policy options, and if required, attachment of authorization credentials, to a specific reference. COR policy settings enable the application that creates the COR to specify requirements for SPF subsystems, such as whether it is permissible for an SPF subsystem to autonomously invoke processing by the Format Conversion Framework in order to request translation of a source datastream datatype. As another example, policy settings can be used to convey the depth and scope of traversal that the SPF subsystem should pursue in dereferencing the associated reference, for example to ensure no more than a single depth traversal on a remote URL reference, or multi-level traversal but not beyond the specific target host. As another example, policy can be specified in the COR to restrict or prohibit processing of script code or active content associated with the object reference by the SPF. COR support for attachment of authorization credentials allow the application creating the COR to effectively delegate authorization to the SPF subsystem in the event that the SPF subsystem requires specific authorizations to access the referenced datastream.
The COR structure supports a variety of different reference/identifier syntaxes including standard IETF URI schemes such as a URL; a DLS local identifier based for example on a DSO Datastream Identifier (see
The COR is created by the application program that invokes SPF processing and is ultimately released or destroyed by the application when the requested SPF processing is completed. References carried by the COR which may need to be persisted by SPF subsystems during processing, for example as the subject or object of an RDF statement, or fact, are copied as the native reference or restated as a fresh URI, or possibly even cast as a distinct fact by the SPF subsystem as part of its internal operation. Credentials attached to the COR are never persisted and, regardless of this fact, should as a matter of practice be issued with a limited validity period consistent with the amount time required to complete the operation.
The primary advantage of the COR is that it provides a programming language-neutral, polymorphic approach to dealing with references throughout the SPF and its subsystems. Additionally, unlike common approaches such as simple self-typed opaque identifier references, the COR allows DLS applications to express conformable behaviors to SPF subsystems using explicit policy and trust semantics for delegation of authority (thus the moniker “Conformable Object Reference”). Finally, the particular utility of this strategy in conjunction with SPF subsystems is to allow loose coupling while achieving expressive trust and policy semantics at per-reference granularity for how DLS applications request processing and how SPF subsystems fulfill those requests.
Referring again to
The OSA is invoked with a COR object and DLS Context reference by the requesting DLS application. The DLS application receives an object reference from the OSA in response, and the OSA continues its work asynchronously. The object reference returned by the OSA allows the DLS application to:
The OSA effectively encapsulates all processing required to internalize the structure, metadata, and content nodes provided by the datastream in the XML DOM-based document/content tree. Practitioners skilled in the art will recognize that a variety of technologies are available for XML DOM processing consistent with the W3C specification, and which can be used to implement generic DOM processing within the broader set of OSA functionality as described.
The content extraction and filter framework, as illustrated in
Filter patterns are self-identifying objects that are typically written using either the W3C Extensible Stylesheet Language Transformation (XSLT) XML language, or possibly using Javascript/ECMAscript, depending on how they can be composed and where they can be applied by the framework. Filter patterns are created to detect and match the most narrowly defined datatype, and are composed using processing defined by the context extraction and filter framework to operate on larger structures such as a complete OSA document/content tree. Composed sets of filters can be named and reused. In the interests of maximum reusability and composability, filter patterns should be designed to operate on discrete datatypes or structure patterns as in the case of a particular microformat such as FOAF, as opposed to complete documents or web pages, and thus should be as stable as the datatypes they are capable of processing. Other techniques for content extraction may also be appropriate in various embodiments.
Unlike single-pass page-level or document scraping techniques that are structure-specific and selected using URL pattern matching, the SPF content extraction and filter framework utilizes dynamic datatype and node-type matching techniques. The discrete filters are composed using framework processing techniques for traversing the document/content tree that include support for backtracking or multi-pass analysis, thus allowing the framework to adapt the application of filters based on what is learned from matches and/or failures during processing. Whereas single-pass page or document level scrapers tend to be very sensitive to changes in the content or matching URL structure, which is a particular problem in processing highly irregular or frequently changing web-based content, the SPF approach provides a more adaptable technique for best-effort detection and extraction.
The collection and annotation framework provides functionality for identifying and extracting facts and content from web dataflow in conjunction with proxy-based browsing activity. The collection and annotation framework uses services provided by the object structure analyzer, content extraction and filter framework, SPF database services, and the SPF policy and preferences framework. The collection and annotation framework can be driven both by APIs provided to the web applications framework, as well as through event-driven automation in conjunction with the Proxy Framework in conjunction with normal web browsing. General operation of the collection and annotation framework in some embodiments works as follows:
Important benefits to observe about the design of the collection and annotation framework that distinguish it from other similar systems in some embodiments, are as follows:
Separating heavy-weight and content-sensitive fact collection processing functions under the collection and annotation framework from browser-hosted UI elements allows the SPF processing framework to adaptively improve processing features through ongoing updates to filters and policy without requiring updates to client code. This further allows feedback from users of the system to direct improvements to policy and filter components in the SPF, in particular affecting the content extraction and filter framework, providing a relatively transparent experience that can be incrementally improved through updates to the DLS with improved or new filters and policies from the operational support services (OSS) provider.
Returning to
As previously mentioned, the semantic processing framework supports multiple databases, the most fundamental of which is the individual's RDF fact store. The fact store consists of the RDF statements collected both using automated functions of the object structure analyzer and content extraction and filter framework, as well as through user-directed processing using the memory/fact collection task application in conjunction with the SPF collection and annotation framework as previously described. Additional third-party databases can also exist for formal representations of taxonomies, ontology data using W3C OWL language descriptions, and concept databases based on W3C RDF or possibly other formats.
Functionality provided by the QRF supports queries and reasoning operations over the individual's RDF fact store, and potentially other compatible knowledge databases configured with the SPF, using the W3C standard SPARQL QL language. Practitioners skilled in the art will recognize that there are multiple available SPARQL database and library technologies, and any of these are potentially useful for implementation of the QRF. The QRF API allows the framework to augment queries from DLS applications using context and policy settings from the SPF policies and preferences framework and DLS context manager. Specifically, the QRF API allows calling DLS applications to indicate to the QRF whether it should augment submitted queries with attributes from the current context. This functionality allows calling DLS applications to allow the QRF to incorporate facts from the current context that may effect results of the query, such the historical time frame as currently established by updates from the semantic history navigator to the context manager. The QRF may additionally use policy settings from the SPF policy and preferences framework to configure or limit security sensitive queries in conjunction with the SPARQL library.
Building on Trusted Sharing Services functionality provided by the DLS (described earlier), it is additionally possible, if configured and authorized by a set of individuals using appropriate trust manager provided credentials, for QRF queries to access RDF stores across different accounts and DLS systems. Support for such a configuration requires the DLS application to construct the references to the shared RDF stores, and may require additional credentials for access to results that reference DLS collections, data storage objects (DSO), or DSO datastreams if they are not available within the Trusted Sharing Service shared storage volume.
The SPF facts presentation framework (FPF), as illustrated in
Similar to the architecture of the QRF framework, the FPF effectively hosts access to a standards-conformant W3C Fresnel library implementation through a higher-level FPF API. Practitioners skilled in the art will recognize that there are multiple library technologies for the W3C suite of Fresnel standards, and any of these are potentially useful for implementation of the FPF. In more detail, the FPF provides functional integration of the W3C Fresnel standard concepts of lens, format, and selector, as follows:
DLS Semantic Applications
Digital life server (DLS) supports a user's long-term information needs through a variety of services and applications which may be implemented collectively or separately in various embodiments. For example, configured on a network with compatible CAS service agents, the DLS can interoperate with existing personal computer systems using standard file service protocols in the form of a commodity network-attached storage device. However, even in this relatively simple configuration, the DLS functions as a storage device with high availability and effectively unlimited capacity, with added ability to securely navigate file versions and history in a dynamic manner over long periods of time. Similarly, the DLS can be configured as a proxy server for electronic mail (POP/SMTP/IMAP) or syndicated feeds (e.g. RSS, IETF ATOM), allowing it to effectively aggregate and provide a secure single point of management for all user identities and accounts in conjunction with existing personal computer desktop and device application configurations. In all cases, Preservation functions of the DLS ensure efficient long-term navigation and recovery of data across all of these applications and data.
The DLS further incorporates support for a flexible set of fact acquisition and reasoning functions as provided by the semantic processing framework (SPF), thus supporting creation of applications capable of representing and manipulating both explicit and inferred relationships between data regardless of its origin, either from the web, or by means of objects managed the DLS using contexts, collections, data storage objects (DSO), and DSO datastreams. Web applications provided with the DLS that utilize the collective functionality of the SPF and other DLS, subsystems for rich personal information services are referred to as the DLS semantic applications.
OSS and OPS Services
Operational Support Services
The DLS system should be capable of significant technical evolution over long periods of time in some embodiments. Economical construction and operation of DLS appliances is expected to utilize low-cost commodity microprocessor, networking, power, and disk components in some embodiments. Depending on environmental conditions, such systems may have a replacement lifecycle of five to seven years, and therefore hardware itself can be expected to fail or require replacement several times during an individual's lifetime. Additionally, improvements in networking technology, physical disk capacity, hardware security, or processor capabilities naturally lead to demand for generational upgrade of systems over time. Durability of the individual's data and continuity of their experience in the presence of these replacement lifecycle conditions therefore requires robust design of the DLS software, its upgrade, maintenance, and configuration management mechanisms.
The operational support services (OSS) are designed to meet the long term robustness, continuity, privacy, and lifecycle maintenance requirements for adoption and use of DLS systems by individuals in large scale deployments.
Referring to
System 1500 of
Communication between the DLS and an OSS site are managed by the OSS operational services access manager. The operational services access manager verifies the secure transport session mutual authentication and then connects the DLS system to the requested OSS service.
Services provided by the OSS include:
The DLS verification service works in conjunction with the OSS operational services access manager to develop and maintain reputation statistics for known DLS devices in the supported population. The DLS verification service requires no information about accounts, identities, and/or any associated cryptographic credentials or keys for any given DLS system, and thus is designed to provide strong privacy assurance for users of the system.
The DLS verification process operates by building a reputation for each known DLS system based on its access patterns with the verification service's operational services access manager. DLS systems access their configured OSS periodically as they transact for updates, policies, and new configurations, and importantly, they do this every time they are restarted. Over time, it should strongly be the case that DLS systems exhibit uniform access patterns due to the relatively fixed nature of how they are deployed, thus making it possible to statistically detect anomalies in behavior that could provide early indication of a possible problem, including:
The verification service utilizes the collected statistical information to maintain a record, or reputation, of known stable and well-behaving DLS systems. The reputation must be maintained by the verification service as a highly efficient structure both to store and evaluate. In an embodiment, the reputation is a vector of hashes computed over data easily obtained from the DLS IP transport stream connection as reported by the OSS operational services access manager. This is referred to as the “basis data.” Reputation vectors of basis data hashes may themselves then be hashed, to compress known good vector sequences for a given time periods, thus providing the means for allowing historical good behavior to be checkpointed efficiently in a compact structure supporting efficient trend analysis.
The reputation for each DLS device is correlated with it nominally based on the device's MAC address as openly communicated and trivially observed in common IP traffic. Trusted boot functions of the DLS device make it particularly difficult for the MAC to be altered without causing the device to fail, thus providing confidence in this most basic information as an always in the clear identifier for each unique device. This confidence is further reinforced using mutual authentication of the protected transport session as a means to reduce potential attacks on the communications channel. It is explicitly not necessary for the verification service to obtain account or personally identifying information in order for the system to work.
As a separate business service, the OSS may offer a risk prevention or anti-theft service to DLS owners, offering them the opportunity to register for notification if anomalies are detected on their device by the OSS verification service. If an owner decides to participate in the service, they opt-in by associating their DLS with their contact information using the MAC address of the device. The optional opt-in business service allows the owner to be contacted in the event that abnormal behavior is detected from the registered device.
Regardless of whether owners opt-in for an optional verification and reporting business service, reputation statistics for anonymous and unregistered systems still provide important telemetry for threat and vulnerability monitoring in support of the OSS security policies and emergency response services.
The DLS security policies service is supported by threat and vulnerability monitoring business activities conducted by the operator of the OSS. Consistent with DLS privacy guarantees, threat and vulnerability monitoring operates by using a combination of anonymous data from the DLS verification service, environmental monitoring for detection of efforts to attack or disrupt operation of the DLS population at large, and vulnerability analysis based tracking of implementation or logic defects in the DLS software base. Some of these functions are provided by business resources of the OSS, whereas others such as statistical trend analysis is automated. Collectively, the threat monitoring and closely related vulnerabilities analysis can lead to software configuration updates. However, some threats may be able to be countered without resorting to deployment of new software and can be addressed by successfully updating configurable DLS policies, for example by forcing a change in the duration of credentials, configuration of cryptographic routines, or other locally-enforced DLS operating system and runtime policies. In such cases, policy updates can be pushed to DLS systems using the DLS security and policies service.
The DLS software configuration service supports automated distribution of software updates and configuration labels. The service pushes notifications of available configuration updates to the OSS' DLS population and supports retrieval/distribution using proven techniques understood by practitioners skilled in the art. The service additionally supports requests for labeled configurations from verified DLS systems with good reputations. Requests by a DLS device for components from an historical, labeled configuration may, for example, occur in the event that an older version of a component is required in order to process data from an epoch that has a dependency on an earlier version of a DLS application. Verification of the requesting device's reputation is an automated risk management behavior of the system designed to minimize arbitrary probing of historical system software for reverse engineering efforts by rogue or malicious parties.
The DLS optional components service is similar to the DLS software configuration service in that it provides a means for delivering authorized software to verified DLS devices. The optional components service is distinguished by the fact that its offerings are not included as mandatory components in labeled system configurations managed by the software configuration service. The OSS may offer access to the optional components services as a separate business feature.
Online Preservation Service
In some embodiments, the online preservation service (OPS) provides the distributed services interface to online mass storage for preservation of DLS users' data sets. In an embodiment, the DLS is operated with a configured OPS service. Distributed OPS systems provide functionality including:
There can be multiple OPS service instances and they can be operated by a variety of different commercial operators/providers.
Functionality of the OPS services as presented to the DLS are discussed in detail in the earlier portion of this specification that describes DLS preservation functions.
As an additional topic to those services of the OPS as previously described, it is desirable to allow for parties who choose at some point to withdraw from the DLS, OSS, OPS environment to extract their data assets from the system in a usable form without any ongoing reliance on the system infrastructure. Business policies for withdrawing from the system are established by OSS and OPS entities. Nominally, a request is made to the OSS service in order to provision the application tools for the user to automate history navigation over the period recorded in the their OPS account, and to export the data in a set of well-defined structures. The automated process uses functionality of the preservation engine, history manager, trust manager, and OPS services as previously described in conjunction with the preservation service read flow sequence; see also
By way of further explanation,
DLS 1410 then reads 1455 from storage subsystem 1425 and receives a read response 1460. This response 1460 is relayed as a client response 1465, and a further read 1470 may occur. A corresponding response 1475 is received and relayed as a client response 1480. Read complete 1485 is signaled to storage subsystem 1425 and preservation engine 1420, and a reservation release 1490 is transmitted to storage subsystem 1425. A final client response 1495 is also transmitted to client 1405 to indicate the read process is complete.
The actual OPS may be further understood with reference to
The following provides a specific set of applications and related software which may be used with various DLS implementations and embodiments. This description is intended to be illustrative, providing an example of how the system may be implemented with a software and user interface. Alternative implementations or embodiments may be used to provide similar functionality or different functionality presented to a user which takes advantage of the capabilities and features of a DLS.
Semantic History Navigator Application
In an embodiment, the Semantic History Navigator (SHN) is implemented using the Web Applications Framework, the Dynamic Web Interaction Framework, and other DLS subsystems as a client-server web application.
The SHN client-server application provides an interactive interface for quickly visualizing and navigating the organization and history of an individual's information assets as managed by the DLS. In more detail, the SHN application provides a browser-based web interface for interaction with remote DLS services in the form of a distributed client-server application using standard web (e.g. HTTP, SOAP) protocols. In such an embodiment, client side application functionality and interactivity is provided in part through script code (e.g. Javascript, ECMAscript) uploaded to the browser using services of the Dynamic Web Interaction Framework (as previously explained); functionality is also provided by standard W3C CSS stylesheets, and may use other resources including GIF and JPEG image files. The browser client script code is hereafter referred to as the “SHN Client.” In the case of this embodiment, server side functionality is provided by the Web Application Framework in the form of a standard Java JSR 154 Servlet. The DLS Web Application Framework servlet for the SHN application is hereafter referred to as the “SHN Servlet.” Communication between the SHN Client and SHN Servlet is conducted using a set of application-specific XML messages over the standard XMLHttpRequest protocol request/callback pattern.
Referring to
As illustrated in
Continuing in more detail with
Visualization of activities and interests relative to the current day and their correlated representation in the Day Context Pane and Current Context Pane is illustrated in
In more detail, the Current Context Pane provides a single day view and is always centered to display elements from the current Content that intersect with the current day, which in the case of this example consists of three activities or interests (recall from preceding discussion that in this context, activities have a fixed start and completion date/time, whereas interests are ongoing and have no beginning or ending date/time). The current day is set by the movement in the Timeline and Event Scroll Region component/pane, and so moving either forward or backward in time using the scroll region changes the current day and updates the component/panes accordingly. Timeline scroll region movements also update the SHN Servlet through XMLHttpRequest/callback protocol messages invoked by the SHN client, thus causing the SHN Servlet to set the corresponding attribute for the “current day” on the DLS side of the application as well. Time navigation also causes the SHN Client to request updates from the SHN Servlet for the minimal and necessary set of information required to update and maintain correlation between visualizations in the components/panes, thus providing the information to populate events in the Day Context Pane, and the Current Context Pane, as illustrated in
As previously mentioned, in some embodiments, changes to the selected Context result in correlated updates to data in other components/panes.
Continuing with
Past timeline pane 2320 and future timeline pane 2310 provide information about past and future events. Present context pane 2330 provides information about a current day, including activities and interests as scheduled. Context information for such activities and interests is provided in activities and interests context pane 2340. Timeline/scroll region 2360 provides a scroll-bar like timeline correlated to the data of panes 2330 and 2340. Markers within each of the panes of interface 2300 are also correlated, such as event markers 2370 and related markers 2380 in panes 2330 and 2340.
Finally, throughout all the described SHN Client and SHN Servlet interactions, it is important to restate that all event, activities, interests, and Context data is derived by the SHN Servlet through use of programming interfaces provided by the Collections Manager, the History Manager, and the Context Manager using functionality as previously described.
Personal Semantic Workspace Application
An embodiment of the Personal Semantic Workspace (PSW) is implemented in an embodiment of an overall system using the Web Applications Framework, the Dynamic Web Interaction Framework, and other DLS subsystems as a client-server web application.
The PSW client-server application provides an interactive interface for using the Semantic History Navigator (SHN) in conjunction with a set of content-specific “panes” or contextual information “facets” for visualizing, creating, editing, storing, and generally manipulating information assets as managed by the DLS. In more detail, the PSW application provides a browser-based web interface for interaction with remote DLS services in the form of a distributed client-server application using standard web (e.g. HTTP, SOAP) protocols. Client side application functionality and interactivity are provided in part through script code (e.g. Javascript, ECMAscript) uploaded to the browser using services of the Dynamic Web Interaction Framework (as previously explained); functionality is also provided by standard W3C CSS stylesheets, and possibly other resources including GIF and JPEG image files. The browser client script code is hereafter referred to as the “PSW Client.” In the case of this embodiment, server side functionality is provided by the Web Application Framework in the form a standard Java JSR 154 Servlet. The DLS Web Application Framework servlet for the PSW application is hereafter referred to as the “PSW Servlet.” Communication between the PSW Client and PSW Servlet is conducted using a set of application-specific XML messages over the standard XMLHttpRequest protocol request/callback pattern.
Referring to
An anchor pane 2515 is provided with a status indicator 2505 and an identity indicator 2510, along with additional status/session indicators 2520 and a preferences link 2525. Timeline panes include past (2530), present (2528) and future (2536). Activities and interests context panes 2533 and 2550 are also provided, along with a context recall pane 2555. Scrollbar 2545 is provided as part of timeline 2542. Individual content panes 2560, 2570, 2575, 2580, 2585 and 2590 provide content navigation for activities and interests, and each may be provided with a preference control 2566 and a pane title 2563.
As illustrated in
In more detail, the SHN application is incorporated in whole by the PSW application and functions as previously described. Significant efficiencies accrue from this technique, in particular because the same behaviors result in selection of Contexts, temporal navigation, and data correlation apply and operate consistently throughout the rest of the PSW components/panes as previously described for the SHN components/panes. Referring to
As previously introduced in the description of the SHN, Context selections are first processed locally by the SHN Client, and in the case of the PSW application, the PSW Client. If the updates can be handled from locally cached data, the updates occur completely within the local browser environment, otherwise the PSW Client uses the XMLHttpRequest/callback message protocol sequence to update the PSW Servlet and retrieve the data required for the required updates. As further illustrated in
In somewhat more detail,
An anchor pane 2810 is provided with a status and identity information. Timeline panes include past (2870) and future (2860), along with day context pane 2820. Activities and interests context panes 2840 and 2895 are also provided, along with a context recall pane 2850. An activities and interests context navigator 2830 is also included, as is a timeline and event region scrollbar 2880. Contextual application framework pane 2890 provides application support related to current activities and interests. In
Referring again to
Contextual Pane components receive and process Context and temporal settings just like all other SHN and PSW application components/panes. Contextual Pane components may incorporate additional client browser and Web Application Framework functionality using either XMLHttpRequest/callback protocol message patterns, or SOAP-based processing, depending on the sophistication and nature of their processing needs.
Finally,
Memory/Fact Collection Task Overlay Application
The Memory Task overlay application utilizes client-server style processing over standard W3C HTTP and related protocols between an individual's browser and the DLS to annotate and remember information valuable to the individual as part of their web browsing experience.
A detailed functional description of the Memory/Fact Collection Task overlay application is described in the preceding section on the SPF Collection and Annotation Framework subsystem.
a illustrates a memory reminder dialog box 3110.
Process 3500 initiates with receipt of a user logon at module 3510. At module 3520, a webpage request is received. At module 3530, a determination is made as to whether the webpage contents are cached in a local cache (such as in a DLS, for example). If so, then the webpage is retrieved from the cache at module 3540. If not, then the webpage is retrieved from the web via the internet at module 3550. The retrieved webpage is provided to a user (such as through a client) at module 3560, and the process may then repeat in whole or in part.
While simply retrieving a webpage may be appropriate in some situations, information may be overlaid on other webpages.
Process 3600 initiates with receipt of a webpage which is reviewed at module 3610. At module 3620, the webpage contents are checked for a match with a database. If a match is found, overlay information for the webpage is retrieved at module 3630 (and may be added to the webpage). Regardless of whether a match is found, the webpage is presented at module 3640. However, if overlay information has been added at module 3630, this may be part of what is presented, and may be indistinguishable from the rest of the webpage in some embodiments.
Overlay information for webpages, and other information may also be stored in a DLS.
Process 3700 initiates with receipt of a request to store information at module 3710. The information may be an overlay for a webpage, for example. At module 3720, attributes of the information to be stored are requested, such as through a user client. At module 3730, a title for the information to be stored is received. Similarly, at module 3740, a type of information to be stored is received. Also, at module 3750, a category for such information is received. The attributes of modules 3730, 3740 and 3750, along with the information itself are stored at module 3760. Note that other attributes may be requested and supplied in other embodiments, and the attributes of such information may take on various different forms, for example.
Storing a document may involve a different process.
Thus, a document is received at module 3810. At module 3820, attributes of the document are extracted, such as from metadata or a scan of data of the document. At module 3830, a determination is made as to whether attributes needed for storage are present. If not, attributes are requested at module 3840, such as through a user client. Such attributes are then received at module 3850. Whether the attributes need to be requested or not, the document and associated attributes are stored at module 3860.
While documents or basic information may be stored routinely, event information may also be stored with a DLS.
Event information is received at module 3910. At module 3920, attributes of the event are extracted from the information if possible. For example, a calendar entry may include information about who attended an event or what the topic was, along with time and date. At module 3930, a determination is made if attributes needed for storage are present. If not, attributes are requested at module 3940, such as through a user browser or client. The attributes are then received at module 3950. Whether the attributes need to be requested or not, event information and associated attributes are stored at module 3960.
With information stored, retrieving that information becomes important.
If a document is specified, this is received as a request at module 4010. If other parameters (e.g. title or date, for example) are specified, a context request is received at module 4020. At module 4030, the context request is used to search the archive for a matching document. Any identified documents (regardless of type of request) are found at module 4040. At module 4050, the found document(s) are retrieved, and at module 4060, the retrieved document(s) are presented to a user, such as through a user client for example.
Access to the internet 4105 is typically provided by internet service providers (ISP), such as the ISPs 4110 and 4115. Users on client systems, such as client computer systems 4130, 4150, and 4160 obtain access to the internet through the internet service providers, such as ISPs 4110 and 4115. Access to the internet allows users of the client computer systems to exchange information, receive and send e-mails, and view documents, such as documents which have been prepared in the HTML format. These documents are often provided by web servers, such as web server 4120 which is considered to be “on” the internet. Often these web servers are provided by the ISPs, such as ISP 4110, although a computer system can be set up and connected to the internet without that system also being an ISP.
The web server 4120 is typically at least one computer system which operates as a server computer system and is configured to operate with the protocols of the world wide web and is coupled to the internet. Optionally, the web server 4120 can be part of an ISP which provides access to the internet for client systems. The web server 4120 is shown coupled to the server computer system 4125 which itself is coupled to web content 4195, which can be considered a form of a media database. While two computer systems 4120 and 4125 are shown in
Cellular network interface 4143 provides an interface between a cellular network and corresponding cellular devices 4144, 4146 and 4142 on one side, and network 4105 on the other side. Thus cellular devices 4144, 4146 and 4142, which may be personal devices including cellular telephones, two-way pagers, personal digital assistants or other similar devices, may connect with network 4105 and exchange information such as email, content, or HTTP-formatted data, for example. Cellular network interface 4143 is coupled to computer 4140, which communicates with network 4105 through modem interface 4145. Computer 4140 may be a personal computer, server computer or the like, and serves as a gateway. Thus, computer 4140 may be similar to client computers 4150 and 4160 or to gateway computer 4175, for example. Software or content may then be uploaded or downloaded through the connection provided by interface 4143, computer 4140 and modem 4145.
Client computer systems 4130, 4150, and 4160 can each, with the appropriate web browsing software, view HTML pages provided by the web server 4120. The ISP 4110 provides internet connectivity to the client computer system 4130 through the modem interface 4135 which can be considered part of the client computer system 4130. The client computer system can be a personal computer system, a network computer, a web tv system, or other such computer system.
Similarly, the ISP 4115 provides internet connectivity for client systems 4150 and 4160, although as shown in
Client computer systems 4150 and 4160 are coupled to a LAN 4170 through network interfaces 4155 and 4165, which can be ethernet network or other network interfaces. The LAN 4170 is also coupled to a gateway computer system 4175 which can provide firewall and other internet related services for the local area network. This gateway computer system 4175 is coupled to the ISP 4115 to provide internet connectivity to the client computer systems 4150 and 4160. The gateway computer system 4175 can be a conventional server computer system. Also, the web server system 4120 can be a conventional server computer system.
Alternatively, a server computer system 4180 can be directly coupled to the LAN 4170 through a network interface 4185 to provide files 4190 and other services to the clients 4150, 4160, without the need to connect to the internet through the gateway system 4175.
The computer system 4200 includes a processor 4210, which can be a conventional microprocessor such as an Intel pentium microprocessor or Motorola power PC microprocessor, a Texas Instruments digital signal processor, or some combination of the two types or processors. Memory 4240 is coupled to the processor 4210 by a bus 4270. Memory 4240 can be dynamic random access memory (dram) and can also include static ram (sram), or may include FLASH EEPROM, too. The bus 4270 couples the processor 4210 to the memory 4240, also to non-volatile storage 4250, to display controller 4230, and to the input/output (I/O) controller 4260. Note that the display controller 4230 and I/O controller 4260 may be integrated together, and the display may also provide input.
The display controller 4230 controls in the conventional manner a display on a display device 4235 which typically is a liquid crystal display (LCD) or similar flat-panel, small form factor display. The input/output devices 4255 can include a keyboard, or stylus and touch-screen, and may sometimes be extended to include disk drives, printers, a scanner, and other input and output devices, including a mouse or other pointing device. The display controller 4230 and the I/O controller 4260 can be implemented with conventional well known technology. A digital image input device 4265 can be a digital camera which is coupled to an I/O controller 4260 in order to allow images from the digital camera to be input into the device 4200.
The non-volatile storage 4250 is often a FLASH memory or read-only memory, or some combination of the two. A magnetic hard disk, an optical disk, or another form of storage for large amounts of data may also be used in some embodiments, though the form factors for such devices typically preclude installation as a permanent component of the device 4200. Rather, a mass storage device on another computer is typically used in conjunction with the more limited storage of the device 4200. Some of this data is often written, by a direct memory access process, into memory 4240 during execution of software in the device 4200. One of skill in the art will immediately recognize that the terms “machine-readable medium” or “computer-readable medium” includes any type of storage device that is accessible by the processor 4210 and also encompasses a carrier wave that encodes a data signal.
The device 4200 is one example of many possible devices which have different architectures. For example, devices based on an Intel microprocessor often have multiple buses, one of which can be an input/output (I/O) bus for the peripherals and one that directly connects the processor 4210 and the memory 4240 (often referred to as a memory bus). The buses are connected together through bridge components that perform any necessary translation due to differing bus protocols.
In addition, the device 4200 is controlled by operating system software which includes a file management system, such as a disk operating system, which is part of the operating system software. One example of an operating system software with its associated file management system software is the family of operating systems known as Windows CE® and Windows® from Microsoft Corporation of Redmond, Wash., and their associated file management systems. Another example of an operating system software with its associated file management system software is the Palm® operating system and its associated file management system. The file management system is typically stored in the non-volatile storage 4250 and causes the processor 4210 to execute the various acts required by the operating system to input and output data and to store data in memory, including storing files on the non-volatile storage 4250. Other operating systems may be provided by makers of devices, and those operating systems typically will have device-specific features which are not part of similar operating systems on similar devices. Similarly, WinCE® or Palms® operating systems may be adapted to specific devices for specific device capabilities.
Device 4200 may be integrated onto a single chip or set of chips in some embodiments, and typically is fitted into a small form factor for use as a personal device. Thus, it is not uncommon for a processor, bus, onboard memory, and display/I-O controllers to all be integrated onto a single chip. Alternatively, functions may be split into several chips with point-to-point interconnection, causing the bus to be logically apparent but not physically obvious from inspection of either the actual device or related schematics.
Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The present invention, in some embodiments, also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the present invention is not described with reference to any particular programming language, and various embodiments may thus be implemented using a variety of programming languages.
One skilled in the art will appreciate that although specific examples and embodiments of the system and methods have been described for purposes of illustration, various modifications can be made without deviating from the present invention. For example, embodiments of the present invention may be applied to many different types of databases, systems and application programs. Moreover, features of one embodiment may be incorporated into other embodiments, even where those features are not described together in a single embodiment within the present document.