In document and data management tools, such as Microsoft SharePoint Online™ (hereinafter referred to as SPO), it is possible to develop documents, for example, for projects, with the contributions of multiple users. In such arrangements, the users often have different levels of authorization for management of the development of the documents. In such document and data management tools, the most crucial and simplest customer requirement is that any version of a document that is visible in in the program should be retrievable in search results. In other words, for many search purposes, such as conducting legal discovery or performing audits, it is often necessary to be able to search not only the current version of a document under development, but also historical versions that were used to develop the current version of the document. However, in current document and data management programs, such as SPO, the search ingestion pipeline keeps only the latest primary (e.g., major) version of a document in a search index which is available for search functionality.
For example, if a document has multiple versions (e.g., 0.1, 0.2, 1.0, 2.0, 2.1, 2.2, 3.0 and 4.0) in the application program over the course of developing the document, the search ingestion pipeline will process each version of document sometime after the change happens, but only keep the last primary version in a record index (for example, a search index in cloud storage) by overwriting the previous primary version. In other words, in the above example where there have been three previous primary versions 1.0, 2.0 and 3.0, in previous searching systems only the current primary version 4.0 will be available for search via the search index, notwithstanding that several other versions are actually stored in cloud storage as historical versions of the document, as well as the previous primary versions which are also historical versions. Although this arrangement is efficient in terms of reducing the time required for conducting a search of the current primary version of the document, it does not provide an optimum experience for many customers, and does not meet some other requirements such as compliance requirements for a discovery search for legal proceedings. In such cases, it is necessary to search not only the current primary version of the document, but also historical versions of the document that have been saved since the last current primary version was saved, in other words, historical versions of the document or project while it is being developed (including previous primary versions that have been overwritten to be replaced by new primary versions). Regarding this, it is noted that the term “primary version” is being used herein to represent a current version of the document being developed which has been selected to be the version which is available in a search index, for example, in a cloud storage search index arrangement, for viewing and searching by searchers with only a low level search access authorization (e.g., read-only viewing).
In an implementation, a system is provided including one or more processors and one or more machine-readable media storing instructions which, when executed by the one or more processors, cause the one or more processors to grant a first search request from a first searcher having a predetermined access authorization, to search, via a search index in a cloud storage, stored selected versions of a document, in which stored selected versions have been developed in an application based on contributions of a plurality of users to produce versions of the document in the application, wherein the stored selected versions of the document are historical versions of a current primary version of the document which has been selected from among the stored selected versions to be accessible, via the search index, to other searchers having a lower access authorization than the predetermined access authorization, other searchers do not have access to the stored selected versions, and the first searcher is provided with a capability to toggle between searching only the primary version of the document and searching the stored selected versions of the document.
In another implementation, a method is provided including developing a document in an application based on contributions of a plurality of users to produce versions of the document in the application, pushing one or more selected ones of the versions from the application to cloud storage as stored versions in the cloud storage, and selecting a latest one of the stored versions of the document to be a primary version of the document which is accessible to searchers with a limited access authorization via a search index, wherein a determination as to when to push the selected one or more of the versions from the application to the cloud storage is based on at least one of: a ranking of the users making changes to the selected one or more of the stored versions in the application in relation to a level of authorization for making changes to the document; a number of times changes have been made to the selected one or more of the versions in the application by the users since the primary version has been selected; the number of users that are using or modifying the selected one or more of the versions in the application; and how often the selected one or more of the versions is being used in the application.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
The drawing figures depict one or more implementations in accord with the present teachings, by way of example only, not by way of limitation. In the figures, like reference numerals refer to the same or similar elements. Furthermore, it should be understood that the drawings are not necessarily to scale.
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings.
This description is directed to allowing an authorized searcher to be able to conduct a search of a of current primary version of a document being developed in an application as well as historical versions of the document which were used in the development of the current primary version of the document.
As discussed above with regard to
It is noted that the document storage application SharePoint Online™ supports the concept of so-called “minor versions,” which are, in effect, versions of a document under development. These minor versions are used to control who has access to “in progress” work. For example, the members of a group involved in developing a document in the application program might be able to see a minor version (e.g., Version (2.3)). But when a visiting searcher, who is not a member of the development group, and who has only read permission, attempts to perform a search, they will only see the last “major version” (e.g., Version 2.0, which is a primary version which one or more of the group members, or an Administrator, determined to be available for read-only viewing by searchers outside of the group). In other words, the push service used to push versions of the document from the application being used to develop the document (e.g., SharePoint Online™) to cloud storage (e.g., a cloud based search index) will submit only the latest major version (e.g., Version 2.0 in this example) to the search index of the cloud storage. As such, although the application developing the document supports the concept of “minor” versions, equivalent to versions in the present disclosure, the search index the cloud storage used for such applications in previous systems does not allow access to such minor versions. In conjunction with this, the Access Control Lists (ACLs) and security trimming for the push service assume all pushed documents are major versions.
It is noted that the present disclosure has particular usefulness with regard to conducting legal discovery for legal proceedings, or similar searches, such as audits. In conjunction with such legal discovery, it is noted that the term “discovery compliance” means that each party (in this case prosecution and defense) must comply with court orders regarding sharing any information relevant to the case in their possession. Electronic discovery, as used herein, refers to any process in which electronic data is sought, located, secured, and searched for using it as evidence in a case. In the process of electronic discovery, data of all types can serve as evidence.
It is also noted that, in the following description, the terminology “legal or litigation hold” is the process an organization employs when litigation is plausibly anticipated. The legal hold mandates preservation of documentation, pertaining to a grievance or an audit. All processes leading to disposal of data are suspended to ensure data availability for the discovery process prior to actual litigation.
Once the content 320 is in the index 310, this content is eligible to be displayed as a result of relevant queries 330 from eligible searchers (who may or may not also be users from the group involved in developing the document in the document storage application). A query 330 can be a text/query that a user or other searcher enters in a search box to send to the index 310. In response to the query 330, the index 310 can fetch the requested information from the cloud storage and provide it to the requestor for display.
As shown in
More specifically, a content push service 450 pushes selected ones of versions 425 of the documents being developed by the group of users in the application 440 to the indexes 310a and 310b through a content router 460. The content push service 450 and the content router 460 are standard elements used with data management applications for pushing versions that have been developed in the application 440 to a search index in cloud storage. In accordance with aspects of the present disclosure, the decision as to which ones of versions 425 of the document being developed in the application 440 to push to the cloud storage 410 (and, in particular, to the portion 310a of the search index) to be stored as historical versions 425 is made based on a variety of possible criteria.
For example, in accordance with aspects of the present disclosure, the determination as to when to push the selected one or more of the versions from the application 440 to the index portion 310a of cloud storage 410, using the push service 450, is based on at least one of the following determining factors: (1) a ranking of the users making changes to the selected one or more of the versions in the application 440 regarding a level of authorization for making changes to the document; (2) a number of times changes have been made to the selected one or more of the versions in the application 440 by the users since the primary version has been selected; (3) the number of users that are using or modifying the selected one or more of the versions in the application; and (4) how often the selected one or more of the versions is being used in the application 440.
The decision to select a particular version being worked on by one or more users in the application 440, or by an administrator of the application 440, to be a version to be pushed to the index portion 310a, can be made based on just one of the above-noted factors or a combination of two or more of them, including all of them, if desired. Further, weighting could be assigned such as that some of the noted selection factors have more weight than others in terms of the decision to push the version to the index portion 310a as a historical version. For example, factor (1) noted above could have the highest weighting value, factor (2) the second highest weighting value, factor (3) the third highest, and factor (4) the lowest weighting value, assuming that all four factors are given weight in making the decision as to whether a particular version being developed in the application 440 is going to be pushed as a historical version to the index portion 310a. This weighting example is, of course, just one example, and other combinations of the decision factors are envisioned as well.
The decision as to which historical version to select as a primary version to be pushed to the index portions 310a and 310b to be the current primary version 420 can be made by one or more of the users and/or an application administrator. This decision can be made using one or more of the above-noted decision factors that can be used for determining which versions of the document being developed in the application will be pushed as historical versions, or other factors, if desired. Similarly, weighting of decision factors can be used for determining which historical version 425 will be selected to be the current primary version 420, if desired, similar to the weighting described above for deciding which versions to push.
Still referring to
In addition, as shown in
Still referring to
Still referring to
In accordance with aspects of the present disclosure, versions 425 that are pushed by the push service 450 to the cloud storage 410 can be made immutable, i.e., supporting only add and delete options in the push service 450 and the index 310. In conjunction with this, the push service 450 will treat the versions 425, which will become historical versions in the index portion 310a, will be treated as a child link of a corresponding primary version (e.g., the historical version 0.1 will be a child link of the primary version 1.0 for which it will become a historical version. To this end, each historical version will be a separate item in the search index 412 with a unique identifier that includes a version ID so that each historical version is distinct and separate from the current primary version. Also, to achieve this immutability of the historical versions 425 that are stored in the index portion 310a, an access control list (ACL) that provides rules for granting or denying access to the stored versions of the document, must be set to null. This is necessary in data management applications because ACLs are generally not versioned in such applications and because such ACLs do not support historical versions that are not immutable.
Since historical versions 425 will be treated as child links of the current primary version in this implementation where the versions 425 are immutable, any deep update in the site (rename, move, recrawl) would cause all historical versions to be updated. This is potentially a large amount of extra load for very little value. To mitigate such an excess load, the push service 450 will treat the historical versions 425 as immutable. This means that 450 will support only add and delete operations for the historical versions 425. Push service 450 will not update the historical versions 425 when the current primary version 420 is updated. This means that all crawl properties on the historical versions 425 must be immutable in the cloud storage 410 to be included in the historical versions.
The path on the historical versions 425 in the application 440 is not immutable, and it is always the same as the current primary version. Since the path is not immutable in the application 440, the push service 450 will not include the path as a crawl property in the historical versions 425. This means that path-based electronic discovery queries for historical versions 425 will not be supported.
In the implementation using immutable historical versions 425, in the application 440 the version URLs for historical versions are based on the path of the current primary version 420. Since the version URL is not immutable, the push service 450 will not include the version URLs as a crawl property. This means that a search client will need to query for the current primary version 420 in order to get the version URL. The proposed solution is to expose a new document storage API and take new IDs to return the correct version URL. This also abstracts to complexity of building the version URL as different file types have different version URL formats (files, items, pages).
As part of the immutable historical version design, push service 450 will not propagate list moves to historical versions. This means that historical versions will not have certain crawl properties because these crawl properties are not immutable within a site. These crawl properties can be retrieved from the current primary version by looking up a date of the current primary version in search index 412.
The computer system 800 may further include a read only memory (ROM) 808 or other static storage device coupled to the bus 802 for storing static information and instructions for the processor 804. A storage device 810, such as a flash or other non-volatile memory may be coupled to the bus 802 for storing information and instructions.
The computer system 800 may be coupled via the bus 802 to a display 812, such as a liquid crystal display (LCD), for displaying information. One or more user input devices, such as the example user input device 814 may be coupled to the bus 802, and may be configured for receiving various user inputs, such as user command selections and communicating these to the processor 804, or to the main memory 806. The user input device 814 may include physical structure, or virtual implementation, or both, providing user input modes or options, for controlling, for example, a cursor, visible to a user through display 812 or through other techniques, and such modes or operations may include, for example virtual mouse, trackball, or cursor direction keys.
The computer system 800 may include respective resources of the processor 804 executing, in an overlapping or interleaved manner, respective program instructions. Instructions may be read into the main memory 806 from another machine-readable medium, such as the storage device 810. In some examples, hard-wired circuitry may be used in place of or in combination with software instructions. The term “machine-readable medium” as used herein refers to any medium that participates in providing data that causes a machine to operate in a specific fashion. Such a medium may take forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media may include, for example, optical or magnetic disks, such as storage device 810. Transmission media may include optical paths, or electrical or acoustic signal propagation paths, and may include acoustic or light waves, such as those generated during radio-wave and infra-red data communications, that are capable of carrying instructions detectable by a physical mechanism for input to a machine.
The computer system 800 may also include a communication interface 818 coupled to the bus 802, for two-way data communication coupling to a network link 820 connected to a local network 822. The network link 820 may provide data communication through one or more networks to other data devices. For example, the network link 820 may provide a connection through the local network 822 to a host computer 824 or to data equipment operated by an Internet Service Provider (ISP) 826 to access through the Internet 828 a server 830, for example, to obtain code for an application program.
In the following, further features, characteristics and advantages of the invention will be described by means of items:
While various embodiments have been described, the description is intended to be exemplary, rather than limiting, and it is understood that many more embodiments and implementations are possible that are within the scope of the embodiments. Although many possible combinations of features are shown in the accompanying figures and discussed in this detailed description, many other combinations of the disclosed features are possible. Any feature of any embodiment may be used in combination with or substituted for any other feature or element in any other embodiment unless specifically restricted. Therefore, it will be understood that any of the features shown and/or discussed in the present disclosure may be implemented together in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Also, various modifications and changes may be made within the scope of the attached claims.
While the foregoing has described what are considered to be the best mode and/or other examples, it is understood that various modifications may be made therein and that the subject matter disclosed herein may be implemented in various forms and examples, and that the teachings may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim any and all applications, modifications and variations that fall within the true scope of the present teachings.
Unless otherwise stated, all measurements, values, ratings, positions, magnitudes, sizes, and other specifications that are set forth in this specification, including in the claims that follow, are approximate, not exact. They are intended to have a reasonable range that is consistent with the functions to which they relate and with what is customary in the art to which they pertain.
The scope of protection is limited solely by the claims that now follow. That scope is intended and should be interpreted to be as broad as is consistent with the ordinary meaning of the language that is used in the claims when interpreted in light of this specification and the prosecution history that follows and to encompass all structural and functional equivalents. Notwithstanding, none of the claims are intended to embrace subject matter that fails to satisfy the requirement of Sections 101, 102, or 103 of the Patent Act, nor should they be interpreted in such a way. Any unintended embracement of such subject matter is hereby disclaimed.
Except as stated immediately above, nothing that has been stated or illustrated is intended or should be interpreted to cause a dedication of any component, step, feature, object, benefit, advantage, or equivalent to the public, regardless of whether it is or is not recited in the claims.
It will be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein. Relational terms such as first and second and the like may be used solely to distinguish one entity or action from another without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “a” or “an” does not, without further constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.
The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it may be seen that various features are grouped together in various examples for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claims require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed example. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.