Electronic documents can contain content such as text, spreadsheets, slides, diagrams, charts, and images. Electronic documents can be the subject of a variety of activities, performed by a variety of people. These can include, for example: authoring a document, modifying/revising/editing a document, etc.
Some conventional search engines allow users to input a search query made up of one or more words, and use a document search index to return a list of documents in a group of documents (a “corpus”) that are relevant to the search query, such as documents in which all of the words in the search query occur; in which all of the words in the search query occur in close proximity to one another; in which all of the words in the search query occur in the same order as in the search query; etc.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key factors or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
A facility for reporting on a corpus of documents is described. The facility receives a user-specified search query. In response to the receiving, among documents in the corpus, the facility identifies a proper subset that have each (1) been modified in a manner relevant to the search query (2) at a recent time. For each of at least a portion of the identified documents, the facility causes to be presented information describing the document.
The inventors have recognized several disadvantages endemic to conventional approaches to document search. First, in many cases, no documents can be found that are relevant to a search query as the only documents relevant to the search query were authored too recently to be included in a document search index. Also, documents that are found are often (1) only tangentially related to the search query, and/or (2) stale, in the sense of having been written significantly earlier and not containing information of current value.
In response to this recognition, the inventors have conceived and reduced to practice a software and/or hardware facility (“the facility”) for identifying among recent revisions to documents those that are relevant to a search query.
The facility maintains, across a corpus of documents, a revision index on the terms that are involved in revisions to the documents that also reflects the times at which revisions are made. In some embodiments, the facility maintains such a revision index at a low level of latency, such as taking only 0.01 second, 0.1 second, one second, 10 seconds, a minute, five minutes, etc. to add a revision to the revision index after the revision is performed. In some embodiments, the facility updates revision index synchronously with each revision, such that the revision is not shown to the person making it to be completed until the revision index is updated to reflect it. This approach to maintaining the revision index is sometimes described as “transactional.”
The facility receives a search query from user made up of one or more terms. The facility uses the index to identify recent revisions to documents that are relevant to the search query, and displays information about these recent relevant revisions. In various embodiments, the facility uses various weightings of relevance versus recency in selecting revisions to include in the revision search result and ordering or otherwise ranking these selected revisions. As one example, when the facility receives a “manganese cathodes” search query from user, it may identify revisions to two documents in the preceding half-hour as relevant to this search query, and display information about them, such as the name of the document, a link to the document, the author of the document, the time of the revision, a video replay of the revision, a live view of ongoing revisions to the document, etc.
In some embodiments, the facility performs this revision search in parallel with conventional document search, and presents the results from each in distinct sections of a user interface display. To extend the example of the “manganese cathodes” search query, the facility may also perform a conventional document search using the search query that identifies two other documents that have had content relevant to the search query for longer periods of time. In this case, the facility displays information about these two other documents at the same time as the two recent relevant revisions
In some embodiments, after presenting a revision search result for a search query, the facility monitors for new document revisions that are relevant to the search query. In some embodiments, the facility immediately adds these to the displayed revision search result as they occur. In some embodiments, the facility displays a visual indication that new document revisions relevant to the search query have been performed; the user can interact with this visual indication, such as by selecting it, in order to display the new relevant document revisions. To extend the example of the “manganese cathodes” search query, 90 seconds after the facility displays information about the two recent relevant revisions, it may determine that a new document revision has been performed that is relevant to the search query; in response, it displays information about this new document revision together with information about the original two recent relevant document revisions.
In some embodiments, the facility includes with information about each recent or new relevant document revision one or more controls for interacting with the author responsible for each of these revisions, such as controls for sending the author an asynchronous message—such as an email message, a text message, an instant message, a voice message, a meeting scheduling request, etc.—or interacting with the author in real time—such as in a voice call, a video call, a text chat session, a collaborative editing session focused on the document, etc.
In some embodiments, the facility uses a time-segmented revision index, in which the revision index is divided into segments each representing the revisions that occurred in a distinct period of time. For example, a first index segment may represent all of the revisions that occurred between 09:05:11 and 09:05:12, a second index segment may result represent all of the revisions that occurred between 09:05:12 and 09:05:13, etc. In some such embodiments, the facility traverses the index segments beginning with the latest in the direction of the earliest, and terminates the traversal once an adequate number of recent relevant revisions have been identified.
In some embodiments, the facility identifies document revisions relevant to each query without regard for when the revision was made.
In some embodiments, in identifying relevant document revisions relevant to a query, the facility considers revisions of particular relevance to the user issuing the query, such as by identifying revisions based on how close the user performing each revision is to the querying user in a social graph or an organizational graph.
By performing in some or all of these ways, the facility makes it easy for a user to learn about and engage with current work that is relevant to the user's search query, and therefore the user's present interests or needs.
Also, by performing in some or all of the ways described above and storing, organizing, and accessing information relating to document revisions in an efficient way, the facility meaningfully reduces the hardware resources needed to store and exploit this information, including, for example: reducing the amount of storage space needed to store the information relating to document revisions; and reducing the number of processing cycles needed to store, retrieve, or process the information relating to document revisions. This allows programs making use of the facility to execute on computer systems that have less storage and processing capacity, occupy less physical space, consume less energy, produce less heat, and are less expensive to acquire and operate. Also, such a computer system can respond to user requests pertaining to information relating to document revisions with less latency, producing a better user experience and allowing users to do a particular amount of work in less time.
While
Those skilled in the art will appreciate that the acts shown in
In some embodiments, the facility provides a method in a computing system for reporting on a corpus of documents, comprising: receiving a user-specified search query; in response to the receiving, (a) among documents in the corpus, identifying a proper subset that have each (1) been modified in a manner relevant to the search query (2) at a recent time; and (b) for each of at least a portion of the identified documents, causing to be presented information describing the document.
In some embodiments, the facility provides one or more instances of computer-readable media collectively having contents configured to cause a computing system to perform a method for reporting on a corpus of documents, the method comprising: receiving a user-specified search query; in response to the receiving, (a) among documents in the corpus, identifying a proper subset that have each (1) been edited in a manner relevant to the search query (2) at a recent time; and (b) for each of at least a portion of the identified documents, causing to be presented information describing the document.
In some embodiments, the facility provides a computing system, comprising: a processor; and a memory, the memory having contents that, when executed by the processor, cause the computing system to perform a method for reporting on a corpus of documents, the method comprising: receiving a user-specified search query; in response to the receiving, (a) among documents in the corpus, identifying a proper subset that have each (1) been edited in a manner relevant to the search query (2) at a recent time; and (b) for each of at least a portion of the identified documents, causing to be presented information describing the document.
In some embodiments, the facility provides one or more instances of computer-readable media collectively storing an index data structure reflecting revisions each to a document among a corpus of documents, comprising: information that, for each of a plurality of terms, identifies edits each to a document among the corpus that involve the term, such that, for a search query comprising one or more terms, the information is usable to identify edits to documents among the corpus that are relevant to the search query.
It will be appreciated by those skilled in the art that the above-described facility may be straightforwardly adapted or extended in various ways. While the foregoing description makes reference to particular embodiments, the scope of the invention is defined solely by the claims that follow and the elements recited therein.
Number | Name | Date | Kind |
---|---|---|---|
6324551 | Lamping et al. | Nov 2001 | B1 |
6356922 | Schilit et al. | Mar 2002 | B1 |
6397213 | Cullen et al. | May 2002 | B1 |
6421691 | Kajitani | Jul 2002 | B1 |
6484162 | Edlund et al. | Nov 2002 | B1 |
6928425 | Grefenstette et al. | Aug 2005 | B2 |
7334195 | Gemmell et al. | Feb 2008 | B2 |
7395501 | Graham et al. | Jul 2008 | B2 |
7437330 | Robinson et al. | Oct 2008 | B1 |
7627582 | Ershov | Dec 2009 | B1 |
7627590 | Boguraev et al. | Dec 2009 | B2 |
7689624 | Huang et al. | Mar 2010 | B2 |
7752204 | Kao et al. | Jul 2010 | B2 |
7756857 | Wan | Jul 2010 | B2 |
7849090 | Sweeney | Dec 2010 | B2 |
7865494 | Best et al. | Jan 2011 | B2 |
8005825 | Ghosh | Aug 2011 | B1 |
8005835 | Walther et al. | Aug 2011 | B2 |
8090708 | Held et al. | Jan 2012 | B1 |
8099406 | Lee | Jan 2012 | B2 |
8170932 | Krakowiecki et al. | May 2012 | B1 |
8515816 | King et al. | Aug 2013 | B2 |
8538967 | Wu et al. | Sep 2013 | B1 |
8554800 | Goldentouch | Oct 2013 | B2 |
8577911 | Stepinski et al. | Nov 2013 | B1 |
8819856 | Tiffe | Aug 2014 | B1 |
8965983 | Costenaro et al. | Feb 2015 | B2 |
9043319 | Burns et al. | May 2015 | B1 |
9043695 | Saito | May 2015 | B2 |
9092773 | Daly et al. | Jul 2015 | B2 |
9430454 | Newman et al. | Aug 2016 | B2 |
9588941 | Carrier et al. | Mar 2017 | B2 |
9626455 | Miller et al. | Apr 2017 | B2 |
20010044795 | Cohen | Nov 2001 | A1 |
20030037034 | Daniels | Feb 2003 | A1 |
20040013302 | Ma et al. | Jan 2004 | A1 |
20040015483 | Hogan | Jan 2004 | A1 |
20040024739 | Copperman et al. | Feb 2004 | A1 |
20040194021 | Marshall et al. | Sep 2004 | A1 |
20040205046 | Cohen | Oct 2004 | A1 |
20040261016 | Glass et al. | Dec 2004 | A1 |
20050055357 | Campbell | Mar 2005 | A1 |
20070003166 | Berkner | Jan 2007 | A1 |
20070055831 | Beeston | Mar 2007 | A1 |
20070294614 | Jacquin et al. | Dec 2007 | A1 |
20080154886 | Podowski et al. | Jun 2008 | A1 |
20080184101 | Joshi et al. | Jul 2008 | A1 |
20080201632 | Hong et al. | Aug 2008 | A1 |
20080263023 | Vailaya et al. | Oct 2008 | A1 |
20080270396 | Herscovici | Oct 2008 | A1 |
20080288859 | Yuan et al. | Nov 2008 | A1 |
20090157572 | Chidlovskii | Jun 2009 | A1 |
20090222490 | Kemp | Sep 2009 | A1 |
20110082848 | Goldentouch | Apr 2011 | A1 |
20110131211 | Harrington | Jun 2011 | A1 |
20110295844 | Sun | Dec 2011 | A1 |
20120254161 | Zhang et al. | Oct 2012 | A1 |
20120310931 | Oliver et al. | Dec 2012 | A1 |
20130124515 | Ghimire | May 2013 | A1 |
20130254126 | Koenig et al. | Sep 2013 | A1 |
20140040812 | Kurtz | Feb 2014 | A1 |
20140101527 | Suciu | Apr 2014 | A1 |
20140229475 | Walsh | Aug 2014 | A1 |
20140250377 | Bisca et al. | Sep 2014 | A1 |
20150169755 | Cierniak et al. | Jun 2015 | A1 |
20150302063 | Nigam et al. | Oct 2015 | A1 |
20150331841 | Antebi et al. | Nov 2015 | A1 |
20150339282 | Goyal | Nov 2015 | A1 |
20160034567 | Miller et al. | Feb 2016 | A1 |
20160070741 | Lin | Mar 2016 | A1 |
20160246886 | Chakraborty | Aug 2016 | A1 |
20160372079 | Ku | Dec 2016 | A1 |
20170011073 | Deshpande | Jan 2017 | A1 |
20170075862 | Kumar | Mar 2017 | A1 |
20170193039 | Agrawal | Jul 2017 | A1 |
20170220546 | Codrington et al. | Aug 2017 | A1 |
20170251072 | Rinehart et al. | Aug 2017 | A1 |
20170300481 | Mullins et al. | Oct 2017 | A1 |
20170351954 | Kosarek | Dec 2017 | A1 |
20180025303 | Janz | Jan 2018 | A1 |
20180165554 | Zhang et al. | Jun 2018 | A1 |
Number | Date | Country |
---|---|---|
210748 | Oct 2009 | EP |
Entry |
---|
GitHub.com webpages from archive.org, various publication dates ranging Apr. 20, 2013 to Nov. 16, 2016. (Year: 2016). |
Elsas et al., “Leveraging temporal dynamics of document content in relevance ranking,” Proceedings of the third ACM international conference on Web search and data mining, ACM, 2010. (Year: 2010). |
“Unindexed items in Content Search,” https://support.office.com/en-us/article/Unindexed-items-in-Content-Search-d1691de4-ca0d-446f-a0d0-373a4fc8487b?ui=en-US&rs=en-, Retrieved on: Oct. 5, 2016, 5 pages. |
“International Search Report and Written opinion Issued in PCT Application No. PCT/US2017/027388”, dated Jul. 12, 2017, 13 Pages. |
Xu, et al., “GooRaph: Document Visualization of Search Results”, Retrieved from http://www.leeds.ac.uk/evie/workpackages/wp5/EDV_09_WP5_PR01_v2.1_DocVizOfSearchResults.pdf, Dec. 24, 2014, 8 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 15/098,240”, dated Jan. 2, 2019, 19 Pages. |
“Final Office Action Issued in U.S. Appl. No. 15/098,240”, dated May 22, 2019, 15 Pages. |
“Final Office Action Issued in U.S. Appl. No. 15/098,240”, dated Mar. 11, 2020, 14 Pages. |
“Non Final Office Action Issued in U.S. Appl. No. 15/098,240”, dated Nov. 8, 2019, 16 Pages. |
Number | Date | Country | |
---|---|---|---|
20180189277 A1 | Jul 2018 | US |