Document viewing and editing applications (hereafter collectively referred to as document viewers or content processing applications) provide users with the ability to read, edit, and specify a variety of annotations for documents, images, and other digital content. Examples of such applications include iBooks® and iBooks Author®, all developed and licensed by Apple, Inc. These applications give the users the ability to make a variety of annotations, including highlights of texts, notes corresponding to particular highlights, bookmarks, and other annotations in a variety of manners.
A user may over time, create numerous annotations for one particular version of a document, including numerous highlights of text throughout the document, various notes associated with the highlights, various bookmarks on different pages of the document. The user may subsequently obtain a newer version of the document on their device. However, the newer version of the document will not contain any of the user's previously specified annotations. If the user wishes to carry over their annotations from the first version of the document, the user will have to manually examine each annotation they made in the previous version of the document and determine where to create the same annotation (e.g., highlight) in the new version of the document. The user will also have to re-specify each bookmark and note for each annotation in the new version of the document. This will likely be a time consuming and onerous task for the user, especially in situations where the user has a significant number of annotations. Furthermore, this becomes even more difficult when the text within the newer version of the document has been rearranged to different locations within the document and thus would require the user to search throughout the new version of the document to find the corresponding location for an annotation.
Some embodiments provide a content processing application with a novel annotation migration operation that allows the application to automatically migrate annotations for a first version of a content to a second version of the content. Each version of the content includes a number of content segments. The first version also includes at least one particular annotation that is specified for at least a first set of content segments in the first version.
The content processing application examines different sets of content segments in the second version to identify in an automated manner a particular set of content segments that matches the first set of content segments. Upon identifying a matching particular set of content segments, the content processing application associates the particular annotation with the particular set of content segments in the second version. The content processing application can then provide a presentation of the second version with the particular annotation for the matching particular set of content segments. In some embodiments, a user specifies the particular annotation for the first set of content segments in the first version. Examples of such annotation include user-specified notes, user-specified highlights, user-specified bookmarks and/or other user-specified annotations. In some embodiments, the content processing application automatically creates certain annotations on behalf of the user, such as implicit bookmarks that identify the last reading position of the user within a document.
In some embodiments, the first set of content segments includes a second content segment set that is annotated and a third content segment set that includes one or more content segments that are selected near the second content segment set in order to define a context around the second content segment set. When examining different sets of content segments in the second version, the content processing application in some embodiments analyzes content segment sets within a particular section of the second version that corresponds to a section in the first version. Alternatively, or conjunctively, when examining different sets of content segments in the second version, the content processing application in some embodiments (1) uses one or more of the content segments in the first set of content segments to derive a search string, and (2) applies the search string to a search index to identify a portion of the second version that contains the different content segment sets.
In some embodiments, the content is a document and the content processing application is a document viewer that presents the document. The content segments in the document in some embodiments include words, images, and/or other content segments (such as audio or video segments) that can be placed in the document viewer. In these embodiments, the annotations are specified for a first set of content segments (e.g., a first set of words, or a first set of words and images) in a first version of a document. The document viewer examines different sets of content segments in the second version to identify a particular content segment set that matches the first content segment set which has an associated particular annotation. Upon identifying a matching particular content segment set, the document viewer associates the particular annotation with the particular content segment set in the second version. The document viewer displays the second version with the particular annotation associated with the matching particular content segment set.
The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, Detailed Description and the Drawings is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, Detailed Description and the Drawings, but rather are to be defined by the appended claims, because the claimed subject matters can be embodied in other specific forms without departing from the spirit of the subject matters.
The novel features of the invention are set forth in the appended claims. However, for purposes of explanation, several embodiments of the invention are set forth in the following figures.
In the following detailed description of the invention, numerous details, examples, and embodiments of the invention are set forth and described. However, it will be clear and apparent to one skilled in the art that the invention is not limited to the embodiments set forth and that the invention may be practiced without some of the specific details and examples discussed.
Some embodiments of the invention provide a document viewer with a novel annotation migration tool that allows the application to automatically migrate annotations for a first version of a document to a second version of the document. Examples of such a document viewer include a document reader (e.g., an electronic book reader), a document editor (e.g., a word processing application that allows the viewing and editing of a document), a web browser, or any other application through which a document can be viewed. Examples of such annotations include user-specified notes, user-specified highlights, user specified bookmarks, and/or other user-specified annotations. In some embodiments, the content processing application automatically creates certain annotations on behalf of the user, such as implicit bookmarks that identify the last reading position of the user within a document.
Each version of the document includes a number of content segments. The document viewer examines different content segment sets in the second version to identify a particular content segment set that matches a first content segment set in the first version for which a particular annotation has been specified. Upon identifying a matching particular content segment set, the document viewer associates the particular annotation with the particular content segment set in the second version. The document viewer displays the second version with the particular annotation associated with the matching particular content segment set.
A user may have specified this highlighting, as in some embodiments a user may highlight different portions (e.g., character, text, word, image, and/or other audio, image, or video content segments) of the document. Such highlights get stored as annotations within the document in some embodiments. The document viewer further provides the user with the ability to perform certain other functions for each highlight, including adding notes for the highlight, searching the document or the web for other locations that contain the highlighted text, and various other functions. The document viewer will display the same portion as highlighted anytime the portion is subsequently displayed to the user on the user's device. The document viewer also allows the user to view any notes previously specified for any highlighted portion of the document. In some embodiments, the document viewer allows a user to specify note annotations without highlighting any portion of the document.
The second, third and fourth views 110-120 in
The second view 110 illustrates a basic example in which the text in the second version that corresponds to the annotated text in the first version is at the same relative position in the second version as the annotated text is in the first version. As illustrated in the second view 110, the word string “the 8th largest economy in the world” 160 appears on page 10 of the Chapter 2 in the second version, which is the same exact page and chapter on which it appeared in the first version. Given that the annotated text appears in the same exact location in the first and second versions, the migration tool highlights the word string “the 8th largest economy in the world” 160 in the second version to match the specified highlight 150 in the first version. In some embodiments, the migration tool performs this annotation migration when the document viewer opens the second version for the first time. As further described below, the migration tool in some embodiments might perform this migration at different times or in different ways, such as upon downloading of the second version, or in a background mode while a user is viewing the second version of the document, or at some other time and/or in some other manner.
The first example that is illustrated by the second view 110 may occur when the author or publisher of the book only added new chapters to the end of the document and thus left the initial chapters that were part of the first version unchanged. In this situation, the document viewer migrates each annotation to the same corresponding content segment set (e.g., word string) within the second version, which appears at the same relative location in the second version that the originally annotated content segment set appears in the first version.
The second example illustrated in the third view 115 presents a more complicated situation in which the second version of the document provides some additional text that was not included in the first version of the document. As such, the text in Chapter 2, version 1 and Chapter 2, version 2 is not identical. In particular, the second version has added the additional text string “As of the year 2012,” 170 to the sentence that precedes the sentence that contains the annotated word string in the first version. Furthermore, the particular page for Chapter 2 now starts on page 13 in version 2, and not page 10 in version 1.
Despite these changes, the document viewer has still been able to successfully migrate the annotation 150 into the second version of the document, as illustrated by the highlighted “the 8th largest economy in the world” 180. As such, the document viewer has successfully identified the appropriate word string and corresponding location within the second version in order to migrate and incorporate the particular annotation. This situation is common when a second version of a document provides additional paragraphs or sections within a new version of a particular chapter that contains annotations. Thus, the document viewer recognizes that the particular word string for which it has to incorporate an annotation may not be at the exact location within the chapter as the original annotation, but will likely be in a relatively close, or nearby location. The document viewer need only search within the nearby vicinity, or sections, of the original location to identify the word strings and the correct location for which it has to incorporate the particular annotation for the content segment.
The third example illustrated in the fourth view 120 presents an example of the document viewer successfully migrating an annotation to a completely different chapter of the document. In certain situations, the author or publisher of the document may move various content segments, including text and paragraphs, from one particular location within the document to a completely different location within a subsequent version the document. As illustrated by the fourth view, the text string “8th largest economy in the world” 190 now appears within a completely different chapter in the second version. In particular, this text string now appears within Chapter 5 of the document, and not Chapter 2. However, the document viewer has successfully recognized the word string “the 8th largest economy of the world” 190 in Chapter 5 and has successfully migrated the annotated highlights for this word string from the first version to the presentation of this word string in Chapter 5 of the second version.
To successfully identify the appropriate locations of matching content segment sets (e.g., word strings) in different versions of a document, the document viewer performs different content-segment matching processes in different embodiments. For instance, in some embodiments, the content-segment matching process of the document viewer initially examines one or more content segment sets (e.g., word strings) at or near a location within a section of the second version that corresponds to a location in a section in the first version that contains the annotated content segment set, in order to find the matching particular content segment set. When it finds a matching content segment set at or near the initially searched section of the second version, the content-segment matching process associates the annotation with the matching content segment set. When the process finds multiple matching content-segment sets at or near the initial search location, the process in some embodiments selects the set that is closest to the relative position of the annotated content-segment set in the first version, or selects the closest set in a particular direction (e.g., the closest to the right of the original location).
When the process does not find the matching content-segment set at or near the initially searched location, the process in some embodiments (1) uses one or more of the content segments in the first set of content segments to derive a search string, and (2) applies the search string to a search index to identify another section of the second version to search in order to find the matching particular content segment set. Alternatively, the content-segment matching process in some embodiments uses a search index to identify another chapter that may contain the matching content-segment set, and only does this when it makes a determination that the second version does not contain the section with the annotated content-segment set from the first version. In some such embodiments, the content-segment matching process places the “orphaned” annotation at a particular default location (e.g., the end) of the second-version chapter that corresponds to the first-version chapter with the annotated content-segment set. This placement informs the user that the section containing the annotated text in the first version cannot be found in the second version.
In order to successfully migrate the annotations between different versions of a document, the document viewer creates and stores a variety of data for each particular annotation, which it may later use to perform its content-segment matching process. The data includes the location of the annotation within the document (e.g., chapter, section, offset), the content of the annotation (e.g., the highlighted content segments, the surrounding text of the highlight), and certain document-specific information including the particular version of the document in which the annotation was created.
The document viewer in some embodiments uses a hierarchal data structure for efficiently storing and accessing document data.
The hierarchical data structure illustrated in
In some embodiments, each chapter, section and body layer node has an associated identifier (ID) value (not shown) that uniquely identifies the node. As further described below, each particular content segment in the body layer can be uniquely specified in terms of the chapter ID, the section ID, the body ID, and an offset value in the body layer. The chapter, section and body IDs can be used to identify the body layer in which the particular content segment resides, while the offset value can be used to identify the location of the particular content segment within the body layer. In some embodiments, the offset value is a number that specifies the number of content segments that precede the particular content segment in the body layer.
As illustrated in
For each annotation, the user may specify a note to associate with the annotation. In some embodiments, the user-specified notes for an annotation are stored in that annotation's data structure or in a data structure associated with (e.g., linked to) the annotation data structure. Several examples of annotation and note data structures are provided below.
By storing the various annotation and document information in the tree structure 200, the document viewer can quickly migrate annotations between different versions of a document. The viewer simply steps through the different annotation data structures and tries to identify content segment sets in the new version of the document that match the annotated content segment sets that are identified in the data structure of the document's previous version. For instance, in some embodiments, the viewer tries to identify the matching word string in a later version for each annotated word string in the earlier version by initially examining the body layer of the section in the later version that corresponds to the section in the earlier version with the annotated word string. If it determines that the corresponding section has been deleted in the later version, then it uses the words in the word string to derive a search string in some embodiments, and then uses a search index to identify other chapters that have other sections with other body layers that might contain the matching search string.
As mentioned above, the document viewer stores the context for each annotation, and uses this context to identify potential matching content segment sets (e.g., matching word strings) in subsequent versions of the document. In other words, by using information regarding the context of a particular annotation, the document viewer can provide a greater level of accuracy for migrating annotations. One example of such context includes the surrounding text adjacent to a particular highlighted word string.
The second and third views 310 and 315 in
The third view 315 illustrates a portion of Chapter 10 of the second version, which also contains the word string “the 8th largest economy in the world” on page 99 of the document. In this situation, both the annotation text string and the context text string match at this particular location. Specifically, page 99 includes the pre-text string “most populous state. California has” and post-text string “The capital of California is” before and after the word string “the 8th largest economy in the world.” As such, the document viewer migrates the annotation to this particular word string at this particular location within the second version of the document.
In the example illustrated in
The context of an annotation is particularly useful in situations where the user has highlighted a relatively short phrase. For example, when a user highlights only a single word within the document, the context of the word becomes essential since the word is more likely to appear in numerous locations within the document than a longer phrase containing the word. In some embodiments, the document viewer analyzes more context words when a user highlights a relatively short phrase. In other situations, the document viewer analyzes fewer context words when analyzing a longer word string. Several more detailed embodiments are described below. Section I describes the annotation creation process and data structure for a particular document. Section II describes the annotation migration process for migrating annotations from a first version of a document to a second version of the document. Section III describes the software architecture of a content processor that uses an annotation migration tool in some embodiments. Section IV describes an electronic system that implements some embodiments of the invention.
Different types of annotations (e.g., highlights, notes, bookmarks) can be created for a particular document through several mechanisms. For instance, a user can create a variety of highlight annotations and notes throughout different portions of a document through various user input.
Stage 410 illustrates the document viewer displaying the portion of text, “the 8th largest economy in the world” 150 as highlighted within document. Furthermore, the document viewer is displaying a tool bar that contains several icons 425-435 of additional tools that the user may access. The style icon 425 allows the user to change the color and style that is used to display the highlighted text. The remove highlight icon 430 allows the user to remove a highlight that was previously made for a portion of text. The notes icon 435 allows the user to add notes to the corresponding highlight. In some embodiments, the toolbar is displayed based on a gesture input from the user. In this example, the user is tapping the touchscreen to display the toolbar. In other embodiments, the toolbar is displayed by other mechanisms (e.g., menu selection).
Stage 415 illustrates the user selecting the notes icon 435 from the toolbar. Stage 420 illustrates the device displaying a notes user interface in which the user has input a note, “This is on the test”, to be associated with the selected text. In some embodiments, the note is stored as a note annotation associated with the highlight annotation. The document viewer also stores a timestamp for each highlight and note annotation. Stage 420 illustrates the user selecting the “Done” icon 440 to indicate that the user has finished adding the note for the particular highlight. After the user selects the “Done” icon 440, the document viewer returns to displaying the same portion of text that was displayed (i.e., as shown at stage 415) prior to the user entering the notes UI screen. The user may then proceed to create other highlights in other locations of the document. All of these highlights and notes get stored as annotations within the document. Furthermore, for each annotation, the document viewer stores numerous other information, including the location of the annotation within the document, the time the annotation was created and/or edited, the text surrounding the annotation, and various other data.
The process next identifies and stores (at 510) the location data for the particular annotation. The process stores this information in an annotation data structure. The location data identifies the precise location of the word string within the document. This location can be specified using the organizational structure of the document. For instance, in some embodiments, the process stores the chapter, section, and a word index or offset of the location of the text string within the document. In some embodiments, the process stores the offset of the first word within the document and an offset for the last word within the annotation. In some embodiments, the process stores the page number of the page that contains the word string within the document. As illustrated in stage 405 of
After the process identifies and stores the location data, the process identifies and stores (at 515) text data for the particular annotation. The text data includes the particular highlighted word string indicated by the user. The text data also includes the surrounding context word strings adjacent to the highlighted text. As illustrated in stage 405 of
The process next identifies and stores (at 520) certain book-specific information for the particular annotation. The book information includes the Book ID number, similar to a book's ISBN, as well as the book's version number. Storing a version number for each annotation is important in situations where a user downloads a different version of a book and thus the document viewer needs to migrate annotations between the different versions of the same book.
The process then incorporates (at 525) this annotation data into the set of annotation data for the particular document and version stored on the user's device. The process then ends.
Each annotation data structure corresponds to a particular word string at a particular location within the structured electronic document. A brief overview of the relationship between the annotation data structure and the hierarchical tree structure of the electronic document is provided by reference to
The hierarchical data structure illustrated in
As illustrated, each chapter node contains one or more section child nodes, which provide the next level of nodes within the tree structure. Each section node includes a body child node and one or more floating child nodes, which provide another level of nodes within the tree structure. Lastly, each body node includes an inline child node.
Each of the body nodes, floating nodes, and inline nodes may be used as a storage node to store content segments within the electronic book 605. In some embodiments, each storage has an associated identifier, or unique Storage ID, that uniquely identifies the storage. In some embodiments, this Storage ID may be a Globally Unique Identifier, or GUID, within the document. The GUID is a unique identifier that is used to identify a particular storage within the document. In addition, each storage node may be identified within the hierarchical tree structure 600 using location information. In particular, each storage node can be uniquely specified in terms of the chapter ID, section ID, and either a body ID, a floating ID, or an inline ID.
In some embodiments, content is defined within both the body layer and the floating layer. Content in the body layer is placed “in line” (i.e., two pieces of content cannot overlap in the body layer) in some embodiments. In contrast, content within the floating layer can overlap with other content within the floating layer. In other words, content in the floating layer may occlude other content in this layer. Consequently, in these embodiments, adding new content or dragging existing content within the floating layer may result in overlapped content.
Content in the floating layer is not affected by content in the body layer of the document. Content in either the floating or body layer can be replaced with new content without affecting content in the other layer. Thus, the floating object nodes exist within a section of the document independent of the body object nodes. In particular, the body object nodes typically have a relationship to other body object nodes, such as a sequential or in-line relationship, in some embodiments.
When a user highlights a particular word string within the electronic document, as described in
Each annotation data structure, 615 and 610, is associated with a particular node of the tree structure 600 that contains the word string corresponding to the annotation. The annotation data structures each include the following fields: an Annotation ID, a Storage ID, a Book ID and Version number, a Location ID, a Body Index, a String Text, a String Pre-Text, a String Post-Text, and an Annotation Note.
The Storage ID identifies a particular storage (e.g., body node, floating node, or inline node) within the electronic document structure that contains the content segments, or word string, associated with the particular annotation. As illustrated, annotation data structure 615 with Annotation ID “5” contains within its Storage ID the number “20”. This Storage ID corresponds to body object node 620 in the hierarchical tree structure 600, as illustrated by the arrow from the annotation data structure to this node. Likewise, annotation data structure 610 with Annotation ID “10” contains within its Storage ID the number “30”. This Storage ID corresponds to floating object node 625 in the hierarchical tree structure 600, as illustrated by the arrow from this annotation data structure to this node.
The Book ID identifies the unique book identification number, similar to an ISBN number of the book. Each annotation is stored specifically for a particular book or document, as identified by its Book ID number Annotation data structures 615 and 610 both contain the Book ID number A4124, because both annotation data structures 615 and 610 relate to the same book 640.
The Book Version number identifies the particular version of the document that the annotation was created within. As illustrated, annotation data structures 615 and 610 both indicate that they correspond to book version 1.0. This version number is important to the annotation migration process since this process is executed when a device receives a different version of a document from that already stored on the device. The document viewer uses this version number when determining whether to migrate annotations from a particular version of a book to a newly received version of the same book. In some embodiments, the document viewer will only migrate annotations to a subsequent version of a document. Thus if a user's device currently contains version 3 of a document and subsequently downloads an older version 2 of the document, the document viewer will not migrate the annotations from the version 3 document to the version 2 document in some embodiments. Furthermore, in some embodiments, the cloud storage that automatically backs-up data on a user's device will also not accept annotations from an earlier version of a document once a user has obtained a newer version of the document on any of their devices that is synced with the cloud storage. This is described in more detail below with reference to
Furthermore each annotation data structure includes a Location ID that is used to locate a content segment associated with the annotation within a particular storage. The Location ID identifies the location of the content segment within a particular storage in the hierarchical tree structure 600. The Location ID can be used as an alternative, or supplement, to the Storage ID in certain situations to locate a particular storage. In particular, this Location ID is specified using the particular chapter ID, section ID, body ID or Floating ID, and an offset value in the body layer. The chapter, section and either the body or floating IDs can be used to identify the storage node that contains the particular content segment, while the offset value can be used to identify the location of the particular content segment within the storage node. In some embodiments, the offset value specifies the number of content segments that precede the particular content segment in the storage.
Annotation data structure 615 contains within its Location ID four values: Chapter 2, Section 1, Body 1, and Offset 10. As such, this annotation corresponds to a word string within an in-line body portion of Chapter 2, Section 1 of the document. The particular word string is at an offset of 10 within this particular section. Likewise, annotation data structure 610 contains within its Location ID four values: Chapter 10, Section 1, Floating 1, and Offset 10. As such, this annotation corresponds to a word string within a floating portion of Chapter 10, Section 1.
The String Text field stores the word string of the highlighted content segment specified by the user for the particular annotation Annotation data structure 615 contains the highlighted text string “the 8th largest economy in the world.” Likewise, annotation data structure 610 contains the highlighted text string “Texas . . . ”.
The context includes the surrounding text that is adjacent to the highlighted word string. The String Pre-Text and String Post-Text fields store contextual text for each annotation. The document viewer uses this context when identifying potential matching word strings for the particular annotation. Annotation data structure 615 contains within the String Pre-Text field the word string “populous state, California has” and within the String Post-Text field, the word string “The capital of California”.
Annotation Note Field stores user-entered notes associated with an annotation. The Annotation Note field of annotation data structure 615 provides a separate note data structure 630 that stores certain information for the particular note. As illustrated, note data structure 630 includes a Note ID that identifies the particular note, an Associated Annotation ID that identifies the Associated Annotation for the note, a String Note field that contains the word strings input by the user and a Book Version number to indicate which version of a book the particular note was created for. As shown in this example, the note 630 specifies values for String Note “This is on the test!” and Book Version “1.0”.
Using the information from one or more fields of an annotation data structure, the document viewer can locate a word string corresponding to a particular annotation within a document using several mechanisms. In some embodiments, the document viewer may initially use the Storage ID within the annotation data structure to locate a particular word string within the structured electronic document. As each body node, inline node and floating node contains a unique Storage ID, the document viewer can directly access these particular storage nodes using the Storage ID number of that node.
Likewise, during the annotation migration process, in order to identify the expected location of a word string within a second version of a document, the document viewer will first examine the annotation data structure for a particular annotation of a first version of a document to identify the particular storage ID of the annotation. Once the document viewer knows the storage ID value, it can directly access the same storage ID within the second version of the document to examine whether it contains a matching word string.
The Storage ID is particularly useful in situations where an author of a first version of a document reorganizes a second version of the document such that that a particular section of the document is now placed in a different location within the second version of the document. For example, if the author of a first version of a document takes the first section of the first chapter and places this in the last section of the last chapter within the second version of the document, the document viewer can quickly identify the correct section to migrate any annotations, (e.g., from the first section of the first chapter to the last section of the last chapter) as long as the storage ID values are the same between the first version and the second version for that particular storage node.
In certain situations, a document may not use the same storage ID for corresponding storage nodes in different versions of the document. As such, in some embodiments, the storage ID alone of a particular node in a first version of a document may be insufficient to locate the node within a second version of the document.
In particular, in situations where the document viewer lacks confidence that it has the correct storage ID within a particular version of a document, or where the storage ID does not exist in the second version of the document, the document viewer may rely on other information within the Location ID to locate a particular word string within the document. For example, had annotation data structure 615 not had a value within the Storage ID that correctly identified the body object node 620, the document viewer could use the Location ID information to locate the particular body node 620.
As illustrated, Annotation data structure 615 contains the Location ID value of Chapter 2, Section 1, Body 1, Offset 10. The document viewer uses the Location ID information in order to traverse the tree from the root 640 to the correct storage node. In particular, the document viewer begins at the root node 640 and compares Chapter 2 to each child node of the root node. When the document viewer identifies the correct child node 650 corresponding to Chapter 2, the document viewer proceeds to examine the section level nodes for this chapter node 650. The process next locates the Section 1 node 660. After identifying the correct section node, the process identifies the body object node 620 that contains the particular word string associated with the particular annotation data structure 615. As such, in situations where an annotation data structure does not contain a Storage ID, or contains an inaccurate Storage ID, or a Storage ID that no longer exists, the document viewer may use the Location ID to traverse the hierarchical tree structure to locate a particular storage that contains a particular word string. By storing several types of location information, including the Storage ID and the Location ID, the document viewer can use each particular type of location information when other location information is not available or as a supplement to verify the accuracy of the storage node (body, floating, inline node) that has been identified.
Furthermore, by storing the various annotation and document information in this particular organizational structure, the document viewer can quickly migrate annotations between different versions of a document in an accurate and efficient manner. Likewise, by storing the different pieces of information, the document viewer can successfully migrate annotations in a variety of different scenarios.
Ii. Annotation Migration
Some embodiments of the document viewer provide a novel annotation migration operation that allows the application to automatically migrate annotations for a first version of a document to a second version of the document. Each version of the document includes a number of content segments. The first version also includes at least one particular annotation that is specified for at least a first set of content segments in the first version.
As described above, the content segments in the document in some embodiments include words, images, and/or other content segments (such as audio or video segments) that can be placed in the document viewer. In these embodiments, the annotations are specified for a first set of content segments (e.g., a first set of words, or a first set of words and images) in a first version of a document. The document viewer examines different sets of content segments in the second version to identify a particular content segment set that matches the first content segment set that has an associated particular annotation. Upon identifying a matching particular content segment set, the document viewer associates the particular annotation with the particular content segment set in the second version. The document viewer displays the second version with the particular annotation associated with the matching particular content segment set.
The process 700 begins by extracting (at 710) a particular annotation from a first version of a document, such as a book. In some embodiments, the process incrementally extracts only those annotations of a particular chapter of the book that is currently being displayed on the user's device. In some embodiments, the process extracts all of the annotations when the document viewer opens the second version for the first time. As further described below, the migration tool in some embodiments might perform this process at different times or in different ways, such as upon downloading of the second version, or in a background mode while a user is viewing the second version of the document, or at some other time and/or in some other manner.
The process next determines (at 715) whether a unique matching word string exists at the exact expected location within the second version of the document to the annotated text. For explanation purposes with respect to
When the process 700 determines there is an exact match, the process proceeds to 720, which is described below. When there is not an exact match, the process transitions to 725 to determine if there are multiple matches.
The second, third and fourth views 810-820 in
The second example illustrated in the third view 815 illustrates the example in which some words that were included within the annotation in the first version of the document are deleted from the text in the second version at the exact expected location. As illustrated in the third view 815, the word string 870 that appears on page 10 of Chapter 2 in the second version, which is the same exact page and chapter as in the first version, now states “California has an economy.” As certain words are deleted, in particular “the 8th largest economy” in the second version, the annotation migration tool does not consider this word string 870 to be an exact match at the exact expected location for the particular annotation. Thus the tool does not highlight this word string in the second version of the document.
The third example illustrated in the fourth view 820 illustrates the example in which all of the words that were included within the annotation in the first version of the document are deleted from the text in the second version at the exact expected location. As illustrated in the fourth view 820, the word string 880 that appears on page 10 of Chapter 2 in the second version, which is the same exact page and chapter as in the first version, now is devoid of any text regarding the California economy. By deleting all of the words within the annotation, the annotation migration tool does not detect an exact match at the exact expected location for the particular annotation. Thus the tool does not highlight any word strings in the second version of the document.
Returning to the process of
In some embodiments, the process 700 does not require that an “exact match” (at 715) be made within the exact expected location, but rather, that the match meet certain criteria in order to migrate a particular annotation from a first version of a document to a second version of the document. In these embodiments, when the process determines that none of the criteria are satisfied, the process transitions to 725, described below. However, in these embodiments, when sufficient criteria are met, a “fuzzy” match is made in these embodiments.
View 905 is similar to view 805, view 910 is similar to view 810, and view 915 is similar to view 815 of
After failing to find an exact or “fuzzy” match, the process 700 next determines (at 725) whether there are multiple matches within the expected location (e.g., section) of the second version. As described above, in some embodiments, the expected location is the same section in the second version as the section that contains the annotated text in the first version. When there are not multiple matches at the expected location, the process transitions to 735, which is described below. When there are multiple matches at the expected location, the process transitions to 730 and incorporates the closest matching word string to the original location.
The second, third and fourth views 1010-1020 show the same text as the text of the annotation, but on different pages of the document in a different version from those of the first view. In this example, even though the portions of the document being displayed are on different pages within the same particular chapter, they are within the same section level storage node as related to the hierarchical tree structure illustrated in
In particular, the second view 1010 illustrates that the word string “the 8th largest economy in the world” 1060 appears on page 9 of Chapter 2. The third view 1015 illustrates that the word string “the 8th largest economy in the world” 1070 once again appears on page 15 of Chapter 2 and the fourth view 1020 illustrates that this word string 1080 appears again on page 16 of Chapter 2.
Returning to process 700 of
Referring back to
Although the examples above involve differences in page numbers, in some embodiments the process 700 does not analyze page number differences, but rather the differences in offset between different potential matching locations within a section and the original annotation location within the same section. In some embodiments, when two potential matching locations have an equal difference in offset, the annotation migration process selects the matching location to the right of the original annotation location. In other embodiments, when two potential matching locations have an equal difference in offset, the annotation migration process selects the matching location to the left of the original annotation location
Returning to process 700 of
The second view 1110 illustrates the second version of the document at the same expected location of the annotation 1150. As illustrated in the second view 1110, the word string 1160 that appears on page 10 of Chapter 2 in the second version, which is the same exact page and chapter as in the first version, now is devoid of any text regarding the California economy. Furthermore, the annotation migration tool has not detected any matching word strings within the section for the particular annotation.
The third view 1115 of
Referring back to
If the process 700 determines (at 745) that the expected section has not been deleted, the process incorporates (at 750) the annotations within a chapter-specific “Old Notes” section in the second version of the document. In this particular situation, the process has determined that no matching word string exists within the particular chapter of the second version of the document for the particular annotation, yet the second version still has the corresponding section of the document that was present in the first version of the document. Thus the process retains these annotations for the user within the same particular chapter of the document.
Referring back to
In some embodiments, if the process 700 determines (at 745) that the corresponding section has been deleted in the later version, it then uses (at 755) the words in the word string to derive a search string in some embodiments. In some embodiments, the process applies this search string to a search index in order to identify other chapters that have sections that might contain the search string.
Referring back to
View 1315 of
Rather than searching a document in a linear fashion (e.g., from the beginning to end), the annotation migration process in some embodiments utilizes a specialized search index to locate potential candidate word strings in various locations of the document. In some embodiments, the search index is a pre-compiled summary of the words that appear within the document along with an index of the corresponding location of the words within the document. In some embodiments, the search index is generated at the time that a particular version of a document is created. The search index may be later used by the document viewer to search for words and text throughout the document.
The process 1400 in some embodiments is performed by the annotation migration tool of the document viewer. The process 1400 initially receives (at 1405) an annotated word string to use as a search string to identify chapters in a different version that may contain a matching word. As mentioned above, the document viewer quickly identifies such chapters that contain some or all of the words in the search string. In other embodiments, the document viewer uses other schemes to specify when it should examine other sections and/or chapters for matching content segment sets. For example, in some embodiments, the document viewer not only uses the content segments (e.g., words) in the annotated content segment set (e.g., in the annotated word string) to identify a search string that is applied to a search index in order to identify the appropriate chapter or section for examination, but also uses the content segments (e.g., words) in the context to identify the search string. Also, in some embodiments, the document viewer examines other chapters or sections even when the section that contained the annotated content segment set is not deleted in the newer version of the document.
Referring back to
When the process 1400 identifies (at 1415) the location of a word within the document, the process next compares (at 1420) the surrounding candidate text of the word to the annotated text to determine whether they match. If the process 1400 determines (at 1425) that the annotated text does not match the surrounding candidate surrounding, the process transitions to 1435. If the annotated text matches the surrounding candidate text, the process returns (at 1430) the location information of the identified word and then transitions to 1435.
In
In some embodiments, the process detects and examines (at 1410) multiple “uncommon” words in the annotation word string 1520 (including the pre-text and post text). For example, as illustrated in
In some embodiments, the process detects the locations of every “uncommon” word within the annotated word string 1520 using the search index and only examines the locations that contain all of these words. For example, as illustrated in
Returning to
Referring back to
As described above, in some embodiments the annotation migration tool automatically migrates annotations for a first version of a document to a second version of the document. In some embodiments, the process migrates all of the annotations when the document viewer opens the second version for the first time. As further described below, the migration tool in some embodiments performs this process in a background mode while a user is viewing a particular chapter within the second version of the document.
The second stage 1710 illustrates that the document viewer now displays the notes user interface. The user interface includes the list of chapters within the document and the corresponding annotations for each chapter. The user interface currently indicates that there is one annotation 1730 for Chapter 1. The annotation for this chapter contains the highlighted word string “Texas still has a larger area than California” and the corresponding note “This is important.” Furthermore, an ellipsis (i.e., “ . . . ”) 1735 is shown for each of the remaining list of chapters. In some embodiments, an ellipsis 1735 is shown in lieu of a number because the number of annotations is not currently known. In this case, since the document viewer has only migrated the annotations from the first chapter of the document, the other chapters' annotations have not yet been migrated by the document viewer and, thus, the document viewer is not aware of the number of annotations within these chapters. In some embodiments, the document viewer migrates these annotations on an incremental (e.g., chapter by chapter, section by section, page by page) basis in order to optimize the performance of the device. This is particularly important on devices with limited resources (e.g., processing power, memory, battery life). In these embodiments, the document viewer only migrates those annotations within a particular portion of the document (e.g., portions that the user is viewing, or about to view on their device).
The third stage 1715 illustrates the document viewer displaying page 10 of Chapter 2 of the document. In this portion, the text “the 8th largest economy in the world” has been highlighted as an annotation 1770 within this particular chapter of the document. The user is once again selecting the “Notes” icon 1760 in order to view the annotations (highlights and notes) within the document.
The fourth stage 1720 illustrates the document viewer displaying the notes user interface. The user interface now indicates two annotations 1780 for Chapter 2, in addition to three annotations 1730 for Chapter 1. The annotation for Chapter 2 contains the highlighted word string “the 8th largest economy in the world” and the corresponding note word string “This is on the test!” Furthermore, the remaining list of chapters (Chapter 3-6) still display an ellipsis (i.e., “ . . . ”) 1735 for the number of annotations within the subsequent chapters. At this point, the document viewer has only migrated the annotations from the first and second chapters of the document.
In some embodiments, the annotation migration process will only migrate the annotations from the chapter that the user is currently viewing. In other embodiments, the annotation migration process uses a priority queue and migrates first the annotations from the currently viewed chapter, and subsequently migrates the annotations from the other chapters in the background. In some embodiments, if the user skips several chapters to view a new different chapter, the annotation migration process skips those chapters as well and migrates the annotations from the new chapter.
The second stage 1810 illustrates that the document viewer now displays the notes user interface. The notes user interface includes the list of chapters within the document and the corresponding annotations for each chapter. The user interface indicates that there are three annotations 1820 for Chapter 1 and two annotations 1830 for Chapter 2. The user interface also displays an ellipsis 1850 for Chapters 3-5. Furthermore, the user is selecting Chapter 6 to view the annotations. Since the user had not previously viewed Chapter 6, the document viewer has not migrated these annotations into the document. As such the document viewer displays an “Updating Notes” 1840 message to notify the user that the document viewer is currently synchronizing the annotations for this particular chapter within the document.
The third stage 1815 now illustrates the document viewer displaying the annotations for Chapter 6 of the document. This chapter currently contains two annotations 1890. Each annotation includes the highlighted word string within the document and the corresponding note. The document viewer has detected a matching word string within the document for each of these annotations, and thus has not placed them within the old notes section of the chapter. Furthermore, as the user skipped Chapters 3-5 and proceeded directly to Chapter 6 from Chapter 2, the annotations for Chapters 3-5 have not yet been incorporated into the document. As illustrated, Chapters 3-5 currently display ellipsis 1880 rather than a number to notify the user that these annotations have not yet been migrated.
The notes user interface of the document viewer also provides the user with a variety of tools and features. The tools include the ability to search for a particular annotation within the entire document, the Internet, or Wikipedia.
The first stage 1905 illustrates a document viewer executing on a device. The document viewer is currently displaying the notes view of the application. The notes view provides a list of the chapters within the document as well as the annotations that have been made within each particular chapter. The user is currently viewing the “Old Notes” section that contains annotations that have been migrated from a previous version of the document, but that were not matched to any particular word string within the current version of the document. The Old Notes include two highlighted word strings with two corresponding notes. The first highlight contains the words “Texas is the second most extensive state in the United States.” The note corresponding to this highlight states “This is amazing.” Stage 1905 also illustrates the user selecting this particular highlighted annotation through a tapping gesture on the particular highlight. In some embodiments, after a user taps the highlight for a particular amount of time, the document viewer selects the particular annotation, as illustrated by the highlighting of the text 1930.
The second stage 1910 illustrates the document viewer now displaying a toolbar overlaid on the highlighted text string. The user is also selecting a “Search” icon 1935 that will cause the document viewer to search the document for all locations that contain the particular highlighted word string. Stage 1915 illustrates that the user interface now displays a popover toolbar 1940 overlaid on the notes view of the user interface. Within this popover toolbar 1940, a list of the locations that contain this particular word string is listed. The first location is on page 5 of the document and the second location is on page 3 of the document. The popover also gives the user the option to search the web for the particular word string or to search Wikipedia. Furthermore, a user may modify the particular word string by typing within the search field of the popover user interface. As illustrated, the user is selecting the word string located on page 5 of the document. Stage 1920 illustrates that the document viewer now displays page 5 of the document that contains the corresponding matching word string. Furthermore, the user has selected the word string to identify it as a highlighted annotation within the document. The user is about to select the “Notes” icon 1945 of the toolbar in order to add a note for the particular highlight.
In
The notes view of the document viewer also provides the user with the ability to copy a particular annotation (either a highlighted portion of text or a corresponding note) and paste the annotation at various different locations.
The first stage 2005 illustrates the user tapping a particular annotation within their “Old Notes” which causes that particular annotation to be highlighted. The second stage 2010 illustrates the document viewer, in response to the user tapping on their “Old Notes”, displaying a toolbar overlaid on the highlighted text string. In this stage, the user is selecting a “Copy” icon (which is different from the “Search” icon that was selected in stage 1910 of
Stage 2015 illustrates that the user has pasted the annotation into the popover toolbar overlaid over the notes view of the user interface. Furthermore, this popover toolbar has listed various locations within the document that contain this particular highlight (e.g., word string). The first location is on page 5 of the document and the second location is on page 3 of the document. As illustrated, the user is selecting the word string located on page 5 of the document.
Stage 2020 illustrates that the document viewer now displays page 5 of the document that contains the corresponding matching word string. Furthermore, the user has selected the word string to identify it as a highlighted annotation within the document. The user is about to select the “Notes” icon in order to add a note for the particular highlight. This figure illustrates an alternative mechanism by which the user can search for a particular annotation within the document using the “Copy” icon.
The notes view of the document viewer also permits a user to make a variety of edits to their particular annotations. These edits may include revising the notes associated with a particular highlight, searching for either the notes or highlight in a variety of locations (e.g., within the document, the Web, Wikipedia, etc.) Furthermore, a user may easily remove notes and or annotations from their document.
As described above, the annotations within a document may also include, in addition to various highlights of text and notes, a user's set of bookmarks within a document. These bookmarks may include a set of user-specified bookmarks explicitly designated by the user, or certain implicit bookmarks that have been created by the document viewer on behalf of the user based on the user's last reading position within the document. During the annotation migration process, the document viewer migrates these annotations using the same process and annotation migration algorithm that has been described for migrating the annotations regarding a user's highlights and notes.
Stage 2210 illustrates that the bookmark toolbar 2235 is now displayed overlaid on the document. The user is also selecting the “Add Bookmark” icon in order to add a bookmark 2240 at the particular location of the document. The bookmark toolbar 2235 also provides the user with the ability to view certain recently viewed portions of the document.
Stage 2215 illustrates the user being notified that a new version of the particular document that the user is currently viewing has become available. In particular, the user is being notified that a new version of the book “50 States” is now available. The user is selecting to download this new version of the document. As illustrated, in some embodiments, the document viewer automatically notifies the user regarding updated versions of a particular document and allows the user to download the updates. The user may also access a content distribution system (e.g., iTunes®) to search for and obtain a particular version of a document. In some of these cases, the user's device automatically obtains a subsequent version of a document by accessing the content distribution system (e.g., iTunes®) without the user's input.
Stage 2220 illustrates that the user has now downloaded the new version of the document on the device. Furthermore, the user has selected the bookmark icon and is viewing a list of bookmarks for the particular document. The bookmark toolbar contains one bookmark 2245 that identifies a location of Chapter 2 of page 11 of the document. As such, the document viewer has migrated the user's bookmark 2240 from the first version of the document into the second version of the document. Furthermore, the document viewer has successfully identified that the corresponding location within the second version of the document is on page 11 of the document, which displays the beginning of Chapter 2. Even though the bookmark 2240 within the first version of the document was placed on page 10 of the document, the document viewer has successfully identified the proper location of the bookmark 2245 within the second version of the document, which is on page 11 of the document. The document viewer has identified the correct location to insert the particular bookmark using the same process and analysis described above in relationship to the migration of a user's highlight and notes annotations. In particular, the document viewer stores a variety of information for each particular bookmark that allows the document viewer to properly migrate these annotations between different versions of a document.
A user may explicitly specify certain bookmarks or the document viewer may specify certain implicit bookmarks on behalf of the user.
View 2310 illustrates a user creating an explicit bookmark within a document. In this view 2310, the document viewer is displaying Chapter 2, page 10 of the document. The user is also selecting the bookmark icon 2305 in order to create an explicit bookmark at this particular location of the document. The document viewer may also create certain implicit bookmarks on behalf of the user. View 2320 illustrates the document viewer automatically creating an implicit bookmark for the user upon the user closing out of the document viewer. As illustrated, the user is selecting button 2325 on the device, which closes out the document viewer. Upon closing the document, the document viewer automatically stores various information regarding the state and location of the user's particular reading position within the document at the time they closed out of the document.
The explicit user-specified bookmark and the implicit bookmark store various information that is used by the document viewer to correctly identify the correct location within the document for the particular bookmark. This information is stored within a bookmarks data structure for each bookmark.
For view 2310, the bookmark data structure 2330 contains the Bookmark ID “5”, the Storage ID “20”, the Book ID “A4124” with Version “1.0”. The Type is “Explicit Bookmark” since the user had explicitly inserted a bookmark in view 2310. The Location ID is Chapter 2, Section 1, Body 1, Offset 0, which corresponds to the particular portion of the document that is displayed in view 2310. The String Text field contains “Earthquakes are a common occurrence in California.” This word string is contained within the portion of the document displayed in view 2310. The String Pre-Text field contains the word string “California is known for several things, including earthquakes.” The document viewer has extracted certain text that is not displayed within view 2310, but that precedes the current portion of text being displayed. The document viewer uses both the word strings from the portions of the document that are currently displayed and word strings from the preceding text in order to correctly identify the exact position of the user's particular bookmark within the document. The Absolute Page Number is ten, which indicates this is the tenth page in the entire document and the Relative Page number is one, which indicates this is the first page of the particular chapter.
Bookmark data structure 2340 contains information corresponding to the bookmark created in view 2320. In particular, the Location ID contains Chapter 10, Section 1, Body 1, Offset 0, as the user was last viewing this particular portion, or chapter, of the document prior to closing out of the document. Furthermore, the Type field contains “Implicit Bookmark” to indicate this was automatically generated by the document viewer on behalf of the user to store the last reading position of the user prior the user closing out of the document. Furthermore, this bookmark data structure 2340 contains word strings from portions of text within the current page of the document, portions of text from the preceding page of the document, the absolute page number of the portion of text within the document, and the relative page number of the portion of text within the particular chapter of the document.
These bookmark data structures contain various information that is used by the document viewer to locate the exact location of the bookmark within the document. Furthermore, this information is essential during the annotation migration process and helps locate the correct locations to incorporate the bookmarks within a subsequent version of the document. By storing the various annotation and document information in the tree structure illustrated in
As described above, each particular location within the tree structure can be uniquely specified in terms of the Location ID (Chapter ID, Section ID, Body ID, and an Offset value) or through a Storage ID value, or using both the Location ID and Storage ID. The process identifies the particular storage through various mechanisms described above in detail in
As illustrated in
By storing the various information in each bookmark annotation data structure, the document viewer can quickly migrate these annotations between different versions of a document. The document viewer simply steps through the different annotation data structures and tries to identify locations in the new version of the document that match the location information identified in the annotation data structure of the document's previous version. The document viewer applies essentially the same process to the bookmark annotations that it uses for migrating other annotations (e.g., highlights and notes) described in detail above. For instance, in some embodiments, the viewer tries to identify the matching word string in a later version for each word string in the bookmark data structure for the earlier version by initially examining the body layer of the section in the later version that corresponds to the section in the earlier version with the particular word string.
In some embodiments, the document viewer disallows a user from migrating annotations to an earlier version of a book. For example, if a user currently has version 1 of a book on their device, and subsequently downloads version 2, all of the user's annotations will be migrated to version 2. However, if the user once again downloads version 1 of the document onto their device, the annotations that have been made within version 2 of the document will not be migrated back into version 1 of the document. This is disallowed primarily to avoid confusion regarding which set of annotations correspond to which version of a document.
Furthermore, for a user that is using a cloud service (e.g., iCloud®) to back up data from their device, once a user is viewing a particular version of a document on a device, only those annotations from the latest version of the document will be backed up to the user's cloud storage.
Iii. Content Processor Modules
In some embodiments, the processes described above are implemented as software running on a particular machine, such as a computer or handheld device, or stored in a machine readable medium.
The content processor 2600 includes a user interface 2615, an import module 2620, a content processing module 2630, an annotation matcher 2635, an annotation migration module 2640, a content segment matcher 2645, a search index storage 2650, a content storage 2625 and an annotation data storage 2655. Also shown in
In some embodiments, the user interface 2615 interacts with the interface module 2605 to receive input regarding various annotations that are to be created and incorporated into a particular version of a document. In some embodiments, the input is user input that is received through a touch sensitive screen of the display of the device, or another input device (e.g., a cursor controller, such as a mouse, a touchpad, a trackpad, or a keyboard, etc.) In some embodiments, the user interface 2615 passes the user input received from the interface module 2605 to the content processing module 2630.
The import module 2620 is for importing content (e.g., documents, electronic books, etc.) from a content distribution system 2610 (e.g., iTunes®) and storing the content in the content storage 2625. An example of a content distribution system in some such embodiments is a third party content provider that receives content requests from the import module 2620 and provides the content to the import module 2620. In some embodiments, the import module 2620 receives automatic notifications from the content distribution system 2610 of newly available content. The import module 2620 of some such embodiments automatically downloads newly available content and stores the content in the content storage 2625. In other embodiments, the import module 2620 downloads newly available content in response to a user input that the user interface 2615 receives from the interface module 2605 and passes to the import module 2620. In some embodiments, the import module communicates with the user interface 2615 to automatically notify a user regarding newly available content (e.g., an updated version for a particular document). In these embodiments, the import module downloads the newly available content only in response to the user's input to do so. When the import module 2620 downloads newly available content in some embodiments the import module 2620 stores the content in the content storage 2625.
The content processing module 2630 receives requests from the user interface 2615 to display a particular document. The content processing module 2630 determines whether the document that is to be displayed has any previous versions within the content storage 2625. The content processing module 2630 displays the document to the user through the user interface 2615 when there are no previous versions. However, when there are previous versions associated the document, the content processing module communicates with the annotation matcher 2635 in order to migrate the annotations from the previous version into the current version.
The annotation matcher 2635 migrates all of the annotations from the first version of the document into the second version of the document. In some embodiments, the annotation matcher 2635 migrates all of the annotations into a document upon detecting that the import module 2620 has downloaded a new version of the document from the content distribution system 2610. In other embodiments, the annotation matcher 2635 incorporates the annotations on an incremental basis based on the particular portion of the second version that the user is viewing on their device. In order to migrate the annotations, the annotation matcher 2635 communicates with the content segment matcher 2645 and the annotation migration module 2640.
The content segment matcher 2645 identifies locations in the second version of the document at which to incorporate the annotations of the first version. In order to correctly identify the locations within the second version of the document, the content segment matcher 2645 analyzes each annotation stored in the annotation data storage 2655 for the first version of the document and identifies the corresponding location of the annotation within the second version of the document. After identifying a particular location within the second version of the document, the content segment matcher 2645 forwards this location information to the annotation matcher 2635 in order to create the annotation at the correct location within the second version of the document. In some embodiments, the content segment matcher 2645 uses a search index storage 2650 to identify the corresponding location within the second version of the document for a particular annotation. In some embodiments, the content segment matcher 2645 only uses the search index storage 2650 in situations where the content segment matcher detects that a deleted section of the second version corresponds to a section that contains a particular annotation in the first version of the document. In other embodiments, the content segment matcher uses the search index when the content segment matcher is searching within the second version of the document. For example, the content segment matcher may use the search index 2650 to search other sections within a particular chapter in a second version of the document that corresponds to a chapter that contains the annotation in the first version of the document.
The search index storage 2650 stores a compiled word index of all of the words within the document and a corresponding location index of the location(s) of the word within the document. Certain words are excluded from the word index, including “common words” such as “the”, “a”, “where”, “there”, “he”, “she”, “it”, “and”, “they”, “who”, etc. The search index storage 2650 in some embodiments is compiled at the time the document is received by the import module 2620. In other embodiments, the search index storage 2650 is compiled as individual words are searched within the document (e.g., on the fly).
The annotation migration module 2640 initializes the annotation data structure for each annotation that is incorporated into the second version of the document. In some embodiments, the annotation migration module 2640 creates a new annotation data structure for each annotation in the second version of the document. In other embodiments, the annotation migration module 2640 modifies the annotation data within the annotation data structure of the first version of the document to correlate with the second version of the document. The annotation migration module stores the annotations in the annotation data storage 2655.
The annotation data storage 2655 stores the annotation data structure for each annotation in different versions of different documents. Each annotation data structure contains various information regarding the annotation, including the location of the annotation within the particular version of the document (e.g., Storage ID, Chapter ID, Section ID, Offset), the word strings within the document that correspond to the annotation (e.g., highlighted text), and the document information associated with the annotation (e.g., book ID number, version number).
The content storage 2625 stores various content (e.g., documents) received from the import module 2620. In some embodiments, the content storage 2625 stores different versions of a single document. In other embodiments, the content storage deletes a first version of the document when it receives a second version of the document from the import module 2620.
The operation of the content process 2600 will now be described for the case the content processing module 2630 is opening a new version of a document for which it has stored an older version with annotations. The content processing module initially receives from the user interface 2615 a request to display a particular document. The content processing module retrieves the requested document from the content storage 2625. If the content processing module also detects that the content storage contains a previous version of the document, the content processing module 2630 next determines whether the annotation data from the previous version of the document has been incorporated into the new version of the document. For explanation purposes, the previous version is referred to as a “first version” and the current version of the document is referred to as a “second version” of the document. When the content processing module determines that the annotation data from the first version has not been incorporated into the second version, the content processing module notifies the annotation matcher 2635 to begin migrating the annotations from the first version into the second version.
The annotation matcher 2635 retrieves all of the annotation data from the annotation data storage 2655 for the first version of the document. For each annotation in the annotation data, the annotation matcher 2635 extracts the annotation data structure for the annotation. The annotation matcher 2635 forwards the annotation data structure to the content segment matcher 2645. As described above, the annotation data structure includes the location of the annotation within the first version of the document (e.g., Storage ID, Chapter ID, Section ID, Offset), the content of the annotation (e.g., the highlighted content segments, the surrounding text of the highlight), and certain document-specific information including the particular version of the document in which the annotation was created.
The content segment matcher 2645 identifies and analyzes, using the location information within the annotation data structure, the particular section in the second version that corresponds to the section that contains the annotation in the first version.
When the content segment matcher 2645 detects that the section has been deleted in the second version of the document, the content segment matcher 2645 uses the search index storage 2650 to identify the location of other matching word strings in the entire document. The content segment matcher 2645 extracts a search string corresponding to the word string within the annotation data structure and applies each word in the search string to the word index within the search index storage 2650. The content segment matcher 2645 identifies the first word in the search string that is not a “common word” (e.g., “the”, “a”, “an”, etc.). The content segment matcher identifies, using the word index and corresponding location within the search index storage 2650, each location of the word within the second version of the document. For each identified location, the content segment matcher determines whether the entire search string matches the text within the particular location. Furthermore, the content segment matcher determines whether there is a unique match within the second version of the document. If the content segment matcher 2645 detects a unique match at a particular location, the content segment matcher 2645 forwards this location information to the annotation matcher 2635. If the content segment matcher 2645 does not detect a unique match in the entire document, the content segment matcher 2645 notifies the annotation matcher 2635 that no matching word strings exist within the entire document.
When the content segment matcher 2645 detects that the particular section has not been deleted in the second version of the document, the content segment matcher 2645 analyzes the word strings at the exact location (e.g., same offset within the Section ID or Storage ID) of the second version that corresponds to the annotation's location (e.g., offset within the Section ID or Storage ID) in the first version. When the content segment matcher 2645 identifies a matching word string, the content segment matcher forwards the location information (e.g., Storage ID, Chapter ID, Section ID, and Offset) to the annotation matcher 2635. When the content segment matcher 2645 does not identify a matching word, the content segment matcher searches within the same section (e.g., Storage ID or Section ID) to identify a matching word string. When the content segment matcher 2645 identifies a matching word string within the same section that is closest to the annotation of the first version, the content segment matcher 2645 forwards this location information (e.g., Storage ID, Chapter ID, Section ID and Offset) to the annotation matcher 2635.
When the content segment matcher 2645 does not identify a matching word string in the same section of the chapter, the content segment matcher examines the other sections within the same chapter for a matching word string (e.g., Chapter ID). If the content segment matcher 2645 detects a unique matching word string in a different section of the same chapter, the content segment matcher forwards the location information to the annotation matcher 2635. If the content segment matcher 2645 does not detect a matching word string in a different section of the same chapter, the content segment matcher 2645 informs the annotation matcher 2635 that no matching word string exists within the chapter.
As described above, the annotation matcher 2635 receives from the content segment matcher 2645 the location information (e.g., Storage ID, Chapter ID, Section ID, Offset) in the second version at which to incorporate an annotation of the first version of the document. In some embodiments, the annotation matcher 2635 uses the annotation migration module 2640 to migrate the annotation into the second version of the document. The annotation migration module 2640 receives the location information and creates an annotation data structure that includes the particular location information, the corresponding matching word string, and the document information. The annotation migration module stores this annotation data structure within the annotation data storage 2655. The annotation migration module also links the annotation data structure to the particular version of the document stored in the content storage 2625.
In some embodiments, the content segment matcher 2645 uses the search index storage 2650 to search the document even when the particular section has not been deleted in the second version. In particular, the content segment matcher examines a particular section in the second version of the document that corresponds to the section that contains the annotation in the first version. If the section has been deleted or does not contain a matching word string, the content segment matcher 2645 uses the search index 2650, described above, to immediately identify other sections (e.g., Storage IDs) within the document that contain a word string that matches the annotation.
After each annotation has been incorporated into the second version of the document, the annotation matcher 2635 informs the content processing module 2630 that all of the annotations from the first version of the document have been incorporated into the second version of the document. The content processing module 2630 then displays to the user, through the user interface 2615 the second version of the document showing the incorporated annotation data.
Iv. Electronic Systems
Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium). When these instructions are executed by one or more computational or processing unit(s) (e.g., one or more processors, cores of processors, or other processing units), they cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random access memory (RAM) chips, hard drives, erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the invention. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
A. Mobile Device
The content processing applications of some embodiments operate on mobile devices.
The peripherals interface 2715 is coupled to various sensors and subsystems, including a camera subsystem 2720, a wireless communication subsystem(s) 2725, an audio subsystem 2730, an I/O subsystem 2735, etc. The peripherals interface 2715 enables communication between the processing units 2705 and various peripherals. For example, an orientation sensor 2745 (e.g., a gyroscope) and an acceleration sensor 2750 (e.g., an accelerometer) is coupled to the peripherals interface 2715 to facilitate orientation and acceleration functions.
The camera subsystem 2720 is coupled to one or more optical sensors 2740 (e.g., a charged coupled device (CCD) optical sensor, a complementary metal-oxide-semiconductor (CMOS) optical sensor, etc.). The camera subsystem 2720 coupled with the optical sensors 2740 facilitates camera functions, such as image and/or video data capturing. The wireless communication subsystem 2725 serves to facilitate communication functions. In some embodiments, the wireless communication subsystem 2725 includes radio frequency receivers and transmitters, and optical receivers and transmitters (not shown in
The I/O subsystem 2735 involves the transfer between input/output peripheral devices, such as a display, a touch screen, etc., and the data bus of the processing units 2705 through the peripherals interface 2715. The I/O subsystem 2735 includes a touch-screen controller 2755 and other input controllers 2760 to facilitate the transfer between input/output peripheral devices and the data bus of the processing units 2705. As shown, the touch-screen controller 2755 is coupled to a touch screen 2765. The touch-screen controller 2755 detects contact and movement on the touch screen 2765 using any of multiple touch sensitivity technologies. The other input controllers 2760 are coupled to other input/control devices, such as one or more buttons. Some embodiments include a near-touch sensitive screen and a corresponding controller that can detect near-touch interactions instead of or in addition to touch interactions.
The memory interface 2710 is coupled to memory 2770. In some embodiments, the memory 2770 includes volatile memory (e.g., high-speed random access memory), non-volatile memory (e.g., flash memory), a combination of volatile and non-volatile memory, and/or any other type of memory. As illustrated in
The memory 2770 also includes communication instructions 2774 to facilitate communicating with one or more additional devices; graphical user interface instructions 2776 to facilitate graphic user interface processing; image processing instructions 2778 to facilitate image-related processing and functions; input processing instructions 2780 to facilitate input-related (e.g., touch input) processes and functions; audio processing instructions 2782 to facilitate audio-related processes and functions; and camera instructions 2784 to facilitate camera-related processes and functions. The instructions described above are merely exemplary and the memory 2770 includes additional and/or other instructions in some embodiments. For instance, the memory for a smartphone may include phone instructions to facilitate phone-related processes and functions. The above-identified instructions need not be implemented as separate software programs or modules. Various functions of the mobile computing device can be implemented in hardware and/or in software, including in one or more signal processing and/or application specific integrated circuits.
While the components illustrated in
B. Computer System
The bus 2805 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 2800. For instance, the bus 2805 communicatively connects the processing unit(s) 2810 with the read-only memory 2830, the GPU 2815, the system memory 2820, and the permanent storage device 2835.
From these various memory units, the processing unit(s) 2810 retrieves instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 2815. The GPU 2815 can offload various computations or complement the image processing provided by the processing unit(s) 2810. In some embodiments, such functionality can be provided using Corelmage's kernel shading language.
The read-only-memory (ROM) 2830 stores static data and instructions that are needed by the processing unit(s) 2810 and other modules of the electronic system. The permanent storage device 2835, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 2800 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 2835.
Other embodiments use a removable storage device (such as a floppy disk, flash memory device, etc., and its corresponding drive) as the permanent storage device. Like the permanent storage device 2835, the system memory 2820 is a read-and-write memory device. However, unlike storage device 2835, the system memory 2820 is a volatile read-and-write memory, such a random access memory. The system memory 2820 stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 2820, the permanent storage device 2835, and/or the read-only memory 2830. For example, the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit(s) 2810 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.
The bus 2805 also connects to the input and output devices 2840 and 2845. The input devices 2840 enable the user to communicate information and select commands to the electronic system. The input devices 2840 include alphanumeric keyboards and pointing devices (also called “cursor control devices”), cameras (e.g., webcams), microphones or similar devices for receiving voice commands, etc. The output devices 2845 display images generated by the electronic system or otherwise output data. The output devices 2845 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD), as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.
Finally, as shown in
Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In addition, some embodiments execute software stored in programmable logic devices (PLDs), ROM, or RAM devices.
As used in this specification and any claims of this application, the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification and any claims of this application, the terms “computer readable medium,” “computer readable media,” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. For instance, many of the figures illustrate various touch gestures (e.g., taps, double taps, swipe gestures, press and hold gestures, etc.). However, many of the illustrated operations could be performed via different touch gestures (e.g., a swipe instead of a tap, etc.) or by non-touch input (e.g., using a cursor controller, a keyboard, a touchpad/trackpad, a near-touch sensitive screen, etc.). In addition, a number of the figures (including