Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet as filed with the present application, are hereby incorporated by reference in their entirety under 37 CFR 1.57.
Field of the Invention
The present invention is related to detecting and identifying the alteration of digital content.
Description of the Related Art
Often, a party involved in litigation or subject to government regulation, is required to disclose information, such as electronically stored information (ESI), including for example, emails, electronic word processing documents, and video files, and audio files, to the other party involved in the litigation or to a government agency. At certain points in time, such information may be placed under a legal hold. When a legal hold is in place, the party holding the information may be prohibited from modifying, deleting or destroying the information. Conventional eDiscovery systems aid in determining when static electronic information has been altered, such as documents that are effectively static and that only change when the document is intentionally modified. However, conventional eDiscovery systems have not adequately addressed the challenges posed by dynamic digital documents that include links to other digital content, wherein if the linked to content changes that dynamic digital document changes.
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
Example embodiments of methods and systems for uniquely identifying digital content, and for detecting changes in digital content, are described. The techniques disclosed herein may be used for a wide array of applications, such as electronic discovery (sometimes referred to as eDiscovery), content monitoring, quality assurance and verification, by way of example.
In an example embodiment, files composing a document at a different time periods may be accessed and sets of hash values corresponding to files composing the document at the different periods may be calculated. A determination may be made as to whether a file in the identified files at the different time periods is an HTML file, and if so an additional hash value corresponding to the HTML file is calculated. Aggregated hash values may be calculated based on hash values in the sets of hash values. A report may be generated reporting hash values for the document as it exists at the different time periods, including the hash values for the files composing the document, the additional hash values for respective HTML files, and the aggregated hash values. Changes in hash values may be indicated to indicate a change is the document and/or in a file composing the document.
An example embodiment provides a method of enabling changes in website content to be detected, the method comprising: receiving an address; accessing at a first time period, by a computerized system comprising at least one computing device, a web page corresponding to the address; identifying, by the system, HTML web page text of the web page accessed at the first time period; identifying, by the system, content linked to by the web page accessed at the first time period; storing the identified HTML web page text accessed at the first time period; accessing and storing the content linked to by the web page accessed at the first time period; calculating a first hash value for a first set of binary data of the web page accessed at the first time period by the system, the first set of binary data corresponding to the HTML web page text and the content linked to by the web page accessed at the first time period; calculating, by the system, a second hash value corresponding to the identified HTML web page text accessed at the first time period, wherein the second hash value is not calculated using the content linked to by the web page accessed at the first time period; storing the first hash value and the second hash value in association with a date and time corresponding to the first time period and in association with a first identifier; accessing, at a second time period, the web page corresponding to the address; identifying, by the system, HTML web page text of the web page accessed at the second time period; identifying, by the system, content linked to by the web page accessed at the second time period; storing the identified HTML web page text accessed at the second time period; accessing and storing the content linked to by the web page accessed at the second time period; calculating, by the system, a third hash value for a second set of binary data of the web page accessed at the second time period, the second set of binary data corresponding to the HTML web page text and the content linked to by the web page accessed at the second time period; calculating, by the system, a fourth hash value corresponding to the identified HTML web page text accessed at the second time period; storing the third hash value and the fourth hash value in association with a date and time corresponding to the second time period; using the first, second, third, and fourth hash values, generating, by the system, an indication as to whether the web page, including the content linked to by the web page, accessed at the second time period, has changed relative to the web page, including the content linked to by the web page, accessed at the first time period.
An example embodiment provides a method comprising: receiving an address for a document; identifying, by a computer system comprising at least one computing device, files composing the document at a first time period; calculating, by the computer system, a first set of hash values including respective hash values corresponding to the respective accessed files composing the document at the first time period; determining, by the computer system, if a file in the identified files composing the document at the first time period is an HTML file; at least partly in response to determining that a file in the identified files composing the document at the first time period is an HTML file, calculating, by the computer system, a first additional hash value corresponding to the HTML file; calculating, by the computer system, a first aggregated hash value based on hash values in the first set of hash values; identifying, by the computer system, files composing the document at a second time period; calculating, by the computer system, a second set of hash values including respective hash values corresponding to the respective accessed files composing the document at the second time period; determining, by the computer system, if a file in the identified files composing the document at the second time period is an HTML file; at least partly in response to determining that a file in the identified files composing the document at the second time period is an HTML file, calculating, by the computer system, a second additional hash value corresponding to the HTML file; calculating, by the computer system, a second aggregated hash value based on hash values in the second set of hash values; reporting: the first set of hash values, the first additional hash value, the first aggregate hash value, and the second set of hash values, the second additional hash value, and the second aggregate hash value.
An example embodiment provides a system comprising: a computing system comprising at least one computing device; a non-transitory computer storage medium having stored thereon executable instructions that direct the computing system to perform operations comprising: receiving an address for a document; identifying files composing the document at a first time period; calculating, by the computer system, a first set of hash values including respective hash values corresponding to the respective accessed files composing the document at the first time period; determining, by the computer system, if a file in the identified files composing the document at the first time period is an HTML file; at least partly in response to determining that a file in the identified files composing the document at the first time period is an HTML file, calculating, by the computer system, a first additional hash value corresponding to the HTML file; calculating, by the computer system, a first aggregated hash value based on hash values in the first set of hash values; identifying, by the computer system, files composing the document at a second time period; calculating, by the computer system, a second set of hash values including respective hash values corresponding to the respective accessed files composing the document at the second time period; determining, by the computer system, if a file in the identified files composing the document at the second time period is an HTML file; at least partly in response to determining that a file in the identified files composing the document at the second time period is an HTML file, calculating, by the computer system, a second additional hash value corresponding to the HTML file; calculating, by the computer system, a second aggregated hash value based on hash values in the second set of hash values; reporting: the first set of hash values, the first additional hash value, the first aggregate hash value, and the second set of hash values, the second additional hash value, and the second aggregate hash value.
Throughout the drawings, reference numbers may be re-used to indicate correspondence between referenced elements. The drawings are provided to illustrate example embodiments described herein and are not intended to limit the scope of the disclosure.
Example embodiments of methods and systems for uniquely identifying digital content, and for detecting changes in digital content, are described. Examples of digital content include, but are not limited to, digital content that include links to other content. The techniques disclosed herein may be used for a wide array of applications, such as electronic discovery (sometimes referred to as eDiscovery), content monitoring, quality assurance and verification, by way of example. It is understood that while certain examples may be discussed with respect to eDiscovery as applied to web pages, embodiments disclosed herein may be used for applications and other types of content, such as email, phone applications, etc.
By way of example, certain embodiments uniquely identify web pages that have been collected for the process of electronic discovery, compliance actions, forensic analysis, quality assurance, content monitoring or any other purpose, including those related to determining if web pages have been changed or altered. A given unique identifier for a web page may be generated, where the unique identifier may correspond to some or all of the web page content, optionally including web page metadata and documents linked to (e.g., using hyperlinks, such as a hyperlink that points to a whole document or to a specific element within a document.) by the web page (e.g., other web pages, image files, word processing, XML data feeds, etc.). Unique identifiers may be generated for a given web page at different points in time. A change in the web page content will result in a different identifier being generated for the web page. Conversely, if there is no change in the content of a web page, then the web page identifiers, generated for different snapshots of the web page, at different points in time, will have the same value.
Comparing an identifier generated for a given web page at one point (e.g., at the beginning of a legal hold period or other monitoring period) in time with identifiers generated for the web page at other points in time enables the determination as whether the content of the web page (optionally including linked-to content and metadata) has changed over the intervening time, and optionally the degree of change.
By way of example, a web page identifier may be generated by a content authentication system (sometimes referred to herein as a content analysis system) optionally using a hash function, a mathematical algorithm that maps data of variable lengths to a fixed length string of characters. The web page identifier may be significantly smaller in size than the web page. For example, the web page identifier may be a 32 digit long hexadecimal number, a 64 digit long hexadecimal number, or other size. Example hash functions include MD5, SHA-1, SHA-2, and SHA-3. By way of illustration, a file can be considered to be a very long binary number that can be fed into a hash algorithm to return an effectively unique hash. If the content of the file is changed, the binary number that it represents will necessarily be changed and therefore a new hash will be computed by the hash algorithm.
As noted above, conventional eDiscovery systems determining when static electronic information has been altered, such as documents that are effectively static and that only change when the document is intentionally modified. For example, conventional eDiscovery systems may use the MD5 hash algorithm for a static word processing document or email communication. By way of further example, for example conventional techniques define a set number of fields needed to uniquely identify an email, and then concatenate those fields and the body of the email into one continuous stream of data and then pass that data to the MD5 hash algorithm.
For both files and emails, the data to be analyzed is effectively static and only changes when the item is intentionally modified. This is generally not true for web pages. The standard web page is built up with Hyper Text Markup Language (HTML). This HTML code is rendered by the user's web browser and the format of the HTML used to construct the web page has sections that, once rendered, will be visible to the user and sections, such as internal hidden data, that will never be displayed to the user during normal viewing of the web page. An example a non-displayed section is metadata tags defined in the head section of the HTML code for a web page. While changes in the visible sections may be visible to a user, changes in the non-displayed (e.g., non-viewable) data may not be. Nonetheless, the detection of changes to both the visible and non-visible content may be needed for certain applications.
As noted above, non-displayed data can be changed in the HTML code for the web page that results in no visible change apparent to the user. In addition, HTML is designed to support embedded data and linked data. Such embedded data and linked data may be retrieved from the same web server serving the rest of the web page content or the embedded data and linked data may be loaded from a server on the other side of the world. For example, in HTML a hyperlink may be identified with an anchor, using a tag that starts with the text “<a”, and includes a reference “href=“URL”>”. By way of further example, XML may use an XLINK has a hyperlink.
A simple and inaccurate technique to identify web pages is to just calculate an MD5 hash using the text that represents the web page URL, such as by way of example “http://www.x1.com/products/x1_social_discovery/case_law_2012.HTML”. While this hash identifies the page being captured, it does not capture any of the content of the web page. It is very possible and perhaps likely that the above URL would not change and therefore the hash of the URL at different points in time would provide the same MD5 hash value, but the content the user sees when navigating to the above URL could change dramatically over the course of a given time period.
Conventional eDiscovery systems treat web pages as static content, and simply use the text content of the page that is generated by the user's web browser as the data used to calculate the MD5 hash. A user can see this same text by opening a web page and then instruct the web browser to display the HTML source code (e.g., by activating a “View Source” control) to view the web page HTML source. This approach takes into consideration the main HTML code's content, but will only adequately work for the simplest of web pages that solely consist of static text and do not have any linking or embedded content, and so do not use external data.
In reality, the majority of all web pages contain linked or embedded content. The above described method would be unable to detect any changes to any of this external data. For example, consider a web page with static text and a link to an image named “campfire.jpg”. The first time the web page is collected, conventional technologies would use the HTML text to create an MD5 hash. In this example, at some later point, the linked image “campfire.jpg” is replaced with a completely different image with the same name, “campfire.jpg”. The next time the web page is collected, an MD5 hash calculated solely on the text content of the page would be identical to the first captured value, even though the page is now visibly different from the first time of collection. Thus, disadvantageously, even though the web page displayed to the user would change to include the new image, an operator viewing the hash for the now-changed web page would believe that the web page has not changed.
To illustrate another disadvantage of conventional technologies, referring to the previous example, if instead of changing the linked image, the web page author had changed the metadata of the HTML, the newly calculated MD5 hash would be different from the originally calculated MD5 hash value, even though nothing readily visible to the end user had changed. Thus, conventional technologies will often provide misleading results when determining whether or not a web page has changed.
To account for these issues, proper collection of a web page should not only uniquely identify the web page, it should include all linked or embedded content (or a desired specified subset thereof), and optionally it should also give an indication, where possible, of whether the changes made were to the visible portion of the web page or to hidden portions of the web page. It may also be desirable in certain instances to indicate whether the changes are made to the web page text, or to the linked-to content/embedded content (or to both). It may also be desirable in certain instances to indicate which linked-to or embedded content changed. It may also be desirable in certain instances to be compatible with current industry standards by providing a single MD5 hash value that can be used as the overall identifier of the data being collected. Certain embodiments may include one or more of the foregoing features.
An example embodiment includes a Content Authentication (CA) system, that operates on the user's computer (e.g., a desktop, laptop, tablet, smart phone, or other computer) connected to a local and/or wide area network (e.g., the Internet) and uses a network adapter to collect each portion of web pages specified by an operator (e.g., by providing the URLs for the web pages, or a for the website). The CA system, comprising a computing device, may calculate a hash (e.g., an MD5 hash) for each web page item (e.g., the HTML text, the linked-to content, the embedded content, the URLs, etc.), an additional content hash (which is a hash of the normally viewable content) that is different for HTML files, and combines each hash (or hashes for specified content types) into a single hash (e.g., a single MD5 hash) to be used as the overall web page identifier. The various hashes and optionally the data used to generate the hashes (e.g., the web page items) is stored locally on the user's computer (e.g., in non-volatile memory) for future use and review. Optionally, in addition or instead, the CA system may be hosted on a remote server (e.g., a cloud based system) and the hashes and other web page data may be stored locally or on the remote server or other storage device. Thus, the collection, computation, and storage of information may be performed in whole or in part by a local system or by a remote system.
As noted above, the collection of web pages and the calculations of identifiers may be performed as part of a legal or regulatory hold, or to be alerted when a change in content in a specified document (e.g., an online document, such as a web page) has been detected. By way of illustration, if a person has made an insurance claim for an accident, alleging that the accident prevented the person from working, an insurance company may wish monitor the person's social networking or blog web page to detect if the person has posted information which would indicate that the person's allegation is false, and that the person is physically capable of working. For example, the person may be posting information and/or images (e.g., still or video images) regarding the person's participating in a sport or other physically strenuous activity. The insurance company can specify to the CA system that the CA system should periodically (e.g., once a day, once a week, or other specified period or specific days) inspect the web page (e.g., by specifying the web page or domain to the CA system), determine if a change in the content has occurred, and generate an alert to an operator so that the operator can visually inspect the content to determine whether the change in content is relevant to the determination that the person is being truthful or untruthful. The CA system may then collect presentations of the specified web page at the specified timing, generate corresponding hashes, compare a given hash to one or more previously generated hashes (e.g., the last generated hash), determine if the hash value has changed, and if the hash value has changed, provide a change notification to a specified operator or system. The insurance company may specify that an alert should be generated when any change is detected, or only when change is detected with respect to viewable data. For example, the insurance company may not care if a URL to an image has changed if the new URL is still pointing to the same image (even if the image is now being accessed from a different source).
In certain cases, a given webpage may not have a beginning tag or an end-of-file tag. Instead, the webpage may be “endless”. To deal with this and other cases, certain embodiments enable a user to indicate that the CA system is to look for posts or other changes after a user-specified date or after the last hash generation date, and run a hash on content after that posting date or last hash generation date. This enables the CA system to generate an artificial “end-of-file” to make a portion of the webpage into a “file.” Optionally, certain embodiments enable the user to specify that comments from non-owners of a webpage (e.g., such as friends of a person posting comments on a social networking webpage of the person) are to be excluded from the hash generated for the webpage.
By way of further example, if a legal or regulatory hold has been placed on a set of documents, in response (e.g., as soon as possible after the legal hold has initiated or in response to a regulatory trigger) copies of the documents may be collected, and unique document identifiers (reflective of the document content, including embedded data and/or link-to content) may be generated in response using embodiments discussed herein for the documents as baseline identifiers. A user interface may be provided for presentation on a user computer via which an operator or other user can specify the location(s) (e.g., web page URL, a domain name, a file path, etc.) of the documents and when identifiers for the documents should be recalculated (e.g., periodically and/or substantially immediately in response to an operator instruction). The CA system may then generate an alert or provide other indication when a change in content has been detected.
Further, certain embodiments are configured to capture websites or files which are linked to remote systems with an HTML overlay. The CA system may generate an alert or provide other indications when a change in content has been detected in such websites or filed linked to remote systems.
As noted above, a user interface is provided via which a user can specify a URL of a web page to be collected and indexed. Optionally, the user interface includes a control via which the user indicate that the URL is of a starting web page, and that the CA system is to crawl the website from the starting web page to a specified depth or number of pages. For example, a web page may include a link to another web page, which may in turn include a link to yet another web page, and so on. There may be circumstances where a user only needs to detect if any changes have been made to only a first number of web pages in the linked chain of web pages.
It may or may not be desirable to capture and include in the file hash the original URL of a file when identifying the path 114 to use in this process. Counterintuitively, it may be better in certain circumstances to exclude the URL as part of the unique identifier to the web page. It is not uncommon for different URLs to point at the same content, so by the URL from the web page hash, the user can identify web pages associated with seemingly different URLs as being identical. For example, URLs starting with http://x1.com, http://www.x1.com, and http://www.x1discovery.com might all point to the same final document/web page.
Once states 106 through 114 have been completed for each identified file that composes the desired web page, the CA system may generate a data structure, such as a table, including the acquisition MD5 hash 106, the content MD5 hash 110 or 112, and the path 114. Other data, such as the date the web page was collected, may also be included. The system may optionally sort the table alphabetically 116 using the path 114 as the sort order. This may ensure that the table data is consistently formatted and can therefore be compared against a previous list of the files (or the selected file types) that compose the web page. For example, if one or more linked or embedded files are added to or deleted from the web page, the table 116 for the changed web page would be different than the originally created table 116 for the original web page.
In this example, the CA system uses the data from table 116 as the input for calculating the overall hash 118 (sometimes referred to herein as an aggregated hash) for the collected web page.
When the CA system combines each individual hash 206, 208, 210 into an overall hash 204 for the web page, the user can be assured that the web page has not changed if the system is still calculating the same overall hash 204 on subsequent collections. The system may provide a corresponding indication when there is, or when there is not, a match. For example, the system may emphasize hashes that do not match via color coding, an icon, or otherwise. In addition or instead, a corresponding notification may be transmitted (e.g., as an email, SMS message, or otherwise) to an operator or another system. Because the system may calculate the acquisition hash 206 separately for each item of the web page (or each item type specified by the user), the user can identify changes where only one of the linked items has changed. If the web page is identical except for a single image file, for example “campfire.jpg”, the user can use the table of values 206, 208, and 210 and/or other notifications provided by the CA system to identify if that individual file has changed and be assured that that is the only change to the collected web page.
In embodiments where the CA system calculates the content hash 208 separately for HTML items, the user can identify when the acquisition hash 206 has changed but the content hash is the same, allowing the user to more quickly pinpoint the location of the change. It is not uncommon for the <metadata> tags in a web page to be changed frequently and therefore with the approach of only one MD5 hash for the full web page, this MD5 hash would be constantly changing, without giving the user an indication of which file or portion of the file was changing.
Optionally, the reporting user interface includes links to the corresponding document or document item. For example, if the user selects the acquisition hash or the URL, the CA system may cause the web page corresponding to the acquisition hash to be displayed and optionally the reference, earlier version of the web page (e.g., the initial version collected). Optionally, the CA system may modify the presentation of the accessed web page to emphasize the visible portions of the web page that have been changed relative to the reference web page (or vice versa). For example, if both versions of the web page include a link to an image, but the CA system has determined that the image has changed in the latter version of the reference web page, the CA system may visually and/or textually indicate that the image has changed (e.g., by drawing a red border around the image or otherwise).
Optionally, the CA system may be configured to detect different types of image changes. For example, optionally the system may be configured to detect whether the entire image has changed, whether the resolution/pixel count has changed, whether the image color has changed, whether the image has been cropped, or otherwise, using an image analysis module.
Certain example user interfaces will now be described with reference to the figures. Certain of the figures may be used to specify and initiate various types of content collection techniques, such as those using a single page web capture, a web crawler to capture multiple layers of a website, or the bulk import of multiple network resources (e.g., content associated with specified URLs).
If the user activates a “next” control the example user interface illustrated in
The collection captured by the CA system will appear in the left-hand Navigation Pane of the user interface illustrated in
Referring to
A layer user interface may be provided via which the user specify the number of layers the CA system is to crawl. A layer may include all of the links directly related to that page. For example, the user may set a limit as to how many layers down, from the top-level domain (or optionally from a specified sub-domain level), the CA system will crawl and index content. Optionally, there may be a maximum number of layers the system will permit the user to specify and will provide an error message to the user if the CA system detects that the user has specified an amount greater than the maximum number of levels.
A layer user interface may be provided via which the user specify the maximum number of pages to crawl and index (e.g., to prevent information overload).
A URL filter user interface may be provided via which the user can instruct the CA system to collect only URLs that start with or contain the user specified text entered into the URL filter field. The filter enables the CA system to filter for a particular directory or word that is contained in the URL. An “include subdomains” control is optionally provided via which the user can indicate whether subdomains are to be included in the crawl. A sub domain is a domain which part of a larger domain and has a different start to the URL address. By way of illustrative example, a subdomain for a “largerdomain.com” may be “subdomain.largerdomain.com.” Thus, in response to a default setting or in response to a user specification, only content that is within a specified top level domain will be indexed, and, via the “include subdomains” control, the user can indicate that content within a specified top level domain is to be indexed and that links should be followed and pages on subdomains are to be indexed.
An optional user interface is provided via which the user can specify one or more of the following example options. A Page Download Timeout field is provided via which the user can change the download timeout time (where the CA system will stop trying to collect a page if there is no response by the time the download timeout occurs). Timing out a page capture is helpful when a crawl has pages which are taking an excessive time to load and are failing as a result. A “Generate .PNG image for web pages” control is provided via which the user can instruct the CA system to create a PNG (or other visual image file) of the page to capture the appearance of the page as it would appear when viewed directly on the respective website. The use of a page image capture addresses the problem posed by certain dynamic and scripted pages that do not capture properly when viewing the HTML.
A “Download Videos” control is optionally provided via which the user can instruct the CA system to download videos when capturing pages including embedded or linked-to video content. A “Download File URLs” control is optionally provided via which the user can instruct the CA system to download file pages (e.g. PDF's) when capturing URLs that reference a binary, non-HTML filetype, like a PDF, DOC, PPT, or XLS. By way of example, a URL such as:
“http://www.x1.com/download/X1_Social_Discovery_Product_Brief.pdf.”
Optionally, a user interface is provided via which the user can set a maximum file size which the CA system is to capture. Optionally, a “Use White List” control is provided via which the user can instruct the CA system only collect the types of files specified by the user (e.g., via a white list of file types). Optionally, a scheduling interface is provided via which the user can specify how often a website is to be crawled (e.g., once a day, every third day, once an hour, once every five hours, once a week, every thirty minutes, etc.).
Optionally, the CA system provides a substantially real time web crawl progress indication for display. For example, the CA system may determine and display a list of found pages and will display while HTML files will be queued for download. As each queued page is downloaded, the download status will be display (e.g., an indication as to whether or not the download was successful).
Optionally, a user interface is provided via which the user can specify that browser cookie sessions are (or are not) to be used when crawling a website. The use of session cookies enables credentialed sites to be crawled and captured.
Crawl logs may be automatically generated by the CA system and may be provided for display to the user (e.g., in response to the user selecting an open log control on the web crawl's configuration user interface).
Referring to
The user interface of
Optionally, the CA system may capture and display the web page HTML, image, source code and/or the MD5 hash value calculated for the web page. A control, such as that illustrated in
Optionally, controls are provided via which a user can add to an existing web capture collection or delete an existing web capture collection. For example, the user may select a given collection, provide a URL to which the CA system browser is to navigate, activate the snapshot control, which will cause the CA system to capture the page and add it to the selected collection.
A hash generation module 608 may generate hash values for the content, which may include one or more webpages or other documents. For example, the hash generation module 608 may generate: hash values for files composing a document; dedicated hash values corresponding to HTML files used to compose the document; and/or aggregated hash values, as similarly discussed elsewhere herein. The hash values may optionally be stored in content data store 614 in association with the content.
An optional change detection module 610 detects changes in hash values for a given document calculated for versions of the document accessed at different times, where a change in a hash value may indicate corresponding changes in the associated file(s). For example, changes may be detected in the files composing a document, the dedicated hash values corresponding to HTML files used to compose the document, and/or the aggregated hash values. A report generation module 612 may be used to generate a report of the hash values for a given document, associated files, and associated HTML files, as well as report associated aggregated hash values. Changes in hash values may be indicated via text, highlighting, icons, sorting, and/or otherwise. The report may include a table, as similarly discussed elsewhere herein. The report generation module 612 may optionally sort the table alphabetically using a storage path as the sort order. The report may be provided to the user terminals 602, optionally via a webpage displayed by the user terminal browsers 603. The various systems and modules illustrated in
While the foregoing example references web pages, the processes and systems described herein can be applied to other documents including links to other content or having embedded content, such as an XML feed. By way of illustration, a dynamic document may be in the form of an email that includes links to images. By way of yet further illustration, a dynamic document may be in the form of a word processing document including a table that is dynamically populated using a data feed from a remote resource.
Thus, methods and systems are described for accurately identifying dynamic content, such as web pages, other documents including links to other content or having embedded content, and the like.
The methods and processes described herein may have fewer or additional steps or states and the steps or states may be performed in a different order. Not all steps or states need to be reached. The methods and processes described herein may be embodied in, and fully or partially automated via, software code modules executed by one or more general purpose computers. The code modules may be stored in any type of computer-readable medium or other computer storage device. Some or all of the methods may alternatively be embodied in whole or in part in specialized computer hardware. The systems described herein may optionally include displays, user input devices (e.g., touchscreen, keyboard, mouse, voice recognition, etc.), network interfaces, etc. While reference may be made to displaying or storing data in a row or column, other display formats and organizations or data storage structures may be used.
The results of the disclosed methods may be stored in any type of computer data repository, such as relational databases and flat file systems that use volatile and/or non-volatile memory (e.g., magnetic disk storage, optical storage, EEPROM and/or solid state RAM).
While the phrase “click” may be used with respect to a user selecting a control, menu selection, or the like, other user inputs may be used, such as voice commands, text entry, gestures, etc. User inputs may, by way of example, be provided via an interface, such as via text fields, wherein a user enters text, and/or via a menu selection (e.g., a drop down menu, a list or other arrangement via which the user can check via a check box or otherwise make a selection or selections, a group of individually selectable icons, etc.). When the user provides an input or activates a control, a corresponding computing system may perform the corresponding operation. Some or all of the data, inputs and instructions provided by a user may optionally be stored in a system data store (e.g., a database), from which the system may access and retrieve such data, inputs, and instructions. The notifications and user interfaces described herein may be provided via a Web page, a dedicated or non-dedicated phone application, computer application, a short messaging service message (e.g., SMS, MMS, etc.), instant messaging, email, push notification, audibly, and/or otherwise.
The user terminals described herein may be in the form of a mobile communication device (e.g., a cell phone), laptop, tablet computer, interactive television, game console, media streaming device, head-wearable display, networked watch, etc. They may optionally include displays, user input devices (e.g., touchscreen, keyboard, mouse, voice recognition, etc.), network interfaces, etc.
Many variations and modifications may be made to the above-described embodiments, the elements of which are to be understood as being among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure. The foregoing description details certain embodiments. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention can be practiced in many ways. As is also stated above, the use of particular terminology when describing certain features or aspects of certain embodiments should not be taken to imply that the terminology is being re-defined herein to be restricted to including any specific characteristics of the features or aspects of the invention with which that terminology is associated.
Number | Name | Date | Kind |
---|---|---|---|
5111398 | Nunberg et al. | May 1992 | A |
5202982 | Gramlich | Apr 1993 | A |
5303361 | Colwell et al. | Apr 1994 | A |
5659732 | Kirsch | Aug 1997 | A |
5671426 | Armstrong, III | Sep 1997 | A |
5692173 | Chew | Nov 1997 | A |
5704060 | Del Monte | Dec 1997 | A |
5717923 | Dedrick | Feb 1998 | A |
5721897 | Rubinstein | Feb 1998 | A |
5724424 | Gifford | Mar 1998 | A |
5724521 | Dedrick | Mar 1998 | A |
5724524 | Hunt et al. | Mar 1998 | A |
5748954 | Mauldin | May 1998 | A |
5752238 | Dedrick | May 1998 | A |
5754938 | Herz et al. | May 1998 | A |
5754939 | Herz et al. | May 1998 | A |
5768521 | Dedrick | Jun 1998 | A |
5778361 | Nanjo et al. | Jul 1998 | A |
5794210 | Goldhaber et al. | Aug 1998 | A |
5819273 | Vora et al. | Oct 1998 | A |
5826241 | Stein et al. | Oct 1998 | A |
5832208 | Chen et al. | Nov 1998 | A |
5835087 | Herz et al. | Nov 1998 | A |
5848397 | Marsh et al. | Dec 1998 | A |
5848407 | Ishikawa et al. | Dec 1998 | A |
5852820 | Burrows | Dec 1998 | A |
5855008 | Goldhaber et al. | Dec 1998 | A |
5864845 | Voorhees et al. | Jan 1999 | A |
5864846 | Voorhees et al. | Jan 1999 | A |
5903882 | Asay et al. | May 1999 | A |
5907837 | Ferrel et al. | May 1999 | A |
5918014 | Robinson | Jun 1999 | A |
5920854 | Kirsch et al. | Jul 1999 | A |
5920859 | Li | Jul 1999 | A |
5951731 | Tsunetomo et al. | Sep 1999 | A |
6006225 | Bowman et al. | Dec 1999 | A |
6029195 | Herz | Feb 2000 | A |
6035325 | Potts | Mar 2000 | A |
6070158 | Kirsch | May 2000 | A |
6073133 | Chrabaszcz | Jun 2000 | A |
6078866 | Buck et al. | Jun 2000 | A |
6085193 | Malkin et al. | Jul 2000 | A |
6112172 | True et al. | Aug 2000 | A |
6182068 | Culliss | Jan 2001 | B1 |
6216122 | Elson | Apr 2001 | B1 |
6269361 | Davis et al. | Jul 2001 | B1 |
6330567 | Chao | Dec 2001 | B1 |
6366923 | Lenk et al. | Apr 2002 | B1 |
6421675 | Ryan et al. | Jul 2002 | B1 |
6496820 | Tada et al. | Dec 2002 | B1 |
6513028 | Lee | Jan 2003 | B1 |
6564213 | Ortega et al. | May 2003 | B1 |
6606304 | Grinter et al. | Aug 2003 | B1 |
6615237 | Kyne et al. | Sep 2003 | B1 |
6638314 | Meyerzon | Oct 2003 | B1 |
6658626 | Aiken | Dec 2003 | B1 |
6665668 | Sugaya et al. | Dec 2003 | B1 |
6700591 | Sharpe | Mar 2004 | B1 |
6704747 | Fong | Mar 2004 | B1 |
6711565 | Subramaniam et al. | Mar 2004 | B1 |
6751603 | Bauer et al. | Jun 2004 | B1 |
6757713 | Ogilvie et al. | Jun 2004 | B1 |
6826594 | Pettersen | Nov 2004 | B1 |
6847959 | Arrouye et al. | Jan 2005 | B1 |
6862713 | Kraft et al. | Mar 2005 | B1 |
6873982 | Bates et al. | Mar 2005 | B1 |
6904450 | King et al. | Jun 2005 | B1 |
6961731 | Holbrook | Nov 2005 | B2 |
7035903 | Baldonado | Apr 2006 | B1 |
7047502 | Petropoulos et al. | May 2006 | B2 |
7054855 | Basso et al. | May 2006 | B2 |
7069497 | Desai | Jun 2006 | B1 |
7370035 | Gross et al. | May 2008 | B2 |
7424510 | Gross et al. | Sep 2008 | B2 |
7424676 | Carlson | Sep 2008 | B1 |
7496559 | Gross et al. | Feb 2009 | B2 |
7571325 | Cooley | Aug 2009 | B1 |
7590707 | McCloy, III | Sep 2009 | B2 |
7676501 | Wilson | Mar 2010 | B2 |
7702636 | Sholtis | Apr 2010 | B1 |
7707255 | Satterfield | Apr 2010 | B2 |
7945914 | Hasiuk et al. | May 2011 | B2 |
8019741 | Gross et al. | Sep 2011 | B2 |
8086953 | Gabber | Dec 2011 | B1 |
8296303 | Navas | Oct 2012 | B2 |
8498977 | Gross et al. | Jul 2013 | B2 |
8527516 | Spasojevic | Sep 2013 | B1 |
8543697 | Knowles | Sep 2013 | B2 |
8694473 | Wilson | Apr 2014 | B2 |
8725682 | Young | May 2014 | B2 |
8788935 | Hirsch | Jul 2014 | B1 |
8892532 | Sogtrop | Nov 2014 | B2 |
9384207 | Provenzano | Jul 2016 | B2 |
20010027406 | Araki et al. | Oct 2001 | A1 |
20010027450 | Shinoda | Oct 2001 | A1 |
20010029508 | Okada et al. | Oct 2001 | A1 |
20010039490 | Verbitsky et al. | Nov 2001 | A1 |
20020019679 | Okada et al. | Feb 2002 | A1 |
20020055981 | Spaey et al. | May 2002 | A1 |
20020078087 | Stone | Jun 2002 | A1 |
20020083178 | Brothers | Jun 2002 | A1 |
20020087408 | Burnett | Jul 2002 | A1 |
20020111962 | Crucs | Aug 2002 | A1 |
20020165707 | Call | Nov 2002 | A1 |
20020169763 | Tada et al. | Nov 2002 | A1 |
20020178009 | Firman | Nov 2002 | A1 |
20020184317 | Thankachan | Dec 2002 | A1 |
20030014415 | Weiss et al. | Jan 2003 | A1 |
20030037321 | Bowen | Feb 2003 | A1 |
20030110296 | Kirsch | Jun 2003 | A1 |
20030120939 | Hughes | Jun 2003 | A1 |
20030130993 | Mendelevitch et al. | Jul 2003 | A1 |
20030171910 | Abir | Sep 2003 | A1 |
20030227489 | Arend et al. | Dec 2003 | A1 |
20030229898 | Babu et al. | Dec 2003 | A1 |
20030233419 | Beringer | Dec 2003 | A1 |
20040073443 | Gabrick et al. | Apr 2004 | A1 |
20040128285 | Green | Jul 2004 | A1 |
20040133564 | Gross et al. | Jul 2004 | A1 |
20040143564 | Gross et al. | Jul 2004 | A1 |
20040172389 | Galai et al. | Sep 2004 | A1 |
20040205514 | Sommerer | Oct 2004 | A1 |
20040221295 | Kawai | Nov 2004 | A1 |
20040230891 | Pravetz | Nov 2004 | A1 |
20050041955 | Beuque | Feb 2005 | A1 |
20050114666 | Sudia | May 2005 | A1 |
20050204191 | McNally | Sep 2005 | A1 |
20050223061 | Auerbach | Oct 2005 | A1 |
20050256846 | Zigmond et al. | Nov 2005 | A1 |
20060036684 | Schwerk | Feb 2006 | A1 |
20060064394 | Dettinger | Mar 2006 | A1 |
20060095424 | Petropoulos et al. | May 2006 | A1 |
20060168067 | Carlson et al. | Jul 2006 | A1 |
20060195379 | Abecassis | Aug 2006 | A1 |
20060195481 | Arrouye | Aug 2006 | A1 |
20060224604 | Landsman | Oct 2006 | A1 |
20070179985 | Knowles | Aug 2007 | A1 |
20070192423 | Karlson | Aug 2007 | A1 |
20070226204 | Feldman | Sep 2007 | A1 |
20070240035 | Sthanikam | Oct 2007 | A1 |
20080034073 | McCloy | Feb 2008 | A1 |
20080114761 | Gross et al. | May 2008 | A1 |
20080133487 | Gross | Jun 2008 | A1 |
20080141365 | Soegtrop | Jun 2008 | A1 |
20080147818 | Sabo | Jun 2008 | A1 |
20080177799 | Wilson | Jul 2008 | A1 |
20080208812 | Quoc | Aug 2008 | A1 |
20080249977 | Tsunemi | Oct 2008 | A1 |
20090119268 | Bandaru | May 2009 | A1 |
20090320127 | Hong | Dec 2009 | A1 |
20090328218 | Tsurukawa | Dec 2009 | A1 |
20100125584 | Navas | May 2010 | A1 |
20100169320 | Patnam | Jul 2010 | A1 |
20110055177 | Chakra | Mar 2011 | A1 |
20110126142 | Zhou | May 2011 | A1 |
20110182422 | Anderson | Jul 2011 | A1 |
20110252033 | Narang | Oct 2011 | A1 |
20120017178 | Mulloy | Jan 2012 | A1 |
20120124012 | Provenzano | May 2012 | A1 |
20120197912 | Grigsby | Aug 2012 | A1 |
20120293840 | Wilson | Nov 2012 | A1 |
20120330887 | Young | Dec 2012 | A1 |
20130042083 | Mutalik | Feb 2013 | A1 |
20130166909 | Agrawal | Jun 2013 | A1 |
20130218873 | Lassley | Aug 2013 | A1 |
20130232160 | Tibble | Sep 2013 | A1 |
20130318053 | Provenzano | Nov 2013 | A1 |
20130326333 | Hashmi | Dec 2013 | A1 |
20130346327 | Lassley | Dec 2013 | A1 |
20140310324 | Branton | Oct 2014 | A1 |
20140344249 | Magistrado | Nov 2014 | A1 |
20150161159 | Provenzano | Jun 2015 | A1 |
20150188945 | Kjeldaas | Jul 2015 | A1 |
20150278764 | Patil | Oct 2015 | A1 |
20150293985 | Young | Oct 2015 | A1 |
Number | Date | Country |
---|---|---|
2007006122 | Jan 2007 | JP |
2007257348 | Oct 2007 | JP |
WO 9847264 | Oct 1998 | WO |
WO 9901802 | Jan 1999 | WO |
WO 9939286 | Aug 1999 | WO |
WO 9948028 | Sep 1999 | WO |
WO 0016218 | Mar 2000 | WO |
WO 0041090 | Jul 2000 | WO |
WO 0067159 | Nov 2000 | WO |
WO 2004023243 | Mar 2004 | WO |
Entry |
---|
“Lotus Magellan Explorer's Guide,” © 1989 Lotus Development Corporation. |
“Lotus Magellan Product Information Guide,” Circa 1989. |
Carol Ellison, “Info to Go,” Jun. 30, 2002; printed from http://www.pcmag.com/article2/0, 1759,3575,00.asp. |
Chapter 14, Section 14.5; “JavaScript & DHTML Cookbook” by Danny Goodman; ISBN: 0-596-00467-2; Publisher: O'Reilly; Print Publication Date Apr. 1, 2003. |
Chris Sherman, “Enfish Tracker Pro,” © Jan. 1999; printed from http://www.onlinemag.net/OL1999/sherman1.html. |
Computing Service, Email Addresses in Microsoft Outlook 98, Dec. 5, 2000, pp. 1-3. |
http://www2.essex.ac.uk/cs/services/email/addressing2.html—searched on www.waybackmachine.com—Intemet Archive Wayback Machine. |
International Search Report for PCT/US2003/027241, filed Sep. 3, 2003. |
Inverted index, Wikipedia, http://en.wikipedia.org/wiki/Inverted—index; Dec. 12, 2006. |
Jon Halpin, PC Data Finder Goes A-Hunting: Jun. 19, 2001; printed from http://www.pcmag.com/article2/0, 4149,144228,00.asp. |
Pogue, David, Finding Files and Web Sites with Sherlock 2, MAC OS 9: The Missing Manual, Chapter 15, pp. 257-278, Mar. 30, 2000. |
Proximity search (text), Wikipedia, http://en.wikipedia.org/wiki/Proximity—search—%28text%29; Dec. 12, 2006. |
Shneiderman, B., et al., Clarifying Search: A User-Interface Framework for Test Searches, D-Lib Magazine, Jan. 1997. |
Steve Barth, “Personal Toolkit: Navigating Information and Ideas,” KMWorid Apr. 2003, vol. 12, Issue 4; printed from http://www.kmworld.com/publications/magazIne/index.cfm?actIon=readarticle&article id=1505&publication id=1. |
Stop words, Wikipedia, http://en.wikipedia.org/wiki/Stop—words; Dec. 12, 2006. |
Supplementary Search Report for EP 03794539.1, filed Sep. 3, 2003. |
Number | Date | Country | |
---|---|---|---|
20140359411 A1 | Dec 2014 | US |
Number | Date | Country | |
---|---|---|---|
61831090 | Jun 2013 | US |