Many techniques are available to users today to find information on the world wide web (“web”). For example, a user may access a document by clicking on a link that includes a uniform resource identifier (URI) associated with the document. Many collections of URIs may exist on the Internet. One example of a URI collection is a collection of bookmarks. If a user finds a document of interest, the user may save the document as a bookmark. The bookmark may store the URI associated with the document and the user may access the document at a later time by selecting the bookmark. However, a URI associated with a document may change. For example, the document may be moved to a different domain. Thus, a user may not be able to access the document via the bookmark if the URI associated with the document has changed. Outdated URI collections may negatively impact the user's browsing experience.
According to one aspect, a method, performed by one or more computer devices, may include obtaining, by at least one of the one or more computer devices, a stored uniform resource identifier (URI) associated with a particular resource and associated with a URI collection; accessing, by at least one of the one or more computer devices, a document index that stores information about canonical URIs, where the information relates a particular canonical URI to one or more other URIs; determining, by at least one of the one or more computer devices, whether the particular canonical URI, stored in the document index and associated with the particular resource, differs from the stored URI; and replacing, by at least one of the one or more computer devices, the stored URI with the canonical URI, when the canonical URI differs from the stored URI.
According to another aspect, a method, performed by one or more computer devices, may include obtaining, by at least one of the one or more server devices, one or more canonical uniform resource identifiers (URIs) from a document index, where the one or more canonical URIs have changed since a particular time period; obtaining, by at least one of the one or more server devices, one or more outdated URIs associated with particular ones of the one or more canonical URIs from the document index; generating, by at least one of the one or more server devices, a URI update that includes the one or more canonical URIs and the associated one or more outdated URIs; and providing, by at least one of the one or more server devices, the generated URI update to one or more subscribers to replace the one or more outdated URIs with the one or more canonical URIs.
According to yet another aspect, a method, performed by one or more computer devices, may include subscribing, by at least one of the one or more computer devices, to a uniform resource identifier (URI) updates service; receiving, by at least one of the one or more computer devices, a URI update from the URI updates service, where the URI update includes an old URI and a new URI associated with the old URI; determining, by the at least one of the one or more computer devices, whether the old URI is stored in a URI collection associated with the one or more computer devices; and updating, by the at least one of the one or more computer devices, the old URI to the new URI, when the old URI is stored in the URI collection.
According to yet another aspect, a system may include one or more server devices to obtain a stored resource identifier associated with a resource identifier collection; access a document index that stores information about canonical resource identifiers, where the information relates a particular canonical resource identifier to one or more other resource identifiers; determine whether the canonical resource identifier differs from the stored resource identifier; and replace the stored resource identifier with the canonical resource identifier, when the canonical resource identifier differs from the stored resource identifier.
According to yet another aspect, a system may include one or more server devices to obtain one or more canonical uniform resource identifiers (URIs) from a document index, where the one or more canonical URIs have changed since a particular time period; obtain one or more outdated URIs associated with particular ones of the one or more canonical URIs from the document index; generate a URI update that includes the one or more canonical URIs and the associated one or more outdated URIs; and provide the generated URI update to one or more subscribers to replace the one or more outdated URIs with the one or more canonical URIs.
According to yet another aspect, a non-transitory computer-readable medium, storing instructions executable by one or more processors, may include one or more instructions to subscribe to a uniform resource identifier (URI) updates service; one or more instructions to receive a URI update from the URI updates service, where the URI update includes an old URI and a new URI associated with the old URI; one or more instructions to determine whether the old URI is stored in a URI collection associated with the one or more computer devices; and one or more instructions to update the old URI to the new URI, when the old URI is stored in the URI collection.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one or more embodiments described herein and, together with the description, explain these embodiments. In the drawings:
The following detailed description of the invention refers to the accompanying drawings. The same reference numbers may be used in different drawings to identify the same or similar elements. Also, the following detailed description does not limit the invention.
A URI may include a string of characters that identifies a resource on a network, such as the Internet. A resource may include any entity with an identity which may be accessed or retrieved over a network connection, such as a document, an image, an audio file, a video file, a data feed, and/or any other type of resource. A common example of an URI may be a uniform resource locator (URL). A URL may correspond to a URI that, in addition to identifying a resource, specifies how to access, or act upon, the resource. For example, a URL of http://www.webpage.com may specify a document that may be accessed at a device with a network address of www.webpage.com using the Hypertext Transfer Protocol (HTTP).
Many collections of URIs exits on the Internet. Examples of URI collections may include URI click data collected by a search engine, a bookmark collection, a search history, a browser history, a collection of data feed subscriptions, a collection of podcast subscriptions, external links included in messages of a discussion group or a message board, links included in email or text messages sent or received by users, a collection of URIs included in a particular document (e.g., a document associated with a “links” title), and/or any other collection of one or more URIs.
A URI may become outdated, meaning that the resource associated with the URI can no longer be accessed via the URI. For example, a URI may become outdated when the resource is moved to a different location, a web site associated with the resource changes domain names or extensions, and/or when the resource is renamed. Large collections of URIs may include many URIs that are no longer valid. For example, a user may store a bookmark collection on a bookmark server and the bookmark server may store bookmark collections for many users. Thus, over time, the bookmark server may end up including many outdated URIs.
An implementation described herein may relate to canonicalization of URIs in a collection of URIs. Canonicalization of a URI may correspond to updating the URI to a canonical URI. A canonical URI may correspond to the most up-to-date version of the URI available in a reference collection of URIs, such as a document index. Furthermore, multiple URIs may identify the same resource, and one of the multiple URIs may be chosen as a canonical URI. For example, two URIs may identify the same resource, yet one URI may include characters that could be removed from the URI while still leaving the URI as a functioning URI. Examples of characters that could be removed include characters associated with session identifiers or other types of characters not necessary for identifying the resource.
In one implementation described herein, a computer device associated with a URI collection may scan a stored URI in the URI collection and contact a document index (or another reference collection of URIs) to determine a canonical URI for the stored URI. If the canonical URI differs from the stored URI, the stored URI may be replaced with the canonical URI. In one example, the computer device may include a server device that manages a particular URI collection. In another example, the computer device may include a client device that stores URIs.
A URI collection may include multiple instances of a same URI. For example, many users may store the same bookmark in their bookmark folder on a bookmark server. Thus, in another implementation described herein, the computer device may generate a unique list of URIs associated with the URI collection and may determine canonical URIs using the unique list of URIs. Once a canonical URI is determined for a particular URI in the unique list of URIs, the canonical URI may be propagated to other instances of the particular URI in the URI collection.
In yet another implementation described herein, a URI updates publisher device may obtain a list of URIs that have recently changed from the document index and provide URI updates at particular intervals to subscribers. A computer device, such as a bookmark server, may subscribe to the URI updates publisher device and may receive URI updates at particular intervals. The URI updates may include a list of outdated URIs together with corresponding canonical URIs.
Another implementation described herein may involve obtaining a canonical URI in response to a user selecting an outdated URI. For example, if a user clicks on a URI that is outdated, while using a browser application, the browser application, or an add-on application (e.g., a toolbar) associated with the browser application, may contact a document index (or a URI updates publisher device) to determine a canonical URI. The browser application may receive the canonical URI and may access the resource associated with the outdated URI without having to display an error message to the user. Additionally, the add-on application may report the outdated URI to a device that manages URI updates, such as a URI updates publisher device.
Another implementation described herein may include identifying a document that includes an outdated URI and sending a notification about the outdated URI to an owner or manager associated with the document. The notification may include a canonical URI obtained from the document index.
A “document,” as the term is used herein, is to be broadly interpreted to include any machine-readable and machine-storable work product. A document may include, for example, an e-mail, a web page or a web site, a file, a combination of files, one or more files with embedded links to other files, a news group posting, a news article, a blog, a business listing, an electronic version of printed text, a web advertisement, etc. In the context of the web (i.e., the Internet), a common document is a web page. Documents often include textual information and may include embedded information (such as meta information, images, hyperlinks, etc.) and/or embedded instructions (such as Javascript, etc.). A “link,” as the term is used herein, is to be broadly interpreted to include any reference to/from a document from/to another document or another part of the same document.
Client device 110 may include a communication or computation device, such as a personal computer, a wireless telephone, a personal digital assistant (PDA), a lap top, or another type of computation or communication device. In one implementation, a client device 110 may include an application that enables documents to be accessed. Client device 110 may also include software, such as a plug-in, an applet, a dynamic link library (DLL), or another executable object or process, that may operate in conjunction with (or be integrated into) the application to implement canonicalization of URIs. Client device 110 may obtain the software from a particular software providing server device (not shown in
In one example, the application may include a web browser running Hypertext Transfer Protocol (HTTP) and/or another protocol to access a document based on a URI, such as, for example, SPDY (a Transmission Control Protocol (TCP)-based application level protocol for transporting web content), File Transfer Protocol (FTP), BitTorrent protocol, and/or any other file transfer protocol. In yet another example, client device 110 may correspond to a mobile device and the application may include a program that uses a transfer protocol associated with an operating system running on the mobile device (e.g., Android or iOS).
Network 120 may include any type of network, such as a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a wireless network, such as a general packet radio service (GPRS) network, an ad hoc network, a telephone network (e.g., the Public Switched Telephone Network (PSTN) or a cellular network), an intranet, the Internet, or a combination of networks. Client device 110, document index server 130, content server 140, UI collection server 150, and/or URI updates publisher server 160 may connect to network 120 via wired and/or wireless connections.
Document index server 130 may include one or more devices (e.g., server devices) that manage a document index. A document index may associate query terms to documents. Document index server 130 may be associated with a search engine that matches query terms to documents. Furthermore, document index server 130 may include a crawler that browses documents on the Internet and determines up-to-date URIs associated with resources. The document index may associate a canonical URI with a resource and may also associate one or more current and/or outdated URIs with the canonical URI.
Content server 140 may include one or more devices (e.g., server devices) that may store one or more resources and/or that may provide content to client device 110. For example, a browser, at client device 110, may request a document associated with a particular URI, and a Domain Name Server (DNS) (not shown in
URI collection server 150 may include one or more devices (e.g., server devices) that are associated with a URI collection. For example, URI collection server 150 may include a bookmark server device that stores bookmarks associated with particular users, a bookmark server device that enables users to share and annotate bookmarks, a mail server device that stores messages sent or received by particular users, a short message service (SMS) server that stores text messages sent or received by particular users, a search history server device that stores search histories associated with particular users, a server device that stores data feed subscriptions for particular users, a server device that stores podcast subscriptions for particular users, a server device that stores messages posted in connection with a discussion group or message board, a server device that stores documents that include URIs, and/or any other computer device associated with a collection of URIs.
URI updates publisher server 160 may include one or more devices (e.g., server devices) that provide URI updates to subscribers. For example, URI updates publisher server 160 may contact document index server 130 to obtain a list of URIs that have been updated since a particular time, such as since a previous time when URI updates publisher server 160 has obtained a list of URIs from document index server 130. URI updates publisher server 160 may receive subscriptions from devices associated with a URI collection, such as URI collection server 150 and/or client device 110. URI updates publisher server 160 may generate a URI update based on the list of URIs obtained from document index server 130 and may send the update to the subscribers. The URI update may relate canonical URIs to outdated URIs.
Although
Computing device 200 may correspond to client device 110, document index server 130, content server 140, URI collection server 150, and/or URI updates publisher server 160. For example, each of client device 110, document index server 130, content server 140, URI collection server 150, and/or URI updates publisher server 160 may include one or more computing devices 200. Mobile computing device 250 may correspond to client device 110 and/or to content server 140. For example, each of client device 110 and/or content server 140 may include one or more mobile computing devices 250.
Computing device 200 may include a processor 202, memory 204, a storage device 206, a high-speed interface 208 connecting to memory 204 and high-speed expansion ports 210, and a low speed interface 212 connecting to low speed bus 214 and storage device 206. Each of the components 202, 204, 206, 208, 210, and 212, may be interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. Processor 202 may process instructions for execution within computing device 200, including instructions stored in the memory 204 or on storage device 206 to display graphical information for a graphical user interface (GUI) on an external input/output device, such as display 216 coupled to high speed interface 208. In another implementation, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices 200 may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system, etc.).
Memory 204 may store information within computing device 200. In one implementation, memory 204 may include a volatile memory unit or units. In another implementation, memory 204 may include a non-volatile memory unit or units. Memory 204 may also be another form of computer-readable medium, such as a magnetic or optical disk. A computer-readable medium may be defined as a non-transitory memory device. A memory device may include memory space within a single physical memory device or spread across multiple physical memory devices.
Storage device 206 may provide mass storage for computing device 200. In one implementation, storage device 206 may include a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. A computer program product may be tangibly embodied in an information carrier. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described below. The information carrier may include a computer- or machine-readable medium, such as memory 204, storage device 206, or memory included within processor 202.
High speed controller 208 may manage bandwidth-intensive operations for computing device 200, while low speed controller 212 may manage lower bandwidth-intensive operations. Such allocation of functions is an example only. In one implementation, high-speed controller 208 may be coupled to memory 204, display 216 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 210, which may accept various expansion cards (not shown). In the implementation, low-speed controller 212 may be coupled to storage device 206 and to low-speed expansion port 214. Low-speed expansion port 214, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device, such as a switch or router, e.g., through a network adapter.
Computing device 200 may be implemented in a number of different forms, as shown in
Mobile computing device 250 may include a processor 252, a memory 264, an input/output (I/O) device such as a display 254, a communication interface 266, and a transceiver 268, among other components. Mobile computing device 250 may also be provided with a storage device, such as a micro-drive or other device (not shown), to provide additional storage. Each of components 250, 252, 264, 254, 266, and 268, may be interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.
Processor 252 may execute instructions within mobile computing device 250, including instructions stored in memory 264. Processor 252 may be implemented as a set of chips that may include separate and multiple analog and/or digital processors. Processor 252 may provide, for example, for coordination of the other components of mobile computing device 250, such as, for example, control of user interfaces, applications run by mobile computing device 250, and/or wireless communication by mobile computing device 250.
Processor 252 may communicate with a user through control interface 258 and a display interface 256 coupled to a display 254. Display 254 may include, for example, a TFT LCD (Thin-Film-Transistor Liquid Crystal Display), an OLED (Organic Light Emitting Diode) display, and/or other appropriate display technology. Display interface 256 may comprise appropriate circuitry for driving display 254 to present graphical and other information to a user. Control interface 258 may receive commands from a user and convert them for submission to processor 252. In addition, an external interface 262 may be provide in communication with processor 252, so as to enable near area communication of mobile computing device 250 with other devices. External interface 262 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used.
Memory 264 may store information within mobile computing device 250. Memory 264 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. Expansion memory 274 may also be provided and connected to mobile communication device 250 through expansion interface 272, which may include, for example, a SIMM (Single In Line Memory Module) card interface. Such expansion memory 274 may provide extra storage space for mobile computing device 250, or may also store applications or other information for mobile computing device 250. Specifically, expansion memory 274 may include instructions to carry out or supplement the processes described above, and may also include secure information. Thus, for example, expansion memory 274 may be provided as a security module for mobile computing device 250, and may be programmed with instructions that permit secure use of mobile computing device 250. In addition, secure applications may be provided via SIMM cards, along with additional information, such as placing identifying information on a SIMM card in a non-hackable manner.
Memory 264 and/or expansion memory 274 may include, for example, flash memory and/or NVRAM memory, as discussed below. In one implementation, a computer program product may be tangibly embodied in an information carrier. The computer program product may store instructions that, when executed, perform one or more methods, such as those described above. The information carrier may correspond to a computer- or machine-readable medium, such as the memory 264, expansion memory 274, or memory included within processor 252, that may be received, for example, over transceiver 268 or over external interface 262.
Mobile computing device 250 may communicate wirelessly through a communication interface 266, which may include digital signal processing circuitry where necessary. Communication interface 266 may provide for communications under various modes or protocols, such as GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others. Such communication may occur, for example, through radio-frequency transceiver 268. In addition, short-range communication may occur, such as using a Bluetooth, WiFi, or other such transceiver (not shown). In addition, a Global Positioning System (GPS) receiver module 270 may provide additional navigation- and location-related wireless data to mobile computing device 250, which may be used as appropriate by applications running on mobile computing device 250.
Mobile computing device 250 may also communicate audibly using an audio codec 260, which may receive spoken information from a user and convert it to usable digital information. Audio codec 260 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of mobile computing device 250. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on mobile computing device 250.
Mobile computing device 250 may be implemented in a number of different forms, as shown in
Various implementations of the systems and techniques described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software, software applications or code) may include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine-readable medium” “computer-readable medium” may refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” may refer to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described herein may be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a LAN, a WAN, and the Internet.
Although
Add-on application 310 may be associated with a browser application and/or another application that accesses resources using URIs. In one example, add-on application 310 may be incorporated into a browser application (e.g., Google Chrome, Microsoft Explorer, Apple Safari, Mozilla Firefox, etc.). In another example, a user of client device 110 may be offered an option to install add-on application 310 by itself or as part of another application (e.g., a toolbar for a browser application). In one example, add-on application 310 may include one or more selectable visual elements, such as an option to activate or de-activate add-on application 310. In another example, add-on application 310, after obtaining the user's permission to activate, may not be associated with any selectable visual object and may function without interaction with the user.
Add-on application 310 may be associated with a URI collection 312 and may include a URI update manager 314 and a URI monitor 316.
URI collection 312 may store one or more URIs. In one example, URI collection 312 may include a bookmark collection associated with a browser application. In another example, URI collection 312 may include a browsing history associated with the browser application. In yet another example, URI collection may include URIs included in messages sent and/or received by the user of client device 110 in connection with a particular application, such as, for example, an email application, a text messaging application, and/or an instant messaging application.
URI update manager 314 may update URIs stored in URI collection 312 to canonical URIs based on information received from another device, such as document index server 130, URI collection server 150, and/or URI updates publisher server 160.
URI monitor 316 may monitor documents being accessed by client device 110 for outdated URIs. For example, if the user of client device 110 is browsing a document that includes links (e.g., URIs of other documents or other types of resources), URI monitor 316 may check whether the URIs included in the document are functioning. In one example, URI monitor 316 may attempt to access a resource associated with a URI included in the document, without providing the resource to an output device associated with client device 110, to determine whether the resource can be accessed. In another example, URI monitor 316 may contact document index server 130 to determine whether URIs included in the document are associated with canonical URIs that are different. URI monitor 316 may report any determined outdated URIs to a particular device, such URI updates publisher server 160.
Although
Document index 332 may associate URIs with resources and may associate a canonical URI with one or more other URIs. Example fields that may be stored in document index 332 are described below with reference to
Although
User memory 352 may store information associated with user accounts. Example fields that may be stored in user memory 452 are described below with reference to
Although
Subscriber memory 362 may store information about subscribers that subscribe to a URI update service with URI updates publisher server 160. For example, subscriber memory 321 may store information (e.g., a network address and/or port number) associated with particular URI collection servers 150 (e.g., a bookmark server, a search history server, a mail server, etc.). In another example, if content server 140 includes documents which include many URIs, content server 140 may also subscribe to URI updates publisher server 160. For example, content server 140 may store news articles that include links to other news articles. News article documents may be associated with URIs that change often. Therefore, if content server 140 subscribes to URI updates publisher server 160, content server 140 may benefit by keeping URIs, included in documents hosted by content server 140, current. URI updates publisher server 160 may charge a subscription fee for the URI updates subscription service. In yet another example, client device 110 may subscribe to URI updates publisher server 160.
URI update manager 364 may contact document index server 130 to obtain canonical URIs that have recently changed (e.g., since the last time URI update manager 364 contacted document index server 130) via index interface 368 and may store the obtained URIs in URI update memory 366. URI update manager 364 may generate a URI update that includes information about URIs that have recently changed and may forward the generated URI update to subscribers via subscriber interface 370.
Index interface 368 may convert a request from URI update manager 364 into a particular format associated with document index server 130 and may convert messages received from document index server 130 into a particular format associated with URI update manager 364. Subscriber interface 370 may convert a URI update message into a particular format associated with a particular subscriber and may convert messages received from a particular subscriber into a particular format associated with URI update manager 364. URI update memory 366 may store information about URIs received from document index server 130.
Link monitor 372 may identify a document that includes a broken link, based on an indication of an outdated URI stored in URI update memory 366, and may send a notification to an owner or manager associated with the document. The notification may include a canonical URI that may be used to replace the outdated URI. Content manager interface 374 may convert a message from link monitor 372 into a particular format associated with an owner or manager of a document that includes a broken link.
Although
Resource ID field 410 may store information identifying a particular resource. For example, resource ID field 410 may store a string that uniquely identifies the resource. Canonical URI field 420 may store a canonical URI associated with the resource. Other URIs field 430 may store one or more other URIs associated with the resource, such as an outdated URI. Backlinks field 440 may store information about documents that include a URI stored in canonical URIs field 420 and/or other URIs field 430. In other words, backlinks field 440 may store backlinks associated with the resource.
Although
User ID field 460 may store information identifying a particular user. For example, user ID field 460 may store a string that uniquely identifies the particular user. URI field 470 may store URIs associated with the particular user. For example, URI field 470 may store URIs associated with the particular user's bookmarks, URIs associated with the particular user's search history, URIs associated with messages sent or received by the particular user, etc.
Although
The process of
A determination may be made if the canonical URI differs from the retrieved URI (block 530). For example, URI update manager 354 (or URI update manager 314) may compare the received canonical URI to the retrieved URI. The retrieved URI may be updated to the canonical URI if the canonical URI differs from the retrieved URI (540). For example, URI update manager 354 may update the retrieved URI to the canonical URI in URIs field 470 of user record 451 associated with the retrieved URI. As another example, URI update manager 314 may update the retrieved URI to the canonical URI in URI collection 312.
The process of
A document index may be checked to identify URIs that have changed (block 620). For example, URI update manager 354 (or URI update manager 314) may compare a canonical URI associated with a particular URI from the list of unique URIs to determine whether the canonical URI differs from the particular URI. If the canonical URI differs from the particular URI, the particular URI may be identified as a URI that has changed.
URIs in the list of unique URIs may be canonicalized using changed URIs from the document index (block 630). For example, URI update manager 354 may change a particular URI, which has been identified as a URI that has changed, to a canonical URI associated with the particular URI. As another example, URI manager 314 may change a particular URI stored on client device 110, which has been identified as a URI that has changed, to a canonical URI associated with the particular URI.
The canonicalized URIs may be propagated to other instances in the generated list of URIs (block 640). For example, URI update manager 354 may propagate the canonicalized URI to other instances in the collection of URIs. For example, URI update manager 354 may determine instances of a particular URI, stored in URI list 356 and that has changed to a canonical URI, in user memory 352 and may change all instances of the particular URI in user memory 352 to the canonical URI. Thus, as an example, if 100 users have saved a URI “www.bookmark.com” as a bookmark, and the URI “www.bookmark.com” has been canonicalized to the URI “www.newbookmark.com,” URI update manager 354 may change the bookmark in the 100 user accounts that include the bookmark. As another example, URI update manager 314 may change all instances of a URI that has been canonicalized on client device 110. For example, assume client device 110 includes the URI “www.myhomepage.com” in a bookmark folder of a browser application, in an email message sent to a contact of the user of client device 110, and in a document composed by a word processing program. Further assume that the URI “www.myhomepage.com” has been canonicalized to “www.mynewhomepage.com.” URI update manager 354 may change all three instances of the URI “www.myhomepage.com” to “www.mynewhomepage.com.”
The process of
The process of
URI updates may be generated (block 820). For example, URI update manager 364 may generate a URI update that includes a list of URIs that have changed since a previous URI update along with corresponding canonical URI. For example, an entry included in the URI update may include “www.oldURL.com has changed to www.newURL.com”. The generated URI updates may be provided to subscribers (block 830). For example, URI update manager 364 may retrieve a list of subscribers from subscribers memory 362 and may send the generated URI update to devices identified in the retrieved list of subscribers.
The process of
The resource may be accessed using the obtained canonical URI (block 930). For example, add-on application 310 may instruct the browser application to access the resource using the canonical URI. Additionally, if the outdated URI is stored by client device 110, add-on application 310 may replace the stored outdated URI with the canonical URI.
The outdated URI may be reported (block 940). For example, add-on application may report the outdated URI to URI updates publisher server 160. Furthermore, in some situations, document index server 130 may not include a canonical URI. For example, a URI associated with a resource may have changed and crawler 334 may not have determined a new URI for the resource yet. In such situations, the browser application may generate an error message and add-on application 310 may report the outdated URI to document index server 130.
The process of
A document may be identified that includes the outdated URI (block 925). For example, link monitor 372 may access backlinks field 440 of document record 401 associated with the outdated URI to determine documents that include the outdated URI. A content manager associated with the identified document may be identified (block 935). For example, link monitor 372 may identify a manager or owner associated with the document that includes the outdated URI. In one example, contact information associated with the manager or owner associated with the document may be stored in backlinks field 440 or may be stored in another memory of documents. In another example, link monitor 372 may obtain contact information associated with the manager or owner by searching a domain associated with the document. Link monitor 372 may search the domain for terms indicative of contact information. For example, assume an outdated URI “www.outdatedURI.com” is included in a document identified by the URI “www.example-domain.com/link.html.” Link manager 372 may search www.example-domain.com for a URI that includes the term “contact” and may search a document associated with the URI for an email address.
A notification may be sent to the identified content manager about the outdated URI (block 945). For example, link monitor 372 may send a notification, via content manager interface 374, to an address associated with the determined content manager. The notification may include information identifying the outdated URI and may include a new canonical URI associated with the outdated URI.
Document index server 130 may crawl content server 140 (signal 1020) and may determine that a new URI is associated with the video file (signal 1030). For example, content server 140 may have changed domain names or may have moved the video file to a different location. Document index server 130 may store the new URI as the canonicalized URI in connection with the video file.
URI collection server 150 may periodically check with document index server 130 for a list of URIs that have been updated (signal 1040). Document index server 130 may provide URI updates to URI collection server 150 (signal 1050). URI collection server 150 may update the URI associated with the video file as stored in the message in the user's “sent emails” folder (signal 1060). At a later time, the user may access the sent email and may select the updated URI included in the email, which may now correspond to the correct URI associated with the video file (signal 1070). Thus, the user may be able to access the video file from the sent email message, even though the URI associated with the video file has changed.
The user may choose to store the search results in the user's search history stored by URI collection server 150 (signal 1120). Crawler 334, associated with document index server 130, may crawl content server 140 (signal 1130). Crawler 334 may obtain a new URI associated with URI stored in the user's search history (signal 1140). URI updates publisher server 160 may periodically check for URI updates by accessing document index server 130 (signal 1150). URI updates publisher server 160 may obtain a list of URIs that have been updated signal 1160). The obtained list may include the URI associated with the document stored by content server 140.
URI updates publisher server 160 may publish a URI update, which may include a list of URIs that have recently changed. URI collection server 150 (which in this example includes a server that stores search histories) may subscribe to URI updates publisher server 160. Since URI collection server 150 is a subscriber of URI updates publisher server 160, URI collection server 150 may receive the URI update from URI updates publisher server 160 (signal 1170).
URI collection server 150 may update user search histories which include URIs that have changed, as indicated in the received URI update (signal 1180). Thus, URI collection server 150 may update the search history associated with the user of client device 110. When the user of client device 110 accesses the stored search history to retrieve the document stored in content server 140, the search history may store the correct URI for the document (signal 1190). Thus, the user may be able to access the document from the user's search history, even though the URI associated with the document has changed.
Example 1200 may include a browser application client device 110-A accessing a document using an old URI (signal 1210). The old URI may correspond to a broken link and client device 110-A may fail to retrieve the document (item 1215). In response, add-on application 310, associated with the browser application, may check for updates associated with the old URI by accessing document index server 130 (signal 1220) and may retrieve a new URI associated with the document (signal 1230). The browser application may access the document stored at content server 140-A using the new URI (signal 1240).
Furthermore, add-on application 310 may report the broken link to URI updates publisher server 160 and may provide the new URI, received from document index server 130, to URI updates publisher server 160 (signal 1250). In another example, when add-on application 310 detects a broken link, add-on application 310 may report the broken link directly to URI updates publisher server 160, URI updates publisher server 160 may determine a new URI by contacting document index server 130, and URI updates publisher server 160 may provide the new URI to add-on application 310.
URI updates publisher server 160 may publish a URI update and may include the new URI in the published URI update (signal 1260). Client device 110-B may be subscriber of URI publisher server 160 and may receive the URI update. In response, add-on application 310 running on client device 110-B may update the old URI, stored in a bookmark folder, to the new URI (item 1265).
URI updates publisher server 160 may check for documents that include the old URI by contacting document index server 130 (signal 1270). Document index server 130 may provide backlink information for the document associated with the old URI, which may include information about documents that include the old URI (signal 1280). URI updates publisher server 160 may identify an owner of the document that includes the old URI, which in this case may be content server 140-B. Content server 140-B may store a document that includes the old URI (item 1205). URI updates publisher server 160 may determine contact information for content server 140-B and may send a message to content server 140-B, informing content server 140-B about the broken link and providing the new URI (signal 1290). Content server 140-B may update the document by replacing the old URI in the document with the new URI (item 1295).
The foregoing description provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of these implementations.
For example, while series of blocks or signals have been described with regard to
Also, certain portions of the implementations may have been described as “component,” “manager,” “monitor,” “crawler,” or “interface” that performs one or more functions. The terms “component,” “manager,” “monitor,” “crawler,” and “interface” may include hardware, such as a processor, an ASIC, or a FPGA, or a combination of hardware and software (e.g., software running on a processor).
Furthermore, while implementations described herein have been described with respect to URIs, other types of resource identifiers may be used. A resource identifier may include any string of characters (e.g., name, network address, identifier, etc.) that uniquely identifies a resource.
It will be apparent that aspects described herein may be implemented in many different forms of software, firmware, and hardware in the implementations illustrated in the figures. The actual software code or specialized control hardware used to implement aspects does not limit the embodiments. Thus, the operation and behavior of the aspects were described without reference to the specific software code—it being understood that software and control hardware can be designed to implement the aspects based on the description herein.
It should be emphasized that the term “comprises/comprising,” when used in this specification, is taken to specify the presence of stated features, integers, steps, or components, but does not preclude the presence or addition of one or more other features, integers, steps, components, or groups thereof.
Even though particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of the invention. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one other claim, the disclosure of the invention includes each dependent claim in combination with every other claim in the claim set.
No element, act, or instruction used in the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise.