Not applicable.
The present invention generally relates to website communication and management, and, more specifically, to systems and methods for efficiently and effectively retaining placement of a website in Internet search results when the website is transferred between website hosting providers.
The Internet comprises a vast number of computers and computer networks that are interconnected through communication links. The interconnected computers exchange information using various services. In particular, a server computer system, referred to herein as a web server, may connect through the Internet to a remote client computer system and may send, to the remote client computer system upon request, one or more websites containing one or more graphical and textual web pages of information. The information on web pages is in the form of programmed source code that the browser interprets to determine what to display on the requesting device. The source code may include document formats, objects, parameters, positioning instructions, and other code that is defined in one or more web programming or markup languages. One web programming language is HyperText Markup Language (HTML), and all web pages use it to some extent. HTML uses text indicators called tags to provide interpretation instructions to the browser. The tags specify the composition of design elements such as text, images, shapes, hyperlinks to other web pages, programming objects such as JAVA applets, form fields, tables, and other elements. The web page can be formatted for proper display on computer systems with widely varying display parameters, due to differences in screen size, resolution, processing power, and maximum download speeds.
Websites, unless extremely large and complex or have unusual traffic demands, typically reside on a single server and are prepared and maintained by a single individual or entity. Some Internet users, typically those that are larger and more sophisticated, may provide their own hardware, software, and connections to the Internet. But many Internet users either do not have the resources available or do not want to create and maintain the infrastructure necessary to host their own websites. To assist such individuals (or entities), hosting companies exist that offer website hosting services. These hosting service providers typically provide the hardware, software, and electronic communication means necessary to connect multiple websites to the Internet. A single hosting service provider may literally host thousands of websites on one or more hosting web servers.
To view a website, a request is made to the web server by visiting the website's address. Upon receipt, the requesting device can display the web pages. The request and display of the websites are typically conducted using a browser. A browser is a special-purpose application program that effects the requesting of web pages and the displaying of web pages. Browsers are able to locate specific websites because each website, resource, and computer on the Internet has a unique Internet Protocol (IP) address. Presently, there are two standards for IP addresses. The older IP address standard, often called IP Version 4 (IPv4), is a 32-bit binary number, which is typically shown in dotted decimal notation, where four 8-bit bytes are separated by a dot from each other (e.g., 64.202.167.32). The notation is used to improve human readability. The newer IP address standard, often called IP Version 6 (IPv6) or Next Generation Internet Protocol (IPng), is a 128-bit binary number. The standard human readable notation for IPv6 addresses presents the address as eight 16-bit hexadecimal words, each separated by a colon (e.g., 2EDC:BA98:0332:0000:CF8A:000C:2154:7313).
IP addresses, however, even in human readable notation, are difficult for people to remember and use. A uniform resources locator (URL) is much easier to remember and may be used to point to any computer, directory, or file on the Internet. A browser is able to access a website on the Internet through the use of a URL. The URL may include a Hypertext Transfer Protocol (HTTP) request combined with the website's Internet address, also known as the website's domain name. An example of a URL with a HTTP request and domain name is: http://www.companyname.com. In this example, the “http” identifies the URL as a HTTP request and the “companyname.com” is the domain name.
Domain names are much easier to remember and use than their corresponding IP addresses. The Internet Corporation for Assigned Names and Numbers (ICANN) approves some Generic Top-Level Domains (gTLD) and delegates the responsibility to a particular organization (a “registry”) for maintaining an authoritative source for the registered domain names within a TLD and their corresponding IP addresses. The process for registering a domain name with .com, .net, .org, and some other TLDs allows an Internet user to use an ICANN-accredited registrar to register their domain name. Domain names are typically registered for a period of one to ten years with first rights to continually re-register the domain name.
The process of translating user-friendly domain names to IP Addresses is called Name Resolution. The domain name system (DNS) is the world's largest distributed computing system that enables access to any resource in the Internet by performing name resolution. A DNS name resolution is the first step in the majority of Internet transactions. The DNS is a client-server system that provides this name resolution service through a family of servers. In order for the domain name to resolve to the IP Address where the web server makes the website available, the web server may need to maintain several types of DNS server records, including the Address (A) record, Name Server (NS) record, and Mail Exchange (MX) record, among others. The DNS records contain information about the website location and resolution instructions to be interpreted by the DNS server. When a website is transferred between locations, such as if the web server is physically or electronically relocated or the hosting provider for the website is changed, these DNS records must be updated to resolve the domain name to the new location.
For Internet users and businesses alike, the Internet continues to be increasingly valuable. More people use the Web for everyday tasks, from social networking, shopping, banking, and paying bills to consuming media and entertainment. E-commerce is growing, with businesses delivering more services and content across the Internet, communicating and collaborating online, and inventing new ways to connect with each other. Competition between business has increased, as more businesses can access the same customers electronically. That is, a local business does not only compete with its “brick-and-mortar” physical neighbors, but also with businesses in distant locations and businesses that interact with customers purely online.
Customers frequently use Internet search engines, such as GOOGLE, BING, YAHOO, or BAIDU, to find businesses that provide the goods or services sought. Internet search engines create indexes of websites based on the contents of the websites. A searching customer enters keywords relevant to the goods or services into the search engine and receives search engine results pages (SERPs) displaying websites or web pages from the index in order of relevance to the entered keywords. In order to attract customers online, a business benefits from its website placing highly on SERPs for keywords that are relevant to its business. To improve its placement, a business may engage in search engine optimization (SEO) of its website. SEO may include modifying the code of web pages in the business's website to include strategically selected keywords in particular parts of the web pages. The optimized web pages must be exposed to the search engine's indexing activities for the SEO to be effective. If a web page is properly indexed, its prominence (i.e., its placement within the SERPs) can continually improve through scoring metrics, such as GOOGLE Page Rank, performed by the search engine.
Unfortunately, many changes to a website's structure can inhibit the search engines' indexing activities. For example, changing a URL for a web page or moving the website in a way that requires DNS record changes will separate the pages of the website from the accrued scoring information for one or more of the web pages. As a result, website owners are hesitant to make major changes to their website, such as transferring their website to a new hosting provider, because they fear they will lose earned prominence of their web pages.
The present invention overcomes the aforementioned drawbacks by providing a system and method for implementing changes to a website without losing the indexing status and accumulated SEO metrics of each of the web pages. The web server tasked with serving the website to requesting devices, which is also known as a hosting provider and may be the new web server in a hosting-transfer situation as described below, may perform one or more algorithms for the website changes. Alternatively, the web server may assign the changes to a related computer system, such as another web server, collection of web or other servers, a dedicated data processing computer, or another computer capable of performing the creation algorithms. Alternatively, a standalone program may be delivered to and installed on a personal computing device, such as the user's desktop computer or mobile device, and the standalone program may be configured to cause the personal computing device to perform the algorithms. For clarity of explanation, and not to limit the implementation of the present methods, the methods are described below as being performed by a web server that serves the web page to requesting devices.
In one implementation, a method in accordance with the present disclosure includes: receiving, on a server computer and from a requestor in communication with the server computer over a computer network, a request for a first web page hosted at a source URL; determining, by the server computer, a destination URL from one or more of the source URL and the first web page; and redirecting, by the server computer, the requestor to the destination URL. In another implementation, a method in accordance with the present disclosure includes: obtaining, by a server computer, one or more source URLs each corresponding to one of a plurality of first web pages of a first website; storing, by the server computer, one or more of the source URLs as source paths in a page mapping table that associates each of the source paths with a destination path; for each source path, determining if one of a plurality of second web pages should be associated with the source path and, if one of the second web pages should be associated with the source path, storing the URL of the second web page as the destination path associated with the source path; receiving, on the server computer and from a requestor, a request for one of the first web pages, the request comprising the source URL corresponding to the requested first web page; determining, by the server computer, a destination URL by identifying the source path in which the source URL of the request is stored, and retrieving, as the destination URL, the URL stored in the destination path associated with the identified source path; and redirecting, by the server computer, the requestor to the destination URL. In yet another implementation, a system in accordance with the present invention includes a processor configured to: obtain a source URL for a first web page of a first website; store the source URL as a source path in a page mapping table that associates each of a plurality of source paths with a destination path; match the first web page to a second web page of a second website; and store, in the destination path associated with the source path that contains the source URL, the URL of the second web page.
Referring to
The website data store 120, and other data stores described below, may be any repository of information that is or can be made freely or securely accessible by the web server 100. Suitable data stores include, without limitation: databases or database systems, which may be a local database, online database, desktop database, server-side database, relational database, hierarchical database, network database, object database, object-relational database, associative database, concept-oriented database, entity-attribute-value database, multi-dimensional database, semi-structured database, star schema database, XML database, file, collection of files, spreadsheet, or other means of data storage located on a computer, client, server, or any other storage device known in the art or developed in the future; file systems; and other electronic files.
The requesting device 110 may request website content when a user enters a URL for the website in the requesting device's 110 browser. The browser then uses the requesting device's 110 communication protocols to access a DNS server 105. The DNS server 105 stores DNS records for the website in a name resolution database 115. The DNS server 105 uses the DNS records to resolve the URL to an IP address for the web server 100 and directs the browser of the requesting device 110 to that IP address. Similarly, as is known in the art, a search engine 130 can access the DNS server 105 to obtain the resolution of the website's domain name to the IP address for the web server 100, and can then index the website in order to include the website in the search engine's 130 SERPs. Indexing the website can include storing information about the website in an index data store 125. The stored information can include website content that the search engine interprets, in light of information stored for other indexed website, to determine a suitable ordering of search results in the SERPs. The content in the index data store 125 therefore may be a primary factor in determining the website's prominence on SERPs for keywords that are relevant to the website. The indexed content typically includes the URLs for some or all of the web pages in the website. As stored, the URL can be a complete URL (e.g. “http://www.website.com/home/example_page.html” or the resolved equivalent “http://123.45.678/home/example_page.html”) or a truncated URL with one or more parent directories implied (e.g. “home/example_page.html” or “example_page.html”) as is known in the art.
An interface module 135 may be configured to electronically access the web server 100 in order to modify the website or to perform page remapping as described below. The interface module 135 may be a web page, web, mobile, or other Internet application, application programming interface (“API”), or a standalone terminal or other computing device. A website owner or his authorized agent (hereinafter “owner”) can use any suitable secured or unsecured means to activate the interface module 135 and access and modify his website or one or more of its configuration files.
Without performing the method of
To prevent the loss of indexing status, at step 200 a page mapping table is generated by the web server 100 or by the interface module 135 itself, and may be stored in the website data store 120 to be accessed by the web server 100 when serving the website. One embodiment of the page mapping table, illustrated as TABLE 1 below, includes columns for the source path and destination path for each web page in the table. The page mapping table may further include a column for indicating the HTTP status code that is generated when a requesting device 110 or search engine 130 requests the source path URL. The page mapping table may further include columns for conveying indexing status and one or more SEO metrics. For example, columns may be included to indicate whether one or more particular search engines 130 have indexed the web page. A column may be provided to convey the GOOGLE Page Rank or another indicator of SERP prominence. Each row of the page mapping table corresponds to a web page of the website. The table may include all of the web pages in the website, or a subset thereof. In one embodiment, the table may include only the web pages that have changed (i.e., have been modified, deleted, or added) between the old and new versions of the website. The source path may be the full or truncated URL of the web page in the old version of the website. The source path may be entered manually by the owner or another entity, or the source paths for the web pages may be automatically retrieved by the web server 100 and pre-populated within the table.
In one embodiment, at step 205 the web server 100 may “crawl” the old version of the website using any suitable methodology to determine the source paths of the web pages. Additionally or alternatively, the web server 100 may access the index of one or more search engines 130 to identify the web pages of the website that have been indexed by that search engine 130. For example, a “site:mydomain.com” search may be performed on GOOGLE to obtain one or more SERPs that list all of the web pages on mydomain.com that GOOGLE has indexed. The web server 100 may add the source paths of all or a subset of the web pages that have been indexed to the page mapping table. In one embodiment of such a subset, the web server 100 may determine from the set of indexed web pages which source paths would generate a 404 error if requested from the new website, and may add those web pages to the subset. The web server 100 may also, at step 210, analyze the identified web pages by retrieving data for other columns in the page mapping table. The data retrieved at step 210 may further include data that is not displayed in the table but may be used to organize the table for display in the interface module 135. The retrieved data may include SEO metrics such as GOOGLE Page Rank or SEOMOZ Page Authority, information from web page meta tags, web page titles, and other web page data that may facilitate page mapping.
At step 215, the web server 100 may organize the table for presentation to the owner. Organizing the table may include sorting the rows of web pages to improve the presentation of the table to the owner. In one embodiment, the table may be sorted in descending order of the GOOGLE Page Rank obtained at step 210. This allows the owner to attend to the page mapping of the most important web pages first. Relatedly, such ordering typically places high-frequency web pages (i.e., web pages that are often included in websites), such as “home,” “about,” and “contact” pages, at the top of the table, facilitating the automated destination path acquisition described below.
At step 220, destination paths may be entered for the web pages in the page mapping table. A destination path is the full or truncated URL of the web page in the new version of the website that corresponds to the web page at the source path listed on a line of the table. A blank entry for the destination path may indicate that the path for that web page has not changed in the new version of the website. In some embodiments, the owner may enter the desired destination paths manually via the interface module 135, and the web server 100 receives the destination paths and stores them in the page mapping table.
Failing a direct match, heuristic comparisons may identify common patterns in the source path and one or more new web page URLs. Heuristic searches may employ any suitable statistical probability model, such as Bayesian probability, for matching web pages, and may employ a confidence level as a threshold for determining whether a match is certainly found, is certainly not found, or should be confirmed by the owner or another user. Some non-limiting examples of heuristic matches include:
Automated acquisition may further include, at step 235, performing content comparisons instead of or in addition to direct or heuristic file name comparisons. For example, where a heuristic comparison has identified more than one possible match of new web pages to an old web page, the content of the new web pages may be compared to the content of the old web page to a desired depth. “Depth” herein refers to the complexity of the content comparison. A comparison with low depth may involve comparing the text within the title tags of each web page and determining a percent match. In contrast, a comparison with high depth may involve determining whether any image files are present within both the old and a new web page, or comparing paragraph text within the bodies of the web pages to determine common word density or identically reused phrases. Statistical probability models and confidence levels can be used as above to determine whether a match is found. In another example, the content comparison of step 235 may be performed on directly-matched old and new web pages (i.e., an exact match to an old web page file name is present in the new website, per step 230) to determine whether content that is relevant to the SEO metrics of the old web page is present on the new web page of the same name. If the relevant content is no longer present, the web server 100 may determine whether the content was moved to a new page using the heuristic comparisons of step 230 and/or the content comparisons of step 235; the web server 100 may enter then URL of any matching new web page as the destination path and request confirmation of the destination path from the owner. In yet another example, the content comparison of step 235 may be performed for any old web page that could not be matched using file name matching.
At step 240, the web server 100 may present the page mapping table to the owner via the interface module 135. The page mapping table may be complete upon presentation, provided the web server 100 was able to automatically match each old web page to a new web page with a suitable level of confidence. Source or destination paths that do not meet the confidence level may be indicated to the owner for confirmation. Some or all of the data in the table may be editable by the owner. Additional indicators may direct the owner to enter destination paths for source paths that could not be matched.
In other embodiments of completing the page mapping table, the steps as described in
At step 315, source paths may be entered for the old web pages that correspond to the new web pages. Entering the source paths may include, at step 320, identifying the old web pages by their URLs. The web server 100 may crawl the old version of the website using any suitable methodology to determine the source paths of the web pages. Additionally or alternatively, the web server 100 may access the index of one or more search engines 130 to identify the web pages of the old website that have been indexed by that search engine 130. Additionally or alternatively, such as if the old website is no longer online, the web server 100 may crawl an archived version of the website that may be available at archive.org (the Internet Wayback Machine), in GOOGLE Cache, or at another internet resource. The web server 100 may store the complete set of results (i.e., the URLs of all old web pages identified) for the subsequent matching steps 325, 330 and for further uses, or the web server 100 may perform the matching steps 325, 330 without storing all of the URLs. At step 325, the web server 100 may perform name matching between the URLs of the identified old web pages and the destination paths, as in step 230 above, and may store suitable matches as the source paths in the table. At step 330, the web server 100 may perform content comparisons as in step 235 above, and may store further matches as source paths in the table.
At step 335, the web server 100 may analyze the identified old web pages as in step 210 above in order to obtain the indexing status and/or SEO metrics for the old web pages. All of the identified old web pages may be analyzed, or only the old web pages that are entered into the page mapping table as source paths may be analyzed. At step 340, the completed page mapping table may be organized as in step 215 above. At step 345, the page mapping table may be presented to the owner via the interface module 135. In addition to the matched source and destination path entries, the page mapping table may be presented with the option to display old web pages that were not mapped to any new web pages. In particular, the page mapping table can include unmapped old web pages that have relatively valuable SEO metrics, such as a high GOOGLE Page Rank, so that the owner can retain a page mapping for those web pages. The unmapped old web pages may be displayed as source paths, with an indicator to the owner that a destination path should be entered for each unmapped web page.
While the owner can manipulate the page mapping table as needed, the web server 100 may use the completed or partially completed page mapping table to handle requests for the web pages at the source paths. In some embodiments, the web server 100 may handle such requests using a redirector page for each row of the page mapping table. A “redirector page” is a web page that has the source path as its URL and contains source code that either automatically forwards the visitor/requestor to the destination path, or contains an instruction to the visitor/requestor that the web page previously located at the source path has moved to the destination path. For example, a redirector page that automatically forwards the visitor may contain a meta refresh tag that redirects the visitor to the destination path after a predetermined time. When the web server 100 publishes the new website, it may concurrently publish redirector pages for each of the source paths in the page mapping table. The web server 100 may propagate changes to the page mapping table by publishing new or revised redirector pages when the changes are made.
In other embodiments, the web server 100 may handle source-path requests using HTTP status codes. Referring to
If the request is a legitimate request for the old web page that resided at the source path, at step 415 the web server 100 may search the page mapping table for a destination path that corresponds to the source path. If a corresponding destination path is found, at step 420 the web server may send a HTTP status code 301 “Moved Permanently” to the requestor. Commonly known as the “301 redirect,” this status code can be interpreted by browsers and other user agents so that the user is automatically forwarded to the new URL provided in the status code, which may be the appropriate destination path from the page mapping table. Google and other search engines have indicated that the 301 redirect will retain most of the accumulated SERP prominence of the original (i.e., old) web page. At step 425, the web server 100 may update the “HTTP code” column for the source path to “301” if needed.
The web server 100 may fail to identify a destination path from the table, such as when the source path is not in the table or a destination path has not been associated with it. In some embodiments, if the web server 100 does not find a corresponding destination path at step 415, the web server 100 may return a standard code 404 error to the requestor. In other embodiments, at step 430 the web server 100 may perform one or more of the file name matching (step 230) and content comparisons (step 235) of the method of
Referring to
The proxy server 140 handles incoming URL requests as in
Referring to
Once the page mapping table is generated, the web server 100 may serve the website and handle source-path requests using the methods described above in order to protect the indexing status of the old web pages.
Similarly to the embodiment of
The web design server 160 may be configured to import web pages from the old website and present them to the owner during the web design process. The web design server 160 may itself crawl the old website to obtain the old web page data, or the web design server 160 may request the web server 100 or another server computer to crawl the old website. The web design server 160 may then present each of the old web pages to the owner. The owner may choose to keep or discard the old web page, and may edit the old web page and save the web page for use in the new website. The web design server 160 may be further configured to assist the owner in creating and saving completely new web pages. The web design process results in a new website that may contain all old web pages, all new web pages, or a mixture of old and new web pages.
The web server 100 may compile the page mapping table during the web design process or after it is complete. In an embodiment of the latter, the web server 100 may populate the source path and destination path columns of the page mapping table using any of the methods described above. In other embodiments, the web server 100 may populate the source path column of the page mapping table by crawling the old website as described above, and may transmit the incomplete table to the web design server 160. As each new web page is created, the web design server 160 may prompt the owner to associate the new web page with an old web page from the page mapping table. If the new web page is an imported old web page, the web design server 160 may prompt the owner to confirm that the old and new web pages are the same (and thus, SEO data should pass through from the old web page to the new web page). The web design server 160 may obtain the URL of the associated new pages and store them as destination paths in the table.
The schematic flow chart diagrams included are generally set forth as logical flow-chart diagrams. As such, the depicted order and labeled steps are indicative of one embodiment of the presented method. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated method. Additionally, the format and symbols employed are provided to explain the logical steps of the method and are understood not to limit the scope of the method. Although various arrow types and line types may be employed in the flow-chart diagrams, they are understood not to limit the scope of the corresponding method. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the method. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted method. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.
The present invention has been described in terms of one or more preferred embodiments, and it should be appreciated that many equivalents, alternatives, variations, and modifications, aside from those expressly stated, are possible and within the scope of the invention.