This invention relates generally to uniform resource locators (URLs), and more particularly to correcting misspelled or otherwise unresolvable URLs.
The term “uniform resource locator” (URL) refers to an addressing technique used to identify resources on the Internet or on a private intranet. To access information, e.g. web content, stored on a computer connected to the Internet, a user may type a URL into a text entry block provided by an Internet browser. The browser generally submits the URL to a domain name server, which translates the URL into an Internet protocol (IP) address. The IP address identifies the particular computer that holds the desired information.
A common problem associated with manually typing URLs into a browser, is that the user may enter an incorrect URL. The user may, for example, make a typing error, incorrectly guess at the spelling of a URL, or the like.
Most currently available web browsers provide only minimal assistance in correcting a mis-entered URL. Generally, the browser's assistance is limited to autocompletion of partial words. More robust error correction and spellchecking methods are used by some Internet search engines. Google, for example, uses the frequency with which users enter a particular term as one measure of attempting to correct the spelling of a URL. Some browsers provide comparison of a URL entered into the browser with URLs that have previously successfully resolved.
Other browsers provide the URL to a server, which checks directory and file names present on the server against corresponding components of the entered URL, and returns a list of possible correct spellings to the requestor based on available files. Other browsers generate a list of candidate URLS using a fuzzy URL detection scheme.
A more complete understanding of the present embodiments and advantages thereof may be acquired by referring to the following description taken in conjunction with the accompanying drawings, in which like reference numbers indicate like features, and wherein:
Preferred embodiments and their advantages are best understood by reference to
If a user enters a URL that does not resolve, e.g. there is no computer with an IP address corresponding to the entered URL, software, e.g. a browser, may include functionality that allows the browser to correct the URL. The entire URL may be corrected, or only a portion thereof. For example, the prefix of the URL, e.g. “www.”, the body of the URL, e.g. “USPTO”, and/or the domain extension of the URL, e.g. “.gov”, may be corrected. Correcting the URL may involve comparing the entered URL to a homophone/homonym list, a list of previously resolved URLs, a list of commonly misspelled words, or other techniques as described below.
Referring first to
RAM 114 may include static dynamic RAM (SD RAM), double data rate RAM (DDR RAM), synchronous RAM (SRAM) or other suitable types of RAM. Generally, RAM 114 holds programs and data to be executed by CPU 112. ROM 116 may include electrical erasable programmable read-only memories (EEPROM) or other types of non-volatile memories. ROM 116 is generally used to hold basic input/output system (BIOS) instructions used by CPU 112 during power up, or other types of information that may be required to be available to CPU 112 on a non-transitory basis.
In the illustrated embodiment, I/O adapter 118 is shown connected to disk drive 120 and tape drive 122. Disk drive 120 may be, in one embodiment, an electro-magnetic storage medium, such as a hard disk drive, or a collection of disk drives, e.g. a redundant array of independent disks (RAID). Tape drive 122 may be a magnetic storage tape, such as those used for back up and archival purposes, or some other suitable type of analog or digital tape drive useful for storing information that may be used by CPU 112. Although not illustrated, other types of drives and/or storage devices may be connected to I/O adapter 118. For example, various optical drives, compact disk (CD) drives, digital video disk (DVD) drives, and the like may also be connected to information handling system 110 through I/O adapter 118 or through a separate I/O adapter configured to control particular storage devices.
User interface 126 may be used to provide connection for various devices, such as mouse 128 and keyboard 130, that allow information handling system 110 to receive input from, and provide input to, a user. Display adapter 132 is also used in the illustrated embodiment to provide video signals to display 134. Communications adapter 124 may be an Ethernet adapter, a token ring adapter, a satellite interface digital subscriber link (DSL) adapter, or any of various other subsystems adapted to communicate via a network or otherwise.
In the illustrated embodiment, information handling system 110 may connect to server 152 or 154 through Internet 150. Browser software stored in RAM 114 is executed by CPU 112 to display a browser on display 134. A user may enter a URL into the browser displayed on display 134 using keyboard 130. Assuming that the user desires to download a web page from server 154. The user may use keyboard 130 to type in the URL corresponding to the address of server 154. If, however, the user mistypes or otherwise incorrectly enters the desired URL, rather than connecting to server 154, information handling system 110 may connect to the server at an incorrect URL address 152. Alternatively, if there is no server or other machine connected to Internet 150, which corresponds to the URL entered by the user, the URL entered by the user will not resolve.
Examples of some types of errors that may result in a user entering a URL incorrectly include typing errors, guessing at spelling, miscommunication of the URL to the user, and domain-name extension errors. A typing error may occur, for example, where a user intends to type in “www.USPTO.gov”, but instead types in “www.USPRO.gov”. The mistyped URL may link to a web page that is completely unrelated to the original, desired web page. In some instances, business desiring to profit from such mistyping errors will establish websites that display competitor's websites to users who mistype a URL, and in some cases, mistyped URLS will result in accessing websites that display adult content, which can prove offensive to some and may harm a business's reputation.
Errors in entering a URL may also occur if a user guesses at the spelling of an unfamiliar word. For example, a user may believe that “cingular” is spelled “singular”. Such misspelled URLs are subject to the same problems as mistyped URLs, but have the additional disadvantage that the user may give up trying to enter the correct URL, since in the user's mind, the URL has already been entered correctly. Another common source of incorrect URLs occurs when a user hears the name of a URL, but misinterprets the name. For example, a user may type in “www.house4sale.com” instead of “www.houseforsale.com”. As another example, the user may hear “houseforsale.com” rather than “www.housesforsale.com”. Finally, domain extension errors can occur if a user mistakenly assumes that, for example, the URL should end in “.com” rather than “.gov”, “.net”, “.org” or the like.
At least one embodiment of the present disclosure accounts for entry errors across the entire URL, including the prefix, the main body and the extension. Additionally, multiple types of errors, including typing errors, guessing errors, miscommunication errors and domain extension selection errors, are addressed by various embodiments. Such embodiments provide improved functionality over solutions which may only perform substitutions if the prefix or extension is missing completely, solutions that rely on external servers, solutions that perform only basic spell checking, and over solutions that employ simple look-ahead completion techniques based on entries previously typed into the browser.
Referring next to
The method then proceeds to 230, where the extension is corrected as needed. The number of domain extensions is limited, and spelling errors on these domains may be detected and corrected using, for example, pattern matching rules. The rules set for determining a probable correct domain extension includes, in at least one embodiment, a highest pattern match score or a most common typing mistakes template that accounts for frequent mistakes. In at least one embodiment, method 200 may first use the domain extension that most closely matches the correct number of matching letters. For example, since “.xom” has the highest match pattern to “.com”, “.com” would be substituted. A second rule that may employed would be to use empirical information, such as the proximity of certain keys that make certain mistakes more likely than others. So, for example, since the C and the X keys are proximate to each other, “.xom” is probably frequently mistyped for “.com”. Thus, “.com” would be substituted for “.xom”. In an alternate embodiment, if a URL appears to be otherwise correct, other domain extensions may be tried in order of frequency use until a valid URL is achieved. Thus, if a URL ending “.com” does not resolve, then “.com” may be changed to “.org”. Thus, a user who typed in “www.uspto.com” as the desired URL could have the system correct the URL to “www.uspto.gov”.
The method proceeds to 240, where the body of the domain name is evaluated to determine if it is apparently correctable. To determine if the domain name is apparently correctable, at least one embodiment of the present disclosure looks to the browser history list, which, in one embodiment, includes a listing of URLs typed into the browser. Each of the URLs typed into the browser is examined to see if a URL similar to the entered URL successfully resolved. If a similar URL has been successfully resolved, the previously successfully resolved URL will be used in place of the current URL, which did not resolve. For example, if a user mistyped “www.cmm.com” and an examination of the history that “www.cnn.com” had been visited before, then a substitution could be made.
In at least one embodiment, an entered URL is determined to be similar if the entered URL differs from a successfully resolved URL by fewer than a predetermined number of characters, or if the entered URL differs from a previously resolved URL by less than a certain percentage of characters. For example, two URLs may be similar if they differ by less than two characters. Alternatively, the two URLs may be considered similar if less than two out of every five characters are different. In yet other embodiments the number of characters in each URL may also be taken into account.
The system may also scan a list of common misspellings, using a spell detection/correction scheme that has been adapted to accept unparsed text. So, for example, “houses4sale” could be recognized as “houses for sale”. Many “spam” websites take advantage of such misspellings, and similar lists could be generated and utilized by a browser to avoid accessing undesired sites through misspelling.
If a typed URL still does not resolve, the system may look at a homophone and/or a homonym list to determine if a homonym or homophone may be substituted for the incorrect URL. So, for example, if the user had mistyped “www.homes4sale” and this URL did not resolve, a substitution can be made using “www.homesforsale.com”.
Assuming that a substitution can be made based on the browser history list, a list of common misspellings, or a homonym/homophone list, then the user may be presented with a listing of possible websites at 260. If no substitution is apparent at 240, then an error message can be returned at 250 to notify the user that the entered URL is unresolvable. If the user approves the correction at 260, for example by selecting one of the presented alternatives, the method proceeds to 270, where the domain name is corrected according to the selection. After the domain name is corrected, the method proceeds to the website specified by the URL at 290. If the user does not approve any of the corrections presented at 260, the method proceeds to 280, and attempts to resolve the URL without any changes to the main body of the domain name.
Although method 200 has been described as having elements performed in a particular order, other embodiments of the present invention may perform the same actions in a different order, perform different actions in place of one or more of the illustrated and discussed actions, or have additional or fewer actions than those illustrated. For example, at least one embodiment of the present disclosure may automatically correct the body of the domain name at 240 without requesting user approval at 260. Still other embodiments determine whether a domain name is apparently correctable and request user approval for the correction prior to correcting the prefix or extension. Yet further embodiments may determine if the domain name is correctable, determine if the prefix is correctable and determine if the extension is correctable and provide suggestions to correct one or more of these portions of the URL to a user for his approval prior to performing any corrective measures or substitutions. In at least one embodiment, some or all possible corrections may be evaluated prior to determining if the entered URL resolves. In one such embodiment, a URL that appears to have been mis-entered, e.g. the URL is similar but not identical to a previously resolved URL, will cause a pop-up list of suggested URLs to be displayed.
Referring next to
If a successfully resolved URL similar to the entered URL is identified at 306, the similar URL is substituted at 340. Method 300 directs the user to the website specified by the substitute URL at 342. If, however, the substitute URL does not resolve, the method proceeds to 308, where method 300 determines whether the URL entered by the user is included in a list of commonly misspelled URLs. The list of commonly misspelled URLs may be obtained, for example, from a commercially available dictionary of misspelled words. Alternatively, user surveys, tests, or data obtained through other empirical methods may be used to construct a list of commonly misspelled URLs. Regardless of the source of the list of misspelled URLs, if the URL entered by the user is included in the list of commonly misspelled URLs, then the method proceeds to 310 where the misspelled URL is substituted for a correctly spelled URL in the list of commonly misspelled URLs. Method 300 then proceeds to 312, where it determines whether the substituted URL resolves. If the substituted URL does resolve, the method accesses the website specified by the substituted URL at 342.
If, at 308, the entered URL is not included in the list of commonly misspelled URLs, or if a substituted URL does not resolve at 312, the method proceeds to 314, where method 300 checks for a misspelled or missing prefix. If the prefix is misspelled or missing it is corrected at 316, and the URL with the corrected prefix is tested at 318 to determine if it will resolve. If the URL with the corrected prefix does resolve, the user is directed to the website specified by that URL. If, however, the URL does not resolve, the method proceeds to 320. Likewise, if at 314 it is determined that the prefix is correctly spelled, the method also proceeds to 320.
At 320, method 300 determines whether the domain is a homophone or homonym. So, for example, if the user has entered “right” when instead the proper URL should have been “write” method 300 will recognize that “right” is a homophone/homonym of “write” and make the appropriate substitution at 322. The URL including the substitution is tested at 324 to determine if it resolves. If the corrected URL does resolve, the user is directed to the website specified by the URL at 342. If, however, the URL does not resolve, or if the URL entered by the user does not include a domain name in the homonym list, then the method proceeds to 326.
At 326, method 300 determines whether the domain extension is correct. If the domain extension is correct, the user is directed to the website specified by the URL. If the domain extension is incorrect, the method proceeds to 334 where it corrects the domain extension. Once the domain extension has been corrected, method 300 proceeds to 336, and attempts to resolve the URL. If the URL resolves, the user is directed to the website specified by the corrected URL at 342. If the URL does not resolve, an error message is returned at 338.
It will be appreciated that various alterations to the specific steps discussed with reference to
Although the present invention has been described in detail, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and scope of the invention as defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5892919 | Nielsen | Apr 1999 | A |
5907680 | Nielsen | May 1999 | A |
6009459 | Belfiore et al. | Dec 1999 | A |
6092100 | Berstis et al. | Jul 2000 | A |
6119167 | Boyle et al. | Sep 2000 | A |
6173406 | Wang et al. | Jan 2001 | B1 |
6243741 | Utsumi | Jun 2001 | B1 |
6311216 | Smith et al. | Oct 2001 | B1 |
6332158 | Risley et al. | Dec 2001 | B1 |
6424983 | Schabes et al. | Jul 2002 | B1 |
6510461 | Nielsen | Jan 2003 | B1 |
6519626 | Soderberg et al. | Feb 2003 | B1 |
6529187 | Dickelman | Mar 2003 | B1 |
6556984 | Zien | Apr 2003 | B1 |
6658662 | Nielsen | Dec 2003 | B1 |
6845475 | He | Jan 2005 | B1 |
6876997 | Rorex et al. | Apr 2005 | B1 |
6941300 | Jensen-Grey | Sep 2005 | B2 |
7043690 | Bates et al. | May 2006 | B1 |
7136932 | Schneider | Nov 2006 | B1 |
7188138 | Schneider | Mar 2007 | B1 |
7194552 | Schneider | Mar 2007 | B1 |
7243305 | Schabes et al. | Jul 2007 | B2 |
7289519 | Liskov | Oct 2007 | B1 |
7296019 | Chandrasekar et al. | Nov 2007 | B1 |
7346605 | Hepworth et al. | Mar 2008 | B1 |
7376752 | Chudnovsky et al. | May 2008 | B1 |
7562153 | Biliris et al. | Jul 2009 | B2 |
7725452 | Randall | May 2010 | B1 |
7853719 | Cao et al. | Dec 2010 | B1 |
20020052912 | Griswold et al. | May 2002 | A1 |
20020059421 | Hendren, III | May 2002 | A1 |
20020078233 | Biliris et al. | Jun 2002 | A1 |
20020103820 | Cartmell et al. | Aug 2002 | A1 |
20020133514 | Bates et al. | Sep 2002 | A1 |
20020143888 | Lisiecki et al. | Oct 2002 | A1 |
20020147774 | Lisiecki et al. | Oct 2002 | A1 |
20020156917 | Nye | Oct 2002 | A1 |
20020188448 | Goodman et al. | Dec 2002 | A1 |
20020194373 | Choudhry | Dec 2002 | A1 |
20030014450 | Hoffman | Jan 2003 | A1 |
20030041147 | van den Oord et al. | Feb 2003 | A1 |
20030110295 | Suzuki et al. | Jun 2003 | A1 |
20030233423 | Dilley et al. | Dec 2003 | A1 |
20040003115 | Mason | Jan 2004 | A1 |
20040019697 | Rose | Jan 2004 | A1 |
20040030780 | Walters | Feb 2004 | A1 |
20040049388 | Roth et al. | Mar 2004 | A1 |
20040093567 | Schabes et al. | May 2004 | A1 |
20040220903 | Shah et al. | Nov 2004 | A1 |
20050235031 | Schneider et al. | Oct 2005 | A1 |
20060026128 | Bier | Feb 2006 | A1 |
20060112066 | Hamzy | May 2006 | A1 |
20060129543 | Bates et al. | Jun 2006 | A1 |
20060167862 | Reisman | Jul 2006 | A1 |
20070283254 | Hamzy | Dec 2007 | A1 |
20080005127 | Schneider | Jan 2008 | A1 |
20080016233 | Schneider | Jan 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20060112094 A1 | May 2006 | US |