Claims
- 1. A Web search engine, comprising:
a Web page database; a crawler to fetch pages from the Web and store the pages in the Web page database; a link extractor to extract link information from the pages; a URL management system to assign an identification number to the URL of each page, and store the identification number and URL pairs in the Web page database and send new URLs to the crawler to be retrieved from the Web; anchor text and link database; an anchor text and link extractor to extract the anchor text and the link information from the pages and store in the anchor text and link database; indexed database; an indexer to parse keywords from the pages and store the keyword and URL identification pairs in the indexed database; and a ranker to rank a page based on intrinsic rank and extrinsic rank of the page.
- 2. The Web search engine of claim 1, wherein the ranker determines the intrinsic rank from content information in the indexed database and the page weight computed from the link information in anchor text and link database, and the extrinsic rank from the anchor text information in the anchor text and link database and the computed page weight.
- 3. The Web search engine of claim 1, wherein the ranker determines the intrinsic rank of the page based on the content score and the page weight.
- 4. The Web search engine of claim 1, wherein the ranker determines the extrinsic rank of the page based on the anchor weight of each inbound link and the page weight of the originating page.
- 5. The Web search engine of claim 1, wherein the ranker determines the anchor weight based on the link weight and the keyword being present in the anchor text or related text.
- 6. The Web search engine of claim 1, wherein the ranker calculates the intrinsic rank and extrinsic rank of a page for a multi-keyword query, wherein the intrinsic rank is a function of content score and the page weight, the extrinsic rank of the page is a function of the partial extrinsic ranks and proximity values.
- 7. The Web search engine of claim 1, further comprising a page weight generator and a page weight database, computing page weights by initializing a page weight vector to a constant, constructing a connectivity graph representing the link structure of the fetched pages, computing an output page weight vector from the input page weight vector and the connectivity graph, and comparing the output page weight vector with the input page weight vector and if convergence is reached, writing the output page weight vector in a page weight database, and if not, mixing the input and output page weight vectors to generate a new input page weight vector and repeating until convergence is reached.
- 8. A computer system for ranking search results from a query on a collection of hypertext pages, comprising:
a crawler to fetch pages from the collection of hypertext pages; a link extractor to extract page locator information from the fetched pages; a page locator management system for storing and retrieving the page locator information; a page database to store the pages; an indexer to parse keywords from the pages and store the keyword page locator pairs in the indexed database; an anchor text and link extractor to extract the anchor text and link structures from the pages; an anchor text and link database, wherein the anchor text and link extractor writes the anchor text and link structures into the anchor text and link database; and a ranker to assign a rank value to a page based on intrinsic and extrinsic rank.
- 9. The system of claim 8, wherein the ranker assigns an intrinsic rank to the page based on a combination of content score and page weight.
- 10. The system of claim 8, wherein the ranker assigns the content score to the page for a keyword based on a combination of location, frequency, and/or font size of the keyword in the page.
- 11. The system of claim 8, wherein the ranker assigns a page weight to the page as the probability of a searcher visiting the page when traveling in the collection of hypertext pages in a random fashion.
- 12. The system of claim 8, wherein the ranker assigns a uniform value corresponding to the reciprocal of the total number of links outbound from an originating page to link weight.
- 13. The system of claim 8, wherein the ranker assigns link weight based on location of the link.
- 14. The system of claim 8, wherein the ranker assigns an extrinsic rank to the page for a given keyword as a combination of anchor weight of the links from other pages and the page weight of referring pages.
- 15. The system of claim 8, wherein the ranker assigns a rank value to a page for a multi-keyword query as a combination of intrinsic rank and extrinsic rank for the multi-keyword.
- 16. The system of claim 8, wherein the ranker assigns an intrinsic rank to a page for a multi-keyword query as a combination of content score and page weight.
- 17. The system of claim 8, wherein the ranker assigns a content score to a page for a multi-keyword query as a combination of content score based on intersection of the given keywords and proximity value.
- 18. The system of claim 8, wherein the ranker assigns a partial extrinsic rank for each variation of identical anchor text.
- 19. The system of claim 8, wherein the ranker assigns a extrinsic rank to a page for a multi-keyword query as a combination of partial extrinsic rank of identical anchor text and proximity values in each anchor text.
- 20. The system of claim 8, wherein the ranker obtains a link connectivity graph of the pages.
- 21. The system of claim 8, wherein the ranker obtains the rank values from the link connectivity graph.
- 22. The system of claim 8, wherein the ranker calculates the page weight by iterative numerical procedure.
- 23. The system of claim 8, wherein the ranker accelerates the convergence of the iterative numerical procedure in obtaining connectivity rank scores.
- 24. The system of claim 8, wherein the ranker calculates rank values by dividing the pages into distinct number of groups.
- 25. The system of claim 8, further comprising a rate controller to control the rate of request for page retrieval.
- 26. The system of claim 8, wherein the Web page database stores the pages in a fixed record large enough to contain a predetermined percentage of all of the pages, wherein if the page is smaller, the fixed record has some empty space, and if the page is larger, the Web page database stores as much of the page as possible in the fixed record and the rest in a record file.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a divisional application of and claims the benefit of priority from U.S. application Ser. No. 09/757,435 to Brian Kim et al., filed Jan. 10, 2001 and entitled “Systems and Methods of Retrieving Relevant Information”, which is fully incorporated herein by reference for all purposes.
Divisions (1)
|
Number |
Date |
Country |
Parent |
09757435 |
Jan 2001 |
US |
Child |
10454452 |
Jun 2003 |
US |