Claims
- 1. A method of optimizing the selection of one or more databases from a plurality of during query searching comprising the steps of:
interrogating the various databases with test queries made up of keywords from keyword sets for each of the databases; using Euclidean analysis to determine if keywords from the keyword sets fall within clusters; and determining predominant databases for interrogating in a particular cluster when a search is to be performed and search terms of the query fall within that cluster.
- 2. The method of claim 1, wherein determining the predominant databases for a cluster involves selecting the databases providing the most search queries falling within that cluster.
- 3. The method of claim 1, wherein determining the predominant databases includes analyzing the search results of the test queries to determine which databases contributed the most significant references.
- 4. The method of claim 1, wherein the determination of the most significant references is based on ranking of the documents by their raw ranking data.
- 5. The method of claim 4, wherein base and meta learners are used in determining the ranking.
- 6. The method of claim 1, wherein the sets of training queries are typical queries to each of the databases.
- 7. The method of claim 1, wherein the sets of queries are generated by an analysis of the documents in each of the databases.
- 8. A computer program product on a computer usable medium for optimizing the selection of databases to be interrogated during query searching comprising:
software for interrogating the various databases with test queries made up of keywords from keyword sets for each of the databases; software for using Euclidean analysis to determine if keywords from the keyword sets fall within clusters; and software for determining dominant databases for interrogating in a particular cluster when a search is to be performed and the search terms of the query fall within that cluster.
- 9. The computer program product of claim 8, wherein the software for determining the dominant databases for a cluster involves software for selecting the databases providing the most search queries falling within that cluster.
- 10. The computer program product of claim 1, wherein the software for determining the dominant databases includes analyzing the search results of the test queries to determine which databases contributed the most significant references.
- 11. The computer program product of claim 8, wherein the software for determination of the most significant references is based on ranking of the documents by their raw ranking data.
- 12. The computer program product of claim 11, wherein software for base and meta learners are used in determining the ranking.
- 13. The computer program product of claim 8, wherein the sets of training queries are typical queries to each of the databases.
- 14. The computer program product of claim 1, wherein the sets of queries are generated by an analysis of the documents in each of the databases.
- 15. A computer program product on a computer usable medium for optimizing the selection of databases to be interrogated during query searching comprising:
software for determining if search terms T of a search query Ss fall within a cluster predetermined by Euclidean analysis using sets of test queries St; and software for limiting access by the search query Ss to that subset of the databases predetermined to be dominant cluster by using the sets of test queries St.
- 16. The computer program product of claim 15 wherein the predetermining of the dominant databases for a cluster is based on those databases providing the most test queries falling within that cluster.
- 17. The computer program product of claim 15 wherein the predetermining of the dominant databases for a cluster is based on databases providing test queries which contributed the most significant references.
RELATED APPLICATIONS
[0001] U.S. patent application Ser. No. xx/xxx,xxx (YOR920020107US1) filed on even date herewith and entitled “Query Routing Based on Feature Learning of Data Sources.”