Claims
- 1. A method of providing to a user sentences from a sentence database in response to a query, the method comprising:
receiving the query; defining indexing units based upon the query, the indexing units including both lemma from the query and extended indexing units associated with the query; and retrieving a plurality of sentences from the sentence database using the defined indexing units as search parameters; determining a similarity between each of the plurality of retrieved sentences and the query, wherein each similarity is determined as a function of a linguistic weight of a term in the query; and ranking the plurality of retrieved sentences based upon the determined similarities.
- 2. The method of claim 1, wherein the linguistic weight of the term in the query is a weight assigned to the term in the query as a function of its part of speech.
- 3. The method of claim 2, wherein determining the similarity between each of the plurality of retrieved sentences and the query further comprises determining each similarity as a function of linguistic weights of a plurality of terms in the query.
- 4. The method of claim 3, wherein determining the similarity between each of the plurality of retrieved sentences and the query further comprises determining each similarity as a function of vector weights of each of the plurality of terms in the query and linguistic weights of each of the plurality of terms in the query.
- 5. The method of claim 4, wherein the vector weights of each of the plurality of terms in the query are determined as a function of an occurrence frequency of the respective term in the query.
- 6. The method of claim 5, wherein the vector weights of each of the plurality of terms in the query are determined as a function of occurrence frequencies of the respective terms in the sentence database.
- 7. The method of claim 6, wherein determining the similarity between each of the plurality of retrieved sentences and the query further comprises determining the similarity for a particular retrieved sentence as a function of vector weights of each of a plurality of terms in the particular sentence, the vector weights of each of the plurality of terms in the query and the linguistic weights of each of the plurality of terms in the query.
- 8. The method of claim 7, wherein the vector weights of each of the plurality of terms in the particular retrieved sentence are determined as a function of occurrence frequencies of the respective terms in the particular retrieved sentence.
- 9. The method of claim 8, wherein the vector weights of each of the plurality of terms in the particular retrieved sentence are determined as a function of occurrence frequencies of the respective terms in the sentence database.
- 10. The method of claim 9, wherein determining the similarity between each of the plurality of retrieved sentences and the query further comprises determining the similarity for the particular retrieved sentence as a function of inner products of the vector weights of the plurality of terms in the query, the vector weights of the plurality of terms in the particular retrieved sentence, and the linguistic weights of each of the plurality of terms in the query.
- 11. The method of claim 1, wherein each similarity is further determined as a function of a sentence length factor corresponding to a length of a corresponding one of the plurality of retrieved sentences.
- 12. The method of claim 11, wherein the sentence length factor is a function of the length of the corresponding one of the plurality of retrieved sentences.
- 13. The method of claim 12, wherein the sentence length factor is an exponential function of the length of the corresponding one of the plurality of retrieved sentences.
- 14. The method of claim 1, wherein defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and lemma from the query with their corresponding parts of speech.
- 15. The method of claim 1, wherein defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and phrasal verbs from the query.
- 16. The method of claim 1, wherein defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and dependency triples corresponding to the query.
- 17. The method of claim 1, wherein defining the indexing units based upon the query further comprises:
defining the indexing units to include the lemma from the query, lemma from the query with their corresponding parts of speech, phrasal verbs from the query, and dependency triples corresponding to the query.
- 18. A method of providing to a user confirming sentences from a sentence database in response to a query, the method comprising:
retrieving a plurality of confirming sentences from the sentence database in response to the query; determining a similarity between each of the plurality of retrieved confirming sentences and the query, wherein each similarity is determined as a function of a linguistic weight of a term in the query; and ranking the plurality of retrieved confirming sentences based upon the determined similarities.
- 19. The method of claim 18, wherein the linguistic weight of the term in the query is a weight assigned to the term in the query as a function of its part of speech.
- 20. The method of claim 19, wherein determining the similarity between each of the plurality of retrieved confirming sentences and the query further comprises determining each similarity as a function of linguistic weights of a plurality of terms in the query.
- 21. The method of claim 20, wherein determining the similarity between each of the plurality of retrieved confirming sentences and the query further comprises determining each similarity as a function of vector weights of each of the plurality of terms in the query and linguistic weights of each of the plurality of terms in the query.
- 22. The method of claim 21, wherein the vector weights of each of the plurality of terms in the query are determined as a function of an occurrence frequency of the respective term in the query.
- 23. The method of claim 22, wherein the vector weights of each of the plurality of terms in the query are determined as a function of occurrence frequencies of the respective terms in the sentence database.
- 24. The method of claim 23, wherein determining the similarity between each of the plurality of retrieved confirming sentences and the query further comprises determining the similarity for a particular confirming sentence as a function of vector weights of each of a plurality of terms in the particular confirming sentence, the vector weights of each of the plurality of terms in the query and the linguistic weights of each of the plurality of terms in the query.
- 25. The method of claim 24, wherein the vector weights of each of the plurality of terms in the particular confirming sentence are determined as a function of occurrence frequencies of the respective terms in the particular confirming sentence.
- 26. The method of claim 25, wherein the vector weights of each of the plurality of terms in the particular confirming sentence are determined as a function of occurrence frequencies of the respective terms in the sentence database.
- 27. The method of claim 26, wherein determining the similarity between each of the plurality of retrieved confirming sentences and the query further comprises determining the similarity for the particular confirming sentence as a function of inner products of the vector weights of the plurality of terms in the query, the vector weights of the plurality of terms in the particular confirming sentence, and the linguistic weights of each of the plurality of terms in the query.
- 28. The method of claim 18, wherein each similarity is further determined as a function of a sentence length factor corresponding to a length of a corresponding one of the plurality of confirming sentences.
- 29. The method of claim 28, wherein the sentence length factor is a function of the length of the corresponding one of the plurality of confirming sentences.
- 30. The method of claim 29, wherein the sentence length factor is an exponential function of the length of the corresponding one of the plurality of confirming sentences.
- 31. The method of claim 18, wherein retrieving the plurality of confirming sentences further includes determining extended indexing units from the query, and searching the sentence database using the extended indexing units as search terms.
- 32. A computer-readable medium having computer-executable instructions for performing steps comprising:
retrieving a plurality of confirming sentences from a sentence database in response to a query; determining a similarity between each of the plurality of retrieved confirming sentences and the query, wherein each similarity is determined as a function of a linguistic weight of a term in the query; and ranking the plurality of retrieved confirming sentences based upon the determined similarities.
- 33. The computer-readable medium of claim 32, wherein the linguistic weight of the term in the query is a weight assigned to the term in the query as a function of its part of speech.
- 34. The computer-readable medium of claim 33, wherein determining the similarity between each of the plurality of retrieved confirming sentences and the query further comprises determining each similarity as a function of linguistic weights of a plurality of terms in the query.
- 35. The computer-readable medium of claim 34, wherein determining the similarity between each of the plurality of retrieved confirming sentences and the query further comprises determining each similarity as a function of vector weights of each of the plurality of terms in the query and linguistic weights of each of the plurality of terms in the query.
- 36. The computer-readable medium of claim 35, wherein the vector weights of each of the plurality of terms in the query are determined as a function of an occurrence frequency of the respective term in the query.
- 37. The computer-readable medium of claim 36, wherein the vector weights of each of the plurality of terms in the query are determined as a function of occurrence frequencies of the respective terms in the sentence database.
- 38. The computer-readable medium of claim 37, wherein determining the similarity between each of the plurality of retrieved confirming sentences and the query further comprises determining the similarity for a particular confirming sentence as a function of vector weights of each of a plurality of terms in the particular confirming sentence, the vector weights of each of the plurality of terms in the query and the linguistic weights of each of the plurality of terms in the query.
- 39. The computer-readable medium of claim 38, wherein the vector weights of each of the plurality of terms in the particular confirming sentence are determined as a function of occurrence frequencies of the respective terms in the particular confirming sentence.
- 40. The computer-readable medium of claim 39, wherein the vector weights of each of the plurality of terms in the particular confirming sentence are determined as a function of occurrence frequencies of the respective terms in the sentence database.
- 41. The computer-readable medium of claim 40, wherein determining the similarity between each of the plurality of retrieved confirming sentences and the query further comprises determining the similarity for the particular confirming sentence as a function of inner products of the vector weights of the plurality of terms in the query, the vector weights of the plurality of terms in the particular confirming sentence, and the linguistic weights of each of the plurality of terms in the query.
- 42. The computer-readable medium of claim 32, wherein each similarity is further determined as a function of a sentence length factor corresponding to a length of a corresponding one of the plurality of confirming sentences.
- 43. The computer-readable medium of claim 42, wherein the sentence length factor is a function of the length of the corresponding one of the plurality of confirming sentences.
- 44. The computer-readable medium of claim 43, wherein the sentence length factor is an exponential function of the length of the corresponding one of the plurality of confirming sentences.
- 45. The computer-readable medium of claim 32, wherein retrieving the plurality of confirming sentences further includes determining extended indexing units from the query, and searching the sentence database using the extended indexing units as search terms.
- 46. A system for retrieving confirming sentences from a sentence database in response to a query, the system comprising:
an input component which receives the query as an input; and a search engine coupled to the input component, the search engine comprising:
a retrieval component configured to retrieve a plurality of confirming sentences from the sentence database in response to the query; and a ranking component configured to determine a similarity between each of the plurality of retrieved confirming sentences and the query, wherein each similarity is determined as a function of a linguistic weight of a term in the query, the ranking component further configured to rank the plurality of retrieved confirming sentences based upon the determined similarities.
- 47. The system of claim 46, wherein the linguistic weight of the term in the query is a weight assigned to the term in the query as a function of its part of speech.
- 48. The system of claim 47, wherein the ranking component is configured to determine the similarity between each of the plurality of retrieved confirming sentences and the query by determining each similarity as a function of linguistic weights of a plurality of terms in the query.
- 49. The system of claim 48, wherein the ranking component is configured to determine the similarity between a particular retrieved confirming sentence and the query as a function of vector weights of each of a plurality of terms in the particular confirming sentence, vector weights of each of the plurality of terms in the query and the linguistic weights of each of the plurality of terms in the query.
- 50. The system of claim 49, wherein the vector weights of each of the plurality of terms in the particular confirming sentence or of the plurality of terms of the query are functions of occurrence frequencies of the respective terms in the particular confirming sentence or in the query.
- 51. The system of claim 50, wherein the vector weights of each of the plurality of terms in the particular confirming sentence or of the plurality of terms of the query are functions of occurrence frequencies of the respective terms in the sentence database.
- 52. The system of claim 46, wherein the ranking component is further configured to determine each similarity as a function of a sentence length factor corresponding to a length of a corresponding one of the plurality of confirming sentences.
- 53. The system of claim 52, wherein the ranking component is further configured to determine each similarity as a function of an exponential function of the length of the corresponding one of the plurality of confirming sentences.
- 54. The system of claim 53, wherein the retrieval component is configured to retrieve the plurality of confirming sentences by determining extended indexing units from the query, and searching the sentence database using the extended indexing units as search terms.
- 55. A method of providing to a user sentences from a sentence database in response to a query, the method comprising:
receiving the query; defining indexing units based upon the query, the indexing units including both lemma from the query and extended indexing units associated with the query; and retrieving at least one sentence from the sentence database using the defined indexing units as search parameters.
- 56. The method of claim 55, wherein defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and lemma from the query with their corresponding parts of speech.
- 57. The method of claim 55, wherein defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and phrasal verbs from the query.
- 58. The method of claim 55, wherein defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and dependency triples corresponding to the query.
- 59. The method of claim 55, wherein defining the indexing units based upon the query further comprises:
defining the indexing units to include the lemma from the query, lemma from the query with their corresponding parts of speech, phrasal verbs from the query, and dependency triples corresponding to the query.
- 60. The method of claim 55, wherein retrieving the at least one sentence from the sentence database using the defined indexing units as search parameters further comprises:
retrieving a plurality of confirming sentences from the sentence database using the defined indexing units as search parameters.
- 61. The method of claim 60, and further comprising:
determining a similarity between each of the plurality of retrieved confirming sentences and the query, wherein each similarity is determined as a function of a linguistic weight of a term in the query; and ranking the plurality of retrieved confirming sentences based upon the determined similarities.
- 62. A computer readable medium having computer-executable instructions for performing steps comprising:
receiving a query; defining indexing units based upon the query, the indexing units including both lemma from the query and extended indexing units associated with the query; and retrieving at least one sentence from a sentence database using the defined indexing units as search parameters.
- 63. The computer readable medium of claim 62, wherein the step of defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and lemma from the query with their corresponding parts of speech.
- 64. The computer readable medium of claim 62, wherein the step of defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and phrasal verbs from the query.
- 65. The computer readable medium of claim 62, wherein the step of defining the indexing units based upon the query further comprises:
defining the indexing units to include both the lemma from the query and dependency triples corresponding to the query.
- 66. The computer readable medium of claim 62, wherein the step of defining the indexing units based upon the query further comprises:
defining the indexing units to include the lemma from the query, lemma from the query with their corresponding parts of speech, phrasal verbs from the query, and dependency triples corresponding to the query.
- 67. The computer readable medium of claim 62, wherein the step of retrieving the at least one sentence from the sentence database using the defined indexing units as search parameters further comprises:
retrieving a plurality of confirming sentences from the sentence database using the defined indexing units as search parameters.
- 68. A system for retrieving confirming sentences from a sentence database in response to a query, the system comprising:
an input component which receives the query as an input; and a search engine coupled to the input component, the search engine configured to define indexing units based upon the query, the indexing units including both lemma from the query and extended indexing units associated with the query, the search engine retrieving at least one confirming sentence from the sentence database using the defined indexing units as search parameters.
- 69. The system of claim 68, wherein the search engine is configured to define the indexing units to include both the lemma from the query and lemma from the query with their corresponding parts of speech.
- 70. The system of claim 68, wherein the search engine is configured to define the indexing units to include both the lemma from the query and phrasal verbs from the query.
- 71. The system of claim 68, wherein the search engine is configured to define the indexing units to include both the lemma from the query and dependency triples corresponding to the query.
- 72. The system of claim 68, wherein the search engine is configured to define the indexing units to include the lemma from the query, lemma from the query with their corresponding parts of speech, phrasal verbs from the query, and dependency triples corresponding to the query.
- 73. The system of claim 72, wherein the search engine retrieves a plurality of confirming sentences from the sentence database using the defined indexing units as search parameters, the search engine further configured to determine a similarity between each of the plurality of retrieved confirming sentences and the query, wherein each similarity is determined as a function of a linguistic weight of a term in the query, the search engine ranking the plurality of retrieved confirming sentences based upon the determined similarities.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] Reference is hereby made to the following co-pending and commonly assigned patent applications filed on even date herewith: U.S. application Ser. No. ______ entitled “METHOD AND SYSTEM FOR DETECTING USER INTENTIONS IN RETRIEVAL OF HINT SENTENCES” and U.S. application Ser. No. ______ entitled “METHOD AND SYSTEM FOR RETRIEVING HINT SENTENCES USING EXPANDED QUERIES” both for inventor Ming Zhou.