Claims
- 1. A method of forming an information closure of a plurality of information units, dividable into a first unit and a plurality of remaining units, each unit having at least one of a plurality of fields, for accessing semistructured information, comprising the steps:
computing a cross product of the fields in a first unit; adding said cross product to a list of accepted units; for each remaining unit r in said plurality of remaining units, computing a selective cross product from said remaining unit r and said list of accepted units; removing from the list of accepted units at least one of a plurality of units having identical fields.
- 2. The method of claim 1 further including repeating the computing a selective cross product step above for all remaining units.
- 3. The method of claim 1 wherein said computing a selective cross product step further comprises the steps:
initializing a result to empty; for each accepted unit r″ in said list of accepted units, determining a unit r′ containing accepted unit r″ and at least one of a plurality non-empty fields in remaining unit r; determining a new unit n′ containing remaining unit r and at least one of a plurality non-empty fields in accepted unit r″; adding r′ and n′ to result; repeating the two determining steps and adding step above for all accepted units r″ in said list of accepted units.
- 4. The method of claim 1 wherein said units comprise rows in a linkage stack.
- 5. The method of claim 1 wherein said information units contain information about shopping items.
- 6. The method of claim 1 wherein said information units contain information about job listings.
- 7. The method of claim 1 wherein said information units contain information about real estate listings.
- 8. The method of claim 1 wherein said information units are derived from a source of semistructured information.
- 9. The method of claim 8 wherein said source of semistructured information comprises a plurality of web pages.
- 10. A method of forming an information closure of a plurality of rows, having at least one of a plurality fields, in a linkage stack of a wrapper program for accessing semistructured information, comprising the steps:
removing a first row from said linkage stack, leaving a plurality of remaining rows in said linkage stack; computing a cross product of the fields in the first row from said linkage stack; adding said cross product of said fields in said first row to a list of accepted rows; for each remaining row r in said plurality of remaining rows in said linkage stack, computing a selective cross product from said remaining row r and said list of accepted rows, comprising the steps:
initializing a result to empty; for each accepted row r″ in said list of accepted rows, determining a row r′ containing accepted row r″ and at least one of a plurality non-empty fields in remaining row r; determining a new row n′ containing remaining row r and at least one of a plurality non-empty fields in accepted row r″; adding rows r′ and n′ to result; repeating the two determining steps and adding step above for all accepted rows r″ in said list of accepted rows; repeating the computing a selective cross product step above for all remaining rows r in said linkage stack; removing from the list of accepted rows at least one of a plurality rows having identical fields; providing the result as the information closure.
- 11. The method of claim 10 wherein said rows contain information about shopping items.
- 12. The method of claim 10 wherein said rows contain information about job listings.
- 13. The method of claim 10 wherein said rows contain information about real estate listings.
- 14. The method of claim 10 wherein said rows are derived from a source of semistructured information.
- 15. The method of claim 15 wherein said source of semistructured information comprises a plurality of web pages.
- 16. The method of claim 10 wherein a default field value is copied into a row only if there is no other row having a value for the field, given the current state of the row.
- 17. A system for computing an information closure of a plurality of information units, dividable into a first unit and a plurality of remaining units, each unit having at least one of a plurality fields, for accessing semistructured information, comprising:
a computer readable medium for containing said plurality of information; and a processor means operatively disposed to:
compute a cross product of the fields in a first unit; add said cross product to a list of accepted units; for each remaining unit r in said plurality of remaining units, compute a selective cross product from said remaining unit r and said list of accepted units; remove from the list of accepted units at least one of a plurality units having identical fields.
- 18. The system of claim 17 wherein said processor means is further operatively disposed to repeating the computing of the selective cross product for all remaining units.
- 19. The system of claim 17 wherein said processor means is further operatively disposed to perform the computing of the selective cross product, which further comprises:
initializing a result to empty; for each accepted unit r″ in said list of accepted units, determining a unit r′ containing accepted unit r″ and at least one of a plurality non-empty fields in remaining unit r; determining a new unit n′ containing remaining unit r and at least one of a plurality non-empty fields in accepted unit r″; adding r′ and n′ to result; repeating the two determining operations and adding operation for all accepted units r″ in said list of accepted units.
- 20. The system of claim 17 wherein said units comprise rows in a linkage stack.
- 21. The system of claim 17 wherein said information units contain information about shopping items.
- 22. The system of claim 17 wherein said information units contain information about job listings.
- 23. The system of claim 17 wherein said information units contain information about real estate listings.
- 24. The system of claim 17 wherein said information units are derived from a source of semistructured information.
- 25. The system of claim 24 wherein said source of semistructured information comprises a plurality of web pages.
- 26. A computer programming product for computing an information closure of a plurality of information units, dividable into a first unit and a plurality of remaining units, each unit having at least one of a plurality fields, for accessing semistructured information, comprising:
code for computing a cross product of the fields in a first unit; code for adding said cross product to a list of accepted units; code for computing a selective cross product from said remaining unit r and said list of accepted units for each remaining unit r in said plurality of remaining units; code for removing from the list of accepted units at least one of a plurality units having identical fields; and a computer readable medium for containing said codes.
- 27. The computer program product of claim 26 further comprising code for repeating the computing a selective cross product step above for all remaining units.
- 28. The computer program product of claim 26 wherein said code for computing a selective cross product further comprises:
code for initializing a result to empty; code for determining a unit r′ containing accepted unit r″ and at least one of a plurality non-empty fields in remaining unit r, for each accepted unit r″ in said list of accepted units; code for determining a new unit n′ containing remaining unit r and at least one of a plurality non-empty fields in accepted unit r″; code for adding r′ and n′ to result; code for repeatedly invoking the two codes for determining and code for adding above for all accepted units r″ in said list of accepted units.
- 29. The computer program product of claim 26 wherein said units comprise rows in a linkage stack.
- 30. The computer program product of claim 26 wherein said information units contain information about shopping items.
- 31. The computer program product of claim 26 wherein said information units contain information about job listings.
- 32. The computer program product of claim 26 wherein said information units contain information about real estate listings.
- 33. The computer program product of claim 26 wherein said information units are derived from a source of semistructured information.
- 34. The computer program product of claim 33 wherein said source of semistructured information comprises a plurality of web pages.
CROSS-REFERENCES TO RELATED APPLICATIONS
[0001] This application claims priority from the following U.S. Provisional Application, the disclosure of which, including all appendices and all attached documents, is incorporated by reference in its entirety for all purposes:
[0002] U.S. Provisional Patent Application Ser. No. 60/066,125, Peter Norvig, et. al. entitled, “Method for Creating an Information Closure Model,” filed Nov. 21, 1997.
[0003] The following commonly-owned copending application, is being filed concurrently and is hereby incorporated by reference in its entirety for all purposes:
[0004] U.S. Patent Application Ser. No. ______, Ashish Gupta, et. al. entitled, “Method and Apparatus for Creating Extractors, Field Information Objects and Inheritance Hierarchies in a Framework for Retrieving Semistructured Information” (attorney docket number 17907-000220US).
[0005] This application makes reference to the following commonly owned U.S. Patent which is incorporated herein in its entirety for all purposes:
[0006] U.S. Pat. No. 5,826,258, in the name of Ashish Gupta, et al., entitled “Method and Apparatus for Structuring the Querying and Interpretation of Semistructured Information,” relates to information retrieval and interpretation from disparate semistructured information resources.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60066125 |
Nov 1997 |
US |
Continuations (1)
|
Number |
Date |
Country |
| Parent |
09196026 |
Nov 1998 |
US |
| Child |
10000235 |
Nov 2001 |
US |