Claims
- 1. A method for extracting and structuring items of data from content available via the Internet, the method comprising:
receiving input of a user specifying at least one source of content available via the Internet, types of data to be extracted from said at least one source, and fields for structuring extracted items of data; retrieving content from said at least one source; parsing the retrieved content to extract items of data of the types specified by the user; and mapping the extracted items of data to the fields specified by the user so as to transform the extracted items of data into a structured format.
- 2. The method of claim 1, wherein said at least one source of content includes a Web site.
- 3. The method of claim 1, wherein said at least one source of content includes an HTML page.
- 4. The method of claim 1, wherein said receiving step includes receiving a URL specifying a source of content available via the Internet.
- 5. The method of claim 1, wherein said receiving step includes receiving user input specifying attributes of data to be extracted from said at least one source.
- 6. The method of claim 1, wherein said retrieving step includes retrieving a Web page.
- 7. The method of claim 6, wherein said Web page comprises a selected one of an HTML page, an cHTML page, and an XHTML page.
- 8. The method of claim 6, wherein said parsing step includes parsing container objects of the Web page.
- 9. The method of claim 8, wherein said parsing step includes creating a new object for a particular container object of the Web page, the new object containing information for the particular container object.
- 10. The method of claim 8, wherein said parsing step includes creating feature tags for elements of the container objects.
- 11. The method of claim 10, wherein said creating feature tags step includes creating information for selected one of a headline, a graphic object, a button, and a run of text.
- 12. The method of claim 10, wherein said creating feature tags includes creating feature tags based on attributes of each element.
- 13. The method of claim 10, further comprising:
saving at least some of the feature tags; and subsequently, using a saved feature tag to retrieve an element from the Web page.
- 14. The method of claim 1, wherein said mapping step includes receiving user input for selecting a particular field for placing a given item of data.
- 15. The method of claim 14, wherein said mapping step further comprises automatically mapping other items of data similar to the given item of data to the particular field.
- 16. The method of claim 1, wherein said mapping step includes generating an XML document including the extracted items of data.
- 17. The method of claim 1, wherein said mapping step includes saving the extracted information to a database table.
- 18. A computer-readable medium having processor-executable instructions for performing the method of claim 1.
- 19. A downloadable set of processor-executable instructions for performing the method of claim 1.
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present application is related to and claims the benefit of priority of the following commonly-owned, presently-pending nonprovisional application(s): application Ser. No. 09/780,993 (Docket No. SYB/0090.03), filed Feb. 8, 2001, entitled “System and Method for Dynamic Content Retrieval”, of which the present application is a Continuation-in-part application thereof. The disclosure of the foregoing application is hereby incorporated by reference in its entirety, including any appendices or attachments thereof, for all purposes.
Provisional Applications (3)
|
Number |
Date |
Country |
|
60180994 |
Feb 2000 |
US |
|
60219156 |
Jul 2000 |
US |
|
60246674 |
Nov 2000 |
US |
Continuation in Parts (1)
|
Number |
Date |
Country |
Parent |
09780993 |
Feb 2001 |
US |
Child |
10709475 |
May 2004 |
US |