Claims
- 1. A data-gathering and reporting system for collecting data from a wide area network (WAN) comprising:
a database stored in a data repository; a first server having access to the data base and organizing data-gathering work assignments from data in the database; a hierarchical network of distributor servers having a highest level connected to the first server and expanding to a lowest level, with distributor servers at different levels connected by data links and distributing work assignments to lower levels on demand from the distributor servers at lower levels; a plurality of gatherer servers connected by data links to the lowest level of the hierarchy of distributor servers and to the WAN, the lowest level of distributor servers distributing work assignments to the gatherer servers on demand from the gatherer servers, the gatherer servers accomplishing the work assignments distributed by the distributor servers and queueing data collected from the WAN as a result of the work assignments; a hierarchical network of collector servers having a lowest level connected to the gatherer servers and contracting to a highest level, the gatherer servers communicating data collected to the lowest level of collector servers, with collector servers at different levels connected by data links and delivering collected data to higher levels by push; and one or more filing servers connected to the highest level of collector servers, the filing servers communicating with the database in the data repository, the collector servers delivering collected data to the one or more filing servers, and the filing servers writing the collected data to the database.
- 2. The system of claim 1 wherein the WAN is the Internet, and data is collected from WEB servers on the Internet.
- 3. The system of claim 1 wherein gating of work assignments and data between one server and another in the distributor network is by the one server having a queue with an adjustable threshold, and demanding data or work assignments from the other server as a result of the queue level falling to the threshold.
- 4. The system of claim 3 wherein latency and database writing efficiency is adjusted by adjusting queue thresholds among servers.
- 5. The system of claim 1 wherein server power and capacity required in a system is adjusted by scaling the number of servers and number of hierarchical levels of servers.
- 6. The system of claim 1 wherein priority is assigned to work assignments, and work assignments and collected data are gated from server to server according to assigned priority as well as by need.
- 7. The system of claim 1 wherein work assignments are expressed in a markup language, allowing all information required to fill an assignment to be encapsulated such that only the one or more filing servers need be connected to the database.
- 8. The system of claim 2 wherein the system is associated with an Internet subscription server, and the work assignments are for collecting data from WEB pages associated with individual subscribers.
- 9. The system of claim 8 wherein some work assignments are automatically scheduled for individual subscribers and some assignments are on demand from individual subscribers.
- 10. A data-gathering and reporting system for collecting WEB summaries from the Internet for individual subscribers to a Portal subscription system, comprising:
a plurality of gatherer servers each connected to the Internet, to an ascending hierarchy of work request distribution servers, and to a ascending hierarchy of collector servers; a work request generator at the top of the hierarchy of distribution servers, generating work requests for collecting WEB summaries; and a filer server at the top of the hierarchy of collector servers, the file server connected to and writing data to a database; wherein flow is by work requests from the work request generator down the hierarchy of distributor servers to the gatherer servers where work requests are accomplished by gathering WEB summaries from Internet servers according to the work requests, and by data collected from the gatherer servers up the hierarchy of collector servers to the filing server.
- 11. The system of claim 10 wherein gating of work assignments and data between one server and another in the hierarchy of distributor servers is by the one server having a queue with an adjustable threshold, and demanding data or work assignments from the other server as a result of the queue level falling to the threshold.
- 12. The system of claim 11 wherein latency and database writing efficiency is adjusted by adjusting queue thresholds among servers.
- 13. The system of claim 10 wherein server power and capacity required in a system is adjusted by scaling the number of servers and number of hierarchical levels of servers.
- 14. The system of claim 10 wherein priority is assigned to work assignments, and work assignments and collected data are gated from server to server according to assigned priority as well as by need.
- 15. The system of claim 10 wherein work assignments are expressed in a markup language, allowing all information required to fill an assignment to be encapsulated such that only the one or more filing servers need be connected to the database.
- 16. The system of claim 10 wherein some work assignments are automatically scheduled for individual subscribers and some assignments are on demand from individual subscribers.
- 17. A method for gathering data from the Internet, comprising:
(a) generating data collection requests by a request generator; (b) passing the requests down a descending hierarchy of distributor servers on demand from servers at lower levels; (c) accomplishing the data gathering requests by a level of gatherer servers connected to the Internet and the lowest level of distributor servers, the gatherer servers pulling requests from the distributor servers; (d) passing collected data in discrete packets associated with the requests up an ascending hierarchy of collector servers to a filing server at the top of the hierarchy; and (e) writing the collected data to a database by the filing server.
- 18. The method of claim 17 wherein gating of work assignments and data between one server and another in the distributor server hierarchy is by the one server having a queue with an adjustable threshold, and demanding data or work assignments from the other server as a result of the queue level falling to the threshold.
- 19. The method of claim 18 wherein latency and database writing efficiency is adjusted by adjusting queue thresholds among servers.
- 20. The method of claim 17 wherein server power and capacity required in a system is adjusted by scaling the number of servers and number of hierarchical levels of servers.
- 21. The method of claim 17 wherein priority is assigned to work requests, and work requests and collected data are gated from server to server according to assigned priority as well as by need.
- 22. The method of claim 17 wherein work requests are expressed in a markup language, allowing all information required to fill a request to be encapsulated such that only the filing server needs be connected to the database.
- 23. The method of claim 17 wherein some work requests are automatically scheduled for individual subscribers and some assignments are on demand from individual subscribers.
CROSS-REFERENCE TO RELATED DOCUMENTS
[0001] The present invention is related as a continuation in part (CIP) to a patent application entitled “Method and Apparatus for Obtaining and Presenting WEB Summaries to Users” filed on Jun. 1, 1999, for which Ser. No. 09/323,598 is assigned, and which is incorporated herein by reference, which is a CIP of application Ser. No. 09/208,740, also incorporated herein by reference.
Divisions (1)
|
Number |
Date |
Country |
Parent |
09362914 |
Jul 1999 |
US |
Child |
10360337 |
Feb 2003 |
US |
Continuation in Parts (2)
|
Number |
Date |
Country |
Parent |
09323598 |
Jun 1999 |
US |
Child |
10360337 |
Feb 2003 |
US |
Parent |
09208740 |
Dec 1998 |
US |
Child |
09323598 |
Jun 1999 |
US |