System and Method for Automated Discovery, Binding, and Integration of Non-Registered Geospatial Web Services

Description

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates web based portal system for automatically discovering and binding web mapping services that adhere to a standard interface specification, and integrating the services into a portal interface, according to an embodiment of the invention.

FIG. 2 illustrates more details of the system of FIG. 1.

FIG. 3 is an example of a Web Mapping Services XML Schema against which documents are validated in a method according to an embodiment of the invention.

FIG. 4 is an example of an XML document for validating against the XML Schema in accordance with an embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 illustrates a web based portal system for automatically discovering and binding web mapping services that adhere to a standard interface specification, and integrating the services into a portal interface. The system takes advantage of the standard interface specification by programming one WMS client to drive WMS requests from any published WMS server configured into the portal. The system can broker access to hundreds of thousands of maps from over a thousand various internet map servers such as NASA Blue Marble server, ArcIMS, Satellite Imagery servers, or WMS servers.

The web based portal system provides the user access all of the map servers through a single mapping application. Because the map servers can have many different implementations and access formats, drivers are programmed to convert map requests for a particular map server to the GIDB Portal Interface API. A major of using the GIDB Portal to access maps online are the copious amount of maps the Portal brokers, and homogeneous access to a variety of heterogeneous map servers via the GIDB Portal Interface APIs and the GIDB thick/thin mapping application clients. In contrast, the various map servers typically include client applications specifically designed to work with only the particular type of map server.

FIG. 1 illustrates the system architecture of a web mapping portal system 100 with automated capabilities to discover, bind, and integrate non-registered geospatial web services according to an embodiment of the invention.

Users 160, 170, and 180 can connect to the GIDB Portal interface 140 via a thick client application 165 resident on their computer, a web-based thin client application 175 via their computer or personal digital assistant, or through another client application 185.

The Portal WMS Driver component 110 converts GIDB Portal map requests to WMS map requests, which are in turn relayed to various WMS servers brokered by the GIDB Portal. For each map server brokered by the GIDB Portal, a driver 110 is used to convert a GIDB Portal map request to the particular map server request. The GIDB WMS Driver 110 converts GIDB Portal map requests to WMS Server Map requests by constructing the appropriate WMS URLs.

The GIDB WMS driver 110 essentially acts as a WMS client that reads the “Capabilities” document of WMS servers 120, 130 that have been configured into the GIDB Portal, and based on the contents of the Capabilities document advertises the particular WMS Server's map content to the GIDB Portal user. As the user 160 selects to view a particular map layer, a GIDB Portal fetch map request is issued and processed into a WMS fetch map request to the selected WMS Server. The only piece of information that is needed for the WMS driver 110 to configure a WMS Server into the GIDB Portal is the WMS Server URL as described above. Communication from the WMS Driver to the WMS Server is performed by configuring HTTP query key value pairs to the corresponding WMS Server URL and fetching the HTTP response.

As depicted in FIG. 1, the GIDB Portal system provides the user 160, 170, or 180 with access to map content from various map servers from a single client application. With any WMS compliant client application, the user may also access GIDB Portal content through the GIDB WMS Interface that adheres to the latest accepted version of the WMS specifications. With all of the benefits that web services technology brings such as interoperability, platform independence, and a standard API, any of the original map sources configured into the GIDB Portal are brokered access via WMS in a fully automated manner. In addition, a WMS server that is configured into the GIDB Portal as input may adhere to WMS specifications Version 1.0.0, but will be provided access via WMS Version 1.1.1 from the GIDB WMS Interface as output. This system design provides programmatic access that is highly interoperable and platform independent to a considerable amount of web map content that may come from platform dependent internet map sources that have little interoperability with other applications.

The integration of the GIDB Portal Interface 140, the GIDB WMS Driver 110, the WMS Web Crawler 190 and the GIDB WMS Interface components 150 of the system described above represent a portal that brokers homogeneous access to a variety of WMS Servers. By configuring as many WMS Servers as available on the Web into the WMS Driver component, the system saves the user the work of searching for WMS Servers manually and provides access to a vast array of WMS content from one comprehensive mapping application.

In an embodiment of the invention illustrated in FIG. 2, an automated method and system automatically searches for XML documents published on the web that validate to a common Open Geospatial Consortium Web Mapping Services XML Schema. More generally, the method and system can search for XML documents that validate to a particular XML schema, and is suitable for many different Web Services related applications.

WMS servers each have a published XML document that validates to a common XML schema as published by the Open Geospatial Consortium. The current WMS specification is available at J. Beaujardiere, Web Map Service Implementation Specifications, Version 1.1.1, Jan. 2; <http://portal.opengeospatial.org/files/index.php?artifact_id=1080&version=1&format=pdf>. An example of a WMS server published XML document that validates to the OGC WMS specification is: <http://digitalearth.gov/wmt/xml/capabilities_—1_—1_—1.dtd>

WMS map requests can be issued to any HTTP URL by appending “REQUEST=GetCapabilities&SERVICE=WMS” to the URL. The HTTP response body returns a validated OGC WMS Capabilities Document.

Initially, the automated system searches 210 the internet using an internet search engine for HTTP URLs 211 that potentially include WMS Servers or links to WMS Servers. Many WMS servers have a common implementation, and may have common URL string patterns in the corresponding WMS server URLs. The search terms can be keywords such as “map”, “WMS Servers” “WMS maps”, “Map server”, “REQUEST=GetCapabilities”, “OpenGIS Web Services”, and other terms and URL string patterns common to WMS servers.

In an exemplary embodiment of the invention, a search engine application program interface (API) is included in the system's computer program, so internet searches can be executed with little user input other than identifying the search terms for the search string. The APIs can be those provided by Google, Inc., Yahoo, Inc., or other search engines. The GOOGLE™ API specifications are described at <http://www.google.com/apis/reference.htms>. Thus, the search engine APIs can be used to dramatically reduce the set of all HTTP URLs to retrieve URLs as input seeds to the WMS crawler.

While APIs can be used to reduce the time and manpower needed to conduct an internet search and can store the results in a readily usable format, it is also possible to type queries into the internet search engines directly.

Once the initial web search has identified HTTP URLs, the system uses a web crawler 220 to search the URLs returned by the internet search for additional http URLs. The web crawler can find http URLs that are listed on the web pages identified by the original search that may not have been indexed by the search engine. By utilizing web crawling, the system can crawl scores of URLs to find WMS Capabilities Documents, and improve upon the original search. In one embodiment, the system employs the scalable and highly configurable Heritrix Open Source Web Crawler (http://archive.crawler.org), although other web crawlers are also suitable. The crawler is fed seed URLs from GOOGLE™ API queries with keywords similar to what a human user would input to a search engine to find WMS map servers: keywords like “WMS Maps”, “OpenGIS Web Services”, etc, as discussed above.

WMS Servers are often advertised on websites within well known “GIS web domains” that are of common interest within the geospatial information systems (GIS) community. The web crawler 220 can also be programmed to crawl these websites in addition to crawling the set of HTTP URLs 211 returned by the initial web search.

The system can also configure the webcrawler to disregard any URLs that return MIME header types that are images, video streams, music, executable programs, etc. that indicate that the URL is unlikely to contain another URL. These URLs will not be downloaded and processed.

The system includes a HTTP URL processor 230 that constructs a HTTP URL by appending a query designed to return only documents having a valid XML Schema. In the particular embodiment concerned with OCG WMS documents, the processor 230 constructs HTTP URL queries determine whether the discovered HTTP URLs are WMS Capabilities XML Documents that validate to the OGC WMS Specification XML Schema. The system tests whether the any discovered HTTP URL response returns a valid WMS “Capabilities” document. If a valid WMS “Capabilities” document is returned, then that HTTP URL is identified as a useful WMS document, and the system will be able to issue WMS map requests to the discovered URL based on the document's contents, and configure the new WMS Server URL into the GIDB Portal System WMS Driver automatically.

As an example, the processor 230 can append key-value pairs “REQUEST=GetCapabilities&SERVICE=WMS” to every HTTP URL within the set of all accessible HTTP URLs on the web. Appending such a query string to all the HTTP URLs within the set of all accessible HTTP URLs on the web can find all of the published WMS servers on the web. Exhaustively crawling the web (without the aid of a search engine) for WMS Server URLs is not feasible. Accordingly, the processor 230 appends the query string to only the HTTP URLs discovered by the web search and the web crawler search.

To target Internet areas likely to contain useful information, crawl jobs can also be configured to crawl particular GIS related web domains. This is an automated approach similar to a human user browsing www.esri.com or www.opengis.org to find WMS Services. To save on download time, only URLs with MIME types that contain text schemas XML, HTML, etc. can be downloaded. The downloaded HTTP response body is processed to extract more HTTP URLs to validate to a WMS Capabilities schema. If a HTTP URL is not of a type that may contain other URLs, such as video or music, then the crawler is configured to disregard the download.

The processor 230 uses an XML parser to validate each of the returned documents as conforming to the predetermined XML schema. As an example, FIG. 3 is an WMS XML Schema available at http://digitalearth.gov/wmt/xml/capabilities_—1_—1_—1.dtd. An example of a returned document to be validated is shown in FIG. 4, an XML document available at the internet site http://columbo.nrlssc.navy.mil/ogcwms/servlet/WMSServlet/Alexander_County_NC_Maps.wms?SERVICE=WMS&REQUEST=GetCapabilities.

If a discovered URL validates, it is stored in a database 240 that the WMS Driver component checks periodically for new findings. Because different WMS URLs can point to a WMS Server offering the same map content, the newly discovered WMS Server's Capabilities XML document tree is tested for equivalence against previously discovered WMS Servers to avoid duplicates within the Portal. As the WMS Driver 250 makes the new WMS Server known to the GIDB Portal, the portal's size grows by the amount of map content available in the server. By feeding the web crawler URL seeds from GOOGLE™ APIs, queries of the form “url:REQUEST=GetCapabilities” should retrieve at least as many discovered WMS Servers as a Refractions Research OGC Web Services Survey.

WMS Server Database 240 maintains a set of HTTP URLs pointing to the unique WMS server capabilities documents on the web for future use by the WMS driver 250. The system periodically re-validates the HTTP URLs stored in the WMS Server Database, and periodically conducts new searches for new WMS Server HTTP URLs. The search would typically be carried out once a week, although it can be done more or less frequently.

The GIDB portal system described herein configures the new WMS Server URL into the GIDB Portal System WMS Driver automatically. The described portal system provides potential users a single access point to all of the services discovered by the search algorithm (identifying a first set of HTTP URLs, webcrawling the first set of URLs to identify other URLs, querying the URLs to find which URLs contain files that comply with the WMS XML schema. This allows users to search, connect and retrieve data from the services all from one point, with all the WMS files being accessible to the user through the GIDB portal system and appearing to come from the single source of the GIDB portal. This is a significant savings of time and energy.

The web service interface to this portal also makes available a highly interoperable and platform independent programmatic access to sources that may have little platform independence and are not compatible with other GIS applications. The larger the number of sources integrated into a web portal, the greater its value is as a service.

Previously, searches and discovery of new sources of mapping information have been done manually by using search engines and catalogs. The system described herein provides a scalable automated solution. By utilizing a topic driven web crawler configured to search for structured XML documents that validate to a public schema, the system of integrated components presents a fully automated means for search, discovery, binding, and integration of Geospatial Web Services, thereby reducing the cost and manual labor needed to search for and configure new Geospatial Web Services into the portal.

The system presented is implemented in the platform-independent Java programming language and designed with a modular and scalable approach. While the system process of discovering new services is catered towards Geospatial Web Services, the same approach can be employed to other Web Services that are advertised by an XML document that validates to a common XML Schema.

In addition to being stored for future retrieval, the list of valid URLs and the query and responses from the URLs can also be printed, transferred to a remote computer, and displayed to a local or remote user.

Embodiments of the invention are also directed to computer based systems, methods, and computer readable media for controlling the computer components and accomplishing the methods described herein.

Users at remote sites have computers or PDAs for selecting WMS data sources from the GIDB portal system, and using the thick client or thin client GIDB software, can assemble maps with overlaid different layers of data, and can store, print, display, modify, and transfer the resulting maps, layers, and map data to other users. The software can also be integrated into other local ore remote computer systems for automatic retrieval of map data and integration of the resulting maps or map data into computer databases or systems.

The invention has been described with reference to certain preferred embodiments. It will be understood, however, that the invention is not limited to the preferred embodiments discussed above, and that modification and variations are possible within the scope of the appended claims.

Claims

1. A computer-based method for identifying internet web pages containing documents that comply with a predetermined XML schema, comprising: searching the internet with a search engine for web pages using initial search terms and identifying a first set of HTTP URLs;web crawling at least the first set of HTTP URLs to identify additional HTTP URLs;appending a query to the identified URLs; and evaluating the responses to the query to determine which responses comply with the predetermined XML schema; andstoring the HTTP URLs responses that comply with the predetermined XML schema in a database.
2. The method according to claim 1, further comprising: the web crawler selecting URLs of the first set of HTTP URLs that are a document type that is unlikely to contain more HTTP URLs, and not web crawling the selected URLs.
3. The method according to claim 3, wherein the selected URLs are at least one of music files, image files, and executable files.
4. The method according to claim 1, further comprising: comparing a newly discovered document against previously known WMS server documents to avoid duplication.
5. The method according to claim 1, further comprising: adding the HTTP URLs responses that comply with the predetermined XML schema to a database of available servers.
6. The method according to claim 4, further comprising web crawling the HTTP URLS in the database periodically to discover non-operational links.
7. The method according to claim 1, wherein the web crawling includes web crawling a predetermined set of additional web sites.
8. The method according to claim 1, wherein predetermined set of additional web sites include GIS web sites.
9. The method according to claim 1, wherein the predetermined XML schema is a schema published by the Open Geospatial Consortium.
10. The method according to claim 1, wherein the appended query requests a Web Mapping Services Capabilities document.
11. The method according to claim 1, wherein the appended query includes ?REQUEST=GetCapabilities.
12. The method according to claim 1, wherein the appended query includes ?REQUEST=GetCapabilities&SERVICE=WMS.
13. The method according to claim 1, wherein said searching includes searching using an API.
14. The method according to claim 1, further comprising: repeating said searching, said web crawling, said appending and said evaluating at periodic intervals.
15. The method according to claim 5, further comprising: converting map requests for at least one particular map server in the database to a GIDB Portal Interface API.
16. A computer-based method for providing a single point of access to geospatial information system web services, comprising: periodically searching the internet with an internet search algorithm; andthe algorithm determining whether a web service is valid by engaging the web services in a query-response interchange.
17. The method of claim 16, further comprising: storing the valid web service in a list of available web services.
18. The method of claim 17, further comprising: a geospatial portal receiving a map request from a remote computer;the geospatial portal transferring documents to the remote computer from the stored list of valid web services.
19. The method according to claim 18, wherein the documents appear to the remote computer user to come from a single service.
20. The method according to claim 17, wherein sources of information or web services which return a valid response are themselves considered valid and added to the list of sources available to users.
21. A computer system including a server connected to the internet and having software for identifying internet web pages containing documents that comply with a predetermined XML schema, the software configured to: search the internet with a search engine for web pages using initial search terms and identify a first set of HTTP URLs;web crawl at least the first set of HTTP URLs to identify additional HTTP URLs;append a query to the identified URLs;evaluate the responses to the query to determine which responses comply with the predetermined XML schema; andstore a list of valued URLs having responses with the predetermined XML schema.
22. The system according to claim 21, further comprising: the web crawler selecting URLs of the first set of HTTP URLs that are a document type that is unlikely to contain more HTTP URLs, and not web crawling the selected URLs.
23. The system according to claim 22, wherein the selected URLs are at least one of music files, image files, and executable files.
24. The system according to claim 21, further comprising: comparing a newly discovered document against previously known WMS server documents to avoid duplication.
25. The system according to claim 21, further comprising: adding the HTTP URLs responses that comply with the predetermined XML schema to a database of available servers.
26. The system according to claim 24, further comprising web crawling the HTTP URLS in the database periodically to discover non-operational links.
27. The system according to claim 21, wherein the web crawling includes web crawling a predetermined set of additional web sites.
28. The system according to claim 22, wherein predetermined set of additional web sites include GIS web sites.
29. The system according to claim 22, wherein the predetermined XML schema is a schema published by the Open Geospatial Consortium.
30. The system according to claim 22, wherein the appended query requests a Web Mapping Services Capabilities document.
31. The system according to claim 21, wherein the appended query includes ?REQUEST=GetCapabilities.
32. The system according to claim 21, wherein the appended query includes ?REQUEST=GetCapabilities&SERVICE=WMS.
33. The system according to claim 21, wherein said searching includes searching using an API.
34. The system according to claim 21, further comprising: repeating said searching, said web crawling, said appending and said evaluating at periodic intervals.
35. The system according to claim 21, further comprising: converting map requests for at least one particular map server in the database to a GIDB Portal Interface API.
36. A computer-based system including a server connected to the internet and having software for providing a single point of access to geospatial information system web services, the software configured to: periodically searching the internet with an internet search algorithm; andthe algorithm determining whether a web service is valid by engaging the web services in a query-response interchange.
37. The system of claim 36, further comprising: storing the valid web service in a list of available web services.
38. The system of claim 37, further comprising: a geospatial portal receiving a map request from a remote computer;the geospatial portal transferring documents to the remote computer from the stored list of valid web services.
39. The system according to claim 38, wherein the documents appear to the remote computer user to come from a single service.
40. The system according to claim 37, wherein sources of information or web services which return a valid response are themselves considered valid and added to the list of sources available to users.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a nonprovisional application of provisional application 60/809,991 filed on May 24, 2006 under 35 USC 119(e), the entire disclosure of which is incorporated herein by reference.

Provisional Applications (1)

	Number	Date	Country
	60809991	May 2006	US

System and Method for Automated Discovery, Binding, and Integration of Non-Registered Geospatial Web Services

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

Provisional Applications (1)