Semantic technologies, which essentially use metadata to describe meanings of data, content files, and/or application code, are evolving and being adopted for mainstream uses. In semantic web technology, vocabularies define the concepts and relationships used to describe and represent an area of concern or interest. As one example, with a web page, metadata in a vocabulary may be included in markup or the like that describes something about the content that is on the page, which helps search engines better understand the web page and thus provide better search results.
In general, vocabularies are used to classify the terms that can be used in a particular application, characterize possible relationships between terms, and define possible constraints on using those terms. Vocabularies help data integration when ambiguities may exist on the terms used in the different data sets, or when extra knowledge may lead to the discovery of new relationships. Vocabularies can be very complex (on the order of several thousands of terms) or very simple (describing one or two concepts only).
However no vocabularies are comprehensive. As a result, users are limited to using known vocabularies, or have to build their own vocabularies. Thus, implicitly or explicitly, most web sites use a vocabulary that is compliant with a standard, or alternatively is custom developed. This causes lot of fragmentation in the web.
Additionally, there are a complex set of semantic schemas and technologies that can be used to define the vocabularies and share data. For example, in the technical publication area there is large number of vocabularies including schema.org (associated with microdata), DITA (Darwin Information Typing Architecture), and custom ones such as TechNet.
As a result of the various vocabularies/schemas, search engines and other middleware may interpret data differently. For example, consider a user trying to collect “How-To” guidance on a specific topic from different websites. In one site the content type may be called “How-To” while in another site the content type may be called “Technical Article” or “KB” (knowledge base). Further, interpreting these sites' content/data becomes complex, e.g., one site may refer to the article's writer as the term “author”, whereas another may use the term “creator.” Results are delivered with different levels of accuracy depending on the query and internal algorithms used, and such results are not always predictable.
This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed
Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
Briefly, various aspects of the subject matter described herein are directed towards a technology by which a request for semantic-related metadata in a second vocabulary is received, in which the request includes semantic-related metadata in a first vocabulary. The semantic-related metadata in the first vocabulary is translated to semantic-related metadata in the second vocabulary, which is returned in response to the request. In one implementation, the request may be received and processed at an intermediary web service.
In one aspect, a semantic intermediary may be configured to receive a request for data associated with one vocabulary, and to access mapping rules and a vocabulary collection to convert at least some of the data in the one vocabulary to data in another vocabulary. Data in the other vocabulary is returned in response to the request.
In one aspect, translation of web content-related metadata in one format to another format is requested. Upon receiving web content-related metadata in the other format in response to the request, the web content-related metadata in the other format is used to produce web-content related output. The web content-related metadata in the other format may be used to dynamically modify a web page for output.
Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.
The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Various aspects of the technology described herein are generally directed an intermediary, such as implemented as a web service, that manages vocabulary mapping and presents data in a format known to an end user. In one implementation, the intermediary understands a well-known set of vocabularies, applies a mapping (e.g., using a model) from one vocabulary to another, collects data from different sources, interprets the data based on vocabularies, acquires related data, converts data following the vocabulary relationships, and sends data back to users according to the vocabulary each user understands.
In one implementation, the intermediary stores or accesses known vocabularies, and uses an intermediary model to map from one vocabulary to another, with the knowledge to interpret the relationship between the terms used. The intermediary may retrieve data from multiple data providers and convert those data to new formats based on implied knowledge in the vocabulary mapping. The intermediary also may retrieve data from different data sources to fulfill the data relationship established in the vocabulary mapping. A technology mapping layer may be used to handle the set of semantic technologies and interpret the syntax and semantic to understand the data being exposed.
In
The intermediary 102 interprets the vocabulary needed for the client 110 and the vocabulary used by the data providers 114-116. To this end, the intermediary 102 may runs a set of models that interpret the element-by-element mapping, e.g., based on constraints applied in the mapping rules 106.
The intermediary applies data conversion as applicable, and fills-up data from the other data providers 114-116 based upon the mapping rules 106. The intermediary sends the data back to client 110 in a format consumable by the client 108.
The returned data is interpreted based on the vocabulary used by each site, and the data is converted to the client's known vocabulary. This process is based upon data transformation, technology interpretation and vocabulary mapping.
By way of another example represented in
In the example of
The service 222 translates between schemas, and returns a response 226 including metadata in microdata format to the website. The mapping rules 106 (
As a further part of the translation, the service 222 may translate using different ontologies for different domains. For example, the term “magazine” in a commercial domain may be translated to “journal” in an academic domain, and vice-versa. The request may specify domain information, and/or the service may recognize the domain information from the metadata or the website that sends the request or the targeted user or application.
In one alternate scenario generally exemplified in
The client 330 takes the response from the intermediary 332 and dynamically incorporates at least some of the metadata in format/vocabulary “Y” into a modified web page 336. As can be readily appreciated, this allows websites to continue to use one metadata language (e.g., RDFA), while being able to expose that metadata in an alternative language such as schema.org, or potentially expose the metadata in both languages. This transformation may happen during page construction in a Web server or during the page rendering in a browser or application. During page construction, the transformation to appropriate vocabulary metadata may be controlled by the Web content owner. If during page rendering, the transformation intermediary service understands the end user capabilities and requirements and can decide on the vocabulary transformation.
To this end, as represented in
As can be seen, there is provided a common layer that understands different vocabularies in the Web and maps the data into appropriate vocabulary through semantic interpretation on the similarity of the vocabulary and/or the interpretation on terms' relationships. Instead of interpreting the vocabulary at each of possibly many various end user applications, which would mean that the many end users/applications need to understand the known vocabularies, associated mappings, and technology choices so as to interpret the data, the intermediary executes the semantic mapping based on known vocabularies.
In this way, any website that is already committed to a standard vocabulary in a specific schema (e.g., rich snippet or RDFa) may pass this metadata to the intermediary with a request for converting data to a specific output vocabulary. The intermediary converts the metadata to the appropriate format and sends an appropriate response back. The website may include this converted information in a Web page or Web services, including possibly dynamically modifying a web page to include the metadata in the other format.
The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to
The computer 510 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 510 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 510. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above may also be included within the scope of computer-readable media.
The system memory 530 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 531 and random access memory (RAM) 532. A basic input/output system 533 (BIOS), containing the basic routines that help to transfer information between elements within computer 510, such as during start-up, is typically stored in ROM 531. RAM 532 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 520. By way of example, and not limitation,
The computer 510 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, described above and illustrated in
The computer 510 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 580. The remote computer 580 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 510, although only a memory storage device 581 has been illustrated in
When used in a LAN networking environment, the computer 510 is connected to the LAN 571 through a network interface or adapter 570. When used in a WAN networking environment, the computer 510 typically includes a modem 572 or other means for establishing communications over the WAN 573, such as the Internet. The modem 572, which may be internal or external, may be connected to the system bus 521 via the user input interface 560 or other appropriate mechanism. A wireless networking component 574 such as comprising an interface and antenna may be coupled through a suitable device such as an access point or peer computer to a WAN or LAN. In a networked environment, program modules depicted relative to the computer 510, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
An auxiliary subsystem 599 (e.g., for auxiliary display of content) may be connected via the user interface 560 to allow data such as program content, system status and event notifications to be provided to the user, even if the main portions of the computer system are in a low power state. The auxiliary subsystem 599 may be connected to the modem 572 and/or network interface 570 to allow communication between these systems while the main processing unit 520 is in a low power state.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.