A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights.
The present application relates generally to telecommunications and more particularly to a system and method for generating an aggregate website search database using smart indexes for searching.
Web sites host and provide information using web pages that are communicated electronically via a telecommunications network. Accessing this information by some client computing devices can be challenging. Computing devices are becoming smaller and increasingly utilize wireless connectivity. Examples of such computing devices include portable computing devices that include wireless network browsing capability as well as telephony and personal information management capabilities.
The smaller size of such client devices necessarily limits their display capabilities. Furthermore the wireless connections to such devices typically have less or more expensive bandwidth than corresponding wired connections. The Wireless Application Protocol (“WAP”) was designed to address such issues, but WAP can still provide a very unsatisfactory experience or even completely ineffective experience, particularly where the small client device needs to effect a connection with web sites that host web pages that are directed to traditional full desktop browsers. In addition, the ability to access data from multiple websites concurrently and extract relevant data in meaningful ways can be difficult and time consuming.
Signature schema documents, may be pre-defined using a query language to provide instructions for application by an engine to extract data from web pages of respective web sites for storage to an aggregate database. For a particular web page, signature schema instructions identify a web page family for the web page and extract desired data from the web page in accordance with its web page family. The instructions use signatures previously identified within web pages of the same family to distinguish the web page family (e.g. in accordance with a shared template for each family) from others of the web site and to distinguish the desired data from other data for the web page family. A gateway server may receive data from a web site and apply signature schema instructions maintained in a repository coupled to the engine. Extracted data can be cached to a database coupled to the engine to facilitate querying of the data to enable aggregate results to be presented to a client machine (e.g. a wireless communication device). Smart indexes can be generated based upon data stored in the aggregate database to facilitate knowledge based queries as opposed to keyword search queries.
In accordance with the present disclosure there is provided a method for generating an aggregate website search database index, the method comprising: sending a page request to a web site selected from one or more web sites; receiving the requested web page from the selected web site; retrieving signature schema associated with the selected web site; applying signature schema to the requested web page to extract data identified by the signature schema; creating an index using the signature schema defined for the extracted data for the one or more web sites; and storing extracted data and indexes to an aggregate database comprising data extracted from the one or more web sites wherein the indexes define relationships between data extracted from the requested web page identified by the signature schema.
In accordance with the present disclosure there is provided a system for generating an aggregate website search database using smart indexes for searching, the system comprising: at least one computing device comprising a processor and a memory coupled thereto, said memory storing instructions and data for configuring the processor to provide a engine to: send a page request to a web site selected from the one or more web sites; receive the requested web page from the selected web site; retrieve signature schema associated with the selected web site; apply signature schema to the requested web page to extract data identified by the signature schema; create an index using the signature schema defined for the extracted data for the one or more web sites; and store extracted data and indexes to an aggregate database comprising data extracted from the one or more web sites wherein the indexes define relationships between data extracted from the requested web page identified by the signature schema.
In accordance with the present disclosure there is provided a computer program product storing computer readable instructions which when executed by a computer processor configure the processor for: sending a page request to a web site selected from one or more web sites; receiving the requested web page from the selected web site; retrieving signature schema associated with the selected web site; applying signature schema to the requested web page to extract data identified by the signature schema; creating an index using the signature schema defined for the extracted data for the one or more web sites; and storing extracted data and indexes to an aggregate database comprising data extracted from the one or more web sites wherein the indexes define relationships between data extracted from the requested web page identified by the signature schema.
In accordance with the present disclosure there is provided a method for generating an aggregate website search database index, the method comprising: sending a page request to a web sites selected from the one or more web sites; receiving the requested web page from the web site; retrieving signature schema associated with the selected web site; applying signature schema to the requested web page to extract data identified by the signature schema; creating an index using the signature schema defined for the extracted data for the one or more web sites; storing extracted data and indexes to an aggregate database comprising data extracted from the one or more web sites wherein the indexes define relationships between data extracted from the requested web page identified by the signature schema and wherein the signature schema comprise eXtensible Markup Language (XML) documents comprising query language for extracting data from the selected web page; receiving a search query from a client machine for data stored in the aggregate database; determining a database index associated with one or more parameters of the search query; generating a database query based upon the received search query using the determined index; retrieving data from the aggregate database based on the database query; and providing the retrieved data to the client machine. The client machine may be a wireless device with the aggregate database and indexes resident on the device.
Referring now to
In the present embodiment, web sites 103, 104 and 105 host web sites which contain data that is to be aggregated into database 126. For example, web site 104 comprises a web server 106 serving web pages (e.g. 110) defined from a plurality of web page family templates 108A-108D (collectively 108) and web page content (described further herein below) from data store 112. In the present embodiment of system 100, gateway and schema server 120 is coupled to a schema repository 124 from which to obtain a signature schema 122 for a particular web site. Signature schema documents (e.g.122) provide instructions and data with which an engine 140 of server 120 can extract data from web pages (e.g. 110) and transcode same to a target format to provide transcoded web page data (e.g. 130 and 132) to the respective requesting client machines 102A and 102B as described more fully below. Gateway and schema server 120 may also be coupled to a database 126 for retrieving/storing data extracted from web sites in accordance with its operations. The database 126 may be a relational database for storing extracted data object and elements and their relationships from web sites in relation to the defined signature schema. The stored data can be accessed by a Structured Query Language (SQL) to retrieve desired data from database 126. Signature schemas for respective web sites may be defined (e.g. coded) using a computing device 128 as described herein below. A web server 125 is coupled to the aggregate web site database 126 to enable access to the aggregated web site database 126 data by a web site 150. The web server 125 can also provide a data collection engine 152, or web crawler, for sending requests to web sites 103, 104 and 105 for desired page and provide content to schema engine 140 for processing.
Representative client machines 102 include any type of computing or electronic device that can be used to communicate and interact with content available via web sites. Each of the client machines 102 may be operated by a respective user U (not shown). Interaction with a particular user includes presenting information on a client machine (e.g. by rendering on a display screen) as well as receiving input at a client machine (e.g. such as via a keyboard for transmitting to a web site). In the present embodiment, client machine 102A comprises a mobile electronic device with the combined functionality of a personal digital assistant, cell phone, email paging device, and a web-browser. Such a mobile electronic device may comprise a keyboard (or other input device(s)), a display screen, a speaker, (and other output device(s) (e.g. LEDs)) and a chassis for housing such components. The chassis may further houses one or more central processing units, volatile memory (e.g. random access memory), persistent memory (e.g. Flash read only memory) and network interfaces to allow client machine 102A to communicate over the telecommunication network.
Referring now to
Programming instructions that implement the functional teachings of client machine 102A as described herein are typically maintained, persistently, in non-volatile storage unit 212 and used by processor 208 which makes appropriate utilization of volatile storage 216 during the execution of such programming instructions. Of particular note is that non-volatile storage unit 212 persistently maintains a web browser application 86 and, in the present embodiment, a native menu application 82, each of which can be executed on processor 208 making use of non-volatile storage 216 as appropriate. An operating system and various other applications (not shown) are maintained in non-volatile storage unit 212 according to the desired configuration and functioning of client machine 102A, one specific non-limiting example of which is a contact manager application (also known as an address book, not shown) which stores a list of contacts, addresses and phone numbers of interest to user U and allows user U to view, update, and delete those contacts, as well as providing user U an option to initiate telecommunications (e.g. telephone, email, instant message (IM), short message service (SMS)) directly from that contact manager application.
Native menu application 82 may be configured to provide menu choices to user U according to the particular application (or other context) that is being accessed. By way of example, while user U is activating the contact manager application, user U can activate menu application 82 to access one or more menu choices available that are respective to contact manger application 90. For example, menu choices may include options to invoke other applications (e.g. a mapping application to map a contact's address) or communication functions (e.g. call, SMS, IM, email, etc.) on the client machine 102A for a particular contact. Menu application 82 may be associated to a particular input button (e.g. one of buttons 200) and invoked to provide a contextual menu comprised of one or more menu choices that are reflective of the context in which the button 200 was selected. Note that the options in a contextual menu are stored within non-volatile storage 212 as being specifically associated with a respective application. Menu application 82 may be therefore configured to generate one or more different contextual menus that are reflective of the particular context in which the menu application 82 is invoked. For example, in an email application where an email is being composed, invoking menu application 82 would generate a contextual menu that included the options of sending the email, cancelling the email, adding addresses to the email, adding attachments, and the like. The contents for such a contextual menu would also be maintained in non-volatile storage 212. Other examples of contextual menus will occur to those of ordinary skill in the art.
Returning now to
Gateway and schema server 120 hosts software applications comprising instructions and data for proxying requests and responses between the client machines 102 and web sites 103, 104 and 105. In addition to software for maintaining HTTP communications, performing requests, maintaining sessions, handling cookies, etc., engine 140 may be implemented in software to apply the signature schemas to web pages from web sites. There may be provided an interpreter that interprets the signature schema document and applies the actions against the web page code (as an ASCII (plain text) document)to extract desired data to produce a result set. A renderer may be provided to express the desired data result set (i.e. transcode to a target format such as cHTML (Compact HTML) for a mobile device browser) for transmitting to the client machines also in accordance with the signature schema.
The web server 125 provides web pages to the requesting client machine through a browser or application on the client for rendering. The web data may be directly pushed to client machines 102A by e-mail or by other push based applications, or the data may be accessed by queries to web site 150 directly. The web site 150 may also extract content from the aggregate database 126 and apply a signature schema 122 to the extracted database data, which schema may be configured to transcode the data in accordance with the target client machine 102A to tailor the output result.
Machines 102, schema server 120 and web sites 103, 104, 105 and 125 are coupled via a telecommunication network (not shown) typically comprising one or more interconnected networks that may include wired and (at least for machine 102A) wireless networks. It should now be understood that the nature of the network is not particularly limited and is, in general, based on any combination of architectures that will support interactions between client machines 102 and servers 106 and 120. In a present embodiment the network includes the Internet as well as appropriate gateways and backhauls.
More specifically, in the present embodiment, a wireless network for client machine 102A may be based on core mobile network infrastructure (e.g. Global System for Mobile communications (“GSM”); Code Division Multiple Access (“CDMA”), Enhanced Data rates for GSM Evolution (“EDGE”), Evolution Data-Optimized (“EV-DO”), High Speed Downlink Packet Access (“HSPDA”), Universal Mobile Telecommunications System (“UMTS”), etc.) or on wireless local area network (“WLAN”) infrastructures such as the Institute for Electrical and Electronic Engineers (“IEEE”) 802.11 Standard (and its variants) or Bluetooth or the like or hybrids thereof. In the present embodiment of system 100 it is contemplated that client machine 102B may be another type of client machine such as a PC (desktop or laptop) configured to include a full desktop computer or as a “thin-client”. Typically such have larger display monitors/screens than portable machines like 102A. A wired network for system 100 and machine 102B can be based on a T1, T3 or any other suitable wired connection.
As previously stated in relation to
Client machine 102A can then make a request 312 to web site 150 on server 125 for a query to the database 126 regarding desired web sites having a specific domain (URL). Web site 150 requests 314 relevant data from the database 126. The results are extracted 316 and are sent 318 to the client machine as aggregate results, or as a proxy as if the query was made directly to the source web site, and transcoded in accordance with the schema 122, to the requesting client machine 102A processed by the signature schema engine 140 before presentation by web server 125. Alternatively, the data may be pushed to a client machine to a push based application. As noted above, retrieved data 130 may also comprise transcoded navigational data for menu application 82 and informational content data (e.g. a list of products and related information from a web page) for displaying by browser application 86. The process can then be repeated for each identified website such as web sites 103 and 105.
Signature schemas are pre-defined documents, and may be eXtensible Markup Language (XML) documents utilizing an SQL-like query language, to incorporate instructions and data with which to intelligently extract the data from web pages (which web pages are typically coded in HTML, DHTML, XHTML, XML, RSS, Javascript, etc). This extracted data may be transcoded and provided to client machines 102, used to dynamically generate a relational database (e.g. 126) or both. Each signature schema incorporates an understanding of a particular web site's data including relationships among the various data (e.g. among its primary informational content found in the body of its web pages as well as among such content and associated navigational data (e.g. web page links) that govern the data in the page. As described further herein below, prior knowledge of the web page code including specific identifiers, tags and text (i.e. strings) used within the code (sometimes referred to as “signatures” herein), may be used to define instructions to identify portions of the code of interest and to extract specific desired data.
A signature schema document may be defined for all the pages of a particular web site. Large data-driven web sites (e.g. 104) don't maintain thousands of individual web pages per se. The sites adopt a few page family templates 108 and dynamically populate these with pertinent content from database 112 comprising information (e.g. weather, stock data, news, shopping/product data, patent data, trade-mark data etc.) as applicable when a client requests a particular page. Each template represents a family of pages having objects and attributes. Below are representative example page family templates and their objects and attributes for a web site offering news and an e-commerce web site offering products for sale electronically:
Each family of pages (the family template) can be identified by a “signature” or unique set of one or more features that automatically identifies a given page on a web site as part of the family and differentiates that family from another family of pages. Similarly each object and attribute field of interest can be identified with its respective unique signature within a family of pages. A signature schema document typically comprise numerous pieces of information (commands), for example, information that instructs the engine 140 for:
A signature schema document may also be configured to enable special functionality for the target web site including searching, logging in a user, purchasing items, etc.
In accordance with a present embodiment, the structure and syntax of a representative signature schema document for a representative e-commerce site eshop.ca is shown and described. Engine 140 may be configured to receive web page code comprising text data and search through the text in accordance with the schema document instructions that provide SQL-query like language instructions. Engine 140 maintains a pointer within the text as it moves through the web page code performing various actions, as described below, in accordance with the schema instructions. Table 1 illustrates a snippet of a representative signature schema:
In the XML code snippet of Table 1, instructions at line 4 are for verifying that the web page under consideration and the signature schema relate to the same web site/domain—eshop.ca. Instructions at lines 9-15 are for determining the particular page family to which the web page under consideration belongs. A respective signature that defines the particular page family has been previously identified for use to distinguish the page. The engine 140 processes the <page type> tag by registering the identification strings for each page family. When a web page is obtained by the engine as input, the engine may be able to identify the page family by its unique string ref=” and the command provides the related tag within the signature schema document where further instructions for the particular web pages are found:
For example, at line 10, the instructions identify a web page using the alternative signatures “Compare products” or “Sort Products”. Web pages with these strings are of the same family type. The instructions at line 10 provide a reference tag to further instructions for this family, providing a link to instructions for the list_elements page family with and ID of mylist—1 (see lines 16-17). Similarly the other lookup instructions provide references to the specific instructions within the signature schema document for handling a web page of each web page family. Representative instructions for some of the web page families are provided in Table 1, for example, at lines 16-17 and 18-29 with others omitted for brevity.
With reference to the extraction instructions for one of the web page families (i.e. item_elements id=“myitem—1”) at lines 18-29, the instruction at line 20 advances the scan pointer within the text file of the web page code to a beginning limit of a region of interest indicated by a signature reference. This establishes an upper limit for review within the text file. Though not shown in this table, an end limit may be defined as well (See Table 4). Further such instructions at lines 22-28 may comprise commands to locate desired data using “signatures” such as string identifiers that uniquely identify the data within the region of interest. In the present example the instructions locate and extract one or more elements, namely, product image, title, price, sale price and description for a product of the item web page family. For example, instructions at line 23 extract a string in between the first “&It;img src="” and “"” that appears after next appearance of “largeimageref”. The string returned is the path (relative URL at web site eshop.ca) to the product image. By advancing a search scan pointer within the web code to a desired location, references before that location can be skipped when searching. Any prior instances of a signature string such as “largeimageref” may be ignored. In this way, otherwise ambiguous signature references can be avoided.
The example in Table 1 shows at least some of the instructions (e.g. lines 23-27) including one or more directional references relative to the signatures to locate and extract the desired data. For example, directional references such as “before” or “after” command the engine to extract desired data that is in a relative position in the web page before or after the signature string (i.e. ref=). Moreover, such instructions may further include at least one of a start reference or an end reference further pinpointing the location of the desired data in accordance with that direction. Additional directional reference information is discussed herein with reference to code snippets in other Tables and the discussion of an embodiment of signature transcoding engine syntax presented below.
The example within Table 1 demonstrates the extraction of data and the establishment of relationships between objects and elements within a same page of a web site. However, signature schema documents may further capture relevant attributes of an object across pages. For example, a user of client machine 102A may click through a number of web pages in eshop.ca to get to a specific product page (e.g. Department→Product Category→Product Sub-Category→Specific Product, such as TV & Video>19″-21″ TVs>LCD TVs>BrandX Product. The navigational hierarchy representing a categorization may be captured and associated to the extracted objects and there elements.
For brevity, certain instructions were omitted from Table 1. Tables 2-4 provide representative instructions for further web page families for e-shop.ca that may be read with Table 1. Table 2 below provides representative instructions, e.g. for lines 16 and 17 of Table 1, including instructions for a web page family related to a list of items/products for sale. Whereas instructions at lines 22-28 provided product data extraction instructions for a web page family showing a single item (i.e. product), the instructions of Table 2 provide additional instructions that repeat product data extractions for each product in the list.
If the engine 140 identifies that the page is of the “mylist—1” family, the engine determines the location in the signature schema document that contains the signature for the objects and elements of that family and applies the instructions therefor. A product list at e-shop.ca may span multiple web pages. Instructions at lines 2-6 of Table 2 find the number of pages and generate the links for each of the pages. Instructions at lines 7-9 (action tag) advance the search scan pointer to the region of web page code that may be of interest (i.e. in this case, the start of the list). In this way, a local signature reference can be used and any earlier ambiguous references skipped. Skipping to the local region of interest may also make the specification of the signature reference less complicated.
Taking advantage of inherent repeated patterns in the web page code, instructions at lines 10-16 (elements tag) of Table 2 provide product data extraction instructions that may be repeated for each product in the list. The engine 140 may be provided with commands to scan for each data element of interest using a signature reference e.g. ref=“, an action, one or more positional instruction(s) to further identify the data within the text of the web page code, and any additional text data manipulation instructions to extract the desired data (e.g. to remove HTML formatting characters or to add characters). The instruction at line 15 moves the scan pointer to the end of the object (in this example a product in a list of products) to ready the instructions for application against the next object (product) in the list.
More particularly:
If the engine 140 has identified that the page is of the “mysearch—1” family the engine applies the portion of the signature schema document that contains the signature for the objects and elements of that family, shown above in Table 3.
If the engine 140 has identified that it is looking for a menu on a page that contains the menu style of the “mymenu—1“family, the engine applies the portion of the signature schema document that contains the signature for the objects and elements of that family, shown above in Table 4.
Though the example described relates to extracting informational content for an e-commerce oriented site, no limitation should be applied. Similar instructions may be defined for other types of sites, for pages which permit a user to input information and for navigational data extraction.
Signature schema document 122 may further comprise transcoding instructions (not shown) for use by engine 140 to express the extracted desired data (which may be retrieved from database 126) in a target format (e.g. a format of HTML, XML, script etc.) for use by the requesting client machine 102. For example, the transcoding instructions may define a web page for displaying the extracted data in browser application 86 that is suitable for display on the client machine 102. The formatting rules can be system and/or user defined and can include one or more parameters such as but not limited to: object positioning, object colour, object size, object shape, object font/image characteristics, background style, and navigational item display (e.g. in a menu as described above) or for display with the content in the generated page on the client screen. Browser application 86 (e.g. of machine 102A) may be configured for using a markup language (e.g. cHTML) or other code format that is not identical to the code provided by web page 110. Alternatively, transcoding instructions may be defined to express the extracted desired data in XML or another code format such as for use by a different client application or plug-in to a client application such as menu application 82 or another application (not shown) on client machine 102.
Signature schema documents may be prepared (i.e. coded) using a computing device such as computing device 128. Computing device 128 may be any suitable desktop or laptop device capable of coding documents (which may be but need not be XML-type documents) and may be configured to automate or semi-automate coding of such documents.
Computing device 128 may be coupled to web site 104 to retrieve web pages from the site for reviewing to prepare the custom signature schema document for the site. Computing device 128 may be configured to automatically review the web page code and apply heuristics or other techniques (e.g. spatial analysis) to determine probable content of interest (i.e. desired data) and generate code to extract the desired data. For example, primary content of interest tends to be located toward the centre of the web page. In another embodiment, the computing device may facilitate a user coding signature schema to manually assist with the analysis of the web page and identification of desired data and the generation of the instructions. Computing device 128 may be further coupled to repository 124 to provide (e.g. up-load or publish) coded signature schema documents for use by server 120.
It will be apparent to a person of ordinary skill in the art that as a web site may be re-designed or otherwise changed such that the code of one or more web page families may be changed or a family added, an existing signature schema may require re-coding to account for the change/addition, as applicable.
In accordance with a present embodiment, further details concerning the syntax of schema instructions are described.
The lookup tag instructs the engine 140 to perform an insert, delete or query the document contents.
Type: Defines the data type of the lookup. Type may be “pex” for a string expression. Type may also support more advanced options such as regular expressions, API calls, and SQL queries.
Action:
Action=“locate_string”: Look for a string (“ref” identifier”) value within the data. Return true iff the string exists in the data (i.e. the “ref” identifier index>=0).
Action=“replace_string”: Replace a string within the data with the “ref” identifier.
Action=“move_ptr”: Remove all characters in the data that exist before the location of the “ref” identifier.
Action=“end_ptr”: Remove all characters in the data that exist after the location of the “ref” identifier.
Action=“get_string” Extract a string based on the location of the “ref”, “start”, and “end” identifiers.
ID: ID is an identifier of another section within the signature. It allows the result of a query to trigger another set of actions within the signature. This is primarily used when identifying page types. Once a match has been made, specific instructions are executed that are marked with this ID. Recursive data structures (e.g. lists within lists) may also be supported.
Ref: Ref defines the initial identifier that the lookup searches for. If an AND case is required multiple ref identifiers can be used (i.e. ref=“string1” ref1=“string2”). If an OR case is required ref_[ref identifier]_alt—1 can be used (i.e. ref=“string1” ref_alt—1=“string2”). To demonstrate (X=“1”∥Y=“2”) && (A=“8”∥B=“9”) would translate to ref=“1” ref_alt—1=“2” ref1=“8” ref1_alt—1=“9”.
Repeat_[identifier]: Repeat executes the identifier query additional times. For example, if ref=“hello” to set the identifier index at the second occurrence of hello the following tag would be added: repeat_ref=“1”.
Location:
Location=“before”: Search the data in a reverse direction, starting from the “ref” identifier. This implies that both the “start” and “end” identifier indexes must be less than the “ref” index.
Location=“middle”: Search the data in two directions, starting from the “ref” identifier. This implies that the “ref” identifier index is greater than the “start” identifier index and less than the “end” identifier index.
Location =“after”: Search the data in a forward direction, starting from the “ref” identifier. This implies that both the “start” and “end” identifier indexes must be greater than the “ref” index.
Start: Start is primarily used when action=“get_string” and may also be used for replace/remove instructions. The start identifier index will be the start index of the string to extract. If an AND case is required multiple “start” identifiers can be used (i.e. start=“string1” start1=“string2”). If an OR case is required start_[start identifier]_alt—1 can be used (i.e. start=“string1” start_alt—1=“string2”). To demonstrate (X=“1”∥Y=“2”) && (A=“8”∥B=“9”) would translate to start=“1” start_alt—1=“2” start1=“8” start1_alt—1=“9”. To find the nth match see the repeat syntax.
End: End is primarily used when action=“get_string” and may also be used for replace/remove instructions. If an AND case is required multiple “end” identifiers can be used (i.e. end=“string1” end1=“string2”). If an OR case is required end_[end identifier]_alt—1 can be used (i.e. end=“string1” end_alt—1=“string2”). To demonstrate (X=“1”∥Y=“2”) && (A=“8”∥B=“9”) would translate to end=“1” end_alt—1=“2” end1=“8” end1_alt—1=“9”. To find the nth match see the repeat syntax
Max_index: Max_index is used to limit the scope of a query by ensuring that no other identifier index is greater than the “max_index” . . . If an AND case is required multiple “max_index” identifiers can be used (i.e. max_index=“string1” max_index1=“string2”). If an OR case is required max_index_[max_index identifier]_alt—1 can be used (i.e. max_index=“string1” max_index_alt—1=”string2”). To demonstrate (X=“1”∥Y=“2”) && (A=“8”∥B=“9”) would translate to max_index=“1” max_index alt—1=“2” max_index=“8” max_index_alt—1=“9”. To find the nth match see the repeat syntax.
Max_Index_Use_Ref: Max_Index_Use_Ref is a Boolean value set to 0 or 1. It is used with Max_index. When set to 0, the “max_index” will begin querying at the beginning of the data. When set to 1, the “max_index” will begin querying from the “ref” identifier index.
Gbl_append_[identifier]: Gbl_append appends a string passed via the url to the identifiers query value
Gbl_Repeat_[identifier]: Gbl_Repeat executes the identifier query additional times. For example, if ref=“hello” to set the identifier index at the second occurrence of hello the following tag would be added: gbl_repeat_ref=“var” where var would be passed in the URL i.e. http://www.eshop.ca/mobile/fatfree.asp?site= . . . &url= . . . &var=1.
Tolerance: Tolerance is a Boolean value set to 0 or 1. It is used to return an empty string. By default tolerance is set to 0 which enforces that a property be found on a page, otherwise the page will be marked as “invalid” and an appropriate error message returned. When set to one, an empty value is returned for properties that can not be located.
Include_sz: Include_sz is a Boolean value set to 0 or 1 and used with get string. It is by default set to 0. When set to 1 it includes the “start” value and the “end” value as part of the result.
Include_start: Include_start is a Boolean value set to 0 or 1 and used with get_string. It is by default set to 0. When set to 1 it includes the “start” value as part of the result.
Include_end: Include_end is a Boolean value set to 0 or 1 and used with get_string. It is by default set to 0. When set to 1 it includes the “end” value as part of the result.
Closetag: Closetag is a Boolean value set to 0 or 1 and used when action=“get_string”. It appends /> to the extracted value.
Strip_Tags: Strip_Tags removes HTML tags from the value and used when action=“get_string”.
Strip_tags=“1”: remove all tags.
Strip_tags=“2”: remove all br and script tags.
Strip_tags=“3”: remove all tags except replace </p> </li> with <br>.
Strip_tags=“4”: remove all tags except replace </div> <br> with <br>.
Strip_tags=“tag1,tag2 . . . tagN”:remove all tag1, tag2, . . . tagN leaving any tag not listed.
Notrim: Notrim is a Boolean value set to 0 or 1 and used when action=“get_string”. By default all value have white spaced trimmed. When this property is set to 1, white space is not trimmed.
Append: Append is a string value and used when action=“get_string”. It appends a string to the extracted value.
Prepend: Prepend is a string value and used when action=“get_string”. It prepends a string to the extracted value.
Upper: Upper is a Boolean value set to 0 or 1 and used when action=“get_string”. It converts all characters to upper case.
Lower: Lower is a Boolean value set to 0 or 1 and used when action=“get_string”. It converts all characters to lower case.
Page Syntax
The page syntax extracts the paging information from the data. This allows the end user the ability to change pages just as on the desktop.
Page_variable: Defines unique key that defines a family's paging feature.
Page_start: Defines value of first page in a family's paging feature.
Page_post: Path where paging variable(s) must be transmitted to.
Page_start:Defines value of first page in a family's paging feature.
Page_increment: Defines value that paging increases by for each page in a family's paging feature.
Page_block: Defines unique key that defines a family's paging block feature.
Page_block_size: Defines the size of the family's page block. (i.e. 10 items per page)
Url_append: Append the unique key that defines a family's paging feature and the page number.
Search Syntax
Make a website family's search feature functional by specifying details such as what variable to post.
Search_path: Search path where search variable must be transmitted to
Search_variable: Name of search variable which a website's search feature is looking to read, request, post, etc.
Url_replace: Remove a portion of the url that is specific to posting search parameters
URL Syntax
The url tag defines global properties for a site, including the url, and name: <url location=“http://www.eshop.ca” key=“eshop.ca” name=“E-Shop”/>
Name: Name is the name to display when browsing using the gateway 120
Location: Location defines the fully qualified address of the site.
Key: Key is the site.
Advanced Syntax
The advanced tag defines global properties for the site. This at a minimum includes the path to the initial page of the site.
Index_link: Index_link specifies the path to the initial page of the site. This is usually the same page as the location property from the URL syntax. This field is always required.
Append_link: Appends a string value to every URL requested for this site.
No_purchase: No_purchase is a Boolean value 0 or 1. The default value is 0 which implies that an item should contain a purchase link. When true, the purchase link is removed.
No_item: No_item is a Boolean value 0 or 1. The default value is 0 which implies that Item pages should show up in the breadcrumb. When true, the item is not added to the breadcrumb.
Check_out: Check_out is a Boolean value 0 or 1. The default value is 0 which implies that Item purchase link sends the request and control away from the gateway server 120. When true, then a checkout process has been created for use with gateway server 120.
Producing_width: Product_img_width defines the width of all item images.
Use_cookies: Use_cookies a Boolean value 0 or 1. By default it is set to 0, and cookies are not passed to the site. When true, gateway 120 passes all cookies from client machine 102 to the site 104, and from the site 104 to the client machine.
Page Type Syntax
The page type is a collection of lookup queries that have an id associated with them.
Lookup queries may be processed in a top down fashion. The first successful lookup will trigger another section in the signature schema document. For example, if the following evaluates to true:
Then the tag element <list_elements id=“mylist—1”> would be executed next.
General Element Syntax
Elements include list_elements, menu_elements, item_elements, search_elements, form_elements. Each element has an ID. For example a menu element:
The element may contain the following sub containers (settings, actions, elements, paging) which scope resides only within the element. Each element is associated with a specific rendering function.
Settings Syntax
Settings syntax varies based on the type of element it resides in. Settings allow customizations that only apply to a specific page family.
Black_list—menu_elements: Black_list removes menu items with names that reside in the black list. Each entry is separated delimited (e.g. using two pound characters
Pass_image—list_elements, search_elements: Pass_image adds the image path to the url when requesting an item. The image added to the url will be used as the item image.
Price[n]—item_elements: Price[n] where n is an integer renames the rendered item with name price[n].
Action—form_elements: Overrides the action of a form displayed to the end user.
Handle—form_elements
Handle=“display”—display the form to the end user.
Handle=“post”—post the form.
Handle=“get”—get the form.
Cookie—form_elements: Send additional cookies when posting this form.
Input_[identifier]—form_elements: Input tag adds/modifies a form value with name [identifier] setting its value.
Rename_[identifier]—form_elements: Rename tag renames a form value with name [identifier].
Actions Syntax
The actions tag primary function is data manipulation. It contains lookup queries that modify data with actions of “move_ptr” or “end_ptr”.
Persons of ordinary skill in the art will appreciate that alternative embodiments are contemplated. System 100 may be implemented so that one or more web sites 103, 104 and 105 are coupled via to the telecommunication network (either alone by a server 106 or by one or more web servers like web-server 106), and that a corresponding plurality of schemas for each of those web sites (or each of the web pages therein, or both) can be maintained by gateway and schema server 120 and repository 124. Client machines 102 can be configured for proxied connections through different servers 120 and for accessing aggregated web site data from database 126. Those skilled in the art will now further recognize that servers 120 and web server 125 can be hosted by a variety of different parties, including, for example but without limitation: a manufacturer of client machine 102, a service provider that provides access to the telecommunication network on behalf of user U of a client machine 102; the entity that hosts web-site 104 or a third party intermediary. In web site host example it can even be desired to simply combine the web server 106 and schema server engine 120 on a single server to thereby obviate the need for separate servers. Alternatively the functionality of server 120 and web server 125 may be locally resident on the client machine providing.
This application claims the benefit of the prior filing of U.S. Provisional Patent Application Ser. No. 60/924503 filed May 17, 2007, the disclosure of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60924503 | May 2007 | US |