The present specification relates generally to telecommunication and more specifically relates to a system and method for content navigation.
Computing devices are becoming smaller and increasingly utilize wireless connectivity. Examples of such computing devices include portable computing devices that include wireless network browsing capability as well as telephony and personal information management capabilities. The smaller size of such client devices necessarily limits their display capabilities. Furthermore the wireless connections to such devices typically have less bandwidth than corresponding wired connections. The Wireless Application Protocol (“WAP”) was designed to address such issues, but WAP can still provide a very unsatisfactory experience or even completely ineffective experience, particularly where the small client device needs to effect a connection with web-sites that host web-pages that are optimized for full traditional desktop browsers.
The present specification provides, amongst other things, a method and system for navigating content. In an embodiment a portable electronic device is provided having a browser application and a native menu application. The embodiment also includes a network that interconnects a web-server and said portable electronic device. The web-server hosts web pages that include menus and content. The portable electronic device is configured to obtain a schema respective to the web-pages whereby the web-page menus can be generated on the portable electronic device using the native menu application rather than the browser application, thereby permitting navigation of content on the portable electronic device via the native menu application.
Referring now to
Each client machine 54 is typically any type of computing or electronic device that can be used to interact with content available on network 66. Each client machine 54 is operated by a user U. Interaction includes displaying of information on client machine 54 as well as to receive input at client machine 54 that is in turn sent back over network 66. In a present embodiment, client machine 54 is a mobile electronic device with the combined functionality of a personal digital assistant, cell phone, email paging device, and a web-browser. Such a mobile electronic device thus includes a keyboard (or other input device(s)), a display, a speaker, (or other output device(s)) and a chassis within which the keyboard, display monitor, speaker are housed. The chassis also houses one or more central processing units, volatile memory (e.g. random access memory), persistent memory (e.g. Flash read only memory) and network interfaces to allow machine 54 to communicate over network 66.
Referring now to
Programming instructions that implement the functional teachings of client machine 54 as described herein are typically maintained, persistently, in non-volatile storage unit 212 and used by processor 208 which makes appropriate utilization of volatile storage 216 during the execution of such programming instructions. Of particular note is that non-volatile storage unit 212 persistently maintains a native menu application 82 and a web-browser application 86, each of which can be executed on processor 208 making use of nonvolatile storage 216 as appropriate. Various other applications (not shown) are maintained in non-volatile storage unit 212 according to the desired configuration and functioning of client machine 54, one specific non-limiting example of which is a contact manager application 90 which stores a list of contacts, addresses and phone numbers of interest to user U and allows user U to view, update, delete those contacts, as well as providing user U an option to initiate telecommunications (e.g. telephone, email, instant message, short message service) directly from that contacts application.
Native menu application 82 is configured to provide menu choices to user U according to the particular application (or other context) that is being accessed. By way of example, while user U is activating contact manager application 90, user U can activate menu application 82 to access a plurality of menu choices available that are respective to contact manger application 90. This example is shown in greater detail in
While accessing contact manager application 90 as shown in
Note that the options in contextual menu M-90 are stored within non-volatile storage 212 as being specifically associated with contact application 90. Menu application 82 is therefore configured to generate a plurality of different contextual menus M that are reflective of the particular context in which the menu application 82 is invoked. For example, in an email application where an email is being composed, invoking menu application 82 would generate a contextual menu M that included the options of sending the email, cancelling the email, adding addresses to the email, adding attachments, and the like. The contents for such a contextual menu M would also be maintained in non-volatile storage 212. Other examples of contextual menus M will now occur to those of skill in the art. Menu application 82 and contextual menus M will be discussed in greater detail below.
Returning now to
It should now be understood that the nature of network 66 and the links 70, 74 and 78 associated therewith is not particularly limited and are, in general, based on any combination of architectures that will support interactions between client machine 54 and servers 58 and 62. In a present embodiment network 66 itself includes the Internet as well as appropriate gateways and backhauls to links 70, 74 and 78. Accordingly, the links 70, 74 and 78 between network 66 and the interconnected components are complementary to functional requirements of those components.
More specifically, system 50 includes link 70 between client machine 54 and network 66, link 70 being based in a present embodiment on core mobile network infrastructure (e.g. Global System for Mobile communications (“GSM”); Code Division Multiple Access (“CDMA”), Enhanced Data rates for GSM Evolution (“EDGE”), Evolution Data-Optimized (“EV-DO”), High Speed Downlink Packet Access (“HSPDA”).) or on wireless local area network (“WLAN”) infrastructures such as the Institute for Electrical and Electronic Engineers (“IEEE”) 802.11 Standard (and its variants) or Bluetooth or the like or hybrids thereof. Note that in an exemplary variation of system 50 it is contemplated that client machine 54 could be other types of client machines, including a full desktop computer or a “thin-client”.
System 50 also includes link 74 which can be based on a T1, T3, O3 or any other suitable wired or wireless connected between server 58 and network 66. System 50 also includes link 78 which can be based on a T1, T3, O3 or any other suitable wired or wireless connected between server 62 and network 66.
As previously stated in relation to
Those skilled in the art will now recognize that menu-panes 104-1, 104-2, 104-3 and 104-4 represent at least one set of hyper-text markup language (“HTML”) programming instructions possibly incorporating scripting language such as Java-script. Likewise those skilled in the art will now recognize that content-panes 108-1, 108-2, 108-3 and 108-4 represent at least one other set of hyper-text markup language (“HTML”) programming instructions possibly incorporating scripting language such as Java-script. The programming instructions for menu-panes 104-1, 104-2, 104-3 and 104-4 are discrete from the programming instructions for content-panes 108-1, 108-2, 108-3 and 108-4. It will also now be apparent that, web-server 58 is configured to provide each web-page 100-1, 100-2, 100-3 and 100-4 in its entirety in response to a request from a web-browser, so that it is not generally possible to view, for example web-page 100-4 directly from web-page 100-1 or web-page 100-2.
Referring again to
Explaining Table I in greater detail, the first four columns of Table I (“Root Menu Item”; “Level 1 Menu Item”; “Level 2 Menu Item”; “Level 3 Menu Item”) correspond to the menu structure found in menu-panes 104-1, 104-2, 104-3, 104-4. The last column of Table I (“Web-page Link within web-site 100”) corresponds to the specific address associated with a particular web-page within website 100, including web-pages 100-1, 100-2, 100-3, 100-4 and other web-pages that are not actually shown in the Figures and points to the respective content (including 108-1, 108-2, 108-3, 108-4 and other content not actually shown in the Figures) that is associated with the menu-panes reflected in the associated first four columns. Thus the first four columns can be used by native menu application 82 to create a plurality of contextual menus M that have substantially the same content as menu-panes 104-1, 104-2, 104-3 and 104-4. Likewise, the last column of Table I can be used to extract web-content corresponding to the web-site address indicated in the relevant entry of that last column, as found within web-site 100, including web-content 108-1, 108-2, 108-3, 108-4 and other web-content from other web-pages in web-site 100 that are not actually shown in the Figures. Web-browser application 86 and native menu application 82 are therefore configured to co-operate using schema 102 in order to present web-content within the web-browser application 86, while using native menu application 82 to permit user U to navigate through web-site 100.
Referring now to
At block 910 a schema is requested. Block 910 is performed by web-browser application 86 (or a separate plug-in or other application configured to execute in conjunction with web-browser, such as a transcoding engine, not shown) which establishes a connection with schema server 62 in order to retrieve schema 102. At block 915 the schema is validated and returned. The validation of block 915 (which, it will be appreciated, like certain other aspects of method 900, will be understood to be optional) can be effected by server 62 which can perform a validation operation to confirm that schema 102 matches web-site 100 and is otherwise up-to-date. If validation is not achieved then an exception (e.g. an error) can be generated. Assuming validation is achieved, then schema 102 is returned to web-browser application 86 where it is loaded into web-browser application 86. Blocks 910 through 915 are represented in
Also note that the means by which web-browser application 86 requests schema 102 is not particularly limited. In one particular embodiment, however, it is contemplated that web-browser application 86 will be configured to automatically make network requests over network 66 to request a schema that corresponds to website 100. For example, schema server 62 can have a predefined network address on network 66 that is preprogrammed into client machine 54. The type of network address is not particularly limited, and can be, for example, any type of network identifier such as an Internet Protocol (“IP”) address or a Uniform Resource Locator (“URL”). Any other suitable type of network address is contemplated. Client machine 54 can therefore be programmed to send a request to the address for schema server 62 and request that schema server 62 provide, if available, a schema (e.g. schema 102) that corresponds to web-site 100. (Note of course that in other embodiments, a separate schema can be provided for each web-page within web-site 100). The request at block 910 provided by client machine 54 can be formed with any unique identifier for each web-page, but in the context of the Internet the request would most typically be, or derived from, the URL associated with each web-page. In turn, that unique identifier can be used to index schema 102 on schema server 62.
As well, authentication can be made through connection 216 to validate the origin of schema 102. For an example, private and public key based authentication can verify that schema 102 is originated from a trusted source.
Those skilled in the art will now recognize that system 50 can be implemented so that a plurality of web-sites (like web-site 100) are hosted over network 66 (either alone by server 58 or by a plurality of web-servers like web-server 58), and that a corresponding plurality of schemas for each of those web-sites (or each of the web-pages therein, or both) can be maintained on schema server 62. Those skilled in the art will now recognize that there can in fact be a plurality of schema servers (like schema server 62) and that client machine 54 can be configured to search for corresponding schema files on one or more of those schema servers. Those skilled in the art will now further recognize that schema servers can be hosted by a variety of different parties, including, for example: a) a manufacturer client machine 54, b) a service provider that provides access to network 66 via link 70 on behalf of user U of client machine 54; or c) the entity that hosts web-site 100. In the latter example it can even be desired to simply host schema 102 directly on web-server 58 and thereby obviate the need for schema server 62.
Referring again to
Referring again to
At block 940, a determination is made as to whether native menu application 82 has been selected for activation. In a present embodiment, and referring again to
Within block 935, user U can perform the usual functions of web browsing, including scrolling through the page, and selecting any individual links which may be active on within content 108-1. Thus, user U could browse and otherwise interact with content 108-1 as if user U was operating a traditional desktop browser. It will now be understood that such interaction could lead to a selection of a different web-page which would otherwise interrupt performance of method 900. Such interaction is not contemplated by method 900 expressly for convenience and simplicity, but that is not to say that such interaction is excluded.
Assuming, however, that a “yes” determination is made at block 940 and method 900 advances to block 945, then at block 945 a contextual menu would be generated.
At block 950, a determination is made as to whether a web-page has been selected. User U can thus scroll through the various options presented on contextual menu M-104-1 in much the same manner that user U could scroll through the options presented on contextual M-90 as discussed above. Thus, at block 950, a determination would be made as to whether user U interacting with contextual menu M-104-1 using menu application 82 made a selection corresponding to one of “Computers”; Computer Add-ons”; “Software”; “Photo-finishing”; “TV & Video”; or “Audio”.
If the determination at block 950 is “no” then at block 955 a determination is made as to whether a selection was made to close the menu application. Continuing with the present example, it would be determined whether user U interacting with contextual menu M-104-1 using menu application 82 made a selection corresponding to “Close Menu”. If the determination at block 955 is “yes”, then method 900 returns to block 935 and contextual menu M-104-1 would close and display 224 would return the appearance as shown in
If the determination at block 955 is “no” then method 900 advances to block 960 where a determination is made as to whether a control item was selected. Continuing with the present example, in
If the determination at block 960 is “no”, (i.e., user U interacting with contextual menu M-104-1 using menu application 82 made a selection corresponding to “Home”), then method 900 cycles back to block 950.
Referring again to block 950 of
Referring now to
Referring now to
Method 900b addresses one problem of browsing between web-pages on mobile electronic devices, whereby browsing through multiple pages can be time consuming, resource (e.g. bandwidth, processor, memory) intensive and not to mention financially expensive for user U depending on the rate plan available to user U. Method 900b can allow users to navigate through multiple levels of web page menus. Turning now to
The determination of which portions of menu panes M104-1, M-104-2 or M-104-3 are to be combined are not particularly limited. For example, a record can be kept of the most popular selections by all users of web site 100 and to include direct links to those selections. Alternatively, specific promotions can be chosen to be combined into the modified menu pane M-104 (e.g. where the operator of server 58 wishes to promote the sale of 17″0 laptops in
The foregoing presents certain exemplary embodiments, but variations or combinations or subsets thereof are contemplated. For example, other functions can be added to each contextual menu M as those menus are presented within browser application 86, such as the common “back” or “forward” commands as found in traditional desk top browsers. Also, the types of web-sites 100 are not intended to be limited to e-commerce web-sites.
Another embodiment provides a communications environment 10D. Referring to
Access to the Web sites over the network 11D can be done directly, in terms of desktop devices 26D, and through a proxy gateway 22DD, further described below. Accordingly, one or more mobile devices 24D (e.g. PDAs, mobile phones, etc.) and one or more desktops 26D can use the gateway to access the pages (both content 50D and navigational 54D aspects). The gateway can be used to format or otherwise monitor the interaction of the user of the devices 24D, 26D with the content 50D and navigational 54D aspects of the Web pages.
Overview of the Environment 10D
Specifically, the environment 10D can take unstructured webpage (e.g. HTML) and convert it into a structured database, for example. It is not about simplifying HTML for any page, it is about understanding the data in a page and the relationships (between data content and between data content and navigational items tied to that page content) that govern the data in the page. Accordingly, knowledge of the data contained in the page content (e.g. data type—navigation verses published content—as well as which of the published content is related to each other and which of the navigation data is related to each other and to which published content on the page) can be used (for example via a signature file) to extract data from the web site (for example on a page by page or other defined collection of information such as for file by file) for consumption by the mobile/desktop device 101D. Therefore, it is the gateway that acts as the proxy between the desktop/mobile for accommodating requests for web site data from the mobile/desktop and corresponding web site data sent from the web site in response to the request. It is recognized that the data (e.g. web page 60D) obtained by the gateway, from the web site, could be any structured file (e.g. an HTML, XML, etc.) document (optionally in the form of a web page), or which the signature file has predefined knowledge about the contents of the document (e.g. meaning of data contained within tags/delimiters as well as the interrelationships between the data in the document). One example of this is a web page described in HTML, which can be referred to as unstructured content.
It is recognized that the extraction process of the gateway for extracting data from the web page of the web site can be used to obtain only that data (e.g. published content and/or navigational data) that is pertinent to a simplified display on the screen of the user device 101D. The reason for generation of the simplified display of the data obtained from the original web site content (e.g. a web page) can be such as but not limited to: limited display space for the generated simplified data display on the user device 101D (e.g. physical space restrictions such as for a mobile screen or for user/system defined space restrictions such as for only a portion of the theoretically available desktop screen space; and for user preference pertaining to continuity of browsing/transactional/session experience. An example of user preference is where the user starts the interaction with the web site and resultant displayed data (published content and navigational data) on the mobile (i.e. mobile formatted data display) and then wishes to retain the formatting of the mobile when continuing to view on the desktop screen. For example, the user on the desktop can continue to browse the published content and navigational data of the web site as previously experienced on the mobile, using only a portion of the desktop screen (for example) for data display.
The remaining description will refer to the document obtained from the web site as a web page, for exemplary purposes only. Large data-driven sites don't maintain thousands of pages. They have a few page templates and populate them from a database of information, news, shopping etc. Each template represents a family of pages. And a family of pages has objects and attributes.
Family: List Page
Objects: lists a selection of news stories
Attributes: Title, abstract and date
Family: Detail page
Objects: lists a single news story (and maybe other related stories)
Attributes: Journalist, City, Date, Title, Full Story, Image
Family: List Page
Objects: lists a selection of products
Attributes: Image, Item Name, Price, Sale Price
Family: Search Page (a specific kind of list page)
Objects: same as list page +−a few
Attributes: same as list page +−a few
There are a few families of pages that can be managed to get an entire website accessible via a signature file, further described below:
List Pages—browse by category, by search, featured products
Detail pages: A specific object details with other information on a page
Search: to enter search information
Input: To do things like enter billing information (these are typically individual pages)
Signature Files
We identify the signature for each family of pages (the family template) that 1) automatically can identify a given page on a website as part of the family and 2) differentiates that family from another family of pages. Similarly each object and attribute field can have a unique signature within a family of pages that we need to identify once for the family.
A Signature file can contain numerous pieces of information, for example namely:
1) identifying the page family
2) identifying the objects and attributes in the page
3) Specifying the relationship between the objects and attributes.
In the case of a document received as a file, the signature file can contain knowledge about the type of file, the objects/attributes of the file, and the relationships between the objects and attributes in the file. A further example of the web site data can be such as but not limited to news articles and RSS feeds or other information feeds (stock tickers, etc.).
Schema Engine
This component uses the signature file for a website to create content data in response to the web page request, from the mobile/desktop, efficiently on the fly and send the data to the client. The data can include web page content data and navigational data obtained from the web page as requested. Alternatively the information can be stored to start building a database of the site, optionally. The construction of this database can be saved locally to the gateway, otherwise cached to the local storage of the user device, and/or cached/stored at the web site or third party (e.g. a search engine service used for comparison of data from different web sites).
Separation of Navigation & Content
Navigation items are on the same page as content, but it may not make sense (in situations with limited screen real estate available) to display the page in the original web page format as obtained from the website by the gateway. Schema extracts the navigational items separately to create a navigational portion of the web page. The environment can do interesting things with the separated navigational items, such as feed it to an application in the background to help improve the browsing experience or to otherwise reformat the presentation of the navigational items on the display of the mobile/desktop, in order to help with navigation and maintaining navigation context in situations with limited display space available for presentation of the web page.
Continuance of Sessions
In the environment 10D, the user can start browsing from the PC or mobile device 101D and complete a purchase on either (or otherwise continue the sessions). We can continue the session to realize benefits, such as revenue share, that could be lost if continuance of sessions was not enabled. Continuance of sessions can also give users seamless flexibility to use their PC and mobile to buy/browse things from websites and to replicate the buy/browse information.
The continuance of sessions can be facilitated by the use of rich bookmarking that is generated from the desktop tagging tool discussed below, such that the rich bookmark is created that has bookmark (e.g. a displayable link) components such as but not limited to: a URL (e.g. network address of the web site data; and identified portions of the web site data located with respect to that URL (e.g. item image, item title, description of item, text body related to item—such as an article, etc.). The portions of web site data associated with the URL (e.g. page/file name) can be considered key or otherwise memorable data preferred by the user with respect to item(s) on the URL (for example product name/price/image).
Desktop Tagging Tool and Automatically Creating Signature Files
This uses artificial intelligence to analyze any page in one or more ways, such as but not limited to.
1) delimiter (e.g. HTML tag) structure and properties; and
2) Spatial analysis of objects located on a rendered page.
Generally main content is closer to the centre of the page, is bigger and is meant to stand out more to the user. Properties in the HTML mark up can be used to accomplish this and we have Al that can identify these properties. One embodiment is where we use the rendered page, in combination with tag analysis. One benefit is that this feature could be used to generate the signature files automatically by guessing and at least significantly speeding up creating of signature files, if not completely automated. Another use of the desktop tagging tool is to create a list of rich bookmarks for later use by the user and/or for publishing or otherwise sharing with other users. One example of this would be a list of rich bookmarks provided by one user to another user, such that the list of rich bookmarks contains URLs and associated data from one or more web sites.
Conducting a Transaction
With regards to billing and completing a transaction. A user goes through a number of pages to navigate to an item they want to buy and then must continue browsing through a number more pages to complete a checkout or transaction. The provided description of the environment 10D includes detailed explanations of the analysis and output of a requested web page. The same process can be extended to all web pages browsed from start to end to complete a transaction. It is recognized that the transaction can be such as but not limited to: browsing for and subsequent purchase of item(s); and/or browsing and subsequent saving of published content (e.g. news article), as desired.
For example, actions one through ten in
Web Sites 20D
Referring again to
For example, a Web service definition encompasses many different systems, but can refer to clients and servers that communicate XML messages that follow the SOAP-standard, via a description of the operations supported by the server e.g. in WSDL.
The composition of the Web pages can include displayed content and navigation features.
Web pages typically have both of these features on each page and will display content in the main content areas and have navigation options through menus, as shown by example in
Content 50D
The content can include computer files, image media, audio files, electronic documents, which are either located on/in the Web page or are otherwise accessible through navigation/requests from a particular Web page and/or Web service. For example, Web content can be referred to as textual, visual or aural content that is encountered as part of the user experience through interaction with Web sites/services. Web content may include, among other things: text; images; sounds; videos; animations; and feeds (video, audio, and/or textual). For example, the pages can present content as predominantly composed of HTML, or some variation, as well as data, applications, e-services, images (graphics), audio and video files, personal Web pages, archived e-mail messages, and many more forms of file and data systems can belong to Web sites and Web pages.
Examples of content can be as follows:
1) Tables for presenting information displayed in a grid, such as a calendar, or in a spreadsheet, such as financial data. Tables can be used to have greater control over page layout.
For example, a table can help that text and graphics are displayed in their correct location. A table can also encompasses an entire page, with nested tables (including content and/or navigation features) within the main table for even more layout control;
2) video and audio files;
3) text, e.g. articles, for most web pages, is tone of he most important features. Text can be used to present ideas, instructions, and/or educational/recreational content; and
4) Images (e.g. GIF, JPEG) can be used in web pages to support the theme of the web page and to provide a visual impression. Images can be separate image files and may not reside in the HTML document itself, but can be stored in the same location as the web page. Images can be scanned photographs or pictures, may be created in a draw program, or may be downloaded from another web site.
Navigation 54D
The mobile 24D and desktop 26D devices coordinate user events 109D of the respective users, though operation of the browsers (or other applications) 207D in interaction with the supporting navigation features of the Web pages/sites. The navigation features can include visual based controls, text controls, and/or a combination thereof. For example, there can be three basic types of navigation: Hierarchical that applies to Web sites that are information-rich and are organized as a large tree, much like a library; Global that applies to Web sites where the user can logically jump among all points (e.g. content and/or other navigation controls); and Local that applies when the user wants to access a depth of information/content within broader areas/content of the Web site.
Examples of navigation 54D mechanisms with respect to the content of a Web site can include such as but not limited to: embedded links (e.g. anyplace where one links content within the body of the page); and navigation buttons, graphic and text-based. As well, text entry fields can be used to navigationally access content and other navigation features of Web sites.
Further examples of navigation 54D mechanisms can be such as but not limited to:
1) Buttons can be images with text on them that provide a means to navigate from one location to another. Buttons may be created in a draw program or downloaded from other web sites.
2) Menu Bar can be features on a web page that provide links to other pages for easy navigation between the pages or other Web sites. Menu bars may contain buttons (e.g. text/images), they may be created as a table, or they may be text-based with divider lines; and
3) Links providing “branching capabilities”—the ability to go to another site/page. Links provide that branching option. Links are “jump starts” to other web pages/sites. A link may take the user to another page or it may take them to another site.
Devices 101D
Referring to
Referring again to
Referring again to
Further, it is recognized that the computing device 101D can include the executable applications 207D comprising code or machine readable instructions for implementing predetermined functions/operations including those of an operating system and the host system 14D modules, for example. The executable instructions 207D can be an application hosted on the user mobile/desktop for interacting with the gateway, the engine and other related components for when acting as a data proxy between the mobile/desktop and the web site, or a web service (e.g. search engine crawling tool) for use by the web site, as configured by the respective device 101D when operating within the environment 10D. The processor 208D as used herein is a configured device and/or set of machine-readable instructions for performing operations as described by example above. As used herein, the processor 208D may comprise any one or combination of, hardware, firmware, and/or software. The processor 208D acts upon information by manipulating, analyzing, modifying, converting or transmitting information for use by an executable procedure or an information device, and/or by routing the information with respect to an output device. The processor 208D may use or comprise the capabilities of a controller or microprocessor, for example. Accordingly, any of the functionality of the executable instructions 207D (e.g. through modules associated with selected tasks) may be implemented in hardware, software or a combination of both. Accordingly, the use of a processor 208D as a device and/or as a set of machine-readable instructions is hereafter referred to generically as a processor/module for sake of simplicity.
The memory 22D is used to store data locally as well as to facilitate access to remote data stored on other devices 101D connected to the network 11D. This data can be related to data/user events of the mobile/desktop, data used by the gateway in obtaining and satisfying requests for web pages and associated content/navigation features, and/or actual Web site data, as appropriate for the use of the device 101D in the environment 10D.
The data can be stored in a table, which can be generically referred to as a physical/logical representation of a data structure for providing a specialized format for organizing and storing the data. General data structure types can include types such as but not limited to an array, a file, a record, a table, a tree, and so on. In general, any data structure is designed to organize data to suit a specific purpose so that the data can be accessed and worked with in appropriate ways. In the context of the present network environment 10D, the data structure may be selected or otherwise designed to store data for the purpose of working on the data with various algorithms executed by components of the executable instructions, depending upon the application thereof for the respective device 101D. It is recognized that the terminology of a table is interchangeable with that of a data structure with reference to the components of the network environment 10D
Example Operation of Response 60D to Web Page Request by Gateway 22DD
Large data-driven sites 20D may not create and maintain thousands of pages. Instead, they use multiple page templates 62D and populate the templates from the database of content information. Examples would be online stores, news sites, sports information and weather. The association of the data in the database with the templates 62D is used to construct the web page(s) sent to the gateway 22DD.
See
1. A client makes a request to the ABC ComTech Corp.ca web server (2)
2. The web server calls the respective page family (3) depending on the page requested
3. The page family (3) retrieves data (all navigation and content) from the database (1) to populate data fields of the page 60D
4. The web server (2) transmits a completed webpage 60D to the gateway 22DD at (4) It is recognized that a page family serves a specific function for use by the gateway 22DD via the signature file 64, see below. For example, ABC ComTech Corp.ca has the following families of pages:
Accordingly, the environment 10D can take advantage of the fact that each web page 50D of the website 20D can follow a recognizable pattern of content data/location and navigational items related to the content data and content location. For example, in the ABC ComTech example, 62D the text on the web page in red, located above a description of a computer product, is always the price of the computer product.
It is recognized that the web page could also be referred to as a document (e.g. file) that is analysed by the engine through use of the signature file in order to extract (or to insert in the case of passing information from the mobile/desktop back to the web site) information subset(s) of the document.
Signature File 64D
A signature file can be created once for a website and then can efficiently analyze and extract data from pages 60D from the website efficiently. One advantage is that as the signature file can be implemented as an application by the gateway 22DD, the gateway 22DD may not have to store any data from the website, and can instead fetch the data in real time upon request as the webpage 60D matching the content/navigation request of the mobile 24D and/or desktop 26D. Another advantage of signature files is that they can be non-intrusive to the existing website infrastructure and may not require that a vendor or merchant make any changes to their website configuration/infrastructure. One preferable characteristic of the websites is the use of page families for representing the website data from their databases 22D, as further described below.
Using signature files, Mobile Applications, —rich mobile applications, can be created from large websites. Each page is “optimized” on the fly by extracting the data from the page and sending the data to a mobile device, through use of the signature file, thereby helping to significantly speed up loading time and saving bandwidth. It is recognized that the Schema Engine is also able to format the content and navigational items obtained from the web pages 60D for efficient display on the display of the mobile/desktop devices 101D. Turning unstructured page content into relational data can also be significant, and can help to enable rich features for users such as custom alerts and price comparison. A user could save an item while browsing and the Schema Engine could automatically go back every day (or other selected time period) and check to see if the item was on sale or in stock. A user could also ask to find similar items at other stores, via appropriate URL requests to the gateway, which could be a supportable feature if product information was stored, for example.
With regard to Search Engines, today, there is no way to automatically index the web and understand the specific details about unstructured information (e.g. embedded in a page format) that are resident on the web pages as embedded content and navigational items. For example a search engine index of the product page of camera A would contain all the keywords for that page and have prices—$200, $100, $50, the relationship of the prices with respect to the camera is unknown to the search engine. For example, a product page can contain information about a specific product including its name, description and price but may also have other product being recommended to the user on the same page with their own prices and names. A search engine may know of all the names and prices in the page, but not know which sets of information belong to which product specifically. But the index would not know which one of those prices is the actual price of camera A. Applying signature files 64D (e.g. having knowledge of which of the content is related to other portions of the content, as well as navigational aspects of the content) to search indexes can allow search engines to unlock the value of precise content that exists in their indexes and can have the ability to significantly improve search results. It also enables new kinds of searches such as “Find the lowest price for this product” or “Show me all articles actually written by this author” which would not show web pages that simply had the authors name in it. Accordingly, it is understood that use of web pages that are unstructured (e.g. little to no use of meta data for defining the content and navigational items resident in the document (e.g. page)), such as unstructured HTML.
With regard to Price Comparison or product recommendation sites—Price comparison sites depend on vendor submission for their offering and it can be very painful for vendors to prepare these data feeds. Crawling the sites like a search engine does not work for the reasons stated above, unless the use of a signature file is applied. Using the signature file, accurate product information could be ascertained by applying signature files to a crawled cache or index or collecting price and product information as it flows through the Schema Engine. In this case, a template of the cached/indexed information would be used to create the respective signature therefore. The information could be much richer and cover a significantly larger portion of the web more reliably and easily. A price comparison engine could automatically crawl using signature files to build a complete database of an ecommerce site, using search criteria facilitated through the signature file to implement complex searches of the web site content on a page per page basis (e.g. find all cameras with prices—done through the use of the signature file for respective web sites and then apply filters to the extracted data—e.g. identify those cameras with a price under $200).
With respect to construction of an appropriate signature file, some terminology is explained using
Page family: Item Page
Object: Product (A camera)
Object Elements: Picture (1), Title (2), Price (3), Description (not shown)
A complete signature file 64D for a website 20D can contain such as but not limited to:
The following provides an overview of constructing the various components of the signature file 64D, for use in interpreting web pages 60D obtained from the web site and for reformatting the content and navigational items of the requested web pages for use by the mobile 24D and/or desktop 26D as reformatted pages 66D. The recognition of various elements of the web pages for use in defining the signature file 64D can be obtained through manual/automated/semi-automated analysis of the web pages (content and navigation), as desired.
Identifying a Page Family
An identifier for a page family can meet 2 criteria:
1) It is present in all pages belonging to the family
2) It is NOT present in pages belonging to any other family
In one embodiment, a string identifier is used that meets the above criteria as shown in the example below.
Code Snippet from Webpage Shown in
Code Snippet from Webpage Shown in
The pages in
Setting a Limit
Unique reference can be defined by setting a limit on portion of a webpage.
Code Snippet A from Webpage Shown in
Code Snippet B from Webpage Shown in
The string “largeImageRef” is the string identifier used to identify and extract the product image for the page shown in
Extracting Objects and Elements in a Page Family
The example page in
As an example let us try to construct an instruction to identify and extract the title from any item page such as the pages shown in
The following instructions will result in the output of the title:
1. Locate the string “product-details-prd-title”
2. Extract the value after the string in (1) and in between the strings “<span” and “</span>;”
3. Strip all mark up tags—“class=“tx-heading3-dgrey”>”
4. The resulting string is the product's title
The code snippet for the page shown in
The command representing instructions 1-4 above is shown below in a query language (e.g. the individual file entries) used in signature files developed for the purpose of data extraction, with some relevant parameters highlighted:
<lookup type=“pex” action=“get_string” name=“title” ref=“product-details-prd-title” location=“after” start=“<span” end=“</span>” include_sz=“1” strip_jags=“1”/>
The signature file and processing of signature files by the Schema Engine are discussed in more detail later.
Identifying Object and Element Relationships
The object and element relationships can be implicitly or explicitly specified. For example in the ABC ComTech Corp. list page shown in
Other Aspects
The example and information demonstrates how to capture data and relationships of objects and elements within a page of a web site 20D. The platform can actually capture relevant attributes of an object across pages. For example, if a user of the mobile 24D clicked through a number of pages in the following categories in ABC ComTech Corp. to get to a specific TV-SONY456: e.g. TV & Video >19″-21″ TVs >LCD TVs >SONY456.
Another aspect is the ability to capture the information across the navigation of pages about the product. In doing that, one can capture the categorization of the TV “TV & Video >19″-21″ TVs >LCD TVs >” and add that as another attribute of the object. This example shows how capturing of navigation metadata or information across pages can be a source of valuable information.
Although this example covers only displaying content, the same concepts apply for a page that requires input. The key input fields and values (e.g. the ability to enter search strings) can be identified in the same way and presented to the user of the mobile 24D and value captured and sent back to the website 20D via the gateway 22DD. The signature file 64D can be written in an xml based query language syntax (or other structured definition language and/or script language, for example) to specify the above identifiers and actions such as traversing backwards, forwards and extracting values. The language can be a SQL type query language and can be built on top of regular expressions.
Automatic Generation of Signature Files 64D
Described is a method of creating signature files that identify and extract specific contents from a webpage. It is recognized that, in view of
The contents may be navigational items, lists, specific items from a list, and other content, for example. The reason that this is useful is that signature files can be manually created, which can be time consuming, and subject to human error. Therefore by automating this process, the turn around time for interpreting a website as a database through the gateway 22DD can be substantially faster and more accurate.
The automated generation method is to break down the html document (of other format of the web pages) into a hierarchy of tags (delimiters pertaining to a schema of the definition of the pages). The resulting structure can be a tree, which defines the parent, siblings and children of each object. The process (described in the following section) can identify the key objects that contain the data required for the signature file. Once an object is identified as being a required field within the database, the object would then identify its uniqueness by examining its properties (for example class, style, id). If the object is a text node of the tree (or other hierarchical structure), the object will use the properties of its parent. If the properties of the object are not unique, then the object would expand its uniqueness to its parent, siblings and children. The process would expand in all directions uniformly (i.e. examine parent, then previous sibling, then next sibling, then first child. The properties of each of these items would also merge with the required object. This process would then be repeated on the parent, then the previous sibling, etc, until a unique identifier was found. Once a unique identifier was found, an expression would be created for the signature. Note that at least two pages of the same family can be used to create the expression.
The user will enter the required fields to be extracted from the page. These fields can be specified by a user using a corresponding graphical user interface of the device 101D to select fields. Alternatively a tool similar to the Desktop tool (see below) could be used to automatically guess at the fields on a page. To automatically generate the signature file assumes that one knows where the key information that resides on the page (i.e. location within the document)—e.g. price, image, description, etc. For example, knowledge of where the key information (e.g. here is the image between these tags to identify the content) is located in the web page can be done using a number of methods, such as but not limited to: look at code of the page by hand to identify the tags used to indicate content type (e.g. navigation, navigation of which content, title, price, image, item description, etc.; semi-automated using a graphical tool to highlight portions on the page and therefore visually select which content data corresponds to what meaning and other content data; and/or the use of the user assisted identification with confirmation/correction by user (further described below with the use of assisted generation that is applicable also to generation of rich bookmarks).
This is a Description for Product Title Made by Product Manufacturer
This is a Description for Sample Title Made by Sample Manufacturer
Assumptions: The required fields are identified prior to this process either by the user or using an automated tool (such as the schema desktop tool). They can be as follows:
It is recognized that different modules of the automated generation process can implement the following steps (embodied as executable instructions 207D—see
Step 1—Identify the Image
From the Item1 the object <img src=“sample_image.gif”/> is selected. It identifies src as an attribute and scans the source of item1 for src=“sample_image.gif”. It does not find a match, so it then scans item2. If a match is found, and the matching object contained the image identified for item2, the attribute would be used to create a signature file image property. However, the item is not found in Item2, so no match has been made. Next the element looks at “<img” within list 1. It determines that it is the second match. When looking at Item2, the second image also provides the object that contains the image. Now that we have the matching object, we apply a similar heuristic to locate the result from within the object. If the object is a text node, the process is complete. Otherwise, the start and end of the object needs to be located. Using pattern recognition techniques, we find that the ‘src=”’ and that ‘”’ ends the string. Therefore the following entry would be added to the signature file <lookup type=“pex” action=“get_string” name=“image” ref=“<img” repeat_ref=“1” start=” src="” end=“"”/>
Step 2—Identify the Title
From the Item1 the object <h1>Product title</h1> is selected. It identifies that it is a text node, and uses its parent to identify uniqueness. There are no attributes for the parent <h1>. Next the element looks at “<h1” within list 1. It determines that it is the only match. When looking at Item2, there is only one match, and the matching element contains the title. Now that we have the matching object, we apply a similar heuristic to locate the result from within the object. Since the object is a text node, the process is complete. Therefore the following entry would be added to the signature file <lookup type=“pex” action=“get_string” name=“title” ref=“<h1 start=“ src=>” end=“<”/>
Step 3—Identify the Price
From the item1 the object <strong> $79.99</strong> is selected. There are no attributes to be checked for this element. Next the element looks at “<strong” within list 1. It determines that it is the second match. When looking at Item2, the second strong tag also provides the object that contains the price. Since the object is a text node, the process is complete. Therefore the following entry would be added to the signature file
<lookup type=“pex” action=“get_string” name=“price” ref=“<strong” repeat_ref=“1” start=“>” end=“<”/>
Step 4—Identify the List Price
From the Item1 the object <strong> $99.99</strong> is selected. There are no attributes to be checked for this element. Next the element looks at “<strong” within list 1. It determines that it is the first match. When looking at Item2, the first strong tag also provides the object that contains the price. Since the object is a text node, the process is complete. Therefore the following entry would be added to the signature file
<lookup type=“pex” action=“get_string” name=“price” ref=“<strong” start=“>” end=“<”/>
Step 5—Identify the Description
From the Item1 the object <p>, this is a description for Sample title made by Sample Manufacturer </p> that is selected. There are no attributes to be checked for this element. Next the element looks at “<p” within list 1. It determines that it is the first match. When looking at Item2, the first p tag does not provide the object that contains the description. The parent object <div class=“product”> is selected next. It identifies the attribute class=“product”, and scans item1, and determines that it is the only match. The <p tag is processed again, limiting its search to the parent. The <p tag is identifies as the first instance within the parent. Next the same process is performed on item2. First the attribute class=“product” is located. The first <p tag that is a child of the object containing class=“product” is found. The <p object also contains the description. Since the object is a text node, the process is complete. Therefore the following entry would be added to the signature file
<lookup type=“pex” action=“get_string” name=“description” ref=“class="product"” start=“<p>” end=“<”/>
Referring to
It is recognized that the hierarchy 74D can link entities 76D either directly or indirectly, and either vertically or horizontally. The only direct links in a hierarchy, insofar as they are hierarchical, can be to the entities' immediate superior or to the entities' subordinates, although a system that is largely hierarchical can also incorporate other organizational patterns. Indirect hierarchical links can extend “vertically” upwards or downwards via multiple links in the same direction. Traveling up the hierarchy to find a common direct or indirect superior, and then down again can nevertheless “horizontally” link all parts of the hierarchy, which are not vertically linked to one another. Further, the structure 74D can also be a lists implemented using arrays or linked/indexed lists of some sort. The structure 74D can have certain properties associated with arrays and linked lists. A sequence can be another name for the structure 74D, emphasizing ordering of the entities 76D.
Further, it is recognized that the structure 74D would be represented in the signature file as the entries as noted above. It is recognized that a user of the device 101D could manually 78D amend or otherwise review the automatically generated signature file 64D, as desired.
User Assisted Generation of Signature Files 64D (Desktop tagging)
Described is a method of assisted recognition of web page contents that identifies and extracts specific contents from a web page, which could be applied in creating signature files. It is recognized that, in view of
The web page contents may be navigational items, lists, specific items from a list, and other content, for example. The reason that this is useful is that signature files can be manually created, which can be time consuming, and subject to human error. Therefore by helping to automate the recognition of web page contents, the turn around time for interpreting a website as through the gateway 22DD can be substantially faster and more accurate.
The following is an embodiment of the process of assisted capturing of web page contents, such as but not limited to the image, title, description, and price of a product page as shown in
The javascript can have no other knowledge of a web-page, other than confidence intervals to determine the specific fields (image, title, description, and price) of a product, for example. The confidence intervals, further explained below, contain the location on the page (width and height) of each field, and other properties (stated below) that are used to guess a field (i.e. what is the significance/meaning of the field with respect to the content/navigation items contained on the web page. Therefore, confidence levels can be set on a per site basis, but the process used to derive the fields can be the same for every site. This can be done, because most ecommerce web sites display products in a similar fashion (e.g. the title is bold, the image is near the middle and large, the description has the most text, and is black, the price is highlighted and when rendered is within close proximity to the image. Any differences between web sites can be accommodated for based on the assisted (e.g. user) nature of the capturing of web page contents. For example, after the initial guess by the javascript, incorrect matches can be altered by the user clicking on the field that was matched incorrectly, and then locating the correct match on the page, and clicking on that. Once the item is submitted, the confidence intervals are updated based on the fields submitted.
Accordingly, referring to
It is recognized that the assisted recognition of web page contents could also be used to locate any navigational items that are related to the web page content (e.g. a buy button located adjacent to a product, a bid now button located next to an auction item, etc.).
Further, this method of web page recognition can be tuned capture the key information on a webpage for different genres of sites. For example, e-commerce websites, news sites, sport etc. The method can capture the product image, title, price & description from a page and then post the information with the URL of the webpage to a server to store the information for the user for later retrieval and use, e.g. a rich bookmark. This allows the user to store rich bookmarks that contain more than just the URL of the website. An example of rich bookmarks 99D lists are shown in
Example Operation
Field Attributes
Image:
Title:
Description:
Price:
Example
Site: http://www.bestbuy.com
Link: http://www.bestbuy.com/site/olspage.jsp?skuld=7731564&type=product&productCategoryId=pcmcat95100050005&id=140392418573
Source: web page of
Referring to
1) User navigates to item page
2) User clicks FatFreeMobile (activation of desire to connect to gateway 22DD)—Save
3) A request is made to fatfreemobile.com (i.e. the gateway 22DD) for the product javascript 95D
4) The FatFree server receives the request
a) The server checks to see if the user is already logged in, if the user is not logged in, the server checks for cookies with the user credentials
b) The server extracts the requesting site from the referrer section of the http request
c) The server attempts to the confidence intervals for the site (based on predefined identification criteria 96D).
d) The server dynamically creates the javascript based on the information from steps (a) and (c).
e) The server returns the javascript to the client
5) The client receives the javascript, which initiates variables required to start the engine, and then launches the engine. Code snippet: watPM.watStart(window);
6) The function watPM.watStart(window) performs the following tasks (e.g. based on the identification criteria 96D)
a) Initializes the objects variables
b) Locates the largest rendered frame
c) From the largest frame, all <head> and <body> tags are extracted. Code snippet: getElementsByTagName(‘body’);
d) The remaining tags i.e. <a> <td> Code snippet: getElementsByTagName(‘body’);
e) A style sheet from FatFreeMobile is then injected into the head of the document
f) Special characters such as " are replaced with their respective rendered characters i.e. "=”
g) The gui for FatFreeMobile is injected into the body, as the first element
i. API call document.element.insertBefore(new_element);
h) Step 0 is then called setTimeout(“top.watPM.watStage(0)”, 20);
7) The function setTimeout(“top.watPM.watStage(0)”, 20); performs the following tasks by calling watScriptX( )
a) All script tags that are embedded within the page are removed
i. API call document.removeElement(element);
b) Step 1 is then called setTimeout(“top.watPM.watStage(1)”, 10);
8) The function setTimeout(“top.watPM.watStage(1)”, 10); performs the following tasks by calling watParselt(0). This function looks at all of the tags. However it only process 1000 at a time, for example, to help avoid the warning message a browser prompts with
“The javascript is not responding”, So for each tag the functions performs the following (e.g. based on the identification criteria 96D)
a) Extract the tag name (i.e. <A> <BR> <TABLE>)
b) Ensure the current tag is visible. If the tag is not visible (one of the following styles implies hidden visibility=hidden display=none) the tag is ignored.
c) The position of the tag (absolute, relative, etc) are extracted from its style property
d) If the tag is one of the following it is ignored (‘LINK’, ‘STYLE’, ‘HEAD’, ‘TITLE’)
i. For example <title>Hewlett-Packard—42″ Plasma HDTV—PL4260N</title> is ignored
e) If the position (c) is absolute, and the x coordinate <0 and/or the y coordinate is <0 the element is ignored
i. For example <div id=“kioskMessage” style=“display:none;”> and all of its children are ignored
f) All javascript actions from the given object are cleared. (i.e. object.onclick will be set to return false;
i. For example <script language=“JavaScript”>if(is Kiosk){var kioskwarning document.getElementById(“kioskMessage”);kioskwarning.style.display=“block”;strAdHeight2=kioskwarning.offsetHeight;}</script> is removed
g) If the objects tag=IMG or (tag=INPUT and type=image) the object is saved as a candidate for the products image.
i. For example <imgsrc=“http://images.bestbuy.com:80/BestBuy_US/images/products/7731/7731564_rc.jpg” alt=“ ” border=“0” align=“top”> the product image
ii. For example <imgsrc=“http://images.bestbuy.com:80/BestBuy_US/images/products/7426/7426458_s.gif” alt=“7426458 Front Thumbnail” border=“0” height=“45.0” width=“54.0” align=“center”> not the correct product image, but still an image.
h) If the objects tag is in the following (‘TD’, ‘UL’, ‘P’, ‘DIV’, ‘SPAN’, ‘B’, ‘H1’, ‘H2’, ‘H3’, ‘H4’, ‘H5’, ‘H6’, ‘STRONG’, ‘FONT’, ‘BIG’) and the objects innerHTML code length is <1024 (for example) the object is stored as a possible candidate for the products title, price, and description.
i. For example <td class=“Body-Headline” colspan=2>Hewlett-Packard42″ Plasma HDTV<br></td> the correct title
ii. <b> More Options </b> an incorrect title
iii. <td class=“Body”> Watch all of your favorite high-definition quality broadcasts on this 42″ plasma TV that features SRS . . . </td> the correct description
iv. <td class=“Body” valign=“top”> 16:9 widescreen aspect ratio delivers a cinema-style entertainment experience; 3-2 pulldown for accurate reproduction of film-based sources </td> an incorrect description
v. <div class=“priceblock”>Our Price: $1,199.99<br></div> the correct price
vi. <div class=“priceblock”>Our Price: $99.99<br></div> an incorrect price
i) Step 2 is then called setTimeout(“top.watPM.watStage(2)”, 10);
9) The function setTimeout(“top.watPM.watStage(2)”, 10); performs the following tasks by calling watSetTitles( ), which calls watAttrib(hcc,lcc,tcc), (e.g. based on the identification criteria 96D);
i. var hcc=[2,1]; //initial requirements
ii. var tcc=[2]; //post location requirements
iii. var lcc=this.ltitle;
a) all candidates for titles from step 8 are compared with each other. The top 5 (for example) are selected from the following:
i. First the objects weight is assigned a numeric value based on their rendered weight. Each objects' weights are compared.
1. not defined, normal, and 400=400
2. bold, bolder and >400=700
3. <400=300
ii. Any ties are broken by the objects rendered size. The size is assigned a numeric value based on its rendered size.
1. x pixels=x
2. x pt=4/3*x
3. HN=
a. Tag=H1=2
b. Tag=H2=3/2
c. Tag=H3=9/8
d. Tag=H4=1
e. Tag=H5=13/16
f. Tag=H6=5/8
g. Tag=ELSE=1
4. x %=x*(16/100)*HN
5. x em=x*16*HN
6. xx-small=10
7. x-small=12
8. small=16
9. medium=18
10. large=24
11. x-large=32
12. xx-large=48
13. 1 or −2=10
14. 2 or −1=13
15. 3=16
16. 4 or +1=19
17. 5 or +2=24
18. 6=32
19. 7=48
20. ELSE=12
b) The candidates are then arranged in order based on their distance from the center of the page. The closest to the center would be the first choice. Etc . . . . The center of the page is defined by the confidence intervals
c) Finally the winning candidate is selected by comparing the confidence interval of the most common winner, the confidence interval of the location, and the weight of each object.
d) For example, comparing the correct title, and the incorrect title above. Both would evaluate to a weight=700. The size of the correct item is larger, so it would be ranked ahead. Next the locality of each object would be compared. Since the correct title is closer to the center it would remain ranked higher. The items would then be re-ranked based on their weight. Since there weights are equal the winner is the correct title.
a. Step 3 is then called setTimeout(“top.watPM.watStage(3)”, 10);
10) The function setTimeout(“top.watPM.watStage(3)”, 10); performs the following tasks by calling watSetDescription( ), which calls watAttrib(hcc,lcc,tcc), (e.g. based on the identification criteria 96D);
i. var hcc=[5,−1]; //initial requirements
ii. var tcc=[ ]; //post location requirements
iii. var lcc=this.ldesc;
a) all candidates for titles from step 8 are compared with each other. The top 5 (for example) are selected from the following:
i. First the objects length of the innerHTML (the length of the source html code the object contains). The longer the length, the more likely it is a description.
ii. Second the weight of the object is compared, A detailed explanation was provided in step (9). The −1 signifies that a candidates weight counts as a negative attribute. Therefore, text that is not bold/italic etc is more likely to be a description.
b) The candidates are then arranged in order based on there distance from the center of the page. The closest to the center would be the first choice. Etc . . . . The center of the page is defined by the confidence intervals
c) Finally the winning candidate is selected by comparing the confidence interval of the most common winner, the confidence interval of the location.
d) For example, comparing the correct description, and the incorrect description above. The length of the correct item is larger so it would be ranked ahead. Next the locality of each object would be compared. Since the correct description is closer to the center it would remain ranked higher. The items would then be re-ranked based on their weight, where a stronger weight counts against the item. Since there weights are equal the winner is the correct description.
e) Step 4 is then called setTimeout(“top.watPM.watStage(4)”, 10);
11) The function setTimeout(“top.watPM.watStage(4)”, 10); performs the following tasks by calling watSetPrice ( ), which calls watAttrib(hcc,lcc,tcc), (e.g. based on the identification criteria 96D);
i. var hcc=[6,9,8,2,1]; //initial requirements
ii. tcc=[6,9]; /post location requirements
iii. var lcc=this.ldesc;
f) all candidates for titles from step 8 are compared with each other. The top 5 (could change later) are selected from the following:
iii. First the objects text is searched for a dollar sign ($). Objects that have a dollar sign will be ranked higher
iv. Second the objects text is casted to a decimal. If the cast is successful, i.e. the text is a number the element is ranked higher.
v. Third the objects text is scanned to determine if any numbers exist. If a number is found the object is ranked higher
vi. Fourth the objects weights are compared. Objects that are bold/italic will rank higher
vii. Fifth the objects size is compared. The larger the font of the price the more likely it is the products price.
g) The candidates are then arranged in order based on there distance from the center of the page. The closest to the center would be the first choice. Etc . . . . The center of the page is defined by the confidence intervals
h) Finally the winning candidate is selected by comparing the confidence interval of the most common winner, the confidence interval of the location, whether or not a $ sign exists, and whether the text is a numeric.
i) For example, comparing the correct price, and the incorrect price above. Both would evaluate to true when searching for a dollar sign. Neither item is a decimal, as they both contain text. Both would evaluate to true when searched for numbers. Both weights would evaluate to 700. Finally the size of both items are equal. So the item is essentially tied, and since html is a top down language the first item is ranked higher in our case the incorrect item. Next the locality of each object would be compared. Since the correct price is closer to the center it would now be ranked higher. The items would then be re-ranked based on the dollar sign and decimal tests. Since there both items evaluate to be equal the winner is the correct price.
j) Step 5 is then called setTimeout(“top.watPM.watStage(5)”, 10);
12) The function setTimeout(“top.watPM.watStage(5)”, 10); performs the following tasks by calling watSetGraphics ( ), which calls watAttrib(hcc,lcc,tcc), (e.g. based on the identification criteria 96D);
a) all candidates for titles from step 8 are compared with each other. The top 5 (could change later) are selected from the following:
i. First find the rendered width and height of the image.
ii. Determine the distance from the center of the page
iii. Compare an object by taking its area−distance to the center. The object that results with the larger number is more likely to be the image,
iv. For example, comparing the correct image, and the incorrect image above. The area of the correct image is visibly larger than that of the incorrect image. As well the correct image is also visibly closer to the center. Then if the correct image CA, and the incorrect image IA would demonstrate: area of CA−distance to middle CA>area of IA−distance to center. Hence the correct image is chosen.
b) Step 6 is then called setTimeout(“top.watPM.watStage(6)”, 10);
13) The function wataddItem takes the guess for image, title, description, and price and displays them to the user, shown in
14) The user clicks Save
15) A form is posted to FatFreeMobile with the products image,price,title, and description. As well for each field, the x,y location of the field and the guess number is sent to FatFreeMobile
16) The server receives the request and updates the database accordingly. The server also downloads the selected image, to help avoid hot linking when displaying products.
It is recognized that the above assisted capture method can be used as a method to have one or more distributed users help or otherwise be employed to create portions of a signature file one or more distributed users help or otherwise be employed to create portions of a signature file for a web site. For example, a number of users could be assigned different pages from a web site in order to assemble a corresponding signature file for the complete web site, as desired.
Schema Communication Flow
The following description provides an example operation of the interaction between the gateway 22DD, the mobile 24D and desktops 26D, and the web pages 60D obtained from the website 20D, based on the requests for content/navigation from the mobile 24D and desktops 26D (see
Referring to the above
1. A client makes a request to the Schema Engine 23D, acting as a proxy 20D, for a specific webpage 60D from a specific domain (e.g. web site 20D);
2. The engine receives the request and makes a request to the web site 20D for the specified page and retrieves the web page code into memory. This may not include objects on the page such as pictures that are inserted at the time of rendering;
3. The engine in parallel makes a request to the signature repository to acquire the signature file 64D for the domain using the domain in the URL as the key to retrieve the signature file, for example;
4. The engine does not render the page but instead uses the code in the signature file as instructions to extract the desired data from the web page, such that the desired data is defined for a particular request type received by the gateway from the mobile 24D/desktop 26D and/or for a predefined mobile 24/desktop 26 platform (e.g. having knowledge of device display capabilities screen size, resolution, and other parameters useful in determining the way in which the data is capable of being displayed on the device 101D;
5. The data can optionally be stored in a local data repository;
6. The engine transmits the data to the client that requested the page; and
7. The client could be a browser application that displays the data or could be an application that renders the data (e.g. see navigational menu 300D example described below).
Further below is described a section on a detailed explanation on how the Schema Engine understands the signature file syntax and processes a webpage, as proxied between the mobile/desktop and the web site.
It is noted that an example embodiment of the engine 23D and the signature file 64D used to interpret the web page 60D and subsequently send revised/reformatted web page content/navigation data to the screen with limited real estate requirements (e.g. mobile) is provided in Appendix A.
Referring to
It is recognized that the above described steps 206-210 can be for the extraction of web It is recognized that the above described steps 206-210 can be for the extraction of web example of web page content/navigation items that are obtained by the engine from the web page 60, using the signature file as a guide for the extraction. It is recognized that the engine can also have a series of formatting rules, not shown, for use with the extracted data in generating a page with the extracted data that is suitable for display on the target device 101D (e.g. desktop, mobile). It is recognized that the formatting rules can be system and/or user defined and can include such parameters such as but not limited to: object positioning, object colour, object size, object shape, object font/image characteristics, background style, and navigational item display (e.g. in menu 300D or embedded along with the content in the generated page for display on the target device.
User Interface Optimization by Separating Web Page Content and Navigation
Schema Solution—Gateway 22DD
Although a Schema Engine 23D of the gateway can automatically determine whether to send back menus or content for a given web page, an Schema client (e.g. mobile 24D and/or desktop 26D) has the ability to explicitly request either the navigation menu or the content for a page and the Schema Engine provides each output accordingly. On each screen of the user device 101D (e.g. mobile or desktop) the user either sees the navigation menus or the page content for the respective page. This method is accomplished using the Schema Engine and signature file, further described below.
This demonstrates how the navigation and content from web pages can effectively be separated and transmitted by the Schema Engine to the schema client.
Packaging Page Content and Navigation Menu Data into a Mobile Application
Page Content
From the above example it is apparent that the Schema Engine is able to output navigation and content data independently as the result of a given web page input. Packaging content into a mobile application (e.g. application hosted by the mobile device or desktop device) entails rendering the data output of the Schema Engine in a client mobile application instead of the web browser. In the current embodiment, as an example, a web browser makes a request for a page and receives the content data as the response that it renders. The mobile application can similarly make a request for the page, as the browser could, and render the data received from the Schema Engine. See
To fully package a website into an application, the navigation functions of the website (browsing) and special features (buy item, check availability, and other buttons/links) can be inserted into a menu 300D of the application, see
The pages marked “2” and “4” in
Navigation Menus 300D
A menu item can be statically created at compile time and its function is known at compile time for web pages. The Schema Engine can dynamically create menu items at run time. Assume that a navigational item is meant to be processed on the client that accomplishes inserting menu items dynamically. If the mobile application was passed the extracted navigational information from the engine, the application would insert the items into the application via MenuItem(name, URL), for example. In this case, it is the engine that would pass the data (name, URL) that indicates the data as a potential menu name and corresponding URL as parameters. The application would insert the data as a menu item into the application menu 300D, such that these parameters would then be linked to the corresponding respective menu item selection. The method described above dynamically inserts navigational items of the web page into the application menu of the application used to interact with the web site contents. The implementation of this method can differ depending on the application and platform of the device 101D. It should be recognized that the menu items are related to navigation of the content in the web pages rather than only between the web pages themselves.
The following steps outline an example process for dynamically inserting menu items into a mobile application menu:
1. The client application makes a request for the navigation items of a web page
2. The Schema Engine receives the web page (marked “1” in
3. The Schema Engine extracts the navigational items and sends the data set (menu name, URL) to the client. Table 1 shows the output for the first 5 navigational items from the web page of
4. The client receives the navigational items and calls a createMenuItem( ) method with each [menu name, URL] set received, thus displaying the navigational items as menu items. It is recognized that the menu items can be displayed overtop of the content displayed on the screen of the device 101D, where the navigational items are no longer displayed adjacent to the content (as formatted in the original web page) and rather assembled/combined and displayed in a separate navigational menu for navigating the content of the web site.
At this point, the navigation items for the page are loaded into the application menu 300D. A user can click a menu item, which will result in the application invoking the URL associated with the menu name and thus facilitating the display of the web site content associated with the menu item (representing the original navigational item). For example, the menu item can be used to invoke one of the navigational items of the web site (e.g. “buy item”), rather than just navigate between pages.
Using this method both content and navigation features can be simultaneously retrieved for a given page. For example in the diagram if the user selects the navigational name “computers” the URL page request will be sent to the Schema Engine that will respond to the client with the content for that page as well as the navigation items (in menu format for example) for that page.
The content is rendered in the application as previously described and the web page navigational items are inserted into the application menu as described above. Accordingly, the contents of the navigational menu 300D for any particular web page is dependent upon the navigational items that are contained or are otherwise associated with that web page as configured by the web site. In this case, both traditional content 50D and the navigational features 54D can be treated as components for each of the web pages. Hence, the web pages of the web site (through use of the signature file described below) can be represented as having web page contents that includes both the content 50D and the navigational items 54D. In this sense, each menu 300D for each page is dynamically created based on the navigational items resident/associated with that page (and page content 50D).
Further, it is recognized that some navigational items 54D can remain on the web page as displayed (e.g. embedded with the displayed content 50D), can be represented as separate menu 300D items, or a combination thereof.
Maintaining a Transactional Session Across Devices 101d
Referring to
It is noted that a cookie can be referred to as a small text file of information 80D that certain Web sites can attach to a users hard drive (of their device 101D) while the user is browsing the Web site. The Cookie can contain information such as user ID, user preferences, archive shopping cart information, etc. Since the web sites can be inherently stateless, these cookies or other session history equivalents can also be a good way to create and maintain state from a website's perspective, as implemented by the environment 10D as further described below. Further, a bookmark can be referred to as a process of saving a URL (e.g. network 11D address) in the web browser/application 207D. The bookmark 82D allows the user to return to a particular web site or web page by making a record of the corresponding network address. A bookmark however may not capture the state (data entered/requested in the process of transaction completion) of a user's browsing session, rather the bookmark serves as a reference point for the location of the web page/web site last visited by the user. One can appreciate that a bookmark captures may only a fragment of a user's browsing session, for example only the address of current page that the user was on.
Accordingly, saving and restoring a user's session can have one or more different components, such as but not limited to: saving and restoring the current page and navigation history 82D; and/or saving and restoring the specific website's transactional state 80D pertaining to the user (e.g. using the respective cookie for the transaction).
Saving a User's Browsing Session
Saving navigation history can be accomplished by saving the current page (saving the URL such as a bookmark would do) and optionally gathering the browser's navigation history. For example on a mobile client, all pages that a user requests can be saved on the client or on a remote server.
For example, when a client browser or application (mobile or desktop) makes an http request, a request comes back including 2 parts, an http header and the http content. One of the instructions in the http header is a “set cookie” command. A browser or client application uses that command to create and maintain the cookie on the client. When the browser or client application makes a web page request, it can pass all the cookies back to the website to maintain state. Because cookie information can be in plain text in a header, it can readily be extracted by a mobile client application. One embodiment of the browser/application 207D is to collect cookies on the desktop/mobile is to use a browser plug-in or state application 88 to retrieve cookies from the “temporary internet folder” of the device 101D where cookies are typically stored and transmit them to the remote server or database. Saving cookies is a way to save the user's state from the website perspective.
Accordingly, the user's transaction can be saved through use of the history 82D and/or information 80D. For example, if the user wishes to save a particular transaction-in-progress, the user can notify the gateway 22DD of the intension and the gateway can save the history 82D, information 80D in the memory 92D, for later use in reactivating the particular transaction-inprogress. It is recognized that the data captured as a rich bookmark could also be used in the data 80D,82D as desired.
Restoring a User's Browsing Session
Restoring the current page can be accomplished by making a request by the user for the current page on the client application/browser (mobile or desktop). The gateway is then responsible for sending to the current device 101D (either the same or different device by which the transaction was last done with) a transaction continuance package 84D that is related to the saved particular transaction-in-progress from the memory 92D, which would contain data such as but not limited to: the saved navigation history for use in populating the navigation history of the user device; and/or all saved cookies for use in restoring website state information by placing the cookies into the appropriate location that the browser or client application uses to create and manage cookies, e.g. the “temporary internet folder”.
One aspect is that the application 88D could synchronize all cookies from the desktop to the mobile device or vice versa. This way, user preferences for all web sites (including re membered login ids, for example) could be always synchronized between a mobile device and the desktop. Further, it is recognized that the memory 92D could be used to remember the device on which the transaction-in-progress was last implemented on and to therefore try to maintain the formatting of web pages 86D as displayed previously for the user activity with respect to the transaction-in-progress. One example of this is to keep the simplified formatting of the web pages done for the mobile display the same for display of similar pages on the desktop, even though sufficient desktop screen space is available to display the original content and format of the web pages. Similarly, for transactions started on the desktop, the continuance of the web pages on the mobile, with respect to desktop formatted webpages, could be retained (e.g. through re-organization of the pages and wrap content around the screen, or used of the WAP standard to spatially divide a page (usually vertically) into a number of pages and allow the user to navigate between each page section to view a page). The maintaining of the look and feel of the particular web page content could be useful in keeping the user from becoming confused between format changes of the web pages. Further, for example the user could select a certain web page format for display through the gateway (e.g. original or otherwise simplified format), in the event that the user anticipates changing devices (e.g. desktop to mobile) to continue and complete the transaction, as desired.
Another extension of the concept of saving the transaction-in-progress is that variables such as an affiliate revenue sharing code can be included in the URL. That way, the user can start browsing from a PC or mobile device and save their session based on the code. When they restore the session on another PC or mobile device, the revenue share would be received by appropriate entity based on the code usage.
Caching
There can be cache points on the engine as well as the client. The cache can consist of the actual webpage or the data output of the webpage. Cache's can be build upon request or output can be pre-cached to optimize the user experience. A combination of the above on different kinds of pages can be used to develop caching schemes for usage. Another aspect of the engine is that it can be used to crawl an entire website with the corresponding signature file and build a complete database of product information from the website automatically.
Pre-Caching (Offline Synchronization) of Website Content to a Mobile Client
Another aspect is the ability for a user to load website content (pre-caching) from the Schema Engine to the client in larger segments instead of page by page. This could either be done through an application on an internet enabled PC when the mobile device is connected to the PC or directly from the mobile device when the user has a wireless data connection available. Once the desired content is on the mobile device, the user could browse the content without a wireless connection.
In the current examples, when a user selects a menu item from a menu page, for example “computers” in
Referring to
1. A client (1) makes a request to the Schema Engine(4), acting as a proxy, for a specific webpage (2) from a specific domain
2. The engine (4) receives the web page code (2) into memory. This typically does not include objects on the page such as pictures that are inserted at the time of rendering.
3. The engine (4) in parallel makes a request to the signature repository (3) to acquire the signature file for the domain using the domain in the URL as the key to retrieve the signature file
4. The engine does not render the page but instead uses the code in the signature file as instructions to extract the desired data from the web page (6).
5. The data can optionally be stored in a data repository (5)
6. The engine (4) transmits the data to the client (1) that requested the page
7. The client (1) could be a browser application that displays the data or could be an application that renders the data
Schema Engine Detailed Walk Through
Assume that a clients makes a request for the ABC ComTech Corp. page shown in
Code Snippet of ABC ComTech Corp. Page Shown in
Code Snippet of ABC ComTech Corp. Signature File Retrieved by Schema Engine
Step 1: Schema Engine Confirms that Input HTML is from ABC ComTech Corp.ca and this Signature File is that of ABC ComTech Corp.ca
Step 2: Schema Engine Sets a Required Global Variable to Append “&test%5Fcookie1” to all requests and sets main index to http://www.ABC ComTech Corp. .ca/home.asp?newlang=EN&logon=&langid=EN
Step 3: Schema Engine then Tries to Determine the Page Type by Checking Existence of String Identifiers for Each Page Family
Schema Engine does not find “Sort or compare products” or “Sort products” in the web page so this page is not from the List family. The engine continues to check the next string.
Schema Engine finds “"product-details-prd-title"” and identifies the page as part of the Item family (item_elements).
Step 4: Schema Engine then jumps to the “item_elements” section of the signature file that contains instructions for extracting the object elements for the page
Step 5: Schema Engine Trims HTML Scope
The engine discards all code before “</head>” setting the upper limit.
Step 6: Schema Engine Extract Image
The Schema Engine returns the string in between the first “<img src="” and“"” that appears after next appearance of “largeimageref”. The string returned is the path to the product image.
Step 7: Schema Engine Extracts Title
Then Schema Engine returns the string in between first “<span” and “</span>” including first and last element that appears after next appearance of “product-detailsprd-title”, excluding any mark up language. The string returned is the title.
Step 8: Schema Engine Extracts Price
Then Schema Engine returns the string in between first “<td” and “</td>” including first and last element that appears after next appearance of “our price:”, excluding any mark up language. The string returned is the price.
Step 9: Schema Engine Extracts Sale Price
ABC ComTech Corp. Web Page Code Snippet:
Then Schema Engine returns the string in between first “<td” and “</td>” including first and last element that appears after next appearance of “sale price:”, excluding any mark up language The string returned is the sale price.
Step 10: Schema Engine Extracts Description
Decked out with an impressive 17″ Acer CrystalBrite widescreen display the Aspire 9300 enhances multitasking productivity and gaming pleasure. <a HREF=“#MoreInfo”>More Info</a>
</p>
Then Schema Engine returns the string in the middle of “<p” and “</p>” including first and last element on the occurrence of “detailbox-text”, excluding any mark up language. The string returned is the description.
Step 11: Schema Engine Assembles and Returns all Extracted Data
Signature File
Example Source (*.ffs)
ABC ComTech Corp. Page Family Signature Explanation
The Schema Engine processes the <page type> tag by registering the identification strings for each page family. By doing that, when a webpage is sent to the engine as input, the
engine is able to identify the page family by its unique string.
action=“locate_string”
command to check for the existence of a string
name=”
identifies the type of page family for each identified family
id=”
assigns an id to the page family that is used across the signature file
When the Schema Engine is passed a web page and the signature file, the first step is to identify the page type which then instructs the engine to the corresponding list_elements tag for the page family.
ABC ComTech Corp. List Family Signature Explanation
Once the engine has identified that the page is of the “mylist—1” family the engine finds the spots in the signature file that contains the signature for the objects and elements of the family.
<paging> . . . </paging>
Contains paging attributes of the mylist—1 family. The tags contain instructions to find the number of pages on the list page and generates the links for each of the page links
<actions> . . . </actions>
The action tag instructs the engine to move the scan pointer to the section on the page right before the main list content of the page. This allows the engine to only scan the relevant area, discarding all the code preceding it. This can be important because it can eliminate ambiguity and repetition by instructing the engine on precisely which parts of the page to scan
<elements> . . . </element>
Explanation of the Lookup Command:
Lookup type=“pex”: string lookup
Action=“get_string”: this action type actually return a value back that is the desired element of the object.
Name=“link”: the object element, in this case the link to the product page
ref=“thumbnail”: the reference string that identifies where to find the value of the link
location=“before”: The value of the link is before the ref string
start=“<a href="”: look for the ref string after this value
end=“">”: look for the ref string before this value
Line 11 for example instructs the engine to look for a reference of the string “thumbnail”, then locate the value between the start and end strings specified to the left of reference point. The element, which is the link to the product page in this case, is before the reference string and its value is to be extracted and returned.
The last lookup with action=“move_ptr” in the element tag instructs the engine to move the pointer past the first object to get ready to repeat the instructions to scan in the element of the second object on the list page.
Note: If you attach “advance_ptr” to a lookup, this will also advance the pointer (this can be used if ordering in list page exists)
ABC ComTech Corp. Search Family Signature Explanation
Once the engine has identified that the page is of the “mysearch—1” family the engine finds the spots in the signature file that contains the signature for the objects and elements of the family, shown above.
<settings> . . . </settings>
Contains any page specific manual overrides such as excluding certain menu items or and customization, modification of a menu that may need to be done In this example, value of form variable “keyword” will be posted to “http://www.ABC ComTech Corp..ca/search/searchresult.asp?logon=&langid=EN&search=KWS”
<paging> . . . </paging>
Manages Paging for the Search Pages
<actions> . . . </actions>
Instruct the engine to move the scan pointer to the string “bg-compare-hero” and start looking for elements from there
<element> . . . </element>
Contains lookup instructions for each object element as previously described.
ABC ComTech Corp. Menu Family Signature Explanation
Once the engine has identified that it is looking for a menu on a page that contains the menu style of the “mymenu—1” family the engine finds the spots in the signature file that contains the signature for the objects and elements of the family, shown above.
<settings> . . . </settings>
Contains any page specific manual overrides such as exclude list, customization, modification, personalization, etc. In this example, any result that matches “Site Index”, “ReClaim & Insurance Replacement” are excluded but partial matches are also possible by using wild card strings.
<action> . . . </action>
Line 6 and 7 sets the start limit and end limit to instruct the engine on where to look for menu items
<element> . . . </element>
Contains lookup instructions for each object element as previously described. In this example, an element in ‘mymenu—1’ (each individual menu entry of webpage) contains link and title as its properties. Line 12 instructs the engine to move the pointer to “</li>” to get ready to loop through and extract the next men item with the same elements
ABC ComTech Corp. Content/Item Family Signature Explanation
Once the engine has identified that the page is of the “myitem—1” family the engine finds the spots in the signature file that contains the signature for the objects and elements of the family, shown above.
<action> . . . </action>
Instructs the engine to move the scan pointer to the appropriate spot to get ready to scan and output the product elements.
<element> . . . </element>
Contains lookup instructions for each of the defined fields in a product. In this example, an object in ‘myitem—1’ (an item) contains the elements image, title, price, sale price and description. Note that the pointer does not need to be moved after scanning in the elements since there are no more objects on the product detail page as there are on the list page.
This family had a detail walk though explained above
Appendix: Signature Engine Syntax
Lookup Syntax
Look up is the query which Signature engine runs against the website for a resultset(s).
Type
Defines data type of reference. One lookup can contain multiple references and specific type of each reference, represented by ‘Type_n’ for each nth reference.
Action
Name
An element of an object, for example price if the object is a product
Id
Id is a named relation for an identified family of web pages. It is the SQL equivalent of a table name.
Ref
Ref is the string reference being matched to identify a page, object or element. It is equivalent to a WHERE condition in SQL. There could be multiple Ref values where default Ref is represented by ‘Ref’ and subsequent Refs are presented by ‘Ref_n’. This is equivalent to having multiple conditions in an SQL WHERE clause.
Alt
Alt is an extension of Ref object with SQL equivalent of ‘OR’ clause presented by ‘Ref_Alt_n’.
Location
Extends SQL equivalent of WHERE clause to give directional containment of data set.
Start
Start is used to specify the beginning of a given data element.
End
End is used to specify end of a given data element.
Include_sz
Boolean to include ‘Start’ and ‘End’ value.
Tolerance
Boolean to define whether failure of given lookup excludes the entire query for an object.
Strip_tags
Boolean to define where to strip all HTML tags out of the target value being extracted.
Notrim
Values will not be trimmed (leading and trailing spaces will not be removed).
Upper
Value is converted to all upper case.
Lower
Value is converted to all lower case.
Uppercase_word
First character of each words in a value is converted to upper case.
Uppercase_first
First character of a value is converted to upper case.
Page Syntax
Make a page family's paging feature functional.
Page_variable
Defines unique key that defines a family's paging feature
Page_start
Defines value of first page in a family's paging feature.
Page_post
Path where paging variable(s) must be transmitted to.
Page_start
Defines value of first page in a family's paging feature.
Search Syntax
Make a website family's search feature functional.
Search_path
Search path where search variable must be transmitted to
Search_variable
Name of search variable which a website's search feature is looking to read, request, post, etc.
General Syntax
Any variable name and value can be defined,
TOR_LAW\6600992\1
The present specification claims the benefit of priority from U.S. Provisional Patent application 60/924,503 filed May 17, 2007, the contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60924503 | May 2007 | US |