Web pages provide an inexpensive and convenient way to make information available to other individuals including, for example, consumers of products, students, and media enthusiasts. However, as the inclusion of multimedia content, embedded advertising, and online services becomes increasingly more prevalent in modern web pages, the web pages themselves have become substantially more complex. For example, in addition to their main content, many web pages display auxiliary content such as background imagery, advertisements, navigation menus, and links to additional content, among others,
It is often the case that web page owners, web page developers, or individuals that visit web pages wish to utilize only a portion of the information presented in a web page. Automatic selection of desired content in web pages can eliminate extraneous or undesired content and significantly streamline a number of workflows. For instance, a user may desire to print, a physical copy of an article located at an online news website without reproducing any of the other content on the web page containing the article, such as advertising, links, to other content, etc. Similarly, an owner of a web page may wish to adapt a web page into another document, such as a marketing brochure, without including content in the web page that is superfluous to the new document. Additionally, a user may wish to display only the most relevant web content on a computing device that has a limited screen size such as a mobile smart phone. Other applications that may benefit from automatic selection of desired content in web pages include, for example, search, information retrieval, information management, archiving, and other applications.
The accompanying drawings illustrate various embodiments of the principles described herein and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the claims,
Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements,
The present specification discloses systems and methods of creating applications for popular web page content. As discussed above, there are many applications where automatically selecting one or more portions of a web page can be advantageous. For purposes of explanation, the specification uses the illustrative example of selecting popular portions of a web page to create applications for popular web page content. Currently, when a web page is printed or displayed, it includes a variety of contents. For example, in addition to the main content, many web pages display content such as background imagery, advertisements, navigation menus, headers/footers, and links to additional content, among others. Some of the content of a web page may be print worthy, but the user may not want to print some or all of the auxiliary content. Ideally, the present system and method may access web page data associated with a web page, the web page data comprising a popular selection of content on the web page, create an application for the popular selection of the content on the web page, and present the popular selection of content of the web page to a user for printing, viewing, archiving, or any other useful purpose via the application.
As used in the present specification and in the appended claims, the term “web page” is meant to be understood broadly as any document that can be retrieved from a server over a network connection and viewed in a web browser application. For example, a web page may be a document accessed by a Uniform Resource Locator (URL) on the World Wide Web over a network such as the Internet. Further, as used in the present specification and in the appended claims, the term “web page data” is meant to be understood broadly as any data relating to a web page, For example, web page data may include the web page's Uniform Resource Locator (URL): the web page's Document Object Model (DOM); information relating to the structure and layout of a Document Object Model (DOM) tree of the web page; the layout and structure of any nodes within the Document Object Model (DOM) tree: content of a web page or nodes previously or currently selected by a viewer within a Document Object Model (DOM) tree; content of a web page or nodes not previously or currently selected by a viewer within a Document Object Model (DOM) tree; any data relating to the amount or characteristics of any type of content of the web page selected or not selected by an individual, entity; or combinations of these. Web page data may additionally include any metadata associated with or describing any of the above mentioned types of data. Still further, web page data may also include any data or metadata relating not only to the content of a web page an individual has selected from any one web page in the past, but may also include information relating to when and how often the viewer had previously viewed, utilized, or adapted a web page or content on a web page.
Still further, as used in the present specification and in the appended claims, the term “similar web page” or similar language is meant to be understood broadly as any web page having similar characteristics as compared to another web page. For example, a similar web page may be similar in the type of template used to arrange the text, images or other content displayed on the web page. A similar web page may also be similar because, although the web page address or Uniform Resource Locator (URL) is of entirely identical, the domain name within the Uniform Resource Locator (URL) is the same. Additionally, a similar web page may be similar in the content displayed on the web page.
Additionally, as used in the present specification and in the appended claims, the term “user” is meant to be understood broadly as any person viewing or otherwise utilizing a web page. Therefore, an owner or administrator of a web page, a user of a computing system having accessed a web page, or any other person may be a viewer or user. Still further, as used in the present specification and in the appended claims, the term “user desirable content” is meant to be understood broadly as that content on a web page that a user or viewer wishes to view, utilize or adapt for any purpose. Indeed, the present specification may refer to “desirable” content within a web page that is meant to be understood as those sections of text, images, or any other content on a web page that the user may generally wish to view, utilize or adapt,
Still further, as used in the present specification and in the appended claims, the term “other users” or “crowd” is meant to be understood broadly as any number of people, including one person, other than the user as described above. Further, as used in the present specification and in the appended claims, the terms “crowd consensus” or “popular selection” are meant to be understood broadly as any method and associated algorithms that aggregate the statistical distribution of what parts of a web page have been selected previously, and determines what portions of the web page are considered to be most popular or are part of a consensus of one or more people. For example, the crowd consensus or popular selection may be determined by a frequency count, a voting scheme, a weighted counting scheme, a ranking of a type of selection, or combinations thereof, among others. In one example, a crowd consensus or popular selection may be made by any number of persons including, for example, a user, other users, or combinations of these. Also, a crowd consensus or popular selection may be based on, for example, how often a portion of a web page was selected, what portion or portions of a web page were selected, how consistently a particular portion of a web page was selected, various types of satistical correlations between how related portions of a web page were selected, the weight of the portions of the web pages that were selected, a rank of a type of selection made within the web page, or combinations thereof, among others.
Still further, as used in the present specification and in the appended claims, the term “similar web page” or similar language is meant to be understood broadly as any web page having similar characteristics as compared to another web page. For example, a similar web page may be similar in the type of template used to arrange the text, images or other intent displayed on the web page. A similar web page may also be similar because, although the web page address or Uniform Resource Locator (URL) is not entirely identical, the domain name within the Uniform Resource Locator (URL) is the same. Additionally, a similar web page may be similar in the content displayed on the web page.
Further, as used in the present specification and in the appended claims, the term “app” or “application” is meant to be understood broadly as any computer program or programs, or any machine readable instructions (including software) component or components that, when executed by a processor, provide functionality in direct support of a specific process or processes. In one example, an app or application may be a lightweight application, a smaller application comprising fewer machine readable instructions (such as software) software components or using less memory for storage in a data storage device.
Additionally, as used in the present specification and in the appended claims, the term “user” is meant to be understood broadly as any person viewing or otherwise utilizing a web page. Therefore, an owner or administrator of a web page, a user of a computing system having accessed a web page, or any other person may be a user. Still further, as used in the present specification and in the appended claims, the term “user desirable content” is meant to be understood broadly as that content on a web page that a user or viewer wishes to view, utilize or adapt for any purpose. Indeed, the present specification may refer to “desirable” content within a web page that is meant to be understood as those sections of text, images, any other content on a web page that the user may generally wish to view, utilize, or adapt. Still further, as used in the present specification and in the appended claims, the term “other users” or “crowd” is meant to be understood broadly as any number of people, including one person, other than the user as described above.
Even still further, as used in the present specification and in the appended claims, the term “sub-node” is meant to be understood broadly as any node within a Document Object Model (DOM) tree that has at least one node located on a higher level in the hierarchal order of the Document Object Model (DOM) tree. Therefore, a sub-node may be a sub-node of a node that is itself a sub-node. Additionally, a sub-node may also comprise a number of sub-nodes itself.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present systems and methods. It will be apparent, however, to one skilled in the art that the present apparatus, systems and methods may be practiced without these specific details. Reference in the specification to “an example” or similar language means that a particular feature, structure, or characteristic described in connection with the example is included in at least that one example, but not necessarily in other examples. The various instances of the phrase “in one example” or similar phrases in various places in the specification are not necessarily all referring to the same example.
Referring now to
The client device (105) of the present example is a computing device that retrieves web page data associated with the web page (110) hosted by the web page server (115). The client device further creates an application for the popular selection of the content on the web page, and presents the popular selection of content of the web page to a user for printing, viewing, archiving, or any other useful purpose via the application. In one example, the client device (105) is a printer with the capability of creating such an application, and printing a physical copy of the popular selection of content of the web page. In still another example, the client device (105) may be a desktop computer with the capability of creating such an application, and displaying the popular selection of content of the web page on an output device of the desktop computer.
In another example, the client device (105) is a mobile computing device such as a mobile phone, personal digital assistant (PDA), or a laptop computer with the capability of creating such an application, and displaying the popular selection of content of the web page on a display device of the mobile computing device. In this example, the display device of the mobile computing device may be smaller display device with respect to, for example, a desktop computer. Thus, having an application that runs on the mobile computing device that displays the popular selection of content of the web page (110) provides for better use of the limited space provided by the display device of the mobile computing device.
The client device may collect and save web page data associated with the selection of portions of web pages, and determine the most user desirable content of the web page (110) based, at least partially, on a popular selection by other users' or a “crowd's” previous selections of text, images, and other content on the web page, web pages that are similar to the web page, or other web pages. In the present example, this is accomplished by the client device (105) requesting the web page (110) from the web page server (115) over the network (120) using the appropriate network protocol (e.g., Internet Protocol (“IP”)), and requesting web page data from a popular selection data storage device (117). Illustrative processes for identifying the most use desirable content of the web page (110) are set forth in more detail below.
To achieve its desired functionality, the client device (105) includes various hardware components. Among these hardware components may be at least one processor (125), at least one data storage device (130), peripheral device adapters (135), and a network adapter (140). These hardware components may be interconnected through the use of one or more busses and/or network connections. In one example, the processor (125), data storage device (130), peripheral device adapters (135), and a network adapter (140) may be communicatively coupled via bus (107).
The processor (125) may include the hardware architecture that retrieves executable code from the data storage device (130) and execute the executable code. The executable code may, when executed by the processor (125), cause the processor (125) to implement at least the functionality of retrieving the web page (110), collect and save web page data associated with the selection of portions of web pages, determine the most user desirable or popular content of the web page (110), and create an application that provides the most user desirable or popular content of the web page (110) upon execution of the application according to the methods of the present specification described below. In the course of executing code, the process (125) may receive input from and provide output to one or more of the remaining hardware units.
The data storage device (130) may store data such as web page data that is processed and produced by the processor (125) or other processing device. As will be discussed, the data storage device (130) may specifically save web page data including, for example, a web page's Uniform Resource Locator (URL), Document Object Model (DOM) tree, popular selections of content in a web page, and sections of content in a web page a user has selected. All of this data may further be stored in the form of a database for easy retrieval when the same or a similar web page is once again accessed by a user.
The data storage device (130) may include various types of memory modules, including volatile and nonvolatile memory. For example, the data storage device (130) of the present example includes Random Access Memory (RAM), Read Only Memory (ROM), and Hard Disk Drive (HDD) memory. Many other types of memory are available in the art, and the present specification contemplates the use of many varying type(s) of memory (130) in the data storage device (130) as may suit a particular application of the principles described herein. In certain examples, different types of memory in the data storage device (130) may be used for different data storage needs, For example, in certain examples the processor (125) may boot from Read Only Memory (ROM), maintain nonvolatile storage in the Hard Disk Drive (HDD) memory, and execute program code stored in Random Access Memory (RAM.
Generally, the data storage device (130) may comprise a computer readable storage medium. For example, the data storage device (130) may be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the computer readable storage medium may include, for example, the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The hardware adapters (135, 140) in the client device (105) enable the processor (125) to interface with various other hardware elements, external and internal to the client device (105). For example, peripheral device adapters (135) may provide an interface to input/output devices, such as, for example, output device (150), to create a user interface and/or access external sources of memory storage, such as, for example, popular selection data storage device (117), As will be discussed below, an output device (150) may be provided to allow a user to interact with and adjust the amount and type of content selected within a web page (110).
Peripheral device adapters (135) may also create an interface between the processor (125) and a printer, display device, or other media output device. For example, in an example where the client device (105) is a printer, the printer may create one or more physical copies of the popular selection of web page content. Further, in an example where the client device (105) is a mobile computing device, the mobile computing device may display the popular selection of web page content. Still further, in an example where the client device (105) is a desktop computer, the desktop computer may select user desirable content of the web page (110) and instruct a communicatively coupled printer to create one or more physical espies of the of the popular selection of web page content. A network adapter (140) may additionally provide an interface to the network (120), thereby enabling the transmission of data to and receipt of data from other devices on the network (120), including the web page server (115) and popular selection data storage device (117).
The popular selection data storage device (117) may be any data storage device that stores web page data associated with popular selections of web page content of a number of web pages, Generally, the popular selection data storage device (117) may comprise a computer readable storage medium. For example, the popular selection data storage device (117) may be, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the computer readable storage medium may include, for example, the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The popular selection data storage device (117) may, in place of or in conjunction with the client device (105), collect and save web page data associated with the selection of portions of web pages, and determine the most user desirable content of the web page (110) based, at least partially, on a popular selection by other users' or a “crowd's” previous selections of text, images, and other content on the web page, web pages that are similar to the web page, or other web pages.
The network (120) may comprise two or more computing devices communicatively coupled. For example, the network (120) may include a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), and the Internet, among others.
In the example shown in
The Main Column (215) sub-node also includes two sub-nodes Itself, Left Column (235) sub-node and Right Column (255) sub-node, at the next hierarchal level. Left Column (235) sub-node has three sub-nodes at the lowest hierarchal level: Main Image (240) sub-node, Image Subtitle (245) sub-node, and Article Synopsis (250) sub-node. The Right Column (255) sub-node has one sub-node at the lowest hierarchal level: Article Text (260) sub-node,
The Main Column (315) sub-node contains at least some of the user desirable content that a user would want to view, utilize, or adapt The Main Column (315) sub-node contains a Left Column (335) and a Right Column (355). In the Left Column (335), an image is shown in the Main Image (340) section; in this illustrative example the image (
Turning to
In one example, throughout the process of collecting web page data associated with the selection of content of a web page (Block 505), the client device (
After data relating to the selected portions of web pages has been saved to the popular selection data storage device (117) (Block 505), the client device (
In one example, the popular selection data storage device (117) may save a Document Object Model (DOM) representation (
In one example, a counter may be added to each DOM element (
In one example, the client device (
In another example, the selection of the user desirable content of the web page (
Further, in yet another example, the selection of the user desirable content of the web page (
After the most user desirable content of the web page is determined (Block 510), the client device (
After creation of the application (Block 515), the created application may be available to users for use via, for example, a network, or computer program product. In one example, a user may download the application created, In another example, the created application may be available to users as a computer program product, Upon running the application, the application may then provide the user with the most user desirable or popular content for printing, viewing on an output device, archiving, or any other useful purpose. Computer program code for carrying out operations of, for example, the method of
Once a user has obtained and executed the application on the client device (
The application may provide the most user desirable or popular portions of the web page to the user via an output device (
In another example, the client device (105) may be a mobile phone such as a smart phone (105). The application may be downloaded to, or otherwise provided on the mobile phone (105). The application may provide the mobile phone (105) with data relating to just the most user desirable or popular portions of the web page. The mobile phone (105) may then present the most user desirable or popular portions of the web page on a display device (150) of the mobile phone (105).
Turning now to
The popular selection data storage device (117) or other computing device within the system (
In one example, to find sets of similar web pages, a template matching algorithm run by, for example, the client device (105), may be used. The template matching algorithm may determine, among web pages for which web page data has been saved, which web pages were generated by or created using the same template. Each web page may be compared with any web page available on the World Wide Web or other documents accessed via the Internet or other network.
In another example, the template matching algorithm run by, for example, the client device (105) may determine, among web pages from the same site, which web pages were generated by or created using the same template. In this example, the template matching algorithm may determine which web pages were generated by or created using the same template among web sites with the same domain name within their respective Uniform Resource Locators (URLs),
After similar web pages have been grouped together (Block 610), the system (
In one example, the popular selection data storage device (117) may save a Document Object Model (DOM) representation (
In one example, the client device (
In another example, the selection of the user desirable content of the web page (
Further, in yet another example, the selection of the user desirable content of the web page (
After the most user desirable content of the group of web pages is determined (Block 615), the client device (
After creation of the application (Block 620), the created application may be available to users for use via, for example, a network, or computer program product. In one example, a user may download the application created. In another example, the created application may be available to users as a computer program product. Upon running the application, the application may then provide the user with the most user desirable or popular content for printing, viewing on an output device, archiving, or any other useful purpose. Computer program code for carrying out operations of, for example, the method of
Once a user has obtained and executed the application on the client device (
The example method of
The application may provide the most user desirable or popular portions of the group of web pages to the user via an output device (
In another example, the client device (105) may be a mobile phone such as a smart phone (105), The application may be downloaded to, or otherwise provided on the mobile phone (105), The application may provide the mobile phone (105) with data relating to just the most user desirable or popular portions of the group of web pages. The mobile phone (105) may then present the most user desirable or popular portions of the web pages on a display device (150) of the mobile phone (105).
The characteristics or demographics gleaned from the user may include any information particular to the user including, for example, the user's age, gender, race, nationality, creed, place of residence, place of birth, past domiciles, occupation, interests, associations, accolades, languages spoken, places visited, marital status, family status, sexual orientation, political affiliation, highest education level achieved, and combinations of these, among others, In another example, actions taken by the user in connection with the selection of portions of the web page may also be gleaned from the user. These actions may include, for example, whether the user tends to make relatively smaller selections or relatively larger selections within the web page, or whether the user tends to include images as well as text when selecting portions of a web page.
In one example, the client device (
Once this demographic information has been received, the method may continue by collecting and saving web page data associated with the selection of content of a plurality of web pages (Block 710) made by the user and other users. Next, the system (100) may group similar web pages together (
Using the demographics gleaned from the user, the client device (105) or other computing device within the system (100) may then create an application based on the user desirable or popular selections of portions of web pages and the demographics of the user as compared to other users who have made selections of the web pages and those other user's demographics (Block 725), In one example, the demographics of other users may be matched to some degree with the user's demographics. Once a match has been determined, an application may be presented to the user that has been created for other users whose demographics match that of the user, For example, if it has been determined via the gleaned demographics that the user is a female accountant between the ages of 25 and 35, then the system may provide the user with an application that has been created by other users who are also female accountants between the ages of 25 and 35. This example may prevent overloading the user with irrelevant applications and may prevent the need to create an application for each individual user.
In another example, the demographics gleaned from the user may be used by the client device (105) or other computing device within the system (100) to create an application based on the user desirable or popular selections of portions of web pages and the demographics of the user as compared to other users who have made selections of the web pages and those other user's demographics (Block 725), In this example, the user desirable or popular selections of other users with similar demographics as compared to the users demographics may be used to create the application (Block 725). For example, if it has been determined via the gleaned demographics that the user is a female accountant between the ages of 25 and 35, then the system (100) may match the user's demographics with the demographics of other user's. Then the other users' popular selections may be used in creating the application for the user.
In the above examples, the collection of web page data (
Further, in one example, generated applications may be periodically tested to ensure that the applications still produce valid results. In some instances, originating web pages or groups of web pages may be removed from the World Wide Web or otherwise made not available for access, In other instances, originating web pages or groups of web pages may have been altered as to its layout, structure, or template so as to no longer provide valid results. Therefore, if upon periodic testing of these web pages and groups of web pages, the web pages fail, then the application may be temporarily removed from availability to users. For example, the applications may be removed temporarily if they fail to produce valid results over a period of a week, In one example, if these applications fail over a long enough period, then they may be removed completely. In one example, the period for permanent removal of the application may be, for example, a month.
The specification describes and figures illustrate a method and system of creating an application for the popular selection of content on a web page. The method may comprise collecting web page data associated with a web page, the web page data comprising a selection of content on the web page, determining among the selection of content of the web page, which content is popular, and creating an application based on the popular selection of content of the web page. This creation of applications for popular web page content may have a number of advantages, including ease of presenting selected portions of a web page to a user that reflects what most users want to select while reducing or eliminating the need for manual selection by the user, These advantages would assist a user in printing, or archiving only desired portions of a web page, and viewing these desirable portions on computing devices with smaller screens such as a mobile phone. All of these advantages are possible without extra programming or configuration needed to add new web sites or identify new web sites. Further, no cooperation is needed from the web site publisher, web page server administrator, or other party.
The preceding description has been presented only to illustrate and describe embodiments and examples of the principles described. This description is not intended to be exhaustive or to limit these principles to any precise form disclosed. Many modifications and variations are possible in light of the above teaching.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US10/60304 | 12/14/2010 | WO | 00 | 2/19/2013 |