Although targeted advertising is common on the World Wide Web, such advertising may have little lasting impact on the web user given that the advertising is often quickly replaced with other web content as the user surfs from electronic document to electronic document. Of potentially greater value would be commercial content that is of a more permanent nature than electronic documents, and therefore more likely to be noticed and acted upon by a user.
The disclosed systems and methods can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale.
As described above, existing online targeted advertising may have little lasting impact on the typical web user. Moreover, many electronically exchanged documents, including both text and image documents, contain little or no commercial content. Therefore, it can be appreciated that it would be desirable to have a system or method for providing relevant commercial content, not originally associated with an electronic document, for users. Disclosed herein are systems and methods that achieve that goal by adding commercial content to electronic document printouts. This can include adding commercial content to documents that result when an electronic document, accessible from a client computer by a network link, is printed by a user. This can also include adding commercial content to electronic word processing documents, PDFs, image files and the like, when the same are printed by a user.
In some examples, the electronic document content that the user has accessed and presumably may chose to preserve by printing, e.g., PDF, word processing document, image file, eta, is identified and analyzed to determine its underlying subject matter and/or a taxonomic analysis to determiner information. Next, commercial content, such as advertisements and/or coupons, pertinent to the underlying subject matter is identified, based on using meta-data associated with the various commercial content in a commercial content database including location, demographic, revenue and the like meta-data, to select relevant commercial content to add to the new, printable electronic document. Once the commercial content has been identified, a new, printable document is created and formatted for printing that comprises both the electronic document content and the commercial content, which may be formatted for unobtrusive placement on the printed page. In some examples, the new, printable document for printing may exclude content that the user does not wish to preserve in a printout. e.g., footers, headers, source formatting, comments and/or annotations, citations, web site navigation features, hyperlinks to other web pages, and online advertisements, and the like. By filtering such content, a printout having improved formatting and less clutter results, even though new, additional commercial content has been added.
Referring now in more detail to the drawings, in which like numerals indicate corresponding parts throughout the several views,
As described in greater detail below, the server computer 104 is, in some examples, configured to identify relevant electronic document content that is to be printed and further to identify commercial content that is to be added to the relevant electronic document content to printout. Ire some examples, the sever computer 104 can be configured to create and format a new, printable document that can be used to generate a printout. In some examples, the server computer 104 is further configured to filter out at least some of the electronic document content, e.g., footers headers, source formatting, comments and/or annotations, citations, image or photo background, web site navigation features, hyperlinks to other web pages, and online advertisements, and the like to improve printout format and reduce printout clutter.
The processing device 200 can include one or more processors associated with the computer 102, e.g., a semiconductor based microprocessor (in the form of a microchip), and/or can include hardware processing resources in the form of an application specific integrated circuit (ASIC). The memory 202 includes any one of or a combination of volatile memory elements (e.g., RAM) and nonvolatile memory elements (e.g., hard disk, flash memory, ROM, tape, etc.).
The user interface 204 comprises the components with which a user interacts with the computer 102. The user interface 204 may comprise, for example, a keyboard, mouse, touchscreen, and a display, such as a cathode ray tube (CRT) or liquid crystal display (LCD) monitor. The one or more I/O devices 206 are adapted to facilitate communications with other devices and may include one or more communication components such as a modulator/demodulator (e.g., modem), wireless (e.g., radio frequency (RF)) transceiver, network card, etc.
The memory 202 comprises various programs including an operating system 210, a browser printing component 212, and a network link 214. The operating system 210 controls the execution of other programs and provides scheduling, input-output control, file and data management, memory management, and communication control and related services. The browser printing component 212 is configured to translate content from user applications, such as word processing applications, file sharing applications, a network browser, and the like accessible over a network link 214, into print content that can be transmitted to an appropriate printing device for the generation of a hard copy printout. The network link 214 is a program that is configured to access and display network content. The network link 214 is used to access, display, and edit electronic documents (image or text content), browse the World Wide Web (“the web”) over the Internet, etc.
In the example of
As indicated in
In some examples, the print manager 312 is configured to control printing of electronic document content. Such control includes control over the format of the electronic document content as well as control over what commercial content is to be added to a printout of the electronic document content. In the illustrated example, the print manager 312 comprises various modules, including a content extractor 316 that extracts relevant electronic document content from the electronic document content, a content analyzer 318 that determines the underlying subject matter or taxonomic information of the electronic document content and identifies relevant commercial content, and a document generator 320 that creates and formats a new, printable document for printing that comprise both the relevant electronic document content and the relevant commercial content. In some examples, the electronic document content extraction inherently non-relevant content, e.g., footers, headers,source formatting, comments and/or annotations, citations, web site navigation features hyperlinks to other web pages, and online advertisements, and the like from the electronic document. The commercial content added to the document can be obtained from the commercial content database 314, which stores and categorizes various commercial content (e.g., advertisements and/or coupons) available for addition to documents to be printed. As explained in more detail below, the content analyzer 318 executes instructions to use meta-data associated with the various commercial content in the commercial content database including location, demographic, revenue meta-data, to select relevant commercial content to add to the new, printable electronic document.
Example systems having been described above, operation of the systems are now discussed. In the discussions that follow, flow diagrams are provided. Process steps or blocks in the flow diagrams may represent modules, segments, or portions of code that include one or more executable instructions for implementing specific logical functions or steps in the process. Although particular example process steps are described, alternative implementations are feasible. Moreover, steps may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.
In block 420, the method includes analyzing the electronic document content to determine underlying subject matter associated with the electronic document. As described above, the electronic document may include relevant electronic document content that the user wishes to preserve in the hard copy printout (e.g., certain underlying subject matter and/or theme) as well as other non-relevant electronic document content that forms part of the electronic document but that the user does not wish to preserve, e.g., footers, headers, source formatting, comments and/or annotations, citations, image or photo background, web site navigation features, hyperlinks to other web pages, and online advertisements, and the like. The relevant electronic document content may comprise, for example, one or more of a written article, a graphic, or an image that is the central subject or focus of the electronic document. The undesired content may comprise one or more extraneous features of the electronic document, such as mentioned above.
Such analysis can be performed by using the commercial content plug-in, content extractor, or a combination of both, to execute instructions to determine underlying subject matter associated with the electronic document. By way of example, if the desired content comprises a written article, the analysis can comprise analysis of the words, phrases, or sentences used in the article to determine one or more themes of the article. Additionally, if the desired content is a graphic or image, analysis can comprise analysis of tags associated with the graphic or image that describe it or direct analysis of the image data (e.g., pixels) of the graphic of image to determine the subject of the graphic or image.
In at least one example, the plug-in content extractor, or a combination of both first executes instructions to create a document object model (DOM) data structure for content analysis and extraction. The DOM, for example, can analyze the cluster of contiguous paragraphs together and the cluster with the largest number of paragraphs, in terms of character count, can be chosen as the text block to an electronic document. Within this text block, the plug-in, content extractor, or a combination of both, can then execute additional instructions to further prune out non-relevant content, e.g., icons and link-lists, and to discriminate between add and article images. In one example text electronic document, the outcome of the electronic document content analysis consists of the following components: the article text body, title, associated relevant images and captions, etc, in block 430, the method includes identifying commercial content relevant to the underlying subject matter. In one car more examples, such analysis can be performed by using the commercial content plug-in, content analyzer, or a combination of both, to execute instructions to perform a taxonomic analysis on the underlying subject matter and/or theme associated with the electronic document.
In at least one example, the content analyzer associated with a server computer, e.g., add server, executes instructions to use meta-data associated with the various commercial content in the commercial content database of the server computer, including location, demographic, revenue, and the like meta-data, to select relevant commercial content to add to the new, printable electronic document.
By way of example and not by way of limitation, a data set of advertisements and coupons along with the necessary meta-data or features for contextual matching can be preprocessed by tokenization, stop word removal, and word stemming. Each document is then represented as a token vector, where each element is the TF-IDF (term frequency-inverse document frequency) of the token. Those token vectors can be further processed with a feature selection algorithm to reduce the dimension. A support vector machine (SVM) can be used as the classification method. The SVM is a classifier for binary classification tasks, but it can be extended to address the multi-class classification tasks by combining the results of multiple binary classifiers.
In block 440, the method includes creating and formatting a printable document that includes the electronic document content and the identified commercial content. Irrespective of the manner of analysis that is performed, commercial content is then identified that is relevant to the determined underlying subject matter based on a taxonomic analysis and using meta-data associated with the various commercial content in the commercial content database including location, demographic, revenue meta-data and the like, to select relevant commercial content to add to the new, printable electronic document.
Beginning with block 500 of
Once the relevant electronic document content is identified 505, the commercial content browser plug-in analyzes that content to determine its underlying subject matter executing instructions to perform a taxonomic analysis on the information, as indicated in block 504, to extract an article 505.
At this point, the commercial content plug-in executes instructions to query a database of commercial content 506, e.g., based on a further taxonomic analysis 507, to identify commercial content, for example advertisements and/or coupons, that is pertinent to the determined underlying subject matter. In some examples, such searching comprises the commercial content plug-in sending a search query to the server computer (e.g., add server computer 104 of
As shown in block 510, the commercial content plug-in can receive commercial content to be printed along with the electronic document content. The commercial content plug-in can then create and format a document comprising both the electronic document content and the received commercial content 510. Then, with reference to block 512, the commercial content plug-in provides the new, printable electronic document 513 to the browser printing component for translation and transmission to the printing device 514 that generates the hard copy printout.
In some examples, the new, printable electronic document includes only or nearly only the electronic document content and the received commercial content, and therefore excludes much or all of the irrelevant electronic document content. With the exclusion or filtering of that extraneous electronic document content, a cleaner, better formatted printout results.
At 604 the server computer 606 receives the electronic document 605 and executes instructions 611 to analyze the electronic document content to determiner underlying subject matter associated with the electronic document, as the same has been described above. That is, instructions can be executed to perform a taxonomic analysis 607 on the received electronic document. In at least one example the server computer 606 additionally executes instructions to use eta-data 609 associated with the various commercial content in a commercial content database including location, demographic, revenue meta-data, to select relevant commercial content 608 to add to the new, printable electronic document 612. In one or more examples, the server computer can identify the electronic document content that is relevant, e.g., that the user wishes to preserve, and generate the same as a hard copy printout, as indicated at 613.
As before, such relevant electronic document content identification 604 comprises identifying the main content of the electronic document. Once the electronic document content is identified, the server computer analyzes, e.g., using taxonomic analysis and meta-data 609 associated with the various commercial content in a commercial content database including location, demographic, revenue meta-data, to select relevant commercial content 608 to add to the new, printable electronic document 612. The database of commercial content can contain, for example, advertisements and/or coupons, that are relevant to the determined underlying subject matter, to result in a new printable document 612.
At 613, the server computer and/or the client computer can send the new, printable document 612 to a printer 614. That is in this example, the client computer can field the new, printable document 612 and send to a printer 614 for printing or the server computer can send the new, printable document 612 to a printer 614 for printing.
In the methods described above, revenue can be generated by the placement of the commercial content on the electronic document printouts. In some examples, the central server computer or other device that controls access to the commercial content database can track which pieces of commercial content are used and how often and can therefore can determine what to charge the advertiser in a per-print scenario.
It is noted that, in some examples, the user can opt-in or opt-out with respect to commercial content being added to his or her electronic document printouts. Incentives may be provided, however, to encourage opting in. For example, in a pay-for-printing scenario, printing fees may be discounted or waived in cases in which the user agrees to the inclusion of commercial content on his or her electronic document printouts.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/CN2010/001453 | 9/21/2010 | WO | 00 | 3/7/2013 |