Currently, web-based search services allow users to search for web page content based on key words received from the user. Typically, these search services receive a string of text and provide a list of search results. Each entry in the search results usually includes a short description and a URL link to a web site. A user may select one of the URL links to load and view a web page comprising one of the results of the search.
Some search services allow a user to search for images available over the web. These services receive a text string and search for images having meta-data that matches the received string. The results of a typical image search display rows of images with a short description and URL link for each result. A user may go to the URL associated with each result by selecting the image or the URL below it.
Some web sites provide a user experience which provides a first set of content, such as a list of article titles, and provides additional content when a cursor is placed over the first content. For example, a sports-related web site might list several article headings which, when selected, redirect a user to another web page with the complete article. When a user positions a cursor over the heading for an article, the beginning of the article may be displayed in a text box near the heading. Similarly, a web site that offers a movie rental service may provide a first sent of content that lists several movie names and a second set of content comprising a movie image and description when a user positions a cursor over a movie name.
Typically, these informative text boxes and images provided by positioning a cursor over a heading, text or other first set of content are programmed into the web page that provides the first set of content. The web page provides static content about the article or movie that is provided upon detecting the position of the cursor.
Search services that provide text and a URL as search results do not provide a user with much information to determine what the site actually contains. Most websites that appear to “preview” content from another page, such as a web page that previews a web page with a text article, hardcode the preview text into the web page. As a result, if the article changes, the “preview” content in the referencing web page must be changed as well. This requires programming time and resources.
The present technology, roughly described, provides a preview of related visual content using an image collection. When a first content page (the context page) is provided to a user per a user request, an image collection is implicitly generated and presented to the user in conjunction with the context page. The image collection is comprised of images that are related to the content of that context page and comprised of selected images from a set of related content pages. The set of related pages may include, for example, one or more of the content pages directly linked from the context page, pages linked from those directly linked pages, and content pages identified to be related via a search algorithm or search engine. The image collection is prepared implicitly when a content page is loaded and does not require any software in the current content page to be changed as the related content pages change.
The image collection, or image cloud, is comprised of images contained in content pages, such as web sites, that are linked to a content page that a user is viewing. The images may be positioned in rows, columns, or some other manner within the collection and are embedded with a URL link corresponding to the page the image is originally from. When a user selects an image within the image collection, the content page in which the image is originally from is then retrieved using the embedded URL and provided to the user.
Images may be processed, by image analysis or some other type of processing, before they are included in the image collection. In some embodiments, the image size, aspect ratio, contrast and other features are compared to thresholds to determine whether the image should be included in the image collection. Images that meet these requirements may still be manipulated before they are placed in an image collection.
An embodiment receives content page data for a first content page requested by a user. The content page data is parsed for identification information for one or more related content pages. The Related content pages found by the parsing are retrieved and images in the retrieved pages are detected. One or more of the detected images are selected to include in an image collection. The image collection is then generated with the selected images and provided to a user.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
A preview of related visual content is provided using an image collection. When a first content page (the context page) is provided to a user in response to a user request, an image collection is implicitly generated and presented to the user in conjunction with the context page. The image collection is comprised of images that are related to the content of the context page and comprised of selected images from a set of related content pages. The set of related content pages may include, for example, one or more content pages directly linked from the context page, pages linked from directly linked pages, and content pages identified to be related to the content page via a search algorithm, search engine, or in some other manner. The image collection is prepared implicitly when a content page is loaded. The current content page does not require any software changes or updates as the related content pages change.
The image collection, or image cloud, is comprised of images contained in content pages, such as web sites, that are Related to a content page that a user is viewing. For example, if the first content page being viewed by the user has web links to four other content pages, the image collection may be constructed using images in the four other content pages. The images may be positioned in rows, columns, or some other manifestation within the collection. Additionally, each image in the image collection may be embedded with a URL link corresponding to the page the image is originally from. When a user selects an image within the image collection, the content page in which the image is originally from is then retrieved using the embedded URL and provided to the user. In some embodiments, the URL link feature may have different behavior in thumbnail/big image scenarios.
Images may be processed and/or analyzed before they are included in the image collection. In some embodiments, the image size, aspect ratio, contrast and other features are compared to thresholds to determine whether the image should be included in the image collection. In some embodiments, the content page in which the image is presented may also be analyzed to determine whether images in that page are related to the context page. Images that meet these requirements may still be manipulated before they are placed in an image collection. For example, images may be cropped to remove content determined to be less interesting or to focus on content considered to be more interesting. Html parsing can define the importance of the image context wise, such as how relevant the page is to the user, and interest wise, to determine if the image is title or content. For example, images may be divided into groups of content images and structure images. Content images are related to an article. Structure images include titles, menus, and so on. The present technology may filter images remove structure images and filter and/or otherwise process only relevant content images that are relevant that a user may be interested in. Any of several image processing algorithms may be used to determine different features of an image and what portions should remain or be removed.
The image collection is prepared by an image collection engine. The engine may be implemented as a service side engine, client application, browser application plug-in, or as some other software, hardware or combination of these. In some embodiments, the information can be cached and doesn't need to be processed for every request. The server might have a preprocessed list of relevant pages and images for a list of URLs or any information extracted from the page. It can also have preprocessed information about the page that is viewed. The image collection engine may parse a first context page to identify URLs listed in the code of the page, retrieve the content pages for each listed URL, identify the images in the retrieved content pages, process the images and construct the image collection from the retrieved images. In some embodiments, there might be different parsing depending on content, for example flash sites might have a flash reader and Adobe Acrobat documents might get the images from the document.
The image collection is a tool for providing visual preview information for one or more content pages related to a current context page. The visual nature of the collection allows users to quickly determine the nature of the content pages related to the currently viewed page. Additionally, the image collection is generated at the time the current page is viewed, so the image data provided in the collection is up to date and requires no programming resources to track changes made to the related content pages. Rather, changes made to any related content page image are automatically processed to be included or reflected in a context page at the time the current context page is provided.
In some embodiments, some computer intensive processing can be done in advance. In general there are three types of information that can be handled by offline computation. First, processes that are slow because of a slow network, having them at a local server (via a proxy, local storage or local network storage like using a search engine cached pages)will speed fetching them. Second, processes that require intensive processing can be cached (for example pictures of rendered internet pages as the rendering process is expensive on resources). Third, information gathered from specific service providers and stored (like a data base of pages linking to a specific URL, or a list of URLs that are relevant to a specific keyword). This information can be gathered through open resources or via internal resources (processing a web graph that is an internal resource or crawling the web).
Client device 110 includes browser application 112 and communicates with network server 130 over network 120. Browser application 112 may retrieve content from network server 130 over network 120 and provide the content to a user through an interface. In some embodiments, browser application 112 may be implemented using “Internet Explorer” provided by Microsoft Corporation of Redmond, Wash. Image table 118 includes sets of information for one or more images. Each set of image information contained in a content page, such as a web page, which is related to a current content page, or web page, being viewed by a user through browser application 112.
Network 120 may be implemented as the Internet or other WAN, a LAN, intranet, extranet, private network or other network or networks.
Network server 130 is in communication with client device 110 and application server 140. Network server 130 may receive requests from client device 110 over network 120, generate a response and send the response to the client device. Generating a response may include sending a request to application server 140. In some embodiments where network 120 is the Internet, network server 130 may be implemented as a web server and provide web pages to browser application 112 on client device 110. Additionally, in some embodiments, network server 130 may provide a content page with embedded image collection information in the page. The embedded image information may be provided as a hint, toolbar element, other visual form or other information indicating the existence of an image collection associated with the current content page.
Application server 140 may communicate with network server 130 and backend server 150 and contain one or more applications (not illustrated). In some embodiments, application server may respond to requests from network server 130 (web server) to process requests for content pages from client 110. While processing requests from network server 130, application server 140 may send queries or other requests to backend server 150.
Backend server 150 may be implemented as a database, data store, application server, or any other machine which may be queried or receive requests from an application server, including application server 140.
Application servers 160, 162, 164 and 166 may each contain one or more applications that perform operations and provide a result in response to a received request. Each of application servers 160-166 may be implemented as one or more servers and may communicate with one or more network servers, backend servers, or other machines.
Application server 160 includes image collection engine 165 and may communicate with client device 110 and application servers 162-166. In some embodiments,
Image collection engine 175 may construct and provide an image collection to client device 110. In particular, image collection engine 175 may receive or retrieve a requested content page (context page), parse HTML code of the current content page to identify links to other web pages, retrieve the linked and other related web pages and process the pages. Processing the related content pages is performed to identify one or more images in each content page. The images in the related content pages are then processed to determine if they should be included in the image collection. Images that should be included in the image collection are then used to construct the image collection page by image collection engine 175. In some embodiments, the functionality and features illustrated on different application servers and image collection servers may in reality be running on the same physical machine, such as application server 160 which contains the image collection engine 165.
In some embodiments, browser application 112 may make one or more requests from one or more remote servers to provide an image collection to a user. For example, browser application 112 may request a content page from network server 130 and request an image collection from application server 160 which may provide a related image service. In this embodiment, the related image service may be implemented by image collection engine 165 and retrieve images contained in a first set of one or more pages directly linked to the requested content page, a second set of content pages linked to a content page in the first set of content pages, or other content pages related to the requested content page (the context page), such as for example related content pages that are found via a search engine or other search service based on a key word search generated in response to detecting key words in the first or second set of content pages. Image collection engine 160 may identify the related content pages, retrieve image data from application servers 162-166 which provide the related content pages and the image data, and provide an image collection to browser application 112 in response to the request. This is discussed in more detail below.
In some embodiments, as discussed above, image data may be retrieved and placed in an image table. In some embodiments, image data is not stored in a table locally. Rather, a remote service processes related content page data, identifies images to include in an image collection, and provides the image collection to the client requesting the original content page (context page). Use of an image table 118 is therefore optional.
The web page content associated with the user input received at step 510 is retrieved at step 520. To retrieve the content, browser application 112 may send a request to network server 130. Network server 130 receives the request, processes the request and optimally invokes application server 150. Network server 130 then generates a response and provides the response to browser application 112 on client device 110. In some embodiments, the web page content is retrieved from a cached page and offline methods.
Steps 522-545 of the method in
The retrieved content page is analyzed and supplemental content pages are retrieved at step 522. The retrieved content page is analyzed for content which may be used to identify content pages having images that may be of interest to a user. In some embodiments, the retrieved content page is analyzed to determine content such as one or more key words. The key words are provided to a search engine which generates a list of content page links for content pages that match the key words. The most relevant of the list of content pages, for example the top five listed content pages, may be retrieved as supplemental content pages.
The retrieved web page and the supplemental content pages are analyzed for links to additional content at step 525. The web pages may be analyzed by an image collection engine implemented on an application server, as a client application, a browser plug-in, or in some other implementation. The image collection engine may analyze the web page by parsing the web page to determine if the web page includes any URLs, semantic information, such as keywords, title, and metadata, or other data.
We assess the relevance of each image to the context page by comparing keywords, title, description etc.; in addition the domain and path to the image and the page hosting the image is another indication of relevance. For example, an image hosted on a different domain (which is not an advertising service) is more likely to contain content than graphics elements. An image coming from pages that are deeper in the site hierarchy is likely to be associated with drill down information. An image located in a separate branch of the site is less likely to be related and therefore will have a lower score and higher chance of being filtered out.
After analyzing the retrieved web pages, additional related content pages associated with each URL link in the retrieved page are retrieved at step 530. Thus, if the image collection engine identified any URLs in the requested content page and supplemental content page, a request is made to those URLs to retrieve the related content page associated with each URL. This step may be repeated for at least another iteration for the retrieved supplemental content pages and pages linked to the first retrieved content page (the context page).
Images in each related content page are selected to include in an image collection at step 535. Images may be selected based on analyzing, modifying, and/or other processing of the images. Selecting related content page images to include in an image collection is discussed in more detail below with respect to
An image collection page is then constructed at step 540. The image collection page is a collage of images originally contained in content pages which are related to the currently provided page and generated by an image collection engine or plug-in. The images in the image collection page may be sorted by contact page, relevance, and/or other factors. In some embodiments, a semantic image can be created from scratch describing the content. An example of an image collection page is provided in
An image may be filtered out from the collection if it is too similar to the context page. In some embodiments, the present system may also look for results that are similar to each other. This can be done via clustering algorithms or in some other manner. In one embodiment, when two or more results are similar, the present system only selects a single representative (the one that is more relevant to the context).
In some embodiments, image collection page information is embedded in the requested web page at step 545. The user requested web page received at step 510 is modified to include image collection page information. The embedded information may be a hint, a button in an existing or new toolbar, or some other visual or audio indicator. In some embodiments, a hint may include an icon, a bold, underline or other modified text, highlighted portion of the page or some other hint mechanism. In some embodiments, the embedded information is added to a frame or other portion of a web page. The embedded information may indicate the image collection exists, provide a portion of the page, or include other information.
In some embodiments, the image collection page is not embedded within the content page but provided to browser application 112 as separate data in response to a separate request, such as for example by application server 160. In this embodiment, browser application 112 may receive the separate response containing the image collection data and provide store the response data until providing it through an interface at step 555. Because the image collection can be provided separately and based on a second request made by client device 110, step 545 is optional as indicated by the dashed lines comprising the step.
The requested web page is provided with the embedded image page information at step 550. This is the page retrieved at step 520 above and modified to include embedded image collection information. In some embodiments, providing the requested content page, the context page, and the image collection data includes providing the requested content page with modifications indicating that an image collection page or data is available. The modification may be a highlight of a button, link, or some other modification to the user requested content page.
After providing the user requested web page, the image collection page may be provided to the user at step 555. This is the image collection page constructed at step 540 may be provided to the user in response to user input or some other event. Providing the image collection page may include modifying or filtering the page, based on user input or other data. Providing an image collection is discussed in more detail below with respect to
In some embodiments, the image collection data is provided in response to a second request transmitted by browser application 112 to image collection engine 165 residing on application server 160. Thus, the image collection engine 165 may provide the image collection data in response to the second request while acting as a “related images” service on application server 160. The response to the second request may include data comprising an image collection. The image collection provided by browser application 121 is generated by the “related images” service in response to a request initiated by user input received through browser application 112 and provided to a user in response to input received through the content page once provided.
The selected image is analyzed to determine whether to include the image in the image collection at step 630. Analyzing an image may include determining if the image meets size requirements, color requirements, and other requirements for images to include in the image collection. Analyzing a selected image is discussed in more detail below with respect to
If an image is selected to be included in an image collection at step 630, the image is modified based on content within the image, if needed, at step 635. Modifying the selected image may include cropping the image to remove non-interesting portions or to emphasize more interesting portions of the image. Modifying a selected image is discussed in more detail below with respect to
Image information is then gathered for the selected image at step 640. The gathered image information may include image tags, context tags, properties, and other information. This is discussed in more detail below with respect to
When there are no more images in a selected content page to analyze, a determination is made as to whether there are more related content pages to analyze at step 655. If there are more related content pages, the next related content page is selected at step 660 and the method of
Similarly, for compressed images, the present technology can optionally analyze the compression ratio and only include those images that provide enough detail.
If the image meets the selected image size requirements, a determination is made as to whether the selected image meets aspect ratio requirements at step 720. In some embodiments, the aspect ratio of an image should be one of several standard aspect ratios, such as 4:3, 3:2, 16:9, 1.85:1, 2.39:1, 4:1, 1:4 and other ratios. If the selected image does not meet the aspect ratio requirement, the image is not included in the image collection at step 750. If the image does meet the aspect ratio requirement, then a determination is made as to whether the selected image meets color content requirement at step 730. The color content requirement may require the entire image not be one color, does not have a contrast which includes one or more horizontal lines, or some other color content requirement. If the image does not meet the color content requirements, the image is not included in the image collection at step 750. If the image does meet the color content requirements, the image is included in the image collection at step 740.
In some embodiments, other analysis techniques and filters may be performed on an image to determine if it may be suitable to include in an image collection. For example, an image may be analyzed to identify the image compression ratio. In some embodiments, a less compressed less may have better quality and therefore be preferred for including in an image collection. Additionally, candidate images may be scored according to some criteria, such as by relevance or some other criteria, and the top scored candidates (for example, the top 10) may be included in the image collection.
Properties of the page where the image is embedded are also used for scoring and/or filtering. For example, the domain (is it the same as the context page), and the local path (is it a drill down from the context page, stepping up from the context page, or stepping aside from the context page) can both be considered. An example for scoring a page is extracting the keywords of a page using any known algorithm. The present technology may extract the keywords from both the page given in the input URL and the candidate page we want to score. The more keywords that are similar between both pages, the higher that score that candidate page will have
First, a determination is made as to whether the selected image has an “active area” within the image at step 810. An active area may be determined as a portion of an image that differs from its surrounding pixels or majority of the remainder of the image, such as a person standing in front of a wall. If the selected image has an active area, then the method of
A determination is made as to whether the second image has an area of high interest as determined by other means at step 830. These also can affect the filtering of the image, but are compute intensive for the basic inspection. Examples of other areas of high interest include facial recognition, edge detection, filtering, threshold processing, and other methods. If the selected image has an area of high interest at step 830, then the image is cropped to include the detected high interest area at step 840. If the selected image does not have an area of high interest, a determination is made as to whether the selected image has a textual portion of graphics at step 850. The selected image may have graphics that form text, which is generally of low interest. If the selected image does have a graphic, textual portion, then the image is cropped to remove the detected low interest portion graphic text at step 870. If the selected image does not have a textual graphic portion, a determination is made as to whether the selected image has some other area of low interest at step 860. Areas of low interest may be determined using methods known in the field of image processing, similar to those listed above to determine high interest areas. If the image does not have any other area of low interest, the method of
The method of
Next, a determination is made as to whether navigation and/or zoom input is received for the image collection page at step 1020. Navigational input may include an input to move or scroll the page within the image to the left, right, up or down. Zoom input may include input to either zoom in to view a closer up view of the image or to zoom out to view more of the image at less detail. If input is not received at step 1020, the method of
A determination is made as to whether user input to filter the image collection images has been received at step 1030. A user may provide input to adjust the image collection after the image collection is initially displayed for the user. User input to filter the collection images may include input to filter the images by the image date, tags associated with the image, image description information, or other information. If no user input is received to filter the image collection, the method of
A determination is made as to whether a mouse over input is received for an image within the image collection at step 1045. A mouse over input may include positioning a cursor over an image and keeping the cursor in that position for two or more seconds. If a mouse over input is not received, the method of
A determination is made as to whether a mouse select input is received for an image within the image collection at step 1055. The mouse select input may include a left or right mouse click while the cursor is placed over an image within the image collection. If no mouse select input is received at step 1055, the method of
Computing environment 1200 of
The technology described herein is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the technology herein include, but are not limited to, personal computers, server computers, hand-held or laptop devices, mobile phones or devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The technology herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth that perform particular tasks or implement particular abstract data types. The technology herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are related through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 1210 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 1210 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 1210. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
The system memory 1230 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 1231 and random access memory (RAM) 1232. A basic input/output system 1233 (BIOS), containing the basic routines that help to transfer information between elements within computer 1210, such as during start-up, is typically stored in ROM 1231. RAM 1232 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 1220. By way of example, and not limitation,
The computer 1210 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
The computer 1210 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 1280. The remote computer 1280 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 1210, although only a memory storage device 1281 has been illustrated in
When used in a LAN networking environment, the computer 1210 is connected to the LAN 1271 through a network interface or adapter 1270. When used in a WAN networking environment, the computer 1210 typically includes a modem 1272 or other means for establishing communications over the WAN 1273, such as the Internet. The modem 1272, which may be internal or external, may be connected to the system bus 1221 via the user input interface 1260, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 1210, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
The foregoing detailed description of the technology herein has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claims appended hereto.