This invention relates generally to a user interface and a method of interfacing with a client computer system over a network such as the internet, and more specifically for such an interface and method for conducting local searches and obtaining geographically relevant information.
The internet is often used to obtain information regarding businesses, events, movies, etc. in a specific geographic area. A user interface is typically stored on a server computer system and transmitted over the internet to a client computer system. The user interface typically has a search box for entering text. A user can then select a search button to transmit a search request from the client computer system to the server computer system. The server computer system then compares the text with data in a database or data source and extracts information based on the text from the database or data source. The information is then transmitted from the server computer system to the client computer system for display at the client computer system.
The invention provides a user interface including a first view transmitted from a server computer system to a client computer system, the first view including a search identifier, interaction with the search identifier causing of a search request from the client computer system to the server computer system, the search request being utilized at the server computer system to extract at least an initial search result from a search data source, the initial search result including information relating to a geographic location to the client computer system for display at the client computer system, and a second view, at least part of which may be transmitted from the server computer system in response to the user interacting with the search identifier, the second view including the initial search result and the plurality of related search suggestions, selection of a respective related search suggestion causing transmission of a related search request from the client computer system to the server computer system.
The related search request may be utilized at the server computer system to extract at least one subsequent search result, and at least part of a third view may be transmitted from the server computer system to the client computer system, the third view including the subsequent search result.
A plurality of initial search results may be extracted and included in the second view and a plurality of subsequent search results may be extracted and included in the third view.
The second view may include a plurality of context identifiers, a plurality of the related search suggestions being associated with each context identifier.
The first view may include a plurality of vertical search determinators, wherein the search result depends on a respective one of the vertical search determinators.
A second context identifier may be displayed in the second view if the search result depends on the second one of the vertical search determinators and the second context identifier may not appear in the second view if the search result depends on the first one of the vertical search determinators.
The context identifiers may include at least one of category, neighborhood, genre, and venue.
The context identifiers may include at least two of category, neighborhood, genre, and venue.
The related search suggestions may be neighborhoods other than a neighborhood of the initial search result but in the same city as a city of the initial search result.
The initial search result may have a plurality of categories and the related search suggestions are the categories.
The categories may be all one of restaurant type and movie genre.
The information relating to the geographic location may include an address.
The second view may include a map and the information relating to the geographic location may be used to indicate the geographic location on the map.
The invention also provides a method of interfacing with a client computer system, including transmitting a first view from a server computer system to the client computer system, the first view including a search identifier, in response to a user interacting with the search identifier, receiving an initial search request from a client computer system at the server computer system, utilizing the initial search request at the server computer system to extract at least one initial search result and a plurality of related search suggestions from at least one search data source, the initial search result including information relating to a geographic location, and transmitting at least part of a second view from the server computer system to the client computer system for display at the client computer system, wherein the second view may include the initial search result and the plurality of related search suggestions, selection of a respective related search suggestion causing transmission of a related search request from the client computer system to the server computer system.
The method may further include utilizing the related search request at the server computer system to extract at least one subsequent search result, and transmitting at least part of a third view from the server computer system to the client computer system, the third view including the subsequent search result.
A plurality of initial search results may be extracted and included in the second view and a plurality of subsequent search results may be extracted and included in the third view.
The second view may include a plurality of context identifiers, a plurality of the related search suggestions being associated with each context identifier.
The first view may include a plurality of vertical search determinators, wherein the search result depends on a respective one of the vertical search determinators.
A second context identifier may be displayed in the second view if the search result depends on the second one of the vertical search determinators and the second context identifier does not appear in the second view if the search result depends on the first one of the vertical search determinators.
The context identifiers may include at least one of category, neighborhood, genre, and venue.
The context identifiers may include at least two of category, neighborhood, genre, and venue.
The related search suggestions may be neighborhoods other than a neighborhood of the initial search result but in the same city as a city of the initial search result.
The initial search result may have a plurality of categories and the related search suggestions are the categories.
The categories may be all one of restaurant type and movie genre.
The information relating to the geographic location may include an address.
The second view may include a map and the information relating to the geographic location may be used to indicate the geographic location on the map.
A plurality of search results may be transmitted, each including information relating to a different geographic location, wherein information relating to a respective one of the search results may be displayed on the map upon selection of at least one component of the respective search result.
The first view may include a map and the second view may include at least a first static location marker at a first fixed location on the map of the second view due to selection of a location marker at the fixed location on the map of the first view.
The method may further include storing a profile page, wherein the first view may include a plurality of verticals, selection of a respective vertical causing the display of a respective search identifier associated with the respective vertical, the search request received from a client computer system at the server computer system being in response to the user interacting with one of the search identifiers, and display of the profile page independent of the search identifier that the user interacts with.
A plurality of search results may be extracted and included in the second view, the method further including receiving a driving direction request relating to a select one of the search results from the client computer system at the server computer system, in response to the driving direction request, calculating driving directions to the selected search result, and transmitting at least part of a third view from the server computer system to the client computer system, the third view including the driving directions to the selected search result and at least one of the search results other than the selected search result.
The second view may include at least one component that may be in substantially the same location as in the first view.
The method may further include transmitting a third view from the server computer system to the client computer system, the third view including a reproduction selector, and in response to a reproduction command transmitted from the client computer system to the server computer system upon selection of the reproduction selector, transmitting a fourth view from the server computer system to the client computer system, the fourth view including the search result included in the second view.
The first view may include a location identifier, a selected location being transmitted from the client computer system to the server computer system due to interaction of the user with the location identifier, causing at least one of the search results to be based on the selected location.
A plurality of search results may be extracted, the method further including determining a number of the search results that have geographic locations within a selected area, wherein the search results that are included in the second view include search results with geographic locations outside the selected area if the number of the search results that have geographic locations within the selected area may be less than a predetermined threshold value.
The search result may be extracted due to a comparison between the search request and a first field of the search result and the search result may be extracted due to a comparison between the search request and a second field of the search result.
The invention also provides a computer-readable medium having stored thereon a set of instructions which, when executed by at least one processor of at least one computer, executes a method including transmitting a first view from a server computer system to the client computer system, the first view including a search identifier, in response to a user interacting with the search identifier, receiving an initial search request from a client computer system at the server computer system, utilizing the initial search request at the server computer system to extract at least one initial search result and a plurality of related search suggestions from at least one search data source, the initial search result including information relating to a geographic location, and transmitting at least part of a second view from the server computer system to the client computer system for display at the client computer system, wherein the second view may include the initial search result and the plurality of related search suggestions, selection of a respective related search suggestion causing transmission of a related search request from the client computer system to the server computer system.
The invention is further described by way of example with reference to the accompanying drawings wherein:
Network and Computer Overview
The server computer system 16 has stored thereon a crawler 19, a collected data store 21, an indexer 22, a plurality of search databases 24, a plurality of structured databases and data sources 26, a search engine 28, and the user interface 12. The novelty of the present invention revolves around the user interface 12, the search engine 28 and one or more of the structured databases and data sources 26.
The crawler 19 is connected over the internet 14A to the remote sites 20. The collected data store 21 is connected to the crawler 19, and the indexer 22 is connected to the collected data store 21. The search databases 24 are connected to the indexer 22. The search engine 28 is connected to the search databases 24 and the structured databases and data sources 26. The client computer systems 18 are located at respective client sites and are connected over the internet 14B and the user interface 12 to the search engine 28.
Reference is now made to
A user at one of the client computer systems 18 accesses the user interface 12 over the internet 14B (step 36). The user can enter a search query in a search box in the user interface 12, and either hit “Enter” on a keyboard or select a “Search” button or a “Go” button of the user interface 12 (step 38). The search engine 28 then uses the “Search” query to parse the search databases 24 or the structured databases or data sources 26. In the example of where a “Web” search is conducted, the search engine 28 parses the search database 24 having general Internet Web data (step 40). Various technologies exist for comparing or using a search query to extract data from databases, as will be understood by a person skilled in the art.
The search engine 28 then transmits the extracted data over the internet 14B to the client computer system 18 (step 42). The extracted data typically includes uniform resource locator (URL) links to one or more of the remote sites 20. The user at the client computer system 18 can select one of the links to one of the remote sites 20 and access the respective remote site 20 over the internet 14C (step 44). The server computer system 16 has thus assisted the user at the respective client computer system 18 to find or select one of the remote sites 20 that have data pertaining to the query entered by the user.
The exemplary client computer system 18 includes a processor 130 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both), a main memory 132 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), and a static memory 134 (e.g., flash memory, static random access memory (SRAM, etc.), which communicate with each other via a bus 136.
The client computer system 18 may further include a video display 138 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The client computer system 18 also includes an alpha-numeric input device 140 (e.g., a keyboard), a cursor control device 142 (e.g., a mouse), a disk drive unit 144, a signal generation device 146 (e.g., a speaker), and a network interface device 148.
The disk drive unit 144 includes a machine-readable medium 150 on which is stored one or more sets of instructions 152 (e.g., software) embodying any one or more of the methodologies or functions described herein. The software may also reside, completely or at least partially, within the main memory 132 and/or within the processor 130 during execution thereof by the client computer system 18, the memory 132 and the processor 130 also constituting machine readable media. The software may further be transmitted or received over a network 154 via the network interface device 148.
While the instructions 152 are shown in an exemplary embodiment to be on a single medium, the term “machine-readable medium” should be taken to understand a single medium or multiple media (e.g., a centralized or distributed database or data source and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories and optical and magnetic media.
Local Searching and Interface
The user enters an address (in the present example, the internet address http://city.ask.com/city/) in the address box 164. A mouse (i.e., the cursor control device 142 of
The view 190A includes a search area 192, a map area 194, a map editing area 196, and a data saving and recollecting area 198. The view 190A of user interface 12 does not, at this stage, include a results area, a details area, or a driving directions area. It should be understood that all components located on the search area 192, the map area 194, the map editing area 196, the data saving and recollecting area 198, a results area, a details area, and a driving directions area form part of the user interface 12 in
The search area 192 includes vertical search determinators 200, 202, and 204 for “Businesses,” “Events,” and “Movies” respectively. An area below the vertical search determinator 200 is open and search identifiers in the form of a search box 206 and a search button 208 together with a location identifier 210 are included in the area below the vertical search determinator 200. Maximizer selectors 212 are located next to the vertical search determinators 202 and 204.
The map area 194 includes a map 214, a scale 216, and a default location marker 218. The map 214 covers the entire surface of the map area 194. The scale 216 is located on a left portion of the map 214. A default location, in the present example an intersection of Mission Street and Jessie Street in San Francisco, Calif., 94103, is automatically entered into the location identifier 210, and the default location marker 218 is positioned on the map 214 at a location corresponding to the default location in the location identifier 210. Different default locations may be associated with respective ones of the client computer systems 18 in
Included on the map editing area 196 are a map manipulation selector 220, seven map addition selectors 222, a clear selector 224, and an undo selector 226. The map addition selectors 222 include map addition selectors 222 for text, location markers, painting of free-form lines, drawing of straight lines, drawing of a polygon, drawing of a rectangle, and drawing of a circle.
The data saving and recollecting area 198 includes a plurality of save selectors 228. The save selectors 228 are located in a row from left to right within the data saving and recollecting area 198.
The search box 206 serves as a field for entering text. The user moves the cursor 172 into the search box 206 and then depresses the left button on the mouse to allow for entering of the text in the search box 206. In the present example, the user enters search criteria “Movies” in the search box 206. The user decides not to change the contents within the location identifier 210. The user then moves the cursor over the search button 208 and completes selection of the search button 208 by depressing the left button on the mouse.
Referring again to
In the present example, the data source entry 232 is extracted if any one of the fields 234, 236, 238, or 240 is for a movie. In addition, the data source entry 232 is extracted only if the coordinates of latitude and longitude 244 are within a predetermined radius, for example within one mile, from coordinates of latitude and longitude of the intersection of Mission Street and Jessie Street. Should an insufficient number, for example, fewer than ten, data source entries such as the data source entry 232 for movies have coordinates of latitude and longitude 244 within a one-mile radius from the coordinates of latitude and longitude of Mission Street and Jessie Street, the threshold radius will be increased to, for example, two miles. All data source entries or movies having coordinates of latitude and longitude 244 within a two-mile radius of coordinates of latitude and longitude of Mission Street and Jessie Street are extracted for transmission to the client computer system 18.
A plurality of location markers 252 are displayed on the map 214. The location markers 252 have the same numbering as the search results in the results area 248. The coordinates of latitude and longitude 244 of each data source entry 232 in
Also included in the search area 192 in the view 190B are a context identifier 256 and a plurality of related search suggestions 258. The context identifier 256 is for “neighborhood” and is thus similar to “neighborhood” of the context 240 in
When a search is conducted, one or more coordinates are extracted for a location of the search. In the present example, the coordinates of latitude and longitude of the intersection of Mission Street and Jessie Street in San Francisco are extracted. The coordinates are then compared with the areas in the table of
The related search suggestions 258 are thus the result of an initial search for movies near Mission Street and Jessie Street in San Francisco, Calif. When the user selects one of the related search suggestions 258 in the view 190B, a subsequent search will be carried out at the server computer system 16 according to the method of
A comparison between
As mentioned, the user can select or modify various ones of the components within the search area 192 in the view 190B of
Selection of the name of the sixth search result causes transmission of a results selection request, also serving the purpose of a profile page request, from the client computer system 18 in
A window 268 is also inserted on the map 214 and a pointer points from the window 268 to the location marker 252 numbered “6.” The exact same information at the sixth search result in the results area 248 in the view 190B of
Persistence is provided from one view to the next. The search area 192, the map area 194, the map editing area 196, and the data saving and recollecting area 198 are in the exact same locations when comparing the view 190B of
The movies portions of the movies and show times 266 are selectable. In the present example, the user selects the movie “The Good Shepherd” to cause transmission of a profile page request from the client computer system 18 in
A search box 274, a location identifier 276, a date identifier 278, and a search button 280 are inserted in an area below the vertical search determinator 204 for “Movies.”
In the present example, the user enters “AMC 1000 Van Ness” in the search box 274. The user elects to keep the default intersection of Mission Street and Jessie Street, San Francisco, Calif., 94103 in the location identifier 276, and elects to keep the date in the date identifier 278 at today, Monday, Feb. 5, 2007. The user then selects the search button 280. Upon selection of the search button, the details area 260 in the view 190 of
The view 190E of
Ten search results are included within the results area 248 and six of the search results are shown at a time by sliding the vertical scroll bar 250 up or down. All ten search results are shown on the map 214. Only four of the results are within a circle 275 having a smaller radius, for example a radius of two miles, from an intersection of Mission Street and Jessie Street, San Francisco, Calif., 94103. Should there be ten search results within the circle 275, only the ten search results within the circle 275 would be included on the map 214 and within the results area 248. The server computer system 16 recognizes that the total number of search results within the circle 275 is fewer than ten and automatically extracts and transmits additional search results within a larger circle 277 having a larger radius of, for example, four miles from an intersection of Mission Street and Jessie Street, San Francisco, Calif., 94103. All ten search results are shown within the larger circle 277. The circles 275 and 277 are not actually displayed on the map 214 and are merely included on the map 214 for purposes of this description.
In
In
Data stored in one of the structured databases or data sources 26 in
When a neighborhood or a ZIP code is selected in the location identifier 276, a search is first conducted within a first rectangle that approximates an area of the neighborhood or ZIP code. If insufficient search results are obtained, the search is automatically expanded to a second rectangle that is larger than the first rectangle and includes the area of the first rectangle. The second rectangle may, for example, have a surface area that is between 50% and 100% larger than the first rectangle.
The window 268 in the view 190K of
The server computer system then calculates driving directions. The driving directions are then transmitted from the server computer system 16 to the client computer system 18 and are shown in the driving directions area 300 of the view 190R in
The server computer system also calculates a path 310 from the start location to the end location and displays the path 310 on the map 214.
Further details of how driving directions and a path on a map are calculated are described in U.S. patent application Ser. No. 11/677,847, which is incorporated herein by reference.
The user can, at any time, select a results maximizer 312, for example in the view 190S of
The user then selects a location for making the addition on the map 214. Various types of additions can be made to the map depending on the addition selector 222 that is selected. Upon indicating where the additions should be made on the map 214, a command is transmitted to the processor 130 in
The user can at any time remove all the additions to the map 214 by selecting the clear selector 224. The user can also remove the last addition made to the map by selecting the undo selector 226. An undo or clear command is transmitted to the processor 130 (step 328). The processor 130 receives the undo or clear command and responds to the undo or clear command by removing the addition or additions from the map 214 (step 330).
Upon selection of the clear selector 224, the undo selector 226, or the map manipulation selector 220, the cursor 172 reverts to an open hand and can be used to drag and drop the map 214.
The user may, at any time, decide to save the contents of a view, and in doing so will select one of the save selectors 228. A save command is transmitted from the client computer system 18 to the server computer system 16 (step 340 in
The user may now optionally close the browser 160. When the browser 160 is again opened, the user can conduct another search, for example a search for a restaurant near Union Street, San Francisco, Calif. The search results in the results area 248 will only include results for the search conducted by the user and the locations of the search results will be displayed on the map 214 without the static location markers or additions shown in the view 190V of
Any further views of the user interface 12 includes the reproduction selector 356 and any further reproduction selectors (not shown) that have been created by the user at different times and have not been deleted. The user can select the reproduction selector 356 in order to retrieve the information in the view 190V of
It should be evident to one skilled of the art that the sequence that has been described with reference to the foregoing drawings may be modified. Frequent use is made in the description and the claims to a “first” view and a “second” view. It should be understood that the first and second views may be constructed from the exact same software code and may therefore be the exact same view at first and second moments in time. “Transmission” of a view should not be limited to transmission of all the features of a view. In some examples, an entire view may be transmitted and be replaced. In other examples, Asynchronous JavaScript™ (AJAX™) may be used to update a view without any client-server interaction, or may be used to only partially update a view with client-server interaction.
Similarly,
Similarly,
In a different embodiment, the figure entities drawn on the map, the polygon 506, for example, may be used by the server computer system to define latitude and longitude coordinates using only the outline of the figure entity, without the enclosed area. In this embodiment, the figure entities such as the polygon 506 may be treated as a series of line segments. In the same manner as in
Search System
Query Processing System
The query processing system (QPS) 650 performs three main functions: a) parsing/disambiguation, b) categorization; and c) transformation.
Categorization
The where-component is sent to a second classification component 706 which is comprised of an ambiguity resolution component 708 and a selection component 710. The ambiguity resolution component 708 determines whether the where-component contains a geographical location. The selection component 710 receives a where-component containing a geographical location from the ambiguity resolution component 708 and determines the resulting location. A view 712 for changing the result location is provided to the user to select the most appropriate location for the user query that is different from the location selected by the selection component 710. The second classification component 706 then sends the location to the transmission component 714. The transmission component 714 sends the processed user query, the classification, and the location to the backend search engine.
The QPS 650 processes every query both on the reply page (e.g., one of the search databases 24 in
“What” Component:
The query processing system can parse user queries, identify their “what” component, and classify them in different buckets: business names, business chain names, business categories, event names, event categories.
Then if no transformation operation can be performed, it sends the original user query and its classification to the backend local search engine. The backend local search engine will make use of the classification provided by the QPS 650 so as to change the ranking method for the search results. Different query classes determined by the QPS 650 correspond to different ranking options on the backend side. For example, the QPS 650 may classify “starbucks” as a business name, while it may categorize “coffee shops” as business category.
The ability to change ranking method depending on the classification information provided by the QPS 650 has a crucial importance in providing local search results that match as closely as possible the intent of the user, in both dimensions: name and category.
Business Name Examples
In a particular geographic location there might not be “starbucks” coffee shops nearby. However, if the user explicitly specifies a request for “starbucks” in that location, the system will be able to provide results for “starbucks” even if they are far away and there are other coffee shops that are not “starbucks” closer to the user-specified location.
There might be database records for which common words that are also business names have been indexed, such as “gap,” “best buy,” “apple.” The QPS 650 recognizes that these are proper and very popular business names, thus making sure that the local backend search engine gives priority to the appropriate search results (instead of returning, for example, grocery stores that sell “apples”).
Category Name Examples
There might exist businesses whose full name (or parts thereof) in the database contains very common words that most typically correspond to a category of businesses. For example, in a particular geographic location there might be several restaurants that contain the word “restaurant” in the name, even if they are not necessarily the best restaurants that should be returned as results for a search in that location. The QPS 650 will recognize the term “restaurant” as a category search, and this classification will instruct the local backend search engine to consider all restaurants without giving undue relevance to those that just happen to contain the word “restaurant” in their name.
“Where” Component:
The QPS 650 can parse user queries and identify their “where” component. The QPS 650 performs two main subfunctions in analyzing user queries for reference to geographic locations: ambiguity resolution and selection.
Ambiguity Resolution:
For every user query the QPS 650 determines whether it does indeed contain a geographic location, as opposed to some other entity that may have the same name as a geographic location. For example, the query “san francisco clothing” is most likely a query about clothing stores in the city of San Francisco, whereas “hollister clothing” is most likely a query about the clothing retailer “Hollister Co.” rather than a query about clothing stores in the city of Hollister, Calif. So only the first query should be recognized as a local business search query and sent to the backend local search engine.
The QPS 650 recognizes the parts of user queries that are candidates to be names of geographic locations, and determines whether they are actually intended to be geographic names in each particular query. This determination is based on data that is pre-computed offline.
The algorithm for geographic name interpretation takes as input the set of all possible ways to refer to an object in a geographic context. This set is pre-computed offline through a recursive generation procedure that relies on seed lists of alternative ways to refer to the same object in a geographic context (for example, different ways to refer to the same U.S. state).
For each geographic location expression in the abovementioned set, the QPS 650 determines its degree of ambiguity with respect to any other cultural or natural artifact on the basis of a variety of criteria: use of that name in user query logs, overall relevance of the geographic location the name denotes, number of web results returned for that name, formal properties of the name itself, and others. Based on this information and the specific linguistic context of the query in which a candidate geographic expression is identified, the QPS 650 decides whether that candidate should be indeed categorized as a geographic location.
Selection:
In case there are multiple locations with the same name, the QPS 650 determines which location would be appropriate for most users. Out of all the possible locations with the same name, only the one that is selected by the QPS 650 is sent to the backend local search engine, and results are displayed only for that location. However, a drop-down menu on the reply page gives the user the possibility to choose a different location if they intended to get results for a place different from the one chosen by the QPS 650.
For example, if the user asks for businesses in “Oakland,” the QPS 650 selects the city of Oakland, Calif. out of the dozens of cities in the U.S. that have the same name.
The determination of which city to display results for out of the set of cities with the same name is based on data pre-computed offline. This selection algorithm takes as input the set of all possible ways to refer to an object in a geographic context (this is the same set as the one generated by the recursive generation procedure described herein before. For example, the city of San Francisco can be referred to as “sf,” “san francisco, ca,” “sanfran,” etc. For all cases in which the same linguistic expression may be used to refer to more than one geographic location, the selection algorithm chooses the most relevant on the basis of a variety of criteria: population, number of web results for each geographic location with the same name and statistical functions of such number, and others.
Transformation
The QPS 650 processes every query both on the reply page and in the AskCity local channel and possibly maps the original user query (source query) to a new query (target query) that is very likely to provide better search results than the original query. While every query is processed, only those that are understood with high confidence are mapped to a different target query. Either the original user query or the rewritten target query is sent to the backend local search engine.
The target queries correspond more precisely to database record names or high quality index terms for database records. For example, a user may enter the source query “social security office.” The QPS 650 understands the query with high confidence and maps it to the target query “US social security adm” (this is the official name of social security office in the database). This significantly improves the accuracy of the search results.
The QPS 650 can perform different types of mappings that improve search accuracy in different ways and target different parts of a user query. The QPS 650 first analyzes the user query into a “what” component and a “where” component. The “what” component may correspond to a business or event (name or category), and the “what” component may correspond to a geographic location (city, neighborhood, ZIP code, etc.). For each component and subtypes thereof, different types of mapping operations may take place.
For example, for business search there are four sub-cases:
Business names: “acura car dealerships”=>“acura”;
Business categories: “italian food”=>“italian restaurants”;
Business name misspellings: “strabucks”=>“starbucks”;
Business category misspellings: “resturant”=>“restaurant.”
Similar sub-cases apply to event search. For locations, there are two sub-cases:
City names: “sf”=>“San Francisco”;
Neighborhood names: “the mission”=>“mission district.”
For each class of sub-cases, a different algorithm is used offline to generate the mapping pairs:
Names and categories (both business and events): mapping pairs are generated on the basis of session data from user query logs. The basic algorithm consists in considering queries or portions thereof that were entered by users in the same browsing session at a short time distance, and appropriately filtering out unlikely candidates using a set of heuristics.
Misspellings (both business and events): mapping pairs are generated on the basis of session data from user query logs. The basic algorithm consists in considering queries or portions thereof that i) were entered by used in the same browsing session at a short time distance; ii) are very similar. Similarity is computed in terms of editing operations, where an editing operation is a character insertion, deletion, or substitution.
Geographic locations (cities and neighborhoods): mapping pairs are generated as a part of the recursive mentioned hereinbefore.
Correlation of Data
The correlated data set 808 already has a reference set of entries. The correlator 806 compares the feed data set 804 with the correlated data set 808 for purposes of linking entries of the feed data set 804 with existing entries in the correlated data set 808. Specifically, the geographical locations of latitude and longitude (see reference numeral 244 in
The duplication detector 810 may be the same duplication detector as the duplication detector 802, but configured slightly differently. The duplication detector 810 detects duplicates in the correlated data set 808. Should one entry have a duplicate, the duplicate is removed, and all entries except the removed duplicate are stored in the search data set 812. The duplication detectors 802 and 810 detect duplicates according to a one-to-many relationship.
The duplication detectors 802 and 810 and the correlator 806 restrict comparisons geographically. For example, entries in San Francisco, Calif. are only compared with entries in San Francisco, Calif., and not also in, for example, Seattle, Wash. Speed can be substantially increased by restricting comparisons to a geographically defined grid.
Soft-term frequency/fuzzy matching is used to correlate web-crawled data and integrate/aggregate feed data, as well as to identify duplicates within data sets. For businesses, match probabilities are calculated independently across multiple vectors (names and addresses) and then the scores are summarized/normalized to yield an aggregate match score. By preprocessing the entities through a geocoding engine and limiting candidate sets to ones that are geographically close, the process is significantly optimized in terms of execution performance (while still using a macro-set for dictionary training).
Selection of Reliable Key Words from Unreliable Sources
The entropy of a word on reliable data type (like a subcategory) is used to filter reliable key words from unreliable sources. For example, there is a set of restaurants with a “cuisine” attribute accompanied by unreliable information from reviews. Each review corresponds to a particular restaurant that has a particular cuisine. If the word has high entropy on distribution on cuisine, then this word is not valid as a key word. Words with low entropy are more reliable. For example, the word “fajitas” has low entropy because it appears mostly in reviews of Mexican restaurants, and the word “table” has high entropy because it is spread randomly on all restaurants.
Multiple Language Models Method for Information Retrieval
Suppose there is a database where objects may have type/category attributes and text attributes. For example, in the “Locations” database, the locations may have:
In some cases a significant part of database objects (>80%) does not have text information at all, so it is impossible to use standard text information retrieval methods to find objects relevant to the user query.
The main idea of the proposed information retrieval method is to build a Language Model for each “type attribute” and then merge them with a Language model of the object. (Language model is usually N-grams with N=1, 2 or 3.)
For example, locations may include:
Language Models may include:
Then a final Language Model for Location “S” is built: Ls=Merge (L1,L2,L3). The Merge function may be a linear combination of language models or a more complex function.
Then Ls is used to estimate the probability that query q belongs to Language model Ls. This probability is the information retrieval score of the location s.
As shown in
As shown in
Ranking of Objects Using Semantic and Nonsemantic Features
In ranking algorithm for Locations, many things need to be taken into account: semantic similarity between query and keywords/texts associated with location, distance from location to particular point, customer's rating of location, number of customer reviews.
A straightforward mix of this information may cause unpredictable results. A typical problem when a location that is only partially relevant to the query is at the top of the list because it is very popular or it is near the searching address.
To solve this problem, a vector score calculation method is used. “Vector score” means that the score applies to two or more attributes. For example, a vector score that contains two values is considered: a qualitative semantic similarity score, and a general quantitative score. The qualitative semantic similarity score shows the qualitative relevancy of the particular location to the query:
QualitativeSemanticSimilarityScore=QualitativeSemanticSimilarityScoreFunction (Location, Query).
QualitativeSemanticSimilarityScore has discrete values: relevant to the query, less relevant to the query, . . . , irrelevant to the query.
A general quantitative score may include different components that have different natures:
GeneralQuantitativeScore=a1*SemanticSimilarity (Location, Query)+a2*DistanceScore(Location)+a3*RatingScore(Location).
So the final score includes two attributes S=(QualitativeSemanticSimilarityScore, GeneralQuantitativeScore).
Suppose there are two locations with scores S1=(X1,Y1) and S2=(X2,Y2). To compare the scores the following algorithm may be used:
This method of score calculation prevents penetration of irrelevant objects to the top of the list.
Table 1 shows a less-preferred ranking of locations where distance scores and semantic scores have equal weight. According to the ranking method in Table 1, the second location on the distance score has the highest total score, followed by the eighth location on the distance score. The semantic score thus overrules the distance score for at least the second location on the distance score and the eighth location on the distance score.
Table 2 shows a preferred ranking method, wherein the distances scores are never overrules by the semantic scores. The distance scores are in multiples of 0.10. The semantic scores are in multiples of 0.01, and range from 0.01 to 0.09. The largest semantic score of 0.09 is thus never as large as the smallest distance score of 0.10. The total score is thus weighted in favor of distances scores, and the distance scores are never overruled by the semantic scores.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative and not restrictive of the current invention, and that this invention is not restricted to the specific constructions and arrangements shown and described since modifications may occur to those ordinarily skilled in the art.
Number | Name | Date | Kind |
---|---|---|---|
5475802 | Wescott et al. | Dec 1995 | A |
5555409 | Leenstra, Sr. et al. | Sep 1996 | A |
5774362 | Suzuki et al. | Jun 1998 | A |
5948040 | Delorme et al. | Sep 1999 | A |
5966710 | Burrows | Oct 1999 | A |
6263343 | Hirono | Jul 2001 | B1 |
6678611 | Khavakh et al. | Jan 2004 | B2 |
6714215 | Flora et al. | Mar 2004 | B1 |
6735518 | Kim | May 2004 | B2 |
6772150 | Whitman et al. | Aug 2004 | B1 |
6925608 | Neale et al. | Aug 2005 | B1 |
7158878 | Rasmussen | Jan 2007 | B2 |
7373244 | Kreft | May 2008 | B2 |
7379811 | Rasmussen et al. | May 2008 | B2 |
7499946 | Klein | Mar 2009 | B2 |
7523099 | Egnor et al. | Apr 2009 | B1 |
7555387 | Sladky et al. | Jun 2009 | B2 |
7620496 | Rasmussen | Nov 2009 | B2 |
7634463 | Katragadda et al. | Dec 2009 | B1 |
7707140 | Leishman et al. | Apr 2010 | B2 |
7788252 | Delli Santi et al. | Aug 2010 | B2 |
7917490 | Norris et al. | Mar 2011 | B2 |
20020042793 | Choi et al. | Apr 2002 | A1 |
20020129014 | Kim et al. | Sep 2002 | A1 |
20030033274 | Chow et al. | Feb 2003 | A1 |
20030033301 | Cheng et al. | Feb 2003 | A1 |
20030217052 | Rubenczyk et al. | Nov 2003 | A1 |
20030233224 | Marchisio et al. | Dec 2003 | A1 |
20040133564 | Gross et al. | Jul 2004 | A1 |
20040249796 | Azzam | Dec 2004 | A1 |
20040260677 | Malpani et al. | Dec 2004 | A1 |
20050060290 | Herscovivi et al. | Mar 2005 | A1 |
20050071328 | Lawrence | Mar 2005 | A1 |
20050097089 | Nielsen et al. | May 2005 | A1 |
20050131872 | Calbucci et al. | Jun 2005 | A1 |
20050270311 | Rasmussen et al. | Dec 2005 | A1 |
20050273346 | Frost | Dec 2005 | A1 |
20060026084 | Bonham et al. | Feb 2006 | A1 |
20060026170 | Kreitler et al. | Feb 2006 | A1 |
20060170565 | Husak et al. | Aug 2006 | A1 |
20060178869 | Acero et al. | Aug 2006 | A1 |
20060200308 | Arutunian et al. | Sep 2006 | A1 |
20060206264 | Rasmussen et al. | Sep 2006 | A1 |
20060206454 | Forstall et al. | Sep 2006 | A1 |
20060212441 | Tang et al. | Sep 2006 | A1 |
20060223509 | Fukazawa et al. | Oct 2006 | A1 |
20060224574 | Dettinger et al. | Oct 2006 | A1 |
20060229899 | Hyder et al. | Oct 2006 | A1 |
20060265352 | Chen et al. | Nov 2006 | A1 |
20060271280 | O'Clair et al. | Nov 2006 | A1 |
20060271281 | Ahn et al. | Nov 2006 | A1 |
20060271287 | Gold et al. | Nov 2006 | A1 |
20060271531 | O'Clair et al. | Nov 2006 | A1 |
20070060114 | Ramer et al. | Mar 2007 | A1 |
20070061487 | Moore et al. | Mar 2007 | A1 |
20070088616 | Lambert et al. | Apr 2007 | A1 |
20070094042 | Ramer et al. | Apr 2007 | A1 |
20070096945 | Rasmussen et al. | May 2007 | A1 |
20070100802 | Celik | May 2007 | A1 |
20070106455 | Fuchs et al. | May 2007 | A1 |
20070118512 | Riley et al. | May 2007 | A1 |
20070118520 | Celik | May 2007 | A1 |
20070176796 | Bliss et al. | Aug 2007 | A1 |
20070179941 | Huang et al. | Aug 2007 | A1 |
20070214454 | Edwards et al. | Sep 2007 | A1 |
20070217493 | Rhoads | Sep 2007 | A1 |
20070260628 | Fuchs et al. | Nov 2007 | A1 |
20070294233 | Sheu et al. | Dec 2007 | A1 |
20080005104 | Flake et al. | Jan 2008 | A1 |
20080005668 | Mavinkurve et al. | Jan 2008 | A1 |
20080009268 | Ramer et al. | Jan 2008 | A1 |
20080028341 | Szeiliski et al. | Jan 2008 | A1 |
20080040678 | Crump | Feb 2008 | A1 |
20080082528 | Bonzi et al. | Apr 2008 | A1 |
20080082578 | Hogue et al. | Apr 2008 | A1 |
20080092061 | Bankston | Apr 2008 | A1 |
20080104530 | Santanche et al. | May 2008 | A1 |
20080133124 | Sarkeshik | Jun 2008 | A1 |
20080140603 | Babikov et al. | Jun 2008 | A1 |
20080172374 | Wolosin et al. | Jul 2008 | A1 |
20080189257 | Wiseman et al. | Aug 2008 | A1 |
20080222119 | Dai et al. | Sep 2008 | A1 |
20080243783 | Delli Santi et al. | Oct 2008 | A1 |
20080243821 | Delli Santi et al. | Oct 2008 | A1 |
20080270383 | Allen et al. | Oct 2008 | A1 |
20080281776 | Goradia | Nov 2008 | A1 |
20080281806 | Wang et al. | Nov 2008 | A1 |
20080292213 | Chau | Nov 2008 | A1 |
20090006069 | Wetzer et al. | Jan 2009 | A1 |
20090064144 | Abhyanker | Mar 2009 | A1 |
20090094223 | Berk et al. | Apr 2009 | A1 |
20090132469 | White et al. | May 2009 | A1 |
20090183097 | Bayiates | Jul 2009 | A1 |
20090254841 | Balaishis et al. | Oct 2009 | A1 |
20100082634 | Leban | Apr 2010 | A1 |
20100118025 | Smith et al. | May 2010 | A1 |
Number | Date | Country |
---|---|---|
2378789 | Feb 2003 | GB |
WO 2004114162 | Dec 2004 | WO |
Number | Date | Country | |
---|---|---|---|
20090132644 A1 | May 2009 | US |