The field of the invention is searching technologies.
Searchers are getting more sophisticated with using search engines and other informational tools on the Web. It is true that “everyone googles”, but it is also true that no one types his itinerary to Google's™ search box and expects to get back a list of flights and prices—he knows Expedia does that kind of work and Google does not. On the other hand, if the searcher knows the name of a company and wants to find out its web site, as in searching for “American Airlines” and expecting to get its web address (happens to be www.AA.com), Google, along with other general web search engines, serves well this particular search need.
The distinctions between the use of Google and that of Expedia teach the following essential characteristics of an online information tool: (1) each has a different database. With a general web search engine, the database is web pages from the entire Web, and for Expedia, which we call an intermediary engine, the database is a product catalogue focused on flights, hotel and car rentals; (2) each takes in different kinds of user input. For general web search engine, it is free-text query, typically of a few words; and for intermediary engines, it is a form of multiple fields, each of which is to be filled by the searcher; (3) each has its own matching mechanism. For general web search engine, it is essentially exact matching between query words and words in web pages, with the preferred embodiment of proximity search. For intermediary engines, it is exact matching between values of fields in user input and those of fields in the database of a product catalogue.
Each tool serves different search needs of a searcher. When a searcher can formulate his search need in a few words, and want to find web pages contain exactly these words, general web search engines serve well. When a searcher can formulate his search need in a few pairs of attribute and value, and an intermediary engine contains catalogues of exactly the kind of products the searcher is looking for, then the engine will serve well.
All other information tools can be modeled with above-mentioned three characteristics. We enumerate below. (1) Home values, such as Zillow.com. A typical input is an address, or a street; expected results are home values; the engine's database is a catalogue of values of home at different addresses; (2) Bulletin board such as eBay.com. A typical input is keywords; expected results are items for sale; and the engine's database is forms filled out by sellers; (3) Business directories, such as Business.com. A typical input is keywords; expected results are company information; the engine's database is forms filled out by companies; (4) B2B search engines, such as Alibaba.com and GlobalSpec.com. A typical input is either keywords, or filled out forms; expected results are product specifications and their manufacturers; the engine's data is product catalogues of certain classes of products; (5) Comparison shopping sites, such as NexTag.com, which is similar to Expedia in terms of input, results and database; (6) local search engines, such as CitySearch.com, and Google's local.google.com, which is yet another variation of intermediary engines. A typical input are of two fields, one for the name of a business, or the category of a business, as in “Chinese restaurants”, and the other field for a location, as in a city or a zip code; the expected results are a list of businesses, their contact information, and some times a short description of their services; and the engine's database is essentially yellow pages information.
The currently available online information tools, while each serves well for the purpose it is built, leave a large white space of un-served search needs. Consider, for example, the situation of a searcher in the area of real estate. She is hunting for an apartment or a house, for a temporary relocation of 12 months. If she wants to use a corporate housing company, then querying “Oakwood corporate housing” or such on Google might well satisfy her search need. If, however, she wants to rent from other parties, and knows the city well enough, searching through Apartments.com's catalogue might suffice. However, if she poses her search need as a natural language query, such as “family of two kids, one dog, looking for an apartment or a house, close to West Los Angeles, with good schools, one year lease”, then no available online tools can return helpful results to her.
For a searcher who is interested in finding information in an area, such as an industry or a specific sector of an industry, a general web search engine is wanting. Among other things, the search engine would typically search against a set of all the web pages that it can crawl from the entire Internet, and these pages number close to 10 billion as of this writing. That is an enormous number given that there are probably less than 10 million relevant pages. This phenomenon in turn leads to the observed situation where returned results for a given query include records that are entirely or largely irrelevant to the area of the searcher's interest.
One way of improving the situation is for the web search engine to partition its database into hundreds or even thousands of areas. The searcher is asked to pick one or more areas when conducting a search, and the engine searches for results only from the area or areas picked by the searcher.
This application pulls together several different concepts, each of which is but a part of the inventive subject matter. Among other things, that subject matter provides systems and methods in which an online information tool has one or more of the following characteristics: (1) taking in user queries that are free text, just like queries to web search engine's, but segments a query into one or more pieces of information, not unlike a filled out form used by intermediary engines; and (2) applying knowledge from the given area, to each of the above.
The system thus enables the searcher to submit queries that are substantially similar to those asked to an expert in the area, and gets back results that are helpful in their decision making.
We employ the following methods in automatically creating a parameterized database from records such as web pages, with a focus on a given area, such as an industry, a sector:
We employ the following methods to match the best advertisers to a searcher's search need:
We employ the following methods in maximally taking advantage of human factors at user interface:
We provide convenience tools that are an interrogated part of search results, with following methods:
We allow users to place banner ads, company logos, or other images in order to facilitate the searcher's navigation to his frequented online destinations, as well as placing buttons to reach favorite tools, via following methods:
We provide a language- and region-specific informational experience to a user, via following methods:
Various aspects of the inventive subject matter can also be perceived as objects and advantages, each of which can be implemented independently of the others, and each of which should be viewed as desirable but not essential.
Viewed from yet another angle, one set of inventions addresses industry knowledge.
Another set of inventions improve searching functionality:
Another set of inventions increase the value of click-throughs to advertisers:
Our inventions can turn a web search engine into a “specialized search engine in multiple areas”, by a way of partitioning its dataset. Such partitions can advantageously be along industry lines, or even along the lines of sectors within industries.
In a preferred set of embodiments, a “B2B search engine in multiple industries” allows a user to choose an industry from a list of industries, and submit a query. The engine returns results about the chosen industry.
In order to have this search capability, the web search engine could map all the available web page URLs to SIC or other industry codes. That mapping might be stored in a “name-code-url” or “name-code-domain” table. Once the mapping has exhausted all web pages in the dataset, which at the state of the art of 2006, is about 4-8 billion pages, such tables would most likely have only millions as opposed to billions of entries.
The reduced dataset could then be partitioned into multiple industries. At this step, a many-to-one “code-to-industry” table could be created, possibly manually. That table might have only hundreds or thousands of industries. The partitioning software program would then iterate though the “name-code-url” table, and perform the following: (a) for each entry, look up its code in the “code-to-industry” table; and (b) copy the web page of the “url” to a hard-disk location where all web pages belonging to the “industry” are stored.
Once this program has exhausted the “name-code-url” table, there would be multiple datasets corresponding to the different industries. In this way the initial dataset has been partitioned according to industry.
Serving user queries.
In one embodiment, the user is asked to provide both (a) a user query, just like he would to a current web search engine; (b) a selection of one or more industries from a displayed list of industries. The engine will apply the current search technology only to the (partitioned) datasets for the selected industries, and return ranked web pages;
In another embodiment, the user is asked to provide only a user query, just like he would to a current web search engine. The engine returns results, the user can select one or more industries from a list of industries, and the engine can perform a re-processing of the results. The re-processing is done so that only those results from the industry or industries selected by the user will be kept and displayed to the user.
1. Embedding Browsing of an Information Space in Multiple-Cycled Search
Our engine in general assumes that a search is multi-cycled, namely more than one query is submitted by the searcher in order to satisfy one search need. A multi-cycled search “session” goes as follows: a searcher starts with a need to satisfy, formulates a query, submits to our engine, and gets returned results. If the searcher's need is satisfied, he's done. However, it is very likely that he needs to submit one or more additional queries. Many times he modifies a previous query in obtaining a new one.
During such a multi-cycled search session, our engine provides suggestions to the searcher on how to formulate a new query. The suggestions take the form of clickable links.
The dataset being searched typically has multiple identifiable portions.
The suggestions can either help the user to explore deeper into a same portion of the dataset, by adding more restrictions to his previous search, or help him to explore more broadly, namely reaching different portions of the dataset, by starting a query that is completely different in wording but related to a previous query.
When the user follows some of these suggestions, his entire session of search exhibits a “browsing” nature. (“Browsing” is a familiar behavior with a directory of information. A user click on links and in the process goes up and down on a hierarchy of information items such as web pages.) The searcher in our case gets a chance to be led to different parts of the information space that's defined by our dataset.
One added benefit is that more parts of our information space are exposed than otherwise. A searcher is unlikely to come up on his own all possible queries that will retrieve all the information he's looking for, due to issues such as mismatching vocabulary (e.g. the searcher has “UCLA” in the query while search engine's data contains “University of California at Los Angeles”).
This benefit of exposing more of the dataset is not available to the current web search engines which typically does not have suggestions that aide the searcher, and even when there are (in the case of Ask.com, as described in Possible Prior Art), the exposure is not necessary effective in our opinion.
2. User Interface that Facilitates Display of Voluminous Contents
Since a company typically has a lot of information that might be of interest when a searcher is making a business decision, it is important to provide an intuitive, simple interface so that the presentation of the information is not cluttered.
(1) Toggle between Chinese and English text
When a company has both Chinese and English text, on the user interface there is a button for toggling between Chinese and English.
(2) “Shrink-and-Restore” an Area on a Web Page
We have developed the “shrink-and-restore” feature on the user interface, so that with a click on a button, a portion of a web page can shrink into a sliver of bar, and with a click on the said bar, the portion is restored to its original size.
It gives the viewer control over how much real estate on the web page she wants to allocate to things she wants to study.
3. Automated Calculation of Summary
One class of embodiments in our system creates a summary for each company. When a user clicks on a search result, which could be focused on a company, the search result leads to the Summary page rather than directly to the web site of the company as web search engines currently do.
On the page displaying a company's summary, the left column displays the summary information our system has created by synthesizing different sources; and the right column is the web site of the company.
With the Summary, the user can get a quick overview of the company, and decide whether the company is a fitting provider. If the user still cannot make the decision, he can explore the company's web site.
To automatically create a company's summary, the engine picks a number of pieces of information that has been obtained in the set of creating a parameterized database from web records.
These pieces of information from the parameterized record for an entity, in this case, a company, include. The selection is made so that the summary is expected to best facilitate the searcher's decision on whether to use the company's services:
1) An introductory text about the company. Currently our engine simply takes the Meta Description of the company's web site. This could be replaced by more sophisticated way of extracting a company's motto, tag line, mission statement, overview, or some such.
2) The main services provided by the company
3) Major service regions
4) Contact information, contact persons
5) Any of the information above could be translated into one or more languages
Further, additional pieces of information might be included in the summary dependent on the user query.
Automated Calculation of the Most Useful Pages of a Company
We calculate the ‘most useful’ pages of a company, and include a small number (currently two) of them in Company Summary of the company. Typically such a page contains much service description, or contains useful contact information, or otherwise information “useful” to a searcher in making his/her decision.
During the Extraction step, our engine assigns a usefulness score of a page based on the recognized services, contact information, or other types of information that are deemed as ‘useful’.
4. Automated Calculation of Reputation
Our system assigns a “reputation” measure to each company. Factors are
1) Size of the company (employee number, revenue, offices around the world, etc.)
2) Industry awards
3) Listing in important directories (e.g., the directory maintained by the Port of Oakland, or directories maintained by magazines)
4) Mentioned by magazine articles
5) Having a relationship with other reputable companies (e.g., agents for airlines, etc.)
6) Having reputable companies as clients
7) Being an advertiser with print or online media
8) Feedback from our search engine users
9) Our in-house experts' opinions
10) Others
The Company Reputation helps in ranking otherwise equally fitting providers.
A company could be a big player in nationwide services, but does not have service in a particular city. Similarly, a company could be a big player in a particular city or state but does not provide any service beyond its service regions.
Distributed Reputation calculates a company's reputation by criteria such as a given region (country, state, city), or a route, or a particular segment of the industry (public warehousing, private warehousing), or on any other factors that's meaningful within the industry context.
Distributed Reputation is then applied in ranking results.
5. Combing Static and Dynamic Snippets
Snippets give a “window” through which to see some of a company's services and other information related to a query. It is the first thing a searcher sees about a company (the other is the Company Summary with our engine), and the quality of snippets has a large impact on whether a searcher is helped or hurting in making a right pick.
Since Google, web search engines use “dynamic” snippets that are calculated on-the-fly for a given query. Before Google, some engines used “static” snippets, which is independent of query.
Our snippets calculation combines static and dynamic snippets. Some of a company's salient information is independent of query, and could help in searcher's picking the company; and other information of the said company is dependent on query, and shall be calculated on-the-fly for a given query.
6. Search with Multiple Languages
Over the last decades, English has emerged as the language of commerce, and Chinese has established as the other language to be reckoned with in commerce. However, there has not been a search engine that is devoted to severing this international market. Namely Google™Yahoo™/MSN™ do English search, and Baidu™ does Chinese search only. All engines do exact matching. The current situation is that a user searches on Google with a Chinese query might get back pages that are in mixed Chinese and English, and the advertisements are sometimes not in Chinese, which reduces the usefulness of the search results, as well as the effectiveness of the ads. Baidu does the same thing in a mirror image.
Our system performs search with awareness of language and region. It does at least the following:
1) filtering ads based on user query's language, (considering a company that has multiple ads)
2) serving web page contents based on user query's language
3) Normalize into meta information
4) When a searcher is led by our system to a destination web site, pass the language ID, the region, and other similar information to the web site
Other aspects of the inventive subject matter that are not being prosecuted at the outset include the following:
Thus, specific embodiments and applications of searching and billing improvements have been disclosed. It should be apparent, however, to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C . . . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.
This application claims priority to U.S. provisional application Ser. No. 60/800131 filed May 11, 2006 and 60/811989 filed Jun. 7, 2006 both of which are incorporated herein by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
60800131 | May 2006 | US | |
60811989 | Jun 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11694930 | Mar 2007 | US |
Child | 11754081 | May 2007 | US |