The present disclosure relates generally to search engines for providing one or more search results respective of a query received from a user and, more specifically, to systems and methods for detecting user intent respective of a query and providing applications respective of user intent.
Search engines are used for searching for information over the World Wide Web. A web search query refers to a query that a user enters into a web search engine in order to receive search results.
A query received from a user device may be explicit or implicit in different levels. An implicit query makes it complicated to provide appropriate search results to the user because the user intent is unclear. As an example, if the user's query is “Madonna clips” it is unclear whether the user is interested in listening to the entertainer Madonna music clips, viewing Madonna's video clips or downloading Madonna's clips to the user's device.
In general, web search engines generate large databases and indexes of websites and webpages accessible on the WWW, in a process known as web crawling. Such databases and indexes are updated frequently as websites and webpages are added, deleted, and changed very frequently on the WWW. The databases of a web search engine may include information regarding each webpage in the databases, such as the actual words on the webpage, and the index usually includes information relating to how a webpage should be classified and indexed in the databases. The indexing of webpages is based on the contents of a webpage, metadata and tags defined by the web-page designers.
When a user submits a search query to a web search engine, the web search engine uses its indexing system to determine which webpages in its databases match the search query with which it was provided. The web search engine may be able to rank the webpages in its databases which most closely match the search query with which it was provided. The webpages which most closely match the search query are returned to the user and usually presented in the form of a list, also known as search results, a search results list, or even an “answer” to a user's search query.
In conventional search engines, such as Google® and Bing, an input query is checked only against the indexes and databases maintained by the search engine. That is, a search query input to Google's search engine will be fully served by Google's databases and indexes and will not be relayed to other engines (e.g., to retrieve the result).
The indexing of web contents is limited in many aspects. For example, the indexing directly relates to contents of the webpages, as such webpages are not indexed to serve specific interests of users seeking for information. In addition, the search engines are limited to search only their index database, thus search results across difference resources cannot be retrieved.
With the widespread use of smartphones these days, users search for mobile applications (also referred to as ‘apps’) and contents provided through such apps. The conventional indexing solutions are not usually designed to index mobile applications or, more specifically, contents that can be retrieved through such applications.
It would therefore be advantageous to provide a solution that would overcome the deficiencies of the conventional indexing solutions.
Certain exemplary embodiments disclosed herein include a method for indexing mobile applications. The method comprises: crawling through a plurality of data sources to detect applications accessible through a user device; for each detected application, generating metadata characterizing the application; analyzing the generated metadata to classify each detected application to at least one category; and updating an application index to include at least the classified applications and the respective classified categories.
Certain embodiments disclosed herein also include a non-transitory computer readable medium having stored thereon causing a processor to execute a process, the process comprising: crawling through a plurality of data sources to detect applications accessible through a user device; for each detected application, generating metadata characterizing the application; analyzing the generated metadata to classify each detected application to at least one category; and updating an application index to include at least the classified applications and the respective classified categories.
Certain embodiments disclosed herein also include a system for indexing mobile applications. The system comprises: an interface to a network for accessing a plurality of data sources over the network; a processor; and a memory coupled to the processor, wherein the memory contains instructions that, when executed by the processor, configure the system to: crawl through the plurality of data sources to detect applications accessible through a user device; for each detected application, generate metadata characterizing the application; analyze the generated metadata to classify each detected application to at least one category; and update an application index to include at least the classified applications and the respective classified categories.
The subject matter disclosed herein is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following detailed description taken in conjunction with the accompanying drawings.
It is important to note that the embodiments disclosed herein are only examples of the many advantageous uses of the innovative teachings herein. In general, statements made in the specification of the present application do not necessarily limit any of the various claimed embodiments. Moreover, some statements may apply to some inventive features but not to others. In general, unless otherwise indicated, singular elements may be in plural and vice versa with no loss of generality. In the drawings, like numerals refer to like parts through several views.
The various disclosed embodiments include a system and methods for indexing applications to queries. The system is configured to receive a query from a user device. A query may be, but is not necessarily limited to, a set of typed words, a voice command, and the like. The received query may be implicit or explicit. The system is then configured to generate metadata respective of the received query and determine the user intent. Respective of the user intent, the system is then configured to classify the at least one query to one or more categories, each category serves a different topic of user intents. The system is then configured to provide one or more appropriate applications respective of the at least one query.
The IDU 140 is configured to determine the user's intent respective of a query or part of a query received from the user through the user device 110. Determination of user intent is further described in co-pending U.S. patent application Ser. No. 14/103,536 filed on May 12, 2013, titled “A System and Methods for Detecting User Intent”, assigned to the common assignee, which is hereby incorporated by reference for all that it contains. The user's intent represents the type of content, the content, and/or actions that may be of interest to the user for a current time period. The IDU 140 may be further configured to send the determined user's intent to the server 130.
A user's intent may be determined based on, e.g., a query entered by a user into an engine. User intents may range from general intents (e.g., “games”) to more narrow intents (e.g., “Angry Birds®,” “tactical games,” “games involving animals”). Queries may further include one or more tokenized portions, wherein each tokenized portion represents a meaningful entity. Entities are physical or conceptual items bearing known types and attributes such as, but not limited to, products, people, locations, groups, theories, facts, virtual spaces, and so on. Types may describe an entity and can be used in identifying user intent. As a non-limiting example, the entity “Madonna” may bear types including, but not limited to, singer, director, actor, and celebrity.
Tokenized portions may be compared and contrasted to determine a user's intent. Specifically, matching types among multiple tokenized portions may indicate that the user's intent is related to such types. As a non-limiting example, a user may enter the query “madonna warren beatty”. The query may be broken down into tokenized portions “madonna” and “warren beatty.” Both Madonna and Warren Beatty are associated with the type “actor.” Thus, the user's intent may be determined to be related to actors. Specifically, the user's intent may be determined to be “movies featuring the actors Madonna and Warren Beatty.” The movie “Dick Tracy®” is a movie featuring both of these actors. As a result, a Wikipedia® article or a YouTube® video of the “Dick Tracy®” movie may be provided to a user based on the user's intent depending on the category of the query Categorization of queries is described further herein below with respect to
The system 100 may further include a database 150 for storing information such as prior user intents, prior queries received from a user, data for enhancing the search experience, applications' classification, etc. A plurality of web sources 160-1 through 160-m (collectively referred hereinafter as web sources 160 or individually as a web source 160, merely for simplicity purposes) are further connected to the network 120. The web sources 160 may include “cloud-based” applications; that is, applications executed by servers in a cloud-computing infrastructure such as, but not limited to, a private-cloud, a public-cloud, or any combination thereof. The cloud-computing infrastructure is typically realized through a data center.
Applications are typically installed on the user devices 110 or suggested to be installed by the server 130. The server 130 is configured to crawl through the applications existing in the web sources 160 as well as through the applications installed on the user devices 110 and the suggested applications. According to certain embodiments, the server 130 is configured to generate metadata respective of the applications. Such metadata may be, for example, the name of the application, the application bundle name, the application description, the application score, content of the application, a portion thereof, a combination thereof, and so on. In an embodiment, the metadata is then analyzed by the server 130 and the applications are determined to be either appropriate or inappropriate to serve one or more categories of queries, wherein each category serves a different topic of user intents. According to another embodiment, one or more additional categories may be generated dynamically respective of user intents as further described herein below with respect to
In an embodiment, an appropriate application may be provided to the user. In a further embodiment, such applications are provided to the user as virtual applications. Virtual applications are applications which run within a browser embedded in another program, thereby permitting users to utilize virtual versions of applications without downloading such applications directly.
The determination of which one or more applications of the plurality of applications are appropriate to serve one or more categories of queries is stored in the database 150 for further use. According to one embodiment, upon receiving a query from a user through a user device 110, the IDU 140 is configured to determine the user intent. The system 100 then classifies the query into one or more categories respective of the user's intent and provides the appropriate applications to the user device 110 respective of the query. Techniques for providing appropriate applications are described in more detail herein below with respect to
The system 100 may further include an agent installed locally on the user devices 110 that enable a local crawling of a search through the content of the user devices 110. The various elements of the system 100 are further described in co-pending U.S. patent application Ser. No. 13/156,999 titled “SYSTEM AND METHODS THEREOF FOR ENHANCING A USER's SEARCH EXPERIENCE”, assigned to the common assignee, which is hereby incorporated by reference for all that it contains.
Transactional category 220 typically includes one or more queries that require additional actions following the execution of a corresponding one or more applications in order to be appropriately served, for example, playing a video within a video stream website, purchasing tickets through ticket purchasing applications, and the like. Examples of such applications include the YouTube® application, the Ticketmaster® website, and so on. According to one embodiment, applications that are determined as appropriate to serve queries that classified to the transactional category 220 may be provided with one or more search results respective of the query, for example, if the query received is “watch Madonna's new video clip,” the stream of the new Madonna's video clip through YouTube® application may be provided to the user device 110 rather than the YouTube® main web page.
The navigational category 230 generally includes one or more queries that specifically mention the name and/or the designated functionality of the application. The one or more queries classified to the navigational category 230 explicitly indicate the user intent. An example for such query may be “PDF reader”, “scanner”, and so on. Applications determined as appropriate to server queries classified to the navigational category may be, for example, photos galleries, alarm clock applications, etc.
According to another embodiment, an experience category 240 may also be determined based on the user intent. The experience category may include, for example, queries such as “games for five minutes.” The user intent based on such a query is determined as quick games and, therefore, a server (e.g., the server 130) may provide such quick games to the user device 110. A person of ordinary skill in the art would readily appreciate that the queries described in
In S330, the metadata is analyzed to classify the application to one or more categorizes. The analysis of the metadata includes, for example, textual analysis of the application's description, the application bundle name, and/or name. The analysis may include querying external databases to determine the category to classify the application. In one embodiment, the crawling process further crawls through deep-URLs listed in the metadata. The contents that can be retrieved through such URLs can be indexed and analyzed.
In S340, based on the analysis of the metadata, the identified applications are determined as appropriate to serve one or more categories of queries. Determination of appropriateness is discussed further herein below with respect to
The application index may be saved in the database. Alternatively, the index may be locally saved in the device. The index may be updated based on usage of the applications and/or queries submitted by users.
In S350, it is checked whether additional applications should be indexed and if so, execution continues with S310; otherwise, execution terminates.
In S440, the application index generated, as discussed above, is searched to detect one or more applications that can appropriately serve the categories determined for the query. In an embodiment, the search returned only applications indexed with an appropriateness score above a predefined threshold. In another embodiment, applications indexed to the same categories as to the input query are returned to the user.
In S450, the matching applications are provided to the user device 110. The matching applications, i.e., search results may be displayed in a form of icons representing the matching applications being rendered and displayed on the user device. A matching application may be a “native application” and/or a “virtual application” in the browser of the native application. A native application (or app) is installed and executed on the user device. A virtual application (app) is executed on a server and only relevant content is rendered and sent to the user device. In an exemplary embodiment, content is relevant if it relates to the user's current activity. For example, if a virtual version of an app that displays content from a video streaming website is executed while a user is engaged in or attempting to view a particular video, only content that is relevant to that video would be displayed on the user device. In an embodiment, the virtual app results include contents addressed by indexed deep URLs. For example, for the query “sushi and seaweed”, a sushi recipe offed by the recipe application (mentioned) above will be returned to the user. In an embodiment, different icons can represent different type of icons. It should be noted that the search results, which may include both virtual and native apps, address the user's intent.
In S460, it is checked whether additional queries have been received and, if so, execution continues with S420; otherwise, execution terminates.
It should be appreciated that the operation of the method for indexing applications as described in
In S520, the application is analyzed. In embodiments where metadata is generated, analysis of the application may include analysis of metadata. Analysis of the application may be utilized to determine, e.g., what types of entities are included in the application, whether the application returns multimedia content (e.g., videos, music, images, etc.), whether the application is interactive (e.g., a game), statistics or parameters related to content featured in the application, and whether content included in the application is suitable for a given age group.
In S530, the results of the analysis are compared to the at least one provided category. In an embodiment, categories may be associated with certain analysis results. For example, an informational category (e.g., informational category 210) may be associated with applications that return text-based information and, in particular, information that is relevant to a particular query. Similarly, a transactional category (e.g., transactional category 220) may be associated with applications that require additional actions following application execution to be appropriately served such as, e.g., video streaming applications, social media applications, and shopping applications.
Navigational categories (e.g., navigational category 230) may be associated with applications whose description metadata matches the query (e.g., “PDF reader,” “scanner,” and so on). An experience category (e.g., experience category 240) may be associated with applications whose content bears statistics or parameters that correspond to a requirement buried in a query (e.g., the YouTube® application may be associated with the category of the query “short clips,” as YouTube® videos are generally shorter in length than, for example, Netflix® streaming content).
In a non-limiting embodiment, applications may be determined as more or less appropriate for a given query's category based on relevance to the query. As a non-limiting example, a user may provide the query “Who won the 2003 NBA championship?” The category of this query is determined to be informational. Applications such as Wikipedia® and ESPN® may be determined to be appropriate for this category based on the presence of articles related to NBA news included in each. Based on statistics demonstrating relative content of each application, however, ESPN® may be determined to be more appropriate for the given query. Thus, in an embodiment, the ESPN® application may be returned to the user rather than the Wikipedia® application.
In an exemplary embodiment, the appropriateness results may be quantized to provide an appropriateness score. The score may be represented using a numerical number, e.g., 0-10, a percentage, and the like. In S540, the appropriateness results determined in S530 are returned.
The various embodiments disclosed herein can be implemented as hardware, firmware, software, or any combination thereof. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage unit or computer readable medium consisting of parts, or of certain devices and/or a combination of devices. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPUs”), a memory, and input/output interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU, whether or not such a computer or processor is explicitly shown. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit. Furthermore, a non-transitory computer readable medium is any computer readable medium except for a transitory propagating signal.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
This application is a continuation of U.S. patent application Ser. No. 14/278,223 filed on May 15, 2014, now allowed, which claims the benefit of U.S. Provisional Application No. 61/826,047 filed on May 22, 2013. The Ser. No. 14/278,223 application is also a continuation-in-part (CIP) of: (a) U.S. patent application Ser. No. 13/712,563 filed on Dec. 12, 2012, now U.S. Pat. No. 9,141,702, which claims the benefit of U.S. Provisional Application No. 61/653,562 filed on May 31, 2012. The Ser. No. 13/712,563 Application is also a CIP of U.S. patent application Ser. No. 13/156,999 filed on Jun. 9, 2011, now U.S. Pat. No. 9,323,844, and of U.S. patent application Ser. No. 13/296,619 filed on Nov. 15, 2011, now pending. The Ser. No. 13/156,999 application claims the benefit of U.S. Provisional Application No. 61/468,095 filed on Mar. 28, 2011, and of U.S. Provisional Application No. 61/354,022 filed on Jun. 11, 2010; (b) U.S. patent application Ser. No. 13/156,999 filed on Jun. 9, 2011, now U.S. Pat. No. 9,323,844, which claims the benefit of U.S. Provisional Application No. 61/468,095 filed on Mar. 28, 2011, and of U.S. Provisional Application No. 61/354,022 filed on Jun. 11, 2010; (c) U.S. patent application Ser. No. 13/296,619 filed on Nov. 15, 2011, now pending; and (d) U.S. patent application Ser. No. 14/103,536 filed on Dec. 11, 2013, now U.S. Pat. No. 9,552,422, which claims the benefit of U.S. Provisional Application No. 61/822,376 filed on May 12, 2013. The Ser. No. 14/103,536 Application is also a CIP of the above-noted U.S. patent application Ser. No. 13/712,563. All of the applications referenced above are herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61826047 | May 2013 | US | |
61653562 | May 2012 | US | |
61468095 | Mar 2011 | US | |
61354022 | Jun 2010 | US | |
61468095 | Mar 2011 | US | |
61354022 | Jun 2010 | US | |
61822376 | May 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14278223 | May 2014 | US |
Child | 15596484 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13712563 | Dec 2012 | US |
Child | 14278223 | US | |
Parent | 13156999 | Jun 2011 | US |
Child | 13712563 | US | |
Parent | 13296619 | Nov 2011 | US |
Child | 13156999 | US | |
Parent | 13156999 | Jun 2011 | US |
Child | 14278223 | US | |
Parent | 13296619 | Nov 2011 | US |
Child | 14278223 | US | |
Parent | 14103536 | Dec 2013 | US |
Child | 13296619 | US | |
Parent | 13712563 | Dec 2012 | US |
Child | 14103536 | US |