Users typically input one or more search terms as a query within a field of a search engine in order to receive information particular to the query. For example, after launching a web browser, a user can input search engine terms corresponding to a particular resource or topic (e.g., documents, links, web pages, item listings, etc.), and one or more servers hosting the search engine logic can obtain data from various remote data sources and cause a web page to display various ranked results associated with the particular resource or topic. The user may then select one or more of the various ranked result identifiers.
Search engine software typically matches terms in the query to terms as found within result candidate data sets and rank the results for display based on the matching. For example, some technical solutions employ term frequency-inverse document frequency (TF-IDF) algorithms. TF-IDF algorithms include numerical statistics that infer how important a query word or term is to a data set. “Term frequency” illustrates how frequently a term of a query occurs within a data set (e.g., a digital document, a blog post, a database, etc.), which is then divided by the data set length (i.e., the total quantity of terms in the data set). “Inverse document frequency” infers how important a term is by reducing the weights of frequently used or generic terms, such as “the” and “of,” which may have a high count in a data set but have little importance for relevancy of a query. Accordingly, a query may include the terms “The different models of product X.” These technologies may then rank a data set the highest because it includes the words “product X” with the highest frequency compared to other data sets.
The existing search engine software technologies are static and/or are costly in terms of CPU, memory, and/or throughput. For example, TF-IDF-based technologies and other technologies, such as “Best Matching (BM) 25” search engines, statically analyze the terms of the query itself against several data sets regardless of any learning techniques to help return more relevant query results. While other search engine software technologies, such as existing search engines that use Gradient Descent Boost Trees (GDBT), employ machine learning techniques, these technologies are costly. For example, particular decision tree structures, such as GDBTs, include a forest of decision trees, where each tree holds Boolean values (e.g., TRUE, FALSE) in the internal nodes of the decision tree. Typically, when a query is issued and received, various factors are obtained from the query and document. These factors and values are located in the root and branch nodes of the decision trees. Each of the relevant decision trees are traversed based on whether a factor value meets a condition (e.g., price <$25) in a node and the corresponding Boolean value (e.g., TRUE). Each of these decision trees are traversed, starting at the root node, then the branch nodes, and ultimately arrive at a leaf node of several leaf nodes, which holds the score for use in sorting search results. The cost problem is that there are typically several trees that are often very large, which means that these structures take up a lot of memory storage space as well as take a large quantity of time to process in terms of CPU and network latency because each node of the tree has to be traversed. Further, CPU execution time is often slow because of branch mispredictions. For typical CPU operations, when a next line of code to be executed depends on a result of a condition (e.g., an IF/ELSE clause), a typical optimization the CPU performs is to guess which line of code will be executed next. If the CPU guess was right, the execution of the query is continued without penalty. However, if the guess is wrong, code needs to be removed from the CPU and a correct portion of code needs to be loaded into the CPU pipeline. This causes a CPU penalty in terms of cycles lost. Because GDBTs include so many trees and very large trees (thereby increasing the quantity of conditional code execution guesses), the likelihood of GDBT branch misprediction is much greater.
Other existing search engine technologies utilize techniques, such as IF-THEN-ELSE implementations. However, these technologies require modifying/updating the source code of the search engine every time a new model is added. These technologies are also costly in terms of CPU branch mispredictions as described above. Other existing search engine technologies, such as QUICKSCORER, are also limited to trees with at most 64 leaves. This technology stores, for each node of each tree, a 64-bit length bit sequence, which uses a machine word. Accordingly, if there are n number of leaves, 0(n2) bits will be needed, thereby decreasing the chances that this will fit in cache or will take up large quantities of memory. Further, this technology stores a score or value associated with a factor in each cell of a data structure. This is redundant and accordingly adds more overhead to the data structure.
Embodiments of the present disclosure improve the existing search engine software technologies and computing devices by implementing new functions or functionalities that are less costly in terms of CPU, memory, network latency, and/or throughput. For instance, some embodiments improve existing technologies that utilize typical GDBTs because the system does not have to traverse every node of every tree, thereby decreasing CPU execution time. For the same reason, branch mispredictions are reduced or eliminated since conditional code execution guesses do not occur because each node is not traversed. Some embodiments of the present disclosure improve the existing technologies by including a data structure and/or bitmap that associates each tree with leaf invalidation pairs for factor-value pairs, which is described in more detail below. Accordingly, instead of traversing each node of each tree, the system can look up leaf invalidation pairs for each decision tree in a particular data structure and/or bitmap, which reduces I/O, latency, branch mispredictions, etc. Moreover, some embodiments improve existing technologies because there is no need to require a modification/update of source code of the search engine every time a new model is added, as IF-THEN-ELSE technologies do. Further, some embodiments improve existing technologies, such as QUICKSCORER, because the system can handle any quantity of leaves in a tree (as opposed to 64 only). Some embodiments improve these technologies by including a data structure (or set of data structures) that reduce the total space taken up in memory, as these particular embodiments may need only 0(n log 2 n) bits (e.g., as opposed to 0(n2) bits needed to represent nodes that have been invalidated in QUICKSCORER). Accordingly, there is a higher probability that this novel data structure will fit in cache, which improves throughput, CPU execution. Further, there is reduced redundancy compared to technologies by adding less overhead to data structures and taking up less memory. For example, some embodiments generate and use a more compact/efficient approach to store which tree's leaves are invalidated by associating each tree with leaf invalidation pairs, using less bits (e.g., some or each component of
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The present technology is described in detail below with reference to the attached drawing figures, wherein:
The subject matter of the present invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” may be used herein to connote different components of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
At 102, the system 100 receives a query and/or document (e.g., a search result candidate) input. For example, in some embodiments, the query input and/or document 102 is received “offline” or outside of a run-time situation. Accordingly, the query may be issued outside of a user session by training modules or administrative users. Alternatively or additionally, in some embodiments, the query/document input 102 is data that is received at run-time or as part of a user session. Accordingly, a user may open a portal and input a query string within a search engine field. A “portal” as described herein in some embodiments includes a feature to prompt authentication and/or authorization information (e.g., a username and/or passphrase) such that only particular users (e.g., a corporate group entity) are allowed access to information. A portal can also include user member settings and/or permissions and interactive functionality with other user members of the portal, such as instant chat. In some embodiments, a portal is not necessary to receive the query, but rather a query can be received via a public search engine (e.g., GOOGLE by ALPHABET Inc. of Mountain View, Calif.) or website such that no login is required (e.g., authentication and/or authorization information) and anyone can view the information. In response to a user inputting a query, such as a string of characters (e.g., “cheap jewelry”), the query is transmitted to the system, such as the one or more factor extraction modules of
The one or more factor extraction modules 104 identify and/or extract one or more factors and/or factor values from the query of the query input 102 and/or documents. A “factor” as illustrated herein describes one or more attribute values of one or more terms of the query and/or search result candidate data sets. For example, factors can be or include the factors of: listing price of an item for sale (e.g., average price that users have selected for a query), time (e.g., when a query was issued), subject category of a query (e.g., sunglasses, sports, news, video, etc.), brand, color, etc. In some embodiments, factors are identified by running one or more query/search result candidate terms through a learning model, such as a word embedding vector model (e.g., WORD2VEC). For example, the term “red sunglasses” can be run through a word embedding vector model category matrix and the term “red” may be closest to the category “color” in vector space and so “color” may be determined to be the factor and “red” is the value of the “color” factor.
In some embodiments, alternatively or additionally, techniques such as natural language processing (NLP) are used to identify factors. NLP is a technique configured to analyze semantic and syntactic content of unstructured/semi-structured data of a set of data. In certain embodiments, the natural language processing technique may be a software tool, widget, or other program configured to determine meaning behind the unstructured data. More particularly, the natural language processing technique can be configured to parse a semantic feature and a syntactic feature of the unstructured data. The natural language processing technique can be configured to recognize keywords, contextual information, and metadata tags associated with one or more portions of the set of data. In certain embodiments, the natural language processing technique can be configured to analyze summary information, keywords, figure captions, or text descriptions included in the set of data, and use syntactic and semantic elements present in this information to identify information used for dynamic user interfaces. The syntactic and semantic elements can include information such as word frequency, word meanings, text font, italics, hyperlinks, proper names, noun phrases, parts-of-speech, or the context of surrounding words. Other syntactic and semantic elements are also possible. In an illustrative example, for the phrase “big apple” in a query, “city” may be chosen as a factor (with “New York City” as a value), as opposed to “fruit” because of the semantic meaning of “big apple.”
In some embodiments, alternatively or additionally, one or more terms of the query/search result candidate data sets are tagged (e.g., with metadata) and are matched against a set of data for factor determination. For example, a user may issue a first query. When the system 100 receives the first query, it may tag and store the first query with timestamp metadata indicating when the first query was received, such that the time stamp value can be associated with a “time” factor. In another example, the system may associate the query “necklaces 25 dollars or less” with at least a factor of “price” because a set of rules may indicate that if the term “dollars,” “less,” is within a query, then the factor “price” is included, and the integer value within the query corresponds to the factor value (e.g., 25).
The one or more bit map modules 106 set a set of bits in a 116 bit map (e.g., a bit array) to indicate whether one or more leaves of one or more decision trees 110 (e.g., a GBDT tree) are reachable (e.g., set to TRUE/1) and/or non-reachable (e.g., set to FALSE/0). The setting of the bits can be based on obtaining data from the data structure 108. Bitmaps are used herein to identify the first k position bit that indicates the first reachable leaf node or the leaf node that is used for scoring search result candidates. The one or more bitmaps are described in more detail below. The one or more bitmap modules 106 may alternatively or additionally associate each decision tree with leaf invalidation pairs for factor-value pairs via the data structure 108, which is described in more detail below.
The one or more data structures 108 associate each decision tree with leaf invalidation pairs for factor-value pairs. A “leaf invalidation pair” as described herein corresponds to an index of a first leaf of a particular internal node (e.g., a branch node) of a decision tree that is invalidated when the internal node evaluates to a “FALSE” Boolean signal. An “invalidation” typically corresponds to a leaf node that is not traversed or used to obtain a score for scoring one or more search result candidates. The leaf invalidation pair also includes an index of the last leaf the internal node invalidates when it evaluates to FALSE. In particular embodiments, the leaf invalidation pair only includes these first and last leaves and no other leaves. In some embodiments, however, the leaf invalidation pair describes a range of leaves that are invalidated with the associated node evaluates to FALSE. For example, the first and last leaf described above may define a beginning and endpoint of a range of leaves that are invalidated. Alternatively or additionally, each value in the range of leaves may be identified other than just the beginning and end value. The one or more data structures 108 can utilize any number of leaves and uses a compact approach to store which decision tree leaves are invalidated, which reduces the size of the structure while improving memory access and CPU time. The one or more data structures 108 are described in more detail below.
The one or more decision trees 110 in various embodiments is a set of graphs that uses a branching method to illustrate every possible outcome of a decision. Decision trees can be linearized into decision rules, where the outcome is the contents of the leaf node, and the conditions along the path form a conjunction in an “if” clause, for example. In some embodiments, decision trees use learning or predictive modeling (e.g., data mining, machine learning, etc.) to go from observations about an item (e.g., represented in the branches) to conclusions (e.g., score relevance) about the item's target value or score represented in the leaves. As described herein, the one or more decision trees 110 is or includes one or more decision tree data structures that are utilized to score one or more features of a query. The one or more decision trees can be or include: boosted trees (e.g., GDBTs), classification trees, regression trees, bootstrap aggregated trees, rotation forest, and/or any suitable type of decision tree.
The one or more scoring modules 112 scores each search result candidate for the query/document input 102. A “search result candidate” includes one or more identifiers that are candidates for being provided as a query result set and that describes or is associated with one or more resources, such as products for sale, documents, web pages, links, etc. For example, a search result candidate can correspond to a product title of a product for sale (e.g., “$20-green toy car”), a document (e.g., a particular PDF document), a web page, a link (e.g., a URL link to one or more images), and/or any other identifier corresponding to content that can be returned in a result set of a query.
The one or more search result output modules 114 rank or sort search results based on the scoring performed by the one or more scoring modules 112. For example, in response to scoring one or more search result candidates, the search result output module(s) 114 may rank a data set based on the scoring and cause a user device to display search results based on the sorting or ranking, such as displaying a first document first on a top portion of a web page, which may be scored the highest.
When a bit is flipped to 0, in some embodiments this indicates that a particular leaf is not reachable. For example, at a second subsequent time (as represented by the bitmap 200), the system (e.g., the bitmap module 106) identifies which leaves to invalidate or set to 0. In order to do this, all tree nodes are evaluated to determine whether the conditions associated with the factors/values are TRUE or FALSE, which is described in more detail in
A ranking model, which may include a decision tree, is typically defined as a function that maps a pair (e.g., document, query) to a floating-point value (e.g., the 0.3 score within the node 221). Each of the leaf nodes 215, 217, 219, 221, 227, 229, 231, and 233 hold a floating point value, which means that each of these nodes are candidates for passing the value for scoring a search result candidate. In various embodiments, only one of the floating point values corresponding to one leaf node is used as a score to score search result candidates. This single score is then added to the total document/search result candidate score. Each individual decision tree score may then be added or integrated with all of the other decision tree scores to come up with a final search result candidate score.
Embodiments of the present disclosure utilize the leaf invalidation pairs, one or more bitmaps, and one or more novel data structures to score search result candidates, as opposed to starting from the root node 201 and traversing through each of the branches (e.g., start going left to node 203, or right to 205) based on whether the conditional test is passed or failed as identified by the Boolean value (TRUE or FALSE). The order in which internal decision tree nodes are evaluated is typically not important because scoring is based on whether leaves are reachable and not reachable as opposed to knowing which branch nodes are affected for the invalidation process. Accordingly, a top-down traversal of the decision tree 200-1 is not necessary. Therefore, an implementer may decide which order is best. This has an immediate benefit of avoiding branch mispredictions as described above.
When a root or branch node becomes FALSE or the conditional test is not met as indicated by the associated Boolean value, the left subtree or left child nodes automatically become unreachable, including all branch nodes and corresponding leaf nodes. The reverse occurs when a node becomes true (i.e., the right sub-trees are invalidated). This means that all leaf nodes descending from the particular FALSE nodes will not be used as the final leaf node for scoring. For example, because the conditional test of node 201 is evaluated to be FALSE (“F”), every descendant node of node 201 is unreachable (nodes 203 and 209) and the associated leaf nodes-207, 215, and 217—are invalidated or not used to score search result candidates, which is reflected in the leaf invalidation pairs for node 0—(0, 2)—which indicates that each of the leaf nodes 0, 1, 2 (i.e., leaf nodes 207, 215, and 217) are invalidated. Accordingly, in some embodiments the system (e.g., the search engine system 100 of
The List of Values (LV) 406 includes all values (e.g., 0.3) associated to all factors (e.g., f1<) of all decision trees in a forest. In particular embodiments, these factors are sorted first by factor and then sorted by the corresponding value in descending order. The LV list 406 also includes pointers to the LL list 402 pointing to the same factor and value within the LL list 402, plus a value of 1. These pointers also point to a “VOID” cell when the smallest value of the last factor is reached (e.g., 0.3 of f2 points to the “VOID” box). The “VOID” cell is a signal indicating that all factors for a given query/document has been associated and any other factors/values are not associated with the query/document.
The List of Factors (LF) 402 includes each factor of a decision tree forest, which is sorted by factor. These factors include 2 pointers. The first pointer 402-1 points to the largest value of factor F1 in the LV list 406, which is shown by the “0” value in the LF list 402 pointing to the “0.3” value in the LV list 406. The second pointer 402-2 points to the first cell associated with factor Fo in the LL list 404, which corresponds to leaf invalidation pair (0,0) and tree ID 0.
The Tree Leaves (TL) list 408 includes a pointer that maps particular decision trees (TL0and TL1) to individual tree leaves (e.g., 0.2) to point to a list indexed by leaf index, and at each position j, a decimal value is associated to the jth leaf of the Ith Tree. The decimal values (e.g., 0.2, 0.9, 0.3, 0.6, and 0.4) each represent corresponding leaf values for a given decision tree, which are each candidate leaves for scoring. When one of the values is selected for scoring search results, the decision tree identifier (e.g., TL0) points to the value of the last reachable leaf.
In some embodiments, the data structure 400 includes values based on the decision tree forest 301 of
For each factor (fo, fi, and f2) illustrated in
In some embodiments, in response to associating each factor with a set of values, every element in the LL list 404 is processed. Accordingly, for every factor-value pair, the system can associate these pairs with tree ID, which is illustrated as being either 0 (i.e., to) or 1 (i.e., ti) within the LL list 404. The TL list 408 is accessed to set/flip bits in a bitmap by mapping (e.g., via a pointer) each decision tree identified in the LL list 404 to a corresponding tree ID in the TL list. The decimal values (e.g., 0.2, 0.9, 0.3, 0.6, and 0.4) each represent corresponding leaf values for a given decision tree, which are each candidate leaves for scoring. These values are reflected in
Each value for a particular factor is also mapped or associated with a factor-value pair within another embedded list in some embodiments. For example, the pointer 503-1 maps the value Vo of factor fo, to a matched value and factor within the LL list 505. As illustrated in the LL list 505, each factor-value pair corresponds to a Key (e.g., fo,Vo) within the LL list 505 and is associated with the values, which include the tree ID where the particular factors are located and each leaf invalidation pair of those factors. For instance, the LL list 505 illustrates that factor fo associated with vo is located in Tree ID X, ID y, and ID Z, each of which respectively include leaf invalidation pairs 1, 2, and 3. In this way, each value of each factor is associated with the various decision trees that include such factors and conditional tests and the associated leaf invalidation pairs for the value. For instance, referring back to
Per block 603, a first quantity of bitmaps are generated (e.g., by the bit map module(s) 106 of
Per block 605, each of the identified/extracted factor(s) and value(s) are associated (e.g., by the bit map module(s) 106) with each decision tree in a forest that holds those factor/value(s) and a set of leaf invalidation pairs. In some embodiments, data structures, such illustrated in
Per block 607, a set of bits of the one or more bitmaps are modified to indicate which leaves of each tree is not reachable based on the leaf invalidation pairs. For example, referring back to
Per block 609, a score for a document/query is initialized to zero to begin the scoring process where the score increases based on the first K position. Per block 611, the first K position bit is identified (e.g., by the scoring module(s) 112) within the modified bitmap(s). The first K position bit indicates that a leaf is reachable (e.g., the bit is set to 1). For example, referring back to
Per block 613, one or more search result candidates are scored based at least in part on the first K position bit. For example, referring back to
In various embodiments, the process 600 is associated with a ranker function that maps each (document, query) pair with a score. Accordingly, each document can be evaluated separately. Therefore, for each (document, query) pair, the bitmaps as described above are created (e.g., block 603) in particular embodiments, as one document at a time is evaluated. Once the score for all documents in a data set are calculated, then the documents can be ranked or sorted. For example, if there is a database of documents D=[d0, d1, d2, d3, and d4], when the system receives the query Q (e.g., a runtime user request), in order to sort all the documents in D according to the query Q, the following algorithm according to the process 600 is as follows in certain embodiments: for each document d in D: BEGIN OF FOR LOOP of blocks 601, 603, 605, 607, 609, and 611. After block 611, add all the scores associated with the leaves detected in block 611 to the variable initialized in block 609. This will be the score for document d in D. END OF FOR LOOP. Once the scores for each of the documents d are obtained, they are then sorted and returned to the user.
Per block 704, one or more factors and/or values of the query and/or search results candidate are identified extracted (e.g., by the factor extraction module 104). For example, a user may issue the query “cheap toy cars.” Accordingly, factors and/or values of price and toy cars are identified. Per block 706, a first position bit, within a modified set of bitmaps, are identified, which indicates that a particular leaf of one or more decision trees are reachable. Referring to the example above, the factors and/or values associated with price and toy cars may be matched against decision trees and/or bitmaps that include the same factors and/or values. Referring back to 2A, the first position bit indicating that a leaf for a tree is identified, which is 3rd leave, with a floating point value of 0.3 In some embodiments, one or more data structures are analyzed (e.g., via the bit map module(s) 106), which associate one or more decision trees with one or more leaf invalidation pairs for one or more factor-value pairs. For example, after one or more factors are extracted (e.g., via the factor extraction module(s) 104), those factors are located in a data structure and are mapped to each decision tree ID that the factors belong to and mapped to leaf invalidation pairs for those factors, as illustrated in
Per block 708, one or more search results are provided based at least on the identifying of the first position bit (and/or any of the steps indicated in the process 600). In these embodiments, an output is provided (e.g. via the search result output module(s) 114) that causes search results to be sorted on a user device. For example, the search engine system 100 can transmit sorted results to a user device, which causes the user device to display ranked results corresponding to the score. In one example, the results are ranked by relevancy from a top-down approach, where a first search result at the very top of a page is the highest scoring candidate and the second search result on the last page is the lowest scoring candidate. Accordingly, the first search result is more relevant than the second search result. In some embodiments, search result candidates are scored immediately before the providing at block 708 (e.g., as opposed to scoring at block 613 of
These components can communicate with each other via the network(s) 816, which can be or include any suitable network such as a Personal Area Network (PAN) (e.g., a Bluetooth® (by BLUETOOTH SIG) network), a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the internet).
In some computing environments, more or fewer components may be present than illustrated in
In some embodiments, the computing environment 800 is the environment in which the processes 600, 700, and/or any other action described herein can be implemented within. The user device(s) 802 include any device associated with a user, such as a mobile phone, desktop computer, sensor devices, etc. In some instances, these devices include a user interface and/or query interface. Users can also transmit requests from the one or more user devices 802, such as the query input 102 of
The one or more control servers 805 in embodiments represent the system that acts as an intermediary or coordinator for executing the one or more queries from the one or more user devices 802. For example, in some embodiments the one or more control servers 805 includes some or each of the components as described in
The invention may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program modules, being executed by a computer or other machine, such as a personal data assistant or other handheld device. Generally, program modules including routines, programs, objects, components, data structures, etc., refer to code that perform particular tasks or implement particular abstract data types. The invention may be practiced in a variety of system configurations, including hand-held devices, consumer electronics, general-purpose computers, more specialty computing devices, etc. The invention may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network.
With reference to
In some embodiments, the computing device 008 represents the physical embodiments of one or more systems and/or components described above. For example, the computing device 008 can be the one or more user devices 802 and/or control server(s) 805 of
Computing device 008 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 008 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 008. Computer storage media does not comprise signals per se. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
Memory 12 includes computer storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, etc. Computing device 008 includes one or more processors 14 that read data from various entities such as memory 12 or components 20. Presentation component(s) 16 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc.
I/O ports 18 allow computing device 008 to be logically coupled to other devices including I/O components 20, some of which may be built in. Illustrative components include a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc. The I/O components 20 may provide a natural user interface (NUI) that processes air gestures, voice, or other physiological inputs generated by a user. In some instance, inputs may be transmitted to an appropriate network element for further processing. A NUI may implement any combination of speech recognition, touch and stylus recognition, facial recognition, biometric recognition, gesture recognition both on screen and adjacent to the screen, air gestures, head and eye-tracking, and touch recognition associated with displays on the computing device 008. The computing device 008 may be equipped with depth cameras, such as, stereoscopic camera systems, infrared camera systems, RGB camera systems, and combinations of these for gesture detection and recognition. Additionally, the computing device 008 may be equipped with accelerometers or gyroscopes that enable detection of motion.
As described above, implementations of the present disclosure relate to automatically generating a user interface or rendering one or more applications based on contextual data received about a particular user. The present invention has been described in relation to particular embodiments, which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present invention pertains without departing from its scope.
From the foregoing, it will be seen that this invention is one well adapted to attain all the ends and objects set forth above, together with other advantages which are obvious and inherent to the system and method. It will be understood that certain features and subcombinations are of utility and may be employed without reference to other features and subcombinations. This is contemplated by and is within the scope of the claims.
Embodiments of the present disclosure generally include a computer-implemented method, a non-transitory computer storage medium, and a system. In one aspect, the computer-implemented method can include the following operations. One or more factors of a query and one or more search result candidates are identified. A data structure associates a plurality of decision trees with one or more leaf invalidation pairs for at least a first value of the one or more factors. The one or more search result candidates are scored based at least in part on the associating of the plurality of decision trees with one or more leaf invalidation pairs for at least the first value of the one or more factors within the data structure.
In another aspect, the non-transitory computer storage medium can store computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform the following operations. A query request is received for one or more resources. One or more factors of the query are identified. Within a modified set of bitmaps, a first position bit that indicates that one or more leaves are reachable of one or more decision trees are identified. The modified set of bitmaps are generated based at least on a data structure that associates one or more values within one or more decision trees with a set of leaf invalidation pairs. One or more search results are provided based at least on the identifying of the first position bit.
In yet another aspect, the system can include at least one computing device having at least one processor. The system can further include at least one computer readable storage medium having program instructions embodied therewith. The program instructions can be readable/executable by the at least one processor to cause the system to perform the following operations. One or more factors of one or more search result candidates are identified. A data structure is generated that associates a plurality of decision trees with a range of leaves that have been invalidated for one or more nodes of the plurality of decision trees for at least a first value of the one or more factors. One or more search result candidates are scored based at least in part on analyzing the data structure.
“And/or” is the inclusive disjunction, also known as the logical disjunction and commonly known as the “inclusive or.” For example, the phrase “A, B, and/or C,” means that at least one of A or B or C is true; and “A, B, and/or C” is only false if each of A and B and C is false.
A “set of” items means there exists one or more items; there must exist at least one item, but there can also be two, three, or more items. A “subset of” items means there exists one or more items within a grouping of items that contain a common characteristic.
A “plurality of” items means there exists more than one item; there must exist at least two items, but there can also be three, four, or more items.
“Includes” and any variants (e.g., including, include, etc.) means, unless explicitly noted otherwise, “includes, but is not necessarily limited to.”
A “user” or a “subscriber” includes, but is not necessarily limited to: (i) a single individual human; (ii) an artificial intelligence entity with sufficient intelligence to act in the place of a single individual human or more than one human; (iii) a business entity for which actions are being taken by a single individual human or more than one human; and/or (iv) a combination of any one or more related “users” or “subscribers” acting as a single “user” or “subscriber.”
The terms “receive,” “provide,” “send,” “input,” “output,” and “report” should not be taken to indicate or imply, unless otherwise explicitly specified: (i) any particular degree of directness with respect to the relationship between an object and a subject; and/or (ii) a presence or absence of a set of intermediate components, intermediate actions, and/or things interposed between an object and a subject.
A “data store” as described herein is any type of repository for storing and/or managing data, whether the data is structured, unstructured, or semi-structured. For example, a data store can be or include one or more: databases, files (e.g., of unstructured data), corpuses, digital documents, etc.
A “module” is any set of hardware, firmware, and/or software that operatively works to do a function, without regard to whether the module is: (i) in a single local proximity; (ii) distributed over a wide area; (iii) in a single proximity within a larger piece of software code; (iv) located within a single piece of software code; (v) located in a single storage device, memory, or medium; (vi) mechanically connected; (vii) electrically connected; and/or (viii) connected in data communication. A “sub-module” is a “module” within a “module.”
The terms first (e.g., first cache), second (e.g., second cache), etc. are not to be construed as denoting or implying order or time sequences unless expressly indicated otherwise. Rather, they are to be construed as distinguishing two or more elements. In some embodiments, the two or more elements, although distinguishable, have the same makeup. For example, a first memory and a second memory may indeed be two separate memories but they both may be RAM devices that have the same storage capacity (e.g., 4 GB).
The term “causing” or “cause” means that one or more systems (e.g., computing devices) and/or components (e.g., processors) may in in isolation or in combination with other systems and/or components bring about or help bring about a particular result or effect. For example, a server computing device may “cause” a message to be displayed to a user device (e.g., via transmitting a message to the user device) and/or the same user device may “cause” the same message to be displayed (e.g., via a processor that executes instructions and data in a display memory of the user device). Accordingly, one or both systems may in isolation or together “cause” the effect of displaying a message.