The disclosure of the present invention generally relates to computers and computer software, and in particular to methods, systems, and computer program product that handle search queries in a database system and perform cache update adaptation.
Recommendation for certain products, certain information etc. is crucial in both academia and industry, and various techniques are proposed such as content-based collaborative filtering, matrix factorization, logistic regression, factorization machines, neural networks and multi-armed bandits. Common problems with these approaches are that (i) the recommendation is considered as a static procedure and the dynamic interactive nature between users and the recommender systems is ignored; (ii) focus is put on the immediate feedback of recommended items and the long-term rewards are neglected. One general approach to shorten response times to queries is to pre-compute or pre-collect results to search queries and maintain them in a cache. Search queries are then actually not processed on the large volumes of original data stored in data bases, but on the results as maintained in the cache.
Recommender systems with user interaction are described, for example, in Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling by Feng Liu et al., Deep Neural Networks for Choice Analysis: A Statistical Learning Theory Perspective by Shenhao Wang et al., Deep Choice Model Using Pointer Networks for Airline Itinerary Prediction by Alejandro Mottini and Rodrigo Acuna-Agost, and DRN: A Deep Reinforcement Learning Framework for News Recommendation by Guanjie Zheng et al.
What is needed, however, is a Reinforcement Learning (RL) algorithm for increasing the number of completed transactions via a website of an online travel agency (OTA), i.e., increase the rate of conversions of users just browsing through the website to actual customers.
In an embodiment, a computer-implemented system for dynamically building and adapting a search website hosted by a webserver is provided. The system includes a Reinforcement Learning module coupled to the webserver and employing a Reinforcement Learning model for controlling appearance and/or functionality of the search website by generating actions to be output to the webserver. The actions relate to controlling an order and/or rank of elements in an ordered list of travel recommendations obtained as a result from a search request to be displayed by the search website and/or arranging website controls on the search website. The Reinforcement Learning module is adapted to receive Reinforcement Learning rewards. The Reinforcement Learning rewards are generated by the search website based on user input on the search website or by a website user simulator in response to one or more of the actions generated by the Reinforcement Learning module based on state information provided by the user simulator. The rewards make the Reinforcement Learning module to adapt the Reinforcement Learning model. The website user simulator is configured to simulate an input behavior of users of the search website and feed the Reinforcement Learning module to train the Reinforcement Learning module.
In some embodiments, the search website may be a travel website for booking travel products and the actions may comprise sorting travel products to be displayed on the travel website in response to a user search request according to one or more characteristics of the travel products and/or controlling an appearance of the website controls to be shown on the travel website.
In some embodiments, the one or more characteristics of the travel product include a price, a duration of the travel product, a number of stops, a departure time, an arrival time, a type of travel provider, or a combination thereof.
In some embodiments, the website user simulator comprises a simulation model with at least one of the following parameters describing the user input behavior: a passenger segment, search behavior according to a passenger type, intention to book at a later point of time after a current search, intention to conduct another search after the current search.
In some embodiments, the passenger segment includes one or more of: business passenger, leisure passenger, senior passenger, passenger visiting friend and relatives.
However, although the segment of the user influences the search that will be done on the website or the searches simulated by the user simulator and the user behavior (booking/other search/leaving) and the user will be given a certain order or rank of elements in an ordered list of travel recommendations and/or website controls provided by the Reinforcement Learning algorithm, the segment is not directly observed by the Reinforcement Learning module to provide a realistic approach.
In some embodiments, the passenger type is specified by one or more of: day of the week for searching, time of the day for searching, number of seats, number of days until departure, Saturday night stay, importance of travel product characteristics.
In some embodiments, the rewards relate to the user booking one of the travel products displayed on the travel website.
A system according to any of the above-mentioned embodiments is also provided, and further includes the webserver hosting the search website.
The above summary may present a simplified overview of some embodiments of the invention in order to provide a basic understanding of certain aspects the invention discussed herein. The summary is not intended to provide an extensive overview of the invention, nor is it intended to identify any key or critical elements, or delineate the scope of the invention. The sole purpose of the summary is merely to present some concepts in a simplified form as an introduction to the detailed description presented below.
The accompanying drawings, that are incorporated in and constitute a part of this specification, illustrate various embodiments of the invention and, together with the general description of the invention given above, and the detailed description of the embodiments given below, serve to explain the embodiments of the invention.
A website user simulator simulates an interaction of the website, which displays search results e.g., for a particular travel itinerary requested by a simulated user. Based on the simulated, i.e., expected, reaction of the user to the displayed search results, more precisely to the way the search results are displayed to him/her, as well as the graphic interfaces and usage of functionalities on the website, a Reinforcement Learning model is adjusted to ergonomically enhance the user experience, by displaying the search results and graphical interfaces according to the user's preferences.
In order to be able to adapt and change the display and representation of requested search results on a search website e.g., of an online travel agency (OTA) with regard to the likes and dislikes of the user who requested the search results, the Reinforcement Learning model is used as a recommender system, meaning the best suited results of the particular user are recommended to him/her. In order to learn the best-suited result and/or website functionalities website controls for a particular user, the website user simulator is used to train the Reinforcement Learning model. The specification sets forth the general principles of the website presentation improvement by way of example using the travel sector, i.e., a user searching and/or booking travel products via the OTA website. However, the general principles are applicable to any search website which displays search results in response to search requests.
In order to address the above-mentioned difficulties, it is therefore generally proposed herein to utilize a Reinforcement Learning algorithm which dynamically optimizes the decision of the number of travel recommendations presented as search results and the order or rank of elements in an ordered list of travel recommendations as search result to be displayed to the user.
Employing a standard supervised learning algorithm, where the algorithm learns using labels on past data poses difficulties since there is no knowledge of which display scheme is optimal for a given share of query results for a particular user. The required expert knowledge is generally not available since the data set from which the expert obtained his/her knowledge is usually too small for a reliable recommendation to be presented for a vast majority of users with varying preferences.
Another way to build this database would be to use a brute force approach to permute all possible orders or ranks of elements in an ordered list of travel recommendations to be displayed by the search website and/or to arrangements of website controls on the search website and compare the arrangements with a number of booked travel products for every arrangement. This could yield a determination which rank and/or order of elements in an ordered list of travel recommendations/website control arrangement is the most appropriate to maximize the number of booked travel products. However, this approach has technical drawbacks, for example, it would take a lot of computation time and hardware resources to gather all these statistics, so that such an approach seems to be nearly technically infeasible in practice.
An example of a computer-implemented system to overcome these drawbacks is shown in
It is proposed herein to train a Reinforcement Learning algorithm 12 (see
Positive/negative booking results for travel products achieved using a certain order and/or rank of elements in an ordered list of travel recommendations and/or website controls arrangement on the search website are, for example, fed to the learning algorithm as a positive/negative reward 220. In the learning phase of the system, the Reinforcement Learning module 10 may report actions 130, such as a change in the order and/or rank of elements in an ordered list of travel recommendations and/or a change of website control arrangements, which were performed by the Reinforcement Learning module 10 to the website user simulator 20. In the actual production phase, with real users and searchers on the search website 300, the webserver 200 hosting the search website 300 will send feedback via its website engine to help the website user simulator 20 to reproduce some user browsing actions.
In general, the system 100 is enhanced with a Reinforcement Learning module 10 employing a Reinforcement Learning model 11 to determine an optimal order and/or rank of elements in an ordered list of travel recommendations and/or website controls arrangement on the search website 300, which is hosted 250 on a webserver 200.
More specifically, the Reinforcement Learning module 10 receives the feed 210 from a website user simulator 20. The feed 210 is a set of inputs simulated by the website user simulator, for example, a simulated query for a certain leisure-related/business-related travel product on a simulated day of week/time of day and/or a simulated timespan before departure. The website user simulator 20 might present actions that stem from a simulation model 21, programmed to simulate a behavior of a particular type of user. The simulation model 21 might be developed based on input behavior of the search website 300 of e.g., millions of different users with a certain quality (age, purpose of trip) in common. The simulation model 21 may be a model based on a multilayer neural network or deep neural network.
The Reinforcement Learning module 10 forwards the simulated query to the search website 300 having certain website controls 60. The search website 300 yields a search result, comprised of an ordered list of travel recommendations. The user simulator 20 now simulates the navigation behavior of a user. The simulated user as well as a real user may perform several successive search requests on the website, and will typically change some search parameters, such as origin and destination, the outbound and inbound dates or other options. After each search request issued by the user, the Reinforcement Learning module may change the order and/or rank of elements in an ordered list of travel recommendations. The simulated user behavior results in a book or not book decision—which is fed forward as a respective positive/negative decision to the Reinforcement Learning module 10. The simulated user, as well as real user may belong to a certain segment (e.g., businessman traveler, holiday traveler etc.) and might be simulated to behave as such.
Key performance indicators (KPIs) may be used to rate a certain order of elements in the ordered list of travel recommendations yielded as a search result and a certain arrangement of the website controls on the search website. For example, a KPI may refer to a booking percentage. The more travel products are actually booked with a rated configuration in a certain time, the higher this KPI may be.
Expert knowledge may be used to determine e.g., which options of arranging the website controls and/or the order or rank of elements in the ordered list of travel recommendations will most probably not have an influence on the KPIs—this can be used to reduce the dimensionality of the learning space.
The values of the individual KPIs may be aggregated to an aggregated value of KPIs as explained in more detail below. The KPIs may be hierarchically defined, with more general KPIs being composed of a number of more specific KPIs. KPI aggregation is then done at each hierarchy level, wherein more specific KPIs are aggregated to form the more general KPI and the more general KPIs are aggregated to establish a common reward value for a certain action.
Before discussing the present Reinforcement Learning (RL) system in more detail, we first give an overview of some general underlying concepts of Reinforcement Learning. Reinforcement Learning mechanisms are also described, for example, by the textbook “Reinforcement Learning” by Richard S. Sutton and Andrew G. Barto, published by the MIT Press in 1998. RL mechanisms utilize state-of-the-art terms having an established meaning and are used herein in this established meaning to describe the algorithm for determining an optimal order and/or rank of elements in an ordered list of travel recommendations and/or website controls arrangement, including:
The goal of the agent is to maximize the rewards not immediately, but in the long run. Hence, a long-term reward is estimated.
A general feature of Reinforcement Learning is the trade-off between exploration and exploitation:
The agent continuously learns in exploration and in exploitation mode from its environment—in the example of
Further particularities of the Reinforcement Learning algorithm design to implement the Reinforcement Learning module 10 with the Reinforcement Learning model 11 are described next with reference to
More details of the RL mode determination are described next. As explained in the introduction of Reinforcement Learning above, a balanced tradeoff is sought between these two modes.
Two balancing methods may be applied during the learning phase, namely the Epsilon-Greedy strategy or the Softmax strategy. For example, the Epsilon-Greedy strategy may be used.
During a production phase (hence a phase with real (not simulated) users), either a full exploitation will be set or a small exploration with a low percentage (e.g., 5%) may be allowed.
Regarding the learning rates' development, a standard learning rate decay scheme may be used.
An exemplary visualization of information presented on a search website 300 is given by
As mentioned above, in some examples, the search website is a travel website for booking travel products, and the actions comprise sorting travel products to be displayed on the travel website in response to a user search request according to one or more characteristics of the travel products and/or controlling an appearance of the website controls etc. to be shown on the travel website.
In some examples, the one or more characteristics of the travel product include a price, a duration of the travel product, a number of stops, a departure time, an arrival time, a type of travel provider, or a combination thereof.
Travel products 52, such as combined flights and hotel bookings, are displayed on the search website 300. The displayed travel products might comprise attributes such as a price, a duration, a number of stops, a departure time, an arrival time or a type of travel provider. The user is capable of selecting, rearranging, booking and so forth the travel products 52 via website controls 60. The Reinforcement Learning module 10 changes the appearance of the search website 300, in particular with respect to the rank and/or order of elements in the ordered list of travel recommendations displayed and the arrangement of the website controls 60. These changes are affected via actions 110 performed by the Reinforcement Learning module 10. As mentioned above and shown in
Examples of input parameters affecting a simulation model of a website user simulator are visualized by
The website user simulator 20 comprises a simulation model 21. The simulation model 21 is a computational/mathematical model used to simulate the actions of particular users. The simulation model 21 is designed and continuously adapted based on input behavior of users, which make use of the search website 300 in order to book a particular travel product 52 (
The simulation model 21 as well as the simulated actions output by the website user simulator 20 may reflect certain environment settings 23. These environment settings 23 comprise characteristics/preferences of the user such as a passenger segment, a search behavior/passenger type, an intention to book and an intention to make an additional search.
As such, in some examples, the website user simulator 20 comprises a simulation model 21 with at least one of the following parameters describing the user input behavior: a passenger segment, search behavior according to a passenger type, intention to book at a later point of time after a current search, intention to conduct another search after the current search.
In some examples, the passenger segment includes one or more of: business passenger, leisure passenger, senior passenger, passenger visiting friend and relatives.
These examples for passenger segments are now explained further: (i) business: a passenger who travels in the course of his or her business; (ii) leisure: a passenger who travels for vacation and wishes to book a hotel etc.; (iii) visiting friends and relatives: a passenger who travels to visit friends and relatives; (iv) senior: passengers users who have retired. Different segments may want to book different flights, such as the fastest flight, the cheapest flight, the most comfortable flight or a combination thereof.
An example for a search behavior/passenger type is the passenger type who just searches to receive information about existing flight connections and does not have a real intention to buy/book. Further examples for behavior/passenger type are (i) the day of the week a search for a travel takes place, (ii) the time of the day the search takes place, (iii) the number of seats that are intended (single booking, family booking), (iv) days till departure (some business passengers may tend to book closely before a planned stay some of them may however book half a year in advance, leisure passengers sometimes only book weeks in advance), (v) Saturday night stay, or (vi) an importance of characteristic (a user's priority of acquiring a travel product).
Search patterns may be estimated from past bookings received and/or stated preferences by a certain user segment. Those search patterns may not need to be fully accurate, but may be accurate enough to provide a basic pre-trained model.
An intention to book may indicate users who indeed intend to book a travel product 52 (see
An intention to make an additional search and/or an intention to leave may indicate that the user uses the website 300 to search for a particular travel product, but will make a further search later after not booking the currently search product on the same website 300 or on a different website, for example, belonging to a different travel provider.
An example for an interaction between the website user simulator 20 and the Reinforcement Learning algorithm is visualized by
The website user simulator 20, corresponding to the environment of the Reinforcement Learning module 10 (
The Reinforcement Learning algorithm 12 continuously performs actions 110 resulting in a change of the website 300 (
The website user simulator returns a reward 220 to the Reinforcement Learning algorithm 12, e.g., if the simulated user books a travel product 52 (
Hence, in some examples, the rewards relate to whether or not the user books one of the travel products displayed on the travel website.
The reward 220 may be based on recommendation features. The recommendation features are travel recommendation features and relate to the features of a travel product. If a booking decision is positive, a reward will be sent to the Reinforcement Learning algorithm 12. There is a 1:1 mapping between a positive booking decision and the reward.
Based on the reward 220 received, the Reinforcement Learning algorithm 12 performs a learning activity 17, which may lead to a modified website changing strategy using the rewards obtained (or the change in rewards obtained) as a result of previous website changing actions 110 and user input on the website.
For example, the website user simulator 20 yields actions on the website that can be categorized to be the behavior of leisure segment users. The interactions of the simulated user with the website (but not the segment the user belongs to) might be forwarded to the Reinforcement Learning algorithm 12. The action “sort by price” might be performed on the search website 300 (see
An example for an interrelation between environmental parameters and a Reinforcement Learning model is depicted by
The system of the Reinforcement Learning module may comprise the following elements depicted in
The elements of the system of
Examples of the RL algorithm and the RL settings 13 are depicted in
The production system, i.e., a system actually implementing the method according to the first aspect may be based on real users and searches, which are used to improve on initial learning of the Reinforcement Learning algorithm 12. The RL model 11 of the production system may be pre-trained based on simulation as explained above, and able to decide about changes implemented by an online travel agency, and may be able to be further trained on real user data.
An example of a flight search environment in conjunction with a Reinforcement Learning model is visualized by
A flight search front-end 400, for example, is provided by the online travel agency (OTA). This flight search front-end 400 may be in communication with a flight search back-end 500, which, for example, performs the actual search and result commutation, which may be based on a meta-search previously performed by the flight search front-end 400. The flight search back-end 500 may be in communication with the API gateway 600, which may take the search and the results yield in response to the search as an input and may redirect these to the RL model 11. The RL model 11 receives this data and is pre-trained, for example, by means of the simulated user queries received from the user simulator (the pretraining, for example, takes place in a learning phase). Furthermore, the RL model 11 may be able to decide a ranking of flight search results on an OTA website 300 or on an appearance of website controls 60. The RL model 11 can be further trained on real user data.
An example of a search tree considering whether or not search results that are suitable for the user are found and also considering which search results are found and whether or not there is an intent to book is illustrated by
The intent to book specifies the extent of the simulated user to actually book a searched product, e.g., to users who search the search website for information purposes, but may decide to book the flight on a different website/different device at a later point and thus have no intent to book.
The intent to book is, for example, used for graphing KPI purposes and is used only in the learning phase, using the simulator. Hence, the intent to book may not be used in the subsequent production phase.
A search activity 301 is performed by the user simulator 20 (
As mentioned above, the intent to book feature models users who prefer to book later on another device, even if they found what they wanted. The effect on the simulation is that it makes the simulation more realistic since in reality, many passengers just search for information purposes, without intending to really book a certain travel product 52 (
If the user has indeed an intent to book on the search website 300, the simulated search arrives at “book” 302. Otherwise, if the simulated user decides to book on a different platform or to book later, the simulated search ends at “leave” 303.
Another example of a search tree similar to the search tree of
Different to what is shown by
As also mentioned above, an intent to leave models whether or not a user makes another search after not booking on the current one. The effect on the simulation is also that the simulation becomes more realistic since many users do more than one search, immediately or hours/days later. These additional searches may be recognized with cookies. This may also help the algorithm to narrow down segments: a string of searches will make detecting the segments (leisure, business etc.) easier. This is the case, as through a lot of searches performed by the same user, it may be better identifiable to which segment the user belongs.
If the simulated user indeed has an intent to leave, the user arrives at “leave” 303. If, however, the user has no intent to leave, the user again arrives at the activity “search” 301, since the user might try a different search for a particular travel product 52 (
An example for a relation between the amount of days before departure of a travel product and the number of requests for such a travel product is illustrated by
As explained above, the implementation of the website user simulator 20 and Reinforcement Learning algorithm 12, may consider multiple passenger segments, each with a different behavior in the state space.
As also mentioned above, these different search patterns and behavior may comprise: e.g., business passengers searching on work hours, leisure passengers having different days to departure (DTD) values. Furthermore, different passenger segments have interest in different flight characteristics (cheapest, fastest, combination . . . ).
The input for the search patterns of a certain user segment can be estimated from past bookings or stated preferences. As also mentioned above, the preferences do not have to be exactly accurate. It is sufficient to provide a pre-trained model with basic quality that can be refined and improved subsequently when being employed with the production system.
The example of
In some examples, the passenger type is specified by one or more of: day of the week for searching, time of the day for searching, number of seats, number of days until departure, Saturday night stay, importance of travel product characteristics.
For the passenger segment “business”, a simulation variable “Day of week” may be set to a random number from 1 to 5. This means business searches usually relate to weekdays, and each weekday is equally probable. The probability for a Saturday night stay may be set to a 10% chance, the number of seats needed for transport may be set to one. The Day to Departure may be determined through a geometric law as the one depicted in
All such criteria used in the website user simulator 20 may be varied during a simulation or may be predefined. The number of passenger segments and shares of the passenger segments may also be modified by parameter value adjustments, i.e., without changing the simulator itself.
The website searches performed are still random, but probability laws may depend on the passenger segment's parameters. In terms of the Reinforcement Learning parameters (agent, environment, state etc.), the passenger segment's parameters correspond to the state. The state may comprise any parameter defining the current environment. As such, a state may comprise user features and search features. The booking behavior by segment may be modelled by the intent to leave, the intent to book and a choice rule, e.g., deterministic cheapest, Multinomial logistic (MNL) choice model.
An example for a learning curve of the Reinforcement Learning model is illustrated by
The learning curve given by
A diagrammatic representation of an exemplary computer system 500 is shown in
The computer system 500 includes a processor 502, a main memory 504 and a network interface 508. The main memory 504 includes a user space, which is associated with user-run applications, and a kernel space, which is reserved for operating-system- and hardware-associated applications. The computer system 500 further includes a non-volatile or static memory 506, e.g., non-removable flash and/or solid-state drive and/or a removable Micro or Mini SD card, which permanently stores software enabling the computer system 500 to execute functions of the computer system 500. Furthermore, it may include a video display 510, a user interface control module 514 and/or an alpha-numeric and cursor input device 112. Optionally, additional I/O interfaces 516, such as card reader and USB interfaces may be present. The computer system components 502 to 509 are interconnected by a data bus 518.
In some examples the software programmed to carry out the method described herein is stored on the static memory 506; in other examples external databases are used.
An executable set of instructions (i.e., software) embodying any one, or all, of the methodologies described above, resides completely, or at least partially, permanently in the non-volatile memory 506. When being executed, process data resides in the main memory 504 and/or the processor 502. The executable set of instructions causes the processor to perform anyone of the methods described above.
Although certain products and methods constructed in accordance with the teachings of the invention have been described herein, the scope of coverage of this invention is not limited thereto. On the contrary, this patent covers all embodiments of the teachings of the invention fairly falling within the scope either literally or under the doctrine of equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2003313 | Apr 2020 | FR | national |