One embodiment is directed generally to a computer system, and in particular to a computer system that generates an artificial intelligence based room personalized demand model.
Increased competition in the hotel industry has caused hoteliers to look for more innovative revenue management policies, such as personalized pricing and recommendations. Over the past few years, hoteliers have come to understand that not all guests are equal and a traditional one-size-fits-all policy might prove to be ineffective. Therefore, a need exists for hotels to profile their guests and offer them the right product/service at the right price with the goal of maximizing their profit.
Embodiments model demand and pricing for hotel rooms. Embodiments receive historical data regarding a plurality of previous guests, the historical data including a plurality of attributes including guest attributes, travel attributes and external factors attributes. Embodiments generate a plurality of distinct clusters based the plurality of attributes using machine learning soft clustering and segment each of the previous guests into one or more of the distinct clusters. Embodiments build a model for each of the distinct clusters, the model predicting a probability of a guest selecting a certain room category and including a plurality of variables corresponding to the attributes. Embodiments eliminate insignificant variables of the models and estimate model parameters of the models, the model parameters including coefficients corresponding to the variables. Embodiments determine optimal pricing of the hotel rooms using the model parameters and a personalized pricing algorithm.
Further embodiments, details, advantages, and modifications will become apparent from the following detailed description of the embodiments, which is to be taken in conjunction with the accompanying drawings.
Embodiments utilize artificial intelligence (“AI”) to predict demand for multiple hotel room categories based on the individual attributes of the hotel guests, their booking channels, and room category features, including the offered price. Embodiments further estimate the fraction of the “no-purchase guests”, or the number of the guests who decide not to book the hotel rooms, which is an unobservable variable. Embodiments output the probability of each individual guest to book a room in a specific room category. Embodiments further estimate the relative monetary value of the room features for each cluster of the hotel guests. An example of the room feature could be the type of the bed (e.g., king vs. queen), view (e.g., ocean or garden), size of room, or type of room (e.g., suite vs. single room). To generate a personalized demand model based on guest characteristics as well as room features, embodiments use a combination of clustering and a mixture of the multinomial choice modeling.
Traditional revenue management (“RM”) practices in the hotel industry use capacity control mechanisms, specifically controlling room availabilities for different categories of products, typically using length-of-stay controls. In general, the hotel industry does not use advanced demand models based on the individual attributes of the hotel guests, their booking channels and room category features. However, operating conditions have significantly changed for the hotel industry in recent years. Given the transparency of room prices via the Internet, corporate travel management companies, leisure travel agencies, and brand websites moved to a common distribution platform and started reaching into each other's customer bases. Search engines then drove this transparency even further, aggregating the online rates from all distribution channels into a single interface and showed price as one of the most prominent differentiators between hotel rooms.
In this competitive environment, traditional RM solutions, which operate under the assumption that the demand for a product does not depend on what other choices are available, are much less effective in segmenting guests with well-fenced restrictions. Therefore, there is a need for hotels to move towards price optimization solutions based on guests' willingness-to-pay and price elasticity.
Especially for the online sales, the personalized demand modeling and price optimization have seen relatively little use in the hotel industry partially due to the difficulty of directly applying these methods to the hotel booking. Most of the demand-forecasting tools currently used by the hotel industry are aimed at providing the overall number of bookings based on time series analysis, thus ignoring demand price elasticity and room category features. These demand modeling tools are often ineffective in the presence of the heterogeneous guests with significantly different willingness-to-pay.
In contrast to known solutions, embodiments implement a personalized strategy by first dividing the guest base into distinct clusters by applying a machine learning-based soft clustering model based on the guest, travel, and external attributes. Known solutions often accomplished this clustering based on only easily separate guests such as the trip purpose (e.g., leisure or business) given the assumption of homogeneous guests. This may be too restrictive to apply in practice since guests have their own characteristics which require different choice models. Even for some guests with similar attributes, their choice probabilities may depend on external attributes such as local events, holidays and the weather at the origin and the destination. Therefore, embodiments relax the strong assumption of homogeneity of guests in the choice modeling.
Embodiments include two prior sequential steps of arrival and booking decision steps. A customer can arrive (or not) in a hotel room booking system. If arrived, the customer then decides to make a reservation (or not) at the hotel. Once they have arrived at the booking system and decided to reserve a room, they would choose a room type. However, in general, observable data is available only for the customers who purchased any product and if embodiments merely fitted the demand model to the observable data, it may lead to a biased estimation and not incorporate price sensitivity appropriately. To avoid these possible biases, embodiments incorporate the no-purchase cases where customers may not arrive into the booking system because they are not interested in the hotel or customers arrived at the booking system but then leave without a purchase due to high price or the lack of available rooms. Therefore, embodiments can account for the no-purchase cases and competitors (or outside options), which may affect a customer's initial decision as compared to the previous industry solutions where they do not consider those factors.
Embodiments cluster the guests into several groups, or clusters, where the guests with similar attributes are assigned to the same cluster. Moreover, embodiments implement a soft clustering approach by allowing each guest to belong to multiple clusters with certain probabilities. Embodiments then build a multinomial choice model for each cluster, which predicts the probability of selecting a certain room category by each particular guest. Embodiments determine the number of groups using a data-driven cross-validation approach to determine the optimal number of clusters.
Since the number of attributes is generally very large, the data within each group may be sparse, leading to inaccurate predictions. In order to mitigate this, embodiments implement a “Lasso” regularization method to set the coefficients for the least important model covariates to zero by maximizing the penalized likelihood function of the mixture multinomial choice model.
In order to estimate the parameters (i.e., arrival rates, the probabilities of belonging to each group and each covariates parameters), embodiments use the Expectation-Maximization (“EM”) algorithm after performing random forest-based soft clustering to find the initial clustering probabilities. Because of the two unobservable factors (i.e., no-purchase process and cluster process), embodiments account for those latent factors. Finally, the parameters extracted from the above are plugged into the personalized pricing algorithm for determining the optimal price of each room type for each guest.
Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments. Wherever possible, like reference numbers will be used for like elements.
System 10 includes a bus 12 or other communication mechanism for communicating information, and a processor 22 coupled to bus 12 for processing information. Processor 22 may be any type of general or specific purpose processor. System 10 further includes a memory 14 for storing information and instructions to be executed by processor 22. Memory 14 can be comprised of any combination of random access memory (“RAM”), read only memory (“ROM”), static storage such as a magnetic or optical disk, or any other type of computer readable media. System 10 further includes a communication device 20, such as a network interface card, to provide access to a network. Therefore, a user may interface with system 10 directly, or remotely through a network, or any other method.
Computer readable media may be any available media that can be accessed by processor 22 and includes both volatile and nonvolatile media, removable and non-removable media, and communication media. Communication media may include computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and includes any information delivery media.
Processor 22 is further coupled via bus 12 to a display 24, such as a Liquid Crystal Display (“LCD”). A keyboard 26 and a cursor control device 28, such as a computer mouse, are further coupled to bus 12 to enable a user to interface with system 10.
In one embodiment, memory 14 stores software modules that provide functionality when executed by processor 22. The modules include an operating system 15 that provides operating system functionality for system 10. The modules further include room demand model module 16 that generates a room demand model to maximize hotel room revenue, and all other functionality disclosed herein. System 10 can be part of a larger system. Therefore, system 10 can include one or more additional functional modules 18 to include the additional functionality, such as the functionality of a Property Management System (“PMS”) (e.g., the “Oracle Hospitality OPERA Property” or the “Oracle Hospitality OPERA Cloud Services”) or an enterprise resource planning (“ERP”) system. A database 17 is coupled to bus 12 to provide centralized storage for modules 16 and 18 and store guest data, hotel data, transactional data, etc. In one embodiment, database 17 is a relational database management system (“RDBMS”) that can use Structured Query Language (“SQL”) to manage the stored data. In one embodiment, a specialized point of sale (“POS”) terminal 99 generates transactional data and historical sales data (e.g., data concerning transactions of hotel guests/customers) used for performing the optimization. POS terminal 99 itself can include additional processing functionality to perform room assignment optimization in accordance with one embodiment and can operate as a specialized room assignment optimization system either by itself or in conjunction with other components of
In one embodiment, particularly when there are a large number of hotel locations, a large number of guests, and a large amount of historical data, database 17 is implemented as an in-memory database (“IMDB”). An IMDB is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism. Main memory databases are faster than disk-optimized databases because disk access is slower than memory access, the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory eliminates seek time when querying the data, which provides faster and more predictable performance than disk.
In one embodiment, database 17, when implemented as a IMDB, is implemented based on a distributed data grid. A distributed data grid is a system in which a collection of computer servers work together in one or more clusters to manage information and related operations, such as computations, within a distributed or clustered environment. A distributed data grid can be used to manage application objects and data that are shared across the servers. A distributed data grid provides low response time, high throughput, predictable scalability, continuous availability, and information reliability. In particular examples, distributed data grids, such as, e.g., the “Oracle Coherence” data grid from Oracle Corp., store information in-memory to achieve higher performance, and employ redundancy in keeping copies of that information synchronized across multiple servers, thus ensuring resiliency of the system and continued availability of the data in the event of failure of a server.
In one embodiment, system 10 is a computing/data processing system including an application or collection of distributed applications for enterprise organizations, and may also implement logistics, manufacturing, and inventory management functionality. The applications and computing system 10 may be configured to operate with or be implemented as a cloud-based networking system, a software-as-a-service (“SaaS”) architecture, or other type of computing solution.
In general, the functionality of
Since the no-arrival and no-booking customers are not recorded in database 17, they are treated as latent or unobserved variables. As disclosed in more detail below, these latent variables are estimated using an Expectation-Maximum (“EM”) algorithm, which iteratively fits the demand model to find the most likely estimate for the rate of all customers including the no-arrival and no-booking customers.
At 204, embodiments cluster guests using machine learning methods (i.e., soft clustering).
To implement a personalized strategy, embodiments first divide the guest base into distinct clusters by applying a machine learning-based soft clustering model based on the guest, travel, and external attributes. Known solutions typically accomplish this clustering based on only easily separable guest attributes, such as the trip purpose (e.g., leisure vs. business) given the assumption of homogeneous guests. This may be too restrictive to apply in practice since guests have their own characteristics which require different choice models. Even for some guests with similar attributes, their choice probabilities may depend on external attributes such as local events, holidays, and weather at origin and destination. Therefore, embodiments relax the strong assumption of homogeneity of guests in the choice modeling.
Further, embodiments add two prior sequential steps including arrival and booking decision steps. Customers arrive (or not) in the booking system. If arrived, then they decide whether making a reservation (or not) in the hotel. Once they arrived at the booking system and decide to reserve a room, they would choose a room type.
However, the data is available only for the customers who purchased any product and if the demand model is only fitted to the observable data, it may lead to a biased estimation and not incorporate price sensitivity appropriately. To avoid these possible biases, embodiments incorporate the no-purchase cases where customers may not arrive into the booking system because they are not interested in the hotel or customers arrived in the booking system, but they would leave without purchase due to high price or lack of available rooms. Therefore, embodiments can account for the no-purchase case and competitors (or outside options), which may affect a customer's initial decision, as compared to known solutions that do not consider those factors.
At 206, embodiments perform choice modeling that develops a mixture multinomial logit model (“MNL”) model to estimate the demand. A multinomial choice model is built for each cluster of 204, which predicts the probability of selecting a certain room category by each particular guest. Embodiments determine the number of groups using a data-driven cross-validation approach to determine the optimal number of clusters.
At 208, embodiments perform variable selection by eliminating insignificant variables using a Lasso regularization method. Since the number of attributes is usually fairly large, the data within each group may be sparse, leading to inaccurate prediction. In order to mitigate this, the Lasso regularization method sets the coefficients for the least important model covariates to zero by maximizing the penalized likelihood function of the mixture multinomial choice model.
At 210, embodiments estimate model parameters using the Expectation-Maximum (“EM”) algorithm. In order to estimate the parameters (i.e., arrival rates, the probabilities of belonging to each cluster group and each covariates parameters), embodiments use the EM algorithm after performing random forest-based soft clustering to find the initial clustering probabilities. Embodiments assume a parametric model to predict the demand. Generally speaking, a parametric model is a family of a probability distribution that has a finite number of parameters that determine the characteristics of the distribution. The parameters of the model are estimated based on the data to find the values of the parameters that provides the minimal deviation from the observed data. In embodiments, the model has three sets of parameters. First, the probabilities of belonging to each cluster group is estimated by performing random forest-based soft clustering. Next, the arrival rates and booking choice parameters are estimated (i.e., the probability of arriving into the booking system and the booking choice probability (if customers arrived)). Finally, each attribute parameters are estimated, such as guest attributes, travel attributes and external factors. Because embodiments include two unobservable factors (i.e., no-purchase process and cluster process), embodiments account for those latent factors.
At 212, embodiments generate a personalized pricing policy algorithm to maximize hotel revenue. The parameters extracted from the above functionality is plugged into a personalized pricing algorithm to determine the optimal price of each room type for each guest. Further, embodiments can use the model to predict the possibility of a particular guest selecting a certain room category.
In addition of the functionality of
Personalized Demand Model
Embodiments consider K-types of hotel rooms with K different prices. The outcome variable y, as a choice of room purchased, takes a value from 1, . . . , K. The demand for the hotel rooms can vary across the individual attributes of the hotel customers, their booking channels and room category features. x denotes all of the features affecting the choice of a hotel room. The personalized demand model is the outcome y given x.
One challenging issue is that data is only available for observed purchases of the hotel rooms. If the no purchase cases are ignored and the demand model is only based on the purchased cases, it leads to biases by underestimating price sensitivity. Some customers might decide to no purchase because of higher price than their willingness to pay. To avoid such biases, embodiments model the customer arrival process by dividing a day into a small discrete time slices, denoted by t=1, . . . , T, during which at most one customer might arrive. Arrival process at time t is modeled as a Bernoulli distribution with the arrival probability denoted by λ. Given an arrival, it is assumed that a customer makes a decision between booking and non-booking any hotel room based on the prices. A logistic regression model is considered for the booking process given the room prices. For the no purchase (no booking), proxy prices can be used such as average prices for each room a day.
Given booking after arrival, guests choose a room among K different rooms according to their own preference given any conditions. For example, the demand depends on guest attributes such as loyalty status, profile preferences, ancillary services, or external attributes such as local events, holidays and weather. To model such a personalized demand, embodiments first segment the guests into G clusters (204 of
where Bt=Pr (bt=1|{tilde over (p)}t, rt=1) and {tilde over (p)}t denotes a summary statistic of the K room prices at time t such as average, minimum, maximum and etc. ptk is the k typed room price at time t and πg(xt, pt)=Pr(zt=g|xt, pt) denotes the probability of belonging the cluster g given xt and pt=(pt1, . . . , ptK)′, where zt is a cluster indicator for a customer purchased at time t.
Clustering is the process of partitioning data into subgroups so that the data points in each group are more similar to each other, according to some distance measure. Random forest for clustering uses an algorithm that generates a proximity matrix that gives a rough estimate of the distance between samples. Alternative methods for clustering can be used in other embodiments.
When analyzing data, it is generally assumed that each observation comes from one specific distribution. However, in practice, assuming that each sample comes from the same distribution might be too restrictive. Often the data are complicated. For example, the data might be skewed-distributed or multimodal. Therefore, in embodiments, mixture models are used to describe such complicated probabilistic behavior of data. A mixture model assumes that each observation is generated from one of G mixture components and within each component, it assumes a specific distribution. In embodiments, the demand for different room types is of interest, which is defined as a categorical variable and modeled as the mixture of Multinomial Logistic (MNL) regression models.
Specifically, for each time slot t with no booking customers denoted by indicator variable bt=0, it is not known whether arrival indicator variable rt is 1 or 0. Since for those time slots, rt is a latent variable, embodiments use the EM algorithm to estimate the model parameters. Here, the EM algorithm is an iterative method to find maximum likelihood estimates of parameters in statistical models that would most closely fit the observed variables.
Model Estimation of the Personalized Model
Embodiments perform model estimation of the personalized model (shown in
In connection with the EM algorithm, it is helpful to first consider the complete likelihood function when all the variables {γt, bt, zt: t=1, . . . , T} are observed, which is given by:
Then, the conditional expected log likelihood function given the observed data D={γt, bt: t=1, . . . , T, bt=1}, denoted by (θ)
The maximizer is found by implementing the EM algorithm as follows: For t-th iteration, (E-step) for given t-th updated parameter, embodiments compute:
where Σg=1G{tilde over (π)}g(xt, ptk, utt)=1 and E(rt=0|D)=1−αt.
(M-step). Obtain the (41)-th updated parameters as follows: compute
and update (β0t+1, β1t+1) by solving the following equation with respect to (β0, β1).
To update (δkg(t+1), γkg(t+1)) solve the equation with respect to (δkg, γkg).
Then, repeat (E-step) and (M-step) until a criterion meets.
This estimation method implicitly assumes that the number of cluster G is known. Since G is unknown in practice, the best G is chosen for given data. In one embodiment, 10-fold cross validation is used and G is chosen minimizing the misclassification rate. BIC is also available. If G=1 is selected, then the proposed personalized demand function based on the mixture MNL model is a classical MNL model commonly used in practice. In other words, the classical MNL model is a special case of the above model.
Variable Selection
Further in connection with 208 and the variable selection,
Embodiments specify K, which is a lasso penalty tuning parameter that enables to choose the best model. Note that (E-step) is the same as the E-step disclosed above because the penalized log-likelihood function is the conditional expected log-likelihood function with adding a function of the parameter |δkjg|+|γkg|, which is not a latent variable. (M-step) for |δkjg|+|γkg| needs to be modified due to the penalty function. After completing (E-step), a maximizer of the objective function in
The Newton algorithm to find the maximizer under the multinomial logistic regression can be tedious, because of the vector nature of the response observations. To avoid these numerical complexities, embodiments use the coordinate descent algorithm disclosed in Friedman, J. et al., “Regularization paths for generalized linear models via coordinate descent”, Journal of Statistical Software, 33(1), 1 (2010), herein incorporated by reference.
Embodiments perform partial Newton steps by forming a partial quadratic approximation to the log-likelihood function (δkjg+γkg) defined as above, allowing only (δkjg+γkg) to vary for a single class at a time, for each k and g. The partial quadratic approximation can be shown to be given by
where B is the number of the booking observations, C(·) is a constant function, and
In summary, embodiments update the (t+1)th, δkjg+γkg for k=1, . . . , K and g=1, . . . , G in the (M-Step) as follows: obtain the estimates of |δkjg|+γkg| by repeating the nested loops: for the mth iteration and g=1, G, repeat the following iteration.
where {tilde over (z)}tkg(m+1)=δk0g(m+1)+El<jxtlδklg(m+1)+Σl>jxtlδklg(m) and S(z, γ) is the soft-thresholding operator with value;
The following table describes each variable and parameters in the model:
In a regression structure as in the model described above in conjunction with
Moreover, many variables make the model complicated. Let p be the number of explanatory variables. The model in embodiments has 1 (arrival process)+2 (booking process)+(G−1)*(K−1)*(p+2), where p is the number of explanatory variables except the price. If there are 4 different room types and 3 clusters, then the number of parameters need to be estimated is 1+2+2*3*(p+2), which increases in p. As the number of parameters increases, the model complexity also increases and the prediction accuracy based on the complex model could get worse. Therefore, embodiments choose a simpler model by removing insignificant variables according to the parsimony principle.
In connection with 212, the following pricing policy algorithm can be used to determine personalized pricing:
The personalized demand model (e.g.,
As an example of using the generated model to predict the possibility of a particular guest selecting a certain room category, consider an example that uses the following experimental dataset: (1) Downtown hotel in Sydney, Australia; (2) 2 years of booking data from January, 2012-January, 2014; (3) Three different room types ($$ Suite>$$ Deluxe>$$ Superior); (4) Two different room features: City View, Water View; (5) Number of total reservations: 2,503; (6) Average booking days in advance: 10.29 days; (7) Average length of stay: 1.84 days.
Using the above dataset, the best model was: # of Clusters (G)=2 has the lowest BIC. A single MNL was used as a benchmark, which did not consider the no-purchase case or clustering. 70% of the data was used for training, and 30% was used for testing. The following performance measure was used:
The following is the preference order of Room Types ($$ Suite>$$ Deluxe>$$ Superior): (1) Deluxe—City View; (2) Deluxe—Water View; (3) Suite—City View; (4) Suite—Water View; (5) Superior—City View; (6) Superior—Water View.
As disclosed, embodiments provide personalized demand modeling for the hotel rooms based on the guest attributes. Embodiments use machine learning to cluster reservations based on guest attributes, travel attributes, and external factors prior to applying the demand choice-based model to estimate the price elasticity and willingness-to-pay of each guest cluster for different room features.
Embodiments assume that there are several clusters of guests and fit a multinomial choice model for each cluster. When those clustering mechanisms are unobservable, embodiments use a combination of soft-clustering and EM-algorithm as estimation method. Based on the clustered mixture typed choice model, embodiments define an expected revenue and solve the optimization problem to determine the optimal price, which maximizes the expected revenue to each room type for each guest.
Several embodiments are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the disclosed embodiments are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
This application claims priority of U.S. Provisional Patent Application Ser. No. 62/923,779, filed on Oct. 21, 2019, the disclosure of which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62923779 | Oct 2019 | US |