The present disclosure relates to a prediction device and a prediction method that predict a flow of a shopper.
PTL 1 discloses a customer simulator system that calculates a probability of a customer staying at each of a plurality of shelves in a shop, based on a probability of a customer staying in the shop, a staying time of a customer in the shop, distances among the shelves in the shop, and other information. With this, it is possible to calculate a customer unit price after a layout of goods on the shelves is changed, and it is thus possible to predict the sales after the layout change.
PTL 1: Japanese Patent No. 5905124
The present disclosure provides a prediction device and a prediction method that predict a flow of a shopper after a change of goods layout.
A prediction device of the present disclosure is a prediction device that predicts a flow of a person after a layout change of goods in a region, and the prediction device includes: an obtaining unit that obtains traffic line information representing flows of a plurality of persons in the region, layout information representing layout positions of the goods, and change information representing a layout change of the goods; and a controller that generates an action model of a person in the region, by an inverse reinforcement learning method, based on the traffic line information and the layout information and that predicts a flow of a person after the layout change of the goods, based on the action model and the change information.
A prediction method of the present disclosure is a prediction method for predicting a flow of a person after a layout change of goods in a region, and the prediction method includes: a step of obtaining traffic line information representing flows of a plurality of persons in the region, layout information representing layout positions of the goods, and change information representing a layout change of the goods; a step of generating an action model of a person in the region by an inverse reinforcement learning method, based on the traffic line information and the layout information; and a step of predicting a flow of a person after the layout change of the goods, based on the action model and the change information.
The prediction device and the prediction method of the present disclosure enable prediction of a flow of a shopper after a change of goods layout with a high degree of accuracy.
Hereinafter, exemplary embodiments will be described in detail with appropriate reference to the drawings. However, an unnecessarily detailed description will not be given in some cases. For example, a detailed description of a well-known matter and a duplicated description of substantially the same configuration will be omitted in some cases. This is to avoid the following description from being unnecessarily redundant and thus to help those skilled in the art to easily understand the description.
Note that the inventors provide the accompanying drawings and the following description to help those skilled in the art to sufficiently understand the present disclosure, but do not intend to use the drawings or the description to limit the subject matters of the claims.
The inventors considered that because a change of goods layout in a shop changes actions of shoppers, it is necessary to consider changes in the shoppers' actions associated with the layout change in order to optimize the layout of the goods with a high degree of accuracy. However, in PTL 1, the action of a shopper is simulated based on the condition that the probability of the shopper moving to a shelf of a plurality of shelves is higher when the moving distance to the shelf is shorter.
However, the shelf that a shopper visits depends on a purpose of purchase of the shopper. Therefore, a shopper does not always take a course with the shortest movement path when shopping. Consequently, if the simulation is performed based on the condition that, of a plurality of shelves, the shopper moves at a higher probability to the shelf that the shopper can reach with a smaller moving distance, it is not possible to simulate the flow of the shopper with a high degree of accuracy.
In view of the above issue, the present disclosure provides a prediction device that enables accurate prediction of a flow of a shopper after a change of goods layout. Specifically, a prediction device of the present disclosure predicts the flow of a shopper after a change of goods layout, on the basis of an actual goods layout (shop layout) and actual traffic lines of shoppers by an inverse reinforcement learning method.
Hereinafter, a prediction device of the present disclosure will be described in detail.
Communication unit 10 includes an interface circuit used for communication with an external device based on a predetermined communication standard, for example, a local area network (LAN), WiFi, Bluetooth (registered trademark), and a universal serial bus (USB). Communication unit 10 obtains goods-layout information 21, traffic line information 22, and purchased goods information 23.
Goods-layout information 21 is information representing actual layout positions of goods. Goods-layout information 21 includes, for example, identification numbers (ID) of goods and identification numbers (ID) of shelves on which the goods are disposed.
Traffic line information 22 is information representing flows of shoppers in a shop. Traffic line information 22 is generated from a video of a camera installed in the shop or other information.
Traffic line information 22 represents flows of shoppers by, for example, the identification numbers s1 to s26 of the areas (isles) that the shoppers have passed through.
Purchased goods information 23 is information representing the goods that a shopper purchased in the shop. Purchased goods information 23 is obtained from a point of sales (POS) terminal device or the like in the shop.
Storage 20 stores goods-layout information 21, traffic line information 22, and purchased goods information 23 obtained through communication unit 10 and action model information 24 generated by controller 40. Storage 20 is implemented by, for example, a hard disk drive (HDD), a solid state drive (SSD), a random access memory (RAM), a dynamic random access memory (DRAM), a ferroelectric memory, a flash memory, a magnetism disk, or a combination of these storage devices.
Operation unit 30 receives an input to prediction device 1 by a user. Operation unit 30 is configured with a keyboard, a mouse, a touch panel, and other devices. Operation unit 30 obtains goods-layout change information 25.
Goods-layout change information 25 represents goods whose positions or layout will be changed, and represents places of the goods after the layout change. Specifically, goods-layout change information 25 includes, for example, identification numbers (ID) of goods whose positions or layout will be changed and identification numbers (ID) of the shelves after the layout change.
Controller 40 includes: first characteristic vector generator 41 that generates from goods-layout information 21 a characteristic vector (area characteristic information) f(s) representing a characteristic of each of areas s1 to s26 in the shop; and model generator 42 that generates an action model of a shopper on the basis of traffic line information 22 and purchased goods information 23.
The characteristic vector f(s) includes at least information representing an item of purchasable goods in each of areas s1 to s26. Note that the characteristic vector f(s) may include, in addition to the information representing purchasable goods in the areas, information representing distances from the areas to goods shelves, an entrance and exit, or a cash desk and may include information representing planar dimensions of the areas and other information.
Model generator 42 includes traffic line information divider 42a and reward function learning unit 42b. Traffic line information divider 42a divides traffic line information 22 on the basis of purchased goods information 23. Reward function learning unit 42b learns reward r(s) on the basis of the characteristic vector f(s) and divided traffic line information 22.
An “action model of a shopper” corresponds to a reward function expressed by following Equation (1).
r(s)=ϕ(f(s)) Equation (1)
In Equation 1, the reward r(s) is expressed as a mapping ϕ(f(s)) of the characteristic vector f(s). Reward function learning unit 42b obtains action model information 24 of a shopper, by learning the reward r(s), from a plural series of data about a traffic line of the shopper, in other words, an area transition. Action model information 24 is a function (mapping) ϕ in Equation (1).
Controller 40 further includes second characteristic vector generator 44 and traffic line prediction unit 45.
Together with goods-layout information corrector 43 that corrects goods-layout information 21 on the basis of goods-layout change information 25 having been input via operation unit 30, second characteristic vector generator 44 generates a characteristic vector F(s) representing the characteristic of each area in the shop when the goods layout is changed, on the basis of corrected goods-layout information 21. Traffic line prediction unit 45 predicts a traffic line (flow) of a shopper on the basis of the characteristic vector F(s) after a change of goods layout and on the basis of action model information 24 after a change of goods layout. Note that instead of correcting the actual goods-layout information 21 on the basis of the goods-layout change information 25, goods-layout information corrector 43 may newly generate goods-layout information 21 after the layout change.
Controller 40 can be implemented by a semiconductor device or other devices. Functions of controller 40 may be configured with only hardware or may be achieved by a combination of hardware and software. Controller 40 can be configured with, for example, a microcomputer, a central processor unit (CPU), a micro processor unit (MPU), a digital signal processor (DSP), a field-programmable gate array (FPGA), and an application specific integrated circuit (ASIC).
Display 50 displays, for example, the predicted traffic line or a result of an action. Display 50 is configured with a liquid crystal display, an organic electroluminescence (EL) display, or other devices.
Communication unit 10 and operation unit 30 correspond to an obtaining unit that obtains information from outside. Controller 40 corresponds to an obtaining unit that obtains information stored in storage 20. Further, communication unit 10 corresponds to an output unit that outputs a prediction result to outside. Controller 40 corresponds to an output unit that outputs a prediction result to storage 20. Display 50 corresponds to an output unit that outputs a prediction result on a screen.
First, a description will be given on how to generate the action model of a shopper. The action model of a shopper is generated by an inverse reinforcement learning method. The inverse reinforcement learning method is for estimating a “reward” from a “state” and an “action”.
In the present exemplary embodiment, the “state” shows that a shopper is in a specific area of the areas made by discretely dividing the inside of the shop. Further, a shopper moves from one area to another (transitions between states) according to the “action”. The “reward” is an imaginary numerical quantity for describing a traffic line of a shopper, and a shopper is assumed to repeat the “action” that maximizes a total sum of “rewards” each of which is obtained every time when the shopper makes one state transition. In other words, imaginary “rewards” are each assigned to each area, and the “rewards” are estimated by the inverse reinforcement learning method in such a manner that the series of “actions” (series of state transitions) in which the sum of the “rewards” is large coincides with the traffic line through which shoppers frequently go. As a result, the area whose “reward” is high mostly coincides with the area that shoppers often stay in or pass through.
With reference to
With reference to
Here, traffic line information 22 and purchased goods information 23 are associated with each other by the identification numbers G1 to Gm of the respective shoppers and other information. For example, because the fact that the time when a shopper is at a cash desk and the time when purchasing of an item of goods is completely input at the cash desk coincide with each other, controller 40 may associate traffic line information 22 with purchased goods information 23 on the basis of the date and time contained in traffic line information 22 and the date and time contained in purchased goods information 23. Further, controller 40 may obtain, via communication unit 10, traffic line information 22 and purchased goods information 23 that are associated with each other by, for example, the identification numbers of shoppers, and controller 40 may store obtained traffic line information 22 and purchased goods information 23 into storage 20.
With reference to
With reference to
Specifically, for example, as shown in
With reference to
Specifically, reward function learning unit 42b learns the reward function of each state s expressed by Equation (1), by using the characteristic vector f(s) generated in step S102 and by using as learning data a plurality of pieces of traffic line data corresponding to the purchasing stages m1 and m2. In this learning, the mapping ϕ is obtained in such a manner that a probability, of passing through (or staying in) each area, calculated from the reward r(s) estimated by the mapping ϕ coincides most with the probability, of passing through (or staying in) each area, obtained from the learning data.
As a method for obtaining such a mapping ϕ, it is possible to use a method in which updating is repeatedly performed by using a gradient method, and to use a method of learning by a neural net. Note that, as a method of obtaining the probability, of passing through (or staying in) each area, from the reward r(s), a method based on a reinforcement learning method can be used, and a method to be described later in [2.3 Traffic line prediction after change of goods layout] is used as a specific method.
With reference to
2.3. Traffic Line Prediction after Change of Goods Layout
Next, a description will be given on prediction of a traffic line of a shopper in the case that a goods layout is changed. The traffic line of a shopper when a goods layout is changed is obtained by a reinforcement learning method. The reinforcement learning method estimates the “action” from the “state” and the “reward”.
Further, with reference to
R(s)=ϕ(F(s)) Equation (2)
The function (mapping) ϕ in Equation (2) is action model information 24 stored in storage 20 in step S108 in
In order to predict the traffic lines of a shopper with respect to the purchasing stage m1 shown in
With reference to
U
π(si)=R(si)+γR(si+1)+γ2R(si+2)+ . . . +γnR(si+n) Equation (3)
Here, γ is a coefficient for temporally discounting a future reward.
Next, traffic line prediction unit 45 calculates, for each action a, an expectation ΣT(s, a, s′)Uπ(s′) of the total sum of the rewards expected to be obtained when possible actions in the state s are taken (step S303). Traffic line prediction unit 45 updates the strategy π(s) with the action a, with which one of expectations ΣT(s, a, s′)Uπ(s′) calculated for the respective possible actions a is the largest, as the new strategy π(s) for the state s, and traffic line prediction unit 45 updates the expected reward sum Uπ(s) (step S304).
Specifically, in steps S303 and S304, traffic line prediction unit 45 updates the optimum strategy π(s) and the expected reward sum Uπ(s) of each area by Equations (4) and (5) shown below on the basis of the reward R(s) of each area (state s).
T(s, a, s′) represents a probability that the state transitions to the state s′ when an action a is taken in the state s.
In the present exemplary embodiment, the state s represents the area, and the action a represents a traveling direction between areas. Therefore, when the state s (area) and the action a (traveling direction) are determined, the next state s′ (area) is automatically determined uniquely; therefore, T(s, a, s′) can be determined on the basis of the layout of the area in the shop.
Therefore, if the area adjacent, to the area corresponding to the state s, in the direction corresponding to an action a is the state s′, an equation T(s, a, s′)=1 may hold; and an equation T(s, a, s″)=0 may hold for the states s″ corresponding to the other areas.
Traffic line prediction unit 45 determines if the strategy π(s) and the expected reward sum Uπ(s) are determined for all of the states s (step S305). The determination here means that the strategy π(s) and the expected reward sum Uπ(s) are converged for all of the states s. Until the strategy π(s) and the expected reward sum Uπ(s) are determined for all of the states s, step S303 and step S304 are repeated. That is, in Equations (4) and (5), by updating π(s) with the action a, which maximizes the expectation ET(s, a, s′)Uπ(s′), as the new strategy and by simultaneously updating Uπ(s), the optimum strategy π(s) and the expected reward sum Uπ(s) can finally be obtained.
Further, with reference to
In the area S16, the actions a1, a2, a3, and a4 can be taken. In this case, the expectations ΣT(s16, a1, s′)Uπ(s′), ET(s16, a2, s′)Uπ(s′), ET(s16, a3, s′)Uπ(s′), and ET(s16, a4, s′)Uπ(s′) when the actions a1, a2, a3, and a4 are respectively taken are calculated. Note that the symbol E means the sum with respect to s′, in other words, with respect to s13, s15, s17, and s20.
Then, traffic line prediction unit 45 selects the action a corresponding to the largest value of the calculated expectations. For example, if ET(s16, a3, s′)Uπ(s′) is the largest, updating is performed as π(s16)=a3 and Uπ(s16)=ET(s16, a3, s′)Uπ(s′). By repeating the updating based on Equations (4) and (5) for each area as described above, the optimum strategy π(s) and the expected reward sum Uπ(s) for each area are finally determined.
In the above description, the strategy π(s) is obtained by a method in which only one action is deterministically selected, but the strategy π(s) can be stochastically obtained. Specifically, as the probability that an action a is to be taken in the state s, the strategy π(s) can be determined as Equation (6).
However, the denominator of the right-hand side in Equation (6) is such a normalization term that normalizes the total sum of P(a|s) to be 1 with respect to a.
With respect to
The probability T(si, a, si+1) is a probability that the state is transitioned to the state si+1 when an action a is taken in the state s, and the value of the probability T(si, a, si+1) is previously determined as described above.
Note that in the case that the above-described deterministic strategy π(s), in which only one action is selected, is taken, P(si+1|si) can be obtained by setting the transition probability as follows. When only such action is taken, the transition probability is set as P(a|si)=1, and when an action other than such action is taken, the transition probability is set as P(a|si)=0.
Traffic line prediction unit 45 calculates the transition probability P(sa→sb) of a predetermined path (area sa→sb) on the basis of the transition probability P(si+1|si) calculated in step S306 (step S307). Specifically, by calculating the product of the transition probabilities from the area sa to the area sb by using Equation (7), the transition probability P(sa→sb) of the path sa→sb is calculated. For example, traffic line prediction unit 45 calculates the transition probability P(s1→s12) of the traffic line from entering the shop to purchasing the item Xo by P(s1)×P(s6|s1)×P(s9|s6)×P(s12|s9). Note that the predetermined path (area sa→sb) for which the transition probability P(sa→sb) should be calculated may be specified via operation unit 30.
Alternatively, it is also possible to form the transition probabilities in a matrix and to obtain the transition probability P(sa→sb) by repeatedly performing matrix product of the matrix. The matrix of the transition probabilities is a matrix whose component (i, j) is P(sj|si), and the sum of all probabilities of leaving from the area sa and arriving at the area sb after passing through any path can be obtained by repeating the product of this matrix.
When the transition probability P(sa→sb) is high, it means that many shoppers pass through the path (area sa→sb). On the other hand, the transition probability P(sa→sb) is low, it means that almost no shopper passes through the path (area sa→sb). As an output of the prediction result (step S205 of
Note that the prediction result to be output in step S205 of
In
Prediction device 1 of the present disclosure is a prediction device that predicts a flow of a person after a layout change of goods in a shop (an example of a region), and the prediction device includes: communication unit 10 (an example of an obtaining unit) that obtains traffic line information 22 representing flows of a plurality of persons in the shop, and goods-layout information 21 representing layout positions of the goods, operation unit 30 (an example of an obtaining unit) that obtains goods-layout change information 25 representing a layout change of the goods; and controller 40 that generates an action model (action model information 24=ϕ) of a person in the shop on the basis of traffic line information 22 and goods-layout information 21 by an inverse reinforcement learning method and that predicts a flow of a person after the layout change of the goods, based on the action model and the goods-layout change information 25.
This arrangement makes it possible to accurately predict a flow of a person when a layout of goods is changed, without actuary changing the goods layout. In addition, on the basis of the predicted flow of a person, it is possible to change the positions of the goods to such positions that improve the sales. Alternatively, when a bargain sale, an event, or the like is held in view of concurrent selling, prediction device 1 can be used to consider a layout change, for example, to determine where to hold the above bargain sale and so on so that the customer unit price will be increased by smoothening or disrupting the flow of people in the shop.
The action model is specifically generated as follows. A shop (an example a region) contains a plurality of areas (an example of zones, and, for example, the areas s1 to s26 shown in
Before the action model is generated, communication unit 10 (an example of an obtaining unit) further obtains purchased goods information 23 representing one or more goods among the goods that a plurality of persons in the shop purchased. Then, controller 40 groups the plurality of persons on the basis of purchased goods information 23 and generates the action model on the basis of traffic line information 22 after the grouping.
This operation makes it possible, for example, to generate the action model of a group that purchased the same item of goods (that is, the action model about a group having the same purpose of purchase); therefore, it is possible to generate a more accurate action model.
Further, controller 40 divides each of the flows of the plurality of persons into a plurality of purchasing stages on the basis of traffic line information 22 and generates an action model for each of the plurality of purchasing stages. The magnitude of the reward changes depending on the purchasing stages. For example, it is considered that, even in the same area, the magnitude of the reward changes between before and after the purchase of a target item of goods. Therefore, by generating the action model for each purchasing stage, more accurate action models can be generated.
The prediction of the flow of a person, after a change of goods layout, on the basis of the action models is specifically performed as follows. With reference to
This arrangement makes it possible to show the flow of a person after the goods layout is changed. Therefore, on the basis of the predicted flow of a person, a proprietor of the shop can actually change the positions of the goods to such positions that improve the sales, for example.
A prediction method of the present disclosure is a prediction method in which a flow of a person after a layout change of goods in a shop (an example of a region) is changed. Specifically, the prediction method includes: step S101 for obtaining goods-layout information 21 representing layout positions of goods shown in
This arrangement makes it possible to accurately predict a flow of a person when a layout of goods is changed, without actuary changing the goods layout. In addition, on the basis of the predicted flow of a person, it is possible to change the positions of the goods to such positions that improve the sales.
The first exemplary embodiment has been described above as an illustrative example of the techniques disclosed in the present application. However, the techniques of the present disclosure can be applied not only to the above exemplary embodiment but also to exemplary embodiments in which modification, replacement, addition, or removal is appropriately made. Further, the components described in the above first exemplary embodiment can be combined to configure a new exemplary embodiment. Therefore, other exemplary embodiments will be illustrated below.
In step S105 of the above first exemplary embodiment, the shoppers having purchased a predetermined item of goods is put in the same group. However, the grouping does not have to be performed by the method in the above first exemplary embodiment. As long as traffic line information 22 and purchased goods information 23 are used for grouping, any method can be used for grouping.
For example, the multimodal LDA (Latent Dirichlet Allocation) may be used to group the shoppers having a similar motive for visiting the shop into the same group. With respect to
Further, as other grouping methods, traffic line information divider 42a may use, for example, a method called as Non-negative Tensor Factorization, unsupervised learning by using a neural network, and a clustering method (the K-means method or other methods).
In the above first exemplary embodiment, in step S106 of
In the case that HMM is used, Equation (8) shown below can express the probability P(s1, . . . , s26) at the time when a shopper's action is observed in, for example, the state transition series {s1, . . . , s26}.
In the equation, P (mi|mi−1) is the probability of transition from the purchasing stage mi−1 (for example, a stage of purchasing a target item of goods) to the purchasing stage mi (for example, a stage of payment).
P(sj|mi) is the probability of staying in or passing through the area sj in the purchasing stage mi (for example, the probability of staying in or passing through s26 in the stage of payment).
The transition probability P(mi|mi−1) and an output probability P(sj|mi) that maximize the value of Equation (8) will be obtained.
First, the Baum-Welch algorithm or the Viterbi algorithm is used to divide the state transition series according to the initial values of P(mi|mi−1) and P(sj|mi) and to recalculate P(mi|mi−1) and P(sj|mi) according to the division until convergence. By this calculation, the state transition series can be divided into each purchasing stage m.
Here, P(sj|mi) includes both of the probability P(sj|mi−1mi) and the probability P(sj|sj−1), where the probability P(sj|mi−1mi) is the probability that the purchasing stage mi starts at the area sj (the probability that the first area when the state transitions from the previous purchasing stage mi−1 to the next purchasing stage mi is the area sj), and the probability P(sj|sj−1) is the probability that the area when the state transitions from the purchasing stage mi to the same purchasing stage mi is the area sj. P(sj|mi−1mi) is obtained by counting the occurrence of the area sj as the start area of the purchasing stage mi, on the basis of traffic line information 22 in the same group. P(sj|sj−1) can be obtained by the inverse reinforcement learning method from a partial series group corresponding to the purchasing stage mi (for example, s1, . . . , s12).
As described above, the transition probability P(mi|mi−1) of the purchasing stage can be estimated by the HMM. Further, the output probability P(sj|mi) in the area sj for each purchasing stage mi can be estimated by the inverse reinforcement learning method on the basis of the state transition series (traffic line) in the stage mi.
This can divide the state transition series represented by traffic line information 22, for each purchasing stage.
Controller 40 may propose a layout change such that another item of goods in a predetermined relation to a predetermined item of goods is put on a leaving-shop traffic line after being divided into the purchasing stages, and may display, for example, the proposed layout change on display 50. The other item of goods in the predetermined relation is, for example, an item of goods that is often purchased together with the predetermined item of goods.
If controller 40 has input a plurality pieces of goods-layout change information 25 via operation unit 30, controller 40 calculates the transition probability P(si+1|si) after the change of goods layout on the basis of each of the input pieces of goods-layout change information 25.
On the basis of the result, the transition probability P(sa→sb) of a predetermined path may be calculated. Then, the goods-layout change information 25 with which the transition probability P(sa→sb) of a predetermined path is high may be selected from a plurality of pieces of goods-layout change information 25, and the selected piece of goods-layout change information 25 may be output to display 50, for example.
The exemplary embodiments have been described to illustrate the techniques according to the present disclosure. For that purpose, the accompanying drawings and the detailed description have been provided. Therefore, in order to illustrate the above techniques, the components described in the accompanying drawings and the detailed description not only include only the components necessary to solve the problem but also can include components unnecessary to solve the problem. For this reason, it should not be immediately recognized that those unnecessary components are necessary just because those unnecessary components are described in the accompanying drawings and the detailed description.
In addition, because the above exemplary embodiments are for illustrating the techniques in the present disclosure, various modifications, replacements, additions, removals, or the like can be made without departing from the scope of the accompanying claims or the equivalent thereof.
Note that, the shop in the present exemplary embodiments may be a predetermined region. In that case, the plurality of areas in the shop are a plurality of zones in the predetermined region.
The prediction device of the present disclosure enables prediction of the traffic lines of shoppers after a layout change of goods; therefore, the prediction device is useful for various devices that provide users with information of such layout positions of goods that increases the sales.
Number | Date | Country | Kind |
---|---|---|---|
2017-004354 | Jan 2017 | JP | national |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2017/034045 | Sep 2017 | US |
Child | 16274470 | US |