This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2021-075085, filed on Apr. 27, 2021; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to an information processing apparatus, an information processing method, and a computer program product.
There are proposed technologies for predicting future data by using a model estimated from data acquired in the past. For example, the number, density, and the like of people in a given location at a given point of time are predicted by using a model for predicting data representing a people flow (human flow) between locations.
An information processing apparatus according to one embodiment includes one or more hardware processors coupled to a memory. The hardware processors function as an acquisition unit, a model generation unit, and a model generation unit. The acquisition unit serves to acquire one or more patterns from among multiple patterns each representing temporal variation of first data being data to be predicted. The patterns are determined for a first region designated out of regions serving as prediction targets of the first data. The model generation unit serves to generate a prediction model for predicting the temporal variation of the first data in the first region. The prediction model is generated on the basis of the acquired patterns. The model generation unit serves to determine a parameter of the prediction model.
A description will be given below in detail of preferred embodiments of an information processing apparatus according to this invention with reference to the accompanying drawings.
An apparatus for predicting data by using a model is sometimes configured to predict data (for example, data indicating a human flow) for each of multiple sections (meshes) defined by dividing a given region. In the multiple sections, there are sections such as business districts and residential districts, which are different from one another in regional characteristics, and also there are sections having different characteristics depending on a time slot (temporal characteristics) although they are identical to one another in regional characteristics. Therefore, in use of a modeling method in which the same model is applied to all the sections, it is sometimes difficult to accurately predict data.
In the information processing apparatus according to each of the following embodiments, an appropriate model corresponding to the regional characteristics and the temporal characteristics is generated, and prediction for each of the regions (sections) is executed by using the generated model. Therefore, accuracy of the data prediction (for example, prediction of a human flow) using a model can be improved.
Note that, in each of the following embodiments, a description will be given of a case of predicting human flow data (an example of first data) indicating populations in the sections. The human flow data is, for example, populations (the number of people) or population densities. The data taken as a prediction target is not limited to the human flow data, and may be any other data such as, for example, power consumption or weather information.
The storage unit 120 stores a variety of data for use in a variety of processing by the information processing apparatus 100. For example, the storage unit 120 stores a human flow pattern database (DB) 130, past data 121, a parameter file 122, and a predicted value table 123. The human flow pattern DB 130 includes a management table 131, and a human flow table 132.
For example, the regional characteristics are “residential district”, “business district”, “commercial use district”, “industrial district”, and “green district”. Details of the regional characteristics will be described later. Note that these regional characteristics are merely examples, and for example, the regional characteristics may be determined in accordance with “use districts” prescribed by Article 8 of City Planning Act.
In the human flow table 132, a column name of a first column is the basic pattern ID, and column names of a second column and after are the time slots. Each time slot is described by a starting time in a format of “hh:mm”. An ending time of each time slot is immediately before a starting time described in the next time slot. Moreover, when the human flow data differs depending on an attribute of the day such as whether a day is a weekday or a holiday, the human flow data may be stored for each time slot of the week day and each time slot of the holiday. In this case, attribute distinguishing information such as “weekday_hh:mm” may be added to the column name that indicates each time slot.
Next, a description will be given of examples of the respective basic patterns with reference to
As will be described later, in the present embodiment, at least one basic pattern designated for each of the sections out of the basic patterns is subjected to synthesis, whereby a human flow model (an example of a prediction model) for predicting the human flow data is generated. Moreover, parameters of the human flow model are determined such that an error from measurement data (past data) being human flow data measured in the past becomes smaller.
Note that, in
The storage unit 120 can be composed of every commonly used storage medium such as a flash memory, a memory card, a random access memory (RAM), a hard disk drive (HDD), and an optical disc.
The storage unit 120 may be composed of multiple storage media physically different from one another. For example, the respective data (the management table 131, the human flow table 132, the past data 121, the parameter file 122, and the predicted value table 123) may be stored in storage media different from one another. Note that details of the parameter file 122 and the predicted value table 123 will be described later.
Returning to
The sections may be determined in any way; however, for example, sections to be determined by such methods as follows can be used.
A section for which a model is to be generated is designated by a user by using, for example, an input device (such as a keyboard, a mouse, and a touch panel; not illustrated). A method for designating the section may be any method; however, for example, a method of selecting a section from sections displayed on a display device, or the like can be applied.
The acquisition unit 101 acquires information (section ID and the like) indicating the section designated by the user as described above. Moreover, the acquisition unit 101 refers to a basic pattern setting file (details will be described later) in which the basic pattern for each of the section IDs is determined, and acquires information indicating the basic pattern determined for the designated section.
On the basis of the basic pattern acquired by the acquisition unit 101, the model generation unit 102 generates the human flow model (prediction model) for predicting a temporal variation of human flow data in the designated section. For example, the model generation unit 102 synthesizes the basic patterns, thereby generating the human flow model. Details of the model generation processing will be described later.
The determination unit 103 determines the parameters of the human flow model generated by the model generation unit 102. For example, the determination unit 103 determines the parameters of the human flow model such that an error between the past data 121 corresponding to the measured human flow data and the human flow data predicted by the human flow model becomes smaller.
By using the human flow model in which the parameters are determined by the determination unit 103, the prediction unit 104 executes prediction processing for predicting the human flow data of the designated section and time slot.
The output control unit 105 controls output processing for the variety of data by the information processing apparatus 100. A method for outputting the information by the output control unit 105 may be any method; however, there can be applied a method of displaying the information on the display device, a method of transmitting the information to an external device connected by a network, and the like.
The above-described respective units (the acquisition unit 101, the model generation unit 102, the determination unit 103, the prediction unit 104, and the output control unit 105) are achieved, for example, by one or more hardware processors. The above-described respective units may be achieved by causing a hardware processor, such as a central processing unit (CPU), to execute a computer program, that is, be achieved by software. The above-described respective units may be achieved by one or more hardware processors such as dedicated integrated circuits (ICs), that is, be achieved by hardware. The above-described respective units may be achieved by using software and hardware in combination. When multiple processors are used, each of the processors may achieve one of the respective units, or may achieve two or more of the respective units.
Note that, in
Next, a description will be given of the model generation processing by the information processing apparatus 100 according to the first embodiment.
Through the input device, for example, the user designates one or more sections in which the human flow model is to be generated. The acquisition unit 101 acquires section IDs of the designated sections (Step S101). Subsequently, the human flow models are constructed one by one for each of the designated section IDs.
The model generation unit 102 acquires unprocessed section IDs, and identifies, for each of the acquired section IDs, one or more basic patterns determined as synthesis targets (Step S102). At this time, the model generation unit 102 may identify all basic patterns as synthesis targets, or may identify basic patterns determined for each of the section IDs. In the latter case, the model generation unit 102 may refer to the basic pattern setting file as illustrated in
Returning to
An example of a method for synthesizing the human flow model will be described below. A structure of a human flow model to be synthesized for an i-th (i is an integer of 1 or more and N or less) section ID is defined as Gi(t). Human flow data at a t-th time slot (t is an integer of 1 or more and T or less) in the i-th section ID is defined as {circumflex over ( )}y(i,t). Note that “{circumflex over ( )}y” is a variable in which a circumflex “{circumflex over ( )}” is assigned onto y, and means a predicted value of y. The readout basic pattern is defined as Fj(t) (j is an integer of 1 or ore and M or less). The parameters are defined as {α(i)0, α(i)1, . . . , α(i)M} The model generation unit 102 synthesizes (generates) the human flow model by, for example, the following Expression (1).
ŷ
(i,t)
=G
i(t)=α(i)0+α(i)1F1(t)+ . . . +α(i)MFM(t) (1)
In Expression (1), α(i)0 is a constant term. The function Fj(t) indicating the basic pattern in Expression (1) is a function to extract, for example, a predicted value of human flow data in a given time slot t from the human flow table 132 while taking, as keys, a j-th basic pattern ID and a j-th time slot. The function Fj(t) is not limited to this example.
Moreover, in Expression (1), the function Gi(t) represents an expression for synthesizing the basic patterns by linear combination; however, is not limited to this example. Any method using a function, a model, or the like, which is capable of synthesizing one or more basic patterns may be used. For example, such a method may use a common statistical model and a machine learning model, such as a general linear model, a neural network model, and a deep learning model. In this case, each of the models is configured to, for example, receive one or more basic patterns and output a single human flow model.
After the structure of the human flow model is determined, the determination unit 103 determines the parameters {α(i)0, α(i)1, . . . , α(i)M} of the synthesized human flow model Gi(t) by using the human flow data stored in the past data 121 (Step S105).
For example, the determination unit 103 can determine the parameters {α(i)0, α(i)1, . . . , α(i)M} of the human flow model of the i-th section ID by, for example, a least squares method like Expression (2). y(i,τ) (τ is an integer of 1 or more and T or less) represents an actual value of the human flow data included in the past data 121.
Note that a method for obtaining the parameters is not limited to the least squares method. Another method for obtaining such parameters that further reduce the error with respect to the past data 121 may be applied. For example, a method of maximum likelihood estimation, a Bayesian estimation method, or a quasi-Newton method may be used for the determination unit 103 to determine the parameters.
In the case of synthesizing the human flow model Gi(t) by a machine learning model such as the neural network model, the determination unit 103 learns the machine learning model so as to minimize a loss function by, for example, using the past data 121 as training data, thereby determining parameters of the model.
Expression (2) of an evaluation function for optimizing the parameters may be modified to use sparse modeling. Expression (3) is an example of such a modification.
Expression (3) corresponds to an expression in which a penalty term as a second term of an argument of an argmin function is incorporated. The penalty term in Expression (3) is a term according to LASSO. The penalty term may be another penalty term such as Elastic Net. Moreover, Expression (3) may be configured to incorporate multiple penalty terms. Use of such a modified expression makes it possible to identify parameters that establish α(i)m=0. That is, a configuration can be adopted so as not to select basic patterns in which the parameters turn to 0. In other words, it is made possible to make a selection that a part of the basic patterns is not to be used in the human flow model. Note that a hyperparameter λ in Expression (3) may be set in advance, or may be determined by an identification method of λ in general sparse modeling.
The determination unit 103 outputs the determined parameters and the determined structure of the human flow model to the parameter file 122 (Step S106).
As illustrated in
In the case of a section in which a human flow model composed of not all the basic patterns but a part of selected basic patterns is used, “NULL” is set as a value of each parameter corresponding to the unselected basic pattern.
The parameters {α(i)0, α(i)1, . . . , α(i)M} of the human flow model may have different values for each time slot t. In this case, for example, when the number of time slots is T, (M+1)×T pieces of the parameters are output to the parameter file 122.
Returning to
Next, the prediction processing by the information processing apparatus 100 according to the first embodiment will be described. The prediction processing is processing for obtaining predicted values of human flows of one or more sections by using the human flow model obtained by the model generation processing.
The user designates, through the input device, a time period and sections in which the human flows are to be predicted. The acquisition unit 101 acquires prediction conditions representing the designated period and sections (Step S201).
Returning to
As illustrated in
The time slots are expressed in a format of “YYYY/MM/DD hh:mm:ss”. Between a starting point and an ending point based on the starting time and the ending time indicated by the prediction conditions, the time slots are set at intervals according to the time resolution. Therefore, each of the time slots has a range from a time expressed as a column name to one time before a time designated in a next column name. For example, when the column name is “2020/11/16 20:00:00” and the time resolution is 1 hour, a time slot designated by this column name is “2020/11/16 20:00:00 to 2020/11/16 20:59:59”.
In response to subsequent prediction operations, predicted values of human flows in sections indicated by the row names in the time slots indicated by the column names are inserted into the respective elements of the predicted value table 123. The predicted values are not yet calculated at the time of this step, and thus NULL values are inserted into all the elements.
Returning to
The prediction unit 104 determines the sections (section IDs) to be processed (Step S203). The prediction unit 104 acquires, from the parameter file 122, the parameters corresponding to the determined section IDs, and constitutes the human flow model by using the acquired parameters and the basic pattern read out from the human flow pattern DB 130 (human flow table 132) (Step S204).
The prediction unit 104 predicts the human flow data in the respective time slots in the designated period by using the constituted human flow model (Step S205). A description will be given below of an example of the case of predicting the human flow data in a time slot t′ (t′ is an integer of 1 or more and T′ or less) for the i-th section.
First, the prediction unit 104 reads out the parameters {α(i)0, α(i)1, . . . , α(i)M} of the human flow model Gi(t) of the i-th section described in the parameter file 122. The prediction unit 104 reads out, from the human flow table, the basic pattern corresponding to the parameters in which values are not NULL among the parameters {α(i)0, α(i)1, . . . , α(i)M}, and constitutes the human flow model Gi(t). The prediction unit 104 calculates the predicted value of the human flow in the time slot t″ by the constituted human flow model Gi(t). The prediction unit 104 causes storage of the calculated predicted value as an element in a row corresponding to the relevant section ID in the predicted value table 123 and in a column corresponding to the time slot t′.
The prediction unit 104 determines whether or not all the sections are processed (Step S206). When all the sections are not processed yet (Step S206: No), the processing returns to Step S203, and is repeated for the next section.
When all the sections are processed (Step S206: Yes), the prediction unit 104 outputs the predicted value table 123 (Step S207), and ends the prediction processing.
As described above, in the first embodiment, basic patterns determined for each of the sections out of multiple basic patterns are synthesized, whereby the prediction model (human flow model) is generated, and the prediction is executed by using the generated model. Thus, the accuracy of the prediction of the data (for example, the human flow), the prediction using a model, can be improved.
In a second embodiment, a description will be given of an example of using a basic pattern different from that of the first embodiment.
The second embodiment is different from the first embodiment in terms of a function of the model generation unit 102-2. The other configurations and functions are similar to those in
The model generation unit 102-2 generates a human flow model by synthesizing basic patterns different from those in the first embodiment.
Note that the entire flow of the model generation processing and the prediction processing in the second embodiment is similar to that in
A function Fj(t) representing each of the basic patterns can be represented by a mathematical model that has U pieces (U is an integer of 1 or more) of parameters {pi, p2, . . . , pU}. The mathematical model can be models with such a variety of forms as follows. At least part of the M pieces of basic patterns may be represented by mathematical models different from one another.
For example, as shown in Expression (4), the j-th basic pattern can be expressed by a mathematical model of a sin function regarding the time slot t.
The mathematical model of the sin function in Expression (4) can store formats of the functions, that is, model structures as the basic patterns, and can have values of the parameters p(i)1 different for each of the section IDs. Therefore, in a human flow pattern DB 130 of the present embodiment, the basic pattern IDs are managed by a management table 131, and objects that indicate the formats of the functions for each of the basic patterns are managed as the model structures by a human flow table 132.
The values of the parameters p(i)1, which indicate phases in Expression (4), may be determined simultaneously when the determination unit 103 determines the parameters {α(i)0, α(i)1, . . . , α(i)M} by using Expression (2). Values of the parameters p(i)1 may be set in advance.
The basic patterns may be represented by using actual values of past human flow data and human flow data in other sections. That is, the basic patterns may include patterns that change in response to human flow data measured in the past in at least one of the subject section and sections other than the subject section. For example, the following Expression (5) is an example of an expression that represents a basic pattern using a vector auto regression model. Expression (5) has a format that uses actual values of the past human flows in the subject section until a time slot Q pieces before.
F
j(t)=Σq=1Qpq(i)*yt-q(i)+p0(i) (5)
Expression (6) is another example of an expression that represents a basic pattern using the vector auto regression model. Expression (6) is an example of an expression that uses, as a variable, an actual value of the past human flow in the h-th (h is an integer of 1 or more and N or less, satisfying i≠h) other section.
For predicting the human flow data, there may be used information, such as weather information or mobility information, which has been stored in other devices (a database device, a storage apparatus, and the like). The weather information includes an air temperature, a humidity, a precipitation, a snowfall, a solar radiation intensity, a total amount of sunshine, and the like in each section. The mobility information includes a traffic, the number of routs, the number of stations, and the like in each section. A description will be given below of an example of using the weather information.
Moreover, the information processing apparatus 100-2b is connected to a storage apparatus 200-2b. The information processing apparatus 100-2b and the storage apparatus 200-2b may be connected to each other in any form; however, for example, can be connected to each other by a network such as the Internet. The network may be either a wired network or a wireless network, or may have a form in which both thereof are mixed with each other.
The storage apparatus 200-2b stores weather information 221. The weather information 221 includes, for example, an air temperature table that stores temperatures for each of the time slots and each of the sections, and a precipitation table that stores precipitations for each of the time slots and each of the sections. A description will be given below of an example of using the air temperature and the precipitation as the weather information.
Expression (7) is an example of an expression representing a basic pattern that uses an air temperature x(i)1 and a precipitation x(i)2.
F
j(t)=p1(i)*x1,t(i)+p2(i)*x2,t(i)+p0(i) (7)
Parameters {p(i)1, p(i)2, p(i)0} in Expression (7) are adjusted by, for example, the determination unit 103 in accordance with the actual values of the air temperatures and the precipitations in the past and with the predicted values thereof.
The weather information 221 may be referred to by the prediction unit 104 when predicting the human flow data. For example, the prediction unit 104 reads out the weather information corresponding to the time slot and the section, each being targets of predicting the human flow data, from the storage apparatus 200-2b, and uses the readout weather information for the prediction processing.
As described above, in the information processing apparatus according to the second embodiment, the model generation processing and the prediction processing can be executed by using the basic patterns represented in the variety of formats.
An information processing apparatus according to a third embodiment has a function to generate basic patterns.
The third embodiment is different from the first embodiment in that the pattern generation unit 106-3 is added. Other configurations and functions are similar to those in
The pattern generation unit 106-3 generates basic patterns for use in synthesizing a human flow model. For example, the pattern generation unit 106-3 generates one or more basic patterns by clustering actual values of human flow data measured in the past, and outputs the basic patterns to a management table 131 and a human flow table 132. More specifically, the pattern generation unit 106-3 classifies the actual values of the human flow data into multiple clusters on the basis of similarities between pieces of the human flow data, and generates multiple basic patterns corresponding to a different one of the multiple clusters by using data belonging to a corresponding one of the multiple clusters having been classified.
Note that the model generation unit 102 synthesizes the basic patterns generated as described above, thereby generating the human flow model. The entire flow of the model generation processing and the prediction processing in the third embodiment is similar to that in
Next, a description will be given of the pattern generation processing by the information processing apparatus 100-3 according to the third embodiment with reference to
The user designates learning conditions through the input device. The learning conditions include ranges of the periods and the sections of the past human flow data used for generating the basic patterns. Each period is represented by, for example, a starting time and an ending time. Each time is expressed in a format of “YYYY/MM/DD hh:mm:ss”. Moreover, the range of the sections is designated, for example, by one or more section IDs.
The acquisition unit 101 acquires the designated learning conditions (Step S301). The pattern generation unit 106-3 acquires, from the past data 121, the actual values of the past human flow data in the periods designated by the learning conditions and the sections identified by the section IDs (Step S302).
The pattern generation unit 106-3 executes the clustering that is based on the actual values (Step S303). First, the pattern generation unit 106-3 calculates similarities between the section IDs from the acquired actual values, and generates a similarity table.
A method for calculating the similarities may be any method of calculating a similarity between two series data. For example, the similarity may be a correlation coefficient and a cosine similarity. The similarity may be dynamic time warping and an inverse number of a distance such as a Euclidean distance.
Next, the pattern generation unit 106-3 executes the clustering on the basis of the calculated similarity. For example, the pattern generation unit 106-3 classifies, into one cluster, the section IDs in which similarities are preset threshold value or more.
The pattern generation unit 106-3 generates one basic pattern for one cluster. That is, the pattern generation unit 106-3 generates basic patterns whose quantity is equivalent to the number of the clusters. Returning to
The pattern generation unit 106-3 determines a cluster to be processed from among one or more clusters obtained by the clustering (Step S304). For each of the time slots, the pattern generation unit 106-3 calculates the human flow data by an average value of the past actual values in the section IDs belonging to the determined cluster, thereby generating the basic pattern (Step S305).
The pattern generation unit 106-3 determines whether or not all the clusters are processed (Step S306). When all the clusters are not processed (Step S306: No), the processing returns to Step S304, and is repeated for the next cluster. When all the clusters are processed (Step S306: Yes), the pattern generation unit 106-3 outputs the generated basic pattern, for example, as the human flow table 132 (Step S307), and ends the pattern generation processing.
As described above, in the third embodiment, the basic pattern can be generated by using the actual values of the past human flow data, and can be used for the model generation processing and the prediction processing.
A description will be given below of an example of a display screen applicable to each of the above-described embodiments.
In
Note that pj,u in
As described above, in accordance with the first to third embodiments, it becomes possible to execute the prediction of the data, which uses the models, with higher accuracy.
Next, referring to
The information processing apparatus according to the first to third embodiments includes a control device such as a CPU 51, storage apparatuses such as a read only memory (ROM) 52 and a RAM 53, a communication I/F 54 for connecting to a network and performing communication, and a bus 61 for connecting the respective units to one another.
A computer program executed by the information processing apparatus according to the first to third embodiments is provided while being embedded in advance in the ROM 52 and the like.
The program executed by the information processing apparatus according to the first to third embodiments may be configured to be a file in an installable format or an executable format, and to be provided as a computer program product by being recorded on a computer-readable recording medium such as a compact disk read only memory (CD-ROM), a flexible disk (FD), a compact disk recordable (CD-R), or a digital versatile disk (DVD).
Moreover, the program executed by the information processing apparatus according to the first to third embodiments may be configured to be stored in a computer connected to a network such as the Internet, and to be provided by being downloaded over the network. Moreover, the program executed by the information processing apparatus according to the first to third embodiments may be configured to be provided or distributed over the network such as the Internet.
The program executed by the information processing apparatus according to the first to third embodiments can cause a computer to function as the above-mentioned respective units of the information processing apparatus. This computer can cause the CPU 51 to read out the program from the computer-readable storage medium onto the main storage apparatus, and to execute the same.
While given embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2021-075085 | Apr 2021 | JP | national |