This invention relates to agents. Businesses need to adapt to many competitive pressures. For example, financial markets are increasingly demanding that companies use capital more efficiently; other businesses are seeking global playing fields to maintain growth and diversify risk; customers are demanding service as markets of one and forcing companies toward mass customization; and innovation cycles are continually accelerating.
These pressures on businesses are driving changes that have enormous implications for supply networks. For some companies, shrinking capital availability is forcing companies to streamline manufacturing and supply operations and build efficiencies, which are critical to the supply network. For other companies, information ubiquity is driving and facilitating globalization, which shrinks distances to markets and resources. The information ubiquity also requires levels of supply network visibility and collaboration that were not essential in traditional supply chains. Customers are armed with information about the real value of products, which is shrinking customer loyalty and requiring customer-service levels too expensive for companies that are unable to manage supply chain efficiencies. Finally, shrinkages in the time available to build and launch products are forcing companies to innovate at velocities far greater than before.
Ultimately, competitive pressures push profit margins lower. Product manufacturers must find ways to improve efficiency, thereby reducing costs, to survive in highly competitive markets. Supply chain efficiency plays a key role in improving margins and can be a determining factor in the success of manufacturers.
A supply chain is a network of facilities and distribution options that performs the functions of procuring materials, transforming the materials into semi-finished and finished products, and distributing the finished products to customers. Supply chain management (“SCM”) is a business policy that aims to improve all activities along the supply chain. SCM results in improved integration and visibility within individual companies, as well as flexibility across supply and demand environments. As a result, a company's competitive position is greatly enhanced by building supply networks that are more responsive than the current sequential supply chains.
SAP AG and SAP America, Inc. provide SCM solutions for product manufacturers to help them reach their goals. Some of the SCM solutions are based on the mySAP.com e-business platform. One of the building blocks of the e-business platform is the SAP R/3 component that provides enterprise resource planning functionality. The SAP R/3 product includes a Web Application Server (“Web AS”), an R/3 core, and various R/3 extensions. The SCM Extensions of R/3 provide various planning, coordination, execution, and optimization solutions that are associated with a supply chain.
In general, in one aspect, this invention provides methods and apparatus, including computer program products, implementing and using techniques for monitoring and predicting various data related to an inventory. A data processing apparatus interacts with one or more business computers to obtain one of replenishment and consumption activity data related to an inventory. A model for variations in the inventory is created. The model is used to determine estimates of variation for one of currently planned replenishment activities and currently planned consumption activities. The estimates of variation are used to predict future inventory levels.
Advantageous implementations can include one or more of the following features. Creating a model can include applying machine learning to create the model. Using the model for variations in inventory can include using an algorithm including a decision tree. Using the algorithm including a decision tree can include using a classification and regression tree. The classification and regression tree can be constructed using a growing stage and a pruning stage to provide one or more splitting points. The growing stage can include instructions for recursively splitting data using a criterion for maximizing the information resulting from each of the one or more splitting points.
The pruning stage can include instructions for removing splitting points. The criterion for determining splitting points can include applying analysis of variance techniques. The size of the decision tree can be selected based on a process including instructions for determining a size of the data; splitting data into multiple substantially equal-sized groups; constructing multiple test decision trees, where each test decision tree includes all of the groups except for one of the groups, such that test decision trees are created in which each group is excluded once; pruning each test decision tree at one or more degrees; testing each of the test decision trees by evaluating each test decision tree with the data excluded from creating that test decision tree; selecting a degree of pruning based on a maximization of the average performance for the excluded data; and pruning the decision tree to the selected degree of pruning.
The model can include one or more of data relating to: an update number specifying how many times an activity has been updated, a creation date specifying when an activity was first planned, a current date specifying the date when information in a record was extracted from the activity, an expected date specifying a date at which the activity is expected to be fulfilled, an expected quantity specifying an amount of material required by the activity, source of an order specifying an entity making the order, demand at a source in a temporal window prior to the expected date, actual delivered quantity, actual delivery date, time spread for use with activities that are completed in more than one piece, and activity cancelled information relating to cancellation of an activity.
A model to for predicting variability on actual delivery quantity can be created. The model can include a model for predicting variability of an actual delivery date. The model can include a model for predicting an order cancellation probability. It can be determined if a qualitative shift in behavior has occurred, and if a qualitative shift in behavior has occurred data prior to the qualitative shift in behavior can be discarded and a decision tree can be created using data generated after the qualitative shift in behavior. The data can be processed based on the quantity of data. Processing the data can include sampling the data if the quantity of the data exceeds a threshold value. Processing the data can include using additional data if the quantity of the data is below a threshold value. Predicted future inventory levels can be displayed. The predicted future inventory levels can be displayed as a bicumulative display. The predicted future inventory levels can be displayed as a relative density display. An early warning agent can be included, the early warning agent having instructions to receive the predicted future inventory levels and use the predicted future inventory levels to determine whether to order additional stock for the inventory.
The data can be generated using a radio-frequency identification device. The data can be generated upon the sale of stock in an inventory. The data can be generated upon the transfer of stock in an inventory. An implementation can include one or more agents in a supply chain management system.
In general, in another aspect, this invention provides a system comprising one or more agent computers, the agent computers. The agent computers include instructions to interact with one or more business computers to obtain one of replenishment and consumption activity data related to an inventory; create a model for variations in the inventory; use the model to determine estimates of variation for one of currently planned replenishment activities and currently planned consumption activities. The estimates of variation are used to predict future inventory levels.
The details of one or more implementations of the invention are set forth in the accompanying drawings and the description below. Other features and advantages of the invention will become apparent from the description, the drawings, and the claims.
Like reference symbols in the various drawings indicate like elements.
An agent is an active program that performs some high-level business functions, such as monitoring an inventory for a particular Stock Keeping Unit (SKU) for potential stock out situations and sending alerts when various thresholds are reached. For example, an agent can be programmed as an inventory early warning agent (“IEWA”) that monitors one or more inventories in a factory, warehouse, store, or shelf within a store. The IEWA is programmed to monitor the inventory based on real time data that is obtained, for example, from cash registers or radio-frequency identification devices that scan information relating to an item being purchased or transferred and use that information to update the IEWA as to the level of stock on a particular shelf.
As described in more detail below, the IEWA is a predictive and adaptive inventory management application that can be used to monitor and predict future inventory levels by modeling variability in both demand and supply related supply chain activities. The IEWA uses learning techniques that can estimate potential variation in inventory levels in the near future in order to identify potentially risky situations early enough to allow for corrective measures. To provide increased effectiveness the IEWA is operated with minimal human intervention, parameter adjustment, and manual calibration. These characteristics are achieved with machine-learning techniques to recognize patterns of behavior from historical data around consumption and replenishment and around the performance of resources and supply chain partners. The IEWA can use two types of modes of operation for its predictive function: in a first mode, the detailed mode, use the IEWA predictive models built from detailed activities; and in a second mode, the aggregate mode, use the IEWA predictive models built from the aggregation of multiple activities.
In the detailed mode of operation, potential variation in inventory is estimated for each planned replenishment or consumption activity. For example, given historical performance data, the IEWA estimates whether a planned truckload of 12 ounce bottles of Bob's bottles will arrive any time within four hours prior to and six hours after the planned delivery time with a quantity that is between 95% and 100% of the requested quantity. A second stage of processing then combines the estimates of potential variation for individual activities into an estimate of the potential variation of the projected inventory levels.
In the aggregate mode of operation, potential variation of the projected inventory level is estimated directly from projected cumulative replenishment and consumption. This estimate is calculated as the aggregation of replenishment and consumption activities.
Both modes of operation provide advantages. For example, the detailed mode of operation is advantageous when low volumes of detailed data are available and when the degree of variation in activities is highly dependent on the particular resource or partner involved. The aggregate mode of operation is advantageous when it is necessary to deal with high volumes of data, or when it is difficult to track the outcome of any particular planned activity. The aggregate model is discussed below initially followed by a discussion of the detailed model. Nonetheless, the examples and descriptions of
As shown in
The IEWA 115 also can be used when the shelf is replenished. For example, when the shelf 105 is replenished, the shelf-monitoring program 120 sends a message to the execution module 140 along the message path 135, which sends a message to the distribution center 165 along the message path 170. These messages are used to update the distribution center of the amount of stock on the shelf 105. The distribution center can use the same IEWA or a separate IEWA to apply predictive models to the distribution center to estimate whether and when there will be an undesirable variation in inventory levels.
At another level, a store can use an IEWA to monitor the levels of an item on the shelf and in the inventory for one or more items to estimate potential undesirable variations in inventory levels. When items are sold, for example, by being scanned at a cash register, the IEWA takes that sales data and uses algorithms to determine whether there will be an undesirable variation in the inventory levels and when to send an order to replenish the shelf and/or order more of that item from a warehouse or distribution center.
At an even higher level, a warehouse or distribution center can use an IEWA to monitor the levels of an item within the warehouse, such as on shelves, on pallets, in quarantine, or at another location within the warehouse, to determine whether there will be an undesirable variation in the inventory levels and when to send an order to replenish. Customers of the warehouse, such as a retailer or a factory, order the item from the warehouse. For example, a consumer product goods (“CPG”) retailer can order a pallet load of an item which the warehouse operator loads onto a delivery truck of either the warehouse, the retailer, or a third party logistics supplier. When the pallet is loaded on the truck, the warehouse operator can use a wireless communications device to notify the inventory management software that a pallet-load of a particular item has been transferred from the warehouse. Either the wireless communications device or the inventory management software can be programmed to notify the IEWA that a pallet-load of the particular item has been transferred from the warehouse. The IEWA takes that transfer data and analyzes it using algorithms to determine whether there is likely to be an undesirable variation in inventory levels and when to order addition stock of that item. One example of a framework of the IEWAs, an example of a messaging system used by the IEWAs, and the algorithms used by the IEWAs are described in more detail below.
As shown in
The MTS 220 is a service that is provided by a particular agent framework and allows agents to send messages, such as an AclMessage, to other agents. An AclMessage encapsulates a communication between two agents and has some characteristics in common with an email message, such as, for example, specifying a sender and recipient, having a subject, and providing message content.
The DF 225 is a service provided by a particular agent framework and provides the framework's agents with access to the central directory service. The directory service is a central service that provides identity (white page) and capability (yellow page) search facilities across an agent community. There is one directory service for one agent community. A directory service might be federated with the directory service for another agent community. An agent community is a collection of agent frameworks (and therefore agents) that collaborate to provide various business functions. A community can consist of several agent frameworks, and each framework can in turn contain several agents.
The AMS 230 is a service provided by a particular agent framework and provides agent lifecycle management facilities within the framework. The facilities allow remote management of a community of agent frameworks by essentially providing an external interface to the AMS in each agent framework. For example, the AMS allows administrators and control agents to manage the execution of other agents by stopping, starting and suspending agents. The agent framework architecture 200 also includes an administrative user interface (“AUI”) that allows a system administrator to manage an agent community. The AUI uses the directory service to identify currently running agent frameworks. It then uses each framework's AMS to manage the agents within that framework.
The post office 235 is a service provided by a particular agent framework. The post office 235 receives and maintains AclMessages addressed to the agents within that framework.
The directory facilitator 225 (“DF”) in each agent framework 217 updates the directory service 240, registering new agents as they are started up and deregistering agents as they are shut down. The directory service 240 then allows agents to search for other agents, both by identity (for example, using a service that is analogous to a white page service) and capability (for example, using a service that is analogous to a yellow page service). The DF 225 provides access to this functionality within each agent framework 217.
In the scenario of
As described briefly above, the agents discussed above can perform analysis of the inventory data that they receive to determine if there is likely to be a potential variation in inventory levels and, if so, when to request a replenishment of the inventory from a warehouse or the production of goods by a manufacturer to replenish the inventory in a warehouse or at a store. The analysis can be a simple process involving the comparison of inventory on the shelf or in a warehouse to a threshold value. If the inventory level is above the threshold value, the agent does not request replenishment but if the inventory level is at or below the threshold value the agent requests replenishment. Such an analysis does not take into account the likelihood that the stock on the shelf or in the warehouse will be sold quickly or slowly or that there can be some variability in the rate at which the stock is sold. Moreover, such an analysis does not take into account the likelihood that the warehouse or manufacturer will have sufficient stock on hand to supply the store, much less the likelihood that the warehouse or manufacturer can make a timely delivery. To ensure that the store, warehouse, or manufacturer has adequate quantities of stock to meet its needs while also minimizing excessive inventory levels, the agent 205 uses a conditional probabilistic predictive statistical analysis to predict an expected inventory level, an upside confidence bound, and a downside confidence bound. For many companies, an expected inventory level that is near or below zero is undesirable. The company can determine the minimum low level, compare the expected inventory level to the company's minimum low level, and order more stock if these expected levels are below the minimum low level.
As shown in
Any suitable statistical method can be used to determine the inventory level and confidence bounds. For example, the statistical analysis used by the agent can be implemented with a predictive statistical model that can predict the variability at any point along the planned inventory trajectory, can be started with very little historical inventory data, and can improve as the data is generated over time. The plot of planned and actual inventory trajectories illustrated in
One such statistical model that can be used is a probabilistic inference model based on a conditional Gaussian approximation. This model, its derivation, and its application to the inventory early warning agent are explained in more detail below.
To derive the model used for explanation purposes, there is an assumption of a known, planned forecast for inventory levels over time. The planned forecast is a piecewise constant function specified by its breakpoints. As illustrated in
To actually develop a prediction algorithm, assume that inventory level plans are generated in regular fixed intervals, Δ (for example, every day at midnight), and set the origin of time to be the time at which there is the earliest historical plan in the dataset. This origin of time can be recent and provide a very limited number of data points. Thus, plans are generated at times nΔ for n∈Z+, where n is the index of the most recent planning period. Further, assume that all plans are generated out to a horizon of τ(>Δ) and indicate the inventory levels of plans generated at time nΔ by Inp(nΔ+δ), where 0≦δ≦τ is a continuous quantity. Because plans are generated at midnight, δ=0 in the future. This inventory plan is piecewise constant and specified by its breakpoints (Inp(b),δnp(b)) for δnp(b)≧0 and b∈[1, . . . ,Bn] where Bnp is the number of breakpoints defining the n th plan. For conciseness define Inp(·) and δnp(·) to be the Bnp vectors with components [Inp(b) . . . Inp(Bnp)]T and [δnp(b) . . . δnp(Bnp) ]T. Furthermore, let Ia(t) be the actual inventory at time t>0. The inventory is not measured at all times, but can be interpolated from a piecewise constant function. The breakpoints defining the actual inventory are given by (Ia(b),ta(b)) for b∈[1,Ba] where Ba is the number of breakpoints defining the actual inventory and all t(b)≧0. Ia(·) and ta(·)$ are the Ba vectors [Ia(b) . . . Ia(Ba)]T and [ta(b) . . . ta(Ba)]T, respectively.
Based on the definitions above, for any time t there are a number of plans that predict the inventory at that time. For example, the most recent plan was formed at
Based on the above notation it is straightforward to develop or build the model and specify the data from which to build the model. Specifically, if n is the index of the most recent planning period, the times at which the model must predict the inventory levels are given by tβ=nΔ−δβ. By defining the q vector δ≡[δ1, . . . ,δq]T, the model can be used to determine P(I|n, δ, I2p(·1), δ·2p(·1), Ia(·), ta(·)), where I·2p(·1)=[I1p(·) . . . INp(·)]T and δ·2p(·1)=[δ1p(·) . . . δNp(·)]T. To understand how the algorithm works it is helpful to understand how order streams are converted to the inventory levels that serve as the input to the aggregate predictor. Since the algorithm works entirely with inventory levels (whether planned or actual), consumption and replenishment order streams need to be converted into inventory levels. Conceptually, this is straightforward: replenishment orders increase the inventory while consumption orders decrease inventory. The only complication is that consumption and replenishment are only measured at certain times such that there is not data describing what happened to the inventory between those times at which the inventory was measured. Additionally, the times at which the measurements are taken can be irregularly spaced. Both of these problems are easily solved with a model of inventory levels between measurements.
The model of inventory levels is based on scenarios to which the inventory is actually subjected. As such, there are a variety of occurrences that might happen to the inventory levels between measurements. The simplest case is that the inventory was unchanging between measurements. This is termed the piecewise constant case since it is possible to interpolate inventory levels between measurements by using a piecewise constant function. Alternatively, if successive inventories are measured as (I1, t1) and (I2, t2) it is possible to linearly interpolate the inventory level at time t (where t1≦t≦t2) as I(t)=I1+(I2−I1)(t−t1)/(t2−t1). The model is based on the assumption of a piecewise constant rather than piecewise linear interpolation, although either choice or other interpolation mechanism is acceptable for data preparation since the algorithm is independent as to the choice. All the algorithm requires is that the inventory levels be measured at equally spaced intervals, Δ, and this can always be accomplished by interpolation by either method.
The particular approach used with the IEWA uses the essential idea of exploiting correlations in errors between planned inventories and the actual inventories themselves. These correlations are used to estimate a probability density for a number of errors and then condition this density on the available data at the time a prediction needs to be made to find the likely errors at the time of prediction. The most likely correction to the planned inventory can then be applied to estimate the most likely actual inventory. The full probability can be used to define confidence intervals around this most likely prediction. Examples of correlations in errors include seasonal errors in inventory, such as season changes in inventory related to holidays or other seasonal events.
The estimation of the actual inventories uses an estimation of the systematic errors made in the planning process. Consequently, the model uses an error function defined so that Ia(t)=Inp(t−nΔ)+ƒn(t−nΔ) where nΔ is the time at which the plan was generated and which must satisfy t−nΔ≦τ. For prediction into the future the model needs only an estimate of ƒn(t) since I(nΔ+δβ)=Ia(nΔ+δβ)=Imp((n−m)Δ+δβ)+ƒm((n−m)Δ+δβ) for all q choices of δ. In the model, fn is defined as fn=[ƒn(nΔ+δ1) . . . ƒn(nΔ+δq)]T, the vector of errors made by the n th plan as it forecasts the future. Knowing the probability density for fn is all that is needed since the density for Ia simply shifts the mean.
The next step is to estimate P(fn) given the historical data by exploiting two types of correlations that likely exist in the data. First, the elements of fn are likely correlated because if inventory is high (or low) at time nΔ+δ1 then it is also likely to be high (or low) at the later time nΔ+δ2 so long as δ2 is not much greater than δ1. Moreover, it is reasonable to assume that the plans formed at subsequent planning periods will be correlated since the more recent plan is probably an update of an older plan. To capture such relationships in the historical data, the algorithm uses a model of the joint density P(fn, fn−1, . . . , fn−p). A reasonable choice of p is p=τ, although a good choice of p is that value determined by an autoregression algorithm. A suitable model for the joint density is that of a Gaussian joint density, as follows:
where {tilde over (f)} is the vector of length (p+1)q given by
As described in more detail below, this Gaussian model can be used for predicting inventory levels and calculating the mean and covariance.
If predicting forward from time nΔ, P(fn) gives the distribution over the possibilities from which it is possible to infer expected values and confidence intervals. In general, this is obtained from P({tilde over (f)}n) by marginalizing over the previous errors fn−1, . . . , fn−p. However, the inventory situation is not this simple because it is desirable to condition the estimate on previously observed errors and not simply use a static estimate. In addition, many of the elements of the previous errors fn−i are for times greater than nΔ and consequently are unknown for the purposes of prediction. The unknown elements from fn−i are those for which (n−i)Δ+δα>nΔ or more simply those for which δα>iΔ.
As a solution, define
where,
and
The density that then must be determined from P({tilde over (f)}n) is
which is easily determined by integration and an application of Bayes rule.
The first step is integrating to determine
as illustrated below:
where
is given by Eq. (0.1). The integration results in another Gaussian distribution with mean μ≦ and covariance Σ≦. If N> is a subset of {1, 2, . . . , (p+1)q} giving the indices in {tilde over (f)}n of
and N≦={1, 2, . . . , (p+1)q}\N> then
The notation a(Nr) indicates the |Nr|-vector formed from a by keeping the elements in the ordered index set Nr. The elements are ordered according to the order in N. Similarly A(Nr,Nc) is the |Nr|×|Nc| matrix formed from A by keeping the rows in Nr and the columns in Nc in their respective orders. This description uses the common notation that |S| gives the number of elements in the set or vector S.
Applying Bayes rule results in
A standard calculation yields this conditional probability as Gaussian having mean and covariance given by
The above formula assumes that Σ2 is invertible. In many situations this can not be the case (for example, if the planned and actual activities are always in complete agreement). As such, Eq. (0.4) is modified to be:
where I is the identity matrix having the same size as Σ2 and λ is a small positive quantity (for example, 10−5). In Eq. (0.5) the μ vectors are given by
μ1=μ≦(N1), μ2=μ≦(N2) (0.6)
and the Σ matrices are defined by
Σ1=Σ≦(N1,N1), Σ1,2=Σ≦(N1,N2), Σ2,1=Σ≦(N2,N1), and Σ2=Σ≦(N2,N2) (0.7)
where the index sets are N1={1, 2, . . . , q} and
Using the above results, the complete algorithm can be used to make the prediction as follows: (1) assuming a value of p (or having the algorithm determine p), construct an estimate of
from the historical data; (2) construct the index set N≦ and form μ≦ and Σ≦ from
according to Eq. (0.3); (3) using historical error values construct the vectors of past errors
build
according to Eqs. (0.5), (0.6) and (0.7); and (0.5) return
as the predicted actual inventory levels and return
as the covariance on this estimate.
The first two steps can be done offline in a discrete calculation since they have no dependence on the time for which the prediction is being made. To accomplish the first step and determine the parameters of the Gaussian, there a number of approaches or models that can be used. A first model is based on frequency counts while a second and a third method use an autoregressive model of the multivariate time series {fi}. The second and third models differ in the underlying simplifications made, with the second model being more accurate but more complex.
If there is sufficient historical inventory data, the simplest possible way to determine the parameters of the Gaussian is through unbiased frequency count estimates. From the historical record it is relatively easily to build up the following estimates:
As additional inventory data is generated, these estimates are easily updated online. As such, the model is adaptive to and learns from changes in conditions.
Because subsequent plans are likely to be related to each other (for example, the latter plan is probably a modification of the former plan) it is a reasonable assumption to model the time series {fi} as an autoregressive process of order p:
where the errors, εn, are modeled as i.i.d. zero-mean Gaussian variables described by N(0, Cp). There are many software packages which will efficiently estimate the parameters by maximizing the Gaussian likelihood (for example, least squares) so the model uses the assumption that wp, Cp, and all Aip are all available. Some packages will attempt to determine the best value of p using a Bayesian information criterion. However, a natural choice in the present setting is p=τ. The proposed predictive algorithm makes use of a number of p.
This probability can be determined as follows:
P(fn, fn−1, . . . , fn−p)=P(fn|fn−1, . . . , fn−p)P(fn−1, . . . , fn−p) where
The model uses a simplification in order to tractably model P(fn−1, . . . , fn−p), which can be written as
The remaining probability P(f) is assumed to be Gaussian, therefore, P(f)=exp[−(f−μ)TΣ(f−μ)/2]/√{square root over ((2π)τ detΣ)} where μ and Σ can be estimated by simple frequency counts from the historical data:
The joint probability is then
where {tilde over (f)}=[fn fn−1 , . . . fn−p]T and the parameters of the Gaussian are
Σ−1 can be inverted using the identity that if
In the present case take B=Σ−1,
for i∈[2, . . . , p] where
is the Cholesky decomposition of C−1 to find
and, by multiplying, the mean is found to be:
As noted above, P(fn−1, . . . , fn−p) was simplified to the form
This can be improved by writing
As noted in the above equations, each conditional probability can be modeled as an autoregressive process of successively smaller orders. The parameters of each process can be determined from the historical times series. For the final unconditional probability the model uses the same Gaussian assumption as before. Each conditional probability for the AR(p−i) process is written as
and the unconditional probability for fn−p is the same as before having mean μ and covariance Σ. Combining all these results shows that the joint density P(fn, . . . , fn−p) is Gaussian with precision (inverse covariance) given by
(whose rows and columns have been ordered according to {tilde over (f)}), which can be formed as
where C0≡Σ and the (i+1)q×q matrices Vi are given by
and Ap is the q×pq matrix given by Ap=[A1p A2p. . App]. Given this particular form for the precision matrix, its Cholesky decomposition can immediately be written down as
In this form the inverse is easier to calculate If an inverse of the form below is necessary:
then the Q matrices must satisfy:
where A0i=Q0i≡−I. The iterative solution satisfying the requirement is given by
Thus the covariance matrix,
is equal to
This matrix can be easily implemented in software as executable instructions to provide the covariance.
Having determined the covariance matrix, the mean is determined by completing the square:
with
as above. Again this matrix and its square are easily implemented in software as executable instructions.
Although complex on paper, the algorithm is computationally very efficient. During the training phase the covariance matrix and its mean can be estimated with a single pass through the data by recording frequency counts. If there are D data points then the complexity of accomplishing both tasks is of the order O(D(p+1)q((p+1)q+1)). Prediction is not much more complicated. The dominant factor is the computation of
which is of order 0(|N2|3). Moreover, this matrix inverse can be stored and need not be recalculated with each prediction.
To be used as an online algorithm, the above algorithm must be modified so that it can operate in an online mode in which all parameters of the algorithm (that is, the mean and covariance) are continually updated as new data arrives so that it is adaptive. This is straightforward because updates to the mean and covariance are simple to implement in software. The algorithm also can be modified to discount data from the past by a factor 0≦λ≦1. This allows the algorithm to learn from recent events and adapt to changing conditions. Thus, if the data f1, . . . , fT−2, fT−1, fT is available at time T, the estimate of the mean is
and the estimate of the covariance at time T as
where the denominator is given by DT(λ)=1+λ+ . . . +λT−1=(1−λT)/(1−λ). With these definitions the algorithm can be implemented in software to provide running estimates of the mean and covariance and update them as
The discount factor λ should be set to match the time scale for variation of the problem. If behavior is stationary then λ=1 otherwise λ should be set from the data. One quick and basic manner to accomplish this is to match some characteristic time of variation,
As shown in
As noted above, the IEWA can use two modes of operation for the predictive function: the aggregate mode and the detailed mode. The aggregate, described above, generally uses real time data and accumulated data. The detailed mode is based on a detailed model that used detailed data that has been previously generated. Using the detailed model, the prediction task is split into two agents: (1) an Activity Risk Agent, and (2) an Early Warning Agent. The Activity Risk Agent conducts activities learning about potential variation based on historical performance, determines estimates of variation for currently planned activities, and provides those estimates to the Early Warning Agent. The Activity Risk Agent uses individual activities records for the activities learning about potential variation. The activities records contain both promised and actual delivery time and quantity. Using this data, the detailed model generates predictions for actual delivery dates and quantities for not yet completed activities.
The detailed model uses decision trees as the initial approach to machine learning. In one implementation, the detailed model uses Classification and Regression Trees (CART®) because these trees: (1) are easy to train and maintain, and (2) perform nearly as well as other machine learning techniques, such as neural networks, that are usually much harder to train, and from which knowledge cannot be extracted easily. Furthermore, classification and regression trees can be implemented and applied to data that consists of both continuous and discrete attributes. However, the use of other machine learning techniques is not precluded by this choice.
The general assumption upon which the detailed model is based is the observation that different suppliers or consumers of a particular material can have behavioral patterns that determine how much and when they deliver or consume materials with respect to what they promise in their order. For example, some suppliers can follow their delivery schedule very closely, some suppliers can be consistently late in delivering their orders, and some consumers can cancel orders frequently. These behavioral patterns can be learned if they are statistically significant. For example, some consumers, such as retailers, can regularly cancel orders after holidays because of prior overstocking of their inventories in anticipation of unrealized sales. Similarly, some suppliers can regularly deliver orders late around holidays because, for example, they regularly over-commit to their customers. Decision trees can be used to learn these patterns.
Generally, an algorithm used in the detailed model involves the building of a decision tree, determining splitting points on the decision tree, pruning the decision tree, extracting precise measurements of variability from the decision tree, and date pruning. A decision tree predicts an attribute by asking a succession of yes/no questions about known predictors. Each non-leaf node in the tree contains a question, and each leaf node contains a prediction. Such predictions can be, for example, single values plus standard deviations when the response is a continuous variable, or a class annotated with a probability when the response is a discrete valuable (that is, known as a classification tree).
The questions in non-leaf nodes refer to values of predictor variables. When those variables are continuous, the questions are in the form “is x less than C?” where C is a constant value. When the variables are discrete, the questions are in the form “does x belong to one of the classes A, C or F?” where A, C and F are some of the classes to which the discrete variable can belong.
The task of extracting a prediction from a decision tree can be relatively simple, as it involves traversing the tree according to the questions the tree poses. Building the tree itself, however, can be quite difficult. For this purpose, data-mining techniques are used that infer a tree from detailed data where the response variable is known. Typically, these techniques seek to find splitting points in the predictors (that is, these techniques seek questions) such that the split is maximally informative, thus minimizing the number of subsequent questions needed to reach a prediction.
One particular type of decision tree is the CART®, noted above. This type of decision tree advantageously can be rapidly inferred from data that can contain both continuous and discrete predictors. Moreover, the response variable can be continuous or discrete itself.
Classification and regression trees are constructed in two stages: (1) a growing stage and (2) a pruning stage. In the first, growing stage, the data is recursively split using a criterion that makes the split maximally informative. In the second, pruning stage, the tree is pruned back to avoid over-fitting and to obtain an adequate level of generality.
The growing part of the algorithm works according to the following general procedure. Initially, for all predictors, select those predictors in which the data exhibits the, possibility of an informative split. If, however, no predictors meet this criterion, exit. Otherwise, choose the split that is most informative, and repeat until no more splits are considered worthy.
The criterion used for seeking split points when the response variable is continuous is based on Analysis of Variance (ANOVA) techniques. Using ANOVA, if the predictor variable x itself is continuous, it is sorted in ascending order, and then a point xs is chosen such that the expression
is maximized, where y is the response variable. Once such a point, xs, is chosen, the above expression on that point serves as the score for the split on that variable (normalized with respect to the mean value of the response). By normalizing the variable, its worthiness can be evaluated globally.
The Analysis of Variance approach also can be used for discrete predictor variables. For discrete predictor variables, each of the classes that the predictor variable can belong to is ranked by the mean of their corresponding response values, and then a split is chosen using the same criterion as above by shuffling classes between two groups starting with those of smaller rank and moving them in a Graycode order.
When the response is discrete, the Gini rule is used, which requires the searching for a splitting point that isolates most frequently the largest class in the response. For this purpose, assume that the predictor is continuous, and that, as above, the splitting point is xs. This point partitions the response into two sets, PL and PR. For each response class j, define p(j|L) and p(j|R), the probabilities that a member of class j belongs to PL or PR, respectively, as
Then, the expression to minimize is the total impurity, given by
where I(p) is the impurity function, and can be defined either as 1−p2 or −p log(p).
As before, the procedure is similar when the predictor is discrete, in which case the split point is selected using Graycode-ranking.
The optimal size of a decision tree is inferred by cross-validation. The training data is split into multiple equal-sized buckets, and separate trees are grown for subsets of the data that exclude one of the buckets respectively. Each tree is then pruned at all possible degrees, and for each degree, the tree's performance is evaluated using the bucket that was excluded in its construction. The best degree of pruning is then chosen as the one that maximizes average performance across the data in all held-out-buckets. The original tree is then pruned to the chosen degree.
By default, predictions extracted from decision trees are parsimonious, typically consisting of a single-valued prediction and a confidence interval. However, more detailed information can be extracted by annotating each leaf of the tree with the values of the predictor variable of the training data that correspond to that leaf of the tree. The tree then can be seen as a partition of the set of training values for the predictor, and a prediction consists of an entire element of this partition. If binned, such element constitutes a histogram. By performing statistics on the histogram, a user can restore the original parsimonious predictions.
To predict order variability, the user obtains the following data or fields to create decision trees. From this data, three models are extracted: (1) a model to predict variability on actual delivery quantity; (2) a model to predict variability on actual delivery date; and (3) a model to predict order cancellation probability. The dates should be encoded into multiple variables as day of the week, day of the month and month of the year, in addition to a single numeric encoding that represents absolute time. The data or fields and an explanation of that data or field are as follows:
Each of the above fields can be represented as one or more numerical or categorical features for use in the machine-learning algorithm (for example, the expected delivery date can be expressed as day of week, day of year, month of year, etc.). A Data Collection Agent must construct these records by matching up planned activities with their outcomes.
The “source” for an activity is the partner or resource involved in that activity. For each of the four locations of activities used in the model, the source is as follows: (1) for replenishment of raw materials, the source is the supplier; (2) for consumption of raw materials, the source is a resource (for example, production line); (3) for replenishment of finished products, the source is a resource (for example, production line); and (4) for consumption of finished products, the source is a customer. If necessary, additional fields can be implemented for use in the special circumstance of predicting the arrival of last minute orders to be able to predict the variation resulting from orders that are not yet known but that might arrive.
The absolute-time encoding of dates does not play a significant role in prediction since all planned orders used for prediction will lie in the future with respect to historical orders used to build the tree. Instead, if the tree contains splits on absolute time, this will be used as an indication that a qualitative shift in behavior has occurred at those points in time. Training data that is older than the most recent of such points is discarded and a new tree is built using only the most recent data. However, if splits on absolute dates occur deep in the tree, then data should only be discarded if the data reaches those deep splits.
The detailed model described above allows the user to annotate planned activities with potential variation, which, for example, allows the user to annotate a sequence of planned activities with potential variation. As described below, using the potential variation information, a company in a supply chain can obtain the probability distributions of various inventory levels across time. In general, at an initial time an inventory contains a known number of items. In the subsequent time interval, there is scheduled a number of replenishment operations which are planned to be a certain time and quantity, but there is a stochastic variation expected in both time and quantity for each operation. It is assumed that both time and quantity are made discrete so that the distributions are discrete and specifiable as a histogram. Furthermore, the variation in time and quantity is assumed to be independent so that they can be completely specified by two independent histograms over one variable only. To implement the model efficiently, one consideration to be analyzed is how to efficiently propagate the probability distribution between the end of time interval t and the end of time interval t+1. To analyze this efficiency consideration, there are a number of observations that are helpful.
First, at the end of interval t+1, the probability of the inventory having value Q is given by:
Second, if there are no outages, the transaction sizes are independent of the current inventory, and hence the probability of transition between two inventory levels is dependent only on the difference between the two inventory levels. That is
P(Inventory transitions from q to Q in interval t+1)=ƒt+1(Q−q)
Third, because of the first two observations, the time evolution operator for a probability distribution over quantity, Pt, in the interval t+1 is given by
Pt+1=Pt*ƒt+1
where * is the convolution operator and ƒt+1 is the Time-Evolution Filter (TEF) for time t+1.
Fourth, there are two ways of evaluating the convolution operator, each of which has its own domain of efficiency. If ƒt+1 is sparse, (that is, ƒt+1(q)≠0 for only a small number of q's) it is more efficient to evaluate the sum directly, since the sum reduces to
which has only a small number of terms. If Pt is also sparse, as it would be at the initial time when the inventory is known exactly, the convolution reduces further to
thereby simplifying the problem even more greatly. Fifth, if fall advantage of sparcity is taken, this operation has computation order O(lm) where l and m are the numbers of nonzero elements in ƒt+1, and Pt
Sixth, if both ƒt+1 and Pt are non-sparse, it is more efficient to use the Fourier transform formula for the convolution.
Pt*ƒt+1=ifft[fft[Pt]×fft[ƒt+1]]
where fft and ifft are the fast Fourier transform operator and its inverse. Seventh, the computational order of this operation is O(n log n) where n is the greater of the length of ƒt+1 and Pt.
The detailed predictive model can be further developed using time evolution filters for various operations. For example, “ordinary” (that is non-stochastic) transactions can be put into the equations described above. First, if no transaction occurs in the interval t, the time evolution filter (TEF) is given by
For a replenishment of inventory of a definite size R that definitely occurs at time t, the TEF is:
and for a debit from inventory of a definite size D that definitely occurs at time t, the TEF is:
Note that these functions are independent of the form of the replenishment of debit from inventory. For example, two debits of size D/2 occurring in the same time interval have the same TEF as one of size D.
By introducing uncertainty into the quantity for the replenishment of inventory, a distribution of, for example, P(R), the TEF becomes
ƒt(ΔQ)=P(ΔQ)
A similar equation results for a debit from inventory.
If replenishment of inventory is certain (that is, the user knows that some quantity of goods will be delivered to the inventory even though there can be some uncertainty on how much will be delivered) the user expects that P(R) will have connected support that is centered on some expected quantity with a relatively small number of non-zero elements. This would satisfy the sparcity condition and allow the convolution to be computed efficiently and directly without resorting to the fft method. If two transactions having TEFs ƒ and g are expected within the same time period, the combined TEF is the convolution ƒ*g. If both ƒand g have connected support, so will the combined TEF.
If there is some non-zero probability, q, that the transaction will not occur, but will have probability distribution over quantity P(R) if it does occur, the TEF is.
Typically, this TEF will not have connected support, but will be non-zero both at 0 and in a region around the expected transaction quantity. This will still allow for sparcity considerations but more care must be taken to track the support of the TEF.
If two transactions occur in the same time interval, the TEF divides into four (possibly overlapping) regions: one region for when neither occurs, two regions for when one occurs and not the other, and one region for when both occur. The possibility of multiple transactions occurring in the same time interval complicates the spread of the probability distribution over quantity.
The description above is directed to the situation in which there is no variation allowed in the time of the orders. In the situation in which there is variation in time, a further level of complexity is added. To accommodate the time variation in the probability distribution, generalize the TEF for a single transaction to vary in time as:
ƒt(ΔQ)→ƒ(t; ΔQ)
Define this function for each transaction to be equal to the TEF for no transaction if t is before the first time at which the transaction can occur as follows:
If t is a time that is equal to or after the final time when the transaction could occur, the TEF is equal to the complete TEF for the transaction (as if there was no time variation) as follows:
where q is the probability that the transaction will not occur at all.
For a time t between the initial and final times of the transaction, the variable q is replaced by q(t), the probability that the transaction has not occurred yet by time t. This is derived from the cumulative distribution function for the transaction. In fact, with this definition, this case subsumes both of the others and provides, the general equation:
With these definitions in place, the probability distribution at time t is written as follows:
where T is the set of all transactions and
is the convolution series operator.
If a complete time series is to be computed, a great deal of time can be saved by noting that if a transaction is finished (that is, it is at or past the latest time step at which it could occur) its final TEF can be factored out of all future time steps and need be convolved into the solution only once. Similarly, transactions that have not yet started need not be convolved at all since they are identities. These occurrences can be written as an equation as follows:
where T′ and T″ are the set of all transactions that have started and finished and the set of transactions that have started and have not finished before time step t respectively. The convolution of the first two factors can be computed progressively throughout the computation and only the third factor need be computed at every time step, which advantageously reduces computation efforts. As such, the algorithm can be written as:
The computational complexity of this algorithm depends on several factors, including: (1) n, the number of bins the probability histogram is to be evaluated at for each time; (2) T, the total time duration of the probability distributions for all transactions; (3) m, the number of transactions; and (4) <Q>, the average support length of the quantity distribution for the transactions. Since each transaction is active for a number of time steps equal to the duration of its time distribution, a total of T convolutions with Pt are completed for all transactions. Further, since each transaction is incorporated exactly once into P0, an additional m convolutions are performed.
Each convolution is completed between a vector of average length <Q> and a vector of length n (P0 or Pt) which takes on the order of <Q>n operations. This order of operations is based on assuming: (1) that the single element indicating probability of not happening is inconsequential and can be ignored; (2) that no advantage is taken of possible sparseness of P0 or Pt; and (3) there is no use of a fast Fourier transform technique. If the user assumes that the convolutions are the most computationally intensive operations, the result is a computational order of (T+m)<Q>n.
Recognizing that computation expends time and resources, there are a number of opportunities to speed up the computation of the convolutions. These are described below and can be used in implementations of the detailed model.
The first method of speeding up the computation of the convolutions takes advantage of the typical situation in which: (1) the initial state of the inventory is a definite quantity (P(Q)=δQ,Q
If the TEFs have connected support, the support of the distribution will continue to be connected for all time and this will be as efficient as possible. However, if the TEFs fail to have connected support the support of the distribution will become fragmented and there will still be zeros between the upper and lower bounds. An example of TEFs failing to have connected support includes the situation in which there is a non-zero probability of failure. A more thorough elucidation of the support of the distribution that expresses the support as a union of intervals would ameliorate this problem, but the computational effort of keeping track of a potentially exponentially increasing number of intervals and deciding when two intervals overlap to merge them would likely use as much computing resource as it saves.
A second method of speeding up the computation is to have a convolution that checks whether or not an element of the probability distribution is zero before factoring in the convolution. This type of check likely would save many operations in which numbers are multiplied by zero.
A third method of speeding up the computation takes advantage of the associative nature of the convolution operator. In the event that more than one transaction occurs in a time interval, the individual TEFs for each transaction can be applied individually. Because the support of the TEF for a single transaction has at most two components, a sparse representation can be easily used.
A fourth method of speeding up the computation makes a trade off between accuracy and execution time by selecting a number of steps N and a number of points M such that for every N steps, M points are sampled from the distribution and the distribution is replaced with those M points and each M is assigned a probability 1/M. If N=1, this method subsumes the Monte Carlo method of estimating the distribution. Larger Ns represent a compromise between execution time and accuracy because the convolution becomes quicker due to smaller numbers of non-zero points in the quantity distribution.
A fifth method of speeding up the computation uses an alternative approximation method that eliminates small probabilities at each iteration, either by setting a threshold on the smallest probability to be considered or by only keeping the highest N elements for some N. Again, the increase in computation speed results from the faster convolution that is a consequence of having fewer non-zero elements in the quantity distribution.
Any of these five methods of speeding up the computation can be implemented to conserve computation resources. Moreover, one or more of these methods can be combined to further speed up the computation.
In addition to the computation of the probability distribution of inventory over time, it is also necessary to consider appropriate graphical methods to display these results to an end user. There are at least two methods of display: (1) bicumulative display and (2) relative density display.
The bicumulative display method takes the computed probability density function (pdf), sums to form the cumulative distribution function and applies the function
where H is the Heaviside function, to the cumulative distribution function. The result is then plotted as an image intensity with 1 being the darkest and 0 being the lightest. This method has the advantage of highlighting the median value of the distribution, but can disadvantageously hide any multi-modality in the distribution that can result from orders having significant probabilities of cancellation and give a false impression of a single-humped distribution.
The relative density display method uses the PDF only and normalizes at each time step so that the inventory with the largest probability of occurring is assigned a renormalized probability of 1. The function is then displayed as in the bicumulative display. The renormalization ensures that at each time step, there is at least one inventory level shown at the darkest color so that the distribution does not “wash out” as the distribution spreads over time. The relative density display method is more intuitive in that a dark segment of the plot indicates an increased probability of being at that location, and multi-modality is shown as more than one dark region at a single time step. The normalization, however, makes it more difficult to judge differences in probability between time steps, while differences in probability at different inventories at the same time step are easier to judge.
Another aspect of the detailed model that must be considered is the calibration of the confidence bounds that relate to inventory levels. Supply-chain control systems tend to be self-correcting. For example, if inventory levels are too low, new sources of supply will be sought, or some demand will be cancelled or postponed. In fact, the Inventory Early Warning Agent is intended to enhance this process of self-correction. For this reason, it is expected that the confidence bounds of a model that inherently assumes a non-correcting system will be too wide. As such, predicted confidence bounds are corrected through scaling to compensate for the self-correcting nature of the supply-chain control system. To make this correction, histograms of actual variation are compared with histograms of predicted variation, a scaling factor that maximizes their similarity is derived, and that scaling factor is used for future predictions.
There are particular factors that are related to building a model from detailed order data and that must be considered when building the detailed model. These factors include: (1) data sets of different sizes; (2) predictor form; (3) predictor dependence; (4) order splitting/combining; (5) modeling last-minute activities; (6) working with the models to predict variation in individual activities; and (7) incorporating real time demand. One objective in building the model and taking these factors into consideration is to be able to generate a fully automatic system in which different approaches or algorithms can be used and, in particular, can be used without human intervention.
With respect to handling data sets of different sizes, for any particular material, the number of records of replenishment or consumption activities can be extremely large (for example, tens of thousands) or extremely small (for example, two or three). Large quantities of data can require excessive processing time, and small quantities of data can provide insufficient records for learning. Both of these extremes, however, can be adequately handled by applying different techniques based on the size of the data set. For example, if the historical data set is too, large, the model can use sampling from the data set. The unused part of the large data set can be later used for validation purposes. If the historical data set for a particular material is too small, it can be extended with records for other materials that involve the same partner or resource. The model can be programmed to, for example, have a threshold that determines whether the data set is too large or too small and automatically samples the data or extends the records, respectively.
The model must also be able to handle activities that have multiple updates. For example, one particular activity can be updated many times before being completed, as in the case of a completion time for providing a good. The completion time can be set initially, and then changed multiple times based on changes in, for example, production schedules and transportation logistics. Each time an activity is updated, a new record for machine learning is generated. If records are considered as independent entries for the model-building algorithms, then activities that are updated multiple times will weigh more heavily than activities that are updated rarely and cause inaccuracies in the model. To compensate for the over-weighing of certain activities, records are weighted by the fraction of time they persist before being updated or fulfilled. The fraction is computed relative to the total time elapsed between when the activity was first planned and when it was completed. Thus all activities will have the same weight, independent of the number of updates.
Predictor variables can be expressed as absolute deviations, percentages, or absolute values. For example, expected variation in quantity can be expressed in terms of units of the variable or in terms of percentage of the order. The learning algorithm of the detailed model can be implemented to construct models that use different forms of the predictor and automatically select the one that performs better at prediction.
Decision trees can only predict a single variable at the time. To predict inventory levels, the detailed model involves the prediction of two or three variables. Since the model will be used to predict actual delivered quantities, actual delivery dates, and completion time spread, one approach is to build separate models, one to predict each of these variables. This approach, however, has the implicit assumption that these variables are independent. Nonetheless, this is one approach that can be implemented in the detailed model. A second approach is to develop a model for one of variables to be predicted, and then use it as a feature in a model for one of the other variables to be predicted. Again, this approach can be implemented in the detailed model. These two approaches imply that there are at least two possible ways to assume dependence.
Some of the activities that relate to inventory management can involve splitting or combining activities. For example, one item, represented as a first activity, can be incorporated into a second item, and the resulting combined item delivered to inventory. Such an activity involves combined activities. As an example of a splitting activity, a product line can make items that are delivered to, for example, the work-in-progress inventory of multiple factories operated by the company. Such an activity involves splitting activities.
If there are split activities, then there is a third predictor, time spread, to be considered when preparing a model of the activities. As an alternative approach in preparing a model, split activities can be treated as separate activities whose characteristics are prorated. Combined activities, on the other hand, can be treated either as single activities that are aggregated, or as independent activities that happened to be delivered at the same time. Each of these methods can be more appropriate in some circumstances than other methods. The Data Collection Agent, described above with respect to obtaining data for constructing the decision trees, will additionally have the task of extracting clean records from the stream of messages pertaining to the splitting and combining activities.
Another source of variability results from orders that are created at the last minute. The Activity Risk Agent must be able to model this type of situation to accurately represent typical supply chains. If there is a record of the four plan dates pertaining to activities (for example, creation date, current date, planned execution date, and actual execution date) then there is sufficient data to work out the probabilities of activities being created at the last minute. Using this information, the Activity Risk Agent can create new activities with the appropriate cancellation probability to represent last-minute orders (for example, the probability of the order being cancelled is equal to one minus the probability of it being created).
As described above, the Activity Risk Agent builds a decision tree that estimates the potential variation in outcome for a particular planned activity. The decision tree can be converted into a set of human-readable rules that can be interpreted and possibly changed by the user. Furthermore, the model-building algorithm used to create the Activity Risk Agent can be “seeded” with known rules about the variation of activities (for example, the user can modify the algorithm to include the condition that shipments of a particular material from a certain supplier arrive within 2 hours of the planned time). Advantages of adding this ability for the user to inspect, change, and seed the learning algorithms are that it gives the user a greater understanding of the system and the ability to enter knowledge that can not be otherwise available.
The features and aspects of the Activity Risk Agent described above do not take into account that variations in downstream demand might have an impact in the timeliness and completeness of the delivery of orders. For example, if downstream demand is high, it is possible that certain orders will be expedited. A difficulty in determining downstream demand results in part from the often significant delay in modem supply chains between when information is generated and when it is actually used. For example, point-of-sales data can come to a decision-maker with a lag of a few days, can have to be aggregated across different sources (stores) and/or times, etc. As a result, the decisions made (for example, new replenishment/manufacturing orders, etc.) can not be reflective of the up-to-date point-of-sales data. This situation is illustrated in the charts in
The Activity Risk Agent optimally is based on knowledge of the relationship between the real-time end-product demand signal and demand from immediate customers at the location. To obtain this knowledge, a Real-Time Demand Forecast Agent takes the point-of-sales data information immediately when it is generated and evaluates how that will affect future orders. The Real-Time Demand Forecast Agent optionally can evaluate already planned orders to determine whether there are possible changes to the orders. Thus, the Activity Risk Agent uses data obtained from the Real-Time Demand Forecast Agent to forecast orders (or evaluate possibility of changes to planned/forecasted orders) at some future time t′, given the current point-of-sales and other data, such as inventory levels, orders already in transition, etc.
In a supply chain, there are a number of scenarios in which the delay between the generation of the point-of-sales data and the actual orders generation can occur in the real-world supply chains and in which the Activity Risk Agent and the Real-Time Demand Forecast Agent can be used. A first scenario is that of a retailer-manufacturer relationship in which orders are placed by a store chain corporate center to a manufacturer. A second scenario is that of a distribution center-manufacturer relationship in which orders are placed by a distribution center to a manufacturer as a result of aggregated/accumulated demand from individual stores. A third scenario is that of a store-distribution center relationship in which orders are placed by an individual store to a distribution center. The third scenario is only relevant if there are time delays in order placement or deliveries between stores and distribution centers.
The information available to the decision maker can differ among the scenarios described above. As such, different algorithms can be prepared that are applicable to all of the above scenarios and other algorithms can be prepared that are better suited to one or two of the scenarios. Examples of these algorithms are described below.
The algorithms each include some common terms, as defined below:
The algorithms formulated below are based on the assumption that future orders are driven by: (1) inventory; (2) sales at time t and at prior times; and (3) by the orders already in the supply chain. The algorithms also assume that there is a built-in time-delay in the existing supply chains (as illustrated in
It0={{xτ, τ≦t−Δtx}, {iτ, τ≦t−Δti}, {sτ, τ≦t−Δts}, {oτ, τ≦t−Δto}},
where Δty (assumed to be a positive integer) is the information delay for the variable y.
In contrast with the above assumption of a delay, if it is assumed that the Real-Time Demand Forecast Agent will receive this information with little or no delay at all, the information set for the Real-Time Demand Forecast Agent is expressed as following:
Itr={{xτ, τ≦t}, {iτ, τ≦t}, {sτ, τ≦t}, {oτ, τ≦t}} (0.15)
Therefore, the agent's goal is to “extract” the decision maker's ordering policies from the available data and then to “make” ordering decisions before they are actually made by the decision maker. This ordering policy “extraction” can not always be optimal, as illustrated by the following scenario. Assume that the user, agent, or decision maker attempts at time t to “predict” what order will be placed by a decision maker at time t+Δt. Depending on the exact value of the Δt, the information set available for the estimation of future orders, Itr, can not contain all the information in the
Therefore, even if the ordering rule was extracted perfectly, the estimate can differ from the actual decision made.
Therefore, one method that can be used is to estimate a set of functions ƒt+t(Itr), Δt∈{1, . . . , T}, each of which minimizes an appropriately selected distance function d between our approximation and the actual order: d(ƒt+1(Itr)−ot+1). The distance minimization criterion of the distance function has been arbitrarily chosen here for simplicity but can be enhanced as necessary.
In developing the functions and Real-Time Demand Forecast Agent to simulate and forecast demand, there is the assumption that the planned/forecasted orders already in the system can be updated (for example, quantity changed, orders cancelled, etc.) when new information becomes available. There also is the assumption that there is a causal relationship between the orders in the near-term future and current point-of-sales data.
In simulating and forecasting demand, it is assumed that there are two separate processes that can be delineated that govern the ordering policy: (1) ordering as a result of experienced consumer demand (and perhaps with the goal of keeping out-of-stocks low) and (2) ordering to stock-up in anticipation of future promotions. Therefore, an order placed at time t can be a composition of orders resulting from both of the above processes. In an actual supply chain, stores receive information about forthcoming promotions (or perhaps make promotional decisions) and can order ahead of time. However, forthcoming promotional information is unlikely to be provided by the agent that obtains the actual data. If this assumption is not correct and information about forthcoming promotions is received, then constructing the model or simulation becomes simpler.
In addition to the assumptions above, there is another assumption made that a store's ordering decisions are aimed at keeping the inventory low, while at the same time not running out of stock. As a result of the above assumptions and equations, the inventory equation can be described as following:
it+1=it+at+1−xt=it−xt+St+1−l (0.16)
One particular method that stores use to make decisions about the quantity to be ordered at the beginning of the day t is keep the expected inventory at the beginning of the day t+l (after l's day shipment arrives) at a value or level that is equal to the value of the model stock m:
st=m+ƒtt+l−1−ot−it, (0.17)
where ƒt:t+l−1 is the expected total consumption in the periods from t to t+l−1.
The value for items on order at time t+1, ot+1, can be expressed as follows:
ot+1=ot+st−at+1=ot+st−st+1−l (0.18)
In this idealized example, an amount of goods ordered at a particular time t+1 is a linear combination of an earlier order and sales. Given that actual supply chains can not satisfy the stationary conditions assumed above, then use the assumption that an order at time t+1 can be expressed as a linear combination of point-of-sale, order and inventory levels from previous periods. This provides a basis for using the linear Real-Time-Demand prediction algorithm, described below.
One of the main objectives of supply chain management is to keep the inventory low, while minimizing the out-of-stocks probability at the same time. Because many store ordering policies are based on linear or near linear algorithms, it is reasonable to try a linear regression algorithm for determining S, the quantity ordered as follows:
In attempting various linear regressions with zero means and including running averages, a value for R-squared was typically obtained with a value of approximately 0.69-0.74. The linear regressions allow for negative predicted order size, which can be problematic because such an order is not possible. Such a problem can be alleviated by running non-negative least squares regression, which has done by using Matlab® functions. A non-negative linear regression has an additional benefit of providing easily interpretable coefficients for the resulting regression. Moreover, the obtained R-squared should be in a similar range as the values noted above.
Another possible method to correlate the supply chain data described above is to assume that the time-series of orders and of other variables is auto-correlated. Such an auto-correlation is illustrated by the orders autocorrelation function for a simulated data set in
Other algorithms instead can be used in the model in response to difficulties with the algorithms described above. In particular, one difficulty in implementing the linear regression algorithms is the necessity of taking into account seasonal effects in the consumer demand. For example, the day of the week effect is often pronounced in the real-world supply chains. A classification and regressions decision tree (CART®) algorithm can be used to take such seasonality into account. Otherwise, the set of predictor variables is similar to that described for the linear algorithm above.
Another algorithm that can be used is the cross-correlation method. For example, it is reasonable in formulating the algorithms above to have the expectation that the user will receive updated and reliable point-of-sales data, but can nonetheless fail to receive inventory data that is as reliable. In certain scenarios, such as in a corporate center-manufacturer interaction, this data can not be available on a timely basis. In such a situation, the algorithms that are used can be selected such that they do not take inventory into account at all. One algorithm that is appropriate in such a situation is the cross-correlation method of the instrument function estimation.
The invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. The invention can be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
Method steps of the invention can be performed by one or more programmable processors executing a computer program to perform functions of the invention by operating on input data and generating output. Method steps can also be performed by, and apparatus of the invention can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry.
The invention can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the invention, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
A number of implementations of the invention have been described. Nevertheless, it will be understood that various modifications can be made without departing from the spirit and scope of the invention. Accordingly, other implementations are within the scope of the following claims.
This application claims priority based on U.S. Provisional Application Ser. No. 60/336,227 filed Nov. 14, 2001, for SCM Supply Network Planning, and U.S. Provisional Application Ser. No. 60/384,638, filed May 31, 2002, for Inventory Early Warning Agent in a Supply Chain Management System, the disclosures of which applications are incorporated here by reference in their entirety.
Number | Name | Date | Kind |
---|---|---|---|
5285494 | Sprecher et al. | Feb 1994 | A |
5446890 | Renslo et al. | Aug 1995 | A |
5615109 | Eder | Mar 1997 | A |
5819232 | Shipman | Oct 1998 | A |
5870735 | Agrawal et al. | Feb 1999 | A |
5946662 | Ettl et al. | Aug 1999 | A |
5963919 | Brinkley et al. | Oct 1999 | A |
6006196 | Feigin et al. | Dec 1999 | A |
6138115 | Agrawal et al. | Oct 2000 | A |
6205431 | Willemain et al. | Mar 2001 | B1 |
6341271 | Salvo et al. | Jan 2002 | B1 |
6408263 | Summers | Jun 2002 | B1 |
6430540 | Guidice et al. | Aug 2002 | B1 |
6572318 | Cobene et al. | Jun 2003 | B2 |
6609101 | Landvater | Aug 2003 | B1 |
6622056 | Lindell | Sep 2003 | B1 |
6681990 | Vogler et al. | Jan 2004 | B2 |
6873979 | Fishman et al. | Mar 2005 | B2 |
7080026 | Singh et al. | Jul 2006 | B2 |
7092896 | Delurgio et al. | Aug 2006 | B2 |
7222786 | Renz et al. | May 2007 | B2 |
7398232 | Renz et al. | Jul 2008 | B2 |
20020143669 | Scheer | Oct 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
20030126103 A1 | Jul 2003 | US |
Number | Date | Country | |
---|---|---|---|
60336227 | Nov 2001 | US | |
60384638 | May 2002 | US |