Methods and systems to complete transaction date

Information

  • Patent Application
  • 20070043542
  • Publication Number
    20070043542
  • Date Filed
    August 22, 2005
    19 years ago
  • Date Published
    February 22, 2007
    17 years ago
Abstract
A method and system to receive transaction data; determine a gap in the transaction data; and use an algorithm to generate data to fill in the gap is described. The algorithm is selected from a group including a first algorithm and a second algorithm. The first algorithm is to determine a dominant pattern in the transaction data; identify a region within the dominant pattern that corresponds to the gap in the transaction data; and adopt data associated with the corresponding region into the gap to minimize impact on the dominant pattern. The second algorithm includes a Moore-Penrose pseudo-inverse algorithm to choose the transaction data to fill in the gap based on a set of substitute data from among a group of substitute data sets and adopts the set of substitute data into the gap.
Description
FIELD

The application relates generally to the field of transaction data, more specifically the methods and systems to complete transaction data, and to a machine-readable medium comprising instructions to perform this method.


BACKGROUND

Automatic Call Distribution (ACD) centers often use forecasting models to forecast transactions (e.g, calls or other communication requests) during certain periods of time. The forecasting models may be useful in determining adequate and efficient staff scheduling, for instance. Parameters for a forecasting model are often updated with new data to improve forecasting accuracy. Often, such updating is tedious and time consuming for an administrator of the forecasting model.


SUMMARY

According to an aspect of the invention there is provided a method and system to receive transaction data; determine a gap in the transaction data; and use an algorithm to generate data to fill in the gap is described. The algorithm is selected from a group including a first algorithm and a second algorithm. The first algorithm is to determine a dominant pattern in the transaction data; identify a region within the dominant pattern that corresponds to the gap in the transaction data; and adopt data associated with the corresponding region into the gap to minimize impact on the dominant pattern. The second algorithm includes a Moore-Penrose pseudo-inverse algorithm to choose the transaction data to fill in the gap based on a set of substitute data from among a group of substitute data sets and adopts the set of substitute data into the gap.




DESCRIPTION OF DRAWINGS

An example embodiment of the present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:



FIG. 1 illustrates a system, according to an example embodiment of the present invention.



FIG. 2 illustrates a method of choosing an algorithm to fill in a transaction data gap, according to an embodiment.



FIG. 3 illustrates a method of implementing an algorithm, according to an example embodiment of the present invention.



FIG. 4 illustrates a method of implementing another algorithm, according to an example embodiment of the present invention.



FIG. 5 shows a diagrammatic representation of machine in the example form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.




DETAILED DESCRIPTION

According to an aspect of the invention there is provided a method and system to receive transaction data; determine a gap in the transaction data; and use an algorithm to generate data to fill in the gap is described. The algorithm is selected from a group including a first algorithm and a second althorithm. The first algorithm is to determine a dominant pattern in the transaction data; identify a region within the dominant pattern that corresponds to the gap in the transaction data; and adopt data associated with the corresponding region into the gap to minimize impact on the dominant pattern. The second algorithm includes a Moore-Penrose pseudo-inverse algorithm to choose the transaction data to fill in the gap based on a set of substitute data from among a group of substitute data sets and adopts the set of substitute data into the gap.


Architecture


FIG. 1 illustrates a system 100, according to an example embodiment of the present invention. The system 100 may be used in the context of Automatic Call Distribution (ACD) centers to forecast transactions (e.g., calls or other communication requests) during certain periods of time using forecast models.


The system 100 may include a transaction gap module 110, an external data source 120, a forecasting module 125, and a database 130. The transaction gap module 110 may include an interface 135 to receive transaction data from the database 130 regarding, for example, a particular forecast group and/or a particular period of time. The interface 135 may receive transaction data from the external data source 120 through a network 160, such as the Internet.


The database 130 includes data regarding frequency of transactions or calls during periods of time. The database 130 (and/or the external data source 120) may include invalid, missing or incomplete data 165.


The transaction gap module 110 determines if there is a gap (e.g., incomplete data 165) in the transaction data. The gap may be invalid data, such as a data error and/or missing/omitted data (null). The gap may be during a period of time, such as a day or a set of days in a monthly data set. A month (series of weeks) of (possibly incomplete) daily data and a list of dates of invalid data may be included in the transaction data. For each valid date in the month, the data may be a non-negative number.


The transaction gap module 110 may also include a selection module 140 used in determining which algorithm, a first algorithm 145 and/or a second algorithm 150 to use to fill in the gap or gaps in transaction data. An algorithm may replace the invalid, incomplete or missing data 165 in the forecast group with plausible and/or likely values to render a complete output. Several algorithm embodiments are described herein. For example, the first algorithm 145 may include a pattern recognition code 155. A month of daily data, where the data for each day in the month is a non-negative number, may be the output of the algorithm of the transaction gap module 110.


The transaction gap module 110 then sends the output, complete data 170 including the filled-in data, to the forecasting module 125 to forecast transactions.



FIG. 2 illustrates a method 200 of choosing an algorithm to fill in a transaction data gap, according to an embodiment.


At block 210, transaction data is received, as discussed herein.


At block 220, a gap in the transaction data is determined, as discussed herein.


At block 230, the algorithm used to fill in the gap is determined. The determined algorithm may depend on the size of the dataset. Additionally, and/or alternatively, the determined algorithm may depend on the desired accuracy of the filled-in data. Additionally, and/or alternatively, the determined algorithm may depend on the desired speed to fill in the missing or invalid data


The algorithm described in FIG. 4 may render more accurate results as compared with the algorithm described in FIG. 3 when there is a large quantity of invalid data, e.g., greater than 50% of the days have missing or invalid data for the given month/forecast group.


However, the algorithm described in FIG. 4 may be computationally more expensive as compared with the algorithm described in FIG. 3. That is, more time and more processing capabilities of a system may be expended comparatively with the algorithm FIG. 4, especially when the data sets are large. The first algorithm may be used when processing time for filling in the gap may be minimized. The second algorithm may be used when accuracy for filling in the gap is to be maximized.



FIG. 3 illustrates a method 300 of implementing an algorithm, according to an example embodiment of the present invention.


At block 310, transaction data is received, as discussed herein.


At block 320, a gap in the transaction data is determined, as discussed herein.


At block 330, a dominant pattern in the transaction data is determined, using the algorithm, as discussed herein. The dominant pattern may be determined by the pattern recognition code 155.


At block 340, a region within the dominant pattern that corresponds to the gap in the transaction data may be identified, using the algorithm, as discussed herein.


At block 350, data associated with the corresponding region may be adopted into the gap to minimize impact on the dominant pattern, using the algorithm, as discussed herein.


Using the algorithm, invalid and/or missing data may be replaced with values that are consistent with the arrangement of the valid data. The algorithm and/or the transaction gap module 110 may also take into consideration any restrictions of the forecasting module 125 of the forecasting module. A forecasting module restriction may be that the number of calls during each week has the same pattern throughout the month, for example.


The algorithm of the embodiment of FIG. 3 may work best when the valid data is not too sparse in a given month. The valid data is not too sparse, for example, when the ratio of valid data to invalid data is greater than 1:1. The actual arrangement of days with invalid data and the degree of dominance of the pattern in the valid data may also impact the quality of the fill and/or a confidence in the fill.


Two examples of how the algorithm of FIG. 3 behaves for sparse valid data are described further below. Sparse valid data, as used here, may denote a qualitative and comparative state of a set of the data where there is less valid data than in some other comparable set of data.


In the below examples, in the first algorithm where a dominant pattern in the data may be determined and adopted to fill in the gap (e.g., null data sets), (i,j) refers to a jth day of an ith week, for n weeks with m days in each week, wherein xij includes valid numerical data, and if data is not valid on (i,j), xij=null.


vij includes vij=xij, unless xij=0, in which case, vij=null, wherein wij includes wij=ln(vij) wherenever vij is not null, and wij=null whenever vij=null.


A matrix of column differences, cij, includes cij=wij+1−wij whenever both wij+1 and wij are not null, and cij=null, otherwise.


A matrix of row differences, rij, includes rij=wi+1j−wij whenever both wi+1j and wij are not null, and rij=null, otherwise.


A jth column of cij includes at least one non-null entry, and c*j includes an average of each non-null entry in the jth column of cij, otherwise, c*j=0.


An ith row of rij includes at least one non-null entry, and ri* includes an average of each non-null entry in the ith row of rij, otherwise, ri*=0.


Cj+1=Cj+c*j, where C1=0, wherein Ri+1=Ri+ri*, wherein R1=0, and uij=Ri+Cj.


K includes an average of wij−uij over each (i,j) entry where wij is not null.


yij=wij whenever wij is not null and otherwise, yij=K+uij.


Output zij=Round(exp(yij)), where each date and time period includes valid data. zij is the matrix that is sent on to the forecasting model or module. zij may be sent through a sequence of one or more modules to be analyzed. Results may then be sent to a module that updates parameters of the forecasting module.


Logarithms may be taken of particular values so that multiplicative effects between day-of-the-week and week-of-the-month may be conveniently expressed as additive effects. In some implementations, it may be more convenient for the algorithm to work with additive effects than directly with the multiplicative effects. For example, multiplicative effect: m_effect=affect1*affec2; Additive: a_effect=affect3+affect4; log(m_effect)=log(affect1*affect2)=log(affect1)+log(affect2). By taking logs, a multiplicative effect can be treated as an additive effect where log(m_effect)=a_effect, log(affect1)=affect3, log(affect2)=affect4.


A first example of how the above-recited functions of the algorithm of FIG. 3 behaves for sparse valid data is as follows:
Wherewij=null-213null0-373null8105nullnullnull-1nullnull-2nullnullnull46nullnullnullnullnull3nullnullnullnullandthuswhereyij=2-21350-3736810520-4-113-2-55146830403572-1Hereisanotherexamplewherewij=nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull1nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullAndthus,whereyij=11111111111111111111111111111111111


In another embodiment, the method is similar to “Fill in Days” for monthly updates described above, however day-of-the-week is replaced by time-period and week-of-the-month is replaced by comparable date. In a particular embodiment, n becomes the number of comparable dates, m becomes the number of time-periods within a day, i becomes an index for comparable dates and j becomes an index for time-period of a day. The calculations are completed using the above described functions in the algorithm of FIG. 3.



FIG. 4 illustrates a method 400 of implementing another algorithm, according to an example embodiment of the present invention.


At block 410, transaction data may be received, as discussed herein.


At block 420, a gap in the transaction data may be determined, as discussed herein.


At block 430, a set of substitute data may be chosen from among a group of substitute data sets using a Moore-Penrose pseudo-inverse algorithm.


At block 440, the set of substitute data may be adopted into the determined gap.


In an embodiment, the Moore-Penrose pseudo-inverse algorithm may be more accurate as compared with the algorithm of FIG. 3 when the valid data is quite sparse (when the count of valid data is, for example, less than n+m) and the invalid data is plentiful. However, the Moore-Penrose pseudo-inverse algorithm may be associated with much more computation (in both space and time), and therefore may be less practical, especially when the data sets are large. For example, a set comprising several hundred comparable days where each day has one hundred periods may be considered large. A parameter may be set based on the data set size, for example, by the user or the administrator to determine which algorithm to use.


In an embodiment, the Moore-Penrose pseudo-inverse algorithm may fill in null or invalid data by producing an optimal “fill in”.


Let wij be the same as defined above with regard to the algorithm of FIG. 3, and let W denote the matrix of the wij.


For p=1,2, . . . , n+m and q=1,2, . . . , n+m, let fpq denote the elements of an n+m by n+m matrix, F, called the “filler”. The filler is a symmetric matrix, defined in the following way:


For p=1,2, . . . , n and q=1,2, . . . , n, let fpp=the number of non-null entries in the pth row of W and let fpq=0 when p≠q. For p=n+1, n+2, . . . , n+m and q=n+1, n+2, . . . , n+m, let fpp=the number of non-null entries in the (p−n)th column of W and let fpq=0 when p≠q. For p=1,2, . . . , n and q=n+1, n+2, . . . , n+m, let fpq=1 when wpq−n is not null and fpq=0 when wpq−n is null. For p=n+1, n+2, . . . , n+m and q=1,2, . . . , n, let fpq=1 when wqp−n is not null and fpq=0 when wqp−n is null.


If A is some real matrix and B is a real matrix such that ABA=B, BAB=A, AB is symmetric, and BA is symmetric, then B is called a Moore-Penrose pseudoinverse of A. It is a theorem that every real matrix has a mathematically unique Moore-Penrose pseudoinverse. Let F+ denote the pseudoinverse of F. Let F+ be computed from F using, say, Greville's Theorem.


Let b denote the average of the non-null values of W.


For i=1, 2, . . . , n and j=1, 2, . . . , m, define {tilde over (w)}ij by the rule {tilde over (w)}ij=wij−b when wij is not null and {tilde over (w)}ij=null otherwise. Let {tilde over (W)} denote the n by m matrix of the {tilde over (w)}ij.


Define a real vector, g, with n+m components gk, for k=1, 2, . . . , n+m, by the following rules: For k=1, 2, . . . , n, let gk=sum of the non-null elements in the kth row of {tilde over (W)} when at least one such element is not null and let gk=0 when every element in the kth row of {tilde over (W)} is null.


For k=1+n, 2+n, . . . , m+n, let gk equal the sum of the non-null element sin the (k−n)th column of {tilde over (W)} when at least one such element is not null and let gk=0 when every element in the (k−n)th column of {tilde over (W)} is null.


Define a real vector, h, with n+m components hk, for k=1, 2, . . . , n+m, by the following rule: h=F+g. The components of h are used to determine values to replace the null data in W as follows: For i=1, 2, . . . , n, let Ri=hi. For j=1, 2, . . . , m, let Cj=hj+n. Define uij by the rule uij=Ri+Cj. Let yij=wij whenever wij is not null and otherwise, let yij=uij+b.


the real matrix of the yij, Y, can be thought of as the matrix, W, with the null values filled in with data that is considered “valid”. As described above, W may be obtained by taking logarithms of the original data, xij. Now let zij=xij wherever xij has valid data and let zij=Round(exp(yij)) otherwise.


Output the zij.


In an example embodiment, the algorithm of FIG. 4 may be executed as follows, using the same first matrix, W, used in the example of the algorithm of FIG. 3, where W=
[[null-213null0-3][73null8105null][nullnull-1nullnull-2null][nullnull46nullnullnull][nullnull3nullnullnullnull]]thecorrespondingfiller,F,=[[500000111011][050001101110][002000010010][000200011000][000010010000][010001000000][110000200000][101110040000][110100003000][010000000100][111000000030][100000000001]]Thus,F+isapproximately:

  • [2.31082375478927E-0001, -5.34003831417627E-0002, -3.04118773946362E-0002, -3.04118773946361 E-0002, -8.21360153256704E-0002, 1.36733716475096E-0001, -4.71743295019157E-0002, -1.19731800766277E-0003, -2.13122605363984E-0002, 1.36733716475096E-0001, -2.13122605363983E-0002, -1.47749042145594E-0001]
  • [-5.34003831417628E-0002, 2.82806513409962E-0001, -1.07998084291187E-0001, -1.07998084291187E-0001, -2.28687739463601 E-0001, -1.99473180076628E-0001, -7.30363984674330E-0002, 1.45354406130268E-0001, - 1.26915708812260E-0002, -1.99473180076629E-0001, -1.26915708812261 E-0002, 1.36733716475096E-0001]
  • [-3.04118773946362E-0002, -1.07998084291187E-0001, 5.77059386973180E-0001, -2.29406130268199E-0002, 3.56800766283525E-0002, 1.91331417624521E-0001, 1.10871647509578E-0001, -1.19013409961685E-0001, 8.15613026819923E-0002, 1.91331417624521 E-0001, -1.18438697318007E-0001, 1.13745210727969E-0001]
  • [-3.04118773946362E-0002, -1.07998084291187E-0001, -2.29406130268199E-0002, 5.77059386973180E-0001, 3.56800766283525E-0002, 1.91331417624521E-0001, 1.10871647509578E-0001, -1.19013409961685E-0001, -1.18438697318007E-0001, 1.91331417624521E-0001, 8.15613026819924E-0002, 1.13745210727969E-0001]
  • [-8.21360153256706E-0002, -2.28687739463602E-0001, 3.56800766283525E-0002, 3.56800766283527E-0002, 1.19085249042146E+0000, 3.12021072796935E-0001, 1.97078544061303E-0001, -2.74185823754789E-0001, 1.19492337164750E-0001, 3.12021072796935E-0001, 1.19492337164751 E-0001, 1.65469348659004E-0001]
  • [1.36733716475096E-0001, -1.99473180076629E-0001, 1.91331417624521E-0001, 1.91331417624521E-0001, 3.12021072796935E-0001, 1.11613984674330E+0000, -1.02969348659003E-0002, -2.28687739463601 E-0001, -7.06417624521073E-0002, 1.16139846743295E-0001, -7.06417624521072E-0002, -2.20067049808430E-0001]
  • [4.71743295019156E-0002, -7.30363984674331 E-0002, 1.10871647509578E-0001, 1.10871647509578E-0001, 1.97078544061303E-0001, -1.02969348659003E-0002, 5.18438697318008E-0001, -1.13745210727969E-0001, -2.46647509578544E-0002, -1.02969348659001 E-0002, -2.46647509578544E-0002, -3.61590038314179E-0002]
  • [-1.19731800766279E-0003, 1.45354406130268E-0001, -1.19013409961685E-0001, -1.19013409961686E-0001, -2.74185823754789E-0001, -2.28687739463602E-0001, -1.13745210727969E-0001, 3.57519157088123E-0001, -3.61590038314176E-0002, -2.28687739463601 E-0001, -3.61590038314176E-0002, -8.21360153256707E-0002]
  • [-2.13122605363983E-0002, -1.26915708812260E-0002, 8.15613026819924E-0002, -1.18438697318007E-0001, 1.19492337164750E-0001, -7.06417624521073E-0002, -2.46647509578544E-0002, -3.61590038314177E-0002, 3.56369731800766E-0001, -7.06417624521072E-0002, -4.36302681992338E-0002, -6.20210727969350E-0002]
  • [1.36733716475096E-0001, -1.99473180076629E-0001, 1.91331417624521E-0001, 1.91331417624521E-0001, 3.12021072796935E-0001, 1.16139846743295E-0001, -1.02969348659002E-0002, -2.28687739463602E-0001, -7.06417624521074E-0002, 1.11613984674330E+0000, -7.06417624521072E-0002, -2.20067049808430E-0001]
  • [-2.13122605363983E-0002, -1.26915708812263E-0002, -1.18438697318007E-0001, 8.15613026819924E-0002, 1.19492337164750E-0001, -7.06417624521073E-0002, -2.46647509578544E-0002, -3.61590038314175E-0002, -4.36302681992337E-0002, -7.06417624521071 E-0002, 3.56369731800766E-0001, -6.20210727969353E-0002]
  • [-1.47749042145596E-0001, 1.36733716475097E-0001, 1.13745210727970E-0001, 1.13745210727969E-0001, 1.65469348659004E-0001, -2.20067049808429E-0001, -3.61590038314176E-0002, -8.21360153256708E-0002, -6.20210727969352E-0002, -2.20067049808430E-0001, -6.20210727969354E-0002, 1.06441570881226E+0000]]


For the first matrix, W, b=2.8, and the elements of {tilde over (W)} include:
[[null-4.8-1.8.2null-2.8-5.8][4.2.2null5.27.22.2null][nullnull-3.8nullnull-4.8null][nullnull1.23.2nullnullnull][nullnull.2nullnullnullnull]]


g is given by
g=(-1519-8.64.4.24.2-4.6-4.28.67.2-5.4-5.8)


Finding F+ by Greville's Theorem, computing h F+g, and solving for the yij in terms of the components of h recovers a matrix that is identical to the yij matrix generated by the algorithm of FIG. 3. However, the computations for the yij matrix generated by the algorithm of FIG. 4, may be computationally more expensive.


The component of the algorithm described here, acts upon the logarithms of the raw data, in the instance where that raw data is not null and not zero. The logarithms may be placed in a (not real) n by m matrix, W, whose elements are either real numbers or null, where at least one entry is not null.


In an embodiment of the algorithm of FIG. 3, let wij denote the entries of the logarithm matrix, W. Each wij is either a real number or null. For any set, A, let o(A) denote the cardinality of A.


The set, S, may be defined by the rule S={(i,j)|wij≠null}.


μ may be defined by the rule
μ=1o(S)(i,j)Swij.


yij may be defined by the rule
yij={wij-μ;(i,j)Snull;(i,j)S.


Y may be defined to be the matrix of yij.


V may be defined to be a real-valued function of n+m real variables so that V=V(r1, . . . , rn, c1, . . . , cm) where
V(r1,,rn,c1,,cm)=(i,j)S(yij-ri-cj)2.


V is a non-negative quadratic function, so V may have a global minimum value, but there may be many values of (r1, . . . , rn, c1, . . . , cm) that achieve this minimum value of V. To find a minimum of V, points where V is stationary are sought. That is, where
Vrk=0;k=1,,nVcl=0;l=1,,m,butVrk=-(i,j)S2(yij-ri-cj)δik=-(k,j)S2(ykj-rk-cj),fork=1,,nandVcl=-(i,j)S2(yij-ri-cj)δjl=-(i,l)S2(yil-ri-cl),forl=1,,m


Therefore a minimum satisfies:
(k,j)S2(yij-ri-cj)=0;k=1,,n(i,l)S2(yil-ri-cl)=0;l=1,,m


The first n sums may be over “non-null” elements in the kth row of Y. The second m sums may be over the “non-null” elements in the lth column of Y.


Let Pk={j|(i,j)∈S and i=k} and let Ql={i|(i,j)∈S and j=l}. The system of equations may be written as
jPk(ykj-rk-cj)=0;k=1,,niQl(yil-ri-cl)=0;l=1,,morjPkrk+jPkcj=jPkykj;k=1,,niQlri+iQlcl=iQlyil;l=1,,morO(Pk)rk+jPkcj=jPkykj;k=1,,niQlri+O(Ql)cl=iQlyil;l=1,,m.


Note that o(Pk) is the number of non-null elements in the kth row of Y and that o(Ql) is the number of non-null elements in the lth column of Y. Also note that
jPkykj

is the sum of the non-null elements in the kth row of Y and
iQlyil

is the sum of the non-null elements in the lth column of Y.


The system of equations shown above comprises n+m simultaneous linear equations in n+m variables. As such, the system of equations may be expressed as a vector-matrix equation in Rn+m of the form Fh=g, where F is an n+m by n+m real matrix and both g and h are vectors in Rn+m.
Thevecorshandg:h=(r1rnc1cm),g=(jP1y1jjPnynjiQ1yi1iQmyim).


In order to describe F, the symbol, εij, may be used, where εij=1, when yij is not null and εij=0, when yij is null.
F=(o(P1)00ɛ11ɛ12ɛ1m0o(P2)0ɛ21ɛ22ɛ2m00o(Pn)ɛn1ɛn2ɛnmɛ11ɛ21ɛn1o(Q1)00ɛ12ɛ22ɛn20o(Q2)0ɛ1mɛ2mɛnm00o(Qm))


The matrix F is a symmetric matrix. The elements on the diagonal of the matrix F may be expressed in terms of the εij term, as follows:
F=(ɛ1jj00ɛ11ɛ12ɛ1m0jɛ2j0ɛ21ɛ22ɛ2m00jɛnjɛn1ɛn2ɛnmɛ11ɛ21ɛn1iɛi100ɛ12ɛ22ɛn20iɛi20ɛ1mɛ2mɛnm00iɛim).


The equation Fh=g includes at least one solution, and possibly an infinite number of solutions. An infinite number of values may minimize V=V(r1, . . . , rn, cl, . . . , cm). The solution chosen to use for the fill in may be the solution that leads to a most conservative approximation of the yij by the values of ri+cj. Such a solution, h, is one for which ∥h∥ is minimum. In other words, find an h, such that Fh=g and ∥h∥ is minimum. Such asn h may be found by means of the pseudoinverse of F. The pseudoinverse of F is a mathematically unique matrix, denoted F+. The solution for h, such that ∥h∥ is minimum, may be given by h=F+g.


This result follows from the definition of pseudoinverse, where: FF+F=F, F+FF+=F+, FF+=(FF+)T, and F+F=(F+F)T.


The above-recited relations imply that (F+F)(F+F)=F+F and (FF+)(FF+)=FF+, so that, in virtue of their symmetries, F+F and FF+ are both projections. For any x in Rn+m, either of these projections determines a decomposition of x into orthogonal components:

x=(I−F+F)x+(F+F)x or x=(l−FF+)x+(FF+)x,
so that (x,x)=((I−F+F)x,(I−F+F)x)+((F+F)x,(F+F)x)
or
(x,x)=((I−FF+)x,(I−FF+)x,(I−FF+)x)+((FF+x),(FF+x)), respectively.


(F+Fx,F+Fx)≦(x,x) and (FF+x,FF+x)≦(x,x) for any x in Rn+m. Also, if (F+Fx,F+Fx)=(x,x) or (FF+x,FF+x)=(x,x), respectively, then ((1−F+F)x,(I−F+F)x)=0 or ((I−FF+)x,(I−FF+)x)=0, respectively, so that (I−F+F)x=0 or (I−FF+x=0, respectively. This forces F+Fx=x or FF+x=x, respectively. Therefore, if (F+Fx,F+Fx)=(x,x) then F+Fx=x and if (FF+x, FF+x)=(x,x) then FF+x=x.


{tilde over (h)} may be defined by the rule {tilde over (h)}=F+g. Then F{tilde over (h)}=FF+g, F+F{tilde over (h)}=F+FF+g=F+g={tilde over (h)}, so that F+F{tilde over (h)}={tilde over (h)}.


Suppose there is an h such that F h=g, then F+F h=F+g={tilde over (h)} and so that ({tilde over (h)},{tilde over (h)})=(F+F h,F+F h)≦( h, h) for any solution, h.


F{tilde over (h)}=FF+F h=F h=g, therefore, {tilde over (h)} is a solution to Fh=g for which ∥h∥ is minimum. Furthermore, suppose ( h, h)=({tilde over (h)},{tilde over (h)}) then ( h, h)=(F+g,F+g)=(F+F h,F+F h) and, because F+F is a projection, F+F h= h by implication. Again, because h is a solution, F+g= h; but F+g={tilde over (h)}, so {tilde over (h)}= h. Therefore, if ( h, h)=({tilde over (h)},{tilde over (h)}) then {tilde over (h)}= h. Therefore, {tilde over (h)}=F+g is a mathematically unique solution to Fh=g, for which ∥h∥ is minimum.


The components of {tilde over (h)} give the values of ri and cj used to fill in the null values of W as follows: If (i,j)∉S, then wij=ri+cj+μ. Otherwise, the value of wij remains unchanged.


The automated update algorithms described herein may make consistent judgments about enormous quantities of numerical data, and may reduce the risk that clerical errors associated with manual update activities may deform the forecast model. Automated introduction of the new data may avoid inappropriate changes in the day of week patterns that are extracted from the data, which may reduce deformation of the forecast model.


Computer Architecture


FIG. 5 shows a diagrammatic representation of machine in the example form of a computer system 600 within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., netwvorked) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.


The example computer system 600 includes a processor 602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 604 and a static memory 606, which communicate with each other via a bus 608. The computer system 600 may further include a video display unit 610 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 600 also includes an alphanumeric input device 612 (e.g., a keyboard), a user interface (UI) navigation device 614 (e.g., a mouse), a disk drive unit 616, a signal generation device 618 (e.g., a speaker) and a network interface device 620.


The disk drive unit 616 includes a machine-readable medium 622 on which is stored one or more sets of instructions and data structures (e.g., software 624) embodying or utilized by any one or more of the methodologies or functions described herein. The softvare 624 may also reside, completely or at least partially, within the main memory 604 and/or within the processor 602 during execution thereof by the computer system 600, the main memory 604 and the processor 602 also constituting machine-readable media.


The software 624 may further be transmitted or received over a network 626 via the network interface device 620 utilizing any one of a number of well-known transfer protocols (e.g., HTTP).


While the machine-readable medium 622 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such a set of instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals. Although an embodiment of the present invention has been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims
  • 1. A computer-implemented method comprising: receiving incomplete transaction data; determining a gap in the incomplete transaction data; and using an algorithm to generate data to fill in the gap and to generate complete transaction data, wherein the algorithm is selected from a group including a first algorithm and a second algorithm, wherein the first algorithm is automatically to: determine a dominant pattern in the transaction data; identify a region within the dominant pattern that corresponds to the gap in the transaction data; and adopt data associated with the corresponding region into the gap to minimize impact on the dominant pattern; and wherein the second algorithm includes a Moore-Penrose pseudo-inverse algorithm to choose at least a portion of the transaction data to fill in the gap based on a set of substitute data from among a group of substitute data sets and to adopt the set of substitute data into the gap.
  • 2. The computer-implemented method of claim 1 wherein the first algorithm includes (i,j) referring to a jth day of an ith week, for n weeks with m days in each week, wherein xij includes valid numerical data, and if the data is not valid on (i,j), xij=null, wherein vij includes vij=xij, unless xij=0, in which case, vij=null, wherein wij includes wij=ln(vij) whenever vij is not null, and wij=null whenever vij=null, wherein a matrix of column differences, cij, includes cij=wij+1−wij whenever both wij+1 and wij are not null, and cij=null, otherwise, wherein a matrix of row differences, rij, includes rij=wi+1j−wij whenever both wi+1j and wij are not null, and rij=null, otherwise, wherein a jth column of cij includes at least one non-null entry, and c*j includes an average of each non-null entry in the jth column of cij, otherwise, c*j=0, wherein an ith row of rij includes at least one non-null entry, and ri* includes an average of each non-null entry in the ith row of rij, otherwise, ri*=0, wherein Cj+1=Cj+c*j, where C1=0, wherein Ri+1=Ri+ri*, where R1=0, wherein uij=Ri+Cj, wherein K includes an average of wij−uij over each (i,j) entry where wij is not null, wherein yij=wij whenever wij is not null and otherwise, yij=K+uij, wherein output zij=Round (exp(yij)), wherein the output zij corresponds to filling in the gap.
  • 3. The computer-implemented method of claim 1 wherein the first algorithm is used when processing time for filling in the gap is to be minimized.
  • 4. The computer-implemented method of claim 1 wherein the second algorithm is used when accuracy for filling in the gap is to be maximized.
  • 5. The computer-implemented method of claim I wherein the second algorithm includes an equation Fh=g, wherein Fh=g includes a plurality of solutions, for h, wherein a solution from the plurality of solutions that is selected to fill in the gap is the solution for h, such that ∥h∥ is minimized solving for h=F+g, wherein a pseudoinverse of F includes F+, wherein
  • 6. The computer-implemented method of claim 1, including forecasting future transaction activity utilizing the complete transaction data
  • 7. A machine-readable medium storing a sequence of instructions that, when executed by a computer, cause the computer to perform the method of claim 1.
  • 8. A system comprising: an interface to receive transaction data; and a transaction gap module to: determine a gap in the transaction data; determine a dominant pattern in the transaction data; identify a region within the dominant pattern that corresponds to the gap in the transaction data; and adopt data associated with the corresponding region into the gap to minimize impact on the dominant pattern.
  • 9. The system of claim 8 wherein the transaction data module embodies an algorithm that includes a formula for output zij=Round(exp(yij)), wherein the output zij corresponds to filling in the gap, wherein (i,j) refers to a jth day of an ith week, for n weeks with m days in each week, wherein yij=wij whenever wij is not null and otherwise, yij=K+uij, wherein K includes an average of wij−uij over each (i,j) entry where wij is not null, wherein Cj+1=Cj+c*j, where C1=0, wherein Ri+1=Ri+ri*, where R10, wherein uij=Ri+Cj, wherein a matrix of column differences, cij, includes cij=wij+1−wij whenever both wij+1 and wij are not null, and cij=null, otherwise, wherein a matrix of row differences, rij, includes rij=wi+1j−wij whenever both wi+1j and wij are not null, and rij=null, otherwise, wherein a jth column of cij includes at least one non-null entry, and C*j includes an average of each non-null entry in the jth column of cij, otherwise, c*j=0, wherein an ith row of rij includes at least one non-null entry, and ri* includes an average of each non-null entry in the ith row of rij, otherwise, ri*0, wherein xij includes valid numerical data, and if the data is not valid on (i,j), xij=null, wherein vij includes vij=0, in which case, vij=null, wherein wij includes wij=ln(vij) whenever vij is not null, and wij=null whenever vij=null.
  • 10. A system comprising: an interface to receive transaction data; a transaction gap module to: determine a gap in the transaction data; use a Moore-Penrose pseudo-inverse algorithm to determine transaction data to fill in the gap based on a set of substitute data from among a group of substitute data sets; and adopt the set of substitute data into the gap.
  • 11. The system of claim 10 wherein the gap includes at least one of a data error and a data omission.
  • 12. The system of claim 10 wherein the transaction gap module includes an equation Fh=g, wherein Fh=g includes a plurality of solutions, for h, wherein a solution from the plurality of solutions that is selected to fill in the gap is the solution for h, such that ∥h∥ is minimized solving for h=F+g, wherein a pseudoinverse of F includes F+, wherein vectors h and g include:
  • 13. A system comprising: means for receiving transaction data; means for determining a gap in the transaction data; means for determining a dominant pattern in the transaction data; means for identifying a region within the dominant pattern that corresponds to the gap in the transaction data; and means for adopting data associated with the corresponding region into the gap to minimize impact on the dominant pattern.
  • 14. A system comprising: means for receiving transaction data; means for determining a gap in the transaction data; means for determine transaction data to fill in the gap based on a set of substitute data from among a group of substitute data sets; and means for adopting the set of substitute data into the gap.
  • 15. The system of claim 14 wherein the means for determining transaction data includes using a Moore-Penrose pseudo-inverse algorithm.