Enterprise software systems receive, generate and store data related to many aspects of a business enterprise. These systems may provide reporting, planning, and/or analysis of the data based on logical entities known as dimensions and measures. Dimensions represent sets of values (i.e., Dimension members) along which an analysis may be performed or a report may be generated (e.g., Country, Year, Product), and measures are indicators, most often numeric, whose values can be determined for a given combination of dimension members. For example, a value of the Sales measure may be determined for bicycles (i.e., a member of the Product dimension) in January (i.e., a member of the Month dimension).
According to one type of business planning, future business targets are defined in order to assist operational planning and to serve as a benchmark against which performance may be measured. Referential disaggregation is one technique used to define future business targets based on prior data.
Table 100 of
System 200 includes application server 210 to provide data of data source 220 to client system 230. For example, application server 210 may execute one of applications 215 to receive a request for analysis from analysis client 232 executed by client system 230, to query data source 220 for data required by the analysis, receive the data from data source 220, perform the analysis on the data, and return results of the analysis to client system 230.
Data source 220 may comprise any one or more systems to store business data. The data stored in data source 220 may be received from disparate hardware and software systems, some of which are not interoperational with one another. The systems may comprise a back-end data environment employed in a business or industrial context. The data may be pushed to data source 220 and/or provided in response to queries received therefrom.
The data may comprise a relational database, a multi-dimensional database, an eXtendable Markup Language (XML) document, and/or any other structured data storage system. The physical tables of data source 220 may be distributed among several relational databases, multi-dimensional databases, and/or other data sources. For example, data source 220 may comprise one or more OnLine Analytical Processing (OLAP) databases (i.e., cubes). Moreover, the data of data source 220 may be indexed and/or selectively replicated in an index.
Data source 220 may implement an “in-memory” database, in which volatile (e.g., non-disk-based) storage (e.g., Random Access Memory) is used both for cache memory and for storing data during operation, and persistent storage (e.g., one or more fixed disks) is used for offline persistency of data and for maintenance of database snapshots. Alternatively, volatile storage may be used as cache memory for storing recently-used database data, while persistent storage stores data. In some embodiments, the data comprises one or more of conventional tabular data, row-based data stored in row format, column-based data stored in columnar format, and object-based data.
To provide economies of scale, data source 220 may include logical databases of more than one customer, which are programmatically isolated from one another. In this scenario, application server 210 includes mechanisms to ensure that a client accesses only the data that the client is authorized to access.
Client system 230 may comprise one or more devices executing program code of a software application for presenting user interfaces to allow interaction with applications 215 of application server 210. Client system 230 may comprise a desktop computer, a laptop computer, a personal digital assistant, a tablet PC, and a smartphone, but is not limited thereto.
Analysis client 132 may comprise program code of a spreadsheet application, a spreadsheet application with a plug-in allowing communication (e.g. via Web Services) with application server 210, a rich client application (e.g., a Business Intelligence tool), an applet in a Web browser, or any other application to perform the processes attributed thereto herein.
Repository 240 stores metadata and data for use by application server 210. The metadata may specify a logical schema of data source 220 (i.e., dimensions and measures), which may be used by application server 210 to query data source 220. The metadata may also define users, workspaces, data source connections, and dimension member hierarchies.
Although system 200 has been described as a distributed system, system 200 may be implemented in some embodiments by a single computing device. For example, both client system 230 and application server 210 may be embodied by an application executed by a processor of a desktop computer, and data source 220 may be embodied by a fixed disk drive within the desktop computer.
Prior to process 300, a user may operate analysis client 232 to access a planning interface, such as a planning Web page. The user interacts with the interface to input a total target value associated with two or more (hereinafter, “N”) dimension members. For example, with reference to
The specified dimension members and total target value are determined at S310. For example, the specified dimension members and total target value may be received by an application 215 of application server 210 at S310. Next, at S320, a set of historical values is determined for each of the dimension members. According to some embodiments, application server 210 may query data source 220 for these values (e.g., using metadata of repository 140). Continuing the above example, application server 210 may query data source 220 for revenue values for each of products A, B and C for each of several years.
A prediction interval is determined for each of the N dimension members based on its set of historical values at S330. Linear regression with least squares fitting is a known operation to determine such a prediction interval. For example,
Table 500 of
At S340, an N-polytope in N-dimensional space is determined based on the N determined prediction intervals. A polytope is a geometric object with flat sides and any number of dimensions.
Next, at S350, an (N−1) polytope of the N-dimensional space is determined, in which the sum of the N values of each coordinate of the (N−1) polytope equals the total desired value. According to the present example, S350 includes determination of a plane in which each coordinate is a solution of A+B+C=110.
An intersection of the (N−1) polytope and the N-polytope is determined at S360.
Accordingly, a disaggregation of the total target value among the N dimension members is determined based on the coordinates of the intersection at S370. In some examples, an appropriate disaggregation is determined to be a midpoint of the intersection. The midpoint may be defined in various manners, including as the point where a min-max line intersects the intersection, the center of mass of the intersection, the centroid of the vertices of the intersection, and the point whose probability is maximized in the linear regression models.
The min (max) point is the point of the N-polytope defined by the prediction intervals which has the minimum (maximum) value in each axis, and the min-max line is a line between the min point and the max point.
Let Min=(p1, . . . , p3) and Max=(q1, . . . , q3) be the min point and the max point of the rectilinear box, respectively. All intervals (p1,q1), . . . , (pd,qd) are specified by Min and Max. Let T be the target value of interval disaggregation. In the example of
In order to determine the intersecting point at S370, the min-max line, which passes through Min and Max, is denoted by a vector X=Min+t(Max−Min) for a real number t, i.e., X=(p1+t(q1−p1), . . . , pd+t(qd−pd)).
t=T−(p1+ . . . +pd)/(q1+ . . . +qd)−(p1+ . . . +pd).
Substituting the value oft for X produces the intersecting point. According to the present example, X=(40+8t, 27+6t, 30+3t),
and the intersecting point is (46.1, 31.6, 32.3). The disaggregation is therefore A=46.1, B=31.6 and C=32.3.
Another way to view the above computation is as follows. Given a vector (a1, . . . , ad) in d dimensions, its Manhattan distance (also known as L1 distance) is a1+ . . . +ad, while its Euclidean distance is √{square root over (a12+ . . . +ad2)}. The line segment between Min and Max (i.e., vector Max−Min) has Manhattan distance (q1+ . . . +qd)−(p1+ . . . +pd), and the line segment between Min and the intersecting point has Manhattan distance T−(p1+ . . . +pd) because the Manhattan distance of the intersecting point is T and that of Min is p1+ . . . +pd. Hence, the intersecting point is the point in the line segment between Min and Max whose relative distance from Min is
i.e., it is Min+t(Max−Min).
Alternatively, to find the center of mass and the centroid at S370, the vertices of the intersection are determined. The vertices of the intersections are the intersections of the (N−1)-polytope of S350 and the edges of the N-polytype of S340. Initially, as shown in
There are four edges parallel to the A-axis in
In general, let (r1, r2, . . . , rd) be the max point of the N-polytype, when the min point is (0, . . . , 0), i.e., ri=qi−pi for 1≦i≦d. To find the intersections on the edges parallel to the A-axis, find all subsets of {r2, . . . , rd} whose sum is between T′−r1 and T′, where T′=T−(p1+−+pd). Similarly, the intersections on the edges parallel to the B-axis, etc. may be determined. The following two algorithms may be executed to determine the intersections for all axes at the same time.
The first algorithm computes all possible subsets of {r1, . . . , rd} and checks if each subset can produce intersections. Let Sum[0 . . . 2d−1] be an array such that Sum[i] is the sum of the subset represented by the binary notation of i, i.e., rk is in the subset if the k-th rightmost bit of i is 1. For example, if i=1102 in
Sum[0]=0
for(k=1 to d)
From each entry Sum[i], intersections are determined by the following Cases. Let maxr=1≦k≦dmax[(r)k].
Sum[i]=T′: the subset represented byi is an intersection.
T′−maxr<Sum[i]<T′: for each dimension 1≦k≦d, if the k-th rightmost bit of i is 0 and Sum[i]>T′−rk, output intersection (b1, . . . , bd), where bi=r1 if i≠k and the j-th rightmost bit of i is 1; bi=0 if j≠k and the j-th rightmost bit of i is 0; bj=T′−Sum[i] if j=k. (If Sum[i]=T′−rk; there is an intersection, but this intersection will be found in Case 1 of some other entry).
Sum[i]≦T′−maxr: output no intersections. (If Sum[i]=T′−maxr, again this intersection will be found in Case 1 of some other entry).
For the example in
Let S be the number of intersections, and P the number of entries such that T′−maxr<Sum[i]≦T′. Computing Sum takes O(2d) time. For each entry such that T′−maxr<Sum[i]≦T′, the three cases above take at least O(d) time, and if there are many intersections from the entry then Case 2 takes O(d) time for each intersection. Therefore, the time complexity of the first algorithm is O(2d+d·P+d·S).
The second algorithm uses dynamic programming Let A(i,w) be the maximum value ≦w that can be obtained with r1, . . . , r1. A dynamic programming recurrence for A(i,w) is:
All entries of table A(0 . . . d, 0 . . . T′) are computed by the recurrence above, and then all paths from the entries A(d,T′−maxr+1 . . . T′) to A(0,0) are found by backtracking. Each path corresponds to a subset of {r1, . . . rd} whose sum is >T′−maxr. Hence, the number of distinct paths is exactly P. For each path, Cases 1 and 2 of the first algorithm are performed, where Sum[i] is now the sum of the subset corresponding to the path. The time complexity of the second algorithm is O(d·T′+d·P+d·S), since computing table A takes O(d·T′) time and backtracking O(d·P).
For the example of
The first algorithm may be useful when N is small whereas the second may be useful when the target value T′ is moderate.
Once all vertices of the intersection are found by one of the two algorithms above, the centroid, which is the average of the vertices, can be easily computed. For instance, in the
To find the center of mass, the intersection is divided into simplexes (e.g., a simplex in two dimensions is a triangle, and a tetrahedron in three dimensions) by using Delaunay triangulation. The center of mass of a simplex is the centroid of its vertices, and the center of mass of the feasible polytope is the weighted sum of the centers of mass of the simplexes where the weights are the volumes of the simplexes. For the example of
In order to determine the mode at S370, the value of the response variable yo at a future value xo is determined for each of N dimension members. As described with respect to S330, a linear regression may be determined for each of N dimension members. Let y=α+βx be the linear equation of line 1110 obtained by linear regression on a sample of n points for a dimension member. See
where MSREs,
Let yi, 1≦1≦d, be the response variable (yo in the previous paragraph) in the i-th dimension at the future value xo. Let pdfi(yi) be the probability density function for yi. If we assume for simplicity that the d dimensions of interval disaggregation are independent, the mode is a point (a1, . . . , ad) such that pdfi(ai)× . . . ×pdfd(ad) is maximized.
A user may desire to adjust disaggregated values after interval disaggregation. During such adjusting, the user may fix some one or more of the values and want to see the remaining values determined by interval disaggregation. For the disaggregated values in table 500, the user may, for example, fix the value of product group A to 47 and request determination of the values of product groups B and C by interval disaggregation (Table 5).
Accordingly, the above-described three-dimensional interval disaggregation with total target value of 110 becomes a two-dimensional interval disaggregation in which Min=(27,30), Max=(33,33), and the total target value is 110−47=63. The two-dimensional interval disaggregation can be solved for each of the four above-described midpoints. For example, the intersecting point may be computed as follows: Max−Min=(6, 3), which has Manhattan distance 9. Since the Manhattan distance of the intersecting point is 63 and that of Min is 57, the intersecting point has relative distance 6/9 from Min, and it is
Apparatus 1200 includes processor 1210 operatively coupled to communication device 1220, data storage device 1230, one or more input devices 1240, one or more output devices 1250 and memory 1260. Communication device 1220 may facilitate communication with external devices, such as application server 110. Input device(s) 1240 may comprise, for example, a keyboard, a keypad, a mouse or other pointing device, a microphone, knob or a switch, an infra-red (IR) port, a docking station, and/or a touch screen. Input device(s) 1240 may be used, for example, to manipulate graphical user interfaces and to input information into apparatus 1200. Output device(s) 1250 may comprise, for example, a display (e.g., a display screen) a speaker, and/or a printer.
Data storage device 1230 may comprise any device, including combinations of magnetic storage devices (e.g., magnetic tape, hard disk drives and flash memory), optical storage devices, Read Only Memory (ROM) devices, etc., while memory 1260 may comprise Random Access Memory (RAM).
Planning application 1232 of data storage device 1230 may comprise program code executable by processor 1210 to provide any of the functions described herein, including but not limited to process 300. Multidimensional data 1234 may store values associated with dimension members as described herein, in any format that is or becomes known. Multidimensional data 1234 may also alternatively be stored in memory 1260. Data storage device 1230 may also store data and other program code for providing additional functionality and/or which are necessary for operation thereof, such as device drivers, operating system files, etc.
Other topologies may be used in conjunction with other embodiments. Moreover, each system described herein may be implemented by any number of computing devices in communication with one another via any number of other public and/or private networks. Two or more of such computing devices of may be located remote from one another and may communicate with one another via any known manner of network(s) and/or a dedicated connection. Each computing device may comprise any number of hardware and/or software elements suitable to provide the functions described herein as well as any other functions. For example, any computing device used in an implementation of system 200 may include a processor to execute program code such that the computing device operates as described herein.
All systems and processes discussed herein may be embodied in program code stored on one or more computer-readable non-transitory media. Such media non-transitory media may include, for example, a fixed disk, a floppy disk, a CD-ROM, a DVD-ROM, a Flash drive, magnetic tape, and solid state RAM or ROM storage units. Embodiments are therefore not limited to any specific combination of hardware and software.
The embodiments described herein are solely for the purpose of illustration. Those in the art will recognize other embodiments may be practiced with modifications and alterations limited only by the claims.