Today's organizations collect and store large volumes of data at an ever-increasing rate. Performing calculations upon or identifying patterns within this data can be time-consuming or even infeasible. Modern data analytics attempts to assist humans in efficiently understanding collected data. Such analytics may include application of machine learning techniques, specific data mining techniques, and purpose-designed mathematical functions.
Certain functions can be applied to an entire set of data or to subspaces of the data. For example, a function may be applied to data in order to determine a total profit for an organization. The function could also be applied to only a subspace of the data which is associated with a particular region, in order to determine a profit for the region. This function may be an example of a function which, when applied to an entire set of data, produces an output equal to the sum of the outputs produced when applied to subsets of the data which together comprise the entire set of data. Such a scenario provides a clear and intuitive understanding of the contribution of each subset to the total profit.
However, in the case of certain complex functions (e.g., non-linear), the sum of the outputs determined for each subset is not equal to the output determined for the entire set of data. It is therefore difficult to intuitively relate the proportional contribution of the subset-determined outputs to the output determined for the entire set of data. Determination of the relationship of each subset to the output determined for the entire set of data therefore requires detailed knowledge of the applied function, and may in turn may require derivation of additional functions to map the relationship between the function output when applied to subsets and the function output when applied to the entire set of data. Even if this approach were practical for a given function, it would be specific to the given function and unable to be generalized.
Systems are desired to provide a generic solution capable of efficiently mapping the relationship between the application of a function to subsets associated with respective discrete values of data and the application of the function to the entire set of data, without requiring any specialized knowledge of the underlying function.
The following description is provided to enable any person in the art to make and use the described embodiments and sets forth the best mode contemplated for carrying out some embodiments. Various modifications, however, will be readily-apparent to those in the art.
Some embodiments provide an interpretable mapping between the output of any calculation as applied to respective discrete values of a discrete feature within a set of data and the output of the calculation as applied to the entire set of data. Some embodiments may determine such a mapping for each of multiple discrete features in parallel, and may therefore be particularly suited for cloud-based implementations.
As used herein, a feature refers to an attribute of a set of data. In the case of a tabular data, each column may be considered as representing a respective feature of the data, while each row is an instance of values of each feature of the data. A continuous feature is represented using numeric data having an infinite number of possible values within a selected range. A discrete feature is represented by data having a finite number of possible values, hereinafter referred to as discrete values. Temperature is an example of a continuous feature, while days of the week and gender are examples of a discrete feature.
Data 110 may comprise values of database table. More specifically, data 110 may comprise rows of database table, with each row including a value of a corresponding database column, or feature. Data 110 consists of at least one discrete feature and one or more continuous features.
Feature selection component 120 identifies the continuous features which are utilized during evaluation of function ƒ and one or more discrete features for which discrete value-specific proportional contributions are to be determined. In the
Function application component 140 applies function ƒ to selected continuous features 135. In particular, function application component 140 evaluates function ƒ using all rows of selected continuous features 135 to generate overall output value Vall. Function application component 140 also evaluates function ƒ for each discrete value of discrete feature 130 using the rows of selected continuous features 135 which are associated with the discrete value. For example, function application component 140 generates output value VC1 based on rows of selected continuous features 135 which correspond to a first discrete value (i.e., discrete value C1) of discrete feature 130, generates output value VC2 based on rows of selected continuous features 135 which correspond to a second discrete value (i.e., discrete value C2) of discrete feature 130, and continues in this manner for each discrete value of discrete feature 130.
Proportional contribution analysis component 150 determines the proportional contribution which each discrete value makes to the overall output value. Generally, and as will be described in more detail below, a single scaling value is determined which is applied to each discrete value-specific output value such that the sum of the thusly-scaled discrete value-specific output values equals the overall output value. Accordingly, system 100 may provide an interpretable explanation of how the output value determined for each discrete value contributes to the output value of the function when applied to the entire set of data.
Process 200 may be initiated by a request for proportional contributions of each of a plurality of discrete values to an output value of a function. Such a request may be received from an end-user via data analytics application. In one non-exhaustive example, an end-user operates an inventory management application to request calculation of an output value of a function based on an input procurement table.
The input data is received at S210, in a structured form such as a tabular format. The structured format facilitates definition of one or more continuous features and one or more discrete features with the data. At least one of the discrete features is not used to calculate the desired function.
A plurality of the continuous features and one of the discrete features is selected at S220. The plurality of continuous features are those features which are needed to evaluate the function which outputs the requested value. In other words, the selected continuous features represent the variables of the function.
For clarity of the following explanation, it will be assumed that only one discrete feature is selected at S220. Embodiments are not limited to the selection of one discrete feature. Accordingly, the foregoing description will note variations to the described process which would be occur in the case of more than one selected discrete feature. In some embodiments, if no discrete features are specified by a user at S220, then all discrete features of the data are assumed to be selected.
Next, at S230, an overall output value of the function is determined using all the values associated with the selected continuous features. In some embodiments, the values of each continuous feature are aggregated to result in one value per continuous feature. The function is then applied to the set of aggregated values.
At S240, an output value of the function is determined for each discrete value of the selected discrete feature. Determination of the output value for a particular discrete value is based on the values of the selected continuous features which are associated with that discrete value. As described with respect to S230, the values of the selected continuous features which are associated with each discrete value may be initially identified and aggregated. Each of the three discrete values (i.e., C1, C2, C3) of column 310 is associated with a respective three rows of continuous feature columns 320.
If more than one discrete feature is selected at S220, the process illustrated at
The proportional contribution of each discrete value to the overall output value is determined at S250 based on the output values determined at S230 and S240. As mentioned above, and according to some embodiments, each discrete value-specific output value is scaled such that the sum of the thusly-scaled discrete value-specific output values equals the overall output value determined at S230.
Process 600 of
At S610, a square symmetric matrix is generated based on the output values determined for each discrete value at S240.
Each row and each column of matrix 700 includes all of the output values of
An overall output vector is generated at S620. The number of entries of the overall output vector is equal to the number of rows of the symmetric matrix (i.e., the number of discrete values of the associated discrete feature). Moreover, as shown in
At S630, and for each selected discrete feature, the associated symmetric square matrix and overall output vector are used to build a regression model to predict the overall output value as applied to the entire input data. In particular, the rows of the symmetric square matrix are used as training set instances, with each column representing independent input features, and the overall output vector is used as the dependent feature, with each entry representing the target value to be predicted for a corresponding row of the symmetric square matrix. The linear regression algorithm may comprise a least squares algorithm or any other suitable regression algorithm from which weights can be extracted.
The learned weights for each discrete value are extracted from the regression model at S640.
The output values determined at S240 are scaled at S650 based on the extracted weights. In the present example, each output value of a discrete value is multiplied with the weight associated with the discrete value.
Returning to process 200, the proportional contributions are presented at S260.
User interface 1100 shows, for example, a gross margin value is calculated based on data according to an associated function. As described above, the function takes into account several features of the data to generate the gross margin.
Panel 1110 is invoked to provide additional information regarding the calculated gross margin value. In particular, panel 1110 indicates, for each of three features of the data, a value which contributes most to the calculated value with respect to other values of the feature. With respect to the discrete feature Sector, panel 1110 presents the proportional contributions of each discrete value of the feature to the calculated overall gross margin value. The proportional contributions may have been determined as described herein. Advantageously, and regardless of the computational complexity of the function used to calculate the gross margin value, the sum of the presented proportional contributions equals the calculated overall gross margin value.
According to some embodiments, user 1220 may interact with application 1212 (e.g., via a Web browser executing a front-end UI application associated with application 1212) to request calculation of a value based on data of data 1216. Next, user 1220 may request analysis of the calculated value. To perform this analysis, application 1212 may access analytics platform 1230. Analytics platform 1230 may also be implemented by on-premise or cloud-based servers.
Analytics platform 1230 includes program code of proportional contribution analysis framework 1232, which may be executed to determine discrete value-specific proportional contributions to an overall value as described herein. These determined proportional contributions may be provided to application 1212 for presentation to user 1220. According to some embodiments, application 1212 is capable of determining discrete value-specific proportional contributions as described herein. Analytics platform 1230 may provide additional functionality to applications, such as but not limited to machine learning model training and inference.
Hardware system 1300 includes processing unit(s) 1310 operatively coupled to I/O device 1320, data storage device 1330, one or more input devices 1340, one or more output devices 1350 and memory 1360. I/O device 1320 may facilitate data exchange with external devices, such as an external network, the cloud, or data storage device. Input device(s) 1340 may comprise, for example, a keyboard, a keypad, a mouse or other pointing device, a microphone, knob or a switch, an infra-red (IR) port, a docking station, and/or a touch screen. Input device(s) 1340 may be used, for example, to enter information into hardware system 1300. Output device(s) 1350 may comprise, for example, a display (e.g., a display screen) a speaker, and/or a printer.
Data storage device 1330 may comprise any appropriate persistent storage device, including combinations of magnetic storage devices (e.g., magnetic tape, hard disk drives and flash memory), optical storage devices, Read Only Memory (ROM) devices, and RAM devices, while memory 1360 may comprise a RAM device.
Data storage device 1330 stores program code executed by processing unit(s) 1310 to cause hardware system 1300 to implement any of the components and execute any one or more of the processes described herein. Embodiments are not limited to execution of these processes by a single computing device. Data storage device 1330 may also store data and other program code for providing additional functionality and/or which are necessary for operation of hardware system 1300, such as device drivers, operating system files, etc.
The foregoing diagrams represent logical architectures for describing processes according to some embodiments, and actual implementations may include more or different components arranged in other manners. Other topologies may be used in conjunction with other embodiments. Moreover, each component or device described herein may be implemented by any number of devices in communication via any number of other public and/or private networks. Two or more of such computing devices may be located remote from one another and may communicate with one another via any known manner of network(s) and/or a dedicated connection. Each component or device may comprise any number of hardware and/or software elements suitable to provide the functions described herein as well as any other functions. For example, any computing device used in an implementation some embodiments may include a processing unit to execute program code such that the computing device operates as described herein.
Embodiments described herein are solely for the purpose of illustration. Those in the art will recognize other embodiments may be practiced with modifications and alterations to that described above.