Computer-implemented technologies can assist users in developing and employing computing applications that utilize machine learning. These machine learning computing applications are typically implemented with one or more machine learning models. A model-development system can be part of or utilized for the machine learning computing system, for instance, in order to create, train, configure, or otherwise develop a machine learning model.
Machine learning models produced using conventional model-development systems are prone to producing models that reinforce biases existing in the training data used for training or developing the model. For example, these deficient models produce biased scores that can be inaccurate and/or that can reinforce unfair discriminations that exists in our societies. Conventional model-development technologies do not have functionality to address or counteract such biases that may be present in the data and thus prevent deficient models from being produced and deployed. Moreover, reducing these biases in a computationally efficient manner is a task that is difficult to implement in practice given the limitless number of unique datasets, ways that data can be partitioned into groups, and the various ways that data may contain biases.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
The technologies described in this disclosure are directed toward computerized systems and methods for providing bias-reducing machine learning correction technology for machine learning systems. In particular, the described technologies involve a loss adjustment operation, or mechanism for performing the operation, that can be performed during the development of a trained machine learning model. The loss adjustment operation can comprise an application of one or more loss adjustment weights applied to a loss function used for training the machine learning model. Embodiments of the present disclosure may include determining loss adjustment weights based on a count of a feature-label combination in the dataset. For instance, in one embodiment, an adjustment weight is based on a count of the number of instances in the data where the same feature-label combination is present in the dataset. These features may comprise particular data features, such as sensitive features, which may be more likely to have bias. The loss adjustment weights may be computed based on a comparison of a predicted output and a ground truth or label. For example, the loss adjustment weights may be computed based on a suitable statistical analysis technique, such as chi-squared test, Fisher's exact test, and the like. After the loss adjustment weight has been computed, a custom loss function which consumes the loss adjustment weight (referred to going forward as the adjusted loss function) is used for training the machine learning model. In some instances, the adjusted loss function is a loss function that corresponds to the machine learning model, but that is modified based on the loss adjustment weight.
During model training, the output of the adjusted loss function, which is the adjusted loss, may be evaluated against a loss threshold to determine whether the model is sufficiently trained. In some embodiments, a model may be considered sufficiently trained when the adjusted loss is below the loss threshold or otherwise satisfies the loss threshold, indicating an acceptable level of inaccuracy (for example, the adjusted loss is below a threshold for inaccuracy or within a permitted range of accuracy). Once the machine learning model is determined to be sufficiently trained, the model may be deployed or otherwise made available for use in a computing application or service. For example, the machine learning model may be deployed to an operating system layer of a client device or server device. On the other hand, in response to the adjusted loss not satisfying the loss threshold value, the adjusted loss may be used to update the model parameters and retrain or further train the machine learning model.
In this manner, present embodiments provide technology to improve machine learning systems by removing or reducing biases associated with some features by performing a loss adjustment operation that utilizes a modified or customized loss function during the training of the machine learning model. Additionally and advantageously, embodiments of these technologies can remove biases in machine learning applications without requiring computer code for addressing the biases to run on the computing system where the model is deployed and/or running, such as a computer program operating on a client computer. Further, some embodiments may be personalized or tailored for certain types of data, such as sensitive data. Whereas existing approaches fail to allow personalization and/or require computationally expensive manipulation of large data sets, the embodiments disclosed herein can remove bias in a computationally efficient manner. These embodiments also can provide for the selection of sensitive features and can perform less computationally complex calculations to determine a loss adjustment weight that can be utilized to determine an adjusted loss function. Accordingly, present embodiments are not only more accurate, but also are more easily scaled compared to existing computationally intensive approaches.
The technology described herein is described in detail below with reference to the attached drawing figures, wherein:
The subject matter of aspects of the present disclosure is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described. Each method described herein may comprise a computing process that may be performed using any combination of hardware, firmware, and/or software. For instance, various functions may be carried out by a processor executing instructions stored in memory. The methods may also be embodied as computer-usable instructions stored on computer storage media. The methods may be implemented within a standalone application, a service or hosted service (standalone or in combination with another hosted service), or a plug-in to another product, to name a few.
As access to complex computer technologies continues to increase, an increased number of users, such as developers, are looking toward machine learning technologies to improve predictive and classification functionality—all in an effort to improve operations and utilization of computer technologies. Computer technologies are challenged to adapt to the diverse needs and preferences of the increasing number of users. Conventionally, machine learning model-development systems are not configured with a computing infrastructure or logic to deliver unbiased predictions or machine learning outputs. In particular, conventional model-development systems may suffer from different data biases, which can lead to the machine learning models generated by these systems to be biased and/or inaccurate. Consequently, these deficient machine learning models can have disparate impact on users by affecting users differently based on user sensitivity to certain features. Additionally, bias can result in decreased accuracy of software applications employing the machine learning model, limiting their use to only certain types of data, and other problems.
One example of the type of biases introduced during model training is client-specific biases; this is when a model produces inaccurate scores for a group of users that was either underrepresented or not present in the data used to train & test the model. Conventional machine learning systems do not have a way to counteract such biases in their data and prevent the biases from resulting in client-specific biases in the produced models. Moreover, reducing client-specific biases in a computationally efficient manner is a task that is difficult to implement in practice given the limitless number of unique datasets, ways that data be partitioned into groups, and ways that data may contain biases.
Existing approaches to the bias problems include employing certain bias-removing algorithms, such as disparate impact remover and equality of odds. But such approaches can (1) require extensive computations making these existing approaches difficult to scale across any number of client devices operating in any number of operating environments, (2) require large storage space to store complex training data, (3) lack user-personalization or otherwise not permit those facilitating training the model to identify or modify certain data such as sensitive data, and/or (4) require extensive computational resources to be employed on system where the model is used, such as the client-side, and in some instances, on the system where the model is developed, such as the server-side, as well. This can result in placing a computational burden on the computing machine running the model, which often has limited computational resources, such as an internet of things (IoT) device or client device, and that may be looking to offload computations. Accordingly, there is a need to improve machine learning methodologies to be computationally efficient as well as scalable and generalizable across different systems where the model is deployed and used.
With the foregoing in mind, embodiments of the present disclosure are directed to providing bias reducing machine learning correction technology for model-development systems. In particular, a loss adjustment operation is performed during the development of a machine-learning model. Performing the loss adjustment operation can comprise applying one or more loss adjustment weights to a loss function used for training the machine-learning model. As used herein, “loss adjustment weight” comprises a value or degree to which an aspect of a loss function is modified to change a weight of loss (or error) attributed to a particular sensitive data feature (or groups of sensitive features). In some embodiments, a loss adjustment weight comprises a coefficient, scalar, multiplier, or another function applied to the training algorithm's default loss function. For instance, one example of a loss adjustment weight can include a value that is applied to the loss function (for example, multiplied, added, or used to re-calculate the output of the loss function) to modify the relative weight attributed to a group of the sensitive feature relative to other groups. As further described herein, a loss adjustment weight can be determined based on a count of a feature-label combination of a sensitive feature. As used herein, “a loss function,” which may also be referenced as “a cost function” or “an error function,” refers to a function that maps a value of one or more variables onto a real number indicative of some “loss” associated with the event or values. Example loss functions can include a computationally simple operation such as a subtraction, addition, absolute-value difference operation, or a more complex calculation. A loss function may be used to compute a difference between an estimated or predicted value and the true value, such that a difference of less magnitude is indicative of the estimate value being a more accurate representation of the true value. In this manner, the loss function can be used to assess the accuracy of an estimate value (for example, a prediction or classification output of a machine-learning model) relative to a true value. In this way, a loss function may be used during training of a machine learning model to determine whether the model has been sufficiently trained.
As used herein, “sensitive feature” may refer to an individual, measurable property or characteristic of a phenomenon (for example, a data feature) that may be subject to bias. Example sensitive features may include gender, age, race, native language, geographic location, type of client device, and the like. To facilitate discussion, sensitive features are discussed as having one or more “groups.” Using gender as an example, “gender” may refer to the “sensitive feature;” and “male,” “female,” and “non-binary” may refer to three example “groups” associated with the corresponding sensitive feature (i.e., gender).
Some embodiments of the present disclosure include determining one or more loss adjustment weights based on a count of a feature-label combination of sensitive features. As used herein, “a feature-label combination” refers to one or the combinations of a feature group and a label value. As used here, “label” is a representation of the “ground truth” and refers to known truth values, as opposed to mere estimates.
Additionally, the label may refer to training data whose identity or values are known. As discussed in more detail herein, the loss adjustment weights may be computed based on an appropriate statistical analysis methodology, such as chi-squared test, Fisher's exact test, or any other suitable statistical analysis methodology. Alternatively or additionally, the loss adjustment weights may be determined based on a comparison of a model output (such as a prediction output by the model) and the label (such as the ground truth). To facilitate computations, in some embodiments, training data used to train the machine learning model may be converted into tabular format. After a loss adjustment weight has been computed, a loss function used for training the machine learning model can be modified (or replaced), based on the loss adjustment weight, to generate an adjusted loss function, as discussed herein. As used herein, the output of the adjusted loss function may be referred to as the “adjusted loss.” The adjusted loss may be indicative of an accuracy of the predicted output of the machine learning model relative to the label (for example, ground truth). During training of a machine learning model, the adjusted loss may be evaluated against a loss threshold to determine whether the model is sufficiently trained. In response to the adjusted loss satisfying a loss threshold value indicative of an acceptable level of inaccuracy (for example, the adjusted loss is below a threshold for inaccuracy or within a permitted range of accuracy), the machine learning model is determined to be sufficiently trained, such that the adjusted machine learning model and the corresponding model parameters used to train the adjusted machine learning model may be deployed or otherwise made available for use in a computing application or service. Alternatively, in response to the adjusted loss not satisfying the loss threshold value, then the adjusted loss may be used to update the model parameters and retrain or further train the machine learning model. Thus, bias associated with a particular sensitive feature may be removed or reduced by, for example, determining a count of feature-label combination(s) of the sensitive feature and determining a loss adjustment weight (based on the count) used to adjust a loss function until the adjusted loss satisfies (for example, is below) the loss threshold value. In this manner, present embodiments provide a technology to improve machine learning systems by removing biases (for example, of sensitive features selected by a user) by modifying a loss function or providing a customized loss function, and may be personalized or customized to particular sensitive features. Whereas existing approaches may fail to allow user personalization and/or may require computationally expensive manipulation of large data sets that can pose a burden on server-side and client-side components, the present embodiments remove bias in a computational efficient manner, as described herein.
Turning now to
Among other components not shown, example operating environment 100 includes a number of user devices, such as user devices 102a and 102b through 102n; a number of data sources, such as data sources 104a and 104b through 104n; server 106; displays 103a and 103b through 103n; and network 110. It should be understood that environment 100 shown in
It should be understood that any number of user devices, servers, and data sources may be employed within operating environment 100 within the scope of the present disclosure. Each may comprise a single device or multiple devices cooperating in a distributed environment. For instance, server 106 may be provided via multiple devices arranged in a distributed environment that collectively provide the functionality described herein. Additionally, other components not shown may also be included within the distributed environment.
User devices 102a and 102b through 102n can be client devices on the client-side of operating environment 100, while server 106 can be on the server-side of operating environment 100. In more detail,
User devices 102a and 102b through 102n may comprise any type of computing device capable of use by a user. For example, in one embodiment, user devices 102a through 102n may be the type of computing device 900 described in relation to
Data sources 104a and 104b through 104n may comprise data sources and/or data systems, which are configured to make data available to any of the various constituents of operating environment 100, or system 400 described in connection to
Operating environment 100 can be utilized to implement one or more of the components of systems 200 and 400, described in
Example system 200 includes a client-side 202 and a server-side 210. In certain embodiments, the server-side 210 includes a user interface (UI) 212, a data source 214 and a machine learning system 220. As discussed below, the model may be trained on the server-side 210. Embodiments of UI 212 may be configured to invoke or access aspects of the machine learning system by way of a set of REST APIs, a python SDK, and the like. UI 212 can utilize a set of commands from users (for example, on the client-side 202) to integrate assigned services and computational resources. For example, the UI 212 may provide commands for deploying web-based applications, creating a postgres structured query langue (SQL) database, managing virtual machines, connecting software applications with user-specific storage devices, and so forth. In the context of machine learning, the UI 212 implements commands for scheduling jobs that train machine learning models, retrain machine learning models, and use the trained machine learning model for inference. As such, some embodiments of UI 212 use data transformation and training configurations received from users as inputs to the underlying actions that are executed. In some embodiments, UI 212 comprises a command line interface (CLI) or a graphical user interface (GUI).
The machine learning system 220 includes a data initializer module configured to receive client-side data 230. In some embodiments, the data initializer module 222 pre-processes the client-side data to generate feature vectors. For example, a client-side device (such as, user device 102 of
The machine learning system 220 includes a model-development system 240 configured to train and format the machine learning model before deploying the trained machine learning model. The model-development system 240 may include a training data determiner 242, a model training system 250, and a model evaluation system 260. The training data determiner 242 may be configured to receive the training data from the training data determiner 224 and/or the initialized data from the data initializer module 222 to parsimoniously describe the data. The training data determiner 242 may define parameters for selecting metadata (for example, a description model) used to describe the data. In one embodiment, the training data determiner 242 describes the data based on a model that would result in the shortest permissible description of the data. In this manner, the computational resources utilized to store the data may be reduced.
In some embodiments, the training data determiner 242 is configured to process the feature vectors to determine suitable training data to be used by a model-development system 240. In some embodiments, the training data determiner 242 is configured to track and store any suitable data which may be used to train a machine learning model. For example, the training data determiner 242 may track user interactions with a software application, determine data (for example, custom data) received by a user, receive ground truths or labels to be used to evaluate (block 320 of
In some embodiments, the training data determiner 242 is configured to query the data source 214 storing the label data, for example, from the client-side 202. It should be understood that the data retrieved training data determiner 242 is not limited to client-side data 230, but may be based on any suitable data (for example, from cloud-side 210 or client-side 202).
The model training system 250 is configured to train a machine learning model. As described herein, the model training system 250 may include the logic (such as the model training logic 252 of
The model evaluation system 260 is configured to evaluate (for example, via model evaluator 320 of
Although the model development system 240 is discussed as including specific components, it should be understood that the model development system 240 any other or additional components. For example, the model development system 240 may include a user interface, a data query module, a data preparation and transformation module, and a module to produce a file containing a serialized version of the machine learning model (such as an .onnx file), to name a few.
Thereafter, the trained machine learning model may be deployed to a prediction unit 270 (for example, user device 102). In some embodiments, the machine learning model may be integrated into the operating system of the user device 102. Alternatively or additionally, the machine learning model may be deployed via or to any suitable abstraction layer(s) such as the application layer, hardware layer, and so forth, of the prediction unit 270.
Turning to
With this in mind, the process 300 may include receiving a user input by way of a graphical user interface (GUI) 302 of the user device 102. The graphical user interface 302 may receive any suitable input, such as a request for preparing training data to train a model or a request indicative of machine learning parameters, such as the sensitive features, corresponding groups, an indication of a type of machine learning model to be used, training data, and the like. For example, the GUI 302 may include a JavaScript Object Notation (JSON) file configured to receive user inputs. The user inputs may include a selection of the sensitive features. Due to the language-independent structure of a JSON file, the JSON file may be employed by processing circuitry employing any suitable machine learning model using any suitable programming language. Nevertheless, it should be understood that the GUI 302 may include any suitable screen regions, selectable icons, toggles, and controls, for example, to select sensitive features and groups. One example of the GUI 302 may be found in
The process 300 includes receiving training data, as discussed below with respect to the sensitive feature collector 412. In some embodiments, the training data may include or be divided into training data 306A used for model training and training validation data 306B used for model validation. The training data 306A may include a labeled data for use to train the machine learning model 310 by the model builder 312. “Labeled data” may refer to data that has been collected and joined with their corresponding labels. Thus the machine learning model builder 312 may receive and use the labeled training data 306A for training purposes. In one embodiment, as a machine learning model 310 is being trained using the training data 306A or once the machine learning model 310 has been trained using training data 306A the machine learning model 310 may be evaluated (block 320), as discussed with respect to the model evaluator 446 of
In some embodiments, training data 306A used by the model builder 312 to train the machine learning model 310 may be converted into a vector or tabular format that may include or be associated with an indication of a label. The vector or tabular format may facilitate computations, such as determining (block 330) a count of feature/label combinations for sensitive features, as discussed below with respect to the count determining engine 414 of
In addition, the process 300 includes determining (block 332) loss adjustment weights, as discussed below with respect to the loss weight calculator 422 of
Evaluation (block 320) of the machine learning model 310 may be based on the output predicted (or otherwise determined) by the machine learning model 310, the label (for example, ground truth) corresponding to training data 306A, and the loss adjustment weights, as discussed below with respect to the model training logic 452 of
In response to the adjusted loss not being below the loss threshold value, the machine learning model 310 is retrained, as discussed below with respect to the bias-reducing model generating engine 440 of
In response to the adjusted loss satisfying the loss threshold (for example, where the adjusted loss is below the loss threshold where the threshold corresponds to a maximum tolerated inaccuracy), the machine learning model 310 is determined to be sufficiently trained, such that the machine learning model 310 and the corresponding model parameters used to train the machine learning model 310 may proceed to being validated (block 350). Validating (block 350) the machine learning model 310 may include receiving training validation data 306B. As discussed above, the training validation data 306B is separate from the training data 306A. For example, the training validation data 306B may be used for validation purposes instead of model training purposes. In some embodiments, the machine learning model 310 may be validated (block 350) using the adjusted loss function. For example, the adjusted loss function may be used as the score function used to validate the model. If the machine learning model does not pass validation, then the machine learning model may be further trained and revised. On the other hand, if the machine learning model passes validation, the machine learning model may be deployed (block 360), for example, to the user device 102.
Turning to
Example system 400 includes network 110, which is described in connection to
In one embodiment, the functions performed by components of system 400 are associated with one or more applications, services, or routines. In one embodiment, certain applications, services, or routines may operate on one or more user devices (such as user device 102a, for example, on the client-side 202 of
Continuing with
The count determining engine 414 is configured to determine features, such as the sensitive features described herein. In some embodiments, the features may be determined based on raw data. In some embodiments, the count determining engine 414 is configured to receive training data (for example, training data 306A of
Example training data includes any labeled data or unlabeled data. For example, training data may include computing device information (such as charging data, date/time, or other information derived from a computing device), user-activity information (for example: app usage; online activity; searches; browsing certain types of webpages; listening to music; taking pictures; voice data such as automatic speech recognition; activity logs; communications data including calls, texts, instant messages, and emails; website posts; other user data associated with communication events; other user interactions with a user device, and so forth) including user activity that occurs over more than one user device, user history, session logs, application data, contacts data, calendar and schedule data, notification data, social network data, news (including popular or trending items on search engines or social networks), online gaming data, ecommerce activity (including data from online accounts such as Microsoft®, Amazon.com®, Google®, eBay®, PayPal®, video-streaming services, gaming services, or Xbox Live®), user-account(s) data (which may include data from user preferences or settings associated with a personalization-related (for example, “personal assistant” or “virtual assistant”) application or service), home-sensor data, appliance data, global positioning system (GPS) data, vehicle signal data, traffic data, weather data (including forecasts), wearable device data, other user device data (which may include device settings, profiles, network-related information (for example, network name or ID, domain information, workgroup information, other network connection data, Wi-Fi network data, or configuration data, data regarding the model number, firmware, or equipment, device pairings, such as where a user has a mobile phone paired with a Bluetooth headset, for example, or other network-related information)), gyroscope data, accelerometer data, payment or credit card usage data (which may include information from a user's PayPal account), purchase history data (such as information from a user's Xbox Live, Amazon.com or eBay account), other data that may be sensed or otherwise detected, data derived based on other data (for example, location data that can be derived from Wi-Fi, cellular network, or IP (internet protocol) address data), calendar items specified in user's electronic calendar, and nearly any other data that may be used to train a machine learning model, as described herein.
The count determining engine 414 is configured to determine (block 330 of
The count determining engine 414 may convert raw data into any suitable format. By way of non-limiting example, the count determining engine 414 may convert raw data into a tabular format. Taking gender as an example of a binary outcome (although gender may not be binary, for purposes of simplifying this example, gender will be discussed as binary), the count determining engine 414 may receive training data (which may comrpise raw data) indicating whether a prediction was accurate (indicated as “Yes”) in Table 1 below or inaccurate (indicated as “No”) in Table 1. Example predictions include whether a camera accurately identified a person, whether a survey was completed as predicted, and any additional or alternative prediction. Taking completion of a survey as an example, a label of “Yes,” as shown in Table 1, indicates that the survey was completed by the corresponding gender; while a label of “No,” as shown in Table 1, indicates that the survey was not completed. Thus, for a label of “no” the prediction failed to satisfy a label.
The count determining engine 414 may convert the labeled raw data of Table 1 into the tabular format of Table 2. As depicted, taking gender as a sensitive feature, the gender is divided into two groups, namely, female and male, which are shown as rows in the table. For each group (for example, row), the label may be determined by adding up the times the ground truth was satisfied (for example, the “Yes” from the third column of Table 1) and adding up the times the ground truth was not satisfied (for example, the “No” from the third column of Table 2). As illustrated in Table 2, the labels may be divided into the times the ground truth was satisfied (for example, the “Label Yes” column Table 2) and the times the ground truth was not satisfied (for example, the “Label No” column of Table 2). The rows and the columns may be added to calculate the totals corresponding to a respective column and/or row. In this example, the count for the feature-label combination of sensitive features, such as gender, may be determined and organized as shown in Table 2. In particular, the counts for the female-yes combination, the male-yes combination, the female-no combination, the male-no combination, and their corresponding summations, may be retrieved from Table 2 to calculate the loss adjustment weights. The Table 1 and the Table 2 may be stored in storage 450.
Although this example is discussed in the context of binary groups of a sensitive feature, it should be understood that these embodiments may be employed by a non-binary groups of sensitive features, such as race, nationality, sexual orientation, age, other groupings, and so forth. Additionally, any number of sensitive features can be engineered by the count determining engine 414. In some embodiments, the count determining engine 414 may determine a count of feature-label combinations across more than one feature. For example, the count determining engine 414 may determine count of feature-label combinations across gender, age, and ethnicity for any groups of each of these three features. Accordingly, embodiments of the present disclosure are not limited to determining a count of feature-label combinations across only one feature (for example, gender). In either case, in one embodiment, the cross-tabular table generated by the count determining engine 414 may be an N×N table where the number of rows equals the number of columns, as in Table 2, which includes the same number of rows as columns. Moreover, although this example is discussed in the context of a Table, the embodiments discussed herein are not limited to data stored in tabular format. The computations described herein may be applied to any statistically independent set of variables, such as the feature and the label.
The model training logic 452 of the storage 450 may define a set of rules or conditions used to calculate or determine the loss adjustment weight, the loss function for a machine learning model, and the like. In some embodiments, the model training logic 452 may be used by the model training system 250 of
Continuing with
The loss weight calculator 422 may receive the counts for feature-label combinations of a sensitive feature from the count determining engine 414. The loss weight calculator 422 may calculate the loss adjustment weights based on the counts for feature-label combinations. The loss weight calculator 422 may calculate the loss adjustment weights based on any suitable statistical method, such as the Chi-square test, Fisher's exact test, UniODA test, Mann-Whitney U test, Kruskal-Wallace Test, and the like.
Continuing on the gender example above, the loss weight calculator 422 may determine the loss adjustment weights by performing a chi-squared test. Using equation 1, WEIGHT IN EACH CELL=(ROW MARGINAL SUM*COLUMN MARGINAL SUM)/(GRAND SUM*CELL VALUE), the loss weight calculator 422 may determine the weight in each cell of Table 3. A more detailed illustration of the calculations is provided in Table 3.
As depicted in Table 3 and using the data in Table 2, the weights in each of the cells of Table 3 may be calculated in accordance with equation 1, whereby the sum of a row (of Table 2) is multiplied by the sum of the column (of Table 2), and then divided by the product of (1) the sum of all entries (for example, bottom right value in Table 2) and (2) the value (for example, from Table 2) of that corresponding cell.
The loss weight calculator 422 may store the calculated loss adjustment weights in storage 450. In some embodiments, the loss adjustment weights may be stored in a hash table to facilitate associating the loss adjustment weights to any data type. For example, the loss weight calculator 422 may assign the loss adjustment weights to certain training data, such that the hash table associates the loss adjustment weights to training data that is used to evaluate the machine learning model by the model evaluator 446.
Table 4 shows example computed loss adjustment weights. As depicted in Table 5, a loss adjustment weight may be assigned to a user device (for example, denoted in Table 5, as “DeviceID”). Although the illustrated Table 5 shows one weight assigned per user device, it should be understood that in certain embodiments, more than one weight may be assigned to a user device. For example, one weight may be assigned per feature for a plurality of features, or more than one weight may be assigned for one feature.
The adjusted loss function generator 424 may determine the adjusted loss. In some embodiments, the adjusted loss function generator 424 receives the loss adjustment weights, an output predicted by the machine learning model, and/or a label corresponding to training data. In response to receiving the aforementioned, the adjusted loss function generator 424 may determine the adjusted loss. The adjusted loss function generator 424 may adjust the loss function corresponding to the machine learning model based on the loss adjustment weights, such that the adjusted loss function is configured to remove bias associated with the sensitive feature determined by the sensitive feature collector 412. In some embodiments, the adjusted loss function generator 424 may adjust the loss function based on the loss adjustment weights and then the adjusted loss function is used to calculate the adjusted loss.
The bias-reducing model generating engine 440 may receive a machine learning model trained based on the bias-reducing loss function engine 410. The model initializer 442 may select and initialize a machine learning model. Example machine learning models include a neural network model, a logistic regression model, a support vector machine model, and the like. Initializing the machine learning model may also include causing the model initializer 442 to determine a loss function associated with the machine learning model. Initializing the machine learning model may include causing the model initializer 442 to determine model parameters and provide initial conditions for the model parameters. In one embodiment, the initial conditions for the model parameters may include a coefficient for the model parameter.
The model trainer 444 may train the machine learning model determined by the model initializer 442. As part of training the machine learning model, the model trainer 444 may receive outputs from the model initializer 442 to train the machine learning model. In some embodiments, the model trainer may receive the type of machine learning model, the loss function associated with the machine learning model, the parameters used to train the machine learning model, and the initial conditions for the model parameters. The model trainer 444 may iteratively train the machine learning model by using the optimizer 340 (of
The model evaluator 446 may evaluate the accuracy of the machine learning model trained by the model trainer 444. In some embodiments, the model evaluator 446 is configured to assess the accuracy of the model based on a loss (for example, error) determined based on the loss function. For example, the model evaluator 446 may receive an indication of the loss adjustment weights and determine an adjusted loss by applying the loss adjustment weights to the loss function corresponding to the machine learning model. The output of the loss function (i.e., the adjusted loss) may be compared to the loss threshold value(s) indicative of an acceptable level of inaccuracy or error. In response to the adjusted loss not being below the loss threshold value(s), the model trainer 444 may retrain the machine learning model. Alternatively, in response to the adjusted loss being below the loss threshold value(s), the machine learning model is determined to be sufficiently trained, such that the machine learning model and the corresponding model parameters used to train the machine learning model may be validated.
The model evaluator 446 may validate the machine learning model. In some embodiments, the model evaluator 446 may receive training data (for example, the training validation data 306B of
The model evaluator 446 may validate the machine learning model based on a score function. The score function may facilitate determining probabilistic scores for a classification machine learning model or estimated averages for regression problems, to name a couple examples. It should be understood that the score function may include any suitable algorithm applied to training data (such as the training validation data 306B of
In some embodiments, the model deploying engine 448 may receive a machine learning model determined to be sufficiently trained. The model deploying engine 448 may deploy a trained machine learning model to any suitable abstraction layer. For example, the model deploying engine 448 may transmit the machine learning model to the operating system layer, application layer, hardware layer, and so forth, associated with a client device or client account. In the context of the model deploying engine 448 transmitting the machine learning model to the operating system layer (for example, of a client device), an end-to-end bias-reducing reducing machine learning system (for example, the machine learning model trained by the model-development system 240 of
As shown, example system 400 includes a presentation component 460 that is generally responsible for presenting content and related information, such as the GUI of
Turning to
Turning now to
Per block 610, particular embodiments include pre-processing a data set to generate data that can be used for machine learning purposes. In some embodiments, pre-processing data set may include generating feature vectors, for example, based on client-side data (for example, client-side data 230 of
Per block, 620, particular embodiments include splitting the pre-processed data set into training data 306A of
Per block 630, particular embodiments include determining loss adjustment weights. The loss adjustment weights may be determined based on the training data. As discussed above, the loss function weight engine 420 of
Moving to
Continuing with
Per block 740, particular embodiments include training the machine learning model using the adjusted loss function generated by block 730. As discussed above, the model trainer 444 is configured to train the machine learning model based on training data (for example, labeled and/or unlabeled training data). In some embodiments, the machine learning model may be iteratively trained. For example, model parameters may be iteratively updated to reduce an error or loss calculated by the loss function (for example, the adjusted loss function). The optimizer 340 of
Per block 750, particular embodiments include deploying the trained machine learning model. As discussed above, the model deploying engine 448 of
An illustrative example embodiment of the present disclosure that has been reduced to practice is described herein. This example embodiment comprises a bias-reducing loss function engine 410 (of
With reference to
As the first example, survey response rates appeared to differ for users operating older devices.
Whereas older devices had lower response rates before employing the embodiments disclosed herein, response rates were more similar across devices of different ages after loss adjustment weights were applied to the loss function used by the machine learning model. Thus, the bias-reducing loss function engine 410 was able to reduce biases associated with age of a device.
As a second example, survey response rates appeared to differ based on the age of the users.
Whereas older users had lower response rates before employing the embodiments disclosed herein, response rates were more similar across devices of different ages after loss adjustment weights were applied to the loss function used by the machine learning model. Thus, the bias-reducing loss function engine 410 was able to reduce biases associated with age of a user.
As a third example, survey response rates appeared to differ based on the gender of the users.
Whereas users identifying as female had lower response rates before employing the embodiments disclosed herein, response rates were more similar across users regardless of gender after loss adjustment weights were applied to the loss function used by the machine learning model. Thus, the bias-reducing loss function engine 410 was able to reduce biases associated with gender of a user.
As a fourth example, survey response rates appeared to differ based on the price of the user device.
Whereas users prompted to complete surveys on lower quality devices had lower response rates before employing the embodiments disclosed herein, response rates were more similar across users regardless of the quality of their respective device after loss adjustment weights were applied to the loss function used by the machine learning model. Thus, the bias-reducing loss function engine 410 was able to reduce biases associated with quality of a user device.
As a fifth example, survey response rates appeared to differ based on the region from which the user device was identified.
As another illustration of the response rates before employing the bias-reducing loss function engine 410,
As the foregoing reduction to practice has illustrated, implementing loss adjustment weights determined in accordance with processes 600 and 700 of
In some embodiments, a computerized system, such as the system described in any of the embodiments above, comprises at least one computer processor and computer memory storing computer-useable instructions that, when executed by the at least one computer processor, cause the at least one computer processor to perform operations. The operations comprise determining, at a bias reducing machine learning engine and from training data, a count of a feature-label combination relating a sensitive feature to a label. The operations also comprise determining a loss adjustment weight based on the count of the feature-label combination, and applying the loss adjustment weight to a loss function to generate an adjusted loss function. The operations further comprise training a machine learning model using the adjusted loss function to generate an adjusted machine learning model. The operations further comprise causing deployment of the adjusted machine learning model for use in a computing application. Advantageously, these and other embodiments, as described herein, provide technology to improve machine learning systems by removing or reducing biases associated with some features by performing a loss adjustment operation that utilizes a modified or customized loss function during the training of the machine learning model. Further, these embodiments remove biases in machine learning applications without requiring computer code for addressing the biases to run on the computing system where the model is deployed and/or running, such as a computer program operating on a client computer, and thus address the bias problem in a computationally efficient manner. Further still, these embodiments can be personalized or tailored for certain types of data, such as sensitive data. Further still, these embodiments can provide for the selection of particular sensitive features. Accordingly, these embodiments are not only more accurate, but are more easily scaled compared to existing computationally intensive approaches.
In any combination of the above embodiments of the system, the operations may further comprise converting the training data into a table or a vector configured to associate a group of the sensitive feature to a corresponding label, and wherein the loss adjustment weight is determined based on a statistical analysis of the table or vector.
In any combination of the above embodiments of the system, the statistical analysis comprises performing a chi-squared test or Fisher's exact test on the converted training data.
In any combination of the above embodiments of the system, the count of the feature-label combination is determined based on a frequency of the sensitive feature relative to the label, and wherein training the machine learning model using the adjusted loss function reduces a bias attributed to the sensitive feature.
In any combination of the above embodiments of the system, the sensitive feature comprises a gender feature, a race feature, an age feature, a socioeconomic feature, a geographical location feature, or a health feature.
In any combination of the above embodiments of the system, the operations may further comprise determining the loss function based on the machine learning model, wherein the loss adjustment weight is determined based on the loss function and the count of the feature-label combination.
In any combination of the above embodiments of the system, the sensitive feature is specified in response to a user input to JSON file of a user interface.
In any combination of the above embodiments of the system, the adjusted machine learning model is deployed to an abstraction layer of a client device or a server device, wherein the abstraction layer comprises at least one of an operating system layer, an application layer, or a hardware layer.
In any combination of the above embodiments of the system, the operations comprise causing presentation of a graphical user interface comprising (i) a first control configured to receive a first user input indicative of the sensitive feature and (ii) a second control configured to receive a second user input indicative of the label.
In any combination of the above embodiments of the system, the operations are performed without receipt of client-side code.
In some embodiments, one or more computer storage media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause operations to be performed. The operations comprise determining, at a bias reducing machine learning engine, a count of a feature-label combination relating a sensitive feature to a label. The operations also comprise determining a loss adjustment weight based on the count of the feature-label combination, and applying the loss adjustment weight to a loss function associated with a machine learning model to generate an adjusted loss function. The operations further comprise training the machine learning model using the adjusted loss function to generate an adjusted machine learning model. The operations further comprise deploying the adjusted machine learning model to an operating system layer of a client device or a server device and for use in a software application of the client device or of the server device. Advantageously, these and other embodiments, as described herein, provide technology to improve machine learning systems by removing or reducing biases associated with some features by performing a loss adjustment operation that utilizes a modified or customized loss function during the training of the machine learning model. Further, these embodiments remove biases in machine learning applications without requiring computer code for addressing the biases to run on the computing system where the model is deployed and/or running, such as a computer program operating on a client computer, and thus address the bias problem in a computationally efficient manner. Further still, these embodiments can be personalized or tailored for certain types of data, such as sensitive data. Further still, these embodiments can provide for the selection of particular sensitive features. Accordingly, these embodiments are not only more accurate, but are more easily scaled compared to existing computationally intensive approaches.
In any combination of the above embodiments, the instructions may further cause the processor to convert the training data into a table or vector configured to associate a group of the sensitive feature to a corresponding label, and wherein the loss adjustment weight is determined based on a statistical analysis of the table or vector, the statistical analysis comprising a chi-squared test or Fisher's exact test.
In any combination of the above embodiments, the count of the feature-label combination is determined based on a frequency of the sensitive feature relative to the label, and wherein training the machine learning model using the adjusted loss function reduces a bias attributed to the sensitive feature.
In any combination of the above embodiments, the count of the feature-label combination is determined based on a frequency of the sensitive feature relative to the label, wherein the sensitive feature is engineered based on a numerical transformation, a category encoder, a clustering technique, a group aggregation value, or principal component analysis.
In any combination of the above embodiments, the instructions may further cause the processor to determining the loss function based on the machine learning model, wherein the loss adjustment weight is determined based on the loss function and the count of the feature-label combination.
In some embodiments, a computer-implemented method is provided. The method comprises accessing training data and training a machine learning model based on the training data. The method further comprises evaluating the machine learning model. The method, including evaluating the machine learning model, also comprises determining a count of a feature-label combination relating a sensitive feature to a label. The method, including evaluating the machine learning model, also comprises determining a loss adjustment weight based on the count of the feature-label combination, and applying the loss adjustment weight to a loss function of the machine learning model to generate an adjusted loss function configured to reduce an error attributed to the sensitive feature. The method, including evaluating the machine learning model, further comprises re-training a machine learning model using the adjusted loss function to generate an adjusted machine learning model. The method further comprises deploying the adjusted machine learning model. Advantageously, these and other embodiments, as described herein, provide technology to improve machine learning systems by removing or reducing biases associated with some features by performing a loss adjustment operation that utilizes a modified or customized loss function during the training of the machine learning model. Further, these embodiments remove biases in machine learning applications without requiring computer code for addressing the biases to run on the computing system where the model is deployed and/or running, such as a computer program operating on a client computer, and thus address the bias problem in a computationally efficient manner. Further still, these embodiments can be personalized or tailored for certain types of data, such as sensitive data. Further still, these embodiments can provide for the selection of particular sensitive features. Accordingly, these embodiments are not only more accurate, but are more easily scaled compared to existing computationally intensive approaches.
In any combination of the above embodiments, the machine learning model may be re-trained until an adjusted loss output by the adjusted loss function satisfies a loss threshold.
In any combination of the above embodiments, the count of the feature-label combination may be determined based on a frequency of the sensitive feature relative to the label, and wherein training the machine learning model using the adjusted loss function reduces a bias attributed to the sensitive feature.
In any combination of the above embodiments, the count of the feature-label combination may be determined based on a frequency of the sensitive feature relative to the label, wherein the sensitive feature is engineered based on a numerical transformation, a category encoder, a clustering technique, a group aggregation value, or principal component analysis.
In any combination of the above embodiments, the adjusted machine learning model may be deployed to an abstraction layer of a client device or a server device, wherein the abstraction layer comprises at least one of an operating system layer, an application layer, or a hardware layer.
Having described various embodiments of the disclosure, an exemplary computing environment suitable for implementing embodiments of the disclosure is now described. With reference to
Embodiments of the disclosure may be described in the general context of computer code or machine-useable instructions, including computer-useable or computer-executable instructions, such as program modules, being executed by a computer or other machine, such as a personal data assistant, a smartphone, a tablet PC, or other handheld device. Generally, program modules, including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data types. Embodiments of the disclosure may be practiced in a variety of system configurations, including handheld devices, consumer electronics, general-purpose computers, more specialty computing devices, or similar computing or processing devices. Embodiments of the disclosure may also be practiced in distributed computing environments where tasks are performed by remote-processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computing device 900 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 900 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage median and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVDs) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 900. Computer storage media does not comprise signals per se. Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media, such as a wired network or direct-wired connection, and wireless media, such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
Memory 912 includes computer storage media in the form of volatile and/or nonvolatile memory. The memory may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid-state memory, hard drives, optical-disc drives, and similar physical storage media. Computing device 900 includes one or more processors 914 that read data from various entities such as memory 912 or I/O components 920. Presentation component(s) 916 presents data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, and the like.
The I/O ports 918 allow computing device 900 to be logically coupled to other devices, including I/O components 920, some of which may be built in. Illustrative components include, by way of example and not limitation, a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, and other I/O components. The I/O components 920 may provide a natural user interface (NUI) that processes air gestures, voice, or other physiological inputs generated by a user. In some instances, inputs may be transmitted to an appropriate network element for further processing. An NUI may implement any combination of speech recognition, touch and stylus recognition, facial recognition, biometric recognition, gesture recognition both on screen and adjacent to the screen, air gestures, head and eye tracking, and touch recognition associated with displays on the computing device 900. The computing device 900 may be equipped with depth cameras, such as stereoscopic camera systems, infrared camera systems, red-green-blue (RGB) camera systems, and combinations of these, for gesture detection and recognition. Additionally, the computing device 900 may be equipped with accelerometers or gyroscopes that enable detection of motion. The output of the accelerometers or gyroscopes may be provided to the display of the computing device 900 to render immersive augmented reality or virtual reality.
Some embodiments of computing device 900 may include one or more radio(s) 924 (or similar wireless communication components). The radio 924 transmits and receives radio or wireless communications. The computing device 900 may be a wireless terminal adapted to receive communications and media over various wireless networks. Computing device 900 may communicate via wireless protocols, such as code division multiple access (“CDMA”), global system for mobiles (“GSM”), or time division multiple access (“TDMA”), as well as others, to communicate with other devices. The radio communications may be a short-range connection, a long-range connection, or a combination of both a short-range and a long-range wireless telecommunications connection. When we refer to “short” and “long” types of connections, we do not mean to refer to the spatial relation between two devices. Instead, we are generally referring to short range and long range as different categories, or types, of connections (i.e., a primary connection and a secondary connection). A short-range connection may include, by way of example and not limitation, a Wi-Fi® connection to a device (for example, mobile hotspot) that provides access to a wireless communications network, such as a wireless local-area network (WLAN) connection using the 802.11 protocol; a Bluetooth connection to another computing device is a second example of a short-range connection, or a near-field communication connection. A long-range connection may include a connection using, by way of example and not limitation, one or more of CDMA, GPRS, GSM, TDMA, and 802.16 protocols.
Referring now to
Data centers can support distributed computing environment 1000 that includes cloud computing platform 1010, rack 1020, and node 1030 (for example, computing devices, processing units, or blades) in rack 1020. The technical solution environment can be implemented with cloud computing platform 1010 that runs cloud services across different data centers and geographic regions. Cloud computing platform 1010 can implement fabric controller 1040 component for provisioning and managing resource allocation, deployment, upgrade, and management of cloud services. Typically, cloud computing platform 1010 acts to store data or run service applications in a distributed manner. Cloud computing infrastructure 1010 in a data center can be configured to host and support operation of endpoints of a particular service application. Cloud computing infrastructure 1010 may be a public cloud, a private cloud, or a dedicated cloud.
Node 1030 can be provisioned with host 1050 (for example, operating system or runtime environment) running a defined software stack on node 1030. Node 1030 can also be configured to perform specialized functionality (for example, compute nodes or storage nodes) within cloud computing platform 1010. Node 1030 is allocated to run one or more portions of a service application of a tenant. A tenant can refer to a user, such as a customer, utilizing resources of cloud computing platform 1010. Service application components of cloud computing platform 1010 that support a particular tenant can be referred to as a multi-tenant infrastructure or tenancy. The terms service application, application, or service are used interchangeably herein and broadly refer to any software, or portions of software, that run on top of, or access storage and compute device locations within, a datacenter.
When more than one separate service application is being supported by nodes 1030, nodes 1030 may be partitioned into virtual machines (for example, virtual machine 1052 and virtual machine 1054). Physical machines can also concurrently run separate service applications. The virtual machines or physical machines can be configured as individualized computing environments that are supported by resources 1060 (for example, hardware resources and software resources) in cloud computing platform 1010. It is contemplated that resources can be configured for specific service applications. Further, each service application may be divided into functional portions such that each functional portion is able to run on a separate virtual machine. In cloud computing platform 1010, multiple servers may be used to run service applications and perform data storage operations in a cluster. In particular, the servers may perform data operations independently but exposed as a single device referred to as a cluster. Each server in the cluster can be implemented as a node.
Client device 1080 may be linked to a service application in cloud computing platform 1010. Client device 1080 may be any type of computing device, such as user device 102a described with reference to
Many different arrangements of the various components depicted, as well as components not shown, are possible without departing from the scope of the claims below. Embodiments of the present disclosure have been described with the intent to be illustrative rather than restrictive. Alternative embodiments will become apparent to readers of this disclosure after and because of reading it. Alternative means of implementing the aforementioned can be completed without departing from the scope of the claims below. Certain features and sub-combinations are of utility and may be employed without reference to other features and sub-combinations and are contemplated within the scope of the claims.
Having identified various components utilized herein, it should be understood that any number of components and arrangements may be employed to achieve the desired functionality within the scope of the present disclosure. For example, the components in the embodiments depicted in the figures are shown with lines for the sake of conceptual clarity. Other arrangements of these and other components may also be implemented. For example, although some components are depicted as single components, many of the elements described herein may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Some elements may be omitted altogether. Moreover, various functions described herein as being performed by one or more entities may be carried out by hardware, firmware, and/or software, as described below. For instance, various functions may be carried out by a processor executing instructions stored in memory. As such, other arrangements and elements (for example, machines, interfaces, functions, orders, and groupings of functions) can be used in addition to or instead of those shown.
Embodiments described in the paragraphs below may be combined with one or more of the specifically described alternatives. In particular, an embodiment that is claimed may contain a reference, in the alternative, to more than one other embodiment. The embodiment that is claimed may specify a further limitation of the subject matter claimed.
The subject matter of embodiments of the invention is described with specificity herein to meet statutory requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” and/or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described.
For purposes of this disclosure, the word “including” has the same broad meaning as the word “comprising,” and the word “accessing” comprises “receiving,” “referencing,” or “retrieving.” Further the word “communicating” has the same broad meaning as the word “receiving,” or “transmitting” facilitated by software or hardware-based buses, receivers, or transmitters using communication media described herein. In addition, words such as “a” and “an,” unless otherwise indicated to the contrary, include the plural as well as the singular. Thus, for example, the constraint of “a feature” is satisfied where one or more features are present. Also, the term “or” includes the conjunctive, the disjunctive, and both (a or b thus includes either a or b, as well as a and b).
For purposes of a detailed discussion above, embodiments of the present invention are described with reference to a distributed computing environment; however the distributed computing environment depicted herein is merely exemplary. Components can be configured for performing novel aspects of embodiments, where the term “configured for” can refer to “programmed to” perform particular tasks or implement particular abstract data types using code. Further, while embodiments of the present invention may generally refer to the technical solution environment and the schematics described herein, it is understood that the techniques described may be extended to other implementation contexts.
Embodiments of the present invention have been described in relation to particular embodiments which are intended in all respects to be illustrative rather than restrictive. Alternative embodiments will become apparent to those of ordinary skill in the art to which the present invention pertains without departing from its scope.
From the foregoing, it will be seen that this invention is one well adapted to attain all the ends and objects hereinabove set forth together with other advantages which are obvious and which are inherent to the structure.
It will be understood that certain features and sub-combinations are of utility and may be employed without reference to other features or sub-combinations. This is contemplated by and is within the scope of the claims.