Machine learning models are one form of automated cybersecurity systems used to protect computer hardware and software. However, cyber criminals develop techniques for circumventing existing cybersecurity systems, including machine learning models. Thus, improvements to securing automated cybersecurity systems are sought.
In general, in one aspect, one or more embodiments relate to a method that includes receiving an event, and altering, responsive to receiving the event, a threshold pseudo-randomly to generate an altered threshold value. The method further includes applying the altered threshold value to a threshold-dependent feature to generate an altered threshold-dependent feature value. The altered threshold-dependent feature value determined at least in part from the event. The method further includes executing a machine learning model, on the event and the altered threshold-dependent feature value, to generate a predicted event type for the event.
In general, in one aspect, one or more embodiments relate to a method that includes obtaining test events, and, for each test event, individually creating at least one test case. Individually creating at least one test case includes altering a threshold pseudo-randomly to generate an altered threshold value, applying the altered threshold value to a threshold-dependent feature to generate an altered threshold-dependent feature value, the altered threshold-dependent feature value determined at least in part from the plurality of test events, and adding the threshold-dependent feature to a test case in the at least one test case. The method further includes iteratively adjusting at least one machine learning model while executing the at least one machine learning model on the at least one test case the test events to generate at least one trained machine learning model.
In general, in one aspect, one or more embodiments relate to a system that includes a server including a processor, and a data repository in communication with the server. The data repository storing an event having an event type, and information regarding the event. The system further includes a machine learning model trained to classify the event. The machine learning model is configured to receive as input the event and the plurality of altered threshold-dependent feature values. The system further includes a server application configured, when executed by the processor, to generate the altered threshold-dependent feature values by altering, using the information regarding the event, thresholds, input, to the machine learning model, the event and the altered threshold-dependent feature values, and generate, as output from the machine learning model, the predicted event type.
Other aspects of the one or more embodiments will be apparent from the following description and the appended claims.
Specific embodiments will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.
In the following detailed description of embodiments, numerous specific details are set forth in order to provide a more thorough understanding of the one or more embodiments. However, it will be apparent to one of ordinary skill in the art that the one or more embodiments may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
Throughout the application, ordinal numbers (e.g., first, second, third, etc.) may be used as an adjective for an element (i.e., any noun in the application). The use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements. By way of an example, a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.
In general, the one or more embodiments are directed to improving the cybersecurity of machine learning models. Machine learning models used for security purposes use, as input, various feature values of features and produce, as output a security classification. Nefarious users wanting to circumvent the security provided by the machine learning models may attack the indirectly through the inputs to the models. When a machine learning model uses, as input, feature values determined from constant thresholds, such nefarious users are able to learn the thresholds. In particular, one or more embodiments improve the cybersecurity of machine learning models by using a pseudo-random, repeatable technique to vary the thresholds that determine feature values of features used by the machine learning model. Further, because a machine learning model is dependent on the inputs to the machine learning model, one or more embodiments improve the machine learning model itself in order to handle the varying thresholds.
A summary of the one or more embodiments is provided by way of broad example, with the details of the one or more embodiments described with respect to the figures. Initially, an event is detected. An event is an electronic action that is being monitored. Examples of events include login attempts, requests to initiate an electronic transfer of information, attempts to access the use of software, attempts to manipulate data on secured data repositories, attempt to electronically transfer money or electronically pay for a product, etc.
When the event is detected, information about the event is provided as input to a machine learning model. The machine learning model is trained to generate a predicted event type of the event. Specifically, the machine learning model is trained to classify the event into one or two or more event types. The classification may include an individual probability for each event type that the event is of the event type. The event type with the greatest probability may be deemed the predicted event type for the event.
For example, the event types may include fraudulent event type or an authentic (i.e., non-fraudulent) event type. In the example, if the probability of a fraudulent event is below a threshold, the event may be deemed authentic, and the user is allowed to proceed. If the probability of a fraudulent event is above the threshold, the event may be deemed possibly fraudulent and a security action is taken.
An issue encountered in cybersecurity is that a cybercriminal may monitor the behavior of an enterprise system that is protected by such machine learning models. If the behavior of the machine learning models becomes predictable, the criminal is able to circumnavigate the portion of the cybersecurity system protected by the machine learning models. Stated more simply, the cybercriminal figures out how to trick the machine learning models into predicting that a fraudulent event is authentic, or finds a way into causing the machine learning models to fail to make a prediction with respect to the fraudulent activity.
The one or more embodiments improve the machine learning models by making the output of the machine learning models much more difficult to predict through changing the inputs to the machine learning model. In particular, a pseudo-random technique is used to alter the thresholds that used to formulate the input to the machine learning models. When many thresholds are altered at once, the cybercriminal will have difficulty predicting the behavior of the cybersecurity system. Additionally, the cybercriminal may be tricked into believing that he or she has discerned the system's behavior patterns, but in actuality a fraudulent event will be detected and thwarted.
By way of an example, consider the scenario in which the machine learning model detects whether a fraudulent login occurs based on the number of unsuccessful login events in a defined lookback period. If the threshold for the lookback period is statically defined as five days, then the cybercriminal may easily be able to detect the five-day threshold based on response of the system. Using the information, the cybercriminal simply waits six days before trying again circumventing the machine learning model. However, if the threshold for the lookback period changes, then the cybercriminal may not be able to detect the threshold and thus cannot easily circumvent the security provided by the machine learning model. The benefit is compounded when more than just the lookback period or if other thresholds are dynamically modified.
Attention is now turned to the figures.
The data repository (100) stores a variety of information useful to the one or more embodiments. For example, the data repository (100) stores an event (102). As mentioned above, the event (102) is an electronic action that is being monitored. Examples of events include login attempts, requests to initiate an electronic transfer of information, attempts to access the use of software, attempts to manipulate data on secured data repositories, etc.
The data repository (100) stores information (104) about the event (102). The information (104) relates to, or is in regard to, the event (102). The term “regards” means that the information (104) somehow describes some aspect related to the event (102). Thus, for example, the information (104) may be characterized as metadata in that the information (104) describes the event (102). However, the information (104) may also include data that constitutes the event (102) itself.
In a more specific example, the information (104) may include a timestamp (106) of when the event occurred, an identifier pertaining to a user or an account associated with the event, an internet protocol (IP) address from which the event (102) originated or at which the event (102) is processed, etc. The information (104) may take many other different forms. Thus, the timestamp (106) shown in
The event (102) is characterized by an event type (108), also stored in the data repository (100). The event type (108) is a category into which the event (102) has been placed. For example, the event (102) may be classified as fraudulent, authentic, suspicious, secure, insecure, etc. The event type (108) may take many other different forms or categories.
The data repository (100) also stores a predicted event type (110). The predicted event type (110) is a machine-learned prediction of which event type (108) applies to a given incoming event (102). The process of predicting the predicted event type (110) is described further with respect to
The data repository (100) also stores past events (112), which, in most cases, is one of multiple past events (114). The past event (112) and the multiple past event (114) are events which have occurred in the past but are stored for informational purposes. Thus, for example, once an incoming event (102) is processed, the incoming event (102) associated with an event type (108) and is added to the multiple past events (114).
The system shown in
The machine learning algorithm (120) is a computer program that, when executed, produces a prediction based on the input provided and the parameter (124) set for the machine learning algorithm (120). Examples of the machine learning algorithm (120) include supervised learning algorithms and unsupervised learning algorithms. A specific example of a supervised machine learning algorithm used in
The parameter (124) is a number which alters how the machine learning algorithm (120) manipulates the input data. An example of a parameter is a set of weights that is specified for a neural network. The set of weights allow the neural network to produce a related output. However, many different possible parameters exist. Some machine learning algorithms have defined many parameters, though only one parameter may be defined for a given machine learning algorithm.
Changing the parameter (124) changes the machine learning model (116), because the output of the machine learning model (116) changes when the parameter (124) changes. Thus, the process of training the machine learning model (116), described with respect to
The system shown in
The server (128) includes at least one processor (130). The processor (130) is computer hardware that is configured to execute software, such as a training application (132), a server application (134), and a fraud prevention application (136). An example of the processor (130) is described with respect to
The server (128) may also be characterized by the software executing on the server (128). Thus, as indicated above, the server (128) may include the training application (132), the server application (134), and the fraud prevention application (136). Each component is described in turn. Note that the information described with respect to each component may be stored in the data repository (100).
Attention is first turned to the training application (132). The training application (132) is one or more software programs that, when executed by the processor (130), operate to train the machine learning model (116) and/or the multiple machine learning models (118). Operation of the training application (132) is described with respect to
The training application (132) uses a test event (138) having one or more test cases. The test event (138) may be one of many different test events. The test event (138) is a past event that has a known event type (112). For example, the test event (138) may be a login attempt which is known to be classified as fraudulent. Another, different, test event (138) may be another login attempt which is known to be classified as authentic.
As described further with respect to
The fact that the actual event type is already known is used to train the model, as described with respect to
The comparison (140) is used to generate a loss function (142), as described with respect to
As described with respect to
Attention is now turned to the server application (134). The server application (134) is one or more software programs that, when executed by the processor (130), operate to perform the one or more embodiments described with respect to
The server application (134) uses certain kinds of data in the performance of the methods of
In general, a “feature” is an individual measurable property (e.g., characteristic) that is related to and directly or indirectly determinable at least in part from an event. For example, a feature may be a number of past events having a same attribute value as the event. A feature has a feature value. The feature value is the value of the feature for a particular event. For a threshold-dependent feature, the feature value is dictated by the threshold. For some features in at least one embodiment, the feature value may be value is a numerical representation that quantifies the feature. For example, if the feature represents the number of past login attempts, then the number “5” would represent five past login attempts.
A feature value may be stored in a cell in a vector. The vector is a data structure that is used to provide input data to the machine learning model (116). Thus, it can be said that the machine learning model (116) executes on the input data, or that the machine learning model (116) executes on the vector. It can also be said that the vector is composed of features.
For example, a feature may be the current number of login attempts received. The feature may be the internet protocol address from which a given login attempt is received. A feature may be a lookback feature. A lookback feature is a feature whose values is dependent on a past period of time. Many features are possible. In a real setting, a vector may include hundreds, or even thousands of features.
The following is an example list of threshold dependent features. (1) Is total amount of purchases in the previous time period greater first threshold, whereby the time period is defined by a second threshold? (2) Is total amount of logins to the account in previous time period greater than a first threshold, whereby the time period is defined by a second threshold? (3) Is the total amount of account creations from an IP address greater than a first threshold, whereby the time period is defined by a second threshold? (4) Is the total amount of credit already given to the user greater than a threshold? (5) Is the total amount of loans already returned by the user greater than a threshold? Other example of threshold dependent features are aggregation features. Aggregation features returns a number, such as a count of a particular attribute value within a time period, whereby the time period is defined by a threshold. The above are only a few examples of threshold dependent features, other threshold dependent features may be used without departing from the scope of the claims.
Returning to the threshold-dependent feature (146), again is defined as a feature which depends on a threshold value. A lookback feature is an example of the threshold-dependent feature (146), as the lookback feature defines a length of a lookback period (the amount of time a data set is to be analyzed). Another example of the threshold-dependent feature (146) is a defined number of logins attempts that are allowed until a security action is taken. Another example of the threshold-dependent feature (146) is a rate at which login attempts are made over a defined time period. Many different forms of the threshold-dependent feature (146) are possible.
As indicated above, the threshold-dependent feature (146) is defined by a threshold (150) having a threshold value. Accordingly, multiple thresholds (152) may be defined for multiple threshold-dependent features (148), on a one-for-one basis. The threshold value of a threshold (150) is a numerical representation of a limit set for the threshold-dependent feature (146), as described above.
A feature index (154) is defined for the threshold-dependent feature (146). Likewise, multiple feature indices (156) are defined for the multiple threshold-dependent features (148). The feature index (154) is a unique identifier of a feature amongst the various features used by the machine learning model. In one or more embodiments, the feature index (154). In one or more embodiments, the feature index may be a unique identifier of the location of feature value of the feature in the vector, described above.
The one or more embodiments alter the threshold-dependent feature (146). In particular, the one or more embodiments provide for a method for establishing an altered threshold value (158) for the threshold-dependent feature (146). Likewise, it is possible for multiple altered threshold values (160) to be defined for multiple threshold-dependent features (148). The term “altered” means that the threshold (150) is a dynamic threshold that is changed pseudo-randomly. The method of altering is described with respect to
The altered threshold value (158) is applied to the threshold-dependent feature (146) to establish an altered threshold-dependent feature value (162). Likewise, multiple altered threshold-dependent feature values (164) are applied to the multiple threshold-dependent features (148) to establish multiple altered threshold-dependent feature values (164). The process of establishing the altered threshold-dependent feature value (162) is described with respect to
The server application (134) may use a hash value (168) to establish the altered threshold value (158). Similarly, the server application (134) may use multiple hash values (170) to establish multiple altered threshold values (160). A hash value is the numerical result of a hash function applied to an input. In one or more embodiments, the hash function is deterministic. Namely, the inputs to the hash function and the hash function itself are defined such that the same inputs produce the same hash value. For example, the hash function may use as input a timestamp (106) of the event and the feature index. An example hash function may be an exclusive or (XOR) based hash function. In one or more embodiments, the hash value (168) maps to an altered threshold value in a range of allowed threshold values. The range of allowed threshold values may be defined, for example, by a user. The process of establishing the hash value (168) is described with respect to
The use of the hash value (168) is an example of a method of performing a pseudo-random alteration of the threshold value (150). The term “pseudo-random” is defined as a method of generating the altered threshold value (158) such that the altered threshold value (158) is sufficiently unpredictable. The term “sufficiently unpredictable” means that an external observer will have difficulty identifying a pattern in the results produced by multiple applications of the method.
For example, without knowing the basis or bases for the hash algorithm, it is difficult to predict the altered threshold value (158) that is produced by the hash algorithm over multiple iterations. The difficulty increases exponentially when different bases for the hash algorithm are applied to different ones of multiple threshold-dependent features (148). For example, the multiple altered threshold-dependent feature values (164) could be based hashes of the multiple feature indices (156) for each of the features. However, unless an external observer knows the order in which the features are arranged in the input vector, the results of applying the hash algorithm appear unpredictable.
The unpredictability increases to the external user because what an external observer observes is not the altered threshold value (158) itself, but rather the behavior of the fraud prevention application (136). In turn, the fraud prevention application (136) is based on the output of the machine learning algorithm (120), which only uses the multiple altered threshold values (160) as input. Thus, to an external user, the fraud prevention application (136) behavior appears random, even though an administrator in control of the system of
The system shown in
While
Step 200 includes receiving an event. The event is received via a communication network, such as the Internet. In a specific example, a login event is received at the server from a web browser on a remote user device. In another example, a login event is received when a user attempts to transfer more than a certain amount of money from or to an account.
Step 202 includes altering, responsive to receiving the event, a threshold pseudo-randomly to generate an altered threshold value. Altering the threshold pseudo-randomly may be achieved by applying a hash function to a feature index with a timestamp. Altering the threshold pseudo-randomly may additionally or alternatively use other values as inputs to the hash function, such as an identifier of a user, an account number associated with the event, combinations thereof. The result of the hash function is a hash value. In one or more embodiments, the hash value is used directly as the altered threshold value. In one or more embodiments, the hash value is used as input to a mapping function that maps the hash value to the altered threshold values. By way of an example, the mapping function may map a different ranges of hash values to a corresponding altered threshold value in a set of allowed altered threshold values. Thus, from the hash value, the range of the hash value is determined, and the corresponding altered threshold value is determined. By way of an example, if the result of the hash function is 1.345 and the range from 1 to 2 maps to altered threshold value 5 while different ranges map to different corresponding threshold values, then the result of the mapping function for 1.345 is an altered threshold value of 5. The set of allowed altered threshold values may be defined on a discrete (e.g., any integer within a range) or on a continuous scale (e.g., any number within a range). Further, the set of allowed altered threshold values may be enumerated (e.g., “4, 5, and 8 are allowed values”) or implicitly defined (e.g., “integers between 1 and 8 are allowed values”).
Altering the threshold pseudo-randomly may also be achieved by altering the threshold according to an algorithm that takes, as input, a process that is random. The random number is stored so that the original threshold value can be reconstructed, when desired. Other techniques for altering the threshold pseud-randomly exist.
Step 204 includes applying the altered threshold value to a threshold-dependent feature to generate an altered threshold-dependent feature value. The altered threshold-dependent feature value is determined at least in part from the event. Directly or indirectly from the event, one or more feature values are determined. Each feature specifies a collection of one or more attributes to combine into the feature value. The attribute value may be the feature value. As another example, the feature value may be determined from a function performed on one or more attribute values. The attribute values may be attributes values in the event, in previous events, attributes of the target of the event or other attributes. For example, the attribute may be the number of log-in attempts by the same internet protocol (IP) address as the event. In such a scenario, the attribute value is the IP address of the event, the function is the number of log-in attempts from the IP address in previous attempts. For at least one threshold-dependent feature, the altered threshold value is used when applying the function. For example, for lookback features, the altered threshold value is on the number of events, period of time of the lookback feature, or other value.
Feature values for the one or more features may be added to a vector. The Steps 202 and 204 may be performed for multiple features to use multiple altered threshold values for the multiple features. The result of Step 204 across the feature may be the vector that is used as input to the machine learning model.
Step 206 includes executing a machine learning model, on the event and the altered threshold-dependent feature value, to generate a predicted event type for the event. As described above with respect to
The result of executing the machine learning model is a number or a string of numbers that represent one or more predictions that the input matches one or more pre-defined event types. Thus, for example, an output of the machine learning model might be that the input has a 1% chance of matching a fraudulent event type 1, a 5% chance of matching a fraudulent event type 2, a 10% chance of matching a fraudulent event type 3, and a and a 84% chance of matching an authentic event type. The highest number is selected, by the server application, resulting in an overall prediction that the input data matches an authentic event type.
Optionally, the output of the machine learning model may be subject to further machine learning. For example, a confidence machine learning model, such as a logistic regression algorithm, could be used to measure a probability that the output of the prediction machine learning model is correct. The confidence could be used as part of the basis that the server application uses to determine which of a number of different event types should be applied to the current event input into the prediction machine learning model.
In any case, once the event type is predicted for the incoming event, the method of
The method of
The unpredictability of the model can be further increased by ensuring that the altered threshold-dependent values are all different, and determine differently, for each threshold-dependent feature. Thus, in this case, at least two of the plurality of altered threshold values are different.
In another variation to the method of
Alternatively, the threshold-dependent features may be a number of occurrences of multiple past events satisfying a criterion for matching the event. For example, eight login attempts could be used as a reference for one set of inputs, but twelve login attempts could be used as a reference for another.
The altered threshold value at step 202 may also be limited to be within a range. For example, a lookback value of twelve years may be deemed excessive. Thus, the server application may limit the lookback period to be a time between five days and twenty days, in one example.
The unpredictability of the behavior of the security system can be further increased by using multiple different prediction machine learning models, which use altered threshold-dependent features. However, in this example, a random or a pseudo-random process may be used to determine which of multiple machine learning models are used to perform a prediction. As each machine learning model is different, either because the parameters are different or because the machine learning algorithms are different, or both, the output becomes less predictable. In the multiple machine learning model embodiment, altering the threshold value is performed by in the pseudorandom selection of the machine learning model from a set of machine learning models.
Thus, the one or more embodiments also contemplate selecting the machine learning model from among multiple machine learning models. The multiple machine learning models each correspond to a distinct corresponding set of one or more altered threshold values. The distinct correspond set of one or more altered threshold values are each altered pseudo-randomly when training the respective machine learning models. Namely, in one or more embodiments, the one or more altered threshold values are statically defined for a particular machine learning model, but the selection of the machine learning model is pseudorandom resulting in a pseudorandom alteration of the one or more thresholds.
Any given pseudo-random number may be generated from information regarding the event. Thus, for example, information regarding the event may be used to generate a pseudo-random number that is used to select the machine learning model applied to a given event. Concurrently, the pseudo-random number may be used to identifying the distinct corresponding set of one or more altered threshold values matching the pseudo-random number.
As indicated above, generating the pseudo-random number may include generating a hash of a timestamp of the event with an index of a selected feature of the threshold-dependent features. However, other methods may be used, such as to generate a hash of a username of an account associated with the event with an index of the selected feature.
The one or more embodiments also contemplate more efficient techniques for accessing differing values of the threshold-dependent features. When many different events are processed concurrently, the speed of accessing different values of different thresholds may be an issue.
Thus, the one or more embodiments also contemplate storing counts for the threshold-dependent feature in an array. Storing the counts includes storing different values for the threshold-dependent features in the array. For example, if the threshold-dependent feature is a lookback feature such as the number of login attempts per day over a five day period, then the value for each day may be stored in the rows of the array.
Then, responsive to receiving the event and altering the threshold value, a subset of the plurality of counts is selected according to the altered threshold value to obtain a selected subset. For example, if the altered threshold value is moved from one day to five days, then the array is accessed at the five-day entry. However, if the altered threshold value is moved to two days, then the array is accessed at the two-day entry.
In an embodiment, the subset may be aggregated and used as the altered threshold-dependent feature value. Thus, the method may include aggregating, responsive to receiving the event and altering the threshold value, the selected subset to generate the altered threshold-dependent feature value. Still other variations are possible.
Attention is now turned to
As indicated above with respect to
Thus, step 300 includes obtaining test events. The test events are obtained from past events for which event types are known. For example, by analyzing patterns in the data, or from analyzing past security issues, it may be known that certain events correspond to fraudulent events and other events correspond to authentic events. In some cases, different kinds of fraudulent event types may be known.
Step 302 then includes individually creating at least one test case for each test event of the plurality of test events. The test cases may be created according to the method of
Step 304 includes adjusting at least one machine learning model while executing the at least one machine learning model on the at least one test case. Each test case may correspond to an individual vector that is used as input to the machine learning model. Thus, the machine learning model is executed as described above with reference to Step 206 of
The parameters are adjusted by way of a loss function. As described with respect to
Step 306 includes determining whether convergence has occurred. If convergence has occurred (a “yes” determination at step 306), then the method proceeds to step 308 where the at least one trained machine learning model is output. In most cases, multiple machine learning models are output, one for each test case. Otherwise, (a “no” determination at step 306), then the method returns to step 304 and iterates.
If the machine learning model is a single machine learning model being trained, whereby the single machine learning model uses features having altered threshold values, then each test event may result in a single test case. By having a single test case per test event, the machine learning model is trained to handle the variability that exists when the machine learning model is used in production.
The method of
The method of
Thus, in the extended method, a new event is received. A selected one of a first trained machine learning model and a second trained machine learning model of the at least one trained machine learning model is pseudo-randomly determined. Then, a prediction is performed, by the selected one of the first trained machine learning model and the second trained machine learning model, whether the new event matches a pre-determined event type.
Attention is now turned to
Step 302A includes altering a threshold value pseudo-randomly to generate an altered threshold value. The process of altering a value pseudo-randomly is described above with respect to
Step 302B includes applying the altered threshold value to a threshold-dependent feature to generate an altered threshold-dependent feature value. The altered threshold-dependent feature value is determined at least in part from the event. The process of applying the altered threshold value is described above with respect to
Step 302C includes adding the threshold-dependent feature to a test case in the at least one test case. The threshold-dependent feature is added to a test case by including the altered threshold-dependent feature in the input that is to be used for a selected machine learning model.
While the various steps in the flowcharts of
In the example of
In particular, reference is made to the threshold dependent features (400). The threshold-dependent features (400) include such features, feature A (402) and feature B (404). Feature A (402) is the number of login attempts for an account. Feature B (404) is the lookback period for the number of login attempts. In other words, feature B (404) represents how many days in the past to look at the number of login attempts. Feature A (402) represents the number of login attempts on any given day, or possibly the accumulated number of login attempts made over the lookback period specified by feature B (404).
Each of feature A (402) and feature B (404) has definition of the feature an index. Thus, feature A (402) has a definition of feature A (406) and an index of feature A (408). Likewise, the feature B (404) has a definition of feature B (410) and an index of feature B (412).
In the example of
In this example, the hash function (418) is performed on the timestamp (416) and the index of feature A (408) to generate a hash value A (420). Similarly, the hash function (418) is performed on the timestamp (416) and the index of feature B (412) to generate a hash value B (422).
The hash values, in turn, are used to alter the thresholds for feature A (402) and for feature B (404), respectively. For example, the hash value A (420) is mapped to an altered threshold value A (426). Further, hash value B (422) is mapped to the altered threshold value B (428). Because different indexes are used, different hash values result for different features of the same event. Note that both processes are examples of step 202 of
In the example of
The altered threshold dependent features (424) are provided as input to an XGBoost (430) machine learning model. Other features may also be provided as input to the XGBoost (430) machine learning model. In addition, information regarding the new login attempt (414) is converted into additional features and provided as input to the XGBoost (430) machine learning model.
The XGBoost (430) machine learning model is then executed. The result of the execution is a prediction (432). The prediction (432) is one or more predictions of probabilities that the new login attempt (414) corresponds to either an authentic login attempt (event type A) or a fraudulent event type (event type B). For example, the prediction (432) may be only a single probability that the new login attempt (414) is fraudulent. However, the prediction (432) may take the form of two numbers, one representing a first probability that the new login attempt (414) is fraudulent, and the other representing a second probability that the new login attempt (414) is authentic. Note that the prediction (432) could include other predicted probabilities, such as probabilities that the new login attempt (414) represents one of different methods of fraudulently logging into the protected account.
The server application then performs a determination (434). In particular, the determination (434) is whether the new login attempt (414) is fraudulent. For example, if the prediction (432) is above a threshold value of 80% that the new login attempt (414) is fraudulent, then a security action (438) is taken (a “yes” at determination (434)). However, if the prediction (432) is below the threshold value, then the allow login (436) action is performed, and the user is granted access to the account.
The security action (438) may take a variety of different forms. For example, the security action (438) may be to provide the user with an additional challenge to ensure that the new login attempt (414) is authentic. For example, the user may be presented with a two-factor authentication challenge to ensure it is the authorized user who initiated the new login attempt (414). If the two-factor authentication check passes, the allow login (436) action is taken. However, if the two-factor authentication check fails, then the login attempt is denied.
Still other examples of the security action (438) are possible. The new login attempt (414) may be denied outright. The new login attempt (414) may be reported to an authority. The new login attempt (414) may be sent for further analysis to determine an internet protocol (IP) address of the remote computer from which the new login attempt (414) was received. The IP address can then be tracked or blocked. Still other examples of the security action (438) are possible.
The computer processor(s) (502) may be an integrated circuit for processing instructions. For example, the computer processor(s) (502) may be one or more cores or micro-cores of a processor. The computing system (500) may also include one or more input device(s) (510), such as a touchscreen, a keyboard, a mouse, a microphone, a touchpad, an electronic pen, or any other type of input device.
The communication interface (508) may include an integrated circuit for connecting the computing system (500) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, a mobile network, or any other type of network) and/or to another device, such as another computing device.
Further, the computing system (500) may include one or more output device(s) (512), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, a touchscreen, a cathode ray tube (CRT) monitor, a projector, or other display device), a printer, an external storage, or any other output device. One or more of the output device(s) (512) may be the same or different from the input device(s) (510). The input and output device(s) (510 and 512) may be locally or remotely connected to the computer processor(s) (502), the non-persistent storage device(s) (504), and the persistent storage device(s) (506). Many different types of computing systems exist, and the aforementioned input and output device(s) (510 and 512) may take other forms.
Software instructions in the form of computer readable program code to perform the one or more embodiments may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, a DVD, a storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium. Specifically, the software instructions may correspond to computer readable program code that, when executed by a processor(s), is configured to perform the one or more embodiments.
The computing system (500) in
Although not shown in
The nodes (e.g., node X (522), node Y (524)) in the network (520) may be configured to provide services for a client device (526). For example, the nodes may be part of a cloud computing system. The nodes may include functionality to receive requests from the client device (526) and transmit responses to the client device (526). The client device (526) may be a computing system, such as the computing system (500) shown in
The computing system (500) or group of computing systems described in
Based on the client-server networking model, sockets may serve as interfaces or communication channel end-points enabling bidirectional data transfer between processes on the same device. Foremost, following the client-server networking model, a server process (e.g., a process that provides data) may create a first socket object. Next, the server process binds the first socket object, thereby associating the first socket object with a unique name and/or address. After creating and binding the first socket object, the server process then waits and listens for incoming connection requests from one or more client processes (e.g., processes that seek data). At this point, when a client process wishes to obtain data from a server process, the client process starts by creating a second socket object. The client process then proceeds to generate a connection request that includes at least the second socket object and the unique name and/or address associated with the first socket object. The client process then transmits the connection request to the server process. Depending on availability, the server process may accept the connection request, establishing a communication channel with the client process, or the server process, busy in handling other operations, may queue the connection request in a buffer until server process is ready. An established connection informs the client process that communications may commence. In response, the client process may generate a data request specifying the data that the client process wishes to obtain. The data request is subsequently transmitted to the server process. Upon receiving the data request, the server process analyzes the request and gathers the requested data. Finally, the server process then generates a reply including at least the requested data and transmits the reply to the client process. The data may be transferred, more commonly, as datagrams or a stream of characters (e.g., bytes).
Shared memory refers to the allocation of virtual memory space in order to substantiate a mechanism for which data may be communicated and/or accessed by multiple processes. In implementing shared memory, an initializing process first creates a shareable segment in persistent or non-persistent storage. Post creation, the initializing process then mounts the shareable segment, subsequently mapping the shareable segment into the address space associated with the initializing process. Following the mounting, the initializing process proceeds to identify and grant access permission to one or more authorized processes that may also write and read data to and from the shareable segment. Changes made to the data in the shareable segment by one process may immediately affect other processes, which are also linked to the shareable segment. Further, when one of the authorized processes accesses the shareable segment, the shareable segment maps to the address space of that authorized process. Often, only one authorized process may mount the shareable segment, other than the initializing process, at any given time.
Other techniques may be used to share data, such as the various data described in the present application, between processes without departing from the scope of the one or more embodiments. The processes may be part of the same or different application and may execute on the same or different computing system.
Rather than or in addition to sharing data between processes, the computing system performing the one or more embodiments may include functionality to receive data from a user. For example, in one or more embodiments, a user may submit data via a graphical user interface (GUI) on the user device. Data may be submitted via the graphical user interface by a user selecting one or more graphical user interface widgets or inserting text and other data into graphical user interface widgets using a touchpad, a keyboard, a mouse, or any other input device. In response to selecting a particular item, information regarding the particular item may be obtained from persistent or non-persistent storage by the computer processor. Upon selection of the item by the user, the contents of the obtained data regarding the particular item may be displayed on the user device in response to the user's selection.
By way of another example, a request to obtain data regarding the particular item may be sent to a server operatively connected to the user device through a network. For example, the user may select a uniform resource locator (URL) link within a web client of the user device, thereby initiating a Hypertext Transfer Protocol (HTTP) or other protocol request being sent to the network host associated with the URL. In response to the request, the server may extract the data regarding the particular selected item and send the data to the device that initiated the request. Once the user device has received the data regarding the particular item, the contents of the received data regarding the particular item may be displayed on the user device in response to the user's selection. Further to the above example, the data received from the server after selecting the URL link may provide a web page in Hyper Text Markup Language (HTML) that may be rendered by the web client and displayed on the user device.
Once data is obtained, such as by using techniques described above or from storage, the computing system, in performing the one or more embodiments, may extract one or more data items from the obtained data. For example, the extraction may be performed as follows by the computing system (500) in
Next, extraction criteria are used to extract one or more data items from the token stream or structure, where the extraction criteria are processed according to the organizing pattern to extract one or more tokens (or nodes from a layered structure). For position-based data, the token(s) at the position(s) identified by the extraction criteria are extracted. For attribute/value-based data, the token(s) and/or node(s) associated with the attribute(s) satisfying the extraction criteria are extracted. For hierarchical/layered data, the token(s) associated with the node(s) matching the extraction criteria are extracted. The extraction criteria may be as simple as an identifier string or may be a query presented to a structured data repository (where the data repository may be organized according to a database schema or data format, such as eXtensible Markup Language (XML)).
The extracted data may be used for further processing by the computing system. For example, the computing system (500) of
The computing system (500) in
The user, or software application, may submit a statement or query into the DBMS. Then the DBMS interprets the statement. The statement may be a select statement to request information, update statement, create statement, delete statement, etc. Moreover, the statement may include parameters that specify data, data containers (a database, a table, a record, a column, a view, etc.), identifiers, conditions (comparison operators), functions (e.g. join, full join, count, average, etc.), sorts (e.g. ascending, descending), or others. The DBMS may execute the statement. For example, the DBMS may access a memory buffer, a reference or index a file for read, write, deletion, or any combination thereof, for responding to the statement. The DBMS may load the data from persistent or non-persistent storage and perform computations to respond to the query. The DBMS may return the result(s) to the user or software application.
The computing system (500) of
For example, a GUI may first obtain a notification from a software application requesting that a particular data object be presented within the GUI. Next, the GUI may determine a data object type associated with the particular data object, e.g., by obtaining data from a data attribute within the data object that identifies the data object type. Then, the GUI may determine any rules designated for displaying that data object type, e.g., rules specified by a software framework for a data object class or according to any local parameters defined by the GUI for presenting that data object type. Finally, the GUI may obtain data values from the particular data object and render a visual representation of the data values within a display device according to the designated rules for that data object type.
Data may also be presented through various audio methods. In particular, data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device.
Data may also be presented to a user through haptic methods. For example, haptic methods may include vibrations or other physical signals generated by the computing system. For example, data may be presented to a user using a vibration generated by a handheld computer device with a predefined duration and intensity of the vibration to communicate the data.
The above description of functions presents only a few examples of functions performed by the computing system (500) of
While the one or more embodiments have been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the one or more embodiments as disclosed herein. Accordingly, the scope of the one or more embodiments should be limited only by the attached claims.