The current application relates to detecting anomalous call behavior.
Scam telephone calls and robocalls are becoming an increasing problem. There are solutions that can be used to block inbound calls from numbers known to be used by robocallers and/or for scam calls. While these solutions may be useful they require an end user to install some application or use a device to provide the desired functionality.
It is difficult to adapt existing solutions from end-user devices to a telephone network level as it may be unacceptable for the telephone network to block a number that was incorrectly identified as being associated with a robocall or scam call. Identifying telephone numbers associated with robocalls or scam calls based on data available to network operators can be a difficult task given the volume of data needed to process.
It would be desirable to have new, additional and/or improved tools for use by telephone network operators in identifying and blocking telephone numbers associated with making robocalls and/or scam calls.
Further features and advantages of the present disclosure will become apparent from the following detailed description taken in combination with the appended drawings, in which:
In accordance with the present disclosure there is provided a system for use in blocking phone numbers in a telephone network comprising: one or more processors for executing instructions; and at least one memory for storing instructions, which when executed by at least one of the one or more processors configure the system to perform a method comprising: receiving from a plurality of telephone network elements a plurality of raw call log records; periodically processing the received plurality of raw call log records comprising: formatting each of the raw call log records into a corresponding call record having a common format; and identifying raw call log records or call records associated with a same call; and aggregating raw call log records or call records associated with the same call together; periodically processing the call logs comprising: processing the call logs using a first trained model to identify phone numbers associated with anomalous call behaviour as anomalous phone numbers; and processing the call logs using a second trained model to identify phone numbers associated with a first undesirable type of call behaviour as first undesirable call type phone numbers; and blocking at least one phone number of the anomalous phone numbers and the first undesirable call type phone numbers from making calls over the telephone network.
In an embodiment of the system, the first undesirable call type is a Wangiri type scam call.
In an embodiment of the system, the at least one phone number that is blocked is further processed to ensure the number should be blocked prior to being blocked.
In an embodiment of the system, the method provided by executing the instructions further comprises: automatically calling at least one of the phone numbers of the anomalous phone numbers and the first undesirable call type phone numbers; and recording a portion of the calls made automatically.
In an embodiment of the system, the method provided by executing the instructions further comprises: generating a user interface including an indication of one or more of the anomalous phone numbers and the first undesirable call type phone numbers; providing the generated user interface to an investigator of the telephone network operator; and receiving from the user interface a selection including the at least one phone number for blocking.
In an embodiment of the system, the generated user interface further includes an indication of the recorded portion of the calls.
In an embodiment of the system, the method provided by executing the instructions further comprises: retrieving additional information from one or more sources on the anomalous phone numbers and the first undesirable call type phone numbers; and including the additional information in the generated user interface. blocking at least one phone number of the anomalous phone numbers and the first undesirable call type phone numbers from making calls over the telephone network.
In an embodiment of the system, the method provided by executing the instructions further comprises: unblocking blocked phone numbers.
In an embodiment of the system, unblocking blocked phone numbers comprises: identifying blocked phone numbers; and for each blocked phone number, determining if there has been no call activity over the telephone network associated with the blocked phone number for a threshold period of days, and unblocking the blocked phone number when it is determined that the has been no call activity for the threshold period of days.
In accordance with the present disclosure there is further provided a method for use in detecting fraudulent phone numbers associated with undesirable behavior in a telephone network.
In accordance with the present disclosure there is further provided a system for detecting fraudulent phone numbers associated with undesirable behavior in a telephone network.
In accordance with the present disclosure there is further provided a method of processing call detail records (CDRs), comprising: receiving a plurality of CDRs, each of the CDRs comprising a calling party number, a callee number, a gap value, a gap type value, a start time, an end time, a ringing time, and a conversation time; determining a dialed callee number for each of the CDRs; identifying at least two CDRs associated with a call event based on one or more similarity thresholds being met between the at least two CDRs; determining a maximum conversation time among all CDRs associated with the call event; and generating a processed CDR for the call event comprising at least the calling party number, the dialed callee number, and the maximum conversation time.
In an embodiment of the method, determining the dialed callee number in each of the CDRs comprises, for each respective CDR: determining whether an additional number field in the respective CDR has a value and whether the value differs from the calling party number; and determining that the dialed callee number in the respective CDR is the value in the additional number field when it is determined that there is the value in the additional number field and that the value differs from the calling party number.
In an embodiment of the method, when it is determined that the additional number field is blank or that the additional number field has a value that is the same as the calling party number, the method further comprises: determining whether the gap type value denotes one of a destination number, a ported number, or a transfer number; and determining that the dialed callee number in the respective CDR is the gap value when the gap type value denotes one of a destination number, a ported number, or a transfer number.
In an embodiment of the method, when it is determined that the gap type value does not denote one of a destination number, a ported number, or a transfer number, determining that the dialed callee number is the callee number of the respective CDR.
In an embodiment of the method, identifying the at least two CDRs associated with the call event comprises: identifying two consecutive CDRs based on the calling party number, the dialed callee number, and the start time; and determining that the two consecutive CDRs are associated with the call event based on the one or more similarity thresholds being met between the two consecutive CDRs.
In an embodiment of the method, the method further comprises generating a sorted list of CDRs by sorting the plurality of CDRs based on the calling party number, the dialed callee number, and the start time, for use in identifying the two consecutive CDRs.
In an embodiment of the method, the similarity thresholds comprise one or more of: a difference of start time is less than w seconds; a difference of end time is less than x seconds; a difference of ringing time is less than y seconds; and a difference of conversation time is less than z seconds, wherein each of w, x, y, and z are predetermined threshold values.
In an embodiment of the method, the method further comprises determining a maximum ringing time among all CDRs associated with the call event; and generating the processed CDR for the call event further comprising the maximum ringing time.
In an embodiment of the method, the plurality of CDRs comprise CDRs generated from a multi-hop call event.
In an embodiment of the method, the plurality of CDRs are received from a plurality of network switches.
In accordance with the present disclosure there is further provided a system for processing call detail records (CDRs), comprising: one or more processors for executing instructions; and at least one non-transitory computer-readable memory storing instructions which, when executed by at least one of the one or more processors, configure the system to perform a method comprising: receiving a plurality of CDRs, each of the CDRs comprising a calling party number, a callee number, a gap value, a gap type value, a start time, an end time, a ringing time, and a conversation time; determining a dialed callee number for each of the CDRs; identifying at least two CDRs associated with a call event based on one or more similarity thresholds being met between the at least two CDRs; determining a maximum conversation time among all CDRs associated with the call event; and generating a processed CDR for the call event comprising at least the calling party number, the dialed callee number, and the maximum conversation time.
In an embodiment of the system, determining the dialed callee number in each of the CDRs comprises, for each respective CDR: determining whether an additional number field in the respective CDR has a value and whether the value differs from the calling party number; and determining that the dialed callee number in the respective CDR is the value in the additional number field when it is determined that there is the value in the additional number field and that the value differs from the calling party number.
In an embodiment of the system, when it is determined that the additional number field is blank or that the additional number field has a value that is the same as the calling party number, the method performed by the system further comprises: determining whether the gap type value denotes one of a destination number, a ported number, or a transfer number; and determining that the dialed callee number in the respective CDR is the gap value when the gap type value denotes one of a destination number, a ported number, or a transfer number.
In an embodiment of the system, when it is determined that the gap type value does not denote one of a destination number, a ported number, or a transfer number, determining that the dialed callee number is the callee number of the respective CDR.
In an embodiment of the system, identifying the at least two CDRs associated with the call event comprises: identifying two consecutive CDRs based on the calling party number, the dialed callee number, and the start time; and determining that the two consecutive CDRs are associated with the call event based on the one or more similarity thresholds being met between the two consecutive CDRs.
In an embodiment of the system, the method performed by the system further comprises generating a sorted list of CDRs by sorting the plurality of CDRs based on the calling party number, the dialed callee number, and the start time, for use in identifying the two consecutive CDRs.
In an embodiment of the system, the similarity thresholds comprise one or more of: a difference of start time is less than w seconds; a difference of end time is less than x seconds; a difference of ringing time is less than y seconds; and a difference of conversation time is less than z seconds, wherein each of w, x, y, and z are predetermined threshold values.
In an embodiment of the system, the method performed by the system further comprises: determining a maximum ringing time among all CDRs associated with the call event; and generating the processed CDR for the call event further comprising the maximum ringing time.
In an embodiment of the system, the plurality of CDRs comprise CDRs generated from a multi-hop call event.
In an embodiment of the system, the plurality of CDRs are received from a plurality of network switches.
Undesirable phone calls can be a problem for consumers. These calls may include various types of scams or other undesirable calls. For example, some calls may impersonate a revenue agency such as the Canadian Revenue Agency (CRA) or the Internal Revenue Service (IRS) and have the victim transfer money or other payments to the perpetrator. Other types of scam calls may include Wangiri, or “one ring” calls in which a scammer calls a target from a phone number and hangs up after one or two rings, or just long enough to register as a missed call. This process may be repeated from the same or slightly different phone number. If the target calls back the phone number, for example out of curiosity, the return number may be for a “pay to call” or premium rate number causing the target to pay these charges. These types of scam calls may be made by robocalls, or may use robocalls to identify possible phone numbers that are active. As described further below, a telephone network operator may collect and process call data from their telephone network in order to identify phone numbers associated with the undesirable behaviours. Once such phone numbers are identified, they may be blocked from making and/or receiving calls on the telephone network operator's network.
The processing of the data collected from the various network elements 104 may be performed by one or more servers 110. The server(s) 110 comprises one or more processing units 112 for executing instructions and memory units 114 for storing instructions which when executed by the processing units 112 configure the server(s) 110 to provide functionality for identifying and blocking phone numbers associated with undesirable behaviour. The server(s) 110 may also include non-volatile (NV) storage 116 as well as one or more input/output (I/O) interfaces 118 for connecting internal and/or external components, devices and/or peripherals to the server(s) 110.
The functionality 120, which is provided by executing the instructions stored in the memory, includes data collection functionality 122 for processing the data collected by the network elements 104, detection functionality 124 for detecting, or rather identifying, phone numbers associated with undesirable behaviour, action functionality 126 for blocking and unblocking phone numbers, investigative interface functionality 128 for providing an interface to investigators of the telephone network operator, as well as additionally investigative processing functionality 130.
Broadly, the data collected by the network elements 104 is pre-processed by the data collection functionality 122 and the pre-processed data is used by the detection functionality 124 to identify phone numbers associated with undesirable call behaviour. The identified phone numbers associated with undesirable call behaviour can be blocked/unblocked or other actions may be taken by the action functionality 126. The actions may be taken automatically, or may be taken based on additional user (e.g. network operator level) input. In a non-limiting example, the additional user input may be provided by an investigator using an interface provided by the investigative interface functionality 128. The investigative interface functionality 128 may also use or solicit additional information that may be useful to the investigator and provided by the investigative processing functionality 130.
As described above, the data collection functionality 122 pre-processes data collected by the network elements 104. The raw call log data may be stored or accessed in numerous different ways, which are depicted schematically as a database 132 in
The raw call data logs may be periodically processed in relatively short periods. For example, the raw call data logs may be processed every 5 minutes. Alternatively, this processing may be done in longer or shorter intervals, or possibly in real time. Regardless of the time intervals of processing the raw call data logs, once the records are processed by the log pre-processing functionality 134 the resulting call records 136 can be stored for subsequent processing by the detection functionality 124.
The detection functionality 124 may comprise various different functionality for processing the call records 136 to identify phone numbers associated with undesirable behaviour. As depicted in
Each of the detection functionalities 138, 142, 144 may label or otherwise provide some other indication of the phone numbers that were detected by the various functionalities as possibly being associated with undesirable call behaviour. That is, for example, the general anomaly detection functionality 138 may provide an indication of one or more phone numbers that were determined to be anomalous, the Wangiri detection functionality 142 may provide an indication of one or more phone numbers that were determined to be associated with Wangiri fraud calls, etc. Details of illustrative implementation of both the general anomaly detection functionality 138 and the Wangiri detection functionality 142 are described in further detail below.
Once one or more phone numbers have been identified by the detection functionality 124, one or more actions may be taken on the phone numbers by action functionality 126. The actions may be taken automatically, or may be taken after some form of user interaction, for example by an investigator of the network operator. For example, an anomalous phone number may not be blocked automatically, but may be marked for blocking after an investigation or further review by an investigator. As depicted, the action functionality 126 may include phone number blocking functionality 146 and phone number unblocking functionality 148.
Depending upon how phone numbers are marked for blocking as well as the level of acceptability of potentially blocking a valid phone number, the blocking functionality 146 may, in a non-limiting example, simply automatically block all provided or marked phone numbers. Alternatively, the blocking functionality may include one or more checks or business rules that are applied to the phone numbers marked for blocking and only those phone numbers passing all of the checks may be blocked.
The phone numbers identified by the detection functionality 124 may be automatically passed to the phone number blocking functionality 146, or they may first be passed to investigative interface functionality 128 for generating an interface for use by an investigator. The investigative interface functionality 128 may include a graphical user interface (GUI) generation functionality 150 that generates an investigative interface that may present the identified telephone numbers to an investigator, which may allow the investigator to determine whether or not the phone number(s) should be blocked or not. The GUI that is generated may include an indication, such as a button or other GUI element, that allows the investigator to select a phone number for subsequent blocking by the phone number blocking functionality 146. In addition to providing an indication of one or more of the phone numbers identified by the detection functionality 124, the GUI may further include additional information that may be helpful to an investigator in determining whether to block a phone number or not.
In order to provide the additional information, the investigative interface functionality 128 may include data collection functionality 152 for retrieving or accessing the additional information presented in the generated GUI. The data collection functionality 152 may retrieve information from various sources. For example, the data collection functionality may retrieve information from one or more subscriber data sources of the telephone network operator to retrieve information associated with phone numbers that are provided by the telephone network operator. Additionally, the data collection functionality 152 may retrieve information from other sources such as provided by the investigative processing functionality 130.
The investigative processing functionality 130 may include one or more different functionalities or elements for providing additional relevant information. For example, the investigative processing functionality may include honey pot number functionality 154 that provides a honey pot phone number that is not used for other purposes and as such any numbers calling the honey pot phone number may be considered anomalous or presenting undesirable behaviour. Additionally, the investigative processing functionality 130 may include automated call-back functionality that can call back identified phone numbers, including for example suspicious numbers or those potentially associated with undesirable behaviours, and record the phone call. The automated call back functionality 156 may simulate a call. Additionally, the investigative processing functionality 130 may include 3rd party data collection functionality 158 that can retrieve or access information from 3rd party sources such as yellow-page information or 3rd party sources collecting information about robocalls or possible fraudulent calls.
Returning to the general anomaly detection functionality 138 depicted in
The Isolation Forest model, tuned using features including those mentioned above, may assign an anomaly score to each number, or originating phone number, which may also be known as calling party or caller. Experiments have shown that the more anomalous the behavior of a particular anumber (i.e. calling party number) is, as defined by the features including those mentioned above, the more likely it is to be assigned a higher anomaly score by the Isolation Forest algorithm, as compared to anumbers that demonstrate “normal” behavior. It will be appreciated that in order to evaluate the detection performance of the Isolation Forest model, one or more sources of verified anomalous phone numbers may be used. For example, the sources used may, for example, be Yellow Pages and/or Nomorobo or similar other sources, which are relatively less biased sources of information due to their crowd-sourced nature.
During performance tuning using Yellow Pages sourced data, it was found that the naïve Isolation Forest model did not result in acceptable accuracy when evaluated by the Yellow Pages reported rate (as defined below). By performing experiments, however, it was found that the addition of a filtering step that eliminated all anumbers with outgoing calls less than a threshold improved the accuracy by approximately 30% when compared to the baseline. Note that the filtering step was not used when evaluating the model using Nomorobo data but still achieved acceptable accuracy.
As a result of the performance tuning experiments, it will be appreciated that the general anomaly detection functionality 138 may, in a non-limiting example, include two Isolation Forest models: (1) the naive Isolation Forest model that detects anomalies that are likely to be also reported by Nomorobo, and (2) the filter controlled Isolation Forest model that detects anomalies that are likely to be also reported by Yellow Pages.
Varied measures may be used to evaluate the performance of the two Isolation Forest models. To evaluate the filter controlled model using Yellow Pages sourced data, one measure that may be used is the Yellow Pages reported rate Y which, for an anumber a that is flagged by the model and is also reported on Yellow Pages, is given by:
And N is the total number of anomalies detected by the model.
To evaluate the naïve model using Nomorobo sourced data, one measure that may be used for example is the Nomorobo reported rate Φ, which, for an anumber a that is flagged by the model, is given by:
Φ=[ΣNomoroboreported(a)]/N
The Yellow Pages reported rate indicates how many anumbers out of the detected anomalies are reported as scammers or debt collectors on Yellow Pages, while the Nomorobo reported rate indicates how many anumbers out of the detected anomalies are reported as robocallers in Nomorobo.
The execution time for performing a grid search in order to tune the parameters of the Isolation Forest models was found to be prohibitive. Therefore, random search was performed instead, using the popular Python library scikit-learn. For the model tuned using Nomorobo sourced data, with 3 fold cross-validation, the resulting accuracy in terms of Nomorobo reported rate was 48.4%.
For the filter controlled Isolation Forest model, an iterative grid search was performed. With 3 fold cross-validation, the resulting accuracy in terms of Yellow Pages reported rate was 60%.
The Precision score was evaluated in a real run, and was calculated by dividing the total number of distinct anumbers that were reported in either Nomorobo or in Yellow Pages by the total number of all anumbers detected as anomalies. The best Precision score observed was 73.87%, on Jan. 3, 2019. During business days, the Precision score is usually observed to be around 60%, while on Sundays, it is usually observed to be less than 40%.
The above has described the anomaly detection as attempting to detect robocalls and/or debt collector/telemarketer calls. It will be appreciated that other anomalous behaviours may also be detected. For example, profile based, or caller behavior based, anomaly detection is possible. In profile based anomaly detection, for example, one might first establish a profile for each caller in the data. The profile may be established by looking at all available call history for each caller, or a subset of it. By analyzing these profiles, it is possible to find unusually deviant behavior, such as sudden spikes/drops in number of calls, sudden increase in calls to a specific destination number, etc. This may help, for example, in detecting spoofed numbers. To build up each caller's profile, time series analysis can be used, more specifically, moving average of each attribute, matrix profiling to discover motif pattern of spammers and hence the abnormality detection.
Returning to the Wangiri detection functionality 142 depicted in
A machine learning approach may be used to automatically “learn” the characteristics of a particular scam by using labelled examples of the scam. Such an approach can semi-automatically tune itself over time to account for changes in input data, representing, in this case, scammer behavior.
In a non-limiting example, in order to mathematically model the behaviour of Wangiri scammers, the following features may be used, which can be prepared or derived from the call logs.
Metrics #4 to #12 above are the predictors (aka features) in the Wangiri model, while the response is a class label that can take on one of two values—“Wangiri” or “Not Wangiri”. It will be appreciated that this is an example of a binary classification problem.
The approach used to solve this problem is to estimate one or more mathematical functions that describe the relationship(s) between the predictors and the response. The function(s) may be typically estimated from a set of manually labelled data that provides examples of each class. These functions, which constitute a model, may then be used to predict the class label (“Wangiri” or “Not Wangiri”) of future data. Labelled training data for the Wangiri class may be obtained using the investigator interface which may initially present investigators with anomalous phone numbers to be investigated. The calls that the investigator consider to be Wangiri can be labelled and used for the training data. The non-Wangiri class training data may be obtained from random sampling of the call data since the vast majority of call data passing over a telephone network will not be Wangiri calls.
The Wangiri detection model may use a Random Forest classifier. This particular classifier was determined to be preferable after comparing the performance of several different classifiers on the labelled data. Model hyperparameters (number of estimators and maximum number of features) were chosen using a Grid Search using 10 fold cross-validation, with the objective of choosing the parameter combination that maximized the F1-Score. The rationale for choosing to optimize the F1-Score, rather than the Precision or Recall, is to provide a balance between false positives and false negatives for the initial model. Originally, the selection criteria solely consisted of maximizing the Precision, but a quick ad-hoc analysis showed that some models with slightly lower precisions (−2%) had significantly higher recalls (+20%). The slightly lower precision, which can result in legitimate numbers being incorrectly identified as Wangiri numbers can be addressed by developing additional rules or filters to filter out the legitimate numbers from the Wangiri numbers. Using a business logic layer to protect legitimate customers from accidentally being blocked, optimizing on the F1-Score, provides significant recall, while mitigating any consequence of a slightly lower precision.
The best estimator chosen from the Grid Search has the following scores (over 10 folds):
The Precision-Recall curve is shown in
The labelled dataset used in this modelling process is fairly large and imbalanced (232,477 examples in total; the positive class makes up 1.73% of total). Due to this, training certain ML algorithms turned out to be infeasible due to very large runtimes. In particular, finding the best estimator using a grid search (or even a random search) for the Support Vector Machine (SVM) with a non-linear kernel and >5 fold cross-validation took unreasonably long. The Random Forest (RF) classifier was chosen mainly for its computational advantages (as well as good classification performance in general), such as the fact that it is inherently parallelizable. Further, RF is relatively less sensitive to the choice of initial values of hyperparameters.
Those skilled in the art will appreciate that it is particularly desirable to have an end-to-end automated system in place that detects and blocks Wangiri scammers, as well as other scams, with minimal human intervention. This blocking may be done automatically; however depending on the level of false positives that are acceptable to be blocked in error, additional logic may be used to further filter out possible legitimate phone numbers that were incorrectly identified as Wangiri numbers. As an example, this logic may, for each suspected Wangiri number, verify that:
It will be appreciated that the above logic may be weighted so that the importance of one test compared to another may be varied as desired. Further, additional or alternative logic may be used to ensure any incorrectly identified Wangiri numbers are not blocked.
A semi-automated approach may be used to block Wangiri phone numbers, or other scam numbers. In a non-limiting example, the semi-automated approach may automatically block verified Wangiri numbers; however use of a human investigator may be used to verify that Wangiri numbers predicted by the detection model are in fact Wangiri numbers. For example, the predicted Wangiri numbers may be presented to an investigator, possibly along with additional useful information for verifying that the call is a Wangiri call, and the investigator may then either verify or refute the prediction. In addition, the verified/refuted predictions may also be used as training data to further train the prediction models.
The Wangiri detection model may divide model predictions for predicted Wangiri calls into two buckets—“Wangiri” and “Manual Review”—for display to human analysts. This division is based on a general rule that applies to most Wangiri scam calls, namely that the originating number typically originates from overseas. To quantify this, it is possible to compute:
The division into the two buckets is then performed by applying thresholds on the value of I. The “Wangiri” bucket includes numbers that are with high confidence Wangiri scammers. The “Manual Review” bucket includes numbers that, although identified by the ML model as Wangiri, are less certain, taking into account the value of I.
The items tagged for manual review are intended to be manually investigated and labelled by human analysts. With this process in place, it is possible to create an automatic feedback loop where:
Alternatively, it is possible to make/a feature for the model itself, rather than post-processing and using thresholds on it.
To avoid over fitting during the automatic training, it is possible to use normal business users' numbers and common users' numbers that have never been flagged (or numbers that are known to be good).
As described above, call detail records (CDRs) (i.e. raw call log data records described above) are pre-processed so that features can be derived from the processed CDRs (i.e. the processed call records described above) and input into a trained model to identify anomalous call behavior and/or specific types of undesirable call behavior. The CDRs are generated by network elements such as ISUP SS7 network switches as described above. However, these CDRs generally require pre-processing to be input into the models and for performing subsequent analysis. For example, as described with reference to
Specifically, for multi-hop calls, which are calls that involve multiple switches during connection, one CDR is generated per switch. Therefore, the multi-hop calls result in multiple CDRs that are associated with a single call event. Table I shows an example of a single multi-hop call event for which multiple CDRs were generated. The fields in these CDRs are described in Table II, which are defined by SS7 protocol/architecture.
As seen in Table I, multiple CDRs may be generated from a single multi-hop call event (one per switch). The multiple records generated for multi-hop call events give rise to the following issues:
To detect anomalous call behavior and to identify specific types of undesirable calls, it is important to identify and label all records associated with the same call event so that call events are analyzed and not just individual call records. For example, as described above, the general anomaly detection model may consider features of CDRs including the number of unique outgoing calls, the number of unique callees, how long a conversation lasts for a given call record, etc. Likewise, for Wangiri detection functionality, features considered may include the number of outgoing calls, the number of unique destination numbers called, an average call duration, the standard deviation of call duration, etc. Further, since Wangiri calls are often “one ring” calls, ringing time may be considered as well. It will thus be appreciated that inputting features derived from respective CDRs to the trained model(s), without consideration of whether there are multiple CDRs corresponding to the same call event, will negatively affect the accuracy of the model predictions and classification of caller behavior. For example, the data may suggest that there are multiple calls made to the same callee number when in fact there was only a single call event. Additionally, where two or more CDRs for the same call event differ (e.g. in conversation time and/or ringing time), it is important that the correct value is input to the trained model(s).
The method 700 comprises receiving a plurality of CDRs (702). Each of the CDRs comprise a calling party number, a callee number, a gap value, a gap type value, a start time, an end time, a ringing time, and a conversation time. The CDRs may be generated from a plurality of networks switches, in particular ISUP SS7 network switches. The plurality of CDRs comprise at least some CDRs that are generated from a multi-hop call event.
A dialed callee number is determined for each of the CDRs (704). As described above, the callee number that appears in the CDR may not be the number that is actually dialed by the caller. Instead, the callee number appearing in the CDR may be a number of an intermediate component, a local routing number, etc. Accordingly, to identify CDRs associated with a same call event, the dialed callee number is determined for each of the CDRs. A method of determining a dialed callee number in the CDRs is described in more detail with reference to
At least two CDRs associated with a call event are identified based on one or more similarity thresholds being met between the at least two CDRs (706). Having identified the dialed callee number, two or more CDRs generated for calls from the caller number to the dialed callee number can be identified as being associated with the same call event when one or more similarity thresholds are met between the two or more CDRs. Accordingly, all CDRs associated with a same call event are identified, and may be labelled or otherwise associated together. A method of identifying CDRs associated with a same call event is described in more detail with reference to
A maximum conversation time is determined among all CDRs associated with the call event (708). As described above, some values in the CDRs may not be the same across all CDRs for the same call event. To resolve ambiguities in conversation time values, the correct value of the conversation time is considered to be the maximum conversation time among all CDRs associated with the call event. Namely, for a call event with ID i and n redundant CDRs, the conversation duration Di is considered as Di=Max(Di1, Di2, . . . , Din). It will also be appreciated that depending on the parameter in the CDR, different statistical methods (maximum, average, etc.) can be considered for providing the most appropriate value.
A processed CDR for the call event is generated comprising at least the calling party number, the dialed callee number, and the maximum conversation time (710). In some embodiments, a maximum ringing time among all CDRs associated with the call event may be determined, and the processed CDR may also comprise the maximum ringing time. Accordingly, features from the processed CDR, together with other processed CDRs, can be input to one or more trained models to identify anomalous call behavior and/or specific types of undesirable call behavior. Moreover, features from the processed CDRs can be analyzed for other purposes as well, such as customer churn prediction, call volume reporting, etc.
Raw CDRs are received (802). A determination is made as to whether the additional number field (i.e. the cnumber field) is not blank (i.e. has a value) and whether that value differs from the anumber value (i.e. the calling party number) (804). If there is a value in the additional number field and that value is not the same as the calling party number (YES at 804), the dialed callee number (i.e. dialed_bnumber) is set as the value in the additional number field (806).
If the additional number field (i.e. the cnumber field) is blank or null, or if the additional number field has a value corresponding to the calling party number (NO at 804), a determination is made as to whether a gaptype value is one of {1, 192, 253} (808). In the ISUP SS7 protocol, a gaptype value of 1 denotes destination number, a gaptype value of 192 denotes ported number, and a gaptype value of 253 denotes transfer number. Accordingly, the determination at 808 determines whether the gaptype value denotes one of a destination number, a ported number, or a transfer number.
When it is determined that the gaptype value denotes one of a destination number, a ported number, or a transfer number (YES at 808), the dialed callee number (i.e. dialed_bnumber) is set as the gap value (810).
If the gaptype value does not denote one of a destination number, a ported number, or a transfer number (NO at 808), the dialed callee number (i.e. dialed_bnumber) is set as the callee number (i.e. the bnumber) (812).
Note that while the method 800 is shown as evaluating the additional number field first at 804 and then the gaptype value at 806, it is also possible that these determinations may be performed in a different order.
In the method 900, CDRs are received (902), which may correspond to the raw CDRs with the dialed callee number determined from method 800 added to or associated with each of the CDRs.
To identify CDRs as being associated with the same call event, it is possible that each CDR (with dialed callee number determined) can be evaluated against all other CDRs to identify CDRs with the same caller number, dialed callee number, and that match one or more similarity thresholds. To improve computational efficiency and reduce processing time, the method of identifying CDRs as being associated with the same call event may comprise identifying consecutive CDRs based on caller number, dialed callee number, and start time, and comparing those two consecutive CDRs against the one or more similarity thresholds. To still further improve computation efficiency, a sorted list of the CDRs may be generated for use in identifying the two consecutive CDRs.
Accordingly, the method 900 may comprise generating a sorted list of CDRs (904) by sorting the plurality of CDRs based on the calling party number, the dialed callee number, and the start time. The method 900 may comprise evaluating every two consecutive CDRs (906).
Two CDRs are evaluated against one or more similarity thresholds. A determination is made as to whether the one or more similarity thresholds are met (908). The similarity thresholds may comprise one or more of: a difference of start time is less than w seconds; a difference of end time is less than x seconds; a difference of ringing time is less than y seconds; and a difference of conversation time is less than z seconds, wherein each of w, x, y, and z are predetermined threshold values. As an example, the threshold value w may be set as 5 seconds; the threshold value x may be set as 2 seconds; the threshold value y may be set as 1 second; and the threshold value z may be set as less than 1 second. It will be appreciated that different threshold values may be set without departing from the scope of this disclosure. In some embodiments, all of the similarity thresholds may be required to be met. In other embodiments, only some of the similarity thresholds may be evaluated or need to be met.
When two records are determined to satisfy one or more of the similarity thresholds (YES at 908), each record may be labelled or associated with one another to indicate that they belong to the same call event (910). The method 900 proceeds to evaluate the next two consecutive CDRs (912). If the one or more similarity thresholds are not met (NO at 908), the method likewise proceeds to evaluate the next two consecutive CDRs (912).
Although certain components and steps have been described above, it is contemplated that individually described components, as well as steps, may be combined together into fewer components or steps or the steps may be performed sequentially, non-sequentially or concurrently. Further, although described above as occurring in a particular order, one of ordinary skill in the art having regard to the current teachings will appreciate that the particular order of certain steps relative to other steps may be changed. Similarly, individual components or steps may be provided by a plurality of components or steps. One of ordinary skill in the art having regard to the current teachings will appreciate that the components and processes described herein may be provided by various combinations of software, firmware and/or hardware, other than the specific implementations described herein as illustrative examples.
The techniques of various embodiments may be implemented using software, hardware and/or a combination of software and hardware. Various embodiments are directed to apparatus, e.g. a node which may be used in a communications system or data storage system. Various embodiments are also directed to non-transitory machine, e.g., computer, readable medium, e.g., ROM, RAM, CDs, hard discs, etc., which include machine readable instructions for controlling a machine, e.g., processor to implement one, more or all of the steps of the described method or methods.
Some embodiments are directed to a computer program product comprising a computer-readable medium comprising code for causing a computer, or multiple computers, to implement various functions, steps, acts and/or operations, e.g. one or more or all of the steps described above. Depending on the embodiment, the computer program product can, and sometimes does, include different code for each step to be performed. Thus, the computer program product may, and sometimes does, include code for each individual step of a method, e.g., a method of operating a communications device, e.g., a wireless terminal or node. The code may be in the form of machine, e.g., computer, executable instructions stored on a computer-readable medium such as a RAM (Random Access Memory), ROM (Read Only Memory) or other type of storage device. In addition to being directed to a computer program product, some embodiments are directed to a processor configured to implement one or more of the various functions, steps, acts and/or operations of one or more methods described above. Accordingly, some embodiments are directed to a processor, e.g., CPU, configured to implement some or all of the steps of the method(s) described herein. The processor may be for use in, e.g., a communications device or other device described in the present application.
Numerous additional variations on the methods and apparatus of the various embodiments described above will be apparent to those skilled in the art in view of the above description. Such variations are to be considered within the scope.
The present application is a continuation-in-part of U.S. patent application Ser. No. 17/560,555, filed on Dec. 23, 2021, which claims priority to U.S. Provisional Patent Application No. 63/132,605, filed on Dec. 31, 2020, the entire contents of each of which is incorporated herein by reference for all purposes.
Number | Date | Country | |
---|---|---|---|
63132605 | Dec 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 17560555 | Dec 2021 | US |
Child | 18216044 | US |