METHOD AND APPARATUS FOR DETECTING ANOMOLIES IN COMMUNICATION DATA

Information

  • Patent Application
  • 20230409606
  • Publication Number
    20230409606
  • Date Filed
    June 17, 2022
    2 years ago
  • Date Published
    December 21, 2023
    a year ago
  • Inventors
    • Neves; Pedro Filipe Caldeira
    • Cardoso; Nuno André de Matos Lopes
  • Original Assignees
  • CPC
    • G06F16/285
    • G06F16/2282
  • International Classifications
    • G06F16/28
    • G06F16/22
Abstract
A method and system for determining anomalies in call center communications. Data relating to communications is streamed and processed to obtain baseline probability distributions over various domains of communications. Streams related to subsequent calls are compared to the baselines to determine anomalies.
Description
BACKGROUND

Contact centers, also referred to as “call centers”, in which agents handle communications with customers based on agent skills and customer requirements, are well known. The term “customer”, as used herein, can be any entity or individual contacting the contact center for information. FIG. 1 is an example system architecture of a cloud-based contact center system 100. Customers 110 interact with a contact center 150 using, for example, voice, email, text, and web interfaces to communicate with the agents 120 through a network 130 and at least one or more of text, voice or multimedia channels.


The agents 120 may be remote from the contact center 150 and handle communications (also referred to as “interactions” or “calls” herein) with customers 110 on behalf of an enterprise. The agents 120 may utilize devices, such as but not limited to, workstations, desktop computers, laptops, telephones, a mobile smartphone and/or a tablet. Similarly, customers 110 may communicate using a plurality of devices, including but not limited to, a telephone, a mobile smartphone, a tablet, a laptop, a desktop computer, or other. For example, telephone communication may traverse networks such as a public switched telephone networks (PSTN), Voice over Internet Protocol (VoIP) telephony (via the Internet), a Wide Area Network (WAN) or a Large Area Network (LAN). The network types are provided by way of example and are not intended to limit types of networks used for communications.


The agents 120 may be assigned to one or more queues representing call categories and/or agent skill levels. The agents 120 assigned to a queue may handle communications that are placed in the queue by the contact routing system 153. For example, there may be queues associated with a language (e.g., English or Chinese), topic (e.g., technical support or billing), or a particular country of origin. When a communication is received, the communication may be placed in a relevant queue, and eventually routed to one of the agents 120 associated with the relevant queue to handle the communication.


The contact center industry has been dealing with ever-more customer data on a daily basis and what was once a blind interaction with the customer at the other end of the line is now a data enriched experience that is very valuable to the call center and users of the call center (i.e., entities for which communications from customers are received). Moreover, within the past few years, there has been a trend to eliminate dedicated physical call centers in favor of virtual platforms in which call center services are provided to users in the form of Software as a Service (SaaS). In such platforms, agents can be employees or contractors and can be located centrally or in a distributed manner. For example, agents can works from their homes on flexible schedules. Such platforms reduce overhead for the user and scalable and convenient service.


Although this disaggregation has advantages, as it allows user companies to grow their business without the need to manage their call center or provide space for call center agents, it also poses some issues and challenges. Security issues are of a primary concern. The distributed nature of the systems provides multiple attach points for hackers. Also, it is more difficult to ensure that agents adhere to proper security protocols. Service level is also a concern as it is more difficult to train and supervise agents. It is known to increase service and security by detecting specific occurrences in call center communications. Known techniques apply filters that are looking for specific terms to trigger and action. For example, if a customer communication includes the words such as “angry” or “dissatisfied”, or phrases such as “cancel order”, the communication can be escalated to a manager or other agent equipped better to deal with unhappy customers. While sometimes referred to as “anomaly detection”, such techniques detect undesired activity but not necessarily anomalies. True anomaly detection of interactions in a call center requires a determination in substantially real time in view of a myriad of variables such as the subject matter of the call, times of day and year, the agent(s), call center user characteristics and domains, and the like. Current call center detection techniques do not provide the required speed and flexibility.


SUMMARY OF THE INVENTION

The disclosed implementations analyze agents' normal behavior and verify if there is any major change over time. This is often called anomaly detection and is very closely related to fraud detection. A first aspect of the invention is a method for creating a baseline database to be used to increase security in a call center implemented over a computing network by detecting anomalies in communication activities between call center agents and call center users, the method comprising: monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users; storing the communication activity data in a collected data database; aggregating the communication activity data into aggregated data; and creating, based on the aggregated data, at least one distribution of communication metrics over a period of time. A second aspect of the invention is a method for increasing security in a call center implemented over a computing network by detecting anomalies in communication activities between call center agents and call center users, the method comprising: monitoring event parameters of communication activities between call center agents and user; querying a baseline distribution database to determine that an event parameter represents a communication anomaly when the event parameter indicates an event that corresponds to a probability that is lower than a predetermined threshold probability and a calculated confidence of the event is higher than a predetermined confidence threshold, wherein the baseline distribution database is created by monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users, storing the communication activity data in a collected data database and aggregating the communication activity data to create at least one distribution of communication metrics over a period of time; and storing a record of the communication anomaly in and anomaly database.


A third aspect of the invention is a system for creating a baseline database to be used to increase security in a call center implemented over a computing network by detecting anomalies in communication activities between call center agents and call center users, the system comprising: at least one memory storing computer executable instructions; and at least one processor which, when executing the instructions accomplishes the method of: monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users; storing the communication activity data in a collected data database; aggregating the communication activity data into aggregated data; and creating based on the aggregated data, at least one distribution of communication metrics over a period of time.


A fourth aspect of the invention is a system for increasing security in a call center implemented over a computing network by detecting anomalies in communication activities between call center agents and call center users, the system comprising: at least one memory storing computer executable instructions; and at least one processor which, when executing the instructions accomplishes the method of: monitoring event parameters of communication activities between call center agents and user; querying a baseline distribution database to determine that an event parameter represents a communication anomaly when the event parameter indicates an event that corresponds to a probability that is lower than a predetermined threshold probability and a calculated confidence of the event is higher than a predetermined confidence threshold, wherein the baseline distribution database is created by monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users, storing the communication activity data in a collected data database and aggregating the communication activity data to create at least one distribution of communication metrics over a period of time; and storing a record of the communication anomaly in and anomaly database





BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing summary, as well as the following detailed description of the invention, will be better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there are shown in the appended drawings various illustrative embodiments. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown. In the drawings:



FIG. 1 is an architectural diagram of a conventional cloud based contact center computing environment.



FIG. 2 is a block diagram an architecture and data flow of system for anomaly detection in a call center in accordance with disclosed implementations.



FIG. 3 illustrates data model that can be used as the basis for anomaly detection in accordance with disclosed implementations.



FIG. 4 illustrates an example of a data aggregation ontology in accordance with disclosed implementations.



FIG. 5 illustrates and example of a data workflow in accordance with disclosed implementations.



FIG. 6 is a flow chart of process for creating distributions for use in anomaly detection in accordance with disclosed implementations.



FIG. 7 is a flow chart of process for creating distributions for use in anomaly detection in accordance with disclosed implementations.





DETAILED DESCRIPTION


FIG. 2 illustrates an architecture of call center anomaly detection system 200 in accordance with disclosed implementations. System 200 is a hybrid model which consumes data as streams from database 202, a Kafka database in this example, and processes that data in both streaming and batch. Apache Kafka™ is an event streaming database platform capable of a high volume of events. In this example, Databricks was used provide flexibility in programming languages and connections to other systems. Databricks™ is a cloud-based data environment that is capable of processing and transforming large quantities of data. through, for example, Machine Learning models.


Communications, between agents and customers for example, are monitored and data streams representing the communications are stored in database 202. This data is then processed by data module 204. Data module 204 can aggregate and segregate the data in various manners as described in more detail below. Baseline module 206 then applies distribution algorithms to produces one or more baseline probability distributions. A probability distribution is a known statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. Plotting of a baseline value (or multiple values) on the probability distribution can be based on a number of factors. These factors include the distribution's mean, standard deviation, skewness, and kurtosis. Data module 204 can create various distributions as needed. For example, distributions can correspond to specific agents, call centers, type of communication, and the like, or any combination thereof.


The anomaly detection system of disclosed embodiments takes raw data relating to communications and provides valuable insights through fast and reliable anomaly detection. FIG. 3 illustrates data model 300 that can be used as the basis for anomaly detection. Within the Databricks environment data can be consumed and stored into three different layers (referred to as “bronze”, “silver”, and “gold” layers herein). Bronze layer 302 holds the raw data, that may or may not be used immediately or at a later time for some other use cases. Silver layer 304 holds processed and filtered data that enables the creation of the baseline distributions which will be the used for anomaly detection. Gold layer 306 is focused on delivering valuable insights and consists of immutable or less mutable data while setting up support for advanced use cases related to anomaly detection. The data in each data layer is described in more detail below.


Data management layer 308 can include 6 modules define a set of policies or a way to trace data back to its origin:

    • Data Catalogue Module—A catalog on what data is being processed from the database and the data being generated to be consumed by other modules;
    • Data Dictionary Module—A detailed dictionary on every table and field within silver and gold data layers;
    • Data Lineage Module—Lineage is used to trace back results to origin and know exactly which processes originated the results, where it has consumed data and what data;
    • Data Archiving Module—Archiving policies define when, where, what and why data should be archived;
    • Data Retention Module—Data retention policies define how much time data needs to be stored in each table and S3 bucket discussed below;
    • Data Deletion Module—Data deletion policies define when, what and why data needs to be deleted, as well as it keeps record of what data was deleted, by whom and why.


The data management policies, catalog and dictionary can correspond to best practices and data engineering guidelines so that the data model can be scaled.


Any database management system can be used. However, in FIGS. 2 and 3 one Kafka database 202 illustrated. Data can be streamed to database 202 from various data sources, including:

    • Audit logs—provides information relating to agents' interactions, such as login and logouts, password changes or resets, authentication methods updates, and contact reads;
    • Calls—gives insights into the type of calls agents are performing, like inbound and outbound calls, missed calls, call initiated or finished;
    • Agents—provides information on updates done to agents' accounts, as when an agent account is created, deleted. activated or deactivated, as well as agent profile updates;
    • Accounts—shows when a client account was created, deleted, or updated;
    • Presence—indicates the agent's current status and when the status was altered (for example when the agent is online and becomes away, in a call, offline, or any of the other statuses that are available for the account the agent belongs to);
    • Call quality—gives an overview of the call quality of each agent;
    • Teams—incorporates updates, creation, and deletion of teams of agents;
    • Recordings—provides data on what call or screen recordings were accessed by whom;
    • Voice metrics—holds data concerning the quality of speech during calls, concerning the number of decibels and overall mood of the conversation.


The disclosed implementations for performing anomaly detection can be split into three main parts: (1) data ingestion into the data model; (2) creation of behavioral baselines; and (3) detecting anomalies on current data. Regardless of the original data source, data is ingested from database 202 (Kafka or another database) into bronze layer 302, which can be in the form of Amazon S3 buckets for example, for long-term storage. The data can be filtered and/or enriched for the events that need to be processed. This data can then be stored in delta tables within Databricks. All this can happen in streaming and data can be made available within the delta tables substantially in real-time, e.g., immediately after it is ingested by the Databricks processes.


Gold layer 306 is composed of processes that run in batches and fetch data from tables in silver layer 304 that pragmatically cannot be processed in streaming. One example of data in silver layer 304 is data related to sessions, where the session start event is processed long before the session end event, and therefore, the process cannot be waiting indefinitely. The baseline distributions can be created periodically (for example, once every day in batch at 00:05 UTC with 30 days of aggregated data from either silver and gold tables or other existing baselines). Data is aggregated by both agent and account per peer per day, in a predefined time period (for example, starting 31 days before the current UTC time and finishing 1 day before the current UTC time.


The baseline distributions can be composed of the four tables which are, for example:

    • Table 1—baseline per user per day;
    • Table 2—baseline per peers per day;
    • Table—baseline per details per user per day;
    • Table 4—baseline per details per peer per day.


The first table holds the aggregated metrics per agent, account, and day. The second, stores per account and day. So, these two tables have the aggregated baseline metrics per day, in either the agent or account level. The third and fourth tables have detailed information for either the agent or the account for each use case. Both baselines provide a different level/aspect of understanding from the agent perspective and the account perspective that can be used for calculating anomalies.


The following is an example situation illustrating a possible anomaly. An account is based in the US, so the agents usually log in from the US. However, the agent (who may be a remote contractor) has moved to France and is now logging in from there. The baseline for the agent will be the number of different countries that the login was made from. In this case, the number of different countries is 2 (U.S. and France) and will be stored in table 1 described above. In table 2, the baseline for the account, which is calculated considering all the agents in this account, will also store 2 as the number of countries from which the agents logged in (since all agents logged from the US and there was 1 agent that logged in from the US and then from France). Table 3 will store 2 records: (1) a record reflecting that a particular agent logged in from the US; (2) a record reflecting that this same particular agent has logged in from France. In table 4, there will be also 2 records: (1) a record that reflects the number of agents logged in from the US, which will be the total number of agents for the account; and (2) a record reflecting the number of agents logged in from France (in this example, 1).


As another example, a statistical distribution of a number of calls received from set of regions around the world per hour for a customer can be created and approximated as a Gaussian or some quasi-Gaussian distribution for instance. Then the probability of number of calls in each hour for each region is computed based on the corresponding distributions. A trigger can be actuated when the number of calls exceeds a threshold (computed as a function of the mean and first-order deviation from the distribution, for example). This technique can be used to identify unusual call volumes during hours where the expected numbers are within a range (as defined by the distribution).


Assume that the anomaly detection processes run every 10 minutes, aggregating data from that day and comparing it to the existing baselines for both users and peers. When a value is outside the norm or baseline distribution, an anomaly is detected and an anomaly message is triggered. For example, an anomaly message can be triggered specifying that the agent has made 20 outbound calls in a day when usually it only makes about 10 outbound calls in a day. However, the agents peers normally make about 19 outbound calls a day, so a particular agent making 20 outbound calls will not be completely out of the norm. Therefore, although an anomaly is detected for that particular agent, it is not an anomaly with respect to agents overall since the peers usually make about 19 outbound calls. Rules can be applied to determine an anomaly message based on which type(s) of anomalies have been detected.


Data in each layer can be processed and combined to cerate data streams for a subsequent layer in the workflow. The following table defines examples of the streams that can be used/generated in disclosed implementations:























Source
Source

Destination
Destination




ID
Origin data
system
Layer
Destination data
system
Layer
Description























First Level
1
event-splitter.audit_logs
kafka
Data
bronze_audit_logs
delta
bronze
Data comes from






source



Kafka, is then passed



2
event-splitter.calls
kafka
Data
bronze_calls
delta
bronze
and stored within






source



bronze tables



3
event-splitter.agents
kafka
Data
Bronze_agents
deite.
bronze







souce







4
event-splitter.teams
kafka
Data
bronze_teams
delta
bronze







souce







5
event-splitter.account
kafka
Data
bronze_accounts
delta
bronze







source







6

text missing or illegible when filed  -calls

kafka
Data
bronze_ text missing or illegible when filed
delta
bronze







source







7
broker.explore.agent-
kafka
Data
bronze_broker_agent_
delta
bronze





status-monthly- text missing or illegible when filed

source
status





Second Level
8
bronze_audit_logs
delta
bronze
silver_audit_logs_
delta
silver
Data from







contact_read


bronze_audit_logs



9



silver_audit_logs_create_
delta
silver
and is filtered into







interaction_recording


silver delta tables



10



silver_audit_logs_
delta
silver








delete_interaction_










recording






11



silver_audit_logs_list_
delta
silver








interaction_recording






12



silver_audit_logs_ text missing or illegible when filed
delta
silver








recording_update_










event






13



silver_audit_logs_read_
delta
silver








call_recordings






14



silver_audit_logs_read_
delta
silver








recording_media_file






15



silver_audit_logs_
delta
silver








update_interaction_










recording






16



silver_audit_logs_user_
delta
silver








authentication_










settings_updated






17



silver_audit_logs_user_
delta
silver








login_attempt






18



silver_audit_logs_user_
delta
silver








password_changed






19



silver_audit_logs_user_
delta
silver








password_reset






20



silver_audit_logs_user_
delta
silver








session_created






21



silver_audit_logs_user_
delta
silver








session_revoked






22
bronze_calls
delta
bronze
silver_calls_agent_call_
delta
silver








answered











text missing or illegible when filed







23



silver_calls_agent_
delta
silver








call_cancelled






24



silver_calls_agent_
delta
silver








call_finished






25



silver_calls_agent_
delta
silver








call_initiated






26



silver_calls_call_
delta
silver








agents_batch_dialed






27



silver_calls_call_
delta
silver








answered






28



silver_calls_call_
delta
silver








billed






29



silver_calls_call_
delta
silver









text missing or illegible when filed  _billed







30



silver_calls_call_
delta
silver








external_answered






31



silver_calls_call_
delta
silver








external_initiated






32



silver_calls_call_
delta
silver








finished






33



silver_calls_call_
delta
silver








initiated






34



silver_calls_call_
delta
silver








missed






35



silver_calls_ text missing or illegible when filed  _sent
delta
silver




36



silver_calls_outgoing_
delta
silver








call_answered






37



silver_calls_outgoing_
delta
silver








call_finished






38



silver_calls_outgoing_
delta
silver








call_initiated






39



silver_calls_outgoing_
delta
silver








call_missed






40
bronze_agents
delta
bronze
silver_agents_agent_
delta
silver








activated






41



silver_agents_agent_
delta
silver








created






42



silver_agents_agent_
delta
silver








deactivated






43



silver_agents_agent_
delta
silver








deleted






44



silver_agents_agent_
delta
silver








status_changed






45



silver_agents_agent_
delta
silver








updated






46



silver_agents_presence_
delta
silver








updated






47
bronze_teams
delta
bronze
silver_teams_system_
delta
silver








remove_members_










from_team






48



silver_teams_user_
delta
silver








add_member_to_team






49



silver_teams_user_
delta
silver








create_team






50



silver_teams_user_
delta
silver








delete_team






51



silver_teams_user_
delta
silver








remove_members_










from_team











text missing or illegible when filed







52



silver_teams_user_
delta
silver








update_team






53
bronze_ text missing or illegible when filed  _calls
delta
bronze
silver_ text missing or illegible when filed  _calls_
delta
silver








call_finished





Third Level
54
user_session_created
delta
silver
gold_all_sessions
delta
gold
Data from delta



55
user_session_revoked
delta
silver



tables is combined



56
user_session_created
delta
silver
gold_closed_sessions
delta
gold
into gold delta tables.



57
user_session_revoked
delta
silver


gold
Extensive ETL



58
gold_users
delta
silver

delta
gold
processing will be in










place here






text missing or illegible when filed indicates data missing or illegible when filed








FIG. 4 illustrates a data aggregation ontology according to an example of disclosed implementations. As discussed above, data in accordance with the data model is organized in bronze layer 302, silver layer 304 and gold layer 306. The baselines in bronze layer 302 represent the basic aggregations that store the information processed from the data sources available, such as:

    • calls;
    • call quality;
    • logs;
    • agents;
    • accounts;
    • teams.


Silver layer 304 represents aggregations performed over bronze baselines, which means that the aggregations in silver layer 304 can be composed of:

    • calls—that combine calls and call quality;
    • agents—that combine logs and agents;
    • accounts—that combine accounts and teams.
    • The gold layer provides overall aggregations of the silver baselines.
    • Anomaly detection



FIG. 5 illustrates data workflow 500 in accordance with an example of disclosed implementations. Elements in FIG. 5 that are the same as, or similar to, those in FIG. 3 are labeled with like reference numerals. After, being collected and stored in database 202, data is sourced from database 202 and streamed into bronze layer 302, which is an S3 database in this example. The same data stream is parsed and filtered to tables in silver layer 304, which is a Delta Lake in this example. Amazon Simple Storage Service (Amazon S3)™ is an object storage service. Delta Lake is an open source storage layer. Tables in silver layer 304 are processed to agregrate/transform data from silver layer 304 into tables of gold layer 306. Anomalies are calculated within the Databricks environment and then synched into Al Kafka. This creates a messaging queue that is used to transport the output of anomaly detection to the client applications for various purposes such as inference/detection.


In Apache Kafka, categories used to organize messages are called “topics”. Each topic should have a name that is unique across the entire Kafka cluster. Messages can be sent to, and read from, specified topics. Kafka topics can have zero or more “consumers” subscribing to that topic and the data written to it. Topics can be partitioned and replicated throughout the implementation. As an example, the disclosed implementations can process the following topics:


event-splitter. audit_logs—all data related to logs (session create, session revoke, etc) event-splitter.calls—all data related to calls (call started, call finished, call billed, etc) event-splitter.agents—all data related to agents (agent created, agent removed, etc)

    • event-splitter.accounts—all data related to accounts (account created, account updated, etc)
    • brokkr.explore.agent-status-monthly-v3—all data related to agent status updated (agent online, etc.)


The anomaly detection process can include two main steps. The first step is to create the baseline of normal behavior and the second is to compare the current behavior to the baselines and check for anomalies. Thee baselines can be divided into several types, such as “session baselines” and “call baselines”. The sessions baselines can include four different tables calculated all within the same data pipeline.

    • silver_agents_sessions_baseline_user_details_day—contains the aggregation of each case per user
    • silver_agents_sessions_baseline_peer_details_day—contains the aggregation of each case account
    • silver_agents_sessions_baseline_user_day—pivots the cases for columns and aggregates data so each row is a unique combination of the cases, account, user and respective role
    • silver_agents_sessions_baseline_peer_day—pivots the cases for columns and aggregates data so each row is a unique combination of the cases and account


After creating these tables, the tables can be updated periodically, such as once per day, and used as the baselines of every account/user/use case for a predefined period of time, such as 30 days. The current day's data can then be run against, e.g., compared to, the baselines. A predetermined divergence form the baseline can be detected as an anomaly.


The calls baselines create the baselines for a user and peers and can include the following tables:

    • silver_agents_calls_baseline_user_details—This table can include the number of distinct countries per agent phone number or customer phone number per user in the last 30 days
    • silver_agents_calls_baseline_peer_details—This table can include the number of distinct countries per agent phone number or customer phone number per account in the last 30 days
    • silver_agents_calls_baseline_user_day—This table can include the call metrics per user and day
    • silver_agents_calls_baseline_peer_day—This table can include the call metrics per account and day


As noted above, the baselines process can run periodically to aggregates current data for the day and compare that data with the baselines. If the current data is different from the baselines in a predetermined manner, then an anomaly is detected, written to the nr_anomalies table and to Al Kafka into a topic, such as ai-guardian.nr_anomalies.


The disclosed implementations use baseline distributions as support for anomaly creation, which means that periodically a smaller baseline is created using the same query for the current day per agent and then compares the results to the matching baseline. The following categories can be used for aggregations of data and baselines:

    • Logs
      • Agent sessions—The number of sessions of this agent increased;
      • Agent IPs—The number of IP addresses used by this agent has grown;
      • Agent countries—Higher agent login related activity by country;
      • Agent browser—The number of browsers used by this agent has grown;
      • Agent operating systems—The number of operating systems used by this agent has grown;
    • Calls
      • Inbound calls—The user has a different number of inbound calls;
      • Outbound calls—User has a different number of outbound calls;
      • Calls for the same number—The user has made a different number of calls to the same number. As the number of calls performed or answered is often used as a metric to evaluate the agent, calling the same number many times is considered gaming the system which is something that we want to detect.


The anomaly detection algorithm is based on the calculation of the probability of a certain type of event to happen based on baselines. For each use case, the probability distributions of the event to occur is calculated in the baselines for both each agent or the agent's peers. A probability model is a mathematical representation of a random phenomenon. It is defined by its sample space, events within the sample space, and probabilities associated with each event. The sample space S for a probability model is the set of all possible outcomes. Various probability models can be used to determine the probability distributions. for example, binomial distribution, Poisson distribution, normal distribution, and/or bivariate normal distribution probability models can be used.


A low probability indicates that the event is unlikely to occur. Therefore, the anomaly is actually ranked higher. Together with the probability, it is helpful to also consider the confidence that the use case is actually an anomaly. So, the anomaly detection algorithm cand use a combination of probability and confidence thresholds to analyze the anomaly. For example, anomalies can be persisted (detected) only when the probability is low, less than, for example, 0.05, and the confidence is high, above, for example, 0.85. The stream processing and use of multiple probabilistic baselines, as disclosed herein allows the disclosed implementations to reliably detect call center anomalies in a meaningful manner is substantially real-time.



FIG. 6 is a flow chart of a process for creating distributions in accordance with disclosed implementations. Process 600 starts at 602 by monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users. At 604, the communication activity data is stored in a database. At 606, the data is aggregated and at 508 distributions are created based on the aggregated data. The distributions can include an agent baseline distribution of communication metrics for each of the agents per day and a peer baseline distribution of communication metrics for at least one group of the agents per day.



FIG. 7 is a flow chart of a process for detecting anomalies in accordance with disclosed implementations. Process 700 begins at 702 by monitoring event parameters of communication activities between call center agents and user. At 04, a baseline distribution database is queried to determine that an event parameter represents a communication anomaly when the event parameter indicates an event that corresponds to a probability that is lower than a predetermined threshold probability and a calculated confidence of the event is higher than a predetermined confidence threshold, wherein the baseline distribution database is created by monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users, storing the communication activity data in a collected data database and aggregating the communication activity data to create an agent baseline distribution of communication metrics for each of the agents per day and a peer baseline distribution of communication metrics for at least one group of the agents per day. At 706, a record of the communication anomaly is stored in and anomaly database.


The baselines can be refreshed on a schedule. Further personalized (e.g., customer-level or industry-level) baselines can be created to facilitate multi-level anomaly detection. For example, an observation could be an outlier at the customer-level but not for the industry. In such a case a trigger rule can be applied to detect (or not detect) and anomaly. Baselines can be multiple and dynamic, and continuously updated to accommodate holidays, supply chain disruptions, and the like. The disclosed implementations leverage distributional techniques to compute probability of an observation to be an outlier based on the computed baselines. Complex observations can be modeled using correlation based techniques using high-dimensional data.


A given computing platform may include one or more processors configured to execute computer program modules. The computer program modules associated with the computing platform allow the computing platform to provide the functionality disclosed herein. Computing platforms may include electronic storage, one or more processors, and/or other components, such as communication lines, or ports to enable the exchange of information with a network and/or other computing platforms. Electronic storage devices may comprise non-transitory storage media that electronically stores information. Electronic storage may include one or more of optically readable storage media (e.g., optical disks, etc.), magnetically readable storage media (e.g., magnetic tape, magnetic hard drive, floppy drive, etc.), electrical charge-based storage media (e.g., EEPROM, RAM, etc.), solid-state storage media (e.g., flash drive, etc.), and/or other electronically readable storage media. Electronic storage may include one or more virtual storage resources (e.g., cloud storage, a virtual private network, and/or other virtual storage resources). Electronic storage may store software algorithms, information determined by processor(s) and/or other information that enables server(s) 202 to function as described herein.


Processor(s) may be configured to provide information processing capabilities and may include one or more of a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information. As used herein, the term “module” may refer to any component or set of components that perform the functionality attributed to the module. This may include one or more physical processors during execution of processor readable instructions, the processor readable instructions, circuitry, hardware, storage media, or any other components.


It will be appreciated by those skilled in the art that changes could be made to the embodiments described above without departing from the broad inventive concept thereof. It is understood, therefore, that this invention is not limited to the particular implementations disclosed, but it is intended to cover modifications within the spirit and scope of the present invention as defined by the appended claims.

Claims
  • 1. A method for creating a baseline database to be used to increase security in a call center implemented over a computing network by detecting anomalies in communication activities between call center agents and call center users, the method comprising: monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users;storing the communication activity data in a collected data database;aggregating the communication activity data into aggregated data; andcreating, based on the aggregated data, at least one distribution of communication metrics over a period of time.
  • 2. The method of claim 1, wherein at least one stream of communication activity data includes at least one of event audit logs, communication events, account information, and agent status data.
  • 3. The method of claim 2, further comprising creating use case specific tables and analytics based on the communication activity data.
  • 4. The method of claim 1, wherein the at least one distribution includes an agent baseline distribution of communication metrics for each of the agents per day and a peer baseline distribution of communication metrics for at least one group of the agents per day.
  • 5. The method of claim 1, wherein the agent baseline distribution and the peer baseline distribution each include metrics of communications by relevant agents.
  • 6. The method of claim 1, wherein the at least one stream of communication data includes direct data about communications and derived data about communications.
  • 7. A method for increasing security in a call center implemented over a computing network by detecting anomalies in communication activities between call center agents and call center users, the method comprising: monitoring event parameters of communication activities between call center agents and user;querying a baseline distribution database to determine that an event parameter represents a communication anomaly when the event parameter indicates an event that corresponds to a probability that is lower than a predetermined threshold probability and a calculated confidence of the event is higher than a predetermined confidence threshold, wherein the baseline distribution database is created by monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users, storing the communication activity data in a collected data database and aggregating the communication activity data to create at least one distribution of communication metrics over a period of time; andstoring a record of the communication anomaly in and anomaly database.
  • 8. The method of claim 7, wherein at least one stream of communication activity data includes at least one of event audit logs, communication events, account information, and agent status data.
  • 9. The method of claim 8, further comprising creating use case specific tables and analytics based on the communication activity data.
  • 10. The method of claim 7, wherein the at least one distribution includes an agent baseline distribution of communication metrics for each of the agents per day and a peer baseline distribution of communication metrics for at least one group of the agents per day.
  • 11. The method of claim 7, wherein the at least one distribution includes an agent baseline distribution of communication metrics for each of the agents per day and a peer baseline distribution of communication metrics for at least one group of the agents per day.
  • 12. The method of claim 7, wherein the at least one stream of communication data includes direct data about communications and derived data about communications.
  • 13. A system for creating a baseline database to be used to increase security in a call center implemented over a computing network by detecting anomalies in communication activities between call center agents and call center users, the system comprising: at least one memory storing computer executable instructions; andat least one processor which, when executing the instructions accomplishes the method of: monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users;storing the communication activity data in a collected data database;aggregating the communication activity data into aggregated data; andcreating based on the aggregated data, at least one distribution of communication metrics over a period of time.
  • 14. The system of claim 13, wherein at least one stream of communication activity data includes at least one of event audit logs, communication events, account information, and agent status data.
  • 15. The system of claim 14, wherein the method further comprises creating use case specific tables and analytics based on the communication activity data.
  • 16. The system of claim 13 wherein the at least one distribution includes an agent baseline distribution of communication metrics for each of the agents per day and a peer baseline distribution of communication metrics for at least one group of the agents per day.
  • 17. The system of claim 13 wherein the agent baseline distribution and the peer baseline distribution each include metrics of communications by relevant agents.
  • 18. The system of claim 13 wherein the at least one stream of communication data includes direct data about communications and derived data about communications.
  • 19. A system for increasing security in a call center implemented over a computing network by detecting anomalies in communication activities between call center agents and call center users, the system comprising: at least one memory storing computer executable instructions; andat least one processor which, when executing the instructions accomplishes the method of: monitoring event parameters of communication activities between call center agents and user;querying a baseline distribution database to determine that an event parameter represents a communication anomaly when the event parameter indicates an event that corresponds to a probability that is lower than a predetermined threshold probability and a calculated confidence of the event is higher than a predetermined confidence threshold, wherein the baseline distribution database is created by monitoring at least one stream of communication activity data indicating parameters of communication activities between call center agents and call center users, storing the communication activity data in a collected data database and aggregating the communication activity data to create at least one distribution of communication metrics over a period of time; andstoring a record of the communication anomaly in and anomaly database.
  • 20. The system of claim 19, wherein at least one stream of communication activity data includes at least one of event audit logs, communication events, account information, and agent status data.
  • 21. The system of claim 20, wherein the method further comprises creating use case specific tables and analytics based on the communication activity data.
  • 22. The system of claim 19, wherein the at least one distribution includes an agent baseline distribution of communication metrics for each of the agents per day and a peer baseline distribution of communication metrics for at least one group of the agents per day.
  • 23. The system of claim 19, wherein the at least one distribution includes an agent baseline distribution of communication metrics for each of the agents per day and a peer baseline distribution of communication metrics for at least one group of the agents per day.
  • 24. The system of claim 19, wherein the at least one stream of communication data includes direct data about communications and derived data about communications.