The disclosure relates to the field of inside sales engagement, and more particularly to the field of the use of analytics and learning systems to optimize sales engagement and productivity of out-bound communications originating from multimedia contact centers.
In the last forty years, “customer care” using remote call or contact centers (that is, remote from the perspective of the customer being cared for, as opposed to in-person customer care at, for example, a retail establishment, which is clearly not remote) has become a major activity of large corporations. Various estimates indicate that somewhere between 2 and 5 million people in the United States alone currently work on call or contact centers (in the art, “call center” generally refers to a center that handles only phone calls, while “contact center” refers to a center that handles not only calls but also other customer communication channels, such as electronic mail (“email”), instant messaging (“IM”), short message service (“SMS”), chat, web sessions, and so forth; in this document, applicant will generally use the term “contact center”, which should be understood to mean either call centers or contact centers, as just defined).
Contact centers are home to some of the more complex business processes engaged in by enterprises, since the process is typically carried out not only by employees or agents of the enterprise “running” the contact center, but also by the customers of the enterprise. Since an enterprise's customers will generally have goals that are different from, and often competitive with, the goals of the enterprise, and since customer care personnel (contact center “agents”) will often also have their own goals or preferences that may not always match those of the enterprise, the fact is that contact center processes lie somewhere between collaborative processes and purely competitive processes (like a courtroom trial). The existence of multiple competing or at least non-aligned stakeholders jointly carrying out a process means that, even when great effort is expended to design an efficient process, what actually occurs is usually a dynamic, surprising, and intrinsically complex mix of good and bad sub-processes, many of which occur without the direction or even knowledge of an enterprise's customer care management team.
Despite the complexity of contact center operations, it is a matter of significant economic importance to try to improve both the productivity of contact centers (from the enterprise's perspective) and the quality of the experience of the customers they serve. Accordingly, a number of well-known routing approaches have been adopted in the art, with the goal of getting each interaction to a most appropriate resource (resource being an agent or other person, or automated system, suitable for fulfilling a customer's service needs). For example, queues are still used in many contact centers, with most queues being first-in-first-out (FIFO) queues. In some cases in the art, enhancements to queue-based routing include use of priority scores for interaction, with higher-priority interactions being pushed “up” in queues to get quicker service. Queue-based routing has the advantage of simplicity and low cost, and is generally still in widespread use in applications where interactions are generally commodity-like or very similar (and therefore where the choice of a particular agent for a particular customer may not be that helpful).
An extension of the basic queuing approach is skills-based routing, which was introduced in the mid-1990s. In skills-based routing, each “agent” or customer service representative is assigned certain interaction-handling skills, and calls are queued to groups of agents who have the requisite skills needed for the call. Skills-based routing introduced the idea that among a large population of agents, some would be much more appropriate to handle a particular customer's need than others, and further that by assigning skills to agents and expressing the skills needed to serve a particular customer need, overall customer satisfaction would improve even as productivity did in parallel. However, in the art most skills are assigned administratively (sometimes based on training completed, but often based on work assignment or workgroup policies), and do not reflect actual capabilities of agents. Moreover, it is common practice in the art to “move interactions” by reassigning skills. That is, when traffic of inbound interactions begins to pile up in one group or skill set of a contact center, staff will often reassign skills of members in other groups so that the overloaded group temporarily becomes larger (and thereby clears the backlog of queued interactions). This common practice in the art further erodes any connection between skills as assigned and actual capabilities of agents, and in general basic skills-based routing has been unable to handle the complex needs of larger contact centers.
In one approach known in the art, the concept of a “virtual waiting room” where customers looking to be served and agents available to serve customers can virtually congregate, and a matching of customers to available agents can be made, much like people would do on their own if they were in a waiting room together. This approach, while attractive on the surface, is very impractical. For example, when there is a surplus of customers awaiting service, the waiting room approach becomes nothing more than determining, one agent at a time, which customer (among those the agent is eligible to serve) has the greatest need for prompt service; similarly, in an agent surplus situation, each time a customer “arrives” in the waiting room, a best-fit agent can be selected. Because generally there will be either an agent or a customer surplus, in most cases this waiting room approach is really nothing more than skills-based routing with a better metaphor.
Finally, because none of the three approaches just described satisfactorily meets the needs of complex routing situations typical in large contact centers, another approach that has become common in the art is the generic routing scripting approach. In this approach, a routing strategy designer application is used to build complex routing strategies, and each time an interaction requiring services appears (either by arriving, in the case of inbound interactions, or being initiated, in the case of outbound interactions), an appropriate script is loaded into an execution environment and executed on behalf of that interaction. An advantage of this approach is its open-endedness, as users can construct complex routing strategies that embody complex business rules. But this approach suffers from the disadvantage that it is very complex, and requires a high degree of technical skill on the part of the routing strategy designer. This requirement for skilled designers also generally means that changes in routing strategies occur only rarely, generally as part of a major technology implementation project (thus agile adoption and adaptation of enhanced business rules is not really an option).
Another general issue with the state of the art in routing is that, in general, one routing engine is used to handle all the routing for a given agent population. In some very large enterprises, routing might be subdivided based on organizational or geographic boundaries, but in most cases a single routing engine makes all routing decisions for a single enterprise (or for several). This means that the routing engine has to be made very efficient so that it can handle the scale of computation needed for large complex routing problems, and it means that the routing engine may be a point of failure (although hot standby and other fault-tolerant techniques are commonly used in the art). Also, routing engines, automated call distributors (ACDs), and queuing and routing systems in general known in the art today generally limit themselves to considering “available” agents (for example, those who have manually or automatically been placed in a “READY” status). Because of this, routing systems in the art generally require a real-time knowledge of the state of each potential target (particularly agents). In large routing systems, having to maintain continuous real-time state information about a large number of agents, and having to process routing rules within a centralized routing engine, have tended to require very complex systems that are difficult to implement, configure, and maintain.
Cloud-based contact centers (CC) and cloud communications platforms (CP) have a common approach of providing pre-integrated provision and management of voice, messaging and video communication channels. In the case of cloud-based contact centers, applications are prebuilt for specific contact center use cases such as call routing, customer service desktop, outbound sales, workforce management, outbound dialing, etc. On the other hand, cloud communications platforms provide APIs for developers to build custom applications. Many contact centers include a platform with rich APIs that enable custom application development, so the distinction between cloud-based contact centers and communications platforms is not always strong. However, use of these contact centers and communication platforms requires human interaction and management of complex communication processes such as ‘process and state tracking’, ‘uncertainty’, ‘hidden states’, ‘actions and actors’, ‘determination of actions leading to optimal outcomes’, ‘rewards and costs’, and ‘constraint propagation’. Even when great effort is expended to design an efficient process, what actually occurs is usually a dynamic, surprising, and intrinsically complex mix of good and bad sub-processes, many of which occur without the direction or even knowledge of an enterprise's customer care management team. Hence, it would therefore be desirable for these aspects to be managed with as much automation as possible to improve operations and activity actions of contact centers and communication platforms.
In the case of cloud-based contact centers, the interaction handling process for ‘process and state tracking’ is defined within the logic of each cloud-based contact center application but the logic can typically be customized through the use of routing rules for each channel type and agent skills. The technical state of the interactions, agents and callers is spread across the applications and the individual media servers. In the case of communications platforms, software developers are able to embed voice, messaging and video interactions directly into software applications and these applications share the technical state together with the media servers. However, the custom process, and the states or stages in the process, need to be regularly defined and managed by the developer, which is a taxing and time-consuming process.
Real-world communications scenarios are complex and involve large degrees of ‘uncertainty’. For example, from the simple fact that there are humans sending and responding to communications, there is uncertainty about knowing when interactions (voice, message, video) will start or terminate and what particular communications choices will be made on which particular channels. The technical “state” of multiple “parties” in an ongoing interaction chain evolves non-deterministically. Parties may switch between channels for communications due to random phenomena such as getting into or out of a car, meeting room or not wanting to communicate on a certain channel in the presence of other people, etc.
In addition to simple technical states that can be easily observed (e.g. whether someone is connected, speaking, silent, typing, dialing, etc.) there are other states that may be ‘hidden’ or unobservable to communication platforms and applications. A simple example of a hidden state is whether or not a person is “able to speak privately”, i.e. communicating in a private and not public setting. If a person is in a public setting, they may prefer to communicate by a text channel so they will not be overheard. This cannot be directly observed by the system (unless it was a video call or could be inferred from background voices). Also, as high quality intelligent speaking assistants and text bots become more prevalent it may become increasingly hard to know whether one party in communication is a human or a machine and thus the state of whether that party is a human or machine is no longer easily observable.
There are many kinds of ‘actions’ that are taken by the ‘actors’ or communicating parties (e.g. to start or end a communication session or to speak or type certain content or speak or write in a certain tone or to send a certain image or emoji, gesture, etc.). But as well as human actions, there are also actions to be taken by the communication platform ‘actors’ including how to route an interaction, to which person or on what channel to contact someone if they are not present. There are also platform infrastructure actions that may be required to, for example, ensure continuing good service under increasing load such as automated scale up and scale down of computer infrastructure nodes, etc.
A key challenge when faced with a large number of choices between possible actions is which specific actions should be taken under differing situations (and in what sequence) in order to achieve the best outcome over time. When considering tradeoffs between multiple possible actions, the concept of a ‘reward’ or benefit (or alternatively a penalty or ‘cost’) associated with an action and change of state and/or observation must be introduced.
In other approaches to optimization such as mathematical programming or constraint propagation, there is a concept of a constraint. In the case of an integer program, it could be that some linear combination of decision variables is greater than 5′ or less than ‘<’ some certain amount. In the case of constraint propagation, quite complex constraints need to be imposed on the allowed domains of integer decision variables. Slack variables can also be introduced to turn a “hard” inequality constraint into a “soft” constraint.
Management and control of cloud-based contact centers and communications platforms require significant effort to not only assign tasks efficiently, but also to be able to evaluate current trends and performance against historical data to project a desired outcome. Whilst a model may be created to be used as basis for some or all system processes, the act of selecting the appropriate model for the given parameters, as well as conditioning the model is quite complex. In-sampling and out-of-sampling techniques may be used by an enterprise's management team in an attempt to predict an efficient approach and process within the contact center systems. In-sampling may be used to evaluate a small subset of known, historical sample of training data to estimate parameters to create a model to predict and attempt to control a desired outcome. However, in-sampling typically draws an overly simplistic scenario of the model's forecasting ability, since commonly chosen algorithms usually are assigned to avoid large prediction errors, and are therefore, susceptible to error when used in the long-run. Using an out-of-sample analysis includes not only a set of historical data, but also a prediction iteration series where an evaluation is made on the results of the model used to readjust the model, and proceed with the adjustment. The use of out-of-sampling is iterative and time consuming, and results must be evaluated and further applied to another model to be tested for the desired outcome, which by that time, the desired outcome may have changed based on ever-changing conditions associated with call centers, as explained above.
Continuing on
What is needed in the art is a way to automate actions and optimize states of communications and operations in a contact center. Further what is needed in the art is an automated system and process for choosing which specific actions should be taken under differing situations, in a dynamic environment, and in what order these actions should be applied in order to achieve the best outcome over time.
Accordingly, the inventor has conceived and reduced to practice, in a preferred embodiment of the invention a system for optimizing communication operations in a contact center, using a reinforcement learning module comprising a reinforcement learning server comprising at least a plurality of programming instructions stored in a memory and operating on a processor of a network-connected computing device and configured to observe and analyze historical and current data using a retrain and design server; develop a training set for use in a fully observable Markov chain model; assign desired rewards to specific states for use in a fully observable Markov decision process model; specify states, add time-labeled states, and create clusters within a set of hidden states added to the fully observable Markov decision process model; design and train the fully observable Markov decision process model using a retrain and design server to achieve a desired outcome; form the fully observable Markov decision process model by fitting the fully observable Markov chain model with a Baum-Welch algorithm to infer parameters based on observations; engage with an optimization server to apply and manage the fully observable Markov decision process model; record results of optimal actions carried out by the optimization server to a learning database; observe and analyze results of the optimal actions stored in the learning database; and repeat these steps iteratively; and an optimization server comprising at least a plurality of programming instructions stored in a memory and operating on a processor of a network-connected computing device and configured to apply optimal actions to states as assigned by the reinforcement learning server; manage and maintain a current revision of the fully observable Markov decision process model; assign an optimal action to each state to be executed by an action handler through interfaces with the contact center; initiate actions within the contact center through interfaces with an action handler; analyze events resulting from executing optimal actions within the contact center by way of interfaces with an event analyzer; record observations and actions resulting from execution of the optimal action; and send records of observations and actions resulting from execution of optimal actions to the reinforcement learning server.
According to a preferred embodiment of the invention, a method for optimizing states of communications and operations in a contact center, by using a reinforcement learning module, comprising the steps of: defining rewards to be used by the reinforcement training module for achieving a desired outcome or goal; assigning the rewards to a set of possible states at a given point in time, “L”; assigning specific actions resulting from the set of possible states for the given point in time “L”; forming a fully observable Markov decision process model by adding rewards, actions and hidden states, the hidden states comprising at least a set of specified states, time-labeled states, or clustered segments, to a Markov process at a given point in time “L”; solving the fully observable Markov decision process model to determine an optimal policy for the given point in time “L”; applying the optimal policy to determine an optimal action; determining the optimal action for the given point in time “L”; executing the optimal action at a new point in time “Li”; recording and observing results of the optimal action at the new point in time, “Li”; computing the current state based on the results of the optimal action at time stamp “Li”; matching observations under actions to fit a new model, at time stamp “Li”; forming a new fully observable Markov decision process model by adding rewards, actions and hidden states, the hidden states comprising at least a set of specified states, time-labeled states, or clustered segments, to a Markov process, at time stamp “Li”; repeating a portion of steps with an incremental time step at “n=1”, yielding a recorded and observed result of the optimal action at the new point in time “t2”; and continuing a portion of these steps iteratively to determine a final optimal action for a given point in time, is disclosed.
The accompanying drawings illustrate several embodiments of the invention and, together with the description, serve to explain the principles of the invention according to the embodiments. It will be appreciated by one skilled in the art that the particular embodiments illustrated in the drawings are merely exemplary, and are not to be considered as limiting of the scope of the invention or the claims herein in any way.
The inventor has conceived, and reduced to practice, in a preferred embodiment of the invention, an automated reinforcement learning module which may be connected to a system of a contact center such that optimized states of communications and operations may be achieved without the need for live user management or control of components or systems within the contact center.
One or more different inventions may be described in the present application. Further, for one or more of the inventions described herein, numerous alternative embodiments may be described; it should be appreciated that these are presented for illustrative purposes only and are not limiting of the inventions contained herein or the claims presented herein in any way. One or more of the inventions may be widely applicable to numerous embodiments, as may be readily apparent from the disclosure. In general, embodiments are described in sufficient detail to enable those skilled in the art to practice one or more of the inventions, and it should be appreciated that other embodiments may be utilized and that structural, logical, software, electrical and other changes may be made without departing from the scope of the particular inventions. Accordingly, one skilled in the art will recognize that one or more of the inventions may be practiced with various modifications and alterations. Particular features of one or more of the inventions described herein may be described with reference to one or more particular embodiments or figures that form a part of the present disclosure, and in which are shown, by way of illustration, specific embodiments of one or more of the inventions. It should be appreciated, however, that such features are not limited to usage in the one or more particular embodiments or figures with reference to which they are described. The present disclosure is neither a literal description of all embodiments of one or more of the inventions nor a listing of features of one or more of the inventions that must be present in all embodiments.
Headings of sections provided in this patent application and the title of this patent application are for convenience only, and are not to be taken as limiting the disclosure in any way.
Devices that are in communication with each other need not be in continuous communication with each other, unless expressly specified otherwise. In addition, devices that are in communication with each other may communicate directly or indirectly through one or more communication means or intermediaries, logical or physical.
A description of an embodiment with several components in communication with each other does not imply that all such components are required. To the contrary, a variety of optional components may be described to illustrate a wide variety of possible embodiments of one or more of the inventions and in order to more fully illustrate one or more aspects of the inventions. Similarly, although process steps, method steps, algorithms or the like may be described in a sequential order, such processes, methods and algorithms may generally be configured to work in alternate orders, unless specifically stated to the contrary. In other words, any sequence or order of steps that may be described in this patent application does not, in and of itself, indicate a requirement that the steps be performed in that order. The steps of described processes may be performed in any order practical. Further, some steps may be performed simultaneously despite being described or implied as occurring non-simultaneously (e.g., because one step is described after the other step). Moreover, the illustration of a process by its depiction in a drawing does not imply that the illustrated process is exclusive of other variations and modifications thereto, does not imply that the illustrated process or any of its steps are necessary to one or more of the invention(s), and does not imply that the illustrated process is preferred. Also, steps are generally described once per embodiment, but this does not mean they must occur once, or that they may only occur once each time a process, method, or algorithm is carried out or executed. Some steps may be omitted in some embodiments or some occurrences, or some steps may be executed more than once in a given embodiment or occurrence.
When a single device or article is described herein, it will be readily apparent that more than one device or article may be used in place of a single device or article. Similarly, where more than one device or article is described herein, it will be readily apparent that a single device or article may be used in place of the more than one device or article.
The functionality or the features of a device may be alternatively embodied by one or more other devices that are not explicitly described as having such functionality or features. Thus, other embodiments of one or more of the inventions need not include the device itself.
Techniques and mechanisms described or referenced herein will sometimes be described in singular form for clarity. However, it should be appreciated that particular embodiments may include multiple iterations of a technique or multiple instantiations of a mechanism unless noted otherwise. Process descriptions or blocks in figures should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process. Alternate implementations are included within the scope of embodiments of the present invention in which, for example, functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those having ordinary skill in the art.
Reinforcement learning follows a productive process, training a model 370, and when the model 370 is ready, run it through subsets of training sets 305 to simulate real-time events. States are learned by reviewing history from the history database 315. Some examples of states include dialing, ringing, on a call, standby, ready, on a break, etc. Once the model 370 has been tested, it is set into motion in live action, and it controls a routing and action server 320 which then works to record more history to store in the history database 315, creates training sets 305, and reapply the model 370 based on more data, learning from more data. Once live, an optimization server 220 is engaged to control actions. Components of reinforcement learning system 200 work in “black-box” scenarios, as stand-alone units that only interface with established components, with no realization that other components exist in the system. Within the optimization server 220 an action handler 350 may act as a pacing manager, in communication with the campaign database 171 via interfaces 340. The action handler 350 may also concern itself with dialing and giving orders to hardware to dial, receive status reports, and translate dialing results, such as connection, transfer, hang-up, etc. The action handler 350 dictates actions to the reinforcement learning system 200. The model 370 is comprised of a set of algorithms, but the action handler 350 uses the model 370 to decide and determine optimal movements and actions, which are then put into action, and the optimization server 220 learns from actions taken in real-time and incorporates observations and results to determine a further optimal actions. The event analyzer 360 receives events from the state and statistics server 330, or the state and statistics server 154, or any of the other components 150, and then receives events as states, interprets events (states) in terms of the model 370, then decides what optimal actions to take and communicates with the action handler 350 which then decides how to implement a chosen action, and sends it via interface 340 out to any of the server components 150, such as state and statistics server 154, routing server 151, outbound server 153, and so forth. The event analyzer 360 receives events, interprets events in accordance with the model 370, and based on results, actions are determined to be executed. An action is a directive to do something. Actions are handled by the action handler 350. An event, or state, is a recording that something has been done. Actions lead to states, and states trigger actions. Refer to
The optimization server 220 carries out instructions from the model 370 by analyzing events with the event analyzer 360, and sending out optimal actions to be executed by the action handler 350 based on those events. The reinforcement learning server 210, during runtime, may be receiving a plurality of events, and action directives, and interpreting them, and adjusting new actions as time advances. The model manager 380 receives increments from the model 370, and from the reinforcement learning server 210, and dynamically updates the model 370 that is being used. Model manager 380 maintains a version of what is the current model 370, as well as have the option to change the model 370 each time an incremental dataset is received, which may even mean changing the model every few minutes, or even seconds, OR after a prescribed quantity of changes are received.
According to a preferred embodiment, decisions of optimal actions to be executed to yield a most desirable outcome, even a best outcome, of processes running within a contact center may be expressed through a partially observable Markov decision process (POMPD) 570. The POMDP 570 is defined by a tuple , O, , P, R, Z, γ, where:
and a matrix P or Pssa, is a conditional probability of a transition from state s at time t to a state s′ at time t+1 given that the state was s at time t and under the effect of action a,
P
ss′
a
=
[S
t+1
=s′|S
t
=s,A
t
=a]
a reward function R or Rsa is an expected (mean) value of the reward at time t+1 after starting in state s at time t and under the effect of action a,
R
s
a
=
[R
t+1
|S
t
=s,A
t
=a]
an observation function Z or Zs′oa is a probability of observing observation o at time t+1 given that the system was in state s′ at time t+1 and had experienced action a,
Z
s′o
a
=
[O
t+1
=o|S
t+1
=s′,A
t
=a]
Standard reinforcement learning (RL) algorithms follow 3 different approaches. Valued Based (estimates the optimal value function), Policy-based (search for the optimal policy directly) and Model-based.
Value-based RL involve estimating the “value functions” of state-action pairs to estimate how good it is to perform a specific action in a given state based on accumulated future rewards. The value of a state s under a policy π is the expected return when starting in state s and following policy π.
The optimal policy π* is the one that maximizes νπ(s).
Deep Reinforcement Learning however uses deep neural networks to represent the Value Function, the Policy and the Model. The loss function is optimized by stochastic gradient descent. This leads to Value-Based Deep RL, Policy-Based Deep RL and Model-Based Deep RL approaches for the solution of the POMDP.
Reinforcement learning follows a productive process, training a model 370, and when the model 370 is ready, run it through subsets of training data 305 to simulate real-time events.
At each time step 660 the computational agent 610 implements a mapping 690 from states to probabilities of selecting each possible action 620. This mapping 690 is called the computational agent's policy 695, written πt where πt(a|s) is the probability that the action 620 at time t, At=a if St=s. Reinforcement learning methods specify how the computational agent 610 changes its policy 695 as a result of its experience 665, which is the accumulated result of each completed iteration through each time stamp 660. The computational agent's goal is to maximize the total amount of reward it receives over the long run. The time steps 660 need not refer to fixed intervals of real time but may refer to arbitrary successive stages of decision making and acting. Basically there are three signal types being sent between the computational agent 610 and its environment 620: (i) choices made by the computational agent 610 (the actions 620); (ii) basis of which choices are to be made by the computational agent 610 (the states 670); and (iii) the computational agent's 610 goal (the rewards 680). Note that states and actions may be low level communication states or actions, but they may also be quite complex. The computational agent 610 and environment 630 boundaries represent the limit of the computational agent's 610 absolute control, not its knowledge. Reward computation is external to the computational agent 610. In practice, multiple computational agents 610 may be operating concurrently, each with a different boundary. They may be hierarchical in that one computational agent may make high-level decisions which form parts of states faced by a second, lower-level computational agent which implements higher level decisions.
The reinforcement learning system 200 is designed to handle uncertainty at its core in terms of transition probabilities between states and probabilistic observation functions, and may perform optimal decision making under uncertainty. The reinforcement learning system 200 makes it possible to statistically infer hidden states even though they are not directly observable, as well as makes it possible to represent actions associated with the reinforcement learning system 200 and its communications platforms. In a preferred embodiment, the reinforcement learning system 200 finds an action policy that has a maximum value of expectation (mean) value of net accumulated reward (total return) over a time horizon in presence of uncertainty of different scenarios. Global constraints on actions are represented by an absence of impermissible actions in formulation of the model 370 and constraints on entering disallowed or undesirable states are represented by large penalties or negative action rewards for actions that have a non-zero probability of transition to disallowed states. Use of the reinforcement learning system 200 clearly enables optimal actions to be computed for any given state of the system 200 and for those actions to be executed.
Other applications are possible such as a plurality of outbound interactions, outbound dialing and pacing, workforce planning, resource allocation, for example, optimal interaction planning for outbound sales leads (when and how often and by what channel should an outbound lead be contacted), optimal skills based routing for inbound interactions (with certain parameters known, such as current system state, number of interactions in queue, number of agents available, paired with more positive rewards based on matching of a skill request with an agent skill, find most optimal actions of routing to an agent in each time step), optimal intraday staffing (actions are which agents to schedule at what time and for how long, as well as servicing of interactions by a well-matched agent), learning optimal channel and times for communication to a customer device, simplification of state handling in developer applications by updating state process and deaccessioning model to cloud as data, not as code, optimal cloud resource management, cloud platform optimizes its response to API actions to maximize reward, etc. In a general sense, an entire journey to customer and even to agent could be modeled as a Markov decision process, subject to actions along the way.
Considering the paragraphs above, a system using a Markov decision process may be built and configured for a contact center to include simultaneous states of interactions and agents. A fully observable Markov decision process may be implemented by creating a Markov chain with actions and rewards, allowing for a system to operate from a hyper-policy that specifies general actions to take such that rewards are maximized over a specified time or horizon. Actions need not be limited to typical routing actions, such as, for example, communication interactions and agent selection, but may be generalized to include actions related to scale-up or scale-down of resources 120 or scaling of other resources, such as, for example, cloud computing resources. Time may be discretely introduced to a Markov chain by introducing time-labeled states, which may be used to model waiting or service times. Therefore, by modifying the exemplary method for creating a partially observable Markov decision process 500 for use by reinforcement learning module 300, as illustrated in
According to a preferred embodiment, decisions of optimal actions to be implemented and executed to yield a most desirable outcome, even a best outcome, of communication operations running within a contact center may be expressed through a fully observable Markov decision process (MPD) 1070. In a similar fashion to the derivation of POMDP 570, the MDP 1070 is defined by a tuple , , , P, R, γ, where:
An overall state of the reinforcement learning system 200 may be represented as , and may be decomposed into a finite number of possible states, (of interactions) NQ, in a queue: Q0, Q1, . . . Q[NQ−1]; and into a finite number of possible states, (of interactions being addressed by agent resources) NA, agent resource state: A0, A1, . . . [NA−1], where a special state Q0 corresponds to an empty queue and where a special state A0 corresponds to all agent resources idle. Transition probabilities may change over time due to any number of uncontrolled actions, such as customer 110 disconnecting due to impatience, or agent resource 120 delayed reporting of availability. The Markov decision process model 1070 may be created as a non-stationary policy, or hyper-policy, by expanding a state definition to include an explicit time stage label, t0, t1, . . . , tN, and considering a state subspace Q to be enlarged by including time units spent waiting in queue and a state subspace A to include a number of time units spent being engaged or active. The finite number of possible states, , of the queue, NQ, may be determined considering all possible interactions types (skill request expressions) and number of interactions of each type waiting in each queue for a range of time units up to a maximum model queue time (horizon), such that an order of interactions in a queue is not important, only wait time counts are captured. Alternatively, queue states may be distinguished by order. All possible combinations of queue interactions and agent resource states may be specified in the overall state space of the reinforcement learning system 200, where S={Q0A0, Q1A0, Q1A1, Q2A0, Q2A1, Q2A2, . . . , QnAn}. Similarly, the Markov decision process model 1070 may be further extended to a partially observable model, for example, when relating a known state of a customer 110.
A non-stationary policy, otherwise termed herein as a hyper-policy, specifically as referenced above, may be implemented to identify optimal actions to take at state, , with a known number of ‘t’ stages within a specified horizon, H. This may be represented as π(s,t), where π:S×T->A, and T comprises a set of non-negative integers. A finite planning horizon, H, comprising a finite number of stages, ‘t’, may be established such that a finite count of actions may be determined. Actions may involve routing of interactions to agent resources or changing a quantity of available or potentially-engaged agent resources according to changing needs of the reinforced learning system 200. Given a Markov decision process 1070 and a known horizon, H, for example, one day, an optimal finite-horizon policy may be computed using, for example, a backward induction algorithm that starts from the end of the known horizon, e.g. one day, and working backwards to find optimal actions to take at each stage or time point, t, manipulated to determine an optimal value function for a know horizon, H. Backwards induction algorithms require some level of initial approximation in order to compute and optimized policy, and may follow: myoptic policies, which optimize current cost but do not apply forecasts or representations of future decisions; look-ahead policies, which explicitly optimize over a future horizon with approximated future data and actions applied; policy function approximations, which directly return an action in a given state with no embedded optimization or forecast of future data applied; and value function approximations (greedy policies) using an approximation of value being in a future state as a result of a decision currently made, with any impact of future actions solely in this value function.
Generally, the techniques disclosed herein may be implemented on hardware or a combination of software and hardware. For example, they may be implemented in an operating system kernel, in a separate user process, in a library package bound into network applications, on a specially constructed machine, on an application-specific integrated circuit (ASIC), or on a network interface card.
Software/hardware hybrid implementations of at least some of the embodiments disclosed herein may be implemented on a programmable network-resident machine (which should be understood to include intermittently connected network-aware machines) selectively activated or reconfigured by a computer program stored in memory. Such network devices may have multiple network interfaces that may be configured or designed to utilize different types of network communication protocols. A general architecture for some of these machines may be described herein in order to illustrate one or more exemplary means by which a given unit of functionality may be implemented. According to specific embodiments, at least some of the features or functionalities of the various embodiments disclosed herein may be implemented on one or more general-purpose computers associated with one or more networks, such as for example an end-user computer system, a client computer, a network server or other server system, a mobile computing device (e.g., tablet computing device, mobile phone, smartphone, laptop, or other appropriate computing device), a consumer electronic device, a music player, or any other suitable electronic device, router, switch, or other suitable device, or any combination thereof. In at least some embodiments, at least some of the features or functionalities of the various embodiments disclosed herein may be implemented in one or more virtualized computing environments (e.g., network computing clouds, virtual machines hosted on one or more physical computing machines, or other appropriate virtual environments).
Referring now to
In one embodiment, computing device 10 includes one or more central processing units (CPU) 12, one or more interfaces 15, and one or more busses 14 (such as a peripheral component interconnect (PCI) bus). When acting under the control of appropriate software or firmware, CPU 12 may be responsible for implementing specific functions associated with the functions of a specifically configured computing device or machine. For example, in at least one embodiment, a computing device 10 may be configured or designed to function as a server system utilizing CPU 12, local memory 11 and/or remote memory 16, and interface(s) 15. In at least one embodiment, CPU 12 may be caused to perform one or more of the different types of functions and/or operations under the control of software modules or components, which for example, may include an operating system and any appropriate applications software, drivers, and the like.
CPU 12 may include one or more processors 13 such as, for example, a processor from one of the Intel, ARM, Qualcomm, and AMD families of microprocessors. In some embodiments, processors 13 may include specially designed hardware such as application-specific integrated circuits (ASICs), electrically erasable programmable read-only memories (EEPROMs), field-programmable gate arrays (FPGAs), and so forth, for controlling operations of computing device 10. In a specific embodiment, a local memory 11 (such as non-volatile random access memory (RAM) and/or read-only memory (ROM), including for example one or more levels of cached memory) may also form part of CPU 12. However, there are many different ways in which memory may be coupled to system 10. Memory 11 may be used for a variety of purposes such as, for example, caching and/or storing data, programming instructions, and the like. It should be further appreciated that CPU 12 may be one of a variety of system-on-a-chip (SOC) type hardware that may include additional hardware such as memory or graphics processing chips, such as a QUALCOMM SNAPDRAGON™ or SAMSUNG EXYNOS™ CPU as are becoming increasingly common in the art, such as for use in mobile devices or integrated devices.
As used herein, the term “processor” is not limited merely to those integrated circuits referred to in the art as a processor, a mobile processor, or a microprocessor, but broadly refers to a microcontroller, a microcomputer, a programmable logic controller, an application-specific integrated circuit, and any other programmable circuit.
In one embodiment, interfaces 15 are provided as network interface cards (NICs). Generally, NICs control the sending and receiving of data packets over a computer network; other types of interfaces 15 may for example support other peripherals used with computing device 10. Among the interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, graphics interfaces, and the like. In addition, various types of interfaces may be provided such as, for example, universal serial bus (USB), Serial, Ethernet, FIREWIRE™, THUNDERBOLT™, PCI, parallel, radio frequency (RF), BLUETOOTH™, near-field communications (e.g., using near-field magnetics), 802.11 (WiFi), frame relay, TCP/IP, ISDN, fast Ethernet interfaces, Gigabit Ethernet interfaces, Serial ATA (SATA) or external SATA (ESATA) interfaces, high-definition multimedia interface (HDMI), digital visual interface (DVI), analog or digital audio interfaces, asynchronous transfer mode (ATM) interfaces, high-speed serial interface (HSSI) interfaces, Point of Sale (POS) interfaces, fiber data distributed interfaces (FDDIs), and the like. Generally, such interfaces 15 may include physical ports appropriate for communication with appropriate media. In some cases, they may also include an independent processor (such as a dedicated audio or video processor, as is common in the art for high-fidelity A/V hardware interfaces) and, in some instances, volatile and/or non-volatile memory (e.g., RAM).
Although the system shown in
Regardless of network device configuration, the system of the present invention may employ one or more memories or memory modules (such as, for example, remote memory block 16 and local memory 11) configured to store data, program instructions for the general-purpose network operations, or other information relating to the functionality of the embodiments described herein (or any combinations of the above). Program instructions may control execution of or comprise an operating system and/or one or more applications, for example. Memory 16 or memories 11, 16 may also be configured to store data structures, configuration data, encryption data, historical system operations information, or any other specific or generic non-program information described herein.
Because such information and program instructions may be employed to implement one or more systems or methods described herein, at least some network device embodiments may include nontransitory machine-readable storage media, which, for example, may be configured or designed to store program instructions, state information, and the like for performing various operations described herein. Examples of such nontransitory machine-readable storage media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as optical disks, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM), flash memory (as is common in mobile devices and integrated systems), solid state drives (SSD) and “hybrid SSD” storage drives that may combine physical components of solid state and hard disk drives in a single hardware device (as are becoming increasingly common in the art with regard to personal computers), memristor memory, random access memory (RAM), and the like. It should be appreciated that such storage means may be integral and non-removable (such as RAM hardware modules that may be soldered onto a motherboard or otherwise integrated into an electronic device), or they may be removable such as swappable flash memory modules (such as “thumb drives” or other removable media designed for rapidly exchanging physical storage devices), “hot-swappable” hard disk drives or solid state drives, removable optical storage discs, or other such removable media, and that such integral and removable storage media may be utilized interchangeably. Examples of program instructions include both object code, such as may be produced by a compiler, machine code, such as may be produced by an assembler or a linker, byte code, such as may be generated by for example a JAVA™ compiler and may be executed using a Java virtual machine or equivalent, or files containing higher level code that may be executed by the computer using an interpreter (for example, scripts written in Python, Perl, Ruby, Groovy, or any other scripting language).
In some embodiments, systems according to the present invention may be implemented on a standalone computing system. Referring now to
In some embodiments, systems of the present invention may be implemented on a distributed computing network, such as one having any number of clients and/or servers. Referring now to
In addition, in some embodiments, servers 32 may call external services 37 when needed to obtain additional information, or to refer to additional data concerning a particular call. Communications with external services 37 may take place, for example, via one or more networks 31. In various embodiments, external services 37 may comprise web-enabled services or functionality related to or installed on the hardware device itself. For example, in an embodiment where client applications 24 are implemented on a smartphone or other electronic device, client applications 24 may obtain information stored in a server system 32 in the cloud or on an external service 37 deployed on one or more of a particular enterprise's or user's premises.
In some embodiments of the invention, clients 33 or servers 32 (or both) may make use of one or more specialized services or appliances that may be deployed locally or remotely across one or more networks 31. For example, one or more databases 34 may be used or referred to by one or more embodiments of the invention. It should be understood by one having ordinary skill in the art that databases 34 may be arranged in a wide variety of architectures and using a wide variety of data access and manipulation means. For example, in various embodiments one or more databases 34 may comprise a relational database system using a structured query language (SQL), while others may comprise an alternative data storage technology such as those referred to in the art as “NoSQL” (for example, HADOOP CASSANDRA™, GOOGLE BIGTABLE™, and so forth). In some embodiments, variant database architectures such as column-oriented databases, in-memory databases, clustered databases, distributed databases, or even flat file data repositories may be used according to the invention. It will be appreciated by one having ordinary skill in the art that any combination of known or future database technologies may be used as appropriate, unless a specific database technology or a specific arrangement of components is specified for a particular embodiment herein. Moreover, it should be appreciated that the term “database” as used herein may refer to a physical database machine, a cluster of machines acting as a single database system, or a logical database within an overall database management system. Unless a specific meaning is specified for a given use of the term “database”, it should be construed to mean any of these senses of the word, all of which are understood as a plain meaning of the term “database” by those having ordinary skill in the art.
Similarly, most embodiments of the invention may make use of one or more security systems 36 and configuration systems 35. Security and configuration management are common information technology (IT) and web functions, and some amount of each are generally associated with any IT or web systems. It should be understood by one having ordinary skill in the art that any configuration or security subsystems known in the art now or in the future may be used in conjunction with embodiments of the invention without limitation, unless a specific security 36 or configuration system 35 or approach is specifically required by the description of any specific embodiment.
In various embodiments, functionality for implementing systems or methods of the present invention may be distributed among any number of client and/or server components. For example, various software modules may be implemented for performing various functions in connection with the present invention, and such modules may be variously implemented to run on server and/or client components.
The skilled person will be aware of a range of possible modifications of the various embodiments described above. Accordingly, the present invention is defined by the claims and their equivalents.
This application is a continuation-in-part of U.S. patent application Ser. No. 15/268,611, tided, “SYSTEM AND METHOD FOR OPTIMIZING COMMUNICATIONS USING REINFORCEMENT LEARNING” filed on Sep. 18, 2016, the entire specification of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62441538 | Jan 2017 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15268611 | Sep 2016 | US |
Child | 15442667 | US |