The present invention is related to a computer-implemented method of training artificial intelligence (AI) systems or rather agents (Maximum Entropy Regularised multi-goal Reinforcement Learning), in particular, an AI system/agent for controlling a technical system.
AI systems like Neuronal Networks (NN) need to be trained in order to learn how to accomplish certain tasks like locomotion and robot manipulation (e.g. manipulation of a robot arm having several joints).
Reinforcement Learning (RL) combined with Deep Learning (DL) lead to great successes in various tasks, such as learning autonomously to accomplish different robotic tasks. One of the biggest challenges in RL is to make the agent learn sample-efficiently in applications with sparse rewards. Recent RL algorithms, such as Deep Deterministic Policy Gradient (DDPG), enable the agent to learn continuous control, such as manipulation and locomotion. Further, UVFAs generalise not just over states but also over goals. Consequently, a UVFA method extends value functions (Q-functions) to multiple goals. Furthermore, to make the agent learn faster in the sparse reward settings Hindsight Experience Replay (HER) encourages the agent to learn from whatever goal states it has achieved. The combined use of DDPG and HER lets the agent learn to accomplish more complex robot manipulation tasks.
However, there is still a huge gap between the learning efficiency of humans and RL agents. In most cases, an RL agent needs millions of samples before it is able to solve the tasks, while humans only need a few samples. A concept of maximum entropy is used to encourage exploration during training. Soft-Q learning learns a deep energy-based policy with the maximum entropy of actions for each state and encourages the agent to learn all the policies that lead to the optimum. Furthermore, Soft Actor-Critic demonstrates a better performance while showing compositional ability and robustness of the maximum entropy policy in locomotion and robot manipulation task. The agent aims to maximise the expected reward while also maximising the entropy to succeed at the task while acting as randomly as possible. Based on maximum entropy policies the agent is able to develop diverse skills by solely maximising an information theoretic objective without any reward function. For multi-goal and multi-task learning the diversity of training sets helps the agent to transfer skills to unseen goals and tasks. The variability of training samples mitigate over-fitting and helps the model to better generalise.
It is an objective of the present invention to solve or at least alleviate the above-mentioned problems. Therefore, a computer-implemented method of training artificial intelligence (AI) systems according to independent claim 1 as well as a corresponding computer-readable medium and a computer system according to the further independent claims. Embodiments and refinements of the present invention are subject of the respective dependent claims.
According to a first aspect of the present invention a computer-implemented method of training artificial intelligence (AI) systems or rather agents (Maximum Entropy Regularised multi-goal Reinforcement Learning) comprises the iterative step of sampling a real goal ge and for each episode of each epoch of the training the iterative steps of sampling an action at, stepping an environment, updating an replay buffer , constructing a prioritised sampling distribution q(τg), sampling goal state trajectories τg [small Tau] and updating a single-goal conditioned behaviour policy θ [small Theta] as well as after each episode for each epoch of the training the step of updating a density model Φ [capital Phi]. In the step of sampling the real goal ge, the real goal ge of a multitude of real goals Ge with a probability p(ge) and an initial state s0 with a probability of p(s0) are sampled. In the step of sampling an action at, an action at from the single-goal conditioned behaviour policy θ that is represented by a Universal Value Function Approximator (UVFA) is sampled. In the step of stepping the environment, the environment is stepped for a new state st+1 with the sampled action at. In the step of updating the replay buffer
, the replay buffer
that comprises a distribution p(τg) of goal state trajectories τg is updated with the current state st and the current action at. The goal state trajectories τg contain pairs of states st from a multitude of states St and corresponding actions at from a multitude of actions At. In the step of constructing the prioritised sampling distribution q(τg), the prioritised sampling distribution q(τg) is stepped with a higher entropy
q(Tg) than the distribution p(τg) of goal state trajectories τg in the replay buffer
. In the step of sampling the goal state trajectories τg, the goal state trajectories τg are sampled with the prioritised sampling distribution q(τg) and a current density model Φ (q(τg|Φ)). In the step of updating the single-goal conditioned behaviour policy θ, the single-goal conditioned behaviour policy θ is updated to a maximum of an Energy
q of a reward r for the states St and real goals Ge (max
q[r(St,Ge]). After the previous steps are iteratively executed for each episode of the current epoch, in the step of updating the density model Φ, the density model Φ is updated (in each epoch). All iterative steps are executed as long as the computer-implemented method has not converged.
According to a second aspect of the present invention a computer program comprises instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the method according to the first aspect of the present invention.
According to a third aspect of the present invention a computer-readable medium has stored thereon the computer program according to the second aspect of the present invention.
According to a fourth aspect of the present invention a data processing system comprises means for carrying out the steps of the method according to the first aspect of the present invention.
In multi-goal RL, the agent learns to achieve multiple goals with a single-goal conditioned behaviour policy. Such a single-goal conditioned behaviour policy is represented with the UVFA. For off-policy approaches, the agent collects trajectories comprising states and corresponding actions into the replay buffer . During training, the trajectories are selected randomly from the replay buffer
for replay. However, in common experience replay methods, the uniformly sampled trajectories are biased towards the behaviour policies, with respect to the achieved goal states. In other words, in common experience replay methods the achieved goals in the replay buffer
are often biased because of the behaviour policies. From a Bayesian perspective, when there is no prior knowledge of the target goal distribution, the agent should rather learn from different achieved goals uniformly. Consider training a robot arm to reach a certain point in a space. At the beginning, the agent samples trajectories with a random policy. The sampled trajectories are centred around the initial position of the robot arm. Therefore, the distribution of achieved goals, i.e. positions of the robot arm, is similar to a Gaussian distribution around the initial position, which is non-uniform. Sampling from such a distribution is biased towards the current policies. However, from the Bayesian point of view the agent should learn from these achieved goals uniformly, when there is no prior knowledge of the target goal distribution. To correct this bias, the present invention provides a different objective, which combines maximum entropy and the multi-goal RI, objective. The multi-goal RL objective of the present invention uses entropy as a regulariser to encourage the agent to traverse diverse goal states. Furthermore, a safe lower bound for optimisation is provided.
The computer-implemented method implements a Maximum Entropy Regularised (MER) multigoal Reinforcement Learning (RL) objective based on weighted entropy. This MER multi-goal RL objective encourages an agent to maximise the expected return as well as to achieve more diverse goals. The MER multi-goal RL objective is regularised via a Maximum Entropy based Prioritisation (MEP) framework. In other words, maximum entropy is combined with multi-goal RL to facilitate the agent to achieve unseen goals by learning from diverse achieved goals uniformly during training. The MEP framework may further be combined with Deep Deterministic Policy Gradient (DDPG) with or without Hindsight Experience Replay (HER).
The present invention regards multi-goal reinforcement learning tasks like robotic (simulation) scenarios (for example Open AI Gym which comprises six tasks including push, slide and pick & place with a robot arm as well as hand manipulation of a block, egg and pen). In the present invention the goals g may be the desired positions and the orientations of an object in robotic (simulation) scenarios. Specifically, ge, where e stands for environment, denotes the real goal, which serves as the input from an environment. A state s comprises two sub-vectors, one achieved goal state sg (e.g. position and orientation of the object being manipulated) and one context state sc, i.e.
s=(sg∥sc)
where ∥ denotes concatenation. The context state sc contains the rest of the state information (e.g. linear and angular velocities of all robot joints and of the object). Achieved goals gs can be represented by states leading to the concept of achieved goal states. In the present invention
g
s
=s
g
is defined to represent an achieved goal as an achieved goal state gs, which has the same dimension as the real goal ge from the environment. The real (environmental) goals ge can be substituted with the achieved goal states gs to facilitate learning (i.e. goal relabeling in HER). A trajectory consisting solely of achieved goal states gs is represented as τg, i.e.
τg=(g0s, . . . ,gTs)
The present invention considers sparse rewards r. There is a tolerated range between the desired goal states and the achieved goal states. If the object is not in the tolerated range of the real goal, the agent receives a reward signal −1 for each transition; otherwise, the agent receives a reward signal 0. In multi-goal settings, the agent receives the real goal ge and the state input
s=(sg∥sc)
Thereby, a single-goal conditioned policy is trained to generalise to different real goals ge well.
The agent interacts with the environment. The environment is fully observable, including a set S of states s, a set A of actions a, a distribution of initial states p(s0), transition probabilities p(st+1|st, at), a reward function r: S×A→ and a discount factory γϵ[0,1].
UVFA essentially generalises the value functions (Q-functions) to multiple achieved goal states gs where Q-values depend not only on state-action pairs (st, at), but also on the achieved goal states gs.
Weighted entropy is an extension of Shannon entropy. The definition of weighted entropy is given by
where wk is the weight of the elementary event and pk is the probability of the elementary event.
In the following the MER multi-goal RL objective and the MEP framework of the present invention are formally described and mathematically derived.
The MER multi-goal RL is considered as goal-conditioned policy learning. Random variables are denoted with upper case letters and the values of random variables with corresponding lower case letters. Let Val(X)=x denote the set of valid values to a random variable X. p(x) is used to denote the probability function of the random variable X. The agent receives a goal geϵVal(Ge) at the beginning of an episode. The agent interacts with the environment for T timesteps. At each timestep t, the agent observes a state stϵVal (St) and performs an action atϵVal(At). The agent also receives the reward r conditioned on the input real goal ge, i.e. r(st, ge)ϵ. A trajectory is denoted by
τ=(s1,a1,s2,a2, . . . ,sT−1,aT−1,sT)
where τϵVal (T[capital Tau]). The probability p(τ|ge, θ) of trajectory τ, given goal ge and a single-goal conditioned behaviour policy parameterised by θϵVal(Θ) [capital Phi] is given by
The transition probability p(st+1|st+1, at) states that the probability of a state transition given an action at is independent of the real goal ge, which is denoted with St+1Ge|St,At. For every τ, ge and θ, it is also assumed that p(τ|ge, θ) is non-zero. The expected return of a policy parameterised by θ is given by
where is an Expectation of the return (accumulated rewards) of the single-goal conditioned behaviour policy θ.
Off-policy RL methods use experience replay to trade bias over variance and potentially improve the sample-efficiency. In the off-policy case, the objective, equation (2), is given by
where denotes the replay buffer. Commonly, the trajectories τ are randomly sampled from the replay buffer
. As in the common case the trajectories in the replay buffer
are often imbalanced with respect to the achieved goal states gs in the goal state trajectory τG, in MER multigoal RL the multi-goal RL method is regularised by the MEP framework to improve performance.
In MER multi-goal RL the agent is encouraged to traverse diverse goal state trajectories τG and at the same time to maximise the expected return. A respective reward weighted entropy objective for the MER multi-goal RL is given by
For simplicity p(τg) is used to represent (τg,ge|θ), which is the occurrence probability of the goal state trajectory τg. The expectation operation is with respect to p(τg) as well, so the proposed objective is the weighted entropy of the goal state trajectory τg, which is denoted as
pw(Tg), where the weight w is the accumulated reward Σt=1Tr(St, Ge). The objective function, equation (4), has two interpretations. The first interpretation is to maximise the weighted expected return, where the rare goal state trajectories τg have larger weights w. Note that when all goal state trajectories occur uniformly, this weighting mechanism has no effect. The second interpretation is to maximise a reward weighted entropy, where the more rewarded goal state trajectories τg have higher weights w. This objective encourages the agent to learn how to achieve diverse goal states gs, as well as to maximise the expected return. In equation (4), the weight
is unbounded, which makes the training of the universal function approximator unstable. Therefore, a safe surrogate objective (θ) is provided, which is essentially a lower bound of the original reward weighted entropy objective
(θ).
To construct the safe surrogate objective (θ), goal state trajectories τg from the replay buffer
are sampled with a prioritised sampling distribution or rather proposal probability density function/distribution
represents the density function/distribution of the goal state trajectories in the replay buffer . The surrogate objective
(θ) is a lower bound of the original reward weighted entropy objective
(θ), i.e.
(θ)≤
(θ), where
Z is the normalisation factor for q(ôg).
To optimise the surrogate objective, equation (5), the optimisation process is cast into the MEP framework or rather prioritised sampling framework. At each iteration first the prioritised sampling distribution/proposal probability density function q(ôg) is constructed, which has an equal or higher entropy than p(ôg). This ensures that the agent learns from a more diverse goal state distribution. The entropy with respect to q(ôg) is higher than the entropy with respect to p(ôg):
The probability density function of achieved goal states in the replay buffer is p(ôg), where
The prioritised sampling distribution or rather proposal probability density function is defined as
The proposal goal probability density function (distribution) q(ôig) has an equal or higher entropy than the probability density function of achieved goal states p(ôg) in the replay buffer
q(Ôg)−p(Ôg)≥0 (9)
In order to optimise the surrogate objective with, equation (5), prioritised sampling, the probability distribution of a goal state trajectory p(ôg) needs to be known. A Latent Variable Model (LVM) is used to model the underlying distribution of p(ôg) because LVM is suitable for modelling complex distributions. Specifically p(ôg|zk) is used to denote the latent variable conditioned goal state trajectory probability density function (distribution), which is assumed as Gaussians. zk is the k-th latent variable, where kϵ{1, . . . , K} and K is the number of the latent variables. The resulting model is a Mixture of Gaussians (MoG), mathematically:
where each Gaussian (ôg|i, Σt) has its own mean
t and covariance Σt, ci are the mixing coefficients and Z is the partition function. The model parameter Ö includes all mean covariance Σi, and mixing coefficients ci. In prioritised sampling, the complementary predictive density of a goal state trajectory ôg is used as the priority, which is given by
(ôg|Ö)∝1−p(ôg|Ö) (11)
The complementary predictive density p(ôg|Ö) describes the likelihood that a goal state trajectory ôg occurs rarely in the replay buffer . A high complementary predictive density
q(ôg)∝
With prioritised sampling, the agent learns to maximise the return of a more diverse goal state distribution. When the agent replays the samples, it first ranks all the goal state trajectories ôg with respect to their proposal distribution p(ôg), and then uses the ranking number directly as the probability for sampling. This means that rare achieved goal states gs have high ranking numbers and, equivalently, have higher priorities to be replayed. Here the ranking is used instead of the density directly. The reason is that the rank-based variant is more robust because it is neither affected by outliers nor by density magnitudes. Furthermore, its heavy-tail property also guarantees that samples will be diverse. Mathematically, the probability of a trajectory to be replayed after the prioritisation is:
where N is the total number of goal state trajectories ôg in the replay buffer , and rank (⋅) is the ranking function.
Thus, MER multi-goal RL is provided to enable RL agents to learn more efficiently in multi-goal tasks. Further, a goal entropy term is integrated into the reward weighted entropy objective (expected return objective), equation (4). To maximise the reward weighted entropy objective, equation (4), a surrogate objective is derived, i.e. a lower bound of the original reward weighted entropy objective. Prioritised sampling based on a higher entropy proposal distribution is used in each iteration and off-policy RL methods are used to maximise the expected return. This framework is implemented as the MEP framework.
In the following an exemplary algorithm according to the present invention is given in pseudocode:
q(Ôg)
q [r(St, Ge)]
The iteration may continue until the method has converged to the optimal policy or until a predefined criterion is met (e.g. number of epochs).
The computer-implemented method according to the first aspect of the present invention (MER multi-goal RI, method) improves the performance and sample-efficiency in training AI systems for a fair trade-off of computational time.
According to a refinement of the present invention the step of updating (6) the goal conditioned behaviour policy é is based on a Deep Deterministic Policy Gradient (DDPG) method and/or on a Hindsight Experience Replay (HER) method.
For continuous control tasks DDPG shows promising performance, which is essentially an off-policy actor-critic method. More details regarding DDPG??? Thereby the ideas underlying the success of Deep Q-Learning are adapted to the continuous action domain. The actor-critic, model-free method is based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters allows for robustly solving tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. The method is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives.
In particular for robotic tasks, if the goal is challenging and the reward is sparse, the agent could perform badly for a long time before learning anything. HER encourages the agent to learn from whatever goal states it has achieved. HER makes training possible in challenging robotic tasks via goal relabeling, i.e. randomly substituting real goals ge with achieved goals gs. Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). HER allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum.
In the following an exemplary algorithm according to the refinement of the present invention is given in pseudo-code:
q(Ôg)
g [r(St, Ge)] via DDPG, HER
With DDPG and especially with DDPG and HER the performance in continuous control tasks (e.g. robotic (simulation) scenarios) can be improved.
The present invention and its technical field are subsequently explained in further detail by exemplary embodiments shown in the drawings. The exemplary embodiments only conduce better understanding of the present invention and in no case are to be construed as limiting for the scope of the present invention. Particularly, it is possible to extract aspects of the subject-matter described in the figures and to combine it with other components and findings of the present description or figures, if not explicitly described differently. Equal reference signs refer to the same objects, such that explanations from other figures may be supplementally used.
In
In
As depicted in
In the step of sampling 1 the real goal ge, the real goal ge of a multitude of real goals Ge with a probability p(ge) and an initial state s0 with a probability of p(s0) are sampled. The real goals ge of the multitude of real goals Ge are environmental goals like desired position and orientation of an object which has to be manipulated by a robot arm. The initial state s0 comprises the initial state like the initial position and orientation of the robot arms and all its joints.
In the step of sampling 2 an action at, an action at from the single-goal conditioned behaviour policy é that is represented by a Universal Value Function Approximator (UVFA) is sampled. The actions at lead from the current state st to the next state st+1. The states st comprise two sub-vectors, one achieved goal state sg (e.g. position and orientation of the object being manipulated) and one context state sc (s=(sg∥sc)). An achieved goal gs can be represented by a state and, thus, the achieved goal states can be written gs=sg UVFA essentially generalises the value functions (Q-functions) to multiple achieved goal states gs where Q-values depend not only on state-action pairs (st, at), but also on the achieved goal states gs.
In the step of stepping 3 the environment, the environment is stepped for a new state st+1 with the sampled action at.
In the step of updating 4 the replay buffer , the replay buffer
that comprises a distribution p(ôg) of goal state trajectories ôg is updated with the current state st and the current action at. The goal state trajectories ôg contain pairs of states st from a multitude of states St and corresponding actions at from a multitude of actions At.
In the step of constructing 5 the prioritised sampling distribution or rather proposal probability density function q(ôg), the prioritised sampling distribution q(ôg) is stepped with a higher entropy q(Ôg) than the distribution p(ôg) of goal state trajectories ôg in the replay buffer
. Goal state trajectories ôg with a lower probability are chosen more likely due to the prioritised sampling distribution q(ôg). This leads to a uniform selection of goal state trajectories ôg.
In the step of sampling 6 the goal state trajectories ôg, the goal state trajectories ôg are sampled with the prioritised sampling distribution q(ôg) and a current density model Ö (q(ôg|Ö)).
In the step of updating 7 the single-goal conditioned behaviour policy é, the single-goal conditioned behaviour policy é is updated to a maximum of an Energy q of a reward r for the states St and real goals Ge (max
q [r(St, Ge)]).
After the previous steps 2 to 7 are iteratively executed for each episode of the current epoch fe2, in the step of updating 8 the density model Ö, the density model Ö is updated for each epoch fe1.
All iterative steps are executed as long as the computer-implemented method has not converged. The method may converge to the optimal policy and/or until a predefined criterion is fulfilled (e.g. a number of epochs). This is checked (y: yes/n: no) 9 after each epoch or before a new epoch is started. For example, a criterion for convergence may be a simple upper limit C of epochs, for example C=200. The upper limit C is preferably between 50 to 200.
In
In each episode of each epoch the agent 10 (AI system) samples an action at from the from the single-goal conditioned behaviour policy é represented as UVFA (step 2).
Then the environment 20 is stepped with the sampled action at from the current state st (e.g. current position and orientation of the object being manipulated and of the robot arm used for manipulation) to the next state st+1 (step 3).
Based on the sampled action at and the state st the replay buffer is updated (step 4).
Afterwards the prioritised sampling distribution or rather proposal probability density function q(ôg) with higher entropy q(Ôg) than the distribution p(ôg) of goal state trajectories ôg in the replay buffer
is constructed (step 5).
With the constructed prioritised sampling distribution q(ôg) the goal state trajectories ôg in the replay buffer are sampled (step 6).
The sampled goal state trajectories ôg, the new state st+1 (state for the next iteration/episode) and the corresponding action at are provided to the agent 10 for gaining “new experience”. Further, the single-goal conditioned behaviour policy é is updated to max q [r(St, GE)] (step 7 not depicted in
After each episode of the current epoch the density model Ö of the agent 10 is updated (step 8).
The steps 2 to 8 are iteratively repeated as described above for each epoch of the training. After the method has converged, no further epoch of the training is started (by sampling 1 a new real goal ge and a new initial state s0, step 1).
In
The performance of the method according to the present invention (MER multi-goal RL method) has been tested on a variety of simulated robotic tasks (i.e. OpenAI Gym: Push, Pick & Place, Slide, Egg, Block and Pen) and compared with state of the art methods as baselines, including DDPG and HER. The most similar method to MER multi-goal RL seems to be Prioritised Experience Replay (PER) (combined with DDPG(+HER)). In the experiments, first the performance improvement of MER multi-goal RL has been compared to DDPG with/without HER and to PER (with DDPG with/without HER). Afterwards, the time-complexity of MER multi-goal RL has been compared to DDPG(+HER) and to PER(+DDPG(+HER)). As will be subsequently described in detail MER multi-goal RL improves performance with much less computational time than DDPG(+HER) and PER.
A principle difference between MER multi-goal RL and PER is that PER uses TD-errors, while MER multi-goal RL is based on the entropy.
To test the performance difference among methods including DDPG, PER+DDGP and MER multi-goal RL (MERmgRL)+DDGP, the experiment has been run in the three robot arm environments of OpenAI Gym. The DDPG has been used as the baseline because the robot arm environment is relatively simple. In the more challenging robot hand environments the DDPG+HER has been used as the baseline and the performance among DDPG+HER, PER+DDPG+HER, and MER multi-goal RL+DDPG+HER has been tested. To combine PER with HER, the TD-error of each transition has been calculated based on the randomly selected achieved goals. Then the transitions with higher TD-errors have been prioritised for replay. The mean success rates have been compared. Each experiment has been carried out with 5 random seeds and the shaded areas in
From
In
To compare the sample-efficiency of the baseline and MER multi-goal RL, the number of training samples needed for a certain mean success rate has been compared. The comparison is shown in
In
To further understand why maximum entropy in goal space facilitates learning, it is looked into the TD-errors during training. The correlation between the complementary predictive density
In
Here, exemplarily a computer-readable storage disc 20 like a Compact Disc (CD), Digital Video Disc (DVD), High Definition DVD (HD DVD) or Blu-ray Disc (BD) has stored thereon the computer program according to the second aspect of the present invention and as schematically shown in
In
The data processing system 30 may be a personal computer (PC), a laptop, a tablet, a server, a distributed system (e.g. cloud system) and the like. The data processing system 30 comprises a central processing unit (CPU) 31, a memory having a random access memory (RAM) 32 and a non-volatile memory (MEM, e.g. hard disk) 33, a human interface device (HID, e.g. keyboard, mouse, touchscreen etc.) 34 and an output device (MON, e.g. monitor, printer, speaker, etc.) 35. The CPU 31, RAM 32, HID 34 and MON 35 are communicatively connected via a data bus. The RAM 32 and MEM 33 are communicatively connected via another data bus. The computer program according to the second aspect of the present invention and schematically depicted in
In particular, the CPU 31 and RAM 33 for executing the computer program may comprise several CPUs 31 and several RAMs 33 for example in a computation cluster or a cloud system. The HID 34 and MON 35 for controlling execution of the computer program may be comprised by a different data processing system like a terminal communicatively connected to the data processing system 30 (e.g. cloud system).
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations exist. It should be appreciated that the exemplary embodiment or exemplary embodiments are only examples, and are not intended to limit the scope, applicability, or configuration in any way. Rather, the foregoing summary and detailed description will provide those skilled in the art with a convenient road map for implementing at least one exemplary embodiment, it being understood that various changes may be made in the function and arrangement of elements described in an exemplary embodiment without departing from the scope as set forth in the appended claims and their legal equivalents. Generally, this application is intended to cover any adaptations or variations of the specific embodiments discussed herein.
In the foregoing detailed description, various features are grouped together in one or more examples for the purpose of streamlining the disclosure. It is understood that the above description is intended to be illustrative, and not restrictive. It is intended to cover all alternatives, modifications and equivalents as may be included within the scope of the invention. Many other examples will be apparent to one skilled in the art upon reviewing the above specification.
Specific nomenclature used in the foregoing specification is used to provide a thorough understanding of the invention. However, it will be apparent to one skilled in the art in light of the specification provided herein that the specific details are not required in order to practice the invention. Thus, the foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed; obviously many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. Throughout the specification, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein,” respectively. Moreover, the terms “first,” “second,” and “third,” etc., are used merely as labels, and are not intended to impose numerical requirements on or to establish a certain ranking of importance of their objects. In the context of the present description and claims the conjunction “or” is to be understood as including (“and/or”) and not exclusive (“either . . . or”).