Aspects of the present disclosure relate generally to artificial intelligence (AI), and more particularly, to training a Neural Network (NN) model for imitating demonstrator's behavior.
Imitation learning (IL) has been used in many real-world applications, such as automatically playing computer games, automatically playing chess, intelligent self-driving assistance, intelligent robotic locomotion, and so on. It's still a challenge to learn skills for a learner or agent implemented by a neural network model from long-horizon unannotated demonstrations.
There are mainly two kinds of imitation learning methods, Behavioral cloning (BC) and Inverse reinforcement learning (IRL). Behavioral cloning, while appealingly simple, only tends to succeed with large amounts of data a, due to compounding error. Inverse reinforcement learning learns a cost function that prioritizes entire trajectories over others, so compounding error, a problem for methods that fit single-time step decisions, is not an issue. Accordingly, IRL has succeeded in a wide range of problem, but many IRL algorithms are extremely expensive to run on computing resources. As an implementation of IRL, Generative adversarial imitation learning (GAIL) is an imitation learning method that directly learn policy based on expert data without learning the reward function, thus greatly reducing the amount of calculation.
Although GAIL exhibits decent performance, an improvement in the structure and performance for imitation learning would be desirable.
The disclosure proposes a novel and enhanced hierarchical imitation learning framework, Option-GAIL, which is efficient, robust and effective in training a neural network model for imitating demonstrator's behavior in various practical applications such as self-driving assistance, robotic locomotion, AI computer games and the so on. The neural network model being trained for imitating demonstrator's behavior may be referred to as agent, learner, imitator or the like.
According to an embodiment, there provides a method for training a Neural Network (NN) model for imitating demonstrator's behavior, comprising: obtaining demonstration data representing the demonstrator's behavior for performing a task, the demonstration data includes state data, action data and option data, wherein the state data correspond to a condition for performing the task, the option data correspond to subtasks of the task, and the action data correspond to the demonstrator's actions performed for the task; sampling learner data representing the NN model's behavior for performing the task based on a current learned policy, the learner data includes state data, action data and option data, wherein the state data correspond to a condition for performing the task, the option data correspond to subtasks of the task, and the action data correspond to the NN model's actions performed for the task, the policy consists of a high level policy part for determining a current option and a low level policy part for determining a current action; and updating the policy by using a generative adversarial imitation learning (GAIL) process based on the demonstration data and the learner data.
According to an embodiment, there provides a method for A method for training a Neural Network (NN) model for self-driving assistance, comprising: training the NN model for self-driving assistance using the method as mentioned above as well as using the method according to aspects of the disclosure, wherein the demonstration data represents a driver's behavior for driving a vehicle.
According to an embodiment, there provides a method for training a Neural Network (NN) model for controlling robot locomotion, comprising: training the NN model for controlling robot locomotion using the method as mentioned above as well as using the method according to aspects of the disclosure, wherein the demonstration data represents a demonstrator's locomotion for performing a task.
According to an embodiment, there provides a method for controlling a machine with a trained Neural Network (NN) model, comprising: collecting environment data related to performing a task by the machine; obtaining state data and option data for the current time instant based at least in part on the environment data; inferring action data for the current time instant based on the state data and the option data for the current time instant with the trained NN model; and controlling action of the machine based on the action data for the current time.
According to an embodiment, there provides a vehicle capable of self-driving assistance, comprising: sensors configured for collecting at least a part of environment data related to performing self-driving assistance by the vehicle; one or more processors; and one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
According to an embodiment, there provides a robot capable of automatic locomotion, comprising: sensors configured for collecting at least a part of environment data related to performing automatic locomotion by the robot; one or more processors; and one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
According to an embodiment, there provides a computer system, which comprises one or more processors and one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
According to an embodiment, there provides one or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
According to an embodiment, there provides a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
By using the hierarchical option-GAIL training method, the training efficiency, robustness and effectiveness as well as the inference accuracy of the trained NN model are improved. Other advantages and enhancements are explained in the description hereafter.
The disclosed aspects will hereinafter be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.
The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.
Various embodiments will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and embodiments are for illustrative purposes, and are not intended to limit the scope of the disclosure.
The apparatus 100 illustrated in
The vehicle 100 can be equipped with various sensors 110 for sensing the condition in which the vehicle is running. The term condition may also be referred to as circumstance, state, context and so on. In the illustrated example of
The apparatus 100 may include a processing system 120. The processing system 120 may be implemented in various ways, for example, the processing system 120 may include one or more processors and/or controllers as well as one or more memories, the processors and/or controllers may execute software to perform various operations or functions, such as operations or functions according to various aspect of the disclosure.
The processing system 120 may receive sensor data from the sensors 110, and perform various operations by analyzing the sensor data. In the example of
The condition detection module 1210 may be configured to determine conditions relating to the operation of the car.
The condition relating to the operation of the car may include weather, absolute speed of the car, relative speed to preceding car, distance from preceding car, distance from nearby cars, azimuth angle relative to nearby cars, existence or not of obstacle, distance to obstacle, and so on. It is appreciated that the condition may include other types of data such as the navigation data from a navigation system, and may not include all the exampled data. And some of the condition data may be directly obtained by the sensor module 110 and provided to the processing module 120.
The action determination module 1220 determines the action to be performed by the car according to the condition data or state data from the condition detection module 1210. The action determination module 1220 may be implemented with a trained NN model, which can imitate a human driver's behavior for driving a car. For example, the action determination module 1220 can obtain the state data such as the above exampled condition data for the current time step and infer the action to be performed for the current time step based on the obtained state data.
At the block 210, training data are obtained. The training data may also be referred to as demonstration data, expert data and so on which represent the behavior of demonstrators or experts for performing a task such as driving a car. The demonstration data may be in the form of a trajectory, which includes a series of data instances for a series of time steps along the trajectory. For example, a trajectory τ=(s0:T, a0:T), where s0:T denotes (s0, . . . , ST) representing multiple state instances for T+1 time steps, a0:T denotes (a0, . . . , aT) representing multiple action instances for T+1 time steps. The training data set may be denoted as demo={τE=(s0:T, a0:T)}. where demo is the demonstration data set, τE is the trajectory representing demonstration data of an expert or a demonstrator.
The state sn may be defined in multiple dimensions, for example, the dimensions may represent the above exampled types of condition data such as whether, speed, distance, navigation information and so on. The action an may be defined in multiple dimensions, for example, the dimensions may represent the action that would be taken by an expert driver such as braking, steering, parking and so on. It is appreciated that the data of trajectory composed of states and actions are known in the art and the disclosure is not limited thereto. In order to obtain the demonstration data, human drivers may drive the car as shown in
At step 220, a NN model may be trained with the demonstration data to imitate the behavior of the expert for performing a task such as the exampled driving task. The NN model may be referred to as a learner, agent, imitator and so on. In an embodiment, a new option-GAIL hierarchical framework may be used to train the agent model. The option-GAIL hierarchical framework will be illustrated hereafter.
A Markov Decision Process (MDP) is a 6-element-tuple (, , Ps,s′a, Rsa, μ0, γ), where (, ) denote the state-action space, for example, the above exampled state data s and action data a of a trajectory belong to the state-action space; Ps,s′a=P(st+1=s′|St=s, at=a) is the transition probability of next state s′ϵ given current state sϵ and action aϵ, determined by the environment; Rsa=[rt|st=s, at=a] returns the expected reward from the task when taking action a on state s; μ0(s) denotes the initial state distribution and γϵ[0,1) is a discounting factor. The effectiveness of a policy π(a|s) is evaluated by its expected infinite-horizon reward:
Options ={1, . . . , K} may be introduced for modeling the policy-switching procedure on a long-term task, where K represents the number of options. In an example, the options may correspond to subtasks or scenarios of a task. For example, for the task of autonomous driving, different scenarios may include express way, city way with high, normal or low traffic, mountain way, rough road, parking, day driving, night driving, conditional weather such as rain, snowing, foggy, etc., or some combination of the above scenarios. An option model is defined as a tuple ϑ=(, , {IIo, πo, βo}oϵ, πϑ(o|s), Ps,s′a, Rsa, μ0, γ), where, , , Ps,s′a, Rsa, μ0, γ are defined as the same as MDP; IIo⊆ denotes an initial set of states, from which an option can be activated; βo(s)=P(b=1|s) is a terminate function which decides whether current option should be terminated or not on a given state s; πo(a|s) is the intra-option policy that determines an action on a given state within an option o; a new option is activated in the call-and-return style by an inter-option policy , πϑ(o|s) once the last option or previous option terminates.
Generative adversarial imitation learning (GAIL) (Ho, J. and Ermon, S. Generative adversarial imitation learning. In Proc. Advances in Neural Inf. Process. Syst., 2016.) is an imitation learning method that casts policy learning upon Markov Decision Process (MDP) into an occupancy measurement matching problem. Given expert demonstrations demo={τE=(S0:T, a0:T)} on a specified task such as driving a car, imitation learning aims at finding a policy π that can best reproduce the expert's behavior, without the access of the real reward. The GAIL cast the original maximum-entropy inverse reinforcement learning problem into an occupancy measurement matching problem:
where, Df computes f-Divergence between ρπ(s, a) and ρ90
where π denotes the expectation of the agent under its learned policy π, π
The above introduced option model ϑ=(, , {IIo, πo, βo}0ϵ, πϑ(o|s), Ps,s′a, Rsa, μ0, γ) may be used for modeling switching procedure on hierarchical subtasks. However, it is inconvenient to directly learn the policy πo and πϑ of this framework due to the difficulty of determining the initial set IIo, and break condition βo.
In one embodiment, this option model may be converted to a one-step option, which is defined as one-step ϑone-step=(, , +, πH, πL, Ps,s′a, Rsa, {tilde over (μ)}0, γ), where +=∪{#} consists of all options plus a special initial option class satisfying o−1≡#, β#(s)≡1. Besides, {tilde over (μ)}0(s, o)≐μ0(s) o=#, where ⇒x=y is the indicator function, and it is equal to 1 iff x=y, otherwise 0. Among the above math symbols, “≡” stands for “identically equal to”, “≐” stands for “be defined as”, “iff” stands for “if and only if”. The high-level policy πH and low-level policy πL are defined as:
It can be derived that the one-step option model is equivalent to the full option model, that is, ϑone-step=O, under practical assumptions. That is, each state is switchable: Io=, ∀oϵ, and each switching is effective: P(ot=ot−1|βo
This equivalence is beneficial as the switching behavior can be characterized by only looking at the high-level policy TH and low-level policy TIL without the need to justify the exact beginning/breaking condition of each option. A overall policy {circumflex over (π)} may be defined as {tilde over (π)}≐(πH, πL), and {tilde over (Π)}={{tilde over (π)}} denotes a set of policies.
In order to take advantage of the one-step option ϑone-step and the GAIL, an option-occupancy measurement may be defined as
The measurement ρ{tilde over (π)}(s, a, o, o′) can be explained as the distribution of the state-action-option tuple generated by the policy {tilde over (π)} composed of high-level policy part πH and low-level policy part πL on a given {tilde over (μ)}0 and Ps,s′a. According to the Bellman Flow constraint, one can easily obtain that the option-occupancy measurement ρ{tilde over (π)}(s, a, o, o′) belongs to a feasible set of affine constraint
In order to train an agent to imitate expert's behavior for performing a task based on long-term demonstrations such as long-term trajectory data, the GAIL is no longer suitable for this scenario since it is hard to capture the hierarchy of sub-tasks by MDP. In an embodiment, the long-term task that can be divided into multiple subtasks may be modeled via the one-step Option upon GAIL, and the policy {tilde over (π)} is learned by minimizing the discrepancy of the occupancy measurement between expert and agent.
Intuitively, for the hierarchical subtasks, the action determined by the agent depends not only on the current state observed but also on the current option selected. By the definition of the one-step option ϑone-step, the hierarchical policy {tilde over (π)} is relevant to the information of current state, current action, last-time option and current option. In an embodiment the option-occupancy measurement is utilized instead of conventional occupancy measurement to depict the discrepancy between expert and agent. Actually, there is a one-to-one correspondence between the set of policies {tilde over (Π)} and the set of affine constraint .
For each ρϵ, it is the option-occupancy measurement of the following policy:
and {tilde over (π)}=(πH, πL) is the only policy whose option-occupancy measure is ρ.
With the above observation, optimizing the option policy is equivalent to optimizing its induced option-occupancy measurement, since ρ{tilde over (π)}(s, a, o, o′)=ρ{tilde over (π)}
Note that the optimization problem defined on Equation (5) implies the optimization problem defined on Equation (3), but not vice verse: first, since ρ{tilde over (π)}(s, a)=Σo,o′ρ{tilde over (π)}(s, a, o, o′), it can derive that ρ{tilde over (π)}(s, a, o, o′)=π{tilde over (π)}
In an embodiment, the expert options are observable and are given in the training data, therefore the option-extended expert demonstrations, which is denoted as demo={{tilde over (τ)}=(S0:T, a0:T, o−1:T)} where {tilde over (τ)}is a trajectory with option data additionally, may be used to train the hierarchical policy {tilde over (π)}.
Rather than calculating the exact value of the option-occupancy measurement, the discrepancy may be estimated by virtue of adversarial learning. A parametric discriminator is defined as Dθ(s, a, o, o′): ×××+(0,1). If specifying the f-divergence as Jensen-Shannon divergence, Equation (5) can be converted to a min-max game:
The inner loop of equation (6) is to train Dθ(s, a, o, o′) with the expert demonstration demo and the samples generated by self-exploration with the learned policy {tilde over (π)}. It is appreciated that θ denotes the parameters of the discriminator Dθ(s, a, o, o′), which is trained by optimizing θ. {tilde over (π)} in equation (6) may also be denoted as {tilde over (D)}
where c(s, a, o, o′)=log Dθ(s, a, o, o′) and the causal entropy ({tilde over (π)}t)={tilde over (π)}[−log πH−log πL] is used as a policy regularizer with λϵ[0,1]. The cost function is related to options, which is different from many HRL problems with option agnostic cost/reward (Zhang, S. and Whiteson, S. DAC: The double actor-critic architecture for learning options. In Proc. Advances in Neural Inf. Process. Syst., 2019). In order to deal with the cost function related to options, equation (7) may be optimized using similar idea as the DAC.
Particularly, the option model may be characterized as two-level MDPs. For the high-level MDP, state, action and reward may be defined as
For the low=level MDP, state, action and reward may be defined as
with the posterior probability
Other terms including the initial state distributions μ0H and μ0L, the state-transition dynamics PH and PL may be defined similar to the DAC. Then, the HRL task on Equation 7 can be separated into two non-hierarchical ones with augmented MDPs: H=(SH, AH, PH, RH, μ0H, γ) and L=(SL, AL, PL, RL, μ0L, γ), whose action decisions depend on πH and πL, separately. Such two non-hierarchical problems can be solved alternatively by utilizing typical reinforcement learning methods like PPO (Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.).
Referring back to equation (6), by alternating the inner loop and the outer loop, the policy {tilde over (π)}* that addresses Equation 5 can be derived. In an embodiment, with option-extended expert trajectories demo={{tilde over (Σ)}=(S0:T, a0:T, o−1:T)} and initial policy {tilde over (π)}, the policy optimization such as that shown in equation (6) may be alternatively performed for sufficient iterations so as to train the policy {tilde over (π)}. With the trained policy, the NN model is expected to be capable of reproducing or imitating the behavior of the expert or demonstrator for performing a task. The following pseudo-code shows an exemplary method for training the agent NN model using the demonstration data.
Initial policy {tilde over (π)}0 may be obtained in various way. For example, it may be obtained by using randomly generated values, predefined values, or some pretrained values. The aspect of the disclosure is not limited to the initial policy.
The sample of the trajectories of the agent may be performed in various way. For example, the NN model with the current policy {tilde over (π)}n may be used to run the task such as the autonomous driving or the robotic locomotion in an emulator, during which the agent sample trajectories sample={{tilde over (Σ)}=(S0:T, a0:T, o−1:7)} may be sampled.
In the above discussed embodiment such as the method 1, the expert options are provided in the training data. However, in practice the expert options are usually not available in the training data or in the inference process of the trained agent. In order to address this potential issue, in an embodiment, the options are inferred from the observed data (states and actions).
Given a policy, the options are supposed to be the ones that maximize the likelihood of the observed state-actions, according to the principle of Maximum-Likelihood-Estimation (MLE). In the embodiment, the expert policy may be approximated with the policy i currently learned by the agent NN model through the method described above. With states and actions observed, the option model will degenerate to a Hidden-Markov-Model (HMM), therefore for example a maximum forward message method (Viterbi, A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE transactions on Information Theory, 13(2):260-269, 1967.) may be used for expert option inference.
The most probable values of o−1:7 are generated given (πH, πL) and t=(S0:T,a0:T)ϵdemo. Specifically, the maximum forward message is recursively calculated by:
It is shown below that deriving the maximum forward message on Equation 8 is able to maximize the probability of the whole trajectory:
By back-tracing ot−1 that induces the maximization on ât(ot) at each time step of the T-step trajectory, the option-extended expert trajectories demo={{tilde over (Σ)}=(S0:T,a0:T, o−1:7)} can finally be obtained.
In an embodiment, with expert-provided demonstrations demo={{tilde over (Σ)}E=(S0:T, a0:T)} and initial policy {tilde over (π)}, the policy optimization such as that shown in equation (6) and option inference such as that shown in equation (8) may be alternatively performed for sufficient iterations so as to train the policy {tilde over (π)}. With the trained policy, the NN model is expected to be capable of reproducing or imitating the behavior of the expert or demonstrator for performing a task. The following pseudo-code 2 shows an exampled method for training the agent NN model using the demonstration data.
The method 2 may be referred to as Expectation-Maximization (EM)-style process: an E-step that samples the options of expert conditioned on the current learned policy, and an M-step that updates the low- and high-level policies of agent simultaneously to minimize the option-occupancy measurement between expert and agent.
At 510, demonstration data representing the demonstrator's behavior for performing a task are obtained. The demonstration data includes state data, action data and option data. The state data correspond to a state for performing the task, the term state may also be referred to as condition, circumstance, context, status or the like. The option data correspond to subtasks of the task, the subtasks may correspond to respective scenarios related to the task. The action data correspond to the demonstrator's actions performed for the task. In an embodiment, the option-extended expert trajectories demo={{tilde over (Σ)}=(S0:T, a0:T, o−1:T)} are example of the demonstration data, where {tilde over (Σ)} represents a trajectory, S0:T represents respective state instances along the trajectory, a0:T represents respective action instances along the trajectory, o−1:T represents respective option instances along the trajectory, T representing the number of time steps along the trajectory.
At 520, learner data representing the NN model's behavior for performing the task are sampled based on a current learned policy. The learner data includes state data, action data and option data. The state data correspond to a state for performing the task, the term state may also be referred to as condition, circumstance, context, status or the like. The option data correspond to subtasks of the task, the subtasks may correspond to respective scenarios related to the task. The action data correspond to the leaner's actions performed for the task. In an embodiment, the sampled learner trajectories sample={{tilde over (Σ)}=(S0:T, a0:T, o−1:7)} are example of the sampled learner data, where {tilde over (Σ)}represents a trajectory, So: represents respective state instances along the trajectory, a0:T represents respective action instances along the trajectory, o−1:T represents respective option instances along the trajectory, T representing the number of time steps along the trajectory. The policy consists of a high level policy part for determining a current option and a low level policy part for determining a current action. In an embodiment, the high level policy part is configured to determine the current option based on a current state and a previous option, and the low level policy part is configured to determine the current action based on the current state and the current option. In an embodiment, each of the high level policy part and the low level policy part is a function of a state, an action, an option and a previous option.
At 530, the policy of the NN model is updated by using a generative adversarial imitation learning (GAIL) process based on the demonstration data and the learner data.
At 5110, initial demonstration data including the state data and the action data without the option data are obtained. In an embodiment, the expert trajectories demo={{tilde over (Σ)}E=(S0:T, a0:T)} may be example of the initial demonstration data.
At 5120, the option data is estimated or inferred by using the current learned policy based on the state data and the action data included in the initial demonstration data.
At 5130, the demonstration data are obtained by supplementing the estimated or inferred option data into the initial demonstration data.
In an embodiment, the inferring the option data at 5120 may comprise: generating the most probable values of the option data by using a Maximum-Likelihood-Estimation process based on the current learned policy as well as the state data and the action data included in the initial demonstration data. In an embodiment, equation (8) is an example of the Maximum-Likelihood-Estimation process for estimating the most probable values of the option data.
At 5310, discrepancy between the demonstrator's behavior and the NN model's behavior is estimated based on the demonstration data and the learner data by using a discriminator. In an embodiment, discrepancy of occupancy measurement between the demonstration data and the learner data is estimated by using the discriminator, wherein the occupancy measurement is a function of a state, an action, an option and a previous option. In an embodiment, each of the high level policy part and the low level policy part is a function of the occupancy measurement.
At 5320, parameters of the discriminator are updated with a target of maximizing the discrepancy in an inner loop.
At 5330, parameters of the current learned policy are updated with a target of minimizing discrepancy in an outer loop. In an embodiment, the parameters of the current learned policy are updated by using a hierarchical reinforcement learning (HRL) process characterized as two-level MDPs. In an embodiment, a policy regularizer used in the HRL process is a function of the high level policy part and the low level policy part.
In an aspect of the disclosure, a method for training a Neural Network (NN) model for self-driving assistance is proposed. The NN model for self-driving assistance is trained using the method of any embodiment described herein, such as the embodiments described with reference to
In an aspect of the disclosure, a method for training a Neural Network (NN) model for controlling robot locomotion is proposed. The NN model for controlling robot locomotion is trained using the method of any embodiment described herein, such as the embodiments described with reference to
At 810, environment data related to performing a task by the machine are collected. For example, the sensors 110 as illustrated in
At 820, state data and option data for the current time instant are obtained based at least in part on the environment data. In an embodiment, state data for the current time instant may be obtained from the environment data, and the option data may be inferred for the current time based on the state data. For example, the option data may be inferred for the current time based on the state data and the option data at the last time step.
At 830, action data for the current time instant is inferred based on the state data and the option data for the current time instant with the trained NN model; and
At 840, action of the machine is controlled based on the action data for the current time.
In an aspect of the disclosure, a vehicle capable of self-driving assistance is provided. For example, as illustrated in
In an aspect of the disclosure, a robot capable of automatic locomotion is provided. For example, as illustrated in
The embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with
The embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with
It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.
It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/097252 | 5/31/2021 | WO |