The present disclosure pertains to the field of artificial intelligence, and in particular to systems and methods for deep neural network (DNN) inference.
Many artificial intelligence (AI) applications rely on deep neural network (DNN) models for classification. For AI inference, a pre-trained DNN model processes an input data sample, such as raw sensing data, and generates a classification result as output. For an AI classification task, usually one DNN inference is performed based on a single data sample. However, the confidence level requirement of the AI task may not be satisfied by a single DNN inference result, due to limited information provided by a single data sample and randomness in the DNN inference result.
For one AI task, there can be multiple available data samples; and for each data sample, there can be multiple different DNN inference results if the data sample is processed by multiple different DNN models. Different data samples usually capture different spatial and temporal features of the same object or event under detection. Different DNN models provide different inference results with randomness for the same data sample. Thus, the DNN inference results corresponding to different data samples and different DNN models provide different confidence levels. To improve the confidence level for the AI task, a straightforward approach is to select the DNN inference result with the maximum confidence level and ignore other DNN inference results with lower confidence levels. If the confidence level requirement is not satisfied, more data samples may be requested and used to obtain more DNN inference results. However, this approach may lead to high latency if the required confidence level is high, and this may violate delay requirements. Additionally, it can be inefficient to completely ignore DNN inference results with lower confidence levels.
Moreover, existing DNN models involve trade-offs between confidence level and computing demand. Typically, a big DNN model can generate DNN inference results with higher confidence levels on average at the cost of more computing demand. Thus, these models are usually deployed at powerful edge or cloud servers in the network. A small DNN model may provide lower confidence level but with more computing efficiency (or lower computing cost), and may therefore be deployed at the network edge, closer to data sources for the AI task. These trade-offs may be especially felt or needed when multiple AI tasks share resources such as transmission and computing resources in a network. Additionally, some elements on the network may be energy-limited such as Internet-of-things (IoT) devices and are not suitable for performing computation-intense tasks.
Therefore, it may be desired to improve the confidence level and delay performance of AI inference with resource and energy efficiency.
This background information is provided to reveal information believed by the applicant to be of possible relevance to the present invention. No admission is necessarily intended, nor should be construed, that any of the preceding information constitutes prior art against the present invention.
An object of embodiments of the present invention is to systems and methods for artificial intelligence inference. For example artificial intelligence inference using both a fast DNN model and a full DNN model.
In accordance with an embodiment of the present disclosure, there is provided a method for cumulative deep neural network (DNN) inference. The method includes receiving, by a Type-D network element, fast DNN inference results for a first artificial intelligence (AI) task and receiving, by the Type-D network element, full DNN inference results for the first AI task. The method further includes obtaining, by the Type-D network element, a cumulative DNN inference result based on the fast DNN inference results and the full DNN inference results and obtaining, by the Type-D network element, a cumulative confidence level based on the fast DNN inference results and the full DNN inference results.
In some embodiments, receiving the full DNN inference results is responsive to an enhanced inference request. In some embodiments, the enhanced inference request is at least in part based on one or more of: dynamics of the cumulative confidence level, a caching status and a remaining time to a deadline associated with the first AI task. In some embodiments, the full DNN inference results are based on intermediate data, the intermediate data indicative of partial determination of the fast DNN inference results.
In accordance with an embodiment of the present disclosure, there is provided an apparatus for cumulative deep neural network (DNN) inference. The apparatus includes a processor, a network interface and a memory having stored thereon machine executable instructions. The instructions when executed by the processor configure the apparatus to receive fast DNN inference results for a first artificial intelligence (AI) task receive full DNN inference results for the first AI task. The instructions when executed by the processor further configure the apparatus to obtain a cumulative DNN inference result based on the fast DNN inference results and the full DNN inference results and obtain a cumulative confidence level based on the fast DNN inference results and the full DNN inference results.
In accordance with an embodiment of the present disclosure, there is provided a method for cumulative deep neural network (DNN) inference. The method includes transmitting, by a controller, one or more of a data request and an enhanced inference request, wherein the data request is for a first artificial intelligence (AI) task and wherein the enhanced inference request is for a full DNN inference result for the first AI task. The method further includes receiving, by the controller, a cumulative confidence level for a current DNN inference result and receiving, by the controller, task requirements for the first AI task.
In some embodiments, the method further includes determining, by the controller, acceptability of the cumulative DNN inference for the first AI task based at least in part on the task requirements and the cumulative confidence level. In some embodiments, the data request includes a request for one or more new data samples from a data source. In some embodiments, the data request includes a request of one or more new fast DNN inference results. In some embodiments, the enhanced inference request includes a request for one or more samples of intermediate data for determination of the full DNN inference result for the first AI task.
In some embodiments, the method further includes receiving, by the controller, the full DNN inference result and upon determination that the full DNN inference result is sufficient, transmitting, by the controller a notification to both a Type-B network element and a Type-D network element, this notification may be a sufficiency notification. In some embodiments, upon receipt of the notification, the Type-B network element will not perform or will cease performing a fast DNN inference (i.e. determining a fast DNN inference result). In some embodiments, upon receipt of the notification, the Type-D network element will not perform or will cease performing a cumulative DNN inference (i.e. determining a cumulative DNN inference result). In some embodiments, a notification indicating or instructing to cease the new fast DNN inference or the cumulative inference is respectively sent to the Type-B network element and the Type-D network element, and the Type-B network element and the Type-D network element according to the notification will respectively not perform or cease performing the fast DNN inference or the cumulative inference. In some embodiments, the task requirements include information indicative of one or more of a deadline and a confidence level. In some embodiments, the deadline includes a delay threshold and the confidence level includes a confidence level threshold. In some embodiments, the first AI task is completed upon the cumulative confidence level reaching the confidence level threshold. In some embodiments, the first AI task is completed with a satisfactory quality of service (QoS) upon the first AI task being completed by at least the delay threshold. In some embodiments, a delay violation occurs when the first AI task is completed after the delay threshold.
In accordance with an embodiment of the present disclosure, there is provided an apparatus for cumulative deep neural network (DNN) inference. The apparatus includes a processor, a network interface and a memory having stored thereon machine executable instructions. The instructions when executed by the processor configure the apparatus to transmit a data request for a first artificial intelligence (AI) task and transmit an enhanced inference request for a full DNN inference result for the first AI task. The instructions when executed by the processor further configure the apparatus to receive a cumulative confidence level for a current DNN inference result, receive task requirements for the first AI task and determine acceptability of the cumulative DNN inference for the first AI task based at least in part on the task requirements and the cumulative confidence level.
In accordance with an embodiment of the present disclosure, there is provided a method for cumulative deep neural network (DNN) inference. The method includes receiving, by a Type-B network element, a data sample for a first artificial intelligence (AI) task and upon determination of a fast DNN inference based on the new data sample, transmitting, by the Type B network element to a Type-D network element, the fast DNN inference result for the first AI task. The method further includes receiving, by the Type-B network element, an enhanced inference request, caching, by the Type-B network element, one or more samples of intermediate data, based on the enhance inference request, the intermediate data indicative of partial determination of the fast DNN inference results and transmitting, by the Type-B network element, the one or more samples of intermediate data.
In some embodiments, the Type-B network element transmits the one or more samples of intermediate data to a Type-C network element. In some embodiments, the method further includes receiving, by the Type-B network element, a data request, the data request indicative of one or more of: a request for one or more samples of intermediate data and a request for a new data sample. In some embodiments, the method further includes determining, by Type-B network element, a new fast DNN inference at least in part based on the new data sample and transmitting, by the Type B network element to the Type-D network element, the new fast DNN inference result for the first AI task. In some embodiments, upon receipt of the one or more samples of intermediate data, the Type-C network element is configured to generate a full DNN inference. In some embodiments, the Type-C network element is configured to transmit the full DNN inference to a Type-D network element, the Type-D network element configured to generate a cumulative DNN inference result at least in part based on the full DNN inference.
In accordance with an embodiment of the present disclosure, there is provided an apparatus for cumulative deep neural network (DNN) inference. The apparatus includes a processor, a network interface and a memory having stored thereon machine executable instructions. The instructions when executed by the processor configure the apparatus to receive a data sample for a first artificial intelligence (AI) task and upon determination of a fast DNN inference based on the new data sample, transmit the fast DNN inference result for the first AI task. The instructions when executed by the processor further configure the apparatus to receive network element, an enhanced inference request, cache one or more samples of intermediate data, based on the enhance inference request and transmit the one or more samples of intermediate data.
In accordance with an embodiment of the present disclosure, there is provided a system for cumulative deep neural network (DNN) inference. The system includes a controller, a Type-B network element and a Type-D network element, each of the controller, the Type-B network element and the Type-D network element having one or more associated processors and one or more associated memories stored machine readable instructions. Upon execution of the machine readable instructions by at least one of the one or more associated processors, the Type-B network element is configured to receive a new data sample for a first artificial intelligence (AI) task and upon determination of a fast DNN inference based on the new data sample, transmit to the Type-D network element, the fast DNN inference result for the first AI task. Upon execution of the machine readable instructions by at least one of the one or more associated processors, the Type-D network element is configured to receive the fast DNN inference results for a first artificial intelligence (AI) task, obtain a cumulative DNN inference result based on the fast DNN inference results and obtain a cumulative confidence level based on the fast DNN inference results. Upon execution of the machine readable instructions by at least one of the one or more associated processors, the controller is configured to transmit one or more of a data request and an enhanced inference request, wherein the data request is for a first artificial intelligence (AI) task and wherein the enhanced inference request is for a full DNN inference result for the first AI task and receive the cumulative confidence level for a current DNN inference result. Upon execution of the machine readable instructions by at least one of the one or more associated processors, the controller is further configured to receive task requirements for the first AI task and determine acceptability of the current cumulative DNN inference for the first AI task based at least in part on the task requirements and the cumulative confidence level.
In some embodiments, upon execution of the machine readable instructions by at least one of the one or more associated processors, the Type-B network element is further configured to receive the enhanced inference request, cache one or more samples of intermediate data, based on the enhance inference request, the intermediate data indicative of partial determination of the fast DNN inference results and transmit the one or more samples of intermediate data.
In some embodiments, the system further includes a Type-C network element having one or more associated processors and one or more associated memories stored machine readable instructions. Upon execution of the machine readable instructions by at least one of the one or more associated processors, the Type-C network element is configured to receive the one or more samples of intermediate data and based on the one or more samples of intermediate data, generate a full DNN inference and transmit the full DNN inference to the Type-D network element.
In some embodiments, upon execution of the machine readable instructions by at least one of the one or more associated processors, the Type-D network element is further configured to receive the full DNN inference results for the first AI task, obtain a cumulative DNN inference result based on the fast DNN inference results and the full DNN inference results and obtain a cumulative confidence level based on the fast DNN inference results and the full DNN inference results.
According to embodiments, there is provided a cumulative DNN inference scheme, which cumulatively combines multiple DNN inference results from different DNN models and generates a cumulative DNN inference result with improved confidence level. This can be provided by exploiting the information diversity of different DNN inference results, based on a non-parametric joint probability density function profiling of DNN inference results of different DNN models with a labelled training dataset.
According to embodiments, there is provided an adaptive control scheme for a cumulative DNN inference framework where a computation-efficient AI model deployment strategy with layer sharing between fast and full DNN models is employed for multiple AI tasks. With the adaptive selection between fast and full DNN inference for each AI task by a reinforcement learning (RL) agent with the consideration of dynamics in cumulative confidence level, caching status, and remaining time to a deadline associated with different AI tasks, the resource and energy efficiency may be maximized, and the total delay violation penalty may be minimized for the satisfaction of confidence level requirements of all AI tasks.
According to embodiments, there is provided an extra experience replay memory and a corresponding enabling mechanism in a deep Q leaning algorithm. The extra experience replay memory can store transitions in zero-penalty episodes and can improve the convergence for an RL problem with a special episode-level penalty which depends on all actions in the whole episode.
Embodiments have been described above in conjunction with aspects of the present invention upon which they can be implemented. Those skilled in the art will appreciate that embodiments may be implemented in conjunction with the aspect with which they are described but may also be implemented with other embodiments of that aspect. When embodiments are mutually exclusive, or are otherwise incompatible with each other, it will be apparent to those skilled in the art. Some embodiments may be described in relation to one aspect, but may also be applicable to other aspects, as will be apparent to those of skill in the art.
Further features and advantages of the present invention will become apparent from the following detailed description, taken in combination with the appended drawings, in which:
It will be noted that throughout the appended drawings, like features are identified by like reference numerals.
A deep neural network (DNN) may be used to classify an object as one of y labels or classes, such as values from 1 to K. The DNN can estimate the conditional probability based on a data sample x that the object is of class y, or P(y|x). This DNN inference may result in a predicted class probability vector {circumflex over (z)}={{circumflex over (z)}k, k=1, . . . , K} with {circumflex over (z)}k=P(y=k|x), with a confidence level of
or 1 minus normalized entropy. This approach may be used to perform one DNN inference based on a single data sample. However, a given task may have a confidence level requirement, which may not be satisfied by a single DNN inference result, due to an accuracy limit of DNN models and/or incomplete information provided by a single DNN sample. Moreover, it can also be difficult to balance between accuracy and computing overhead, e.g. meeting higher accuracy requirements while limiting computing demand/overhead.
An adaptive and cumulative DNN inference scheme can be used to generate more accurate classifications, including aggregating multiple DNN inference results to form a combined DNN inference with high (e.g., improved) confidence level. The scheme can place fast DNN functionality at network elements or network entities (“Type B network elements (or Type B network entities)”) which are at or closer to data sources (“Type A network elements (or Type A network entities)”), while maintaining more sophisticated enhanced DNN inference functionality at network elements or network entities (“Type C network elements (or Type C network entities”) which may be further from the data sources.
It will be understood that Type-A network element(s), Type-B network element(s), Type-C network element(s) and Type-D network element(s) as described in more detail elsewhere herein, for example as illustrated in
A network controller may run the scheme, sending data requests and enhanced inference requests given both network-level information (such as network resource availability) and application-level information (such as current cumulative confidence level, task confidence level requirement, and task completion time requirement). The network controller may use a reinforcement learning (RL) agent for decision making. A data request may be used to request one or more new data samples from one or more data sources and execute fast DNN inference at one or more Type B network elements which are associated with the requested data sample(s) to obtain one or more fast inference results. An enhanced inference request may trigger the execution of an enhanced DNN inference (or a full DNN inference) at a Type C network element, and the enhanced DNN inference may be executed based on cached intermediate data offloaded from a Type B network element, to obtain a new full inference result. A stochastic cumulative DNN inference scheme, e.g. running at the application layer, may provide the cumulative confidence level based on all fast and full inference results corresponding to the same AI task. According to embodiments, a full DNN inference can be considered to involve both local computing to generate intermediate data and edge computing for enhanced DNN inference based on this intermediate data.
Thus, one aspect of this disclosure describes a data-driven stochastic cumulative DNN inference scheme which statistically aggregates multiple DNN inference results to obtain a cumulative DNN inference result and provides an improved cumulative confidence level. Such a system may also include a control scheme for cumulative DNN inference, which can provide adaptive selection between a fast DNN inference, with low computing demand but low confidence level, and a full DNN inference, with high computing demand but high confidence level. This selection may be made to satisfy the confidence level requirements of multiple AI tasks and the selection may seek to maximize energy and resource efficiency and minimum delay violation.
The fast DNN model 120 may be deployed at multiple network entities in a network, such as at network entities which are positioned close to data sources that generate input data samples 102. The execution of the fast DNN model 120 may be referred to as a fast DNN inference, which generates a fast inference result 122 for each input data sample 102. For each execution of the fast DNN model 120, the output at the cut layer 104 can be referred to as intermediate data 106.
The full DNN model 110 may be partitioned into two parts by the cut layer 104. The layer(s) 108 before and at the cut layer 104 may be shared between the full DNN model 110 and the fast DNN model 120, while the layers 114 after the cut layer 104 may be used only by the full DNN model 110. As such, for the full DNN model 110 the layers 114 thereof after the cut layer 104 may be deployed at a network entity which is relatively further from data sources, such as at an access point (AP). For clarity on deployment in association with the full DNN model, the full DNN model includes two parts, namely the part before the cut layer and the part after the cut layer. Generally, either part can be deployed far from the data sources. However, the part after the cut layer, namely layers 114, can be further from the data sources when compared to the part, namely layer(s) 108 before the cut layer. The full DNN model 110 may be configured to receive data from multiple data sources. The intermediate data 106 at the cut layer 104 can be further processed by the layers 114 after the cut layer 104 at the AP to generate a full inference result 112, which can be referred to as enhanced DNN inference. It will be readily understood that the intermediate data 106 can be a combination of one or more pieces or samples of intermediate data, and as such, the further processing at layers 114 can be performed on one or more samples of the intermediate data 106.
As illustrated, some of the computation is shared between the full DNN model 110 and the fast DNN model 120. Specifically, both models 110, 120 share the layer(s) 108 before the cut layer 104 and both further include the cut layer 104. By sharing some computation between the fast DNN model 120 and the full DNN model 110, the computing demand for generating one fast DNN inference result and one full DNN inference result may be reduced when compared with an AI model deployment strategy without layer sharing between the fast and full DNN models 110, 120.
Generally, the full DNN model 110, in particular layers 114, may be implemented on a network entity in a network, such as an AP, which can be referred to as a Type C network element or Type C network entity. The fast DNN model 120 may be implemented on another network entity in the network, such as an Internet of Things (IoT) device like a smart camera, which can be referred to as a Type B network element or Type B network entity. Each of these network entities can be configured to run an AI task, such as a DNN-based classification task for AI inference, with multiple data samples generated by one or more data sources such as a data source within an IoT device. For example, a smart camera is an IoT device which may generate consecutive video frames and these video frames may be used for the classification of a moving object.
The IoT device or other network entity, which can be defined as a Type B network element, may support some local processing, sufficient to run the fast DNN model 120, but the operation thereof may be limited by one or more of computing resources- and energy. Meanwhile, a network entity, e.g., an AP, may have a higher computing capability, which can be defined as a Type C network element, e.g. due to an edge server or cloud server integrated within or co-located with the network entity. The network entity may serve some network entities (e.g. user devices) which initiate AI tasks, and the computing resources of the network entity may be shared by the multiple network entities. Each of these devices may be allocated with a virtual CPU at the network entity for AI processing. In some embodiments, the network entity and the other devices being served by the network entity, for example Type B network elements, may be connected to one another via a wireless network, such as an OFDMA network.
The framework includes a network entity which in this example has been illustrated as an AP 202 which has a controller 204 (or network controller) and a module for enhanced DNN inferences, such as enhanced DNN inferences for IoT device i 224. The AP 202 may be serving, and connected to one or more IoT devices, including IoT device i 210. It is to be readily understood that in this figure AP is being used as an example and should not be considered to be limiting. The network entity, which in this example has been illustrated as an AP, is configured to perform the particular actions discussed elsewhere herein in association with this example. Moreover, a network entity can be AP, a UE, a based station, a IoT device or other suitable network entity as would be readily understood.
The controller 204 at the AP 202 may be configured to make adaptive offloading decisions among multiple devices across consecutive time slots based on both network-level information (such as network resource availability including the transmission resource availability and the computing resource availability at the AP 202) and application-level information, until the confidence level requirements for the AI tasks of all the devices are satisfied. For example, each AI task may have certain requirements for completion time and for confidence level needed, and the controller 204 may be configured to choose whether to use enhanced DNN inference for device i 224 or a fast DNN inference 214, based on these trade-offs between timeliness of completion, confidence level, and computing resource availability.
For example, let ai(k) denote a nonnegative integer offloading decision for network entity i 210 at time slot k, which represents the number of pieces or portions of the intermediate data 226 to offload from network entity i 210 to the AP 202 during time slot k. It will be readily understood that the intermediate data 226 can be a combination of one or more pieces or samples of intermediate data, and as such, the offloading of the intermediate data can be envisioned as offloading one or more of the pieces or samples of the intermediate data.
If the offloading decision for network entity i 210 is not to offload at time slot k, i.e., ai(k)=0, no offloading takes place at network entity i 210, but a data request 206 for the network entity i 210 is initiated by the controller 204 at time slot k. The controller 204 notifies both the data source 212 and the fast DNN inference 214 module for network entity i 210 of the data request 206. Then, the data source 212 of network entity i 210 can provide a new data sample to the fast DNN inference 214 module for the network entity.
With a new data sample for network entity i 210 at time slot k, fast DNN inference 214 is executed by running the fast DNN model at network entity i 210 to obtain a new fast inference result 216 during time slot k. The new fast inference result 216 is then passed to an application-layer cumulative DNN inference module 220 for network entity i 210, which runs a stochastic DNN inference scheme.
A cache 218 can be placed at each network entity, including network entity i 210. For each execution of fast DNN inference 214 with a new data sample at network entity i 210, intermediate data 226, i.e., the layer output at the shared cut layer between the fast and full DNN models, is temporarily stored in the cache 218 of network entity i 210, and the caching state (i.e., cached intermediate data 226) at network entity i 210 is increased by one. Let qi(k) denote the caching state of network entity i 210 at the beginning of time slot k, which is initialized as qi(1)=0 at the beginning of the first time slot for the AI task of network entity i 210. Then, if ai(k)=0, we have qi(k+1)=qi(k)+1.
If the offloading decision for network entity i 210 is to offload at time slot k, i.e., ai(k)>0, an enhanced inference request 222 for the network entity is initiated by the controller 204 at time slot k. The controller 204 notifies both the enhanced DNN inference module 224 for network entity i 210 at the AP 202 and the cache 218 module at network entity i 210 of the enhanced inference request 222. Then, ai(k) intermediate data 226 is offloaded from the cache 218 of network entity i 210 to the AP 202, and processed with enhanced DNN inference 224 at the AP 202. Accordingly, the caching state at network entity i 210 is decreased by ai(k) at the beginning of time slot k+1, i.e., qi(k+1)=qi(k)−ai(k).
If intermediate data 226 is offloaded to the AP 202 from network entity i 210 during time slot k, i.e., ai(k)>0, the AP 202 executes enhanced DNN inference 224 for each portion of the offloaded intermediate data 226 and generates a corresponding number of full inference results 228 during time slot k. The new full inference results 228 can then be passed to the application-layer cumulative DNN inference module 220 for the network entity i 210.
For network entity i 210, the application-layer cumulative DNN inference module 220 receives one new fast inference result 216 during time slot k if ai(k)=0, or ai(k) full inference result(s) 228 during time slot k if ai(k)>0. The application-layer cumulative DNN inference module 220 aggregates the new inference results with the old ones received in previous time slots and updates a cumulative DNN inference result 230 for the AI task of network entity i 210, based on a proposed stochastic cumulative DNN inference scheme. The confidence level of the cumulative DNN inference result is referred to as the cumulative confidence level. Let ηi(k) denote the cumulative confidence level for the AI task of network entity i 210 at the beginning of time slot k, which is initialized as ηi(1)=0 at the beginning of the first time slot for the AI task. Based on the updated cumulative DNN inference result for the AI task of network entity i 210, a corresponding updated cumulative confidence level 232 can be calculated. At the end of time slot k, the controller 204 is informed of the updated cumulative confidence levels 232 for the AI tasks of all network entity, e.g., ηi(k+1) for the AI task of network entity i 210. The application layer 220 of the AI task of network entity i 210 also provides the task requirements 234 including confidence level requirement and the delay requirement to the controller 204 for initialization before the execution of AI tasks.
According to embodiments, the confidence level of a DNN inference result (predicted class probability vector) is further defined elsewhere herein, can has a value range between 0 and 1. A confidence level requirement or confidence threshold, ηT, can be a value between 0 and 1, which defines a threshold for the confidence level of the result. This can be the associated confidence with either a DNN inference result based on a single data sample or a cumulative DNN inference result based on multiple data samples for a classification task. It will be understood that these data samples can be considered to be samples of the intermediate data. If the value associated with the confidence threshold is larger, the confidence level threshold requirement is more stringent.
According to embodiments, with the cumulative DNN inference scheme, the confidence level of the cumulative DNN inference result, which may also be referred to as a cumulative confidence level, for the classification task gradually increases, with fluctuations, by combining more DNN inference results over time. The increase in the cumulative confidence level can continue to increase until it reaches the confidence level threshold, namely a confidence level requirement, at which point the classification task is completed.
According to embodiments, the delay requirement is a value, which defines a delay threshold for the classification task. If the confidence level threshold is satisfied before or at a delay threshold, the classification task is considered to be successful with a satisfactory quality of service (QoS). Otherwise, there is a delay violation penalty applied to the corresponding network entity which initiated the classification task. An example of a delay requirement can be 100 ms, is or other time period which may be determined based on the application layer's requirement.
Aspects of this disclosure can improve the confidence level and delay performance of AI inference with energy and resource efficiency, using the following design elements. Each design element is described in more detail below.
Stochastic cumulative DNN inference scheme: For the AI task of each network entity, multiple fast and full inference results based on different data samples can be used by the application layer. A stochastic cumulative DNN inference scheme can aggregate multiple DNN inference results, calculate a cumulative DNN inference result, and update a cumulative confidence level. By aggregating more DNN inference results, the cumulative confidence level may be improved.
An adaptive control scheme for cumulative DNN inference will be further described. The controller 204 can adaptively decide when to request new data samples for fast DNN inference 214 at the network entity, and how to offload the intermediate data 226 from caches at the network entity to the AP 202 for enhanced DNN inference. These decisions may consider dynamics in current cumulative confidence level, caching status, and remaining time to deadline for the AI task of each network entity over time. Specifically, the controller 204 can periodically make offloading decisions for the AI tasks of multiple network entity, which can be interpreted as either data requests or enhanced inference requests, depending on the value of offloading decisions.
If the offloading decision for an AI task at a time slot is not to offload, a data request can be sent from the network controller to data sources (Type-A network elements or network entities) of the AI task. The network controller can also notify a Type-B network element of the data request. Then, the data sources send a new data sample to the fast DNN inference module at the notified Type-B network element.
If the offloading decision for an AI task at a time slot is to offload, an enhanced inference request may be sent from the network controller to both a Type-B network element and a Type-C network element for the AI task, to request one or more pieces or samples of the intermediate data stored in the cache of the Type-B network element to be offloaded to the Type-C network element.
A deep Q learning algorithm with extra experience replay will be further described. A modified deep Q learning algorithm with extra experience replay may be used to determine when to adaptively offload AI tasks. Besides an ordinary experience replay which stores each transition over time, the extra experience replay may be configured to store transitions in episodes with no delay violation penalty for all AI tasks of different devices, which can help the learning agent to learn more from these good and rare transitions and converges to desired solution with minimum delay violation penalty.
In some embodiments, the controller 204 is configured to determine whether a cumulative DNN inference is to be performed. For example, if the controller 204 determines that a full DNN inference is sufficient (i.e. meets the confidence level requirement of the AI task), for example based on task requirement and/or confidence level requirement, the controller 204 may inform, by for example transmitting a notification, the network element i 210 and the application-layer cumulative DNN inference module 220 of the sufficiency of the full DNN inference. For example, this notification may in some instances be a sufficiency notification. Upon receipt of the notification, the network element i 210 will not perform or will cease performing a fast DNN inference (i.e. determine a fast DNN inference result). In addition, upon receipt of the notification, the application-layer cumulative DNN inference module 220 will not perform or will cease performing a cumulative inference. It will be readily understood that the notification sent to the network element i 210 may be the same as or similar to, or may be different from the notification sent to the application-layer cumulative DNN inference module 220 in configuration and/or information therein, and that each of these notifications will have information or instructions which is suitable for and understandable by the respective network element to which it is transmitted and received thereby. In some embodiments, a notification indicating or instructing to cease the fast DNN inference or the cumulative inference is respectively sent to the network element i 210 and the application-layer cumulative DNN inference module 220, and the network element i 210 and the application-layer cumulative DNN inference module 220 according to the notification will respectively not perform or cease performing the fast DNN inference or the cumulative inference. It will be readily understood that the notification sent to the network element i 210 may be the same as or similar to, or may be different from the notification sent to the application-layer cumulative DNN inference module 220 in configuration and/or information therein, and that each of these notifications will have information or instructions which is suitable for and understandable by the respective network element to which it is transmitted and received thereby.
A general scenario can include one network controller 302 and multiple AI tasks 304. For each AI task 304, there may be one or more data sources which provide data samples for AI inference, such as the Type A network elements for AI task i 306. Each AI task 304 may also include one or more Type B network elements 308 close to data sources but with limited computing resources. The Type B network elements 308 may provide fast DNN inference functionality and caching functionality, as described above.
The network can also include a Type-C network element 310 farther from the data sources but with abundant computing resources, which can be shared among multiple AI tasks, such as AI tasks 304, and can provide enhanced DNN inference functionality. Moreover, the application layer 312 for an AI task can be placed at another network element other than the Type-B 308 or Type-C 310 network elements, which is referred to as Type-D network element 314. The Type-D network element 314 includes a cumulative DNN inference module 316, which is the same as the cumulative DNN inference module 216 and supports cumulative DNN inference for an AI task, and the fast inference results 318 (same as the fast inference results 216) or full inference results 320 (same as the full inference results 228) for the AI task should be transmitted to the corresponding Type-D network element 314. In some embodiments, the Type-D network element 314 is split into two network entities, e.g. a control plane entity and a data plane entity. The data plane entity includes the cumulative DNN inference module 316 and receives the fast inference results 318 and the full inference results 320; the control plane entity provides the task requirements 326 and the cumulative confidence level 328 to the network controller 302. This may be considered to be similar to the task requirements 234 and the cumulative confidence level update 232 as defined in
As discussed in further detail elsewhere herein according to embodiments, the cumulative DNN inference scheme, the confidence level of the cumulative DNN inference result, which may also be referred to as a cumulative confidence level, for the classification task gradually increases, with fluctuations, by combining more DNN inference results over time. The increase in the cumulative confidence level can continue to increase until it reaches a threshold, namely a confidence level requirement, at which point the classification task is completed, namely the task requirements are satisfied.
The illustrated adaptive control framework 300 may be used for cumulative DNN inference of multiple AI tasks in general application scenarios. The framework 300 includes interactions among the network controller 302 and different types of network elements 306, 308, 310, 314 for AI task i, and simplifies interactions for other AI tasks. For each AI task, there can be multiple Type-A network elements 306 and Type-B network elements 308 and there can be one Type-D network element 314, which may be different from the corresponding network elements for other AI tasks. The framework 300 can also include a Type-C network element 310, which can be shared by multiple AI tasks 304, where the computing resources for enhanced DNN inference are shared among multiple AI tasks. Several potential specific scenarios are described herein, which are simplified example scenarios for illustrative purposes under the general framework 300 described here.
According to embodiments, the network controller 302 can transmit a data request 340 to a Type B network element 308 and to a Type A network element 306. This data request 340 may be considered to be the same or similar to the data request 206 as illustrated in
According to embodiments, the network controller 302 can transmit an enhanced inference request 344 to a Type B network element 308 and to a Type C network element 310. This enhanced inference request 344 may be considered to be the same or similar to the enhanced inference request 222 as illustrated in
According to embodiments, the Type B network element can transmit one or more sample of intermediate data 342 to a Type C network element 310 wherein these one or more samples of intermediate data are provided in order for the Type C network element to determine an enhanced DNN inference. The one or more samples of intermediate 342 may be considered to be the same or similar to the one or more samples of intermediate data 226 as illustrated in
In some embodiments, the controller is configured to determine whether a cumulative DNN inference is to be performed. For example, if the network controller 302 determines that a full DNN inference is sufficient (i.e. meets the confidence level requirement of the AI task), for example based on task requirement and/or confidence level requirement, the network controller 302 may inform, by for example transmitting a notification, the Type-B network element 308 and the Type-D network element 314 of the sufficiency of the full DNN inference. For example, in some instances this notification may be considered as a sufficiency notification. Upon receipt of the notification, the Type B network element 308 will not perform or will cease performing a fast DNN inference (i.e. determining a fast DNN inference result). In addition, upon receipt of the notification, the Type D network element 314 will not perform or will cease performing a cumulative inference (i.e. determining a cumulative inference result). In some embodiments, a notification indicating or instructing to cease the fast DNN inference or the cumulative inference is respectively sent to the Type-B network element 308 and the Type-D network element 314, and the Type-B network element 308 and the Type-D network element 314 according to the notification will respectively not perform or cease performing the fast DNN inference or the cumulative inference. It will be readily understood that the notification sent to the Type-B network element 308 may be the same as or similar to, or different from the notification sent to and the Type-D network element 314 in configuration and/or information therein, and that each of these notifications will have information or instructions which is suitable for and understandable by the respective network element to which it is transmitted and received thereby.
For example, in order to provide a level of continuity between
For further continuity between
According to embodiments, having further regard to
According to embodiments and having regard to
As described, a data-driven stochastic cumulative DNN inference scheme may be used to aggregate the contributions of multiple DNN inference results based on different data samples and different DNN models. The scheme may form a cumulative DNN inference result with potentially improved confidence level and this result can be updated with more aggregated DNN inference results, as those results become available.
The cumulative DNN inference scheme can combine data from multiple DNN inference results. For example, consider J DNN inference results based on either fast or full DNN inference for an M-class classification task. The true class label for the classification task may be unknown. Let zj={zj,m, 1≤m≤M} denote the j-th (1≤j≤J) DNN inference result, which is an M-dimension predicted class probability vector. Let binary parameter χj indicate whether zj is generated by fast or full DNN inference, with χj=1 indicating full DNN inference, and χj=0 otherwise.
Each of the DNN inference results can be assumed to be conditionally independent given the same unknown true class label. For example, with the same unknown true class label, one DNN model may generate conditional independent DNN inference results for different data samples, and different DNN models may generate conditional independent DNN inference results for the same data sample.
Let Z={z1, . . . , zj} denote the set of DNN inference results up to the j-th DNN inference result. The cumulative DNN inference result, given DNN inference result set Zj, may be defined as an M-dimension predicted class probability vector, denoted by of oj={oj,m, 1≤m≤M}, with oj,m=Pr(Y=m|Zj) representing the predicted conditional probability of class m given DNN inference result set Zj. Based on Bayes' theorem and the conditional independence assumption, oj,m is written as:
where Pr(Y=m) represents the prior class distribution, and Pr(zj′|Y=m) represents the conditional joint probability density of the j′-th DNN inference result (i.e., predicted class probability vector zj′) given true class label Y=m. This formula contains:
where fmA(zj′) denotes the conditional joint probability density of zj′ given true class label Y=m if zj′ is a fast DNN inference result, and fmU(zj′) denotes the conditional joint probability density of zj′ given true class label Y=m if zj′ is a full DNN inference result.
For the cumulative DNN inference result, of ={oj,m, 1≤m≤M}, a cumulative confidence level may be defined as one minus normalized entropy, as given by:
Prior to executing a stochastic DNN inference scheme for multiple fast and full DNN inference results, the following initialization steps may be performed.
First, for a training dataset with known class labels Y, the prior class distribution Pr(Y=m) may be estimated for any class m(1≤m≤M).
Next, the training data set may be split into M class-specific training data subsets according to known class labels Y. With each class-specific training data subset, a subset of fast DNN inference results may be collected along with a subset of full DNN inference results. These may be collected by running the fast and full DNN models for each training data respectively.
Finally, we may profile the conditional joint probability density functions (PDF) of fast and full DNN inference results for each class m with the corresponding subset of DNN inference results, i.e., fmA(z) and fmU(z) for class m, using non-parametric probability density estimation methods such as Kernel density estimation.
These initialization steps may be used prior to a cumulative DNN inference scheme which gradually aggregates J fast or full inference results and updates both a cumulative DNN inference result and a cumulative confidence level at each step j as further discussed in the following steps.
This scheme 700 may be used after the initialization steps described above. At block 702, the cumulative DNN inference scheme 700 includes inputting prior class distribution and the profiled PDF functions of any class for both the fast and the full DNN models.
At block 704, the cumulative DNN inference scheme 700 includes initializing scalar sm=Pr(Y=m) for each class m and initializing j=1.
At block 706, the cumulative DNN inference scheme 700 includes calculating conditional joint probability density Pr(zj|Y=m), the conditional joint probability density of the j-th DNN inference result zj, for each class m. Depending on whether zj is a fast DNN inference result or a full DNN inference result, this may use either fmA(z) or fmU(z) as the PDF function for class m. Specifically, this may calculate Pr(zj|Y=m)=(1−χj)fmA(zj)+χjfmU(zj) for class m, where binary parameter χj indicates whether the result is fast or full, as described above.
At block 708, the cumulative DNN inference scheme 700 includes updating scalar sm=smPr(zj|Y=m) for each class m.
At block 710, the cumulative DNN inference scheme 700 includes obtaining cumulative DNN inference result given Zj, i.e., oj={oj,m, ∀m} where
At block 712, the cumulative DNN inference scheme 700 includes obtaining cumulative confidence level given Zj as
At block 714, the cumulative DNN inference scheme 700 includes checking whether j<J. If so, at block 716, the cumulative DNN inference scheme 700 includes increasing j by 1, and repeating blocks 706, 708, 710, 712, and 714. Otherwise, if j=J, the cumulative DNN inference scheme 700 ends at block 718.
According to some embodiments, the cumulative DNN inference scheme can improve the confidence level for AI classification tasks by aggregating multiple inference result and it can be robust to non-frequent false inference especially when the number of aggregated inference results is larger. The confidence level metric can evaluate the uncertainty or information entropy in a DNN inference result. A larger confidence level can be considered to have a lower uncertainty (less information entropy) in the predicted class probability vector. As such the accuracy of AI classification, which evaluates the average percentage of correct classification, can be improved by the cumulative DNN inference scheme, as the uncertainty in the prediction for the true class can be reduced by improving the confidence level of cumulative inference result.
In the considered device-edge co-inference framework with cumulative DNN inference for multiple network entities or network elements, each initiating an AI task, the update of cumulative confidence levels during time slot k depends on the offloading decisions during the time slot. Specifically, the cumulative confidence level of network entity i at the beginning of time slot k+1, denoted as ηi(k+1), is updated based on the proposed cumulative DNN inference scheme by aggregating either one new fast inference result for ai(k)=0 or a number of ai(k) new full inference results for ai(k)>0 with all the past inference results at device i from the start of the AI task.
An adaptive control scheme may be used with cumulative DNN inference of multiple AI tasks. The adaptive control scheme may seek to improve confidence levels and reduce delays for AI tasks, while improving both energy and network resource efficiency.
For example, consider that each network entity in a network initiates an AI classification task at the beginning of time slot k=1, with delay requirement Ki in number of times slots for network entity i. If the confidence level requirement, ηT, is satisfied at or before time slot Ki, the task of network entity i is successfully finished and the quality-of-service (QoS) requirement is satisfied. Otherwise, the cumulative DNN inference continues for network entity i until the confidence level is satisfied, in which case a delay violation penalty may be applied to the network entity, as defined as follows.
As discussed in further detail elsewhere herein according to embodiments, the cumulative DNN inference scheme, the confidence level of the cumulative DNN inference result, which may also be referred to as a cumulative confidence level, for the classification task gradually increases, with fluctuations, by combining more DNN inference results over time. The increase in the cumulative confidence level can continue to increase until it reaches a threshold, namely a confidence level requirement, at which point the classification task is completed. Having regard to
As discussed in further detail elsewhere herein, according to embodiments, the delay requirement is a value, which defines a delay threshold for the classification task. If the confidence level requirement is satisfied before or at a delay threshold, the classification task is considered to be successful with a satisfactory QoS. Otherwise, there is a delay violation penalty applied to the corresponding network entity which initiated the classification task. An example of a delay requirement can be 100 ms, is or other time period which may be determined based on the application layer's requirement. Having regard to
Let Pi(k) denote the delay violation penalty of network entity i at the end of time slot k. The penalty Pi(k) is zero for 1≤k<Ki, as the deadline for network element i has not been reached. For k≥Ki, if the current cumulative confidence level does not reach the required confidence level threshold ηT, such that ηi(k)<ηT for network element i, the penalty Pi(k) may increase linearly with the number of time slots behind deadline. For example, P may be a constant denoting the unit penalty for each time slot with delay violation. Thus, the delay violation penalty may be calculated as:
The delay violation penalty Pi(k) of network element i for k≥Ki depends on all the offloading decisions from time slot 1 to time slot k, as the sequence of offloading decisions determines the total number of fast and full DNN inference results obtained for network entity i until time slot k. To improve the confidence level performance within given delay requirement and reduce the delay violation penalty, it may be preferable to execute full DNN inference rather than fast DNN inference, i.e., offloading is preferred than local computing for QoS improvement, as full DNN inference provides higher confidence level gain on average. However, as an example offloading may lead to more network resource consumption in terms of transmission and edge computing. Moreover, also as an example, local energy consumption should also be considered, as some IoT devices may be battery powered and thus energy limited. Also, there are potential trade-offs between local energy consumption and network resource consumption. As the intermediate data size is usually small, the local transmission energy for offloading one intermediate data sample to obtain one full inference result is usually smaller than the local computing energy for fast DNN inference. The network resource consumption cost and energy consumption cost are formally defined as follows.
The adaptive control scheme may seek to measure and limit network resource consumption cost. Let βi(k) denote the fraction of uplink transmission resource usage for offloading ai(k) intermediate data samples from network element i to a Type C network element. Let (k) denote the fraction of edge computing resource usage at the Type C network element for enhanced DNN inference of the ai(k) offloaded intermediate data samples from network element i. Let ρ(k) denote the network resource consumption cost during slot k, which is the maximum between the total fraction of uplink transmission resource usage,
βi(k), and the total fraction of edge computing resource usage,
(k), for all devices in set
during time slot k.
The adaptive control scheme may seek to measure and limit energy resource consumption cost. Let ei(k) denote the energy consumption at network element i during time slot k, which is either the transmission energy for offloading ai(k) intermediate data samples from network element i to the Type C network element, or the computing energy for one fast DNN inference at network element i. The total energy consumption cost at all network elements in set during time slot k is e(k)=
ei(k).
The adaptive control scheme may seek to characterize the trade-off between local energy consumption and network resource consumption. For example, this cost may be denoted by c(k) as a linearly weighted summation of the total local energy consumption cost and the network resource consumption cost during slot k, given by
with weighting factor ωi∈ (0,1).
According to embodiments, the adaptive control scheme can be executed by the controller 204 in
The adaptive control scheme may look to trade-off between using less local energy but more network resources to offload an intermediate data and obtaining a full inference result with higher confidence level, or using more local energy but no network resources to process a new data sample and obtaining a fast inference result with lower confidence level.
Therefore, the adaptive control scheme may be configured to adaptively make offloading decisions for devices with efficient resource allocation among devices. The scheme may seek to minimize the long-run total cost in terms of network resource and local energy consumption and the total delay violation penalty until all the tasks are finished with confidence level satisfaction.
To support the offloading decisions for the devices during time slot k, i.e., ak={ai(k), ∀i∈}, the uplink transmission resources between the devices and the AP and the edge computing resources at the AP may be allocated among the network elements in set
, to ensure that the ai(k) intermediate data samples can be transmitted from network element i to the Type C network element and finish the enhanced DNN inference at the Type-C network element within time slot duration r under the resource capacity constraints if ai(k)>0, with the minimum cost in term of energy consumption at the devices and network resource consumption. An optimal resource allocation can be obtained using traditional optimization techniques. The details of the resource allocation optimization problem are neglected. Let c*(k) denote the minimal cost with optimal resource allocation given offloading decision vector ak={ai(k), ∀i∈
in time slot k. The sequence of offloading decisions over consecutive time slots can be made using a Markov decision process for adaptive offloading decision.
To minimize the total cost and total delay violation penalty in the long run, the adaptive control scheme may adaptively determine the offloading decisions during the cumulative DNN inference for the AI tasks of multiple network elements. These adaptive offloading decisions may be formulated as a Markov decision process. The state sk, action ak, and reward rk in the Markov decision process are formally defined as follows.
For time slot k, the adaptive control scheme may be configured to consider the current caching state at each device, q(k)={qi(k), ∀i∈, as the number of samples of intermediate data offloaded from a network element should not exceed the number of samples of intermediate data currently stored in the local cache. The adaptive control scheme may also consider the current cumulative confidence level at each network element, η(k)={ηi(k), ∀i∈
}, and the current time slot index, k. Given the delay requirement Ki for device i, the remaining number of time slots before deadline is known at time slot k. It may be more beneficial to offload more intermediate data from a network element whose current cumulative confidence level is low and remaining time to deadline is short, to reduce the potential delay violation penalty. Hence, the state for time slot k, denoted by sk, can be composed of three parts: caching state q(k), current cumulative confidence levels η(k), and current time slot index k, represented as sk=[q(k), η(k), k]. At the beginning of an episode, the state can be initialized as s1=[q(1), η(1), 1]=[0, 0, 1]. At the end of time slot k, the state can then be updated as sk+1=[q(k+1), η(k+1), k+1]. Both the caching state and the time slot index can be updated inside the network controller, while the cumulative confidence levels can be updated from the application-layer cumulative DNN inference modules for each network element.
The action at time slot k is the offloading decision vector ak={ai(k), ∀i∈}. Let
denote the action space, which corresponds to a set of feasible offloading decisions under network resource availability. The adaptive control scheme may predetermine the action space by checking the feasibility of a resource allocation optimization problem given each candidate offloading action.
For adaptive offloading in cumulative DNN inference, the adaptive control scheme may be configured to jointly consider the cost and QoS performance. Let rk denote the reward during slot k, which incorporates both minimal cost c*(k) with optimal resource allocation and delay violation penalty Pi(k), given by
where ω2 is a positive weighting factor. In the expression of rk, the adaptive control scheme uses an exponential function to increase the cost gaps among different offloading decisions and make reward rk more sensitive to offloading decision.
According to embodiments, the adaptive control framework for cumulative DNN inference of multiple AI tasks can substantially maximize the energy and resource efficiency with a substantially minimum delay violation penalty for the cumulative confidence level satisfaction of all AI tasks. As the network resources can be shared among multiple AI tasks, the selection between fast and full DNN inference and the number of samples of intermediate data offloaded can be adaptively determined for each AI task, while including the consideration of dynamics in the current cumulative confidence levels, the caching state, and the remaining time to the deadline for different AI tasks. The AI model deployment with layer sharing between the fast and full DNN models can enable the reuse of intermediate data of the fast DNN inference for generating a new full inference result. This may improve the computation efficiency for obtaining full inference results. Hence, the resource efficiency of the cumulative DNN inference for AI tasks can be further enhanced by using the computation-efficient AI model deployment strategy.
According to embodiments, there is provided a deep Q learning algorithm with extra experience replay. The Markov decision process for adaptive offloading decision can be solved using a reinforcement learning (RL) approach. The goal is to find a policy, π(s), mapping a state to an action, to maximize the expected cumulative discounted reward (Σk=1Kγkrk) where
denotes expectation, K is the maximum number of time slots in an episode, and γ∈(0,1) is the discount factor. As the offloading actions are discrete, a modified deep Q learning algorithm based on the basic deep Q learning algorithm can be used to solve the Markov decision process. In deep Q learning, a state-action value function (i.e., Q function) can be defined as:
Having regard to
Having regard to episodes, consider that an RL agent interacts 800 with the intelligent IoT environment 802 with the device-edge co-inference framework for cumulative DNN inference of multiple devices in a sequence of episodes. Each episode contains a finite and variable number of learning steps, wherein there can be one learning step for one time slot. An episode starts when the devices initiate a new group of AI tasks whose confidence levels are initialized as 0 and ends when the last device finishes its task with confidence level satisfaction. At the beginning of a new episode, the time slot index k is initialized to 1.
Having regard to the interaction between the RL agent 800 and the intelligent IoT environment 802, within an episode, the RL agent observes state sk 804 and takes action ak 806 at the beginning of each time slot k. The deep Q learning uses an ε-greedy policy 808 for action selection for exploitation, with ε representing the exploration probability. With probability 1-ε, the action with the maximum Q value at state sk is selected, i.e.,
with probability ε, a random action is selected. At the end of time slot k, the RL agent receives reward rk from the intelligent IoT environment 802, and transits to new state sk+1.
Having regard to the done signal, for example uk can be defined as a binary flag indicating if time slot k is the last time slot in the corresponding episode. If uk=1, the episode terminates at time slot k, and a done signal (uk (done) 810) is generated by the intelligent IoT environment 802. As previously discussed, an episode terminates if all the tasks of different devices are finished with confidence level satisfaction. The number of time slots (K) in an episode can be smaller than
if all tasks are finished before the required deadlines, in which case there is no delay violation penalty in the episode. It can also be larger than
when there is delay violation penalty. Hence, K is a variable which may take different values in different episodes.
Having regard to the evaluation and target deep Q networks (DQNs), the deep Q-learning can adopt two deep Q networks (DQNs) with the same neural network structure as Q function approximators, i.e., evaluation DQN with weights θ 812 and target DQN with slowly updated weights {circumflex over (θ)} 814. Every Kθ learning steps, {circumflex over (θ)} is replaced by θ. The approximated Q functions by the evaluation and target DQNs are represented as Q(sk, ak; θ) and {circumflex over (Q)}(sk, ak; {circumflex over (θ)}), respectively.
Having regard to learning from transitions in experience replay, at the end of time slot k, a new transition (sk, ak, rk, sk+1, uk) is added to a replay memory in the deep Q learning algorithm. Here, we refer to such a replay memory being updated per learning step as the ordinary experience replay 816. Traditionally, at each learning step k, an evaluation DQN with weights θ is trained with a mini batch of N transitions (also referred to as experiences) sampled from the ordinary replay memory. The n-th sampled experience is (sn, an, rn, sn+1, un). The evaluation DQN is trained by minimizing a loss function, defined as follows:
for all the sampled N transitions through gradient descent on θ, where yn is a target value estimated by target DQN, which can be defined as follows:
A gradient descent on θ can be performed as follows:
Having regard to the episode-level penalty and episodic total penalty flag, the delay violation penalty Pi(k) for device i always equals zero before deadline Ki. Only for k≥Ki close to the end of an episode, Pi(k) may have positive values. However, Pi(k) for k≥Ki depends on all transitions from the beginning of the current episode (i.e., time slot 1) to time slot k. A penalty with such a property can be defined as an episode-level penalty. To indicate whether the QoS requirements of the AI tasks of all devices are satisfied or not, there is defined an episodic total penalty flag 818, which is set to 0 if all the transitions in an episode have no delay violation penalty and set to 1 otherwise. The episodic total penalty flag is set by the environment at the end of an episode.
Having regard to the extra experience replay, an episode with QoS satisfaction for all devices (i.e., with zero episodic total penalty flag) can be rare, especially at the early learning stage. Consequently, the sampling frequency for transitions in such zero-penalty episodes from the ordinary experience replay can be low, especially if the replay memory capacity is large. However, these rare transitions can be good transitions which can help the RL agent to learn how to satisfy the confidence level requirements without a delay violation penalty. To increase the sampling frequency for such good transitions and deal with the episode-level penalty, there is provided an extra replay memory 820. Specifically, all the transitions in a whole episode can be stored in an extra replay memory if the episodic total penalty flag is zero.
Having regard to the temporary memory 822, the storage mechanism for the extra replay memory 820 can be enabled by a temporary memory 822 which stores the transitions at each learning step and empties out before each new episode. If the episodic total penalty flag is zero at the end of an episode, all the transitions in the temporary memory 822 are popped out and stored in the extra replay memory 820. Otherwise, all the transitions in the temporary memory 822 are discarded.
Having regard to learning from transitions in both ordinary and extra experience replays, with the extra experience replay memory 820, a mini-batch of N experiences 824 are sampled from the ordinary replay memory 816 and another mini-batch of N experiences 826 are sampled from the extra experience replay memory 820 at each learning step. The evaluation DQN is trained twice at each learning step, first trained with the N sampled experiences 824 from the ordinary replay memory 816, and then trained with the N sampled experiences 826 from the extra experience replay memory 820.
At block 902 initialization occurs, wherein {circumflex over (θ)} and θ are initialized for the target DQN and the evaluation DQN respectively. At block 904 a new episode begins with the initialization of the state as s1 and the done signal is set to zero and k is set to 1. As block 906 for learning step k, sk is observed and action ak is selected according to an ε-greedy policy. At block 908 action ak is executed a reward rk is collected. The transition to the next state sk+1 occurs together with the determination of uk (done) signal. At block 910 transition (sk, ak, rk, sk+1, uk) is stored in the ordinary experience replay memory and the temporary memory.
At block 912 a random mini-batch of N transitions (sn, an, rn, sn+1, un) is sampled from the ordinary experience replay memory and at block 914 a gradient descent on step θ is performed. At block 916 a random mini-batch of N transitions (sn, an, rn, sn+1, un) is sampled from the extra experience replay memory and at block 918 a gradient descent on step θ is performed. At block 920 for every Kθ steps, {circumflex over (θ)} is set equal to θ. At decision 922, if the uk (done) signal is equal to 1, subsequent decision 924 is made to determine if it is the last episode. If decision block 924 is yes the process ends, however if decision block 924 is no, at decision 926 if the episode total penalty is zero, the process moves to block 928 where all transitions in the temporary memory are popped out to the extra experience replay memory and at block 930 the temporary memory is emptied. The process then moves to block 904. However, if decision 922 is no, k is set to k+1 and the process moves to block 906.
According to embodiments, an episode with QoS satisfaction for all devices (i.e., with all transitions in the episode having no penalty) can be rare, especially at the early learning stage. Given extra experience replay can store the transitions in episodes with no penalty, the sampling frequency for such good transitions can be improved, and the RL agent has more opportunities to learn from these good transitions, and this may help the RL agent to converge towards a good solution with negligible delay violation penalty.
According to embodiments a simulation according to embodiments of the instant disclosure is performed, wherein the simulation setup is considered where an edge-assisted intelligent IoT scenario has three intelligent IoT devices under the coverage of one AP. The AP is co-located with an edge server. The system parameters are given in TABLE 1. It is assumed that each of the devices have identical noise power, transmit power, uplink channel gain, computing capability, and energy efficiency.
For this simulation, a video classification application scenario is considered, where an AI classification task is to classify a moving object under the surveillance of the smart camera. A typical video dataset UCF101 which has been integrated in Tensorflow has been considered. The video dataset contains videos capturing moving objects belonging to 101 different classes. Five classes of video data are selected among all the 101 classes, and the 5-class small video dataset are denoted as UCF5. For each video in the UCF5 dataset, multiple consecutive frames are extracted with a frame sampling rate equal to 5 frames per second (fps). Hence, corresponding to the UCF5 video dataset, there is obtained a 5-class image dataset including all the extracted video frames. Then, for each AI classification task, there are multiple available data samples, which correspond to all the extracted frames of a randomly selected video belonging to an unknown class in the UCF5 video dataset.
The fast DNN model 1004 and the full DNN model 1002 which share the first few layers is illustrated in
Given the DNN layer parameters, both the communication and computing resource demands for DNN inference can be determined. With the simulation parameters as defined in TABLE 1, the time slot length is set equal to the local computing delay for one fast DNN inference as τ=0.288 s. At most two intermediate data samples can be offloaded to the edge server and finishing of the enhanced DNN inference during one time slot. The action space for the deep Q learning algorithm includes 10 discrete offloading actions, i.e., (0, 0, 0), (0, 0, 1), (0, 0, 2), (0, 1, 0), (0, 1, 1), (0, 2, 0), (1, 0, 0), (1, 0, 1), (1, 1, 0), (2, 0, 0). With the small action space, the minimal cost for each candidate offloading action can be pre-calculated by solving a resource allocation optimization problem. Then, the minimal costs can be used in the reward calculation at each learning step in the deep Q-learning algorithm for adaptive offloading decision. The evaluation and target deep Q networks both have three hidden layers with (128, 64, 32) neurons between the input and output layers. The activation function for each hidden layer is Relu. Other learning parameters are summarized in TABLE 2. The weighting factor ω2=30 and unit penalty P=400 in the reward function are set.
With the trained fast and full DNN models, two sets of DNN inference results are obtained with the 5-class image dataset extracted from the UCF5 video dataset, which include fast and full inference results, respectively. With known class labels for each image in the training dataset, the joint probability density functions (pdfs) of fast and full DNN inference results are profiled and given each true class label m, i.e., fmA(z) and fmU(z). The kernel density estimation method in Matlab to profile the pdf functions can be used. Subsequently the cumulative DNN inference scheme according to embodiments can be performed.
For this example, there are approximately 600 videos in the UCF5 video dataset. For each video, J=50 video frames are randomly selected as available data samples for the cumulative DNN inference. As different data samples with the same true class label generate conditionally independent DNN inference results, the J data samples are reordered for each video by 100 times, to create 100 different sequences of data samples based on each video. A sequence of data samples can be referred to as a data trace. Each data trace corresponds to an AI classification task. As such, 60000 AI classification tasks with different data traces for cumulative DNN inference can be simulated. It is noted that the video frames are not disordered for cumulative DNN inference in a real intelligent IoT scenario. In this example, the video frames are disordered in order to simulate more data traces.
Cumulative confidence level: For this example, the cumulative confidence level can be determined and the relationship between the cumulative confidence level and the number of data samples is evaluated. The experiments for full and fast DNN inference are performed separately. For example, in the experiments with full DNN inference, all the J data samples in each data trace are processed by the full DNN model, and the corresponding J full inference results are aggregated based on the cumulative DNN inference scheme.
In addition to the confidence level performance metric, an accuracy performance metric is determined for the AI classification tasks, with the cumulative DNN inference scheme. During the AI inference stage, the true class labels are unknown, and the AI classification application relies on the DNN inference results which can be false. As previously noted, the cumulative confidence level gradually increases with possible fluctuations as the number of data samples increases. However, as the confidence level represents uncertainty in a DNN inference result rather than the accuracy thereof, a single DNN inference result with high confidence level is still possible to be false, if the predicted probability for a wrong class is high. However, if the cumulative confidence level which aggregates the contributions of multiple data samples is high, it is highly possible that the cumulative DNN inference result is accurate. The accuracy is estimated as the average ratio of correct inference among all AI classification tasks with different data traces.
According to embodiments, the performance of the adaptive control scheme is further discussed. The performance of the deep Q learning algorithm for adaptive offloading decision is further discussed. For time slot k in an episode, the current cumulative confidence levels, represented as η(k), are part of state sk. To determine the state transitions in terms of the cumulative confidence levels, we use the cumulative confidence level traces obtained from the cumulative DNN inference scheme. For simplicity, the average cumulative confidence level traces are used for both fast and full DNN inference. Consider differentiated task completion time requirements for the three devices, which are set as [9, 11, 13] in number of time slots. Assuming the devices have the same confidence level requirement, ηT, for their AI classification tasks, evaluation of the performance of the deep Q learning algorithm for three different values of ηT among {0.93, 0.95, 0.97}, where ω1=0.90 by default, can be determined.
Due to the priority on minimizing the network resource consumption at ω1=0.90, it can be seen in
According to embodiments, in order to evaluate the benefit of an extra experience replay which stores the transitions in episodes with no penalty, a comparison of both the episodic (smoothed) total reward and the episodic total penalty during the training process with and without the extra experience replay, with results shown in
As shown, the electronic device 2000 may include a processor 2010, such as a central processing unit (CPU) or specialized processors such as a graphics processing unit (GPU) or other such processor unit, memory 2020, non-transitory mass storage 2030, input-output interface 2040, network interface 2050, and a transceiver 2060, all of which are communicatively coupled via bi-directional bus 2070. According to certain embodiments, any or all the depicted elements may be utilized, or only a subset of the elements. Further, electronic device 2000 may contain multiple instances of certain elements, such as multiple processors, memories, or transceivers. Also, elements of the hardware device may be directly coupled to other elements without the bi-directional bus. Additionally, or alternatively to a processor and memory, other electronics, such as integrated circuits, may be employed for performing the required logical operations.
The memory 2020 may include any type of non-transitory memory such as static random-access memory (SRAM), dynamic random-access memory (DRAM), synchronous DRAM (SDRAM), read-only memory (ROM), any combination of such, or the like. The mass storage element 2030 may include any type of non-transitory storage device, such as a solid-state drive, hard disk drive, a magnetic disk drive, an optical disk drive, USB drive, or any computer program product configured to store data and machine executable program code. According to certain embodiments, the memory 2020 or mass storage 2030 may have recorded thereon statements and instructions executable by the processor 2010 for performing any of the method operations described above.
Embodiments of the present disclosure can be implemented using electronics hardware, software, or a combination thereof. In some embodiments, the disclosure is implemented by one or multiple computer processors executing program instructions stored in memory. In some embodiments, the disclosure is implemented partially or fully in hardware, for example using one or more field programmable gate arrays (FPGAs) or application specific integrated circuits (ASICs) to rapidly perform processing operations.
It will be appreciated that, although specific embodiments of the technology have been described herein for purposes of illustration, various modifications may be made without departing from the scope of the technology. In particular, it is within the scope of the technology to provide a computer program product or program element, or a program storage or memory device such as a magnetic or optical wire, tape or disc, or the like, for storing signals readable by a machine, for controlling the operation of a computer according to the method of the technology and/or to structure some or all of its components in accordance with the system of the technology.
Acts associated with the method described herein can be implemented as coded instructions in a computer program product. In other words, the computer program product is a computer-readable medium upon which software code is recorded to execute the method when the computer program product is loaded into memory and executed on the microprocessor of the wireless communication device.
Further, each operation of the method may be executed on any computing device, such as a personal computer, server, personal digital assistant (PDA), or the like and pursuant to one or more, or a part of one or more, program elements, modules or objects generated from any programming language, such as C++, Java, or the like. In addition, each operation, or a file or object or the like implementing each said operation, may be executed by special purpose hardware or a circuit module designed for that purpose.
Through the descriptions of the preceding embodiments, the present disclosure may be implemented by using hardware only or by using software and a necessary universal hardware platform. Based on such understandings, the technical solution of the present disclosure may be embodied in the form of a software product. The software product may be stored in a non-volatile or non-transitory storage medium, which can be a compact disc read-only memory (CD-ROM), USB flash disk, or a removable hard disk. The software product includes instructions that enable a computer device (personal computer, server, or network device) to execute the methods provided in the embodiments of the present disclosure. For example, such an execution may correspond to a simulation of the logical operations as described herein. The software product may additionally or alternatively include instructions that enable a computer device to execute operations for configuring or programming a digital logic apparatus in accordance with embodiments of the present disclosure.
Although the present invention has been described with reference to specific features and embodiments thereof, it is evident that various modifications and combinations can be made thereto without departing from the invention. The specification and drawings are, accordingly, to be regarded simply as an illustration of the invention as defined by the appended claims, and are contemplated to cover any modifications, variations, combinations, or equivalents that fall within the scope of the present invention.
This application is a continuation of International Application No. PCT/CA2022/051493, filed on Oct. 11, 2022, the disclosure of which is hereby incorporated by reference in its entirety.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CA2022/051493 | Oct 2022 | WO |
| Child | 19077680 | US |