This application claims priority to foreign European patent application No. EP 20306518.0, filed on Dec. 9, 2020, the disclosure of which is incorporated by reference in its entirety.
The invention relates to energy management of energy constrained electronic systems, for example Internet of Things (IoT) nodes, possibly depending on harvested energy for their power supply.
An IoT sensor node typically has sensor(s), processing unit(s), and a radio transmitter. It has a power supply, possibly recharged with an energy-harvester. The node surroundings, such as service provider, weather and objects around the node are uncertain and they impact the sensor node behaviour. These uncertainties are sometimes called “disturbances”. In practice, no prior information about these uncertainties is available, and when these uncertainties are estimated, the estimate is usually poor, i.e., far from actual events as they transpire. Thus, in this domain, several technical problems must be faced. They may be related among other factors to data and energy uncertainty, workload variations, wireless link quality variations, and the unpredictability of harvested and consumed energy. For the convenience of the reader, a table of common abbreviations as used throughout the following description is provided at the end of the detailed description.
It will be appreciated that similar uncertainties may occur in the management of other energy consuming operations of electronic systems.
A number of mechanisms have been proposed for managing these factors.
Reinforcement Learning (RL) is known as one of the most effective methods to deal with the uncertainties with no a priori information. It also possesses adaptability to the environmental changes by constant online exploration and learning. Meanwhile, for control purposes, it does not require an analytic model of the system to be controlled, as do classical control techniques such as Proportional-Integral-Derivative (PID)-based, Model Predictive Control (MPC)-based, etc. It may be noted that in Reinforcement Learning a set of variables that reflects the environment is referred to as the state, and each user decides which variables to use for the state representation. A number of implementations based on such technologies are known.
A known reinforcement learning mechanism is the Actor-Critic model.
As shown in
The article by Masadeh, Z. Wang, and A. E. Kamal, entitled “An actor-critic reinforcement learning approach for energy harvesting communications systems,” published in 2019 28th International Conference on Computer Communication and Networks (ICCCN), pp. 1-6, July 2019, presents an actor-critic Reinforcement Learning method for transmission (TX) output power control in energy-harvesting point-to-point communication systems. The actor learns the parameters for the mean and standard deviation of a normal distribution, while the critic is constructed by a two-layer neural network, which is costly for the resource-constrained devices. The control interval is 1 second, and an infinite data buffer is assumed.
A feed-forward mechanism is proposed in the article by C. Qiu, Y. Hu, Y. Chen, and B. Zeng, entitled “Deep Deterministic Policy Gradient (DDPG)-based energy harvesting wireless communications,” IEEE Internet of Things Journal, vol. 6, pp. 8577-8588, October 2019. This method is based on an actor-critic method where a policy gradient and the concept of Deep Q-Network are combined.
Another feed-forward mechanism is proposed in the article by N. Zhao, Y. Liang, D. Niyato, Y. Pei, and Y. Jiang, entitled “Deep reinforcement learning for user association and resource allocation in heterogeneous networks,” in 2018 IEEE Global Communications Conference (GLOBE-COM), pp. 1-6, December 2018. They make use of Double Deep Q-Network, which is costly for resource-constrained devices.
Storing learned parameters and carrying out computations, e.g., multiply-and-accumulate (MAC) operations come at a cost. Table 1 below shows the required number of memory spaces and computations for the feed-forward operation on the basis of certain feed forward mechanisms. The parameters are updated, in general, by a back propagation algorithm.
In the article by S. Sawaguchi et al., entitled “Multi-agent actor-critic method for joint duty-cycle and transmission power control,” in Design, Automation Test in Europe Conference (DATE) 2020, March 2020, a multi-agent actor-critic algorithm for joint TX duty-cycle and output power optimization is proposed. The observation of the State-of-Buffer (SoB) and the State-of-Charge (SoC) for data and energy management is described. Such an observation reduces the input cost and increases the scalability of the output.
The control update is applied every 30 minutes.
A photovoltaic cell is used for the energy-harvesting. Real-life solar irradiance data are provided by Oak Ridge National Laboratory https://midcdmz.nrel.gov/apps/sitehome.pl?site=ORNL.
The self-discharge of a supercapacitor (20% per day) is considered.
The wireless link quality is under the influence of path-loss and shadowing.
The workload follows the Poisson distribution. The average rate doubles up after the first 6 months (where the algorithm will be put through the test regarding fast adaptability/reactivity). More precisely, the system receives the average of 1.0 pkts/min for the first 6 months, and it impulsively becomes twice (2.0 pkts/min) afterwards.
Results as shown in
The applied hyper-parameters of the Actor-Critic are listed in Table 2 below. They correspond to the learning rates for the Actor βx and for the Critic αx, the forgetting factor γx for the past reward, the recency weight λx in the Temporal Difference algorithm and the standard deviation σx for the policy based on the Gaussian distribution for the exploration space. The subscribe x corresponds respectively to the output power (op) and to the duty cycle (dc).
Meanwhile, a number of patent publications exist in this domain.
WO2020/004972 proposes an artificial intelligence based automatic control. This application presents the limitation of a PID algorithm with a target value (i.e., a set point) and the necessity of a Reinforcement Learning based approach. Environmental change detection is carried out based on a predetermined value, requiring some expert (a priori) knowledge about the control, which can be costly.
CN102958109 describes a self-adaptive energy management mechanism in wireless sensor networks, in particular mentioning Markov Decision Process (MDP) as a solution.
CN109217306 proposes a deep reinforcement learning with self-optimising ability for regional power generation control. The application scenario is specific and the use of neural network requires extensive computational and memory resources.
These prior art approaches have been found not to be entirely satisfactory. They tend to present poor adaptability of reinforcement learning and slow online adaptation. Moreover, when neural nets are used, they are resource hungry in terms of computational workload, memory footprint and mitigation of sparse gradients at the cost of faster convergence/reactivity. It is an objective of the present invention to provide improvements in at least some of these regards.
In accordance with the present invention in a first aspect there is provided a controller of electrical energy consuming operations in an electrical energy constrained electronic system in a closed loop mode. The controller is adapted to apply a linear function approximation based Actor-Critic Reinforcement Learning algorithm to a set of state parameters comprising an electrical energy resource state of said electronic system and one or more performance parameters of said electrical energy consuming operation, wherein a trade off between said electrical energy resource state on one hand and said performance parameters on the other is inherent to the operation of said electronic system. The Reinforcement Learning algorithm incorporates an adaptive learning rate algorithm serving to mitigate fluctuations in the gradient of said state parameters, and the controller is further adapted to define an output parameter specifying a system operation concordant with said performance requirement subject to said mitigation of fluctuations, wherein there exists a predictable monotonic relationship between said system operation and said electrical energy resource state reflected in said linear function approximation.
In accordance with the present invention in a second aspect there is provided an electronic device comprising an electrical energy resource and an output transducer, and a controller according to the first aspect.
In accordance with the present invention in a third aspect there is provided a method of controlling electrical energy consuming operations in an electrical energy constrained electronic system in a closed loop mode. The method comprises the steps of: applying a linear function approximation based Actor-Critic Reinforcement Learning algorithm to a set of state parameters comprising an electrical energy resource state of said electronic system and one or more performance parameters of said electrical energy consuming operation, wherein a trade off between said electrical energy resource state on one hand and said performance parameters on the other is inherent to the operation of said electronic system. The Reinforcement Learning algorithm incorporates an adaptive learning rate algorithm serving to mitigate fluctuations in the gradient of said state parameters, and the method comprises the further step of defining an output parameter specifying a system operation concordant with optimizing said state parameters subject to said mitigation of fluctuations, wherein there exists a monotonic relationship between said system operation and said electrical energy resource state reflected in said linear function approximation.
In a development of the third aspect, the adaptive learning rate is implemented using the Adam algorithm.
In a development of the third aspect, the adaptive learning rate is implemented using the rmsprop algorithm.
In a development of the third aspect, the adaptive learning rate is implemented using the Adadelta algorithm.
In a development of the third aspect, the first order decay coefficient (β1) of the rmsprop algorithm or Adam Algorithm is less than 0.9 and the second order decay coefficient (β2) is less than (and β2=0.999).
In a development of the third aspect, the electrical energy consuming operations comprise the transmission of data, said one or more performance parameters include a data buffer level, and said system operation is the transmission of a specified part of the content of the data buffer to which said data buffer level relates.
In a development of the third aspect, the electrical energy consuming operations comprise the wireless transmission of data.
In a development of the third aspect, the performance parameters further include a transmission channel quality indicator.
In a development of the third aspect, the electrical energy consuming operations comprise actuator operations of a mechanical system calculated to cause said mechanical system to maintain or assume a particular orientation, configuration, or attitude in a physical frame of reference.
In a development of the third aspect, the electrical energy resource state reflects the charge level of a battery or super capacitor.
In a development of the third aspect, the charge level of a battery or super capacitor is dependent on electrical energy gleaned from a variable source.
In a development of the third aspect, said variable source is solar electrical energy.
In accordance with the present invention in a fourth aspect, there is provided a program comprising instructions which, when the program is executed by a compute element, cause the compute element to carry out the method of the third aspect.
The invention will be better understood and its various features and advantages will emerge from the following description of a number of exemplary embodiments provided for illustration purposes only and its appended figures in which:
In view of the foregoing discussion, it is proposed to combine a fast online adaptation technique with lightweight reinforcement learning, without recourse to neural nets, as discussed in further detail below.
The present disclosure relates generally to controlling energy consuming operations in an energy constrained electronic system. In particular, electronic systems in the context of the present invention may be considered as constituting energy constrained electronic system insofar as their power requirements are constrained by the capacity of a power supply supporting those power requirements, or energy constraints being imposed by the capacity of one or more electrical power supplies providing electrical energy for said system. For example, and electronic power supply will typically be constrained in terms of the maximum instantaneous current that can be provided, as well as the maximum average current that can be provided over a more or less extended period. These limitations may be expressed in terms of maximum current, duty cycle, operation period, and other terms as will be familiary to the skilled person. An objective of certain embodiments is to control energy consuming operations so as to remain within the limits defined by the capacity of an electrical power supply in this sense. On this basis, it will be understood that references to energy in the present application concern electrical energy, as provided by the power supply and transformed by the operations of the the electronic system.
As shown in
Typically the optimal or preferred value of the energy resource state will correspond to maximum availability of energy, e.g. a full charge, zero voltage drop, etc., and the optimal parameter of the energy consuming operation will be a completion of all pending operations. In certain embodiments, additional parameters with associated optimal or preferred values may be defined, and taken into account together with the energy resource state of said electronic system and a performance parameter of said energy consuming operation as discussed below.
The energy resource state may represent the charge level of a battery, super capacitor or other energy storage device. The charge level of a battery or super capacitor may be dependent on energy gleaned from a variable source, such as solar energy, wind energy, environmental temperature variations, user motion, and the like.
A performance parameter may be affected by operations of an output transducer of the system, such as a transmitter, in which case the performance parameter may describe output data buffer or the like.
Where the energy consuming operations comprise the transmission of data, the performance parameter may be a data buffer level, and the system operation may comprise setting output power or the transmission duty cycle for transmission or otherwise cause the transmission of a specified part of the content of the data buffer to which the data buffer level relates.
Other possible performance parameters may include one or more transmission channel quality indicators. For example, in a system using protocols such as LORA, Wifi, Zigbee and the like, transmission channel quality indicators may include an indication of a confirmation that transmitted data has been received (Acknowledgement signal) and/or a Received Signal Strength Indicator (RSSI) value. The skilled person will recognise that similar or comparable indicators exist in other telecommunication systems.
Monitoring a minimal set of performance parameters reduces the overall complexity of the system, and reinforces the applicability of simple linear or monotonic models.
The energy consuming operations may comprise actuator operations of a mechanical system calculated to cause the mechanical system to maintain or assume a particular orientation, configuration, or attitude in a physical frame of reference.
In accordance with the embodiment, the Reinforcement Learning algorithm incorporates an adaptive learning rate algorithm serving to mitigate fluctuations in the gradient of the state parameters.
Certain adaptive learning rate algorithms that may be adapted to the purposes of the present invention are known in the prior art.
For example, the “Adam” algorithm is a known adaptive learning rate mechanism described in the article by D. Kingma and J. Ba, entitled “Adam: A method for stochastic optimization”, 3rd International Conference for Learning Representations, San Diego, 2015. As described, this method is used to ensure the stable convergence of all parameters of the network, i.e., to avoid sparse gradient issues. After the gradient has been computed in the Actor, the Adam algorithm is applied for faster adaptability, and the learning rate is adjusted. However, as discussed below, suitable operating parameters may be selected in view of the objectives of the present invention to make this a suitable algorithm for incorporation in embodiments of the present invention.
Similarly the adaptive learning rate may be implemented using the rmsprop algorithm as described in the article by S. Ruder entitled “An overview of gradient descent optimization algorithms,” Computing Research Repository, vol. abs/1609.04747, 2016.
Similarly the adaptive learning rate may be implemented using the Adadelta algorithm as described in the article by S. Ruder entitled “An overview of gradient descent optimization algorithms,” Computing Research Repository, vol. abs/1609.04747, 2016.
As described in the respective articles, at least those incorporating an Exponentially Weighted Moving Average (EWMA) approach such as Adam or rmsprop, first and second moment smoothing factors are set at values of 0.9 and 0.999, in view of the underlying aims of ensuring stable convergence. As described below, in accordance with embodiments of the present invention these may be advantageously adapted to provide a more rapid convergence. Generally, sparse gradients in Neural Net systems are resolved by taking into account the long past information (as non-zero gradients are rare and precious, contributing much to the parameter update). Adopting a Reinforcement Learning approach based on a linear function approximation avoids the concerns associated with neural nets, and accordingly frees us from considering sparse gradients.
In particular, the first order decay coefficient (β1) of the rmsprop algorithm or Adam algorithm may be less than 0.9 and the second order decay coefficient (β2) is less than 0.999. More preferably, the first order decay coefficient (β1) of the rmsprop algorithm or Adam algorithm may be between 0.7 and 0.1 and the second order decay coefficient (β2) may be between 0.7 and 0.1. More preferably still the first order decay coefficient (β1) of the rmsprop algorithm or Adam algorithm may be below 0.5, where the initialization bias correction terms can be ignored to further alleviate the computation costs. This is possible and meaningful in the context of the present invention because a neural network is not used. In conventional implementations based on a Neural Network as described with reference to the prior art, such values would not be suitable.
As shown in
Where the operation is transmission of data, this may occur by a wireless channel, by modification of one or more operating parameters having an influence of the performance parameter(s) such as, duty cycle, transmission output power level (for wireless), spreading factor (in LORA systems or the like), and any combination of these. Any other output transducer, such as a motor, haptic or audio transducer, light source or laser, and so on may be covered. These may form part of an active input transducer, such as a sonar, lidar, radar device or the like.
A system suitable for a linear function approximation based approaches may generally be expected to have fewer observed state parameters. For example reducing the management of an IoT sensor to managing the Transmission Buffer state and the Energy resource Charge state in the embodiments described in detail below, tending also to decrease the sensing cost.
In accordance with certain embodiments, a lightweight fast online adaptation method is used. For instance, exponentially weighted moving average (EWMA) may be used in workload change detection, as it only incurs two multiplications and one addition. This approach is incorporated in the Adam algorithm as described below but other Reinforcement Learning based algorithms may also be adapted to incorporate this approach in accordance with certain embodiments.
Since the neural nets are removed as well as any direct measurement of external uncertainties for Reinforcement Learning (RL) inputs, the solution will be much more low-cost in terms of computation and memory footprint. Fast and stable online adaptability/reactivity will also be attained thanks to a fast online adaptation technique.
Accordingly, a lightweight Reinforcement Learning (RL) mechanism is proposed that addresses the fast adaptation to any new environmental situation. A linear function approximation based RL is lightweight in terms of computation and memory footprint and advantageously combined with an adaptive learning rate method. As a consequence, compared to the existing solutions, this new approach enables faster adaptability (i.e. both fine-tuning and reactivity) with less computation and memory footprint.
The described approaches thus offer improved adaptability of reinforcement learning with fixed learning rate, faster online adaptation, and avoid recourse to neural nets as generally required in prior art approaches, that, in general, require more computations, memory footprint and mitigation of sparse gradients at the cost of faster convergence/reactivity.
As such, large variance of gradients due to the architecture and application scenarios (e.g., control interval), or to some degree of environmental changes may be handled, and quick adaptation at run-time to a changing environment and low-cost Reinforcement Learning and fast online adaptability may be achieved.
In particular, as shown, the method comprises steps 300 and 320 substantially as described above, and a step 410, corresponding substantially to step 310 of
The method then reverts to step 320 as described above.
It will be appreciated that certain of these steps may be performed in alternative sequences without changing the underlying effect. For example, the sequence of steps 414 and 415 may be exchanged.
The Actor-Critic Reinforcement Learning algorithm corresponding to step 310 of
The skilled person will appreciate that specific functions may be envisaged to implement the steps of
In particular, as shown, the method comprises steps 300 and 320 substantially as described above, and a step 510, corresponding substantially to step 410 of
where the domain regularisation value ε is fixed to avoid computational issues (divide by zero or by something very small) to prevent a gradient or update value explosion in case where vt hat becomes infinitesimally small. ε is chosen by the user (tuning parameter). It also depends on the arithmetic used. The obtained values can then be used to update the respective actor parameter or parameters at step 515d,
The method then reverts to step 320 as described above.
It may be noted that although the operations include divisions and squared root determinations, the number of operations is small, and only 9 parameters (7 parameters for Actor and Critic, and two for EWMA in Adam) need to be stored for each agent (i.e., each action). Compared to prior art approaches, the computation and memory cost of this implementation are likely to be smaller, because while the proposed Actor uses a parameterized mean and standard deviation, more complex mechanisms are generally proposed in the prior art for the Critic, such as the parameterized mean and 3-layer neural nets of Masadeh, Z. Wang, and A. E. Kamal, as compared to the TD(λ) algorithm consisting of only multiplications and additions thanks to the linear function approximation of a value function presented in the implementation of
By way of example, there will now be presented a detailed algorithm illustrating an implementation of the methods of
The state is composed of the State-of-Buffer (SoB) and State-of-Charge (SoC), which is presumed to reflect both incoming and outgoing data and energy.
The following algorithm refers only to the State of Buffer and State of Charge for the state representation, thereby reducing the number of observations required for a complete understanding of system state, in contrast to approaches known in the prior art, for example as known from Masadeh, Wang, and Kamal, “An actor-critic reinforcement learning approach for energy harvesting communications systems,” in 2019 28th International Conference on Computer Communication and Networks (ICCCN), pp. 1-6, July 2019, or A. Murad, F. A. Kraemer, K. Bach, and G. Taylor, “Autonomous management of energy-harvesting IoT nodes using deep reinforcement learning,” in 2019 IEEE 13th International Conference on Self-Adaptive and Self-Organizing Systems (SASO), pp. 43-51, June 2019.
Table 4 provides the list of parameters that are manipulated by Algorithm 1. The notes column additionally incorporates references in parentheses associating respective operations with the corresponding steps in the method of
(μx(t), σx)
It will be appreciated that the above algorithm incorporates numerous advantageous implementation details. For example the value function is assumed to be linearly proportional to the multiplication of 1−ϕSoB and ϕSoC (line 6), which indicates that the value of the state is higher when a lower Buffer (SoB) and higher charge (SoC) are confirmed. Similarly, the linear relationship is also assumed between the mean action value and the multiplication of ϕSoB and ϕSoC (line 15), which means that lower action values (i.e., less performance) is enough when the SoB level is less, and higher values can be provided when the SoC level is higher. Furthermore, the final action is generated based on the Gaussian distribution (line 16) to guarantee explorations and to find an optimal action.
As shown in
Alternative embodiments may incorporate some or all of these features in any combination.
In the preceding pseudo code, each state parameter is associated with a respective agent. The algorithm operates to converge on a joint optimisation of these parameters. Any number of parameters may be used. In the following example, the case of joint optimization of transmission duty-cycle and output power in an energy-harvesting IoT sensor end-node communicating with a sink node is considered, however the skilled person will appreciate that the same approach may be applied in other contexts of controlling energy consuming operations in an electronic system in a closed loop mode. The skilled person will appreciate that this constitutes the application of a new algorithm to a particular technical purpose of managing the balance between energy resources and system performance.
By way of illustration, a specific scenario is now presented as a basis for simulations based on the exemplary algorithm set out in table 4.
In this exemplary application scenario, the following conditions are defined:
The control update is conducted in every 30 minutes.
A photovoltaic cell is used for the energy-harvesting. Real-life solar irradiance data are provided by Oak Ridge National Laboratory https://midcdmz.nrel.gov/apps/sitehome.pl?site=ORNL.
The self-discharge of a supercapacitor (20% per day) is considered.
The wireless link quality is under the influence of path-loss and shadowing.
The workload follows the Poisson distribution. The average rate doubles up after the first 6 months (where the algorithm will be put through the test regarding fast adaptability/reactivity). More precisely, the system receives the average of 1.0 pkts/min for the first 6 months, and it impulsively becomes twice (2.0 pkts/min) afterwards.
The hyper-parameters for the Actor-Critic algorithm are shown in table 5. In table 5, x=op is related to the TX output power while x=dc is related to TX duty cycle. αx and βx correspond to the Critic and Actor learning rates, respectively. γx is the discount factor for the future reward. λx is the recency weight in the TD algorithm. σx is the exploration space (standard deviation for the policy based on the Gaussian distribution)
To evaluate the convergence/reactivity speed, a comparison between conventional decay coefficient settings as defined in the prior art (β1=0.9 and β2=0.999) and adaptation-aware (0.5 for both) is made.
Values are averaged over 92 cases. The conventional smoothing values as used in
Functioning with the fine-tuned parameters is visible in area 701. The reactivity to changes in workload is visible in area 702, corresponding to the period immediately after the change in workload defined by line 740.
Values are averaged over 98 cases. The approach as shown in
Functioning with the fine-tuned parameters is visible in area 804. The reactivity to changes in workload is visible in area 805, corresponding to the period immediately after the change in workload defined by line 840. As can be seen in comparison to
While the example of
The skilled person will appreciate that while
While no system failure was observed with the settings of
As shown in
In accordance with the embodiment, the controller 901 is configured to apply a linear function approximation based Actor-Critic Reinforcement Learning algorithm for example as discussed above to a set of state parameters. One of these state parameters is an energy resource state of the electronic system, which may comprise the charge level of the energy reserve 905 in line with the preceding discussion. Another of these is a performance parameter of said energy consuming operation, which may comprise the instantaneous fill level of the buffer 903 in line with the preceding discussion.
The Reinforcement Learning algorithm incorporates an adaptive learning rate algorithm serving to mitigate fluctuations in the gradient of said state parameters, for example as discussed above. The controller is being further adapted to define an output parameter specifying a system operation concordant with said performance requirement subject to said mitigation of fluctuations. In the context of the embodiment of
The skilled person will appreciate that while the arrangement of
The skilled person will appreciate that the various concepts described with respect to
Software embodiments include but are not limited to application, firmware, resident software, microcode, etc. The invention can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or an instruction execution system. Software embodiments include software adapted to implement the steps discussed above with reference to
In some embodiments, the methods and processes described herein may be implemented in whole or part by a user device. These methods and processes may be implemented by computer-application programs or services, an application-programming interface (API), a library, and/or other computer-program product, or any combination of such entities.
For example, the controller may comprise one or more physical logic devices configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result, in particular to implement certain of the operations described above, for example with reference to
Such logic devices may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic device may include one or more hardware or firmware logic devices configured to execute hardware or firmware instructions. Processors of the logic device may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic device optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic device may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration.
The controller may additionally comprise or have access to one or more storage devices, which may include one or more physical devices configured to hold instructions executable by the logic device to implement the methods and processes described herein. When such methods and processes are implemented, the state of a storage device may be transformed—e.g., to hold different data.
A storage device may include removable and/or built-in devices. Storage device may be locally or remotely stored (in a cloud for instance). Storage device 903 may comprise one or more types of storage device including semiconductor memory (e.g., FLASH, RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., MRAM, etc.), among others. Storage device may include volatile, non-volatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.
In certain arrangements, the system may comprise an interface adapted to support communications between the logic device and further system components.
It will be appreciated that storage device includes one or more physical devices, and excludes propagating signals per se. However, aspects of the instructions described herein alternatively may be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.), as opposed to being stored on a storage device.
Aspects of logic device 901 and storage device 903 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC/ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.
The term “program” may be used to describe an aspect of computing system implemented to perform a particular function. In some cases, a program may be instantiated via logic device executing machine-readable instructions held by a specific storage device. It will be understood that different modules may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same program may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The term “program” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.
In particular, the system of
A controller such as that of
As shown, there may be provided an electronic device comprising an energy resource and an output transducer, and a controller as described with reference to
Accordingly, in certain embodiments a lightweight Learning mechanism combining a linear function approximation based Reinforced Learning and an adaptive learning rate method is provided for energy management of Internet of Things (IoT) nodes and other energy constrained electrical systems, especially for nodes with harvested energy and wireless transmitters. The adaptive learning rate method may be based on an exponentially weighted moving average (EWMA), or Adam, which incorporate EWMA. Optimal decay coefficient ranges outside the usual range in Neural Network contexts have been found to be effective in implementations based on this linear function approach.
It will be understood that the configurations and/or approaches described herein are exemplary in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and non-obvious combinations and sub-combinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.
Number | Date | Country | Kind |
---|---|---|---|
20306518.0 | Dec 2020 | EP | regional |