The present disclosure relates generally to electric power systems, and more particularly to optimal joint bidding and pricing of load serving entity.
Conventional power system restructuring has been going on for many years, which aims to improve power system planning and operation activities. On the transmission level, under a market-based regime, planning short-term and long-term electricity production and consumption activities, different parties from the supply side to the demand side, participate in a wholesale electricity market (WEM) which is operated by an independent system operator (ISO) to offer or bid for the electricity/energy.
Traditionally, sellers (or power producers) submit sealed offers and buyers (or power consumers) submit sealed bids to the ISO, which the ISO then clears the market and determines the cleared energy prices and cleared energy quantities for all participants. However, with the advancement of Smart Grid technologies, the advances bring real-time bidding & selling for 5- to 15-minute time intervals in the WEM that is operated by the ISO. The traditional role of a load serving entity (LSE) as a buyer in the WEM is to pass on the cleared energy prices and cleared energy quantities plus an added cost of a government tariff back to end use consumers (EUCs). Now, with real-time bidding & selling, the LSE is able to use a flexible price signal through a variety of demand response programs in the retail electricity market (REM) to interact with the EUCs, such that their behaviors are changed in way that benefits the LSE. As a consequence, under such an environment, a profit-seeking LSE is faced with two problems, i.e., the bidding problem that determines the optimal electricity bid it submits in the WEM, and the pricing problem that determines the optimal electricity price it charges the EUCs.
In application US 2005/0004858 A1, methods are disclosed for assisting a large industrial or business consumer of energy to become a self-serving retail electricity provider in a deregulated energy market. Performed by an energy advisory and transaction management service provider, one method registers the large business energy consumer with the state public utility commission, assists the business to qualify as a scheduling entity with an independent service operator, and establishes the business as a bilateral trading partner of wholesale energy merchants.
US 2014/0316973 A1 discloses an approach for facilitating the generation of energy-related revenue for an energy customer of an electricity supplier. The approach is used to generate operating schedules for a controller of the energy assets. When implemented, the generated operating schedules facilitates derivation of the energy-related revenue, over a time period T, associated with operation of the energy assets according to the generated operating schedule. The energy-related revenue available to the energy customer over the time period T is based at least in part on a wholesale electricity market.
However, all of these conventional approaches address the bidding problem and the pricing problem separately. Unfortunately, these conventional approaches ignore the strong coupling or relationship between the two problems, i.e. bidding and pricing, since the energy purchased in the WEM, and that sold in the REM, must balance, otherwise the LSE will incur economic losses or even reliability issues. Reliability issues relating to not having any energy to sell to their EUCs, if the LSE cannot make bids to the ISO that are competitive against other LSE's when buying the energy. Therefore, there is a need for new approaches that include a joint determination of the energy bid to be submitted in the WEM, and an energy price charged in the REM for an LSE, at least for the purpose of maximizing the LSE's total profit, among other reasons.
The present disclosure relates to electric power systems, and more particularly to optimal joint bidding and pricing of load serving entity.
In a restructured conventional electric power industry, a load serving entity (LSE) needs to submit bids for electricity/energy in a wholesale electricity market (WEM), which is operated by an independent system operator (ISO), so as to meet the demand from its end use customers (EUCs). The LSE then charges the EUCs for electricity/energy a conventionally fixed tariff that is regulated by the government. Therefore, the conventional decision-making process of the LSE involves only the bidding problem, i.e. the determination of the energy bids, which relies on the forecast of EUC demand, which is relatively inflexible.
However, due to the rapid development of smart grid technologies, demand-side management, becomes feasible through demand response programs such as real-time pricing. An LSE may determine a real-time energy price in the retail electricity market (REM) that the LSE operates to incentivize the EUCs changing their energy consumption behaviors in a way that benefits the LSE. In this context, in addition to the bidding problem, the LSE is also faced with the pricing problem—the determination of the energy price that is charged to the EUCs.
Conventional decision-making processes for LSE's involves only addressing the bidding problem, i.e. the determination of the energy bids, which rely on the forecast of EUC demands, which is relatively inflexible, as noted above. Thus, these conventional approaches are concerned with only one problem. Yet, there two problems are inherently coupled, since the energy purchased in the WEM and that sold in the REM must balance, and the profit earned by the LSE is dependent on the results in both markets. Therefore, the embodiments of the present disclosure solve the bidding problem as well as the pricing problem jointly.
During experimentation, one approach included modeling the joint bidding and pricing problem as a bi-level programming problem, which was solved using mixed integer linear programming techniques. However, these particular experimented approaches assume that all market participants are myopic or nearsighted, the parameters of all the experimented models, including all market participants in the WEM and all EUCs in the REM, are completely known to the LSE. More importantly, all the models are linear. These assumptions, however, are very constraining and impractical, in view of the aspects of the present disclosure.
Some embodiments of the present disclosure formulate the joint bidding and pricing problem as a Markov decision process (MDP), in which the energy bid and the energy price are two actions that shared as a common objective. In order to solve this MDP without the necessity to knowing the WEM and EUC models, a deep deterministic policy gradient based reinforcement learning algorithm can be devised to learn the bidding and pricing policies. The proposed reinforcement learning algorithm takes advantage of the decision-making process by determining the second stage action, e.g., the retail energy price, using information revealed after the first stage action, e.g., the energy bid, is taken, which improves the overall profit earned by the load serving entity, among other benefits.
Assuming the models of other market participants in the WEM and all EUCs in the REM are not known in advance. To this end, neural networks can be applied to learn a bid response function and a price response function from historical data to model the WEM, and the collective behavior of the EUCs respectively, from the perspective of the LSE. These response functions can explicitly capture the inter-temporal correlations of the WEM clearing results and the EUC demand, according to embodiments of the present disclosure.
Overall, aspects of the present disclosure provide a new model-free and flexible solution to the joint bidding and pricing problem. Some novelties of the present disclosure, among many novelties, include: a formulation of the joint bidding and pricing problem as an MDP, which allows the consideration of an accumulative profit of the LSE in the long-term; a development of a reinforcement learning algorithm that solves the MDP while taking into account its structural characteristics; using the application of the multi-layer feedforward neural networks (FNNs), recurrent neural networks (RNNs), or long short-term memory (LSTM) unit networks to model the WEM and the REM using historical data, which captures the inter-temporal correlations, among other novelties.
According to an embodiment of the present disclosure, a system to control operation of an electrical device in a market-based resource allocation system. The system having a processor configured, in as close to real-time, to receive, via a transceiver, a user selected desired operating level of an electrical device by a user for an upcoming time interval, the processor connected to a memory having executable programs and stored data. The system includes using the processor to compute, an offer amount representative of a value at which electricity is available to be supplied to operate the electrical device for the upcoming time interval at the user selected desired operating level. Wherein computing the offer amount is based on multiple factors for the upcoming time interval, including the user selected desired operating level, current environmental data, and stored historical energy futures market data used to determine inter-temporal correlation behaviors of past offer amounts to a local resource allocation market (LRAM) and past clearing pricing for electricity by the LRAM, to obtain the offer amount. Transmit, via the transceiver, the offer amount to the LRAM. Receive, via the transceiver, a cleared price for electricity from the LRAM from which the electrical device receives electricity. Compute, a retail price of electricity for operating the electrical device based at least in part on the user selected desired operating level, and the current environmental data, the cleared price for electricity from the LRAM and the stored historical data from the energy futures market used to determine inter-temporal correlation behaviors of past user selected desired operating levels by the user, past pricing for electricity in a retail electricity market (REM) and the past clearing pricing for electricity by the LRAM, to obtain the retail price. Wherein the computation of the offer amount and the retail price are computed jointly. Compare the submitted offer amount to the retail price. Activate or deactivate the electrical device based on the comparison.
According to another embodiment of the present disclosure, a system to control operation of an electrical device in a market-based resource allocation system. The system having a processor configured, in as close to real-time, to receive, via an input interface, a user selected desired operating level of an electrical device by a user for an upcoming time interval, the processor connected to a memory having executable programs and stored data. The system includes using the processor to compute, an offer amount representative of a value at which electricity is available to be supplied to operate the electrical device for the upcoming time interval at the user selected desired operating level. Transmit, via an output interface, the offer amount to a local resource allocation market (LRAM). Receive, via the input interface, a cleared price for electricity from the LRAM from which the electrical device receives electricity. Compute, a retail price of electricity for operating the electrical device based at least in part on the user selected desired operating level, and current environmental data, the cleared price for electricity from the LRAM and stored historical data from the energy futures market used to determine inter-temporal correlation behaviors of past user selected desired operating levels by the user, past pricing for electricity in a retail electricity market (REM) and past clearing pricing for electricity by the LRAM, to obtain the retail price. Wherein the computation of the offer amount and the retail price are computed jointly. Compare the submitted offer amount to the retail price. Activate or deactivate the electrical device based on the comparison.
According to another embodiment of the present disclosure, a method to control operation of an electrical device in a market-based resource allocation system. The method having a processor configured, in as close to real-time, to receive, via an input interface, a user selected desired operating level of an electrical device by a user for an upcoming time interval. The processor connected to a memory having executable programs and stored data. The method includes using the processor for computing, an offer amount representative of a value at which electricity is available to be supplied to operate the electrical device for the upcoming time interval at the user selected desired operating level. Transmitting, via an output interface, the offer amount to a local resource allocation market (LRAM). Receiving, via the input interface, a cleared price for electricity from the LRAM from which the electrical device receives electricity. Computing, a retail price of electricity for operating the electrical device based at least in part on the user selected desired operating level, and current environmental data, the cleared price for electricity from the LRAM and stored historical data from the energy futures market used to determine inter-temporal correlation behaviors of past user selected desired operating levels by the user, past pricing for electricity in a retail electricity market (REM) and past clearing pricing for electricity by the LRAM, to obtain the retail price. Wherein the computation of the offer amount and the retail price are computed jointly. Comparing the submitted offer amount to the retail price. Activating or deactivating the electrical device based on the comparison.
The presently disclosed embodiments will be further explained with reference to the attached drawings. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the presently disclosed embodiments.
While the above-identified drawings set forth presently disclosed embodiments, other embodiments are also contemplated, as noted in the discussion. This disclosure presents illustrative embodiments by way of representation and not limitation. Numerous other modifications and embodiments can be devised by those skilled in the art which fall within the scope and spirit of the principles of the presently disclosed embodiments.
The present disclosure relates to electric power systems, and more particularly to optimal joint bidding and pricing of load serving entity.
According to an embodiment of the present disclosure, a system to control operation of an electrical device in a market-based resource allocation system. The system having a processor configured, in as close to real-time, to receive, via a transceiver, a user selected desired operating level of an electrical device by a user for an upcoming time interval. The processor connected to a memory having executable programs and stored data. The system includes computing, using the processor, an offer amount representative of a value at which electricity is available to be supplied to operate the electrical device for the upcoming time interval at the user selected desired operating level. Wherein computing the offer amount is based on multiple factors for the upcoming time interval, including: the user selected desired operating level, current environmental data, and stored historical energy futures market data used to determine inter-temporal correlation behaviors of past offer amounts to a local resource allocation market (LRAM) and past clearing pricing for electricity by the LRAM, to obtain the offer amount. Transmit, via the transceiver, the offer amount to the LRAM. Receive, via the transceiver, a cleared price for electricity from the LRAM from which the electrical device receives electricity. Compute, using the processor, a retail of electricity for operating the electrical device based at least in part on: the user selected desired operating level; and the current environmental data; the cleared price for electricity from the LRAM; and the stored historical data from the energy futures market used to determine inter-temporal correlation behaviors of past user selected desired operating levels by the user, past pricing for electricity in a retail electricity market (REM) and the past clearing pricing for electricity by the LRAM; to obtain the retail price. Wherein the computation of the offer amount and the retail price are computed jointly. Compare the submitted offer amount to the retail price. Activate or deactivate the electrical device based on the comparison.
Step 116 of
Step 126 of
Step 136 of
Step 146 of
Step 156 of
Embodiments of the present disclosure provide unique aspects, by non-limiting example, a specific learning period may include historical data going back in time, for example, one month, two months, six months to empirically determined a solution. However, the above specific learning period is not limited to any stated learning period, some aspects in determining a specific learning period can include, by non-limiting example, a level of accuracy in a set of earlier time frames. Further, upon considering user inputs on desired operating levels, then some methods of the present disclosure can determine possible offer amount and possible retail price, followed by a comparison of offer amount and retail price, and ultimately a decision to activate or deactivate the device. At least one aspect some of the methods and systems of the present disclosure may be applied to demand response aggregators.
Still referring to
The processor 155 then, in communication with the receiver 153, predicts the aggregate energy consumptions of EUCs for the upcoming time interval using dynamical demand response models with possible LSE retail energy prices (Step 126), and predicts market clearing prices and quantities of ISO for the upcoming time interval using dynamical bid response models with possible LSE wholesale bids (Step 136).
After the possible demand and bid responses are obtained, the processor 155 determines wholesale bidding prices and quantities, and retail energy prices for the upcoming time interval using deep reinforcement learning algorithm (step 146).
Still referring to
Optionally, the control system of the operation of EUC devices 100 can store the system energy and price data in a computer readable memory 144, wherein the computer readable memory is in communication with the processor 155 and controller 157. Further, it is possible an input interface 145 can be in communication with the memory 144 and the processor 155 and controller 157. For example, a user via a user interface of the input interface 145 may input predicted predetermined conditions, for example, the aggregate energy consumptions of EUCs. It is contemplated the receiver, processor and controller could be a single computer system or multiple computer systems located at different locations depending on the specific application(s).
The electric power system 115 can also include a set of power plants, 120 A and 120 B that produces power to the system. Each power producer, 120A and 120B may have multiple generation units, or called generators, 150. The EUCs, 130A and 130 B, 130C and 130D consume the powers provided by the power plants, 120A and 120 B through the network connected by transmission lines, 160. An independent system operator (ISO), 140 is responsible for the coordination between producers and LSEs to maintain stable operation of the electric power system 115. A communication network may be used for exchanging information between the ISO 140 and the producers 120, or the LSEs 110 through communication links, 170. The LSEs 110 buys the power from the ISO 140 and resells to EUCs 130. The EUCs 130 may connect with the LSE 110 through distribution line 160 and also communicate with the LSE through communication links 170. In
It includes a preparatory offline step, and a set of online steps iterating over time intervals. Before triggering real-time application, the bidding and pricing policy functions, and dynamical demand and bid response functions are trained offline using historical data, in Step 210. After this step is completed, the trained policy and response functions are used to make real-time bidding and pricing decisions for the LSE for the upcoming time intervals, iteratively. Step 220 determines the wholesale bids and the retail prices using trained bidding and pricing policy functions for an upcoming time interval. The determined wholesale bids are then sent to WEM, and the retail energy prices are posted to REM for the upcoming time interval in Step 230. Step 240 estimates the aggregate energy consumptions of EUCs and clearing results of WEM based on the bids and prices using dynamical demand and bid response functions, and the bidding and pricing policy functions are updated for performance improvements with the response results in Step 250, accordingly.
Some embodiments of the present disclosure include an MDP formulation that is developed for the joint bidding and pricing problem of the LSE, which is solved by an effective RL algorithm, the deep deterministic policy gradient (DDPG) algorithm. Dynamical bid response and price response functions represented by neural networks are learned from historical data to model the WEM and the EUCs, respectively. These response functions explicitly or implicitly capture the inter-temporal correlations of the WEM clearing results and the EUCs, and are utilized to generate state transition samples required by the DDPG algorithm without any cost.
Wholesale and Retail Energy Market Models
In an electric power industry, a load serving entity (LSE) needs to submit bids for electricity/energy in a wholesale electricity market (WEM), which is operated by an independent system operator (ISO), so as to meet the demand from its end use customers (EUCs). An LSE may determine a real-time energy price in the retail electricity market (REM) it operates to incentivize the EUCs changing their energy consumption behaviors in a way that benefits the LSE. In this context, in addition to the bidding problem, the LSE is also faced with the pricing problem, i.e. the determination of the energy price that is charged to the EUCs.
Some embodiments of the present disclosure include the joint bidding and pricing problem to be formulated as a Markov decision process (MDP), in which the energy bid and the energy price are two actions that share a common objective. Doing so allows the consideration of an accumulative profit of the LSE in the long-term. To solve this MDP without the necessity to knowing the WEM and EUC models, the deep deterministic policy gradient (DDPG) algorithm, a policy-based reinforcement learning (RL) algorithm, is applied to learn the bidding and pricing policies, which determine the optimal action from the state. The models of other market participants in the WEM and all EUCs in the REM are not known in advance. To this end, neural networks are applied to learn a bid response function and a price response function from historical data to model the WEM and the collective behavior of the EUCs, respectively, from the perspective of the LSE. These response functions can explicitly capture the inter-temporal correlations of the WEM clearing results and the EUC responses, and can be utilized to generate state transition samples without any cost. More importantly, they also inform the choice of the states in the MDP formulation.
Assume one day is decomposed into T time intervals indexed by the elements in the set ={0, . . . , T−1}. Let t index the time intervals; then t mod T∈, where mod denotes the modulo operation. Typically, the duration of one interval may be 5, 15, 30, or 60 minutes, depending on the specific market. This disclosure only focuses on the activities that take place in the real-time energy market.
Prior to time interval t, each market participants, including the sellers and buyers, need to submit energy offers/bids 410 for time interval t. Then, a WEM 460 is cleared 420 to yield a wholesale energy price, as well as the energy sales and purchases that are successfully cleared for each seller and buyer, respectively. In the meantime, the LSE 450, which is a buyer in the WEM 460, also determines a retail energy price (simply referred to as the price) 425 for time interval t, at which it resells the energy to its customers, i.e., the EUCs 470, in the REM 450. During time interval t, the EUCs 470 respond to the price signal 430 by adjusting their energy consumptions. The LSE 450 needs to make payments to the ISO 460 for the energy consumed by the EUCs 470; meanwhile, it also collects payments from the EUCs 470. The total amount of profit resulted from energy trading in these two markets can be evaluated 440 after time interval t. This process is repeated for all the time intervals.
Still referring to
The wholesale market consists of a set of sellers and a set of buyers. Let ={g1, . . . , gG} denote the set of the sellers, and ={b, . . . , bB} the set of buyers. Each seller g∈ submits an offer (i.e. inverse supply function), denoted by ftg(⋅), which specifies the minimum price at which it is willing to sell energy during time interval t. Specifically, ftg(qrg) is the minimum price at which seller g is willing to sell energy during time interval t with a quantity of qtg. Similarly, each buyer b∈ submits a bid (i.e. inverse demand function), denoted by ftb(⋅), that specifies the maximum price at which it is willing to buy energy during time interval t. Specifically, ftb(qtb) is the maximum price at which a buyer is willing to buy energy during time interval t with a quantity of qtb.
Still referring to
where (1b) is the power balance equation, λt is the dual variable associated with constraint (1b), is the feasible set of the decision variables, which may depend on the market clearing results in the previous time interval. Constraint (1c) can capture all physical constraints such as capacity limits, energy limits, ramp rate limits, as well as security constraints such as reserve requirement, and line flow limits. For convenience, denote the total cleared energy sales/purchases by qt, i.e., qt=qtb=qtg.
The solution to (1) gives cleared energy sales and purchases, as well as the wholesale energy price for each market participant. In a uniform pricing market, all market participants receive a uniform price that equals to λt. When the WEM is competitive, a single market participant typically does not have the capability to influence the clearing price and the chances that it is the marginal unit are low. In such a setting, given λt, the cleared energy purchase for the buyer b when it is non-marginal, can be computed as follows:
Still referring to
It is noted that the disclosed methodology can be easily extended to handle cases when losses and transmission line congestions are considered. As an example, the simple yet representative case is presented here with the hope to provide more insights.
In the WEM, the LSE participates as a buyer that purchases energy through bidding. Without loss of generality, assume the LSE under consideration is buyer b in the WEM. The LSE resells the purchased energy to a set of EUCs in the Retail Energy Market (REM) and charges them at a typically regulated price that it needs to determine. Let νt denote the price at time interval t, and qtb the energy purchased from the WEM.
Let ={c1, . . . , cc} denote the set of the EUCs in the REM served by this LSE. For EUC c∈, it will respond to the price νt by adjusting its energy consumption, denoted by dtc. Denote the aggregate energy consumption of all EUCs measured at the substation during time interval t by dt, i.e., dt=dtc. Then, the objective of the LSE is to maximize its profit earned from time interval t and onwards, subject to the energy balance constraint, which can be mathematically express as follows:
where denotes expectation operation, γ∈[0, 1] is a discount factor that discounts the future profit (i.e., expectation of future profits in regard to Eq. 3), ϕr(⋅) is a non-negative scalar function that computes the cost incurred when the aggregate energy consumption deviates from the energy purchase, ν and
The buyers for the REM are end use customers. At the beginning of each time interval t, EUC c, c∈ receives a price νt from the LSE, it will then optimize its energy consumption so as to maximize its overall benefit. A generic EUC model is agnostic to the underlying components. Let etc denote the energy need of EUC c at time interval t. A myopic EUC finds its optimal action via solving the following utility maximization problem:
where βc(⋅) is the benefit function, which gives the benefit of the EUC at certain energy need and energy consumption, ηtc∈[0, 1] is the backlog rate that represents the percentage of unmet energy need that is carried over to the next time interval, ξtc is a random variable that models that newly generated incremental energy need, tc is the feasible set of the energy consumption.
Joint Bidding and Pricing Problem Formulation Under Synchronized Action Mechanism
Some embodiments of the present disclosure include address the problem of jointly determining the energy bid that is submitted to the wholesale electricity market (WEM) and the energy price that is charged in the retailed electricity market (REM) for a load serving entity (LSE), which seeks to maximize its total profits. The joint bidding and pricing problem is formulated as a Markov decision process (MDP) with continuous state and action spaces, in which the energy bid and the energy price are two actions that share a common objective, i.e. profit maximization.
First dynamical bid and price response functions are introduced, followed by the bidding and pricing policies. Then, we formulate the joint bidding and pricing problem faced by the LSE as an MDP.
From the perspective of the LSE, it has to determine a bid ftb—the bidding problem, as well as a price νt—the pricing problem, for time interval t. Assume ftb is characterized by a parameter vector ωt. Let {λτ, qτ}t−n
(Δt,qt,qtb)=ψ({λτ,qτ,qτb}t−n
where (t mod T) is included to model the time dependence. The cleared energy purchase can be computed using (2). For a perfectly competitive WEM, ωt has negligible impacts on the clearing results, and (5) essentially models the dynamics of the clearing results. The core idea behind the bid response function is the following. Assume all market participants make decisions for time interval t based on the WEM clearing results for previous time intervals. From the perspective of the LSE, the WEM clearing results will evolve to λt, qt, qtb from previous WEM clearing results, given its bid ωt. The impacts from other market participants' actions are implicitly included in this bid response function. Therefore, when n1 is large enough, the n1-order bid response function can well capture the dynamics in the WEM.
In the meantime, the LSE may only have information on the aggregate energy consumption dt in real-time, rather than complete parameters in (4).
Therefore, instead of adopting the complete EUC model in (4), we use a n2-order price response function, denoted by ϕ(⋅), to characterize the collective behavior of all EUCs defined through the set of problems in (4), as follows:
d
t=ϕ({dτ,υτ}t−n
In the special case when n2=0, the aggregate energy consumption of EUCs only depends on the price at the current time interval. The core idea behind the price response function is similar to that of the bid response function. Compared to the complete WEM model and EUC models, the response functions are easier to learn from the data that are available to the LSE.
At least one objective of the joint bidding and pricing problem to be solved by the LSE is to determine the bid and the price based on available information. As discussed earlier, prior to time interval t, the information related to the WEM that is available to the LSE includes ωτ, λτ, qτ, qτb, ∀τ≤t−1. In the meantime, the information related to the REM that is available to the LSE includes νt, dτ, ∀τ≤t−1. Let τ-1=ωτ, λτ, qτ, qτb, ντ, dτ, {τ≤t−1} denote the set of information available to the LSE before the WEM for time interval t is cleared.
The bidding problem and the pricing problem are inherently coupled, and thus need to be considered jointly. In a uniform pricing market, the LSE's bid will get cleared as long as its bid price is no smaller than λt. Meanwhile, to minimize the cost incurred due to the mismatch of the energy purchase and aggregate energy consumption, it is indeed desirable to bid for the amount of energy that equals to the aggregate energy consumption. In fact, when λt is not affected by ωt, for any νt, the optimal bid ωt that maximizes the profit defined in (3) is the one that gives qτb=dτ. Essentially, we only need to find the optimal price νt for the REM, and then construct the bid from νt
Define a deterministic pricing policy, denoted by π(⋅), as the following function that maps t−1 to the price νt:
νt=π(t−1). (7)
Also, define a deterministic bidding policy, denoted by μ(⋅), as the following function that maps t−1 and νt to a bid ωt:
ωt=μ(t−1,νt). (8)
As an example, assume the bid ωt consists of two components, a bid price ωtp in $/MWh and a bid quantity ωtq in MWh. Then, the optimal bidding policy μ* is such that ωtp is set to νt and ωtp is set to the estimated aggregate energy consumption obtained using the price response function ϕ. Therefore, there is no additional parameter in μ that needs to be learned beyond those in ϕ.
The joint bidding and pricing problem is formulated as a Markov decision process (MDP). An MDP consists of a state space, an action space, a reward function, and a transition probability function that satisfies the Markov property, i.e., given the current state and action, the next state is independent of all states and actions in the past. Specifically, in the joint bidding and pricing problem, define the state at time interval t to be st=({ωτ, λτ, qτ, qτb}t−n
Then, the pricing policy can equivalently become
νt=π(st), (9)
and the bidding policy can be equivalently written as:
ωt=μ(st,νt). (10)
The objective of the joint bidding and pricing problem is to maximize the profit of the LSE; therefore, we define the reward for time interval t to be the profit earned by the LSE as follows:
r
t=(νt−λt)dt−ϕt(dt−qtq), (11)
where ϕt(⋅) computes the cost incurred when the aggregate energy consumption deviates from the cleared energy purchase, as in (3). The cumulative discounted reward from time interval t and onwards, denoted by Rt and referred to as the return, is Rt=Στ=t∞γτ-trτ, where γ∈[0, 1] is a discount factor. The action value function, also referred to as the Q function, under pricing policy π and bidding policy μ, at action a and state s, denoted by Qπ(s, a) is the expected return defined as
Q
π(st,at)=[Rt|s
The Q function under optimal pricing policy π* and optimal bidding policy μ*, denoted by Q*(⋅,⋅), satisfies the Bellman optimality equation:
Q*(st,at)=[rt]+γ∫S{st+1|st,at}Q*(st+1,at+1), (13)
where {st+1|st, at} is the probability that the state transit into st+1 conditioning on st, at; S is the state space.
Since μ* does not need to be learned once we have ϕ, the joint bidding and pricing problem essentially becomes finding π that maximize the following performance function:
J(π)=[R1;π,μ*], (14)
which gives the expected return under given bidding and pricing policies. The MDP problem can be solved leveraging a Reinforcement Learning (RL) algorithm be detailed later.
Learning Algorithm for Bid/Price Response
In RL algorithms, transitions (sτ, aτ, rτ, sτ+1) are critical for learning a good policy. Typically, a large number of transition samples are needed in order to learn a good policy. One approach to obtain the transitions is to sample from the actual environment online, i.e., to get samples from directly interacting with the ISO and the EUCs, till adequate samples are acquired. This approach, however, does not utilize the samples in an efficient manner. In addition, this may incur significant cost for the LSE during action exploration.
Alternatively, we can learn the bid response function ψ and the price response function ϕ from historical data and use the learned response functions as a substitute to the actual environment. The learned response functions can generalize the transition samples to new transitions, and if accurate enough, would allow the learning of good bidding and pricing policies without incurring any cost. The response function learning problems can be cast as supervised learning problems. The objective of the learning algorithm is to minimize the mean squared error between the predicted values and the actual values of the outputs.
To explicitly capture the temporal behavior of the WEMs and EUCs, the dynamical bid response and demand response models as shown (5) and (6) which have states that evolve over time are used. The states in the dynamical model keep necessary information from previous time intervals, and allows more accurate prediction of the WEM and EUC response. These states can be explicitly chosen based on (5) and (6), in which case the model can be represented by a linear function or a multi-layer feedforward neural network (FNN), or implicitly chosen, in which case the model can be represented by a recurrent neural network (RNN) or a long short-term memory (LSTM) unit network.
The bid response model is used to express the relationship between the wholesale market clearing results with the LSE bid. It takes the market cleared results at the upcoming time interval as the output, and both the LSE bid for the upcoming interval, and wholesale market clearing results at time intervals prior to the upcoming time interval as inputs. The wholesale clearing results include the cleared prices and quantities of electricity/energy. Taking the impacts of previous wholesale market clearing results on the cleared results for the upcoming time interval, i.e. the inherent temporal correlation of wholesale market behaviors into account, reduces the mismatches between the actual bid response and the computed response using the bid response model.
Meanwhile, the price response model is used to express the relationship between the aggregate energy consumption of end use customers and the retail price. It takes the aggregate energy consumption as the output. Besides taking the electricity price as one of its input, this function also takes the aggregate energy consumption and price at previous time intervals as inputs to simulate the inherent temporal correlation of the EUC behaviors.
When using linear functions or multi-layer feedforward neural networks (FNNs), the inputs and outputs are chosen explicitly. When learning the bid response function, the inputs are ({λτ, qτ, qtb}t−n
Still referring to
d
t
=W[stT,νt]+b, (15)
where W is a weight vector, st=({dτ, ντ}t−n
Still referring to
h
r
[l]=relu(W[l]xt[l]+b[l]), (16)
where relu(⋅) denotes a rectified linear unit function that is applied element-wise, W[l] is a weight matrix, and b[l] is a bias vector. Note that the output vector of one hidden layer is the input vector for the next hidden layer, i.e., x[t+1]=h[l], except the last hidden layer, the output of which is mapped to the output through a fully connected unit as follows:
y
t
=Wh
[L]
+b, (17)
where W is a weight matrix, and b is a bias vector. The multi-layer FNN can be trained using back-propagation algorithm such that the mean squared error between the predicted output yt and the true value dt is minimized, i.e., by minimizing the following loss function, :
where, mtr is the total number of samples for FNN training.
Alternatively, the dynamic demand response can be implicitly modeled within the neural network, which leads to RNNs, i.e. Recurrent Neural Network. The right part in
h
t
[l]=tanh(Wh[l]ht−1[l]+Wx[l]xr[l]+b[l]), (19)
where tanh(⋅) is applied element-wise, Wh and Wx are weight matrices, b is a bias vector. Note that h−1[l] are initialized to zeros, xt[l]=ht[t−1] for l=2, . . . , L, and xt[l]=(st, νt). The hidden states in RNN are dynamical since their values also depend on their previous values, while those in the FNN are static since their values purely depend on the inputs. The output of the last hidden state vector is mapped to the output through a fully connected unit as in the case of multi-layer FNN. The RNN can be trained by minimizing the same loss function as in (18) using backpropagation through time technique. The input vector only has to include the most-recent information, i.e., when RNN is used, n=1 and st=(νt−1, dt−1, t mod T).
Still referring to
f
t=σ(Wfhht−1+Wfxxt+bf), (20)
i
t=σ(Wihht−1+Wixxt+bi), (21)
o
t=σ(Wohht−1+Woxxt+bo), (22)
then, the two hidden state vectors are updated as follows:
=tanh(WChht−1+WCxxt+bC), (23)
C
t
=f
t
∘C
t−1
+i
t∘, (24)
h
t
=o
t∘tanh(Ct), (25)
where ∘ represents element-wise multiplication. This structure has proven to be very effective in capturing long-term temporal dependencies, and therefore, is expected to outperform the basic RNN unit when representing the dynamical DR model.
Similarly, the bid response function can also be modeled using direct or indirect approaches as demand response function.
Still referring to
Learning Algorithm for Bidding/Pricing Policies Under Synchronized Action Mechanism
After obtained the bid and price response functions, we can next discuss the learning algorithms for pricing policy π. Since the optimal bidding policy can be directly derived from the bid response function, therefore no additional parameter in the bidding policy μ needs to be learned beyond those in ϕ. Assume π is parameterized by a vector θπ. Then, finding the optimal pricing policy is essentially finding the optimal value for θπ. One type of RL algorithms that can find (sub-optimal) values for θπ is the policy gradient methods, which update the parameter vector in the direction that maximizes performance function, J(π). The gradient of J can be computed according to the Deterministic Policy Gradient Theorem. Specifically, the gradient of J with respect to θπ, referred to as the action gradient, is as follows:
∇θ
Note that the gradient of the performance function J depends on the action value function Q, which is not known and needs to be estimated. The deep deterministic policy gradient (DDPG) based RL algorithm is used for solving the joint bidding and pricing optimization problem.
In addition to using the neural networks, there are two more important ideas in the DDPG algorithm. First, target networks, the parameters of which slowly track those of the actor network and the critic network, are used to stabilize the algorithm. The parameter vector of the target network for the critic 735 is denoted by θQ′, and that of the pricing network 725 is denoted by θπ′. Second, a replay buffer is used to store the transitions of the MDP and at each time instance, a mini-batch of size m is sampled from the and used to estimate the gradients. Note that in the training stage, response functions are used to substitute the WEM 715 and EUCs 716. The WEM 715 and EUCs 716 constitute the environment 710. The behaviors of WEM and EUCs can be represented using actual data/measurements during real-time application and can also be simulated using bid and price response functions for training or predicting purposes.
Still referring to
The intuition behind this is to actually find a critic network that satisfy the Bellman optimality equation in (13). Note that the target networks are used to compute the action value as well as the next action, i.e., π′ (si+1). Meanwhile, θπ is updated in the direction that maximizes performance function, J, specifically, the direction of action gradient that is approximated using samples as follows:
Still referring to
The computing retail energy price is achieved in Step 8 as the output of the pricing policy function when the state collected from the environment is available and take as the input. The pricing policy function is represented by the pricing policy network, wherein the pricing policy network is implemented as a neural network.
The computing wholesale bid is achieved in Step 9 as the output of the bidding policy function when the retail energy price, and the state collected from the environment are taken as the input. The biding policy function is represented by the bidding policy network which implemented as a neural network as well. The wholesale bid may include one or more pairs of bid prices and bid amounts, or a function to represent the relationship of bid price and bid amount. The wholesale bid price is also referred as “LSE offer amount” in this disclosure.
The configurations or parameters of critic and actor networks, and critic and actor target networks are adaptively updated with latest information in Steps 10-16.
It is noted that only steps 8-16 are required for computing wholesale bids and retail energy prices for the upcoming time interval r when it is used for real-time application. Meanwhile, the wholesale cleared results and the EUCs aggregate energy consumption used in Step 10 can be replaced with actual data if the actual measurements or information can be collected timely.
It is also worth to mention that a FNN is used for constructing the required neural networks, when a finite number of previous time intervals is used, and a RNN or a LSTM network is used, when all available previous time intervals are used.
Although we only give details for DDPG algorithm based on the DDPG structure given in
Joint Bidding and Pricing for LSE Under Asynchronized Action Mechanism
The above algorithms are devised by assumed that the synchronized action mechanism is used when jointly determining WEM and REM actions. If an asynchronized action mechanism as shown in
For such settings, the retail market model for an LSE can be formulated to maximize its profit earned from time interval t and onwards, subject to the energy balance constraint, which can be mathematically express as follows:
The earned profit from resells can also be determined as (ντdτ−λτdτ) instead of (ντdτ−λτqτb), if payments to ISO is computed based on actual aggregate energy consumption dτ but not qτb, for example, in a real-time market.
According to (29), the values of λ, qb, as well as dc, c∈ at time interval τ, will have an impact on their values in future time intervals. Meanwhile, for a myopic LSE—one that concerns only with the profit for the current time interval, γ is set to 0, resulting in a static optimization that only concerns with the time interval t. If the LSE is farsighted, then γ>0. However, while all future time intervals can be taken into consideration here, decisions that concern any time interval beyond t, such as νt+1, will not be realized immediately. Once new information is revealed, the decisions concerning future intervals can be further improved.
Still referring to
The objective of the joint bidding and pricing problem to be solved by the LSE is to determine the wholesale bid and the retail energy price, based on all available information. Before time interval t, the information related to the WEM that is available to the LSE includes ωτ, λτ, qτ, qτb, ∀τ≤t−1. In the meantime, the information related to the REM that is available to the LSE includes ντ, dτ, ∀τ≤t−1. Let t−1={ωτ, λτ, qτ, qtb, ντ, dτ, ∀τ≤t−1} denote the set of information available to the LSE before the WEM for time interval t is cleared. t−1 is referred to as the prior-WEM-clearing information set. In general, the retail energy price is posted to the EUCs 425 after the clearing of the WEM 420. This gives the LSE more information in addition to t−1, specifically, ωt, λt, qt, qtb, when determining the retail energy price for time interval t. Define t−1=t−1∪{ωt, λt, qt, qtb}, which is referred to as the post-WEM-clearing information set.
The bidding policy is defined as a vector function that maps t−1 to ωt and denoted by π(⋅), as follows:
ωt=π(t−1), (30)
in addition, the pricing policy is defined as a scalar function that maps t−1 to νt and denoted by μ(⋅), as follow:
νt=μ(Jt−1), (31)
where μ return values in [ν,
The joint bidding and pricing problem is next formulated as a Markov decision process (MDP). The MDP consists of a state space, an action space, a reward function, and a transition probability function that satisfies the Markov property, i.e., given the current state and action, the next state is independent of all states and actions in the past.
Specifically, define the state at time interval t to be st=({ωτ, λτ, qτ, qτb}t−n
The action of the LSE consists of two components, ωt and νt, where νt is determined after ωt. As discussed earlier, at the time of determining νt, new information on the state is available and can be used to make more informed decisions. Define an intermediate state for time interval t as {tilde over (s)}t=(ωt, λt, qt, qtb, {ωτ, λτ, qτb}t−n
ωt=π(st), (32)
νt=μ({tilde over (s)}t). (33)
At least one objective of the joint bidding and pricing problem is to maximize the profit of the LSE; therefore, define the reward for time interval t to be
r
t=νtdt−λtqtq−ϕt(dt−qtq), (34)
which is the profit earned by the LSE for the time interval t. The cumulative discounted reward from time interval t and onwards, denoted by Rt and referred to as the return, is Rt=Στ=t∞γτ−trτ. The action value function under bidding policy π and pricing policy μ at action α and state s, denoted by Qπ,μ, (s, α), or equivalently, Qπ,μ(s, ω, ν), is the expected return defined as follows:
Q
π,μ(st,at)=Qπ,μ(st,ωt,νt)=[Rt|s
Then, the joint bidding and pricing problem essentially becomes finding π and μ that maximize the following performance function:
J(π,μ)=[R1;π,μ]. (36)
The dynamical bid response and demand response models used for determining optimal joint bidding and pricing actions are represented by a linear function or a multi-layer feedforward neural network (FNN) explicitly; or a recurrent neural network (RNN) or a long short-term memory (LSTM) unit network implicitly. When learning the bid response function, the inputs are ({λτ, qτ, qtb}t−n
Then, the deep deterministic policy gradient (DDPG) based RL algorithm is used to solve the joint bidding and pricing problem through learning the bidding policy π and pricing policy μ.
Assume π and μ are parameterized by vectors θπ and θμ. Then, finding optimal bidding and pricing policies is essentially finding optimal values for θπ and θμ. The policy gradient methods are used to find (sub-optimal) values for θπ and θμ, which update these parameters in the direction that maximizes J(π, —). For deterministic policies, the gradient of J can be computed using the Deterministic Policy Gradient Theorem. Deterministic polices typically outperform stochastic ones in terms of sample efficiency, and are more desired for control tasks. According to the Deterministic Policy Gradient Theorem, the gradient of J with respect to θπ and θμ, referred to as the action gradients, are determined as follows:
∇θ
∇θ
where {tilde over (s)} is the intermediate state after s following the policy π. Note that the gradient of the performance function J depends on the action value function Q, which is not known and needs to be estimated. The joint bidding and pricing problem is solved using the DDPG algorithm with an actor-critic architecture.
The detailed DDPG based RL algorithm for solving the joint bidding and pricing problem under asynchronized action mechanism is presented in Algorithm 2. At each step, θQ is updated in the direction that minimizes the following loss function, :
Note that the target networks are used to compute the action value for the next time step. Also, an estimated value of {tilde over (s)}i+1, denoted by {tilde over (ŝ)}i+1, is used since the true intermediate state after {tilde over (s)}i following the bidding policy π′ is not known. {tilde over (ŝ)}i+1 is estimated using the trained/fitted bidding response function. Estimated values of the intermediate state are also used when evaluating the action gradients for the same reason.
The algorithm 2 is designed to determine pricing policy and bidding policy using separate neural networks according to
The computing wholesale bid is achieved in Step 8 as the output of the bidding policy function when the state collected from the environment are taken as the input. The biding policy function is represented by the bidding policy network which implemented as a neural network. The wholesale bid may include one or more pairs of bid prices and bid amounts, or a function to represent the relationship of bid price and bid amount. The wholesale bid price is also referred as “LSE offer amount” in this disclosure.
The computing retail energy price is achieved in Step 10 as the output of the pricing policy function when the state collected from the environment and the computed wholesale bid are available and taken as the input. The pricing policy function is represented by the pricing policy network, wherein the pricing policy network is implemented as a neural network as well.
The configurations or parameters of critic and actor networks, and critic and actor target networks are adaptively updated with latest information in Steps 11-18.
It is noted that only steps 8-18 are required for computing wholesale bids and retail energy prices for the upcoming time interval r and preparing for next time intervals, when it is used for real-time application. Meanwhile, the wholesale cleared results and the EUCs aggregate energy consumption used in Step 9 and Step 11 can be replaced with actual data if the actual measurements or information can be collected timely.
In addition, the neural network is represented by using a FNN when a finite number of previous time intervals is used, or a RNN or LSTM network when all available previous time intervals are used.
Similarly, the algorithm 2 can be easily extended to any variations to the DDPG structure.
Exemplar Simulation
The application of the disclosed joint bidding and pricing algorithm can be illustrated through numerical simulations under synchronized WEM and REM action mechanism, in which the multi-layer feedforward neural networks are used to represent the bid and price response functions, and bidding and pricing policy networks.
The WEM model is constructed based on a 300-bus test system, which has 69 generators, each corresponding to one seller, and 195 loads, each corresponding to one buyer. For an illustrative purpose, assume each offer/bid is a pair of offer/bid price (in $/MWh) and quantity (in MW). Then, the wholesale bid ωt is a two-dimensional vector that consists of a bid price and a bid quantity. The offer quantities of the sellers are taken from the generator capacities in the test system, and the offer prices are sampled uniformly from [10, 30] $/MWh. The bid quantities of the buyers are taken from historical loads of a practical system, with their peak values scaled to the nominal loads in the test case, and the bid prices are sampled uniformly from [20, 40] $/MWh. In addition, an inelastic load, the peak value of which equals to 50% of the total generator capacity, is also added. System losses and line congestions are ignored in the WEM clearing problem, and only generation capacity limits are considered. Assume the LSE under study serves 100 EUCs. The backlog rate ηtc is sampled uniformly from [0, 0.5]. The newly generated incremental energy need ξtc is simulated using historical incremental loads from the practical system scaled by a value that is sampled uniformly from [0.1, 2] MW, and added with a zero-mean Gaussian noise that has a scaled standard deviation of 0.1. The benefit function takes the following quadratic form:
βc(etc,dtc)=κtc(etc−dtc)2+ζtcdtc,
where κtc (in $/MWh2) is sampled from a Gaussian distribution with a mean of 10 and a standard deviation of 1, and ζtc is sampled uniformly from [20, 30] $/MWh. The feasible set of the energy consumption is ={dtc≥0}.
Other parameters are set as follows: T=24, i.e., one day is decomposed into 24 segments, and ϕt(x)=5|x|, i.e., the LSE will loss $5 if the aggregate energy consumption in the REM deviates from purchased energy quantity in the WEM by 1 MW. We create two scenarios, a winter scenario in which historical load data from the practical system during 3 months in winter are used, and a summer scenario in which historical load data from the practical system during 3 months in summer are used. In both scenarios, data from the first two months are used for training, while data for the last month are used for testing.
The response functions are critical since they replace the actual environment during the learning process of the bidding and pricing policy, and also are used to determine the state in the MDP formulation. The response functions are represented using neural networks, and the parameters of are learned using the backpropagation algorithm. To illustrate the application of the response functions, we first generate a set of historical data of the WEM, i.e., {ωτ, λτ, qτ}, using the WEM model in (1), and a set of historical data of the REM, i.e., {ντ, dτ} using the EUC model in (4). When generating the data of the WEM, the bid quantities from the LSE under study are sampled uniformly from [0, 80] MW, and the bid prices are sampled uniformly from [20, 40] $/MWh.
A neural network with 2 hidden layers, each consisting of 128 neurons are used as the bid response function. An L2 regularizer with a scale of 0.01 is used. Rectified linear unit (ReLU) is used as the activation function for the two hidden layers and the output layer. Adam optimizer with a learning rate of 0.001 is adopted to train the neural network for 10000 steps. The performance of the response functions is measured by the mean and standard deviation of the absolute error between the actual and predicted responses. Table I shows the mean and standard deviation of the absolute error in the wholesale energy price under different orders of the bid response function. The mean wholesale energy prices of the training data in the winter scenario and the summer scenario are 22.98 $/MWh and 23.72 $/MWh, respectively, and those of the testing data in the winter scenario and the summer scenario are 22.63 $/MWh and 23.40 $/MWh, respectively. Note that a zero-order bid response function takes only information on the time as well as the bid when predicting the WEM clearing results. Both the mean and standard deviation of the absolute error decrease as the order of the response function increases. Yet, the decrease is not significant when the order is great than 1 in both scenarios. Therefore, an appropriate order of the bid response function for this case would be n1=1.
The neural network adopted for the price-based demand response function is similar to that for the bid response function except that the number of neurons in each hidden layer is 256 and the scale of the L2 regularizer is 0.001. The neural network is trained with a learning rate of 0.0002 for 20000 steps. Table II shows the mean and standard deviation of the absolute error in the aggregate energy consumption under different orders of the price response function. The mean aggregate energy consumptions in the training data in the winter and summer scenarios are 40.75 MW and 50.45 MW, respectively, and those of the testing data in the winter and summer scenarios are 38.16 MW and 47.08 MW, respectively. A zero-order price response function takes only information on the time and the price when predicting the aggregate energy consumption. Similar to the argument made for the bid response function, an appropriate order for the price response function would be n2=1.
We emphasize that the appropriate order of response functions may vary from case to case, and need to be determined from the historical data following the procedures presented here. Based on learned response function, the state is st=(λt−1, qt−1, dt−2, νt−2, dt−1, νt−1, t mod T).
The pricing policy network and the critic network each has 2 hidden layers each with 128 neurons. ReLU is used as the activation function for all hidden layers. The output layer of the pricing policy network adopts the tanh function as the activation function, while that of the critic network does not use any activation function. An L2 regularizer with a scale of 0.01 is used for the critic network. The learning rates for the pricing policy network and the critic network are 0.0001 and 0.001, respectively. Note that the bidding policy network essentially the bid price to the retail energy price and the bid quantity to the estimated aggregate energy consumption obtained using the price response function. Therefore, there is no parameter for the bidding part needs to be trained. The minimum price is 20 $/MWh and the maximum price is 40 $/MWh. The update rate for the target networks is 0.001. The size of a mini-batch is chosen to be 64. The discount rate is 0.9. The policy is trained over 200 episodes.
The test results are given in
The wholesale and retail energy prices under RL policy during a typical day are shown in
Referring to
As discussed earlier, the consideration of the long-term behavior is beneficial, compared to the myopic decision making, in which no future rewards are taken into account. To illustrate this, we compare the cumulative rewards under the RL policy with γ=0.9 and those under a myopic policy, or equivalently, the RL policy with γ=0.
As shown in
The computing device 1000A can include a power source 1008, a processor 1009, a memory 1010, a storage device 1011, all connected to a bus 1050. Further, a high-speed interface 1012, a low-speed interface 1013, high-speed expansion ports 1014 and low speed connection ports 1015, can be connected to the bus 1050. Also, a low-speed expansion port 1016 is in connection with the bus 1050. Contemplated are various component configurations that may be mounted on a common motherboard, by non-limiting example, 1030, depending upon the specific application. Further still, an input interface 1017 can be connected via bus 1050 to an external receiver 1006 and an output interface 1018. A receiver 1019 can be connected to an external transmitter 1007 and a transmitter 1020 via the bus 1050. Also connected to the bus 1050 can be an external memory 1004, external sensors 1003, machine(s) 1002 and an environment 1001. Further, one or more external input/output devices 1005 can be connected to the bus 1050. A network interface controller (NIC) 1021 can be adapted to connect through the bus 1050 to a network 1022, wherein data or other data, among other things, can be rendered on a third-party display device, third party imaging device, and/or third-party printing device outside of the computer device 1000A.
Contemplated is that the memory 1010 can store instructions that are executable by the computer device 1000A, historical data, and any data that can be utilized by the methods and systems of the present disclosure. The memory 1010 can include random access memory (RAM), read only memory (ROM), flash memory, or any other suitable memory systems. The memory 1010 can be a volatile memory unit or units, and/or a non-volatile memory unit or units. The memory 1010 may also be another form of computer-readable medium, such as a magnetic or optical disk.
Still referring to
The system can be linked through the bus 1050 optionally to a display interface or user Interface (HMI) 1023 adapted to connect the system to a display device 1025 and keyboard 1024, wherein the display device 1025 can include a computer monitor, camera, television, projector, or mobile device, among others.
Still referring to
The high-speed interface 1012 manages bandwidth-intensive operations for the computing device 1000A, while the low-speed interface 1013 manages lower bandwidth-intensive operations. Such allocation of functions is an example only. In some implementations, the high-speed interface 1012 can be coupled to the memory 1010, a user interface (HMI) 1023, and to a keyboard 1024 and display 1025 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 1014, which may accept various expansion cards (not shown) via bus 1050. In the implementation, the low-speed interface 1813 is coupled to the storage device 1011 and the low-speed expansion port 1015, via bus 1050. The low-speed expansion port 1015, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices 1005, and other devices a keyboard 1024, a pointing device (not shown), a scanner (not shown), or a networking device such as a switch or router, e.g., through a network adapter.
Still referring to
For example, as noted above, the end-use consumers (EUC) can use computing devices 1046, 1048, to request electricity from their load serving entity (LSE) based on their current electricity needs, i.e. electricity to power electrical devices. The computing devices 1046, 1048, can be transactive controllers or active controllers, which can be used to request the electricity from the LSE, i.e. transactive controllers are capable of transmitting bids to the LSE, whereas the active controllers are used to control equipment not capable of computing and transmitting bids to the LSE, but can be help from adaptive control strategies. The EUC inputs their amount of electricity needed, for example, through a web site that transmits the EUC's requests over the Internet to the central computer 1042 used by a load serving entity (LSE) to allocate the electricity. In such instances, the requests can be computed and transmitted by executing computer-executable instructions stored in non-transitory computer-readable media (e.g., memory or storage). The electricity requests include a quantity of the electricity needed and a requested price. It is possible that the central computer 1042 can receive electricity bids from those computing devices associated with EUC's 1046, 1048, and receive electricity offers from those computing devices associated with electricity/power via computing devices 1044, 1050 and 1052.
Still referring to
It is contemplated the hardware processor 1054A can include two or more hardware processors depending upon the requirements of the specific application, wherein the processors can be either internal or external. Certainly, other components may be incorporated with method 1000B including output interfaces and transceivers, among other devices.
It is possible the network 1049 can include, by non-limiting example, one or more local area networks (LANs) and/or wide area networks (WANs). Wherein the networking environments can be similar to enterprise-wide computer networks, intranets and the Internet. Contemplated for all the components mentioned that there can be any number of client devices, storage components, and data sources employed within the system 1000B. Each may comprise a single device or multiple devices cooperating in a distributed environment. Further, system 1000B can include one or more data source(s) (not shown). The data source(s) can comprise data resources for training neural networks to express bid and price response functions. The data provided by data source(s) may include historical wholesale bids and cleared prices and quantities, and historical retail energy prices and aggregate energy consumptions.
The present disclosure improves the existing technology and technological field, for example the fields of electrical power grid management and electrical device control using the transactive controllers. For example, the computing hardware is activating and deactivating the electrical device based on the comparison of the submitted offer amount to the retail price. Specifically, that the components of the systems and methods of the present disclosure are meaningfully applied to improve the control of end-use electrical devices using the transactive computing devices associated with the electrical devices, which in turn, improves the electrical power grid management. Further, the steps of the systems and methods of the present disclosure are by computing hardware associated with the electrical device.
Features
According to aspects of the present disclosure, the user selected desired operating level can be selected from a first user desired operating level and a second user desired operating level, wherein the second user desired operating level is representative of the user choosing to pay more value to attain a desired operation level for the electrical device compared to the first desired operating level.
Another aspect of the present disclosure can include the LRAM is a wholesale electricity market (WEM) operated by an independent system operator (ISO), and the user selected desired operating level of the user, is an end use customer (EUC) consumer of the electricity in the REM, wherein the offer amount and the retail price are utilized by a load serving entity (LSE). Further, an aspect can be that the LRAM is a real-time electricity market or a day-ahead electricity market.
Another aspect of the present disclosure can include the electrical device is one of an air-conditioning unit, heating unit, hot water heater, refrigerator, dish washer, washing machine, dryer, oven, microwave oven, pump, home lighting system, electric vehicle charger, one or more commercial electrical system or home electrical system. Further, an aspect can include that the current environmental data includes environmental data for the user location, as well as forecasted environmental data for the user location for the upcoming time interval.
It is possible that an aspect can be that the stored historical energy futures market data includes past energy futures market information and past LRAM information, and wherein the computing of the offer amount is performed based at least in part on offer amount information from the past energy futures market information, and at least in part on offer amount information from the past LRAM. Wherein the offer amount information from the electricity futures market information comprises offer information from a fixed window of time from a real-time electricity market, and wherein the offer amount information from the LRAM information comprises offer amount information for a rolling window of time.
Another aspect can include the offer amount and the retail price are utilized by a load serving entity (LSE), and computed jointly by maximizing LSE expectation of future profits starting from the upcoming time interval, subject to at least one energy balance constraint, and a future profit of the upcoming time interval determined based on a difference between the retail price and the cleared LRAM price, and an amount of electricity consumed by the user for the upcoming time interval and a cleared quantity of electricity corresponding with the cleared price for electricity from the LRAM. It is possible an aspect can include the amount of electricity consumed by the user is a dynamical demand response function of retail prices, and amounts of electricity consumed by the user at time intervals prior to the upcoming time interval, and the computed retail price for the upcoming time interval. Further, an aspect can be that the dynamical demand response function is learned using a supervised learning approach by a multi-layer feedforward neural network when a finite number of previous time intervals are used, or a recurrent neural network (RNN) or a long short-term memory (LSTM) unit network, when all available previous time intervals are used.
It is possible an aspect can include the cleared price and the cleared quantity of electricity for electricity from the LRAM is a dynamical bid response function of the cleared prices and quantities for electricity by the LRAM at time intervals prior to the upcoming time interval, and the offer amount to the LRAM for the upcoming time interval. Wherein the dynamical bid response function is learned using a supervised learning approach by a multi-layer feedforward neural network, when a finite number of previous time intervals is used, or a recurrent neural network (RNN) or a long short-term memory (LSTM) unit network, when all available previous time intervals are used.
Further, another aspect can be that the offer amount and the retail price of a load serving entity (LSE) is determined jointly, the retail price is first computed using a pricing policy based on previous state information, and the offer amount is then computed using a bidding policy based on previous state information, and the computed retail price; wherein the previous state information includes LSE offer amounts, LRAM cleared prices and quantities, amounts of electricity consumed by the user, and retail prices for all time intervals prior to the upcoming time interval.
Another aspect can include that the offer amount and the retail price of a load serving entity (LSE) is determined jointly, the offer amount is computed using a bidding policy based on previous state information, and the retail price is computed using a pricing policy based on previous state information, the offer amount, and the cleared price and quantity; wherein the previous state information includes LSE offer amount, LRAM cleared prices and quantities, amounts of electricity consumed by the user, and retail prices for all time intervals prior to the upcoming time interval.
Another aspect can include that the retail price of a load serving entity (LSE) is computed by a pricing policy based on current state information, where the state information includes past individualized user selected desired operating levels by the user at past corresponding time intervals to the upcoming time interval, and past individualized LSE retail pricing data for electricity in a retail electricity market (REM) at past corresponding time intervals to the upcoming time interval, and past cleared pricing data for electricity from the LRAM at past corresponding time intervals to the upcoming time interval.
It is possible an aspect can include that the computation of the offer amount and the retail price are computed jointly by formulating a Markov decision process, and solved using deep deterministic policy gradients approach with an actor-critic structure, wherein an actor is implemented by neural networks to determine a candidate of an offer amount and a retail price, and a critic is implemented by neural networks to evaluate a performance of the candidate offer amount and the candidate retail price, to adjust the parameters of neural networks, for improving performance. Wherein the actor includes a pricing policy network, a bidding policy network, and a pricing policy target network, and the critic includes a critic network, and a critic target network, wherein the pricing policy network is first used to compute a retail price, then the bidding policy network is used to compute an offer amount with the computed retail price to improve the overall profit earned by the LSE.
Further still, wherein the actor includes a pricing policy network, a bidding policy network, a pricing policy target network, and a bidding policy target network, and the critic includes a critic network, and a critic target network; wherein the bidding policy network is first used to compute an offer amount, then the pricing policy network is used to compute an retail price with the cleared prices and quantities from the LRAM corresponding to the computed off amount, to improve the overall profit earned by the LSE.
An aspect can include computing the offer amount is based on multiple factors for the upcoming time interval, including the user selected desired operating level, the current environmental data, and the stored historical energy futures market data used to determine inter-temporal correlation behaviors of past offer amounts to a local resource allocation market (LRAM) and the past clearing pricing for electricity by the LRAM, to obtain the offer amount.
Processor, by non-limiting example, as stated in claim 1 can be computer hardware, i.e. a logic circuitry that responds to and processes the basic instructions that drive a computer to implement the algorithm described in present disclosure.
User selected desired operating level of an electrical device, includes, by non-limiting example, a user deciding upon an operating level, i.e. an amount of electricity according to the user desire sensed feeling such as to a cold feeling or a hot feeling. Wherein the user changes an operation of a device, such as a heater device or the like, cooling device or the like, or both, to a user desired operating level according to the user's specific desire in accordance with temperature, humidity and the like. According to the retail energy price changes, the user may change the operating level accordingly to maximize the benefit. The operating level for the upcoming time interval can be determined using the price response function with given retail energy price for the upcoming time interval.
Upcoming time interval is a time interval in the future as opposed to a current time interval, which is at the current moment in time, or to a past time interval which is a time before the current time interval.
Computing an offer amount refers to determining a representative of a value, i.e. price, at which electricity is available to be supplied to operate an electrical device for an upcoming time interval at a user selected desired operating level, i.e. an amount of electricity. The offer amount is included in the wholesale bid as a bid price along with a bid quantity, i.e. the user selected desired operating level in the Specification. By non-limiting example, the computing an offer amount can be found in Section headings of the Specification labeled “computing wholesale bid”, however, some Sections may not be labeled. The computing an offer amount is jointly implemented with the computing a retail price within the deep reinforcement learning process as described in Algorithm 1, or in the algorithm 2, according to different synchronization mechanism is used between the wholesale market actions and the retail market actions.
During the real-time application, when the synchronized action mechanism is used between the wholesale market and the retail market, the computing an offer amount and computing a retail price for the upcoming time interval is achieved through the following consecutive steps:
Similarly, when the asynchronized action mechanism is used between wholesale market and retail market, the computing an offer amount and computing a retail price for the upcoming time interval is achieved using the following consecutive steps:
Computing a retail price refers to determining a retail price of electricity for operating the electrical device. By non-limiting example, the computing of a retail price found in Section headings of the Specification labeled “computing retail price”, however, some Sections may not be labeled. The computing a retail price is jointly implemented with computing an offer amount within the deep reinforcement learning process described in Algorithm 1, or the algorithm 2, according to different action synchronization mechanism is used between the wholesale market and the retail market. Detailed steps can be found in above paragraphs on computing an offer amount.
Clearing pricing for electricity by a local resource allocation market (LRAM), is a price cleared by an operator, which is the price used to assist in computing the retail price. The clearing pricing for electricity is determined along with a cleared quantity for electricity that is purchased from the wholesale electricity market and then resold in the retail electricity market. For example, electricity (both power and energy) is a commodity capable of being bought, sold, and traded. An electricity market is a system enabling purchases, through bids to buy; sales, through offers to sell; and short-term trades, generally in the form of financial or obligation swaps. Bids and offers use supply and demand principles to set the price. Wholesale transactions (bids and offers) in electricity are typically cleared and settled by the market operator or a special-purpose independent entity charged exclusively with that function. Market operators do not clear trades but often require knowledge of the trade in order to maintain generation and load balance. For example, market clearing can begin with the organizing of both buying and selling components. Wherein buyers can be organized from highest price to lowest price. Sellers can be organized from lowest price to highest price. Then, one approach may be that buyer and seller curves are created by an accumulating sum of the quantities associated with these organized prices. Wherein the curves can be implemented as computer-usable representations of the curves that are computed and stored once a necessary input data is received. The representations of the curves can include, groups of values or other data elements or structures. The two, sorted curves can then be overlaid or otherwise analyzed to determine an intersection between the curves. In general, the market clears at the intersection of the buying and selling curves of the market. The commodities within an electric market generally consist of two types: power and energy. Power is the metered net electrical transfer rate at any given moment and is measured in megawatts (MW). Energy is electricity that flows through a metered point for a given time interval and is measured in megawatt-hours (MWh). Markets for energy-related commodities trade net generation output for a number of intervals usually in increments of 5, 15 and 60 minutes.
Past clearing pricing for electricity by the LRAM are clearing pricing for electricity from a past time, i.e. historical clearing pricing for electricity in time.
Clearing pricing for electricity in the REM is the retail price cleared in the REM and the user is charged with this price for the actual energy consumption to operate the electrical device at desired operating levels.
Past clearing pricing for electricity in the REM are clearing pricing for electricity in the REM from a past time, i.e. historical clearing pricing for electricity in the REM in time.
Inter-temporal correlation behaviors of offer amounts to a local resource allocation market (LRAM) and cleared prices and quantities for electricity by the LRAM, refers to that there is a time-dependent relationship between offer amounts and cleared prices and quantities for electricity. For example, the cleared prices and quantities for electricity for a given time interval depend on not only the offer amount for the given time interval, but also the offer amounts and the cleared prices and quantities for electricity at previous time intervals to the given time interval.
Inter-temporal correlation behaviors of user selected desired operating levels by the user, pricing for electricity in the REM and the clearing pricing for electricity by the LRAM, refers to that there is a time-dependent relationship among user selected desired operating levels by the user, pricing for electricity in the REM and the clearing pricing for electricity by the LRAM. For example, the pricing for electricity in the REM for a given time interval is related to the clearing pricing for electricity by the LRAM at the given time interval, and user selected desired operating levels, the clearing pricing for electricity by the LRAM, and the pricing for electricity in the REM at previous time intervals.
Energy balance constraint, by non-limiting example, refers to the amount of electricity/energy purchased from the wholesale electricity market must be equal to the amount of electricity/energy sold to the retail electricity market, and any mismatches must be reduced by adjusting aggregate power consumption of EUCs, or charged with additional costs. The adjusting aggregate power consumption can be achieved through either adjusting the user operating levels of electrical devices, shifting the energy usage from one time-interval to another time-interval, adjusting charging and discharging statuses of storages own by the LSE, or adjusting operating levels of distributed energy resources own by the LSE.
Comparing the submitted offer amount to the retail price, by non-limiting example, can include determining if there is a difference between the submitted offer amount and the retail price. If yes, an adjustment on the user operating level is required to maximize the user's benefit based on the retail price.
Activate or deactivate the electrical device can be carried out based on the comparison results on the submitted offer amount to the retail price, to adjust actual user operating level. The activating and deactivating of the electrical device, can include activating the electrical device by supplying electricity or deactivating the electrical device by not supplying electricity, or activating/deactivating some components of the electrical device. The user desired operating level can be determined by using the dynamical demand response function with given retail price. For example, the user has an electrical device consisting of multiple electric heaters. This user can adjust the user operating level by switching on or off some heaters for matching the determined amount of electricity by the demand response function corresponding to the retail price.
The following description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing one or more exemplary embodiments. Contemplated are various changes that may be made in the function and arrangement of elements without departing from the spirit and scope of the subject matter disclosed as set forth in the appended claims.
Specific details are given in the following description to provide a thorough understanding of the embodiments. However, understood by one of ordinary skill in the art can be that the embodiments may be practiced without these specific details. For example, systems, processes, and other elements in the subject matter disclosed may be shown as components in block diagram form in order not to obscure the embodiments in unnecessary detail. In other instances, well-known processes, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments. Further, like reference numbers and designations in the various drawings indicated like elements.
Also, individual embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged. A process may be terminated when its operations are completed, but may have additional steps not discussed or included in a figure. Furthermore, not all operations in any particularly described process may occur in all embodiments. A process may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, the function's termination can correspond to a return of the function to the calling function or the main function.
Furthermore, embodiments of the subject matter disclosed may be implemented, at least in part, either manually or automatically. Manual or automatic implementations may be executed, or at least assisted, through the use of machines, hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine readable medium. A processor(s) may perform the necessary tasks.
Further, embodiments of the present disclosure and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Further some embodiments of the present disclosure can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Further still, program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
According to embodiments of the present disclosure the term “data processing apparatus” can encompass all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
Although the present disclosure has been described with reference to certain preferred embodiments, it is to be understood that various other adaptations and modifications can be made within the spirit and scope of the present disclosure. Therefore, it is the aspect of the append claims to cover all such variations and modifications as come within the true spirit and scope of the present disclosure.