SMART CHARGE SCHEDULING FOR AN AGGREGATE OF ELECTRIC VEHICLES CONSIDERING GRID DEMAND

TECHNICAL FIELD

This disclosure relates to controlling charging of individual electric vehicles (EVs) based on charge scheduling considering vehicle settings in addition to aggregate electric demand and cost of electric grid power.

BACKGROUND

An ongoing shift toward electrification and the adoption of EVs has raised concern among electric utility companies with respect to the large loads of energy drawn from the grid for sustained periods of time to charge these vehicles. Charging demands from EVs (averaging around 16 kWh/EV/day) can strain the electric grid such that utility companies are implementing various demand management strategies. EV sales in the United States alone increased by 85% from 2020 to 2021 according to the US Department of Energy. This rapid growth poses challenges both charging service providers (charging platforms) and the grid operators as a large-scale EV charging market leads to a highly random and significant load to the power grid. This problem can be mitigated by: 1) ramping up and down the energy output generated by dispatchable power plants (which generally uses non-renewable sources of energy); or 2) scheduling the charging of EVs by a platform to coordinate with the grid operators and renewable energy providers. The former approach requires significant capital investment in building new infrastructure. On the other hand, the latter approach is more straightforward to implement due to the inherent flexibility of the EV charging process and widespread availability of smart chargers. For example, charging one EV may take up to two hours, but the customer may park the EV in a charging station overnight, allowing for slower, delayed, or preemptive charging. Furthermore, customers may opt for a flexible charging service that allows the charging provider to charge the EV between a minimum target SoC and a maximum target SoC.

Scheduling algorithms for EV charging have been developed under various assumptions and with specific goals. Typically, in a day-ahead market, the platform has complete information about the future demand, and thus, the charging process can be scheduled offline by a deterministic algorithm. For instance, various algorithms have been developed to solve valley-filling problems that shave the difference between the charging loads and grid capacity. Another set of algorithms optimize the profits/costs/social-welfare of charging through a deterministic optimization. Other strategies employ game theoretic approaches.

However, current algorithms either do not exploit information that are available from past data or are too computationally complex to be able to schedule a large number of EVs. For instance, online algorithms that schedule according to a departure deadline or laxity of charging, i.e. Early Deadline First (EDF) and Least Laxity First (LLF) algorithms, do not incorporate the information of future demand that can be inferred from past data, and thus, are sub-optimal in many scenarios.

If the distribution of the future demand is known, then the charging platform can apply Model Predictive Control (MPC), scenario-based algorithms, or other stochastic optimization methods to optimize the charging schedule. For instance, MPC has been used to maximize charging profits for each EV or to track a specified demand trajectory. Nonetheless, these algorithms schedule the charging processes either with an integer programming or using the dynamics for individual EVs. This leads to an increase in the computational complexity (potentially exponentially) as the number of EVs grows in the market and corresponding challenges to real-time implementation in a large-scale EV charging market. This intractability is also present in recently proposed data-driven reinforcement learning based scheduling algorithms due to the very high number of past samples needed for learning an approximately optimal policy.

In addition to the limitation stated above, these algorithms consider single types of demand. The multi-type demand strategies mainly focus on the types or levels of charging rates among the EVs. For instance, Bayram et. al. applied a multi-class queue network to model the charging services with different charging rates. The goal is to optimize the quality of the service and the charging cost by tuning the prices of charging that affect the demand rate. Kong et. al. also used the queue network framework to allocate appropriate chargers to different types of EVs. Khalkhali et. al. proposed a two-stage algorithm that schedules EV charging with slow/fast charging services to minimize the expected charging costs.

SUMMARY

The present inventors have recognized that controlling charging of EVs with a focus on multiple types of flexibility of charging demand benefits both the customers and the charging platform in that the customers can choose a lower charging price in exchange for platform flexibility of not charging to their specified target SoC. This provides the platform more flexibility to drop the aggregated charging demand during the peak-hours, which can reduce the charging costs to the platform. As such, in one or more embodiments according to the disclosure, control of charging individual EVs is performed based on preemptive scheduling of charging a large number of EVs with electric grid services using a stochastic dynamic program with a state-dependent action constraint.

In one or more embodiments, control of charging an aggregate of EVs is based on use of approximate dynamic programming (ADP) to compute a scheduling algorithm for the charging of the EVs that maximizes the profit of the EV charging platform. This algorithm facilitates a mix of inflexible and flexible charging demand types employing a multi-stage algorithm that efficiently solves the high dimensional scheduling problem, along with the complexity and optimality analysis. Each EV that arrives at a charging station is assigned a category depending on its arrival/departure time and initial/target state of charge (SoC). This categorization allows scheduling and control of the charging process for a large number (on the order of millions) of EVs because the computation complexity depends only on the number of the categories, rather than the number of EVs. In addition, the system and method allow the customer to specify a flexible charging demand with a minimum target SoC and a maximum target SoC. While this additional flexibility adds an extra dimension in the state and action space that would otherwise lead to at least O(L²) time complexity where L is the number of classes, various embodiments employ a multi-stage algorithm that sequentially solves the scheduling problem to reduce complexity to O(L) time complexity. The sufficient condition for the multi-stage algorithm to be optimal is also described.

Embodiments may include a method for controlling charging of multiple electric vehicles (EVs) arriving at, and departing from, different charging stations at different times, comprising, by one or more processors: scheduling charging of each EV of the multiple EVs responsive to which one of a plurality of categories each EV is assigned, each EV assigned to one of the categories according to an arrival time at an associated one of the different charging stations, a departure time from the associated one of the different charging stations, an initial state of charge (SoC) of the EV, and a target SoC of the EV; and controlling charging of each EV responsive to the scheduling of charging for the assigned category of each EV. The method may further include assigning each EV to one of the categories according to one of a plurality of charging demand types designated by the EV. The plurality of charging demand types may include a reliable charging demand and a flexible charging demand. For EVs designating the flexible charging demand, the scheduling may be responsive to a specified minimum target SoC and a specified maximum target SoC of the EV.

In various embodiments, scheduling charging of each EV includes scheduling charging of EVs designating the reliable charging demand assuming the EVs designating the reliable charging demand will consume all available electricity during a specified time period based on an associated grid upper demand band for the specified period, and scheduling charging of EVs designating the flexible charging demand based on the specified minimum target SoC of the EVs designated the flexible charging demand. Scheduling charging of each EV may also include allocating available electricity from an electrical grid to each of the plurality of categories based on a number of EVs in each category arriving at the charging stations during a designated time period and a cost associated with allocated available electricity, the allocated available electricity limited by a minimum of remaining electricity available for each category and charging capacity of the multiple EVs. In various embodiments, the cost associated with the allocated available electricity corresponds to a net cost corresponding to cost of electricity supplied by an electric grid less a cost associated with a penalty corresponding to the allocated available electricity corresponding to EVs designating the reliable charging demand being insufficient to charge the EVs designating the reliable charging demand to associated target SoCs. The cost associated with the allocated available electricity may correspond to a net cost corresponding to cost of electricity supplied by an electric grid less a cost associated with a penalty corresponding to the allocated available electricity corresponding to EVs designating the flexible charging demand being insufficient to charge the EVs designating the reliable charging demand to associated minimum target SoCs.

In at least one embodiment, scheduling charging of each EV includes determining a number of EVs in each category designating a reliable charging demand and arriving at a specified time using a designated statistical distribution, allocating electricity from the grid available for charging to each category designating the reliable charging demand for a second specified time period, associating electricity cost of electricity allocated to each category designating a reliable charging demand, and allocating any electricity available after satisfying the reliable charging demand for the second specified time period to EVs designating the flexible charging demand. The scheduling may further include limiting allocation of electricity from the grid available for charging to each category designating the reliable charging demand to a minimum of remaining electricity available from the grid and charging capacity of the EVs designating the reliable charging demand.

Embodiments may also include a computer-implemented method for controlling charging of a large number of electric vehicles (EVs) arriving and departing different charging stations at different times, the method comprising, by one or more computers: assigning one of a plurality of categories to each EV that arrives at a charging station depending on arrival time to the charging station, designated departure time from the charging station, initial state of charge (SoC) of the EV, target SoC of the EV, and a charging demand type specified by the EV; scheduling charging of each EV according to which of the plurality of categories the EV has been assigned and the charging type specified by the EV; and controlling charging of each EV based on the scheduling. The method may include scheduling based on the demand type specified by the EV corresponding to a flexible demand type, wherein EVs specifying the flexible demand type specify a minimum target SoC and a maximum target SoC, and wherein controlling charging is based on the minimum target SoC and the maximum target SoC specified by the EV. The method may include, for categories associated with the flexible demand type, scheduling based on controlling the charging to satisfy the minimum target SoC for each EV specifying the flexible demand type, and continuing to charge to the maximum target SoC responsive to profit associated with charging to the maximum target SoC exceeding a threshold. In various embodiments, the demand type includes a reliable demand type, wherein scheduling charging comprises allocating electricity available from the grid to categories associated with the reliable demand type before allocating the electricity available from the grid to categories associated with the flexible demand type. The scheduling may include assigning a penalty cost for each EV specifying the flexible demand type that is not charged to at least the minimum target SoC prior to the departure time. In various embodiments, the scheduling is based on a statistical distribution representing EV arrival times to the charging stations and designated departure times from the charging stations.

Various embodiments include a system comprising a plurality of electric vehicle (EV) charging stations each configured to charge a plugged EV during a time period specified by at least one remotely-located processor, the processor configured to schedule charging of EVs for all of the charging stations by scheduling a plurality of charging categories, each EV assigned to one of the plurality of categories by the processor based on arrival time to a charging station, expected departure time from the charging station, state of charge (SoC) of the EV upon arrival, target SoC of the EV before the expected departure time, and a charging demand type specified by the EV. Each of the plurality of charging stations may include a processor configured to control charging of an associated plugged EV according to the charging schedule. The remotely-located processor may be configured to schedule charging of EVs based on a minimum and maximum available power provided from an associated electric grid. The charging demand type may include a flexible demand having an associated minimum target SoC and maximum target SoC specified by the EV, wherein the remotely-located processor is further configured to schedule charging of EVs to provide the minimum target SoC for all EVs specifying the flexible demand, and to continue charging the EVs specifying the flexible demand above the minimum target SoC only if an associated profit exceeds a threshold.

Embodiments of the disclosure may provide one or more associated advantages. For example, vehicle manufacturers may facilitate aggregate scheduling and control of EV charging by platforms in consideration of grid demand by providing vehicle customers the ability to designate flexible charging parameters via a vehicle human-machine interface (HMI), such as a touch-screen display or wired/wireless connected smart device. Similarly, customer preferences, such as inflexible charging or flexible charging and corresponding minimum/maximum target SoC may be automatically communicated to the charging platform. The ability of the vehicle manufacturer to control an individual EV charging demand based on default or customer specified settings allows shifting of charging demand temporally and/or geographically to assist the utility companies' demand management strategy by coupling the capability to control charging of an EV with the readily available flexibility in charge scheduling while that EV is parked and plugged. The scalable and tractable framework for coordinating the charging of a large number of EVs according to embodiments of the present disclosure creates a mutually beneficial system for customers, charging platforms, and the electric utilities. Control of aggregated charging demand by the vehicle manufacturer may provide the ability for OEMs to participate in the wholesale electricity market by biding in capacity markets, demand response markets, and aggregator's markets, for example.

As those of ordinary skill in the art will appreciate, the claimed subject matter enables exchange of vehicle data in a more efficient and secure manner, enhances the data validity check before using the data for vehicle and other operations, and protects data users (whether human or controllers) from being sniffed, spoofed, or hacked.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates scheduling and control of EV charging for a large number of EVs using categorization of EVs.

FIG. 2 illustrates relative performance in terms of profits and energy, respectively, of approximate dynamic programming (ADP), simple programming (SP), and first-come first-serve (FCFS) algorithms.

FIG. 3 illustrates cumulative profits and energy consumption for reliable demand compared to cumulative profits and energy consumption for flexible demand.

FIG. 4 illustrates profits and energy consumption associated with two different penalty amounts.

FIG. 5 illustrates profits and energy consumption for flexible demand relative to varying penalties.

FIG. 6 illustrates profits and energy consumption for different grid power bounds.

DETAILED DESCRIPTION

Embodiments of the present disclosure are described herein. It is to be understood, however, that the disclosed embodiments are merely examples and other embodiments can take various and alternative forms. The figures are not necessarily to scale; some features could be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention. As those of ordinary skill in the art will understand, various features illustrated and described with reference to any one of the figures can be combined with features illustrated in one or more other figures to produce embodiments that are not explicitly illustrated or described. The combinations of features illustrated provide representative embodiments for typical applications. Various combinations and modifications of the features consistent with the teachings of this disclosure, however, could be desired for particular applications or implementations.

Control logic, functions, code, software, algorithms, strategy etc. as described herein with reference to the figures is performed by one or more processors, controllers, computers, etc. executing instructions stored in one or more non-transitory computer readable media to control charging of individual EVs based on scheduling of the charging processes of a large number of EVs. Various steps or functions illustrated or described may be performed in the specified sequence, in parallel, or in some cases omitted. Although not always explicitly illustrated, one of ordinary skill in the art will recognize that one or more of the illustrated steps or functions may be repeatedly performed. Similarly, the specified order of processing is not necessarily required to achieve the features and advantages described herein, but is provided for ease of illustration and description.

Notations

Let x=(x_i)_i∈ custom-character be a real multivariate with an index set . min{x,y}:=(min{x_i,y_i}_i∈ is the element wise minimum of x,y. We let (x_i)_i∈⁺:=min{(x_i)_i∈, 0}. Further, ^d×d′ is an d×d′ dimensional all zero matrix, and ^d×dis an d×d dimensional identity matrix. Let _b() denote the space of all continuous and bounded functions endowed with supremum norm on a set custom-character .

Other notations used in this disclosure are provided below:

- x_s^r/x_s^f,l—state for reliable/flexible demands)
- y_s^r/y_s^f,l—vector of number of plugged-in EVs with reliable/flexible demands
- w_s^rw_s^f,l—vector of number of new arrivals with reliable/flexible demands
- u_s^ru_s^f,l—vector of the electricity allocation to each reliable/flexible category
- d_s^rd_s^f,l—vector of grid bounds reliable/flexible demands
- c_s^rc_s^f,l—vector of charging cost for reliable/flexible demands
- z_s^r—vector of remaining electricity required for each reliable demand category
- z_s^f,lz_s^f,l—vector of maximum/minimum remaining electricity required for each flexible demand category
- p_s^f,l—vector of penalty if the minimum request 1 is not met
- ^r^f,l—menu for reliable/flexible demands
- r—charging rate

According to the present disclosure, the EV charge scheduling problem is formulated as a dynamic optimization problem with a finite time horizon custom-character ={0, . . . , T}. Consider an operator that provides a menu-based charging service—a customer plugs in the EV at time t and selects the menu (m, n) in a charging application on a connected smart phone or the panel on the charger, for example, before the charging process. The facility will supply m units of electricity from time t to time t+n−1 to the EV.

The EV charging scheduling and corresponding control problem is illustrated graphically in FIG. 1. System 100 provides EV charging scheduling and control for a group of EVs 120 including a large number of individual EVs 110 based on aggregate demand of the group and preferences or settings of the individual EVs 110 to achieve a particular goal, such as maximizing profit of the charging platform and/or managing grid demand, for example. The group of EVs 120 are typically distributed across a geographic area and arrive at different types of charging facilities 130 at different times with different planned departure times. Charging facilities 130 may include home chargers 132 and commercial charging stations 134, for example. The large group of EVs 120 are not necessarily associated or affiliated with a common manufacturer, charging facility, fleet owner, vehicle type, etc. and may be knowingly or unknowingly affiliated with a particular platform 140 that provides remote control of EV charging based on scheduling performed by platform 140 as described herein. The large group of EVs 120 may include thousands or millions of EVs across a large geographic area.

Customer vehicle charging preferences may be set by default by a vehicle manufacturer and/or may be selected by an app associated with the vehicle and accessed via mobile computing device, such as smartphone or computer, for example. One or more menu items or preferences may be selected in advance or upon arrival and plug-in at a home charger 132 or charging station 134. Those of ordinary skill in the art will appreciate that the menu parameters m, n, etc. described herein are not necessarily selectable or otherwise displayed to the customer, but may be determined by a particular charging platform 140 and used in scheduling and controlling EV charging over a selected time period as generally represented at 150 and described in detail herein. In one embodiment, menu parameters m, n, etc. are computed based on the current/target state of charge (SoC) and arrival/parking time entered by the customers, or automatically communicated by the vehicle, charging app, etc. to the charging platform, or otherwise detected by the charging platform 140. In at least one embodiment, two types of charging demands may be provided by charging platform 140 and coordinated via customer preference selection: reliable (or inflexible) and flexible demands charging. For simplicity and ease of exposition, the following description assumes that all EVs charge with a constant charging rate given by r kW. Of course, those of ordinary skill in the art will recognize that the described simplified implementation may be extended to implementations with variable charging rates or multiple charging rates.

For inflexible charging or reliable demand, the menu selections are denoted by custom-character ^r⊂{1, . . . , M}×{1, . . . , N}, where an item (m, n)∈^rmeans that the charging facility or platform 140 will provide m units of electricity within the next n time slots. We further let _n^r={m: (m, n)∈^r}. As illustrated in FIG. 1, the platform operator 140 assigns a category (t, m, n)∈T× custom-character ^rto every EV depending on the preferences input by the EV owner in a smartphone app or via another vehicle or charging facility HMI as represented at 142. Let _s^r⊂×^rdenote the categories of EVs that are present at time s. Define _s,1^r={(t, m, n):s=t+n−1} to be the categories of the EVs that are connected at time s but will depart at time (s+1) and custom-character _s,2^r=_s^r\_s,1^r.

Let w^t,m,nbe the number of EVs in category (m, n) that arrive at time t, which is a non-negative bounded integer valued random variable with a known distribution. The selected known distribution may be supported by observational data collected by the charging platform or otherwise determined for a particular application or implementation. We let w^t,m,n=0 whenever t<0. We further assume that the sequence of random variables are mutually independent. Let w_t^rbe the random vector representing all new arrivals at time t:

$w_{t}^{r} = {[{(w^{t, m, 1})}_{m \in ℬ_{1}^{r}}, \dots, {(w^{t, m, N})}_{m \in ℬ_{N}^{r}}]}^{⊤} \in 𝒲_{t} := \prod_{(m, n) \in ℬ} {0, \dots, {\overline{w}}^{t, m, n}} \subset ℕ^{\dim (ℬ)} .$

We let y_s^rdenote the vector of the number of EVs at the charging station in each category in custom-character _s^r:

$y_{s}^{r} := {(w^{t, m, n})}_{(t, m, n) \in 𝒥_{s}} \in 𝒴_{s} := \prod_{(t, m, n) \in 𝒥_{s}} {0, \dots, {\overline{w}}^{t, m, n}},$

where y_s^ris formed in the order of leaving time, i.e.

$y_{s}^{r} := [{(w^{s - 1, m, 1})}_{m \in ℬ_{1}}, \dots \underset{leaving at s}{\underset{︸}{, {(w^{s - N, m, N})}_{m \in ℬ_{N}},}} \dots, 0, \dots \underset{leaving at s + N - 1}{\underset{︸}{, {(w^{s - 1, m, N})}_{m \in ℬ_{N}}}}] .$

At each time s, the total electricity allocated to the EVs in the category (t, m, n)∈ custom-character _s^ris denoted by u_s^t,m,n. We let u_s^rbe the vector of the electricity allocation to each category (t, m, n)∈_s^r:

$u_{s}^{r} = {(u_{s}^{t, m, n})}_{(t, m, n) \in 𝒥_{s}^{r}} \in 𝒰_{s}^{r} := ℝ_{+}^{\dim (𝒥_{s}^{r})} .$

We assume that for t<0, we let u_s^t,m,n=0. We also have the constraint that the total electricity allocated to all the categories having reliable demand be in the interval [d_r^r, d_s^r], that is, d_s^r≤ custom-character ^Tu_s≤d_s^r, where is a column vector of all 1 of appropriate dimension. Let d_s^r=[d_s^r, d_s^r]^T.

Suppose that allocating one unit of electricity to (t, m, n) at time s incurs a cost c_s^t,m,n. Then, the total cost to the operator at each time is c_s^rTu_s^r, where

$c_{s}^{r} = {(c_{s}^{t, m, n})}_{(t, m, n) \in 𝒥_{s}^{r}} \in 𝒰_{s}^{r} := ℝ^{\dim (𝒥_{s}^{r})} .$

Here, c_scan represent either the cost of electricity or the cost of electricity minus the revenue per kWh from the EV owner. Thus, c_scan take positive or negative values.

Let z_s^t,m,nbe the remaining electricity required by the category (t, m, n)∈ custom-character _s, which is updated as

$\begin{matrix} z_{s + 1}^{t, m, n} = {\begin{matrix} {my}^{t, m, n} & s = t - 1 \\ z_{s}^{t, m, n} - u_{s}^{t, m, n} & t \leq s \leq t + n - 2 \\ 0 & s \geq t + n - 1 \end{matrix} & (1) \end{matrix}$

then, let z_s^rbe (z_s^t,m,n)_(t,m,n)∈ custom-character _s_rand _s⊂₊^dim(^s⁾be the space of z_s^r.

We let x_s^r=[y_s^r, z_s^r, d_s^r]∈ custom-character _s^rbe the state of the reliable demand, where _s^r:=_s×_s×₊²is the corresponding state space, and d_s^r=[d_s^r, d_s^r] is the deterministic “actuation noise”. For simplicity, we assume that the noise has a Dirac mass at the point d_s^rin a day-ahead market. This can be relaxed as described in greater detail below.

Let u_s^rbe the actions of the system. For each state x_s^r∈ custom-character _s^r, the feasible action u_s^rshould satisfy that u_s^r∈Γ^r(x_s^r), where Γ_s^r_s^ris a correspondence given by

Γ^r(x_s^r):={u_s^r∈ custom-character _s^r:0≤u_s^r≤g^r(x_s^r),d_s^r≤^Tu_s^r≤d_s^ru_s^t,m,n=z_s^t,m,nfor all (t,m,n)∈_s,1^r}, (2)

where g(x_s^r):=min{ry_s^r,z_s^r}. Here, u_s^r∈Γ^r(x_s^r) guarantees that, at each time s, the allocated electricity is upper bounded by the minimum of the remaining electricity z_s^rand the charging capacity ry_s^r.

As previously described, in addition to the inflexible charging selection represented by the reliable demand described above, various embodiments according to the disclosure provide aggregate scheduling of EVs that select a flexible charging service so that the platform can charge these EVs to an SoC between a selected or specified minimum target SoC and maximum target SoC. In this scenario, the menu is denoted by custom-character ^f⊂{1, . . . , M}×{1, . . . , N}×{1, . . . , L} where an item (m, n, l) represents that the facility provides at least 1 and at most m units of electricity within n time slots. The notations used in the flexible demand setting are similar to the notations used previously for the reliable demand setting, but with the superscripts (t, m, n) and r replaced by (t, m, n, l) and (f, l), respectively, for the flexible demand with minimum demand l. For instance, we denote w_s^f,l, y_s^f,l, z_s^f,l, c_s^f,land u_s^f,l, respectively, as the vector of new arrivals w^t,m,n,l, the number of EVs at the charging station y^t,m,n,l, the remaining unit of electricity z_s^t,m,n,l, the cost of charging c_s^t,m,n,l, and the amount of charging u_s^t,m,n,l.

Note that the platform only needs to meet the minimum demand l. This leads to introducing a new variable z_s^f,l=(z_s^t,m,n,l)_(t,m,n)∈ custom-character _s_f,lto capture the remaining minimum demand, which is defined analogously with equation (1) by replacing m with l, i.e.

${\underline{z}}_{s + 1}^{t, m, n, l} = {\begin{matrix} {ly}^{t, m, n, l} & s = t - 1 \\ {(z_{s}^{t, m, n, l} - u_{s}^{t, m, n, l})}^{+} & t \leq s \leq t + n - 2 \\ 0 & s \geq t + n - 1 \end{matrix} .$

We also denote z_s^f,las the vector of

${({\underline{z}}_{s + 1}^{t, m, n, l})}_{(t, m, n, l) \in 𝒥_{s}^{f, l}} .$

The reliable demand setting uses an equality constraint in equation (2) to impose that the demand m is met within the charging window. Instead, under the flexible demand setting, the platform will compensate the customers whose minimum demand is not met. Let

$p_{s}^{f, l} = {(p_{s}^{t, m, n, l})}_{(t, m, n . l) \in 𝒥_{s, 1}^{f, l}}$

be the penalty vector, which is the monetary penalty per kWh that the platform pays to the customer, if the minimum demand l is not met at the end of the charging window. The total penalty paid by the platform at each time s is given by

$p_{s}^{f, l^{⊤}} [{({\underline{z}}_{s}^{f, l} - u_{s}^{f, l})}_{(t, m, n, l) \in 𝒥_{s, 1}^{f, l}}^{+}] .$

In this case, we let x_s^f,l=[y_s^f,l,z_s^f,l, z_s^f,l,d_s^f,l]∈ custom-character _s^fbe the state of flexible demand and _s^f:=_s×_s×_s×₊². Further, the feasible action set for the flexible demand is the correspondence Γ^f,l:_s^f→, which is

$\begin{matrix} Γ^{f, l} (x_{s}^{f, l}) := {u_{s}^{f, l} \in 𝒰 : 0 \leq u_{s}^{f, l} \leq g (x_{s}^{f, l}), d_{s}^{f, l} \leq ⊤ u_{s}^{f, l} \leq {\overline{d}}_{s}^{f, l}}, & (3) \end{matrix}$

where d_s^f,l=(d_s^f,l, d_s^f,l) is defined similarly with d_s^rbelow. In this case, we let d_s=(d_s,d_s) be the total energy bound for the platform (both reliable and flexible demand) at each time s, then d_s^rand d_s^f,lsatisfy that

$\begin{matrix} {\overline{d}}_{s} = {\overline{d}}_{s}^{r} + \sum_{l = 1}^{L} {\overline{d}}_{s}^{f, l}, {\underline{d}}_{s} = {\underline{d}}_{s}^{r} + \sum_{l = 1}^{L} {\underline{d}}_{s}^{f, l}, l = 1, \dots, L . & (4) \end{matrix}$

The feasible set of flexible demand represented by equation (3) removes the equality constraints in equation (2), which allows charging an EV in (t, m, n, l) from 0 to m units of electricity. Note that the penalty p_s^f,lcan be changed to ensure the minimum demand l is satisfied.

We next determine the state of the EV charging system, the transition dynamics, and pose the stochastic dynamic program.

The system has linear dynamics for both reliable and flexible demand with minimum demand l, and the state transition functions ƒ^r, ƒ^f,1, . . . , ƒ^f,Lgiven by:

$\begin{matrix} x_{s + 1}^{r} = f^{r} (x_{s}^{r}, u_{s}^{r}, w_{s}^{r}, d_{s + 1}^{r}) := [\begin{matrix} A_{y}^{r} y_{s}^{r} + C_{y}^{r} w_{s}^{r} \\ A_{z} (z_{s}^{r} - u_{s}^{r}) + C_{z}^{r} w_{s}^{r} \\ d_{s + 1}^{r} \end{matrix}], & (5) \end{matrix}$

$\begin{matrix} x_{s}^{f, l} = f^{r} (x_{s}^{f, l}, u_{s}^{f, l}, w_{s}^{f, l}, d_{s + 1}^{f, l}) \\ := [\begin{matrix} A_{y}^{f, l} y_{s}^{f, l} + C_{y}^{f, l} w_{s}^{f, l} \\ A_{z}^{f, l} z_{s}^{f, l} - u_{s}^{f, l} + C_{z}^{f, l} w_{s}^{f, l} \\ A_{z} ({\underline{z}}_{s}^{f, l} - u_{s}^{f, l}) + C_{\underline{z}}^{f, l} w_{s}^{f, l} \\ d_{s + 1}^{f, l} \end{matrix}], l = 1, \dots, L \end{matrix}$

where the time invariant matrices A_y^r, A_z^r, C_y^r, C_z^rare given as follows:

$A_{y}^{r} = A_{z}^{r} = [\begin{matrix} 𝕆^{❘ 𝒥_{s}^{2} ❘ \times ❘ 𝒥_{s}^{1} ❘} & 𝕀^{❘ 𝒥_{s}^{2} ❘ \times ❘ 𝒥_{s}^{2} ❘} \\ 𝕆^{❘ 𝒥_{s}^{1} ❘ \times ❘ 𝒥_{s}^{1} ❘} & 𝕆^{❘ 𝒥_{s}^{1} ❘ \times ❘ 𝒥_{s}^{2} ❘} \end{matrix}],$

$C_{y}^{r} = [\begin{matrix} C_{y}^{1} & 0 & \dots & 0 \\ 0 & C_{y}^{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & C_{y}^{N} \end{matrix}], C_{z}^{r} = [\begin{matrix} C_{z}^{1} & 0 & \dots & 0 \\ 0 & C_{z}^{2} & \dots & 0 \\ ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & C_{z}^{N} \end{matrix}],$

$C_{y}^{k} = [\begin{matrix} \begin{matrix} 0 \\ ⋮ \end{matrix} \\ 𝕀^{❘ ℬ_{k} ❘ \times ❘ ℬ_{k} ❘} \\ ⋮ \\ 0 \end{matrix}],$

$C_{z}^{k} = [\begin{matrix} \begin{matrix} 0 \\ ⋮ \end{matrix} \\ diag ({m}_{m \in ℬ_{k}}) \\ ⋮ \\ 0 \end{matrix}], ℬ_{k} = {(m, n) \in ℬ : n = k} .$

and A_y^f,l, A_z^f,l, C_y^f,l,C_z^f,lare given by

$A_{y}^{f, l} = A_{z}^{f, l} = [\begin{matrix} 𝕆^{\dim (𝒥_{s, 2}^{f, l}) \times \dim (𝒥_{s, 1}^{f, l})} & 𝕀^{\dim (𝒥_{s, 2}^{f, l}) \times \dim (𝒥_{s, 2}^{f, l})} \\ 𝕆^{\dim (𝒥_{s, 1}^{f, l}) \times \dim (𝒥_{s, 1}^{f, l})} & 𝕆^{\dim (𝒥_{s, 1}^{f, l}) \times \dim (𝒥_{s, 2}^{f, l})} \end{matrix}],$

$C_{y}^{f, l} = [\begin{matrix} C^{1} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & C^{N} \end{matrix}], C^{k} = [\begin{matrix} \begin{matrix} 0 \\ ⋮ \end{matrix} \\ 𝕀^{\dim (ℬ_{k}) \times \dim (ℬ_{k})} \\ ⋮ \\ 0 \end{matrix}],$

$C_{z}^{f, l} = [\begin{matrix} C_{z}^{1, l} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & C_{z}^{N, l} \end{matrix}], C_{\underline{z}}^{f, l} = [\begin{matrix} C_{\underline{z}}^{1, l} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & C_{\underline{z}}^{N, l} \end{matrix}],$

$C_{z}^{k, l} = [\begin{matrix} \begin{matrix} 0 \\ ⋮ \end{matrix} \\ diag ({m}_{m \in ℬ_{k}^{f, l}}) \\ ⋮ \\ 0 \end{matrix}], C_{\underline{z}}^{k, l} = [\begin{matrix} \begin{matrix} 0 \\ ⋮ \end{matrix} \\ diag ({l}_{l \in ℬ_{k}^{f, l}}) \\ ⋮ \\ 0 \end{matrix}], where$

$ℬ_{k}^{f, l} = {(m, n, l^{'}) \in ℬ^{f} : n = k, l^{'} = l} .$

At time s, a feasible policy of both reliable and flexible situations forms a measurable map π_s: custom-character _s^r×_s^f→_s^r×_s^fwhere _s^f:=Π_l=1^L_s^f,land _s^f:=Π_l=1^L_s^f,l. Note that based on equation (4) the feasible policy π_ssatisfies that

$\begin{matrix} π_{s} (x_{s}^{r}, x_{s}^{f}) \in Γ (x_{s}^{r}, x_{s}^{f}) := {(u_{s}^{r}, u_{s}^{f}) \in 𝒰_{s}^{r} \times 𝒰_{s}^{f} : u_{s}^{r} \in Γ^{r} (x_{s}^{r}), u_{s}^{f, l} \in Γ^{f, l} (x_{s}^{f, l}), & (6) \end{matrix}$

$for all i = 1, \dots, L, {\underline{d}}_{s} \leq ⊤ u_{s}^{r} + \sum_{i = 1}^{L} ⊤ u_{s}^{f, l} \leq {\overline{d}}_{s}} .$

Let Π_sdenote the set of all feasible policies, π=(π₀, . . . , π_T) denote a feasible strategy of the platform, and Π:=Π_sΠ_sdenote the feasible strategy space.

We now introduce the finite time horizon stochastic dynamic program (DP). The expected total cost of the reliable and flexible demand based on the initial state x=[x^r, x^f] and using the strategy π is given by

$\begin{matrix} J (π; x) = 𝔼 [\sum_{s = 1}^{T} (\underset{\underset{reliable demand cost}{︸}}{c_{s}^{r^{T}} π_{s}^{r} (x_{s}^{r})} + \sum_{l = 1}^{L} (\underset{\underset{flexible demand cost}{︸}}{c_{s}^{f, l^{T}} π_{s}^{f, l} (x_{s}^{f, l})} + \underset{\underset{flexible demand penalty}{︸}}{p_{s}^{f, l^{T}} {(l - \sum_{τ = t}^{s} π_{s}^{f, l} (x_{s}^{f, l}))}_{(t, m, n, l) \in 𝒥_{s, 1}^{f, l}}^{+})}) ❘ [x_{0}^{r}, x_{0}^{f, l}] = x] . & (7) \end{matrix}$

That is, for the flexible demand in category (t, m, n, l), the platform will try to satisfy the minimum demand l, and will keep charging up to their battery capacity m whenever it is profitable. However, in case d_sis small, then the platform pays a penalty to the EVs in category custom-character _s,1^f,lwhose minimum demand was not met and who need to leave at time s+1.

The goal is to minimize the expected total cost from s=1 to s=T given the initial state x, which can be solved using the usual dynamic programming method under fairly mild conditions. The optimal value functions v s can be obtained by applying Bellman operator H_sfor each time s=T, . . . , 1, where the Bellman operator is defined as

$\begin{matrix} v_{s}^{*} (x_{s}) = H_{s + 1} (v_{s + 1}^{*}) (x_{s}) := \inf_{(u_{s}^{r}, u_{d}^{f}) \in Γ (x_{s}^{r}, x_{s}^{f, l})} c_{s}^{r^{T}} u_{s}^{r} + \sum_{l = 1}^{L} (c_{s}^{f, l^{T}} u_{s}^{f, l} + p_{s}^{f, l^{T}} [{({\underline{z}}_{s}^{f, l} - u_{s}^{f, l})}_{(t, m, n) \in 𝒥_{s, 1}^{f, l}}^{+}]) + 𝔼 [v_{s + 1}^{*} (f (x_{s}, u_{s}, W_{s}, d_{s + 1}))], & (8) \end{matrix}$

where v_T+1*(x_T+1)≡0, and f(x_s, u_s, W_s, d_s+1) is abuse of notation representing the state transition functions (5) of x_s=[x_s^r,x_s^f], u_s=[u_s^r,u_s^f], W_s=[W_s^r,W_s^f], and d_s+1.

In this case, we can obtain the optimal scheduling policies π*=[π_s* custom-character SET by applying the value iteration v_s*=H_s+1(v_s+1*) in (8) from time s=T to time s=1 recursively. Here, π_s*(x_s) is the minimizer in (8).

As those of ordinary skill in the art may appreciate, with a sufficiently large menu size custom-character ^rand ^f, the dimensionality of the state and action spaces also becomes large. In this case, computing v_sfor each s∈ is challenging due to the curse of dimensionality. As such, embodiments according to the present disclosure leverage Approximate Dynamic Programming (ADP) to compute the approximately optimal value functions.

The two main challenges to overcome for obtaining the approximately optimal value functions are computing of expectation in the Bellman operator and storing approximately optimal value functions. The first challenge is mitigated by using the empirical Bellman operator, which uses independent and identically distributed (i.i.d). samples of noise to approximate the computation of the expected future value. The second challenge is mitigated by using a projection operator, which takes the values from the computation of the empirical Bellman operator as inputs, and a function in the chosen function approximating class as outputs.

We use empirical Bellman operator Ĥ_s+1^k: custom-character _b(_s)→_b(_s) to approximate the actual Bellman operator H_s+1. Let {W_s,i}_i=1^kbe a sequence of independent identically distributed (i.i,d.) samples of w_s, then the empirical Bellman operator Ĥ_s+1^kis given by

${\hat{v}}_{s}^{k} (x_{s}) = {\hat{H}}_{s + 1}^{k} ({\hat{v}}_{s  + 1}^{`k}) (x_{s}) := \inf_{u_{s} \in Γ (x_{s})} c_{s}^{T} u_{s} + \frac{1}{k} \sum_{i = 1}^{k} {\hat{v}}_{s + 1}^{k} (f (x_{s}, u_{s}, W_{s, i})) .$

While applying the value iteration, it is necessary to store a function approximator of {circumflex over (v)}_s^kin computers readable memory/storage. The function approximator can be obtained by projecting the value function {circumflex over (v)}_s^konto a feasible function approximating class, such as neural networks or reproducing kernel Hilbert space (RKHS), which is dense in custom-character _b().

$Loss ({\hat{v}}_{s}^{k}, h ❘ {x_{s, j}}_{j = 1}^{l}) = \frac{1}{l} \sum_{j = 1}^{l} {({\hat{v}}_{s}^{k} (x_{s, j}) - h (x_{s, j}))}^{2} .$

We denote Π_s^l,d: custom-character _b(_s)→_d(_s) as the function approximating projection that maps the output of Ĥ_s+1^k({circumflex over (v)}_s+1^k) to a function in _d. This is defined as

$\prod_{s}^{l, d} ({\hat{v}}_{s}^{k}) = \arg \inf_{h \in 𝒢_{d}} Loss ({\hat{v}}_{s}^{k}, h ❘ {x_{s, j}}_{j = 1}^{l}) .$

We here construct a composited operator that combines the empirical Bellman operator and function approximating operator. We let

Ψ_s^k,l,d=Π_s^l,d∘Ĥ_s+1^k: custom-character _d(_s+1)→_d(_s)

be the random fitted empirical Bellman operator used in place of the actual Bellman operator H_s+1to arrive at an approximate function {circumflex over (v)}_s. Here, k is the number of samples generated, d is a parameter describing the size of the function approximating class, and l is the number of samples used in computing the empirical loss function for the projection operation.

We define the fitted value iteration at time s∈ custom-character as

{circumflex over (v)}
_s
^k,l,d(x_s)=Ψ_s^k,l,d({circumflex over (v)}_s+1^k,l,d)(x_s).

We now proceed to proving that this fitted value iteration algorithm converges as we increase k, l, d→∞. In what follows, we aim at increasing the k, l, d simultaneously. Let j∈ custom-character and k(j), l(j), d(j) be such that as j→∞, we have k(j), l(j), d(j)→∞. By a slight abuse of notation, we denote Ĥ_s+1^j:=Ĥ_s+1^k(j), Π_s^j:=Π_s^l(j),d(j), and the fitted value iteration algorithm by

{circumflex over (v)}
_s
^j(x_s):=Ψ_s^j({circumflex over (v)}_s+1^j)(x_s):=Ψ_s^{k(j),l(j),d(j))}({circumflex over (v)}_s+1^{k(j),l(j),d(j)})(x_s).

To establish the convergence of the proposed algorithms, we also need the following reasonable assumptions on the projection operators.

Assumption 1. The projection operator Π_s^l,d: custom-character _b(_s)→_d(_s) satisfies the followings two conditions:

- 1. Π_s^l,dis approximately non-expansive, that is, for all v₁, v₂∈_b(_s), we have ∥Π_s^l,d(v₁)−Π_s^l,d(v₂)∥_∞≤∥v₁−v₂∥_∞{circumflex over (ζ)}_s^l,d, where {circumflex over (ζ)}_s^l,d≤{circumflex over (ζ)}_s≤∞ almost surely and {circumflex over (ζ)}_s^l,d→>0 as l, d→∞ in probability.
- 2. For any ϵ>0 and δ>0, there exists M_l, M_dthat may depend on v_s* such that (∥Π_s^l,d(v_s*)−v_s*∥_∞>ϵ)<δ for all l≥M_l, d≥M_d.

Under the assumptions listed above, we have the following theorem where the convergence of the fitted value iteration algorithm is established.

Theorem 1: If Assumption 1. holds, then {circumflex over (v)}_s^jsatisfies for any κ>0,

$\underset{j \to \infty}{limsup} ℙ ({ {\hat{v}}_{s}^{j} - v_{s}^{*} }_{\infty} > κ) = 0.$

The proof of the Theorem is established below. Thus, as we increase the number of samples for empirical Bellman operator, expand the function approximating class to include more parameters, and take more samples of the state to project the value function to the function approximating class, we are guaranteed to converge to the optimal value functions under the sup norm.

Proof. We first establish two auxiliary results to establish the theorem. The first statement establishes that the empirical Bellman operator is non-expansive. The second statement shows that the empirical Bellman operator Ĥ_s+1^jwhen applied on v_s+1* converges to v_s* in probability as j→∞.

Lemma 1. For any v,v′∈ custom-character _b(_s+1) and any realization of the random operator Ĥ_s+1^j, we have ∥Ĥ_s+1^j(v)−Ĥ_s+1^j(v′)∥_∞≤∥v−v′∥_∞ almost surely. The proof is straightforward and therefore omitted.

Lemma 2. For any ϵ>0, we have the following holds:

$\lim_{k \to \infty} ℙ ({ {\hat{H}}_{s + 1}^{k} (v_{s + 1}^{*}) - H_{s + 1} (v_{s + 1}^{*}) }_{\infty} \geq ϵ) = 0,$

The proof may be found in the published literature.

We now proceed to proving Theorem 1 using the principle of mathematical induction. We have

∥{circumflex over (v)}_s^j−v_s*∥_∞≤∥Ψ_s^j({circumflex over (v)}_s+1^j)−Ψ_s^j(v_s+1*)∥_∞+∥Ψ_s^j(v_s+1*)−H_s+1(v_s+1*)∥_∞.

Let us consider the first summand on the right side of the equation above. We have

∥Ψ_s^j({circumflex over (v)}_s+1^j)−Ψ_s^j(v_s+1*)∥_∞≤∥Ĥ_s+1^j({circumflex over (v)}_s+1^j)−Ĥ_s+1^j(v_s+1*)∥_∞+ζ_s^j≤∥{circumflex over (v)}_s+1^j−v_s+1*∥_∞+ζ_s^j,

where we used Lemma 1 and Assumption 1(1). Next, we consider the second summand on the right side of the equation:

∥Ψ_s^j(v_s+1*)−H_s+1(v_s+1*)∥_∞=∥Π_s^j(Ĥ_s+1^j(v_s+1*))−v_s*∥_∞≤∥Π_s^j(Ĥ_s+1^j(v_s+1*))−Π_s^j(v_s*)∥_∞+∥Π_s^j(v_s*)−v_s*∥_∞≤∥Ĥ_s+1^j(v_s+1*))−v_s*∥_∞+ζ_s^j+∥Π_s^j(v_s*)−v_s*∥_∞,

where the first inequality is due to the triangle inequality and the second inequality is due to Assumption 1(1). Thus, we conclude that

∥{circumflex over (v)}_s^j−v_s*∥_∞≤∥{circumflex over (v)}_s+1^j−v_s+1*∥_∞+∥Ĥ_s+1^j(v_s+1*))−v_s*∥_∞+∥Π_s^j(v_s*)−v_s*∥_∞+2ζ_s^j.

For time s=T, we have v_T+1*={circumflex over (v)}_T+1^j=0. As j→∞, all three terms on the right goes to 0 in probability due to Lemma 2, Assumption 1(1), and Assumption 1(2). Thus, ∥{circumflex over (v)}_T^j−v_T*∥_∞→0 in probability as j→∞ and the statement holds for time T. For any time s, we can use the same argument to conclude that as j→∞, ∥{circumflex over (v)}_s^j−v_s*∥_∞→∞ in probability. The proof of the theorem is complete.

Next, we examine three crucial properties of the value function: monotonicity and Lipschitz continuity with respect to the state x_s, and continuity with respect to the system parameters.

Monotonicity of Value Functions

Note that any realization of the state x_sis a non-negative vector in custom-character ^|^s^|×₊^|^s^|×². Endow the state space _swith the following partial order: Let x_s, x_s′∈. Then, x_s≤x_s′ if and only if y_s≤y_s′, z_s^t,m,n=z′_s^t,m,n=z′_s^t,m,nfor every (t, m, n)∈_s¹, z_s^t,m,n≤z′_s^t,m,nfor every (t, m, n)∈_s², and d_s≤d_s′. A function v: custom-character _s→ is said to be a monotonically increasing function if and only if for any x, x′∈_ssuch that x≤x′, we have v(x)≤v(x′). A function v: → is said to be a monotonically decreasing function if and only if −v is monotonically increasing. In this section, we show that the dynamic optimization problem formulated above yields monotonically decreasing value functions at all times.

Theorem 2: For each s∈ custom-character , the optimal value function v_s* is a monotonically decreasing function of x_s.

Proof. To show this, we first note that for any x≤x′, we have:

- 1. Γ(x)⊆Γ(x′).
- 2. f(x,u,w,d_s+1)≤f(x′,u,w,d_s+1) for all u∈Γ(x) and w∈_s.

We now prove the statement using induction. The terminal cost is 0, so it is trivially monotone decreasing. Assume that v_s+1* is monotonically decreasing. We claim that v_s*=H_s+1(v_s+1*) is also monotone decreasing function. Pick x, x′∈ custom-character _ssuch that x≤x′, u∈Γ(x) and w∈_s. Since f(x, u, w)≤f(x′,u, w) and v_s+1* is monotonically decreasing, we conclude that

v
_s+1*(f(x′,u,w,d_s+1))≤v_s+1*(f(x,u,w,d_s+1)).

Consequently, custom-character [v_s+1*(f(∩,u,W,d_s+1)] is also monotonically decreasing function. This yields

$\inf_{u \in Γ (x)} c_{s}^{T} u + 𝔼 [v_{s + 1}^{*} (f (x, u, W, d_{s + 1}))] \geq \inf_{u \in Γ (x')} c_{s}^{T} u + 𝔼 [v_{s + 1}^{*} (f, x, u, W, d_{s + 1}))] \geq \inf_{u \in Γ (x')} c_{s}^{T} u + 𝔼 [v_{s + 1}^{*} (f, x^{'}, u, W, d_{s + 1}))],$

where the first inequality is due to Γ(x)⊆Γ(x′), and the second inequality results from the conclusion above. In other words, v_s* is monotonically decreasing. An application of the principle of mathematical induction implies that v_s* is monotone decreasing for all s.

Lipschitz Continuity of Value Functions

We now endow the state and the action space with metrics and establish the Lipschitz continuity of the value functions. Let custom-character :=₀=₂= . . . =_Tand a same convention is applied for . Define the metric on and as

ρ_x(x,x′)=∥x−x′∥_∞,ρ custom-character (u,u′)=∥u−u′∥_∞,

for any x, x′∈ custom-character ,u,u′∈. Let 2 denote the set of all compact subsets of . We endow this space with the Hausdorff metric, given by

$ρ_{J} (⋃, ⋃^{'}) = \max {\sup_{u \in ⋃} \inf_{u' \in ⋃'} ρ_{U} (u, u^{'}), \sup_{u' \in ⋃'} \inf_{u \in ⋃} ρ_{U} (u, u^{'})},$

for all U,U′⊂ custom-character .

Theorem 3: The value function v_s* is a Lipschitz continuous function.

Proof. We first claim the following statements:

- 1. The correspondence Γ: _s→, is Lipschitz continuous with coefficient L_Γ=max{r, 1}: For any x, x′∈_s, we have ρ_H(Γ(x), Γ(x′))≤L_Γρ_X(x, x′).
- 2. For every w∈_s, the state transition function ƒ(∩,∩, w) is Lipschitz continuous in (x, u)∈_swith Lipschitz coefficient L_f(w)≡1 and L_P:=∫L_f(w)(dw)=1≤∞.
- 3. The cost function c_s:_s→ is Lipschitz continuous with Lipschitz coefficient L_c_s:=∥c_s∥₁.

We can write Γ(x) as Γ(x)={u∈ custom-character _s: u≥0, Q₁u≤Q₂x, Q₃u=Q₄x} for appropriate matrices Q₁, Q₂, Q₃, Q₄that have bounded entries. Thus, the constraint set is actually a polyhedral set. We conclude that Γ is a Lipschitz continuous correspondence with Lipschitz coefficient L_Γ. The exact value of Lipschitz coefficient is difficult to derive with more detailed discussions on upper bounds on L_Γ available in the published literature.

We now prove the second claim. Using triangle inequality, we have

∥f(x,u,w)−f(x′,u′,w)∥_∞≤∥A∥_∞∥x−x′∥_∞+∥B∥_∞μu−u′∥_∞≤(∥x−x′∥_∞+∥u−u′∥_∞)

which shows that f is Lipschitz continuous over custom-character _swith Lipschitz coefficient 1. The Lipschitz coefficient of the cost function is derived from the Cauchy Schwarz inequality.

It can be shown that the Lipschitz continuity of the value function then follows, outlined as follows. Suppose that v_s+1* is Lipschitz continuous with Lipschitz coefficient. Then, it can be concluded that

which implies v_s* is Lipschitz continuous with Lipschitz constant L_v_s_*=L_c_sL_Γ+L_v_s+1_*(1+L_Γ) (since L_P=1). The induction step is complete.

Robustness of Value Functions with Respect to Parameters

The problem identified here has multiple parameters that can change over time. For instance, the cost of acquiring electricity in the wholesale markets or the distribution of the EV arrival process may change slightly over time. This can be studied under the umbrella of parameterized dynamic programs, where the parameters influence the cost/profit functions or the EV arrival process. We investigate in this section the continuity of the value function as a function of the parameters. We identify some sufficient conditions under which a slight change in the parameters would lead to a slight change in the value function. This allows us to conclude the robustness of the scheduling algorithm with respect to small parametric uncertainty.

Let Θ⊂ custom-character ^qbe the parameter space, which is assumed to be a compact subset of a Euclidean space. We consider a parameterized optimization problem, parameterized by θ∈Θ, in which:

- 1. {tilde over (c)}_s(θ) is the negative profit function; and
- 2. The probability distribution of the EV arrival process {tilde over (w)}_sis given by v_s(⋅, θ).
  
  The parameterized dynamic program is then rewritten as:

${\tilde{v}}_{s}^{*} (x_{s}, θ) = \inf_{u_{s} \in Γ (x_{s})} {{\tilde{c}}_{s} (θ)}^{T} u_{s} + 𝔼_{v_{s} (θ)} [{\tilde{v}}_{s + 1}^{*} (f (x_{s}, u_{s}, {\tilde{w}}_{s}, d_{s + 1}), θ)] .$

Here, {tilde over (v)}_s*: custom-character _s×Θ→ is the optimal parameterized value function. We also let {tilde over (π)}_s*(x_s, θ) be the corresponding parameterized scheduling policy. We identify some sufficient conditions and establish the continuity of {tilde over (v)}_s* and lower semicontinuity of {tilde over (π)}_s* below.

Assumption 2. The following holds:

- 1. {tilde over (c)}_sis continuous on Θ; and
- 2. There exists a base probability measure λ_sand a continuous and bounded function β_s: _s×Θ→[0, ∞) such that v_s(dw,Θ)=β_s(w,θ)λ_s(dw).

Theorem 4. Suppose that Assumption 2 holds. Then, {tilde over (v)}* is jointly continuous on custom-character _s×Θ and {tilde over (π)}_s* is lower semi-continuous on X_s×Θ.

Proof. Assumption 2(1) implies the cost function (u_s, θ) custom-character _s(θ)^Tu_sis jointly continuous on Θ×_s. Since the state transition function ƒ is a linear map, then linearity of ƒ and Assumption 2(2) implies that for any h∈_b(_s+1) and any convergent sequence {(x_n, u_n, θ_n)}_n⊂_s×_s×Θ satisfying (x_n, u_n, θ_n)→(x, u, θ), we have h(f(x_n, u_n, w, d))β_s(w, θ_n)→h(f(x, u, w, d))β_s(w, θ). Further, since h, β_sare continuous and bounded functions, we conclude that

$\lim_{n \to \infty} \int h (f (x_{n}, u_{n}, w, d^{'})) v_{s} (dw, θ_{n}) = \lim_{n \to \infty} \int h (f (x_{n}, u_{n}, w, d^{'})) β_{s} (w, θ_{n}) λ_{s} (dw) \overset{(a)}{=} \int h (f (x, u, w, d^{'})) β_{s} (s, θ) λ_{s} (dw) = \int h (f (x, u, w, d^{'})) v_{s} (dw, θ),$

where the equality in (a) results from the dominated convergence theorem as custom-character _s, _s, , Θ are compact.

Note that we have also shown in the proof of Theorem 3 that Γ(x_s) is a continuous and compact-valued correspondence. Thus, it can be implied that {tilde over (v)}_s* is continuous on custom-character _s×Θ and {tilde over (π)}_s* is lower semi-continuous on X_s×Θ, which completes the proof.

In contrast to the above, which is based on a single demand type (reliable demand), we now take into account the flexible demand that allows flexible charging between a minimum target SoC and maximum target SoC within the available charging time. In this scenario, the dimensionality of state space of the flexible demand, custom-character _s^f, linearly increases with L, which significantly increases the computational complexity. To alleviate the problem, we decouple the original problem into L+1 serial stages to reduce the dimensionality of the state space. The detailed algorithm and the corresponding optimality results are described below.

The scheduling problem (7) can be solved by the Empirical Fitted Value Iteration algorithm, which aims at efficiently solving an approximated dynamic programming. As previously described above with respect to a single demand type, consider Bellman operators at s=1, . . . ,T,

$\begin{matrix} v_{s}^{*} (x_{s}) = H_{s + 1} (v_{s + 1}^{*}) (x_{s}) := \inf_{u_{s} \in Γ (x_{s})} c_{s} (x_{s}, u_{s}) + 𝔼 [v_{s + 1}^{*} (f (x_{s}, u_{s}, W_{s}, d_{s + 1}))], & (9) \end{matrix}$

where v_T+1* (x_T+1)≡0. Here, with slight abuse of notation, we let c_sbe a general bounded Lipschitz continuous cost function at time s, and Γ be a Lipschitz continuous correspondence. We also remove the superscripts r,f here to indicate that this algorithm can be used for any stochastic dynamic program.

Let custom-character _b(_s) denote the space of continuous and bounded functions over _sendowed with the supremum norm. To solve (9), we again use the empirical Bellman operator {tilde over (H)}_s+1^k:_b(_s)→_b(_s) to approximate the actual Bellman operator H_s+1. Let {W_s,i}_i=1^kbe a sequence of independent identically distributed (i.i,d.) samples of w_s, then the empirical Bellman operator Ĥ_s+1^kis given by

$\begin{matrix} {\hat{v}}_{s}^{k} (x_{s}) = {\hat{H}}_{s + 1}^{k} ({\hat{v}}_{s + 1}^{k}) (x_{s}) := \inf_{u_{s}^{r} \in Γ (x_{s})} c_{s} (x_{s}, u_{s}) + \frac{1}{k} \sum_{i = 1}^{k} {\hat{v}}_{s + 1}^{k} (f (x_{s}, u_{s}, W_{s, i})) . & (10) \end{matrix}$

As before, the value approximator {circumflex over (v)}_s^kis stored by projecting it onto a feasible function approximating class, such as neural networks or reproducing kernel Hilbert space (RKHS), which is dense in custom-character _b(_s). Let _d(_s)⊂_b(_s) be the function approximating class parameterized by d∈. We then create a data set {x_s,j, {circumflex over (v)}_s^k(x_s,j)}_j=1^k′, where {x_s,j}_j=1^k′ are uniformly sampled from the state space _s, and {circumflex over (v)}_s^k(x_s,j) is obtained according to (10). We let Π_s^k′,d: custom-character _b(_s)→_d(_s) be the function approximating the projection that maps the output of Ĥ_s+1^k({circumflex over (v)}_s+1^k) to a function in _d. This is defined as

$\begin{matrix} \prod_{s}^{l, d} ({\hat{v}}_{s}^{k}) = \arg \inf_{h \in 𝒢_{d}} Loss ({\hat{v}}_{s}^{k}, h ❘ {x_{s, j}}_{j = 1}^{k'}), & (11) \end{matrix}$

where the loss function Loss: custom-character _b(_s)×_d(_s)→₊ can be picked as the mean squared error between two functions

$Loss ({\hat{v}}_{s}^{k}, h ❘ {x_{s, j}}_{j = 1}^{k'}) = \frac{1}{k^{'}} \sum_{j = 1}^{k'} {({\hat{v}}_{s}^{k} (x_{s, j}) - h (x_{s, j}))}^{2} .$

We again construct a composite operator that combines the empirical Bellman operator and function approximating operator. We obtain an approximate function {circumflex over (v)}_sby replacing the actual Bellman operator H_s+1with a random fitted empirical Bellman operator Ψ_s^k,k′,dwhich is given by

Ψ_s^k,k′,d=Π_s^k′,d∘Ĥ_s+1^k: custom-character _d(_s+1)→_d(_s).

Here, k is the number of samples generated, d is a parameter describing the size of the function approximating class, and k′ is the number of samples used in computing the empirical loss function for the projection operation. By increasing k, k′, d simultaneously, we let j∈ custom-character and k(j), k′(j), d(j) be such that as j→∞, we have k(j), k′(j), d(j)→∞. In this case, the fitted value iteration algorithm is given by

$\begin{matrix} {\hat{v}}_{s}^{j} (x_{s}) = Ψ_{s}^{j} ({\hat{v}}_{s + 1}^{j}) (x_{s}) := Ψ_{s}^{k (j), k' (j), d (j)} ({\hat{v}}_{s + 1}^{k (j), k' (j), d (j)}) (x_{s}) . & (12) \end{matrix}$

The convergence of {circumflex over (v)}_s^jis proven in Theorem 1 as previously described.

Lemma 3. If Assumption 1 holds, then {circumflex over (v)}_s^jis satisfied for any κ>0,

$\underset{j \to \infty}{limsup} ℙ ({ {\hat{v}}_{s}^{j} - v_{s}^{*} }_{\infty} > κ) = 0.$

Note that the convergence result is established under j→∞, which means that computing {circumflex over (v)}_s^jwith small error K requires a sufficiently large number of samples k(j). However, if we apply (12) on the problem (8), then it becomes practically intractable with multiple types of demand, e.g. reliable (inflexible) and flexible demand. As previously described, the dimensionality of the action space is dim( custom-character _s)=dim(_s^r)+Σ_l=1^Ldim (_s^f/l), which increases linearly with L. This can be shown to leads to the computation complexity of solving the optimization (8) of at least O(L²). Further, the state space dimensionality is also linearly increasing with L, e.g. dim(_s)=dim(_s^r)+Σ_l=1^Ldim ( custom-character _s^f,l), which also leads to high sample and computational complexity. The quadratic scaling of computational time in the size of the state and action spaces is addressed by dividing the scheduling problem into two subproblems in which the state and action space of each subproblem is smaller. This reduces the computational time at the expense of a small loss in optimality. However, we also identify a sufficient condition under which there is no loss in optimality.

Reducing Complexity Through Two Stages of Scheduling

In this section, we describe a two stage algorithm that sequentially solves (7) according to each type of demand. This can reduce the dimensionality of the state space custom-character _s^r×_s^f, which simplifies the empirical fitted value iteration.

The Bellman equation (8) is decoupled as follows: we separate the state space custom-character _sinto _s^rand _s^f,1, . . . , _s^f,L, and consider a sub-scheduling problem on each separated space. That is, we sequentially solve the original problem based on the types of demands, where each stage solves an empirical fitted value iteration on _s^r, _s^f,1, . . . , _s^f,Literatively. Though this may seem like an intuitive decoupling, the key challenge here comes from the control variables u_s^rand u_s^f,1, . . . , u_s^f,Lsharing the same bounds d_sin the last inequalities of (6). Recognizing this forms our motivation to obtain d_s^rand d_s^f,1, . . . , d_s^f,Lsequentially for satisfying (4) as described below.

Reliable Demand

We first let d_s^r=d_s, which implies that the reliable demand consumes all the electricity available to the platform. Then, the reliable demand is scheduled by solving the Bellman equation (9) for reliable demand only. For instance, we let

$\begin{matrix} π_{s}^{r} (x_{s}^{r}) = \underset{u_{s}^{r} \in Γ^{r} (x_{s}^{r})}{arginf} c_{s}^{r^{T}} u_{s}^{r} + 𝔼 [v_{s + 1}^{r *} (f^{r} (x_{s}^{r}, u_{s}^{r}, W_{s}^{r}))], & (13) \end{matrix}$

where v_s^r+: custom-character _s^r→ is the optimal value function for the cost of reliable demand only. Here, we use the empirical fitted value iteration to solve (13) to obtain the approximated value function {circumflex over (v)}_s^r,j. The approximated optimal control action at each time s is û_s^r*={circumflex over (π)}_s^r,j(x_s^r), where {circumflex over (π)}_s^r,jis the approximated scheduling policy corresponding to {circumflex over (v)}_s^r,j. Then, the expected total cost based on the given initial state x^ris

$J^{r} [{\hat{π}}^{r, j}; x^{r}) = 𝔼 [\sum_{s = 1}^{T} c_{s}^{r^{T}} {\hat{u}}_{s}^{r *} ❘ x_{0}^{r} = x^{r}] .$

Flexible Demand

Similarly to the reliable demand, the flexible demand with minimum demand l is scheduled by the policy

$\begin{matrix} π_{s}^{f, l} (x_{s}^{f, l}) = \underset{u_{s}^{f, l} \in Γ^{f, l} (x_{s}^{f, l})}{arginf} c_{s}^{f, l^{T}} u_{s}^{f, l} + {p_{s}^{f, l^{T}} (z_{s}^{f, l} - u_{s}^{f, l})}_{(t, m, n) \in 𝒥_{s, 1}^{f, l}}^{+} + 𝔼 [v_{s + 1}^{s, l *} (f^{f, l} (x_{s}^{f, l}, u_{s}^{f, l}, W_{s}^{f, l}))], & (14) \end{matrix}$

where v_s^f,l: custom-character _s^f/l→ is the optimal value function for flexible demand only. In this case, we denote {circumflex over (v)}_s^f,las the approximator for v_s^f,lby using the empirical fitted value iteration (12). We then obtain {circumflex over (π)}_s^f,l,j(x_s^f,l) as the corresponding approximated optimal policy.

However, we need to use the optimal action û_s^r* to compute the remaining electricity that can be allocated to the flexible demand. That is, we compute the d_s^f,lby

$\begin{matrix} {\underline{d}}_{s}^{f, l} = {({\underline{d}}_{s} - 1^{T} {\hat{u}}_{s}^{r *} - \sum_{i = 1}^{L} 1^{T} {\hat{u}}_{s}^{f, i *})}^{+}, {\overline{d}}_{s}^{f, l} = {({\overline{d}}_{s} - 1^{T} {\hat{u}}_{s}^{r *} - \sum_{i = 1}^{L} 1^{T} {\hat{u}}_{s}^{f, i *})}^{+}, & (15) \end{matrix}$

where û_s^f,l*={circumflex over (π)}_s^f,l,j(x_s^f,l) the approximated optimal action for the flexible demand with minimum demand l.

Here, it is straightforward to verify that computing d_s^f,lwith 15 for each l=1, . . . , L yields d_s^r, d_s^f,1, . . . , d_s^f,Lsatisfying (4). In this case, the optimal expected total cost for flexible demand is given by

$J^{f, l} ({\hat{π}}^{f, l, j}; x^{f, l}) = 𝔼 [\sum_{s = 1}^{T} (c_{s}^{f, l^{T}} {\hat{u}}_{s}^{f, l *} + {p_{s}^{f, l^{T}} (l - \sum_{τ = t}^{s} {\hat{u}}_{s}^{f, l *})}_{(t, m, n, l) \in 𝒥_{s, 1}^{f, l}}^{+} ❘ x_{0}^{f, l} = x^{f, l}] .$

We establish the following sufficient condition to guarantee the optimality of the above decoupling procedure.

Lemma 4. Suppose d₁, . . . , d_Tsatisfy for all s=1, . . . , T,

d

_s=0,d_s≥r( custom-character ^Ty_s^r+Σ_l=1^L^Ty_s^f,l) (16)

then for any κ>0,

$\underset{j \to \infty}{limsup} ℙ (❘ J^{r} ({\hat{π}}^{r, j}; x^{r}) + \sum_{l = 1}^{L} J^{f, l} ({\hat{π}}^{f, l, j}; x^{f, l}) - J (π; x) ❘ \geq κ) = 0.$

Proof. We will first show that if (16) holds, then u_s^r*=π_s^r(x_s^r) and u_s^f,l*=π_s^r(x_s^f,l) for l=1, . . . , L will minimize (8). Then we apply triangular inequality to prove the probability bounds. The feasible action set of (8) defined in (6) yields that for any (u_s^r, u_s^f)∈Γ(x_s^r, x_s^f), we have u_s^r∈Γ^r(x_s^r) and u_s^f,l∈Γ^f,l(x_s^f,l). Thus,

0≤u_s^r≤g(x_s^r)=min{ry_s^r,z^r}≤ry_s^r,

0≤u_s^f,l≤g(x_s^f,l)=min{ry_s^f,l*,z^f,l*}≤ry_s^f,l,

by (2) and (3). That is, if (16) holds, then

${\underline{d}}_{s} = 0 \leq 1^{T} u_{s}^{r} + \sum_{l = 1}^{L} 1^{T} u_{s}^{f, l} \leq r (1^{T} y_{s}^{r} + \sum_{l = 1}^{L} 1^{T} y_{s}^{f, l}) \leq {\overline{d}}_{s}, for all s = 1, \dots, T,$

which implies

$Γ^{r} (x_{s}^{r}) \subset {u_{s}^{r} \in 𝒰_{s}^{r} : {\underline{d}}_{s} \leq 1^{T} u_{s}^{r} + \sum_{l = 1}^{L} 1^{T} u_{s}^{f, l} \leq {\overline{d}}_{s}, u_{s}^{f, l} \in Γ^{f, l} (x_{s}^{f, l}), for all l = 1, \dots, L}, Γ^{f, l} (x_{s}^{f, l}) \subset {u_{s}^{f, l} \in 𝒰_{s}^{f, l} : {\underline{d}}_{s} \leq 1^{T} u_{s}^{r} + \sum_{l' = 1}^{L} 1^{T} u_{s}^{f, l'} \leq {\overline{d}}_{s}, u_{s}^{r} \in Γ^{r} (x_{s}^{r}), u_{s}^{f, l} \in Γ^{f, l} (x_{s}^{f, l}), for all l^{'} = 1, \dots, L, l^{'} \neq l} .$

Thus, we can conclude that (16) implies

Γ(x_s^r,x_s^r)={(u_s^r,u_s^f)∈ custom-character _s^r×_s^f:

u
_s
^r∈Γ^r(x_s^r),u_s^f,l∈Γ^f,l(x_s^f,l), for all l=1, . . . ,L}.

By applying the principle of mathematical induction from s=T to s=1 with v_T*, v_T^r*, v_T^f,l*≡0, we have (8) being equivalent to

$v_{s}^{*} (x_{s}) = \inf_{u_{s}^{r} \in Γ (x_{s}^{r}), u_{s}^{f, l} \in Γ (x_{s}^{f, l}), l = 1, \dots, L} c_{s}^{r^{T}} u_{s}^{r} + \sum_{l = 1}^{L} (c_{s}^{f, l^{T}} u_{s}^{f, l} + {p_{s}^{f, l^{T}} (z_{`s}^{f, l} - u_{s}^{f, l})}_{(t, m, n) \in {𝒥`}_{s, 1}^{f, l}}^{+}) + 𝔼 [v_{s + 1}^{*} (f (x_{s}, u_{s}, W_{s}, d_{s + 1}))] = v_{s}^{r *} (x_{s}^{r}) + \sum_{l = 1}^{L} v_{s}^{f, l *} (x_{s}^{f, l}), for all s = 1, \dots, T .$

This indicates that given (16), if u_s^r*=π^r(x_s^r) and u_s^f,l*=π^f,l(x_s^f,l) for l=1, . . . , L, then (u_s^r*, u_s^f,l*, . . . ,u_s^f,l*) minimizes (8).

We now prove the probability bounds. By Lemma 3, we have for any κ>0,

$\begin{matrix} 0 = \underset{j \to \infty}{limsup} ℙ ({ {\hat{v}}_{s}^{r, j} - v_{s}^{r *} }_{\infty} > κ) \\ = \underset{j \to \infty}{limsup} ℙ ({ v_{s}^{f, l, j *} - v_{s}^{f, l} }_{\infty} > κ) . \end{matrix}$

That is, for any x=(x^r, x^f,l, . . . , x^f,L)∈ custom-character ₀^r×₀^fand κ>0,

$\underset{j \to \infty}{limsup} ℙ (❘ J^{r} ({\hat{π}}^{r, j}; x^{r}) + \sum_{l = 1}^{L} J^{f, l} ({\hat{π}}^{f, l, j}; x^{f, l}) - J (π; x) ❘ \geq κ) = \underset{j \to \infty}{limsup} ℙ (❘ J^{r} ({\hat{π}}^{r, j}; x^{r}) + \sum_{l = 1}^{L} J^{f, l} ({\hat{π}}^{f, l, j}' x^{f, l}) - v_{0}^{r *} (x^{r}) - \sum_{l = 1}^{L} v_{0}^{f, l *} (x^{f, l}) ❘ \geq κ) \leq \underset{j \to \infty}{limsup} ℙ (❘ J^{r} ({\hat{π}}^{r, j}; x^{r}) - v_{0}^{r *} (x^{r}) ❘ \geq κ) + \sum_{l = 1}^{L} \underset{j \to \infty}{limsup} ℙ (❘ J^{f, l} ({\hat{π}}^{f, l, j}; x^{f, l}) - v_{0}^{f, l *} (x^{f, l}) ❘ \geq κ) \leq \underset{j \to \infty}{limsup} ℙ ({ {\hat{v}}_{0}^{r, j} - v_{0}^{r *} }_{\infty} > κ) + \sum_{l = 1}^{L} \underset{j \to \infty}{limsup} ℙ ({ {\hat{v}}_{0}^{f, l, j *} - v_{0}^{f, l *} }_{\infty} > κ) = 0,$

which completes the proof.

Algorithm

Having computed the value functions using the fitted value iteration algorithm described above, we provide a detailed ADP algorithm to obtain u₀*, u₁*, . . . , u_T*. We compute the approximate optimal action for reliable and flexible demand as û_s^r* and û_s^f,1*, . . . , u_s^f,L* using a multistage Rollout algorithm. Then, each EV in the category can be charged according to any disaggregation algorithm, like first-come, first served (FCFS). The overall algorithm is described below.

A coarse approximation of the true computational complexity of the algorithm may be provided as follows. Let U=max{dim( custom-character _s^r), dim(_s^f,1), . . . , dim(_s^f,L)}, then the time complexity of solving a linear optimization with constraints is (indeed, at least) O((L+1)²U^2.5). By using the two-stage algorithm according to the present disclosure, solving each stage is of time complexity of at least O(U^2.5) and the total time complexity is at least O((L+1)U^2.5).

The multi-stage EV charging scheduling and control algorithm according to embodiments of the disclosure may be summarized by the following pseudocode:

Part I: Multistage Fitted Value Iteration

Initialize v_T+1*≡ 0.

FOR s = T, . . ., 1 DO

Generate state and noise samples for reliable demand: {x_s,j^r}_j=1^k′ and {W_s,i^r}_i=1^k

Create data set using fitted empirical Bellman operator {x_s,j^r, {circumflex over (v)}_s^r,j(x_s,j)}_j=1^k′, and

obtain v_s^r,jwith Neural networks.

Generate state and noise samples for flexible demand {x_s,j^f,l}_j=1^k′ and {W_s,i^f,l}_i=1^kfor

each l = 1, ..., L. Create data set using fitted empirical Bellman operator

{x_s,j^f,l, {circumflex over (v)}_s^f,l,j(x_s,j)}_j=1^k′, and obtain {circumflex over (v)}_s^f,l,jwith Neural networks for each l = 1, .., L.

END FOR

Part II: Multistage Rollout Algorithm

Initialize x₀= 0.

FOR s = 1, . . ., T DO

Update x_susing (5), and decouple x_sinto x_s^r, x_s^f,1, ..., x_s^f,L.

Pick d_s^r= d_sand compute û_s^r* = {circumflex over (π)}_s^r(x_s^r) by (13).

FOR l = 1, ...., L DO

Update d_s^f,lby (15).

Compute û_s^f,l* = {circumflex over (π)}_s(x_s^r) with (14).

END FOR

FOR (t,m,n) ∈ T × B^rand (t,m,n,l) ∈ T × B^fDO

Charge each EV η units of electricity in the interval from s to s + 1 based on FCFS

discipline, where:

IF In category (t,m,n) THEN

Reliable demand: η = û_s^t,m,n* /y_s^t,m,n

ELSE

Flexible demand: η = û_s^t,m,n,l* /y_s^t,m,n,l

END IF

END FOR

END FOR

Numerical Results
Simulation Setup

In this illustration of operation of the system or method for scheduling and controlling EV charging based on the scheduling of a large number of EVs, we consider the scheduling of EV charging for a T=24 hour period, that is, from 7 AM (day 1) to 7 AM (day 2). The electricity prices may vary according to a time-of-use schedule having two or more ranges or categories as well as the day of the week (such as weekdays/weekends) and a summer/winter season, for example. In this illustration, electricity prices vary according to peak/off-peak hours during the same season and days with the same rate (weekdays). We consider two types of customers, i.e. L=1, and the customers pay a constant price of 9.2¢/kWh for reliable demand and 7.36¢/kWh (20% discount) for flexible demand. The cost c_sis considered as the difference between the electricity price and the revenue per kWh from the customers. The platform will compensate p_s=2.5¢/kWh to customers if their minimum flexible demand is not met. In what follows, we will interpret the respective costs associated with reliable demand and flexible demand c_s^rand c_s^f,las negative profits instead. These parameters are shown in Table 1 below:

TABLE 1

Electricity Prices During Weekdays

Time (h)
7-14
14-18
18-22
22-7

Peak hours
Mid-Peak
On-Peak
Mid-Peak
Off Peak

c_s^r¢/kWh
0
7.4
0
−4.4

c_s^f,1¢/kWh
1.84
9.24
1.84
−2.56

p_s^f,1¢/kWh
2.5
2.5
2.5
2.5

For this example, we choose M=3 and N=6, and the charging rate is fixed at r=10 kW. The feasible menus custom-character ^rand ^f,1are given in Table 2 below. Then, the dimensionality of the state/action space is determined, which is dim(_s^r)=182, dim(_s^f,1)=272 and dim(_s^r)=dim(_s^f,1)=90. The arrival process {w_t^r and {w_t^f,1 are sequences of random variables with Poisson distribution. The distribution of the arrival process is deduced from the ACN Dataset. We also pick d_s=[0 kWh, 10000 kWh] as the hourly grid bounds.

TABLE 2

Feasible Menu Given M = 3 And N = 6

B^r, B^{f, 1}
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6

m = 10 kWh
(1, 1)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)

m = 20 kWh
x
(2, 2)
(2, 3)
(2, 4)
(2, 5)
(2, 6)

m = 30 kWh
x
x
(3, 3)
(3, 4)
(3, 5)
(3, 6)

For the projection operator, we choose the number of state samples and noise samples to be 64 (i.e. j=64). The function approximating class custom-character _dis the set of neural networks with width dim(_s^r)×2=364 and dim(_s^f,1)×2=544 and depth 8. The learning rate is chosen as 0.005. The empirical fitted value iteration is employed to compute the value functions {circumflex over (v)}_s^r,jand {circumflex over (v)}_s^f,1,j.

Results of Reliable and Flexible Demand

We demonstrate the performance of our ADP algorithm, denoted as ADP, by comparing it with two other algorithms: SP (simple programming) and FCFS (first-come first-serve). The algorithm SP computes the optimal cost with the knowledge of all the future demand—in this case, the problem boils down to solving a linear program with constraints. It is formulated by a deterministic optimization problem since {w_t^T custom-character , {w_t^f,1} are known. We denote the optimal actions of SP as {u_SP,s^r* and {u_SP,s^r*. The second algorithm FCFS follows the First Come First Serve discipline, which charges the EVs immediately when they arrive at the charging station. This is the most widely used scheduling algorithm across the world. Let the actions of FCFS be denoted by {u_FCFS,s^r* custom-character and {u_FCFS,s^f,1*. An overview of the information required by the three algorithms is summarized in Table 3 below.

TABLE 3

Application Scenarios of the Algorithms

Given Knowledge of the Future

Algorithms
Future Demand
Demand Distribution
No Knowledge

SP
Yes
No
No

ADP
Yes
Yes
No

FCFS
Yes
Yes
Yes

We similarly denote optimal actions of our ADP algorithm as {u_ADP,s^r* custom-character and {u_ADP,s^f,1*. Let the cumulative cost for each sample path as

$\begin{matrix} J_{α, t}^{*} = \sum_{s = 1}^{T} (c_{s}^{r^{T}} u_{α, s}^{r *} + {p_{s}^{r^{T}} (m - \sum_{τ = t}^{s} u_{α, s}^{r *})}_{(t, m, n) \in 𝒥_{s, 1}^{r}}^{+} + c_{s}^{f, l^{T}} u_{α, s}^{f, l *} + {p_{s}^{f, l^{T}} (l - \sum_{τ = t}^{s} u_{α, s}^{f, l *})}_{(t, m, n, l) \in {𝒥`}_{s, 1}^{f, l}}^{+}) & (17) \end{matrix}$

where α∈{ADP,SP,FCFS}. Note that J_SP* are the lower bounds on J_ADP* and J_FCFS* since it knows all the future demand. Ten sample paths are used to compare the performance of these algorithms, and the results are shown in FIG. 2. Since ADP and SP exploit knowledge of the future demand distribution or demand itself, their profits are much higher than the FCFS charging policy. We consider two types of FCFS for flexible demand: charge the EV up to the minimum target or the maximum target SOC.

We can further observe from FIG. 2 that despite FCFS with maximum demand, all of the algorithms serve a similar amount of demand at the end of the day. In fact, ADP serves more demand, and achieves a relatively similar profit to SP. The optimality gap between SP and ADP in the upper plot of FIG. 2 is due to the approximation error of the value function and the uncertainty about the future in the ADP algorithm.

FIG. 3 depicts that the error from the approximation in the flexible demand setting. This results from two major facts in serving flexible demand: 1) the value function is more difficult to approximate due to the penalty term in the cost function; 2) the two-stage optimization makes the value function of flexible demand more sensitive to the results of the reliable demand optimization, and the function approximator requires more data samples to achieve the comparable accuracy. Under the reliable demand setting, the profits of SP and ADP are similar, whereas, under the flexible setting, there is an optimality gap between SP and ADP. However, ADP performs better than FCFS for both settings.

We also observe that the penalty of violating the charging demand can significantly affect the profits in the flexible demand setting, which is shown in FIG. 4. A lower penalty leads to lower energy consumption but higher profits since it allows dropping flexible demand during the peak hours.

As the penalty p_s^f,lincreases, the optimality gap between SP and ADP becomes larger, as demonstrated in FIG. 5. Here, the FCFS algorithm is not affected by the penalty changes. Note that at p_s^f,1=1.84 and p_s^f,1=9.24, the SP energy consumption increases due to the penalty being higher than the charging cost during the mid-peak and peak hours respectively, and the demand with high charging cost cannot be dropped. This leads to a trade-off between the total charge provided and the cumulative profits.

Constraints Relaxation for Reliable Demand

As long as the grid bounds d_sare sufficiently large, the reliable demand can always be fully satisfied. However, in various regions and/or during various times, the available grid power may be limited to less than the reliable demand. In this case with limited grid bounds d_s, leads to Γ^r(x_s^r)=Ø, and thus, there is no feasible solution for (13). To circumvent this, we also add a penalty term to the reliable demand setting. It solves the following Bellman equation

$\begin{matrix} π_{s}^{r} (x_{s}^{r}) = \underset{u_{s}^{r} \in Γ^{r'} (x_{s}^{r})}{arginf} c_{s}^{r^{T}} u_{s}^{r} + {p_{s}^{r^{T}} (z_{s}^{r} - \underset{\underset{penalty for reliable demand}{︸}}{u_{s}^{r}})}_{(t, m, n) \in {𝒥`}_{s, 1}^{r}}^{+} + 𝔼 [v_{s + 1}^{r *} (f^{r} (x_{s}^{r}, u_{s}^{r}, W_{s}^{r}))], & (18) \end{matrix}$

where Γ^r′ is the relaxed feasible action set, i.e.

Γ^r′(x_s^r):={u_s^r∈ custom-character _s^r:0≤u_s^r≤g(x_s^r),d_s^r≤^Tu_s^r≤d_s^r},

and v_s^r′* is the corresponding value function. For this example, we choose d_s=[0,6000 kWh] and d_s=[0,8000 kWh] to demonstrate the performance of each algorithm in this scenario, which is shown in FIG. 5. In this case, we replace the principle of heuristic algorithm from FCFS to EDF since EDF requires smaller grid bounds to meet the charging demand. The results are depicted in FIG. 6. During the peak hours, SP and ADP consume 0 kWh electricity since the demand is not served and the platform pays penalties to the customers—this leads to SP and ADP having a higher profit than EDF. In the context of FIG. 6, we can observe that the optimality gap between SP and ADP caused by the multi-stage algorithm reduces if the grid bound is sufficiently large (d_s=10 kWh). This was established in Lemma 4.

As described herein, scheduling of EV charging may be modeled with multiple types of reliability constraint as a stochastic dynamic program. Due to very high dimensional state and action spaces, and a high number of constraints, the resulting problem could not be solved using the usual dynamic programming algorithm. As such, various embodiments according to the disclosure use fitted value iteration to solve the problem, and apply a multi-stage algorithm to reduce the computational complexity of the solution approach. Simulations show that this algorithm yields profits close to optimal profits under full information about the future demand of the EVs, and is better than the heuristic algorithms like FCFS and EDF. This disclosure demonstrates robustness of the algorithm with relaxed constraints in optimization. While the disclosed two-stage decoupling algorithm may provide acceptable results for various applications, it may be further improved to reduce the optimality gap.

While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms encompassed by the claims. The words used in the specification are words of description rather than limitation, and it is understood that various changes can be made without departing from the spirit and scope of the disclosure. As previously described, the features of various embodiments can be combined to form further embodiments of the invention that may not be explicitly described or illustrated. While various embodiments could have been described as providing advantages or being preferred over other embodiments or prior art implementations with respect to one or more desired characteristics, those of ordinary skill in the art recognize that one or more features or characteristics can be compromised to achieve desired overall system attributes, which depend on the specific application and implementation. These attributes may include, but are not limited to cost, strength, durability, marketability, appearance, packaging, size, serviceability, weight, manufacturability, ease of assembly, etc. As such, embodiments described as less desirable than other embodiments or prior art implementations with respect to one or more characteristics are not outside the scope of the disclosure and can be desirable for particular applications.

SMART CHARGE SCHEDULING FOR AN AGGREGATE OF ELECTRIC VEHICLES CONSIDERING GRID DEMAND

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

CROSS-REFERENCE TO RELATED APPLICATIONS

Provisional Applications (1)