SMART CHARGE SCHEDULING FOR AN AGGREGATE OF ELECTRIC VEHICLES CONSIDERING GRID DEMAND

Information

  • Patent Application
  • 20240034177
  • Publication Number
    20240034177
  • Date Filed
    July 20, 2023
    11 months ago
  • Date Published
    February 01, 2024
    4 months ago
  • CPC
    • B60L53/63
    • H02J7/0071
    • B60L53/62
    • H02J7/0048
    • B60L53/67
    • B60L53/64
  • International Classifications
    • B60L53/63
    • H02J7/00
    • B60L53/62
    • B60L53/67
    • B60L53/64
Abstract
A system and method for controlling charging of multiple electric vehicles (EVs) arriving at, and departing from, different charging stations at different times, includes scheduling charging of each EV of the multiple EVs responsive to which one of a plurality of categories each EV is assigned, each EV being assigned to one of the categories according to an arrival time at an associated one of the different charging stations, a departure time from the associated one of the different charging stations, an initial state of charge (SoC) of the EV, and a target SoC of the EV, and controlling charging of each EV responsive to the scheduling of charging for the assigned category of each EV. Charging demand may be selected by each EV as being reliable or flexible with flexible charging demand having a minimum target SoC and maximum target SoC.
Description
TECHNICAL FIELD

This disclosure relates to controlling charging of individual electric vehicles (EVs) based on charge scheduling considering vehicle settings in addition to aggregate electric demand and cost of electric grid power.


BACKGROUND

An ongoing shift toward electrification and the adoption of EVs has raised concern among electric utility companies with respect to the large loads of energy drawn from the grid for sustained periods of time to charge these vehicles. Charging demands from EVs (averaging around 16 kWh/EV/day) can strain the electric grid such that utility companies are implementing various demand management strategies. EV sales in the United States alone increased by 85% from 2020 to 2021 according to the US Department of Energy. This rapid growth poses challenges both charging service providers (charging platforms) and the grid operators as a large-scale EV charging market leads to a highly random and significant load to the power grid. This problem can be mitigated by: 1) ramping up and down the energy output generated by dispatchable power plants (which generally uses non-renewable sources of energy); or 2) scheduling the charging of EVs by a platform to coordinate with the grid operators and renewable energy providers. The former approach requires significant capital investment in building new infrastructure. On the other hand, the latter approach is more straightforward to implement due to the inherent flexibility of the EV charging process and widespread availability of smart chargers. For example, charging one EV may take up to two hours, but the customer may park the EV in a charging station overnight, allowing for slower, delayed, or preemptive charging. Furthermore, customers may opt for a flexible charging service that allows the charging provider to charge the EV between a minimum target SoC and a maximum target SoC.


Scheduling algorithms for EV charging have been developed under various assumptions and with specific goals. Typically, in a day-ahead market, the platform has complete information about the future demand, and thus, the charging process can be scheduled offline by a deterministic algorithm. For instance, various algorithms have been developed to solve valley-filling problems that shave the difference between the charging loads and grid capacity. Another set of algorithms optimize the profits/costs/social-welfare of charging through a deterministic optimization. Other strategies employ game theoretic approaches.


However, current algorithms either do not exploit information that are available from past data or are too computationally complex to be able to schedule a large number of EVs. For instance, online algorithms that schedule according to a departure deadline or laxity of charging, i.e. Early Deadline First (EDF) and Least Laxity First (LLF) algorithms, do not incorporate the information of future demand that can be inferred from past data, and thus, are sub-optimal in many scenarios.


If the distribution of the future demand is known, then the charging platform can apply Model Predictive Control (MPC), scenario-based algorithms, or other stochastic optimization methods to optimize the charging schedule. For instance, MPC has been used to maximize charging profits for each EV or to track a specified demand trajectory. Nonetheless, these algorithms schedule the charging processes either with an integer programming or using the dynamics for individual EVs. This leads to an increase in the computational complexity (potentially exponentially) as the number of EVs grows in the market and corresponding challenges to real-time implementation in a large-scale EV charging market. This intractability is also present in recently proposed data-driven reinforcement learning based scheduling algorithms due to the very high number of past samples needed for learning an approximately optimal policy.


In addition to the limitation stated above, these algorithms consider single types of demand. The multi-type demand strategies mainly focus on the types or levels of charging rates among the EVs. For instance, Bayram et. al. applied a multi-class queue network to model the charging services with different charging rates. The goal is to optimize the quality of the service and the charging cost by tuning the prices of charging that affect the demand rate. Kong et. al. also used the queue network framework to allocate appropriate chargers to different types of EVs. Khalkhali et. al. proposed a two-stage algorithm that schedules EV charging with slow/fast charging services to minimize the expected charging costs.


SUMMARY

The present inventors have recognized that controlling charging of EVs with a focus on multiple types of flexibility of charging demand benefits both the customers and the charging platform in that the customers can choose a lower charging price in exchange for platform flexibility of not charging to their specified target SoC. This provides the platform more flexibility to drop the aggregated charging demand during the peak-hours, which can reduce the charging costs to the platform. As such, in one or more embodiments according to the disclosure, control of charging individual EVs is performed based on preemptive scheduling of charging a large number of EVs with electric grid services using a stochastic dynamic program with a state-dependent action constraint.


In one or more embodiments, control of charging an aggregate of EVs is based on use of approximate dynamic programming (ADP) to compute a scheduling algorithm for the charging of the EVs that maximizes the profit of the EV charging platform. This algorithm facilitates a mix of inflexible and flexible charging demand types employing a multi-stage algorithm that efficiently solves the high dimensional scheduling problem, along with the complexity and optimality analysis. Each EV that arrives at a charging station is assigned a category depending on its arrival/departure time and initial/target state of charge (SoC). This categorization allows scheduling and control of the charging process for a large number (on the order of millions) of EVs because the computation complexity depends only on the number of the categories, rather than the number of EVs. In addition, the system and method allow the customer to specify a flexible charging demand with a minimum target SoC and a maximum target SoC. While this additional flexibility adds an extra dimension in the state and action space that would otherwise lead to at least O(L2) time complexity where L is the number of classes, various embodiments employ a multi-stage algorithm that sequentially solves the scheduling problem to reduce complexity to O(L) time complexity. The sufficient condition for the multi-stage algorithm to be optimal is also described.


Embodiments may include a method for controlling charging of multiple electric vehicles (EVs) arriving at, and departing from, different charging stations at different times, comprising, by one or more processors: scheduling charging of each EV of the multiple EVs responsive to which one of a plurality of categories each EV is assigned, each EV assigned to one of the categories according to an arrival time at an associated one of the different charging stations, a departure time from the associated one of the different charging stations, an initial state of charge (SoC) of the EV, and a target SoC of the EV; and controlling charging of each EV responsive to the scheduling of charging for the assigned category of each EV. The method may further include assigning each EV to one of the categories according to one of a plurality of charging demand types designated by the EV. The plurality of charging demand types may include a reliable charging demand and a flexible charging demand. For EVs designating the flexible charging demand, the scheduling may be responsive to a specified minimum target SoC and a specified maximum target SoC of the EV.


In various embodiments, scheduling charging of each EV includes scheduling charging of EVs designating the reliable charging demand assuming the EVs designating the reliable charging demand will consume all available electricity during a specified time period based on an associated grid upper demand band for the specified period, and scheduling charging of EVs designating the flexible charging demand based on the specified minimum target SoC of the EVs designated the flexible charging demand. Scheduling charging of each EV may also include allocating available electricity from an electrical grid to each of the plurality of categories based on a number of EVs in each category arriving at the charging stations during a designated time period and a cost associated with allocated available electricity, the allocated available electricity limited by a minimum of remaining electricity available for each category and charging capacity of the multiple EVs. In various embodiments, the cost associated with the allocated available electricity corresponds to a net cost corresponding to cost of electricity supplied by an electric grid less a cost associated with a penalty corresponding to the allocated available electricity corresponding to EVs designating the reliable charging demand being insufficient to charge the EVs designating the reliable charging demand to associated target SoCs. The cost associated with the allocated available electricity may correspond to a net cost corresponding to cost of electricity supplied by an electric grid less a cost associated with a penalty corresponding to the allocated available electricity corresponding to EVs designating the flexible charging demand being insufficient to charge the EVs designating the reliable charging demand to associated minimum target SoCs.


In at least one embodiment, scheduling charging of each EV includes determining a number of EVs in each category designating a reliable charging demand and arriving at a specified time using a designated statistical distribution, allocating electricity from the grid available for charging to each category designating the reliable charging demand for a second specified time period, associating electricity cost of electricity allocated to each category designating a reliable charging demand, and allocating any electricity available after satisfying the reliable charging demand for the second specified time period to EVs designating the flexible charging demand. The scheduling may further include limiting allocation of electricity from the grid available for charging to each category designating the reliable charging demand to a minimum of remaining electricity available from the grid and charging capacity of the EVs designating the reliable charging demand.


Embodiments may also include a computer-implemented method for controlling charging of a large number of electric vehicles (EVs) arriving and departing different charging stations at different times, the method comprising, by one or more computers: assigning one of a plurality of categories to each EV that arrives at a charging station depending on arrival time to the charging station, designated departure time from the charging station, initial state of charge (SoC) of the EV, target SoC of the EV, and a charging demand type specified by the EV; scheduling charging of each EV according to which of the plurality of categories the EV has been assigned and the charging type specified by the EV; and controlling charging of each EV based on the scheduling. The method may include scheduling based on the demand type specified by the EV corresponding to a flexible demand type, wherein EVs specifying the flexible demand type specify a minimum target SoC and a maximum target SoC, and wherein controlling charging is based on the minimum target SoC and the maximum target SoC specified by the EV. The method may include, for categories associated with the flexible demand type, scheduling based on controlling the charging to satisfy the minimum target SoC for each EV specifying the flexible demand type, and continuing to charge to the maximum target SoC responsive to profit associated with charging to the maximum target SoC exceeding a threshold. In various embodiments, the demand type includes a reliable demand type, wherein scheduling charging comprises allocating electricity available from the grid to categories associated with the reliable demand type before allocating the electricity available from the grid to categories associated with the flexible demand type. The scheduling may include assigning a penalty cost for each EV specifying the flexible demand type that is not charged to at least the minimum target SoC prior to the departure time. In various embodiments, the scheduling is based on a statistical distribution representing EV arrival times to the charging stations and designated departure times from the charging stations.


Various embodiments include a system comprising a plurality of electric vehicle (EV) charging stations each configured to charge a plugged EV during a time period specified by at least one remotely-located processor, the processor configured to schedule charging of EVs for all of the charging stations by scheduling a plurality of charging categories, each EV assigned to one of the plurality of categories by the processor based on arrival time to a charging station, expected departure time from the charging station, state of charge (SoC) of the EV upon arrival, target SoC of the EV before the expected departure time, and a charging demand type specified by the EV. Each of the plurality of charging stations may include a processor configured to control charging of an associated plugged EV according to the charging schedule. The remotely-located processor may be configured to schedule charging of EVs based on a minimum and maximum available power provided from an associated electric grid. The charging demand type may include a flexible demand having an associated minimum target SoC and maximum target SoC specified by the EV, wherein the remotely-located processor is further configured to schedule charging of EVs to provide the minimum target SoC for all EVs specifying the flexible demand, and to continue charging the EVs specifying the flexible demand above the minimum target SoC only if an associated profit exceeds a threshold.


Embodiments of the disclosure may provide one or more associated advantages. For example, vehicle manufacturers may facilitate aggregate scheduling and control of EV charging by platforms in consideration of grid demand by providing vehicle customers the ability to designate flexible charging parameters via a vehicle human-machine interface (HMI), such as a touch-screen display or wired/wireless connected smart device. Similarly, customer preferences, such as inflexible charging or flexible charging and corresponding minimum/maximum target SoC may be automatically communicated to the charging platform. The ability of the vehicle manufacturer to control an individual EV charging demand based on default or customer specified settings allows shifting of charging demand temporally and/or geographically to assist the utility companies' demand management strategy by coupling the capability to control charging of an EV with the readily available flexibility in charge scheduling while that EV is parked and plugged. The scalable and tractable framework for coordinating the charging of a large number of EVs according to embodiments of the present disclosure creates a mutually beneficial system for customers, charging platforms, and the electric utilities. Control of aggregated charging demand by the vehicle manufacturer may provide the ability for OEMs to participate in the wholesale electricity market by biding in capacity markets, demand response markets, and aggregator's markets, for example.


As those of ordinary skill in the art will appreciate, the claimed subject matter enables exchange of vehicle data in a more efficient and secure manner, enhances the data validity check before using the data for vehicle and other operations, and protects data users (whether human or controllers) from being sniffed, spoofed, or hacked.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates scheduling and control of EV charging for a large number of EVs using categorization of EVs.



FIG. 2 illustrates relative performance in terms of profits and energy, respectively, of approximate dynamic programming (ADP), simple programming (SP), and first-come first-serve (FCFS) algorithms.



FIG. 3 illustrates cumulative profits and energy consumption for reliable demand compared to cumulative profits and energy consumption for flexible demand.



FIG. 4 illustrates profits and energy consumption associated with two different penalty amounts.



FIG. 5 illustrates profits and energy consumption for flexible demand relative to varying penalties.



FIG. 6 illustrates profits and energy consumption for different grid power bounds.





DETAILED DESCRIPTION

Embodiments of the present disclosure are described herein. It is to be understood, however, that the disclosed embodiments are merely examples and other embodiments can take various and alternative forms. The figures are not necessarily to scale; some features could be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention. As those of ordinary skill in the art will understand, various features illustrated and described with reference to any one of the figures can be combined with features illustrated in one or more other figures to produce embodiments that are not explicitly illustrated or described. The combinations of features illustrated provide representative embodiments for typical applications. Various combinations and modifications of the features consistent with the teachings of this disclosure, however, could be desired for particular applications or implementations.


Control logic, functions, code, software, algorithms, strategy etc. as described herein with reference to the figures is performed by one or more processors, controllers, computers, etc. executing instructions stored in one or more non-transitory computer readable media to control charging of individual EVs based on scheduling of the charging processes of a large number of EVs. Various steps or functions illustrated or described may be performed in the specified sequence, in parallel, or in some cases omitted. Although not always explicitly illustrated, one of ordinary skill in the art will recognize that one or more of the illustrated steps or functions may be repeatedly performed. Similarly, the specified order of processing is not necessarily required to achieve the features and advantages described herein, but is provided for ease of illustration and description.


Notations

Let x=(xi)i∈custom-character be a real multivariate with an index set custom-character. min{x,y}:=(min{xi,yi}i∈custom-character is the element wise minimum of x,y. We let (xi)i∈custom-character+:=min{(xi)i∈custom-character, 0}. Further, custom-characterd×d′ is an d×d′ dimensional all zero matrix, and custom-characterd×d is an d×d dimensional identity matrix. Let custom-characterb(custom-character) denote the space of all continuous and bounded functions endowed with supremum norm on a set custom-character.


Other notations used in this disclosure are provided below:

    • xsr/xsf,l—state for reliable/flexible demands)
    • ysr/ysf,l—vector of number of plugged-in EVs with reliable/flexible demands
    • wsr wsf,l—vector of number of new arrivals with reliable/flexible demands
    • usr usf,l—vector of the electricity allocation to each reliable/flexible category
    • dsr dsf,l—vector of grid bounds reliable/flexible demands
    • csr csf,l—vector of charging cost for reliable/flexible demands
    • zsr—vector of remaining electricity required for each reliable demand category
    • zsf,l zsf,l—vector of maximum/minimum remaining electricity required for each flexible demand category
    • psf,l—vector of penalty if the minimum request 1 is not met
    • custom-characterr custom-characterf,l—menu for reliable/flexible demands
    • r—charging rate


According to the present disclosure, the EV charge scheduling problem is formulated as a dynamic optimization problem with a finite time horizon custom-character={0, . . . , T}. Consider an operator that provides a menu-based charging service—a customer plugs in the EV at time t and selects the menu (m, n) in a charging application on a connected smart phone or the panel on the charger, for example, before the charging process. The facility will supply m units of electricity from time t to time t+n−1 to the EV.


The EV charging scheduling and corresponding control problem is illustrated graphically in FIG. 1. System 100 provides EV charging scheduling and control for a group of EVs 120 including a large number of individual EVs 110 based on aggregate demand of the group and preferences or settings of the individual EVs 110 to achieve a particular goal, such as maximizing profit of the charging platform and/or managing grid demand, for example. The group of EVs 120 are typically distributed across a geographic area and arrive at different types of charging facilities 130 at different times with different planned departure times. Charging facilities 130 may include home chargers 132 and commercial charging stations 134, for example. The large group of EVs 120 are not necessarily associated or affiliated with a common manufacturer, charging facility, fleet owner, vehicle type, etc. and may be knowingly or unknowingly affiliated with a particular platform 140 that provides remote control of EV charging based on scheduling performed by platform 140 as described herein. The large group of EVs 120 may include thousands or millions of EVs across a large geographic area.


Customer vehicle charging preferences may be set by default by a vehicle manufacturer and/or may be selected by an app associated with the vehicle and accessed via mobile computing device, such as smartphone or computer, for example. One or more menu items or preferences may be selected in advance or upon arrival and plug-in at a home charger 132 or charging station 134. Those of ordinary skill in the art will appreciate that the menu parameters m, n, etc. described herein are not necessarily selectable or otherwise displayed to the customer, but may be determined by a particular charging platform 140 and used in scheduling and controlling EV charging over a selected time period as generally represented at 150 and described in detail herein. In one embodiment, menu parameters m, n, etc. are computed based on the current/target state of charge (SoC) and arrival/parking time entered by the customers, or automatically communicated by the vehicle, charging app, etc. to the charging platform, or otherwise detected by the charging platform 140. In at least one embodiment, two types of charging demands may be provided by charging platform 140 and coordinated via customer preference selection: reliable (or inflexible) and flexible demands charging. For simplicity and ease of exposition, the following description assumes that all EVs charge with a constant charging rate given by r kW. Of course, those of ordinary skill in the art will recognize that the described simplified implementation may be extended to implementations with variable charging rates or multiple charging rates.


For inflexible charging or reliable demand, the menu selections are denoted by custom-characterr⊂{1, . . . , M}×{1, . . . , N}, where an item (m, n)∈custom-characterr means that the charging facility or platform 140 will provide m units of electricity within the next n time slots. We further let custom-characternr={m: (m, n)∈custom-characterr}. As illustrated in FIG. 1, the platform operator 140 assigns a category (t, m, n)∈T×custom-characterr to every EV depending on the preferences input by the EV owner in a smartphone app or via another vehicle or charging facility HMI as represented at 142. Let custom-charactersrcustom-character×custom-characterr denote the categories of EVs that are present at time s. Define custom-characters,1r={(t, m, n):s=t+n−1} to be the categories of the EVs that are connected at time s but will depart at time (s+1) and custom-characters,2r=custom-charactersr\custom-characters,1r.


Let wt,m,n be the number of EVs in category (m, n) that arrive at time t, which is a non-negative bounded integer valued random variable with a known distribution. The selected known distribution may be supported by observational data collected by the charging platform or otherwise determined for a particular application or implementation. We let wt,m,n=0 whenever t<0. We further assume that the sequence of random variables are mutually independent. Let wtr be the random vector representing all new arrivals at time t:







w
t
r

=




[



(

w

t
,
m
,
1


)


m



1
r



,


,


(

w

t
,
m
,
N


)


m



N
r




]





𝒲
t


:=






(

m
,
n

)







{

0
,


,


w
_


t
,
m
,
n



}







dim

(

)


.







We let ysr denote the vector of the number of EVs at the charging station in each category in custom-charactersr:








y
s
r

:=




(

w

t
,
m
,
n


)



(

t
,
m
,
n

)



𝒥
s





𝒴
s


:=





(

t
,
m
,
n

)



𝒥
s





{

0
,


,


w
_


t
,
m
,
n



}




,




where ysr is formed in the order of leaving time, i.e.







y
s
r

:=


[



(

w


s
-
1

,
m
,
1


)


m



1



,







,


(

w


s
-
N

,
m
,
N


)


m



N



,




leaving


at


s






,
0
,







,


(

w


s
-
1

,
m
,
N


)


m



N








leaving


at


s

+
N
-
1




]

.





At each time s, the total electricity allocated to the EVs in the category (t, m, n)∈custom-charactersr is denoted by ust,m,n. We let usr be the vector of the electricity allocation to each category (t, m, n)∈custom-charactersr:







u
s
r

=




(

u
s

t
,
m
,
n


)



(

t
,
m
,
n

)



𝒥
s
r





𝒰
s
r


:=



+

dim

(

𝒥
s
r

)


.






We assume that for t<0, we let ust,m,n=0. We also have the constraint that the total electricity allocated to all the categories having reliable demand be in the interval [drr, dsr], that is, dsrcustom-characterTusdsr, where custom-character is a column vector of all 1 of appropriate dimension. Let dsr=[dsr, dsr]T.


Suppose that allocating one unit of electricity to (t, m, n) at time s incurs a cost cst,m,n. Then, the total cost to the operator at each time is csrT usr, where







c
s
r

=




(

c
s

t
,
m
,
n


)



(

t
,
m
,
n

)



𝒥
s
r





𝒰
s
r


:=




dim

(

𝒥
s
r

)


.






Here, cs can represent either the cost of electricity or the cost of electricity minus the revenue per kWh from the EV owner. Thus, cs can take positive or negative values.


Let zst,m,n be the remaining electricity required by the category (t, m, n)∈custom-characters, which is updated as










z

s
+
1


t
,
m
,
n


=

{




my

t
,
m
,
n





s
=

t
-
1








z
s

t
,
m
,
n


-

u
s

t
,
m
,
n






t

s


t
+
n
-
2






0



s


t
+
n
-
1










(
1
)







then, let zsr be (zst,m,n)(t,m,n)∈custom-charactersr and custom-characterscustom-character+dim(custom-characters) be the space of zsr.


We let xsr=[ysr, zsr, dsr]∈custom-charactersr be the state of the reliable demand, where custom-charactersr:=custom-characters×custom-characters×custom-character+2 is the corresponding state space, and dsr=[dsr, dsr] is the deterministic “actuation noise”. For simplicity, we assume that the noise has a Dirac mass at the point dsr in a day-ahead market. This can be relaxed as described in greater detail below.


Let usr be the actions of the system. For each state xsrcustom-charactersr, the feasible action usr should satisfy that usr∈Γr(xsr), where Γsrcustom-charactercustom-charactersr is a correspondence given by





Γr(xsr):={usrcustom-charactersr:0≤usr≤gr(xsr),dsrcustom-characterTusrdsrust,m,n=zst,m,n for all (t,m,n)∈custom-characters,1r},  (2)


where g(xsr):=min{rysr,zsr}. Here, usr∈Γr(xsr) guarantees that, at each time s, the allocated electricity is upper bounded by the minimum of the remaining electricity zsr and the charging capacity rysr.


As previously described, in addition to the inflexible charging selection represented by the reliable demand described above, various embodiments according to the disclosure provide aggregate scheduling of EVs that select a flexible charging service so that the platform can charge these EVs to an SoC between a selected or specified minimum target SoC and maximum target SoC. In this scenario, the menu is denoted by custom-characterf⊂{1, . . . , M}×{1, . . . , N}×{1, . . . , L} where an item (m, n, l) represents that the facility provides at least 1 and at most m units of electricity within n time slots. The notations used in the flexible demand setting are similar to the notations used previously for the reliable demand setting, but with the superscripts (t, m, n) and r replaced by (t, m, n, l) and (f, l), respectively, for the flexible demand with minimum demand l. For instance, we denote wsf,l, ysf,l, zsf,l, csf,l and usf,l, respectively, as the vector of new arrivals wt,m,n,l, the number of EVs at the charging station yt,m,n,l, the remaining unit of electricity zst,m,n,l, the cost of charging cst,m,n,l, and the amount of charging ust,m,n,l.


Note that the platform only needs to meet the minimum demand l. This leads to introducing a new variable zsf,l=(zst,m,n,l)(t,m,n)∈custom-charactersf,l to capture the remaining minimum demand, which is defined analogously with equation (1) by replacing m with l, i.e.








z
_


s
+
1


t
,
m
,
n
,
l


=

{





ly

t
,
m
,
n
,
l





s
=

t
-
1








(


z
s

t
,
m
,
n
,
l


-

u
s

t
,
m
,
n
,
l



)

+




t

s


t
+
n
-
2






0



s


t
+
n
-
1





.






We also denote zsf,l as the vector of








(


z
_


s
+
1


t
,
m
,
n
,
l


)



(

t
,
m
,
n
,
l

)



𝒥
s

f
,
l




.




The reliable demand setting uses an equality constraint in equation (2) to impose that the demand m is met within the charging window. Instead, under the flexible demand setting, the platform will compensate the customers whose minimum demand is not met. Let







p
s

f
,
l


=


(

p
s

t
,
m
,
n
,
l


)



(

t
,
m
,

n
.
l


)



𝒥

s
,
1


f
,
l








be the penalty vector, which is the monetary penalty per kWh that the platform pays to the customer, if the minimum demand l is not met at the end of the charging window. The total penalty paid by the platform at each time s is given by








p
s

f
,

l




[


(



z
_

s

f
,
l


-

u
s

f
,
l



)



(

t
,
m
,
n
,
l

)



𝒥

s
,
1


f
,
l



+

]

.




In this case, we let xsf,l=[ysf,l,zsf,l, zsf,l,dsf,l]∈custom-charactersf be the state of flexible demand and custom-charactersf:=custom-characters×custom-characters×custom-characters×custom-character+2. Further, the feasible action set for the flexible demand is the correspondence Γf,l:custom-charactersfcustom-character, which is












Γ

f
,
l


(

x
s

f
,
l


)

:=

{



u
s

f
,
l




𝒰
:

0


u
s

f
,
l




g

(

x
s

f
,
l


)




,


d
s

f
,
l








u
s

f
,
l






d
_

s

f
,
l




}


,




(
3
)







where dsf,l=(dsf,l, dsf,l) is defined similarly with dsr below. In this case, we let ds=(ds,ds) be the total energy bound for the platform (both reliable and flexible demand) at each time s, then dsr and dsf,l satisfy that












d
_

s

=



d
_

s
r

+




l
=
1

L



d
_

s

f
,
l





,



d
_

s

=



d
_

s
r

+




l
=
1

L



d
_

s

f
,
l





,

l
=
1

,


,

L
.





(
4
)







The feasible set of flexible demand represented by equation (3) removes the equality constraints in equation (2), which allows charging an EV in (t, m, n, l) from 0 to m units of electricity. Note that the penalty psf,l can be changed to ensure the minimum demand l is satisfied.


We next determine the state of the EV charging system, the transition dynamics, and pose the stochastic dynamic program.


The system has linear dynamics for both reliable and flexible demand with minimum demand l, and the state transition functions ƒr, ƒf,1, . . . , ƒf,L given by:











x

s
+
1

r

=



f
r

(


x
s
r

,

u
s
r

,

w
s
r

,

d

s
+
1

r


)

:=

[






A
y
r



y
s
r


+


C
y
r



w
s
r










A
z

(


z
s
r

-

u
s
r


)

+


C
z
r



w
s
r








d

s
+
1

r




]



,




(
5
)













x
s

f
,
l


=



f
r

(


x
s

f
,
l


,

u
s

f
,
l


,

w
s

f
,
l


,

d

s
+
1


f
,
l



)








:=


[






A
y

f
,
l




y
s

f
,
l



+


C
y

f
,
l




w
s

f
,
l











A
z

f
,
l




z
s

f
,
l



-

u
s

f
,
l


+


C
z

f
,
l




w
s

f
,
l











A
z

(



z
_

s

f
,
l


-

u
s

f
,
l



)

+


C

z
_


f
,
l




w
s

f
,
l









d

s
+
1


f
,
l





]


,

l
=
1

,


,
L







where the time invariant matrices Ayr, Azr, Cyr, Czr are given as follows:








A
y
r

=


A
z
r

=

[




𝕆




"\[LeftBracketingBar]"


𝒥
s
2



"\[RightBracketingBar]"


×



"\[LeftBracketingBar]"


𝒥
s
1



"\[RightBracketingBar]"







𝕀




"\[LeftBracketingBar]"


𝒥
s
2



"\[RightBracketingBar]"


×



"\[LeftBracketingBar]"


𝒥
s
2



"\[RightBracketingBar]"









𝕆




"\[LeftBracketingBar]"


𝒥
s
1



"\[RightBracketingBar]"


×



"\[LeftBracketingBar]"


𝒥
s
1



"\[RightBracketingBar]"







𝕆




"\[LeftBracketingBar]"


𝒥
s
1



"\[RightBracketingBar]"


×



"\[LeftBracketingBar]"


𝒥
s
2



"\[RightBracketingBar]"







]



,








C
y
r

=

[




C
y
1



0





0




0



C
y
2






0


















0


0






C
y
N




]


,


C
z
r

=

[




C
z
1



0





0




0



C
z
2






0


















0


0






C
z
N




]


,








C
y
k

=

[






0













𝕀




"\[LeftBracketingBar]"



k



"\[RightBracketingBar]"


×



"\[LeftBracketingBar]"



k



"\[RightBracketingBar]"













0



]


,








C
z
k

=

[






0













diag

(


{
m
}


m



k



)










0



]


,



k

=


{



(

m
,
n

)




:
n


=
k

}

.






and Ayf,l, Azf,l, Cyf,l,Czf,l are given by








A
y

f
,
l


=


A
z

f
,
l


=

[




𝕆


dim

(

𝒥

s
,
2


f
,
l


)

×

dim

(

𝒥

s
,
1


f
,
l


)






𝕀


dim

(

𝒥

s
,
2


f
,
l


)

×

dim

(

𝒥

s
,
2


f
,
l


)








𝕆


dim

(

𝒥

s
,
1


f
,
l


)

×

dim

(

𝒥

s
,
1


f
,
l


)






𝕆


dim

(

𝒥

s
,
1


f
,
l


)

×

dim

(

𝒥

s
,
2


f
,
l


)






]



,








C
y

f
,
l


=

[




C
1






0















0






C
N




]


,


C
k

=

[






0













𝕀


dim

(


k

)

×

dim

(


k

)












0



]


,









C
z

f
,
l


=

[




C
z

1
,
l







0















0






C
z

N
,
l





]


,


C

z
_


f
,
l


=

[




C

z
_


1
,
l







0















0






C

z
_


N
,
l





]


,








C
z

k
,
l


=

[






0













diag

(


{
m
}


m



k

f
,
l




)










0



]


,


C

z
_


k
,
l


=

[






0













diag

(


{
l
}


l



k

f
,
l




)










0



]


,
where








k

f
,
l


=


{




(

m
,
n
,

l



)





f

:

n


=
k

,


l


=
l


}

.





At time s, a feasible policy of both reliable and flexible situations forms a measurable map πs: custom-charactersr×custom-charactersfcustom-charactersr×custom-charactersf where custom-charactersf:=Πl=1Lcustom-charactersf,l and custom-charactersf:=Πl=1Lcustom-charactersf,l. Note that based on equation (4) the feasible policy πs satisfies that












π
s

(


x
s
r

,

x
s
f


)



Γ

(


x
s
r

,

x
s
f


)


:=

{



(


u
s
r

,

u
s
f


)




𝒰
s
r

×

𝒰
s
f

:


u
s
r





Γ
r

(

x
s
r

)


,


u
s

f
,
l





Γ

f
,
l


(

x
s

f
,
l


)


,






(
6
)














for


all


i

=
1

,


,
L
,



d
_

s








u
s
r


+




i
=
1

L






u
s

f
,
l








d
_

s



}

.




Let Πs denote the set of all feasible policies, π=(π0, . . . , πT) denote a feasible strategy of the platform, and Π:=ΠsΠs denote the feasible strategy space.


We now introduce the finite time horizon stochastic dynamic program (DP). The expected total cost of the reliable and flexible demand based on the initial state x=[xr, xf] and using the strategy π is given by










J

(

π
;
x

)

=


𝔼
[




s
=
1

T



(






c
s

r
T




π
s
r



(

x
s
r

)





reliable


demand


cost



+




l
=
1

L



(




c
s

f
,

l
T





π
s

f
,
l




(

x
s

f
,
l


)





flexible


demand


cost



+




p
s

f
,

l
T






(

l
-




τ
=
t

s




π
s

f
,
l


(

x
s

f
,
l


)



)



(

t
,
m
,
n
,
l

)



𝒥

s
,
1


f
,
l



+


)




flexible


demand


penalty




)





[


x
0
r

,

x
0

f
,
l



]


=
x



]

.





(
7
)







That is, for the flexible demand in category (t, m, n, l), the platform will try to satisfy the minimum demand l, and will keep charging up to their battery capacity m whenever it is profitable. However, in case ds is small, then the platform pays a penalty to the EVs in category custom-characters,1f,l whose minimum demand was not met and who need to leave at time s+1.


The goal is to minimize the expected total cost from s=1 to s=T given the initial state x, which can be solved using the usual dynamic programming method under fairly mild conditions. The optimal value functions v s can be obtained by applying Bellman operator Hs for each time s=T, . . . , 1, where the Bellman operator is defined as












v
s
*

(

x
s

)

=




H

s
+
1


(

v

s
+
1

*

)



(

x
s

)


:=


inf


(


u
s
r

,

u
d
f


)



Γ

(


x
s
r

,

x
s

f
,
l



)





c
s

r
T




u
s
r


+




l
=
1

L



(



c
s

f
,

l
T





u
s

f
,
l



+


p
s

f
,

l
T



[


(



z
_

s

f
,
l


-

u
s

f
,
l



)



(

t
,
m
,
n

)



𝒥

s
,
1


f
,
l



+

]


)


+

𝔼
[


v

s
+
1

*

(

f

(


x
s

,

u
s

,

W
s

,

d

s
+
1



)

)

]



,




(
8
)







where vT+1*(xT+1)≡0, and f(xs, us, Ws, ds+1) is abuse of notation representing the state transition functions (5) of xs=[xsr,xsf], us=[usr,usf], Ws=[Wsr,Wsf], and ds+1.


In this case, we can obtain the optimal scheduling policies π*=[πs*custom-character SET by applying the value iteration vs*=Hs+1(vs+1*) in (8) from time s=T to time s=1 recursively. Here, πs*(xs) is the minimizer in (8).


As those of ordinary skill in the art may appreciate, with a sufficiently large menu size custom-characterr and custom-characterf, the dimensionality of the state and action spaces also becomes large. In this case, computing vs for each s∈custom-character is challenging due to the curse of dimensionality. As such, embodiments according to the present disclosure leverage Approximate Dynamic Programming (ADP) to compute the approximately optimal value functions.


The two main challenges to overcome for obtaining the approximately optimal value functions are computing of expectation in the Bellman operator and storing approximately optimal value functions. The first challenge is mitigated by using the empirical Bellman operator, which uses independent and identically distributed (i.i.d). samples of noise to approximate the computation of the expected future value. The second challenge is mitigated by using a projection operator, which takes the values from the computation of the empirical Bellman operator as inputs, and a function in the chosen function approximating class as outputs.


We use empirical Bellman operator Ĥs+1k:custom-characterb(custom-characters)→custom-characterb(custom-characters) to approximate the actual Bellman operator Hs+1. Let {Ws,i}i=1k be a sequence of independent identically distributed (i.i,d.) samples of ws, then the empirical Bellman operator Ĥs+1k is given by









v
^

s
k

(

x
s

)

=





H
^


s
+
1

k

(


v
^


s





+
1



`k

)



(

x
s

)


:=


inf


u
s



Γ

(

x
s

)





c
s
T



u
s


+


1
k






i
=
1

k






v
^


s
+
1

k

(

f

(


x
s

,

u
s

,

W

s
,
i



)

)

.








While applying the value iteration, it is necessary to store a function approximator of {circumflex over (v)}sk in computers readable memory/storage. The function approximator can be obtained by projecting the value function {circumflex over (v)}sk onto a feasible function approximating class, such as neural networks or reproducing kernel Hilbert space (RKHS), which is dense in custom-characterb(custom-character).







Loss
(



v
^

s
k

,

h



{

x

s
,
j


}


j
=
1

l



)

=


1
l






j
=
1

l





(




v
^

s
k

(

x

s
,
j


)

-

h

(

x

s
,
j


)


)

2

.







We denote Πsl,d:custom-characterb(custom-characters)→custom-characterd(custom-characters) as the function approximating projection that maps the output of Ĥs+1k({circumflex over (v)}s+1k) to a function in custom-characterd. This is defined as









s

l
,
d



(


v
^

s
k

)


=

arg


inf

h


𝒢
d






Loss
(



v
^

s
k

,

h



{

x

s
,
j


}


j
=
1

l



)

.






We here construct a composited operator that combines the empirical Bellman operator and function approximating operator. We let





Ψsk,l,dsl,d∘Ĥs+1k:custom-characterd(custom-characters+1)→custom-characterd(custom-characters)


be the random fitted empirical Bellman operator used in place of the actual Bellman operator Hs+1 to arrive at an approximate function {circumflex over (v)}s. Here, k is the number of samples generated, d is a parameter describing the size of the function approximating class, and l is the number of samples used in computing the empirical loss function for the projection operation.


We define the fitted value iteration at time s∈custom-character as






{circumflex over (v)}
s
k,l,d(xs)=Ψsk,l,d({circumflex over (v)}s+1k,l,d)(xs).


We now proceed to proving that this fitted value iteration algorithm converges as we increase k, l, d→∞. In what follows, we aim at increasing the k, l, d simultaneously. Let j∈custom-character and k(j), l(j), d(j) be such that as j→∞, we have k(j), l(j), d(j)→∞. By a slight abuse of notation, we denote Ĥs+1j:=Ĥs+1k(j), Πsj:=Πsl(j),d(j), and the fitted value iteration algorithm by






{circumflex over (v)}
s
j(xs):=Ψsj({circumflex over (v)}s+1j)(xs):=Ψsk(j),l(j),d(j))({circumflex over (v)}s+1k(j),l(j),d(j))(xs).


To establish the convergence of the proposed algorithms, we also need the following reasonable assumptions on the projection operators.


Assumption 1. The projection operator Πsl,d:custom-characterb(custom-characters)→custom-characterd(custom-characters) satisfies the followings two conditions:

    • 1. Πsl,d is approximately non-expansive, that is, for all v1, v2custom-characterb(custom-characters), we have ∥Πsl,d (v1)−Πsl,d(v2)∥≤∥v1−v2{circumflex over (ζ)}sl,d, where {circumflex over (ζ)}sl,d≤{circumflex over (ζ)}s≤∞ almost surely and {circumflex over (ζ)}sl,d→>0 as l, d→∞ in probability.
    • 2. For any ϵ>0 and δ>0, there exists Ml, Md that may depend on vs* such that custom-character(∥Πsl,d(vs*)−vs*∥>ϵ)<δ for all l≥Ml, d≥Md.


Under the assumptions listed above, we have the following theorem where the convergence of the fitted value iteration algorithm is established.


Theorem 1: If Assumption 1. holds, then {circumflex over (v)}sj satisfies for any κ>0,








limsup

j








(







v
^

s
j

-

v
s
*






>
κ

)


=
0.




The proof of the Theorem is established below. Thus, as we increase the number of samples for empirical Bellman operator, expand the function approximating class to include more parameters, and take more samples of the state to project the value function to the function approximating class, we are guaranteed to converge to the optimal value functions under the sup norm.


Proof. We first establish two auxiliary results to establish the theorem. The first statement establishes that the empirical Bellman operator is non-expansive. The second statement shows that the empirical Bellman operator Ĥs+1j when applied on vs+1* converges to vs* in probability as j→∞.


Lemma 1. For any v,v′∈custom-characterb(custom-characters+1) and any realization of the random operator Ĥs+1j, we have ∥Ĥs+1j(v)−Ĥs+1j(v′)∥≤∥v−v′∥ almost surely. The proof is straightforward and therefore omitted.


Lemma 2. For any ϵ>0, we have the following holds:









lim

k







(








H
^


s
+
1

k

(

v

s
+
1

*

)

-


H

s
+
1


(

v

s
+
1

*

)







ϵ

)


=
0

,




The proof may be found in the published literature.


We now proceed to proving Theorem 1 using the principle of mathematical induction. We have





{circumflex over (v)}sj−vs*∥≤∥Ψsj({circumflex over (v)}s+1j)−Ψsj(vs+1*)∥+∥Ψsj(vs+1*)−Hs+1(vs+1*)∥.


Let us consider the first summand on the right side of the equation above. We have





∥Ψsj({circumflex over (v)}s+1j)−Ψsj(vs+1*)∥≤∥Ĥs+1j({circumflex over (v)}s+1j)−Ĥs+1j(vs+1*)∥sj≤∥{circumflex over (v)}s+1j−vs+1*∥sj,


where we used Lemma 1 and Assumption 1(1). Next, we consider the second summand on the right side of the equation:





∥Ψsj(vs+1*)−Hs+1(vs+1*)∥=∥Πsj(Ĥs+1j(vs+1*))−vs*∥≤∥Πsj(Ĥs+1j(vs+1*))−Πsj(vs*)∥+∥Πsj(vs*)−vs*∥≤∥Ĥs+1j(vs+1*))−vs*∥sj+∥Πsj(vs*)−vs*∥,


where the first inequality is due to the triangle inequality and the second inequality is due to Assumption 1(1). Thus, we conclude that





{circumflex over (v)}sj−vs*∥≤∥{circumflex over (v)}s+1j−vs+1*∥+∥Ĥs+1j(vs+1*))−vs*∥+∥Πsj(vs*)−vs*∥+2ζsj.


For time s=T, we have vT+1*={circumflex over (v)}T+1j=0. As j→∞, all three terms on the right goes to 0 in probability due to Lemma 2, Assumption 1(1), and Assumption 1(2). Thus, ∥{circumflex over (v)}Tj−vT*∥→0 in probability as j→∞ and the statement holds for time T. For any time s, we can use the same argument to conclude that as j→∞, ∥{circumflex over (v)}sj−vs*∥→∞ in probability. The proof of the theorem is complete.


Next, we examine three crucial properties of the value function: monotonicity and Lipschitz continuity with respect to the state xs, and continuity with respect to the system parameters.


Monotonicity of Value Functions

Note that any realization of the state xs is a non-negative vector in custom-character|custom-characters|×custom-character+|custom-characters|×custom-character2. Endow the state space custom-characters with the following partial order: Let xs, xs′∈custom-character. Then, xs≤xs′ if and only if ys≤ys′, zst,m,n=z′st,m,n=z′st,m,n for every (t, m, n)∈custom-characters1, zst,m,n≤z′st,m,n for every (t, m, n)∈custom-characters2, and ds≤ds′. A function v: custom-characterscustom-character is said to be a monotonically increasing function if and only if for any x, x′∈custom-characters such that x≤x′, we have v(x)≤v(x′). A function v: custom-charactercustom-character is said to be a monotonically decreasing function if and only if −v is monotonically increasing. In this section, we show that the dynamic optimization problem formulated above yields monotonically decreasing value functions at all times.


Theorem 2: For each s∈custom-character, the optimal value function vs* is a monotonically decreasing function of xs.


Proof. To show this, we first note that for any x≤x′, we have:

    • 1. Γ(x)⊆Γ(x′).
    • 2. f(x,u,w,ds+1)≤f(x′,u,w,ds+1) for all u∈Γ(x) and w∈custom-characters.


We now prove the statement using induction. The terminal cost is 0, so it is trivially monotone decreasing. Assume that vs+1* is monotonically decreasing. We claim that vs*=Hs+1(vs+1*) is also monotone decreasing function. Pick x, x′∈custom-characters such that x≤x′, u∈Γ(x) and w∈custom-characters. Since f(x, u, w)≤f(x′,u, w) and vs+1* is monotonically decreasing, we conclude that






v
s+1*(f(x′,u,w,ds+1))≤vs+1*(f(x,u,w,ds+1)).


Consequently, custom-character[vs+1*(f(∩,u,W,ds+1)] is also monotonically decreasing function. This yields















inf

u


Γ

(
x
)





c
s
T


u

+

𝔼
[


v

s
+
1

*

(

f

(

x
,
u
,
W
,

d

s
+
1



)

)

]






inf

u


Γ

(

x



)





c
s
T


u

+

𝔼
[


v

s
+
1

*

(

f
,
x
,
u
,
W
,

d

s
+
1



)




)

]





inf

u


Γ

(

x



)





c
s
T


u

+

𝔼
[


v

s
+
1

*

(

f
,

x


,
u
,
W
,

d

s
+
1



)




)

]

,




where the first inequality is due to Γ(x)⊆Γ(x′), and the second inequality results from the conclusion above. In other words, vs* is monotonically decreasing. An application of the principle of mathematical induction implies that vs* is monotone decreasing for all s.


Lipschitz Continuity of Value Functions

We now endow the state and the action space with metrics and establish the Lipschitz continuity of the value functions. Let custom-character:=custom-character0=custom-character2= . . . =custom-characterT and a same convention is applied for custom-character. Define the metric on custom-character and custom-character as





ρx(x,x′)=∥x−x′∥custom-character(u,u′)=∥u−u′∥,


for any x, x′∈custom-character,u,u′∈custom-character. Let 2custom-character denote the set of all compact subsets of custom-character. We endow this space with the Hausdorff metric, given by









ρ
J

(



,






)

=

max


{



sup

u






inf


u













ρ
U

(

u
,

u



)


,


sup


u












inf

u







ρ
U

(

u
,

u



)



}



,




for all U,U′⊂custom-character.


Theorem 3: The value function vs* is a Lipschitz continuous function.


Proof. We first claim the following statements:

    • 1. The correspondence Γ: custom-characterscustom-character, is Lipschitz continuous with coefficient LΓ=max{r, 1}: For any x, x′∈custom-characters, we have ρH(Γ(x), Γ(x′))≤LΓρX(x, x′).
    • 2. For every w∈custom-characters, the state transition function ƒ(∩,∩, w) is Lipschitz continuous in (x, u)∈custom-characters with Lipschitz coefficient Lf(w)≡1 and LP:=∫Lf(w)custom-character(dw)=1≤∞.
    • 3. The cost function cs:custom-characterscustom-character is Lipschitz continuous with Lipschitz coefficient Lcs:=∥cs1.


We can write Γ(x) as Γ(x)={u∈custom-characters: u≥0, Q1u≤Q2x, Q3u=Q4x} for appropriate matrices Q1, Q2, Q3, Q4 that have bounded entries. Thus, the constraint set is actually a polyhedral set. We conclude that Γ is a Lipschitz continuous correspondence with Lipschitz coefficient LΓ. The exact value of Lipschitz coefficient is difficult to derive with more detailed discussions on upper bounds on LΓ available in the published literature.


We now prove the second claim. Using triangle inequality, we have





f(x,u,w)−f(x′,u′,w)∥≤∥A∥∥x−x′∥+∥B∥μu−u′∥≤(∥x−x′∥+∥u−u′∥)


which shows that f is Lipschitz continuous over custom-characters with Lipschitz coefficient 1. The Lipschitz coefficient of the cost function is derived from the Cauchy Schwarz inequality.


It can be shown that the Lipschitz continuity of the value function then follows, outlined as follows. Suppose that vs+1* is Lipschitz continuous with Lipschitz coefficient. Then, it can be concluded that





|vs*(xs)−vs*(x′s)|≤|csTus*−csTu′s*|+|custom-character[vs+1*(f(xs,us*,ws,ds+1))]−custom-character[vs+1*(f(xs′,u′s*,ws,ds+1))]|≤LcsLΓ∥xs−xs′∥+Lvs+1*(1+LΓ)∥xs−xs′∥≤(LcsLΓ+Lvs+1*LP(1+LΓ))∥xs−xs′∥,


which implies vs* is Lipschitz continuous with Lipschitz constant Lvs*=LcsLΓ+Lvs+1*(1+LΓ) (since LP=1). The induction step is complete.


Robustness of Value Functions with Respect to Parameters


The problem identified here has multiple parameters that can change over time. For instance, the cost of acquiring electricity in the wholesale markets or the distribution of the EV arrival process may change slightly over time. This can be studied under the umbrella of parameterized dynamic programs, where the parameters influence the cost/profit functions or the EV arrival process. We investigate in this section the continuity of the value function as a function of the parameters. We identify some sufficient conditions under which a slight change in the parameters would lead to a slight change in the value function. This allows us to conclude the robustness of the scheduling algorithm with respect to small parametric uncertainty.


Let Θ⊂custom-characterq be the parameter space, which is assumed to be a compact subset of a Euclidean space. We consider a parameterized optimization problem, parameterized by θ∈Θ, in which:

    • 1. {tilde over (c)}s(θ) is the negative profit function; and
    • 2. The probability distribution of the EV arrival process {tilde over (w)}s is given by vs(⋅, θ).


      The parameterized dynamic program is then rewritten as:









v
~

s
*

(


x
s

,
θ

)

=



inf


u
s



Γ

(

x
s

)








c
~

s

(
θ
)

T



u
s


+



𝔼


v
s

(
θ
)


[



v
~


s
+
1

*

(


f

(


x
s

,

u
s

,


w
~

s

,

d

s
+
1



)

,
θ

)

]

.






Here, {tilde over (v)}s*:custom-characters×Θ→custom-character is the optimal parameterized value function. We also let {tilde over (π)}s*(xs, θ) be the corresponding parameterized scheduling policy. We identify some sufficient conditions and establish the continuity of {tilde over (v)}s* and lower semicontinuity of {tilde over (π)}s* below.


Assumption 2. The following holds:

    • 1. {tilde over (c)}s is continuous on Θ; and
    • 2. There exists a base probability measure λs and a continuous and bounded function βs: custom-characters×Θ→[0, ∞) such that vs(dw,Θ)=βs(w,θ)λs(dw).


Theorem 4. Suppose that Assumption 2 holds. Then, {tilde over (v)}* is jointly continuous on custom-characters×Θ and {tilde over (π)}s* is lower semi-continuous on Xs×Θ.


Proof. Assumption 2(1) implies the cost function (us, θ)custom-characters(θ)Tus is jointly continuous on Θ×custom-characters. Since the state transition function ƒ is a linear map, then linearity of ƒ and Assumption 2(2) implies that for any h∈custom-characterb(custom-characters+1) and any convergent sequence {(xn, un, θn)}ncustom-characters×custom-characters×Θ satisfying (xn, un, θn)→(x, u, θ), we have h(f(xn, un, w, d))βs(w, θn)→h(f(x, u, w, d))βs(w, θ). Further, since h, βs are continuous and bounded functions, we conclude that









lim

n








h

(

f

(


x
n

,

u
n

,
w
,

d



)

)




v
s

(

dw
,

θ
n


)




=



lim

n








h

(

f

(


x
n

,

u
n

,
w
,

d



)

)




β
s

(

w
,

θ
n


)




λ
s

(
dw
)





=

(
a
)







h

(

f

(

x
,
u
,
w
,

d



)

)




β
s

(

s
,
θ

)




λ
s

(
dw
)



=




h

(

f

(

x
,
u
,
w
,

d



)

)




v
s

(

dw
,
θ

)






,




where the equality in (a) results from the dominated convergence theorem as custom-characters, custom-characters, custom-character, Θ are compact.


Note that we have also shown in the proof of Theorem 3 that Γ(xs) is a continuous and compact-valued correspondence. Thus, it can be implied that {tilde over (v)}s* is continuous on custom-characters×Θ and {tilde over (π)}s* is lower semi-continuous on Xs×Θ, which completes the proof.


In contrast to the above, which is based on a single demand type (reliable demand), we now take into account the flexible demand that allows flexible charging between a minimum target SoC and maximum target SoC within the available charging time. In this scenario, the dimensionality of state space of the flexible demand, custom-charactersf, linearly increases with L, which significantly increases the computational complexity. To alleviate the problem, we decouple the original problem into L+1 serial stages to reduce the dimensionality of the state space. The detailed algorithm and the corresponding optimality results are described below.


The scheduling problem (7) can be solved by the Empirical Fitted Value Iteration algorithm, which aims at efficiently solving an approximated dynamic programming. As previously described above with respect to a single demand type, consider Bellman operators at s=1, . . . ,T,












v
s
*

(

x
s

)

=




H

s
+
1


(

v

s
+
1

*

)



(

x
s

)


:=


inf


u
s



Γ

(

x
s

)






c
s

(


x
s

,

u
s


)


+

𝔼
[


v

s
+
1

*

(

f

(


x
s

,

u
s

,

W
s

,

d

s
+
1



)

)

]



,




(
9
)







where vT+1* (xT+1)≡0. Here, with slight abuse of notation, we let cs be a general bounded Lipschitz continuous cost function at time s, and Γ be a Lipschitz continuous correspondence. We also remove the superscripts r,f here to indicate that this algorithm can be used for any stochastic dynamic program.


Let custom-characterb(custom-characters) denote the space of continuous and bounded functions over custom-characters endowed with the supremum norm. To solve (9), we again use the empirical Bellman operator {tilde over (H)}s+1k:custom-characterb(custom-characters)→custom-characterb(custom-characters) to approximate the actual Bellman operator Hs+1. Let {Ws,i}i=1k be a sequence of independent identically distributed (i.i,d.) samples of ws, then the empirical Bellman operator Ĥs+1k is given by












v
^

s
k

(

x
s

)

=





H
^


s
+
1

k

(


v
^


s
+
1

k

)



(

x
s

)


:=


inf


u
s
r



Γ

(

x
s

)






c
s

(


x
s

,

u
s


)


+


1
k






i
=
1

k






v
^


s
+
1

k

(

f

(


x
s

,

u
s

,

W

s
,
i



)

)

.








(
10
)







As before, the value approximator {circumflex over (v)}sk is stored by projecting it onto a feasible function approximating class, such as neural networks or reproducing kernel Hilbert space (RKHS), which is dense in custom-characterb(custom-characters). Let custom-characterd(custom-characters)⊂custom-characterb(custom-characters) be the function approximating class parameterized by d∈custom-character. We then create a data set {xs,j, {circumflex over (v)}sk(xs,j)}j=1k′, where {xs,j}j=1k′ are uniformly sampled from the state space custom-characters, and {circumflex over (v)}sk(xs,j) is obtained according to (10). We let Πsk′,d: custom-characterb(custom-characters)→custom-characterd(custom-characters) be the function approximating the projection that maps the output of Ĥs+1k({circumflex over (v)}s+1k) to a function in custom-characterd. This is defined as













s

l
,
d



(


v
^

s
k

)


=

arg


inf

h


𝒢
d





Loss
(



v
^

s
k

,

h



{

x

s
,
j


}


j
=
1


k






)



,




(
11
)







where the loss function Loss: custom-characterb(custom-characterscustom-characterd(custom-characters)→custom-character+ can be picked as the mean squared error between two functions







Loss
(



v
^

s
k

,

h



{

x

s
,
j


}


j
=
1


k






)

=


1

k








j
=
1


k








(




v
^

s
k

(

x

s
,
j


)

-

h

(

x

s
,
j


)


)

2

.







We again construct a composite operator that combines the empirical Bellman operator and function approximating operator. We obtain an approximate function {circumflex over (v)}s by replacing the actual Bellman operator Hs+1 with a random fitted empirical Bellman operator Ψsk,k′,d which is given by





Ψsk,k′,dsk′,d∘Ĥs+1k:custom-characterd(custom-characters+1)→custom-characterd(custom-characters).


Here, k is the number of samples generated, d is a parameter describing the size of the function approximating class, and k′ is the number of samples used in computing the empirical loss function for the projection operation. By increasing k, k′, d simultaneously, we let j∈custom-character and k(j), k′(j), d(j) be such that as j→∞, we have k(j), k′(j), d(j)→∞. In this case, the fitted value iteration algorithm is given by












v
^

s
j

(

x
s

)

=



Ψ
s
j

(


v
^


s
+
1

j

)



(

x
s

)


:=



Ψ
s


k

(
j
)

,

k




(
j
)


,

d

(
j
)



(


v
^


s
+
1



k

(
j
)

,

k




(
j
)


,

d

(
j
)



)




(

x
s

)

.






(
12
)







The convergence of {circumflex over (v)}sj is proven in Theorem 1 as previously described.


Lemma 3. If Assumption 1 holds, then {circumflex over (v)}sj is satisfied for any κ>0,








limsup

j








(







v
^

s
j

-

v
s
*






>
κ

)


=
0.




Note that the convergence result is established under j→∞, which means that computing {circumflex over (v)}sj with small error K requires a sufficiently large number of samples k(j). However, if we apply (12) on the problem (8), then it becomes practically intractable with multiple types of demand, e.g. reliable (inflexible) and flexible demand. As previously described, the dimensionality of the action space is dim(custom-characters)=dim(custom-charactersr)+Σl=1L dim (custom-charactersf/l), which increases linearly with L. This can be shown to leads to the computation complexity of solving the optimization (8) of at least O(L2). Further, the state space dimensionality is also linearly increasing with L, e.g. dim(custom-characters)=dim(custom-charactersr)+Σl=1L dim (custom-charactersf,l), which also leads to high sample and computational complexity. The quadratic scaling of computational time in the size of the state and action spaces is addressed by dividing the scheduling problem into two subproblems in which the state and action space of each subproblem is smaller. This reduces the computational time at the expense of a small loss in optimality. However, we also identify a sufficient condition under which there is no loss in optimality.


Reducing Complexity Through Two Stages of Scheduling

In this section, we describe a two stage algorithm that sequentially solves (7) according to each type of demand. This can reduce the dimensionality of the state space custom-charactersr×custom-charactersf, which simplifies the empirical fitted value iteration.


The Bellman equation (8) is decoupled as follows: we separate the state space custom-characters into custom-charactersr and custom-charactersf,1, . . . , custom-charactersf,L, and consider a sub-scheduling problem on each separated space. That is, we sequentially solve the original problem based on the types of demands, where each stage solves an empirical fitted value iteration on custom-charactersr, custom-charactersf,1, . . . , custom-charactersf,L iteratively. Though this may seem like an intuitive decoupling, the key challenge here comes from the control variables usr and usf,1, . . . , usf,L sharing the same bounds ds in the last inequalities of (6). Recognizing this forms our motivation to obtain dsr and dsf,1, . . . , dsf,L sequentially for satisfying (4) as described below.


Reliable Demand

We first let dsr=ds, which implies that the reliable demand consumes all the electricity available to the platform. Then, the reliable demand is scheduled by solving the Bellman equation (9) for reliable demand only. For instance, we let












π
s
r

(

x
s
r

)

=



arginf


u
s
r




Γ
r

(

x
s
r

)





c
s

r
T




u
s
r


+

𝔼
[


v

s
+
1


r
*


(


f
r

(


x
s
r

,

u
s
r

,

W
s
r


)

)

]



,




(
13
)







where vsr+:custom-charactersrcustom-character is the optimal value function for the cost of reliable demand only. Here, we use the empirical fitted value iteration to solve (13) to obtain the approximated value function {circumflex over (v)}sr,j. The approximated optimal control action at each time s is ûsr*={circumflex over (π)}sr,j(xsr), where {circumflex over (π)}sr,j is the approximated scheduling policy corresponding to {circumflex over (v)}sr,j. Then, the expected total cost based on the given initial state xr is









J
r

[



π
^


r
,
j


;

x
r



)

=


𝔼
[






s
=
1

T




c
s

r
T





u
^

s

r
*






x
0
r


=

x
r


]

.





Flexible Demand

Similarly to the reliable demand, the flexible demand with minimum demand l is scheduled by the policy












π
s

f
,
l


(

x
s

f
,
l


)

=



arginf


u
s

f
,
l





Γ

f
,
l


(

x
s

f
,
l


)





c
s

f
,

l
T





u
s

f
,
l



+



p
s

f
,

l
T



(


z
s

f
,
l


-

u
s

f
,
l



)



(

t
,
m
,
n

)



𝒥

s
,
1


f
,
l



+

+

𝔼
[


v

s
+
1


s
,

l
*



(


f

f
,
l


(


x
s

f
,
l


,

u
s

f
,
l


,

W
s

f
,
l



)

)

]



,




(
14
)







where vsf,l:custom-charactersf/lcustom-character is the optimal value function for flexible demand only. In this case, we denote {circumflex over (v)}sf,l as the approximator for vsf,l by using the empirical fitted value iteration (12). We then obtain {circumflex over (π)}sf,l,j(xsf,l) as the corresponding approximated optimal policy.


However, we need to use the optimal action ûsr* to compute the remaining electricity that can be allocated to the flexible demand. That is, we compute the dsf,l by












d
_

s

f
,
l


=


(



d
_

s

-


1
T




u
^

s

r
*



-




i
=
1

L




1
T




u
^

s

f
,

i
*






)

+


,




d
_

s

f
,
l


=


(



d
_

s

-


1
T




u
^

s

r
*



-




i
=
1

L




1
T




u
^

s

f
,

i
*






)

+


,




(
15
)







where ûsf,l*={circumflex over (π)}sf,l,j(xsf,l) the approximated optimal action for the flexible demand with minimum demand l.


Here, it is straightforward to verify that computing dsf,l with 15 for each l=1, . . . , L yields dsr, dsf,1, . . . , dsf,L satisfying (4). In this case, the optimal expected total cost for flexible demand is given by








J

f
,
l


(



π
^


f
,
l
,
j


;

x

f
,
l



)

=


𝔼
[




s
=
1

T



(





c
s

f
,

l
T






u
^

s

f
,

l
*




+



p
s

f
,

l
T



(

l
-




τ
=
t

s




u
^

s

f
,

l
*





)



(

t
,
m
,
n
,
l

)



𝒥

s
,
1


f
,
l



+




x
0

f
,
l



=

x

f
,
l





]

.





We establish the following sufficient condition to guarantee the optimality of the above decoupling procedure.


Lemma 4. Suppose d1, . . . , dT satisfy for all s=1, . . . , T,







d

s=0,ds≥r(custom-characterTysrl=1Lcustom-characterTysf,l)  (16)


then for any κ>0,








limsup

j








(




"\[LeftBracketingBar]"




J
r

(



π
^


r
,
j


;

x
r


)

+




l
=
1

L



J

f
,
l


(



π
^


f
,
l
,
j


;

x

f
,
l



)


-

J

(

π
;
x

)




"\[RightBracketingBar]"



κ

)


=
0.




Proof. We will first show that if (16) holds, then usr*=πsr(xsr) and usf,l*=πsr(xsf,l) for l=1, . . . , L will minimize (8). Then we apply triangular inequality to prove the probability bounds. The feasible action set of (8) defined in (6) yields that for any (usr, usf)∈Γ(xsr, xsf), we have usr∈Γr(xsr) and usf,l∈Γf,l(xsf,l). Thus,





0≤usr≤g(xsr)=min{rysr,zr}≤rysr,





0≤usf,l≤g(xsf,l)=min{rysf,l*,zf,l*}≤rysf,l,


by (2) and (3). That is, if (16) holds, then









d
_

s

=

0




1
T



u
s
r


+




l
=
1

L




1
T



u
s

f
,
l







r

(



1
T



y
s
r


+




l
=
1

L




1
T



y
s

f
,
l





)




d
_

s



,


for


all


s

=
1

,


,
T
,




which implies









Γ
r

(

x
s
r

)



{



u
s
r





𝒰
s
r

:


d
_

s






1
T



u
s
r


+




l
=
1

L




1
T



u
s

f
,
l








d
_

s



,


u
s

f
,
l





Γ

f
,
l


(

x
s

f
,
l


)


,


for


all


l

=
1

,


,
L

}


,




Γ

f
,
l


(

x
s

f
,
l


)




{



u
s

f
,
l






𝒰
s

f
,
l


:


d
_

s






1
T



u
s
r


+





l



=
1

L




1
T



u
s

f
,

l











d
_

s



,


u
s
r




Γ
r

(

x
s
r

)


,


u
s

f
,
l





Γ

f
,
l


(

x
s

f
,
l


)


,


for


all



l



=
1

,


,
L
,


l



l


}

.






Thus, we can conclude that (16) implies





Γ(xsr,xsr)={(usr,usf)∈custom-charactersr×custom-charactersf:






u
s
r∈Γr(xsr),usf,l∈Γf,l(xsf,l), for all l=1, . . . ,L}.


By applying the principle of mathematical induction from s=T to s=1 with vT*, vTr*, vTf,l*≡0, we have (8) being equivalent to









v
s
*

(

x
s

)

=




inf



u
s
r



Γ

(

x
s
r

)


,


u
s

f
,
l




Γ

(

x
s

f
,
l


)


,

l
=
1

,

,
L




c
s

r
T




u
s
r


+




l
=
1

L



(



c
s

f
,

l
T





u
s

f
,
l



+



p
s

f
,

l
T



(


z
`s

f
,
l


-

u
s

f
,
l



)



(

t
,
m
,
n

)



𝒥`

s
,
1


f
,
l



+


)


+

𝔼
[


v

s
+
1

*

(

f

(


x
s

,

u
s

,

W
s

,

d

s
+
1



)

)

]


=



v
s

r
*


(

x
s
r

)

+




l
=
1

L




v
s

f
,

l
*



(

x
s

f
,
l


)





,


for


all


s

=
1

,


,

T
.





This indicates that given (16), if usr*=πr(xsr) and usf,l*=πf,l(xsf,l) for l=1, . . . , L, then (usr*, usf,l*, . . . ,usf,l*) minimizes (8).


We now prove the probability bounds. By Lemma 3, we have for any κ>0,









0
=


limsup

j








(







v
^

s

r
,
j


-

v
s

r
*







>
κ

)








=


limsup

j









(






v
s

f
,
l
,

j
*



-

v
s

f
,
l







>
κ

)

.









That is, for any x=(xr, xf,l, . . . , xf,L)∈custom-character0r×custom-character0f and κ>0,

















limsup

j








(




"\[LeftBracketingBar]"




J
r

(



π
^


r
,
j


;

x
r


)

+




l
=
1

L




J

f
,
l


(



π
^


f
,
l
,
j


;

x

f
,
l



)


-

J

(

π
;
x

)




"\[RightBracketingBar]"



κ

)


=


limsup

j







(





"\[RightBracketingBar]"





J
r

(



π
^


r
,
j


;

x
r


)


+




l
=
1

L




J

f
,
l


(




π
^


f
,
l
,
j


'



x

f
,
l



)


-


v
0

r
*


(

x
r

)

-




l
=
1

L




v
0

f
,

l
*



(

x

f
,
l


)





"\[RightBracketingBar]"



κ

)





limsup

j








(




"\[LeftBracketingBar]"




J
r

(



π
^


r
,
j


;

x
r


)

-


v
0

r
*


(

x
r

)




"\[RightBracketingBar]"



κ

)


+




l
=
1

L




limsup

j








(




"\[LeftBracketingBar]"




J

f
,
l


(



π
^


f
,
l
,
j


;

x

f
,
l



)

-


v
0

f
,

l
*



(

x

f
,
l


)




"\[RightBracketingBar]"



κ

)








limsup

j








(







v
^

0

r
,
j


-

v
0

r
*







>
κ

)


+




l
=
1

L




limsup

j








(







v
^

0

f
,
l
,

j
*



-

v
0

f
,

l
*








>
κ

)





=
0

,




which completes the proof.


Algorithm

Having computed the value functions using the fitted value iteration algorithm described above, we provide a detailed ADP algorithm to obtain u0*, u1*, . . . , uT*. We compute the approximate optimal action for reliable and flexible demand as ûsr* and ûsf,1*, . . . , usf,L* using a multistage Rollout algorithm. Then, each EV in the category can be charged according to any disaggregation algorithm, like first-come, first served (FCFS). The overall algorithm is described below.


A coarse approximation of the true computational complexity of the algorithm may be provided as follows. Let U=max{dim(custom-charactersr), dim(custom-charactersf,1), . . . , dim(custom-charactersf,L)}, then the time complexity of solving a linear optimization with constraints is (indeed, at least) O((L+1)2U2.5). By using the two-stage algorithm according to the present disclosure, solving each stage is of time complexity of at least O(U2.5) and the total time complexity is at least O((L+1)U2.5).


The multi-stage EV charging scheduling and control algorithm according to embodiments of the disclosure may be summarized by the following pseudocode:














Part I: Multistage Fitted Value Iteration


 Initialize vT+1*≡ 0.


 FOR s = T, . . ., 1 DO


   Generate state and noise samples for reliable demand: {xs,jr}j=1k′ and {Ws,ir}i=1k


   Create data set using fitted empirical Bellman operator {xs,jr, {circumflex over (v)}sr,j (xs,j)}j=1k′, and


    obtain vsr,j with Neural networks.


   Generate state and noise samples for flexible demand {xs,jf,l}j=1k′ and {Ws,if,l}i=1k for


    each l = 1, ..., L. Create data set using fitted empirical Bellman operator


    {xs,jf,l, {circumflex over (v)}sf,l,j (xs,j)}j=1k′, and obtain {circumflex over (v)}sf,l,j with Neural networks for each l = 1, .., L.


 END FOR


Part II: Multistage Rollout Algorithm


 Initialize x0 = 0.


 FOR s = 1, . . ., T DO


   Update xs using (5), and decouple xs into xsr, xsf,1, ..., xsf,L.


   Pick dsr = ds and compute ûsr* = {circumflex over (π)}sr(xsr) by (13).


  FOR l = 1, ...., L DO


   Update dsf,l by (15).


   Compute ûsf,l* = {circumflex over (π)}s(xsr) with (14).


  END FOR


  FOR (t,m,n) ∈ T × Br and (t,m,n,l) ∈ T × Bf DO


   Charge each EV η units of electricity in the interval from s to s + 1 based on FCFS


    discipline, where:


    IF In category (t,m,n) THEN


     Reliable demand: η = ûst,m,n* /yst,m,n


    ELSE


     Flexible demand: η = ûst,m,n,l* /yst,m,n,l


    END IF


  END FOR


 END FOR









Numerical Results
Simulation Setup

In this illustration of operation of the system or method for scheduling and controlling EV charging based on the scheduling of a large number of EVs, we consider the scheduling of EV charging for a T=24 hour period, that is, from 7 AM (day 1) to 7 AM (day 2). The electricity prices may vary according to a time-of-use schedule having two or more ranges or categories as well as the day of the week (such as weekdays/weekends) and a summer/winter season, for example. In this illustration, electricity prices vary according to peak/off-peak hours during the same season and days with the same rate (weekdays). We consider two types of customers, i.e. L=1, and the customers pay a constant price of 9.2¢/kWh for reliable demand and 7.36¢/kWh (20% discount) for flexible demand. The cost cs is considered as the difference between the electricity price and the revenue per kWh from the customers. The platform will compensate ps=2.5¢/kWh to customers if their minimum flexible demand is not met. In what follows, we will interpret the respective costs associated with reliable demand and flexible demand csr and csf,l as negative profits instead. These parameters are shown in Table 1 below:









TABLE 1







Electricity Prices During Weekdays













Time (h)
7-14
14-18
18-22
22-7






Peak hours
Mid-Peak
On-Peak
Mid-Peak
Off Peak



csr ¢/kWh
0
7.4
0
−4.4



csf,1 ¢/kWh
1.84
9.24
1.84
−2.56



psf,1 ¢/kWh
2.5
2.5
2.5
2.5









For this example, we choose M=3 and N=6, and the charging rate is fixed at r=10 kW. The feasible menus custom-characterr and custom-characterf,1 are given in Table 2 below. Then, the dimensionality of the state/action space is determined, which is dim(custom-charactersr)=182, dim(custom-charactersf,1)=272 and dim(custom-charactersr)=dim(custom-charactersf,1)=90. The arrival process {wtrcustom-character and {wtf,1custom-character are sequences of random variables with Poisson distribution. The distribution of the arrival process is deduced from the ACN Dataset. We also pick ds=[0 kWh, 10000 kWh] as the hourly grid bounds.









TABLE 2







Feasible Menu Given M = 3 And N = 6













Br, Bf, 1
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6





m = 10 kWh
(1, 1)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)


m = 20 kWh
x
(2, 2)
(2, 3)
(2, 4)
(2, 5)
(2, 6)


m = 30 kWh
x
x
(3, 3)
(3, 4)
(3, 5)
(3, 6)









For the projection operator, we choose the number of state samples and noise samples to be 64 (i.e. j=64). The function approximating class custom-characterd is the set of neural networks with width dim(custom-charactersr)×2=364 and dim(custom-charactersf,1)×2=544 and depth 8. The learning rate is chosen as 0.005. The empirical fitted value iteration is employed to compute the value functions {circumflex over (v)}sr,j and {circumflex over (v)}sf,1,j.


Results of Reliable and Flexible Demand

We demonstrate the performance of our ADP algorithm, denoted as ADP, by comparing it with two other algorithms: SP (simple programming) and FCFS (first-come first-serve). The algorithm SP computes the optimal cost with the knowledge of all the future demand—in this case, the problem boils down to solving a linear program with constraints. It is formulated by a deterministic optimization problem since {wtTcustom-character, {wtf,1}custom-character are known. We denote the optimal actions of SP as {uSP,sr*custom-character and {uSP,sr*custom-character. The second algorithm FCFS follows the First Come First Serve discipline, which charges the EVs immediately when they arrive at the charging station. This is the most widely used scheduling algorithm across the world. Let the actions of FCFS be denoted by {uFCFS,sr*custom-character and {uFCFS,sf,1*custom-character. An overview of the information required by the three algorithms is summarized in Table 3 below.









TABLE 3







Application Scenarios of the Algorithms


Given Knowledge of the Future










Algorithms
Future Demand
Demand Distribution
No Knowledge





SP
Yes
No
No


ADP
Yes
Yes
No


FCFS
Yes
Yes
Yes









We similarly denote optimal actions of our ADP algorithm as {uADP,sr*custom-character and {uADP,sf,1*custom-character. Let the cumulative cost for each sample path as










J

α
,
t

*

=




s
=
1

T



(



c
s

r
T




u

α
,
s


r
*



+



p
s

r
T


(

m
-




τ
=
t

s



u

α
,
s


r
*




)



(

t
,
m
,
n

)



𝒥

s
,
1

r


+

+


c
s

f
,

l
T





u

α
,
s


f
,

l
*




+



p
s

f
,

l
T



(

l
-




τ
=
t

s



u

α
,
s


f
,

l
*





)



(

t
,
m
,
n
,
l

)



𝒥`

s
,
1


f
,
l



+


)






(
17
)







where α∈{ADP,SP,FCFS}. Note that JSP* are the lower bounds on JADP* and JFCFS* since it knows all the future demand. Ten sample paths are used to compare the performance of these algorithms, and the results are shown in FIG. 2. Since ADP and SP exploit knowledge of the future demand distribution or demand itself, their profits are much higher than the FCFS charging policy. We consider two types of FCFS for flexible demand: charge the EV up to the minimum target or the maximum target SOC.


We can further observe from FIG. 2 that despite FCFS with maximum demand, all of the algorithms serve a similar amount of demand at the end of the day. In fact, ADP serves more demand, and achieves a relatively similar profit to SP. The optimality gap between SP and ADP in the upper plot of FIG. 2 is due to the approximation error of the value function and the uncertainty about the future in the ADP algorithm.



FIG. 3 depicts that the error from the approximation in the flexible demand setting. This results from two major facts in serving flexible demand: 1) the value function is more difficult to approximate due to the penalty term in the cost function; 2) the two-stage optimization makes the value function of flexible demand more sensitive to the results of the reliable demand optimization, and the function approximator requires more data samples to achieve the comparable accuracy. Under the reliable demand setting, the profits of SP and ADP are similar, whereas, under the flexible setting, there is an optimality gap between SP and ADP. However, ADP performs better than FCFS for both settings.


We also observe that the penalty of violating the charging demand can significantly affect the profits in the flexible demand setting, which is shown in FIG. 4. A lower penalty leads to lower energy consumption but higher profits since it allows dropping flexible demand during the peak hours.


As the penalty psf,l increases, the optimality gap between SP and ADP becomes larger, as demonstrated in FIG. 5. Here, the FCFS algorithm is not affected by the penalty changes. Note that at psf,1=1.84 and psf,1=9.24, the SP energy consumption increases due to the penalty being higher than the charging cost during the mid-peak and peak hours respectively, and the demand with high charging cost cannot be dropped. This leads to a trade-off between the total charge provided and the cumulative profits.


Constraints Relaxation for Reliable Demand

As long as the grid bounds ds are sufficiently large, the reliable demand can always be fully satisfied. However, in various regions and/or during various times, the available grid power may be limited to less than the reliable demand. In this case with limited grid bounds ds, leads to Γr(xsr)=Ø, and thus, there is no feasible solution for (13). To circumvent this, we also add a penalty term to the reliable demand setting. It solves the following Bellman equation












π
s
r

(

x
s
r

)

=



arginf


u
s
r




Γ

r




(

x
s
r

)





c
s

r
T




u
s
r


+



p
s

r
T


(


z
s
r

-


u
s
r




penalty


for


reliable


demand




)



(

t
,
m
,
n

)



𝒥`

s
,
1

r


+

+

𝔼
[


v

s
+
1


r
*


(


f
r

(


x
s
r

,

u
s
r

,

W
s
r


)

)

]



,




(
18
)







where Γr′ is the relaxed feasible action set, i.e.





Γr′(xsr):={usrcustom-charactersr:0≤usr≤g(xsr),dsrcustom-characterTusrdsr},


and vsr′* is the corresponding value function. For this example, we choose ds=[0,6000 kWh] and ds=[0,8000 kWh] to demonstrate the performance of each algorithm in this scenario, which is shown in FIG. 5. In this case, we replace the principle of heuristic algorithm from FCFS to EDF since EDF requires smaller grid bounds to meet the charging demand. The results are depicted in FIG. 6. During the peak hours, SP and ADP consume 0 kWh electricity since the demand is not served and the platform pays penalties to the customers—this leads to SP and ADP having a higher profit than EDF. In the context of FIG. 6, we can observe that the optimality gap between SP and ADP caused by the multi-stage algorithm reduces if the grid bound is sufficiently large (ds=10 kWh). This was established in Lemma 4.


As described herein, scheduling of EV charging may be modeled with multiple types of reliability constraint as a stochastic dynamic program. Due to very high dimensional state and action spaces, and a high number of constraints, the resulting problem could not be solved using the usual dynamic programming algorithm. As such, various embodiments according to the disclosure use fitted value iteration to solve the problem, and apply a multi-stage algorithm to reduce the computational complexity of the solution approach. Simulations show that this algorithm yields profits close to optimal profits under full information about the future demand of the EVs, and is better than the heuristic algorithms like FCFS and EDF. This disclosure demonstrates robustness of the algorithm with relaxed constraints in optimization. While the disclosed two-stage decoupling algorithm may provide acceptable results for various applications, it may be further improved to reduce the optimality gap.


While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms encompassed by the claims. The words used in the specification are words of description rather than limitation, and it is understood that various changes can be made without departing from the spirit and scope of the disclosure. As previously described, the features of various embodiments can be combined to form further embodiments of the invention that may not be explicitly described or illustrated. While various embodiments could have been described as providing advantages or being preferred over other embodiments or prior art implementations with respect to one or more desired characteristics, those of ordinary skill in the art recognize that one or more features or characteristics can be compromised to achieve desired overall system attributes, which depend on the specific application and implementation. These attributes may include, but are not limited to cost, strength, durability, marketability, appearance, packaging, size, serviceability, weight, manufacturability, ease of assembly, etc. As such, embodiments described as less desirable than other embodiments or prior art implementations with respect to one or more characteristics are not outside the scope of the disclosure and can be desirable for particular applications.

Claims
  • 1. A method for controlling charging of multiple electric vehicles (EVs) arriving at, and departing from, different charging stations at different times, comprising, by one or more processors: scheduling charging of each EV of the multiple EVs responsive to which one of a plurality of categories each EV is assigned, each EV assigned to one of the categories according to an arrival time at an associated one of the different charging stations, a departure time from the associated one of the different charging stations, an initial state of charge (SoC) of the EV, and a target SoC of the EV; andcontrolling charging of each EV responsive to the scheduling of charging for the assigned category of each EV.
  • 2. The method of claim 1 further comprising assigning each EV to one of the categories according to one of a plurality of charging demand types designated by the EV.
  • 3. The method of claim 2 wherein the plurality of charging demand types includes a reliable charging demand and a flexible charging demand.
  • 4. The method of claim 3 wherein scheduling charging of each EV designating the flexible charging demand includes scheduling the charging responsive to a specified minimum target SoC and a specified maximum target SoC of the EV.
  • 5. The method of claim 3 wherein scheduling charging of each EV comprises: scheduling charging of EVs designating the reliable charging demand assuming the EVs designating the reliable charging demand will consume all available electricity during a specified time period based on an associated grid upper demand band for the specified period; andscheduling charging of EVs designating the flexible charging demand based on the specified minimum target SoC of the EVs designated the flexible charging demand.
  • 6. The method of claim 3 wherein scheduling charging of each EV comprises: allocating available electricity from an electrical grid to each of the plurality of categories based on a number of EVs in each category arriving at the charging stations during a designated time period and a cost associated with allocated available electricity, the allocated available electricity limited by a minimum of remaining electricity available for each category and charging capacity of the multiple EVs.
  • 7. The method of claim 6 wherein the cost associated with the allocated available electricity corresponds to a net cost corresponding to cost of electricity supplied by an electric grid less a cost associated with a penalty corresponding to the allocated available electricity corresponding to EVs designating the reliable charging demand being insufficient to charge the EVs designating the reliable charging demand to associated target SoCs.
  • 8. The method of claim 6 wherein the cost associated with the allocated available electricity corresponds to a net cost corresponding to cost of electricity supplied by an electric grid less a cost associated with a penalty corresponding to the allocated available electricity corresponding to EVs designating the flexible charging demand being insufficient to charge the EVs designating the reliable charging demand to associated minimum target SoCs.
  • 9. The method of claim 6 wherein scheduling charging of each EV comprises: determining a number of EVs in each category designating a reliable charging demand and arriving at a specified time using a designated statistical distribution;allocating electricity from the grid available for charging to each category designating the reliable charging demand for a second specified time period;associating electricity cost of electricity allocated to each category designating a reliable charging demand; andallocating any electricity available after satisfying the reliable charging demand for the second specified time period to EVs designating the flexible charging demand.
  • 10. The method of claim 9 wherein scheduling charging of each EV further comprises limiting allocation of electricity from the grid available for charging to each category designating the reliable charging demand to a minimum of remaining electricity available from the grid and charging capacity of the EVs designating the reliable charging demand.
  • 11. A computer-implemented method for controlling charging of a large number of electric vehicles (EVs) arriving and departing different charging stations at different times, the method comprising, by one or more computers: assigning one of a plurality of categories to each EV that arrives at a charging station depending on arrival time to the charging station, designated departure time from the charging station, initial state of charge (SoC) of the EV, target SoC of the EV, and a charging demand type specified by the EV;scheduling charging of each EV according to which of the plurality of categories the EV has been assigned and the charging type specified by the EV; andcontrolling charging of each EV based on the scheduling.
  • 12. The method of claim 11 wherein the demand type specified by the EV corresponds to a flexible demand type, wherein EVs specifying the flexible demand type specify a minimum target SoC and a maximum target SoC, and wherein controlling charging is based on the minimum target SoC and the maximum target SoC specified by the EV.
  • 13. The method of claim 12 wherein, for categories associated with the flexible demand type, the scheduling is based on controlling the charging to satisfy the minimum target SoC for each EV specifying the flexible demand type, and continuing to charge to the maximum target SoC responsive to profit associated with charging to the maximum target SoC exceeding a threshold.
  • 14. The method of claim 13 wherein the demand type includes a reliable demand type, and wherein scheduling charging comprises allocating electricity available from the grid to categories associated with the reliable demand type before allocating the electricity available from the grid to categories associated with the flexible demand type.
  • 15. The method of claim 14 wherein scheduling comprises assigning a penalty cost for each EV specifying the flexible demand type that is not charged to at least the minimum target SoC prior to the departure time.
  • 16. The method of claim 11 wherein the scheduling is based on a statistical distribution of arrival times to the charging stations and designated departure times from the charging stations.
  • 17. A system comprising: a plurality of electric vehicle (EV) charging stations each configured to charge a plugged EV during a time period specified by at least one remotely-located processor, the processor configured to schedule charging of EVs for all of the charging stations by scheduling a plurality of charging categories, each EV assigned to one of the plurality of categories by the processor based on arrival time to a charging station, expected departure time from the charging station, state of charge (SoC) of the EV upon arrival, target SoC of the EV before the expected departure time, and a charging demand type specified by the EV.
  • 18. The system of claim 17 wherein each of the plurality of charging stations includes a processor configured to control charging of an associated plugged EV according to the charging schedule.
  • 19. The system of claim 17 wherein the processor is configured to schedule charging of EVs based on a minimum and maximum available power provided from an associated electric grid.
  • 20. The system of claim 17 wherein the charging demand type includes a flexible demand having an associated minimum target SoC and maximum target SoC specified by the EV, and wherein the processor is further configured to schedule charging of EVs to provide the minimum target SoC for all EVs specifying the flexible demand, and to continue charging the EVs specifying the flexible demand above the minimum target SoC only if an associated profit exceeds a threshold.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit under 35 U.S.C. § 119(e) of U.S. Application 63/390,779 filed Jul. 20, 2022, the disclosure of which is hereby incorporated by reference in its entirety.

Provisional Applications (1)
Number Date Country
63390779 Jul 2022 US