Threshold based selective maintenance of series-parallel systems

Information

  • Patent Grant
  • 11755971
  • Patent Number
    11,755,971
  • Date Filed
    Wednesday, April 7, 2021
    3 years ago
  • Date Issued
    Tuesday, September 12, 2023
    a year ago
Abstract
A condition of an asset including one or more subsystems connected in series is monitored. Each subsystem includes one or more components connected in parallel. The asset has one or more jobs. A probability of the asset surviving a predetermined amount of time is determined based on the monitoring and one or more shared resources. The one or more shared resources are configured to be shared between the subsystems. A model is established using a threshold based heuristic maintenance policy. The model is configured to maximize a number of successful jobs that the asset is able to complete based on the determined probability. The one or more shared resources are allocated to the one or more subsystems based on the model.
Description
SUMMARY

Embodiments described herein involve a method comprising monitoring a condition of an asset comprising one or more subsystems connected in series. Each subsystem comprises one or more components connected in parallel. The asset has one or more jobs. A probability of the asset surviving a predetermined amount of time is determined based on the monitoring and one or more shared resources. The one or more shared resources are configured to be shared between the subsystems. A model is established using a threshold based heuristic maintenance policy. The model is configured to maximize a number of successful jobs that the asset is able to complete based on the determined probability. The one or more shared resources are allocated to the one or more subsystems based on the model.


Embodiments involve a system comprising a processor and a memory storing computer program instructions which when executed by the processor cause the processor to perform operations. The operations comprise monitoring a condition of an asset comprising one or more subsystems connected in series. Each subsystem comprises one or more components connected in parallel. The asset has one or more jobs. A probability of the asset surviving a predetermined amount of time is determined based on the monitoring and one or more shared resources. The one or more shared resources are configured to be shared between the subsystems. A model is established using a threshold based heuristic maintenance policy. The model is configured to maximize a number of successful jobs that the asset is able to complete based on the determined probability. The one or more shared resources are allocated to the one or more subsystems based on the model.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1A shows a process for allocating resources using a threshold based heuristic system in accordance with embodiments described herein;



FIG. 1B. illustrates another process for allocating resources using a threshold based heuristic system in accordance with embodiments described herein;



FIG. 2 shows a block diagram of a system capable of implementing embodiments described herein;



FIG. 3 shows a more an optimal maintenance resource allocation policy of threshold versus time slots in accordance with embodiments described herein; and



FIG. 4 illustrates a maintenance resource allocation policy of optimal allocation versus number of maintenance resources in accordance with embodiments described herein;





The figures are not necessarily to scale. Like numbers used in the figures refer to like components. However, it will be understood that the use of a number to refer to a component in a given figure is not intended to limit the component in another figure labeled with the same number.


DETAILED DESCRIPTION

Industrial and mission critical assets such as factory assembly lines, aircraft engines and military equipment are composed of series-parallel systems that require periodic maintenance and replacement of faulty components. The assets typically operate over a finite time window and are brought in (or shut down) for service between consecutive operational windows (e.g., missions). To prevent excessive down times and costly breakdown, a selective maintenance policy that judiciously selects the sub-systems and components for repair and prescribes allocation of resources may be used.


Embodiments described herein involve a threshold based system and method that provides near optimal maintenance policy especially when resources (man power, budget, spare parts) are sparse and have to be shared across the entire planning horizon. The exact planning problem is computationally intensive and our heuristic method will produce near optimal maintenance schedules in reasonable time suitable for practical applications. The threshold value is expressed as the long-term expected maximum reward yielded by holding a single resource in reserve. The heuristic policy described herein determines threshold values that are a function of the resource inventory levels and state of the system. If the immediate reward yielded by allocating a resource exceeds the long-term expected marginal yield (e.g., the threshold value), the policy dictates that the resource be allocated. In this regard, we perform a tradeoff between the immediate reward and the long-term expected reward yielded by holding the resource in reserve (i.e., for allocation at a future time). If a plurality of resources are available, the policy dictates that the resource with the maximum marginal reward be allocated first. Also, if a plurality of sub-systems/components are in need of repair, the policy dictates that those sub-systems/components whose repair will yield the maximum asset reliability be repaired first. Embodiments described herein can be used in any type of system. For example, the techniques described herein can be used in aircraft engine maintenance, assembly line selective maintenance (e.g., in printer ink facilities), industrial equipment, military asset management, and/or weapons allocation (e.g., assignment of weapons to targets).


In some cases, finite resources are typically allocated for each service period (or break). Embodiments described herein specifically include allocation of scarce resources across the planning horizon. In general, consumable resources, (e.g., man hours, parts) can be shared and therefore optimally assigned depending on the condition of the asset and the remaining number of missions (time horizon). The selective maintenance scheduling problem for series-parallel systems is computationally intensive and optimal schedules may take hours to compute on current day edge devices. Embodiments described herein exploit the structure in the problem to derive an intuitive threshold based heuristic policy that is amenable to fast real time implementation. The heuristic policy employs a scalable linear backward recursion for computing the threshold values as opposed to solving a non-linear Bellman recursion (which is typically not scalable for a system with large number of sub-systems and components) for computing the optimal allocation.



FIG. 1A shows a process for allocating resources using a heuristic threshold-based system in accordance with embodiments described herein. A condition of an asset comprising one or more subsystems connected in series is monitored 110. For example, the monitoring 110 may involve determining if all subsystems are functioning properly. Each subsystem has one or more components connected in parallel. One or more of the components may be redundant within each subsystem. In some cases, all of the one or more components are redundant within each subsystem. According to various embodiments, the asset has one or more jobs. According to various embodiments the one or more jobs comprise one or more identical jobs.


A probability of the asset surviving a predetermined amount of time is determined 120 based on the monitoring 110 and one or more shared resources configured to be shared between the subsystems. The shared resources may include one or both of replenishable resources and consumable resources. Specifically, the one or more shared resources include one or more of man hours, budget, parts, and equipment used to perform maintenance.


A model is established 130 using a threshold based heuristic maintenance policy. The model may be configured to maximize a number of successful jobs that the asset is able to complete based on the determined probability. According to various embodiments, the model is configured to minimize a shared resource cost. The resource cost may be calculated using a Maximal Marginal Reward (MMR) algorithm that is configured to maximize an expected payout or minimize an expected cost.


The one or more shared resources are allocated 140 to the one or more subsystems based on the model. A maintenance schedule for the asset may be determined based on the model. In some cases, the one or more components have a known failure rate and the known failure rates are used to establish the model.



FIG. 1B shows a more detailed flow diagram for allocating resources using a heuristic threshold-based system in accordance with embodiments described herein. The process starts 145 and It is determined 155 whether it is the final decision stage. This may occur when enough information is available about the condition of the asset and the number of available resources to build a model.


If it is not determined 155 that it is the final decision stage, a failure probability for all components of the asset is determined 150 until the next decision epoch. The next decision epoch may be a current task that one or more subsystems of the asset is performing and/or a current mission of the asset, for example.


Components and/or subsystems are prioritized for repair and/or replacement 160. The prioritization may be based on a time to failure of the component and/or subsystem, for example. In some cases, the prioritization may be based on a cost of the replacement and/or repair.


A threshold value is computed 170 for the next available resource. It is determined 180 whether the immediate resource allocation reward is greater than the threshold. If it is determined 180 that the immediate resource allocation reward is not greater than the threshold, the process again computes 170 the threshold value for the next available resource.


If it is determined 180 that the immediate resource allocation reward is greater than the threshold, resources are allocated 190. The remaining resources and components selected for repair are iterated through. The process then advances 195 to the next decision epoch and the process returns to determining 155 if it is the final decision stage and the process continues.


If it is determined 155 has been reached, an algorithm is used 165 to allocate all remaining resources. According to various configurations, the algorithm is the MMR algorithm. The process then ends 175.


The methods described herein can be implemented on a computer using well-known computer processors, memory units, storage devices, computer software, and other components. A high-level block diagram of such a computer is illustrated in FIG. 2. Computer 200 contains a processor 210, which controls the overall operation of the computer 200 by executing computer program instructions which define such operation. It is to be understood that the processor 210 can include any type of device capable of executing instructions. For example, the processor 210 may include one or more of a central processing unit (CPU), a graphical processing unit (GPU), a field-programmable gate array (FPGA), and an application-specific integrated circuit (ASIC). The computer program instructions may be stored in a storage device 220 and loaded into memory 230 when execution of the computer program instructions is desired. Thus, the steps of the methods described herein may be defined by the computer program instructions stored in the memory 230 and controlled by the processor 210 executing the computer program instructions. The computer 200 may include one or more network interfaces 250 for communicating with other devices via a network. The computer 200 also includes a user interface 260 that enable user interaction with the computer 200. The user interface 260 may include I/O devices 262 (e.g., keyboard, mouse, speakers, buttons, etc.) to allow the user to interact with the computer. The user interface may include a display 264. The computer may also include a receiver 215 configured to receive data from the user interface 260 and/or from the storage device 220. According to various embodiments, FIG. 2 is a high-level representation of possible components of a computer for illustrative purposes and the computer may contain other components.


Establishing the model using a threshold based heuristic maintenance policy is described in further detail below.


Markov Task Sets

The decision maker (DM) has at his disposal M homogenous resources that are to be sequentially allocated to incoming tasks. Decisions are made at discrete epochs, i.e., t=0, 1, . . . . Each task i has a window of opportunity, [tsi, tfi], within which it is active and therefore, the DM can assign resources towards completing the task. A single resource assigned to a task gets the job done at or before the next decision epoch (i.e., task is completed) with probability p<1. So, if k resources are allocated to an incomplete task i at decision epoch t∈[tsi, tfi), the task will be completed with probability 1−(1−p)k at the next decision epoch t+1. We assume “complete information” in that the task completion status is known to the DM at the beginning of each decision epoch. Upon successful completion of task i, the DM earns the positive reward ri. The DM also incurs a cost c>0 for each resource assigned. Suppose at time 0, the DM knows that N tasks shall arrive with a priori known windows of opportunity. Without loss of generality, we shall assume that at least one of the tasks starts at time 1 i.e., minitsi=1. Furthermore, let T=maxitfi be the time horizon of interest. We wish to compute the optimal allocation of resources to tasks so that the DM accrues the maximal cumulative reward over the time window, t∈[1, T].


Let the task completion status (state) variable: z(i)=0,1 indicate that task i is incomplete and complete respectively. At decision epoch t, let V (z,k,t) indicate the optimal cumulative reward (value function) that can be achieved thereafter where k indicates the number of resources left and the vector z encapsulates the status of each task.


A task i is active at the current time if it is incomplete, i.e., z(i)=0 and if the current time t∈[tsi, tfi]. Accordingly, we define the set of active tasks, A(z,t)={i: t∈[tsi, tfi] and z(i)=0}.


The decision variable, u is a vector wherein ui indicates the number of resources to be assigned to task i. The feasible allocation set is shown in (1).

U(z,k,t)={u:Σi∈A(z,t)ui≤k,ui≥0,∀i∈A(z,t),ui=0,∀i∉A(z,t)}.  (1)

It follows that the value function V (z,k,t) satisfies the Bellman recursion:











V

(

z
,
k
,
t

)

=


max

u


U

(

z
,
k
,
t

)




{





i


A

(

z
,
t

)




[



-
c



u
i


+


(

1
-

q

u
i



)



r
i



]


+




z
¯





Prob

(

z

_


)


V


(


z
¯

,

k
-




i


A

(

z
,
t

)




u
i



,

t
+
1


)




}



,




(
2
)













z

,

k
=
1

,


,
M
,



t


[

1
,

T
-
1


]



,






where q=1−p. In effect, we assign Σi∈A(z,t) ui resources in total. In the next decision epoch, the status of each task changes according to:










Prob

(



z
_

(
i
)

=

z

(
i
)


)

=

{





q

u
i


,



i


A

(

z
,
t

)










1
,



i


A


(

z
,
t

)














(
3
)








Since the evolution of each task is a completely independent Markov process, the transition probabilities in (3) are given by:

Prob(z)=πi=1, . . . ,N Prob(z(i)).  (4)

The boundary conditions for the value function are given by:











V

(

z
,
k
,
T

)

=


max

u


U

(

z
,
k
,
T

)







i


A

(

z
,
T

)




[



-
c



u
i


+


(

1
-

q

u
i



)



r
i



]




,


z

,




(
5
)










k
=
1

,


,
M
,











V

(

z
,
0
,
t

)

=
0

,


z

,

t
.





(
6
)








We shall address the much simpler single task scenario first.


Single Task Allocation

Suppose we are only interested in the optimal allocation of resources to a single active task i. For ease of exposition, let us assume that 1=tsi<tfi=Ti. The corresponding task specific value function is given by:












V
i

(

k
,
t

)

=


max


u
i



[

0
,
k

]




{


[



-
c



u
i


+


(

1
-

q

u
i



)



r
i



]

+


q

u
i





V
i

(


k
-

u
i


,

t
+
1


)



}



,




(
7
)










k
=
1

,


,
M
,



t


[

1
,

T
i


]



,





with the boundary conditions:












V
i

(

k
,

T
i


)

=


max


u
i



[

0
,
k

]



[



-
c



u
i


+


(

1
-

q

u
i



)



r
i



]


,


k
=
1

,


,
M
,




(
8
)















V
i

(

0
,
t

)

=
0

,



t



[

1
,


T
i


]

.







(
9
)








The optimal policy is given by:

μi(k,t)=arg maxui∈[0,k]{[−cui+(1−qui)ri]+quiVi(k−ui,t+1)}, k=1, . . . ,M,∀t ∈[1,Ti].  (10)

Suppose the DM has u+1 resources left at the terminal time Ti. We denote the marginal reward yielded by assigning 1 additional resource over and above u resources to the active task i as:

ΔTi(u)=−c(u+1)+(1−qu+1)ri−[−cu+(1−qu)ri]=−c+pquri.  (11)

Since q<1 it follows that ΔTi(u) is monotonic decreasing in u. Let κi(T) be the least non-negative integer at which the marginal reward becomes negative i.e., κi(Ti)=custom character s.t. custom characterri<c. It follows that the optimal policy at the terminal time is given by:











μ
i

(

k
,

T
i


)

=


arg




max


u
i



[

0
,
k

]



[



-
c



u
i


+


(

1
-

q

u
i



)



r
i



]


=

{




k
,





k
<


κ
i

(

T
i

)


,








κ
i

(

T
i

)

,




k




κ
i

(

T
i

)

.











(
12
)



















V
i

(

k
,

T
i


)


=



-
c




μ
i

(

k
,

T
i


)


+


[

1
-

q


μ
i

(

k
,

T
i


)



]



r
i




,



k
.







(
13
)








In other words, we assign as many resources as is available but no more than the threshold. We will show likewise that at each decision epoch, there is an upper bound to the number of resources that the DM shall assign (optimally) for the remainder of the time horizon. Having computed the terminal value function, we are now in a position to compute the value function at the previous decision epoch. Indeed, we have from (7):












V
i

(

k
,


T
i

-
1


)

=


max


u
i



[

0
,
k

]




{


[



-
c



u
i


+


(

1
-

q

u
i



)



r
i



]

+


q

u
i





V
i

(


k
-

u
i


,

T
i


)



}



,




(
14
)










k
=
1

,


,

M
.






We can show that the value function Vi(k,Ti−1) has a threshold based analytical form similar to Vi(k,Ti). Let the function inside the max operator in (14) be given by,










W

(

k
,
u
,

T
i


)

=



[


-
cu

+


(

1
-

q
u


)



r
i



]

+


q

u
i





V
i

(


k
-
u

,


T
i


)



=

{






-
cu

+

r
i

-


q
u




r
_

i



,





u
<

k
-


κ
i

(

T
i

)



,








-
cu

+

r
i

-


q
u



c

(

k
-
u

)


-


q
k



r
i



,





u


k
-


κ
i

(

T
i

)



,










(
15
)








where, ri=cκi(Ti)+qκi(Ti)ri. Note that we substitute for Vi(k−u, Ti) from (13) to get (15). As before, the marginal reward at time Ti−1 is defined according to:











Δ

T
-
1

i

(

k
,
u

)

=



W

(

k
,

u
+
1

,

T
i


)

-

W

(

k
,
u
,

T
i


)


=

{






-
c

+


pq
u



r
i



,




u
<

k
-


κ
i

(

T
i

)










-
c

+

c


q
u



f


T
i

-
1




(

k
-
u

)



,





u


k
-


κ
i



(

T
i

)




,










(
16
)








where, fTi−1(w)=pw+q. Note however that, unlike ΔTi(u), the marginal reward at time Ti−1 is also a function of k, the number of resources left with the DM. The optimal allocation is determined by the threshold value at which the marginal reward becomes negative.


Let κi(Ti−1)=custom character s.t. custom characterri<c. We have the following results.


Lemma 1 κi(Ti−1)≤κi(Ti).


Proof. If κi(Ti)=0, ri=ri and it follows from the definition of κi(Ti−1) that it will also equal 0. Suppose κi(Ti)>0. We have:















r
i

-


r
_

i


=


-
c



κ
i

(

T
i

)


+


[

1
-

q


κ
i

(

T
i

)



]



r
i









=


-
c



κ
i

(

T
i

)


+

p






=
0




κ
i

(

T
i

)

-
1




q




r
i











=






=
0




κ
i

(

T
i

)

-
1



[



pq




r
i


-
c

]


>
0





,




(
17
)








where (17) follows from the definition of κi(Ti). In other words, custom characterri>c for all custom character ∈ [0, κi(Ti)). It immediately follows from the definition of κi(Ti−1) that κi(Ti−1)≤κi(Ti).


From the definition of the marginal reward (16), we note that determining when it switches sign depends in part on fTi−1(k−u). Clearly, fTi−1(k−u)=p(k−u)+q is a decreasing function of u. Since, q0fTi−1(k)=pk+q≥1 and qkfTi−1(0)=qk+1<1, ∃u*(k)=minu∈[0,k]u s.t. qufTi−1(k−u)≤1.


Lemma 2











u
*

(

k
-
1

)

=

{






u
*

(
k
)

,






if



q


u
*

(
k
)




f


T
i

-
1




(

k
-


u
*



(
k
)



)


<
q

,









u
*

(
k
)

-
1

,




otherwise
.









(
18
)







Proof. By definition we have: qu*(k)fTi−1(k−u*(k))≤1 and qu*(k)−1fTi−1(k−u*(k)+1)>1. In addition, we can write:













q


u
*

(
k
)





f


T
i

-
1


(

k
-
1
-


u
*

(
k
)


)


=





q


u
*

(
k
)





f


T
i

-
1


(

k
-


u
*

(
k
)


)


-

pq


u
*

(
k
)



<
1.






(
19
)













q



u
*

(
k
)

-
2





f


T
i

-
1


(

k
-
1
-


u
*

(
k
)

+
2

)


=




q



u
*

(
k
)

-
2





f


T
i

-
1


(

k
-


u
*

(
k
)

+
1

)


>
1.










q



u
*

(
k
)

-
1





f


T
i

-
1


(


k
-
1

,



u
*

(
k
)

-
1


)


=





q



u
*

(
k
)

-
1





f


T
i

-
1


(

k
-


u
*

(
k
)


)





u
*

(

k
-
1

)


=


{







u
*

(
k
)

-
1

,






if







q


u
*

(
k
)




f


T
i

-
1




(

k
-


u
*



(
k
)



)



q

,








u
*

(
k
)

,




otherwise
.










Theorem 1 The optimal allocation policy at Ti−1 is dictated by:
















μ
i

(

k
,


T
i

-
1


)

=


{






u
*

(
k
)

,





k




κ
i



(

T
i

)


+


κ
i



(


T
i

-
1

)




,








κ
i

(


T
i

-
1

)

,





k
>



κ
i



(

T
i

)


+


κ
i



(


T
i

-
1

)




,













(
20
)











where




u
*

(



κ
i

(

T
i

)

+


κ
i

(


T
i

-
1

)


)


=




κ
i

(


T
i

-
1

)



and


for


1

<
k




κ
i

(

T
i

)

+


κ
i

(


T
i

-
1

)




,













u
*

(

k
-
1

)

=

{







u
*

(
k
)

-
1

,






if



q


u
*

(
k
)




f


T
i

-
1




(

k
-


u
*



(
k
)



)



q

,








u
*

(
k
)

,




otherwise
.










(
21
)







Proof. First, we note that the marginal reward function ΔT−1i(k,u) defined earlier (16) is monotonic decreasing in u. Indeed, for u∈[0, k−κi(Ti)), ΔT−1i(k,u)=−c+pquri and since q<1 it is decreasing in u. For u≥k−κi(Ti), ΔT−1i(k,u)=−c(1−qu+1)+cpqu(k−u) which is clearly also a decreasing function of u. So, we only need to show that:

ΔT−1i(k,k−κi(Ti)−1)>ΔT−1i(k,k−κi(Ti)).

From the definition of κi(Ti), we have: pqκi(Ti)−1ri>c. So, we can write:














Δ

T
-
1

i

(

k
,

k
-


κ
i

(

T
i

)



)


=


-
c

+


q

k
-


κ
i

(

T
i

)





c
[


p



κ
i

(

T
i

)


+
q

]











<


-
c

+


q

k
-


κ
i

(

T
i

)

-
1



cp


κ
i



(

T
i

)


+


q

k
-


κ
i

(

T
i

)




c











<


-
c

+


q

k
-


κ
i

(

T
i

)

-
1



cp


κ
i



(

T
i

)


+


q

k
-


κ
i

(

T
i

)



[


pq



κ
i

(

T
i

)

-
1



r

]












=


-
c

+


pq

k
-


κ
i

(

T
i

)

-
1





r
_

i












=



Δ

T
-
1

i

(

k
,

k
-


κ
i



(

T
i

)


-
1


)

.









(
22
)








As before, It follows that the optimal allocation u* is the threshold value at which the marginal reward becomes negative. However, the added complexity here is the dependence on k. So, we deal with the three possible scenarios that lead to different threshold values:

    • k>κi(Ti)+κi(Ti−1): For this case, κi(Ti−1)<k−κi(Ti). We have already shown that ΔT−1i(k,u) decreases for u ∈ [0, k−κi(Ti)) and so, the threshold occurs at u*=κi(Ti−1). Indeed, this follows from the definition of κi(Ti−1).
    • k=κi(Ti)+κi(Ti−1): For this case, k−κi(Ti)=κi(Ti−1). As such, ΔT−1i(k,u)>0 for u ∈ [0, k−κi(Ti)). From the definition of κi(Ti), we have: pqκi(Ti)−1ri>c⇒pqκi(Ti)ri>cq. The marginal reward at κi(Ti−1) is given by:














Δ

T
-
1

i

(

k
,


κ
i

(


T
i

-
1

)


)

=




c
[


p



κ
i

(

T
i

)


+
q

]



q


κ
i

(


T
i

-
1

)



-
c







=




[


pc



κ
i

(

T
i

)


+
cq

]



q


κ
i

(


T
i

-
1

)



-
c







<




[


c



κ
i

(

T
i

)


+


q


κ
i

(

T
i

)




r
i



]



pq


κ
i

(


T
i

-
1

)



-
c








=





pq


κ
i

(


T
i

-
1

)





r
_

i


-
c


0


,







(
23
)








where the last inequality follows from the definition of κi(Ti−1). So, it follows that the threshold value at which the marginal reward turns negative is given by u*=κi(Ti−1).

    • k<κi(Ti)+κi(Ti−1): For this case, k−κi(Ti)<κi(Ti−1). As such, ΔT−1i(k,u)>0 for u ∈ [0, k−κi(Ti)). So, the threshold must occur at some u ∈ [k−κi(Ti), k] at which qufTi−1(k−u) dips below unity. From Lemma 2 it follows that: for










k




κ
i

(

T
i

)

+


κ
i

(


T
i

-
1

)



,



u
*

(

k
-
1

)

=

{







u
*

(
k
)

-
1

,






if



q


u
*

(
k
)




f


T
i

-
1




(

k
-


u
*



(
k
)



)



q

,








u
*

(
k
)

,




otherwise
.










(
24
)








with the boundary condition u*(κi(Ti)+κi(Ti−1))=κi(Ti−1), which was established in 2).


In summary, we have:











μ
i

(

k
,


T
i

-
1


)

=

{






κ
i

(


T
i

-
1

)

,





k




κ
i



(

T
i

)


+


κ
i



(


T
i

-
1

)




,








u
*

(
k
)

,





k
<



κ
i



(

T
i

)


+


κ
i



(


T
i

-
1

)




,









(
25
)








where u*(k) is computed via the backward recursion (24).


It follows from (15) that the optimal value function at time Ti−1 is given by:











V
i

(

k
,


T
i

-
1


)

=

{






-
c



κ
i

(


T
i

-
1

)


+

r
i

-


q


κ
i

(


T
i

-
1

)





r
_

i



,





k




κ
i



(

T
i

)


+


κ
i



(


T
i

-
1

)




,








-


cu
*

(
k
)


+


[

1
-

q
k


]



r
i


-


q


u
*

(
k
)




c

(

k
-


u
*

(
k
)


)



,




otherwise
.









(
26
)







Corollary 1 The optimal allocation at time Ti−1 is such that the number of resources left at the last stage Ti is no more than the optimal threshold κi(Ti). In other words,

k−μi(k,Ti−1)≤κi(Ti), k<κi(Ti)+κi(Ti−1).  (27)


Proof. This follows from (24). As k decreases by a value of 1, u*(k) either remains the same or goes down by 1. So, it cannot decrease fast enough that k−u*(k) exceeds the threshold κi(Ti). In other words, when the total number of resources is less than the sum of thresholds κi(Ti)+κi(Ti−1), no resources will be left unused at the final time Ti.


Threshold Based Optimal Policy and Recursion

In light of Theorem 1, one can easily compute the optimal thresholds i.e., number of resources to be allocated at each decision epoch when resources are aplenty. In particular, we have the result: for any t∈[1, Ti],












μ
i

(

k
,
t

)

=


κ
i

(
t
)


,

k




κ
i

(


T
i

-

)



,




(
28
)








where the thresholds can be computed via a backward recursion. For any t=Ti, . . . , 1:















κ
i



(
t
)


=



min



0







s
.
t
.


pq





R


T
i

-
t




c


,








R


T
i

-
t
+
1


=


c



κ
i

(
t
)


+


q



κ


i



(
t
)





R


T
i

-
t





,







(
29
)








with the boundary condition, R0=ri. The optimal value function is given by:












V
i

(

k
,
t

)

=



-
c




κ
i

(
t
)


+

r
i

-


q


κ
i

(
t
)




R


T
i

-
t





,

k





κ
i

(


T
i

-

)

.







(
30
)







In other words, one can quickly compute the thresholds κi(t) over the entire time window without resorting to the non-linear Bellman recursion. At decision time t, the DM allocates exactly the threshold amount so long as the number of available resources is no less than the sum of thresholds over the remainder of the time horizon.


Remarks

This also tells us that for any task i, the DM requires at most a maximum of








κ
i

(


T
i

-

)






resources. Furthermore, since the rewards custom character are strictly decreasing with custom character (see (17) in Lemma 1), it follows that κi(Ticustom character) must go to zero as custom character increases. Indeed, there exists n(i) ∈ [0, ∞) such that κi(T(i)−n(i))=0. So, the DM need never allocate more than








κ
i

(


T
i

-

)






resources to task i and furthermore, the window of opportunity for target i needs to be no longer than n(i). If the window is any longer, it is optimal for the DM to simply hold off without assigning any resources until the remaining time window shrinks to n(i).


Numerical Example

We use as an example the allocation of maintenance personnel to a piece of industrial equipment prone to failure to illustrate our method and results. The example problem involves the allocation of maintenance personnel to a single equipment over a window of opportunity, T=10. The reward, r=90, the probability of resolving the fault, p=0.25 and the cost of allocating a maintainer, c=1. According to various configurations, since the cost of allocating one maintainer is c=1, minimizing this cost is equivalent to minimizing the number of maintainers. Using the backward recursion (29), we compute the threshold values, κ(j), j=1, . . . , 10 as shown in FIG. 3. In particular, κ(10)=11 and κ(9)=5. So, we compute the optimal allocation at time t=9 for all possible k<κ(10)+κ(9) as shown in FIG. 4. As per Lemma 2, the optimal allocation starts at κ(9) and either remains the same or decreases by one with decreasing k.


Single Task Optimal Allocation at All Stages

In the previous section, we have completely characterized the optimal allocation at stages Ti and Ti−1. Note that the optimal allocation at stage Ti is linear up to the threshold κi(Ti) and a constant thereafter. The optimal allocation at stage Ti−1 is far more interesting in that it is piecewise constant and monotonic non-decreasing up to the threshold κi(Ti−1). Arguably, the optimal allocation at earlier stages will also exhibit a piecewise constant and monotonic non-decreasing behavior. The difficulty is in computing the switching points where the allocation increases by 1. When resources are abundant, we have the complete solution. The interesting case is when resources are scarce where it is not obvious how to distribute them among the different stages knowing that future stages may not come into play. Nonetheless, one can still generalize some of the results that have been shown to be true for stage Ti−1. Indeed, the optimal value function for earlier stages generalizes (26) and has the form: for custom character>0 and k≤















h
=
0





κ
i

(


T
i

-
h

)


,





(
31
)












V
i

(

k
,


T
i

-



)

=



-
c


μ

(

k
,


T
i

-



)


+


[

1
-

q

μ

(

k
,


T
i

-



)



]



r
i


+


q

μ

(

k
,


T
i

-



)




{



-
c



μ

(



k
~

1

,


T
i

-

+
1


)


+


[

1
-

q

μ

(



k
~

1

,


T
i

-

+
1


)



]



r
i


+


q

μ

(



k
~

1

,


T
i

-

+
1


)


[

]


}



=


-
c


μ

(

k
,


T
i

-



)


+


[

1
-

q
k


]



r
i


-

c





j
=
1






q




n
=
0


j
-
1



μ

(



k
~

n

,


T
i

-

+
n


)





μ

(



k
~

j

,


T
i

-

+
j


)







,











k
~

j

=



k
~


j
-
1


-

μ

(



k
~


j
-
1


,


T
i

-

+
j
-
1


)



,

j
=
1

,


,

,


and




k
~

0


=

k
.








The above value function reflects the fact that the DM incurs a cost of c×μ({tilde over (k)}j, Ticustom character+j) only if the resources allocated at all previous stages Ticustom character+n, n ∈ [0, j−1] are unsuccessful in completing the task. Moreover, the optimal allocation at stage Ticustom character+j is a function of the number of resources left at the stage, custom character, which, by definition, is the number of resources allocated at previous stages deducted from the initial inventory of k resources. Since all k resources are allocated, the expected reward equals [1−qk]ri. The exact distribution of the k resources amongst the different stages requires solving the Bellman recursion. For the special case of k=custom characterκi(Ti−h), the result is immediate and the optimal allocation, μ({tilde over (k)}j, Ticustom character+j)=κi(Ticustom character+j), j ∈ [0, custom character].


From (31), we can also generalize the marginal reward function. Suppose u resources are allocated at stage Ticustom character and optimal allocations are made thereafter. We can write:












W
i

(

k
,
u
,


T
i

-



)

=


-
cu

+


[

1
-

q
k


]



r
i


-


cq
u

[


μ

(



k
~

1

,


T
i

-

+
1


)

+




j
=
2






q




n
=
1


j
-
1



μ

(



k
~

n

,


T
i

-

+
n


)





μ

(



k
~

j

,


T
i

-

+
j


)




]



,




(
32
)














k
~

j

=



k
~


j
-
1


-

μ

(



k
~


j
-
1


,


T
i

-

+
j
-
1


)



,

j
=
2

,


,

,


and




k
~

1


=

k
-

u
.









It follows that the marginal reward at stage Ticustom character is given by:














Δ


T
i

-


i

(

k
,
u

)

=




W
i

(

k
,

u
+
1

,


T
i

-



)

-


W
i

(

k
,
u
,


T
i

-



)








=



-
c

+


cq
u



{


μ


(


k
-
u

,


T
i

-

+
1


)


-

q


μ


(


k
-
u
-
1

,


T
i

-

+
1


)















+




j
=
2





q




n
=
1


j
-
1



μ

(



k
~

n

,


T
i

-

+
n


)




μ


(



k
~

j

,


T
i

-

+
j


)














-
q






j
=
2





q




n
=
1


j
-
1



μ

(



k
_

n

,


T
i

-

+
n


)




μ


(



k
_

j

,


T
i

-

+
j


)




}

,







(
33
)















k
_

j

=



k
_


j
-
1


-

μ
(



k
_


j
-
1


,


T
i

-

+
j
-
1


)



,

j
=
2

,


,

,


and




k
_

1


=

k
-
u
-
1.






(
34
)








Let custom character(k,u)=−c+cqucustom character(k−u), where custom character(k−u) represents the quantity inside the curly brackets in (33). Assuming that the unit incremental property of the optimal allocation (see Lemma 2) holds for all stages, we have either μ({tilde over (k)}1, Ticustom character+1)=μ({tilde over (k)}1−1, Ticustom character+1) or μ({tilde over (k)}1, Ticustom character+1)=μ({tilde over (k)}1−1, Ticustom character+1)+1. Accordingly,

    • 1) if μ({tilde over (k)}1, Ticustom character+1)=μ({tilde over (k)}1−1, Ticustom character+1), we have:











f


T
i

-



(

k
-
u

)

=


p


μ

(



k
~

1

,


T
i

-

+
1


)






(
35
)








+


q

μ

(



k
~

1

,


T
i

-

+
1


)


[




j
=
2






q




n
=
2


j
-
1



μ

(



k
~

n

,


T
i

-

+
n


)





μ

(



k
~

j

,


T
i

-

+
j


)

















-
q






j
=
2





q




n
=
2


j
-
1



μ

(



k
_

n

,


T
i

-

+
n


)




μ


(



k
_

j

,


T
i

-

+
j


)




]










=



p

μ


(



k
~

1

,


T
i

-

+
1


)


+


q

μ

(



k
~

1

,


T
i

-

+
1


)




f


T
i

-

+
1




(



k
~

1

-

μ


(



k
~

1

,


T
i

-

+
1


)



)




,





(
36
)








where we recognize that the expression inside the square brackets in (35) is, by definition, fTicustom character1 ({tilde over (k)}1−μ({tilde over (k)}1, Ticustom character+1)).

    • if μ({tilde over (k)}1, Ticustom character+1)=μ({tilde over (k)}1−1, Ticustom character+1)+1, we have: k1={tilde over (k)}1−1 and kj={tilde over (k)}j, ∀j≥2. It follows that,

      custom character(k−u)=pμ({tilde over (k)}1,Ticustom character+1)+q.  (37)


In addition to the unit incremental property, further suppose (as in Lemma 2)), that:

μ({tilde over (k)}1−1,Ticustom character+1)=μ({tilde over (k)}1,Ticustom character+1)−2) if custom charactercustom character({tilde over (k)}1,μ({tilde over (k)}1−Ticustom character+1))≤q.  (38)

Combining (36), (37) and (38), we have:

custom character(w)=pμ(w,T1custom character+1)+max{qcustom charactercustom character(w−μ(w,Ticustom character+1))}.  (39)

Recall that the optimal allocation at the last stage, μ(k,Ti)=k, k≤κi(Ti). So, we have the initial condition for the update given by:

fTi−1(w)=pμ(w,Ti)+q=pw+q,  (40)

which matches with our earlier definition for fTi−1(w). Finally, we can extend the optimal allocation result to all stages by generalizing Theorem 1.


Theorem 2 The optimal allocation policy at Ticustom character for any custom character>0 is dictated by:













μ
i

(

k
,


T
i

-



)

=

{






u
*

(
k
)

,





k





h
=
0





κ
i



(


T
i

-
h

)




,








κ
i

(


T
i

-


)

,





k
>




h
=
0





κ
i



(


T
i

-
h

)




,










(
41
)











where




u
*

(




h
=
0





κ
i

(


T
i

-
h

)


)


=




κ
i

(


T
i

-


)



and


for


1

<
k





h
=
0





κ
i

(


T
i

-
h

)




,













u
*

(

k
-
1

)

=

{







u
*

(
k
)

-
1

,






if



q


u
*

(
k
)




f


T
i

-





(

k
-


u
*



(
k
)



)



q

,








u
*

(
k
)

,




otherwise
.










(
42
)








The update function for custom character(.) is given by (39) with the initial condition (40).


To prove the above result, we have to generalize the results shown for fTi−1(.) to the general case of custom character(.) for any custom character>1. Recall the optimal policy at the terminal time is given by:











μ
i

(

k
,

T
i


)

=

{




k
,





k
<


κ
i



(

T
i

)



,








κ
i

(

T
i

)

,




k



κ
i




(

T
i

)

.











(
43
)







Multiple Temporally Disjoint Tasks

We return our attention to the multiple task arrivals. The ensuing analysis is greatly simplified when the arriving tasks are temporally disjoint. In other words, suppose the time windows [tsi,tfi] for different tasks are completely disjoint. Without loss of generality, we can assume that the N tasks are ordered temporally as follows: 1=ts1<tf1<ts2<tf2< . . . <tsN<tfN=T. Furthermore, we can also assume that tsi+1=tfi+1 for all i<N since the DM cannot allocate anything between two successive disjoint windows of opportunity.


Concurrent Tasks

This case deals with the scenario wherein there are two or more tasks such that the intersection of their windows of opportunity is non empty. The curse of dimensionality may render the dynamic programming recursion intractable. However, we can use the notion of marginal reward from the single task case to establish heuristic policies. The idea would be to assign resources greedily to tasks that yield the maximal marginal reward.


Random Task Arrivals

This would be the most complex scenario, wherein the windows of opportunity are no longer fixed. In other words, suppose the length of each task window Ti is known, but the start time is random. The original recursion has to be modified to account for random arrivals. For example, we could assume that each time step task of type i would appear with probability αi. Furthermore, we could assume that once a task arrives, no other task can arrive until either the current task is completed or its window of opportunity expires.


The allocation of resources is described in more detail in the following paragraphs. According to various implementations, an asset performs a sequence of identical missions at time, t=1, 2, . . . , T and is brought into service in between missions. The asset is comprised of m independent sub-systems connected in series with each sub-system i comprising ni independent, identical constant failure rate (CFR) components connected in parallel. At any point in time, a component is either functioning or has failed. A sub-system is also either functioning (if at least one of its components is functioning) or has failed. The asset is functioning and can perform a mission successfully only if all of its sub-systems are functioning. A faulty component can be repaired during the break between two consecutive missions. We assume that each component in sub-system i has reliability ri i.e., a functioning component at the start of a mission fails during the mission with probability qi=1−ri. In this example, each component requires exactly 1 man hour to be repaired. The decision maker (DM) has at his disposal a total budget of M man hours over the entire planning horizon. There is a unit cost c>0 borne by the DM for each man hour expended. We assume “complete information” in that each component's status is known to the DM at the end of a mission (or beginning of a break). We wish to maximize the asset reliability while minimizing cumulative service cost over the remainder of the planning horizon. At the end of mission t, suppose si≤ni components are found to be faulty in sub-system i and the DM allocates ui≤si man-hours for repairing subsystem i. The reliability of subsystem i for mission t+1 is given by: 1−qini−si+ui. In other words, this is the probability that not all functioning components in subsystem i fail during the (t+1)th mission. Correspondingly, the reliability of the asset for mission t+1 is given by:








Π

i
=
1

m

[

1
-

q
i


n
i

-

s
i

+

u
i




]

.




Single Mission Allocation

Suppose we have a budget of k man-hours and a single mission for which we wish to maximize asset reliability and minimize resource use (cost). As before, let si≤ni denote the number of faulty components before the start of the mission. Let us allocate ui≤si man hours towards repairing components in sub-system i. The optimization problem becomes:










max
u


{





i
=
1

m


[

1
-

q
i


n
i

-

s
i

+

u
i




]


-

c





i
=
1

m


u
i




}





(
44
)









subject


to











i
=
1

m


u
i




k
.






For this single stage optimization problem, it can be shown that the optimal allocation is given by the Maximal Marginal Reward (MMR) algorithm. Indeed, we assign one man-hour at a time to the sub-system that yields the highest marginal reward for the additional allocation. Suppose we have already assigned ui man-hours to sub-system i. The marginal reward i.e., increase in asset reliability yielded by assigning an additional man hour to sub-system i minus the additional cost is given by:












Δ
i

(
u
)

=



r
i



q
i


n
i

-

s
i

+

u
i






Π


j
=
1

;

j

i


m

[

1
-

q
j


n
j

-

s
j

+

u
j




]


-
c


.




(
45
)








We assign the next available man hour to the sub-system with the highest payoff: arg maxii(u)]. We stop assigning resources when this quantity becomes negative indicating that any additional allocation yields a negative (marginal) reward.


Multi-Mission Allocation

Suppose at decision time t, the DM is left with k man-hours and the number of faulty components in sub-system i is given by si and the state vector, s=(s1, . . . , sm). Let the DM allocate ui≤si man-hours towards repairing components in sub-system i. The corresponding expected future payoff or value function is given by:











V

(

k
,
s
,
t

)

=


max


u
i

;





i
=
1

m


u
i



k




{





i
=
1

m


[

1
-

q
i


n
i

-

s
i

+

u
i




]


-

c





i
=
1

m


u
i



+




s






Prob
(

s
_

)



V
(


k
-




i
=
1

m



u
i



,

s
_

,

t
+
1


)




}



,




(
46
)













s
i

=
1

,


,

n
i

,


i

,






where the number of failed components in sub-system i at the end of mission t+1 is given by si=si−ui+zi. Note that zi is a binomial random variable with ni−si+ui individual and identical Bernoulli trials having probability of success qi, i.e., zi˜B(ni−si+ui, qi). Suppose we wish to assign an additional resource (man-hour) to sub-system i. As before, the immediate marginal reward yielded by this allocation is given by:












Δ
i

(
u
)

=



r
i



q
i


n
i

-

s
i

+

u
i






Π


j
=
1

;

j

i


m

[

1
-

q
j


n
j

-

s
j

+

u
j




]


-
c


.




(
47
)








On the other hand, if the additional resource is kept in reserve for future stages (breaks), the expected marginal future reward is given by:











W

(


k
-




i
=
1

m


u
i



,


t
+
1


)

-

W

(


k
-




i
=
1

m


u
i


-
1

,


t
+
1


)


,




(
48
)








where:

W(k,t+1)=ΣsProb(s)V(k,s,t+1).  (49)


Single Sub-System Multi-Mission Allocation

Suppose we are only interested in the optimal allocation of resources to a single sub-system i. Recall decisions are made at t=1, . . . , T before the start of the tth mission. We assume that all components are healthy before the start of the 1st mission. Suppose at decision time t, the DM is left with k man-hours and the number of faulty components is given by s. We wish to minimize the cost of labor and maximize the number of successful missions. The unit cost c>0 can be appropriately chosen to trade off the cost of labor with mission success probability. The corresponding sub-system specific value function is given by:












V
i

(

k
,
s
,
t

)

=


max

u


[

0
,

min
(

k
,
s

)


]




{



R
i

(

s
,
u

)

+





n
i




s
¯

=

s
-
u




Prob


(

s
¯

)




V
i

(


k
-
u

,

s
¯

,

t
+
1


)




}



,




(
50
)












k
=
1

,


,
M
,

s
=
1

,


,

n
i

,



t


[

1
,

T
-
1


]



,






where where the immediate reward associated with allocation u is given by Ri(s,u)=1−qini−s+u−cu and the number of failed components at the end of mission t+1 is given by s=s−u+zi. Note that zi is a binomial random variable with ni−s+u individual and identical Bernoulli trials having probability of success qi, i.e., zi˜B(ni−s+u,qi). Let the binomial mass distribution function be denoted by







b

(

y
,

,

q
i


)

=


(



y








)



q
i





r
i

y
-


.







So, the Bellman recursion becomes:












V
i

(

k
,
s
,
t

)

=


max

u


[

0
,

min
(

k
,
s

)


]




{



R
i

(

s
,
u

)

+





=
0



n
i

-
s
+
u




b

(



n
i

-
s
+
u

,

,

q
i


)




V
i

(


k
-
u

,

s
-
u
+

,

t
+
1


)




}



,




(
51
)












k
=
1

,


,
M
,

s
=
1

,


,

n
i

,



t



[

1
,

T
-
1


]

.









The optimal policy is given by:












μ
i

(

k
,
s
,
t

)

=

arg


max

u


[

0
,

min
(

k
,
s

)


]




{



R
i

(

s
,
u

)

+





=
0



n
i

-
s
+
u




b

(



n
i

-
s
+
u

,

,

q
i


)




V
i

(


k
-
u

,

s
-
u
+

,


t
+
1


)




}



,




(
52
)












k
=
1

,


,
M
,

s
=
1

,


,

n
i

,



t



[

1
,

T
-
1


]

.









At the terminal decision epoch, we have the boundary condition:












V
i

(

k
,
s
,
T

)

=


max

u


[

0
,

min
(

k
,
s

)


]





R
i

(

s
,
u

)



,

k
=
1

,


,
M
,

s
=
1

,


,


n
i

.





(
53
)








When there are no man hours left, the system evolves autonomously and we have:












V
i

(

0
,
s
,
t

)

=

1
-

q
i


n
i

-
s


+


b

(



n
i

-
s

,
,

q
i


)




V
i

(

0
,

s
+

,

t
+
1


)




,

s
=
1

,


,

n
i

,

t



[

1
,

T
-
1


]

.






(
54
)







Lemma 1












V
i

(

0
,
s
,

T
-
h
+
1


)

=

h
-




j
=
1

h



(

1
-

r
i
j


)



n
i

-
s





,

s
=
1

,


,

n
i

,



h



[

1
,
T

]

.







(
55
)







Proof. From the boundary condition (53), we have:

Vi(0,s,T)=1−qini−s, s=1, . . . ,ni.  (56)

We make the induction assumption that:












V
i

(

0
,
s
,

T
-
h
+
1


)

=

h
-




j
=
1

h



(

1
-

r
i
j


)



n
i

-
s





,

s
=
1

,


,


n
i

.





(
57
)








for some h>1. It follows that:











V
i

(

0
,
s
,

T
-
h


)

=

1
-

q
i


n
i

-
s


+





=
0



n
i

-
s





b

(



n
i

-
s

,

,

q

i




)



V

i





(

0
,

s
+

,

T
-
h
+
1


)













=

1
-

q
i


n
i

-
s


+





=
0



n
i

-
s




(





n
i

-
s









)



q
i





r
i


n
i

-
s
-



[

h
-




j
=
1


h



(

1
-

r

i


j


)



n
i

-
s
-





]








(
58
)






=

h
+
1
-

q
i


n
i

-
s


-


q
i


n
i

-
s








=
0



n
i

-
s




(





n
i

-
s









)



r
i


n
i

-
s
-








j
=
1


h



(


1
-

r

i


j



q
i


)



n
i

-
s
-

















=

h
+
1
-

q
i


n
i

-
s


-


q
i


n
i

-
s








=
0



n
i

-
s




(





n
i

-
s









)



r
i


n
i

-
s
-








j
=
1


h



(




a
=
0


j
-
1



r
i
a


)



n
i

-
s
-

















=

h
+
1
-


q
i


n
i

-
s


[

1
+




j
=
1


h






=
0



n
i

-
s




(





n
i

-
s







)




(




a
=
1

j


r
i
a


)



n
i

-
s
-







]






(
59
)






=

h
+
1
-


q
i


n
i

-
s


[

1
+




j
=
1


h



(

1
+




a
=
1

j


r
i
a



)



n
i

-
s




]











=

h
+
1
-


q
i


n
i

-
s


[

1
+




j
=
1


h



(




a
=
0

j


r
i
a


)



n
i

-
s




]











=

h
+
1
-




j
=
1


h
+
1





(

1
-

r
i
j


)



n
i

-
s


.







(
60
)








We have used the induction assumption in (58) and repeatedly used







q
i

=


1
-


r
i



and


1

-

r
i
b


=


q
i



Σ

j
=
0


b
-
1




r
i








to arrive at (60).


At the terminal decision epoch T, if there are s faulty components, we will at most need only s man hours to repair them. So, the optimal allocation for k>s is the same as the optimal allocation for k=s. With this in mind, recall the boundary condition:












V
i

(

k
,
s
,
T

)

=


max

u


[

0
,

min
(

k
,
s

)


]





R
i

(

s
,
u

)



,

s
=
1

,


,


n
i

.





(
61
)







Suppose the DM has u+1 resources left at the terminal time T. We denote the marginal reward yielded by assigning 1 additional resource over and above u resources to the active task i as:

ΔTi(u)=Ri(s,u+1)−Ri(s,u)=−c+qini−s+uri.  (62)

Since qi<1 it follows that ΔTi(u) is monotonic decreasing in u. Let κi(T) be the least non-negative integer at which the marginal reward becomes negative i.e., κi(T)=custom characters.t. custom characterri<c. It follows that the optimal policy at the terminal decision epoch is given by: for k≤s,











μ
i

(

k
,
s
,
T

)

=


arg


max

u


[

0
,
k

]





R
i

(

s
,
u

)


=

{




0
,







n
i

-
s




κ
i



(
T
)



,






k
,







n
i

-
s
+
k




κ
i



(
T
)



,









κ
i

(
T
)

-

n
i

+
s

,





othe

rwise

.










(
63
)








For k>s, the optimal assignment is given by:











μ
i

(

k
,
s
,
T

)

=



μ
i

(

s
,
s
,
T

)

=

{




0
,







n
i

-
s




κ
i



(
T
)



,






s
,






n
i




κ
i

(
T
)


,









κ
i

(
T
)

-

n
i

+
s

,





othe

rwise

.










(
64
)








In other words, we assign as many resources as possible so that ni−s+u gets close to the threshold κi(T) without exceeding it. The optimal terminal value is given by:

Vi(k,s,T)=Ri(s,μi(k,s,T))=1−qini−s+μi(k,s,T)−cμi(k,s,T).  (65)

We will show likewise that at each decision epoch, there is an upper bound to the number of resources that the DM shall assign (optimally) for the remainder of the time horizon. Having computed the terminal value function, we are now in a position to compute the value function at the previous decision epoch. Indeed, we have from (50):












V
i

(

k
,
s
,

T
-
1


)

=


max

u


[

0
,

min
(

k
,
s

)


]




{


R

(

s
,
u

)

+

Q

(

k
,
s
,
u
,
T

)


}



,




(
66
)










k
=
1

,


,
M
,

s
=
1

,


,

n
i

,





where the future expected reward associated with allocation u is given by:










Q

(

k
,
s
,
u
,
T

)

=


b

(



n
i

-
s
+
u

,
,

q
i


)




V
i

(


k
-
u

,

s
-
u
+

,
T

)






(
67
)








We wish to show that the value function Vi(k,s,T−1) has a threshold based analytical form similar to Vi(k,s,T).


Case 1: u<κi(T)−ni+s and k−u≥κi(T). It follows that:

n−s+u−custom characteri(T) and n−s+k−custom character≥κi(T)∀custom character.  (68)
⇒μi(k−u,s−u+custom character,T)=κi(T)−ni+s−u+custom character.  (69)

So, we have:

















Q

(

k
,
s
,
u
,
T

)

=





=
0



n
i

-
s
+
u




b

(



n
i

-
s
+
u

,

,

q
i


)




V
i

(


k
-
u

,

s
-
u
+


,
T

)









=





=
0



n
i

-
s
+
u




b

(



n
i

-
s
+
u

,

,

q
i


)

[

1
-

c

(



κ
i

(
T
)

-

n
i

+
s
-
u
+


)

-

q
i


κ
i

(
T
)



]








=

1
-

c



κ
i

(
T
)


-

q
i


κ
i

(
T
)


+

c






=
0



n
i

-
s
+
u




b

(



n
i

-
s
+
u

,

,

q
i


)



(


n
i

-
s
+
u
-


)











=

1
-

c


κ
i



(
T
)


-

q
i


κ
i

(
T
)


+

c






=
0



n
i

-
s
+
u




(





n
i

-
s
+
u









)



q
i




r
i


n
i

-
s
+
u
-





(


n
i

-
s
+
u
-


)











=

1
-

c


κ
i



(
T
)


-


q
i


κ
i

(
T
)



cr


(


n
i

-
s
+
u

)







=
0



n
i

-
s
+
u
-
1




(





n
i

-
s
+
u
-
1









)



q
i




r
i


n
i

-
s
+
u
-
1
-













=

1
-

c


κ
i



(
T
)


-

q
i


κ
i

(
T
)


+

c

r


(


n
i

-
s
+
u

)







.










(
70
)







Case 2: n−s+k≤κi(T). It follows that:

n−s+k−custom character≤κi(T)∀custom character.  (71)
⇒μi(k−u,s−u+custom character,T)=k−u.  (72)

So, we have:














Q

(

k
,
s
,
u
,
T

)

=





=
0



n
i

-
s
+
u




b

(



n
i

-
s
+
u

,
,


q
i


)




V
i

(


k
-
u

,

s
-
u
+


,
T

)









=





=
0



n
i

-
s
+
u




b

(



n
i

-
s
+
u

,

,

q
i


)

[

1
-

c

(

k
-
u

)

-

q
i


n
i

-
s
+
k
-




]








=

1
-

c

(

k
-
u

)

-





=
0



n
i

-
s
+
u




b

(



n
i

-
s
+
u

,

,

q
i


)



q
i


n
i

-
s
+
k
-












=

1
-

c


(

k
-
u

)


-





=
0



n
i

-
s
+
u




(





n
i

-
s
+
u









)



q
i




r
i


n
i

-
s
+
u
-





q
i


n
i

-
s
+
k
-












=

1
-

c


(

k
-
u

)


-


q
i


n
i

-
s
+
k








=
0



n
i

-
s
+
u




(





n
i

-
s
+
u









)



r
i


n
i

-
s
+
u
-













=

1
-

c


(

k
-
u

)


-



q
i


n
i

-
s
+
k


(

1
+
r

)



n
i

-
s
+
u







.




(
73
)








It follows that:

R(s,u)+Q(k,s,u,T)=2−ck−qini−s+u−qini−s+k(1+r)ni−s+u.⇒ΔT−1i(k,u)=rqini−s+u[1−qk−u(1+r)ni−s+u].  (74)


Unless otherwise indicated, all numbers expressing feature sizes, amounts, and physical properties used in the specification and claims are to be understood as being modified in all instances by the term “about.” Accordingly, unless indicated to the contrary, the numerical parameters set forth in the foregoing specification and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by those skilled in the art utilizing the teachings disclosed herein. The use of numerical ranges by endpoints includes all numbers within that range (e.g. 1 to 5 includes 1, 1.5, 2, 2.75, 3, 3.80, 4, and 5) and any range within that range.


The various embodiments described above may be implemented using circuitry and/or software modules that interact to provide particular results. One of skill in the computing arts can readily implement such described functionality, either at a modular level or as a whole, using knowledge generally known in the art. For example, the flowcharts illustrated herein may be used to create computer-readable instructions/code for execution by a processor. Such instructions may be stored on a computer-readable medium and transferred to the processor for execution as is known in the art. The structures and procedures shown above are only a representative example of embodiments that can be used to facilitate ink jet ejector diagnostics as described above.


The foregoing description of the example embodiments have been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the inventive concepts to the precise form disclosed. Many modifications and variations are possible in light of the above teachings. Any or all features of the disclosed embodiments can be applied individually or in any combination, not meant to be limiting but purely illustrative. It is intended that the scope be limited by the claims appended herein and not with the detailed description.

Claims
  • 1. A method comprising: monitoring a condition of a physical asset comprising one or more subsystems connected in series, each subsystem comprising one or more components connected in parallel, the physical asset having one or more jobs and experiencing one or more shutdowns between consecutive operational periods, wherein monitoring the condition of the physical asset involves determining that the one or more subsystems are functioning properly;determining a probability of the physical asset surviving a predetermined amount of time based on the monitoring and one or more shared resources, the one or more shared resources configured to be shared between the subsystems;establishing a model using a threshold based heuristic maintenance policy and known failure rates of the one or more components, the model configured to maximize a number of successful jobs that the physical asset is able to complete based on the determined probability; andmitigating downtime and a cost associated with shutdowns of the physical asset, including the one or more subsystems and the one or more components of each subsystem, by allocating the one or more shared resources to the one or more subsystems based on the model.
  • 2. The method of claim 1, wherein the one or more jobs comprise one or more substantially identical jobs.
  • 3. The method of claim 1, wherein the model comprises minimizing a shared resource cost.
  • 4. The method of claim 3, wherein the shared resource cost is calculated using a Maximal Marginal Reward (MMR) algorithm that is configured to maximize an expected payout.
  • 5. The method of claim 1, further comprising determining a maintenance schedule based on the model.
  • 6. The method of claim 1, wherein monitoring the condition of the physical asset comprises determining if all subsystems are functioning.
  • 7. The method of claim 1, wherein the one or more components are redundant within each subsystem.
  • 8. The method of claim 1, wherein the shared resources comprise one or more of man hours and parts used to perform maintenance.
  • 9. The method of claim 1, wherein the one or more shared resources comprise one or both of consumable resources and replenishable resources.
  • 10. A system, comprising: a processor; anda memory storing computer program instructions which when executed by the processor cause the processor to perform operations comprising: monitoring a condition of a physical asset comprising one or more subsystems connected in series, each subsystem comprising one or more components connected in parallel, the physical asset having one or more jobs and experiencing one or more shutdowns between consecutive operational periods, wherein monitoring the condition of the physical asset involves determining that the one or more subsystems are functioning properly;establishing one or more consumable resources that are shared among the one or more subsystems;determining a probability of the physical asset surviving a predetermined amount of time based on the monitoring and the one or more consumable resources;establishing a model using a threshold based heuristic maintenance policy and known failure rates of the one or more components, the model configured to maximize a number of successful jobs that the physical asset is able to complete based on the determined probability; andmitigating downtime and a cost associated with shutdowns of the physical asset, including the one or more subsystems and the one or more components of each subsystem, by allocating the one or more consumable resources to the one or more subsystems based on the model.
  • 11. The system of claim 10, wherein the one or more jobs comprise one or more substantially identical jobs.
  • 12. The system of claim 10, wherein the model comprises minimizing a shared resource cost.
  • 13. The system of claim 12, wherein the shared resource cost is calculated using a Maximal Marginal Reward (MMR) algorithm that is configured to maximize an expected payout.
  • 14. The system of claim 10, wherein the processor is configured to determine a maintenance schedule based on the model.
  • 15. The system of claim 10, wherein monitoring the condition of the physical asset comprises determining if all subsystems are functioning.
  • 16. The system of claim 10, wherein the one or more components are redundant within each subsystem.
  • 17. The system of claim 10, wherein the one or more consumable resources comprise one or more of man hours and parts used to perform maintenance.
  • 18. The system of claim 10, wherein the one or more consumable resources comprise replenishable resources.
US Referenced Citations (10)
Number Name Date Kind
8903750 Bodkin Dec 2014 B1
11106190 Huang Aug 2021 B2
20080140361 Bonissone Jun 2008 A1
20080234994 Goebel Sep 2008 A1
20090210081 Sustaeta Aug 2009 A1
20100017241 Lienhardt Jan 2010 A1
20110246093 Wood Oct 2011 A1
20140156584 Motukuri Jun 2014 A1
20140257526 Tiwari Sep 2014 A1
20170205818 Adendorff et al. Jul 2017 A1
Non-Patent Literature Citations (4)
Entry
Wu, Sze-jung, et al. “A neural network integrated decision support system for condition-based optimal predictive maintenance policy.” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37.2 (2007): 226-236. (Year: 2007).
Patra, Sunandita. Acting, Planning, and Learning Using Hierarchical Operational Models. Diss. University of Maryland, College Park, 2020. (Year: 2020).
Ahadi et al., “Approximate Dynamic Programming for Selective Maintenance in Series-Parallel Systems”, IEEE Transactions on Reliability, vol. 69, Issue 3, Sep. 2020, pp. 1147-1164.
Meuleau et al., “Solving Very Large Weakly Coupled Markov Decision Processes”, Proceedings of the 15th National/Tenth Conference on Artificial Intelligence, Jul. 1998. pp. 165-172.
Related Publications (1)
Number Date Country
20220351106 A1 Nov 2022 US