The present disclosure relates to a novel window-based dynamic scrubbing scheduling algorithm for reducing scrubbing conflicts of a field-programmable gate array (FPGA) scrubbing module and improving reliability of a static random access memory (SRAM)-based FPGA in a high-radiation environment.
A SRAM-based FPGA is an energy-efficient computing platform for applications of smart cars and the like. The SRAM-based FPGA has a powerful computing capability and is flexible in reconfiguring a circuit. However, in terms of system reliability, when FPGAs are exposed to a high-intensity radiation environment like space, they will be affected by single event upset (SEU). In this case, because charged particles impact a chip, a state of a configuration memory of the chip and a state of an on-chip memory (BRAM or Flip-flop) may be flipped [1], which may change a function of hardware and lead to an incorrect operating result.
Currently, there are two types of mainstream methods for ensuring reliability of an FPGA system. In the first type of methods, spatial redundancy, such as triple modular redundancy (TMR) [2, 3], dual modular redundancy (DMR) [4], or interconnection redundancy [5], is used to detect and mitigate an error. The methods based on spatial redundancy achieve high reliability by replicating a user design for a plurality of times. Therefore, the methods based on spatial redundancy have high circuit area and energy overheads, and some of the methods also attempt to balance reliability and a resource overhead by replicating a partial design or strengthening a specific design unit [6]. Although the methods based on spatial redundancy have high reliability, these methods all face problems of high energy consumption and area overheads of an FPGA circuit and long average fault repairing time.
To address these problems, the second type of methods improve the system reliability by using a scheduling-based scrubbing technology instead of the expensive spatial redundancy technology [1, 7, 8]. These methods try to scrub a configuration memory of each task before executing the user task, so as to ensure correctness of the hardware before executing the user task. Due to lack of system redundancy, this type of methods have smaller area and power consumption overheads. However, when there are parallel or massive tasks, there may be numerous scrubbing requests frequently. As a result, a system scrubbing port (ICAP) is busy, decreasing the system reliability.
The present disclosure aims to resolve a technical problem that a SRAM-based FPGA is unreliable in a high-radiation environment.
In order to resolve the above technical problem, the technical solutions of the present disclosure provide a window-based dynamic scrubbing scheduling method, including the following steps:
where conflictcost represents a result of subtracting a total quantity of available ICAP ports in the system from a total quantity of ICAP ports required by the system during a scrubbing period of a currently considered scrubbing job; reliabilitycost represents a time interval between the currently considered scrubbing job and a corresponding user job, cn represents a congestion level on a time node n, Sl represents a minimum unit of time discretization, slmaxik represents a maximum feasible scheduling interval of a kth scrubbing job of an ith scrubbing task, SWi represents running time of the ith scrubbing task, m represents a currently investigated time node, and ξi represents importance of an ith user task;
step 202: scheduling the scrubbing job to a first time node on the shortest path to obtain optimal scheduling of the current scheduling job;
step 203: updating a congestion level cn on a time node on which the scrubbing scheduling job is scheduled; and
step 204: continuously scheduling a remaining scrubbing job until there is no conflict between scrubbing tasks or a specified maximum quantity of iterations is reached;
step 3: identifying a conflicting scrubbing job that cannot be resolved in the step 2, and dynamically deleting some scrubbing tasks to ensure that there is no conflict between legalized scrubbing job scheduling; and if there are a plurality of ICAP scrubbing ports in the FPGA system, using an excess ICAP port to dynamically allocate a scrubbing port for each scrubbing task through graph coloring; and
step 4: iteratively optimizing, based on a local optimal scheduling condition, the scrubbing scheduling generated in the step 3, such that the generated scrubbing scheduling is finally executed by a scrubbing module.
Preferably, the step 1 includes the following steps:
step 101: adjusting a scrubbing cycle STi of each user task by solving the ILP problem in a following formula:
where SWi represents scrubbing time of a scrubbing task corresponding to the ith user task, Ti represents a running cycle of the ith user task, STi represents a running cycle of the scrubbing task corresponding to the ith user task, ξi represents the importance of the ith user task, |SΛ| represents a quantity of scrubbing tasks in the FPGA system, and ubound represents a usage rate of the ICAP module in the FPGA system; and
step 102: generating the candidate scrubbing job within each scrubbing cycle based on a scrubbing cycle of a generated scrubbing task for scheduling in a subsequent step; and if there are the plurality of ICAP scrubbing ports in the FPGA system, using the excess ICAP port to dynamically allocate the scrubbing port for each scrubbing task through the graph coloring.
Preferably, in the step 2, an ODS algorithm model includes three groups of 0-1 decision variables, namely X,Y,Z={xlt>ylt, zlt; l=0, 1, . . . , |SE|−1, t=0, 1, 2, . . . , N−1}, where SE represents a job set containing all scrubbing jobs in the scheduling, |SE| represents a length of the job set, and N represents a time length of a current scheduling window, where when the heuristic NDS algorithm is used to resolve the scrubbing conflict, start time of each scrubbing task, namely, scrubbing task scheduling, is calculated by solving an integer planning system including an objective function shown in a formula (1) and a constraint condition shown in a formula (2);
in the formula (1), the reliability function describes a time interval between a scrubbing task and a user task,
and system reliability is optimized by minimizing the reliability function, where Σt tylt describes the start time of each scrubbing task, Sl represents the minimum unit of the time discretization, slmaxi represents a maximum feasible scheduling interval of a scrubbing task l; conflict=Σtzt, where a conflict between scrubbing tasks is optimized by minimizing the conflict function, and zt represents a port conflict for each time node; and κ1 and κ2 represent weights of the two objective functions; and
Preferably, the dynamically deleting some scrubbing tasks in the step 3 includes the following steps:
Preferably, a probability of deleting each scrubbing job is calculated according to a following formula:
Preferably, the dynamically allocating a scrubbing port to each scrubbing task in the step 3 includes the following steps:
Preferably, in the step 4, the iteratively optimizing, based on a local optimal scheduling condition, the scrubbing scheduling generated in the step 3 includes the following steps:
The present disclosure proposes a novel window-based dynamic scrubbing scheduling algorithm. By dynamically scheduling a user task and a scrubbing task, the algorithm disclosed in the present disclosure can reduce scrubbing conflicts of an FPGA scrubbing module and scrub each user task in a timely manner as much as possible. Compared with an existing method, the method provided in the present disclosure greatly reduces area and energy consumption overheads of a hardware circuit, and improves system reliability. Compared with existing technical means, the present disclosure has following innovative points:
The present disclosure will be further described below with reference to specific embodiments. It should be understood that these embodiments are only intended to describe the present disclosure, rather than to limit the scope of the present disclosure. In addition, it should be understood that various changes and modifications may be made on the present disclosure by those skilled in the art after reading the content of the present disclosure, and these equivalent forms also fall within the scope defined by the appended claims of the present disclosure.
As shown in
The step 1 specifically includes the following steps:
Firstly, scrubbing cycle ST of each user task is adjusted by solving an ILP problem in formula (1):
In the formula (1), SWi represents scrubbing time of a scrubbing task corresponding to an ith user task, Ti represents a running cycle of the ith user task, STi represents a running cycle of the scrubbing task corresponding to the ith user task, ξi represents importance of the ith user task, |SΛ| represents a quantity of scrubbing tasks in an FPGA system, and ubound represents a usage rate of an ICAP module in the FPGA system.
A scrubbing cycle of each scrubbing task is obtained by solving the integer programming problem in the formula (1). It is obtained by minimizing the formula (1) that an important user task has a shorter scrubbing cycle, and a secondary user task has a longer scrubbing cycle. It is noted that in the formula (1), an ICAP utilization rate (ubound) in the system is also constrained, such that a quantity of generated candidate scrubbing jobs does not exceed a total ICAP utilization rate of the system, which effectively controls a congestion level between the generated scrubbing jobs. Based on a scrubbing cycle of a generated scrubbing task, a candidate scrubbing job is generated within each scrubbing cycle for scheduling in a subsequent step.
The ODS algorithm converts the scrubbing scheduling into the ILP problem for solution. An ODS algorithm model includes three groups of 0-1 decision variables, namely X,Y,Z={xlt,ylt,zlt; l=0, 1, . . . , |SE|−1, t=0, 1, 2, . . . , N−1}, where SE represents a job set containing all scrubbing jobs in the scheduling, |SE| represents a length of the job set, and N represents a time length of a current scheduling window.
In the formula (2), the reliability function describes a time interval between a scrubbing task and a user task, where system reliability is optimized by minimizing the function, and
Σttylt describes start time of each scrubbing and task, where Sl represents a minimum unit of time discretization, slmaxl represents a maximum feasible scheduling interval of scrubbing task l. conflict=Σt zt, where a conflict between scrubbing tasks is optimized by minimizing the conflict function, and zt represents a port conflict for each time node. κ1 and κ2 represent weights of the two objective functions, and a balance between the two objective functions is achieved by adjusting the weights.
By optimizing the objective function shown in the formula (2), the ODS algorithm can simultaneously optimize the system reliability and reduce conflicts between scrubbing tasks.
To ensure that the ODS algorithm generates the scheduling correctly, it is also required to constrain the objective function. The following formula (3) shows constraints of the system:
A first constraint describes that each scrubbing task can only have one start time. A second constraint describes that the scrubbing task must be scheduled to a legal scheduling interval, where slminl represents a minimum value of a feasible scheduling interval for the scrubbing job. Third and fifth constraints state that each scrubbing task must run continuously for unit time of SWl. A fourth constraint states that a quantity of ICAP ports used simultaneously on each time node cannot exceed zt, which reflects a congestion level on the current time node.
The start time of each scrubbing task, namely, scrubbing task scheduling, can be obtained by solving an integer planning system including the objective function shown in the formula (2) and a constraint condition shown in the formula (3).
(2) NDS algorithm: The NDS algorithm approximately solves a scrubbing task scheduling problem through iteration. Within a scheduling window, scheduling time is first discretized, and each time node represents a time period of Sl. By recording a congestion level on each time node, global scrubbing task congestion level information can be obtained. Based on a discretized time node and a to-be-scheduled scrubbing task, scheduling graph DSSG is first generated. A construction method is as follows: Firstly, each scrubbing task node is connected to a time node within its feasible scheduling interval. Secondly, each adjacent time node is connected. Each time node has congestion level information to help the NDS algorithm avoid a high-congestion time node.
The NDS algorithm is an iterative algorithm, and each iteration includes following specific steps:
In the formula (4), conflictcost represents a result of subtracting a total quantity of available ICAP ports in the system from a total quantity of ICAP ports required by the system during a scrubbing period of a currently considered scrubbing job, reliabilitycost represents a time interval between the currently considered scrubbing job and a corresponding user job, cn represents a congestion level on time node n, Sl represents the minimum unit of the time discretization, slmaxik represents a maximum feasible scheduling interval of a kth scrubbing job of an ith scrubbing task, SWi represents running time of the ith scrubbing task, and m represents a currently investigated time node. It can be observed that pathcost considers both the system reliability and the conflict between the scrubbing tasks. The two indicators are optimized simultaneously by finding the shortest path.
Specific steps for legalizing the scheduling are as follows:
If a congestion level on the current time node is greater than 0, a probability of scrubbing each scrubbing job on the time node is calculated, and the first n scrubbing jobs are deleted. Herein, n represents a result of subtracting the total quantity of available ICAP ports in the system from the quantity of ICAP ports required on the time node.
Each time node of a current scheduling window is iteratively traversed until all conflicts are resolved.
A probability of deleting each scrubbing job is calculated according to formula (5):
In the above formula (5), pni,k represents a probability of deleting the kth scrubbing job of the ith scrubbing task on the nth time node, w1, w2, and w3 respectively represent time since the ith user task is last scrubbed, a total quantity of conflicts between the kth scrubbing job and other scrubbing job, and importance of the kth scrubbing job of the ith scrubbing task, and θ1 and θ2 represent user-customized weights. This calculation method ensures that an important task is scrubbed more frequently, while also preventing hunger scheduling of a secondary task.
If there are the plurality of ICAP scrubbing ports in the system, a legalization step makes full use of the excess ICAP port to dynamically allocate the scrubbing port for each scrubbing task through the graph coloring. An ICAP port allocation algorithm is as follows:
All scrubbing jobs in a scheduling window are arranged in an ascending order based on start time of all the scrubbing jobs.
An ICAP port is allocated to each scrubbing job in an arrangement order. A port allocation strategy for each scrubbing job is as follows: If the scrubbing job is located on a time node on which no ICAP port is allocated to other scrubbing job, any ICAP port in the system is randomly selected. If the scrubbing job is located on a time node on which an ICAP port is already allocated to other scrubbing job, an unused ICAP port is randomly selected from the system.
Because the scheduling legalization step ensures zero congestion on any time node, the ICAP port allocation algorithm mentioned above can be used to ultimately allocate a legal ICAP port for each scrubbing job.
All scrubbing jobs in a scheduling window are arranged in an descending order based on start time of all the scrubbing jobs.
Two adjacent scrubbing jobs are sequentially optimized in an arrangement order.
For the two adjacent scrubbing jobs, a compact operator is first run. If there is a time gap between the two scrubbing jobs, a scrubbing job with earlier start time is rescheduled to make the scrubbing job adjacent to the other scrubbing job to eliminate the time gap.
For the two adjacent scrubbing jobs, running a swap operator, and if the two scrubbing jobs have different scrubbing time, swapping one scrubbing job with longer scrubbing time and the other scrubbing job with shorter scrubbing time to ensure that the other scrubbing job with the shorter scrubbing time always runs later.
The time interval between the scrubbing job and the user job is reduced by reducing the time gap between the scrubbing jobs, making the generated scheduling more reliable.
The above solution is implemented on an SRAM-based FPGA system. For any system with a high reliability requirement, high-quality scrubbing scheduling can be generated for each user task by extracting task information from the system and using the proposed scheduling algorithm.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202310126349.9 | Feb 2023 | CN | national |
This application is the continuation application of International Application No. PCT/CN2023/083564, filed on Mar. 24, 2023, which is based upon and claims priority to Chinese Patent Application No. 202310126349.9, filed on Feb. 16, 2023, the entire contents of which are incorporated herein by reference.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2023/083564 | Mar 2023 | WO |
| Child | 18542792 | US |