This application claims priority under 35 U.S.C. 119(a) to Indian Patent Application No. 2207/CHE/2009 filed Sep. 14, 2002.
The technical field of this invention is video processing in hardware engines.
The field of this invention is the software overheads and the hardware utilization when using a hardware engine to process multiple channels (or contexts) of video and multiple frames of video per channel. The integration of such hardware engines in microprocessors running on high level operating systems demands that the hardware engine should be managed by a software driver.
Conventional drivers generally permit the application software to submit only one frame at a time. The software operating on video streams thus makes multiple submissions, one per frame. When each submission is completed, the hardware typically issues an interrupt once per submission. When systems are managing one or two channels of processing, the overhead of submission and managing the completion interrupt is generally not a problem. Multichannel video systems and aggregators must deal with hundreds of channels. Software models for batch processing these plural channels in hardware engines have not yet been conceived.
The standard driver models in conventional high level operating systems provide seamless interface between the hardware and the software but not designed to maximize the utilization of the hardware. Accordingly, the hardware engine is not utilized as highly as feasible in the prior art.
This invention allows the application software to submit multiple (N) frames belonging to different and/or same channels in one submission. The driver maintains a request queue and serializes requests and manages the hardware utilization. The driver informs the software through a callback function when the entire submission has been serviced.
These and other aspects of this invention are illustrated in the drawings, in which:
This invention is useful in signal processing including video processing where the input and output signals are video files or video streams. Applications of video processing include digital video discs (DVDs) and video players. The processing of video is performed using a hardware video processing engine (VPE). The VPE receives requests from multiple channels for processing one or more functions. A VPE driver provides the interface to an application program enabling use of the VPE for the video processing functions. The functions include de-interlacing and noise filtering of the video streams.
Existing models of the VPE driver provide an interface between an application program and the VPE. In the prior art, the VPE driver interface accepts one channel per request and the application program has to call the driver number of times for each channel. After completion of the request, a prior art VPE generates a call back to the application program usually via an interrupt.
In some embodiments, some of the functioning of the VPE 115 can also be performed by processor 135 in connection with VPE 115. For example, the processor can support the application.
Application 210 places each request in request queue 211. Application 210 may run on VPE 115 or on processor 135.
VPE hardware 230 services requests from driver input buffer 221 one at a time in the order received. After processing of each request, VPE hardware 230 issues a call-back function (Processing Done) to VPE driver 220 indicating the end of processing function. The resulting processed data is stored and serialized in driver output queue 222.
Application 310 issues request R1 at time T0311 to driver/kernel space 320. Referring back to
At the end of busy interval 332 at time T2322, hardware 330 produces the results of the first request. Hardware 330 communicates to driver/kernel space 320 at time T3323. Driver/kernel space 320 communicates these results back to application 310 at time T5313.
During the resulting time, at time T0+T 313 application 310 issues another request R2 to driver/kernel space 320. Driver/kernel space 320 cannot immediately supply this request to hardware 330 because hardware 330 is busy with the prior request. Driver/kernel space 320 communicates a data processing request and the necessary data to hardware 330 at time T4324. Hardware 330 is initially idle during an interval 333 between completion of processing of the first request R1 at time 322 and receipt of the next data processing request at time T4324. As a result of this request, hardware 330 is busy during an interval 334 performing the requested operation. At the end of busy interval 334 at time T6325, hardware 330 produces the results of the second request. Hardware 330 communicates to driver/kernel space 320 at time T7326. Driver/kernel space 320 communicates these results back to application 310 at time T9314. Following completion of servicing the second request R2, hardware 330 is idle during an interval 335.
The time to complete N requests by the VPE is given by:
N*(Ts+Th)
where: Ts is the time for software overhead which is Tsa+Tsd; Tsa is the application to driver overhead; Tsd is the driver overhead; and Th is the actual hardware processing time.
Application 410 places each request in request queue 411.
VPE hardware 430 services the requests received from driver input queue 421. After processing of all M Frames in a request, VPE hardware 430 issues a call-back function (Processing Done) to driver 420 indicating the end of processing function. The resulting processed data is stored in serialized in driver output buffer 425.
Application 510 issues a combined request R1, R2, R3 and R4 at time T0511 to driver/kernel space 520. Referring back to
Hardware 530 is initially idle during an interval 531 before receipt of the data processing request. As a result of this request, hardware 530 is busy during an interval 532 performing the requested operation on the M frames.
During busy interval 532 at time T2522, hardware 530 produces the results of the first request R1. Similarly also during busy interval 532 at time T3523, hardware 530 produces the results of the second request R2. Hardware 530 produces results of the third request R3 at time T4524 and the results of the fourth request R4 at time T5525. Hardware 530 communicates to driver/kernel space 520 at time T6526. Driver/kernel space 520 communicates these results back to application 510 at time T7512.
During this interval time, at time T0+T 513 application 510 issues another request R5 to driver/kernel space 520. Driver/kernel space 520 cannot immediately supply this request to hardware 530 because hardware 530 is busy with the prior requests. Driver/kernel space 320 communicates a data processing request and the necessary data to hardware 330 at time T4324. Hardware 530 is idle during an interval 533 following between completion of processing of the set of first requests R1, R2, R3 and R4. Driver/kernel space 520 dispatches this next request ending idle interval 533 (not shown in
The time to complete N requests using the processing engine of this invention is given by:
T
s
+N*T
h
where: Ts is the time for software overhead which is Tsa+Tsd; Tsa is the application to driver overhead; Tsd is the driver overhead; and Th is the actual hardware processing time. This invention is advantageous over the prior art by requiring the software overhead Ts less frequently. This invention incurs the software overhead Ts only once per N requests rather than on each request.
Table 1 is a comparison of the overhead incurred in the prior art and in this invention. The first row of Table 1 corresponds to the overhead calculations above. The second row of Table 1 shows the hardware utilization factor for N frames.
Table 1 shows the hardware utilization factor in the prior art approaches 1 (100% utilization) only as Th becomes large relative to Ts. Table 1 shows that the hardware utilization factor in this invention approaches 1 as N becomes larger.
Table 2 shows a comparison of hardware utilization of a prior art example product and the predicted hardware utilization of this invention for example processes. Table 2 shows the hardware overhead Th and the software overhead Ts for each of the example tasks.
The last two rows of Table 2 show that as N increases for the same operation, the hardware utilization approaches 100%.
Table 2 shows that the overhead can be decreased up to 35% compared to the prior art. With increase in value of N, the hardware efficiency can be improved towards 100%. The proposed VPE driver also allows more number of VPEs to be controlled by a single central processing unit. If a central processing unit (CPU) controls software scheduling of the VPE engine(s), since the software overhead has come down the same number of VPEs could be controlled with a less powerful CPU. Alternately, the using the same CPU frequency, more VPEs could be controlled. As another alternative, the CPU processing capability saved using this invention could be used for other CPU intensive processing tasks like video encode/decode.
To get maximum utilization using this invention, the VPE hardware should support submission of multiple frames/streams at a time. If hardware does not support multiple submissions, this invention may still be useful. Using this invention will avoid incurring the driver software overhead every submission as required by prior art VPE drivers. This invention avoids incurring the as application to driver software overhead Tsa every frame. Only the software overhead Tsd of programming the hardware registers is present. This allows previously designed VPE engines to use this invention. All new designs of VPE engines should support multiple submission to get the maximum benefit out of this invention.
A further embodiment of this invention reduces the latency of the bundled requests. Rather than require them to service requests in submission order driver 420 could submit requests using a priority system. This reduces latency for real time (high priority) requests at the expense of low priority requests. Latency can be avoided using intermediate call-backs. The request partial results occurring at times T2522, T3523, T4524 and T5525 could be immediately communicated to application 510 rather than being bundled.
Those skilled in the art will recognize that a wide variety of modifications, alterations, and combinations can be made with respect to the above described embodiments without departing from the scope of the present disclosure, and that such modifications, alterations, and combinations are to be viewed as being within the ambit of the inventive concept.
The foregoing description sets forth numerous specific details to convey a thorough understanding of embodiments of the present disclosure. However, it will be apparent to one skilled in the art that embodiments of the present disclosure may be practiced without these specific details. Some well-known features are not described in detail in order to avoid obscuring the present disclosure. Other variations and embodiments are possible in light of above teachings, and it is thus intended that the scope of present disclosure not be limited by this detailed description.
Number | Date | Country | Kind |
---|---|---|---|
2207/CHE/2009 | Sep 2002 | IN | national |