Data compression is used extensively in modern computing devices. The use of data compression in computing devices includes video compression, audio compression, and the like. Compression reduces the quantity of data used to represent digital video images, audio file and the like.
Video compression typically operates on groups of neighboring pixels referred to as macroblocks. The macroblocks are compared from one frame to the next and the video compression codec generates a difference within those blocks. The compressed video may then be transmitted and/or stored as a series of reference frames encoding the macroblocks of a particular frame and one or more non-reference frames encoding the macroblock differences between the reference frame and another reference or non-reference frame. The difference between a reference frame and non-reference frame is whether any following frame will use it as a reference.
The frames of audio and video data are sequential and therefore encoding and decoding the compressed data can be done sequentially. The encoding and decoding, however, is typically computationally intensive causing processing latency, needing high communication bandwidth and/or large amounts of memory. Accordingly, there is a continued need for improved techniques for encoding and decoding video data, audio data and the like.
Embodiments of the present technology are directed toward scalable dynamic data encoding and decoding. In one embodiment, an encoding or decoding method includes receiving a frame based data stream. The type of each given frame is determined. If the given frame of data is a reference frame, the frame is encoded or decoded by a main frame processing unit. If the given frame of data is not a reference frame, a determination as to whether an auxiliary frame processing unit is available for decoding the given frame of data. If the given frame of data is not a reference frame and a given auxiliary frame processing unit is available, the frame is encoded or decoded by a given auxiliary frame processing unit. If the given frame of data is not a reference frame and no auxiliary frame processing unit is available, the frame is encoded or decoded by the main frame processing unit.
Embodiments of the present technology are illustrated by way of example and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
Reference will now be made in detail to the embodiments of the present technology, examples of which are illustrated in the accompanying drawings. While the present technology will be described in conjunction with these embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present technology, numerous specific details are set forth in order to provide a thorough understanding of the present technology. However, it is understood that the present technology may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the present technology.
Most conventional parallel computing efforts have been directed at or below the macro-block processing level. However, in conventional computer architectures, a single computing device has more and more computing resources that are available to perform other tasks than what they originally were targeted for. This makes system level or frame/slice level parallelism possible. For example, a typical computing device may include a central processing unit (CPU) or multi-core CPU, a graphics processing unit (GPU), and/or dedicated video decoding hardware. The GPU which was originally targeted to render graphics, may be used to also perform video decoding. Based on real-time usage, one or more of the plurality of computing resources can be dynamically utilized to perform a computation-intensive task together in parallel.
Referring now to
A frame processing unit (FPU) as used herein is any computing resource which can perform frame based data encoding and/or decoding. A FPU can be a dedicated encoder and/or decoder (e.g., codec) 120, a CPU 105 or CPU core 170 plus necessary software running on it, or a GPU or GPU core plus necessary software running on it. Due to the sequential nature of video frame decoding, any sequential-execution hardware processing unit is counted as one FPU. In one implementation, the FPU may be a video frame processing unit (VFPU), audio frame processing unit, audio/video frame processing unit and/or the like.
In a typical computing device, there is one FPU which is referred to herein as the main FPU 170. The other units are referred to herein as auxiliary FPUs 110, 115, 120, 175. The main FPU 170 provides for resource management, bit stream dispatching, reference frame encoding or decoding, and other logic controls. It can be a combination of a dedicated hardwired encoder and/or decoder and a small portion of software running on a CPU core 170. In one implementation, when there is a dedicated FPU (e.g., video decoder 120), it is usually used as the main FPU in connection with the CPU or one of the CPU cores 170. The dedicated FPU is used because the dedicated decoder is likely faster than general purpose processors such as a CPU, CPU core, GPU, GPU core or the like. The task of the main FPU is to decode reference frames and manage system resources dynamically (e.g., dispatching non-reference frames to auxiliary FPUs). An allocated auxiliary FPU 110, 115, 120, 175 receives a bit stream and encodes or decodes it.
The techniques for scalable dynamic encoding and decoding described herein do not use fixed system resources. Instead, the technique allocates FPUs based on real-time usage. Referring now to
If an auxiliary FPU is available, the given non-reference frame is allocated to the given available auxiliary FPU, at 230. At 235, the given available auxiliary FPU decodes the given non-reference frame. After the given non-reference frame is dispatched to the auxiliary FPU, the process continues with the main FPU determining if the given frame is the last frame, at 240. Although the reference frames need to be decoded sequentially, the non-reference frames can be decoded in parallel because no other reference frames depend on them.
If an auxiliary FPU is not available, the given non-reference frame is decoded by the main FPU, at 245. The non-reference frame may be decoded partially as described below with respect to
If one or more macroblocks of the given non-reference frame needs deblocking, the main FPU determines if an auxiliary FPU is available, at 255. If an auxiliary FPU is available, one or more macroblocks of the given non-reference frame are allocated to the given available FPU for deblocking, at 260. At 265, the given available FPU deblocks the one or more macroblocks of the given non-reference frame. After the macroblocks of the given non-reference frame are dispatched to the available FPU, the process continues with the main FPU determining if the given frame is the last frame, at 240.
If an auxiliary FPU is not available, the one or more macroblocks of the given non-reference frame are deblocked by the main FPU, at 270. After the macroblocks of the given non-reference frame are deblocked, the process continues with the main FPU determining if the given frame is the last frame, at 240.
If the main FPU determines that the given frame is the last frame, decoding of the requested data stream is complete, at 275. If the given frame is not the last frame, the process continues with getting a next frame of the bit stream, at 210.
It is appreciated that, in accordance with the above described method, all reference frames are decoded by the main FPU, due to the nature of the sequential processing of video decoding and the like. For an H.264 video streams, an on-spot deblocking method, as discussed below with respect to
If the computing system has only one FPU, the FPU will do the conventional sequential decoding without sacrificing performance. If the computing system includes an auxiliary FPU that is available when a video decoding process needs it, parallel processing takes place and performance is improved. In typical cases, non-reference frames are the majority in a video stream. For example, one typically frame sequence in display order may be I, B, B, P, B, B, P, B, B, P, B, B, P, B, B, P . . . , where I's and P's are reference frames and B's are non-reference frames. By sending B's to one or more auxiliary FPUs and decoding them in parallel, the performance is significantly improved.
Referring now to
If the current macroblock was received in-order and the immediately proceeding macroblock was completely decoded, then the current macroblock is deblocked at 330. In one implementation, the current macroblock is deblocked by calling a routine (Deblock_Available_MBs) for deblocking consecutive available macroblocks as described below with respect to
If the current macroblock was received out-of-order or the immediately proceeding macroblock was not completely decoded, then the current decoded macroblock data is stored, at 345. At 350, the current macroblock is flagged as being decoded but not deblocked, if the current macroblock was received out-of-order or the immediately proceeding macroblock was not completely decoded. In one implementation, a bit value corresponding to the current macroblock is set in the macroblock array. After the current macroblock is flagged as being decoded but not deblocked, the process returns to 315.
The on-spot deblocking method tries to deblock macroblocks as soon as possible. This is very useful for ASO/FMO frames of H.264 and the like. When a macroblock is decoded and it can be deblocked, it will be deblocked and the following consecutive macroblocks, which are decoded but not deblocked, will be deblocked as well. This makes it possible for the next decoded macroblock to be deblocked. Therefore, the data coming out of the motion compensation module doesn't have to be stored and loaded back for deblocking. At the same time, the technique doesn't sacrifice performance for in-order slice/frame decoding.
The techniques described herein achieve on-spot deblocking for a good portion of the macroblocks. Because the macroblock data are already in the working buffer, they don't need to be saved and reloaded. Therefore, traffic on the data bus is reduced and bandwidth requirement is eased. At the same time, memory used to store intermediate data is also reduced because the data does not need to be stored in memory. As an immediate result, the performance is improved.
Referring now to
If the current value of the identifier of the current macro block is not greater than the macroblock number of the last macroblock in the frame, then the main FPU determines if the current macroblock can be deblocked, at 425. If it is determined that the macroblock can be deblocked, then it is deblocked at process 410 In one implementation, the bit value in the array corresponding to the current macroblock is read. If the bit value is set (e.g., equal to one), then the current macroblock can be deblocked. If the current macroblock cannot be deblocked, then the value of last_DB_MB is set to the current value of N decremented by one (N−1) at 430, and the value of last_DB_MB is returned and utilized as described above in conjunction with
Thus, according to the above described methods, macroblocks can be deblocked as soon as they are eligible to be deblocked. On-spot deblocking can be achieved for some macroblocks that are in an out-of-order (e.g., arbitrary slice ordering (ASO), flexible macroblock ordering (FMO)) frame. Accordingly, the amount of bus traffic can be reduced because it is not necessary to transfer all macroblocks in such a frame to and from memory, and the amount of memory consumed is also reduced. Furthermore, computing time can be reduced because decoding and deblocking can be accomplished in parallel—while one macroblock is being decoded, another macroblock can be deblocked.
Referring now to
If the current macroblock was received in-order, it is determined whether the frame is an ASO or FMO frame, at 530. In one implementation, the flag indicating weather the frame is an ASO or FMO is checked to see if it is set. If the current macroblock was received in-order and is not an ASO or FMO frame, then the current macroblock is deblocked, at 535. After the current macroblock is deblocked, it is determined if the current macroblock is the last macroblock, at 540. If the current macroblock is not the last macroblock in the frame, then the process continues at 515. If the current macroblock is the last macroblock in the frame, then the process returns an indication that there is ‘no need for deblocking,’ at 545. In one implementation, the routine returns to process 250 as describe above with regard to
If the current macroblock was received out-of-order, then the flag indicating that the frame is an ASO or FMO may be set, at 550. At 555, the current decoded macroblock data is stored prior to deblocking along with storing deblocking related information, if the current macroblock was received out-of-order or the frame is an ASO or FMO frame. At 560, it is determined if the current macroblock is the last decoded macroblock in the frame. If the current macroblock is not the last decoded macroblock in the frame, then the process continues at 515. If the current macroblock is the last decoded macroblock in the frame, then the process returns an indication that macroblocks ‘need deblocking’ and the identifier of the last deblocked macroblock, at 565. In one implementation, the routine returns to process 250 as describe above with regard to
Referring now to
The techniques described herein advantageously turn sequential processing of frame based data steams and the like to parallel computing by utilizing available computing resources. The native sequential processing is done in sequence. At the same time, tasks are split into sub-tasks which can be processed in parallel. If there is any parallel processing resource available, it is utilized to process the sub-task.
Furthermore, dynamic computing resource management is introduced to make use of every possible resource. With modern computer system, this speeds up encoding and decoding significantly. This design can be used on any computer system and is fully scalable. The scalable dynamic technique can be used for any video, audio, imaging or the like task (e.g., encoding and/or decoding).
The on-spot deblocking technique realizes on-the-fly ASO/FMO detection for the H.264 video decoding protocol, and also improves the decoding speed, eases bandwidth consumption and memory storage size requirements.
The foregoing descriptions of specific embodiments of the present technology have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the present technology and its practical application, to thereby enable others skilled in the art to best utilize the present technology and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the Claims appended hereto and their equivalents.
Number | Name | Date | Kind |
---|---|---|---|
5163136 | Richmond | Nov 1992 | A |
5189671 | Cheng | Feb 1993 | A |
5585931 | Juri et al. | Dec 1996 | A |
5774206 | Wasserman et al. | Jun 1998 | A |
5796743 | Bunting et al. | Aug 1998 | A |
5818529 | Asamura et al. | Oct 1998 | A |
5821886 | Son | Oct 1998 | A |
5881176 | Keith et al. | Mar 1999 | A |
5969750 | Hsieh et al. | Oct 1999 | A |
5990812 | Bakhmutsky | Nov 1999 | A |
6008745 | Zandi et al. | Dec 1999 | A |
6009203 | Liu et al. | Dec 1999 | A |
6023088 | Son | Feb 2000 | A |
6041403 | Parker et al. | Mar 2000 | A |
6047357 | Bannon et al. | Apr 2000 | A |
6144322 | Sato | Nov 2000 | A |
6157741 | Abe et al. | Dec 2000 | A |
6161531 | Hamburg et al. | Dec 2000 | A |
6246347 | Bakhmutsky | Jun 2001 | B1 |
6298370 | Tang et al. | Oct 2001 | B1 |
6317063 | Matsubara | Nov 2001 | B1 |
6339658 | Moccagatta et al. | Jan 2002 | B1 |
6385244 | Morad et al. | May 2002 | B1 |
6441757 | Hirano | Aug 2002 | B1 |
6456340 | Margulis | Sep 2002 | B1 |
6462744 | Mochida et al. | Oct 2002 | B1 |
6480489 | Muller et al. | Nov 2002 | B1 |
6507614 | Li | Jan 2003 | B1 |
6542550 | Schreiber | Apr 2003 | B1 |
6543023 | Bessios | Apr 2003 | B2 |
6552673 | Webb | Apr 2003 | B2 |
6563440 | Kangas | May 2003 | B1 |
6563441 | Gold | May 2003 | B1 |
6577681 | Kimura | Jun 2003 | B1 |
6587057 | Scheuermann | Jul 2003 | B2 |
6621499 | Callway | Sep 2003 | B1 |
6654539 | Duruoz et al. | Nov 2003 | B1 |
6675282 | Hum et al. | Jan 2004 | B2 |
6696992 | Chu | Feb 2004 | B1 |
6738522 | Hsu et al. | May 2004 | B1 |
6751259 | Zhang et al. | Jun 2004 | B2 |
6795503 | Nakao et al. | Sep 2004 | B2 |
6839624 | Beesley et al. | Jan 2005 | B1 |
6847686 | Morad et al. | Jan 2005 | B2 |
6891976 | Zheltov et al. | May 2005 | B2 |
6981073 | Wang et al. | Dec 2005 | B2 |
7016547 | Smirnov | Mar 2006 | B1 |
7051123 | Baker et al. | May 2006 | B1 |
7068407 | Sakai et al. | Jun 2006 | B2 |
7068919 | Ando et al. | Jun 2006 | B2 |
7069407 | Vasudevan et al. | Jun 2006 | B1 |
7074153 | Usoro et al. | Jul 2006 | B2 |
7113115 | Partiwala et al. | Sep 2006 | B2 |
7113546 | Kovacevic et al. | Sep 2006 | B1 |
7119813 | Hollis et al. | Oct 2006 | B1 |
7129862 | Shirdhonkar et al. | Oct 2006 | B1 |
7132963 | Pearlstein et al. | Nov 2006 | B2 |
7158539 | Zhang et al. | Jan 2007 | B2 |
7209636 | Imahashi et al. | Apr 2007 | B2 |
7230986 | Wise et al. | Jun 2007 | B2 |
7248740 | Sullivan | Jul 2007 | B2 |
7286543 | Bass et al. | Oct 2007 | B2 |
7289047 | Nagori | Oct 2007 | B2 |
7324026 | Puri et al. | Jan 2008 | B2 |
7366240 | Chiang Wei Yin et al. | Apr 2008 | B2 |
7372378 | Sriram | May 2008 | B2 |
7372379 | Jia et al. | May 2008 | B1 |
7404645 | Margulis | Jul 2008 | B2 |
7432835 | Ohashi et al. | Oct 2008 | B2 |
7496234 | Li | Feb 2009 | B2 |
7606313 | Raman et al. | Oct 2009 | B2 |
7627042 | Raman et al. | Dec 2009 | B2 |
7660352 | Yamane et al. | Feb 2010 | B2 |
7724827 | Liang et al. | May 2010 | B2 |
7765320 | Vehse et al. | Jul 2010 | B2 |
7912298 | Kato et al. | Mar 2011 | B2 |
8004569 | Yu et al. | Aug 2011 | B2 |
8009673 | Gandal et al. | Aug 2011 | B2 |
8102399 | Berman et al. | Jan 2012 | B2 |
8477852 | Jia | Jul 2013 | B2 |
8502709 | Jia | Aug 2013 | B2 |
8849051 | Jia | Sep 2014 | B2 |
20010010755 | Ando et al. | Aug 2001 | A1 |
20010026585 | Kumaki | Oct 2001 | A1 |
20020063807 | Margulis | May 2002 | A1 |
20020085638 | Morad et al. | Jul 2002 | A1 |
20020094031 | Ngai et al. | Jul 2002 | A1 |
20020135683 | Tamama et al. | Sep 2002 | A1 |
20030043919 | Haddad | Mar 2003 | A1 |
20030067977 | Chu et al. | Apr 2003 | A1 |
20030142105 | Lavelle et al. | Jul 2003 | A1 |
20030156652 | Wise et al. | Aug 2003 | A1 |
20030179706 | Goetzinger et al. | Sep 2003 | A1 |
20030191788 | Auyeung et al. | Oct 2003 | A1 |
20030196040 | Hosogi et al. | Oct 2003 | A1 |
20040028142 | Kim | Feb 2004 | A1 |
20040056787 | Bossen | Mar 2004 | A1 |
20040059770 | Bossen | Mar 2004 | A1 |
20040067043 | Duruoz et al. | Apr 2004 | A1 |
20040081245 | Deeley et al. | Apr 2004 | A1 |
20040096002 | Zdepski et al. | May 2004 | A1 |
20040130553 | Ushida et al. | Jul 2004 | A1 |
20040145677 | Raman et al. | Jul 2004 | A1 |
20040158719 | Lee et al. | Aug 2004 | A1 |
20050008331 | Nishimura et al. | Jan 2005 | A1 |
20050021811 | Roelens | Jan 2005 | A1 |
20050046700 | Bracke | Mar 2005 | A1 |
20050123274 | Crinon et al. | Jun 2005 | A1 |
20050147375 | Kadono | Jul 2005 | A1 |
20050182778 | Heuer et al. | Aug 2005 | A1 |
20050207497 | Rovati et al. | Sep 2005 | A1 |
20060013321 | Sekiguchi et al. | Jan 2006 | A1 |
20060056513 | Shen et al. | Mar 2006 | A1 |
20060056515 | Kato et al. | Mar 2006 | A1 |
20060067582 | Bi et al. | Mar 2006 | A1 |
20060083306 | Hsu | Apr 2006 | A1 |
20060133500 | Lee et al. | Jun 2006 | A1 |
20060176960 | Lu et al. | Aug 2006 | A1 |
20060215916 | Kimura | Sep 2006 | A1 |
20060256120 | Ushida et al. | Nov 2006 | A1 |
20060271717 | Koduri et al. | Nov 2006 | A1 |
20070006060 | Walker | Jan 2007 | A1 |
20070288971 | Cragun et al. | Dec 2007 | A1 |
20080162860 | Sabbatini et al. | Jul 2008 | A1 |
20080240157 | Muraguchi et al. | Oct 2008 | A1 |
20080253694 | Berman et al. | Oct 2008 | A1 |
20080317138 | Jia | Dec 2008 | A1 |
20090196356 | Houki | Aug 2009 | A1 |
Number | Date | Country |
---|---|---|
101017574 | Aug 2007 | CN |
06276394 | Sep 1994 | JP |
09261647 | Oct 1997 | JP |
2000049621 | Feb 2000 | JP |
1020030016859 | Mar 2003 | KR |
200520535 | Dec 2003 | TW |
200428879 | Nov 2004 | TW |
0124425 | Apr 2001 | WO |
Entry |
---|
Sullivan et at.; Rate-distortion optimization for video compression; Signal Processing Magazine, IEEE (vol. 15 , Issue: 6 ) Nov. 1998; pp. 74-90; IEEE Xplore. |
Roitzsch; Slice-balancing H.264 video encoding for improved scalability of multicore decoding; Proceeding EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software 2007, pp. 269-278; ACM Digital Library. |
Miska Hannuksela, Picture Decoding Method, USPTO Provisional Application filed Feb. 18, 2003; U.S. Appl. No. 60/448,189. |
Ting-Kun Yeh et. al, Video Decoder, USPTO Provisional Application filed Dec. 3, 2003; U.S. Appl. No. 60/526,294. |
English Translation of Office Action for Chinese Patent Application No. 200810212373.X, Entitled: Decoding Variable Length Codes in JPEG Applications. |
Number | Date | Country | |
---|---|---|---|
20100150244 A1 | Jun 2010 | US |