This application claims the benefit of the EP Application No. 04292882.0 filed on Dec. 6, 2005 and 05090251.9 filed on Sep. 1, 2005, the disclosure of which is herewith incorporated by reference in its entirety.
The invention relates to a method and to an apparatus for encoding or decoding two digital video signals arranged in a single-video signal path, e.g. an SDI or HDSDI format video signal path.
In the upper part of the market of professional video cameras a triax system is used for transferring various signals back and forth over a coaxial cable between the camera and a base unit. Transferring multiple signals in different direction over a single cable is feasible because frequency multiplexing is used in which to every type of signal a separate frequency band is assigned.
In the lower part of the market a multi-core adapter solution is currently being used.
In earlier systems all signals were transferred as analogue signals over separate wires or cables. Because no frequency multiplex/de-multiplex is required such solution is much cheaper. However, a disadvantage is that the maximum distance between camera and base unit is restricted to about 100 meters, that the signals on the receiving side need to be equalized and that every additional meter of cable has a negative influence on the signal quality, e.g. the S/N ratio.
In current systems the analogue camera CVBS video output signal (Chroma Video Blanking Signal) is replaced by a standard serial SDI signal (Serial Digital Interface) achieving a maximum data rate of e.g. 270 Mbit/s, 143 Mbit/s, 360 Mbit/s or 540 Mbit/s for SDTV and 1.485 Gbit/s for HDTV over a coaxial cable. The SDI video signal has a word length of 10 bit and a multiplexed 4:2:2 format. Its clock rate is 27 MHz. It is standardized in ANSI/SMPTE 259 M and ANSI/SMPTE 125 M.
At the receiving base unit this SDI signal is re-clocked and/or converted to CVBS format or Y-Cr-Cb format. Thereby a degradation of the quality of the CVBS signal can be avoided. All the other signals in the multi-core cable remain in analog format.
A single SDI or HDSDI (High Definition Serial Digital Interface) connection is designed for carrying a single digital video signal. However, it is desirable to transmit a digital playback video signal as well as a digital teleprompter (TP) video signal from the base unit to a camera.
A problem to be solved by the invention is to provide transmission of two digital video signals, in particular a play-back video signal and a teleprompter video signal from a base unit to a professional camera, via a serial video signal connection designed for transmission of a single video signal.
A one-dimensional adaptive dynamic range compression (ADRC) is used to reduce the data word length of the two video signals to be transmitted via the SDI or HDSDI connection. To one of the two video signals (e.g. the teleprompter signal) a smaller data word length can be assigned than to the data word length of the other one (e.g. the playback signal), whereby the MSB bit (most significant bit) of the SDI connection is not used for carrying bits of the two compressed video signals. As an alternative, two compressed video signals having equal word length can be used whereby one video signal occupies the full range of 32 amplitude levels and the other video signal occupies a range of 31 amplitude levels.
Two compressed 8-bit multiplexed 4:2:2 signals are multiplexed into one 10-bit 4:2:2 stream. The ADRC compression is described e.g. in EP-A-0926898 and is a lossy compression which requires low resources only. The compression has a latency of less then 100 clock cycles and has a constant bit rate. The two compressed video streams fit transparently in a standard 270 Mbit/sec serial SDI video data stream. All other or auxiliary data signals like synchronization, data communication, private data, intercom and audio transport are also embedded in the SDI or HDSDI stream.
For compatibility with analogue recording equipment some analogue signals are also present on the adapter VTR plug that is the standard digital multi-core connector.
Only one SDI/HDSDI downstream and one SDI/HDSDI upstream form the link between camera and base unit. The upstream SDI signal contains two video signals, e.g. teleprompter video and playback video. These video signals are send back to the camera. Playback video, also known as external video, can be used by the cameraman for orientation purposes. Teleprompter video is used by news readers for displaying text on a monitor or any other display.
The advantages of the one-dimensional ADRC compression are:
A disadvantage is that there is some loss of amplitude resolution.
In principle, the inventive method is suited for encoding a first and a second digital video signal using compression, the samples of each of which have a pre-selected original word length, into a combined video signal the code words of which have a pre-selected main word length that is smaller than two times said original word length, said method including the steps:
In principle, the inventive method is suited for decoding a combined video signal including two compressed video signals into a first and a second digital video signal, the samples of each of which have a pre-selected original word length, whereby the code words of said combined video signal have a pre-selected main word length that is smaller than two times said original word length, said method including the steps:—parsing code words of said combined video signal, so as to regain from pre-selected lower bit positions—representing a first word length of each one of said code words—the bits of quantized difference values of said first video signal and from pre-selected upper bit positions—representing a second word length of corresponding ones of said code words—the bits of corresponding quantized difference values of said second video signal, said upper bit positions being arranged adjacent to said lower bit positions, wherein said first and second word lengths can be different, and to regain data words for a minimum amplitude value and a dynamic range value, or for a minimum amplitude value and a maximum amplitude value, of a current data block of said first and of said second digital video signal, whereby the bits of the data words for said minimum amplitude value and said dynamic range value or said maximum amplitude value, respectively, of said current data block each form one bit per code word of said quantized difference values of said first and second video signals, and whereby said dynamic range value represents the difference between said maximum amplitude value and said minimum amplitude value in said current data block;
In principle the inventive apparatus is suited for encoding a first and a second digital video signal using compression, the samples of each of which have a pre-selected original word length, into a combined video signal the code words of which have a pre-selected main word length that is smaller than two times said original word length, said apparatus including:
In principle the inventive apparatus is suited for decoding a combined video signal including two compressed video signals into a first and a second digital video signal, the samples of each of which have a pre-selected original word length, whereby the code words of said combined video signal have a pre-selected main word length that is smaller than two times said original word length, said apparatus including:
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
In
In
A corresponding video line is depicted in
The line arrangement is depicted in more detail in
How the lines are arranged as a PAL or SECAM picture frame is shown in
For NTSC, field 1 and field 2 each contain in total 262.5 lines. The active portion of field 1 starts with full line 21 and ends with full line 262 or half line 263. The active portion of field 2 starts with half line 283 or full line 284 and ends with full line 525. E.g. 19 lines before the start of the active fields 1 and 2 may contain optional video data.
One video line includes 720 Y, 360 Cb and 360 Cr samples. These components are compressed separately. Returning to
The encoder contains a second part (not depicted) which basically corresponds to the first part described above. The first part processes e.g. the playback video signal IVS1 whereas the second part processes the teleprompter video signal IVS2. The second part generates corresponding output signals MIN2, DR2 and COD2. As an alternative, the input signals IVS1 and IVS2 are both processed in a single part in a multiplexed fashion.
The sample playback video signal amplitude differences output from SB are quantized to 4 bits in a 10-bit system, and for the TP signal to 4 bits (or 3 bits) in a 10-bit system. Because the minimum value MIN and the dynamic range value DR or the maximum value MAX for each group or block are required by the decoder, these values are also transmitted. A different bit from the two current 8-bit data words for MIN and DR, or for MIN and MAX, is assigned to the different compressed data words of the current group or block, i.e. the bits of these two values form a fifth bit of the play-back video signal data words and a fifth (or fourth) bit of the TP signal data words. Preferably, these additional bits are arranged at the beginning or at the end of the compressed playback signal data words and the compressed TP signal data words.
The signals MIN1, DR1, COD1, MIN2, DR2 and COD2, as well as any required or desired auxiliary input signals AUXIS are fed to an assembling and multiplexing stage ASSMUX which outputs a corresponding SDI data stream SDIU.
In a corresponding decoder as shown in
As shown in a compression schematic overview in
As shown in a decompression schematic overview in
Following compression and merging of the two streams in ASSMUX or FMT care must be taken that the resulting data words do not corrupt synchronization. In other words e.g. the values ‘0’, ‘1’, ‘1022’ and ‘1023’ must not occur.
In case a 5-bit and a 4-bit stream are merged one bit, e.g. the MSB, could be reserved for corruption prevention. If the constructed code words tend to get a value in the forbidden zone ‘0’ and ‘1’, ‘512’ is added by e.g. setting the MSB to ‘1’.
However, thereby one half of the total range of 1024 values is consumed by corruption prevention. A more effective way of preventing timing corruption is to construct two 5-bit streams of which one occupies a full range of 32 values and the other one occupies only 31 values. The advantage is that only 32 values out of 1024 values are not used for video coding.
This is depicted in
If in the original compression processing values between ‘0’ and ‘15’ occur, ‘32’ must be subtracted. Thereby the forbidden range ‘0’ to ‘15’ is shifted to the range 992 . . . 1007. Note that subtracting ‘32’ is equivalent to adding 992 (=1024−32) since the sum will always be 10 bit with no parity.
Correspondingly, in the decompression processing it is checked whether there occur values in the range 992 . . . 1007. If that is true ‘32’ will be added.
For each line the active video portion of the stream is now multiplexed into three separate streams:
Multiplexed Stream 1
Cb0, Cb1, Cb2, Cb3, . . . , Cb357, Cb358, Cb359
Multiplexed Stream 2
Cr0, Cr1, Cr2, Cr3, . . . , Cr357, Cr358, Cr359
Multiplexed stream 3
Y0, Y1, Y2, Y3, . . . , Y717, Y718, Y719
Per line, every multiplexed stream is partitioned into sample groups. Y is partitioned into 45 groups of 16 samples each, 45*16=720. Cb and Cr are both divided as 20 groups of 18 samples each, 20*18=360. These components add up to 720+2*360=1440 samples per line. The groups or blocks generated are:
(Cb0 . . . Cb17), (Cb18 . . . Cb35), (Cb36 . . . Cb53), etc.; (Cr0 . . . Cr17), (Cr18 . . . Cr35), (Cr36 . . . Cr53), etc.; (Y0 . . . Y15), (Y16 . . . Y31), (Y32 . . . Y47), (Y48 . . . Y63), etc.
All samples from any group are always treated as positive numbers only. From every group the highest and lowest values Ghighest, Glowest are determined, both are 8-bit.
The highest minus the lowest value is the group range Grange=Ghighest−Glowest.
All the samples of the group are scaled to this group range and are quantized to the available levels. Available quantir levels for external-video is ‘15’ and for teleprompter-video ‘14’:
Qsample(i)=(Gsample(i)−Glowest)*(Qlevels−1)/Grange.
Y and C (i.e. Cb and Cr) are quantised using the same resolution for that channel. Each 5-bit channel is build as one bit for constructing the highest/lowest group values (or the lowest group value and the dynamic range value) and four bits for quantized values. Two bits in every C-group block can be left unused, or can be used as an additional data channel. Because the groups have different lengths the colour information for an Y group is taken from either one C group or from two adjacent C groups.
Advantageously, the reconstructed stream is arranged as a components multiplex the same way as defined in the above-mentioned SMPTE standard. The highest and lowest group values are sent bit-wise together with the quantized samples of that group. The arrangement of the highest and lowest group values in the reconstructed stream is like depicted in
The encoding formula for external-video is:
Qsample(i)=Truncate[((Gsample(i)−Glowest)*15)/Grange+0.5]
The encoding formula for teleprompter-video is:
Qsample(i)=Truncate[((Gsample(i)−Glowest)*14)/Grange+0.5]
wherein Grange, Glowest and Gsample(i) have values lying between ‘0’ and ‘255’. Gsample(i) is the sample taken from the original stream.
The decoding formula for external-video is:
Sample(i)
The decoding formula for teleprompter-video is:
Sample(i)
As an alternative embodiment shown in
Preferably, bit PB0 or bit PB4 of the playback signal data words and bit TP0 or bit TP3 of the TP signal data words represent the values MIN1/MIN2 and DR1/DR2 or MAX1/MAX2.
In
The dynamic performance of the inventive ADRC processing can be improved by shifting lowest group values. This is explained in connection with
ADRC makes use of the property that in a small group of consecutive pixel values their amplitudes do not differ much from each other. Usually this is the case. Another approach is that in a small group of pixel values the dynamic amplitude range is small, the highest group sample value is close to the lowest group sample value.
However, if e.g. one sample in such group has a very low or high amplitude value in comparison with the other group samples, a visible column forming effect can occur.
The easiest way to deal with this situation would be to decrease the number of samples per group. But this would also increase the amount of data, i.e. the resulting data rate. An advantageous way to decrease column forming is to shift the lowest group values by half the group size.
For explaining this, the normal ADRC application is depicted first in
As shown in
Again, consider the samples of group 2, S16 . . . S31. The samples S16 . . . S23 are quantized using the highest group2 value and the lowest group1 value, whereas the samples S24 . . . S31 are quantized using highest group2 value and lowest group2 value.
At the decoder the same highest and lowest group values are used as at the encoder.
Advantageously, due to using shifted lowest group values the above-described column artefact effect can be reduced significantly.
The numbers given in this description, e.g. the word lengths, can be adapted to different applications of the invention as required.
Number | Date | Country | Kind |
---|---|---|---|
04292882 | Dec 2004 | EP | regional |
05090251 | Sep 2005 | EP | regional |
Number | Name | Date | Kind |
---|---|---|---|
5157656 | Turudic et al. | Oct 1992 | A |
6009305 | Murata | Dec 1999 | A |
6345390 | Eto et al. | Feb 2002 | B1 |
7564484 | Rotte et al. | Jul 2009 | B2 |
20030142869 | Blaettermann et al. | Jul 2003 | A1 |
Number | Date | Country |
---|---|---|
32 30 270 | Feb 1984 | DE |
0 740 469 | Oct 1996 | EP |
0 848 517 | Jun 1998 | EP |
0 926 898 | Jun 1999 | EP |
Number | Date | Country | |
---|---|---|---|
20070116114 A1 | May 2007 | US |