This disclosure relates to transmission formats for video transmission and more particularly to systems and methods for controlling the transmission of independent but temporally related elementary video streams.
For reasons discussed in the above entitled co-pending application titled SYSTEMS AND METHODS FOR HIGHLY EFFICIENT COMPRESSION OF VIDEO, situations exist where for high compression efficiency the video stream is divided into a Detail portion and a Carrier portion which, while locked together temporally are actually independent streams compressed separately. For discussion purposes herein, the output of the encoding process is an elementary stream and is a detailed series of bits representing the output of the encoder. In the prior art, there would be one elementary video stream for the video and one elementary audio stream for the audio. In the high compression encoder circuit discussed above, there are two elementary video streams which present challenges to transportation of the streams using existing formats.
One challenge is to mesh the two elementary video streams into a single stream so as to be able to leverage existing video production systems and transports. This meshing must be accomplished in a manner to avoid the necessity of working on a case by case basis to modify existing supporting tools and equipment. In some situations, such as MPEG-4, the file container allows a second video stream to be carried in the existing container format.
However, existing video production systems have a fairly structured framework as to how video and audio pipelines are laid out. In that structure there is only room for one video decoder stream. The problem with trying to put two video streams together stems primarily from the way video is packetized. When a frame (a series of pixels forming one picture) of raw video arrives there is certain information about that frame that must be stored with that portion of data. This information is, for example, a timestamp indicating when the associated frame will be presented and a resolution, i.e., the size of the stream of video associated with the frame.
Because of the asynchronous nature of the high compression encoding process, there are different sets of information pertaining to the different timestamps and different resolutions, as well as many other differences of required information between the Carrier and Detail videos. For the most part, the existing transportation formats do not have provisions for concurrently handling dual informational channels.
By multiplexing a plurality of elementary video streams it is possible to combine the streams so that they appear as a single stream to existing transportation protocols. In one embodiment, the Carrier stream retains its timestamp and resolution information and metadata is added that allows for the reconstruction of the missing timestamps and/or resolution information for each frame of the Detail stream. In this manner, the transportation protocol is unaware that a second video stream has been hidden in the first stream and thus two video streams are transported concurrently using a protocol established for a single stream.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.
For a more complete understanding of the present invention, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which:
Before beginning the Detailed Description some terms may be handy to have in mind.
GOP—Group of pictures—a portion of the video stream starting with an I frame which can be seeked to and immediately decoded.
H.264—Standard video compression format used for each of the two video streams which make up a dual temporally-related video stream.
MPEG4—Standard video file format used to hold dual compressed video streams.
NAL—Network Abstraction Layer—concept defined by H.264 to hold a discrete portion of an H.264 video stream, usually a single frame of video.
As shown in
Preceding the pixel specific information at positions 5 through 11 are parameters that describe parts of the high compression process that are required to decode the video. This portion also contains how many NAL units there are in locations 12 through 19. Locations 3 and 4 contain security information. If encryption is used, the keys and other data would be at these locations. Locations 1 and 2 are header fields which identify this stream as a high compression stream. Note that the numbers below the locations show the number of bits in this embodiment with VL meaning variable length and n representing whatever number of bits there happens to be in the data for a particular segment. As discussed above, the Configuration access unit contains the overall stream information, and the Video access unit contains actual pixel data NAL units. Note that for convenience, the two access units (frames) are shown with different numbers for the locations. This is for convenience for discussion and in reality they are the same locations.
Similarly, every access unit stored in the container file must have a timestamp associated with it so that the decoder will know when to display that access unit. Position 26 contains the time offset which can be added or subtracted from the Carrier timestamp in the container to obtain the corresponding time for the “piggyback” Detail pixel frame. This then allows the Carrier and the Detail streams to be transported in a single protocol container while still maintaining different resolutions and timestamps.
Process 304 counts the number of NAL units that are taken up in each of those streams to make that frame of video. The count is called NC for the number of NAL units on the Carrier stream, and called ND for the number of NAL units on the Detail stream. Those counts are then stored in locations 10 and 11 of the Configuration field (
Next, the system deals with the timestamp issue. Process 306 calculates the presentation time PTC for the Carrier access unit and process 307 calculates the presentation time PTD for the Detail access unit. Next, the frames per second value of the video stream is used to convert this absolute time difference into a relative frame count between the two frames. Process 308 then subtracts PTD from PTC and then stores that difference as CTTS into location 26 (
Process 309 determines the size of the Carrier (NC) and the size of the Detail (ND) and these values then go in locations 10 and 11. The values for locations 7 and 8 are known and/or calculated as detailed with respect to
Locations 1 and 21 have two flags each, an S flag and a P flag. The S flag controls input signals indicating whether or not the Access Unit is encrypted. The effect that has is it essentially signals the presence or absence of locations 3, 4 or 23 which contain information used by the decryption process. The P flag controls the presence of the parameter blocks. They are generally always there, but there is the possibility that they might not be.
When process 310 determines that all of the parameters pertaining to the Video frame have been gathered, process 311 constructs the Video frame and writes out all the carrier NAL units and all the detail NAL units in the order they appear in the original H.264 streams. Each NAL unit is preceded by its length LEN (
Process 320 determines if there is a Configuration frame already constructed for this program (or for the settings necessary for this particular frame). If so, then process 31 constructs the Video frame. If not, then process 321 obtains the necessary protocols as shown and discussed with respect to
If desired, the Access Units can be encrypted, for example, using 128-bit AES encryption. In such a situation, there would be a static shared key between the encoder and all the decoders. In such a situation, the decoder key is compiled into the net list of the decoder FPGA image.
In order to allow random access to the encrypted stream, the AES block cipher should be operated in counter mode, such that the 128-bit initialization vector for the AES algorithm is split into two sub-fields. In one embodiment, these sub-fields are locations 3 and 4 (23) of the configuration (video) frames. These fields are called NONCE and byte stream offset (BSO). The BSO is the offset of the start of the packet within the entire encrypted data. This then provides a unique key for every encrypted byte of data since every byte offset is different. The NONCE is created by using the current time the video is encrypted in combination with other factors to yield a one-time unique code.
The NONCE thus can be used for customer fencing since it can also be used as a customer ID. This NONCE value then can be looked at in the field to compare against a table of customer IDs that this decoder belongs to. In one embodiment, there would be a table of customer IDs stored on the decoders. these customer IDs would form a piece of the nonce, and the incoming nonce would be split apart into it's constituent parts and the customer ID extracted, and compared against this table. If that customer ID in the NONCE doesn't match the customer that this decoder belongs to then it can't be decrypted. This then allows for the encoding of content received under one agreement from one supplier and insures that only customers of that supplier can decode the data.
The Video frame (
The Configuration frame (
Carrier reader 401 is a standard MPEG4 reader which extracts the carrier H.264 elementary stream from the first MPEG4 file container produced by encoder 41. Detail reader 402 is a standard MPEG4 reader which extracts the detail H.264 elementary stream from the second MPEG4 file container produced by encoder 41.
Muxer 403 performs the mux function to combine the carrier and detail elementary streams into an unencrypted elementary stream. Optionally, encryptor 50 (
Note the processors discussed herein can run software code and/or could be designed as firmware or hardware depending upon the situation. Also note that while the above discussion has focused on two video streams, the concepts would apply to more than two video streams and would also apply to other types of data streams having the same temporal relationships therebetween as discussed herein.
Process 502 performs nonce handling such that if the packet is a Configuration frame, then the nonce value used for the encryption must be written out in location 3. Process 503 handles encryption and BSO such that the BSO value used for the encryption is written in location 4 (23). The rest of the packet is then run through a standard AES-128 encryption function, using a secret 128 bit key and an initialization vector generated from the nonce and BSO, into a temporary buffer. The length of this buffer is added to the current BSO value for use in the next packet to be encrypted.
Process 504 writes out encrypted data to a temporary buffer which is then written out to the output stream.
Process 602 identifies the type of the output frame. For Configuration frames, the value 0x05 is written to location 2 and for Video frames the value 0x04 is written to location 22. (As discussed above, these are actually the same locations but numbered for clarity of discussion herein). Process 303 gathers the parameters, first by creating an empty parameters list structure in memory. The total number of H.264 NAL units in both the Carrier and Detail streams are written into this structure. If this is a Video frame, and if the Video frame from the Carrier stream is an I-frame, then write the carrier width (CW) and height (CH) into this structure at locations 24 and 25. The reason for this is to allow for the use of different carrier scaling size for each GOP in the Carrier stream.
Process 604 examines the values which have been generated by process 603 and determines if any require writing out to the Video stream. If Carrier width/height are present, or if the carrier/detail NAL counts are not equal to 1, then they must be written out.
The process calculates a flag set for location 6, encoded values for locations 7 through 11, and a total length (T. LEN) for location 5. These values are gathered together into a buffer and written out to the output packet. For Video frames, the same process results in the corresponding locations.
The scaler, if used, for location 9 is either fixed for all the frames or variable as desired. One or more of the flags can be used to tell the decoder to ignore certain streams or to not bother with the offset because there is only one video stream.
Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
This application is related to commonly owned patent application SYSTEMS AND METHODS FOR HIGHLY EFFICIENT VIDEO COMPRESSION USING SELECTIVE RETENTION OF RELEVANT VISUAL DETAIL, U.S. patent application Ser. No. 12/176,374, filed on Jul. 19, 2008, Attorney Docket No. 54729/P012US/10808779; SYSTEMS AND METHODS FOR DEBLOCKING SEQUENTIAL IMAGES BY DETERMINING PIXEL INTENSITIES BASED ON LOCAL STATISTICAL MEASURES, U.S. patent application Ser. No. 12/333,708, filed on Dec. 12, 2008, Attorney Docket No. 54729/P013US/10808780; VIDEO DECODER, U.S. patent application Ser. No. 12/638,703, filed on Dec. 15, 2009, Attorney Docket No. 54729/P015US/11000742 and concurrently filed, co-pending, commonly owned patent applications SYSTEMS AND METHODS FOR HIGHLY EFFICIENT COMPRESSION OF VIDEO, U.S. patent application Ser. No. ______, Attorney Docket No. 54729/P016US/11000746; A METHOD FOR DOWNSAMPLING IMAGES, U.S. patent application Ser. No. ______, Attorney Docket No. 54729/P017US/11000747; DECODER FOR MULTIPLE INDEPENDENT VIDEO STREAM DECODING, U.S. patent application Ser. No. ______, Attorney Docket No. 54729/P018US/11000748; SYSTEMS AND METHODS FOR ADAPTING VIDEO DATA TRANSMISSIONS TO COMMUNICATION NETWORK BANDWIDTH VARIATIONS, U.S. patent application Ser. No. ______, Attorney Docket No. 54729/P020US/11000750; and SYSTEM AND METHOD FOR MASS DISTRIBUTION OF HIGH QUALITY VIDEO, U.S. patent application Ser. No. ______, Attorney Docket No. 54729/P021US/11000751 all of the above-referenced applications are hereby incorporated by reference herein.