The described embodiments relate to image processing processes, and more particularly, to encoding and decoding multiple image frames into a reduced number of multiplexed frames.
Any existing image data transmission or processing infrastructure typically has a limited data bandwidth. This bandwidth is normally sufficient to serve the functions that the infrastructure was designed for (e.g., providing video at a given resolution or quality). However, once enhanced or new functions are required (e.g., video of a higher image quality is desired to be transmitted), the infrastructure typically needs to be upgraded or replaced. This upgrade or replacement process may involve changing hardware, software and/or network connections between components (especially if higher data bandwidth is necessary with the new functions and the system algorithms and protocols remain unchanged).
For example, digital cinematic video is typically provided at a frame rate of 24 frames per second (fps), with each frame traditionally being provided at a resolution of what is known as 2K (2048×1080 or 2.2 megapixels) or 4K (4096×2160 or 8.8 megapixels). To display cinematic video at this frame rate and resolutions in a theatre, the theatre typically contains infrastructure that has sufficient bandwidth to transmit the cinematic video to the display device (e.g., a projector).
Recent developments in camera technology have allowed the capture of digital cinematic video at higher frame rates (commonly known as HFR video). For example, the frame rates of these videos may be provided at 48 fps, 60 fps, or even 120 fps at a given spatial resolution. With HFR video, the amount of data to be transmitted to the display device is greatly increased.
As technology evolves, digital cinematic video may begin to be provided at image spatial resolutions higher than 2K and 4K. Even at existing frame rates, these higher-resolution video streams may also increase the amount of data to be transmitted to the display device.
Moreover, multiple-view video streams are gaining in popularity (e.g., stereoscopic videos streams providing a three-dimensional (3D) viewing experience). These types of video streams require higher transmission bandwidth than normal single view (e.g., two-dimensional (2D)) video streams.
Existing infrastructure at theatres may not provide sufficient bandwidth to allow transmission of these newer types of video streams to the display device. To upgrade the existing infrastructure, especially the hardware, software and/or system connection bandwidth is costly or undesirable. Accordingly, there is a need for alternatively improving the existing methods and systems of encoding and decoding image frames in an image stream, to allow for transmission of image frames of the new types of video streams over existing bandwidth-limited infrastructure.
In one aspect, some embodiments of the invention provide a method of dynamic frame packing, the method comprising:
In another aspect, some embodiments of the invention provide a method of decoding a multiplexed frame, the method comprising:
In another aspect, some embodiments of the invention provide a method of delivering multiplexed frames, the method comprising:
In another aspect, some embodiments of the invention provide a method of displaying multiplexed frames, the method comprising:
In another aspect, some embodiments of the invention provide a system for transmitting a multiplexed frame, the system comprising:
In another aspect, some embodiments of the invention provide a system for decoding a multiplexed frame for display, the system comprising:
In another aspect, some embodiments of the invention provide a method of generating a nested multiplexed frame, the method comprising:
In another aspect, some embodiments of the invention provide a method of decoding a nested multiplexed frame, the method comprising:
A preferred embodiment of the present invention will now be described in detail with reference to the drawings, in which:
It will be appreciated that numerous specific details are set forth in order to provide a thorough understanding of the example embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the embodiments described herein. Furthermore, this description and the drawings are not to be considered as limiting the scope of the embodiments described herein in any way, but rather as merely describing the implementation of the various embodiments described herein.
Particularly, the embodiments described herein relate to the field of image processing, and various drawings have been provided to illustrate the transformation of image data. It will be understood that the drawings are not to scale, and are provided for illustration purposes only.
The embodiments of the systems and methods described herein may be implemented in hardware or software, or a combination of both. However, preferably, these embodiments are implemented in computer programs executing on programmable computers each comprising at least one processor (e.g., a microprocessor), a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. For example and without limitation, the programmable computers (e.g., the various devices shown in
Each program is preferably implemented in a high level procedural or object oriented programming and/or scripting language to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Each such computer program is preferably stored on a storage media or a device (e.g. ROM or magnetic/optical diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The subject system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
Furthermore, the system, processes and methods of the described embodiments are capable of being distributed in a computer program product comprising a computer readable medium that bears computer usable instructions for one or more processors. The medium may be provided in various forms, including one or more diskettes, compact disks, tapes, chips, wireline transmissions, satellite transmissions, internet transmission or downloadings, magnetic and electronic storage media, digital and analog signals, and the like. The computer useable instructions may also be in various forms, including compiled and non-compiled code.
Referring to
As will be understood, the various illustrated software modules may, in some embodiments, be implemented on one or more computer servers available at the pre-processing facility 102. Such server(s) may contain at least one processor and at least one memory storing various software modules containing instructions that when executed by the at least one processor, causes the at least one processor to perform acts of the illustrated software modules.
The pre-processing facility 102 may, optionally, also be provided with a compression module 108 (shown in dotted outline) that contains instructions that cause the processor to compress an image frame stream. The compression module may or may not be a part of the existing infrastructure. Further, the pre-processing facility 102 may be provided with various mechanisms to allow the transmission or storage of data. For example, such mechanisms may include network interface cards (e.g., for Ethernet, WiFi, etc.) or high-speed data ports (e.g., HD-SDI, HDMI, USB, Firewire, etc.) that allow a multiplexed image frame stream 120 to be transmitted over a communications network or stored on an external storage medium. These various mechanisms may be activated and/or controlled by, for example, a communication module (not shown) at the pre-processing facility 102.
It will be understood that in various embodiments, the compression module 108 may be provided on a different device from the pre-processing facility in
The encoding module 106 may be a stand-alone hardware device or a logical software component that contains the instructions to perform the method of encoding of image frames described below. Viewed from a high-level, the encoding module 106 may identify frames in the image frame stream received at the receiving module through a wider bandwidth channel 152, and then encode the original image frames into multiplexed frames. The multiplexed frames can then be transmitted through a narrower bandwidth channel 156 as a multiplexed image frame stream 120 that requires less bandwidth during transmission, so that the multiplexed image frame stream 120 can be provided to, and processed at, the display facility 140. The acts performed by the encoding module 106 will be described in greater detail below in relation to
If the compression module 108 is present on the pre-processing facility 102, the multiplexed image frame stream may be compressed prior to being transmitted and provided to the display facility 140. Such compression may be performed according to known video compression algorithms or specified by the existing system infrastructure. For example, if the compression module and the multiplexed image frame stream 120 are to be constructed according to the Digital Cinema Initiatives (DCI v1.0) specification, the multiplexed image frame stream would be compressed according to the ISO/IEC 15444-1 “JPEG2000” compression standard.
It will be understood that the acts performed by the encoding module 106 may be performed offline. That is, the encoding of an image frame stream to form a multiplexed image frame stream 120 need not be performed in real-time. Instead, the image frame stream may be processed by the encoding module 106 without any particular time constraint, and the resultant multiplexed image frame stream 120 may be transferred/transmitted to the display facility 140 after the entire multiplexed image frame stream 120 has been created.
Once created, the multiplexed image frame stream 120 may be provided to the display facility 140 (as illustrated, for example, by the dotted arrow from the compression module 108 to the decompression module 144 of the display facility 140). It will be understood that the multiplexed image frame stream 120 may be provided to the display facility 140 in any number of ways. For example, the multiplexed image frame stream 120 may be provided by way of a computer readable medium (such as a hard disk, optical disc or flash memory), which is then loaded onto a storage device (not shown) at the display facility 140. Additionally or alternatively, the frame stream 120 may be transmitted to the display facility 140 via computer network communications (e.g., the data may be transmitted over the Internet). Other methods of providing the frame stream 120 to the display facility 140 may also be possible.
The display facility 140 may include: a decoding module 146 that is able to decode the multiplexed frame stream 120 (e.g., so as to restore the frame stream for transmission through a wider bandwidth transmission channel 154); and a display device 148 (e.g., a projector) that displays the resultant video stream produced from the decoding module 146.
If the multiplexed image frame stream 120 has been compressed (e.g., by compression module 108) the display facility 140 may also optionally include a decompression module 144 (shown in dotted outline) that is configured to decompress the compressed multiplexed image frame stream 120 before the multiplexed image frame stream is provided to the decoding module 146. As discussed, the decompression may be performed according to the JPEG2000 standard if, for example, the multiplexed image frame stream 120 is provided according to the Digital Cinema Initiatives (DCI v. 1.0) specification. In variant embodiments, the decompression module 144 may be provided within the decoding module 146.
The decompression module 144 may contain a video or network interface component that allows the decompression module 144 to transmit the multiplexed image frame stream 120 to the decoding module 146. As will be understood, the decompression module 144 may contain a processor and a memory storing instructions that instruct the processor to interact with the video or network interface component.
In some embodiments, the decompression module 144 may be capable of processing the multiplexed image frame stream 120 in real-time without first storing the multiplexed image frame stream 120. Additionally or alternatively, the display facility 140 may include a storage device (not shown) that stores the multiplexed image frame stream 120, prior to the frame stream 120 being processed by the decompression module 144. Such a storage device may include a storage medium (such as a hard disk) to store the multiplexed image frame stream 120. In various embodiments, the decompression module 144 may be provided as an application executable on the storage device.
A bandwidth-limited transmission channel 150 (shown in
Due to the encoding performed by the encoding module 106 at pre-processing facility 102, the multiplexed image frame stream 120 may contain video data that is at a lower frame rate than that of the original HFR video. That is, because more than one original image frames of the HFR video can be encoded and compressed into one multiplexed frame, the frame rate for the stream of multiplexed frames 120 would be lower. Accordingly, the multiplexed image frame stream 120 may able to transmit HFR frame data across the bandwidth-limited communications channel 150. The multiplexed image frame stream 120 may then be restored into a HFR video stream by decoding the encoded frames in the multiplexed image frame stream 120, before the multiplexed image frame stream 120 is displayed by the display device 148.
The decoding module 146 may be configured to perform the decoding of multiplexed frames. As will be understood, the connection 154 between the decoding module 146 and the display device 148 may vary. For example, the decoding module 146 may be provided as a separate computing device that has a high-bandwidth connection to the display device 148. Additionally or alternatively, the decoding module 146 may be provided as an add-on module of the display device 148. The various acts performed during the process of decoding a multiplexed frame will be described in greater detail below with respect to
Referring to
As noted, at least some of these acts may be performed by the encoding module 106 in the pre-processing facility 102 shown in
Step 202 involves identifying a plurality of image frames for a multiplexed frame that is to be created. This may involve selecting the plurality of image frames from an original image frame stream such as an HFR video stream or multiple-view video stream. For example, the HFR video stream may be cinematic or television video that resulted from HFR footage captured using HFR-capable cameras (such as cameras manufactured by RED®, for example). In the embodiments discussed herein, the plurality of image frames selected from the image frame stream may be a pair of image frames; however, it will be understood that any number of image frames may be selected from an original image stream to be encoded into a multiplexed frame.
The selected image frames may be consecutive image frames in an original image frame stream (this scenario is generally illustrated, for example, at
Referring simultaneously to
At step 204 in
At step 206, a resolution of each of the plurality of image frames is modified to generate a plurality of corresponding processed frames. This modification may include any processing that can modify the resolution of an image frame. For example, this processing may include modifying an aspect ratio of the image frame, vertically compressing the image frame, and/or horizontally compressing the image frame. The modification of the resolution of the image frames may be performed by image resampling algorithms, for example.
Step 206 may also include modification of image frame resolution to a portion of its original image frame resolution. For example, consider the scenario of when a standard 2K frame with resolution 2048 by 1080 pixels (i.e., having an aspect ratio of 1.90:1) is used to convey a 2.19:1 aspect ratio original movie. If the full width of the original movie frames is to be displayed within the 2K frame, there may be 72 lines of black fillings at both the top and bottom of the image (e.g., such that only the middle 936 rows of pixels in the 2K frame will be used to convey image information). In such scenario, only the actual image portion may be encoded by cropping and/or resizing the 2K image frames to generate the processed image frame. The image aspect ratio or the number of vertical lines of actual original image content may need to subsequently be stored and transmitted through metadata of the multiplexed frame, which will be described in the latter sections.
Referring again to
At step 208, the encoding module 106 may generate masking data that indicates how an output frame can be produced from one of the plurality of the processed frames and another of the plurality of the processed frames (circle number 4 in
In the example of
Reference is now to
Additionally or alternatively, the differences between each successive frame may be identified from an analysis of the original image frames 302 (as opposed to the processed frames 304, 304′). For example, identifying differences between the second original frame and first original frame may allow for a more accurate determination of the regions that are different between the successive frames because the original frames may contain image data that is of a higher resolution. The location of the differences can then be scaled so that the differences can be identified on the processed frames 304, 304′.
After these particular locations 402 have been identified, the masking data 316 can be created. In various embodiments, the masking data 316 may be considered to be a grid of data, with each position in the grid corresponding to a region on both of processed frames 304 and 304′. For example, in an example scenario in which processed frames 304 and 304′ have been divided into 12 regions horizontally and 8 regions vertically, a 12 by 8 grid of masking data 316 may be generated, with the position of a value in the masking data (e.g., row and column) mapping to a region at the same position (e.g., row and column) in processed frames 304, 304′. The value at each position of the masking data may indicate how processed frame for F2304′ differs from processed frame for F1304. In the example masking data 316, binary values of (‘0’ or ‘1’) are used, where ‘0’ indicates that the image data does not change, and ‘1’ indicates that the image data does change, between the processed frame for F1304 and processed frame for F2304′. Accordingly, all the positions in the masking data contain a ‘0’, except for positions 404 in masking data 316 that correspond to regions 402 of processed frame 304′ that show the regions where the golf ball was in frame F1, and where the golf ball has moved to in F2. The two ‘1’s in the masking data 316 thus show how the differences between the processed frame for F2304′ and the processed frame for F1304 can be identified.
It will be understood that the example masking data is shown as a 12 by 8 grid of data for illustration purposes only. In various alternate embodiments, the exact size and/or number of blocks in the masking data may be determined by another algorithm. For example, the size of masking data region may be set to be proportional to the first or second processed image frames.
As will be explained below with respect to the decoding of a multiplexed frame, the masking data value at a given position of the grid specifies how the image data at the same region on both processed frame 304 and 304′ can be combined to arrive at an output frame. For example, the masking data value may indicate the weighting to be given to one or the other of the processed frames 304 and 304′ for that particular region on the output frame. In the example masking data shown in
In further embodiments, the masking data 316 may not contain only binary values, but fractional values between 0 and 1 as well. If the same size of masking data 316 is required to decode the processed image frames 304 or 304′, then the masking data 316 can be interpolated into the same size as the processed image frames 304 or 304′. The exact value within the range may then indicate the proportions of how the first and second frames are to be blended together for a given region (e.g., a low value may indicate that more of the first frame is to be used, whereas a high value may indicate that more of the second frame is to be used.)
Once the masking data 316 has been generated, the process may proceed to step 210 (circle number 5 in
Referring again to
During the partitioning, at least one of the plurality of frame components may be provided with a guard area 314 that overlaps with a portion of another of the plurality of frame components that is adjacent to it. Having a guard area that is overlapping (for example, between frame components 312) may allow for better reconstruction of the processed frame 304 during the decoding of the multiplexed frame. The size of guard area may be a constant pixel width or a predefined percentage of the original frame components. Within the guard area during the decoding, the two combining frame components are gradually blended together to avoid any possible visible seam appearing in the decoded image frames.
It will be understood that the relative sizes of frame components 312, 312′ need not be the same, i.e., sizes of the frame components resulting from the partitioning of the processed frame for F2304′ may differ from sizes of the frame components resulting from the partitioning of the processed frame for F1304. Further, whereas the processed frame for F1304 may be partitioned according to one algorithm, the processed frame for F2304′ may be partitioned according to another algorithm, as their corresponding frame components 312, 312′ need not have the same image properties.
Optionally, one or more of the frame components 312, 312′ (and/or the processed frames 304, 304′, as the case may be) may be rotated prior to their insertion into the multiplexed frame. This may be performed at step 212 (shown in dotted outline), which involves at least one of the plurality of frame components 312 for the processed frame for F1304 being rotated to generate a corresponding one or more rotated frame components 312 (circle number 6 in
At step 214, the plurality of processed frames 304, 304′ (e.g., their corresponding frame components 312, 312′) and the masking data may be combined to form the multiplexed frame 306 (circle number 7 in
To assist the decoding module 146 to decode the multiplexed frame 306, the multiplexed frame 306 may also store metadata 328 (shown in dotted outline attached to the multiplexed frame 306) specifying how a plurality of output frames can be generated from the plurality of processed frames 304, 304′. As illustrated, the metadata 328 is stored separately from the masking data 316, although it will be understood that in various embodiments, the metadata 328 may include the masking data 316 as well. The metadata 328 may be stored in a dedicated region or any position of the multiplexed frame 306, including the position of the masking data 316 shown in multiplexed frame 306. Additionally or alternatively, at least a portion of the metadata may be stored by way of an invisible watermark of the multiplexed frame.
As used herein, an “invisible watermark” can relate to information embedded into existing data on a frame that may be undetectable during normal processing (e.g., this information may be imperceptible to human viewers when a frame is being viewed), but may otherwise be detectable under a specific process designed for identifying such information. In terms of image frames and multiplexed image frames, the embedded information may, for example, be encoded onto an original image frame, a processed frame, a frame component, masking data, metadata, guard data, or any component in a nested frame.
One example of a watermark can be a group of image pixels in which the intensity values in an original frame could be slightly modified. For example, the intensity values of a pre-defined spatial distribution pattern on the original frame may be modified based on a mathematical transform calculation to embed the information of the watermark. Such slight modification of the intensity values would likely not be perceivable by a human viewer, but would allow the embedded information to be subsequently retrieved (provided the decoder is aware of the pre-defined spatial distribution pattern and the mathematical transform calculation). In various embodiments, the encoded information could relate to metadata, copyright information or other small amount(s) of pertinent information.
A digital watermarked frame component may be considered different from a processed frame component for multiplexing in that it does not increase the data size of the frame or the bandwidth used to transmit the frame by a carrier signal. For example, an invisible digital watermark may be considered to be a trade-off between an amount of visually imperceptible image quality and the ability to carry an extra amount of information within the same bandwidth.
The metadata 328 may include, for example, the aspect ratio of the original image content or the vertical number of lines of actual image content (e.g., as would be the case for the scenario mentioned above if the processed image frames are only a portion of the input original image frames that was cropped, for example). The metadata 328 may additionally or alternatively include the mapping data that can be used to derive the positioning or size information of the frame components 312, 312′ (and/or processed frames 304, 304′, as the case may be) in the multiplexed frame 306. For example, in the example multiplexed frame 306 shown in
In various embodiments, the mapping data may be a code that indicates a pre-defined layout of the processed frames within the multiplexed frame. For example, when combining the plurality of processed frames and the masking data to form the multiplexed frame, each of the processed frames may be positioned within the multiplexed frame according to a pre-defined layout that is selected from amongst a plurality of different pre-defined layouts (such that the code indicates which pre-defined layout is selected for the particular multiplexed frame).
The plurality of different pre-defined layouts may each correspond to a different type of scene that may be shown in the image frames. For example, it may be the case that certain layouts are better suited to be used with certain types of scenes. Since the encoding of the image frames can be performed offline, and the types of scenes shown in the image frames can be known before encoding, the pre-defined layout can be selected based on the scene shown in the frames to be encoded. A code representing the pre-defined layout can then be provided in the metadata of the multiplexed frame.
The metadata 328 may also include various information about the original image frames 302 so that various aspects of the original image frame 302 may be restored from the processed frames 304, 304′. For example, this information may include an aspect ratio and/or frame size of one or more of the original image frames 302. Additionally or alternatively, the metadata 328 may also include a resolution ratio between the processed frames 304, 304′ (The resolution ratio may be stored in situations where the resolutions of the processed frame for F1304, and the processed frame for F2304′ are different.). Further, the frame-multiplexing parameters discussed earlier with respect to step 204 in
As will be understood, the information that can be stored in the metadata area 328 may be limited. As a result, the information to be encoded and stored as metadata in the multiplexed frame may have to be carefully selected. For example, in some embodiments, metadata 328 may only stores a frame multiplexing parameter that indicates frame types (e.g., 2D or 3D), while the frame components' sizes, aspect ratio and boundary information can be pre-defined or implicitly derived by the decoder using the image boundary detection algorithms. For example, one such boundary detection algorithm can optimally locate the boundary between two image components by conducting Bayesian posteriori estimation.
As will be understood, the steps of selecting image frames 302 for generating a multiplexed frame 306 according to the method of
A multiplexed frame may generally be compatible with existing infrastructure at a display facility because the frame format of a multiplexed frame will be the same as the frame format for original image frames. A multiplexed frame may also generally survive compression. As a result, a multiplexed frame stream may be able to be processed using existing bandwidth-limited infrastructure.
Referring to
At step 502, the decoding module 146 at the display facility 140 may receive a multiplexed frame comprising a plurality of processed frames and masking data. The multiplexed frame 306 may be part of a multiplexed image frame stream 120 that is transmitted from a decompression module 144 through a bandwidth-limited communications channel 150, with the multiplexed image frame stream 120 having originally been encoded at the encoding module 106 of pre-processing facility 102.
Referring simultaneously to
At step 504, the frame-packing parameters may be identified in the multiplexed frame (circle number 2 in
At step 506, the method involves unpacking the multiplexed frame 306 to identify the plurality of processed frames and the masking data within the multiplexed frame 306. In the example scenario of
The multiplexed frame 306 may also include metadata 328 (shown in dotted outline as attached to multiplexed frame 306) specifying how the plurality of output frames can be generated from the plurality of processed frames. The discussion above regarding how the metadata 328 is stored in a multiplexed frame 306 during the encoding process may also be applicable in this context. That is, during the decoding process, the decoding module 146 may be configured to identify the metadata 328 in the multiplexed frame 306 according to how the metadata 328 was stored within the multiplexed frame 306. For example, if the metadata 328 is stored as a watermark of the multiplexed frame 306, the decoding module 146 may be configured to identify the metadata 328 in the watermark of the multiplexed frame 306.
To identify the position of each of the frame components (and/or processed frames, as the case may be) during the unpacking step, the decoding module 146 may refer to mapping data that is stored in the metadata 328. As discussed above, the mapping data may identify the positioning of the processed frames and/or the frame components stored in the multiplexed frame 306. For example, the mapping data may identify a pre-defined layout that specifies the position of one or more of the processed frames and/or the masking data within the multiplexed frame 306. The pre-defined layout may then be used when identifying the plurality of processed frames and the masking data. Also as discussed above, since the pre-defined layout may be selected from a plurality of different pre-defined layouts, the decoding module may be provided with access to the different pre-defined layouts that may be used to encode image frames into multiplexed frames 306.
In another example, instead of, or in addition to, referring to a pre-defined layout to identify the processed frames and masking data in a multiplexed frame 306, the position of at least one of the processed frames and/or the masking data within the multiplexed frame may be identified according to an image boundary detection analysis of the multiplexed frame. For example one of such algorithms can optimally locate the boundary between any two image components by conducting Bayesian posteriori estimation.
After the frame components (or the processed frames) have been identified, any rotated frame components in the multiplexed frame that were rotated during encoding may be restored to their original orientation. This may be performed at step 508 (shown in dotted outline). At step 508, the method may include rotating the one or more rotated frame components 312, 312′ (circle number 4 in
At step 510, one or more frame components 312 may be assembled together to generate the processed frame 304′ (circle number 5 in
As noted above in the discussion regarding the encoding process, during the partitioning of a processed frame into frame components 312, the frame components 312, 312′ may be provided with a guard area 314, 314′ that overlaps with a portion of another of the frame components 312, 312′ for the same processed frame 304, 304′. Accordingly, in the example scenario of
To generate a plurality of output frames from the plurality of identified processed frames 304, 304′, a first processed frame (e.g., processed frame for F1304) may first be identified as the base frame that will be the first output frame. Subsequent output frame(s) can then be generated by applying the masking data 316 to the first output frame, and then overlaying the first output frame with region(s) of a subsequent processed image (e.g., processed frame for F2304′) according to a corresponding value in the masking data 316.
However, before the overlaying can be performed, since the resolutions of the reconstituted processed frames 304, 304′ may be different, the resolution of any subsequent processed frame (e.g., processed frame for F2304′) may be modified so as to be the same as the resolution of the base first output frame. For example, in the example scenario shown in
At step 512, an output frame may be produced by combining one of the plurality of processed frames with another of the plurality of processed frames according to the masking data (circle number 6 in
Referring simultaneously to
After applying the masking data 316 to the processed frame for F1304, the output frame corresponding to the processed frame for F2304′ may be produced. Still referring to
At step 514, once the output frame F1 has been identified, and the output frame F2 has been produced, the output frames may be modified to restore the properties of the original image frames (circle number 7 in
At step 516, the output frames ready for display can be arrived at (circle number 8 in
While the above discussion relates to the encoding of image frames directly into a multiplexed frame, in various embodiments, it may also be possible to encode another multiplexed frame into a multiplexed frame, so as to generate a nested multiplexed frame.
Referring to
For example, the method of generating a nested multiplexed frame may include identifying a multiplexed frame 306 which itself comprises a first processed frame that corresponds to a first image frame, and a second processed frame that corresponds to a second image frame. A third image frame F3 from an image frame stream may then be identified. As previously described, a first multiplexed frame 306 may contain: frame components 312 for an original image frame F1, frame components 312′ for an original image frame F2, and masking data for the multiplexed frame 306; where the masking data represents the relationship between image frame F2 and image frame F1. The masking data and metadata associated with the first multiplexed frame may be retained in the section 316 and is referred to as “M.D. 1” in the Figures. The intermediate steps for encoding the multiplexed frame 306 are shown in
Similar to multiplexing processed frame components in 306, processed frame components 812′a and 812′b for the third processed frame 802 can be packed into the nested multiplexed frame 806 where 812′a is a first frame component of the third processed image frame 802 (with label ‘F3A’) and where 812′b is a second frame component of the third processed frame 802 (with label ‘F3B’). Both frame components can be rotated and inserted into the nested multiplexed frame 806. The nested multiplexed frame may contain frame components 812′a and 812′b for a processed image frame 802, and masking and metadata 816 (labeled “M.D. 2” in
In further embodiments, the creation of a nested multiplexed frame 806 may be further repeated when the nested multiplexed frame itself is further encoded into yet another nested multiplexed frame. This process of recursively multiplexing image frames may continue to be repeated until a minimum quality level in the processed frames is reached during the encoding process.
To decode a nested multiplexed image frame, a similar decoding process as was described above with regards to
Generating the output image frames from a nested multiplexed frame requires the third output image frame to be generated from the second processed frame; and before generating the second output image frame, the first processed frame needs to be identified. Therefore, the layers of nesting of recursively nested multiplexed frame may have to be “unraveled” (e.g., each of the nested processed image frames must be unpacked and identified) before any of the output image frames can be generated.
As will be understood, the steps of the decoding method of the multiplexed frames can generally be similar to the encoding steps carried out in a reverse order. In the example illustrated in
To generate the third output frame F3 in the
This process may continue where further nesting of multiplexed frames has occurred until the last output image frame of the last nested multiplexed frame is generated. The decoding order of all the output image frames thus occurs in the same order as encoding the image frames.
The decoding method may include receiving: a nested multiplexed frame that comprises a processed multiplexed frame of a first processed frame and a second processed frame; a third processed frame; and masking and metadata at this nested multiplexing level. The decoding module can use the metadata associated with the nested multiplex frame to unpack the nested multiplexed frame to identify the processed multiplexed frame, the third processed frame, and the masking data associated within the nested multiplexed frame that is at a lower nested multiplexing level.
The method may then unpack the nested multiplexed frame by identifying and reconstructing the multiplexed frame using the metadata associated with the nested multiplexed frame. Once reconstructed, the multiplexed frame can then be decoded using the metadata associated with the multiplexed frame to identify the first processed frame, the second processed frame, and the masking data.
Once the processed frames are generated, the method may then proceed to generate output image frames corresponding to the processed frames, i.e. the first output image frame, the second output image frame and third output image frame to reconstruct the original high frame rate image sequence for display.
In various embodiments, the encoding and decoding process described in the present disclosure may be applied to the field of multiple-view image frame streams (e.g., a frame stream having image frames that provide different views of the same scene).
An example of a multiple-view image frame stream is a three-dimensional (3D) cinematic video stream in which two views of a scene (e.g., a stereoscopic frame pair containing one view for a left eye, and the other view for the right eye) are provided. As is known, the viewing of the different views together (or quickly in succession) allows the perception of depth that generates the 3D effect. In these types of 3D cinematic video streams, alternating scenes for each eye may be shown sequentially, and polarized glasses may be used to only allow each eye to see the view that is intended for it.
As discussed above, traditional cinematic video is typically provided at 24 fps. To allow for alternating left and right views of a scene in 3D cinematic video, twice the number of frames per second would have to be provided in a 3D cinematic image frame stream so as to maintain the traditional frame rate of 24 fps perceived by each eye (e.g., although the actual frame stream may be displayed at 48 fps, only half of these frames are directed to each eye in any given second). 3D cinematic image frame streams are thus typically provided at 48 fps, and thus may suffer from the similar bandwidth constraints described above. Accordingly, 3D cinematic image frame streams may be suitable candidates for the encoding and decoding processes described in the present disclosure.
Referring to
For multiplexing of stereoscopic frames, masking data generation may be different from what was described in the previous examples for better efficiency. For example, the analysis of difference between two different views of images may consider the scene depth and known or detected stereoscopic model parameters. In such embodiments, the scene depth information may be transmitted as a part of masking data, for example.
Referring to
As illustrated, a first plurality of original image frames 1002 may be selected to be encoded (e.g., the left-eye view of scenes ‘0’ and ‘1’ labeled frames L0 and L1) to produce a resultant multiplexed frame 1006 containing original frames labeled L0 and L1. Similarly, a second plurality of original frames 1002′ may be selected to be encoded (e.g., the right-eye view of scenes ‘0’ and ‘1’ labeled frames R0 and R1) to produce a resultant multiplexed frame 1006′ containing original frames labeled R0 and R1. The process of encoding may then be performed repeatedly, altering between sequential temporal pairs for each of a left-eye and a right-eye view of a scene. The result will be a multiplexed image frame stream that maintains the characteristic of a 3D video stream that provides left-eye and right-eye views in alternating sequence.
The process for generating output frames from a multiplexed image frame resulting from the embodiments shown in
Referring to
The process for generating output frames from a multiplexed image frame resulting from the embodiments shown in
Referring to
As illustrated, in the example scenario described above for packing a pair of original image frames F1 and F2 into a multiplexed frame, the frame components 312 for F1 (labeled ‘F1A’ and ‘F1B’) may be added to the multiplexed frame 1206. As well, instead of a processed frame for F2 that reflects the entirety of original frame F2 being added to the multiplexed frame 1206, only the regions 402 from the original second frame F2 (that are later used to generate an output frame) may be added to the multiplexed frame 1206.
The masking data 316 may be generated in a manner similar to what was described earlier, i.e., to show the regions of F1 that need to be overlaid with regions 402 when generating the output frame for frame F2. However, to allow the decoding module to associate a given position of the original frame F1 with a specific region 402, the multiplexed frame 1206 may also include mapping data in the metadata 328.
As the metadata 328 may only allow a small amount of data to be losslessly transmitted, in various embodiments, the layout of the positions of F1 (e.g., the 12×8 grid of data discussed earlier) may be fixed so that layout information need not be transmitted in the mapping data. In such case, the mapping data may simply provide a sequence of indices into the grid that specify the positions of F1 that need to be replaced with the respective sequence of regions 402 provided in the multiplexed frame 1206.
In some embodiments, the mapping data may additionally or alternatively be provided by way of another data channel separate from the metadata 328.
It will be understood that various aspects of the encoding method illustrated in
Further, in various embodiments, the factors that affect the encoding process may be pre-selected depending on the type of scene being shown in the image frames. That is, since the encoding may be performed offline with full knowledge of the properties of the image frames to be multiplexed, it may be possible to determine that a particular configuration of the described encoding process may be better suited for certain selected sequences of the original image frame stream.
In these embodiments, there may be a pre-selected number of such scene-specific configurations of the encoding process which can then be selected based on the original image frame stream. These scene-specific configurations may be provided at the decoding module 146, so that the multiplexed frame 306 need only store a scene-specific configuration identifier that will indicate to the decoding module 146 how the multiplexed frame 306 is to be decoded. For example, as discussed above, an example of such an identifier may be a code to indicate which pre-defined layout is used to position frame components within a multiplexed frame.
The present invention has been described here by way of example only. Various modification and variations may be made to these exemplary embodiments without departing from the scope of the invention, which is limited only by the appended claims. For example, the steps of a method in accordance with any of the embodiments described herein may be performed in any order, whether or not such steps are described in the claims, figures or otherwise in any sequential numbered or lettered manner.
This application is a continuation of PCT International Application No. PCT/CA2013/000028, filed Jan. 15, 2013. The entire content of PCT Application No. PCT/CA2013/000028 is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CA2013/000028 | Jan 2013 | US |
Child | 14799044 | US |