The present invention relates to methods for decoding signals, such as video signals, using hierarchical coding formats, as well as decoders and decoding systems. The present invention specifically relates to a video decoder for integrating a hierarchical decoder, preferably an LCEVC decoder, into an application or client.
A hybrid backward-compatible coding technology has been previously proposed, for example in WO 2014/170819 and WO 2018/046940, the contents of which are incorporated herein by reference. Further examples of tier-based coding formats include ISO/IEC MPEG-5 Part 2 LCEVC (hereafter “LCEVC”). LCEVC has been described in WO 2020/188273A1, and the associated standard specification documents including the Draft Text of ISO/IEC DIS 23094-2 Low Complexity Enhancement Video Coding published at MPEG 129 meeting in Brussels, held Monday, 13 Jan. 2020 to Friday, 17 Jan. 2020, both documents being incorporated by reference herein in their entirety.
In these coding formats a signal is decomposed in multiple “echelons” (also known as “hierarchical tiers”) of data, each corresponding to a “Level of Quality”, from the highest echelon at the sampling rate of the original signal to a lowest echelon. The lowest echelon is typically a low quality rendition of the original signal and other echelons contain information on correction to apply to a reconstructed rendition in order to produce the final output.
LCEVC adopts this multi-layer approach where any base codec (for example Advanced Video Coding—AVC, also known as H.264, or High Efficiency Video Coding—HEVC, also known as H.265) can be enhanced via an additional low bitrate stream. LCEVC is defined by two component streams, a base stream typically decodable by a hardware decoder and an enhancement stream consisting of one or more enhancement layers suitable for software processing implementation with sustainable power consumption. The enhancement provides improved compression efficiency to existing codecs, and reduces encoding and decoding complexity.
Since LCEVC and similar coding formats leverage existing decoders and are inherently backwards-compatible, there exists a need for efficient and effective integration with existing video coding implementations without complete re-design. Examples of known video coding implementations include the software tool FFmpeg, which is used by the simple media player FFplay.
Moreover, LCEVC is not limited to known codecs and is theoretically capable of leveraging yet-to-be-developed codecs. As such any LCEVC implementation should be capable of integration with any hitherto known or yet-to-be-developed codec, implemented in hardware or software, without introducing coding complexity.
Aspects and variations of the present invention are set out in the appended claims. Certain unclaimed aspects are further set out in the detailed description below.
According to one aspect, there is provided a video decoder comprising: one or more decoder plug-ins that provide a wrapper for one or more respective base decoders to implement a base decoding layer to decode an encoded video signal, each wrapper implementing an interface for data exchange with a corresponding base decoder; an enhancement decoder to implement an enhancement decoding layer, the enhancement decoder being configured to: receive an encoded enhancement signal; and, decode the encoded enhancement signal to obtain one or more layers of residual data, the one or more layers of residual data being generated based on a comparison of data derived from the decoded video signal and data derived from an original input video signal, and a decoder integration layer to control operation of the one or more decoder plug-ins and the enhancement decoder to generate a decoded reconstruction of the original input video signal using a decoded video signal from the base encoding layer and the one or more layers of residual data from the enhancement encoding layer, wherein the decoder integration layer provides a control interface for the video decoder.
Preferably the enhancement decoder is an LCEVC decoder such that the decoder integration layer, one or more plug-ins and the enhancement decoder together provide an LCEVC decoding software solution. The LECVC decoding software stack may be implemented in one or more LCEVC decoder libraries and thus provides an optimised software library for decoding MPEG-5 enhanced streams.
LCEVC decoding is extremely lightweight, often freeing up resources and matching or reducing battery power consumption vs. native base codec decoding. The above aspect provides for rapid deployment of LCEVC across all platforms, including support of different base encodings and decoder implementations.
The decoder integration layer may also include control operation of an upscale operation to upscale the decoded video signal from the base encoding layer so that the one or more layers of residual data may be applied to the decoded video signal from the base encoding layer.
The decoder can be easily implemented on popular media players across platforms such as iOS®, Android® and Windows®.
The one or more decoder plug-ins may be configured to instruct the corresponding base decoder through a library function call or operating system function call. Function calls may include for example, Android® mediacodec, VTDecompressionSession and MFT depending on the operating system. Hence, different base decoding implementations may be easily supported, including native implementations within an operating system and hardware-accelerated decoding.
The decoder integration layer may be configured to apply the one or more layers of residual data from the enhancement encoding layer to the decoded video signal from the base encoding layer to generate the decoded reconstruction of the original input video signal. In certain cases, the decoder integration layer may instruct a plug-in from the set of decoder plug-ins to apply the one or more layers of residual data; in other cases, the decoder integration layer may obtain a decoded output from the base encoding layer that was instructed using the decoder plugin and combine this with the output of the enhancement decoder. Preferably the layers of residual data may be applied during playback.
In certain embodiments the decoder integration layer is configured to receive: one or more input buffers comprising the encoded video signal and the encoded enhancement signal in an encoding order, wherein the one or more input buffers are also fed to the base decoders; and, one or more base decoded frames of the decoded video signal from the base encoding layer, in presentation order. In this way minimal processing is needed by a client and the integration takes care of the operation for the client. The same input buffers can be passed to the base decoding layer and the enhancement decoding layer to aid simplicity.
In particularly preferred embodiments, the control interface comprises an output type configuration parameter, wherein the decoder integration layer is configured to vary how the decoded reconstruction of the original input video signal is output based on a value of the output type configuration parameter. The value of the output type configuration parameter may be stored in a configuration data structure retrieved by the decoder integration layer upon initialisation.
In one example of a configured output, the decoder integration layer is configured to output the decoded reconstruction of the original input video signal as one or more buffers. In another example, the decoder integration layer is configured to output the decoded reconstruction of the original input video signal as one or more on-screen surfaces. Alternatively, the decoder integration layer is configured to output the decoded reconstruction of the original input video signal as one or more off-screen textures. Each of these three example outputs may be selected by the output type configuration parameter.
Where the output is selected to be one or more off-screen textures, the control interface may comprise a render instruction and, when the decoder integration layer receives the render instruction the decoder integration layer may be configured to render the off-screen texture. This is particularly useful when a client wants to finely manage the time of display of each frame and perhaps keep a queue of decoded frames ready for display at the right time. For this use, a separate render function is provided, that is, the render instruction.
The control interface may comprise a pipeline mode parameter, wherein the decoder integration layer is configured to control stages of the enhancement layer to be performed on a central processing unit (CPU) or graphical processing unit (GPU) based on a value of the pipeline mode parameter. For example, in one pipeline mode all the LCEVC stages may be performed in a CPU while a GPU is used only for a possible colour component (e.g. YUV/RGB) conversion. Similarly, in another mode, most of the LCEVC stages may be performed in a GPU using graphics library (GL) shaders, including colour component (e.g. YUV/RGB) conversions, while the CPU may be only used to produce the LCEVC residual planes. The configuration of the present decoder allows efficient distribution of processing across CPUs/GPUs, and for this to be configured via the decoder integration layer.
The decoder integration layer may be configured to fall back to passing an output of the base decoding layer as the decoded reconstruction of the original input video signal where no encoded enhancement signal is received. This is particularly beneficial as a video signal may still be output, albeit at a lower resolution than if an enhancement signal had been received successfully.
The control interface may comprise a skip frame instruction and the decoder integration layer may be configured to control the operation to not decode a frame of the encoded enhancement signal and/or not decode a frame of the encoded video signal in response to receiving the skip frame instruction. When a client skips frames, for example, because of a seek in the timeline, or drops frames because they are ‘late,’ it may alert the decoder integration layer using a suitable function. The decoder integration layer falls back to a ‘no operation’ case if the skip instruction is received. This alert may be used to internally perform a minimal frame decoding to keep reference decoding buffer consistent or may fall back to no operation.
The one or more decoder plug-ins may provide a base control interface to the base decoder layer to call functions of the corresponding base decoder. The plug-ins thus provide an application programming interface (API) to control operations and exchange information.
The control interface may comprise a set of predetermined encoding options, wherein the decoder integration layer is configured to retrieve a configuration data structure comprising a set of decoding settings corresponding to the set of predetermined decoding options. The configuration data structure may be retrieved by the decoder integration layer upon initialisation. Examples of decoding settings include: graphics library versions (e.g. OpenGL major and minor versions or the use of graphics library functions for embedded systems such as OpenGL ES); bit-depth, e.g. use of 8 or 16 bit LCEVC residual planes; use of hardware buffers; user interface (UI) configurations (e.g. enabling an on-screen UI for stats and live configuration); and logging (e.g. enabling dumping stats and/or raw output frames to local storage).
In certain embodiments, the decoder integration layer may be configured to receive, via the control interface, an indication of a mode in which the decoder integration layer should control operation of the one or more decoder plug-ins and the enhancement decoder, wherein, in a synchronous mode, the decoder integration layer may be configured to block a call to a decode function until decoding is complete; and, in an asynchronous mode, the decoder integration layer may be configured to return (e.g. immediately) upon call to a decode function and call back when decoding completes. Thus, the decoder integration layer can be used in either synchronous or asynchronous mode, optionally by implementing a decode function in either mode.
Using the decoder integration layer is simplified for client applications, since the control interface operates at a relatively high-level, has a small number of commands and hides additional complexity. The control interface may comprise a set of functions to instruct respective phases of operation of the decoder integration layer, the set of functions comprising one or more of: a create function, in response to which an instance of the decoder integration layer is created; a destruct function, in response to which the instance of the decoder integration layer is destroyed; a decode function, in response to which the decoder integration layer controls operation of the one or more decoder plug-ins and the enhancement decoder to generate a decoded reconstruction of the original input video signal using and the one or more layers of residual data from the enhancement encoding layer; a feed input function which passes an input buffer comprising the encoded video signal and the encoded enhancement signal to the video decoder; and, a call back function, in response to which the decoder integration layer will call back when the decoded reconstruction of the original input video signal is generated. The call back may be thought of as a registered for alert which indicates to a client that the decoding is complete.
According to a further aspect there may be provided a method of generating a decoded reconstruction of an original input video signal using a video encoder according to any of the above aspects, the method comprising: initialising an instance of the decoder integration layer; feeding an input to the video decoder comprising an encoded video signal and an associated encoded enhancement signal; instructing the decoder integration layer to generate the decoded reconstruction; and, destroying the instance of the decoder integration layer. The method may be performed by a client or application.
According to further aspects of the present invention, the video decoder and method may be provided by a computer readable medium comprising instructions which, when executed by a processor, cause the processor to perform the functionality of the video encoder or carry out the steps of the method.
According to a further aspect there may be provided a video decoding system, comprising: a video decoder according to any of the first aspect; one or more base decoders; and, a client which provides one or more calls to the video decoder via the control interface to instruct generation of a decoded reconstruction of an original input video signal using the video decoder. In conjunction with a base decoder, typically provided by the operating system, the video decoder described herein offers a complete solution from buffer to output. Examples of the one or more base codecs include, for example, AVC, HEVC, VP9, EVC, AV1 and may be implemented in software or hardware as is commonplace in this field.
According to a further aspect of the invention there may be provided a decoder integration layer to control operation of one or more decoder plug-ins and an enhancement decoder to generate a decoded reconstruction of an original input video signal using a decoded video signal from a base encoding layer and one or more layers of residual data from the enhancement encoding layer, wherein the decoder integration layer provides a control interface for a video decoder, wherein the one or more decoder plug-ins provide a wrapper for one or more respective base decoders to implement the base decoding layer to decode an encoded video signal, each wrapper implementing an interface for data exchange with a corresponding base decoder; and, wherein the enhancement decoder implements the enhancement decoding layer, the enhancement decoder being configured to: receive an encoded enhancement signal; and, decode the encoded enhancement signal to obtain one or more layers of residual data, the one or more layers of residual data being generated based on a comparison of data derived from the decoded video signal and data derived from an original input video signal. In this way, the decoder integration layer, together with a suitably configured client, together communicate with a simple API.
In a further illustrative aspect, the present disclosure provides a decoder for decoding input data including a plurality of layers of data in a hierarchical structure, wherein the plurality of layers includes base layer data and at least one enhancement layer data, wherein the at least one enhancement layer data is useable to enhance a rendition of the base layer data at a first level of quality to an enhanced rendition at a second level of quality, the second level of quality being higher than the first level of quality, characterized in that the decoder includes a base function arrangement (e.g. a base decoder) for processing the base layer data, and an enhancement layer arrangement (e.g. an enhancement decoder) for processing the at least one enhancement layer data, wherein the decoder further includes a plug-in system implemented using software to interface between the base function arrangement and the enhancement layer arrangement, and an applications layer arrangement (e.g. a functional layer) for executing one or more software applications executable on computing hardware for controlling operation of the decoder; wherein the decoder further includes an orchestration unit (e.g. a decoder integration layer) for adapting or selecting one or more plugins of the plugin system to use when data is communicated between the enhancement layer arrangement and the base function arrangement; and wherein the orchestration unit, when in operation, reconfigures the decoder via the plugin system to accommodate to at least changes in operating characteristics of the base function arrangement and the enhancement layer arrangement.
This aspect is of advantage in that use of the plugin system in combination with the orchestration unit enables the decoder to be reconfigurable and adaptable to changes in at least one of the base function arrangement and the enhancement layer arrangement.
Optionally, in the encoder, the orchestration unit, when in operation, monitors changes in the operating characteristics of the base function arrangement and the enhancement layer arrangement, and reconfigures the plugin system as a function of the changes in the operating characteristics.
Optionally, in the decoder, the orchestration unit (e.g. the decoder integration layer) is arranged in operation to apply test data to the base function arrangement and the enhancement layer arrangement to determine their operating characteristics, and to implement a selection or an adaptation the plugin system as a function of the operating characteristics. More optionally, in the decoder, the selection or the adaptation is implemented using at least one of: machine learning, artificial intelligence (AI) algorithms.
Optionally, the decoder includes a parsing unit for parsing the input data to divide the input data into the layer data for the base function arrangement, and into the at least one enhancement layer data for the enhancement layer arrangement.
Optionally, in the decoder, the applications layer arrangement is updatable with enhancement functions that provide for additional functionalities to be provided by the decoder.
Optionally, in the decoder, the base function arrangement implements a base layer decoder complying to industry-recognised encoding standards.
According to a further illustrative aspect, there is provided a method for (namely, a method of) controlling a decoder for decoding input data including a plurality of layers of data in a hierarchical structure, wherein the plurality of layers includes base layer data and at least one enhancement layer data, wherein the at least one enhancement layer data is useable to enhance a rendition of the base layer data at a first level of quality to an enhanced rendition at a second level of quality, the second level of quality being high then the first level of quality, characterized in that the method includes: (a) arranging for the decoder to include a base function arrangement for processing the base layer data, and an enhancement layer arrangement for processing the at least one enhancement layer data, (b) arranging for the decoder to include a plug-in system implemented using software to interface between the base function arrangement and the enhancement layer arrangement, and an applications layer arrangement for executing one or more software applications executable on computing hardware for controlling operation of the decoder; (c) arranging for the decoder to include an orchestration unit for adapting or selecting one or more plugins of the plugin system to use when data is communicated between the enhancement layer arrangement and the base function arrangement; and (d) arranging for the orchestration unit, when in operation, to reconfigure the decoder via the plugin system to accommodate to at least changes in operating characteristics of the base function arrangement and the enhancement layer arrangement.
Examples of systems and methods in accordance with the invention will now be described with reference to the accompanying drawings, in which:
This disclosure describes an implementation for integration of a hybrid backward-compatible coding technology with existing decoders, optionally via a software update. In a non-limiting example, the disclosure relates to an implementation and integration of MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC). LCEVC is a hybrid backward-compatible coding technology which is a flexible, adaptable, highly efficient and computationally inexpensive coding format combining a different video coding format, a base codec (i.e. an encoder-decoder pair such as AVC/H.264, HEVC/H.265, or any other present or future codec, as well as non-standard algorithms such as VP9, AV1 and others) with one or more enhancement levels of coded data.
Example hybrid backward-compatible coding technologies use a down-sampled source signal encoded using a base codec to form a base stream. An enhancement stream is formed using an encoded set of residuals which correct or enhance the base stream for example by increasing resolution or by increasing frame rate. There may be multiple levels of enhancement data in a hierarchical structure. In certain arrangements, the base stream may be decoded by a hardware decoder while the enhancement stream may be suitable for being processed using a software implementation. Thus, streams are considered to be a base stream and one or more enhancement streams, where there are typically two enhancement streams possible but often one enhancement stream used. It is worth noting that typically the base stream may be decodable by a hardware decoder while the enhancement stream(s) may be suitable for software processing implementation with suitable power consumption.
The video frame is encoded hierarchically as opposed to using block-based approaches as done in the MPEG family of algorithms. Hierarchically encoding a frame includes generating residuals for the full frame, and then a reduced or decimated frame and so on. In the examples described herein, residuals may be considered to be errors or differences at a particular level of quality or resolution.
For context purposes only, as the detailed structure of LCEVC is known and set out in the approved draft standards specification,
By additional PID we mean additional track or PID. By this we mean not only Transport Stream (PID) but also ISO Base Media File Format and WebM as container types.
Throughout the present description, the invention may be described in the context of NAL units. However, it should be understood that the NAL units in this context may refer equally and more generally to elementary stream input buffers, or equivalent. That is, LCEVC is equally capable of supporting non-MPEG base codecs, i.e. VP8/VP9 and AV1, that typically do not use NAL encapsulation. So where a term NAL unit is used, the term may be read to mean an elementary stream input buffer, depending on the base codec utilised.
LCEVC can be rapidly implemented in existing decoders with a software update and is inherently backwards-compatible since devices that have not yet been updated to decode LCEVC are able to play the video using the underlying base codec, which further simplifies deployment.
In this context, there is proposed herein a decoder implementation to integrate decoding and rendering with existing systems and devices that perform base decoding. The integration is easy to deploy. It also enables the support of a broad range of encoding and player vendors and can be updated easily to support future systems.
The proposed decoder implementation may be provided through an optimised software library for decoding MPEG-5 LCEVC enhanced streams, providing a simple yet powerful control interface or API. This allows developers flexibility and the ability to deploy LCEVC at any level of a software stack, e.g. from low-level command-line tools to integrations with commonly used open-source encoders and players.
The terms LCEVC and enhancement may be used herein interchangeably, for example, the enhancement layer may comprise one or more enhancement streams, that is, the residuals data of the LCEVC enhancement data.
As noted above, when we refer to NAL units here, we refer to elementary stream input buffers, or equivalent, depending on the base codec used.
In
In
NAL units 24 comprising the encoded video signal together with associated enhancement data may be provided in one or more input buffers. The input buffers may be fed by a similar non-MPEG elementary stream input buffer, such as used for example in VP8/VP9 or AV1. The input buffers may be fed (or made available) to the base decoder 26 and to the decoder integration layer 27, in particular the enhancement decoder that is controlled by the decoder integration layer 27. In certain examples, the encoded video signal may comprise an encoded base stream and be received separately from an encoded enhancement stream comprising the enhancement data; in other preferred examples, the encoded video signal comprising the encoded base stream may be received together with the encoded enhancement stream, e.g. as a single multiplexed encoded video stream. In the latter case, the same buffers may be fed (or made available) to both the base decoder 26 and to the decoder integration layer 27. In this case, the base decoder 26 may retrieve the encoded video signal comprising the encoded base stream and ignore any enhancement data in the NAL units. For example, the enhancement data may be carried in SEI messages for a base stream of video data, which may be ignored by the base decoder 26 if it is not adapted to process custom SEI message data. In this case, the base decoder 26 may operate as per the base decoder 22 in
On receipt of the encoded video signal comprising the encoded base stream, the base decoder 26 is configured to decode and output the encoded video signal as one or more base decoded frames. This output may then be received or accessed by the decoder integration layer 27 for enhancement. In one set of examples, the base decoded frames are passed as inputs to the decoder integration layer 27 in presentation order.
The decoder integration layer 27 extracts the LCEVC enhancement data from the input buffers and decodes the enhancement data. Decoding of the enhancement data is performed by the enhancement decoder 27b, which receives the enhancement data from the input buffers as an encoded enhancement signal and extracts residual data by applying an enhancement decoding pipeline to one or more streams of encoded residual data. For example, the enhancement decoder 27b may implement an LCEVC standard decoder as set out in the LCEVC specification.
A decoder plug-in is provided at the decoder integration layer to control the functions of the base decoder. In certain cases, the decoder plug-in 27a may handle receipt and/or access of the base decoded video frames and apply the LCEVC enhancement to these frames, preferably during playback. In other cases, the decoder plug-in may arrange for the output of the base decoder 26 to be accessible to the decoder integration layer 27, which is then arranged to control addition of a residual output from the enhancement decoder to generate the output surface 28. Once integrated in a decoding device, the LCEVC decoder 25 enables decoding and playback of video encoded with LCEVC enhancement. Rendering of a decoded, reconstructed video signal may be supported by one or more GPU functions 27c such as GPU shaders that are controlled by the decoder integration layer 27.
In general, the decoder integration layer 27 controls operation of the one or more decoder plug-ins and the enhancement decoder to generate a decoded reconstruction of the original input video signal 28 using a decoded video signal from the base encoding layer (i.e. as implemented by the base decoder 26) and the one or more layers of residual data from the enhancement encoding layer (i.e. as implemented by the enhancement decoder). The decoder integration layer 27 provides a control interface, e.g. to applications within a client device, for the video decoder 25.
Depending on configuration, the decoder integration layer may output the surface 28 of decoded data in different ways. For example, as a buffer, as an off-screen texture or as an on-screen surface. Which output format to use may be set in configuration settings that are provided upon creation of an instance of the decoding integration layer 27, as further explained below.
In certain implementations, where no enhancement data is found in the input buffers, e.g. where the NAL units 24 do not contain enhancement data, the decoder integration layer 27 may fall back to passing through the video signal at the lower resolution to the output, that is, the output of the base decoding layer as implemented by the base decoder 26. In this case, the LCEVC decoder 25 may operate as per the video decoder pipeline 20 in
The decoder integration layer 27 can be used for both application integration and operating system integration, e.g. for use by both client applications and operating systems. The decoder integration layer 27 may be used to control operating system functions, such as function calls to hardware accelerated base codecs, without the need for a client application to have knowledge of these functions. In certain cases, a plurality of decoder plug-ins may be provided, where each decoder plug-in provides a wrapper for a different base codec. It is also possible for a common base codec to have multiple decoder plug-ins. This may be the case where there are different implementations of a base codec, such as a GPU accelerated version, a native hardware accelerated version and an open-source software version.
When viewing the schematic diagram of
The set of decoder plug-ins are configured to present a common interface (i.e. a common set of commands) to the decoder integration layer 27, such that the decoder integration layer 27 may operate without knowledge of the specific commands or functionality of each base decoder. The plug-ins thus allow for base codec specific commands, such as MediaCodec, VTDecompression Session or MFT, to be mapped to a set of plug-in commands that are accessible by the decoder integration layer 27 (e.g. multiple different decoding function calls may be mapped to a single common plug-in “Decode( . . . )” function).
Since the decoder integration layer 27 effectively comprises a ‘residuals engine’, i.e. a library that from the LCEVC encoded NAL units produces a set of correction planes at different levels of quality, the layer can behave as a complete decoder (i.e. the same as decoder 22) through control of the base decoder.
For simplicity, we will refer to the instructing entity here as the client but it will be understood that the client may be considered to be any application layer or functional layer and that the decoder integration layer 27 may be integrated simply and easily into a software solution. The terms client, application layer and user may be used herein interchangeably.
In an application integration, the decoder integration layer 27 may be configured to render directly to an on-screen surface, provided by a client, of arbitrary size (generally different from the content resolution). For example, even though a base decoded video may be Standard Definition (SD), the decoder integration layer 27, using the enhancement data, may render surfaces at High Definition (HD), Ultra High Definition (UHD) or a custom resolution. Further details of out-of-standard methods of upscaling and post-processing that may be applied to a LCEVC decoded video stream are found in PCT/GB2020/052420, the contents of which are incorporated herein by reference. Example application integrations include, for example, use of the LCEVC decoder 25 by ExoPlayer, an application level media player for Android, or VLCKit, an objective C wrapper for the libVLC media framework. In these cases, VLCKit and/or ExoPlayer may be configured to decode LCEVC video streams by using the LCEVC decoder 25 “under the hood”, where computer program code for VLCKit and/or ExoPlayer functions is configured to use and call commands provided by the decoder integration layer 27, i.e. the control interface of the LCEVC decoder 25. A VLCKit integration may be used to provide LCEVC rendering on iOS devices and an ExoPlayer integration may be used to provide LCEVC rendering on Android devices.
In an operating system integration, the decoder integration layer 27 may be configured to decode to a buffer or draw on an off-screen texture of the same size of the content final resolution. In this case, the decoder integration layer 27 may be configured such that it does not handle the final render to a display, such as a display device. In these cases, the final rendering may be handled by the operating system, and as such the operating system may use the control interface provided by the decoder integration layer 27 to provide LCEVC decoding as part of an operating system call. In these cases, the operating system may implement additional operations around the LCEVC decoding, such as YUV to RGB conversion, and/or resizing to the destination surface prior to the final rendering on a display device. Examples of operating system integration include integration with (or behind) MFT decoder for Microsoft Windows® operating systems or with (or behind) Open Media Acceleration (OpenMAX—OMX) decoder, OMX being a C-language based set of programming interfaces (e.g. at the kernel level) for low power and embedded systems, including smartphones, digital media players, games consoles and set-top boxes.
These modes of integration may be set by a client device or application and the mechanism for selection and configuration will be described in more detail below.
The configuration of
As described above, to integrate an LCEVC decoder such as 25 into a client, i.e. an application or operating system, a decoder integration layer such as 27 provides a control interface, or API, to receive instructions and configurations and exchange information.
At step 30, a first API call (e.g. “Create( . . . )”) is used to create an instance of the decoder integration layer. The term “instance” is used here as per its normal use within the field of computer programming, e.g. an instance may be an instance of a defined software class or object. If successful, the call returns an indication of success. In creating the instance, the decoder integration layer may retrieve a configuration data structure comprising a set of decoding settings. The specifics of the configuration system will be described below but the decoder integration layer may be unable to initialise an instance of the layer if the settings are incorrect or not retrieved properly. The layer is initialised in accordance with the values of the settings specified in the configuration data structure.
The main phases of the operation of the decoder integration layer are provided by two types of API call; one, shown at step 31 in
A fifth type of API call is shown at step 34. This API call (e.g. SetCallBack( . . . )) may be used to register a call back to the client from the decoder integration layer, such that the client is informed by the decoder integration layer once a particular event has occurred. For example, a call back may be registered based on either the output of final (e.g. highest) resolution data from the LCEVC decoding process or on a surface change event. The call back may be thought of as an alert from the decoder integration layer and may be subsequently used by the client to perform post processing while or before rendering the decoded data. For example, the call back may allow the client to override a content resolution of the base decoding (e.g. the LCEVC decoding may generate a surface that is at a higher resolution that the default base decoding and this may need to be handled by the client). The client may register for a variety of call backs, such as when a surface has changed or when decoding is complete in asynchronous mode as set out below.
The API calls that instruct the decoder integration layer may come in different modes, synchronous or asynchronous, with each mode being signalled through the use of a specific API call to that decode mode. It will be noted that instead of signalling these modes in the use of a specific function, or API call, for each mode, the choice of decoding mode may instead be signalled in the configuration data structure retrieved upon initialisation of the decoder integration layer.
In synchronous mode, a caller is blocked on the API call (e.g. to the control interface of the decoder integration layer) until the entire decoding is completed. In asynchronous mode, the call returns immediately and then when the decode completes, the decoder integration layer calls back the client's registered call back function. The asynchronous mode may allow the caller (i.e. the client) to perform other operations while waiting for the decoded data.
It was noted above how, upon creation of an instance of the decoder integration layer, the layer may retrieve a data structure comprising a set of configuration settings set by the client or populated by default. The data structure may be a program code data structure such as the C/C++ “struct” or any suitable mechanism to pass a set of values, such as a simple string, a JavaScript Object Notation (JSON) string, or Extensible Markup Language (XML) data.
Via one configuration setting, the decoder integration layer may be configured to work with different types of internal pipeline. For example, a particular internal pipelines may control how stages of the decoding operation to be performed. In one case, different types of internal pipeline may distribute computation over one or more Central Processing Units (CPUs) and/or Graphical Processing Units (GPUs). In one case, two types of internal pipeline may be provided. A first example type may relate to a CPU-led operation, where the LCEVC stages (e.g. all the stages) are performed in CPU of a computing device running the LCEVC decoder. A CPU-led mode may only use Single instruction, Multiple Data (SIMD) acceleration, e.g. based on the implementation of the decoder plug-in(s) only. For this first example type, a GPU may be used only for possible YUV/RGB conversion. The first example type may not use the GPU functions 27c of
Myriad configurations may be set by the configuration data that is passed or set upon creation of an instance of the decoder integration layer. Further non-limiting examples which the client can configure in the decoder integration layer include:
As noted, the five main types of API call listed in the context of
A further API call (e.g. Skip( . . . )) may be defined and used to allow the client to indicate to the decoder integration layer that it should skip the processing of certain frames. The call may indicate that the decoder integration layer should skip decoding of the encoded video signal by the base layer or skip decoding of the enhancement data by the enhancement layer, that is, pass through the input video signal at the low resolution as decoded by the base decoding layer as the output.
In certain cases, an LCEVC stream may be configured to use a temporal mode for encoded data. In this case, if the LCEVC content is encoded with a temporal feature, the LCEVC decoder may require each and every frame to be processed in order to keep an internal temporal reference data (e.g. as used by a current decoder plug-in) correct. As a result, when the client skips frames, e.g., because of a seek in the timeline, or drops frames because they are “late,” this skip call may be used by the client to alert the decoding integration layer. The decoder integration layer may be aware of whether the temporal feature is on, and therefore can fall back to a “no operation” case. For example, the decoder integration layer can ensure appropriate processing in the temporal mode by a decoder plug-in but provide a “skip” behaviour in output provided to the client.
Via an initialisation command 40, the client is able to create and fill a settings data structure. Via an instance creation command 41, the client is able to create an instance of the decoder initialisation layer. The instance creation command 41 may be passed the setting data structure generated via initialisation command 40. For example, the client may configure the decoder integration layer to use CPU mode for its internal pipeline and to output to a buffer.
Upon successful initialisation, e.g. via commands 40 and 41, an instance of the decoder integration layer is now ready to receive an input buffer. The client may pass an input buffer or input buffer data to the instance of the decoder integration layer via buffer handling command 42. This command may include informing the decoder integration layer of the NAL format and how to identify the enhancement encoded data from the input buffers, that is, from metadata, properties or parsing of the input buffers.
As well as decoding operations as shown in
Two more commands are shown in
In certain implementations described above, a decoder plug-in that corresponds to the base codec for the encoded base video stream may be selected (either via configuration data passed to the decoder integration layer and/or via a determination of the encoded video signal configuration data). This decoder plug-in may be selected from a plurality of available decoder plug-ins that each form an interface to a different base decoder implementation. The decoder plug-in is configured to receive commands from the decoder and issue function calls to the base decoder (e.g. using specific commands for the determined base decoder implementation). When using hardware-accelerated base decoders, the decoder plug-in may make system calls to operating system functions that provide such base decoding. The decoder integration layer also controls the enhancement layer decoding of the enhancement (i.e. LCEVC enhancement) data. This may be performed in parallel with the base decoding. The decoder integration layer lastly controls the combination of the decoded base and enhancement data to generate a reconstructed output frame. The reconstruction of the video signal, e.g. the combining of the decoded base and residual data for an LCEVC data stream, may be performed by one or more of the decoder plug-in, the enhancement decoder and the decoder integration layer.
It will be noted that, in a situation of software applications providing various functions when executed on computing hardware, there is the need to be able to modify behaviours of the functions for various reasons, including adapting or enhancing their performance. These functions are susceptible to being varied and include video and audio processing, metadata extraction, image processing, error corrections, data security, detection and recognition and so forth. The pipeline set out herein addresses these challenges.
It is further noted that the aforesaid decoder is beneficially configured so that an enhancement is achieved that improves performance of the base function (for example, comprising a set of functions) within the application layer with minimal or no disruption to the system, so that the enhancement can be deployed in systems already in use. Such an advantage is achieved through a combination of the plug-in system and the enhancement integration layer.
In addition to controlling the operation of the generation of the decoded reconstruction of the original input signal, at the decoder integration layer, beneficially, multiple plug-ins can be created for multiple functions of a base function, so that the enhancement can be applied to selected subset of functions of the base function simply by deploying a new plug-in to perform such creation. New plug-ins can also be developed with the enhancement in place to improve the performance of an additional base system provided in the decoder.
Beneficially, capability extension is an optional small amount of modification that can be applied to a functional layer for it to:
Where no modifications are applicable to the base function, the enhancement becomes a proxy for the functionality of both the base function and the enhanced base function.
When modifying the decoder, the enhancement integration layer, the plug-in system and the base function can be distributed mutually separately and at different temporal instances by relevant owners of software components to be incorporated into the decoder. Beneficially, add-ons thereby provided to the decoder are not disruptive but provide seamless improvements in performance of the decoder.
Extra functionalities that are susceptible to being added to the decoder include at least one of: dynamic bit rate control, various up-sampling functions, various down-sampling functions, dynamic adaptive quantization (for example in an encoder), dynamic adaptive dequantisation (for example in a decoder), various implementations of motion compensation, various high-dynamic range colour enhancements, error correction.
Beneficially, in an embodiment of the decoder, the enhancement and the base function can be arranged to provide a combined output such as encoded video data or decoded video data.
Enhancements applied to the decoder can be optionally implemented using one or more software modules that are introduced or exchanged within the decoder. For example, the one or more software modules beneficially include an orchestration (i.e. decoder integration) module and one or more core enhancement (i.e. enhancement decoder) modules
The orchestration (i.e. decoder integration) module beneficially functions using one or more machine learning algorithms (for example, in a manner of artificial intelligence AI) to characterize performance and functionality of the one or more functions of the base function and characteristics of the enhancement integration layer, for example in respect of input/output requirements, execution performance (for example, latency, compression efficiency, data format requirements and so forth), and from such characterization to select and/or adapt plug-ins of the plug-in system accordingly. For example, such characterization optionally involves the orchestration (i.e. decoder integration) module applying test data to the user functions and/or to the enhancement decoder layer to determine how they function, and whether or not their operation has temporally changed, for example as a result of a software upgrade being implemented or a physical hardware device being exchanged or replaced. When a given computational task arises, for example image compression, object analysis in images, image enhancement, the orchestration module beneficially selects how much the computation task is implemented in the base function and how much is implemented by the enhancement decoder layer, so that the decoder operates as efficiently as possible. The orchestration module beneficially includes a deep-learning neural network that is taught a priori, by example, to teach how the configured plug-ins of the plug-in system of the decoder to cope with various scenarios arising within the base function and/or the enhancement integration layer, for example by selecting and/or configuring plug-ins of the plug-in system.
A further example of this latter example, is illustrated in
There is shown an example of a decoder; the decoder is indicated generally by 500. The decoder 500 includes a base function 510 that, for example, implements the aforesaid base layer decoder. Optionally, the base function 510 includes a plurality of mutually different base layer decoders that can be selected depending on a nature of the input signal Sin provided to the decoder 500. The base function 510 is optionally implemented, at least in part, using hardware, for example by employing custom decoder integrated circuits or field programmable gate arrays (FPGAs). The decoder 500 further includes an application layer 520 that is used to manage specific functionalities, for example controlling a graphical user interface (GUI) for the decoder 500, performing object recognition in video data, video editing and so forth. The application layer 520 is conveniently implemented using one or more software products executable on computing hardware. The application layer 520 is susceptible to being periodically upgraded with extra functionality referred to as being ‘capability extension’, as denoted by 530; the application layer 520 therefore is reconfigurable as a function of time to accommodate new requirements and functionalities.
The decoder 500 also includes an enhancement integration layer 540, for example that implements the enhancement layer decoder as aforementioned. The enhancement integration layer 540 is beneficially implemented, at least in part, using one or more software products that are executable on computing hardware. However, it will be appreciated that the enhancement integration layer 540 potentially also needs upgrading from time-to-time as improved approaches to enhancing output from the base function 510 arise. With both the base function 510 and the enhancement integration layer 540 temporally being changed by software upgrades, technical problems can arise in respect of mutual compatibility of the enhancement integration layer 540 and the base function 510.
In order to address these technical problems, the decoder 500 further includes a plug-in system 550 that includes one or more software plug-ins that provide a flexible interface between the base function 510 and the enhancement integration layer 540. Thus, changes occurring as a function of time as adaptations are made to the base function 510 and the enhancement integration layer 540 accommodated by modifications in the plug-in system 150.
Thus, in a situation of software applications providing various functions when executed on computing hardware, there is the need to be able to modify behaviours of the functions for various reasons, including adapting or enhancing their performance. These functions are susceptible to being varied and include video and audio processing, metadata extraction, image processing, error corrections, data security, detection and recognition and so forth.
Conventionally, the application layer 520, for example a given program being run by a given user, and the base function 510 are tightly coupled and belong to a same given entity, for example a given software programmer or a given software house that built the program. However, contemporarily, it is no longer the case that there is tight coupling, and both open source and proprietary software allow for additional functionality to be made available through a variety of methods, including contributions to the decoder 500, plug-ins and extensions. However, the functionality of the base function 510 remains in a domain of its constructor, who may not always be incentivised or capable of improving its performance. Such problems with maintenance and upgrading of software in respect of the decoder 500 are addressed in examples.
The enhancement integration layer 540, when in operation, improves a performance of one or more functions of the base function 510 by modifying one or more of inputs, outputs and controls of the base function 510, for example via modification of the plug-in system 550. It will be appreciated that the enhancement provided in the enhancement integration layer 540 is optionally very similar, and sometimes equal, to a function of the base function 510 that it enhances.
In other words, in the decoder 500, there is employed a base function plug-in to provide an interface between the enhancement integration layer 540 and the base function 510 to ensure that an enhancement that is thereby provided can improve the performance of the base function 510, even in a condition where the base function 510 is unaware of the existence of the enhancement. The base function plug-in is designed to link dynamically and to load a given function of the base function 510 in order for the decoder 500 to perform well even under a condition where the given function of the base function 510 is not present. By such an approach, there is provided a clear separation of operation between the enhancement and the base function 510, wherein the separation maintains an ability for the enhancement integration layer 540 and the base function 510 to be distributed, priced, maintained, updated separately, for example by mutually different vendors. At a decoder, for example implemented in a client device or client device decoding from a data store, methods and processes described herein can be embodied as code (e.g., software code) and/or data. The decoder may be implemented in hardware or software as is well-known in the art of data compression. For example, hardware acceleration using a specifically programmed Graphical Processing Unit (GPU) or a specifically designed Field Programmable Gate Array (FPGA) may provide certain efficiencies. For completeness, such code and data can be stored on one or more computer-readable media, which may include any device or medium that can store code and/or data for use by a computer system. When a computer system reads and executes the code and/or data stored on a computer-readable medium, the computer system performs the methods and processes embodied as data structures and code stored within the computer-readable storage medium. In certain embodiments, one or more of the steps of the methods and processes described herein can be performed by a processor (e.g., a processor of a computer system or data storage system).
Generally, any of the functionality described in this text or illustrated in the FIGS. can be implemented using software, firmware (e.g., fixed logic circuitry), programmable or nonprogrammable hardware, or a combination of these implementations. The terms “component” or “function” as used herein generally represents software, firmware, hardware or a combination of these. For instance, in the case of a software implementation, the terms “component” or “function” may refer to program code that performs specified tasks when executed on a processing device or devices. The illustrated separation of components and functions into distinct units may reflect any actual or conceptual physical grouping and allocation of such software and/or hardware and tasks.
Number | Date | Country | Kind |
---|---|---|---|
2011670.3 | Jul 2020 | GB | national |
2018723.3 | Nov 2020 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2021/051940 | 7/28/2021 | WO |