The present disclosure relates to systems for removal of lossy compression artifacts to improve image quality and reduce bandwidth requirements using neural networks.
Data compression involves encoding information using fewer bits than an original representation. Typically, data compression is carried out in two discrete steps of encoding and decoding. During the encoding step, the input stream is transformed according to the compression scheme into a coded representation. During the decoding step the inverse transformation is applied, the coded representation is restored or nearly restored to the original input stream. A special case of data compression is transcoding, where data compressed in a first compression scheme is decoded and then recoded using an encoder from a second compression scheme.
Data compression can be either lossless or lossy. Lossless compression reduces bits of information by identifying and eliminating statistical redundancy. During lossless compression, no information is actually lost and all the bits from the original representation can be recovered during the decoding (decompression) process. In contrast, lossy compression does not retain all the bits of the original representation during encoding, but instead removes bits that are not useful or important according to some metric. This process can greatly reduce the overall number of bits, at a cost of quality degradation. Unfortunately, lossy compression can result in compression artifacts Examples compression artifacts include blocking artifacts, cosine or wavelet transform artifacts, quantization artifacts, aliasing artifacts, etc.
Digital image or video cameras typically require a digital image processing pipeline that converts signals received by an image sensor into a usable image by use of image processing algorithms and filters. Because of the large quantity of associated digital information, data encoding, decoding, and transcoding using lossy compression schemes are often used to support connections to streaming devices. What is needed are systems and methods for removal of lossy compression artifacts to improve image quality and reduce bandwidth.
Non-limiting and non-exhaustive embodiments of the present disclosure are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various figures unless otherwise specified.
In some of the following described embodiments, methods, processing schemes, and systems for improving neural network (NN) processing are described. As will be disclosed in more detail, improved neural network processing encompasses a data compression system that can include a neural compression artifact removal module (NCARM) arranged to receive compressible data and output data with compression artifacts removed. A lossy compression module can be arranged to at least one of receive and send data to the NCARM and a decompression module arranged to at least one of receive and send data to the NCARM. In some embodiments, the NCARM sends data to the lossy compression module. Alternatively, the NCARM can receive data from the decompression module and/or data from the lossy compression module. As will be understood, while any lossy compressible data can be processed with the described system, in some embodiments the data is at least one of audio and video.
In other embodiments a data compression system can include a neural compression artifact removal module (NCARM) forming a portion of at least one of a transcoder and a lossy compression module, with the NCARM arranged to receive compressible data and output data with compression artifacts removed. A data encoder and decoder can be respectively connected to the NCARM and an NCARM calibration module arranged to at least one of receive and send data to the NCARM.
In other embodiments a data compression system can include a neural compression artifact removal module (NCARM) forming a portion of at least one of a transcoder, a lossy compression module, an encoding module, and a decoding module, with the NCARM arranged to receive compressible data and output data with compression artifacts removed. A data encoder and decoder can be respectively connected to the NCARM. A NCARM neural network forms a portion of one of the data encoder and decoder.
In another embodiment a camera data compression system can include a camera and a neural compression artifact removal module (NCARM) connected to the camera. The NCARM can be arranged to receive compressible data and output data with compression artifacts removed. In some embodiments the NCARM is operable on the camera, while in other embodiments the NCARM is operable on a cloud or VMS system that receives compressible data from the camera.
In one embodiment, data can include but is not limited to a wide range of video, audio, streaming, sensor, or control data. Lossy compression is often used in such applications in part because of their large input complexity, and also because of the high degree of redundancy in their data streams. Processing data according to this disclosure can benefit of image quality, reduced file size and bandwidth requirements, and give improved downstream machine or artificial intelligence (AI) application performance. In effect, lossy data compression reduces file size at the expense of some signal loss. In addition to signal loss, in many cases the compression process will also introduce undesirable artifacts. Using neural network technology allows partially recovery lost signals while also removing compression artifacts. The described systems and methods can improve signal fidelity and further reduces file size or bandwidth requirements, thereby enabling aggressive compression without compromising the original signal.
As will be understood, a wide variety of compression schemes can be used in the systems described in this disclosure. For example, both intra-frame (which use single frame image compression and interframe (which use one or more preceding and/or succeeding frames in a sequence to compress the contents of the current frame) video compression systems can benefit from neural network mediated artifact removal. Common compression schemes include but are not limited to Motion JPEG (M-JPEG), MPEG-1 (CD, VCD), MPEG-2 (DVD), MPEG-4, and H.264 based compression (encoding) and decompression (decoding) schemes.
In one embodiment, the module providing neural network assisted compression artifact removal (NCARM) services is a neural network that has been calibrated (trained) to remove input complexity and compression artifacts, while preserving data fidelity. In the case of images and video, the network receives as input images or a sequence of images, and outputs an enhanced image or sequence of images. Depending on system architecture, the NCARM processing module can be applied during the encoding phase, during the decoding phase, or during the transcoding phase. Further, the module can be standalone, or integrated with the decoder or encoder. In general, the processing submodule accepts as input an unencoded data stream and removes data complexity from the signal such that 1) the encoding process is more efficient with less artifacts, or 2) any artifacts resulting from the compression scheme are removed.
In some embodiments, calibration for the module providing neural network assisted compression artifact removal (NCARM) can be made using manual or automated parameters. This can be accomplished via training of the processing submodule's neural network (NN), whereby some loss function is minimized or maximized. In some embodiments a calibration module receives both the high-fidelity signal as well as the encoded-decoded signal that has been degraded. The calibration submodule adjusts the processing submodule's parameters such that much of the original high-fidelity signal is restored. In the absence of a “paired” high-fidelity/degraded signal, a reference high-quality stream can be used. In this case, the calibration submodule does not attempt to restore the degraded stream to an identical copy of the high-fidelity stream, but via methods such as generative-adversarial training attempts to match the statistical distribution of the degraded stream with that of the high-fidelity stream.
As will be understood, various embodiments of neural networks (NN) can be used. For example, neural networks can include fully convolutional, recurrent, generative adversarial, or deep convolutional networks. Convolutional neural networks are particularly useful for image processing applications such as described herein. Images can be pre-processed with conventional pixel operations or can preferably be fed with minimal modifications into a trained convolutional neural networks. Processing can proceed through one or more convolutional layers, pooling layers, a fully connected layer, and end with output suitable for encoding or decoding. In operation, one or more convolutional layers apply a convolution operation to the input, passing the result to the next layer(s). After convolution, local or global pooling layers can combine outputs into a single or small number of nodes in the next layer. Repeated convolutions, or convolution/pooling pairs are possible. After neural network processing is complete, the output can be passed between neural networks, to another local neural network, or in addition or alternatively to neural network based cloud based processing for additional neural network-based modifications.
One neural network embodiment of particular utility is a fully convolutional and recurrent neural network. A fully convolutional and recurrent neural network is composed of convolutional layers without any fully connected layers usually found at the end of the network. Advantageously, fully convolutional neural networks are image size independent, with any size images being acceptable as input for training or bright spot image modification. Recurrent behavior is provided by feeding at least some portion of output back into the convolutional layer or to other connected neural networks.
The various neural networks can identify and improve data compression for many types of artifacts. For example, capture noise originating from scene light and camera sensor is a common artifact. This noise is not caused by the encoding process but does contribute to the signal complexity (filesize/bandwidth) and quality. Capture noise can be divided into two cases: 1) low compression, where the noise is reasonably represented in the compressed video artifacts can be identified as “graininess”, and 2) high compression, where noise is poorly represented in the compressed video, and where it can be identified as irregular vertical or horizontal lines, or even checkerboards when viewed closely.
Many artifacts are due to representation by basis functions and use of quantization. Quantization noise is present because compression schemes often represent the data as a combination of basis functions (wavelet, discrete cosine, etc). In the limit, these can perfectly represent videos. However, many compression schemes reduce or remove the high frequency components since these don't significantly impact human perception. When aggressively compressing a signal, high frequency components appear as small patches of horizontal, vertical, or checkerboard patterns. These are the basis functions approximating the original signal with some error. Fortunately, such artifact errors can be corrected by use of neural networks and the described NCARM systems and methods.
Another type of artifacts known as blocking artifacts arise because many types of compression schemes aim to reuse as much information as possible. One way is to take a “patch” from the current or nearby frames and referencing it in multiple other areas. The patch is unlikely to perfect represent the other areas, so some error must be compensated for. In aggressive compression this error compensation is traded for filesize. After compression and decompression, resultant video data now includes small square patches in the video whose boundaries do not perfectly blend with its neighbors. Again, such artifact errors can be corrected by use of neural networks and the described NCARM systems and methods.
Another type of artifacts known as aliasing can also occur after compression and decompression. Aliasing is the result of limited spatial sampling period for a given signal jagged edges or moire patterns. Such artifact errors can be corrected by use of neural networks and the described NCARM systems and methods.
Artifacts can be automatically identified using machine intelligence techniques, or alternatively or in addition can be identified by a trained operator. The NCARM module can be trained to identify and remove these artifacts by ensuring they are well represented within the dataset. A team of data labelers can build a database of each artifact, which can be fed directly as training data to the NCARM module or used to train an automated “artifact classifier” algorithm which automates labelling of newly acquired data. Furthermore, for the purposes of training, these artifacts can be “forced” into the training data by purposefully using aggressive compression on some source material (modifying the input data or compression parameters such that the desired artifact becomes dominant).
As will be appreciated, a wide range of still or video cameras can benefit from use neural network supported image or video processing system and methods as discussed within this disclosure. Camera types can include but are not limited to conventional DSLRs with still or video capability, smartphone, tablet cameras, or laptop cameras, dedicated video cameras, webcams, or security cameras. In some embodiments, specialized cameras such as infrared cameras, thermal imagers, millimeter wave imaging systems, x-ray or other radiology imagers can be used. Embodiments can also include cameras with sensors capable of detecting infrared, ultraviolet, or other wavelengths to allow for hyperspectral image processing.
Cameras can be standalone, portable, or fixed systems. Typically, a camera includes processor, memory, image sensor, communication interfaces, camera optical and actuator system, and memory storage. The processor controls the overall operations of the camera, such as operating camera optical and sensor system, and available communication interfaces. The camera optical and sensor system controls the operations of the camera, such as exposure control for image captured at image sensor. Camera optical and sensor system may include a fixed lens system or an adjustable lens system (e.g., zoom and automatic focusing capabilities). Cameras can support memory storage systems such as removable memory cards, wired USB, or wireless data transfer systems.
In some embodiments, neural network processing can occur after transfer of audio, video, or other compressible data to a remote computational resources, including a dedicated neural network processing system, laptop, PC, server, or cloud. In other embodiments, neural network processing can occur within the camera, using optimized software, neural processing chips, dedicated ASICs, custom integrated circuits, or programmable FPGA systems.
As will be understood, the camera system and methods described herein can operate locally or in via connections to either a wired or wireless connect subsystem for interaction with devices such as servers, desktop computers, laptops, tablets, or smart phones. Data and control signals can be received, generated, or transported between varieties of external data sources, including wireless networks, personal area networks, cellular networks, the Internet, or cloud mediated data sources. In addition, sources of local data (e.g. a hard drive, solid state drive, flash memory, or any other suitable memory, including dynamic memory, such as SRAM or DRAM) that can allow for local data storage of user-specified preferences or protocols. In one particular embodiment, multiple communication systems can be provided. For example, a direct Wi-Fi connection (802.11b/g/n) can be used as well as a separate 4G cellular connection.
Connection to remote server embodiments may also be implemented in cloud computing environments. Cloud computing may be defined as a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned via virtualization and released with minimal management effort or service provider interaction, and then scaled accordingly. A cloud model can be composed of various characteristics (e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, etc.), service models (e.g., Software as a Service (“SaaS”), Platform as a Service (“PaaS”), Infrastructure as a Service (“IaaS”), and deployment models (e.g., private cloud, community cloud, public cloud, hybrid cloud, etc.).
Reference throughout this specification to “one embodiment,” “an embodiment,” “one example,” or “an example” means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment of the present disclosure. Thus, appearances of the phrases “in one embodiment,” “in an embodiment,” “one example,” or “an example” in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures, databases, or characteristics may be combined in any suitable combinations and/or sub-combinations in one or more embodiments or examples. In addition, it should be appreciated that the figures provided herewith are for explanation purposes to persons ordinarily skilled in the art and that the drawings are not necessarily drawn to scale.
The flow diagrams and block diagrams in the described Figures are intended to illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flow diagrams or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flow diagrams, and combinations of blocks in the block diagrams and/or flow diagrams, may be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flow diagram and/or block diagram block or blocks.
Embodiments in accordance with the present disclosure may be embodied as an apparatus, method, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware-comprised embodiment, an entirely software-comprised embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” or “system.” Furthermore, embodiments of the present disclosure may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.
Any combination of one or more computer-usable or computer-readable media may be utilized. For example, a computer-readable medium may include one or more of a portable computer diskette, a hard disk, a random access memory (RAM) device, a read-only memory (ROM) device, an erasable programmable read-only memory (EPROM or Flash memory) device, a portable compact disc read-only memory (CDROM), an optical storage device, and a magnetic storage device. Computer program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages. Such code may be compiled from source code to computer-readable assembly language or machine code suitable for the device or computer on which the code will be executed.
Many modifications and other embodiments of the invention will come to the mind of one skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is understood that the invention is not to be limited to the specific embodiments disclosed, and that modifications and embodiments are intended to be included within the scope of the appended claims. It is also understood that other embodiments of this invention may be practiced in the absence of an element/step not specifically disclosed herein.
This application claims the benefit of U.S. Provisional Application Ser. No. 63/289,454, filed Dec. 14, 2021, and entitled “Neural Network Assisted Removal of Video Compression Artifacts”, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63289454 | Dec 2021 | US |