High speed image collection generates large volumes of data that can choke conventional data channels. A particular example is a camera system that captures imagery of residential waste traveling on a high-speed conveyor belt, for recognition of recyclable items. One such system employs an array of six cameras, each capturing 300 frames per second of 16-bit imagery, at a resolution of 1280×1024 pixels. (Additional details on such arrangements are found in patent publications US20190306385, WO2020186234, and US20210299706. Each of the cameras generates about 6.3 gigabits of raw image data every second.
The referenced recycling system manages this data flow by dividing it up for analysis among dozens of threads of multiple hardware processors. The data finally resulting from the image analysis is of low bandwidth—simply indicating the locations on the belt of different items, and their respective identified plastic compositions.
For research and testing purposes it is desirable to log some or all of the captured imagery in a local or cloud archive. Yet the throughput of conventional data channels is an obstacle. Common disk storage interfaces, and most non-optical network connections, cannot handle such a large data rate. Some form of data compression is required.
Apart from bandwidth issues, compression also saves on storage requirements and costs.
It is desirable that compression be performed using a single thread of a single hardware processor, so that most processing threads can be allocated for image analysis. Thus, the compression method should be fast and light enough not to introduce a burdensome processing task.
Various data compression techniques are known, both lossless and lossy. In lossless compression, the original data can be perfectly reconstructed from the compressed counterpart. In lossy compression, only an approximation of the original data can be reconstructed from the compressed counterpart. In our application, lossless compression is required.
LZ77 and LZ78 are two familiar types of lossless data compression and respectively refer to methods taught by Lempel and Ziv in their 1977 and 1978 papers. Both are dictionary coders. A dictionary coder is a class of techniques that operates by searching for matches between the input data to be compressed and a set of strings contained in a data structure (the “dictionary”) maintained by the encoder. When the encoder finds such a match, it substitutes a reference to the string's position in the data structure.
In LZ77, a circular buffer called the “sliding window” holds the last N bits of data processed. This window serves as the dictionary, effectively storing every substring that has appeared in the past N bits as dictionary entries. Instead of a single index identifying a dictionary entry, two values are needed: the length, indicating the length of the matched text, and the offset (also called the distance), indicating where the match is found in the sliding window.
LZ78 uses a more explicit dictionary structure, which is compiled during use. Data found in the dictionary is represented in the output string simply by an index identifying the dictionary entry.
(Additional details are found in the original Lempel/Ziv papers: “A Universal Algorithm for Sequential Data Compression,” IEEE Transactions on Information Theory, 1977, Vol. 23, No. 3, pp. 337-343; and “Compression of Individual Sequences via Variable-Rate Coding,” IEEE Transactions on Information Theory, 1978, Vol. 24, No. 5, pp. 530-536.)
These and other dictionary coder techniques offer good compression, but the need to search the dictionary for a string that matches an input data string slows them down, preventing real-time operation with high data rates, especially if only a single processing thread is used. Moreover, these algorithms generally do not exploit particular characteristics of the input data which can offer opportunities for higher compression rate and speed.
Thus, there is a need for a system that can compress high volumes of imagery quickly and simply.
Certain aspects of the present technology address such needs. A variety of other features and advantages are also detailed.
One of the cameras, shown on the right, applies one of its execution threads to compression of the imagery captured by that camera. The compressed data is then stored on disk (or transmitted over a network connection).
While imagery from just one camera is compressed in the example of
The disk storage used in
In the embodiment of
The fast compression needed in these systems can be achieved by the arrangements detailed below.
In an exemplary system, the exposure interval for each captured image is very short—typically under 100 microseconds. Supplemental lighting must be used to assure adequate illumination of items on the conveyor belt. Even so, the resulting images tend to be dark, and vacant regions of the dark conveyor belt (which comprise more than 50% of most images) are nearly black, and are thus represented by pixels of low (dark) values. Moreover, in the majority of the images there is not any object; the image depicts only a dark conveyor belt.
In an illustrative embodiment, a difference array—corresponding to the pixel array—is computed. That is, a difference is computed between pairs of neighboring pixel values. (The neighbors are typically adjoining pixels, but this is not strictly necessary.)
The first value in
The second value in
Small difference values require fewer bits to represent than large difference values. In the illustrative embodiment, fields of different bit lengths (e.g., of 2-, 4-, 8-, 12- or 16-bits) are used to represent the difference values, with the shortest field that can represent each difference value being used in each instance.
To permit a decoder to correctly interpret these different-length fields, so that it can thereby reconstruct the original array of pixel data, a field length tag is associated with each difference value. Field length tags of three bits are used in the illustrative embodiment, but longer or shorter tags can naturally be used, depending on the particular application. As shown in
In the exemplary embodiment, the absolute value of each difference is represented by the variable length (2-bit to 16-bit) field. The sign of the difference is represented by a separate single-bit flag, with “1” indicating a positive difference value, and “0” indicating a negative difference value. (A difference value of zero can use either bit flag.)
Thus, three data are associated with each value in the difference frame of
The field length tag “011” is used to signal that this difference value is represented as a string of 12 bits.
The difference 1648 is a positive number, so the polarity bit flag is a “1.”
These three data can be conveyed in any order. In the bottom row of the
It will be recognized that this just-mentioned string is 16 bits in length. The image pixel was originally-represented as a 16-bit datum, so there is no economy in this form of representation, in this instance. But in other instances, a savings is achieved.
The next column of
The next two columns show savings of even more bits. Each represents a difference value by 6 bits, as contrasted with the 16-bits required for the original pixels. Ten bits are saved for each pixel.
The string at the bottom of
Thus, in certain embodiments the above-detailed compression arrangement is followed by second phase of compression, such as an implementation of LZ77 or LZ78 compression. In an illustrative embodiment, Zstandard software is used. Zstandard software (sometimes abbreviated Zstd) was developed at Facebook and is based on LZ77 principles. An open source reference implementation of the code is available from the Facebook site on the Github service (Facebook<dot>github<dot>io/zstd/). The output data from this second phase of data compression can be regarded as a twice-compressed counterpart to the input data. This twice-compressed data is then stored on a disk drive device, or transmitted on a network (e.g., for cloud storage or analysis).
Recovery of the original data from the compressed data is straight-forward. If the compressed data is of the twice-compressed form, a decompression algorithm corresponding to the second phase of compression is applied. An advantage of Zstandard is that it can be trained for the data to increase the compression ratio and speed. This feature is helpful when the library is used to compress many similar data that mostly have similar patterns, as ii our case here. The open source implementation of Zstandard software includes fast decompression code. Zstandard decompression yields the once-compressed counterpart to the input data, e.g., as depicted by the composite string at the bottom of
This composite string can be parsed serially. The first bit (“1”) indicates that the polarity of the first difference value is positive. The next three bits (“011”) indicate that the following difference value is represented as a 12-bit string. The decompressor takes the following 12-bits, and zero-pads them to yield a 16-bit number, which is the pixel value of the first pixel (1648).
The decompressor continues by examining the next bit in the composite string (“0”), which indicates that the polarity of the second difference value is negative. The following three bits (“010”) indicate that the following difference value is represented as an 8-bit string. The decompressor then takes the next 8-bits, and subtracts them from the just-determined value of the preceding pixel (because the polarity of the difference is negative), yielding the pixel value of the second pixel (1584).
The decompressor then examines the next bit in the composite string (“1”), which indicates that the polarity of the third difference value is positive. The following three bits (“000”) indicate that the following difference value is represented as a 2-bit string. The decompressor than takes the next 2-bits, and adds them to the just-determined value of the preceding pixel (because the polarity of the difference is positive), yielding the pixel value of the third pixel (1584).
The decompressor continues in this fashion until it has worked its way through the composite bit string and re-created the original array of pixel values shown in
It will be recognized that the compression method detailed above is exceedingly fast and simple, and is suited to single-thread execution. No string-matching is required. The second phase of compression, when used, is slower. But it operates on data that has already been compressed once, so the throughput requirements for the second phase of compression are not as demanding as requirements for the first phase.
Having described and illustrated aspects of the technology with reference to illustrative embodiments, it will be recognized that the technology is not so-limited.
For example, while the input data in the exemplary arrangements is natural image data, this is not required. The technology can be used with data of any sort, including audio and video data, synthetic image data, data from other sensors, and data resulting from other data processing operations.
Similarly, while the exemplary arrangements concern lossless compression, this is not required. In a different embodiment the input data can be quantized, losing one or more least significant bits of resolution. Such quantized data, with the LSB(s) truncated, can be compressed using the above-described arrangements, yielding fast and simple compression, but without the ability to recover the finest level of resolution.
While the difference data in the detailed embodiment is determined between the present pixel and the immediately-preceding pixel, this too is not required. The difference data can be relative to any other known pixel value in the input data set, such as one or two pixels away from the current pixel—either forwards or backwards.
Moreover, in some embodiments the difference may be relative to a spatially-corresponding pixel (i.e., at the same row/column coordinates) in a preceding image frame. Or where, as in the illustrative system, a camera captures imagery from a belt that advances by generally consistent spatial offsets between successive frames (e.g., 72 rows of pixels in a particular example), the difference can be between a pixel in the current frame and a pixel in the same column, but 72 rows away, in the previous frame.
Still further, the difference may be between the current pixel value, and an average of two or more other pixels. These other pixels may all be in the current image row, or may be from plural rows.
The use of five different bit field lengths in the detailed arrangement is exemplary and not limiting. Other arrangements can use more or fewer different bit field lengths. With the three-bit field length tag of the detailed embodiment, eight different bit field lengths can be represented (e.g., 2-, 4-, 6-, 8-, 10-, 12-, 14- and 16-bit fields).
Although the detailed embodiment was described as a pixel-by-pixel process in which a pixel value is input, and compressed data corresponding to that pixel is immediately output, this is not necessary. In some embodiments an entire frame of pixel values is buffered, and is then processed. In such an arrangement all of the difference values are computed. All of the polarities are then known. All of needed field lengths are determined, together with associated field length tags. Only after all of this data is generated are any of the output data elements, i.e., the triplet of: polarity tags, field length tags, or difference values, output.
In one such embodiment these data elements are not output as successive triplets, e.g., {polarity tag, field length tag, difference value, polarity tag, field length tag, difference value . . . }. Instead, the like elements are all grouped together. For example, the output data string may start with polarity tags for all 1,310,720 pixels in the frame, followed by field length tags for all the pixels in the frame, followed by the difference values for all the pixels in the frame. Or variant data packing can be used, such as pairing the polarity tag and field length tag for each pixel into a 4-bit string, and sending such strings for all 1,310,720 pixels in the frame grouped together, followed by the difference values for all the pixels in the frame, etc. In a further variant, the difference values are grouped based on their field lengths. For example, all of the 2-bit differences can be grouped together, followed by all of the 4-bit differences, etc. (The order with which such differences should be used during decompression is indicated by the order in which the field length tags are presented.)
In another embodiment that buffers and then processes an image frame, the difference value for a pixel need not involve, solely, pixels that are earlier in a row/column scan down the image frame. For example, the difference value for a pixel can be relative to the average of the eight surrounding pixels, i.e., those that are vertically-, horizontally-, and diagonally-adjoining. In this case decompression is less straight-forward and slower, since determination of each pixel value requires solving systems of multiple variables. (The values of the four corner pixels and other of pixels scattered through the image frame can be left uncompressed as known constraints for this process.) But in many applications it is acceptable for the decompression process to be slower than the compression process. (This particular arrangement can also result in a small degree of data loss, since the averaging process can yield non-integer values.)
Although the technology is described in the context of a recycling system, it will be recognized that large volumes of data must be compressed quickly in many other contexts. One example is astronomy. Another is medical imaging. Another is particle detection in high energy physics. Etc.
The imagery in the detailed examples is assumed to be greyscale. Color imagery (e.g., RGB or CMYK) can be compressed similarly, with each color channel compressed separately.
Although not detailed earlier, it will be understood that the compressed output data is accompanied by certain overhead, or administrative data. This data can precede the compressed output data in a header data structure. This data structure can include, e.g., data identifying the image frame, data specifying the frame dimensions in rows/columns, data specifying the bit depth of the pixels (e.g., 16-bit), etc. It can further include error checking data, such as a CRC value. It may also include a count of the number of pixels represented as 2-bit differences, the number of pixels represented by 4-bit differences, etc.
In some embodiments the high bandwidth input (image) data is first stored on a high-speed disk drive (e.g., a solid state drive equipped USB 3.2 SuperSpeed+interface), and this data is thereafter read from high-speed drive and compressed using the detailed technology before storage on longer-term, slower-speed storage media, or transmission over the internet (e.g., to cloud storage).
As indicated, the processes and system components detailed in this specification can be implemented as instructions for computing devices, including general purpose processor instructions for a CPU such as the cited Intel 9960X processor. Implementation can also employ a variety of specialized processors, such as graphics processing units (GPUs, such as are included in the nVidia Tegra series, and the Adreno 530—part of the Qualcomm Snapdragon processor), and digital signal processors (e.g., the Texas Instruments TMS320 and OMAP series devices, and the ultra-low power Qualcomm Hexagon devices, such as the QDSP6V5A), etc. The instructions can be implemented as software, firmware, etc. These instructions can also be implemented in various forms of processor circuitry, including programmable logic devices, field programmable gate arrays (e.g., the Xilinx Virtex series devices), field programmable object arrays, and application specific circuits—including digital, analog and mixed analog/digital circuitry. Although single-threaded execution of the instructions is preferred, execution can be distributed among processors and/or made parallel across processors within a system or across a network of devices. Processing of data can also be distributed among different processor and memory devices. References to “processors,” “modules” or “components” should be understood to refer to functionality, rather than requiring a particular form of implementation.
Implementation can additionally, or alternatively, employ special purpose electronic circuitry that has been custom-designed and manufactured to perform some or all of the component acts, as an application specific integrated circuit (ASIC).
Software instructions for implementing the detailed functionality can be authored by artisans without undue experimentation from the descriptions provided herein, e.g., written in C, C++, etc., in conjunction with associated data.
Software and hardware configuration data/instructions are commonly stored as instructions in one or more data structures conveyed by tangible media, such as magnetic or optical discs, semiconductor memory, etc., which may be accessed across a network.
Although disclosed as a complete system, sub-combinations of the detailed arrangements are also separately contemplated (e.g., omitting various of the features of a complete system).
While aspects of the technology have been described by reference to illustrative methods, it will be recognized that apparatuses configured to perform the acts of such methods are also contemplated as part of applicant's inventive work. Likewise, other aspects have been described by reference to illustrative apparatus, and the methodology performed by such apparatus is likewise within the scope of the present technology. Still further, tangible computer readable media containing instructions for configuring a processor or other programmable system to perform such methods is also expressly contemplated.
To provide a comprehensive disclosure, while complying with the Patent Act's requirement of conciseness, applicant incorporates-by-reference each of the documents referenced herein. (Such materials are incorporated in their entireties, even if cited above in connection with specific of their teachings.) These references disclose technologies and teachings that applicant intends be incorporated into the arrangements detailed herein, and into which the technologies and teachings presently-detailed be incorporated.
In view of the wide variety of embodiments to which the principles and features discussed above can be applied, it should be apparent that the detailed embodiments are illustrative only, and should not be taken as limiting the scope of the invention.