The present embodiments relate generally to image compression and, more particularly, but not exclusively, to compressing an image to substantially a preset file size using statistical information obtained from a single subset of the image and an initial compression of the single subset.
In the digital world, JPEG is a commonly used method for compressing and storing digital images. The term “JPEG” stands for Joint Photographic Experts Group, and is the name of a standards committee that created the JPEG standard, among other standards. The JPEG standard defines how an image is compressed into a stream of bytes and decompressed back into an image. The standard enables a user to adjust the degree of compression of an image, and thereby allows for a selectable tradeoff between storage size for the compressed image, and the resulting image's quality when subsequently uncompressed.
However, many computing systems may be constrained in a size of storage space that may be allocated to storing images. Thus, the JPEG compression method appears to be a desirable method to use. However, current implementations of the JPEG compression algorithms often require the user to encode, or compress, an image a plurality of times in order to first obtain an estimate of a scale factor or quantization value that may then be used to obtain a desired preset target size of the compressed image. If the size of the compressed image exceeds the preset target size, then further compressions of the image may be required. Should the resulting size of the compressed image significantly undershoot the preset target size, then further compressions may again be required. This may be the case, for example, where underutilizing storage size may result in wasting space that might not be useable for storing other compressed images.
Moreover, because the quantization factor which is defined by the JPEG standard is fixed for the entire image and do not provide for bit rate control during the compression of the image. Thus, it is with respect to these considerations and others that the present invention has been made.
Non-limiting and non-exhaustive embodiments are described with reference to the following drawings. In the drawings, like reference numerals refer to like parts throughout the various figures unless otherwise specified.
For a better understanding of the present embodiments, reference will be made to the following Detailed Description, which is to be read in association with the accompanying drawings, in which:
The present embodiments now will be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific aspects in which the embodiments may be practiced. These embodiments may, however, take many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope to those skilled in the art. Among other things, the present embodiments may include methods or devices. Accordingly, the present embodiments may take the form of entirely hardware or a combination of software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.
Throughout the specification and claims, the following terms take the meanings explicitly associated herein, unless the context clearly dictates otherwise. The phrase “in one embodiment” as used herein does not necessarily refer to the same embodiment, though it may. Furthermore, the phrase “in another embodiment” as used herein does not necessarily refer to a different embodiment, although it may. Thus, as described below, various embodiments of the invention may be readily combined, without departing from the scope or spirit of the invention.
In addition, as used herein, the term “or” is an inclusive “or” operator, and is equivalent to the term “and/or,” unless the context clearly dictates otherwise. The term “based on” is not exclusive and allows for being based on additional factors not described, unless the context clearly dictates otherwise. In addition, throughout the specification, the meaning of “a,” “an,” and “the” include plural references. The meaning of “in” includes “in” and “on.”
As used herein, the term “image,” or “image data,” refers to data that defines an image to be displayed in at least two dimensions, and may take the form of a single display of the image, or a time varying display of a sequence of images comprising multiple video frames which may be spaced in time.
As used herein, the term “quantizer” refers to a value or values useable for modifying, such as by scaling, a transformation of an input image or portion of an input image. For example, where the transformation of the input image is represented as coefficients obtained from a discrete cosine transform (DCT), the modification of the transformation may include dividing a coefficient obtained from the DCT by the quantizer or scaling factor. In one embodiment, a result of the division may further be rounded to a nearest integer value.
As used herein, the term “bit rate” refers to an amount of data produced by a compression method or encoder per a selected unit. In one embodiment, the selected unit may be time. Thus, a bit rate may be expressed in units of kilo-bits per second, kbps; however, other units may also be used. For example, bit rate may also be determined over a number of slices of an image, or the like.
The following briefly describes the embodiments in order to provide a basic understanding of some aspects. This brief description is not intended as an extensive overview. It is not intended to identify key or critical elements, or to delineate or otherwise narrow the scope. Its purpose is merely to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
Briefly stated, embodiments are directed towards compressing an image to substantially a preset file size using statistical information obtained from a single subset of the image and an initial compression of the single subset. A representative subset portion of the image is selected based in part on a clustering analysis of the image. The representative subset is then compressed, in one embodiment, twice, in order to obtain statistics useable for the entire image. A scale factor is then determined that may be used in the quantization and for creating a Bit Rate Control (BRC) curve. The BRC curve represents an amount of accumulated bits per Minimal Codec Unit (MCU) or slice. During the compression (sometimes called encoding) process, the BRC curve is used to prevent accumulating bits from over shooting a final preset file size target. In one embodiment, the disclosure below is applied using a JPEG compression; however, other compression algorithms may also be employed. For example, the embodiments disclosed below may be applied to virtually any other compression method where a same scaling factor or quantizer is employed for the entire image.
In addition, while the disclosure below discusses a single image as the input, other inputs may also be used, including, for example, a sequence of images or sequence of frames of an image, such as a video sequence, or the like.
By employing the disclosed zero pass JPEG compression with a BRC curve, it is anticipated that compression time may be reduced as multiple compressions of the entire image to obtain the preset file size is eliminated. As discussed below, this may be achieved at least in part, because the scale factor is estimated based on compression of a subset of the image, the BRC curve is created from statistics obtained from the subset of the image, and the bit rate control using the BRC curve may be applied during the course of the compression. Thus, obtaining compressed images that satisfy the present file size may be performed more quickly over conventional methods.
As shown, system 100 of
As shown, system 100 may include components on a single integrated circuit chip or on a plurality of different circuit chips. In any event, components shown in
Also shown is a volatile random-access memory (RAM) circuit chip 106 that may be coupled to EIP 200 to provide temporary data storage. In one embodiment, RAM 106 may be configured to receive and store image data, such as one or more frames of image data for use by EIP 200 or output data from EIP 200, as well as to store scale factors, BRC curves, cluster data, quantization data, and the like. A separate non-volatile read-only memory (ROM) memory chip 104 is also coupled to EIP 200 and may be employed for storage of a processor program, calibration data, look-up tables (LUTS), non-linear functions, a variety of other data useable by system 100, and the like. In one embodiment, ROM 104 may be flash memory, which is re-programmable, or a memory that is programmable once, such as programmable read-only memory (PROM), electrically programmable read-only memory (EEPROM), or any of a variety of other storage devices.
Although not illustrated, other type of memory or physical storage devices may be included within system 100, including, for example, memory cards that may, include semi-conductor flash electrically erasable and programmable read-only memory, removable rotating magnetic disk storage, removable universal serial bus (USB) devices, or any of a variety of other storage devices. In one embodiment, system 100 may also be configured through Input/Output (I/O) device 108 to access storage devices that may be external to system 100. Thus, it should be understood that EIP 200 may be configured to receive one or more frames of image data, operate upon the received one or more frames of image data to compress the data, and store or otherwise send a resulting compressed (encoded) bit-stream of data using a variety of storage devices, and/or communication mechanisms, and therefore is not limited to merely those described herein.
I/O device 108 includes circuitry for coupling system 100 to one or more external devices, networks or the like, and is constructed for use with one or more communication protocols and technologies, including any of a variety of communication protocols and technologies useable for communicating images, including images to and/or from system 100. In one embodiment, I/O device 108 is sometimes known as a transceiver, transceiving device, or network interface card (NIC).
I/O device 108 may also provide for various other communications, including for use various input devices, such as keypads, touch screens, or the like, as well as output devices including screen displays, audio outputs, or the like. Thus, although not shown, system 100 may also include a speaker and/or microphone that may be coupled to I/O device 108 to enable communications. System 100 may also include a display that may include a liquid crystal display (LCD), gas plasma, light emitting diode (LED), or any other type of display usable for providing text and/or an image for display. Further, in one embodiment, the display may also include a touch sensitive screen arranged to receive input from an object such as a stylus or a digit from a human hand.
Also illustrated, is an analog-to-digital converter (A/D) 110 that may be configured to receive an analog signal representing an image, and to convert the received signal into digital image data that, in one embodiment, may be a sequence of individual blocks of digital image data representing an intensity of light that may be received through various photo-detectors of an image sensor and/or lens arrangement (not shown). A/D 110 may then provide the digital data to EIP 200 for processing.
One embodiment of EIP 200 is shown in
As shown in
Moreover, in one embodiment, ZPC 201 may be implemented in software that operates within image processor 208. However, in another embodiment, ZPC 201 may represent a hardware component, integrated circuit, or the like, configured to perform actions as described herein.
Interfaces 210 may provide for various mechanisms to communicate with image processor 208 and/or memory management 206, other components, to enable modifications to various actions, provide status of an action, or the like by another device, an end-user, or the like.
Network device 300 includes central processing unit 312, video display adapter 314, and a mass memory, all in communication with each other via bus 322. The mass memory generally includes RAM 316, ROM 332, and one or more permanent mass storage devices, such as hard disk drive 328, tape drive, compact-disc read only memory (CD-ROM)/digital versatile disc-ROM (DVD-ROM) drive 326, and/or floppy disk drive. The mass memory stores operating system 320 for controlling the operation of network device 300. Any general-purpose operating system or special purpose operating system may be employed. Basic input/output system (“BIOS”) 318 is also provided for controlling the low-level operation of network device 300. As illustrated in
The mass memory as described above illustrates another type of computer-readable or processor-readable device, namely non-transitory computer-readable storage media. Computer-readable storage media (devices) may include volatile, nonvolatile, non-transitory, removable, and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of non-transitory computer-readable storage media include RAM, ROM, Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other physical medium which can be used to store the desired information and which can be accessed by a computing device. Moreover, in at least one embodiment, one or more of the memory devices may be configured to employ storage mechanisms that may be based on having a preset file size for images that are to be stored thereon.
As shown, data stores 354 may include a database, text, spreadsheet, folder, file, or the like, that may be configured to maintain and store data useable for Zero Pass Controller (ZPC) 301, including, range data, threshold data, function look-ups, tables, video images, single images, encoded data, reconstructed frame data, reference frame data, motion vectors, one or more frame data, or the like. Data stores 354 may further include program code, data, algorithms, and the like, for use by a processor, such as central processing unit (CPU) 312 to execute and perform actions. In one embodiment, at least some of data and/or instructions stored in data stores 354 might also be stored on another device of network device 300, including, but not limited to cd-rom/dvd-rom 326, hard disk drive 328, or other computer-readable storage device resident on network device 300 or accessible by network device 300 over, for example, network interface unit 310.
The mass memory also stores program code and data. One or more applications 350 are loaded into mass memory and run on operating system 320. Examples of application programs may include transcoders, schedulers, customizable user interface programs, security programs, and so forth. Memory may also include EIP 358 which may further include ZPC 301. It should be recognized that while EIP 358 and ZPC 301 are illustrated within RAM 316, other embodiments may include EIP 358 and/or ZPC 301 within ROM 332, and/or within one or more separate circuit boards (not shown) within network device 300.
EIP 358 and ZPC 301 operate substantially similar to EIP 200 and ZPC 201 of
It should be recognized that EIP 358 and ZPC 301 may operate on image data obtained from data stores 354, hard disk drive 328, cd-rom/dvd-rom drive 326, other storage devices, or even from a network or from another device through network interface unit 310, as well as from various image sensing devices, or the like.
The operation of certain aspects of the invention will now be described with respect to
Process 400 of
Prior to discussing how various embodiments of the EIP/ZPC operate, it may be of interest to first discuss how an image may be received and prepared for processing in one embodiment.
Jumping briefly to
In one embodiment, frame 1104 may be obtained from an analog source, and be represented by red (R), green (G), and blue (B) lines that may be converted into color difference components using, for example, various processes. For example, in one embodiment, such color difference components may be obtained based on the Rec. 601 (formally known as the CCIR-601) component color television standard from the International Telecommunication Union (ITU) Radio communication Sector (ITU-R). However, any of a variety of other techniques may also be employed, and embodiments are not constrained to a particular standard, or format. In any event, by way of example only, the image data may be defined by three components of the image signal; namely, a luminance component (Y), and two complementary chrominance (color differences) components (V=R−Y) and (U=B−Y). For three dimensional or higher dimensional images, and/or other types of image representations, other components may also be included.
In this example, each image sub-block (block 1108) may be formed of a given number of pixels of the image. A Y block 1110, may comprise 16 pixels horizontally and 16 lines of pixels vertically. Where the image data includes a color signal, then the image sub-blocks (block 1108) further include color information in the form of chrominance components, Cb and Cr, where Cb and Cr are the blue-difference (U) and red-difference (V) components, respectively. Each of the color components may be represented by respectively superimposed color blocks 1112.
Various mechanisms may be employed to convert the RGB data signals into color difference components, including for example using a matrix circuit to provide the luminance (Y), and chrominance (Cb, Cr) component signals. In one embodiment, the luminance component and the chrominance components may be received as analog signals that are provided to respective low pass (or equal bandwidth) filters and passed through analog-to-digital converters, to generate a digital data format. In one embodiment, the filtered and digitized luminance and chrominance components may be supplied to a block forming circuit, in one embodiment, where the described image blocks may be formed.
Also illustrated in
Frame 1104 described above and in
Turning next to operations of the EIP/ZPC,
In one embodiment of
Block 402 of process 400 may use the raw Bayer image data 401 as input to determine AC/DC statistics for the image. However, in another embodiment, block 402 may also be configured to use as input the YUV formatted image data, such as processed YUV data of block 411.
In any event, the input, such as raw Bayer image 401, may be divided into blocks and rows, where the blocks are termed as slices. One non-limiting, non-exhaustive example of an image 500 that is shown as being divided into a plurality of slices is seen in
In one embodiment, statistics for the AC components may be determined for each color in the Bayer colors (R, G, and B) separately, for each sub-window (slice). For example, in one embodiment, the following calculations may be performed:
where pixels are referred to as p(x, y). The AC statistics for red may then be determined as
AC_red=AC_Horizontal_red+AC_Vertical_red
The AC statistics for the green and blue colors (AC_green and AC_blue) are computed in a similar approach. It is noted, however, that other methods may also be used to calculate the AC statistics. In any event, the above AC statistics may be interpreted as an approximation of the high frequency energy in the image (e.g., a first derivative of the pixel resolution).
The DC statistics may be calculated, in one embodiment, using a down scaled version of the Bayer formatted image. Referring briefly to
G1=G11+G12+G13+G14
R2=R21+R22+R23+R24
B3=B31+B32+B33+B34
G4=G41+G42+G43+G44
However, other scaling factors may also be used.
The DC statistics may then be determined for each color of the Bayer colors (R, G, and B) separately using the following equation for each sub-window (slice).
where, the down scaled pixels are referred to as p(x, y). The DC statistics of the green and blue colors (DC_green and DC_blue) may be calculated in a similar way. Again, it is noted that other equations may also be used to determine the DC components for the Bayer colors.
The above DC statistics may be interpreted as an approximation of the low frequency energy in the image (e.g., a first derivative of the low resolution of the image). However, higher derivative statistics may also be used for the DC and AC components.
The overall AC and DC statistics may then be computed as weighed sum at block 402 of
AC=WAC
DC=WDC
where W refers to the various weights.
Continuing with process 400 of
At block 404, the AC/DC statistics may be used to segment the slices of the image (shown in
One embodiment for performing the clustering in block 404 may be described as follows. A slices index may be generated in a “raster order,” starting from an uppermost left-hand corner of the image, and progressing across rows from left to right. The AC and DC statistics of all of the slices may then be organized in a vector array, where each slice in the image has its corresponding AC/DC statistics, according to the corresponding slice index:
AC_DC [0: NumOfSlices−1] [0:1]
For each array entry, a first memory location contains a corresponding slice's AC statistics, and a second memory location contains the slice's DC statistics.
Then, the number of clusters, K, is selected. In one embodiment, the value of K may be selected based on a variety of performance requirements, including speed of convergence, quality of results, as well as based on engineering judgment, historical data, or the like. In one embodiment, K may be selected between 4-25; however, other values may also be used.
A vector that includes for each cluster and contains the representative AC/DC statistics may be defined as:
ClusterCentroids [0:K-1] [0:1]
Further, the following vector may be defined to contain for each slice, the cluster index to which the slice belongs:
IDX [0: NumOfSlices−1]
An initial value of the ClusterCentroids vector with AC/DC statistics of the slices may be set, evenly placed at every Nth slice raster order-wise, where N=Round down (Number of slices/K).
Referring briefly to
Continuing the computations of block 404, the distance for each slice form the cluster representatives as:
d(m,n)=abs (AC_DC[m][0]−ClusterCentroids[n][0])+abs(AC_DC[m][1]−ClusterCentroids[n][1])
for:
∀m ∈[0, NumOfSamples−1] and ∀n ∈[0, K-1]
For each slice the cluster index to which it belongs may then be found according to the minimum distance from cluster representatives:
where the arg min provides the index n which results with the min value.
The number of members in each cluster is given by:
The cluster centers may be updated by calculating the mean of the updated member of each cluster, per AC and DC component, as:
Further, the sum of distances from the cluster centers may be calculated as:
The above equations may be repeated at least twice, or until
(Previous_SumOfDist−SumOfDist)/SumOfDist<Threshold
where Threshold may be selected based on a desired confidence of the clustering result.
Turning briefly to
Returning to process 400 of
The cluster segment representatives for the K groups may then be given as:
Representative—IDX[n]=argi min [d(i,n)]
In one embodiment, after clustering in the AC/DC domain, the members may be mapped back onto the sectioned image, such as illustrated in
The representation cluster slices may be selected ideally to represent the entire image, and to provide estimates on what scale factor to use for the entire image, as well as how the BRC curve appears. As shown in
As noted above, the initial scale factor is determined at block 403 of
In any event, in one embodiment, at block 403, the initial scale factor may be determined based on the AC statistics determined from block 402, and a desired compression ratio (CR). For example, in one embodiment, the initial scale factor may be obtained from a look-up table (LUT), such as illustrated in the non-limiting non-exhaustive LUT 1000 of
As an example, looking briefly to LUT 1000 of
Returning to process 400 of
That is, each cluster representative is compressed and the result of the bitstream size may be stored in a vector Representative_stream_size[n] providing the size in bits.
The BPP for the cluster representative may be given by:
BPP[n]=Representative_stream_size[n]/input_image_number_pixels
The results of the compression are then used to calculate a weighted average Compression Ratio or the equivalent inverse BPP. That is, the average BPP is a weighted sum of the representative BPP and is given by:
Average_BPP=Σ BPP[n]*ClusterCount[n]/Σ ClusterCount[n]
for:
∀n ∈[0, K-1]
The average CR can be determined as:
Average CR=YUV_image_BPP/Average_BPP
Weights may be set according to the relative size of each segment relative to the total number of slices. The scale factor may then be adjusted according to a degree of over/undershoot of a target BPP. For example, where the compression is of the YUV image, the result with a target_bitstream_size is then:
Target_CR=YUV—image_size/target_bitstream_size
And, the target_BPP is then determined as:
Target_BPP=YUV_image_BPP/Target_CR
The representation slices are then compressed again using the adjusted scale factor. Again, using the equations above, the weighted average of the BPP is calculated. The first and second sets of scale factors and resulting average BPP values may then be used at block 407 of process 400 of
BPP=function(Scale factor)
where the log linear model may be defined as:
log2 y=a log2 x+b
While the above equation uses the base of two, other bases may also be used. The following system of linear equations may then be used based on values of scale factors and BPPs obtained from block 406:
At block 407 the following equations may then be solved to determine the parameters A[n] and B[n] for each representation slice.
log2 BPP1[n]=A[n] log2 SF1+B[n]
log2 BPP2[n]=A[n] log2 SF2+B[n]
For the global case, the BPP may be represented by the Average_BPP as determined above and the following equations may then be solved to determine the global parameters A and B
log2 Average_BPP1[n]=A log2 SF1+B
log2 Average_BPP2[n]=A log2 SF2+B
Briefly,
Proceeding to block 408 of
SFfinal=pow(2,(log2 BPPtarget−B)/A)
The individual MCUs (Minimal Codec Units) of the YUV image may be mapped onto the slices. Where there is a mismatch in image slice sizes, and MCU sizes, an MCU might not fall nicely within a slice. Thus, for those MCUs that fall on edges between slices are assigned, in one embodiment, either by where an upper right hand corner falls, or by where a largest intersection in the area between the slice and MCU is found. Other approaches may also be used. This mapping is performed to know the number of bits to take for each MCU for the construction of the BRC curve described further below.
With the above LLM for each representation slice, the expected BPP for each representation slice may also be determined as:
BPPRepSlice[n]=pow(2,(A[n]*log2 SFfinal+B [n]))
Further, the BPPRepSlice(n) represents the BPP of each representation slice, and the parameters A[n] and B[n] are used to calculate the target BPP for each cluster, using the above. It may be assumed that each slice has the same BPP as its representation slice and this may then be used to calculate the BRC curve.
Process 400 next flows to block 409, where the BRC curve is created.
Continuing to block 410, using the final scale factor from block 408, and the BRC curve of block 409, the entire image may then be compressed. As discussed above, the image that is compressed may be compressed using the JPEG compression/encoders. In one embodiment, the YUV image after being converted from the Bayer image format and having undergone any additional image processing may be used as the input to block 410.
Along with compressing the image using the final scale factor, block 410 also is configured to monitor the JPEG encoder for possible overflows in bits. When the accumulated bits exceed the bits set by the BRC curve of block 409, a private quantizer technique may be employed to effectively increase the effective scale factor for a specific MCU. The affect of the private quantizer is that fewer bits may be used to encode an MCU without modifying the scale factor.
As shown in process 1200, the input to block 1201 may be an MCU from the processed YUV of block 411 of
Processing then flows to block 1202 where quantization is performed. In one embodiment, the quantization may be performed using default quantization matrices for the luma (Y) and chroma (C) components, as defined by the JPEG Standard (and available from ISO/IEC IS 10918-1 ITU-T Recommendation T.81), examples of which are illustrated in
The output of the quantization block 1202 then is input to private quantizer 1203, along with the BRC curve from block 409 of
In any event, when block 1203 is activated, all of the AC values (u≠0 or v≠0) of the resulting Q[u][v] block are compared against a threshold value. In one embodiment, the DC value (u=v=0) may be unchanged. When the AC value is below a threshold, it may be zeroed to reduce small coefficients in the BPP for the block, and thereby maintain a bit rate control consistent with the BRC curve for the image, and thereby prevent overshooting. In one embodiment, the threshold value may be different per each color component. However, in another embodiment, the threshold may be created as a coefficient matrix, such as T[u][v], and thus, be specific to the DCT coefficients.
The output of process 1203 of
It will be understood that each component of the illustrations of
Accordingly, components of the flow illustrations support combinations of means for performing the specified actions, combinations of steps for performing the specified actions and program instruction means for performing the specified actions. It will also be understood that each component of the flow illustrations, and combinations of components in the flow illustrations, can be implemented by special purpose hardware-based systems, which perform the specified actions or steps, or combinations of special purpose hardware and computer instructions.
This application is a utility patent application based on previously filed U.S. Provisional Patent Application, Ser. No. 61/514,784 filed on Aug. 3, 2011, the benefit of which is hereby claimed under 35 U.S.C. §119(e) and incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61514784 | Aug 2011 | US |