Embodiments of the present invention comprise methods and systems for halftoning digital image data. Some embodiments provide cache-optimized halftoning of digital image data. Some embodiments may relate to other areas of image processing such as image resizing and other areas.
Image processing is typically both data intensive and CPU intensive. Two common image processing operations encountered in printer technology are resizing of image data and halftoning (or screening) of image data. These operations are sometimes done as a single composite operation; they are also done as two separate operations performed sequentially, with resizing of data done first.
There are many different resize algorithms. Some algorithms involve simple replication of input pixel data. Other algorithms involve interpolation of multiple input pixels.
The halftone operation can involve converting 8 bit-per-pixel input data to a lower bit-per-pixel output data (common values are 1-bit, 2-bit, and 4-bit pixel depth). It may also involve color adjustment in a device dependent manner.
In practice, there are different strategies for combining the operations of resize and halftone. These approaches typically involve reading through the halftone screen data sequentially.
The image processing operations of resize and halftone are both processing- and data-intensive; these operations typically consume a significant percentage of time in the overall processing of an image. Performance and efficiency improvements in this area of image processing can provide significant benefits.
Some embodiments of the present invention comprise methods and systems for cache-optimized halftoning of digital image data. Some embodiments comprise processing image data in a non-sequential order that is related to the processing screen, mask or cell dimensions for increased processing performance.
The foregoing and other objectives, features, and advantages of the invention will be more readily understood upon consideration of the following detailed description of the invention taken in conjunction with the accompanying drawings.
Embodiments of the present invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The figures listed above are expressly incorporated as part of this detailed description.
It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the methods and systems of the present invention is not intended to limit the scope of the invention but it is merely representative of the presently preferred embodiments of the invention.
Elements of embodiments of the present invention may be embodied in hardware, firmware and/or software. While exemplary embodiments revealed herein may only describe one of these forms, it is to be understood that one skilled in the art would be able to effectuate these elements in any of these forms while resting within the scope of the present invention.
Some embodiments of the present invention improve performance of image processing activities by processing the image data in a non-sequential manner. Some embodiments may operate on the principle that the speed of data access increases the closer the data is to the processor core:
Accordingly, some embodiments may keep and re-use data in higher speed memory (L1 cache and registers) for a longer time, thus improving performance.
For multi-level threshold halftone algorithms, there are multiple bytes of screen data for every destination pixel to be written. In the case of 8-bit to 4-bit halftoning, there may be 15 bytes of screen data associated with any given destination pixel. Given that the halftone data is the greatest per-pixel burden on memory, some embodiments may process the image data in a manner to minimize the loading of halftone screen data.
In some embodiments of the present invention, processing may traverse the data in a halftone-centric manner. In this manner, a given halftone cell may be selected and every destination pixel associated with that cell is then identified and processed. Thus, the processing of data is not in the order of either the source or destination image data. Instead, the image traversal follows an order wherein the “next pixel” is the next one to use a given halftone cell's data.
Some embodiments of the present invention may be applied in a situation wherein a halftone screen is organized as a square (or rectangular) array, and has the same orientation as the output image (ie, there is no skew or shift to the data). In this case, the image data may be processed modulo the halftone screen's dimension in the direction of processing, e.g., width and height. For instance: if the halftone screen is 128×128 pixels, the data may be processed in the order of every 128 (output) pixels, and every 128 (output) lines.
Some embodiments of the present invention may be described with reference to
This process may be applied to resizing or scaling, halftoning, filtering, interpolating, rotating, transforming, and other processes or combinations thereof. In some embodiments, combined processes, such as scaling and halftoning or scaling and filtering, may be implemented in the processing step before proceeding to the next non-sequential data unit. These combined processes may be referred to as a single “process” or “processing.”
Other embodiments of the present invention may be described with reference to
In alternative embodiments, data units may be processed by columns. Again, for example, starting with the upper-left data unit 40, every data unit having the same result from the operation y mod 3=1 may be processed. This operation will process every third data unit in the first column, after the first unit, followed by every third unit in each of the subsequent columns, after the first unit in each column. The second unit in each column followed by every third unit thereafter will then be processed and so on. Again, this relationship and other similar relationships may also be expressed as a periodic function or by some other mathematical or logical expression.
In some alternative embodiments, where the image data is multi-dimensional, data units corresponding to a multi-dimensional constraint may be processed. For example, data units corresponding to the relationship: x mod a=c and y mod b=d may be processed before incrementing values. In this example, starting with the upper-left data unit 40, every data unit in the first row that satisfies the relationship: x mod a=1 will be processed until the end of the first row. Thereafter, every data unit that satisfies the relationship: x mod a=1 in a row that satisfies the relationship: y mod b=1 will be processed and so on. These units are hatched and designated at 45 in
This process may also be applied to resizing or scaling, halftoning, filtering, interpolating, rotating, transforming, and other processes or combinations thereof. In some embodiments, combined processes, such as scaling and halftoning or scaling and filtering, may be implemented in the processing step before proceeding to the next non-sequential data unit. These combined processes may be referred to as a single “process” or “processing.”
Other embodiments of the present invention may implement variations wherein the order of the data processing is based upon the size of the halftone screen, including (but not limited to) the following:
Given that the halftone data accounts for the great majority of memory references in these image processing operations, these embodiments will improve performance by minimizing the cache misses for the halftone data.
There is an additional opportunity for performance improvement with these embodiments: by using a given cell of halftone data repeatedly, it is possible to keep most or all of the cell data in the actual processor registers, for many modern processors. This of course is the fastest storage option for data access, surpassing even L1 cache. As an example of this, for a given 32-register machine and a standard compiler, there is a sufficient number of scratch registers to store 14 of the 15 bytes of halftone data, without having to resort to assembly language programming.
In some embodiments, multiple operations, e.g. resize and halftone or rotate and filter, may be combined in a single algorithm, in the interest of efficiency and performance. This adds a complication to these embodiments: not only is the data order traversed in a non-standard manner, the location of the source input pixel (or pixels) must be calculated based upon the constraints of the resize or rotate operation, in a manner which still yields a performance benefit for the operation as a whole.
Some embodiments of the present invention may be described with reference to
In the embodiments illustrated in
In
Similar to the input pixel values and the output pixel values, the screen 70 comprises elements that are labeled with a combination of the upper-case letter “S” and a number. These labels, S0-S9 may represent a screen operator, which may comprise a numeric value, a function or some other operation that is applied through the screen 70. In a typical application, a screen operator, e.g. S9 at 74, is applied to an input image pixel value, e.g. d at 58 to produce an output image pixel value, e.g. D9 at 62.
In these embodiments of the present invention, a first screen index location 72 is selected and the first output pixel 61 corresponding to that screen index location is determined. The input image pixel 51 associated with that first output pixel 61 is then determined and that input image pixel 51 is processed. This processing may comprise application of the screen operator, S0 at 72 to the input pixel value, a, at 51 to produce an output pixel value, A0, at 61. Processing then proceeds to the next input pixel 58 associated with an output image pixel 64 that corresponds to the selected screen index location 72. This process continues until all input image pixels associated with output image pixels corresponding to a screen index location are processed. Then, the next screen index location is selected and all input image pixels that relate to output image pixels associated with that next screen index location are processed. This process continues until each screen location is selected and the related pixels are processed.
For a first output pixel 61 location, the associated input pixel location 51 is determined. This location may be determined using an input location process 66 that, in some embodiments, may comprise an inverse process of combined process 57. In some embodiments, wherein combined process 57 comprises a resizing operation and one or more additional operations, input location process 66 may comprise an inverse resizing operation and may not relate to the additional operations. In alternative embodiments, this location may be determined by other methods. The data for this input pixel 51 (“a”) may be halftoned (or screened) using screen cell S0, to generate the output data (A0) for first output pixel 61. The processing order for these embodiments may then proceed to the next output destination pixel location. For the first screen cell, S072, the next output image pixel location is determined to be D064, and the associated input pixel location 58 is determined as discussed. The data for this input pixel (“d”) is halftoned or otherwise processed using cell S0, to generate the output image data, D064. This process may be repeated until the end of either the input stream or output buffer is reached.
When processing proceeds in this order, the address of the “next output pixel location” is the address of the previous output pixel location plus the screen width (10, in this case) until all pixels corresponding to a screen index location are processed. Then, the next screen index location is selected, e.g. S1, and the process is repeated for that screen index location.
The exemplary embodiment illustrated in
These embodiments of the present invention utilize a composite halftone-resize operation wherein the location of the output destination pixel and the nature of the resize algorithm/screen, e.g. screen index location, define which input pixel(s) will be used to determine the input value(s) to be used in the combined operation.
d.
Increment destination column by halftone cell width
5.
Increment the destination row by halftone cell height
The lines in bold designate elements that implement a non-sequential aspect of some embodiments: the data is not traversed in sequential order, but in steps determined by the height and width of the halftone screen.
This section describes some exemplary embodiments of the present invention. These embodiments process data on a per-line basis, as part of a broader resize-halftone operation over the entire image.
For ease of exposition, assume the following:
Empirical results indicate significant performance improvements using such an approach. The actual performance gain is dependent upon many factors:
The terms and expressions which have been employed in the foregoing specification are used therein as terms of description and not of limitation, and there is no intention in the use of such terms and expressions of excluding equivalence of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims which follow.