The example embodiments relate generally to image processing, and more specifically, to multi-pass image processing.
A number of image processing systems use multi-pass architectures. In such architectures, multiple downscaled versions of an image may be sequentially processed. For example, a full-scale image (1:1 scale) may be received by an image front end (e.g., received from an image sensor), and a number of downscaled resolution versions of that full-scale image may be generated, such as a 1:4 scale, and a 1:16 scale. The full-scale image and the downscaled images may be stored in a memory—such as in a random access memory (RAM). An image processor may then process the images sequentially from lower resolutions to higher resolutions. For example, such an image processor may process the 1:16 scale image, then the 1:4 scale image, and finally the 1:1 full-scale image. Such techniques may be used for image processing such as two-dimensional filtering, de-mosaicing, lens rolloff correction, scaling, color correction, color conversion, noise reduction filtering, spatial filtering, scale space image processing, and other image processing applications.
Multi-pass processing can allow for higher quality image processing at a relatively low cost. For example, multi-pass architectures can allow for effective kernel sizes of filters to be significantly larger than their actual size as implemented.
One aspect of conventional multi-pass architectures is that the larger resolution images cannot be processed or discarded before the smaller resolution images are processed. This sequential dependency can be costly for high resolution images. For example, significant bandwidth may be expended writing each full-scale image into RAM. This may result in significant power consumption, particularly if the RAM is off-chip. It may also result in one extra frame delay of a preview stream corresponding to the processed images, as the full-scale image is copied to RAM, and then fetched for further processing. If on-chip memory or caching is used, this bandwidth and power consumption and extra frame delay may be reduced, but may require a large amount of such on-chip memory, which may be costly to implement.
This Summary is provided to introduce in a simplified form a selection of concepts that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to limit the scope of the claimed subject matter.
Aspects of the present disclosure are directed to methods and apparatus for processing image data. An example method may include sequentially receiving a plurality of raster lines corresponding to an image, and grouping the received plurality of raster lines into a plurality of full-scale horizontal stripes of image data. For each full-scale horizontal stripe of image data, the method may include generating a first downscaled version of the full-scale horizontal stripe, generating a full-scale rotated stripe by rotating the full-scale horizontal stripe to a vertical orientation, generating a first downscaled rotated stripe by rotating the first downscaled version of the full-scale horizontal stripe to the vertical orientation, and performing image processing on the full-scale rotated stripe and the first downscaled rotated stripe before all subsequent raster lines of the image have been received.
In another example, an image processing system configured to process an image is disclosed. The image processing system includes an image front end (IFE) to sequentially receive a plurality of raster lines corresponding to the image, and group the received plurality of raster lines into a plurality of full-scale horizontal stripes of image data. The image processing system also includes one or more processors, and a first memory storing instructions that, when executed by the one or more processors, cause the image processing system to, for each full-scale horizontal stripe of image data: generate a first downscaled version of the full-scale horizontal stripe, generate a full-scale rotated stripe by rotating the full-scale horizontal stripe to a vertical orientation, generate a first downscaled rotated stripe by rotating the first downscaled version of the full-scale horizontal stripe to the vertical orientation, and perform image processing on the full-scale rotated stripe and the first downscaled rotated stripe before all subsequent raster lines of the image have been received by the IFE.
In another example, a non-transitory computer readable storage medium is disclosed, storing instructions that when executed by one or more processors of an image processor, cause the image processor to process an image by performing operations including sequentially receiving a plurality of raster lines corresponding to an image, and grouping the received plurality of raster lines into a plurality of full-scale horizontal stripes of image data. For each full-scale horizontal stripe of image data, the operations may include generating a first downscaled version of the full-scale horizontal stripe, generating a full-scale rotated stripe by rotating the full-scale horizontal stripe to a vertical orientation, generating a first downscaled rotated stripe by rotating the first downscaled version of the full-scale horizontal stripe to the vertical orientation, and performing image processing on the full-scale rotated stripe and the first downscaled rotated stripe before all subsequent raster lines of the image have been received.
In another example, an image processing system configured to process an image is disclosed. The image processing system includes means for sequentially receiving a plurality of raster lines corresponding to an image, and means for grouping the received plurality of raster lines into a plurality of full-scale horizontal stripes of image data. For each full-scale horizontal stripe of image data, the image processing system may include means for generating a first downscaled version of the full-scale horizontal stripe, means for generating a full-scale rotated stripe by rotating the full-scale horizontal stripe to a vertical orientation, means for generating a first downscaled rotated stripe by rotating the first downscaled version of the full-scale horizontal stripe to the vertical orientation, and means for performing image processing on the full-scale rotated stripe and the first downscaled rotated stripe before all subsequent raster lines of the image have been received.
The example embodiments are illustrated by way of example and are not intended to be limited by the figures of the accompanying drawings, where:
Like reference numerals refer to corresponding parts throughout the drawing figures.
In the following description, numerous specific details are set forth such as examples of specific components, circuits, and processes to provide a thorough understanding of the present disclosure. The term “coupled” as used herein means connected directly to or connected through one or more intervening components or circuits. Also, in the following description and for purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the example embodiments. However, it will be apparent to one skilled in the art that these specific details may not be required to practice the example embodiments. In other instances, well-known circuits and devices are shown in block diagram form to avoid obscuring the present disclosure. Some portions of the detailed descriptions which follow are presented in terms of procedures, logic blocks, processing and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the relevant art to most effectively convey the substance of their work to others skilled in the art. In the present application, a procedure, logic block, process, or the like, is conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present application, discussions utilizing the terms such as “accessing,” “receiving,” “sending,” “using,” “selecting,” “determining,” “normalizing,” “multiplying,” “averaging,” “monitoring,” “comparing,” “applying,” “updating,” “measuring,” “deriving” or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
In the figures, a single block may be described as performing a function or functions; however, in actual practice, the function or functions performed by that block may be performed in a single component or across multiple components, and/or may be performed using hardware, using software, or using a combination of hardware and software. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the example embodiments. Also, the example image processing devices may include components other than those shown, including well-known components such as one or more processors, memory and the like.
The techniques described herein may be implemented in hardware, software, firmware, or any combination thereof, unless specifically described as being implemented in a specific manner. Any features described as modules or components may also be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a non-transitory processor-readable storage medium comprising instructions that, when executed, performs one or more of the methods described above. The non-transitory processor-readable data storage medium may form part of a computer program product, which may include packaging materials.
The non-transitory processor-readable storage medium may comprise random access memory (RAM) such as synchronous dynamic random access memory (SDRAM), read only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), FLASH memory, other known storage media, and the like. The techniques additionally, or alternatively, may be realized at least in part by a processor-readable communication medium that carries or communicates code in the form of instructions or data structures and that can be accessed, read, and/or executed by a computer or another processor.
The various illustrative logical blocks, modules, circuits and instructions described in connection with the embodiments disclosed herein may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), application specific instruction set processors (ASIPs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. The term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated software modules or hardware modules configured as described herein, for example software modules or hardware modules comprising stages in one or more image processing pipelines. Also, the techniques could be fully implemented in one or more circuits or logic elements. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The example embodiments are not to be construed as limited to specific examples described herein but rather to include within their scopes all embodiments defined by the appended claims.
As mentioned above, conventional multi-pass image processing architectures receive a full-scale image, generate one or more downscaled versions of the full-scale image, and process the full-scale image and the one or more downscaled versions. Such architectures can allow for increased effective kernel sizes as compared to non-multi-pass architectures. However, such architectures can introduce sequential dependency, where the larger resolution copies of the image to be processed cannot be processed or discarded before the smaller resolution images are processed. This sequential dependency can result in significant power consumption and bandwidth if off-chip RAM is used for storing the multiple copies of the image, and can require significant and costly amounts of on-chip memory if local caching is used instead. The requirement that all smaller resolution images are processed before the larger scale images can also introduce a frame delay—for example a frame delay in a preview stream corresponding to the processed image.
Note that the sequential dependency of such architectures results in the MPIP 140 being unable to process or discard full-scale image 104A until the lower resolution images 104B and 104C have been processed. As discussed above, this may be costly, as the required memory for storing each full-scale image can be quite large. In addition, if off-chip RAM (such as RAM 130) is used for storing the images 102A-102C, then the bandwidth required for storing these images, and then for the MPIP 140 to read the images can be considerable. For example, for ultra-high definition (UHD) resolution images (i.e., having a full-scale resolution of 3840×2160 pixels), approximately 24 MB of data may be required to be buffered for electronic image stabilization (EIS), or 16 MB without EIS. The bandwidth and power consumed may be several gigabytes per second (GB/s) or several hundred milliwatts (mw) for UHD 60 (UHD resolution at 60 frames per second) if EIS is used. As resolutions and framerates continue to increase, this bandwidth and power consumption may become even more problematic.
In addition, some image multi-pass image processors employ stripe-based processing. In such systems, the full-scale image and the downscaled images are divided into stripes for processing. Such stripe-based processing can allow for cost savings in an MPIP, for example by allowing an MPIP to use smaller line buffers.
However, where MPIP 140 processes each of images 104A-104C of
MPIP 240 may process the full-scale and the downscaled images in an increasing size order (such as described above with respect to
MPIP 340 may then read each of the stored images as an equal number of vertical stripes. For example, with respect to
It would be advantageous for an image processing system to realize both the performance benefits of multi-pass processing and the cost benefits of stripe-based processing, while minimizing or avoiding the sequential dependence of the previously described image processing systems. Accordingly, the example embodiments described herein provide for stripe-based multi-pass image processing systems which allow for an MPIP to process received stripes of image data before all stripes of the captured image have been received by the IFE.
In accordance with the example embodiments, an image processing system may perform both multi-pass and stripe-based image processing, and may reduce or eliminate the sequential dependency of conventional stripe-based multi-pass image processing. The example embodiments may counter that sequential dependency by grouping the raster lines of a received full-scale image and its corresponding downscaled images into sets of horizontal stripes, and rotating each horizontal stripe to generate a set of vertical stripes corresponding to the full-scale horizontal stripe and to each of its corresponding downscaled horizontal stripes. The MPIP may then process sets of corresponding stripes in an increasing order of size, as in
IFE 420 may further generate one or more corresponding sets of downscaled horizontal stripes 402B(1)-402B(3) and 402C(1)-402C(3), where each downscaled horizontal stripe is a downscaled version of one of the full-scale horizontal stripes. For example, the downscaled horizontal stripes 402B(1) and 402C(1) may correspond to full-scale horizontal stripe 402A(1), downscaled horizontal stripes 402B(2) and 402C(2) may correspond to full-scale horizontal stripe 402A(2), and downscaled horizontal stripes 402B(3) and 402C(3) may correspond to full-scale horizontal stripe 402A(3). These corresponding downscaled horizontal stripes may also be stored in system cache 430. In some embodiments, the one or more corresponding sets of downscaled horizontal stripes may comprise a 1:4 resolution set and a 1:16 resolution set of horizontal stripes—for UHD, a full-scale resolution may be 3840×2160 pixels, a 1:4 resolution may be 960×540 pixels, and a 1:16 resolution may be 240×135 pixels. Note that while, in
After a full-scale horizontal stripe and its corresponding downscaled horizontal stripes are stored, the MPIP 440 may read these stripes in a rotated (e.g., vertical) orientation. In particular, MPIP 440 may read a stored full-scale horizontal stripe as if it were a full-scale “vertical” stripe, and may read the corresponding downscaled horizontal stripes as if they were downscaled vertical stripes. In some other embodiments, the IFE 420 may store the full-scale horizontal stripes 402A(1)-402A(3) and the corresponding downscaled horizontal stripes 402B(1)-402B(3) and 402C(1)-402C(3) in the system cache 430 in a rotated orientation (e.g., storing the horizontal stripes in the vertical orientation). In such embodiments, the MPIP 440 may not need to read the stripes in a rotated orientation.
Among other benefits, rotating the horizontal stripes before processing may allow each stripe to be processed as it is received by the image processing system 400, rather than waiting for the full image 401A to be captured by image sensor 410 and received by IFE 420. Instead, the MPIP 440 may begin processing individual stripes before all subsequent stripes have been received. For example, with respect to
The example embodiments may reduce frame latency of conventional multi-pass image processing systems. For example, conventional multi-pass image processing systems require at least one frame delay due to the sequential dependence for such systems. In contrast, as described above, the present embodiments may reduce this frame latency by allowing stripes to be processed as they are received, which may reduce preview or display latency by up to a full frame. Improvements in preview/display latency may be important for applications requiring actions to be performed in real-time responsive to the processed images, such as for computer vision, or for remote vehicle navigation. For example, the reduced preview/display latency may be helpful for navigation of remote controlled vehicles (for example quadcopters or “drones”), as such navigation often depends on images captured and processed from a vehicle-mounted camera.
In some example embodiments, two or more of the sets of stripes—such as sets 404(1)-404(3) of
After each set of stripes 404(1)-404(3) has been processed, the resulting processed full-scale image may be stored in memory, such as a RAM. For example, with reference to the multi-pass image processing system 500 shown in
Image sensor 610 may include one or more image sensors such as one or more color filter arrays (CFAs) arranged on a surface of the respective sensors, and may be coupled directly or indirectly to processor 620. Image sensor 610 may alternatively include other types of image sensors for capturing images. For example, image sensor 610 may include arrays of solid state sensor elements such as complementary metal-oxide semiconductor (CMOS) sensor elements, or other appropriate image sensor devices.
Memory 640 may include a non-transitory computer-readable medium (e.g., one or more nonvolatile memory elements, such as EPROM, EEPROM, Flash memory, a hard drive, and so on) that may store at least the following software (SW) modules:
Processor 620 may be any suitable one or more processors capable of executing scripts or instructions of one or more software programs stored in device 600 (e.g., within memory 640). Further, processor 620 may include one or more stages of an image processing pipeline. For example, processor 620 may execute the stripe reception software module 641 to receive raster lines of image data from the image sensor 610, and to group the received raster lines into stripes of image data. Processor 620 may also execute the downscaled stripe generation software module 642 to generate one or more downscaled stripes of image data corresponding to each full-scale stripe of image data. Processor 620 may further execute the stripe rotation software module 643 to read horizontal stripes of image data in a rotated orientation, and (optionally) rotate processed full-scale images to match an original orientation of a received image. Processor 620 may further execute the stripe processing software module 644 to process rotated stripes of full-scale and downscaled image data.
A plurality of raster lines may be sequentially received, where the plurality of raster lines corresponds to an image (710). The plurality of raster lines may be grouped into a plurality of full-scale horizontal stripes of image data (720). For example, the plurality of raster lines may be received from image sensor 410 of
After generating the full-scale rotated stripe and the downscaled rotated stripe of image data, the full-scale rotated stripe and the downscaled rotated stripe may be processed before all subsequent raster lines of the image have been received (734). For example, the full-scale rotated stripe and downscaled rotated stripe may be processed by MPIP 440 of
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosure.
The methods, sequences or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
In the foregoing specification, the example embodiments have been described with reference to specific example embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader scope of the disclosure as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.