This invention relates to an instruction based parallel median filtering processor and method.
Median filtering is a non-linear signal enhancement technique for the smoothing of signals, the suppression of impulse noise, and preserving of edges. It consists of sliding a window of an odd number of elements along the signal and replacing the center sample by the median of the samples in the window. The median value m of the samples in a window is the value for which half of the samples in the window have smaller values then m and the other half have values greater than m. In a one dimensional median filter having three samples P1, P2, P3; the median value is found by sorting the three samples and selecting the mid point as the median. In the straightforward approach P2 is compared to P3 in the first stage; the minimum of that is compared to P1 in the second stage, and the minimum of the second stage is the PMIN. In the third stage the maximum output of the second stage is compared to the maximum of the first stage. The maximum output of the third stage is PMAX and the minimum output of the third stage is PMED. One shortcoming of this approach is that the three stages operate sequentially; it requires three cycles of operation to obtain the median. Another problem is that each sort operation (finding the min and max between two samples) is dependent on the result of the previous one which in a deeply pipelined machine would cause pipeline stall: the pipeline will stop, waiting for the offending instruction to finish, before resuming work. A fully parallel solution that mitigates the multiple sequential operation problem uses a dedicated ASIC, which, however, embodies additional limited functionality hardware which permanently accompanies the DSP even though it may be only occasionally needed. Attempts to apply a parallel solution within the DSP that are optimized for multiply-accumulate actions as occur in FIR and FFT operations, has not been pursued because in a typical DSP where median filters are used the compute-unit result bus has only half the width of the input bus due to the fact that in multiplication of two N bit numbers the result being stored to memory is one number of N bits. In median filters, however, the three, five . . . inputs are merely sorted and result in the same number of outputs.
It is therefore an object of this invention to provide an improved instruction based parallel median filtering processor and method.
It is a further object of this invention to provide such a improved instruction based parallel median filtering processor and method which is faster than conventional median filters and requires no additional ASIC or FPGA.
It is a further object of this invention to provide such an improved instruction based parallel median filtering processor and method which is compatible with conventional two input, one output compute-unit bus structures.
It is a further object of this invention to provide such an improved instruction based parallel median filtering processor and method which decomposes the three tap median filters into two parallel independent instructions.
It is a further object of this invention to provide such an improved instruction based parallel median filtering processor and method which removes pipeline dependency between the decomposed instructions.
It is a further object of this invention to provide such an improved instruction based parallel median filtering processor and method which reduces the processor die area by avoiding the limited functionality hardware block required for parallel median filtering.
It is a further object of this invention to provide such an improved instruction based parallel median filtering processor and method which can employ the existing hardware components of a traditional processor.
The invention results from the realization that improved instruction based median filtering which is faster than conventional median filters, requires no additional limited functionality ASIC or FPGA, is pipeline independent and is compatible with two input, one output compute-unit bus structures can be achieved by sorting in parallel each combination of pairs of inputs into greater and lesser members, determining from that sorting the minimum, maximum and median filter values of the inputs and applying pipeline independent decomposed instructions to enable the decision circuit to indicate at least one of the maximum, minimum and median filter values in response to one instruction and the others of those values in response to another instruction.
The subject invention, however, in other embodiments, need not achieve all these objectives and the claims hereof should not be limited to structures or methods capable of achieving these objectives.
This invention features a processor with instruction based parallel median filtering including a compute unit for receiving a plurality of inputs and including a comparing circuit for sorting in parallel each combination of pairs of inputs into greater and lesser members and a decision circuit responsive to the sorting of the pairs of inputs to determine the minimum, maximum and median filter values of the inputs. A program sequencer provides an instruction for enabling the decision circuit to indicate at least one of the maximum, minimum and median field values.
In a preferred embodiment the comparing unit may include a comparator circuit for comparing each pair of the inputs. Each comparator circuit may include a subtractor circuit for subtracting each pair of inputs. The greater and lesser members of each pair may be indicated by the sign of their difference. The decision circuit may include a logic circuit responsive to the pattern of signs of the differences to indicate the median filter value. The decision circuit may include a logic circuit responsive to the pattern of signs of the differences to indicate the maximum, minimum and median filter values. The program sequencer may provide one instruction for enabling the decision circuit to indicate one of the maximum, minimum and median filter values and another instruction to indicate the others of those values. There may be three inputs
The invention also features a method of instruction based parallel median filtering in a compute unit of a processor including sorting in parallel each combination of pairs of inputs into greater and lesser values and determining from that sorting the minimum, maximum and median filter values of the inputs. There is an applied instruction for indication of at least one of the maximum, minimum and median filter values.
In a preferred embodiment there may be applied decomposed instructions for enabling indication of at least one of the maximum, minimum and median filter values in response to one instruction and the others of those values in response to another instruction. There may be three inputs.
Other objects, features and advantages will occur to those skilled in the art from the following description of a preferred embodiment and the accompanying drawings, in which:
Aside from the preferred embodiment or embodiments disclosed below, this invention is capable of other embodiments and of being practiced or being carried out in various ways. Thus, it is to be understood that the invention is not limited in its application to the details of construction and the arrangements of components set forth in the following description or illustrated in the drawings. If only one embodiment is described herein, the claims hereof are not to be limited to that embodiment. Moreover, the claims hereof are not to be read restrictively unless there is clear and convincing evidence manifesting a certain exclusion, restriction, or disclaimer.
There is shown in
Conventional median filters, such as, median filter 30,
In accordance with this invention it is understood that with a fixed number of inputs, for example, three, there will be a predictable number of sort patterns, each one representing a different sort pattern of inputs, P1, P2, and P3 occupying the positions of Min, Med, and Max. This can be shown in the truth table of
An application of the realization according to this invention is shown in
A second problem can be addressed at the cost of only one more cycle by decomposing the instructions which operate compute unit 50. This problem arises from the fact that most processors' compute units generally have a result bus which is only half the size of the input bus. Typically, for example, the input bus would accommodate two 16 bit numbers for multiplication resulting in one 16 bit product. Here, however, three inputs of whatever size, 4 bits, 8 bits, 16 bits . . . are sorted and result in three similar outputs. To solve this problem, this invention decomposes the median filter instructions into two pipeline independent instructions.
This is shown graphically in
Although thus far in
In keeping with this invention the median filters can be implemented, as explained previously, in the compute unit of a processor. Such a processor is shown in
The third problem of pipeline dependency can be addressed by decomposing the median filter instructions into two parallel pipeline independent instructions. In pipelined operations, when there is no dependency between the result of a previous instruction and the subsequent one across all processor parallel building blocks the pipeline efficiencies are preserved. However, if there is such a dependency a pipeline stall can happen, where the pipeline will stop and wait for the offending instruction to finish before resuming to work. Although the processor here is generally described as a digital signal processor this is not a necessary limitation as a controller, a MIPS, an ARM or any other suitable processor would be usable. The decomposed instructions for operating through the program sequencer 118 according to this invention are reproduced below:
The invention is not limited to the particular hardware shown or suggested but also encompasses a method carried out in a processor,
Although specific features of the invention are shown in some drawings and not in others, this is for convenience only as each feature may be combined with any or all of the other features in accordance with the invention. The words “including”, “comprising”, “having”, and “with” as used herein are to be interpreted broadly and comprehensively and are not limited to any physical interconnection. Moreover, any embodiments disclosed in the subject application are not to be taken as the only possible embodiments.
In addition, any amendment presented during the prosecution of the patent application for this patent is not a disclaimer of any claim element presented in the application as filed: those skilled in the art cannot reasonably be expected to draft a claim that would literally encompass all possible equivalents, many equivalents will be unforeseeable at the time of the amendment and are beyond a fair interpretation of what is to be surrendered (if anything), the rationale underlying the amendment may bear no more than a tangential relation to many equivalents, and/or there are many other reasons the applicant can not be expected to describe certain insubstantial substitutes for any claim element amended.
Other embodiments will occur to those skilled in the art and are within the following claims.