1. Field of the Invention
This invention relates to a picture processing method and apparatus in which, in a device handling picture data, such as a computer handling picture data, an electronic still camera, a recording device, an editing equipment or a display equipment, high resolution picture data can be outputted from input low-resolution picture data.
2. Description of Related Art
As computers, digital cameras or networks are becoming more popular, it is becoming a frequent practice to deform or correct picture data, captured on a computer, by a picture data processing software. In particular, picture enlargement and contraction is practiced routinely. Above all, in enlarging a picture, the method for enlargement, that is a method for interpolation, is at issue, such that, depending upon the enlargement method used, the enlarged picture is different in picture quality. Representative of the interpolation methods are a nearest neighbor, linear interpolation and cubic convolution.
The nearest neighbor method is equivalent to an order-zero holding, with the number of pixels increasing with picture enlargement. The value of pixels lying physically closest to a pixel present from the outset is directly used as the value of added pixels.
The linear interpolation method uses values of pixels present from the outset on upper and lower sides and on left and right sides and averaged with weights proportional to the distance.
The cubic convolution method uses linearly filtered values of distant pixels as the values of the added pixels.
The above methods may be used in combination with a method of enhancing edges after interpolation to improve subjective picture quality.
The above-described interpolation methods suffer from a defect that, although the number of pixels can be increased, the spatial resolution cannot be improved beyond that of the original picture, and that, since the interpolation is not that by an ideal filter, aliasing tends to be produced.
The nearest neighbor method has a defect that a picture is blocked due in particular to aliasing such that a picture representing an oblique line is stepped line picture.
The cubic convolution method is affected to a lesser extent by aliasing since the interpolation is close to that by an ideal filter. However, since the spatial resolution is not changed from that of the original picture, the picture gives an impression of a subjectively blurred picture.
The linear interpolation is a method compromised between the two methods, such that a picture produced is also compromised between the pictures obtained with these methods. That is, the produced picture gives a blurred impression and also suffers from blocked distortion.
These inconveniences are particularly objectionable with a higher enlargement ratio, such that an enlarged picture appears to be non-optimum when seen at a shorter distance.
For improving the blurred impression, edge enhancement may be used in combination. The most routine method for edge enhancement is to find a waveform of an order-two differentiation and to add a moderate amount of the differentiated waveform to the original waveform. Although the blurred impression is thereby improved, pre-shooting or over-shooting is likely to be produced.
Recently, proposal has been made of a method of converting standard television signals into HD signals of the high-vision television grade. This technique is not simple interpolation. Specifically, HD and SD signals, previously prepared from the same source, are used as training data, and a database is produced with the HD and SD signals associated with each other. When the SD signals are inputted, the data pace is lowered by way of performing non-linear processing to output HD signals. However, this method is limited to the case of doubling the multiplication in the vertical and horizontal directions, such that the method is difficult to apply for higher multiplication.
Thus, as a method for supppressing blocked distortion even in enlargement to larger multiplication and for producing as clear a picture as pssible, a method known as a MAP (maximum posteriori) in IEEE Transactions on Image Processing, vol. 3, no.3, pp. 233 to 242, 1994). This method processes a picture by the nearest neighbor method as a starting picture to produce a target picture. Specifically, the values of respective pixels are updated to approach to a natural picture on the assumtion that the natural picture is smooth everywhere. This processing is repeated several times. The degree of non-smoothness of an entire picture, termed smoothness, is used as an energy, and a picture is updated using the steepest descent method, so that this energy will be decreased. With this method, the blocked distortion is not perceived, with the produced picture being clearer than a picture obtained with the cubic convolution method. However, this method has a drawback that the processing is sluggish because of the use of the steepest descent method.
In the MAP method, it is crucial how an energy function representing the above energy is to be determined. Representative of such energy function is the Huber function which is proportionate to the square of the smoothness and to the smoothness when the energy is small, that is when the picture in its entirety is smooth, and when the energy is large, that is when the picture in its entirety is not smooth, respectively. The reason of using this form of the function is that, since sharp edges are inherently contained in a picture, the picture is to be prevented from being excessively smooth to protect the edge, that is that the state of high smoothness (non-smoothness) is to be the state of excessively high energy state to prevent the energy from being lowered to prevent the energy from being lowered by repetitive operation to lead to excessive smoothness.
However, with the Huber function, it is necessary to set a parameter of setting a switching point between the power of two and the power of one. An optimum value of this parameter differs from picture to picture and hence it is not advisable to set this value unequivocally.
The known MAP method is applied to enlargement by a factor equal to an integer number. Therefore, if enlargement by a factor corresponding to an optional number is desired, the MAP method can be combined with other methods. The other methods are known, and hence are not explained here and enlargement by a factor equal to an integer number only is explained.
First, enlargement of an input picture by a factor equal to q in the vertical and horizontal directions is considered. If a low resolution input picture of M×N pixels is Y, a high resolution output picture is X, a decimated matrix with a vertical to horizontal ratio equal to 1/q is T and the white Gaussian noise is n, the relationship of the equation 1:
Y=T×X+n (1)
holds, where T may be represented by the following equation (2):
Although it is desirable to find a high resolution output X from the equation (2), X exists infinitely, such that X cannot be found algebraically. Therefore, the following suppositions are made:
The supposition 2 postulates that the smaller the smoothness, the higher is the probability, that is that a natural picture is approximately smooth. Therefore, this supposition may also be said to be reasonable.
On the other hand, if the noise added in producing an input picture is thought to be a Gaussian noise, the probability of the noise n may be represented by the following equation (4):
where σ is the standard deviation.
From the above model, an input is assumed to be a lowresolution picture Y, and deal high resolution picture X^ satisfying this supposition is found. From the above supposition, X^ maximizes Pr(X|Y). In general, Pr(X|Y) is represented by the equation (5):
Pr(X|Y)=Pr(Y|X)×Pr(X)/Pr(Y) (5)
That is, for finding X^, it suffices to maximize the equation (6). It is however obvious from the equations (6), (3) and (4) that it is sufficient if ΣvεvSv^(X) is minimized, as may be seen from the following equation (7):
Now, the function Sv(X), representing the smoothness, is defined. Since Sv(X) is the local smoothness of a picture, the less smooth a picture, the larger must be the magnitude of the function. As a function that meets this condition, 3 vertically consecutive by 3 horizontally consecutive pixels as shown in
where hk(X) is a order-two FIR filter represented by the following equation (9):
h0(X)=Xi+1,j−2Xi,j+Xi−1,j
h1(X)=Xi+1,j−1−2Xi,j+Xi,j+1
h2(X)=Xi,,j−1−2Xi,j+Xi,j+1
h3(X)=Xi−1,j−1−2Xi,j+Xi+1,j+1 (9)
If Sv^ (X) is defined in this manner, it is a scalar quantity contrary to a picture X which is a vector quantity.
Therefore, the equation 7 may be taken as an energy function determined by X. Next, the equation 7, as the energy function, is minimized. To this end, the well-known steepest descent method may be used. Suppose that, by m'th calculations, all pixel values are updated as shown by the following equation 10:
Xm+1=Xm−αmDm (10).
In the steepest descent method, since Dm is the gradient of the energy function, shown by the equation 11:
this gradient must be found.
On the other hand, when the right side of the equation 10 is set as a uni-variable function z(αm) having αm as a variable, αm is determined by finding such as αm which minimizes
This processing is repeated until
is substantially unchanged to find targeted X^.
Meanwhile, if this processing is repeated, the entire picture becomes smooth to approach to a natural picture. However, if a picture has abundant edges, these edged are also smoothed to give the impression of a blurred picture by way of an undesirable secondary effect. Therefore, the Huber function shown by the following equation (12):
is used.
By so doing, the gradient in case of a large magnitude of Sv(X) becomes smaller than in the equation 7, so that the degree of decrease of the value of the equation 7 by the steepest descent method is decreased. That is, the smoothness (degree of non-smoothness) is retained to prevent the edges in an edgy portion from becoming blurred to more than a necessary extent.
The foregoing is the explanation of the high definition technique by MAP so far known in the art.
So, in the MAP method, it is crucial how the energy function representing the above-mentioned energy is to be determined. To this end, the Huber function is preferentially used. This function is set so that it is proportional to the square of smoothness when the energy is small, that is when the picture in its entirety is smooth, while it is set so that it is proportional to smoothness when the energy is large, that is when the picture in its entirety is not smooth. The reason of doing this is that, since a picture inherently contains edges, excess smoothing needs to be prevented to protect the edges. Specifically, it is necessary to prevent the high smoothness state, that is the non-smooth state, from being converted to an excessively high energy state to prevent the energy from being lowered due to repetitive processing to bring about excess smoothness.
However, with the conventional MAP method, the picture energy is expressed by the Huber function, this energy being decreased by the steepest descent method. However, the calculations for finding a by the steepest descent method is expensive. On the other hand, it is necessary with the Huber function to define a parameter T determining a switching point between a power of 2 and a power of 1. However, this value is changed depending on a particular portion of a picture and hence it cannot be said to be optimum to set this value uniquely.
It is therefore an object of the present invention to decrease the amount of calculations, without deteriorating the picture quality in comparison with that in the conventional method, in case picture enlargement processing is to be performed in accordance with the MAP method, in order to avoid blocked distortion and in order to obtain a clear picture.
It is another object of the present invention to provide a method for edge protection without employing the Huber function which tends to deteriorate the picture quality when the parameter values are not appropriate.
Now, α is considered. This value α is a parameter used for decreasing the energy function. By properly determining this value depending on a particular picture, the speed until the state of convergence is remote, that is the number of times of repetition can be decreased. On the other hand, an optimum value of α is affected by the definition of the energy function, as discussed above, such that, if the definition of the energy function is changed, the optimum value of α differs, even if the processing is that for the same picture. Therefore, in the MAP method, an optimum value of α needs to be found from one repetitive processing to another. Since the processing volume for calculating the value α is large, it is advisable to render this processing efficient in order to decrease the overall processing volume.
Meanwhile, our experiments by the conventional MAP method with the use of variable pictures have revealed that, if the energy function is defined as in the equations 7 to 9, the number of times of repetition until convergence where no visual changes are noticed is on the order of three. This means that, since the gradation resolution of a human eye is 8 bits, that is 256 gradations, at most, such that an optimum picture is approached by three MAP processing operations, while subsequent changes as from this point are minute.
It has also been seen that, in this repetitive processing from the first to the third operations, the value ranges assumed by α are comprised in a pre-set constant range.
Thus, a mean value of the values assumed by α in each repetitive operation for the above-mentioned variable pictures is found. A mean value of α in an i'th processing is set as αave i. The processing for calculating α is not performed for the variable pictures and the value αave i is used for processing.
With this method, it may be feared that, since the value of α is not optimum, the rate of convergence is slower or not converged, such that it is oscillated with the minimum energy state as the center of oscillations.
However, since αave i is inherently close to an optimum value, it shows the converging process comparable to that in using an optimum value. If the converging rate is slow, the energy value is equivalent to that when the optimum α value is used, on the condition that the convergence occurs to the minimum energy state, such that the output picture quality is not vitally different from that with the use of the optimum α value. If oscillations occur ultimately, the energy is already close to the minimum value, the output picture quality is within the range of the gradation resolution of the human eye and hence is not problematical. So, the problem is solely the calculating cost, so that it is advisable to compare the cost in the processing for calculating α and in memory consumption to that incurred due to the increased number of repetitions to use a method which is more favorable in cost.
Our experiments have revealed that, if αave i is used, the number of repetitions until convergence is on the order of three, which is not changed from the number in case of calculating an optimum α value. This may be ascribable to the fact that αave i is a value close to the optimum value of α and that the standard of verifying the convergence is the visual properties of the order of 8 bits as discussed above. It is therefore apparent that the processing volume is smaller with the use of αave i, it being unnecessary to make cost comparison as discussed above. This, however, does not apply if the definition of the energy function is changed, in which case it may occur that the number of repetitions until convergence becomes larger than that in case of calculating α. In such case, cost comparison is required, as explained previously. However, since the rate of convergence is not vitally changed even though the processing volume in calculating α is large, it is in general more preferred to use αave i.
Moreover, considering that αave i is not changed significantly from one repetitive processing to another, an experiment was conducted using the same value of αave in each repetitive processing, without providing αave i consistent with i. The results of this experiment revealed that the number of times of repetition in this case was also on the order of three. It was thus found that αave may be used from one repetitive processing to another since it is more advantageous in processing cost than if an optimum value of α is calculated, while being slightly more advantageous than if αave i is used.
In this consideration, we have decided to define the energy function first and to find an average value of a for the respective repetitive operations by experiments on a large number of data or an average of values for the entire processing operations in advance to use this value for actual processing. By so doing, it is possible to omit the voluminous processing involved in calculating α from one actual repetitive processing operation to another.
We have also decided not to give judgment on the convergence conditions. Inherently, the value of the energy function is calculated from one repetitive processing to another, a difference is found from the result of the previous processing and the processing is terminated when the value is less than a pre-set value. However, if the energy function is defined as in the equations 7 to 9, convergence is achieved with three operations irrespective of the picture. If the energy function is defined otherwise, it is possible to discontinue the processing at a number of times of repetition in which discrimination is not possible, due to the human visual properties, as discussed above. The number of times until discrimination is impossible is previously determined, at a time point αave is determined, by processing variable pictures. This renders it possible to omit the calculations required in convergence judgment.
Meanwhile, the value of α, previously found, may also be a median value, instead of being an average value αave.
Also, in the Huber function, used for preventing the edge from becoming excessively smooth, it is difficult to determine the parameter T appropriately.
Therefore, smoothness S′v(X) is re-defined as in the equation 14:
where h′k(X) is a dynamic value defined in the equation (15):
h′0(X)=bXi+1, j+aXi,,j+bXi−1,j
h′1(X)=bXi+1,j−1+aXi,j+bXi−1,j+1
h′2(X)=bXi,,j−1+aXi,j+bXi,,j+1
h′3(X)=bXi−1,j−1+aXi,j+bXi+1,j+1 (15)
in which a and b are changed with local picture values, that is a value determined by the equation (16):
In this case, since filter coefficients are changed with the local picture values, updating effects are changed depending on particular picture portions. With a large absolute value of Sv(X), that is if edge components are contained locally in large quantities, the values of S′v(X) or ∇S′v(X) are not excessively increased to prevent edges from becoming excessively smoothed by updating. Conversely, should local edge components be small in quantities, with the picture being smooth, the values of S′v(X) or ∇S′v(X) are not so small so that the effect of being smoothed on updating is not excessively small. That is, optimum results are achieved in keeping with features of particular picture portions.
As discussed above, the effect of edge protection is by the equation 14, while the equation 13 employing the Huber function is not used. This eliminates the necessity of considering the parameter T.
Thus, according to the present invention, the value of α is fixed and picture processing is performed using a dynamic filter instead of using the Huber function. This processing is repeated a pre-set number of times.
According to the present invention, an energy function of a picture is defined in advance and stored. An input picture is enlarged, that is the number of pixels is increased, and a gradient value of the energy function in the pixel of the enlarged picture is calculated. The product of the gradient value of the energy function with a value not dependent on the input picture is added to the pixel to update the pixel value to adjust the picture quality to raise the resolution. The pixel value is updated a number of times to adjust the picture quality to raise the resolution.
Moreover, according to the present invention, an energy function of a picture varied in dependence upon the input picture is defined in advance and stored. An input picture is enlarged, that is the number of pixels is increased, and a value which decreases the energy in the pixel of the enlarged picture is calculated. This energy decreasing value is added to the pixel and the pixel value is updated to adjust the picture quality to raise the resolution. The pixel value is updated a number of times to adjust the picture quality to raise the resolution.
According to the present invention, the picture obtained may be a clear high resolution picture without the picture suffering from the blocked distortion or becoming blurred due to insufficient spatial resolution. Moreover, the processing volume or the storage capacity needed in calculating the parameters can be diminished. By employing a dynamic filter, edge protection may be realized without using the Huber function in which picture quality deterioration is liable to be produced because of the difficulty encountered in optimizing the parameter T.
Referring to the drawings, preferred embodiments of according to the present invention will be explained in detail.
The present invention is applied to a picture processing apparatus 10 configured as shown in
This picture processing apparatus 10 processes low resolution picture data, inputted via an input interface 14 by a central processing unit (CPU) 12 to generate high resolution picture data which is outputted via an output interface 16. The picture processing apparatus 10 is made up of a CPU 12, connected to an internal bus 11, a memory 13, an input interface 14, a user interface 15 and an output interface 16.
The algorithm of picture processing by the CPU 12 in the picture processing apparatus 10 is explained by a flowchart shown in
First, at step S11, M×N pixel low resolution picture is inputted over the input interface 14. The input picture is a picture photographed by a CCD or stored in a hard disc, without regard to the picture furnishing source.
At step S12, the picture is enlarged by zero-order hold by a factor of q in both the vertical and horizontal directions. That is, the same pixel values are repeated for the vertical q by horizontal q pixels.
This picture is to be an initial picture. As the number of times of picture updating processing operations is verified at step S13, the picture updating processing is repeated at step S14 a pre-set number of times. This decreases the value of the equation 7 each time. At step S13, it is checked whether or not a pre-set number of times is reached. If the pre-set number of times is reached, the value of the equation 7 is close to a minimum value, with the value of each pixel being not changed. So, the processing is terminated to output the processed enlarged picture via the output interface 16. The output destination may be enumerated by a storage device having a recording medium, such as a tape or a hard disc, a next-stage signal processing circuit, a display device, such as VRAM or CRT, or an output device, such as a printer, without regard to the type of the output destination.
The picture updating processing at the above step S14 is specifically explained by a flowchart shown in
First, at step S21, (i, j)=(0,0) is set to determine a pixel processed first. As it is checked at step S22 whether or not the pixel updating processing has come to a close for the entire pixels, the pixel updating processing is sequentially repeatedly performed at step S23 for each of qM by qN pixels.
At step S24, (i,j) is updated to determine the pixel processed next. At step S22, it is checked whether or not the pixel updating processing has come to a close for the entire pixels. If the pixel updating processing has come to a close, the pixel updating processing is terminated.
Since the pixel updating processing at step S23 is the processing accompanied by FIR filtering, special processing is required at a terminal portion of the picture. For this processing, routinely used methods are directly used. For example, in the terminal portions of the picture, a mirror image picture is assumed to continue to an outer side of the picture and, based on this assumption, processing is executed as if the terminal portions are not the terminal portions. Alternatively, pixel values are intermediate values between the zero and the maximum value, or processing is not performed on the terminal portions.
Referring to the flowchart of
In the pixel updating processing, ∇(Σv∈v(Sv{circumflex over ( )}(X))) is found such as to decrease the energy function Σv∈vSvg{circumflex over ( )}(X), and is multiplied by a constant α to give Dm to update the pixel value by the equation (10).
Specifically, ∇(Σv∈v(Sv{circumflex over ( )}(X))) is found at step S31. This is a sum of ∇(Σv∈v(Sv{circumflex over ( )}(X))) and ∇(β∥Y−Tx∥2), as shown by the equation 11.
Since the former is the partial derivative of a pixel Xi, j of smoothness ∇(Σvεv(Sv^ (X)) of the entire picture, it is a partial derivative at Xi, j in each Sv(X) containing Xi, j in the picture. If this is calculated, the result is a 5×5 FIR filter shown in
The latter can be found on calculations, since β and T are constants, Y is an input picture and X is a current high resolution picture.
At the next step, the value found at step S31 is multiplied by a constant α and added to the current pixel value to give a new pixel value.
The above is the pixel updating processing. By this processing, the pixels are updated in a direction of decreasing the energy, that is smoothing a picture. However, since the energy is calculated by a dynamic filter, as shown in the equations 14 to 16 and in
The present invention is not limited to the above-described embodiments and may be modified in structure or application without departing from the principle of the pixel updating method. For example, although the software processing is presupposed in the foregoing description, it may be implemented by a hardware logic.
That is, referring to
The present invention is not limited to enlarging an input picture. For example, in case of a picture with a large number of pixels not containing high frequency components, in which the spatial frequency is only up to an area that can be represented by one-half pixels in the vertical and horizontal directions, an input picture may be such a one having pixels decimated to one half in the vertical and horizontal directions and may subsequently be enlarged by a factor of two by zero order hold, in which case the high resolution may be realized with the number of pixels equal to that of the original picture. The present invention is not limited to processing accompanied by picture enlargement. Of course, a picture derived from the order zero hold may be used as an input picture and the initial enlargement operation may be omitted. The present invention may also be applied to a picture portion. The enlargement ratio is not limited to two.
In the definition of the energy function of the present invention, an order two FIR filter is used. However, the coefficients or the number of orders are not limited to this example. In such case, the filter used for finding the gradient of the energy function is necessarily different from that shown in the foregoing description.
In the MAP method in general, discussions are made on a model which does not take the noise into account, or a model in which an updated picture is compared to an input picture every updating operation to provide a constraint condition. Since using a constant α value and not using the Huber function are the same as to the method of pixel updating, the present invention naturally may be applied to these models.
Number | Date | Country | Kind |
---|---|---|---|
P11-147248 | May 1999 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
4275450 | Potter | Jun 1981 | A |
4484347 | Kashioka | Nov 1984 | A |
4528693 | Pearson et al. | Jul 1985 | A |
4578812 | Yui | Mar 1986 | A |
4610026 | Tabata et al. | Sep 1986 | A |
4633503 | Hinman | Dec 1986 | A |
4701808 | Nagashima | Oct 1987 | A |
6535632 | Park et al. | Mar 2003 | B1 |
6611618 | Peli | Aug 2003 | B1 |