An input circuit in an electronic device will typically be designed to accept an input signal in a predefined amplitude range. If the input signal exceeds the maximum design amplitude, the signal cannot be properly processed. This problem is very common in devices such as audio recorders, which include an input preamplifier to process the signal from a microphone. While it may be possible to selectively use an automatic gain control circuit upstream of the input preamplifier to attenuate input signals that exceed the maximum rated amplitude, a user may choose not to employ such a circuit to avoid limiting the dynamic range of the recorded signal. The user may prefer to control the input level by adjusting a manual gain or manual attenuation control that limits the amplitude of the input signal being recorded. Also, some recording devices do not include an automatic gain control.
It can be difficult to predict the maximum amplitude signal that will be produced during a recording session in order to properly set the input control of a recording device before a session is recorded. If the maximum rated input signal amplitude is exceeded, saturation of the input circuitry occurs and the recording that results will exhibit “clipping.” When a recorded signal is clipped, the peak levels of the recorded signal that were in excess of the rated maximum will not be accurately recorded. Instead, the recorded waveform will appear flat where the input signal exceeds the maximum rated amplitude level. When the recorded clipped audio signal is played back to a user, it will sound distorted. These clipped portions of the signal are analogous to short gaps in the recorded waveform, since the true amplitude of the waveform is missing in each clipped section.
To avoid clipping the recorded signal, a user can appropriately adjust the input signal amplitude, for example, while viewing the input level on a sound level meter, so that the sound level of the input signal applied to the input circuitry of the device does not exceed the rated amplitude. For making a recording of sound that is relatively constant in amplitude, the adjustment of the sound level for the input signal is relatively easy. However, in recording dynamically varying sound levels, it can be difficult or virtually impossible to predict the maximum amplitude that may be input, and clipping can occur before a user can respond to an increasing input signal level by adjusting the input control to reduce the signal below the clipping level. In addition, adjustment of the input control while a signal is being recorded can also adversely impact the dynamic range accuracy with which the input signal is recorded, since the input signal will be recorded at different input attenuation settings. Once an audio event has been recorded with clipping, it may not be possible to repeat the recording with the input control set properly to avoid clipping.
Accordingly, it would be desirable to repair a clipped signal to effectively restore the full dynamic range of the original input signal and minimize distortion that has resulted because of saturation of an input circuit and clipping of the audio signal that was recorded. Various approaches have been developed for accomplishing this task, but they typically do not produce a sound with the desired quality or may require too much processing to be carried out in a consumer application. Accordingly, a more effective and efficient approach for repairing a clipped waveform of a recorded audio signal is needed.
The discussion set forth above is directed to repairing audio data that have been clipped due to the dynamic range of an input signal exceeding the capabilities of an input circuit in a recording device. However, this problem with clipping of data can arise in many other applications and is not limited simply to audio recordings. Accordingly, an exemplary method is discussed below that is generally useful for repairing almost any type of data in which clipping has occurred. The goal of this method is to restore data that were lost due to the clipping. In this approach, the clipped audio data are processed in terms of relatively short frames. For example, each frame of data that is processed might be from 30-50 milliseconds in duration. For each frame of the data, the method includes a plurality of steps that are iteratively carried out. In one of these steps, an auto-covariance for the data in the frame is estimated and is used for determining a least-squares solution for the frame of data that was clipped. Based upon the least-squares solution, the method produces restored data in which the clipped data are estimated by interpolation from samples of the data in the frame that were not clipped. Next, for all but a last iteration, peak rectification is applied to correct inversion errors in the restored data. The result produces current repaired data for the frame. These steps are then repeated in the next iteration, but using the current repaired data that were just produced. The last iteration produces the final repaired data for the frame. The next frame of clipped data is then processed in the same iterative manner, until the successive frames of the data in which clipping has occurred have been repaired.
For most applications of this method, successive frames of the data overlap. However, for certain types of data, it may not be necessary to employ overlapping frames. If overlapping frames are not used, the method may include the step of adjusting a duration of the successive frames of data that do not overlap, so that a boundary between the successive frames does not coincide with a clipped portion of the data. This step may be necessary to avoid interpolation discontinuities at the boundary of a frame.
The method may also include the step of automatically detecting clipped samples in the data. To automatically detect the clipped samples, a vector of the data containing clipped samples is identified, based upon a set of indices at which the vector either exceeds a defined maximum value or is less than a defined minimum value.
The step of estimating the covariance can include the step of determining a sample mean for a vector of samples of data that are clipped in the frame. An estimate of the covariance is then determined, based upon the sample mean and the vector of the samples. Iteration tends to reduce an error in the step of estimating the covariance.
The method may further include the step of using interpolation for recombining the successive frames of final repaired data to produce a complete set of repaired data. If the data that are clipped comprise audio data, the complete set of the repaired data will then comprise repaired audio data. Further, the method can then include at least one step selected from a group of steps. Specifically, the group of steps includes the step of storing at least a portion of the complete set of the repaired audio data, enabling a person to listen to at least a portion of the complete set of the repaired audio data, and recording at least a portion of the complete set of the repaired audio data on a medium.
Another aspect of this technology is directed to a memory medium on which are stored machine executable and readable instructions, for carrying out the steps of the method. Yet another aspect of the technology is directed to an exemplary system for repairing data in which clipping has occurred, to restore data that were lost due to the clipping. The system includes a memory in which machine instructions are stored, and a processor that is coupled to the memory. The processor executes the machine instructions, which cause the processor to carry out a plurality of functions that are generally consistent with the steps of the method discussed above.
This Summary has been provided to introduce a few concepts in a simplified form that are further described in detail below in the Description. However, this Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Various aspects and attendant advantages of one or more exemplary embodiments and modifications thereto will become more readily appreciated as the same becomes better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
Exemplary embodiments are illustrated in referenced Figures of the drawings. It is intended that the embodiments and Figures disclosed herein are to be considered illustrative rather than restrictive. No limitation on the scope of the technology and of the claims that follow is to be imputed to the examples shown in the drawings and discussed herein.
Although the following discussion is directed to repairing clipped audio data, it will be understood that the same approach can be applied to repair almost any type of signal that has been clipped as a result of a dynamic range of an original input signal exceeding the saturation levels of an electronic circuit that processes the input signal. Accordingly, it will be understood that this novel approach is not intended to in any way be limited to repairing clipped audio data, but can be applied to other types of clipped data, either as recorded or in real time.
The overall clipping repair process can be broadly described as a windowing operation where the frames are restored individually and then recombined to synthesize the output. This process is illustrated in
Clipped Sample Selection for One Frame
A step 54 then provides for estimating an auto-covariance for the audio data in the frame currently being processed. Further details of step 54 are discussed below.
In addition, a step 56 provides for detecting clipped samples in the current frame. The detection of clipped samples in a frame of audio data is a straightforward process. If X is a vector of data containing clipped samples, then the set of indices at which X has clipped samples is defined as:
C={i: 0≦i<N and (Xi>μ+ or Xi<μ−)} (1)
where
μ+=(1−ε)max{X}
μ−=(1−ε)min{X} (2)
for some tolerance ε. A good choice for ε is a small fraction such as 0.01, which permits some leeway in identifying the corners of clipped segments. In practice, a simple element-wise search of X determines the elements of C.
Clipping Restoration for One Frame—Theory
The primary assumption of the clipping repair algorithm is that every clipped sample can be approximated by a linear combination of known samples. Let X be the N-length signal of clipped audio data. Further, denote C as the ordered set of indices for which X is unknown (i.e., clipped samples), and Ω as the set of known or “observed” samples. That is,
C∪Ω={0, 1, . . . , N−1}, C∩Ω=Ø,
where C is defined above. In the remainder of this section, C(i) will refer to the ith element of the ordered set C, and likewise for Ω(i).
By the assumption of linear prediction,
where hi,j are coefficients to be determined and e is an error vector. If there are M known samples and L unknown samples in X, then the h coefficients can be grouped compactly as an L×M matrix H.
Along the lines of a Wiener filter, the optimal H is defined as the matrix that minimizes the expected value of the squared norm of the error term in (3). This condition is obtained by solving Eq. (3) for the vector e, taking the expected value of its squared norm, differentiating with respect to the elements hij, and setting the derivative to zero. This problem is a standard quadratic optimization problem, which is discussed by Raymond Veldhuis, in his work entitled, “Restoration of Lost Samples in Digital Signals,” Prentice Hall International (1990) and has a unique minimum given by:
where Rτ is the auto-covariance sequence of X, assuming X is a finite realization from some stationary random process. The above expression can be converted to matrix notation and solved via:
HR′={circumflex over (R)}→H={circumflex over (R)}(R′)−1 (5)
where
R′p,q=RΩ(p)−Ω(q)′
{circumflex over (R)}s,t=RC(s)−Ω(t)′ (6)
In words, R′ is formed by taking the Toeplitz matrix Rn,m=Rn−m and then deleting the rows and columns corresponding to unknown samples. Similarly, {circumflex over (R)} is formed from Rn,m by deleting the rows corresponding to known samples and the columns corresponding to unknown samples. Then, Eq. (5) is solved directly using matrix inversion.
A step 58 in
{circumflex over (X)}=HX′. (7)
The restored signal is then:
Implementation—Auto-Covariance Estimation and Iteration
In practice, the chief difficulty in determining the least-squares solution is that the true auto-covariance sequence Rτ of X is unknown. For a finite X, Rτ is commonly approximated by the biased estimator:
where
Implementation—Peak Rectification
Another modification to the optimal interpolation scheme is peak rectification, which is carried out in a step 60 in
However, care must be taken to not rectify peaks in the final iteration of the clipping restoration procedure, because doing so may leave the output with audible discontinuities in the waveform. Instead, the rectification step helps guide the convergence of the interpolation process in all but the final iteration for each frame.
A decision step 62 then determines if the last iteration has been completed, which can simply be based upon carrying out a predefined number of iterations, or on determining if the last error in the estimate of the auto-covariance is within acceptable limits. If not, the process proceeds with another iteration, returning to step 52. Once the last iteration has been carried out in decision step 62, a step 64 provides for output of the repaired waveform for the current frame. These steps are then repeated for each successive frame of the clipped signal that is to be repaired in a step 66, and the process is done once the complete repaired audio waveform is produced by recombining the repaired audio data of these frames.
Implementation—Faster Solution Using Sub-Frames
The interpolation method outlined by Eqs. (5), (6), and (7) can also apply to sub-frames within a single frame of data, enabling each sub-frame to be interpolated separately, using appropriately truncated versions of the matrices in Eq. (5). Since the computational complexity of matrix inversion is non-linear and polynomial, reducing the matrix size can improve the run-time of the algorithm. Even though interpolation may occur within sub-frames, each sub-frame uses the same auto-covariance estimate obtained from the entire frame of data, because a reliable auto-covariance estimate requires a sufficient number of samples. Once the estimate is found for a frame, however, each sub-frame then only needs a subset of the auto-covariance values corresponding to small lags. The results of sub-frame interpolation tend to be similar to the one-step interpolation method described above, because the covariance between samples generally decreases as the samples become more separated in time.
Summary of the Interpolation Process
The repair of clipped audio data in a frame essentially is an interpolation process. The entire clipping restoration algorithm for a single frame is summarized by the following steps:
1. Determine C as defined by Eqs. (1) and (2).
2. Set Y(0)=X.
3. Set q=0.
4. Compute Rτ(q) via Eq. (9) with Y(q) substituted for X.
5. Carry out Eqs. (5), (6), (7), and (8) using Rτ(q), obtaining Y(q+1) as a result.
6. If q<qmax, then rectify the peaks in Y(q+1) according to Eq. (10).
7. If q<qmax, then increment q, i.e., q=q+1, and repeat steps 4-7.
Data Segmentation via Windowing
Finding an optimal solution for long-duration clipping is possible only for a stationary signal. For natural time-varying signals like music and speech, a windowing scheme sub-divides the signal into short intervals over which the signal is locally stationary, which allows the procedure described above to separately restore each frame. The windowing scheme that is used for this exemplary approach is described below. In this discussion, the terms “window” and “frame” are used interchangeably.
Window Length
The window length (i.e., the length of each frame of the audio data that is processed) is a parameter that can be adjusted on a signal-by-signal basis, but in general a window of approximately 30 milliseconds in length is sufficient for most signals. Since the windowed frame will later be subject to an interpolation operation during the repair process (as described above), a rectangular window shape maintains proper relationships between the audio data as successive frames are processed.
Window Overlap
Overlapping windows or frames of audio data essentially average over many interpolation operations. This redundancy appears to have an overall smoothing effect on the relationships between adjacent frames, and thus maintains a sense of homogeneity as the analysis window slides across the clipped signal and repairs it. In practice, window or frame overlap is another parameter that can be adjusted, depending on the nature of the input signal. Some signals require no overlap, yet an overlap of 75% is a safe value that works for most cases.
Since the window shape is rectangular, the amount of overlap between frames is limited to factors of the window length. In other words, the window length N and the overlap P satisfy the relationship:
N=k(N−P), k=some integer. (11)
Window overlap requires careful synthesis in order to correctly reconstruct the full-length signal when the repaired frames are recombined. Define one frame of data as:
xi[n]=x[n]w[n−iP], (12)
where the function w[n] is the rectangular window of length N and amplitude 1. After repairing the clipping in x[n], the restored frames yi[n] are recombined as:
Overlapping-window synthesis creates an averaging effect that tends to smooth over interpolation discontinuities at frame boundaries. The redundancy of overlapping frames also enables the elimination of such discontinuities entirely. Eq. 13 effectively expresses every sample of y[n] as an average of samples from k adjacent frames, but can be modified slightly to average only the non-discontinuous interpolated points. In this case, affected samples of y[n] will be averages of less than k terms, rather than k terms.
Special Case—No Window Overlap
As previously mentioned, overlap is not required for all signals. In the interest of saving computation time, some signals can be safely processed without overlap, albeit with a slightly modified windowing scheme. In order to prevent interpolation discontinuities at the window boundaries, the edge of the analysis window must be prevented from coinciding with a clipped sample. An adjustable window length guarantees this condition.
To be explicit, when overlap is not used, the analysis window or frame has a variable length (in contrast to the use of a fixed length window or frame if overlap is used), defined as follows:
where Ni is the length of the ith window wi[n]. The length Ni is determined by first attempting to use a nominal length N, looking for clipped samples at the boundary points of the frame, and then adjusting the length accordingly to avoid a clipped sample coinciding with the boundary of the frame.
When adjusting the window length, there are two cases. In the first case, the final point of the ith frame coincides with a clipped sample. Let sk and fk be the starting and ending indices, respectively, of the kth contiguous segment of clipped samples within the scope of the frame, where K is the total number of contiguous clipped regions. The new window length, after adjustment, is then:
where δ is a constant value set to the integer nearest N/10. In other words, the new window length is set to the midpoint between the last two contiguous segments of clipped samples, as long as the segments are not too far separated.
In the second case, the first point of the ith frame coincides with a clipped sample. Nm is set to N+1 and the window position shifts backward by one sample. Such a move is valid so long as the first case is handled properly for the previous frame.
As noted above, the input signal with one or more gaps caused by clipping may be provided in a form of a stored signal on a memory medium such as a floppy disk, or an optical storage medium, or may have been previously stored on data store 218 after being received over a connection to a network Internet 230, or from some other source, such as a recording device. Any one or more of a number of different input devices 224 such as a keyboard, mouse or other pointing device, trackball, touch screen input, etc., are connected to I/O interface 220. A monitor or other display device 226 is coupled to display interface 222, so that a user can view graphics and text produced by the computing system as a result of executing the machine instructions, both in regard to an operating system and any applications being executed by the computing system, enabling a user to interact with the system. An optical drive 232 is included for reading (and optionally writing to) CD-ROM 234, or some other form of optical memory medium.
Although the concepts disclosed herein have been described in connection with the preferred form of practicing them and modifications thereto, those of ordinary skill in the art will understand that many other modifications can be made thereto within the scope of the claims that follow. Accordingly, it is not intended that the scope of these concepts in any way be limited by the above description, but instead be determined entirely by reference to the claims that follow.
Number | Name | Date | Kind |
---|---|---|---|
5025404 | Janssen et al. | Jun 1991 | A |
5331587 | Abel et al. | Jul 1994 | A |
7003448 | Lauber et al. | Feb 2006 | B1 |
7809556 | Goto et al. | Oct 2010 | B2 |
20020176353 | Atlas et al. | Nov 2002 | A1 |
20040186707 | Fourquin et al. | Sep 2004 | A1 |
20040220799 | Peeters et al. | Nov 2004 | A1 |
20070100610 | Disch et al. | May 2007 | A1 |
20080040122 | Chen et al. | Feb 2008 | A1 |
20080046233 | Chen et al. | Feb 2008 | A1 |
20080056511 | Zhang et al. | Mar 2008 | A1 |
20080253553 | Li et al. | Oct 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20090083031 A1 | Mar 2009 | US |