The present invention relates to an improved waveform display. In particular, but not exclusively, the present invention relates to a method and apparatus for displaying an audio signal as an improved waveform.
Many audio recording, editing and production systems, or Digital Audio Workstations (DAWs), use a waveform to represent audio recordings on a computer screen or video monitor. The most common method of displaying a waveform is the use of a two-dimensional graph representing amplitude against time. The problem with amplitude versus time waveforms is that the vast majority of audio recordings contain more information than can be represented on a computer screen or video monitor at one time. Therefore, DAWs have implemented a system of zooming in and zooming out on both the amplitude and time scales to better represent the sound and overcome the lack of detail represented on the computer screen or video monitor. However, repeatedly zooming in and out to view the detail is particularly laborious and inefficient.
A waveform is a two-dimensional graph representing amplitude against time. Typically, time is represented on the horizontal axis and amplitude on the vertical axis. The reverse arrangement of the axes is feasible, but not commonly used, if at all. Typically, waveforms are monochrome, in that the waveform is represented with a single colour. Different colours are often used within DAW systems to represent different recordings in a single project. For example, a vocal track may be coloured green, whilst a drum track may be coloured blue and so on.
In this field, the terms “microscopic” and “macroscopic” are used in relation to displays of audio signals. Any waveform showing individual samples making up the signal on the screen is considered microscopic. Any waveform where pixels on the screen represent a period of time comprising more than one sample is considered macroscopic.
With reference to
With reference to
The parameters of sound that are useful to a user of a DAW system are the peak amplitude of a sound signal, the root-mean-square (RMS) amplitude of the sound signal and the frequency content, i.e. the amplitude or energy of the signal in certain frequency bands.
The peak amplitude is easily represented by the maximum and minimum frequency component values and is well executed in the majority of modern DAW systems.
The RMS amplitude has a simple yet strong mathematical background, but is often quite difficult to calculate and represent with complex audio recordings.
One method of displaying frequency content is via a spectrogram. With reference to
Another type of apparatus and method for displaying audio data as a discrete waveform is disclosed in U.S. Pat. No. 5,634,020 assigned to Avid Technology, Inc. A smoothing operation is applied to a selected portion of audio data to obtain an average value for the sample and the average value is compared against a user-set or calculated threshold to generate a discrete waveform representative of the audio sample. The apparatus and method also includes an option of determining a root-mean-square of each sample of audio data during the comparison process. However, the root-mean-square is not directly represented in the display. The discrete waveform is displayed as either a series of coloured bars of equal height or as bars of the same colour, but of different heights, the colours/heights selected according to a value of the corresponding sample of audio.
This apparatus and method provides an alternative display method that aids in locating features of the audio data, such as breaks in sound and dialogue. However, frequency component detail is not represented in this display. Also, the improvement therein resides in displaying the results of a comparison between the signal or derived analysis of a signal with a threshold which is user defined or derived from another signal, and therefore, does not necessarily apply to the entire waveform, or apply directly to the waveform in its own right. Furthermore, the Avid method and apparatus does not address the aforementioned problem of zooming in and out repeatedly.
Another type of waveform display method and apparatus is disclosed in U.S. Pat. No. 6,184,898 assigned to Comparisonics Corporation. A signal is partitioned into a plurality of consecutive time segments, which are then processed to extract frequency-dependent information that characterises each segment. The frequency-dependent information may depend on a dominant frequency or a subordinate frequency determined by the greatest or smallest amplitude respectively. The frequency spectrum is divided into bands and values are associated with each band. A value P is assigned to each time segment based on the band in which the characteristic frequency-dependent information falls. An amplitude variance V is also determined for each segment, the values P & V combining to create a signature that characterises each segment. The signatures are stored in memory and read to generate a display in which a column of pixels representing the time segment of the signal are represented in a particular colour. The colour depends at least on the frequency-dependent value P.
The Comparisonics method uses a Fast Fourier Transform or a Linear Prediction Algorithm to provide some frequency analysis of the time segment. A Fourier Transform is not a favourable method of analysis because it requires the time segments to have an even number of samples (2, 4, 6, 8, 10, etc.). A Fast Fourier Transform is even less flexible because it requires segments that are a power of 2 (2, 4, 8, 16, 32, 64, etc.). Thus, the relationship between the duration of the segment and the time period represented by any point on the display can be proven to be a point of weakness. Furthermore, the aforementioned problem of zooming in and out to view detail of the signal is again not addressed.
Another method is disclosed in U.S. Pat. No. 5,532,936 in the name of John W. Perry. In this invention, the audio signal is broken into a number of frequency bands, with a plurality of damped oscillators that are used to detect the presence of energy in certain frequency bands. This technique is more efficient and flexible than the Fourier Transform or Fast Fourier Transform methods. However, the technique is used to create a spectrogram and therefore suffers the same shortcomings as the abovementioned spectrogram display methods. In the spectrograms in this invention, the strength of the signal components are represented by pixels of varying intensity and/or colour. Low strengths are represented as blue pixels of low intensity and high strengths are represented as pink pixels of high intensity with intermediate strengths represented by pixels coded along the colour and intensity continuums in between.
In addition to the shortcomings in the display, the disclosed technique of using a damped oscillator to determine frequency content is also less flexible because each damped oscillator is designed to respond to a certain frequency band. As the user zooms in and out on a waveform display, thus changing the time scale axis, the frequencies that can be shown on the display also change. Therefore, as the time scale changes, a change in the design of the damped oscillators would also be required in order to provide useful functionality at a range of time scales. Redesigning the damped oscillators would also require the audio signal to be re-processed with the new damped oscillator designs, which would be inefficient.
Both the Comparisonics and Avid methods and apparatus and the method of Perry employ multiple colours that can cause confusion in cases where different colours are used to represent different recordings in a project, such as vocals in one colour, drums in another colour and so on.
Hence, there is a need for a system, method and/or apparatus that addresses or ameliorates at least the aforementioned prior art problem of needing to zoom in and out on a signal to have an indication of the detail contained within the signal.
In this specification, the terms “comprises”, “comprising” or similar terms are intended to mean a non-exclusive inclusion, such that a method, system or apparatus that comprises a list of elements does not include those elements solely, but may well include other elements not listed.
In one form, although it need not be the only or indeed the broadest form, the invention resides in a method of displaying an audio signal as an improved waveform including the steps of:
a) determining samples of the audio signal which represent a waveform based on positions of the pixels in the waveform and a time scale of the waveform;
b) calculating minimum and maximum amplitudes of the samples for each pixel on the time axis;
c) calculating intensities of frequency components of the samples which cannot be represented at the time scale of the waveform for each pixel on the time axis; and
d) displaying the samples as an improved waveform of amplitude versus time wherein the intensities of the frequency components are represented in the improved waveform by shades of a single colour.
Suitably, darker shades represent a higher intensity of high frequency components that cannot be displayed at the time scale of the waveform and lighter shades represent a lower intensity of high frequency components that cannot be displayed at the time scale of the waveform or vice versa.
Suitably, a gradient between a darkest shade and a lightest shade of the single colour used in the improved waveform is linear or curved. The method may include:
e) calculating root-mean-square amplitudes of the samples for each pixel on the time axis.
The method may further include representing the root-mean-square amplitudes of the samples in a profile of amplitude versus colour shade.
Suitably, the shade of a pixel comprising said improved waveform is indicative of the root-mean-square amplitude of the samples in the time interval represented by said pixel.
The method may further include representing the root-mean-square amplitudes of the samples in the improved waveform as a region of pixels of a darker shade within pixels of a lighter shade, said lighter shade pixels representing maximum and minimum amplitudes of the samples.
The method may further include repeating steps a)-d) when the time scale of the improved waveform is changed.
Suitably, steps b) and c) and optionally e) are performed in a single step.
Suitably, the colour of the waveform is the same as the colour employed for a recording type, such as vocals, bass or the like.
The method may include creating a plurality of overview packets as a summary of a recording of the audio signal enabling some or all of steps a) to d) to be performed without directly accessing the recording.
Suitably, the summary of the audio recording comprises approximations of one or more of the following: minimum amplitudes, maximum amplitudes, a root-mean-square amplitude, high frequency component energies.
The method may include transmitting a summary of processing conducted in a main processor to a graphical processor to enable the graphical processor to construct an image of the improved waveform.
In another form, the invention resides in an apparatus for displaying an audio signal as an improved waveform, said apparatus comprising:
a processor for:
a display coupled to be in communication with the processor for displaying the samples as an improved waveform of amplitude versus time wherein the intensities of the frequency components are represented in the waveform by shades of a single colour.
Suitably, the processor comprises a main processor coupled to be in communication with a graphical processor, said graphical processor coupled to be in communication with the display.
Suitably, the main processor creates a plurality of overview packets as a summary of a recording of the audio signal enabling some or all of the steps performed in the main processor to be performed without directly accessing the recording.
Preferably, the main processor transmits the summary to the graphical processor to enable the graphical processor to construct an image of the improved waveform.
Further features of the present invention will become apparent from the following detailed description.
By way of example only, preferred embodiments of the invention will be described more fully hereinafter with reference to the accompanying drawings, wherein:
Referring to
The signal is stored in memory 12 as a file, such as an industry standard AIFF or WAVE file, or PCM (Pulse Code Modulated) data, and may be a recording from an original source via a microphone 20 coupled to an analogue-to-digital converter (ADC) 22. Alternatively, the file stored in memory 12 may be a recording from another source, such as a compact disc (CD), record, tape, electronic instrument (including guitars), synthesizer, tone generator or computer system which generates audio recordings.
The method of generating and displaying the improved waveform will now be described with reference to
With reference to
With reference to step 120, the minimum and maximum amplitudes of the samples for each pixel are calculated and in step 125, the root-mean-square amplitudes for each pixel are calculated. In step 130, the intensities of the frequency components that cannot be represented at the time scale of the waveform are calculated for each pixel. Whilst steps 120, 125 and 130 are shown in
where t1 and t2 are the start time and end time respectively of the time period for the corresponding pixel.
In another embodiment, the intensities of the frequency components of a sample f(t) are calculated according to equation (2):
The inventor envisages that in a further embodiment, a Fourier Transform (FT) or a Fast Fourier Transform (FFT) could be employed to analyse the frequency components, although this is not preferred due to the aforementioned drawbacks of such algorithms. Once a FT or FFT is performed, a sum of the magnitude of frequency components would be carried out to determine the intensity of frequency components above the lower limit determined by the time scale.
Referring to step 140 in the flowchart in
As shown in
There are many systems available for defining colours, each using a number of components. Among the most common systems are RGB (Red, Green and Blue), CMYK (Cyan, Magenta, Yellow and Key) and HSB (Hue, Saturation and Brightness). RGB is typically used in video and computer displays, because the components relate directly to the red, green and blue phosphors in a Cathode Ray Tube display, for example. CMYK is mostly used in print media industries, because the components relate directly to the cyan, magenta, yellow and key (usually black) inks used for printing on paper. The HSB colour system uses a different set of components, namely hue, saturation and brightness, which describe colours in terms more natural to an artist. Hue is a component that describes a range of colours from red through green through to blue, similar to the spectrum of colours in a rainbow. Saturation describes the intensity of a colour, which ranges from gray to vivid tones, for example describing the difference between tan and brown. Brightness describes the shade of a colour, from dark to light, ranging from black to a full intensity of the colour according to the values of the hue and saturation components.
Often, the description of colour in text relates to hue. Named colours, such as red, orange and blue correspond to colours in the rainbow and can be defined with values of the hue component in the HSB colour system.
In the existing display methods mentioned above, a variation in colour typically happens in the hue component. For example, different intensities in a spectrogram, or different dominant frequencies, are represented by a change in the hue of a colour thus creating a spectrum similar to the range of colours on the rainbow.
The present invention uses shades of a single colour, which maintain a constant hue. That is, the pixels comprising the improved waveform image have a constant value of the hue component and the brightness is varied to create a range of shades in a single colour.
In
In one embodiment, the gradient between the darkest shade and the lightest, default shade, which, in one embodiment, represent the maximum and minimum intensities of the frequency components respectively, is linear. Alternatively, the gradient between the shades may be curved to provide the best visual consistency across the range of time scales that can be viewed by zooming in and out on the improved waveform.
The improved waveform 24 generated by the present invention can be contrasted with a waveform for the same signal on the same time scale generated by a typical DAW. The typical prior art waveform 26 is shown in
It will be appreciated that where reference is made herein to the invention and representing the frequency components in shades of colour, such as red, blue, green and the like, in some embodiments, a grey scale may be employed and therefore the expression “shades of colour” also includes shades of grey.
With reference to step 150 in
In addition to showing the location of the frequency components in the improved waveform 24, in one embodiment, the improved waveform 24 also shows the RMS value of the signal. The shade of a pixel comprising the improved waveform 24 is indicative of a root-mean-square amplitude of the signal in the time interval represented by said pixel. Therefore, with reference to
The analysis of a signal may be saved in memory or cached on disk, either as a separate file or as meta-data embedded into an audio file, to speed up the drawing process and to reduce memory requirements and access times.
To further improve efficiency, in one embodiment, the method of the present invention may include reducing the audio recording into a plurality of packets. Each packet corresponds to a time period within the audio recording and comprises a summary of the audio recording during that period. The duration of these packets is independent of the display and can be specified by the user or by the application. The summary may comprise approximations of values in an effort to reduce memory requirements and/or increase the speed of drawing the improved waveform 24 by removing the need to access the audio recording directly. The summary may contain approximations of values representing the minimum and maximum amplitude, the RMS amplitude and/or the high frequency energy of the period of the audio recording. Suitably, in order to maintain maximum quality of improved waveform images, summary packets are used only when the time period of the packet is less than the time period associated with each pixel along the time axis.
With reference to
The main processor 34 performs the signal analysis (steps 120, 125 and 130 in
Once the main processor 34 has performed the correct analysis of the audio signal, a summary of this information is sent to the graphical processor 36.
Typically this summary will be considerably smaller than the audio signal being displayed and also considerably smaller than the resulting image that is displayed on the attached display 16. Therefore the transferring of the summary of analysis from the main processor 34 to the graphical processor 36 is a very efficient task.
The graphical processor 36 receives a summary of the analysis of the audio signal in memory 12 from main processor 34. The Graphical Processor then constructs a waveform image that is shown on the display 16.
This combination of main processor 34 and graphical processor 36 yields a number of performance enhancements. The workload is distributed across two processors where each processor performs a part of the overall processing in a manner that can be optimized for that processor. The communication between the two processors is also very efficient because the amount of information leaving the main processor 34 is smaller in size and can be transmitted in less time. This allows the main processor 34 to return to other tasks, which is of great value to most Digital Audio Workstations. It also allows the specialized graphical processor 36 to be put to better use because it can communicate directly with the attached display 16 faster than the main processor 34.
Hence, the method and apparatus of the present invention thus provides a solution to the aforementioned prior art problem by virtue of representing a signal as an improved waveform in which frequency components of the signal that cannot be displayed at the current time scale of the waveform are represented by various shading of the improved waveform in a single colour. The particular level of shading depends on the frequency components at each time interval of the signal represented by the improved waveform. Therefore, a user of the improved waveform can easily see the locations of the frequency components within the waveform without having to zoom in on the waveform to determine whether further frequency components of the signal represented by the improved waveform are present. Nonetheless, zooming in and out on the improved waveform, i.e. changing the magnification and therefore the time scale, is, of course, possible in the present invention. Another advantage of the present invention is that the same method can be employed to generate the improved waveform irrespective of the time scale being processed.
In addition to the improved waveform displaying the minimum and maximum amplitudes of the signal at each time interval along the improved waveform and the aforementioned frequency component detail, in one embodiment, the present invention can also simultaneously display the RMS amplitude of the signal within each time interval displayed in the improved waveform. This is achieved because the shading varies along the amplitude axis as well as along the time axis.
A further advantage is that the present invention is easier to use by users with imperfect colour vision because different shades of a single colour are employed in the improved waveform. The prior art uses a range of colours to represent the waveform, which can often be problematic for users with imperfect colour vision. This is avoided in the present invention and the user can select the colour to be used in the improved waveform that is most agreeable to the user's colour vision.
The method of the present invention can form part of the suite of functions of a conventional Digital Audio Workstation (DAW) and is implemented in software. The present invention builds on the simplicity and intuitive nature of existing waveform display methods so that greater detail can be displayed and improved workflow can be achieved whilst maintaining a smooth and intuitive progression from microscopic to macroscopic time scales.
Throughout the specification the aim has been to describe the invention without limiting the invention to any one embodiment or specific collection of features. Persons skilled in the relevant art may realize variations from the specific embodiments that will nonetheless fall within the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2005904542 | Aug 2005 | AU | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/AU2006/001213 | 8/22/2006 | WO | 00 | 11/23/2007 |