This application claims the benefit of priority to U.S. Provisional patent application titled “SYSTEM AND METHOD FOR AUDIO SYNTHESIZER UTILIZING FREQUENCY APERTURE ARRAYS”, Application No. 61/378,765, filed Aug. 31, 2010; and U.S. Provisional patent application titled “SYSTEM AND METHOD FOR AUDIO SYNTHESIZER UTILIZING FREQUENCY APERTURE ARRAYS”, Application No. 61/379,094, filed Sep. 1, 2010; each of which applications are herein incorporated by reference in their entirety.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Embodiments of the invention are generally related to music, audio, and other sound processing and synthesis, and are particularly related to a system and method for audio synthesizer utilizing frequency aperture cells (FAC) and frequency aperture cell arrays (FAA).
Disclosed herein is a system and method for audio synthesizer utilizing frequency aperture cells (FAC) and frequency aperture arrays (FAA). In accordance with an embodiment, an audio processing system can be provided for the transformation of audio-band frequencies for musical and other purposes. In accordance with an embodiment, a single stream of mono, stereo, or multi-channel monophonic audio can be transformed into polyphonic music, based on a desired target musical note or set of multiple notes. At its core, the system utilizes an input waveform(s) (which can be either file-based or streamed) which is then fed into an array of filters, which are themselves optionally modulated, to generate a new synthesized audio output.
Advantages of various embodiments of the present invention over previous techniques include that the input audio source can be completely unpitched and unmusical, even consisting of just pure white noise or a person's whisper, and after being synthesized by the FAA have the ability to be completely musical, with easily recognized pitch and timbre components; and the use of a real-time streamed audio input to generate the input source which is to be synthesized. The frequency aperture synthesis approach allows for both file-based audio sources and real-time streamed input. The result is a completely new sound with unlimited scope because the input source itself has unlimited scope. In accordance with an embodiment, the system also allows multiple syntheses to be combined to create unique hybrid sounds, or accept input from a musical keyboard, as an additional input source to the FAA filters. Other features and advantages will be evident from the following description.
Disclosed herein is a system and method for audio synthesizer utilizing frequency aperture cells (FAC) and frequency aperture arrays (FAA). In accordance with an embodiment, an audio processing system can be provided for the transformation of audio-band frequencies for musical and other purposes. In accordance with an embodiment, a single stream of mono, stereo, or multi-channel monophonic audio can be transformed into polyphonic music, based on a desired target musical note or set of multiple notes. At its core, the system utilizes an input waveform(s) (which can be either file-based or streamed) which is then fed into an array of filters, which are themselves optionally modulated, to generate a new synthesized audio output.
An advantage of various embodiments of the present invention over previous techniques is how the input audio source can be completely unpitched and unmusical, even consisting of just pure white noise or a person's whisper, and after being synthesized by the FAA have the ability to be completely musical, with easily recognized pitch and timbre components. The output audio source is unlimited in its scope, and can include realistic instrument sounds such as violins, piano, brass instruments, etc., electronic sounds, sound effects, and sounds never conceived or heard before.
Other advantages of embodiments of the present invention over previous techniques are the use of a real-time streamed audio input to generate the input source which is to be synthesized. Previously, musical synthesizers have relied upon stored files (usually pitched) which consist of audio waveforms, either recorded (sample based synthesis) or algorithmically generated (frequency or amplitude modulated synthesis) to provide the audio source which is then synthesized. The frequency aperture synthesis approach allows for both file-based audio sources and real-time streamed input. The result is a completely new sound with unlimited scope because the input source itself has unlimited scope.
In order to facilitate pitched streamed audio input sources, in accordance with an embodiment the system also includes a dispersion algorithm which can take a pitched input source and make it unpitched and noise-like (broad spectrum). This signal then feeds into cells and filters of the FAA which further synthesizes the audio signal. This allows for a unique attribute in which a person can sing, whisper, talk or vocalize into the dispersion filter, which, when fed into the FAA filters and triggered by a keyboard or other source guiding the pitch components of the FAA synthesizer, can yield an output that sounds like anything, including a real instrument such as a piano, guitar, drumset, etc. The input source is not limited to vocalizations of course. Any pitched input source (guitar, drumset, piano, etc.) can be dispersed into broad spectrum noise and re-synthesized to produce any musical instrument output.
In accordance with an embodiment, the system also allows multiple syntheses to be combined to create unique hybrid sounds. Finally, embodiments of the invention include a method of using multiple impulse responses, mapped out across a musical keyboard, as an additional input source to the FAA filters, designed, but not limited to, synthesizing the first moments of a sound.
Introduction
White noise is a sound that covers the entire range of audible frequencies, all of which possess equal intensity. White noise is analogous to white light, which contains roughly equal intensities of all frequencies of visible light. An approximation to white noise is the static that appears between FM radio stations. Pink noise contains all frequencies of the audible spectrum, but with a decreasing intensity of roughly three decibels per octave. This decrease approximates the audio spectrum composite of acoustic musical instruments or ensembles.
In accordance with an embodiment, the system uses an array of audio frequency aperture cells, which separate noise components into harmonic and inharmonic frequency multiples. Much in the way that a prism can separate white light into it's constituent spectrum of frequencies, the resultant frequencies based on the material, internal feedback interference and spectrum of incoming light. Among other factors, frequency aperture cells (FAC's) do analogously with audio, based on their type, feedback properties, and the spectrum of incoming audio. Another aspect of the invention deals with the conversion of incoming pitched sounds into wide-band audio noise spectra, while at the same time preserving the intelligibility, sibilance, or transient aspect of the original sound, then routing the sound through the array of FAC's.
Previous techniques for dealing with both pitched and non-pitched audio input is known as subtractive synthesis, whereby single or multi-pole High Pass, Low Pass, Band Pass, Resonant and non-resonant filters are used to subtract certain unwanted portions from the incoming sound. In this technique, the subtractive filters usually modify the perceived timbre of the note, however the filter process does not determine the perceived pitch, except in the unusual case of extreme filter resonance. These filters are usually of type IIR, Infinite Impulse Response, indicating a delay line and a feedback path. Others who have employed noise routed through IIR filters are Kevin Karplus, Alex Strong (1983). “Digital Synthesis of Plucked String and Drum Timbres”. Computer Music Journal (MIT Press) 7 (2): 43-55. doi:10.2307/3680062, incorporated herein by reference. Although arguably also subtractive, in this case the resonance of the filter usually determines the pitch as well as it affects the timbre. There have been various improvements to this technique, whereby certain filter designs are intended to emulate certain portions of their acoustic counterparts.
In accordance with an embodiment, the system provides an improvement to both of these musical processes, by employing arrays of frequency aperture cells. FAC's have the ability to transform a spectrum of related or unrelated, harmonic or inharmonic input frequencies into an arbitrary, and potentially continuously changing set of new output frequencies. There are no constraints on the type of filter designs employed, only that they have inherent slits of harmonic or in-harmonic frequency bands that separate desired frequency components between their input and output. Both FIR (Finite Impulse Response) and IIR (Infinite Impulse Response) type designs are employed within different embodiments of the FAC types. Musically interesting effects are obtained as individual frequency slit width, analogous to frequency spacing, and height, analogous to amplitude, are varied between FAC stages. FAC stages are connected in series and in parallel, and can each be modulated by specific modulation signals, such as LFO's, Envelope generators, or by the outputs of prior stages.
Frequency spacing from the output of the FAC is often not even (i.e. harmonic, hence the term “slit width” instead of “pitch” is used. “Slit width” can affect both the pitch, timbre or just one or the other, so the use of “pitch” is not appropriate in the context of an FAC array.
Frequency Aperture Arrays
In accordance with an embodiment, frequency aperture arrays (FAA's) are n series by m parallel connections of frequency aperture cells, and optionally other digital filters such as multimode HP/BP/LP/BR filters and/or resonators of varying type. The multi-mode filter can be omitted as an option.
Frequency Aperture Cells
Each frequency aperture cell, with varying feedback properties, produces instantaneous output frequency based on both the instantaneous spectrum of incoming audio, as well as the specific frequency slits and resonance of the aperture filter. Two controlling properties are the frequency slit spacing (slit width) and the noise-to-frequency band ratio, or frequency (slit height).
An important distinction of constituent FAA cells is that their slit widths are not necessarily representative of the pitch of the perceived audio output. FAA cells may be inharmonic themselves, or in the case of two or more series cascaded harmonic cells of differing slit width, they may have their aperture slits at non-harmonic relationships, producing inharmonic transformations through cascaded harmonic cells. The perceived pitch is often a complex relationship of the slit widths and heights of all constituent cells and the character of their individual harmonic and inharmonic apertures. The slit width and height are as important to the timbre of the audio as they are to the resultant pitch.
In accordance with an embodiment, a system and method are provided that is an improvement to both of these musical processes, by employing arrays of frequency aperture filters. FAA's have the ability to transform a spectrum of related or unrelated, harmonic or inharmonic input frequencies into an arbitrary, and potentially continuously changing set of new output frequencies. There are no constraints on the type of filter designs employed, only that they have inherent slits of harmonic or in-harmonic frequency bands that separate desired frequency components between their input and output. Both FIR (Finite Impulse Response) and IIR (Infinite Impulse Response) type designs are employed within different embodiments of the FAA types. Musically interesting effects are obtained as individual frequency slit width, analogous to frequency spacing, and height, analogous to amplitude, are varied between FAC stages. In accordance with an embodiment, FAC stages are connected in series and in parallel, and can each be modulated by specific modulation signals, such as LFO's, Envelope generators, or by the outputs of prior stages.
Frequency Aperture Filters
In accordance with an embodiment, frequency aperture filters (FAF) may be embodied as single or multiple digital filters of either the IIR (Infinite Impulse Response) or FIR (Finite Impulse Response) type, or any combination thereof. One characteristic of the filters is that both timbre and pitch are controlled by the filter parameters, and that input frequencies of adequate energies that line up with the multiple pass-bands of the filter will be passed to the output of the collective filter, albeit of potentially differing amplitude and phase.
In accordance with an embodiment, this buffer is of circular (modulo) type and comprised of four interleave channels of equal modulo-4096 (or other 2^n) length, for simplicity of modulo addressing. Multiple channels are addressed by the same pointer index by adding offsets of 4096, 8192, and 12288, respectively. However, by virtue of the 4-channel interleave arrangement, execution of one single SIMD (Single Instruction Multiple Data) data lookup, provides one address for all four variables simultaneously.
For example, with a SIMD CPU opcode such as the “MOVAPS [EAX], xmm2” instruction in the Intel SSE-capable architecture, with the EAX register having first been preloaded with the write index, four 32-bit floating point values are read from the xmm2 register and placed into the memory address indexed by the EAX register. This happens in as little as a single CPU cycle.
In accordance with an embodiment, Left and Right Stereo or mono audio is de-multiplexed into four channels, based on the combination type desired for the aperture spacing. This is the continuous live streaming audio that follows the impulse transient loading.
After that, continuous, successive write addresses are generated by the buffer address control for incoming combined input samples, as well as for successive read addresses for outgoing samples into the Interpolation and Processing block.
In one example buffer address calculation, the read address is determined by the write address, by subtracting from it a base reference value divided by the read step size. The read step size is calculated from the slit_width input. The pass bands of the filter may be determined in part by the spacing of the read and write pointers, which represent the Infinite Impulse, or feedback portion of an IIR filter design. The read address in this case may have both an integer and fractional component, the later of which is used by the interpolation and processing block.
In accordance with an embodiment, the Interpolate and Process block is used to lookup and calculate a value “in between” two successive buffer values at the audio sample rate. The interpolation may be of any type, such as well known linear, spline, or sine(x)/x windowed interpolation. By virtue of the quad interleave buffer, and corresponding interleave coefficient and state variable data structures, four simultaneous calculations may be performed at once. In addition to interpolation, the block processing includes filtering for high-pass, low-pass, or other tone shaping. The four interleave channels have differing, filter types and coefficients, for musicality and enhancing stereo imaging. In addition, there may be multiple types of interpolation needed at once, one to resolve the audio sample rate range via up-sampling and down-sampling, and one to resolve the desired slit_width.
In accordance with an embodiment, the Selection and combination block is comprised of adaptive stability compensation filtering based on the desired slit_width and slit height, recombining the 4 interleave inputs from the Interpolate and Process block by mixing the various audio channels together at different amplitudes, and calculating and applying the amplitude scaling coefficient based on the slit height input. Adaptive stability compensation filtering is important for maintaining stability of a recursive IIR design at relatively higher values of slit_width and slit height, which may be changing continuously in value.
After interpolation and processing, the audio is multiplexed in the output mux and combination block. The output multiplexing complements both the input de-multiplexing and the selection and combination blocks to accumulate the desired output audio signal and aperture spacing character.
Pitch Dispersion Transform Filter
In applications where the incoming audio stream is comprised of frequency peaks and valleys, as in the case of sounds of a definite pitch (e.g.
Each frequency aperture cell in the array will separate noise components into harmonic and inharmonic frequency multiples, therefore an incoming sound with strong frequency peaks and valleys will result in large, frequency-dependent peaks and valleys in output volume, via the incoming frequency bands lining up with the multiple pass band spacing within the cell (or cell array).
“Pi”=22/7,
“(12th root of two)”=1.0594630943592952645618252949463,
“Fs”=the sample rate of the digital audio
“Sine[ ]” is the mathematical sin of the argument in brackets
The section between the “out and DISP_OUT may simply be a HP or other digital multimode filter. The Sine function arguments may also be adjusted to the source audio and offset slightly for enhanced stereo seperation.
Input Audio Signal
The input audio signal can consist of any audio source in any format and be read in via a file-based system or streamed audio. A file-based input may include just the raw PCM data or the PCM data along with initial states of the FAA filter parameters and/or modulation data.
Modulation of the Input Audio Source
The input audio signal itself can be subject to modulation by various methods including algorithmic means (random generators, low frequency oscillation (LFO) modulation, envelope modulation, etc.), MIDI control means (MIDI Continuous Controllers, MIDI Note messages, MIDI system messages, etc.); or physical controllers which output MIDI messages or analog voltage. Other modulation methods may be possible as well.
Hybrid FAA Synthesis
Console Application
Additional Applications
The above-described systems and methods can be used in accordance with various embodiments to provide a number of different applications, including but not limited to:
The present invention may be conveniently implemented using one or more conventional general purpose or specialized digital computers or microprocessors programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.
In some embodiments, the present invention includes a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.
Number | Name | Date | Kind |
---|---|---|---|
4185531 | Oberheim et al. | Jan 1980 | A |
4649783 | Strong et al. | Mar 1987 | A |
4988960 | Tomisawa | Jan 1991 | A |
5194684 | Lisle et al. | Mar 1993 | A |
5448010 | Smith, III | Sep 1995 | A |
5524057 | Akiho et al. | Jun 1996 | A |
5684260 | Van Buskirk | Nov 1997 | A |
5777255 | Smith et al. | Jul 1998 | A |
5811706 | Van Buskirk et al. | Sep 1998 | A |
5841387 | Van Buskirk | Nov 1998 | A |
5890125 | Davis et al. | Mar 1999 | A |
5917919 | Rosenthal | Jun 1999 | A |
6008446 | Van Buskirk et al. | Dec 1999 | A |
6104822 | Melanson et al. | Aug 2000 | A |
7110554 | Brennan et al. | Sep 2006 | B2 |
7359520 | Brennan et al. | Apr 2008 | B2 |
20030063759 | Brennan et al. | Apr 2003 | A1 |
20030108214 | Brennan et al. | Jun 2003 | A1 |
20040131203 | Liljeryd et al. | Jul 2004 | A1 |
20050111683 | Chabries et al. | May 2005 | A1 |
20050132870 | Sakurai et al. | Jun 2005 | A1 |
20080260175 | Elko | Oct 2008 | A1 |
20080304676 | Alves et al. | Dec 2008 | A1 |
20090220100 | Ohta et al. | Sep 2009 | A1 |
20090323976 | Asada et al. | Dec 2009 | A1 |
20100124341 | Kano | May 2010 | A1 |
20120099732 | Visser | Apr 2012 | A1 |
20120128177 | Truman et al. | May 2012 | A1 |
20120166187 | Van Buskirk et al. | Jun 2012 | A1 |
20120288124 | Fejzo et al. | Nov 2012 | A1 |
Entry |
---|
Karplus and Strong, “Digital Synthesis of Plucked-String and Drum Timbres,” Computer Music Journal, vol. 7, No. 2 (Summer 1983), pp. 43-55, The MIT Press. |
Jaffe and Smith, “Extensions of the Karplus-Strong Plucked-String Algorithm,” Computer Music Journal, vol. 7, No. 2 (Summer 1983), pp. 56-69, The MIT Press. |
Office Action dated Dec. 19, 2012, in U.S. Appl. No. 13/196,690. |
Office Action dated Jul. 18, 2013, in U.S. Appl. No. 13/196,690. |
Number | Date | Country | |
---|---|---|---|
20120166187 A1 | Jun 2012 | US |
Number | Date | Country | |
---|---|---|---|
61378765 | Aug 2010 | US | |
61379094 | Sep 2010 | US |