This disclosure relates to audio devices and, more particularly, to audio devices that generate audio output based on audio formats such as musical instrument digital interface (MIDI).
Musical Instrument Digital Interface (MIDI) is a format for the creation, communication and playback of audio sounds, such as music, speech, tones, alerts, and the like. A device that supports the MIDI format may store sets of audio information that can be used to create various “voices.” Each voice may correspond to a particular sound, such as a musical note by a particular instrument. For example, a first voice may correspond to a middle C as played by a piano, a second voice may correspond to a middle C as played by a trombone, a third voice may correspond to a D# as played by a trombone, and so on. In order to replicate the sounds played by various instruments, a MIDI compliant device may include a set of information for voices that specify various audio characteristics, such as the behavior of a low-frequency oscillator, effects such as vibrato, and a number of other audio characteristics that can affect the perception of different sounds. Almost any sound can be defined, conveyed in a MIDI file, and reproduced by a device that supports the MIDI format.
A device that supports the MIDI format may produce a musical note (or other sound) when an event occurs that indicates that the device should start producing the note. Similarly, the device stops producing the musical note when an event occurs that indicates that the device should stop producing the note. An entire musical composition may be coded in accordance with the MIDI format by specifying events that indicate when certain voices should start and stop. In this way, the musical composition may be stored and transmitted in a compact file format according to the MIDI format.
MIDI is supported in a wide variety of devices. For example, wireless communication devices, such as radiotelephones, may support MIDI files for downloadable ringtones or other audio output. Digital music players, such as the “iPod” devices sold by Apple Computer, Inc and the “Zune” devices sold by Microsoft Corp. may also support MIDI file formats. Other devices that support the MIDI format may include various music synthesizers such as keyboards, sequencers, voice encoders (vocoders), and rhythm machines. In addition, a wide variety of devices may also support playback of MIDI files or tracks, including wireless mobile devices, direct two-way communication devices (sometimes called walkie-talkies), network telephones, personal computers, desktop and laptop computers, workstations, satellite radio devices, intercom devices, radio broadcasting devices, hand-held gaming devices, circuit boards installed in devices, information kiosks, video game consoles, various computerized toys for children, on-board computers used in automobiles, watercraft and aircraft, and a wide variety of other devices.
A number of other types of audio formats, standards and techniques have also been developed. Other examples include standards defined by the Motion Pictures Expert Group (MPEG), windows media audio (WMA) standards, standards by Dolby Laboratories, Inc., and quality assurance techniques developed by THX, ltd., to name a few. Moreover, many audio coding standards and techniques continue to emerge, including the digital MP3 standard and variants of the MP3 standard, such as the advanced audio coding (AAC) standard used in “iPod” devices. Various video coding standards may also use audio coding techniques, e.g., to code multimedia frames that include audio and video information.
In general, this disclosure describes techniques for processing audio files. The techniques may be particularly useful for playback of audio files that comply with the musical instrument digital interface (MIDI) format, although the techniques may be useful with other audio formats, techniques or standards. As used herein, the term MIDI file refers to any audio information that contains at least one audio track that conforms to the MIDI format. According to this disclosure, techniques make use of a plurality of hardware elements that operate simultaneously to service various synthesis parameters generated from one or more audio files, such as MIDI files.
In one aspect, this disclosure provides a method comprising storing audio synthesis parameters generated for one or more audio files of an audio frame, processing a first audio synthesis parameter using a first audio processing element of a hardware unit to generate first audio information, processing a second audio synthesis parameter using a second audio processing element of the hardware unit to generate second audio information, and generating audio samples for the audio frame based at least in part on a combination of the first and second audio information.
In another aspect, this disclosure provides a device comprising a memory that stores audio synthesis parameters generated for one or more audio files of an audio frame, and a hardware unit that generates audio samples for the audio frame based on the audio synthesis parameters. The hardware unit includes a first audio processing element that generates first audio information based on a first audio synthesis parameter, and a second audio processing element that generates second audio information based on a second audio synthesis parameter, wherein the hardware unit generates the audio samples based at least in part on a combination of the first and second audio information.
In another aspect, this disclosure provides a device comprising means for storing audio synthesis parameters generated for one or more audio files of an audio frame, means for processing a first audio synthesis parameter to generate first audio information, means for processing a second audio synthesis parameter to generate second audio information, and means for generating audio samples for the audio frame based at least in part on a combination of the first and second audio information.
In another aspect, this disclosure provides a computer-readable medium comprising instructions that upon execution cause one or more processors to store audio synthesis parameters generated for one or more audio files of an audio frame, process a first audio synthesis parameter using a first audio processing element of a hardware unit to generate first audio information, process a second audio synthesis parameter using a second audio processing element of the hardware unit to generate second audio information, and generate audio samples for the audio frame based at least in part on a combination of the first and second audio information.
In another aspect, this disclosure provides a circuit configured to store audio synthesis parameters generated for one or more audio files of an audio frame, process a first audio synthesis parameter using a first audio processing element of a hardware unit to generate first audio information, process a second audio synthesis parameter using a second audio processing element of the hardware unit to generate second audio information, and generate audio samples for the audio frame based at least in part on a combination of the first and second audio information.
The details of one or more aspects of this disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
This disclosure describes techniques for processing audio files. The techniques may be particularly useful for playback of audio files that comply with the musical instrument digital interface (MIDI) format, although the techniques may be useful with other audio formats, techniques or standards that make use of synthesis parameters. As used herein, the term MIDI file refers to any audio data or file that contains at least one audio track that conforms to the MIDI format. Examples of various file formats that may include MIDI tracks include CMX, SMAF, XMF, SP-MIDI to name a few. CMX stands for Compact Media Extensions, developed by Qualcomm Inc. SMAF stands for the Synthetic Music Mobile Application Format, developed by Yamaha Corp. XMF stands for eXtensible Music Format, and SP-MIDI stands for Scalable Polyphony MIDI.
MIDI files, or other audio files can be conveyed between devices within audio frames, which may include audio information or audio-video (multimedia) information. An audio frame may comprise a single audio file, multiple audio files, or possibly one or more audio files and other information such as coded video frames. Any audio data within an audio frame may be termed an audio file, as used herein, including streaming audio data or one or more audio file formats listed above. According to this disclosure, techniques make use of a plurality of hardware elements that operate simultaneously to service various synthesis parameters generated from one or more audio files, such as MIDI files.
The described techniques may improve processing of audio files, such as MIDI files. The techniques may separate different tasks into software, firmware, and hardware. A general purpose processor may execute software to parse audio files of an audio frame and thereby identify timing parameters, and to schedule events associated with the audio files. The scheduled events can then be serviced by a DSP in a synchronized manner, as specified by timing parameters in the audio files. The general purpose processor dispatches the events to the DSP in a time-synchronized manner, and the DSP processes the events according to the time-synchronized schedule in order to generate synthesis parameters. The DSP then schedules processing of the synthesis parameters in a hardware unit, and the hardware unit can generate audio samples based on the synthesis parameters.
The synthesis parameters generated by the DSP can be stored in memory prior to processing by the hardware unit. According to this disclosure, the hardware unit includes a plurality of processing elements that operate simultaneously to service the different synthesis parameters. A first audio processing element, for example, processes a first audio synthesis parameter to generate first audio information. A second audio processing element processes a second audio synthesis parameter to generate second audio information. Audio samples can then be generated based at least in part on a combination of the first and second audio information. The different processing elements may each comprise an arithmetic logic unit that supports operations such as multiply, add and accumulate. In addition, each processing element may also support hardware specific operations for loading and/or storing to other hardware components such as a low frequency oscillator, a waveform fetch unit, and a summing buffer.
Alternatively, the tasks associated with MIDI file processing can be delegated between two different threads of a DSP and the dedicated hardware. That is to say, the tasks associated with the general purpose processor (as described herein) could alternatively be executed by a first thread of a multi-threaded DSP. In this case, the first thread of the DSP executes the scheduling, a second thread of the DSP generates the synthesis parameters, and the hardware unit generates audio samples based on the synthesis parameters. This alternative example may also be pipelined in a manner similar to the example that uses a general purpose processor for the scheduling.
The various components illustrated in
As illustrated in the example of
Device 4 may implement an architecture that separates audio processing tasks between software, hardware and firmware. As shown in
Once DSP 12 has generated the synthesis parameters, these synthesis parameters can be stored in memory unit 10. Memory unit 10 may comprise volatile or non-volatile storage. In order to support quick data transfer, memory unit 10 may comprise random access memory (RAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), FLASH memory, or the like. In any case, the synthesis parameters stored in memory unit 10 can be serviced by audio hardware unit 14 to generate audio samples.
In accordance with this disclosure, audio hardware unit 14 generates audio samples based on the synthesis parameters. To do so, audio hardware unit 14 may include a number of hardware components that can help to process the synthesis parameters in a fast and efficient manner. For example, according to this disclosure, audio hardware unit 14 includes a plurality of audio processing elements that operate simultaneously to service the different synthesis parameters. A first audio processing element, for example, processes a first audio synthesis parameter to generate first audio information while a second audio processing element processes a second audio synthesis parameter to generate second audio information. Audio samples can then be generated by hardware unit 14 based at least in part on a combination of the first and second audio information generated by the different audio processing elements in hardware unit.
The different processing elements in audio hardware unit 14 may each comprise an arithmetic logic unit (ALU) that supports operations such as multiply, add and accumulate. In addition, each processing element may also support hardware specific operations for loading and/or storing to other hardware components. The other hardware components in audio hardware unit, for example, may comprise an low frequency oscillator (LFO), a waveform fetch unit (WFU), and a summing buffer (SB). Thus, the processing elements in audio hardware unit 14 may support and execute instructions for interacting and using these other hardware components in the audio processing. Additional details of one example of audio hardware unit 14 are provided in greater detail below with reference to
In some cases, the processing of audio files by device 4 may be pipelined. For example, processor 8, DSP 12 and audio hardware unit 14 may operate simultaneously with respect to successive audio frames. Each of the audio frames may correspond to a block of time, e.g., a 10 millisecond (ms) interval, that includes many coded audio samples. Digital output of hardware unit 14, for example, many include 480 digital audio samples per audio frame, which can be converted into an analog audio signal by digital-to-analog converter 16. Many events may correspond to one instance of time so that many different sounds or notes can be included in one instance of time according to the MIDI format or similar audio format. Of course, the amount of time delegated to any audio frame and the number of audio samples defined in one frame may vary in different implementations.
In some cases, audio samples generated by audio hardware unit 14 are delivered back to DSP 12, e.g., via interrupt-driven techniques. In this case, DSP may also perform post processing techniques on the audio samples. The post processing may include filtering, scaling, volume adjustment, or a wide variety of audio post processing that may ultimately enhance the sound output. Digital-to-analog converter (DAC) 16 then converts the audio samples into analog signals, which can be used by drive circuit 18 to drive speakers 19A and 19B for output of audio sounds to a user.
Memory 10 may be structured such that processor 8, DSP 12 and MIDI hardware 14 can access any information needed to perform the various tasks delegated to these different components. In some cases, the storage layout of MIDI information in memory 10 may be arranged to allow for efficient access from the different components 8, 12 and 14. Again, memory 10 is used to store (among other things) the synthesis parameters associated with one or more audio files. Once DSP 12 generates these synthesis parameters, they can be processed by hardware unit 14, as explained herein, to generate audio samples. The audio samples generated by audio hardware unit 18 may comprise pulse-code modulation (PCM) samples, which are digital representations of an analog signal wherein the analog signal is sampled at regular intervals. Additional details of exemplary audio generation by audio hardware unit 14 are discussed in greater detail below with reference to
In addition, audio hardware unit 20 may include a coordination module 32. Coordination module 32 coordinates data flows within audio hardware unit 20. When audio hardware unit 20 receives an instruction from DSP 12 (
At the direction of coordination module 32, synthesis parameters may be loaded from memory 10 (
The instructions loaded into program RAM unit 44A or 44N instruct the associated processing element 34A or 34N to synthesize one of the voices indicated in the list of synthesis parameters in VPS RAM unit 46A or 46N. There may be any number of processing elements 34A-34N (collectively “processing elements 34”), and each may comprise one or more ALUs that are capable of performing mathematical operations, as well as one or more units for reading and writing data. Only two processing elements 34A and 34N are illustrated for simplicity, but many more may be included in hardware unit 20. Processing elements 34 may synthesize voices in parallel with one another. In particular, the plurality of different processing elements 34 work in parallel to process different synthesis parameters. In this manner, a plurality of processing elements 34 within audio hardware unit 20 can accelerate and possibly improve the generation of audio samples.
When coordination module 32 instructs one of processing elements 34 to synthesize a voice, the respective processing element may execute one or more instructions associated with the synthesis parameters. Again, these instructions may be loaded into program RAM unit 44A or 44N. The instructions loaded into program RAM unit 44A or 44N cause the respective one of processing elements 34 to perform voice synthesis. For example, processing elements 34 may send requests to a waveform fetch unit (WFU) 36 for a waveform specified in the synthesis parameters. Each of processing elements 34 may use WFU 36. An arbitration scheme may be used to resolve any conflicts if two or more processing elements 34 request use of WFU 36 at the same time.
In response to a request from one of processing elements 34, WFU 36 returns one or more waveform samples to the requesting processing element. However, because a wave can be phase shifted within a sample, e.g., by up to one cycle of the wave, WFU 36 may return two samples in order to compensate for the phase shifting using interpolation. Furthermore, because a stereo signal may include two separate waves for the two stereophonic channels, WFU 36 may return separate samples for different channels, e.g., resulting in up to four separate samples for stereo output.
After WFU 36 returns audio samples to one of processing elements 34, the respective processing element may execute additional program instructions based on the synthesis parameters. In particular, instructions cause one of processing elements 34 to request an asymmetric triangular wave from a low frequency oscillator (LFO) 38 in audio hardware unit 20. By multiplying a waveform returned by WFU 36 with a triangular wave returned by LFO 38, the respective processing element may manipulate various sonic characteristics of the waveform to achieve a desired audio affect. For example, multiplying a waveform by a triangular wave may result in a waveform that sounds more like a desired musical instrument.
Other instructions executed based on the synthesis parameters may cause a respective one of processing elements 34 to loop the waveform a specific number of times, adjust the amplitude of the waveform, add reverberation, add a vibrato effect, or cause other effects. In this way, processing elements 34 can calculate a waveform for a voice that lasts one MIDI frame. Eventually, a respective processing element may encounter an exit instruction. When one of processing elements 34 encounters an exit instruction, that processing element signals the end of voice synthesis to coordination module 32. The calculated voice waveform can be provided to summing buffer 40 at the direction of another store instruction during the execution of the program instructions, and causes summing buffer 40 to store that calculated voice waveform.
When summing buffer 40 receives a calculated waveform from one of processing elements 34, summing buffer 40 adds the calculated waveform to the proper instance of time associated with an overall waveform for a MIDI frame. Thus, summing buffer 40 combines output of the plurality of processing elements 34. For example, summing buffer 40 may initially store a flat wave (i.e., a wave where all digital samples are zero.) When summing buffer 40 receives audio information such as a calculated waveform from one of processing elements 34, summing buffer 40 can add each digital sample of the calculated waveform to respective samples of the waveform stored in summing buffer 40. In this way, summing buffer 40 accumulates and stores an overall digital representation of a waveform for a full audio frame.
Summing buffer 40 essentially sums different audio information from different ones of processing elements 34. The different audio information is indicative of different instances of time associated with different generated voices. In this manner, summing buffer 40 creates audio samples representative of an overall audio compilation within a given audio frame.
Processing elements 34 may operate in parallel with one another, yet independently. That is to say, each of processing elements 34 may process a synthesis parameter, and then move on to the next synthesis parameter once the audio information generated for the first synthesis parameter is added to summing buffer 40. Thus, each of processing elements 34 performs its processing tasks for one synthesis parameter independently of the other processing elements 34, and when the processing for synthesis parameter is complete that respective processing element becomes immediately available for subsequent processing of another synthesis parameter.
Eventually, coordination module 32 may determine that processing elements 34 have completed synthesizing all of the voices required for the current audio frame and have provided those voices to summing buffer 40. At this point, summing buffer 40 contains digital samples indicative of a completed waveform for the current audio frame. When coordination module 32 makes this determination, coordination module 32 sends an interrupt to DSP 12 (
Cache memory 48, WFU/LFO memory 39 and linked list memory 42 are also shown in
In accordance with this disclosure, any number of processing elements 34 may be included in audio hardware unit 20 provided that a plurality of processing elements 34 operate simultaneously with respect to different synthesis parameters stored in memory 10 (
Processing elements 34 may process all of the synthesis parameters for an audio frame. After processing each respective synthesis parameter, the respective one of processing elements 34 adds its processed audio information in to the accumulation in summing buffer 40, and then moves on to the next synthesis parameter. In this way, processing elements 34 work collectively to process all of the synthesis parameters generated for one or more audio files of an audio frame. Then, after the audio frame is processed and the samples in summing buffer are sent to DSP 12 for post processing, processing elements 34 can begin processing the synthesis parameters for the audio files of the next audio frame.
Again, first audio processing element 34A processes a first audio synthesis parameter to generate first audio information while a second audio processing element 34N processes a second audio synthesis parameter to generate second audio information. At this point, first processing element 34A may process a third audio synthesis parameter to generate third audio information while a second audio processing element 34N processes a fourth audio synthesis parameter to generate fourth audio information. Summing buffer 40 can combine the first, second, third and fourth audio information in the creation of one or more audio samples.
Decoder 55 is coupled to a respective one of program RAM units 44A or 44B (shown in
ALU 54 may support one or more multiply operations, one or more add operations, and one or more accumulate operations. ALU 54 can execute these operations in the processing of synthesis parameters. These basic operations may form a fundamental set of logic operations typically needed to service synthesis parameters. These basic operations, however, may also provide flexibility to processing element 50 such that it can be used for other purposes unrelated to synthesis parameter processing.
Load/store logic 56 support one or more loading operations and one or more storing operations associated with a specific audio format. Load/store logic 56 can execute these load and store operations in the processing of synthesis parameters. In this manner, load/store logic 56 facilitates the use of other hardware for that specific audio format via loading and storing operations to such logic. As an example, for the MIDI format, load/store logic 56 may support separate operations for a low frequency oscillator such as LFO 38 (
A plurality of different processing elements 34 then simultaneously process different synthesis parameters (82A, 82B and 82C). In particular, a first synthesis parameter is processed in a first processing element 34A (82A), a second synthesis parameter is processed in a second processing element (not shown in
Any number of processing elements 34 may be used. Any time that one of processing elements 34 finishes the respective processing and encounters exit and store instructions, the generated audio information associated with that processing element is accumulated in summing buffer 40 (83). In this manner, accumulation is used to generate audio samples in summing buffer 40. If more synthesis parameters exist for the audio frame (yes branch of 84), the respective processing element 34 then processes the next synthesis parameter (82A, 82B or 82C). This process continues until all of the synthesis parameters for the audio frame are serviced (no branch of 84). At this point, summing buffer 40 outputs the audio samples for the audio frame (85). For example, coordination module 32 may send an interrupt command to DSP 12 (
DSP 12 then processes the MIDI events according to the timing defined by processor 8 to generate synthesis parameters (63). The synthesis parameters generated by DSP 12 can be stored in memory (64). At this point, processing elements 34 simultaneously process different synthesis parameters (65A, 65B and 65C). Any time that one of processing elements 34 finishes the respective processing, the generated audio information associated with that processing element is combined with an accumulation in summing buffer 40 (66) to generate audio samples. If more synthesis parameters exist for the audio frame (yes branch of 67), the respective processing element 34 then processes the next synthesis parameter (65A, 65B or 65C). This process continues until all of the synthesis parameters for the audio frame are processed (no branch of 67). At this point, summing buffer 40 outputs the audio samples for the audio frame (68). For example, coordination module 32 may send an interrupt command to DSP 12 (
DSP 12 performs post processing on the audio samples (69). The post processing may include filtering, scaling, volume adjustment, or a wide variety of audio post processing that may ultimately enhance the sound output. Following the post processing, DSP 12 may output the post processed audio samples to DAC 16, which converts the digital audio samples into an analog signal (70). The output of DAC 16 may be provided to drive circuit 18, which amplifies the signal to drive one or more speakers 19A and 19B to create audible sound that is output to the user (71).
In some cases, the processing by processor 8, DSP 12 and processing elements 34 of hardware unit 20 may be pipelined. That is to say, when processing elements 34 are processing the synthesis parameters for frame N+2, DSP 12 may be generating synthesis parameters for frame N+1 and processor 8 may be scheduling events for frame N. Although not shown in
Various examples have been described. One or more aspects of the techniques described herein may be implemented in hardware, software, firmware, or combinations thereof. Any features described as modules or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, one or more aspects of the techniques may be realized at least in part by a computer-readable medium comprising instructions that, when executed, performs one or more of the methods described above. The computer-readable data storage medium may form part of a computer program product, which may include packaging materials. The computer-readable medium may comprise random access memory (RAM) such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media, and the like. The techniques additionally, or alternatively, may be realized at least in part by a computer-readable communication medium that carries or communicates code in the form of instructions or data structures and that can be accessed, read, and/or executed by a computer.
The instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated software modules or hardware modules configured or adapted to perform the techniques of this disclosure.
If implemented in hardware, one or more aspects of this disclosure may be directed to a circuit, such as an integrated circuit, chipset, ASIC, FPGA, logic, or various combinations thereof configured or adapted to perform one or more of the techniques described herein. The circuit may include both the processor and one or more hardware units, as described herein, in an integrated circuit or chipset.
It should also be noted that a person having ordinary skill in the art will recognize that a circuit may implement some or all of the functions described above. There may be one circuit that implements all the functions, or there may also be multiple sections of a circuit that implement the functions. With current mobile platform technologies, an integrated circuit may comprise at least one DSP, and at least one Advanced Reduced Instruction Set Computer (RISC) Machine (ARM) processor to control and/or communicate to DSP or DSPs. Furthermore, a circuit may be designed or implemented in several sections, and in some cases, sections may be re-used to perform the different functions described in this disclosure.
Various aspects and examples have been described. However, modifications can be made to the structure or techniques of this disclosure without departing from the scope of the following claims. For example, other types of devices could also implement the processing techniques described herein. Also, although the exemplary hardware unit 20 shown in
Claim of Priority under 35 U.S.C. §119 The present Application for Patent claims priority to Provisional Application No. 60/896,462 entitled “AUDIO PROCESSING HARDWARE ELEMENTS” filed Mar. 22, 2007, and assigned to the assignee hereof and hereby expressly incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
60896462 | Mar 2007 | US |