This invention generally relates to the field of digital audio signal processing. In particular, the invention is directed to digital audio signal programming and processing in the simulation of sounds moving through three dimensional (3D) space within a multimedia application.
3D positional audio in multimedia applications uses signal processing to localize a single sound to a specific location in three dimensional space around the listener. 3D positional audio is the most common sound effect used in multimedia applications such as interactive games because a sound effect, such as the sound of an opponent's automobile, can be localized to a specific position. This position, for instance, could be behind the listener and quickly moving around the left side while all the other sounds are positioned separately.
One of the reasons that 3D positional audio is so popular in action video games is because it can be interactive. Sounds don't have to be preprocessed during the game's development to position the sound. As the listener changes location in a virtual world, all the sound objects can maintain their correct location speed and path of motion around the listener as the action unfolds.
3D positional audio generally refers to a system where multimedia applications can use application programming interfaces (API's) to set the position of sounds in 3D space. “Head-Related Transfer Function” (HRTF) is one mechanism for achieving that. HRTF is a method by which sounds are processed to localize them in space around the player or user. Although this technique is acceptable for 3D positioning, it requires a large amount of processing power. This is the reason 3D audio hardware accelerators are becoming so common in personal computers (PCs). Another might be surrounding the user with speakers, etc.
Developers of multimedia applications such as interactive video games that have 3D audio generally use an 3D audio application programming interface (API) that interfaces with lower level 3D audio rendering routines and/or the audio hardware accelerator to include 3D audio capability. An API is series of software routines and development tools that comprise an interface between a computer application and lower-level services and functions (e.g. the operating system, device drivers, and other low-level software). APIs serve as building blocks for programmers putting together software applications. For example, in the case of interactive multimedia applications having 3D audio, developers may use 3D audio APIs such as Microsoft® DirectSound3D® API, Environmental Audio Extensions (EAX®), and Aureal® 3D (A3D®). These, in turn may have lower level audio rendering APIs.
However, in many common 3D audio APIs, the hardware resources, raw audio data, and 3D audio positional parameters are all encapsulated in a single monolithic 3D buffer object. Also, the 3D audio sound-source object within a 3D audio API may tie 3D positional parameters to a given audio voice. By coupling 3D parameters to rendering resources, these designs inherently tie 3D audio positional algorithms to the underlying rendering API, restricting a multimedia application developer's ability to modify such functionality to suit their needs.
Thus, there is a need for systems and methods for 3D audio programming and processing that does not tie 3D audio positional algorithms to the underlying audio rendering API, and provides more transparency and flexibility to application developers by allowing them to alter the way geometry calculations behave independent of low level digital signal processing (DSP) implementation.
The invention is directed to systems and methods for 3D audio programming and processing. In particular, a method is described for three dimensional (3D) audio processing comprising calculating digital signal processing (DSP) settings for 3D audio effects for a digital audio signal independently of DSP rendering of the digital audio signal. Also, the act of calculating DSP settings may comprise receiving coordinates of locations within a 3D environment representing at least one sound source and at least one audio listener and calculating the DSP settings for 3D audio effects based on the distances between at least one sound source and at least one audio listener. This method may further comprise receiving at least one parameter relating to audio behavior from the at least one sound source in relation to the at least one listener within the 3D environment and calculating the DSP settings for 3D audio effects based on the at least one parameter received and the distances between the at least one sound source and at least one audio listener. The DSP settings may then be communicated to a multimedia application engine or an audio rendering application programming interface (API).
Additional features of the invention will be made apparent from the following detailed description of illustrative embodiments that proceeds with reference to the accompanying drawings.
The foregoing summary, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the invention, there is shown in the drawings illustrative embodiments of the invention; however, the invention is not limited to the specific methods and instrumentalities disclosed. In the drawings:
Referring first to
Due to the interactive aspect of many multimedia applications such as computers games, it is desirable to be able to render multiple sounds within a scene, often dozens or more at a time. Though it is unlikely that the listener will be able to hear and locate any more sounds than that at one time, which ones will be required at any instant is probably impossible to predict in any well-written interactive multimedia application. Therefore, the system must be rendering multiple sounds at all times, even if they are currently playing at lower volumes than other sounds and therefore momentarily inaudible. The more sources that can be rendered at once, the better an interactive audio rendering engine can perform the illusion of a realistic sound environment, and the more layers of sound, the closer it approaches realism. Accordingly, each point on the 3D coordinate system of
Referring next to
Referring next to
Referring next to
A multimedia application using the geometry information module 21, will treat 3D mathematical computations (using the 3D emitter and listener parameters 29) as separate functionality, independent from rendering an audio voice (voice) performed at the level of the audio rendering application programming interface (API) 23. Instead of creating a voice with 3D properties, the multimedia application engine geometry information module will create a generic “voice” for rendering represented by the DSP settings 27. The generic voice has no integrated 3D properties, only various DSP settings 27 representing its signal processing capabilities such as matrix, delay, filter coefficient, reverb send levels, for example.
For each sound position, the multimedia application engine geometry information module 21 will create an audio emitter, such as those emitters 4, 5, 7, 9, 11 depicted in
Referring next to
Referring next to
As mentioned above, for each sound position, the multimedia application engine geometry information module 21 will create 31 an audio emitter, such as those emitters 4, 5, 7, 9, 11 depicted in
The library routines of the DSP settings generator 25 may use explicit piecewise curves made up of linear segments to directly define DSP behavior with respect to distance. This allows sound designers to better visualize and more accurately control 3D audio processing on a per-emitter basis. The piecewise curves could also be nonlinear. Also, the curves could be described algorithmically, rather than as a table of line segments. Below are a few examples of such curves that may be used, however, these are not all inclusive of curves that may be used to define DSP behavior. Any variety of curves with varying shapes and applicability to audio behavior may be used instead of or in addition to the examples provided herein. Also, the curves can have any number of points, be user-definable, be modified dynamically, and can be shared among many emitters to avoid wasting memory with redundant parameters structures.
Referring next to
Referring next to
Referring next to
Referring next to
The audio processing system of
An example for the use of positional multi-channel sounds would be audio-modeling more realistic sounds for a car. One could have mono waves for tires at the corners, combined with a stereo wave for mufflers at the rear, and another stereo wave for engine sounds coming from the front. Changes to listener or emitter orientation would cause the entire sound field to rotate appropriately with respect to the listener. Such functionality allows multimedia application sound designers to more easily create complex audio environments without the extra work and runtime overhead required of breaking everything down into monaural points.
Note that a multi-channel source does not necessarily imply a multipoint source. A 5.1 wave for ambience might be authored to correspond to specific speaker locations, and dynamic orientation changes are not always desired. If a multi-channel source is sent position coordinates, the sound will be ‘transformed’ into a multipoint source with the following geometry.
The main difference between a multi-channel source and a multipoint source is that, when played back without position coordinates, a multi-channel sound will not ‘bleed’ sound from the authored speakers into other speakers the way positioned sounds must to ensure smooth panning. That is, it is not a 3D positioned sound, rather just a multi-channel sound with static speaker channel assignments.
Referring next to
With respect to a mono audio wave on a track, the audio wave can be optionally positioned for the position relative to 0.0.0 listener. By default, a mono wave's azimuth is 0 degrees, so that setting the position of the sound to x,y,z sets the position of the wave to x,y,z. Also, the sound can be assigned to one or more speaker locations, with levels.
With respect to a multi-channel wave on a track, each channel of the track's wave can be optionally positioned for the position relative to 0.0.0 listener. The default positions are:
While displayed so the channel can be properly assigned, an LFE channel does not have an associated angle (and may be displayed as a point source directly on top of the listener). Also, the sound can be assigned to one or more speaker locations, with levels.
Many sounds in nature are inherently directional. That is, they are louder in one direction than another. A common example of this is a person speaking. A talker facing the listener will be heard louder than when the talker is facing away from the listener. Thus sound cones are specific to account for this. A sound cone is specified by an inner diameter and outer diameter, and at least three signal processing parameters: volume, filter and reverb modifier. An example of possible settings in defining a sound cone is provided below:
The same types of user-definable curves for how these parameters vary between the inner radius and outer radius may optionally be provided. However, in the example above, a linear value is interpolated between the inner value and outer value for each parameter.
Orientation of the emitter and listener in that support for orientation has the function of determining how a multipoint source should be positioned relative to the listener. For instance, if the listener's orientation is due north while the sound source's orientation is due south, and a channel of that sound source is directed to play at 45 degrees (to the ‘right’), it will actually be transformed by the listener's opposing orientation to be heard from the ‘left’.
Also, if a multimedia application has more than one source position for a single sound, a single sound source may be rendered to multiple listeners by adding (in the case of volume) the results of all the source/listener volume calculations together before sending it to an audio renderer. As an example, in a skiing video game, the ski resort in the video game has set up loudspeakers at various places on the mountain for the skier's enjoyment. There is one listener (the skier), one sound (music played by the ski resort), but multiple sound sources (each of the speakers). All the speaker volumes are calculated for each sound source, and they are then summed together. Then they are applied to the audio voice that's playing the music. The result is that as the skier in the video game skis closer to one particular speaker, the music gets louder (from that speaker) while the skier still hears some from behind them (from the speaker further up the mountain).
Exemplary Multimedia Console
Referring next to
A graphics processing unit (GPU) 108 and a video encoder/video codec (coder/decoder) 114 form a video processing pipeline for high speed and high resolution graphics processing. Data is carried from the graphics processing unit 108 to the video encoder/video codec 114 via a bus. The video processing pipeline outputs data to an A/V (audio/video) port 140 for transmission to a television or other display. A memory controller 110 is connected to the GPU 108 and CPU 101 to facilitate processor access to various types of memory 112, such as, but not limited to, a RAM (Random Access Memory).
The multimedia console 100 includes an I/O controller 120, a system management controller 122, an audio processing unit 123, a network interface controller 124, a first USB host controller 126, a second USB controller 128 and a front panel I/O subassembly 130 that are preferably implemented on a module 118. The USB controllers 126 and 128 serve as hosts for peripheral controllers 142(1)-142(2), a wireless adapter 148, and an external memory unit 146 (e.g., flash memory, external CD/DVD ROM drive, removable media, etc.). The network interface 124 and/or wireless adapter 148 provide access to a network (e.g., the Internet, home network, etc.) and may be any of a wide variety of various wired or wireless interface components including an Ethernet card, a modem, a Bluetooth module, a cable modem, and the like.
System memory 143 is provided to store application data that is loaded during the boot process. A media drive 144 is provided and may comprise a DVD/CD drive, hard drive, or other removable media drive, etc. The media drive 144 may be internal or external to the multimedia console 100. Application data may be accessed via the media drive 144 for execution, playback, etc. by the multimedia console 100. The media drive 144 is connected to the I/O controller 120 via a bus, such as a Serial ATA bus or other high speed connection (e.g., IEEE 1394).
The system management controller 122 provides a variety of service functions related to assuring availability of the multimedia console 100. The audio processing unit 123 and an audio codec 132 form a corresponding audio processing pipeline with high fidelity, 3D, surround, and stereo audio processing according to aspects of the present invention described above. Audio data is carried between the audio processing unit 123 and the audio codec 126 via a communication link. The audio processing pipeline outputs data to the A/V port 140 for reproduction by an external audio player or device having audio capabilities.
The front panel I/O subassembly 130 supports the functionality of the power button 150 and the eject button 152, as well as any LEDs (light emitting diodes) or other indicators exposed on the outer surface of the multimedia console 100. A system power supply module 136 provides power to the components of the multimedia console 100. A fan 138 cools the circuitry within the multimedia console 100.
The CPU 101, GPU 108, memory controller 110, and various other components within the multimedia console 100 are interconnected via one or more buses, including serial and parallel buses, a memory bus, a peripheral bus, and a processor or local bus using any of a variety of bus architectures.
When the multimedia console 100 is powered on or rebooted, application data may be loaded from the system memory 143 into memory 112 and/or caches 102, 104 and executed on the CPU 101. The application may present a graphical user interface that provides a consistent user experience when navigating to different media types available on the multimedia console 100. In operation, applications and/or other media contained within the media drive 144 may be launched or played from the media drive 144 to provide additional functionalities to the multimedia console 100.
The multimedia console 100 may be operated as a standalone system by simply connecting the system to a television or other display. In this standalone mode, the multimedia console 100 may allow one or more users to interact with the system, watch movies, listen to music, and the like. However, with the integration of broadband connectivity made available through the network interface 124 or the wireless adapter 148, the multimedia console 100 may further be operated as a participant in a larger network community.
Referring next to
The multimedia console depicted in
Also, over time, system features may be updated or added to a multimedia application. For example, complex audio environments associated with multimedia applications are becoming increasingly more prevalent. The systems and methods described herein allow a multimedia application sound designers to more easily create complex audio environments involving 3D audio without the extra work and runtime overhead required of breaking everything down into monaural points.
Exemplary Computing and Network Environment
Although the 3D audio processing system has been described thus far as it is applicable to a multimedia console, the processing may run and also be used on other computing systems such as the exemplary computing and network environment described below. Referring to
Aspects of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Aspects of the invention may be implemented in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Aspects of the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
An exemplary system for implementing aspects of the invention includes a general purpose computing device in the form of a computer 241. Components of computer 241 may include, but are not limited to, a processing unit 259, a system memory 222, and a system bus 221 that couples various system components including the system memory to the processing unit 259. The system bus 221 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
Computer 241 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 241 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 241. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
The system memory 222 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 223 and random access memory (RAM) 260. A basic input/output system 224 (BIOS), containing the basic routines that help to transfer information between elements within computer 241, such as during start-up, is typically stored in ROM 223. RAM 260 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 259. By way of example, and not limitation,
The computer 241 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
The computer 241 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 246. The remote computer 246 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 241, although only a memory storage device 247 has been illustrated in
When used in a LAN networking environment, the computer 241 is connected to the LAN 245 through a network interface or adapter 237. When used in a WAN networking environment, the computer 241 typically includes a modem 250 or other means for establishing communications over the WAN 249, such as the Internet. The modem 250, which may be internal or external, may be connected to the system bus 221 via the user input interface 236, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 241, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention. In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs that may implement or utilize the processes described in connection with the invention, e.g., through the use of an API, reusable controls, or the like. Such programs are preferably implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations.
Although exemplary embodiments refer to utilizing aspects of the invention in the context of one or more stand-alone computer systems, the invention is not so limited, but rather may be implemented in connection with any computing environment, such as a network or distributed computing environment. Still further, aspects of the invention may be implemented in or across a plurality of processing chips or devices, and storage may similarly be effected across a plurality of devices. Such devices might include personal computers, network servers, handheld devices, supercomputers, or computers integrated into other systems such as automobiles and airplanes.
An exemplary networked computing environment is provided in
Distributed computing provides sharing of computer resources and services by exchange between computing devices and systems. These resources and services include the exchange of information, cache storage and disk storage for files. Distributed computing takes advantage of network connectivity, allowing clients to leverage their collective power to benefit the entire enterprise. In this regard, a variety of devices may have applications, objects or resources that may implicate the processes described herein.
This network 270 may itself comprise other computing entities that provide services to the system of
It can also be appreciated that an object, such as 275, may be hosted on another computing device 276. Thus, although the physical environment depicted may show the connected devices as computers, such illustration is merely exemplary and the physical environment may alternatively be depicted or described comprising various digital devices such as PDAs, televisions, MP3 players, etc., software objects such as interfaces, COM objects and the like.
There are a variety of systems, components, and network configurations that support distributed computing environments. For example, computing systems may be connected together by wired or wireless systems, by local networks or widely distributed networks. Currently, many networks are coupled to the Internet, which provides an infrastructure for widely distributed computing and encompasses many different networks. Any such infrastructures, whether coupled to the Internet or not, may be used in conjunction with the systems and methods provided.
A network infrastructure may enable a host of network topologies such as client/server, peer-to-peer, or hybrid architectures. The “client” is a member of a class or group that uses the services of another class or group to which it is not related. In computing, a client is a process, i.e., roughly a set of instructions or tasks, that requests a service provided by another program. The client process utilizes the requested service without having to “know” any working details about the other program or the service itself. In a client/server architecture, particularly a networked system, a client is usually a computer that accesses shared network resources provided by another computer, e.g., a server. In the example of
A server is typically, though not necessarily, a remote computer system accessible over a remote or local network, such as the Internet. The client process may be active in a first computer system, and the server process may be active in a second computer system, communicating with one another over a communications medium, thus providing distributed functionality and allowing multiple clients to take advantage of the information-gathering capabilities of the server. Any software objects may be distributed across multiple computing devices or objects.
Client(s) and server(s) communicate with one another utilizing the functionality provided by protocol layer(s). For example, HyperText Transfer Protocol (HTTP) is a common protocol that is used in conjunction with the World Wide Web (WWW), or “the Web.” Typically, a computer network address such as an Internet Protocol (IP) address or other reference such as a Universal Resource Locator (URL) can be used to identify the server or client computers to each other. The network address can be referred to as a URL address. Communication can be provided over a communications medium, e.g., client(s) and server(s) may be coupled to one another via TCP/IP connection(s) for high-capacity communication.
As the foregoing illustrates, the invention is directed to systems and methods for 3D audio processing. It is understood that changes may be made to the illustrative embodiments described above without departing from the broad inventive concepts disclosed herein. For example, while an illustrative embodiment has been described above as applied to a multimedia console, running video games, for example, it is understood that the invention may be embodied in other computing environments. Furthermore, while illustrative embodiments have been described with respect to particular audio behavior, embodiments including processing for other audio behavior numbers are also applicable. Accordingly, it is understood that the invention is not limited to the particular embodiments disclosed, but is intended to cover all modifications that are within the spirit and scope of the invention as defined by the appended claims.