A portion of the disclosure of this patent document contains material which is subject to copyright protection. This patent document may show and/or describe matter which is or may become trade dress of the owner. The copyright and trade dress owner has no objection to the facsimile reproduction by anyone of the patent disclosure as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright and trade dress rights whatsoever.
The process of making an audio recording commonly starts by capturing and storing one or more different audio objects to be combined into the ultimate recording. In this context, “capturing” means converting sounds audible to a listener into storable information. An “audio object” is a body of audio information that may be conveyed as one or more analog signals or digital data streams and may be stored as an analog recording or a digital data file or other data object. Raw, or unprocessed, audio objects may be commonly referred to as “tracks” in remembrance of a time when each audio object was, in fact, recorded on a physically separate track on a magnetic recording tape. Currently, “tracks” may be recorded on an analog recording tape or may be recorded digitally on digital audio tape or on a computer readable storage medium.
Digital Audio Workstations (DAWs) are commonly used by audio music professionals to integrate individual tracks into a desired final audio product that is eventually delivered to the end user. These final audio products are commonly referred to as “artistic mixes”. The creation of an artistic mix requires a considerable amount of effort and expertise. In addition artistic mixes are normally subject to approval by the artists that own the rights to the particular content.
The term “stem” is widely used to describe audio objects. The term is also widely misunderstood since “stem” is commonly given different meanings in different contexts. During cinematic production, the term “stem” usually refers to a surround audio presentation. For example, the final audio used for movie audio playback is commonly referred to as a “print master stem”. For a 5.1 presentation, the print master stem consists of 6 channels of audio—left front, right front, center, LFE (low frequency effects, commonly known as subwoofer), left rear surround, and right rear surround. Each channel in the stem typically contains a mix of several components such as music, dialog, and effects. Each of these original components, in turn, may be created from hundreds of sources or “tracks”. To complicate things even further, when films are mixed, each component of the audio presentation is “printed” or recorded separately. At the same time that the print master is being created, each major component (e.g. dialog, music, effects) may also be recorded or “printed” to a stem. These are referred to as “D M & E” or dialog, music and effects stems. Each of these components may be a 5.1 presentation containing six audio channels. When the D M & E stems are played together in synchronism, they sound exactly the same as the print master stem. The D M & E stems are created for a variety of reasons, with foreign dialog replacement being a common example.
During recorded music production, the reason for the creation of stems and the nature of the stems are substantially different from the cinematic “stems” described above. A primary motivation for stem creation is to allow recorded music to be “re-mixed”. For example, a popular song that was not meant for playing in dance clubs may be re-mixed to be more compatible with dance club music. Artists and their record labels may also release stems to the public for public relations reasons. The public (typically fairly sophisticated users with access to digital audio workstations) prepare remixes which may be released for promotional purposes. Songs may also be remixed for use in video games, such as the very successful Guitar Hero and Rock Band games. Such games rely on the existence of stems representing individual instruments. The stems created during recorded music production typically contain music from different sources. For example, a set of stems for a rock song may include drums, guitar(s), bass, vocal(s), keyboards, and percussion.
In this patent, a “stem” is a component or sub-mix of an artistic mix generated by processing one or more tracks. The processing may commonly, but not necessarily, include mixing multiple tracks. The processing may include one or more of level modification by amplification or attenuation; spectrum modification such as low pass filtering, high pass filtering, or graphic equalization; dynamic range modification such as limiting or compression; time-domain modification such as phase shifting or delay; noise, hum, and feedback suppression; reverberation; and other processes. Stems are typically generated during the creation of an artistic mix. A stereo artistic mix is typically composed of four to eight stems. As few as two stems and more than eight stems may be used for some mixes. Each stem may include a single component or a left component and a right component.
Since the most common techniques for delivering audio content to listeners have been compact discs and radio broadcasts, the majority of artistic mixes are stereo, which is to say the majority of artistic mixes have only two channels. In this patent, a “channel” is a fully-processed audio object ready to be played to a listener through an audio reproduction system. However, due to the popularity of home theater systems, many homes and other venues have surround sound multi-channel audio systems. The term “surround” refers either to source material intended to be played on more than two speakers distributed in a two or three dimensional space, or to playback arrangements which include more than two speakers distributed in two or three dimensional space. Common surround sound formats include 5.1, which includes five separate audio channels plus a low frequency effects (LFE) or sub-woofer channel; 5.0, which includes five audio channels without an LFE channel; and 7.1, which includes seven audio channels plus an LFE channel. Surround mixes of audio content have a great potential to achieve more engaging listener experience. Surround mixes may also provide a higher quality of reproduction since the audio is reproduced by an increased number of speakers and thus may require less dynamic range compression and equalization of individual channels. However, creation of another artistic mix that is designated for multi-channel reproduction requires an additional mixing session with the participation of artists and mixing engineers. The cost of a surround artistic mix may not be approved by content owners or record companies.
In this patent, any audio content to be recorded and reproduced will be referred to as a “song”. A song may be, for example, a 3-minute pop tune, a non-musical theatrical event, or a complete symphony.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
Throughout this description, elements appearing in figures are assigned three-digit reference designators, where the most significant digit is the figure number where the element is introduced and the two least significant digits are specific to the element. An element that is not described in conjunction with a figure may be presumed to have the same characteristics and function as a previously-described element having the same reference designator.
Description of Apparatus
Embodiments of this invention relate to audio signal processing and, in particular, to methods for automatic mixing of multi-channel audio signals. Referring now to
These electrical signals may be recorded by the recorder 120 as a plurality of tracks. Each track may record the sound produced by a single musician or instrument, or the sound produced by a plurality of instruments. In some cases, such as a drummer playing a set of drums, the sound produced by a single musician may be captured by a plurality of transducers. Electrical signals from the plurality of transducers may be recorded as a corresponding plurality of tracks or may be combined into a reduced number of tracks prior to recording. The various tracks to be combined into an artistic mix need not be recorded at the same time or even in the same location.
Once all of the tracks to be mixed have been recorded, the tracks may be combined into an artistic mix using the mixer 130. Functional elements of the mixer 130 may include track processors 132A-132F and adders 134L and 134R. Historically, track processors and adders were implemented by analog circuits operating on analog audio signals. Currently, track processors and adders are typically implemented using one or more digital processors such as digital signal processors. When two or more processors are present, the functional partitioning of the mixer 130 shown in
Each track processor 132A-132F may process one or more recorded tracks. The processes performed by each track processor may include some or all of summing or mixing multiple tracks; level modification by amplification or attenuation; spectrum modification such as low pass filtering, high pass filtering, or graphic equalization; dynamic range modification such as limiting or compression; time-domain modification such as phase shifting or delay; noise, hum, and feedback suppression; reverberation; and other processes. Specialized processes such as de-essing and chorusing may be performed on vocal tracks. Some processes, such as level modification, may be performed on individual tracks before they are mixed or added, and other processes may be performed after multiple tracks are mixed. The output of each track processor 132A-132F may be a respective stem 140A-140F, of which only stems 140A and 140F are identified in
In the example of
Each stem 140A-140F may include sounds produced by a particular instrument or group of instruments and musicians. The instrument or group of instruments and musicians included in a stem will be referred to herein as the “voice” of the stem. Voices may be named to reflect the musicians or instruments that contributed the tracks that were processed to generate the stem. For example, in
The stems 140A-140F generated during the creation of the stereo artistic mix 160 may be stored. Additionally, metadata identifying the voice, instrument or musician in the stem may be associated with each stem audio object. Associated metadata may be attached to each stem audio object or may be stored separately. Other metadata, such as the title of the song, the name of the group or musician, the genre of the song, the recording and/or mixing date, and other information may be attached to some or all of the stem audio objects or stored as a separate data object.
The multichannel encoder 240 may encode the surround artistic mix 235 in accordance with the MPEG-2 (Motion Picture Experts Group) standard, which allows encoding audio mixes with up to six channels for 5.1 surround audio systems. The multichannel encoder 240 may encode the surround artistic mix 235 in accordance with the Free Lossless Audio Coder (FLAC) standard, which allows encoding audio mixes with up to eight channels. The multichannel encoder 240 may encode the surround artistic mix 235 in accordance with the Advanced Audio Coding (AAC) enhancement to the MPEG-2 and MPEG-4 standards. AAC allows encoding audio mixes with up to 48 channels. The multichannel encoder 240 may encode the surround artistic mix 235 in accordance with some other standard.
The encoded audio produced by the multichannel encoder 240 may be transmitted over a distribution channel 242 to a compatible multichannel decoder 250. The distribution channel 242 may be a wireless broadcast, a network such as the Internet or a cable TV network, or some other distribution channel. The multichannel decoder 250 may recreate or nearly recreate the channels of the surround artistic mix 235 for presentation to listeners by a surround audio system 260.
As previously described, every stereo artistic mix does not necessarily have an associated surround artistic mix.
The surround mix 275 may be encoded by the multichannel encoder 240 and transmitted over a distribution channel 242 to a compatible multichannel decoder 250. The multichannel decoder 250 may recreate or nearly recreate the channels of the surround mix 275 for presentation to listeners by a surround audio system 260. In the system 200B, a single surround mix produced by the automatic surround mixer 270 is distributed to all listeners.
The encoded stems may then be transmitted via a distribution channel 242 to a compatible multichannel decoder 255. The multichannel decoder 255 may recreate or nearly recreate the stems and metadata 232. The automatic surround mixer 270 may produce a surround mix 275 based on the recreated stems and metadata. The surround mix 275 may be tailored to the listener's preferences and/or the peculiarities of the listener's surround audio system 260.
Referring now to
The automatic surround mixer 300 may include a respective stem processor 310-1 to 310-6 for each input stem, a mixing matrix 320 that combines the processed stems in various proportions to provide the output channels, and a rule engine 340 to determine how the stems should be processed and mixed.
=
Each stem processor 310-1 to 310-6 may be capable of performing processes such as level modification by amplification or attenuation; spectrum modification by low pass filtering, high pass filtering, and/or graphic equalization; dynamic range modification by limiting, compression or decompression; noise, hum, and feedback suppression; reverberation; and other processes. One or more of the stem processors 310-1 to 310-6 may be capable of performing specialized processes such as de-essing and chorusing on vocal tracks. One or more of the stem processors 310-1 to 310-6 may provide multiple outputs subject to different processes. For example, one or more of the stems processors 310-1 to 310-6 may provide a low frequency portion of the respective stem for incorporation into the LFE channel and higher frequency portions of the respective stem for incorporation into one or more of the other output channels.
Each stem input to the automatic surround mixer 300 may have been subject to some or all of these processes as part of creating a stereo artistic mix. Thus, to preserve the general sound and feel of the stereo artistic mix, minimal processing may be performed by the stem processor 310-1 to 310-6. For example, the only processing performed by the stem processors may be adding reverberation to some or all of the stems and low-pass filtering to provide the LFE channel.
Each of the stem processors 310-1 to 310-6 may process the respective stem in accordance with effects parameters 342 provided by the rule engine 340. The effects parameters 342 may include, for example, data specifying an amount of attenuation or gain, a knee frequency and a slope of any filtering to be applied, equalization coefficients, compression or decompression coefficients, a delay and a relative amplitude of reverberation, and other parameters defining processes to be applied to each stem.
The mixing matrix 320 may combine the outputs from the stem processors 310-1 to 310-6 to provide the output channels in accordance with mixing parameters 344 provided by the rule engine. For example, the mining matrix 320 may generate each output channel in accordance with the formula:
The rule engine 340 may determine the effects parameters 342 and the mixing parameters 344 based, at least in part, on metadata associated with the input stems. Metadata may be generated during the creation of a stereo artistic mix and may be attached to each stem object and/or included in a separate data object. The metadata may include, for example, the voice or type of instrument contained in each stem, the genre or other qualitative description of the program, data indicating the processing done on each stem during creation of the stereo artistic mix, and other information. The metadata may also include descriptive material, such as the program title or artist, of interest to the listener but not used during creation of a surround mix.
When appropriate metadata cannot be provided with the stems, metadata including the voice of each stem and the genre of the song may be developed through analysis of the content of each stem. For example, the spectral content of each stem may be analyzed to estimate what voice is contained in the stem and the rhythmic content of the stems, in combination with the voices present in the stems, may allow estimation of the genre of the song.
The automatic surround mixer 300 may be incorporated into a listener's surround audio system. In this case, the rule engine 340 may have access to configuration data indicating the surround audio system configuration (5.0, 5.1, 7.1, etc.) to be used to present the surround mix. When the automatic surround mixer 300 is not incorporated into a surround audio system, the rule engine 340 may receive information indicating the surround audio system configuration, for example, as manual inputs by the listener. Information indicating the surround audio system configuration may be obtained automatically from the audio system, for example by communications via an HDMI (high definition media interconnect) connection.
The rule engine 340 may determine the effects parameters 342 and the mixing parameters 344 using a set of rules stored in a rule base. In this patent, the term “rules” encompasses logical statements, tabulated data, and other information used to generate effects parameters 342 and mixing parameters 344. Rules may be empirically developed, which is to say the rules may be based on the collected experience of one or more sound engineers who have created one or more artistic surround mixes. Rules may be developed by collecting and averaging mixing parameters and effects parameters for a plurality of artistic surround mixes. The rule base 346 may include different rules for different music genres and different rules for different surround audio system configurations.
In general, each rule may include a condition and an action that is executed if the condition is satisfied. The rule engine may evaluate the available data (i.e. metadata and speaker configuration data) and determine what rule conditions are satisfied. The rule engine 340 may then determine what actions are indicated by the satisfied rules, resolve any conflicts between the actions, and cause the indicated actions to occur (i.e. set the effects parameters 342 and the mixing parameters 344).
Rules stored in the rule base 346 may be in declarative form. For example, the rules stored in the rule base 346 may include “lead vocal goes to the center channel”. This rule, as stated, would apply to all music genres and all surround audio system configurations. The condition in the rule is inherent—the rule only applies if a lead vocal stem is present.
A more typical rule may have an expressed condition. For example, the rules stored in the rule base 346 may include “if the audio system has a sub-woofer, then low frequency components of drum, percussion, and bass stems go to the LFE channel, else low frequency components of drum, percussion, and bass stems are divided between the left front and right front channels”. A rule's express condition may incorporate logical expressions (“and”, “or”, “not”, etc.).
A common form of rule may have a condition, such as “if the genre of the music is X and the voice is Y, then . . . ” Rules of this type and other types may be stored in the rule base 346 in tabular form. For example, as shown in
For example, row 420 of the table 400 implements the rule, “for a 5.1 surround audio system and this particular genre, the lead vocal goes to the center channel” with the assumption that no effects processing is performed on the lead vocal stem. For further example, the row 430 of the table 400, implements the rule, “for a 5.1 surround audio system and this particular genre, low frequency components of the drum stem go to the LFE channel and high frequency components of the drum stem are divided between the front left and front right channels”.
Referring back to
The rule engine 340 may also receive data indicating listener preferences. For example, the listener may be provided an option to elect a conventional mix and a nonconventional mix such as an a cappella (vocals only) mix or a “karaoke” mix (lead vocal suppressed). An election of a nonconventional mix may override some of the mixing parameters selected by the rule engine 340.
The functional elements of the automatic surround mixer 300 may be implemented by analog circuits, digital circuits, and/or one or more processors executing an automatic mixer software program. For example, the stem processors 310-1 to 310-6 and the mixing matrix 320 may be implemented using one or more digital processors such as digital signal processors. The rule engine 340 may be implemented using a general purpose processor. When two or more processors are present, the functional partitioning of the automatic surround mixer 300 shown in
Referring now to
The automatic surround mixer 500 may also include a rule engine 540 and a rule base 546. The rule engine 540 may determine effects parameters 342 based on metadata and surround audio system configuration data as previously described.
The rule engine 540 may not directly determine the mixing parameters 344, but may rather determine relative voice position data 548 based on rules stored in the rule base 546. Each relative voice position may indicate a position on virtual stage of a hypothetical source of the respective stem. For example, the rule base 546 would not include the rule, “the lead vocal goes to the center channel”, but may include the rule, “the lead vocalist is positioned at the center front of the stage”. Similar rules may define the positions of other voices/musicians on the virtual stage for various genres.
A common form of rule may have a condition, such as “if the genre of the music is X and the voice is Y, then . . . ” Rules of this type may be stored in the rule base 546 in tabular form. For example, as shown in
The rules described in the previous paragraphs are simple examples. A more complete, but still exemplary, set if rules will be explained with reference to
A set of rules for mixing stems may be expressed in terms of the apparent angle from the listener to the source of the stem. The following exemplary set of rules may provide a pleasant surround mix for songs of various genres. Rules are stated in italics.
Referring back to
The rule engine 540 may also receive data indicating listener preferences. For example, the listener may be provided an option to elect a conventional mix and a nonconventional mix such as an a cappella (vocals only) mix or a karaoke mix (lead vocal or lead and background vocals suppressed). The listener may have an option to select an “educational” mix where each stem is sent to a single speaker channel to allow the listener to focus on a particular instrument. An election of a nonconventional mix may override some of the mixing parameters selected by the rule engine 540.
The rule engine 540 may supply the voice position data 548 to a coordinate processor 550. The coordinate processor 550 may receive a listener election of a virtual listener position with respect to the virtual stage on which the voices are positioned. The listener election may be made, for example, by prompting the listener to choose one of two or more predetermined alternative positions. Possible choices for virtual listener position may include “in the band” (e.g. in the center of the virtual stage surrounded by the voices), “front row center”, and/or “middle of the audience”. The coordinate processor 550 may then generate mixing parameters 344 that cause the mixing matrix 320 to combine the processed stems into channels that provide the desired listener experience.
The coordinate processor 550 may also receive data indicating the relative position of the speakers in the surround audio system. This data may be used by the coordinate processor 550 to refine the mixing parameters to compensate, to at least some extent, for deviations in the speaker arrangement relative to the nominal speaker arrangement (such as the speaker arrangement shown in
The functional elements of the automatic surround mixer 500 may be implemented by analog circuits, digital circuits, and/or one or more processors executing an automatic mixer software program. For example, the stem processors 310-1 to 310-6 and the mixing matrix 320 may be implemented using one or more digital processors such as digital signal processors. The rule engine 540 and the coordinate processor 550 may be implemented using one or more general purpose processors. When two or more processors are present, the functional partitioning of the automatic surround mixer 500 shown in
Description of Processes
Referring now to
At 810, a rule base such as the rule bases 346 and 546 may be developed. The rule base may contain rules for combining stems into a surround mix. These rules may be developed by analysis of historical artistic surround mixes, by accumulating the consensus opinions and practices of recording engineers with experience creating artistic surround mixes, or in some other manner. The rule base may contain different rules for different music genres and different rules for different surround audio configuration. Rules in the rule base may be expressed in tabular form. The rule base is not necessarily permanent and may be expanded over time, for example to incorporate new mixing techniques and new music genres.
The initial rule base may be prepared before, during, or after, a first song is recorded and a first artistic stereo mix is created. An initial rule base must be developed before a surround mix can be automatically generated. The rule base constructed at 810 may be conveyed to one or more automatic mixing systems. For example, the rule base may be incorporated into the hardware of each automatic surround mixing system or may be transmitted to each automatic surround mixing system over a network.
Tracks for the song may be recorded at 815. An artistic stereo mix may be created at 820 by processing and combining the tracks from 815 using known techniques. The artistic stereo mix may be used for conventional purposes such as recording CDs and radio broadcasting. During the creation of the artistic stereo mix at 820, two or more stems may be generated. Each stem may be generated by processing one or more tracks. Each stem may be a component or sub-mix of the stereo artistic mix. A stereo artistic mix may typically be composed of four to eight stems. As few as two stems and more than eight stems may be used for some mixes. Each stem may include a single channel or a left channel and a right channel.
At 825, metadata may be associated with the stems created at 820. The metadata may be generated during the creation of a stereo artistic mix at 820 and may be attached to each stem object and/or stored as a separate data object. The metadata may include, for example, the voice (i.e. type of instrument) of each stem, the genre or other qualitative description of the song, data indicating the processing done on each stem during creation of the stereo artistic mix, and other information. The metadata may also include descriptive material, such as the song title or artist name, of interest to the listener but not used during creation of a surround mix.
When appropriate metadata is unavailable from 820, metadata including the voice of each stem and the genre of the song may be extracted from the content of each stem at 825. For example, the spectral content of each stem may be analyzed to estimate what voice is contained in the stem and the rhythmic content of the stems, in combination with the voices present in the stems, may allow estimation of the genre of the song.
At 845, the stems and metadata from 825 may be acquired by an automatic surround mixing process 840. The automatic surrounding mixing process 840 may occur at the same location and may use the same system as the stereo mixing at 820. In this case, at 845 the automatic mixing process may simply retrieve the metadata and stems from memory. The automatic surrounding mixing process 840 may occur at one or more locations remote from the stereo mixing. In this case, at 845, the automatic surround mixing process 840 may receive the stems and associated metadata via a distribution channel (not shown). The distribution channel may a wireless broadcast, a network such as the Internet or a cable TV network, or some other distribution channel.
At 850, the metadata associated with the stems and the surround audio configuration data may be used to extract applicable rules from the rule base. The automatic surround mixing process 840 may also use data indicating a target surround audio configuration (e.g. 5.0, 5.1, 7.1) to select rules. In general, each rule may define an express or inherent condition and one or more actions that are executed if the condition is satisfied. Rules may be expressed as logical statements. Some or all rules may be expressed in tabular form. Extracting applicable rules at 850 may include selecting only rules having conditions that are satisfied by the metadata and surround audio configuration data. The actions defined in each rule may include, for example, setting mixing parameters, effects parameters, and/or a relative position for a particular stem.
At 855 and 860, the extracted rules may be used to set mixing parameters and effects parameters, respectively. The action at 855 and 860 may be performed in any order or in parallel.
At 865, the stems may be processed into channels for the surround audio system. Processing the stems into channels may include perform processes on some or all of the stems in accordance with effects parameters set at 870. Processes that may be performed include level modification by amplification or attenuation; spectrum modification by low pass filtering, high pass filtering, and/or graphic equalization; dynamic range modification by limiting, compression or decompression; noise, hum, and feedback suppression; reverberation; and other processes. Additionally, specialized processes such as de-essing and chorusing may be performed on vocal stems. One or more of the stem may be divided into multiple components subject to different processes for inclusion in multiple channels. For example, one or more of the stems may be processed to provide a low frequency portion for incorporation into the LFE channel and a higher frequency portion for incorporation into one or more of the other output channels.
At 870, the processed stems from 865 may be mixed into channels. The channels may be input to the surround audio system. Optionally, the channels may also be recorded for future playback. The process 800 may end at 895 after the conclusion of the song.
Referring now to
At 975, rules extracted at 750 may be used to define a relative voice position for each stem. Each relative voice position may indicate a position on virtual stage of a hypothetical source of the respective stem. For example, a rule extracted at 750 may be, “the lead vocalist is positioned at the center front of the stage”. Similar rules may define the positions of other voices/musicians on the virtual stage for various genres.
The automatic surround mixing process 940 may receive an operator's election of a virtual listener position with respect to the virtual stage on which the voices positions were defined at 975. The operator's election may be made, for example, by prompting the listener to choose one of two or more predetermined alternative positions. Example choices for virtual listener position include “in the band” (e.g. in the center of the virtual stage surrounded by the voices), “front row center”, and/or “middle of the audience”.
The automatic surround mixing process 940 may also receive data indicating the relative position of the speakers in the surround audio system. This data may be used to refine the mixing parameters to compensate, to at least some extent, for asymmetries in the speaker arrangement such as the center speaker not being centered between the left and right front speakers.
At 980, the voice positions defined at 975 may be transformed into mixing parameters in consideration of the elected virtual listener position and the speaker position data if available. The mixing parameters from 980 may be used at 770 to mix processed stems from 765 into channels that provide the desired listener experience.
Although not shown in
Throughout this description, the embodiments and examples shown should be considered as exemplars, rather than limitations on the apparatus and procedures disclosed or claimed. Although many of the examples presented herein involve specific combinations of method acts or system elements, it should be understood that those acts and those elements may be combined in other ways to accomplish the same objectives. With regard to flowcharts, additional and fewer steps may be taken, and the steps as shown may be combined or further refined to achieve the methods described herein. Acts, elements and features discussed only in connection with one embodiment are not intended to be excluded from a similar role in other embodiments.
As used herein, “plurality” means two or more. As used herein, a “set” of items may include one or more of such items. As used herein, whether in the written description or the claims, the terms “comprising”, “including”, “carrying”, “having”, “containing”, “involving”, and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases “consisting of” and “consisting essentially of”, respectively, are closed or semi-closed transitional phrases with respect to claims. Use of ordinal terms such as “first”, “second”, “third”, etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term) to distinguish the claim elements. As used herein, “and/or” means that the listed items are alternatives, but the alternatives also include any combination of the listed items.
Many other variations than those described herein will be apparent from this document. For example, depending on the embodiment, certain acts, events, or functions of any of the methods and algorithms described herein can be performed in a different sequence, can be added, merged, or left out altogether (such that not all described acts or events are necessary for the practice of the methods and algorithms). Moreover, in certain embodiments, acts or events can be performed concurrently, such as through multi-threaded processing, interrupt processing, or multiple processors or processor cores or on other parallel architectures, rather than sequentially. In addition, different tasks or processes can be performed by different machines and computing systems that can function together.
The various illustrative logical blocks, modules, methods, and algorithm processes and sequences described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, and process actions have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. The described functionality can be implemented in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of this document.
The various illustrative logical blocks and modules described in connection with the embodiments disclosed herein can be implemented or performed by a machine, such as a general purpose processor, a processing device, a computing device having one or more processing devices, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor and processing device can be a microprocessor, but in the alternative, the processor can be a controller, microcontroller, or state machine, combinations of the same, or the like. A processor can also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
Embodiments of the system and method described herein are operational within numerous types of general purpose or special purpose computing system environments or configurations. In general, a computing environment can include any type of computer system, including, but not limited to, a computer system based on one or more microprocessors, a mainframe computer, a digital signal processor, a portable computing device, a personal organizer, a device controller, a computational engine within an appliance, a mobile phone, a desktop computer, a mobile computer, a tablet computer, a smartphone, and appliances with an embedded computer, to name a few.
Such computing devices can be typically be found in devices having at least some minimum computational capability, including, but not limited to, personal computers, server computers, hand-held computing devices, laptop or mobile computers, communications devices such as cell phones and PDA's, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, audio or video media players, and so forth. In some embodiments the computing devices will include one or more processors. Each processor may be a specialized microprocessor, such as a digital signal processor (DSP), a very long instruction word (VLIW), or other micro-controller, or can be conventional central processing units (CPUs) having one or more processing cores, including specialized graphics processing unit (GPU)-based cores in a multi-core CPU.
The process actions of a method, process, or algorithm described in connection with the embodiments disclosed herein can be embodied directly in hardware, in a software module executed by a processor, or in any combination of the two. The software module can be contained in computer-readable media that can be accessed by a computing device. The computer-readable media includes both volatile and nonvolatile media that is either removable, non-removable, or some combination thereof. The computer-readable media is used to store information such as computer-readable or computer-executable instructions, data structures, program modules, or other data. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.
Computer storage media includes, but is not limited to, computer or machine readable media or storage devices such as Bluray discs (BD), digital versatile discs (DVDs), compact discs (CDs), floppy disks, tape drives, hard drives, optical drives, solid state memory devices, RAM memory, ROM memory, EPROM memory, EEPROM memory, flash memory or other memory technology, magnetic cassettes, magnetic tapes, magnetic disk storage, or other magnetic storage devices, or any other device which can be used to store the desired information and which can be accessed by one or more computing devices.
A software module can reside in the RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of non-transitory computer-readable storage medium, media, or physical computer storage known in the art. An exemplary storage medium can be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium can be integral to the processor. The processor and the storage medium can reside in an application specific integrated circuit (ASIC). The ASIC can reside in a user terminal. Alternatively, the processor and the storage medium can reside as discrete components in a user terminal.
The phrase “non-transitory” as used in this document means “enduring or long-lived”. The phrase “non-transitory computer-readable media” includes any and all computer-readable media, with the sole exception of a transitory, propagating signal. This includes, by way of example and not limitation, non-transitory computer-readable media such as register memory, processor cache and random-access memory (RAM).
The phrase “audio signal” is a signal that is representative of a physical sound.
Retention of information such as computer-readable or computer-executable instructions, data structures, program modules, and so forth, can also be accomplished by using a variety of the communication media to encode one or more modulated data signals, electromagnetic waves (such as carrier waves), or other transport mechanisms or communications protocols, and includes any wired or wireless information delivery mechanism. In general, these communication media refer to a signal that has one or more of its characteristics set or changed in such a manner as to encode information or instructions in the signal. For example, communication media includes wired media such as a wired network or direct-wired connection carrying one or more modulated data signals, and wireless media such as acoustic, radio frequency (RF), infrared, laser, and other wireless media for transmitting, receiving, or both, one or more modulated data signals or electromagnetic waves. Combinations of the any of the above should also be included within the scope of communication media.
Further, one or any combination of software, programs, computer program products that embody some or all of the various embodiments of the system and method described herein, or portions thereof, may be stored, received, transmitted, or read from any desired combination of computer or machine readable media or storage devices and communication media in the form of computer executable instructions or other data structures.
Embodiments of the system and method described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The embodiments described herein may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a cloud of one or more devices, that are linked through one or more communications networks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including media storage devices. Still further, the aforementioned instructions may be implemented, in part or in whole, as hardware logic circuits, which may or may not include a processor.
Conditional language used herein, such as, among others, “can,” “might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or states. Thus, such conditional language is not generally intended to imply that features, elements and/or states are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or states are included or are to be performed in any particular embodiment. The terms “comprising,” “including,” “having,” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list.
While the above detailed description has shown, described, and pointed out novel features as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the devices or algorithms illustrated can be made without departing from the scope of the disclosure. As will be recognized, certain embodiments of the inventions described herein can be embodied within a form that does not provide all of the features and benefits set forth herein, as some features can be used or practiced separately from others.
Moreover, although the subject matter has been described in language specific to structural features and methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
This application is a divisional application of U.S. patent application Ser. No. 14/206,868 filed on Mar. 12, 2014, entitled “AUTOMATIC MULTI-CHANNEL MUSIC MIX FROM MULTIPLE AUDIO STEMS”, which claims priority from Provisional Patent Application No. 61/790,498, filed Mar. 15, 2013, entitled “AUTOMATIC MULTI-CHANNEL MUSIC MIX FROM MULTIPLE AUDIO STEMS”, the entire contents of both documents are hereby incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
6226282 | Chung | May 2001 | B1 |
7078607 | Alferness | Jul 2006 | B2 |
7333863 | Lydecker et al. | Feb 2008 | B1 |
7343210 | Devito et al. | Mar 2008 | B2 |
7526348 | Marshall et al. | Apr 2009 | B1 |
7590249 | Jang et al. | Sep 2009 | B2 |
7636448 | Metcalf | Dec 2009 | B2 |
8331585 | Hagen et al. | Dec 2012 | B2 |
9136881 | Groeschel et al. | Sep 2015 | B2 |
20010055398 | Pachet et al. | Dec 2001 | A1 |
20070044643 | Huffman | Mar 2007 | A1 |
20070297624 | Gilman | Dec 2007 | A1 |
20080015867 | Kraemer | Jan 2008 | A1 |
20100098275 | Metcalf | Apr 2010 | A1 |
20100137662 | Sechrist et al. | Jun 2010 | A1 |
20100284543 | Sobota | Nov 2010 | A1 |
20100299151 | Soroka et al. | Nov 2010 | A1 |
20110013790 | Hilpert et al. | Jan 2011 | A1 |
20110022402 | Engdegard et al. | Jan 2011 | A1 |
20120057715 | Johnston | Mar 2012 | A1 |
20130170672 | Groeschel | Jul 2013 | A1 |
20140133683 | Robinson | May 2014 | A1 |
20140270181 | Siciliano | Sep 2014 | A1 |
Number | Date | Country |
---|---|---|
2009131391 | Oct 2009 | WO |
2010118763 | Oct 2010 | WO |
2013006338 | Jan 2013 | WO |
Entry |
---|
Pachet, et al., “Constraint-Based Spatialization,” journal, In First COST-G6 Workshop on Digital Audio Effects (DAXF98), Barcelona (Spain), Nov. 19-21, 1998, 4 total pages. |
Pachet, Francois, Music Listening: What is in the Air?, Sony CSL Internal Report, published in 1999, 16 total pages. |
WIPO, International Search Report and Written Opinion for PCT Application No. PCT/US2014/024962, dated Aug. 5, 2014, 10 total pages. |
Office Action in corresponding Japanese Patent Application No. P2016-501703; 3 pages. |
Number | Date | Country | |
---|---|---|---|
20170301330 A1 | Oct 2017 | US |
Number | Date | Country | |
---|---|---|---|
61790498 | Mar 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14206868 | Mar 2014 | US |
Child | 15583933 | US |