Current multi-channel audio compression methods are bulky and processor intensive. Multi-channel audio compression is often used to create “surround sound” where a system produces sound that appears to surround the listener. Speakers are situated around the listener to provide the impression that sounds are coming from all possible direction. Consequently, surround sound often provides a more realistic experience, especially when listening to soundtracks of motion pictures and when engaged in video games.
Current multi-channel audio compression methods require discrete speaker arrangements to output the sound in a quality manner. One approach to current multi-channel audio compression is using “n.n” audio tracks, such as “5.1,” “7.1,” etc. In a 5.1 system, there are 5 channels of sound (left, right, center, left surround, and right surround) and 1 channel for low frequency effects (LFE), usually produced by a subwoofer. A 7.1 system is similar but provides an additional left rear and right reach channel for seven channels with the same single channel for LFE. Currently, to produce these effects each channel is stored separately and is bandwidth intensive to transmit. The approaches often need matching speaker outputs to produce the sound correctly. These approaches also utilize intensive remixing in which the source is recoded by same style of equipment. These approaches also result in perceptual coding that limits sound fidelity since re-composition of depends on the psychoacoustic model that was used.
An approach is provided for creating a digital representation of an analog sound. The approach retrieves a number of digital sound data streams with each of the digital sound data streams corresponding to an orientation angle of the digital sound data streams with respect to one another. The digital representation of the analog sound is generated by processing the digital sound data streams and their corresponding orientation angles.
The foregoing is a summary and thus contains, by necessity, simplifications, generalizations, and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages will become apparent in the non-limiting detailed description set forth below.
This disclosure may be better understood by referencing the accompanying drawings, wherein:
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The detailed description has been presented for purposes of illustration, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
As will be appreciated by one skilled in the art, aspects may be embodied as a system, method or computer program product. Accordingly, aspects may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable storage medium(s) may be utilized. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. As used herein, a computer readable storage medium does not include a transitory signal.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present disclosure are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The following detailed description will generally follow the summary, as set forth above, further explaining and expanding the definitions of the various aspects and embodiments as necessary. To this end, this detailed description first sets forth a computing environment in
Northbridge 115 and Southbridge 135 connect to each other using bus 119. In one embodiment, the bus is a Direct Media Interface (DMI) bus that transfers data at high speeds in each direction between Northbridge 115 and Southbridge 135. In another embodiment, a Peripheral Component Interconnect (PCI) bus connects the Northbridge and the Southbridge. Southbridge 135, also known as the I/O Controller Hub (ICH) is a chip that generally implements capabilities that operate at slower speeds than the capabilities provided by the Northbridge. Southbridge 135 typically provides various busses used to connect various components. These busses include, for example, PCI and PCI Express busses, an ISA bus, a System Management Bus (SMBus or SMB), and/or a Low Pin Count (LPC) bus. The LPC bus often connects low-bandwidth devices, such as boot ROM 196 and “legacy” I/O devices (using a “super I/O” chip). The “legacy” I/O devices (198) can include, for example, serial and parallel ports, keyboard, mouse, and/or a floppy disk controller. The LPC bus also connects Southbridge 135 to Trusted Platform Module (TPM) 195. Other components often included in Southbridge 135 include a Direct Memory Access (DMA) controller, a Programmable Interrupt Controller (PIC), and a storage device controller, which connects Southbridge 135 to nonvolatile storage device 185, such as a hard disk drive, using bus 184.
ExpressCard 155 is a slot that connects hot-pluggable devices to the information handling system. ExpressCard 155 supports both PCI Express and USB connectivity as it connects to Southbridge 135 using both the Universal Serial Bus (USB) the PCI Express bus. Southbridge 135 includes USB Controller 140 that provides USB connectivity to devices that connect to the USB. These devices include webcam (camera) 150, infrared (IR) receiver 148, keyboard and trackpad 144, and Bluetooth device 146, which provides for wireless personal area networks (PANs). USB Controller 140 also provides USB connectivity to other miscellaneous USB connected devices 142, such as a mouse, removable nonvolatile storage device 145, modems, network cards, ISDN connectors, fax, printers, USB hubs, and many other types of USB connected devices. While removable nonvolatile storage device 145 is shown as a USB-connected device, removable nonvolatile storage device 145 could be connected using a different interface, such as a Firewire interface, etcetera.
Wireless Local Area Network (LAN) device 175 connects to Southbridge 135 via the PCI or PCI Express bus 172. LAN device 175 typically implements one of the IEEE 802.11 standards of over-the-air modulation techniques that all use the same protocol to wireless communicate between information handling system 100 and another computer system or device. Optical storage device 190 connects to Southbridge 135 using Serial ATA (SATA) bus 188. Serial ATA adapters and devices communicate over a high-speed serial link. The Serial ATA bus also connects Southbridge 135 to other forms of storage devices, such as hard disk drives. Audio circuitry 160, such as a sound card, connects to Southbridge 135 via bus 158. Audio circuitry 160 also provides functionality such as audio line-in and optical digital audio in port 162, optical digital output and headphone jack 164, internal speakers 166, and internal microphone 168. Ethernet controller 170 connects to Southbridge 135 using a bus, such as the PCI or PCI Express bus. Ethernet controller 170 connects information handling system 100 to a computer network, such as a Local Area Network (LAN), the Internet, and other public and private computer networks.
While
The Trusted Platform Module (TPM 195) shown in
The core reasoning behind this algorithm is that N channels of audio arranged around a listener can be represented as a {A0 . . . A2π−θ} array for each t, where A is amplitude, θ is the sampling angle, and t is the time sample. The interval of θ can be chosen to give as rich or as poor a sampling rate as desired. At the lower limit of θ=2π, such a representation devolves to the monaural case of {A0}, {A1}, {A2}, . . . {An} for t={0 . . . n}. For higher dimensions of θ, the sampling rate can be constructed as fits the fidelity needs of the source. For example, a 7.1 stream can be sampled without artifacts at θ=π/13.
For efficiency in compression and calculation, in one embodiment, the values of θ are restricted to powers of 2. This restriction gains four advantages. First, this restriction provides the ability to incorporate variable sampling depths without allocating too much data on indicator bits. Second, this restriction provides the ability to use packed binary compression routines against the sample data. Third, this restriction provides for automatic alignment of the data stream. And fourth, this restriction provides speed efficiency in higher level compression transforms.
Sampling
A sampling methodology of the analog audio is utilized. In one embodiment, the sampling methodology utilizes receives N channels of digital audio input coming in from a digital or analog source. Each channel has an constant associated angle αc from arbitrary reference zero angle. A bit depth for each sample is specified ahead of time, such as an 8 or 16 bit depth. In addition, A time based sampling rate is chosen ahead of time.
In one embodiment, such as for high-fidelity analog applications, the analog inputs are physically arranged along axes evenly distributed along the number of input channels. In another embodiment, arbitrary arrangements are utilized, such as for usual mid-fidelity sample bit depths of 8 or 16. The minimum angular division τ between two channels is computed by subtracting each ac from αc+1 modulo 2π. An angular sample size of θ=2π/(τ*2) is chosen. Angle zero is chosen in such a way that no analog input lies on a boundary, and the distribution across all samples is such that every other sample has no inputs lying in it. In one embodiment, angle zero represents the approximate direction of the intended observer, or listener, of the audio. Each audio channel from {1 . . . N} is assigned to a sample channel in {0 . . . 2π−θ}. This creates a sparse incoming channel signal.
For each time t, a sample of the desired bit depth is taken from the input in each angle and the resulting channels connected together into a continuous waveform. Zero channels are dropped, and the dropped channels noted as a separate part of the sample. The samples are arranged in a variable length digital array for each time t.
In an embodiment using fewer than four channels, somewhat different handling may be utilized. In the case of two speakers that are not aligned opposite each other, or three speakers, it becomes inefficient to digitize on equal size channels. In this case, bytes that specify the angular offset of each channel can be added to the zero adjustment and marked in a compression header to aid in better decoding. Such header marking comprises one, two, or three 16 bit floating point values measured in radians.
Compression
Once an angular based array representation of the sample data is created, the results are compressed in several steps. First, a compression header is created. In one embodiment, the compression header has the following elements: (1) an eyecatcher that indicates the kind of compression used; (2) a version element; (3) a file size; (4) an entry indicating the number of angular channel samples; (5) an entry indicating the bit depth of each channel sample; (6) an entry indicating the time division sampling rate; and (7) an optional entry for angular displacement and low channel special case (i.e., fewer than four channels).
Compression starts with an array of 2π/θ samples, such as {S0, S1, S2 . . . S2π−θ}. The approach reduces the sample array by dropping out (removing) zero values. Every other sample will be empty due to zero position adjustment, so the channels that contain data are noted in a bitfield B of the size π/θ. The channel samples are normalized against itself by subtracting out a quantized mode value. The normalization constant M is stored.
In the approach utilizing this embodiment, the sample at time t now appears as {B, M, S0-M, S1-M . . . S2π−θ-M}. At this point, using typical audio data, the majority of samples will now be zero. The approach uses this characteristic to make a determination based on the number of zeroes. If a typical sample is detected, the approach runs a run-length encoding (RLE) compression to reduce the sparse matrix to a smaller not sparse matrix. The RLE data is smaller than sample data (2-6 bits vs 8 or 16) so the approach can combine it with a known property bitfield to indicate that the data is RLE data. For example, the approach might define a bitfield of 16 bits with 1 s on each end that is impossible in the sample data to represent RLE data.
In the approach, the sample at time t now looks like {B, M, S0-M|Z0, . . . S2π−θ-M|Zx}. The approach no longer has any zero samples in it and is fully useful data. At this point, the approach measures the compression of the sample against a desired goal. If compression is sufficient, the sample is stored and processing and moves to the next time mark. At the end of the sample, the approach adds a unique eyecatcher, such as an eyecatcher of eight zero bits, indicating that sample is stored. If additional compression is required, the approach runs a bitwise Fourier transform on the sample array. This will produce a new set of samples with a large number of contiguous bits. A bitwise RLE or token compression can be done to reduce the payload size further. Lossy compression can be done at this stage to ever further reduce the data payload.
In one embodiment, the final compressed sample appears as {B, M, F0, F1, . . . Fj} where j<<2π/θ. This is stored along with an end eyecatcher indicating how the sample was further compressed. Sample are strung together along with time marks to compose the compressed audio bitstream. This bit stream can be saved or transmitted for later decompression.
Decompression
In one embodiment, decompression begins by receiving a compression header. The version included in the header is used to determine which algorithms are supported. The bit depth and time clocking found in the header are used to determine the size of receiver buffers and loops to use in decompression. Once initialized, the decompression proceeds on a time sample by time sample basis. For each time sample: (1) the eyecatcher is read and optional standard compression steps undone; (2) any Fourier transform (FFT) data is reversed; (3) RLE is used to expand the sample bits and zeroes into their respective bytes; (4) the quantization value is added back into the data; (5) zero channels are added back into the data; and (6) angular offsets, if present, are added back in to the data.
Processing commences at step 520, where the process digitizes analog sound into N digital data streams (e.g., one stream per microphone, etc.). In the example shown, the sound would be digitized into seven data streams as seven microphones are depicted in audio recording location 500. However, any number of audio input devices can be utilized.
At step 525, the process gathers location metadata and this metadata is associated for each stream (angle of each microphone from sound source, etc.). For example, if the intended observer of the audio is represented by microphone 511, the location metadata of the stream corresponding to microphone 511 might be angle zero with the other microphones being at their respective angle intervals from microphone 511. In one embodiment, the location metadata is input through metadata entry 530 which may be a manual or automated process depending on the sophistication of audio recording location 500. The audio stream metadata is stored in data store 540.
At predefined process 550, the process performs the Combine Streams routine that combines the streams into a desired uncompressed representation (see
Data store 550 represents the audio stream data that is needed to perform compression as shown in
Processing commences whereupon, at step 610, the process computes the minimum angular division T between two channels by subtracting each αc from αc+1 modulo 2π. At step 620, the process selects an angular sample size of θ=2π/(τ*2). At step 630, the process selects an input as angle zero with this input representing the direction of the intended observer of the audio. At step 635, the zero angle is adjusted so that no channel lies exactly on a sample border and so that a maximum number of empty samples are attained. At step 640, the process assigns each audio channel from {1 . . . N} to a sample channel in the range of {0 . . . 2π−θ} radians. This creates a sparse incoming channel signal. At step 650, for each time t, the process takes a sample of the desired bit depth from the input in each of the angles and the resulting channels are connected together into a continuous waveform. At step 660, the process drops, or removes, channels with values of zero, and the dropped channels are noted as a separate part of the sample. At step 670, the process arranges the samples in a variable length digital array for each time t. The audio data from N channels are stored in data store 560.
At step 710, the process determines the angle of the closest two input channels. At step 715, the process chooses a sampling angle size. At step 720, the process creates a compression header and fills in the known elements (e.g., eyecatcher, version, number of angular samples, angle offsets, channel bit depth, etc.). At step 730, the process grabs a first sample from each of the N channels. A loop is established with the process processing samples until no more samples remain (decision 735). Until the routine runs out of samples, decision 735 continues to branch to the ‘no’ branch to process the last sample grabbed. The looping continues until there are no more samples, at which point decision 735 branches to the ‘yes’ branch to conclude compression processing.
Steps 740 through 785 are processed for the sample grabbed at step 730. The process determines as to whether sequential zeros or constants dominate the sample that was grabbed (decision 740). If sequential zeros or constants dominate the sample that was grabbed, then decision 740 branches to the ‘yes’ branch whereupon, at step 745, run-length encoding (RLE) is performed on the sample. A determination is made as to whether the RLE compression of the sample was sufficient to satisfy compression thresholds (decision 750). If the RLE compression was not sufficient, then decision 750 branches to the ‘no’ branch for further compression steps. On the other hand, if the RLE compression was sufficient, then decision 750 branches to the ‘yes’ branch bypassing further compression found in steps 755 through 780.
Returning to decision 740, if sequential zeros or constants do not dominate the sample that was grabbed, then decision 740 branches to the ‘no’ branch bypassing the RLE compression found in steps 745 and 750. At step 755, the process performs a Fourier transform of the sample and the sample is accordingly marked as having been Fourier transformed. At step 760, the process performs an RLE compression of the Fourier transformed (FFT) data. The process determines as to whether to perform lossy compression on the sample (decision 765). The decision might be made based on a compression threshold so that lossy compression is performed if further compression of the sample is desired in view of the threshold.
If lossy compression is being performed on the sample, then decision 765 branches to the ‘yes’ branch to perform steps 770 through 780. On the other hand, if lossy compression is not being performed on the sample, then decision 765 branches to the ‘no’ branch bypassing steps 770 through 780. During lossy compression, at step 770, the process normalizes the sample. Then, at step 775, the process quantizes the sample. Finally, at step 780, the process marks the sample as having been lossy compressed. At step 785, after the sample has been compressed using steps 740 through 780, the process stores the compressed sample, the time corresponding to the sample, and any compression marks pertaining to the sample into compressed audio stream 725. Returning to decision 735, when the routine runs out of samples to process, then decision 735 branches to the ‘yes’ branch whereupon, at step 790, the size of the compressed audio stream is marked in the header area of the audio stream. Compression of the audio data using vector fields thereafter ends at 795.
At step 810, the process grabs a compressed sample from data store 725. A loop is established to process samples until there are no more samples to process (decision 815). While samples remain to be processed, decision 815 continues to branch to the ‘no’ branch to decompress and output the sample. This looping continues until there are no more samples to process, at which point decision 815 branches to the ‘yes’ branch whereupon decompression processing ends at 895.
At step 820, the process decodes the selected sample using run-length encoding (RLE) if any RLE encoding was found in the sample. The process determines as to whether does the sample contains additional compression (decision 825). If the sample contains additional compression, then decision 825 branches to the ‘yes’ branch to further decompress using steps 830 through 850. On the other hand, if the sample does not contain additional compression, then decision 825 branches to the ‘no’ branch bypassing steps 830 through 850. The process determines as to whether the sample was compressed using lossy compression (decision 830). If the sample was compressed using lossy compression, then decision 830 branches to the ‘yes’ branch whereupon, at step 835, the sample is de-normalized and, at step 840, the process interpolates quantized elements pertaining to the sample. On the other hand, if the sample was not compressed using lossy compression, then decision 830 branches to the ‘no’ branch bypassing steps 835 and 840.
At step 845, the process performs a reverse Fourier transform (FFT) on the sample. At step 850, the process decodes the sample using RLE decoding. After the sample has been decompressed using steps 820 through 850, then at step 855, the process de-normalizes the sample. The decompressed and de-normalized sample is then output to an audio renderer at step 860 with the audio renderer receiving angular encoded audio data which is stored in memory area 865.
While particular embodiments have been shown and described, it will be obvious to those skilled in the art that, based upon the teachings herein, that changes and modifications may be made without departing from this disclosure and its broader aspects. Therefore, the appended claims are to encompass within their scope all such changes and modifications as are within the true spirit and scope of this disclosure. Furthermore, it is to be understood that the invention is solely defined by the appended claims. It will be understood by those with skill in the art that if a specific number of an introduced claim element is intended, such intent will be explicitly recited in the claim, and in the absence of such recitation no such limitation is present. For non-limiting example, as an aid to understanding, the following appended claims contain usage of the introductory phrases “at least one” and “one or more” to introduce claim elements. However, the use of such phrases should not be construed to imply that the introduction of a claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to others containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an”; the same holds true for the use in the claims of definite articles.
Number | Name | Date | Kind |
---|---|---|---|
20040083094 | Zelazo | Apr 2004 | A1 |
20080097766 | Kim | Apr 2008 | A1 |
20110060595 | Trainor | Mar 2011 | A1 |
20110224992 | Chaoui | Sep 2011 | A1 |
20130034170 | Chen | Feb 2013 | A1 |
20130332156 | Tackin | Dec 2013 | A1 |
20140164454 | Zhirkov | Jun 2014 | A1 |
20150264507 | Francombe | Sep 2015 | A1 |
20160066117 | Chen | Mar 2016 | A1 |
Number | Date | Country | |
---|---|---|---|
20160293169 A1 | Oct 2016 | US |