This disclosure relates generally to transforms, and, more particularly, to methods and apparatus to identify sources of network streaming services using windowed sliding transforms.
The sliding discrete Fourier transform (DFT) is a method for efficiently computing the N-point DFT of a signal starting at sample m using the N-point DFT of the same signal starting at the previous sample m−1. The sliding DFT obviates the conventional need to compute a whole DFT for each starting sample.
In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts. Connecting lines and/or connections shown in the various figures presented are intended to represent example functional relationships, physical couplings and/or logical couplings between the various elements.
Sliding transforms are useful in applications that require the computation of multiple DFTs for different portions, blocks, etc. of an input signal. For example, sliding transforms can be used to reduce the computations needed to compute transforms for different combinations of starting samples and window functions. For example, different combinations of starting samples and window functions can be used to identify the compression scheme applied to an audio signal as, for example, disclosed in U.S. patent application Ser. No. 15/793,543, filed on Oct. 25, 2017. The entirety of U.S. patent application Ser. No. 15/793,543 is incorporated herein by reference. Conventional solutions require that an entire DFT be computed after each portion of the input signal has had a window function applied. Such solutions are computationally inefficient and/or burdensome. In stark contrast, windowed sliding transformers are disclosed herein that can obtain the computational benefit of sliding transforms even when a window function is to be applied.
Reference will now be made in detail to non-limiting examples, some of which are illustrated in the accompanying drawings.
where the coefficients
are fixed values. An example operation of the example transformer 102 of
Conventionally, the DFT Z(i) of a portion of an input signal x after the portion has been windowed with a window function w is computed using the following mathematical expression:
Accordingly, an entire DFT must be computed for each portion of the input signal in known systems.
In some examples, the input signal 106 is held (e.g., buffered, queued, temporarily held, temporarily stored, etc.) for any period of time in an example buffer 110.
When EQN (2) is rewritten according to teachings of this disclosure using Parseval's theorem, as shown in the mathematical expression of EQN (3), the window function w is expressed as a kernel Kk,k′ 112, which can be applied to the transformed representation X(i) 108 of the portion 104.
In EQN (3), the transformed representation X(i) 108 of the portion 104 can be implemented using the example sliding DFT of EQN (1), as shown in EQN (4).
where the coefficients
and the kernel Kk,k′112 are fixed values. In stark contrast to conventional solutions, using EQN (4) obviates the requirement for a high-complexity transform to be computed for each portion of the input. In stark contrast, using EQN (4), a low-complexity sliding transform together with a low-complexity application of the kernel Kk,k′112 is provided.
To window the transformed representation 108, the example windowed sliding transformer 100 of
To window the transformed representation 108, the example windowed sliding transformer 100 of
where the coefficients
and Kk,k′ are fixed values.
To compute the kernel 112, the example windowed sliding transformer 100 includes an example kernel generator 122. The example kernel generator 122 of
where ( ) is a Fourier transform. The kernel Kk,k′112 is a frequency-domain representation of the window function w 120. The example windower 114 applies the frequency-domain representation Kk,k′112 to the frequency-domain representation X(i) 108. The kernel Kk,k′112 needs to be computed only once and, in some examples is sparse. Accordingly, not all of the computations of multiplying the transformed representation X(i) and the kernel Kk,k′ 112 in EQN (3) and EQN (4) need to be performed. In some examples, the sparseness of the kernel Kk,k′112 is increased by only keeping values that satisfy (e.g., are greater than) a threshold. Example windows 120 include, but are not limited to, the sine, slope and Kaiser-Bessel-derived (KBD) windows.
References have been made above to sliding windowed DFT transforms. Other forms of sliding windowed transforms can be implemented. For example, the sliding N-point MDCT Y(i) 108 of an input signal x 106 starting from sample i from the N-point DFT X(i−1) of the input signal x 106 starting from sample i−1 can be expressed mathematically as:
where the kernel Kk,k′112 is computed using the following mathematical expression:
In another example, the sliding N-point complex MDCT (i) 108 of an input signal x 106 starting from sample i from the N-point DFT X(i−1) of the input signal x 106 starting from sample i−1 can be expressed mathematically as:
where the kernel Kk,k′112 is computed using the following mathematical expression:
While an example manner of implementing the example windowed sliding transformer 100 is illustrated in
A flowchart representative of example hardware logic or machine-readable instructions for implementing the windowed sliding transformer 100 is shown in
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, and (6) B with C.
The program of
The transformer 102 computes a DFT 108 of a first block 104 of samples of an input signal 106 (block 404). In some examples, the DFT 108 of the first block 104 is a conventional DFT. For all blocks 104 of the input signal 106 (block 406), the transformer 102 computes a DFT 108 of each block 104 based on the DFT 108 of a previous block 106 (block 408) by implementing, for example, the example mathematical expression of EQN (4).
For all kernels Kk,k′112 computed at block 402 (block 410), the example windower 114 applies the kernel Kk,k′112 to the current DFT 108 (block 412). For example, the example multiplier 116 implements the multiplication of the kernel Kk,k′ 112 and the DFT 108 shown in the example mathematical expression of EQN (3).
When all kernels Kk,k′112 and blocks 104 have been processed (blocks 414 and 416), control exits from the example program of
In U.S. patent application Ser. No. 15/793,543 it was disclosed that it was advantageously discovered that, in some instances, different sources of streaming media (e.g., NETFLIX®, HULU®, YOUTUBE®, AMAZON PRIME®, APPLE TV®, etc.) use different audio compression configurations to store and stream the media they host. In some examples, an audio compression configuration is a set of one or more parameters that define, among possibly other things, an audio coding format (e.g., MP1, MP2, MP3, AAC, AC-3, Vorbis, WMA, DTS, etc.), compression parameters, framing parameters, etc. Because different sources use different audio compression, the sources can be distinguished (e.g., identified, detected, determined, etc.) based on the audio compression applied to the media. The media is de-compressed during playback. In some examples, the de-compressed audio signal is compressed using different trial audio compression configurations for compression artifacts. Because compression artifacts become detectable (e.g., perceptible, identifiable, distinct, etc.) when a particular audio compression configuration matches the compression used during the original encoding, the presence of compression artifacts can be used to identify one of the trial audio compression configurations as the audio compression configuration used originally. After the compression configuration is identified, the AME can infer the original source of the audio. Example compression artifacts are discontinuities between points in a spectrogram, a plurality of points in a spectrogram that are small (e.g., below a threshold, relative to other points in the spectrogram), one or more values in a spectrogram having probabilities of occurrence that are disproportionate compared to other values (e.g., a large number of small values), etc. In instances where two or more sources use the same audio compression configuration and are associated with compression artifacts, the audio compression configuration may be used to reduce the number of sources to consider. Other methods may then be used to distinguish between the sources. However, for simplicity of explanation the examples disclosed herein assume that sources are associated with different audio compression configurations.
Disclosed examples identify the source(s) of media by identifying the audio compression applied to the media (e.g., to an audio portion of the media). In some examples, audio compression identification includes the identification of the compression that an audio signal has undergone, regardless of the content. Compression identification can include, for example, identification of the bit rate at which the audio data was encoded, the parameters used at the time-frequency decomposition stage, the samples in the audio signal where the framing took place before the windowing and transform were applied, etc. As disclosed herein, the audio compression can be identified from media that has been de-compressed and output using an audio device such as a speaker, and recorded. The recorded audio, which has undergone lossy compression and de-compression, can be re-compressed according to different trial audio compressions. In some examples, the trial re-compression that results in the largest compression artifacts is identified as the audio compression that was used to originally compress the media. The identified audio compression is used to identify the source of the media. While the examples disclosed herein only partially re-compress the audio (e.g., perform only the time-frequency analysis stage of compression), full re-compression may be performed. Reference will now be made in detail to non-limiting examples of this disclosure, examples of which are illustrated in the accompanying drawings. The examples are described below by referring to the drawings.
To present (e.g., playback, output, display, etc.) media, the example environment 500 of
To present (e.g., playback, output, etc.) audio (e.g., a song, an audio portion of a video, etc.), the example media presentation device 514 includes an example audio de-compressor 518, and an example audio output device 520. The example audio de-compressor 518 de-compresses the audio 510 to form de-compressed audio 522. In some examples, the audio compressor 512 specifies to the audio de-compressor 518 in the compressed audio 510 the audio compression configuration used by the audio compressor 512 to compress the audio. The de-compressed audio 522 is output by the example audio output device 520 as an audible signal 524. Example audio output devices 520 include, but are not limited, a speaker, an audio amplifier, headphones, etc. While not shown, the example media presentation device 514 may include additional output devices, ports, etc. that can present signals such as video signals. For example, a television includes a display panel, a set-top box includes video output ports, etc.
To record the audible audio signal 524, the example environment 500 of
To identify the media source 506 associated with the audible audio signal 524, the example AME 502 includes an example coding format identifier 530 and an example source identifier 532. The example coding format identifier 530 identifies the audio compression applied by the audio compressor 512 to form the compressed audio signal 510. The coding format identifier 530 identifies the audio compression from the de-compressed audio signal 524 output by the audio output device 520, and recorded by the audio recorder 526. The recorded audio 106, which has undergone lossy compression at the audio compressor 512, and de-compression at the audio de-compressor 518 is re-compressed by the coding format identifier 530 according to different trial audio compression types and/or settings. In some examples, the trial re-compression that results in the largest compression artifacts is identified by the coding format identifier 530 as the audio compression that was used at the audio compressor 512 to originally compress the media.
The example source identifier 530 of
In U.S. patent application Ser. No. 15/793,543, for each starting location, a time-frequency analyzer applies a time-domain window function, and then computes a full time-to-frequency transform. Such solutions may be computationally infeasible, complex, costly, etc. In stark contrast, applying teachings of this disclosure to implement the example time-frequency analyzer U.S. patent application Ser. No. 15/793,543 with the windowed sliding transform 100, as shown in
For example, computation of the sliding DFT of EQN (1) requires 2N additions and N multiplications (where N is the number of samples being processed). Therefore, the sliding DFT has a linear complexity of the order of N. By applying a time-domain window as the kernel Kk,k′112 after a sliding DFT as shown in EQN (4), the computational efficiency of the windowed sliding DFT is maintained. The complexity of the kernel Kk,k′112 is KN additions and SN multiplications, where S is the number of non-zero values in the kernel Kk,k′112. When S<<N (e.g., 3 or 5), the windowed sliding DFT remains of linear complexity of the order of N. In stark contrast, the conventional methods of computing a DFT and an FFT are of the order of N2 and N log(N), respectively. Applying a conventional time-domain window function (i.e., applying the window on the signal before computing a DFT) will be at best of the order of N log(N) (plus some extra additions and multiplications) as the DFT needs to be computed for each sample. By way of comparison, complexity of the order of N is considered to be low complexity, complexity of the order of N log(N) is considered to be moderate complexity, and complexity of the order of N2 is considered to be high complexity.
To store (e.g., buffer, hold, etc.) incoming samples of the recorded audio 106, the example coding format identifier 530 includes an example buffer 110 of
To perform time-frequency analysis, the example coding format identifier 530 includes the example windowed sliding transformer 100. The example windowed sliding transformer 100 of
To compute compression artifacts, the example coding format identifier 530 of
To compute an average of the values of a spectrogram 804-806, the artifact computer 604 of
To detect the small values, the example artifact computer 604 includes an example differencer 716. The example differencer 716 of
To identify the largest difference D1, D2, DN/2 between the averages A1, A2, . . . AN/2+1 of spectrograms 804-806, the example artifact computer 604 of
A peak in the differences D1, D2, . . . DN/2 nominally occurs every T samples in the signal. In some examples, T is the hop size of the time-frequency analysis stage of a coding format, which is typically half of the window length L. In some examples, confidence scores 808 and offsets 810 from multiple blocks of samples of a longer audio recording are combined to increase the accuracy of coding format identification. In some examples, blocks with scores under a chosen threshold are ignored. In some examples, the threshold can be a statistic computed from the differences, for example, the maximum divided by the mean. In some examples, the differences can also be first normalized, for example, by using the standard score. To combine confidence scores 808 and offsets 810, the example coding format identifier 530 includes an example post processor 722. The example post processor 722 of
To store sets of audio compression configurations, the example coding format identifier 530 of
The compression configurations may be stored in the example compression configurations data store 726 using any number and/or type(s) of data structure(s). The compression configurations data store 726 may be implemented using any number and/or type(s) of non-volatile, and/or volatile computer-readable storage device(s) and/or storage disk(s). The example controller 728 of
While an example implementation of the coding format identifier 530 is shown in
A flowchart representative of example machine-readable instructions for implementing the example AME 502 of
The example program of
A flowchart representative of example machine-readable instructions for implementing the example coding format identifier 530 of
The example program of
When all blocks have been processed (block 1120), the example post processor 722 translates the score 808 and offset 810 pairs for the currently considered trial coding format parameter set into polar coordinates, and computes a circular mean of the pairs in polar coordinates as an overall confidence score for the currently considered compression configuration (block 1122).
When all trial compression configurations have been processed (block 1124), the controller 728 identifies the trial compression configuration set with the largest overall confidence score as the audio compression applied by the audio compressor 512 (block 1126). Control then exits from the example program of
As mentioned above, the example processes of
A flowchart representative of example hardware logic or machine-readable instructions for computing a plurality of compression artifacts for combinations of parameters using the windowed sliding transformer 100 is shown in
In comparison to
The processor platform 1300 of the illustrated example includes a processor 1310. The processor 1310 of the illustrated example is hardware. For example, the processor 1310 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example windowed sliding transformer 100, the example artifact computer 712, the example averager 714, the example differencer 716, the example peak identifier 718, the example post processor 722, and the example controller 728.
The processor 1310 of the illustrated example includes a local memory 1312 (e.g., a cache). The processor 1310 of the illustrated example is in communication with a main memory including a volatile memory 1314 and a non-volatile memory 1316 via a bus 1318. The volatile memory 1314 may be implemented by Synchronous Dynamic Random-access Memory (SDRAM), Dynamic Random-access Memory (DRAM), RAMBUS® Dynamic Random-access Memory (RDRAM®) and/or any other type of random-access memory device. The non-volatile memory 1316 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1314, 1316 is controlled by a memory controller (not shown). In this example, the local memory 1312 and/or the memory 1314 implements the buffer 110.
The processor platform 1300 of the illustrated example also includes an interface circuit 1320. The interface circuit 1320 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB) interface, a Bluetooth® interface, a near field communication (NFC) interface, and/or a peripheral component interface (PCI) express interface.
In the illustrated example, one or more input devices 1322 are connected to the interface circuit 1320. The input device(s) 1322 permit(s) a user to enter data and/or commands into the processor 1310. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 1324 are also connected to the interface circuit 1320 of the illustrated example. The output devices 1324 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-plane switching (IPS) display, a touchscreen, etc.) a tactile output device, a printer, and/or speakers. The interface circuit 1320 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 1320 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, and/or network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1326 (e.g., an Ethernet connection, a digital subscriber line (DSL), a telephone line, a coaxial cable, a cellular telephone system, a Wi-Fi system, etc.). In some examples of a Wi-Fi system, the interface circuit 1320 includes a radio frequency (RF) module, antenna(s), amplifiers, filters, modulators, etc.
The processor platform 1300 of the illustrated example also includes one or more mass storage devices 1328 for storing software and/or data. Examples of such mass storage devices 1328 include floppy disk drives, hard drive disks, CD drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and DVD drives.
Coded instructions 1332 including the coded instructions of
The processor platform 1400 of the illustrated example includes a processor 1410. The processor 1410 of the illustrated example is hardware. For example, the processor 1410 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example transformer 102, the example windower 114, the example multiplier 116, the example kernel generator 122, and the example artifact computer 604.
The processor 1410 of the illustrated example includes a local memory 1412 (e.g., a cache). The processor 1410 of the illustrated example is in communication with a main memory including a volatile memory 1414 and a non-volatile memory 1416 via a bus 1418. The volatile memory 1414 may be implemented by Synchronous Dynamic Random-Access Memory (SDRAM), Dynamic Random-Access Memory (DRAM), RAMBUS® Dynamic Random-Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1416 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1414, 1416 is controlled by a memory controller. In the illustrated example, the volatile memory 1414 implements the buffer 110.
The processor platform 1400 of the illustrated example also includes an interface circuit 1420. The interface circuit 1420 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a peripheral component interconnect (PCI) express interface.
In the illustrated example, one or more input devices 1422 are connected to the interface circuit 1420. The input device(s) 1422 permit(s) a user to enter data and/or commands into the processor 1410. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system. In some examples, an input device 1422 is used to receive the input signal 106.
One or more output devices 1424 are also connected to the interface circuit 1420 of the illustrated example. The output devices 1424 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 1420 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 1420 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1426. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc. In some examples, input signals are received via a communication device and the network 1426.
The processor platform 1400 of the illustrated example also includes one or more mass storage devices 1428 for storing software and/or data. Examples of such mass storage devices 1428 include floppy disk drives, hard drive disks, CD drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and DVD drives.
Coded instructions 1432 including the coded instructions of
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that identify sources of network streaming services. From the foregoing, it will be appreciated that methods, apparatus and articles of manufacture have been disclosed which enhance the operations of a computer to improve the correctness of and possibility to identify the sources of network streaming services. In some examples, computer operations can be made more efficient, accurate and robust based on the above techniques for performing source identification of network streaming services. That is, through the use of these processes, computers can operate more efficiently by relatively quickly performing source identification of network streaming services. Furthermore, example methods, apparatus, and/or articles of manufacture disclosed herein identify and overcome inaccuracies and inability in the prior art to perform source identification of network streaming services.
From the foregoing, it will be appreciated that example methods, apparatus and articles of manufacture have been disclosed that lower the complexity and increase the efficiency of sliding windowed transforms. Using teachings of this disclosure, sliding windowed transforms can be computed using the computational benefits of sliding transforms even when a window function is to be implemented. From the foregoing, it will be appreciated that methods, apparatus and articles of manufacture have been disclosed which enhance the operations of a computer by improving the possibility to perform sliding transforms that include the application of window functions. In some examples, computer operations can be made more efficient based on the above equations and techniques for performing sliding windowed transforms. That is, through the use of these processes, computers can operate more efficiently by relatively quickly performing sliding windowed transforms. Furthermore, example methods, apparatus, and/or articles of manufacture disclosed herein identify and overcome inability in the prior art to perform sliding windowed transforms.
Example methods, apparatus, and articles of manufacture to sliding windowed transforms are disclosed herein. Further examples and combinations thereof include at least the following.
Example 1 is an apparatus, comprising a transformer to transform a first block of time-domain samples of an input signal into a first frequency-domain representation based on a second frequency-domain representation of a second block of time-domain samples of the input signal, and a windower to apply a third frequency-domain representation of a time-domain window function to the first frequency-domain representation.
Example 2 is the apparatus of example 1, wherein the windower includes a multiplier to multiply a vector including the first frequency-domain representation and a matrix including the third frequency-domain representation.
Example 3 is the apparatus of example 2, further including a kernel generator to compute the matrix by computing a transform of the time-domain window function.
Example 4 is the apparatus of example 3, wherein the kernel generator is to set a value of a cell of the matrix to zero based on a comparison of the value and a threshold.
Example 5 is the apparatus of any of examples 1 to 4, wherein the transformer computes the first frequency-domain representation based on the second frequency-domain representation using a sliding transform.
Example 6 is the apparatus of any of examples 1 to 5, further including a kernel generator to compute the third frequency-domain representation using a discrete Fourier transform, wherein the transformer computes the first frequency-domain representation based on the second frequency-domain representation using a sliding discrete Fourier transform, and wherein the windower includes a multiplier to multiply a vector including the first frequency-domain representation and a matrix including the third frequency-domain representation.
Example 7 is the apparatus of example 6, wherein the multiplication of the vector and the matrix by the multiplier implements an equivalent of a multiplication of the time-domain window function and the first block of time-domain samples.
Example 8 is the apparatus of any of examples 1 to 7, wherein the time-domain window function includes at least one of a sine window function, a slope window function, or a Kaiser-Bessel-derived window function.
Example 9 a method, comprising transforming a first block of time-domain samples of an input signal into a first frequency-domain representation based on a second frequency-domain representation of a second block of time-domain samples of the input signal, and applying a third frequency-domain representation of a time-domain window function to the first frequency-domain representation.
Example 10 is the method of example 9, wherein the applying the third frequency-domain representation of a time-domain window function to the first frequency-domain representation includes multiplying a vector including the first frequency-domain representation and a matrix including the third frequency-domain representation.
Example 11 is the method of example 10, further including transforming the time-domain window function to the third frequency-domain representation.
Example 12 is the method of example 11, further including setting a value of a cell of the matrix to zero based on a comparison of the value and a threshold.
Example 13 is the method of any of examples 9 to 12, wherein transforming the first block of time-domain into the first frequency-domain representation includes computing a sliding discrete Fourier transform.
Example 14 is the method of any of examples 9 to 13, wherein the time-domain window function includes at least one of a sine window function, a slope window function, or a Kaiser-Bessel-derived window function.
Example 15 is a non-transitory computer-readable storage medium comprising instructions that, when executed, cause a machine to transform a first block of time-domain samples of an input signal into a first frequency-domain representation based on a second frequency-domain representation of a second block of time-domain samples of the input signal, and apply a third frequency-domain representation of a time-domain window function to the first frequency-domain representation.
Example 16 is the non-transitory computer-readable storage medium of example 15, wherein the instructions, when executed, cause the machine to apply the third frequency-domain representation of the time-domain window function to the first frequency-domain representation by multiplying a vector including the first frequency-domain representation and a matrix including the third frequency-domain representation.
Example 17 is the non-transitory computer-readable storage medium of example 16, wherein the instructions, when executed, cause the machine to transform the time-domain window function to the third frequency-domain representation.
Example 18 is the non-transitory computer-readable storage medium of example 17, wherein the instructions, when executed, cause the machine to set a value of a cell of the matrix to zero based on a comparison of the value and a threshold.
Example 19 is the non-transitory computer-readable storage medium of any of examples 15 to 18, wherein the instructions, when executed, cause the machine to transform the first block of time-domain into the first frequency-domain representation by computing a sliding discrete Fourier transform.
Example 20 is the non-transitory computer-readable storage medium of any of examples 15 to 19, wherein the time-domain window function includes at least one of a sine window function, a slope window function, or a Kaiser-Bessel-derived window function.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
This patent arises from a continuation-in-part of U.S. patent application Ser. No. 15/793,543, which was filed on Oct. 25, 2017; and arises from a continuation-in-part of U.S. patent application Ser. No. 15/899,220, which was filed on Feb. 19, 2018. U.S. patent application Ser. No. 15/793,543 and U.S. patent application Ser. No. 15/899,220 are hereby incorporated by reference in their entirety. Priority to U.S. patent application Ser. Nos. 15/793,543 and 15/899,220 are hereby claimed.
Number | Name | Date | Kind |
---|---|---|---|
5373460 | Marks, II | Dec 1994 | A |
6820141 | Bennett | Nov 2004 | B2 |
7742737 | Peiffer et al. | Jun 2010 | B2 |
7907211 | Oostveen et al. | Mar 2011 | B2 |
8351645 | Srinivasan | Jan 2013 | B2 |
8553148 | Ramaswamy et al. | Oct 2013 | B2 |
8559568 | Clark | Oct 2013 | B1 |
8639178 | Anniballi et al. | Jan 2014 | B2 |
8768713 | Chaoui et al. | Jul 2014 | B2 |
8825188 | Stone | Sep 2014 | B2 |
8856816 | Falcon | Oct 2014 | B2 |
9049496 | Raesig et al. | Jun 2015 | B2 |
9313359 | Stojancic et al. | Apr 2016 | B1 |
9456075 | Ponting et al. | Sep 2016 | B2 |
9515904 | Besehanic et al. | Dec 2016 | B2 |
9641892 | Panger et al. | May 2017 | B2 |
9648282 | Petrovic | May 2017 | B2 |
9837101 | Bilobrov | Dec 2017 | B2 |
20030026201 | Arnesen | Feb 2003 | A1 |
20030086341 | Wells et al. | May 2003 | A1 |
20050015241 | Baum | Jan 2005 | A1 |
20060025993 | Aarts et al. | Feb 2006 | A1 |
20080169873 | Toda | Jul 2008 | A1 |
20140137146 | Topchy et al. | May 2014 | A1 |
20140336800 | Radhakrishnan | Nov 2014 | A1 |
20150170660 | Han et al. | Jun 2015 | A1 |
20150222951 | Ramaswamy | Aug 2015 | A1 |
20150302086 | Roberts et al. | Oct 2015 | A1 |
20160196343 | Rafii | Jul 2016 | A1 |
20170048641 | Franck | Feb 2017 | A1 |
20170337926 | Chon et al. | Nov 2017 | A1 |
20180315435 | Goodwin et al. | Nov 2018 | A1 |
20180365194 | Grado et al. | Dec 2018 | A1 |
20190122673 | Rafii et al. | Apr 2019 | A1 |
20190139559 | Rafii et al. | May 2019 | A1 |
Number | Date | Country |
---|---|---|
2474508 | Apr 2011 | GB |
2019084065 | May 2019 | WO |
Entry |
---|
Jenner, Frank, and Andres Kwasinski. “Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals.” 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2012. (Year: 2012). |
Korycki, Rafal. “Authenticity examination of compressed audio recordings using detection of multiple compression and encoders' identification.” Forensic science international 238 (2014): 33-46. (Year: 2014). |
Todd et al., “AC-3: Flexible Perceptual Coding for Audio Transmission and Storage”, presented at the 96th Convention of the Audio Engineering Society, Feb. 26-Mar. 1, 1994, 13 pages. |
Brandenburg et al.,“ISO-MPEG-1 Audio: A Generic Standard for Coding of High-Quality Digital Audio”, presented at the 92 Convention of the Audio Engineering Society, 1992; revised Jul. 15, 1994, 13 pages. |
Brandenburg, Karlheinz, “MP3 and AAC Explained”, presented at the Audio Engineering Society's 17th International Conference on High Quality Audio Coding, Sep. 2-5, 1999, 12 pages. |
Herre et al., “Analysis of Decompressed Audio—The “Inverse Decoder””, presented at the 109th Convention of the Audio Engineering Society, Sep. 22-25, 2000, 24 pages. |
Moehrs et al., “Analysing decompressed audio with the “Inverse Decoder”—towards an operative algorithm”, presented at the 112the Convention of the Audio Engineering Society, May 10-13, 2002, 22 pages. |
Bosi et al., “Introduction to Digital Audio Coding and Standards”, published by Kluwer Academic Publishers, 2003, 426 pages. |
Yang et al., “Detecting Digital Audio Forgeries by Checking Frame Offsets”, presented at the 10th annual ACM Multimedia & Security Conference, Sep. 22-23, 2008, 6 pages. |
D'Alessandro et al., “MP3 Bit Rate Quality Detection through Frequency Spectrum Analysis”, presented at the 11th annual ACM Multimedia & Security Conference, Sep. 7-8, 2009, 5 pages. |
Yang et al., “Defeating Fake-Quality MP3”, presented at the 11th annual ACM Multimedia & Security Conference, Sep. 7-8, 2009, 8 pages. |
Liu et al., “Detection of Double MP3 Compression”, published in Cognitive Computation, May 22, 2010, 6 pages. |
Hiçsönmez et al., “Audio Codec Identification Through Payload Sampling”, published in Information Forensics and Security (WIFS), 2011, 6 pages. |
Advanced Television Systems Committee, “ATSC Standard: Digital Audio Compression (AC-3, E-AC-3)”, Dec. 17, 2012, 270 pages. |
Hiçsönmez et al., “Methods for Identifying Traces of Compression in Audio”, published online, URL: https://www.researchgate.net/publication/26199644, May 1, 2014, 7 pages. |
Bianchi et al., “Detection and Classification of Double Compressed MP3 Audio Tracks”, presented at the 1st annual AMC workshop on Information Hiding & Multimedia Security, Jun. 17-19, 2013, 6 pages. |
Qiao et al., “Improved Detection of MP3 Double Compression using Content-Independent Features”, published in Signal Processing, Communication and Computing (ICSPCC), 2013, 4 pages. |
Korycki, Rafal, “Authenticity examination of compressed audio recordings using detection of multiple compression and encoders' identification”, published in Forensic Science International, Februray 7, 2014, 14 pages. |
Gärtner et al., “Efficient Cross-Codec Framing Grid Analysis for Audio Tampering Detection”, presented at the 136th Audio Engineering Society Convention, Apr. 26-29, 2014, 11 pages. |
Luo et al., “Identifying Compression History of Wave Audio and Its Applications”, published in ACM Transactions on Multimedia Computing, Communications and Applications, vol. 10, No. 3, Article 30, Apr. 2014, 19 pages. |
xiph.org Foundation, “Vorbis I Specification”, published Feb. 27, 2015, 74 pages. |
Seichter et al., “AAC Encoding Detection and Bitrate Estimation Using a Convolutional Neural Network”, published in Acoustics, Speech and Signal Processing (ICASSP), 2016, 5 pages. |
Hennequin et al., “Codec Independent Lossy Audio Compression Detection”, published in Accoustics, Speech and Signal Processing (ICASSP), 2017, 5 pages. |
Kim et al., “Lossy Compression Identification from Audio Recordings, version 1”, 5 pages. |
Kim et al., “Lossy Compression Identification from Audio Recordings, version 2”, 5 pages. |
Barry Van Oudtshoorn, “Investigating the Feasibility of Near Real-Time Music Transcription on Mobile Devices,” Honours Programme of the School of Computer Science and Software enginnering, The University of Western Australia, 2008, 50 pages. |
Eric Jacobsen and Richard Lyons, “Sliding Spectrum Analysis,” Streamlining digital Signal Processing: A Tricks of the Trade Guidebook, IEEE, Chapter 14, 2007, 13 pages. |
Eric Jacobsen and Richard Lyons, “An update to the sliding DFT,” IEEE Signal Processing Magazine, 2004, 3 pages. |
Eric Jacobsen and Richard Lyons, “The Sliding DFT,” IEEE Signal Processing Magazine, 1053-5888, Mar. 2003, p. 74-80, 7 pages. |
Haitham Hassanieh, Piotr Indyk, Dina Katabi, and Eric Price, “Simple and Practical Algorithm for Sparse Fourier Transform,” SODA '12 Proceedings of the Twenty-Third Annual Symposium on Discrete Algorithms, 12 pages. |
Judith C. Brown and Miller S. Puckette, “An efficient algorithm for the calculation of a constant Q transform,” J. Acoust. Soc. Am. 92 (5), Nov. 1992, pp. 2698-2701, 4 pages. |
Judith C. Brown, “Calculation of a constant Q spectral transform,” J. Acoust. Soc. Am. 89 (1), Jan. 1991, pp. 425-434, 10 pages. |
Steve Arar, “DFT Leakage and the Choice of the Window Function,” Aug. 23, 2017, retrieved from www.allaboutcircuits.com/technical-articles, 11 pages. |
Tom Springer, “Sliding FFT computes frequency spectra in real time,” EDN Magazine, Sep. 29, 1988, reprint taken from Electronic Circuits, Systems and Standards: The Best of EDN, edited by Ian Hickman, 1991, 7 pages. |
Kim et al., “Lossy Audio Compression Identification,” 26th European Signal Processing Conference (EUSIPCO 2018), Rome, Italy, Sep. 3-7, 2018 (5 pages). |
Kim et al., “Lossy Audio Compression Identification,” 26th European Signal Processing Conference (EUSIPCO 2018), Rome, Italy, Sep. 3-7, 2018 (1 page). |
United States Patent and Trademark Office, “Non-Final Office Action,” dated May 20, 2019 in connection with U.S. Appl. No. 15/899,220 (10 pages). |
International Searching Authority, “International Search Report,” issued in connection with application No. PCT/US2018/057183, dated Feb. 13, 2019, 5 pages. |
International Searching Authority, “Written Opinion,” issued in connection with application No. PCT/US2018/057183, dated Feb. 13, 2019, 4 pages. |
United States Patent and Trademark Office, “Non-Final Office Action,” dated Feb. 26, 2019, in connection with U.S. Appl. No. 15/793,543 (14 pages). |
United States Patent and Trademark Office, “Final Office Action,” dated Jul. 12, 2019, in connection with U.S. Appl. No. 15/793,543 (14 pages). |
Hicsonmez et al. “Audio Codec Identification from Coded and Transcoded Audios,” Digital Signal Processing vol. 23, 2013: pp. 1720-1730 (11 pages). |
Luo et al., “Identification of AMR decompressed audio,” Digital Signal Processing vol. 37, 2015: pp. 85-91 (7 pages). |
United States Patent and Trademark Office, “Final Office Action,” issued in connection with U.S. Appl. No. 15/899,220, dated Nov. 25, 2019, 18 pages. |
United States Patent and Trademark Office, “Notice of Allowance,” issued in connection with U.S. Appl. No. 15/899,220, dated Feb. 11, 2020, 18 pages. |
Number | Date | Country | |
---|---|---|---|
20190122678 A1 | Apr 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 15899220 | Feb 2018 | US |
Child | 15942369 | US | |
Parent | 15793543 | Oct 2017 | US |
Child | 15899220 | US |