This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/EP2014/056304, filed Mar. 28, 2014, which was published in accordance with PCT Article 21(2) on Oct. 9, 2014 in English and which claims the benefit of European patent application No. 13305425.4, filed Apr. 2, 2015.
The invention relates to a method and to an apparatus for determining watermark symbols in a received audio signal that can contain echoes, reverberation and/or noise.
Audio watermarking is the process of embedding in an inaudible way additional information into an audio signal. The embedding is performed by changing the audio signal, for example by adding pseudo-random noise or echoes. To make the embedding in-audible, the strength of the embedding is controlled by a psycho-acoustic analysis of the audio signal. WO 2011/141292 A1 describes watermark detection in the presence of echoes, reverberation and/or noise in an audio signal, e.g. loudspeaker sound received by a microphone. These echoes are resulting in multiple peaks within a correlation result value sequence of length N with a watermark symbol (i.e. a reference signal), and are used for improving the watermark detection reliability. Basic steps of that statistical detector are:
P(k) is the probability of falsely accepting the candidate watermark symbol. It describes the probability of k or more correlation result values from a non-watermarked signal section being greater than or equal to the actual k peak values under consideration.
That statistical detector solves the following problems:
That statistical detector uses several correlation result peaks in order to improve the detection performance. Especially this improvement is advantageous if the watermarked tracks are transmitted over an acoustic path resulting in multipath detection due to echoes. The np peaks v1≧ . . . ≧vn
However, this kind of processing does not reflect the physical reality in an optimum way because the additional peak values in the correlation result value sequence are stemming from reverberations and are therefore grouped closely around the main peak. I.e., they will be time delayed in relation to the direct path between the loudspeaker and the microphone of the detection device, but only within a limited time period. In case the watermark detection device receives sound over an acoustic path with a distance of ds (measured in samples s) from the source, the propagation distance is dm=ds cT meters, where cT is the distance the sound is propagating within one sampling interval, T=1/f is the sampling rate and c is the speed of sound.
For example, if dmd=4 m for the direct path and dmi=2dmd=8 m for an indirect path, the distance in samples between the main peak of the direct path and that neighbour peak due to the reflection is Δds=dsi−dsd=dmd/cT≈560 samples with c≈343 m/s and f≈48000 samples/s. In a typical setting the correlation length is N=16 k samples (with 1 k≡1024). Therefore the peaks are to be searched in a window of e.g. size L≈1 k samples (i.e. L<<N) around the main peak value or peak values, and the corresponding false positive probabilities are to be calculated. Since the main correlation result peak value is located somewhere in the current set of N correlation result values, the false positive probability is calculated for all possible N−L+1 shifts of the window of size L within a buffer of size N.
A problem to be solved by the invention is to take the physics of multipath reception better into account than in known statistical watermark detectors, and thus to improve false positive probability calculation and watermark detection performance. This problem is solved by the method disclosed in claim 1. An apparatus that utilises this method is disclosed in claim 2.
As mentioned above, following transmission of a watermarked audio signal over an acoustic path causing echoes, reverberation and/or noise, watermark detection correlation result peak values generally are concentrated around main correlation result peaks within a limited temporal range, with its maximum size denoted as L, which is much smaller than the total correlation length N. The related task for watermark detection can be formulated as follows:
Given np peak values v=(v1, v2, . . . , vn
Measurements have shown that such average or expected probability distribution for correlation result values for unmarked content corresponds to, or is similar to, a Gaussian distribution.
Remark: when performing a correlation by shifting sample-by-sample a reference pattern over a current input audio signal section, the N-values result of this correlation can have positive peak values as well as negative peak values, which together are denoted ‘peak amount values’.
To simplify the following description, some notations are introduced:
Let a sliding window of length L shift through N correlation result values. As mentioned above, the required FP (false positive) probability is the probability that for one or more times the sliding window contains np or more correlation result amount values greater than or equal to the np peak values in the average or expected probability distribution for correlation result values for non-watermarked audio signal content. The complementary case for that FP probability is that there is no sliding window containing np or more correlation result amount values greater than or equal to np peak values in the average or expected probability distribution for correlation result values for non-watermarked audio signal content, namely cjv,∀j. Consequently, Pr{cjv,∀j} is the complementary probability for the false positive probability Pr{∃jε(1, . . . , N−L+1), cj≧v} (one or more windows indexed by j).
The final FP probability calculation can be expressed as (see the detailed description further below):
For L=N (in that case c1≡c=(c1, c2, . . . , cN)), in view of the above Remark in the ‘cjv’ definition, this general formula reduces to the case of the recursive calculation in the WO 2011/141292 statistical detector: Pr{c1≧v}=1−Pr{c1v}.
The calculation of Pr{c1v} is described for the statistical detector in WO 2011/141292: based on peak values in the correlation result values for a current signal section, it is detected which one of the candidate symbols is present in the current signal section by using related values of false positive probability of detection of the kind of watermark symbol, wherein the false positive probability is calculated in a recursive manner and the total false positive probability for a given number of correlation result peak values is evaluated by using initially the false positive probabilities for a smaller number of correlation result peak values, and by increasing gradually the number of considered correlation result peak values according to the required detection reliability.
Therefore the calculation of the false positive probability in the following description is mapped to the problem of calculating Pr{c2≧v,c1v}: for a given number np of peak values this probability can be recursively calculated from the probabilities for np−1 peak values starting with peaks i=1,2 (cf. section Description of embodiments).
While the invention improves the detection performance of the WO 2011/141292 statistical watermark detection processing by significantly reducing false positive detection decisions, it retains all advantages of the WO 2011/141292 watermark detection.
Advantageously, the invention can be used in a 2nd screen scenario where a user watches TV and gets via the watermarked TV sound watermark information e.g. for a tablet computer that uses the watermark information for downloading and presenting basically in synchronism additional information related to the current TV program.
In principle, the inventive method is suited for determining watermark symbols in a received audio signal that can contain echoes, reverberation and/or noise, said method including the steps:
In principle the inventive apparatus is suited for determining watermark symbols in a received audio signal that can contain echoes, reverberation and/or noise, said apparatus including:
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
A. Definition of False Positive (FP) Probability
It is well-known in watermark detection based on cross-correlation to use a single correlation result peak value for determining the embedded watermark information. In this invention, however, optimum watermark detection employing multiple correlation result peak amount values is described. For evaluating FP probability it is assumed that:
For a single peak value v in a current correlation result of length N, the FP probability is the probability that in the average or expected probability distribution for correlation result values for unmarked content there are one or more values out of N correlation values not less than that peak value v. Similarly, for np peak values v1≧v2≧ . . . ≧vn
Given a specific correlation result vector sample c1L[c1, . . . , cL] following sorting (xn
If nGT=np, c1L has np or more values greater than or equal to v1n
A.1 Correlation Value Distribution, Comparison to Multiple Peaks
For the purpose of comparing correlation result values to multiple peak values, the complete range of correlation result amount values is divided into np+1 intervals:
[−∞,vn
Correlation value distribution is then performed by counting how many correlation result amount values are located within individual intervals, which can be described by a representative vector. Because sometimes a number of values in some intervals are irrelevant for FP probability evaluation, representative vectors may have different lengths. Therefore, in this description, the most right element of a representative vector always corresponds to the interval [v1,+∞), while it's most left element is referred to as its first element.
For such case, due to v1≧v2,v1≧v3, there is still at least one value ≧v1, one value ≧v2, and one value ≧3. This can be interpreted as two zeros associated with the intervals [v3,v2) and [v2,v1) are compensated by m≧3 in the interval [v1,∞). For simplicity, a ‘zero compensated representative vector’ denoted [[ . . . ]] can be defined in this case as [[1,1,m≧1]], derived from the original representative vector [m≧3]. If not otherwise stated, a zero compensated representative vector having a length np is used for the interval [vn
E.g. in case P2, there are two correlation result values in the interval [v1,∞), zero correlation result values in the interval [v2,v1), m≧1 correlation result values in the interval [v3,v2), and L−2−m correlation result values are less than v3.
Given a representative vector a=[an, . . . , a2, a1], its corresponding zero compensated counterpart is derived as follows:
The resulting vector a′ is the zero compensated representative vector for a. Note that zero compensated representative vectors for any interval [vk,+∞) with k≧n can be similarly obtained by just replacing np with k.
For example, representative vectors and their zero compensated representative vectors for the cases listed in
Advantageously, with the introduction of such zero compensated representative vectors it becomes much easier to compare correlation result values including multiple peaks. Specifically, if there is no zero element in the zero compensated representative vector, it can be assumed that a correlation vector collecting N correlation values, denoted as c1N=(c1, c2, . . . , cN), is greater than or equal to np peaks, which is concisely denoted as c1N≧v1n
and the final FP probability is PFP=P1+P2+P3+P4+P5. The factor (pi−pi−1) is the probability of obtaining a correlation result peak with a value in the range [vi,vi−1], whereas (1−pi) is the probability of getting a peak in the range (−∞,vi].
B. FP Probability for Correlation Result Peak Values Within a Limited Range
As mentioned above, for signal transmission over an acoustic path correlation result peak amount values generally are concentrated within a limited temporal range of maximum size L, which is much smaller than the correlation length N.
Thus, for np peak amount values within a current window of length L out of N correlation result values, the probability is to be computed that in the average or expected probability distribution for correlation result values for unmarked content there are np or more correlation result amount values greater than or equal to the np peak amount values in this current length L window.
B.1 Inventive FP Probability Calculation Processing
The definitions for cj(cj, cj+1, . . . , cj+L−1), cj≧v and civ were given above.
The complementary case for the sliding window containing for one or more times np or more values greater than or equal to np expected peaks is that there is no sliding window that contains np or more values greater than or equal to np expected peaks, namely, cjv,∀j. Consequently, the complementary probability for the FP probability is
Pr{cjv,∀j}=Pr{c1v,c2v, . . . , cN−L+1v}. (1)
Using the chain rule, the joint probability Pr{c1v, c2v, . . . , cN−L+1v} can be calculated by means of conditional probabilities:
For a correlation vector cj only the last predecessor cj−1 is relevant in the conditional probability Pr{cjv|cj−1v, . . . , c1v}, since it contains all but one new element of cj:
Pr{cjv|cj−1v, . . . , c1v}=Pr{cjv|cj−1v}.
Therefore, the joint probability Pr{c1v ,c2v, . . . , cN−L+lv} can be written as
Pr{c1v, . . . , cN−L+1v}=(Πj=2N−L+1Pr{cjv|cj−1v})·Pr{c1v}.
In addition, a subset of L correlation samples already represents a representative set of all the N samples because L is large enough, leading to the identity
Pr{cjv|cj−1v}≡Pr{c2v|c1v},
which is employed in equation (2):
Pr{c1v, . . . , cN−L+1v}=(Pr{c2v|c1v})N−L·Pr{c1v}. (3)
Since it is known how to evaluate Pr{c1}, the FP probability calculation reduces to evaluation of the conditional probability Pr{c2v|c1v}.
B.2 Conditional Probability Evaluation
The conditional probability Pr{c2v|c1v} can be reformulated using the definition of the conditional probability. Given two events A and B with P(B)>0, the conditional probability of A given B is defined as
Therefore Pr{c2v|c1v} can be written as:
Consequently, the FP probability can be evaluated as
This general formula reduces for L=N to the case of recursive calculation in the WO 2011/141292 statistical watermark detection processing (in that case c1[c1, . . . , cN]):
Pr{c1≧v}=1−Pr{c1v}.
For calculating equation (6) in case L≠N, a split-up is carried out as explained in the following section.
B.3 Joint Probability Evaluation Based on Correlation Value Distributions
The joint probability Pr{c2≧v,c1v} can be represented as
Pr{c2≧v,c1v}=Pr{c2L has exactly (np−1) values ≧v,
adding cL+1 to c2L makes c2≧v,
adding c1 to c2L makes c1v}. (7)
Cases where c2L has exactly (np−1) values ≧v are divided into two disjoint groups again:
In the case of np=3, c2L has exactly np−2=1 value ≧v1n
For example, P2 corresponds to a zero compensated representative vector [[0,1,1]]. If c1<v3, adding c1 to c2L will result in c1Lv13. On the other side, if cL+1≧v3, adding cL+1 to c2L will result in c2L≧v13.
Accordingly, the individual probabilities are calculated as
P1((1L-1)p1(1L-2)(p2−p1)(1−p3)L-3)p3(1−p3)
P2((2L-1)p12(1−p3)L-3)p3(1−p3)
P3(Σm=2L-1(mL-1)(p2−p1)m(1−p2)L-1-m)p1(1−p1)
P4((1L-1)(p2−p1)Σm=1L-2(mL-2)p3−p2m(1−p3)L-2-m)p1(1−p1)
P5((1L-1)p1Σm=1L-2(mL-2)(p3−p2)m(1−p3)L-2-m)p2(1−p2)
In general, calculating the sum terms for the P3, P4, P5 probabilities can be reformulated by employing the binomial theorem:
which significantly reduces the computational complexity if n1<<n2−n1.
In addition, for ‘m≧b’ cases such as P3, P4, P5 shown in
Having representative vectors for all disjoint cases where c2L has exactly (np−1) values ≧v, it is straightforward to evaluate the FP probability.
B.4 Recursive Representative Vector Construction
The representative and zero compensated vectors are used for computing the false positive probability FP(visk). If these vectors are known, equations for the probabilities P1 to P5 can be formulated, see the examples in the description for
In the following it is explained how to recursively obtain these representative vectors. As discussed previously, all cases in
In other words, given the remaining elements in a representative vector excluding the ‘m≧b’ element, the ‘m≧b’ element can be deduced: refer to
Motivated by
For s=0, the set of representative vectors is {[0]}, for s=1, the set of representative vectors is {[0,1], [1,0]}.
Dependent on the length lj of individual representative vectors in the set for s, the representative vectors of the current recursion for s+1 are constructed differently:
Update for s=1: two representative vectors [1,0] and [0,1] both having the length lj=s+1=2. For vector [0,1], there is no zero element after the first non-zero element. Therefore the new representative vectors are obtained as [0,0,2],[0,1,1],[1,0,1]. On the other hand, for vector [1,0] there is one zero element after its first non-zero element, and there is one zero left after zero compensation. Therefore, the new representative vectors are expanding and adding unit vectors: [0,2,0],[1,1,0], as well as adding unit vectors without expanding: [2,0].
Update for s=2 with representative vector set {[0,0,2],[0,1,1],[1,0,1],[0,2,0], [1,1,0],[2,0]}. According to the above description, representative vectors for s=3 are obtained as
For [0,1,0,2] with [[0,1,1,1]] as its zero compensated representative vector there is no zero on the right hand of its first non-zero element after zero compensation, while for [1,0,0,2] with [[1,0,1,1]] as its zero compensated representative vector there is one zero left on the right hand side of its first non-zero element even after zero compensation. Accordingly, different updating procedures will be performed for [0,1,0,2] and [1,0,0,2] to get representative vectors for s=4.
In the block diagram of the inventive watermark decoder in
In the flow diagram of
Because L is significantly smaller than N, the summed-up length of all length-L windows is smaller than length N. In practise, L<<N such that L is at least one order of magnitude smaller than N, i.e. N/L>10. For example, N=16 k and L=1 k.
Thereafter, an outer loop running from is=1 to nSymbols and an inner loop running from k=1 to M are entered, controlled by comparison steps 57 and 56, respectively. In the inner loop, the false positive probability FP(visk) is computed in step 54 from the current correlation result values, followed by a comparison step 55, l an increment of k, a comparison step 56, and in the outer loop an increment of is and a comparison step 57. The false positive probability FP(visk) value corresponds to the probability that for one or more times such length-L window contains np or more correlation result amount values greater than or equal to np peak values in an average or expected probability distribution for correlation result amount values for a non-watermarked audio signal. Following comparison step 57 the candidate watermark symbol is is determined in step 58 for which the final false positive probability fp is minis (mink(FP(visk))). Step 59 checks by comparing that fp value with a further threshold Tmax whether or not a watermark symbol has been detected. If true, that detected watermark symbol is output. If not true, no watermark symbol has been detected.
In comparison step 55, if FP(visk) is smaller than a predetermined threshold Tmin, it is assumed that the correct candidate watermark symbol is has been found, both loops are left in order to save computation time, and that watermark symbol is output.
The inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.
Number | Date | Country | Kind |
---|---|---|---|
13305425 | Apr 2013 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2014/056304 | 3/28/2014 | WO | 00 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2014/161785 | 10/9/2014 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20090187765 | Baum | Jul 2009 | A1 |
20100166120 | Baum | Jul 2010 | A1 |
20110103444 | Baum | May 2011 | A1 |
Number | Date | Country |
---|---|---|
WO2005078658 | Aug 2005 | EP |
2081188 | Jul 2009 | EP |
WO2011141292 | Nov 2011 | WO |
Entry |
---|
Arnold et al., “Robust detection of audio watermarks after acoustic path transmission”, Proceedings of the 12th ACM Workshop on Multimedia and Security, MM&SEC'10, Jan. 1, 2010, p. 117-126. |
Search Report Dated May 27, 2014. |
Number | Date | Country | |
---|---|---|---|
20160049154 A1 | Feb 2016 | US |