SIGNAL PROCESSING DEVICE AND METHOD, AND PROGRAM

Abstract
The present technology relates to signal processing device and method, and a program that make it possible to reproduce sound more effectively. A signal processing device includes: a rotation operation unit that rotates a head-related transfer function in a spherical harmonic domain by an operation on the basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; and a synthesis unit that synthesizes the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal. The present technology is applicable to an audio processor.
Description
TECHNICAL FIELD

The present technology relates to signal processing device and method, and a program, and specifically to signal processing device and method, and a program that make it possible to reproduce sound more efficiently.


BACKGROUND ART

In recent years, development and spread of systems that record, transmit, and reproduce spatial information from an entire environment have been progressing in the field of sound. For example, in Super Hi-Vision, broadcasting is being planned using three-dimensional 22.2 multichannel sound.


Further, in the field of virtual reality, systems that also reproduce signals surrounding the entire environment for sound in addition to an image surrounding the entire environment are becoming popular.


Among them, there is a technique of representing three-dimensional audio information, which is flexibly adaptable to any recording/reproducing system. The technique is called ambisonics, and has been attracting attention. In particular, second or higher order ambisonics is called higher order ambisonics (HOA) (see NPTL 1, for example).


In three-dimensional multichannel sound, sound information spread along a spatial axis in addition to a time axis, and in ambisonics, information is held by performing frequency transformation, that is, spherical harmonic function transformation relative to an angular direction of three-dimensional polar coordinates. It is possible to consider that the spherical harmonic function transformation corresponds to time-frequency transformation of an audio signal with respect to a time axis.


Advantages of this method include ability to encode and decode information from any microphone array to any speaker array without limiting the number of microphones or the number of speakers.


In contrast, impediments to spread of ambisonics include need for a speaker array including a large number of speakers in a reproduction environment, and a narrow range (sweet spot) where it is possible to reproduce sound space.


For example, a speaker array including more speakers is necessary to increase spatial resolution of sound, but it is impractical to increase such a system at home or the like. In addition, in a space such as a movie theater, a region where it is possible to reproduce sound space is narrow, and it is difficult to give desired effects to an entire audience.


CITATION LIST
Non-Patent Literature



  • NPTL 1: Jerome Daniel, Rozenn Nicol, Sebastien Moreau, “Further Investigations of High Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging,” AES 114th Convention, Amsterdam, Netherlands, 2003.



SUMMARY OF THE INVENTION
Problems to be Solved by the Invention

It is therefore conceivable to combine ambisonics and binaural reproduction technology. The binaural reproduction technology is generally called virtual auditory display (VAD), and is implemented using a head-related transfer function (HRTF).


Herein, the head-related transfer function expresses information regarding how sound is transmitted from every direction surrounding a human head to binaural eardrums as a function of frequency and arrival direction.


In a case where a synthesis obtained by synthesizing a target sound and the head-related transfer function from a certain direction is presented with headphones, a listener perceives the sound as if the sound comes from the direction of the head-related transfer function used, not from the headphones. The VAD is a system that utilizes such a principle.


In a case where a plurality of virtual speakers are reproduced by using the VAD, it is possible to achieve, by presentation with the headphones, the same effects as those of ambisonics in a speaker array including a plurality of speakers, which is difficult in reality.


However, such a system is able to reproduce sound sufficiently efficiently. For example, in a case where ambisonics and binaural reproduction technology are combined, not only an amount of operations such as a convolution operation of the head-related transfer function increases, but a usage amount of a memory used for the operations and the like also increases.


The present technology has been made in light of such a situation, and makes it possible to reproduce sound more efficiently.


Means for Solving the Problems

A signal processing device according to one aspect of the present technology includes: a rotation operation unit that rotates a head-related transfer function in a spherical harmonic domain by an operation on the basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; and a synthesis unit that synthesizes the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal.


A signal processing method or a program according to one aspect of the present technology includes steps of: rotating a head-related transfer function in a spherical harmonic domain by an operation on the basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; and synthesizing the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal.


In one aspect of the technology, the head-related transfer function in the spherical harmonic domain is rotated by the operation in which the order of the rotation matrix is limited on the basis of the rotation matrix corresponding to the rotation of the head of the listener, and the head-related transfer function after the rotation obtained by the operation and the sound signal of the spherical harmonic domain are synthesized to generate the headphone drive signal.


Effects of the Invention

According to one aspect of the present technology, it is possible to reproduce sound more efficiently.


It is to be noted that effects of the present technology are not necessarily limited to the effects described here, and may be any of the effects described in the present disclosure.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a diagram describing simulation of stereophony using a head-related transfer function.



FIG. 2 is a diagram describing calculation of a drive signal in a first technique.



FIG. 3 is a diagram describing calculation of a drive signal in a case where head tracking is performed.



FIG. 4 is a diagram describing calculation of a drive signal in a second technique.



FIG. 5 is a diagram describing calculation of a drive signal in a third technique.



FIG. 6 is a diagram describing an operation amount and a necessary memory amount.



FIG. 7 is a diagram describing calculation of a drive signal in a fourth technique.



FIG. 8 is a diagram describing a rotation matrix.



FIG. 9 is a diagram describing the rotation matrix.



FIG. 10 is a diagram describing the rotation matrix.



FIG. 11 is a diagram illustrating a configuration example of an audio processor.



FIG. 12 is a diagram describing a difference in an elevation angle direction.



FIG. 13 is a flow chart describing drive signal generation processing.



FIG. 14 is a diagram illustrating a configuration example of an audio processor.



FIG. 15 is a flow chart describing drive signal generation processing.



FIG. 16 is a diagram illustrating a configuration example of a control system.



FIG. 17 is a diagram describing resetting and an operation amount.



FIG. 18 is a diagram describing resetting for each degree.



FIG. 19 is a diagram describing resetting for each time frequency.



FIG. 20 is a diagram illustrating a configuration example of a control system.



FIG. 21 is a diagram illustrating a configuration example of a computer.





MODES FOR CARRYING OUT THE INVENTION

Some embodiments to which the present technology is applied are described below in detail with reference to the drawings.


First Embodiment
<About First Technique>

The present technology achieves a reproduction system that is more efficient in an operation amount and a memory usage amount by determining a head-related transfer function in a spherical harmonic domain corresponding to rotation of a head with use of accumulation of minute rotations and synthesizing, in the spherical harmonic domain, the head-related transfer function and an input signal of sound to be reproduced.


For example, spherical harmonic function transformation on a function f(θ, ϕ) on spherical coordinates is expressed by the following expression (1).










[

Math
.




1

]

















F
n
m

=



0

2

π






0
π




f


(

θ




,
ϕ

)






Y
n
m

_



(

θ
,
ϕ

)



sin





θ





d





θ





d





ϕ







(
1
)







In the expression (1), θ and ϕ respectively represent an elevation angle and a horizontal angle in the spherical coordinates, and Ynm(θ, ϕ) represents a spherical harmonic function. In addition, the spherical harmonic function Ynm(θ, ϕ) with “-” at a top thereof represents a complex conjugate of the spherical harmonic function Ynm(θ, ϕ).


Herein, the spherical harmonic function Ynm(θ, ϕ) is expressed by the following expression (2).









[

Math
.




2

]













Y
n
m



(

θ
,
ϕ

)


=



(

-
1

)

m






(


2

n

+
1

)




(

n
-
m

)

!



4



π


(

n
+
m

)


!







P
n
m



(

cos





θ

)




e

im





ϕ







(
2
)







In the expression (2), n and m represent a degree and an order of the spherical harmonic function Ynm(θ, ϕ), and are −n≤m≤n. The order m is also referred to as order or period, and hereinafter, in a case where it is not necessary to particularly distinguish n and m, the degree n and the order m are collectively referred to as degrees.


In addition, in the expression (2), i represents a pure imaginary number, and Pnm(x) represents an associated Legendre function.


The associated Legendre function Pnm(x) is expressed by the following expression (3) or (4) in a case where n≥0 and 0≤m≤n. It is to be noted that the expression (3) is in a case where m=0.









[

Math
.




3

]













P
n
0



(
x
)


=


1


2
n



n
!






d
n


d


x
n






(


x
2

-
1

)

n






(
3
)






[

Math
.




4

]













P
n
m



(
x
)


=



(

1
-

x
2


)


m
/
2





d
n


d


x
n






P
m
0



(
x
)







(
4
)







In addition, in a case where −n≤m≤0, the associated Legendre function Pnm(x) is expressed by the following expression (5).









[

Math
.




5

]













P
n
m



(
x
)


=



(

-
1

)


-
m






(

n
+
m

)

!



(

n
-
m

)

!





P
n

-
m




(
x
)







(
5
)







Further, inverse transformation from a function Fnm obtained by the spherical harmonic function transformation into the function f(θ, ϕ) on the spherical coordinates is as expressed in the following expression (6).









[

Math
.




6

]












f


(

θ
,
ϕ

)


=




n
=
0








m
=

-
n


n




F
n
m




Y
n
m



(

θ
,
ϕ

)









(
6
)







From the above, transformation from an input signal D′nm(ω) of sound after correction in a radial direction, which is held in the spherical harmonic domain, into a speaker drive signal S(xi, ω) of each of L number of speakers arranged on a spherical surface having a radius R is as expressed in the following expression (7).









[

Math
.




7

]












S


(


x
i

,
ω

)


=




N


n
=
0







n


m
=

-
n







D
n







m




(
ω
)





Y
n
m



(


β
i

,

α
i


)









(
7
)







It is to be noted that in the expression (7), xi represents a position of the speaker, and w represents a time frequency of a sound signal. The input signal D′nm(ω) is a sound signal corresponding to each degree n and each order m of the spherical harmonic function for a predetermined time frequency ω.


Further, xi=(R sin βi cos αi, R sin βi sin αi, R cos βi), and i represents a speaker index that specifies the speaker. Herein, i=1, 2, . . . , L, and βi and αi respectively represent an elevation angle and a horizontal angle that indicate a position of the i-th speaker.


Such transformation expressed by the expression (7) is spherical harmonic inverse transformation corresponding to the expression (6). In addition, in a case of determining the speaker drive signal S(xi, ω) by the expression (7), it is necessary for the L number of speakers and a degree N of the spherical harmonic function, that is, a maximum value N of the degree n to satisfy a relationship expressed by the following expression (8). The L number of speakers is the number of reproducing speakers.





[Math. 8]






L>(N+1)2  (8)


Incidentally, a general technique of simulating stereophony at ears by representation with headphones is, for example, a method using the head-related transfer function as illustrated in FIG. 1.


In an example illustrated in FIG. 1, an inputted ambisonics signal is decoded to generate a speaker drive signal of each of virtual speakers SP11-1 to SP11-8, which are a plurality of virtual speakers. The signal decoded at this time corresponds to, for example, the input signal D′nm(ω) described above.


Herein, each of the virtual speakers SP11-1 to virtual speakers SP11-8 is annularly disposed and virtually arranged, and the speaker drive signal of each of the virtual speakers is determined by the calculation of the expression (7) described above. It is to be noted that the virtual speakers are also simply referred to as virtual speakers SP11 hereinafter in a case where it is not necessary to particularly distinguish the virtual speakers SP11-1 to SP11-8.


In a case where the speaker drive signals of the respective virtual speakers SP11 are thus obtained, for each of the virtual speakers SP11, left and right drive signals (binaural signals) of headphones HD11 that actually reproduce sound are generated by a convolution operation using the head-related transfer function. Then, the sum of the respective drive signals of the headphones HD11 obtained for each of the virtual speakers SP11 is a final drive signal.


It is to be noted that such a technique is described in detail in, for example, “ADVANCED SYSTEM OPTIONS FOR BINAURAL RENDERING OF AMBISONIC FORMAT (Gerald Enzner et. al. ICASSP 2013) and the like.


The head-related transfer function H(x, ω) used to generate the left and right drive signals of the headphones HD11 is obtained by normalizing a transfer characteristic H1(x, ω) from a sound source position x in a state in which a head of a user, who is a listener, exists in free space to positions of eardrums of the user by a transfer characteristic H0(x, ω) from the sound source position x in a state in which the head does not exit to a head center O. That is, the head-related transfer function H(x, ω) for the sound source position x is obtained by the following expression (9).









[

Math
.




9

]












H


(

x
,
ω

)


=



H
1



(

x
,
ω

)




H
0



(

x
,
ω

)







(
9
)







Herein, the head-related transfer function H(x, ω) is convolved with an optional audio signal, and a thus-obtained result is presented with headphones or the like, which makes it possible to give, to the listener, an illusion as if sound comes from a direction of the convolved head-related transfer function H(x, ω), that is, a direction of the sound source position x.


In the example illustrated in FIG. 1, the left and right drive signals of the headphones HD11 are generated with use of such a principle.


Specifically, the position of each of the virtual speakers SP11 is set as the position xi, and the speaker drive signals of these virtual speakers SP11 are set as S(xi, ω).


In addition, the number of the virtual speakers SP11 is set as L (herein, L=8), and the final left and right drive signals of the headphones HD11 are respectively set as Pl and Pr.


In this case, in a case where the speaker drive signals S(xi, ω) are simulated by presentation with the headphones HD11, it is possible to determine the left and right drive signals Pl and Pr of the headphones HD11 by calculation of the following expression (10).









[

Math
.




10

]













P
l

=





i
=
1


L




S


(


x
i

,
ω

)





H
l



(


x
i

,
ω

)












P
r

=





i
=
1


L




S


(


x
i

,
ω

)





H
r



(


x
i

,
ω

)









(
10
)







It is to be noted that, in the expression (10), Hl(xi, ω) and Hr(xi, ω) represent normalized head-related transfer functions from the position xi of the virtual speaker SP11 to left and right eardrum positions of the listener, respectively.


Such an operation makes it possible to finally reproduce the input signal D′nm(ω) of the spherical harmonic domain by presentation with the headphones. That is, it is possible to achieve the same effects as those of ambisonics by presentation with the headphones.


It is to be noted that, hereinafter, in a case where it is not necessary to particularly distinguish the drive signal Pl and the drive signal Pr for the time frequency ω, the drive signal Pl and the drive signal Pr are also simply referred to as drive signals P(ω). In addition, in a case where it is not necessary to particularly distinguish the head-related transfer function Hl(xi, ω) and the head-related transfer function Hr(xi, ω), the head-related transfer function Hl(xi, ω) and the head-related transfer function Hr(xi, ω) are also simply referred to as head-related transfer functions H(xi, ω).


Further, hereinafter, the technique of combining ambisonics and binaural reproduction technology described above is also referred to as first technique.


In the first technique, for example, an operation illustrated in FIG. 2 is performed to obtain the drive signal P(ω) of 1×1, that is, one row and one column.


In FIG. 2, H(ω) represents a vector (matrix) of 1×L including the L number of head-related transfer functions H(xi, ω). In addition, D′(ω) represents a vector including the input signal D′nm(ω), and the vector D′(ω) becomes K×1, where the number of input signals D′nm(ω) of bins of the same time frequency ω is K. Further, Y(x) represents a matrix including the spherical harmonic function Ynmi, αi) of each degree, and the matrix Y(x) becomes a matrix of L×K.


Accordingly, in the first technique, a matrix (vector) S obtained from a matrix operation of the matrix Y(x) of L×K and the vector D′(ω) of K×1 is determined, and a matrix operation of the matrix S and the vector (matrix) H(ω) of 1×L is further performed to obtain one drive signal P(ω).


In addition, in a case where the head of the listener wearing the headphones HD11 rotates in a predetermined direction expressed by a rotation matrix gj (hereinafter also referred to as direction gj), for example, the drive signal Pl(gj, ω) of a left headphone of the headphones HD11 is as expressed in the following expression (11).









[

Math
.




11

]













P
l



(


g
j

,
ω

)


=




i
=
1

L




S


(


x
i

,
ω

)





H
l



(



g
j

-
1




x
i


,
ω

)








(
11
)







It is to be noted that the rotation matrix gj is a three-dimensional, i.e., 3×3 rotation matrix represented by ϕ, θ, and ψ that are rotational angles of Euler angles. In addition, in the expression (11), the drive signal Pl(gj, ω) represents the drive signal Pl described above, and is written as the drive signal Pl(gj, ω) herein to clarify the position, that is, the direction gj and the time frequency ω.


In this case, the rotation direction of the head of the listener, that is, the direction gj of the head of the listener may be obtained by some sensor, and left and right drive signals of the headphones HD11 may be calculated using the head-related transfer function of a relative direction gj−1xi of each of the virtual speakers SP11 viewed from the head of the listener from among a plurality of head-related transfer functions. Thus, even in a case where sound is reproduced by the headphones HD11, it is possible to fix a sound image position viewed from the listener in space similarly to a case where real speakers are used.


<About Second Technique>

In addition, in the first technique, convolution of the head-related transfer function performed in the time frequency domain may be performed in a spherical harmonic domain. Doing so makes it possible to reduce the operation amount and the necessary memory amount as compared with the first technique, and to reproduce sound more efficiently. Such a technique of convoluting the head-related transfer function in the spherical harmonic domain is also referred to as second technique, and the second technique is described below.


For example, in a case where attention is focused on the left headphone, the vector Pl(ω) including each of the drive signals Pl(gl, ω) of the left headphone for all rotation directions of the head of the user (listener), who is a listener, is expressed by the following expression (12).









[

Math
.




12

]
















P
l



(
ω
)


=




H


(
ω
)




S


(
ω
)









=




H


(
ω
)




Y


(
x
)





D




(
ω
)










(
12
)







It is to be noted that, in the expression (12), S(ω) is a vector including the speaker drive signal S(xi, ω), and S(ω)=Y(x)D′(ω). In addition, in the expression (12), Y(x) represents a matrix including each degree and the spherical harmonic function Ynm(xi) of the position xi of each of the virtual speakers expressed by the following expression (13). Herein, i=1, 2, . . . , L, and a maximum value (maximum degree) of the degree n is N.


D′(ω) represents a vector (matrix) including the input signal D′nm(ω) of sound corresponding to each degree, which is expressed by the following expression (14). Each input signal D′nm(ω) is a sound signal of the spherical harmonic domain.


Further, in the expression (12), H(ω) represents a matrix, as expressed by the following expression (15), including the head-related transfer function H(gj−1xi, ω) of the relative direction gj−1xi of each of the virtual speakers viewed from the head of the listener in a case where the direction of the head of the listener is the direction gj. In this example, the head-related transfer function H(gj−1xi, ω) of each of the virtual speakers is prepared for each of the total M number of directions g1 to gM.









[

Math
.




13

]












Y


(
x
)


=

(





Y
0
0



(

x
1

)









Y
N
N



(

x
1

)



















Y
0
0



(

x
L

)









Y
N
N



(

x
L

)





)





(
13
)






[

Math
.




14

]













D




(
ω
)


=

(





D
0
′0



(
ω
)













D
N



N




(
ω
)





)





(
14
)






[

Math
.




15

]












H


(
ω
)


=

(




H


(



g
1

-
1




x
1


,
ω

)








H


(



g
1

-
1




x
L


,
ω

)


















H


(



g
M

-
1




x
1


,
ω

)








H


(



g
M

-
1




x
L


,
ω

)





)





(
15
)







In calculating the drive signal Pl(gj, ω) of the left headphone in a case where the head of the listener is directed in the direction gj, it is sufficient if a row corresponding to the direction gj, which is the direction of the head of the listener, that is, a row including the head-related transfer function H(gj−1xi, ω) for the direction gj is selected from the matrix H(ω) of the head-related transfer functions to perform calculation of the expression (12).


In this case, only a necessary row is calculated as illustrated in FIG. 3, for example.


In this example, the head-related transfer function is prepared for each of the M number of directions; therefore, matrix calculation expressed by the expression (12) is as indicated by an arrow A11.


That is, in a case where the number of input signals D′nm(ω) of the time frequency ω is K, the vector D′(ω) is K×1, that is, a matrix of K rows and one column. In addition, the matrix Y(x) of the spherical harmonic function is L×K, and the matrix H(ω) is M×L. Accordingly, in the calculation of the expression (12), the vector Pl(ω) is M×1.


Herein, a matrix operation (product-sum operation) of the matrix Y(x) and the vector D′(ω) is first performed in an online operation to determine the vector S(ω), which makes it possible to select a row corresponding to the direction gj of the head of the listener in the matrix H(ω) as indicated by the arrow A12 and reduce the operation amount at the time of calculation of the drive signal Pl(gj, ω). In FIG. 3, a hatched portion in the matrix H(ω) represents the row corresponding to the direction gj, and an operation of this row and the vector S(ω) is performed to calculate the desired drive signal Pl(gj, ω) of the left headphone.


Herein, the matrix H′(ω) is defined as expressed by the following expression (16), which makes it possible to express, by the following expression (17), the vector Pl(ω) expressed by the expression (12).





[Math. 16]






H′(ω)=H(ω)Y(x)  (16)





[Math. 17]






P
l(ω)=H′(ω)D′(ω)  (17)


In the expression (16), the head-related transfer function, more specifically, the matrix H(ω) including the head-related transfer function in the time-frequency domain, is transformed by the spherical harmonic function transformation using the spherical harmonic function into the matrix H′(ω) including the head-related transfer function in the spherical harmonic domain.


Accordingly, in calculation of the expression (17), convolution of the speaker drive signal and the head-related transfer function is performed in the spherical harmonic domain. In other words, in the spherical harmonic domain, the product-sum operation of the head-related transfer function and the input signal is performed. It is to be noted that it is possible to calculate and hold the matrix H′(ω) in advance.


In this case, in calculating the drive signal Pl(gj, ω) of the left headphone in a case where the head of the listener is directed in the direction gj, it is sufficient if only the row corresponding to the direction gj of the head of the listener is selected from the matrix H′(ω) held in advance to calculate the expression (17).


In such a case, calculation of the expression (17) is calculation expressed by the following expression (18). This makes it possible to greatly reduce the operation amount and the necessary memory amount.









[

Math
.




18

]













P
I



(


g
j

,
ω

)


=




N


n
=
0







n


m
=

-
n







H
n







m




(


g
j

,
ω

)





D
n







m




(
ω
)









(
18
)







In the expression (18), H′nm(gj, ω) is one element of the matrix H′(ω), that is, a head-related transfer function in the spherical harmonic domain, which is a component (element) corresponding to the direction gj of the head in the matrix H′(ω). In the head-related transfer function H′nm(gj, ω), n and m represent the degree n and the order m of the spherical harmonic function.


In such an operation expressed by the expression (18), the operation amount is reduced as illustrated in FIG. 4. That is, calculation expressed by the expression (12) is calculation to determine a product of the matrix H(ω) of M×L, the matrix Y(x) of L×K, and the vector D′(ω) of K×1 as indicated by an arrow A21 in FIG. 4.


Herein, H(ω)Y(x) is the matrix H′(ω) as defined in the expression (16); therefore, the calculation indicated by the arrow A21 eventually becomes as indicated by an arrow A22. In particular, it is possible to perform calculation for determining the matrix H′(ω) offline, that is, in advance; therefore, determining and holding the matrix H′(ω) in advance makes it possible to reduce the operation amount for determining the drive signals of the headphones online by that amount.


In a case where the matrix H′(ω) is thus determined in advance, the calculation indicated by the arrow A22, that is, the calculation of the expression (18) described above is performed to actually determine the drive signals of the headphones.


That is, as indicated by the arrow A22, the row corresponding to the direction gj of the head of the listener in the matrix H′(ω) is selected, and the drive signal Pl(gj, ω) of the left headphone is calculated by a matrix operation of that selected row and the vector D′(ω) including the inputted input signal D′nm(ω). In FIG. 4, a hatched portion in the matrix H′(ω) represents the row corresponding to the direction gj, and an element included in this row is the head-related transfer function H′nm(gj, ω) expressed by the expression (18).


<About Third Technique>

Incidentally, in the second technique described above, while it is possible to greatly reduce the operation amount and the necessary memory amount, it is necessary to hold all the rotation directions of the head of the listener, that is, the rows corresponding to the respective directions gj in a memory as the matrix H′(ω) of the head-related transfer functions.


Accordingly, a matrix (vector) including the head-related transfer function of the spherical harmonic domain for one direction gj may be set as Hs(ω)=H′(gj), and only the matrix Hs(ω) including the row corresponding to the one direction gj of the matrix H′(ω) may be held, and a rotation matrix R′(gi) for performing rotation corresponding to head rotation of the listener in the spherical harmonic domain may be held by the number of the plurality of directions gj. Hereinafter, such a technique is referred to as third technique.


The rotation matrix R′(gj) of each of the directions gj is different from the matrix H′(ω) and has no time frequency dependence. This makes it possible to greatly reduce the memory amount as compared with making the matrix H′(ω) hold the component of the direction gj of rotation of the head.


First, a product H′(gi−1, ω) of a row H(gi−1, ω) corresponding to a predetermined direction gj of the matrix H(ω) and the matrix Y(x) of the spherical harmonic function is considered as expressed by the following expression (19).





[Math. 19]






H′(gj−1,ω)=H(gj−1x,ω)Y(x)  (19)


In the second technique described above, coordinates of the head-related transfer function used are rotated from x to gi−1x for the direction gj of the rotation of the head of the listener. However, the same result is obtainable by rotating coordinates of the spherical harmonic function from x to gjx without changing the coordinates of the position x of the head-related transfer function. That is, the following expression (20) is established.





[Math. 20]






H′(gj−1,ω)=H(gj−1x,ω)Y(x)=H(x,ω)Y(gjx)  (20)


Further, the matrix Y(gjx) of the spherical harmonic function is the product of the matrix Y(x) and the rotation matrix R′(gj−1), and is as expressed by the following expression (21). It is to be noted that the rotation matrix R′(gj−1) is a matrix that rotates the coordinates by gj in the spherical harmonic domain.





[Math. 21]






Y(gjx)=Y(x)R′(gj−1)  (21)


Herein, for the set Q expressed by the following expression (22), elements other than elements in rows (n2+n+1+k) and columns (n2+n+1+m+m) of the rotation matrix R′(gj), which are (n2+n+1+k) and (n2+n+1+m) belonging to Q, are zero.





[Math. 22]






Q={q|n
2+1≤q≤(n+1)2, q,n∈{0,1,2 . . . }}  (22)


Accordingly, it is possible to express the spherical harmonic function Ynm(gjx), which is an element of the matrix Y(gjx), by the following expression (23) using the element R′(n)k,m(gj) in the (n2+n+1+k) rows and the (n2+n+1+m) columns of the rotation matrix R′(gi).









[

Math
.




23

]













Y
n
m



(


g
j


x

)


=




n


k
=

-
n







Y
n
k



(
x
)





R

k
,
m





(
n
)





(

g
j

-
1


)








(
23
)







Herein, the element R′(n)k,m(gj) is expressed by the following expression (24).





[Math. 24]






R′
k,m
(n)(gj)=e−imϕrk,m(n)(θ)e−ikψ  (24)


In the expression (24), i represents a pure imaginary number, θ, ϕ, and ψ represent rotational angles of Euler angles of the rotation matrix, and r(n)k,m(θ) is expressed by the following expression (25).














[

Math
.




25

]














r

k
,
m


(
n
)




(
θ
)


=





(

n
+
k

)



!


(

n
-
k

)

!





(

n
+
m

)



!


(

n
-
m

)

!









σ




(




n
+
m






n
-
k
-
σ




)



(




n
-
m





σ



)




(

-
1

)


n
-
k
-
σ





(

cos


θ
2


)



2

σ

+
k
+
m





(

sin


θ
2


)



2

n

-

2

σ

-
k
-
m









(
25
)







From the above, a binaural reproducing signal reflecting the rotation of the head of the listener by using the rotation matrix R′(gj−1), for example, the drive signal Pl(gj, ω) of the left headphone is obtained by calculating the following expression (26). In addition, in a case where the left and right head-related transfer functions are optionally considered to be symmetric, performing inversion is performed using a matrix Rref that makes either the matrix D′(ω) of the input signal or the row vector Hs(ω) of a left head-related transfer function horizontally flip as pre-processing of the expression (26), which makes it possible to obtain a right headphone drive signal only by holding the row vector Hs(ω) of the left head-related transfer function. Note that a case where different left and right head-related transfer functions are necessary is basically described below.














[

Math
.




26

]














P
I



(


g
j

,
ω

)


=



H


(



g
j

-
1



x

,
ω

)




Y


(
x
)





D




(
ω
)



=



H


(

x
,
ω

)




Y


(
x
)





R




(

g
j

-
1


)





D




(
ω
)



=



H
S



(
ω
)





R




(

g
j

-
1


)





D




(
ω
)









(
26
)







In the expression (26), the drive signal Pl(gj, ω) is determined by synthesizing the row vector Hs(ω), the rotation matrix R′(gj−1), and the vector D′(ω).


The calculation as described above is, for example, calculation illustrated in FIG. 5. That is, the vector Pl(ω) including the drive signal Pl(gj, ω) of the left headphone is obtained by the product of the matrix H(ω) of M×L, the matrix Y(x) of L×K, and the vector D′(ω) of K×1, as indicated by an arrow A41 in FIG. 5. This matrix operation is as expressed by the expression (12) described above.


This operation is represented by using the matrix Y(gjx) of the spherical harmonic function prepared for each of M number of directions gj, as indicated by an arrow A42. That is, the vector Pl(ω) including the drive signal Pl(gj, ω) corresponding to each of the M number of directions gj is obtained by the product of the predetermined row H(x, ω) of the matrix H(o)), the matrix Y(gjx), and the vector D′(ω) from a relationship expressed by the expression (20).


Herein, the row H(x, ω), which is a vector, is 1×L, the matrix Y(gjx) is L×K, and the vector D′(ω) is K×1. This is further transformed by using relationships expressed by the expressions (17) and (21), which is as indicated by an arrow A43. That is, as expressed by the expression (26), the vector NO is obtained by the product of the row vector Hs(ω) of 1×K, the rotation matrix R′(gj−1) of K×K of each of the M number of directions gj, and the vector D′(ω) of K×1.


It is to be noted that, in FIG. 5, hatched portions of the rotation matrix R′(gj−1) represent non-zero elements of the rotation matrix R′(gj−1).


In addition, FIG. 6 illustrates the operation amount and the necessary memory amount in such a third technique.


That is, it is assumed that, as illustrated in FIG. 6, the row vector Hs(ω) of 1×K is prepared for each time frequency bin w, the rotation matrix R′(gj−1) of K×K is prepared for the M number of directions gj, and the vector D′(ω) is K×1. In addition, it is assumed that the number of time frequency bins ω is W, and the maximum value of the degree n of the spherical harmonic function, that is, the maximum degree is J.


At this time, the number of non-zero elements of the rotation matrix R′(gj) is (J+1)(2J+1)(2J+3)/3; therefore, the total calc/W of the number of product-sum operations per time frequency bin ω in the third technique is as expressed by the following expression (27).









[

Math
.




27

]












calc


/


W

=




(

J
+
1

)



(


2

J

+
1

)



(


2

J

+
3

)


3

+

2

K






(
27
)







In addition, in the operation by the third technique, it is necessary to hold the row vector Hs(ω) of 1×K for each time frequency bin ω for left and right ears, and further, it is necessary to hold non-zero elements of the rotation matrix R′(gj−1) for each of the M number of directions. Accordingly, a memory amount “memory” necessary for the operation by the third technique is as expressed by the following expression (28).









[

Math
.




28

]











memory
=


M
×



(

J
+
1

)



(


2

J

+
1

)



(


2

J

+
3

)


3


+

2
×
K
×
W






(
28
)







In the third technique, holding the number of non-zero elements of the rotation matrix R′(gj−1) makes it possible to greatly reduce the necessary memory amount as compared with the second technique.


<About Fourth Technique>

It is to be noted that, in the third technique, it is necessary to hold the rotation matrices R′(gj−1) for rotation of three axes of the head of the listener, that is, for optional M number of directions gj. To hold such rotation matrices R′(gj−1), a certain memory amount is necessary, though the amount is less than that in a case of holding the matrix H′(ω) with time frequency dependence.


Accordingly, the rotation matrix R′(gj−1) for performing rotation about the head of the listener as a rotation center in the spherical harmonic domain may be sequentially determined at the time of an operation. Hereinafter, such a technique is also referred to as fourth technique.


Herein, it is possible to express a rotation matrix R′(g) by the following expression (29). In addition, g in the expression (29) is a rotation matrix, and is represented by the product of a matrix u(ϕ), a matrix a(ϕ), and a matrix u(ψ) as expressed by the following expression (30).





[Math. 29]






R′(g)=R′(u(ϕ)a(θ)u(ψ))=R′(u(ϕ))R′(a(θ))R′(u(ψ))   (29)





[Math. 30]






g=u(ϕ)a(θ)u(ψ)  (30)


It is to be noted that, in the expression (29), a(θ) and u(ϕ) are rotation matrices that rotate coordinates by an angle θ and an angle ϕ about a coordinate axis as a rotation axis of a coordinate system in which the position of the head of the lister is an origin point. In addition, u(ψ) is a rotation matrix that is only different in the rotation angle from u(ϕ) and rotates the coordinates by an angle ψ about the same coordinate axis as the rotation axis. It is to be noted that rotation angles of the respective matrices u(ϕ), a(θ), and u(ψ), that is, the angle ϕ, the angle θ, and the angle ψ are Euler angles.


For example, it is assumed that there is an orthogonal coordinate system in which the position of the head of the listener is set as the origin point, and an x axis, a y axis, and a z axis orthogonal to each other are respective axes. Herein, in a state in which the listener is directed to front, a positive direction of the x axis is a direction of the front, and the z axis is an upward-downward direction viewed from the listener directed to the front, that is, an axis in a vertical direction. The angle ϕ, the angle θ, and the angle ψ are rotation angles to respective rotation directions relative to the state in which the listener is directed to the front, that is, to the positive direction of the x axis.


Specifically, the rotation angle of the head in a case where the head moves in the upward-downward direction about the y axis as the rotation axis while the listener seeing the front is the angle θ that is an elevation angle. Further, the rotation angle of the head in a case where the head moves in a horizontal direction viewed from the listener about the z axis as the rotation axis while the listener is directed to the front is the angle θ that is a horizontal angle.


The matrix a(θ) is a rotation matrix that rotates the coordinates (coordinate system) by the angle θ about the y axis as the rotation axis, and the matrix u(ϕ) is a rotation matrix that rotates the coordinates (coordinate system) by the angle ϕ about the z axis as the rotation axis. Specifically, these matrices a(θ) and u(ϕ) are as expressed by the following expressions (31) and (32), respectively.














[

Math
.




31

]












{


a


(
θ
)


=


(




cos





θ



0



sin





θ





0


1


0






-




sin






θ



0



cos





θ




)

|

θ


[

0
,

2

π


]




}




(
31
)






[

Math
.




32

]











{


u


(
ϕ
)


=


(




cos





ϕ





-
sin






ϕ



0





sin





ϕ




cos





ϕ



0




0


0


1



)

|

ϕ


[

0
,

2

π


]




}




(
32
)







Accordingly, for example, the matrix a(0) acts on an optional position v=(vx, vy, vz)T in the coordinate system with the position of the head of the listener as the origin point, which makes it possible to give rotation about the y axis as the rotation axis to the position v. A position v2 after the rotation of the position v is expressed by the following expression (33).


Similarly, the matrix u(ϕ) acts on the position v, which makes it possible to give rotation about the z axis as the rotation axis to the position v. A position v3 after the rotation of the position v is expressed by the following expression (34).





[Math. 33]






v
2
=a(θ)v  (33)





[Math. 34]






v
3
=u(ϕ)v  (34)


Accordingly, the rotation matrix R′(g)=R′(u(ϕ)a(θ)u(ψ)) is a rotation matrix that, in the spherical harmonic domain, rotates the coordinate system by the angle ϕ in a horizontal angle direction, then rotates, by the angle θ in an elevation angle direction viewed from that coordinate system, the coordinate system rotated by the angle ϕ, and further rotates, by the angle ψ in the horizontal angle direction viewed from that coordinate system, the coordinate system rotated by the angle θ.


In addition, R′(u(ϕ)), R′(a(θ)), and R′(u(ψ)) represent the rotation matrices R′(g) in a case where the coordinates are rotated by rotations by the matrix (u(ϕ)), the matrix (a(θ)), and the matrix (u(ψ)), respectively.


In other words, the rotation matrix R′(u(ϕ)) is a rotation matrix that rotates the coordinates by the angle ϕ in the horizontal angle direction in the spherical harmonic domain, and the rotation matrix R′(a(θ)) is a rotation matrix that rotates the coordinates by the angle θ in the elevation angle direction in the spherical harmonic domain. In addition, the rotation matrix R′(u(ψ)) is a rotation matrix that rotates the coordinates by the angle ψ in the horizontal angle direction in the spherical harmonic domain.


Thus, for example, as indicated by an arrow A51 in FIG. 7, it is possible to express the rotation matrix R′(g)=R′(u(ϕ)a(θ)u(ψ)), which rotates the coordinates three times by the angle ϕ, the angle θ, and the angle ψ as rotation angles, by the product of three rotation matrices R′(u(ϕ)), R′(a(θ)), and R′(u(ψ)).


In this case, it is sufficient if, as data for obtaining the rotation matrix R′(gj−1), the rotation matrix R′(u(ϕ)), the rotation matrix R′(a(θ)), and the rotation matrix R′(u(ψ)) for the respective values of the rotation angles ϕ, θ, and ψ are held in tables in the memory. In addition, in a case where the same head-related transfer function is optionally used for the left and the right, the row vector Hs(ω) is held for only one ear, and the matrix Rref described above for horizontal inversion is also held in advance, which makes it possible to obtain the rotation matrix for the other ear by determining the product of this and a generated rotation matrix.


In addition, in a case where the vector Pl(ω) is actually calculated, one rotation matrix R′(gj−1) is calculated by calculating the product of respective rotation matrices read out from tables. Then, as indicated by an arrow A52, the product of the matrix Hs(ω) of 1×K, the rotation matrix R′(gj−1) of K×K common to all the time frequency bins w, and the vector D′(ω) of K×1 is calculated for each of the time frequency bins w to determine the vector Pl(ω).


Herein, for example, in a case where the rotation matrix R′(gj−1) itself of each rotation angle is held in the table, it is necessary to hold 3603=46656000 rotation matrices R′(gj−1), where accuracy of the angle ϕ, the angle θ, and the angle ψ of each rotation is one degree (1°).


In contrast, in a case where the rotation matrix R′(u(ϕ)), the rotation matrix R′(a(θ)), and the rotation matrix R′(u(ψ)) of each rotation angle are held in tables, it is necessary to hold only 360×3=1080 rotation matrices, where accuracy of the angle θ, the angle θ, and the angle iv of each rotation is one degree (1°).


Accordingly, in a case where the rotation matrix R′(gj−1) itself is held, it is necessary to hold data of the order of O(n3). In contrast, in a case where the rotation matrix R′(u(ϕ)), the rotation matrix R′(a(θ)), and the rotation matrix R′(u(ψ)) are held, only data of the order of O(n) is sufficient, which makes it possible to greatly reduce the memory amount.


Moreover, as indicated by the arrow A51, the rotation matrix R′(u(ϕ)) and the rotation matrix R′(u(ψ)) are diagonal matrices; therefore, it is sufficient if only diagonal components are held.


In addition, the rotation matrix R′(u(ϕ)) and the rotation matrix R′(u(ψ)) are both rotation matrices for performing rotation in the horizontal angle direction, which makes it possible to obtain the rotation matrix R′(u(ϕ)) and the rotation matrix R′(u(ψ)) from the same common table. In other words, the table of the rotation matrix R′(u(ϕ)) and the table of the rotation matrix R′(u(ψ)) may be the same.


It is to be noted that, in FIG. 7, hatched portions of the respective rotation matrices represent non-zero elements.


Further, for k and m in a case where (n2+n+1+k) and (n2+n+1+m) belong to the set Q expressed by the expression (22) described above, elements other than elements in rows (n2+n+1+k) and columns (n2+n+1+m+m) of the rotation matrix R′(a(θ)) are zero; therefore, it is sufficient if only elements other than zero are held as the rotation matrix R′(a(θ)), which makes it possible to further reduce the memory amount.


From the above, it is possible to further reduce the memory amount necessary to hold data for obtaining the rotation matrix R′(gj−1).


Specifically, for example, in a case where Φ number of rotation matrices R′(u(ϕ)), Θ number of rotation matrices R′(a(θ)), and Ψ number of rotation matrices R′(u(ψ)) are held, the number M of rotation directions gj of the head becomes M=Φ×Θ×Ψ.


In the fourth technique, the rotation matrices R′(a(θ)) are held by accuracy of the angle θ, that is, the Θ number of rotation matrices R′(a(θ)) are held; therefore, the memory amount necessary to hold the rotation matrices R′(a(θ)) is memory(a)=Θ×x(J+1)(2J+1)(2J+3)/3.


In addition, for the rotation matrix R′(u(ϕ)) and the rotation matrix R′(u(ψ)), it is possible to use a common table, and in a case where the accuracy of the angle ϕ and the angle ψ are the same, it is sufficient if rotation matrices are held only by the angle θ, that is, the Φ number of rotation matrices are held, and it is sufficient if only the diagonal components of these rotation matrices are held. Accordingly, assuming that a length of the vector D′(ω) is K, the memory amount necessary to hold the rotation matrices R′(u(ϕ)) and the rotation matrices R′(u(ψ)) is memory(b)=Φ×K.


Further, assuming that the number of time frequency bins ω is W, the memory amount necessary to hold the row vector Hs(ω) of 1×K by the time frequency bins w for the left and right ears is 2×K×W.


Accordingly, as the sum of these memory amounts, the memory amount necessary in the fourth technique is the memory amount memory=memory(a)+memory(b)+2KW.


Such a fourth technique makes it possible to greatly reduce the memory amount necessary for the operation amount substantially the same as that in the third technique. Specifically, the fourth technique exerts more effects, for example, in a case where the accuracy of the angle ϕ, the angle θ, and the angle ψ is set to one degree (1°) or the like to withstand practical use in realizing a head tracking function.


<About Proposed Technique 1>

Incidentally, in the fourth technique, it is possible to reduce, to 1080, the number of rotation matrices to be held, for example, by having rotation with respect to three axes at every one degree, that is, by setting the accuracy of the angle ϕ, the angle θ, and the angle ψ to one degree (1°).


However, in the fourth technique, in terms of the operation amount, it is possible to reduce the maximum degree J of the degree n of the spherical harmonic function only to the cube order.


The reason for this is that the rotation matrix R′(a(θ)) for tracking rotation of the head of the listener (user) is a block diagonal matrix as illustrated in FIG. 8, for example.


It is to be noted that, in FIG. 8, a horizontal axis represents components of a column of the rotation matrix R′(a(θ)), and a vertical axis represents components of a row of rotation matrix R′(a(θ)). In addition, in FIG. 8, shades of gray at respective positions of the rotation matrix R′(a(θ)) indicate levels (dB) of elements corresponding to these positions of the rotation matrix R′(a(θ))



FIG. 8 illustrates the rotation matrix R′(a(θ)) in a case where the rotation angle θ is one degree. In this example, in a case where attention is focused on elements having a value of −400 dB or more, for example, in the rotation matrix R′(a(θ)), a portion including elements having such a value is a block having a size of (2n+1)(2n+1) for the degree n. For example, a square portion indicated by an arrow A71 is a portion of one block of a block diagonal matrix, and a width (thickness) W11 of the block is 2n+1. That is, in the square portion indicated by the arrow A71, (2n+1) elements are arranged in a row direction, and (2n+1) elements are also arranged in a column direction.


Using the rotation matrix R′(a(θ)) that is such a block diagonal matrix makes it possible to reduce the operation amount to some extent, but if it is possible to further reduce the operation amount, it is possible to obtain the drive signal more quickly and efficiently.


Accordingly, the present technology focuses on characteristics of the rotation matrix for minute rotation, and performing tracking of rotation of the head of the listener (user) by accumulation of the minute rotations makes it possible to reduce the operation amount to the square order of the degree J.


The technique of the present technology (hereinafter also referred to as proposed technique 1) is described in detail below.


Of rotation of three axes of the head of the listener, that is, the rotation matrix R′(u(ϕ)), the rotation matrix R′(a(θ)), and the rotation matrix R′(u(ψ)), only the rotation matrix R′(a(θ)) is a block diagonal matrix, and the other rotation matrices R′(u(ϕ)) and R′(u(ψ)) are fully diagonal matrices.


However, depending on how a rotation axis is selected, two or more rotation matrices may become block diagonal matrices in some cases. In an example of this specification, a rotation axis that causes two or more rotation matrices to become block diagonal matrices is not used, but the present technique is applicable to a case where two or more rotation matrices are block diagonal matrices.


It is assumed that the angle θ is 0 degrees in a case where the listener is directed to the direction of the front in the upward-downward direction (the vertical direction), that is, in the elevation angle direction.


The angle θ becomes one degree in a case where the listener moves his head from a state in which the angle θ is 0 degrees to an upward direction (to a positive direction of the z axis) by +1 degree, i.e., rotates his head about the y axis as the rotation axis to the positive direction of the z axis by +1 degree.


The rotation matrix R′(a(θ)) in such a case where the angle θ is one degree is as illustrated in FIG. 8 as described above.


In the example illustrated in FIG. 8, it can be seen that the rotation matrix R′(a(θ)) is a block diagonal matrix, and a portion of each block of the block diagonal matrix is a square including (2n+1) elements on one side for each degree n. At the same time, the rotation matrix R′(g) that is a synthesis of the rotation matrix R′(a(θ)), rotation matrix R′(u(ϕ)), which is a diagonal matrix, and the rotation matrix R′(u(ψ)), which is a diagonal matrix, is also a similar block diagonal matrix. Herein, the direction gj may be a discrete value or a continuous value; therefore, gj is hereinafter simply referred to as g.


Now, in a case where the head-related transfer function in the spherical harmonic domain is rotated for one block of the rotation matrix R′(g) that is a block diagonal matrix, that is, for a certain degree n, the head-related transfer function H′nm(g−1) after the rotation becomes as expressed by the following expression (35). That is, in a case where the head-related transfer function in the spherical harmonic domain is rotated by the angle of the direction g using a portion of a block of the degree n of the rotation matrix R′(g), the head-related transfer function H′nm(g−1) after the rotation becomes as expressed by the following expression (35).









[

Math
.




35

]













H
n







m




(

g

-
1


)


=




n


k
=

-
n






H
n







k





R

k
,
m





(
n
)





(
g
)








(
35
)







In the expression (35), k represents an order before the rotation, and m represents an order after the rotation. In addition, H′nk represents elements of the degree n and the order k in the row vector Hs(ω).


It can be seen from such calculation of the expression (35) that all (2n+1) elements R′(n)k,m(g) are used to determine the element of the order m after one rotation.


However, in a case where the angle θ is minute, such as a case of the angle θ=one degree, most of the respective elements of the rotation matrix R′(a(θ)) that is a block diagonal matrix have a minute value. Accordingly, most of the elements R′(n)k,m(g) of the rotation matrix R′(g) have a minute value.


That is, for example, the rotation matrix R′(a(θ)) illustrated in FIG. 9 indicates the rotation matrix R′(a(θ)) in a case where the angle θ is one degree that is the same as the rotation matrix R′(a(θ)) illustrated in FIG. 8.


That is, in FIG. 9, a horizontal axis represents components of a column of the rotation matrix R′(a(θ)), and a vertical axis represents components of a row of rotation matrix R′(a(θ)).


In addition, shades of gray at respective positions of the rotation matrix R′(a(θ)) indicate levels (dB) of elements corresponding to these positions of the rotation matrix R′(a(θ)).


However, in FIG. 8, a range of the level of each element of the rotation matrix R′(a(θ)) is from −400 dB to 0 dB, whereas, in FIG. 9, the range of the level of each element of the rotation matrix R′(a(θ)) is limited to a range from −100 dB to 0 dB.


As with an example illustrated in FIG. 9, in a case where an element having an effective value in the rotation matrix R′(a(θ)) is an element having a level of −100 dB to 0 dB, it can be seen that the element having the effective value exists only in the vicinity of diagonal components.


Further, it can be seen that the number of elements having the effective value in one focused row of the rotation matrix R′(a(θ)), that is, the number of elements having the effective value (hereinafter also referred to as effective element width) that are continuously disposed side by side in a lateral direction in FIG. 9 is almost the same in all degrees n.


Accordingly, the number of elements having an effective value in each degree n is only on the square order of J, which is nearly the maximum value of the degree n, even though the degree n increases.


Therefore, the element having a value within a range of a predetermined level, such as an element having a level of −100 dB to 0 dB of the rotation matrix R′(a(θ)) is set as an effective element, and only the effective element is used to perform an operation of rotating the head-related transfer function in the spherical harmonic domain, which makes it possible to reduce the operation amount. In other words, an element having a value within a range of a predetermined level of the rotation matrix R′(g) is set as an effective element, and only the effective element is used to perform the operation of rotating the head-related transfer function in the spherical harmonic domain, which make is possible to reduce the operation amount. The effective element width of the rotation matrix R′(g) is the same as the effective element width of the rotation matrix R′(a(θ)).


For example, in a case where the effective element width is 2C+1, calculation of the expression (35) described above is as expressed by the following expression (36).









[

Math
.




36

]













H
n







m




(

g

-
1


)








k
=

max


(


-
n

,

m
-
C


)





min


(

n
,





m
+
C


)






H
n







k





R

k
,
m





(
n
)





(
g
)








(
36
)







Note that, in the expression (36), min(a, b) represents a function that selects a smaller one of a and b. In the expression (36), max(a, b) represents a function that selects a larger one of a and b.


In the expression (35), (2n+1) elements R′(n)k,m(g) of the order k ranging from n to n are used for each degree n, but only (2C+1) elements R′(n)k,m(g) of the order k ranging from m−C to m+C, where m is set as a center, are used in calculation of the expression (36), thereby achieving a reduction in the operation amount. It is to be noted that, in a case where k is larger than n and in a case where k is smaller than −n, an operation is performed for k up to n and k up to −n, respectively, not to exceed a matrix range. The operation in which the order k is limited is performed in such a manner, that is, the operation is performed only on elements in which the order k has a value within a range determined by C, which makes it possible to reduce the operation amount.


In this case, the effective element width of 2C+1 is the same in all degrees n; therefore, it can be seen that the larger the degree J, the more advantageous the proposed technique 1 is in terms of the operation, as compared with the fourth technique described above.


It is to be noted that, in the expression (36), a constant C determined from the effective element width is applied to all degrees n. However, C determining the effective element width of 2C+1 is not limited to a constant, and a function C(n) of the degree n (where C(n)<n) may be used as C, or a function C(n, k) of the degree n and the order k may be used as C. Herein, it is sufficient if the function C(n) or the function C(n, k) is a natural number smaller than the degree n. In other words, it is sufficient if the operation is performed with the number of elements even slightly smaller than that in the operation using the elements of an entire block of the rotation matrix R′(a(θ)), which is a block diagonal matrix, that is, the rotation matrix R′(g).


In addition, the element used in the operation of the rotation matrix R′(a(θ)) may be an element itself of the rotation matrix R′(a(θ)) or may be an approximate value of the element of the rotation matrix R′(a(θ)).


That is, more generally, it is assumed that it is possible to express the rotation matrix R′(a(θ)) as R′(a(θ))=A1+A2+A3+ . . . by combining a certain plurality of matrices. In this case, for an approximate rotation matrix Rs′(a(θ)) represented by the sum of some extracted ones of matrices included in the rotation matrix R′(a(θ)), an operation may be performed using a smaller number of elements than (2n+1)×(2n+1) elements in each of n-th order blocks.


For example, it is possible to express an n-th order block diagonal matrix R′(n)(β) of the rotation matrix R′(a(θ)) by the following expression (37).














[

Math
.




37

]














R




(
n
)





(
β
)


=


exp







S
n



(
β
)



=





S
n



(
β
)


0


0
!


+




S
n



(
β
)


1


1
!


+




S
n



(
β
)


2


2
!


+




S
n



(
β
)


3


3
!


+







(
37
)







Herein, a matrix Sn(β) in the expression (37) is expressed by the following expression (38). In a case where a thickness of the approximate rotation matrix Rs′(a(θ)) is desired to be C with use of this matrix Sn(β), it is sufficient if calculation is limited to calculation up to the C-th power in a polynomial of the matrix expressed by the expression (37).














[

Math
.




38

]














S
n



(
β
)


=


β
2



(



0




1


(

2

n

)





0







0





-


(


2

n

-
1

)





0




2


(


2

n

-
1

)










0




0



-



(


2

n

-
1

)


2





0


























(


2

n

-
1

)


2




0




0








-


2


(


2

n

-
1

)













(


2

n

-
1

)






0













-


1


(

2

n

)












)






(
38
)







By doing so, in the rotation matrix Rs′(a(θ)) used as the rotation matrix R′(a(θ)), elements having a non-zero value are almost only diagonal components. Accordingly, a rotation operation that causes the head-related transfer function to be rotated using the non-zero element of the rotation matrix R′(g) obtained with use of the rotation matrix Rs′(a(θ)), that is, an matrix operation of the rotation matrix R′(g) and the row vector Hs(ω) is performed, resulting in performing an operation in which the order of the rotation matrix R′(g) is limited, which makes it possible to reduce the operation amount.


It is to be noted that, in this case, for example, the rotation matrix R′(u(ϕ)), the rotation matrix Rs′(a(θ)), and the rotation matrix R′(u(ψ)) are synthesized to form the rotation matrix R′(g), and a matrix operation in which the order is limited is performed.


In a case where tracking of the rotation of the head of the listener is performed by the proposed technique 1 as described above, it is assumed that, for example, the listener has rotated his head by 30 degrees in the upward direction, that is, in the elevation angle direction. That is, it is assumed that the elevation angle (the angle θ) indicating the direction of the head of the listener has become 30 degrees.


In this case, the rotation matrix R′(a(θ)) becomes as illustrated in FIG. 10. It is to be noted that, in FIG. 10, a horizontal axis represents components of a column of the rotation matrix R′(a(θ)), and a vertical axis represents components of a row of rotation matrix R′(a(θ)). In addition, in FIG. 8, shades of gray at respective positions of the rotation matrix R′(a(θ)) indicate levels (dB) of elements corresponding to these positions of the rotation matrix R′(a(θ)).


In FIG. 10, similarly to the case in FIG. 9, the range of the level of each element of the rotation matrix R′(a(θ)) is limited to a range from −100 dB to 0 dB.


However, in an example illustrated in FIG. 10, the larger the degree n, the thicker (larger) the effective element width of a block for that degree n becomes. That is, even if components of −100 dB or less are truncated, the rotation matrix R′(a(θ)) becomes a block-diagonal matrix having a thick effective element width.


As described above, in the rotation matrix R′(a(θ)), in a case where the rotation angle θ is small, the effective element width is narrow, which makes it possible to reduce the operation amount as described with reference to FIG. 9, but the effective element width becomes thicker with an increase in the rotation angle θ, which reduces an effect of reducing the operation amount.


In addition, in this state, it is necessary to increase the constant C that determines the effective element width 2C+1 as the head of the listener rotates more in the elevation angle direction.


In order to track rotation of the head up to a large rotation angle θ in the elevation angle direction while keeping the operation amount small, it is sufficient if accumulation of minute rotations is used.


That is, for example, the direction of the head of the listener (user) at a predetermined time is expressed by (ϕ, θ, ψ) using the Euler angles. Herein, the angle ϕ, the angle θ, and the angle ψ respectively correspond to the rotation angle ϕ, the rotation angle θ, and the rotation angle ψ described above. It is to be noted that, herein, the direction g that is the rotation direction of the head of the listener is represented by the Euler angles, but may be represented by, for example, another method such as a quarternion. In the following description, unless otherwise specified, the direction g is represented with use of the Euler angles.


Specifically, the angle ϕ and the angle ψ are horizontal angles viewed from the listener, and the angle θ is an elevation angle viewed from the listener. Specifically, the angle θ at a time t is hereinafter referred to as angle θt. Similarly, the angle ϕ and the angle ψ at the time t are hereinafter referred to as the angle ϕt and the angle ψt, respectively.


In a case where accumulation of the minute rotations is used, it is sufficient if the rotation matrix R′(gt) is updated by determining a difference Δgt=gtgt-1−1 between the angle gt indicating the direction g at the time t and an angle gt-1 at a time (t−1) immediately before the time t, that is, the time (t−1) before the time t, and rotating a previously obtained rotation matrix R′(gt-1) by the difference Δgt. That is, it is sufficient if the product of the previously obtained rotation matrix R′(gt-1) at the time (t−1) and the rotation matrix R′(Δgt) corresponding to the difference Δgt is defined as the rotation matrix R′(gt) at the time t.


This make it possible to obtain the rotation matrix R′(gt) with a smaller operation amount with use of the rotation matrix R′(Δgt)=R′(u(Δϕt))R′(a(Δθt))R′(u(Δψt)). The rotation matrix R′(Δgt) is obtained by synthesizing a rotation matrix R′(a(Δθt)) in which the effective component width is narrow for a difference Δθt of the difference Δgt, a rotation matrix R′(u(Δϕt)) that is a diagonal matrix for a difference Δθt of the difference Δgt, and a rotation matrix R′(u(Δψt)) that is a diagonal matrix for a difference Δψt of the difference Δgt. The difference Δgt is a minute rotational angle.


It is to be noted that the difference Δθt is a difference between the angle θt and the angle θt-1, that is, a difference Δθtt−θt-1. Similarly, the difference Δθt is a difference between the angle θt and the angle Δθt-1, and the difference Δψt is a difference between the angle ψt and the angle Δψt-1.


<Configuration Example of Audio Processor>

Herein, description is given of an audio processor to which the present technology described above is applied. FIG. 11 is a diagram illustrating a configuration example of an embodiment of the audio processor to which the present technology is applied.


An audio processor 11 illustrated in FIG. 11 is a signal processing device that is incorporated in, for example, headphones or the like, and receives the input signal D′nm(ω) of the spherical harmonic domain, which is an acoustic signal of sound to be reproduced, and outputs drive signals of two-channel sound of a time domain. It is to be noted that, although description is given of a case where the audio processor 11 is incorporated in the headphones, the audio processor 11 may be incorporated in any other device different from the headphones, or may be any other device different from the headphones or the like.


The audio processor 11 includes a head rotation sensor unit 21, a previous direction holding unit 22, a rotation matrix operation unit 23, a rotation operation unit 24, a rotation coefficient holding unit 25, a head-related transfer function holding unit 26, a head-related transfer function synthesis unit 27, and a time frequency inverse transformation unit 28.


The head rotation sensor unit 21 includes, for example, an acceleration sensor, an image sensor, and the like attached to the head of the listener (user) as necessary. The head rotation sensor unit 21 detects rotation (movement) of the head of the listener, and supplies a detection result to the rotation matrix operation unit 23.


It is to be noted that the listener herein refers to a user who wears headphones, that is, a user who listens to sound reproduced by headphones on the basis of drive signals of left and right headphones obtained by the time frequency inverse transformation unit 28.


In the head rotation sensor unit 21, the angle ϕt, the angle θt, and the angle ψt at the time t that is the current time are obtained as a result of detecting the rotation of the head of the listener, that is, a direction in which the head of the listener is directed. Hereinafter, information that includes the angle ϕt, the angle θt, and the angle ψt and indicates the direction (rotation) of the head of the listener is also referred to as head rotation information. The direction at a certain time t indicated by the head rotation information is the angle gt corresponding to the direction g described above, and is angle information that indicates the direction of the head with reference to the x-axis direction, for example.


The previous direction holding unit 22 holds angles at each time supplied from the rotation matrix operation unit 23 as previous direction information, and supplies the previous direction information held at a time subsequent to the time to the rotation matrix operation unit 23. Accordingly, for example, in a case where the head rotational information at the time t is supplied from the head rotation sensor unit 21 to the rotation matrix operation unit 23, the angle gt-1 at the time t−1 is supplied as the previous direction information from the previous direction holding unit 22 to the rotation matrix operation unit 23.


The rotation matrix operation unit 23 holds a table indicating the rotation matrix R′(u(ϕ)) at each angle ϕ and a table indicating the rotation matrix R′(a(θ)) at each angle θ. It is to be noted that the table indicating the rotation matrix R′(u(ϕ)) is also used to determine the rotation matrix R′(u(ψ)). That is, a common table is used for the rotation matrix R′(u(ϕ)) and the rotation matrix R′(u(ψ)).


The rotation matrix operation unit 23 determines and outputs the rotation matrix R′(u(Δϕt)), the rotation matrix R′(a(Δθt)), and the rotation matrix R′(u(Δψt)) on the basis of the held tables, the head rotational information supplied from the head rotation sensor unit 21, and the previous direction information supplied from the previous direction holding unit 22. The rotation matrix operation unit 23 supplies the rotation matrix R′(u(Δϕt)), the rotation matrix R′(a(Δθt)), and the rotation matrix R′(u(Δψt)) to the rotation operation unit 24.


The rotation matrix R′(Δgt) that is a synthesis of the rotation matrix R′(u(Δϕt)), the rotation matrix R′(a(Δθt)), and the rotation matrix R′(u(Δψt)) is a rotation matrix for performing rotation by an angle of a difference (the difference Δgt) between rotation gt of the head of the listener at the time t and rotation gt-1 of the head of the listener at the time (t−1).


It is to be noted that, for the rotation matrix R′(u(Δϕt)), the rotation matrix R′(a(Δθt)), and the rotation matrix R′(u(Δψt)), the rotation matrix operation unit 23 may determine, without using the tables, the rotation matrix R′(u(Δϕt)), the rotation matrix R′(a(Δθt)), and the rotation matrix R′(u(Δψt)) by an operation on the basis of the difference Δϕt, the difference Δθt, and the difference Δψt. In addition, the table of rotation matrix R′(a(Δθt)) may indicate the rotation matrix Rs′(a(Δθt)) that is an approximation of the rotation matrix R′(a(Δθt)), or the rotation matrix Rs′(a(Δθt)) may be determined not from the tables but by an operation.


In addition, the rotation matrix operation unit 23 supplies head rotation information gt supplied from the head rotation sensor unit 21 as previous direction information to the previous direction holding unit 22 and causes the previous direction holding unit 22 to hold the head rotation information gt.


The rotation operation unit 24 calculates the row vector H′(gt−1, ω) and supplies the row vector H′(gt−1, ω) to the rotation coefficient holding unit 25 and the head-related transfer function synthesis unit 27.


Herein, the row vector H′(gt−1, ω) is a row vector obtained by performing a rotation operation that causes the head-related transfer function in the spherical harmonic domain, that is, the row vector Hs(ω) to be rotated by the angle gt on the basis of the rotation matrix R′(gt) at the time t.


Actually, the rotation operation unit 24 calculates the row vector H′(gt−1, ω) at the time t on the basis of the rotation matrix R′(Δgt) supplied from the rotation matrix operation unit 23 and a row vector H′(gt-1−1, ω) at the time (t−1) supplied from the rotation coefficient holding unit 25.


Such an operation is a rotation operation in which an operation result of a rotation operation at the time (t−1) is further rotated by an angle indicated by the difference Δgt. The operation result is the rotated head-related transfer function obtained by a rotation operation that causes the row vector Hs(ω) to be rotated by the angle gt-1.


Further, the rotation operation on the basis of the rotation matrix R′(Δgt) is a matrix operation in which only elements having the order k within a range determined by the predetermined value C in the rotation matrix R′(Δgt) are calculated, that is, an operation limited by the order k is performed. Accordingly, it can be said that the rotation matrix R′(Δgt) is a rotation matrix in which only elements having the order k within the range determined by the predetermined value C are elements having a non-zero effective value, that is, a rotation limited by the order k


It is to be noted that the rotation operation unit 24 calculates the row vector H′(gt−1, ω) on the basis of the row vector Hs(ω) of the head-related transfer function supplied from the head-related transfer function holding unit 26 and the rotation matrix R′(Δgt) supplied from the rotation matrix operation unit 23 at the start of processing, that is, in the absence of the row vector H′(gt-1−1, ω). In this case, the angle gt-1 is 0 degrees; therefore, the rotation matrix R′(Δgt) is equivalent to the rotation matrix R′(gt).


The rotation coefficient holding unit 25 holds the row vector H′(gt−1, ω) at the time t supplied from the rotation operation unit 24, and supplies the row vector H′(gt−1, ω) held at a subsequent time (t+1) to the rotation operation unit 24.


The head-related transfer function holding unit 26 holds the predetermined row vector Hs(ω) or the row vector Hs(ω) supplied from outside, and supplies the held row vector Hs(ω) to the rotation operation unit 24. It is to be noted that the row vector Hs(ω) may be prepared for each listener (user), or the common row vector Hs(ω) may be prepared for all listeners or a plurality of listeners included in one group.


Herein, the row vector H′(g−1, ω) is a matrix obtained by rotating the row vector Hs(ω) including the head-related transfer function in the spherical harmonic domain by the rotation matrix W(g−1), that is, a matrix including the head-related transfer function after rotation. In other words, the row vector H′(g−1, ω) is a matrix (vector) including, as an element, a head-related transfer function rotated by angles determined by the direction of the head of the listener in the spherical harmonic domain, that is, by the angle ϕ in the horizontal direction, the angle θ in the elevation angle direction, and the angle ψ in the horizontal direction.


It is to be noted that, herein, description has been given of an example in which the head-related transfer function is rotated in all directions of the angle θ, the angle ϕ, and the angle ψ by a difference between rotation at the time t and rotation at the time (t−1) with use of the row vector H′(gt-1−1, ω) that is an operation result at the time (t−1). However, this is not limitative, and a result of the rotation operation of the head-related transfer function at the time (t−1) may be further rotated in a direction (a rotation direction) of at least one of the angle θ, the angle ϕ, or the angle ψ by a difference between the angle at the time t and the angle at the time (t−1).


The head-related transfer function synthesis unit 27 synthesizes the input signal D′nm(ω) for each of the time frequency bins w supplied from outside and the row vector H′(gt−1, ω) supplied from the rotation operation unit 24 to generate the drive signals of the left and right headphones. The input signal D′nm(ω) is a sound signal of the spherical harmonic domain.


That is, the head-related transfer function synthesis unit 27 calculates the drive signal Pl(g, ω) and the drive signal Pr(g, ω) of the left and right headphones by determining the product of the row vector H′(gt−1, ω) and the matrix D′(ω) including the input signal D′nm(ω) for each of the left and right headphones, and supplies the drive signal Pl(g, ω) and the drive signal Pr(g, ω) to the time frequency inverse transformation unit 28. The input signal D′nm(ω) is the sound signal of the spherical harmonic domain.


Herein, the drive signal Pl(g, ω) is a drive signal (binaural signal) of the left headphone in the time frequency domain and the drive signal Pr(g, ω) is a drive signal (binaural signal) of the right headphone in the time frequency domain.


In the head-related transfer function synthesis unit 27, synthesis of the head-related transfer function on the input signal and spherical harmonic inverse transformation on the input signal are performed simultaneously.


The time frequency inverse transformation unit 28 performs time frequency inverse transformation on the drive signals in the time frequency domain supplied from the head-related transfer function synthesis unit 27 for the respective left and right headphones to determine the drive signal pl(g, t) of the left headphone in the time domain and the drive signal pr(g, t) of the right headphone in the time domain, and outputs these drive signals to a subsequent stage. In a reproduction device that reproduces sound by two channels or a plurality of channels, such as headphones in the subsequent stage, more specifically, headphones including earphones and speakers using transaural technology, sound is reproduced on the basis of the drive signals outputted from the time frequency inverse transformation unit 28. It is to be noted that, in a case where a signal to be inputted is not subjected to time frequency transformation, a time frequency transformation unit is provided in an signal input portion, that is, in a previous stage of the head-related transfer function synthesis unit 27, for example, or a convolution operation in the time domain is performed in the head-related transfer function synthesis unit 27.


Herein, processing in the respective components of the audio processor 11 is described in detail.


For example, the rotation matrix operation unit 23 determines the head rotational information at the time t, that is, the difference Δgt=gtgt-1−1 between the angle gt at the time t and the angle gt-1 at the time (t−1). Then, the rotation matrix operation unit 23 determines the difference Δθt, the difference Δϕt, and the difference Δψt from the difference Δgt, reads, from the tables of the held rotation matrix R′(a(θ)) and the held rotation matrix R′(u(ϕ)), the rotation matrix R′(a(θ)) in a case where the angle θ is the difference Δθt, and the rotation matrix R′(u(ϕ)) in a case where the angle ϕ is the difference Δϕt and the difference Δψt, and sets the rotation matrices as the rotation matrix R′(a(Δθt)), the rotation matrix R′(u(Δϕt)), and the rotation matrix R(u(Δψt)).


Further, the rotation matrix operation unit 23 performs an operation similar to the expression (29) described above, and synthesizes the thus-obtained rotation matrix R′(a(Δθt)), the thus-obtained rotation matrix R′(u(Δϕt)), and the thus-obtained rotation matrix R(u(Δψt)) to obtain the rotation matrix R′(Δgt).


For example, in a case where the difference Δθt is determined for each frame, that is, every frame of the input signal D′nm(ω), the difference Δθt is as illustrated in FIG. 12. It is to be noted that, in FIG. 12, a vertical axis represents the angle θ (elevation angle θ) at each time, and a horizontal axis represents time.


In an example illustrated in FIG. 12, a curve L11 represents the angles θ at respective times, and an enlarged portion of a region RZ11 in the curve L11 is as illustrated in a portion on a lower side of the diagram.


Herein, a period from the time t−1 to the time t is a period of one frame. Accordingly, a difference between the angle θt and the angle θt-1 is Δθt. The angle θt is the angle θ at the time t, and the angle θt-1 is the angle θ at the time t−1.


In the rotation matrix operation unit 23, the rotation matrix R′(Δgt) obtained on the basis of the difference Δgt is supplied to the rotation operation unit 24, and the angle gt at the time t is supplied to the previous direction holding unit 22 to update the previous direction information. That is, the newly supplied angle gt at the time t is held as the updated previous direction information.


In the rotation operation unit 24, the row vector H′(gt−1, ω) at the time t is calculated on the basis of the rotation matrix R′(Δgt) and the row vector H′(gt-1−1, ω) at the time (t−1).


For example, the following expression (39) is established for an optional rotation matrix g1 and an optional rotation matrix g2.





[Math. 39]






R′(g1g2)=R′(g1)R′(g2)  (39)


It can be seen from this that the following expression (40) is established, and the row vector H′(gt−1, ω) is determined by determining the product of the row vector H′(gt-1−1, ω) and the rotation matrix R′(Δgt).





[Math. 40]






H′(gt−1,ω)=H′(gt-1−1,ω)R′(Δgt)  (40)


That is, it is assumed that elements of the degree n and the order k in the row vector H′(gt−1, ω) represents H′nm(gt−1, ω), elements of the degree n and the order m in the rotation matrix R′(Δgt) represents R′(n)k,m(Δgt), and a constant that determines the effective element width for the degree n of the rotation matrix R′(Δgt) represents C. In this case, the following expression (41) is established. That is, it is possible to determine respective non-zero elements of the row vector H′(gt−1, ω) by an operation of the following expression (41).














[

Math
.




41

]














H
n







m




(


g
t

-
1


,
ω

)


=





k
=

-
n


n





H
n







k




(


g

t
-
1


-
1


,
ω

)





R

k
,
m





(
n
)





(

Δ


g
t


)










k
=

max


(


-
n

,

m
-
C


)





min


(

n
,





m
+
C


)







H
n







k




(


g

t
-
1


-
1


,
ω

)





R

k
,
m





(
n
)





(

Δ


g
t


)









(
41
)







The rotation operation unit 24 obtains the row vector H′(gt−1, ω) by calculating the expression (41). In the calculation of the expression (41), only (2C+1) elements having the order k ranging from m−C to m+C, where m is set as a center, are calculated similarly to the expression (36) described above. Note that the order k is limited to a range of −n≤k≤n. That is, the operation is performed only on elements in which the order k has a value within a range determined by C, which is the operation in which the order k is limited, and the operation amount is reduced.


It is to be noted that, in the rotation matrix operation unit 23, the rotation matrix R′(a(Δθt)) may be sequentially determined by calculation, or the rotation matrix R′(a(Δθt)) may be selected from one or a plurality of candidates prepared in advance.


Further, a method of performing an operation on the rotation matrix R′(a(Δθt)) by the time and a method of selecting the rotation matrix R′(a(Δθt)) from one or a plurality of candidates may be combined, and an angle by which the head-related transfer function is rotated by tracking the actual angle θt of rotation of the head of the listener may be adjusted while changing frequency of using these respective methods.


<Description of Drive signal Generation Processing>


Next, description is given of drive signal generation processing performed by the audio processor 11 with reference to a flow chart of FIG. 13.


In step S11, the head rotation sensor unit 21 detects rotation of the head of the user who is the listener, and supplies head rotation information obtained as a result of the detection to the rotation matrix operation unit 23.


In step S12, the rotation matrix operation unit 23 determines the difference Δgt between the angle gt of the head rotational information supplied from the head rotation sensor unit 21 and the angle gt-1 at the time (t−1) held as the previous direction information in the previous direction holding unit 22.


In addition, upon obtaining the difference Δgt, the rotation matrix operation unit 23 supplies the angle gt of the head rotational information obtained in the step S11 to the previous direction holding unit 22 to update the previous direction information. The previous direction holding unit 22 updates the previous direction information to cause the angle gt supplied from the rotation matrix operation unit 23 to become new previous direction information, and holds a thus-updated result.


In step S13, on the basis of the difference Δgt obtained in the step S12, the rotation matrix operation unit 23 determines the rotation matrix R′(a(Δθt)) in the elevation angle direction corresponding to the difference Δθt of the difference Δgt. It is to be noted that, in the step S13, the rotation matrix operation unit 23 may determine, as the rotation matrix R′(a(Δθt)), the rotation matrix Rs′(a(Δθt)) corresponding to the difference Δθt. The rotation matrix Rs′(a(Δθt)) corresponds to the rotation matrix Rs′(a(θ)).


In step S14, on the basis of the difference Δϕt and the difference Δψt in rotation of the head determined from the difference Δgt that is obtained in the step S12, the rotation matrix operation unit 23 determines the rotation matrix R′(u(Δϕt)) and the rotation matrix R′(u(Δψt)) in the horizontal direction corresponding to the differences Δϕt and Δψt.


In step S15, the rotation matrix operation unit 23 synthesizes the rotation matrix R′(a(Δθt)) in the elevation angle direction obtained in the step S13 and the rotation matrix R′(u(Δϕt)) and the rotation matrix R′(u(Δψt)) in the horizontal direction obtained in the step S14 to determine the rotation matrix R′(Δgt) for performing rotation by a difference in rotation of the entire head, and supplies the rotation matrix R′(Δgt) to the rotation operation unit 24.


In step S16, the rotation operation unit 24 performs a rotation operation on the basis of the rotation matrix R′(Δgt) supplied from the rotation matrix operation unit 23 and the row vector H′(gt-1−1, ω) held in the rotation coefficient holding unit 25.


That is, for example, in the step S16, the expression (41) described above is calculated as a rotation operation on the basis of the effective element width 2C+1 determined by the constant C to calculate the row vector H′(gt−1, ω).


The rotation operation unit 24 supplies the obtained row vector H′(gt−1, ω) to the rotation coefficient holding unit 25, and causes the rotation coefficient holding unit 25 to hold the row vector H′(gt−1, ω), and also supplies the row vector H′(gt−1, ω) to the head-related transfer function synthesis unit 27.


In step S17, the head-related transfer function synthesis unit 27 synthesizes the supplied input signal D′nm(ω) and the row vector H′(gt−1, ω) of the head-related transfer function supplied from the rotation operation unit 24 to generate drive signals of the left and right headphones.


For example, in the step S17, the product of the row vector H′(gt−1, ω) and the matrix D′(ω) is determined for each of the left and right headphones to calculate the drive signal Pl(g, ω) and the drive signal Pr(g, ω) of the left and right headphones. The head-related transfer function synthesis unit 27 supplies the obtained drive signal Pl(g, ω) and the obtained drive signal Pr(g, ω) to the time frequency inverse transformation unit 28.


In step S18, the time frequency inverse transformation unit 28 performs time frequency inverse transformation on the drive signal Pl(g, ω) and the drive signal Pr(g, ω) supplied from the head-related transfer function synthesis unit 27, and outputs, to a subsequent stage, the drive signal Pl(g, t) and the drive signal Pr(g, t) that are obtained as results of the time frequency inverse transformation, and the drive signal generation processing ends.


As described above, the audio processor 11 determines the rotation matrix R′(Δgt) on the basis of the difference Δgt, and determines the current row vector H′(gt−1, ω) on the basis of the rotation matrix R′(Δgt) and the previous row vector H′(gt-1−1, ω).


Thus, rotations by the difference Δgt, which is a minute rotation angle, are accumulated to determine the row vector H′(gt−1, ω), which makes it possible to reduce the memory amount and the operation amount that are to be used. This makes it possible to reproduce sound more efficiently. Specifically, according to the proposed technique 1 described above, it is possible to obtain the drive signals with a memory amount substantially equal to that in the fourth technique and with a smaller operation amount than that in the fourth technique.


Second Embodiment
<Configuration Example of Audio Processor>

Incidentally, in the proposed technique 1 described above, only the elements in the block having the effective element width 2C+1 determined by the constant C, that is, only the effective elements are used to perform the operation, resulting in not a few errors in the rotation matrix R′(gt), that is, in the row vector H′(gt−1, ω).


In addition, in a case where the operation causing such an error is repeatedly performed for a while, the errors are accumulated, thereby causing the row vector H′(gt−1, ω) to become a value different from an original value. That is, the error in the row vector H′(gt−1, ω) becomes large.


Accordingly, accumulation of errors may be prevented by performing an operation of determining an accurate rotation matrix R′(gt−1) at a predetermined timing and resetting the value of the rotation matrix R′(gt−1), that is, the row vector H′(gt−1, ω) (hereinafter, simply referred to as resetting). Hereinafter, a technique of performing resetting at a predetermined timing in the proposed technique 1 is also referred to as proposed technique 2.


In the proposed technique 2, an operation of an operation amount of a cube order of the degree n is necessary to determine the row vector H′(gt−1, ω) at the time of resetting, but performing the resetting less frequently makes it possible to reduce the operation amount as a whole.


In a case where the resetting is performed appropriately in such a manner, the audio processor 11 is configured as illustrated in FIG. 14. It is to be noted that, in FIG. 14, components corresponding to those in FIG. 11 are denoted by the same reference numerals, and description thereof is omitted as appropriate.


The audio processor 11 illustrated in FIG. 14 includes the head rotation sensor unit 21, the previous direction holding unit 22, the rotation matrix operation unit 23, the rotation operation unit 24, the rotation coefficient holding unit 25, the head-related transfer function holding unit 26, the head-related transfer function synthesis unit 27, and the time frequency inverse transformation unit 28.


The audio processor 11 illustrated in FIG. 14 is the same as the audio processor 11 in FIG. 11 in including components from the head rotation sensor unit 21 to the time frequency inverse transformation unit 28, but differs from the audio processor 11 in FIG. 11 in that a reset trigger that is a signal indicating a timing of the resetting is supplied to the rotation matrix operation unit 23 and the rotation operation unit 24.


In a case where the reset trigger is not supplied, that is, in a case where the reset trigger is off, the rotation matrix operation unit 23 determines the rotation matrix R(Δgt) on the basis of the angle gt of the head rotational information and the angle gt-1 as the previous direction information, and supplies the rotation matrix R(Δgt) to the rotation operation unit 24.


In contrast, in a case where the reset trigger is supplied, that is, in a case where the reset trigger is on, the rotation matrix operation unit 23 determines the rotation matrix R′(gt) on the basis of the angle gt of the head rotation information and supplies the rotation matrix R′(gt) to the rotation operation unit 24. That is, the resetting is performed to determine the accurate rotation matrix R′(gt). In other words, a rotation matrix determined by a difference such as rotation matrix R′(Δgt) is not determined, but the absolute the rotation matrix R′(gt) is determined.


In addition, in the case where the reset trigger is off, the rotation operation unit 24 calculates the row vector H′(gt−1, ω) on the basis of the rotation matrix R′(Δgt) supplied from the rotation matrix operation unit 23 and the row vector H′(gt-1−1, ω) held in the rotation coefficient holding unit 25.


In contrast, in the case where the reset trigger is on, the rotation operation unit 24 calculates the row vector H′(gt−1, ω) on the basis of the rotation matrix R′(gt) supplied from the rotation matrix operation unit 23 and the row vector Hs(ω) of the head-related transfer function held in the head-related transfer function holding unit 26.


In this case, in the rotation operation unit 24, the row vector H′(gt−1, ω) is calculated by performing calculation similar to that in the expression (35) or the expression (36) described above. That is, the product of the rotation matrix R′(gt) and the row vector Hs(ω) is determined to calculate the row vector H′(gt−1, ω).


In such a manner, the resetting is performed in response to input of the reset trigger to determine the accurate rotation matrix R′(gt) and the row vector H′(gt−1, ω), which makes it possible to obtain a drive signal having a small error while keeping the necessary memory amount the necessary operation amount low.


It is to be noted that, herein, description is given of an example in which the reset trigger is turned on or off at an optional timing, but the reset trigger may be turned on at all time. That is, the rotation matrix R′(gt) may be calculated at all times.


In addition, the reset trigger may be turned on at any timing. For example, the timing at which the reset trigger is turned on may be a predetermined regular (periodic) timing such as a predetermined time interval, a timing at which the difference Δθt becomes equal to or greater than a threshold value, or a timing at which the angle θt becomes equal to or greater than a predetermined value.


<Description of Drive Signal Generation Processing>

Next, description is given of drive signal generation processing to be performed by the audio processor 11 in FIG. 14 with reference to a flow chart in FIG. 15.


It is to be noted that a process in step S51 is the same as that in the step S11 in FIG. 13, and the description thereof is omitted.


In step S52, the rotation matrix operation unit 23 determines whether or not to perform the resetting on the basis of the reset trigger supplied from outside. For example, in a case where the reset trigger is turned on, it is determined to perform resetting.


In a case where it is determined not to perform the resetting in the step S52, the processing proceeds to step S53, and processes in steps S53 to S57 are performed.


It is to be noted that the processes in the steps S53 to S57 are the same as those in the steps S12 to S16 in FIG. 13, and the description thereof is omitted.


The process in the step S57 is performed, and then the rotation operation unit 24 supplies the obtained row vector H′(gt−1, ω) to the head-related transfer function synthesis unit 27 and the rotation coefficient holding unit 25, and thereafter, the processing proceeds to step S60.


In contrast, in a case where it is determined to perform the resetting in the step S52, in step S58, the rotation matrix operation unit 23 determines the rotation matrix R′(a(θt)) in the elevation angle direction, and the rotation matrix R′(u(ϕt)) and the rotation matrix R′(u(ψt)) in the horizontal direction on the basis of the angle gt of the head rotation information supplied from the head rotation sensor unit 21.


Further, the rotation matrix operation unit 23 synthesizes the rotation matrix R′(a(θt)), the rotation matrix R′(u(ϕt)), and the rotation matrix R′(u(ψt)) to determine the rotation matrix R′(gt), and supplies the rotation matrix R′(gt) to the rotation operation unit 24. It is to be noted that, in the step S58, the rotation matrix R′(a(θt)) may be obtained from the table on the basis of the angle θt, or the rotation matrix R′(a(θt)) may be obtained by an operation on the basis of the angle θt. Similarly, the rotation matrix R′(u(ϕt)) and the rotation matrix R′(u(ψt)) may be determined by an operation on the basis of the angle ϕt and the angle ψt, or the rotation matrix R′(u(ϕt)) and the rotation matrix R′(u(ψt)) may be obtained from the table on the basis of the angle ϕt and the angle ψt.


In step S59, the rotation operation unit 24 performs a rotation operation on the basis of the rotation matrix R′(gt) supplied from the rotation matrix operation unit 23 and the row vector Hs(ω) of the head-related transfer function held in the head-related transfer function holding unit 26 to calculate the row vector H′(gt−1, ω). For example, in the step S59, the row vector H′(gt−1, ω) is calculated by performing calculation similar to that in the expression (35) or the expression (36) described above.


The row vector H′(gt−1, ω) is obtained, and then the rotation operation unit 24 supplies the obtained row vector H′(gt−1, ω) to the head-related transfer function synthesis unit 27 and the rotation coefficient holding unit 25, and thereafter, the processing proceeds to step S60.


After the process in the step S57 or the step S59 is performed, processes in step S60 and step S61 are performed, and the drive signal generation processing ends; however, these processes are the same as those in the step S17 and the step S18 in FIG. 13, and the description thereof is omitted.


As described above, in the case where the reset trigger is turned on, the audio processor 11 determines the accurate rotation matrix R′(gt) and the accurate row vector H′(gt−1, ω) to generate the drive signal. Doing so makes it possible to obtain a drive signal having a small error while keeping the necessary memory amount and the necessary operation amount low.


It is to be noted that, for example, in a case where the head of the listener is abruptly and largely rotated to the elevation angle direction, the difference Δθt abruptly increases. Accordingly, in a case where the row vector H′(gt−1, ω) is intended to be determined by tracking rotation of the head of the listener, if the row vector H′(gt−1, ω) is intended to be determined accurately, the operation amount increases, and if the row vector H′(gt−1, ω) is intended to be determined with a small operation amount, the error increases.


In such a case, for example, in a case where it is desired to keep the operation amount low, if the actual difference Δθt becomes equal to or greater than a predetermined threshold value such as 30 degrees or greater, the value of the difference Δθt may be limited to a value equal to or less than one degree regardless of the actual value of the difference Δθt, and the rotation matrix operation unit 23 may determine the rotation matrix R′(a(Δθt)).


Doing so makes it possible to keep the operation amount low, though it is not possible to perform tracking perfectly until the actual difference Δθt becomes less than the threshold value, thereby causing an error in the rotation matrix R′(a(Δθt)). It is to be noted that it is possible to perform such processing independently of turning the reset trigger on or off.


Further, for example, in a case where the actual difference Δθt becomes equal to or greater than a predetermined threshold value such as 30 degrees or greater, the rotation matrix operation unit 23 may determine the rotation matrix R′(a(θt)) only for the elements in the block having the effective element width 2C+1 determined for C, which is used as a predetermined value, that determines the effective element width 2C+1, that is, only for the effective elements. In this case, in the rotation operation unit 24, calculation of the expression (36) is performed only for the effective elements determined for C to determine the row vector H′(gt−1, ω).


In this example, the operation amount is increased because of use of the rotation matrix R′(gt), but only an operation only for the effective elements determined by C that determines the effective element width 2C+1 is sufficient, which makes it possible to keep the operation amount low to some extent while tracking rotation of the head of the listener. It is also possible to perform such processing independently of turning the reset trigger on or off.


Further, for example, in a case where the actual difference Δθt becomes equal to or greater than a predetermined threshold value, such as 30 degrees or greater, the rotation matrix R′(a(Δθt)) is determined, and the row vector H′(gt−1, ω) is determined from the rotation matrix R′(Δgt) and the row vector H′(gt-1−1, ω) determined by the rotation matrix R′(a(Δθt)), but at this time, the rotation operation unit 24 may temporarily increase C determining the effective element width 2C+1 to a value greater than a normal value. Herein, the value of C may be a constant, or may be determined by the degree n, the difference Δθt, or the like.


Doing so makes it possible to perform tracking of rotation of the head of the listener, though the operation amount in a case where the row vector H′(gt−1, ω) is determined is increased. Even in this case, it is possible to perform processing of changing the value of C independently of turning the reset trigger on or off.


In addition, the resetting may be performed, for example, in a case where the angle θt of the head rotation information becomes a predetermined value (hereinafter, also referred to as reset point).


Specifically, for example, the rotation matrix operation unit 23 holds the rotation matrix R′(a(θ)) determined in advance for the angle θ, which is the reset point, for every reset point or every plurality of reset points. For example, it is assumed that the rotation matrix R′(a(θ1)) is held in advance for an angle θ1 determined as the reset point.


In this case, for example, in a case where the angle θt is the angle θ1, the rotation matrix operation unit 23 determines the rotation matrix R′(gt) with use of the held rotation matrix R′(a(θ1)) as the rotation matrix R′(a(θt)), and supplies the rotation matrix R′(gt) to the rotation operation unit 24. By doing so, a memory is necessary to hold the rotation matrix R′(a(θ)) for each reset point, but it is not necessary to perform an operation of the rotation matrix R′(a(θt)), which makes it possible to keep the operation amount low while performing the resetting to obtain the accurate rotation matrix R′(a(θt)).


Modification Example 1 of Second Embodiment
<About Resetting Control by Plurality of Devices>

In addition, for example, it is assumed that a plurality of listeners exists in a space, and as illustrated in FIG. 16, there is a control system in which each of a plurality of audio processors outputs a drive signal to each of the headphones and the like worn by each of the listeners.


The control system illustrated in FIG. 16 includes audio processors 71-1 to 71-4 and a switch 72.


Each of the audio processors 71-1 to 71-4 has the same configuration as that of the audio processor 11 illustrated in FIG. 14. Hereinafter, in a case where it is not necessary to particularly distinguish the audio processors 71-1 to 71-4, the audio processors 71-1 to 71-4 are also simply referred to as audio processors 71.


Each of the audio processor 71 receives the input signal D′nm(ω), performs processing similar to the drive signal generation processing described with reference to FIG. 15, and outputs the drive signal kg, t) and the drive signal pr(g, t) of the left and right headphones.


It is to be noted that each of the audio processors 71 may be one independent device, or these audio processors 71 may be provided in one device, but it is assumed herein that the respective audio processors 71 are provided in one computing system (device) located in a middle.


The switch 72 controls supply of the reset trigger to the audio processors 71 to supply the reset trigger to one audio processor 71 of the audio processors 71-1 to 71-4 at an optional timing.


In such a control system, each of the plurality of listeners wears headphones, and each of the headphones reproduces sound on the basis of the drive signal supplied from each of the audio processors 71 that are different from each other.


Then, each of the audio processors 71 detects movement (rotation) of the headphones to which the drive signal is to be outputted, that is, movement (rotation) of the head of the listener wearing the headphones in the head rotation sensor unit 21, and rotates the head-related transfer function by tracking the movement of the head of the listener, and generates the drive signal.


In the control system, the reset trigger is supplied to each of the four audio processors 71 by the switch 72 at different timings; therefore, the resetting is not performed simultaneously on the audio processors 71. This makes it possible to suppress a sudden increase in an operation load in the entire control system. That is, it is possible to prevent a temporary increase in the operation amount.


In the control system, in a case where the resetting is performed simultaneously on four audio processors 71, the operation amount in the entire control system temporarily becomes large (increases) at a time at which the resetting is performed, as indicated by an arrow Q11 in FIG. 17, for example.


It is to be noted that, in FIG. 17, a vertical axis represents the operation amount in the entire control system, and a horizontal axis represents time.


For example, in an example indicated by the arrow Q11, the resetting is performed simultaneously on the four audio processors 71 in the control system at predetermined intervals. For example, the resetting is performed at a time 11, and the operation amount becomes large (increases) at the time t11, but the operation amount is kept low at other times at which the resetting is not performed.


In this case, although the operation amount increases with a low frequency, the operation load on the control system temporarily increases at the time of the resetting.


In contrast, in the example indicated by the arrow Q12, for example, the resetting is not performed simultaneously on the plurality of audio processors 71, but the resetting is performed on the respective audio processors 71 at different timings. The operation amount increases with a higher frequency, but the operation amount at each time does not become so large. That is, although the operation amount rises at the time of the resetting, an increase in the operation amount at that time is only an amount corresponding to the resetting on one audio processor 71; therefore, the operation load to be applied is not as large as the operation load applied in a case where the resetting is performed simultaneously on the plurality of audio processors 71.


For example, at a time t12, the resetting is performed on one audio processor 71, but the operation amount is kept low, as compared with that at the time t11 in the example indicated by the arrow Q11.


It is to be noted that, although description has been given of an example in which the resetting is performed on one audio processor 71 at a time, it is possible to suppress the operation load unless the resetting is performed simultaneously on all the audio processors 71. For example, all the audio processors 71 may be divided into a plurality of groups including one or a plurality of audio processors 71, and the resetting may be performed on each of the groups.


As described above, in a case where there is a plurality of audio processors 71, the resetting is performed on the respective audio processors 71 at timing different from each other, which makes it possible to suppress a temporary increase in the operation amount.


Modification Example 2 of Second Embodiment
<About Resetting for Each Degree or Each Order>

In addition, the resetting may be performed for each degree n or for each order m regardless of the example of the audio processor 11 illustrated in FIG. 14 and the example of the control system illustrated in FIG. 16, that is, regardless of whether one or a plurality of listeners exist. Doing so makes it possible to suppress an increase in the operation load at the time of the resetting.


For example, as illustrated in FIG. 18, it is assumed that the row vector H′(gt−1, ω) includes a matrix H0(ω) including elements of the degree n=0, a matrix H1(ω) including elements of the degree n=1, a matrix H2(ω) including elements of the degree n=2, and a matrix H3(ω) including elements of the degree n=3.


In such a case, for example, the resetting may be performed only on a component of a predetermined degree for the degree n. At this time, the resetting may be performed on components of the respective degrees at different timings, or the resetting may be performed simultaneously on components of some of the degrees.


For example, in a case where the resetting is performed only on a zeroth-order component of the degree n, that is, on a component of the degree n=0, the product of the rotation matrix R′(gt) and the row vector Hs(ω) for the zeroth-order component is determined to generate the matrix H0(ω).


In contrast, for first to third-order components of the degree n, the product of the rotation matrix R′(Δgt) and the row vector H′(gt-1−1, ω) is determined, that is, calculation of the expression (41) is performed to generate the matrix H1(ω), the matrix H2(ω), and the matrix H3(ω).


Then, the final row vector H′(gt−1, ω) is obtained from the matrix H0(ω), the matrix H1(ω), the matrix H2(ω), and the matrix H3(ω) that are thus obtained.


Accordingly, for example, in the audio processor 11 illustrated in FIG. 14, at a timing at which the resetting is performed only on the zeroth-order component of the degree n, the process in the step S58 in FIG. 15 and the process in the step S59 in FIG. 15 are performed on the zeroth-order component to generate the matrix H0(ω). In contrast, for the first to third-order components of the degree n, the processes in the steps S53 to S57 are performed to generate the matrix H1(ω), the matrix H2(ω), and the matrix H3(ω). Then, the row vector H′(gt−1, ω) is generated from the matrix H0(ω), the matrix H1(ω), the matrix H2(ω), and the matrix H3(ω).


It is to be noted that, even in a case where the resetting is performed for each degree, for example, some groups such as a group including zeroth and first orders of the degree n may be provided, and the resetting may be performed for each of the group.


For example, in an example illustrated in FIG. 18, the number of elements is small from the zeroth to second orders of the degree n; therefore, the zeroth to second orders of the degree n may be set as one group, and the resetting may be performed simultaneously on the zeroth-order, first-order, and second-order components of the degree n. In this case, a timing at which the resetting is performed on the zeroth-order, first-order, and second-order components of the degree n is different from a timing at which the resetting is performed on the third-order component of the degree n.


It is to be noted that, although description has been given of a case where the resetting is performed for each degree n as a specific example, a case where the resetting is performed for each order m is also the same as the case where the resetting is performed for each degree n.


Modification Example 3 of Second Embodiment
<About Resetting for Each Time Frequency>

In addition, the resetting may be performed for each time frequency ω regardless of the example of the audio processor 11 illustrated in FIG. 14 and the example of the control system illustrated in FIG. 16, that is, regardless of whether one or a plurality of listeners exists. Doing so makes it possible to suppress an increase in the operation load at the time of the resetting.


For example, as illustrated in FIG. 19, it is assumed that the number of time frequency bins ω is W, and the row vector H′(gt−1, ω) is determined for W number of time frequencies ω1 to ωW. That is, row vectors H′(gt−1, ω1) to H′(gt−1, ωW) are obtained.


In such a case, for example, resetting may be performed only for a predetermined time frequency ω. At this time, the resetting may be performed at different timings for the respective time frequencies ω, or the resetting may be performed simultaneously for some time frequencies ω.


For example, in the audio processor 11 illustrated in FIG. 14, at a timing at which the resetting is performed only for the time frequency ω1, the processes in the steps S58 and S59 in FIG. 15 are performed for the time frequency ω1 to generate the row vector H′(gt−1, ω1).


In contrast, for the time frequencies ω2 to t ωW, the processes in the steps S53 to S57 in FIG. 15 are performed to generate the row vectors H′(gt−1, ω2) to H′(gt−1, ωW).


It is to be noted that, even in a case where the resetting is performed for each time frequency ω, some groups such as a group including one or a plurality of time frequencies w may be provided, and the resetting may be performed for each of the groups.


Modification Example 4 of Second Embodiment
<Another Example of Control System>

Further, in the control system illustrated in FIG. 16, it is assumed that the audio processors 71 corresponding to a plurality of listeners are operated by one computing system located in the middle.


However, it is difficult to previously determine performance of the computing system in the middle in a case where the number of listeners changes dynamically.


Accordingly, in a case where a system (slave) for each listener, such as a smartphone, may independently perform processing of generating a drive signal for each listener, and the slave does not have sufficient processing performance to perform the resetting described above, a device (master) in the middle to which the slave is coupled may perform a portion or the entirety of an operation at the time of the resetting.


In such a case, the control system is configured, for example, as illustrated in FIG. 20.


The control system illustrated in FIG. 20 includes a master device 101 and slave 102-1 to slave 102-9.


In this example, the master device 101 and each of the slaves 102-1 to 102-9 are coupled to each other through a wired or wireless network. It is to be noted that, hereinafter, in a case where it is not necessary to particularly distinguish the slaves 102-1 to 102-9, the slaves 102-1 to 102-9 are also simply referred to as slaves 102.


Instead of the slaves 102, the master device 101 performs a portion of an operation (processing) originally performed in the slaves 102, and supplies a result of the operation result to the slaves 102.


The slaves 102 each include, for example, headphones, a smartphone, or the like, and correspond to the audio processor 11 illustrated in FIG. 14. The slaves 102 each perform the drive signal generation processing described with reference to FIG. 15 in accordance with rotation of the head of the listener, and outputs the drive signal, but requests the master device 101 to perform a portion of an operation of the drive signal generation processing, such as an operation at the time of the resetting.


In a specific example, it is possible for the master device 101 to perform the operation at the time of the resetting, for example.


In this case, the slave 102 transmits, to the master device 101, an operation request for calculation of the row vector H′(gt−1, ω) together with the angle gt or the rotation matrix R′(gt).


Then, the master device 101 that has received the operation request from the slave 102 and the angle gt or the rotation matrix R′(gt) performs an operation of the following expression (42) in response to the operation request, and transmits the resultant row vector H′(gt−1, ω) to the slave 102.





[Math. 42]






H′(gt−1,ω)=Hs(ω)R′(gt)  (42)


It is to be noted that the row vector Hs(ω) to be used in the operation of the expression (42) may be obtained from the slave 102 by the master device 101 in advance, or may be held in the master device 101 in advance.


In such a manner, it is possible for the slave 102 to obtain a drive signal of sound to be presented to the listener with a small operation amount with use of the row vector H′(gt−1, ω) received from the master device 101.


It is to be noted that, as described above, the resetting may be performed for each listener, each degree n, each order m, each time frequency ω, or the like, and appropriately determining the timing of the resetting makes it possible to reduce the operation load on the master device 101. For example, performing the resetting for the respective slaves 102 at different timings makes it possible to reduce the operation load on the master device 101.


Modification Example 5 of Second Embodiment
<Another Example of Control System>

Contrary to the case of the modification example 4 of the second embodiment, the slave 102 may perform the operation at the time of the resetting.


In such a case, the master device 101 sequentially receives the angle gt, the rotation matrix R′(Δgt), or the like from the slave 102, performs an operation expressed by the following expression (43), and calculates the row vector H′(gt−1, ω).









[

Math
.




43

]













H
n







m




(


g
t

-
1


,
ω

)


=





k
=

max


(


-
n

,

m
-
C


)





min


(

n
,





m
+
C


)







H
n







k




(


g

t
-
1


-
1


,
ω

)





R

k
,
m





(
n
)





(

Δ


g
t


)








(
43
)







It is to be noted that the master device 101 may perform operations up to the calculation of the row vector H′(gt−1, ω) and the slave 201 may perform the remaining operations up to obtaining of the drive signal, or the master device 101 may calculate the drive signal with use of the row vector H′(gt−1, ω) and supply the drive signal to the slave 102.


In addition, at the time of the resetting, the slave 102 performs the operation of the expression (42) described above, and the resultant row vector H′(gt−1, ω) is transmitted from the slave 102 to the master device 101. This makes it possible for the master device 101 to hold the row vector H′(gt−1, ω) received from the slave 102 and use the row vector H′(gt−1, ω) for the operation of the expression (43) to be performed next time.


The slave 102 performs the operation at the time of the resetting in such a manner, which makes it possible for the master device 101 to update the row vector H′(gt−1, ω) that is normally calculated on the basis of a difference to the more accurate row vector H′(gt−1, ω), and reset an error.


It is to be noted that the row vector Hs(ω) necessary for the operation at the time of the resetting may be obtained from the master device 101 by the slave 102 in advance, may be held in the slave 102 in advance, or may be held in both the master device 101 and the slave 102 in advance.


In addition, in a case where only one device of the master device 101 and the slave 102 holds the row vector Hs(ω) or the like, the row vector Hs(ω) or the like held by the one device may be transmitted to the other device at an optional timing such as the time of coupling or the time of initialization.


Further, even in this embodiment, the resetting may be performed for each listener, each degree n, each order m, each time frequency ω, or the like, and it is possible to appropriately determine the timing of the resetting. However, in a case where the operation at the time of the resetting is performed by the slave 102, the operation at the time of the resetting for a plurality of listeners is not performed simultaneously in one slave 102; therefore, it is not necessary to disperse the timing of the resetting.


In addition, the master device 101 and the slave 102 may share the drive signal generation processing described with reference to FIG. 13 and FIG. 15. That is, the master device 101 has some of functions to perform the drive signal generation processing, which makes it possible to flexibly cope with a case where the number of listeners increases dynamically, and the like.


<Configuration Example of Computer>

Incidentally, it is possible to execute the series of processing described above by hardware or software. In a case where the series of processing is executed by the software, a program including that software is installed in a computer. Herein, the computer includes a computer incorporated into dedicated hardware and, for example, a general-purpose computer capable of executing various functions by being installed with various programs.



FIG. 21 is a block diagram illustrating a configuration example of the hardware of the computer that executes the series of processing described above by a program.


In the computer, a CPU (Central Processing Unit) 501, a ROM (Read Only Memory) 502, and a RAM (Random Access Memory) 503 are coupled to each other by a bus 504.


A input/output interface 505 is further coupled to the bus 504. An input unit 506, an output unit 507, a recording unit 508, a communication unit 509, and a drive 510 are coupled to the input/output interface 505.


The input unit 506 includes a keyboard, a mouse, a microphone, an imaging element, and the like. The output unit 507 includes a display, a speaker, and the like. The recording unit 508 includes a hard disk, a nonvolatile memory, and the like. The communication unit 509 includes a network interface and the like. The drive 510 drives a removable recording medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.


In the computer configured as described above, the CPU 501 loads, for example, a program recorded on the recording unit 508 into the RAM 503 through the input/output interface 505 and the bus 504, and executes the program, thereby performing the series of processing described above.


It is possible to record the program to be executed by the computer (the CPU 501), for example, in the removable recording medium 511 as a package medium or the like and provide the program. Moreover, it is possible to provide the program through a wired or wireless transmission medium such as a local area network, the Internet, digital satellite broadcasting, or the like.


In the computer, it is possible to install the program in the recording unit 508 through the input/output interface 505 by mounting the removable recording medium 511 to the drive 510. Further, it is possible to receive the program by the communication unit 509 through the wired or wireless transmission medium and install the program on the recording unit 508. In addition, it is possible to install the program on the ROM 502 or the recording unit 508 in advance.


It is to be noted that the program to be executed by the computer may be a program in which processing is performed in time series in sequence described in this specification, or may be a program in which processing are performed in parallel or at a necessary timing such as when a timing at which calling is performed.


Moreover, embodiments of the present technology are not limited to the foregoing embodiments, and may be modified in variety of ways in a scope without departing from the gist of the present technology.


For example, it is possible for the present technology to adopt a configuration of cloud computing in which one function is shared and collaboratively processed by a plurality of devices through a network.


It is possible to execute each of the steps described in the flow charts described above by one device or to share and execute each of the steps by a plurality of devices.


Further, in a case where a plurality of processes is included in one step, it is possible to execute the plurality of processes included in that one step by one device or to share and execute the plurality of processes by a plurality of devices.


In addition, effects described in this specification are merely illustrative and non-limiting, and other effects may be included.


Further, the present technology may have the following configurations.


(1)


A signal processing device including:


a rotation operation unit that rotates a head-related transfer function in a spherical harmonic domain by an operation on the basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; and a synthesis unit that synthesizes the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal.


(2)


The signal processing device according to (1), in which, for a rotation operation of the head-related transfer function in at least one rotation direction, the rotation operation unit performs the rotation operation at a predetermined time with use of an operation result of the rotation operation in the rotation direction at another time before the predetermined time to determine the head-related transfer function after the rotation at the predetermined time.


(3)


The signal processing device according to (2), in which the rotation operation unit performs the rotation operation in the rotation direction at the predetermined time on the basis of a rotation matrix corresponding to a difference between a rotation angle in the rotation direction of the head of the listener at the predetermined time and a rotation angle in the rotation direction of the head of the listener at the other time, and the operation result of the rotation operation in the rotation direction at the other time.


(4)


The signal processing device according to (3), in which the rotation operation unit performs the rotation operation only on an element having the order within a predetermined range as the operation in which the order is limited.


(5)


The signal processing device according to (3) or (4), in which, for an elevation angle direction as the rotation direction, the rotation operation unit performs the rotation operation in the rotation direction at the predetermined time with use of the operation result of the rotation operation in the rotation direction at the other time.


(6)


The signal processing device according to any one of (3) to (5), in which


in a case where resetting of the rotation matrix is not performed, the rotation operation unit performs the rotation operation in the rotation direction at the predetermined time with use of the operation result of the rotation operation in the rotation direction at the other time, and


in a case where the resetting of the rotation matrix is performed, the rotation operation unit performs the rotation operation in the rotation direction at the predetermined time on the basis of a rotation matrix corresponding to a rotation angle in the rotation direction of the head of the listener at the predetermined time and the head-related transfer function.


(7)


The signal processing device according to (6), in which the resetting is performed for each degree, each order, or each time frequency.


(8)


The signal processing device according to (6) or (7), in which, in a case where the headphone drive signal is generated for each of a plurality of the listeners, the resetting is performed for each of the listeners.


(9)


The signal processing device according to any one of (6) to (8), in which, in a case where the resetting is performed, the rotation operation unit performs the rotation operation with use of a rotation matrix determined in advance as the rotation matrix corresponding to the rotation angle in the rotation direction of the head of the listener at the predetermined time.


(10)


The signal processing device according to any one of (1) to (9), in which, in a case where a rotation matrix, which is included in the rotation matrix corresponding to the rotation of the head, for performing rotation to a predetermined rotation direction is represented by a sum of a plurality of matrices, the rotation operation unit performs, as the operation in which the order is limited, an operation of rotating the head-related transfer function with use of a sum of some of the plurality of the matrices as the rotation matrix for performing the rotation to the predetermined rotation direction.


(11)


A signal processing method including steps of:


rotating a head-related transfer function in a spherical harmonic domain by an operation on the basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; and


synthesizing the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal.


(12)


A program causing a computer to execute processing, the processing including steps of:


rotating a head-related transfer function in a spherical harmonic domain by an operation on the basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; and


synthesizing the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal.


REFERENCE SIGNS LIST






    • 11: audio processor


    • 21: head rotation sensor unit


    • 22: previous direction holding unit


    • 23: rotation matrix operation unit


    • 24: rotation operation unit


    • 25: rotation coefficient holding unit


    • 26: head-related transfer function holding unit


    • 27: head-related transfer function synthesis unit


    • 28: time frequency inverse transformation unit




Claims
  • 1. A signal processing device comprising: a rotation operation unit that rotates a head-related transfer function in a spherical harmonic domain by an operation on a basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; anda synthesis unit that synthesizes the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal.
  • 2. The signal processing device according to claim 1, wherein, for a rotation operation of the head-related transfer function in at least one rotation direction, the rotation operation unit performs the rotation operation at a predetermined time with use of an operation result of the rotation operation in the rotation direction at another time before the predetermined time to determine the head-related transfer function after the rotation at the predetermined time.
  • 3. The signal processing device according to claim 2, wherein the rotation operation unit performs the rotation operation in the rotation direction at the predetermined time on a basis of a rotation matrix corresponding to a difference between a rotation angle in the rotation direction of the head of the listener at the predetermined time and a rotation angle in the rotation direction of the head of the listener at the other time, and the operation result of the rotation operation in the rotation direction at the other time.
  • 4. The signal processing device according to claim 3, wherein the rotation operation unit performs the rotation operation only on an element having the order within a predetermined range as the operation in which the order is limited.
  • 5. The signal processing device according to claim 3, wherein, for an elevation angle direction as the rotation direction, the rotation operation unit performs the rotation operation in the rotation direction at the predetermined time with use of the operation result of the rotation operation in the rotation direction at the other time.
  • 6. The signal processing device according to claim 3, wherein in a case where resetting of the rotation matrix is not performed, the rotation operation unit performs the rotation operation in the rotation direction at the predetermined time with use of the operation result of the rotation operation in the rotation direction at the other time, andin a case where the resetting of the rotation matrix is performed, the rotation operation unit performs the rotation operation in the rotation direction at the predetermined time on a basis of a rotation matrix corresponding to a rotation angle in the rotation direction of the head of the listener at the predetermined time and the head-related transfer function.
  • 7. The signal processing device according to claim 6, wherein the resetting is performed for each degree, each order, or each time frequency.
  • 8. The signal processing device according to claim 6, wherein, in a case where the headphone drive signal is generated for each of a plurality of the listeners, the resetting is performed for each of the listeners.
  • 9. The signal processing device according to claim 6, wherein, in a case where the resetting is performed, the rotation operation unit performs the rotation operation with use of a rotation matrix determined in advance as the rotation matrix corresponding to the rotation angle in the rotation direction of the head of the listener at the predetermined time.
  • 10. The signal processing device according to claim 1, wherein, in a case where a rotation matrix, which is included in the rotation matrix corresponding to the rotation of the head, for performing rotation to a predetermined rotation direction is represented by a sum of a plurality of matrices, the rotation operation unit performs, as the operation in which the order is limited, an operation of rotating the head-related transfer function with use of a sum of some of the plurality of the matrices as the rotation matrix for performing the rotation to the predetermined rotation direction.
  • 11. A signal processing method comprising steps of: rotating a head-related transfer function in a spherical harmonic domain by an operation on a basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; andsynthesizing the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal.
  • 12. A program causing a computer to execute processing, the processing comprising steps of: rotating a head-related transfer function in a spherical harmonic domain by an operation on a basis of a rotation matrix corresponding to rotation of a head of a listener, the operation in which an order of the rotation matrix is limited; andsynthesizing the head-related transfer function after rotation obtained by the operation and a sound signal of the spherical harmonic domain to generate a headphone drive signal.
Priority Claims (1)
Number Date Country Kind
2017-132187 Jul 2017 JP national
PCT Information
Filing Document Filing Date Country Kind
PCT/JP2018/023557 6/21/2018 WO 00