The present invention relates to a head-related transfer function generator, a head-related transfer function generation program, and a head-related transfer function generation method.
Priority is claimed on Japanese Patent Application No. 2020-090035, filed May 22, 2020 and Japanese Patent Application No. 2020-200590, filed Dec. 2, 2020, the content of which is incorporated herein by reference.
Conventionally, research and development have progressed for the purpose of practical implementation of a three-dimensional acoustic system, virtual reality (VR) of sounds, and the like. For realizing practical implementation of such technologies, it is necessary to reproduce a head-related transfer function for each listener. As an example of a technology for reproducing a head-related transfer function for each listener, there is a head-related transfer function selecting device disclosed in Patent Document 1.
This head-related transfer function selecting device includes a measurement unit, a feature quantity extracting unit and a characteristic selecting unit. The measurement unit acquires a head-related impulse response of a user on the basis of a speech signal received by a microphone worn on the ears of the user in a state in which predetermined speech is generated as a measurement signal from a speaker. The feature quantity extracting unit extracts a feature quantity of frequency characteristics corresponding to the head-related impulse response. The characteristic selecting unit selects one head-related transfer function from a database in which a head-related transfer function of each of a plurality of persons and a feature quantity of the head-related transfer function are associated with each other on the basis of the extracted feature quantity.
However, the head-related transfer function selecting device described above only selects one of a plurality of head-related transfer functions stored in the database. For this reason, in a case in which a head-related transfer function that is appropriate for a listener is not stored in the database, naturally, the head-related transfer function selecting device cannot select a head-related transfer function that is appropriate for the listener.
In addition, in a case in which the head-related transfer function of a listener is to be actually measured, since it is necessary to exclude effects of unnecessary reflective sounds, surrounding noises, and the like, it is necessary to perform measurement not in a house, an office, and the like but in an anechoic chamber. However, anechoic chambers are present only in limited research organizations. In addition, a general user not having sufficient specialized knowledge of acoustics cannot measure a head-related transfer function of the user who is a listener with sufficient accuracy.
The present invention is in view of the problems described above, and an object thereof is to provide a head-related transfer function generator, a head-related transfer function generation program, and a head-related transfer function generation method capable of acquiring a head-related transfer function reproducing features of a head-related transfer function of a listener without actually measuring the head-related transfer function of the listener.
According to one aspect of the present invention, there is provided a head-related transfer function generator including: an actually measured head-related impulse response acquiring unit configured to acquire data that represents an actually measured head-related impulse response of sound waves arriving at external auditory meatus entrances of a listener for training; an early head-related transfer function generating unit configured to calculate an initial head-related impulse response by applying a window function to the actually measured head-related impulse response and generate data representing an early head-related transfer function by performing a Fourier transform on the initial head-related impulse response; a frequency band dividing unit configured to divide the early head-related transfer function into a plurality of frequency bands; and a modeled head-related transfer function generating unit configured to execute a process of extracting a peak or a notch on the basis of curvature of the early head-related transfer function and a process of determining a relative amplitude for each of the plurality of frequency bands and generate data representing a modeled head-related transfer function of the listener for training by interpolating points representing the relative amplitudes.
The head-related transfer function generator according to one aspect of the present invention further includes: a pinna shape acquiring unit configured to acquire data that represents a shape of a pinna of the listener for training; a frequency band identifying unit configured to identify a first frequency band including a first notch having a lowest frequency among notches included in the modeled head-related transfer function of the listener for training and a second frequency band including a second notch having a second lowest frequency among the notches included in the modeled head-related transfer function of the listener for training; and a relation deriving unit configured to execute a first process of deriving a relation between a first scale having a correlation with a first probability corresponding to the first frequency band and the shape of the pinna of the listener for training for each of the plurality of frequency bands and execute a second process of deriving a relation between a second scale having a correlation with a second probability corresponding to the second frequency band and the shape of the pinna of the listener for training for each of the plurality of frequency bands.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to calculate a first correlation matrix as the relation derived by the first process by executing a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having the plurality of frequency bands as objective variables in the first process and calculate a second correlation matrix as the relation derived by the second process by executing a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having the plurality of frequency bands as objective variables in the second process.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to calculate the first scale using the first correlation matrix and the shape of the pinna of the listener for training, identify a frequency band having the highest first probability among the plurality of frequency bands as the first frequency band on the basis of the first scale, calculate the second scale using the second correlation matrix and the shape of the pinna of the listener for training, and identify a frequency band having the highest second probability among the plurality of frequency bands as the second frequency band on the basis of the second scale.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to derive a first learned model that has been caused to learn using training data having the shape of the pinna of the listener for training as a problem and having the first frequency band as an answer as the relation derived by the first process in the first process and derive a second learned model that has been caused to learn using training data having the shape of the pinna of the listener for training as a problem and having the second frequency band as an answer as the relation derived by the second process in the second process.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to calculate the first scale using the first learned model and the shape of the pinna of the listener for training, identify a frequency band having the highest first probability among the plurality of frequency bands as the first frequency band on the basis of the first scale, calculate the second scale using the second learned model and the shape of the pinna of the listener for training, and identify a frequency band having the highest second probability among the plurality of frequency bands as the second frequency band on the basis of the second scale.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is further configured to execute at least one of a first correction process of re-identifying a frequency band having a second highest first probability as the first frequency band and a second correction process of re-identifying a frequency band having a second highest second probability as the second frequency band in a case in which the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to execute the first correction process in a case in which the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is equal to or smaller than the predetermined lower limit threshold or equal to or larger than the predetermined upper limit threshold, and a predetermined size of the pinna of the listener for training is smaller than a first threshold.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to execute the second correction process in a case in which the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is equal to or smaller than the predetermined lower limit threshold or equal to or larger than the predetermined upper limit threshold, and a predetermined size of the pinna of the listener for training exceeds a second threshold.
The head-related transfer function generator according to one aspect of the present invention further includes: a pinna shape acquiring unit configured to acquire data that represents a shape of a pinna of the listener for training; a frequency band integrating unit configured to generate at least two integrated frequency bands acquired by integrating a plurality of the frequency bands; an integrated frequency band identifying unit configured to identify a first integrated frequency band including a first notch having a lowest frequency among notches included in the modeled head-related transfer function of the listener for training and a second integrated frequency band including a second notch having a second lowest frequency among notches included in the modeled head-related transfer function of the listener for training; and a relation deriving unit configured to execute a first process of deriving a relation between a first scale having a correlation with a first probability corresponding to the first integrated frequency band and the shape of the pinna of the listener for training for each of the plurality of integrated frequency bands and execute a second process of deriving a relation between a second scale having a correlation with a second probability corresponding to the second integrated frequency band and the shape of the pinna of the listener for training for each of the plurality of integrated frequency bands.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to calculate a first correlation matrix as the relation derived by the first process by executing a discriminant analysis having the shape of the pinna of the listener for training set an explanatory variable and having the plurality of integrated frequency bands as objective variables in the first process and calculate a second correlation matrix as the relation derived by the second process by executing a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having the plurality of integrated frequency bands as objective variables in the second process.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to calculate the first scale using the first correlation matrix and the shape of the pinna of the listener for training, identify an integrated frequency band having the highest first probability among the plurality of integrated frequency bands as the first integrated frequency band on the basis of the first scale, calculate the second scale using the second correlation matrix and the shape of the pinna of the listener for training, and identify an integrated frequency band having the highest second probability among the plurality of integrated frequency bands as the second integrated frequency band on the basis of the second scale.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to derive a first learned model that has been caused to learn using training data having the shape of the pinna of the listener for training as a problem and having the first integrated frequency band as an answer as the relation derived by the first process in the first process and derive a second learned model that has been caused to learn using training data having the shape of the pinna of the listener for training as a problem and having the second integrated frequency band as an answer as the relation derived by the second process in the second process.
According to one aspect of the present invention, in the head-related transfer function generator described above, the relation deriving unit is configured to calculate the first scale using the first learned model and the shape of the pinna of the listener for training, identify an integrated frequency band having the highest first probability among the plurality of integrated frequency bands as the first integrated frequency band on the basis of the first scale, calculate the second scale using the second learned model and the shape of the pinna of the listener for training, and identify an integrated frequency band having the highest second probability among the plurality of integrated frequency bands as the second integrated frequency band on the basis of the second scale.
According to one aspect of the present invention, in the head-related transfer function generator, the pinna shape acquiring unit is further configured to acquire data that represents a shape of a pinna of a listener for inference, the head-related transfer function generator may further include a frequency band estimating unit configured to execute a third process of calculating a third scale having a correlation with a third probability corresponding to a third frequency band including a first notch having a lowest frequency among notches included in an individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the first correlation matrix and estimating a frequency band having the highest third probability as the third frequency band for each of the plurality of frequency bands and execute a fourth process of calculating a fourth scale having a correlation with a fourth probability corresponding to a fourth frequency band including a second notch having a second lowest frequency among the notches included in the individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the second correlation matrix and estimating a frequency band having the highest fourth probability as the fourth frequency band for each of the plurality of frequency bands.
According to one aspect of the present invention, in the head-related transfer function generator, the pinna shape acquiring unit is configured to acquire data representing a shape of a pinna of a listener for inference, the head-related transfer function generator may further include a frequency band estimating unit configured to execute a third process of calculating a third scale having a correlation with a third probability corresponding to a third frequency band including a first notch having a lowest frequency among notches included in an individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the first learned model and estimating a frequency band having the highest third probability as the third frequency band for each of the plurality of frequency bands and execute a fourth process of calculating a fourth scale having a correlation with a fourth probability corresponding to a fourth frequency band including a second notch having a second lowest frequency among the notches included in the individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the second learned model and estimating a frequency band having the highest fourth probability as the fourth frequency band for each of the plurality of frequency bands.
According to one aspect of the present invention, in the head-related transfer function generator described above, the frequency band estimating unit is further configured to execute at least one of a third correction process of re-estimating a frequency band having a second highest third probability as the third frequency band and a fourth correction process of re-estimating a frequency band having a second highest fourth probability as the fourth frequency band in a case in which the number of frequency bands present between the frequency band estimated as the third frequency band and the frequency band estimated as the fourth frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold.
According to one aspect of the present invention, in the head-related transfer function generator described above, the frequency band estimating unit is configured to execute the third correction process in a case in which the number of frequency bands present between the frequency band estimated as the third frequency band and the frequency band estimated as the fourth frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, and a predetermined size of the pinna of the listener for inference is smaller than a third threshold.
According to one aspect of the present invention, in the head-related transfer function generator described above, the frequency band estimating unit is further configured to execute the fourth correction process in a case in which the number of frequency bands present between the frequency band estimated as the third frequency band and the frequency band estimated as the fourth frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, and a predetermined size of the pinna of the listener for inference exceeds a fourth threshold.
According to one aspect of the present invention, in the head-related transfer function generator described above, the frequency band estimating unit may further include an individualized head-related transfer function generating unit configured to generate an individualized head-related transfer function of the listener for inference using results of estimation of the third frequency band and the fourth frequency band acquired by the frequency band estimating unit.
According to one aspect of the present invention, in the head-related transfer function generator described above, the pinna shape acquiring unit is further configured to acquire data that represents a shape of a pinna of a listener for inference, the head-related transfer function generator may further include an integrated frequency band estimating unit configured to execute a third process of calculating a third scale having a correlation with a third probability corresponding to a third integrated frequency band including a first notch having a lowest frequency among notches included in an individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the first correlation matrix and estimating an integrated frequency band having the highest third probability as the third integrated frequency band for each of the plurality of integrated frequency bands and execute a fourth process of calculating a fourth scale having a correlation with a fourth probability corresponding to a fourth integrated frequency band including a second notch having a second lowest frequency among the notches included in the individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the second correlation matrix and estimating an integrated frequency band having the highest fourth probability as the fourth integrated frequency band for each of the plurality of integrated frequency bands.
According to one aspect of the present invention, in the head-related transfer function generator described above, the pinna shape acquiring unit is further configured to acquire data that represents a shape of a pinna of a listener for inference, the head-related transfer function generator may further include an integrated frequency band estimating unit configured to execute a third process of calculating a third scale having a correlation with a third probability corresponding to a third integrated frequency band including a first notch having a lowest frequency among notches included in an individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the first learned model and estimating an integrated frequency band having the highest third probability as the third integrated frequency band for each of the plurality of integrated frequency bands and execute a fourth process of calculating a fourth scale having a correlation with a fourth probability corresponding to a fourth integrated frequency band including a second notch having a second lowest frequency among the notches included in the individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the second learned model and estimating an integrated frequency band having the highest fourth probability as the fourth integrated frequency band for each of the plurality of integrated frequency bands.
According to one aspect of the present invention, there is provided a head-related transfer function generation program causing a computer to execute: acquiring data that represents an actually measured head-related impulse response of sound waves arriving at external auditory meatus entrances of a listener for training; calculating an initial head-related impulse response by applying a window function to the actually measured head-related impulse response and generating data representing an early head-related transfer function by performing a Fourier transform on the initial head-related impulse response; dividing the early head-related transfer function into a plurality of frequency bands; and executing a process of extracting a peak or a notch on the basis of curvature of the early head-related transfer function and a process of determining a relative amplitude for each of the plurality of frequency bands and generating data representing a modeled head-related transfer function of the listener for training by interpolating points representing the relative amplitudes.
According to one aspect of the present invention, there is provided a head-related transfer function generation method including: acquiring data that represents an actually measured head-related impulse response of sound waves arriving at external auditory meatus entrances of a listener for training; calculating an initial head-related impulse response by applying a window function to the actually measured head-related impulse response and generating data representing an early head-related transfer function by performing a Fourier transform on the initial head-related impulse response; dividing the early head-related transfer function into a plurality of frequency bands; and executing a process of extracting a peak or a notch on the basis of curvature of the early head-related transfer function and a process of determining a relative amplitude for each of the plurality of frequency bands and generating data representing a modeled head-related transfer function of the listener for training by interpolating points representing the relative amplitudes.
According to the present invention, a head-related transfer function generator, a head-related transfer function generation program, and a head-related transfer function generation method capable of acquiring a head-related transfer function that reproduces features of the head-related transfer function of a listener without actually measuring the head-related transfer function of the listener can be provided.
First, a trunnion coordinate system used showing a head-related transfer function generator according to an embodiment will be described with reference to
The trunnion coordinate system illustrated in
Next, hardware composing the head-related transfer function generator according to an embodiment will be described with reference to
The processor 11 is, for example, a central processing unit (CPU) and realizes each function of the head-related transfer function generator 1 by reading and executing a head-related transfer function generation program. In addition, the processor 11 may realize functions required for realizing each function of the head-related transfer function generator 1 by reading and executing a program other than the head-related transfer function generation program.
The main storage device 12 is, for example, a random access memory (RAM) and stores a head-related transfer function generation program and other programs, which are read and executed by the processor 11, in advance.
The communication interface 13 is an interface circuit that is used for communicating with other devices through a network. For example, the network is the Internet, an intranet, a wide area network (WAN), or a local area network (LAN).
For example, the auxiliary storage device 14 is a hard disk drive (HDD), a solid state drive (SSD), a flash memory, or a read only memory (ROM).
For example, the input/output device 15 is an input/output port. For example, a mouse 151, a keyboard 152, and a display 153 illustrated in
The bus 16 connects the processor 11, the main storage device 12, the communication interface 13, the auxiliary storage device 14, and the input/output device 15 such that data thereof can be transmitted and received.
Next, the functional configuration of the head-related transfer function generator according to the embodiment will be described with reference to
The actually measured head-related impulse response acquiring unit 101 acquires data that represents an actually measured head-related impulse response of sound waves that have arrived at the external auditory meatus entrance of a listener for training.
The actually measured head-related impulse response is transformed into an actually measured head-related transfer function through a Fourier transform. The head-related transfer function (HRTF) represents changes in physical characteristics of sound waves, which have arrived at the external auditory meatus entrances of a listener from a sound source, by being influenced by a head part of the listener and the periphery thereof in a frequency domain. The actually measured head-related transfer function is a head-related transfer function generated by actually measuring sound waves.
In
As illustrated in
In addition, the head-related transfer function of a listener in a case in which a sound source is located in a specific direction causes the listener to perceive a sound image located in the specific direction. The sound image is a whole body perceived by a listener in a case in which sound waves arrive at an eardrum of the listener and is a psychological image felt by the listener according to the perception. For example, the sound image includes a time property such as a reverberation feeling, a sense of rhythm, and a sustaining feeling, a spatial property such as a direction feeling, a distance feeling, and an expanse feeling, and a quality property such as a magnitude, a height, and a tone. The listener's perception of a spatial position of a sound image will be referred to as sound image localization.
The head-related transfer function causes a listener to perceive a sound image, and thus, in a case in which the head-related transfer function is appropriately reproduced, it is a significant concept for realizing a three-dimensional acoustic system, virtual reality of sounds, and the like. However, a difference of the head-related transfer function for each listener becomes a hurdle for realizing such a technology.
The early head-related transfer function generating unit 102 calculates an initial head-related impulse response by applying a window function to the actually measured head-related impulse response. The window function described here is, for example, a Blackman-Harris window and is a step function extracting only a period until a predetermined time elapses from a maximum peak of a relative intensity included in the actually measured head-related impulse response.
Then, the early head-related transfer function generating unit 102 performs a Fourier transform on the initial head-related impulse response, thereby generating data representing the early head-related transfer function.
The frequency band dividing unit 103 divides the early head-related transfer function into a plurality of frequency bands. For example, the frequency band dividing unit 103 divides the early head-related transfer function denoted by the solid line in
The modeled head-related transfer function generating unit 104 executes a process of extracting a peak or a notch on the basis of the curvature of the early head-related transfer function for each of a plurality of frequency bands. A peak represents an upwardly convex part in the head-related transfer function. A notch represents a downwardly convex part in the head-related transfer function.
Next, the modeled head-related transfer function generating unit 104 executes a process of determining a relative amplitude on the basis of the curvature of the early head-related transfer function for each of the plurality of frequency bands. For example, the modeled head-related transfer function generating unit 104, first, searches for inflection points included in each frequency band. In a case in which one inflection point is found in the frequency band, the modeled head-related transfer function generating unit 104 determines a relative amplitude represented by the inflection point as the relative amplitude of the frequency band. On the other hand, in a case in which two or more inflection points are found in the frequency band, the modeled head-related transfer function generating unit 104 determines a maximum relative amplitude among relative amplitudes represented by such inflection points as the relative amplitude of the frequency band. In addition, in a case in which no inflection point is found in the frequency band, the modeled head-related transfer function generating unit 104 determines a relative amplitude at the center frequency of the frequency band as the relative amplitude of the frequency band.
Then, the modeled head-related transfer function generating unit 104 interpolates points representing relative amplitudes of the frequency bands, thereby generating data representing an individualized head-related transfer function of the listener. For example, the modeled head-related transfer function generating unit 104 joins such points using segments, thereby generating data representing an individualized head-related transfer function denoted by a broken line in
In addition, the modeled head-related transfer function generating unit 104 reproduces the early head-related transfer function with different accuracies in accordance with a width of the frequency band set by the frequency band dividing unit 103. Next, a relation between the width of frequency bands and the reproduction accuracy of the early head-related transfer function will be described with reference to
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated in
The reproduction accuracy of the early head-related transfer function using the modeled head-related transfer function generating unit 104 has an influence on listener's sound image localization. Thus, the influence of the reproduction accuracy of the early head-related transfer function on listener's sound image localization will be described with reference to
Thus, the width of frequency bands set by the frequency band dividing unit 103 is preferably 1/12 octave to ⅓ octave and is more preferably 1/12 octave to ⅙ octave. In accordance with this, the peak P2, the peak P3, the first notch N1 and the second notch N2 illustrated in
Next, a process for deriving a relation between a frequency band including a first notch and the shape of a pinna of a listener for training and a process for deriving a relation between a frequency band including a second notch and the shape of the pinna of the listener for training using the head-related transfer function generator 1 will be described.
The pinna shape acquiring unit 105 acquires data that represents the shape of the pinna of a listener.
For example, the pinna shape acquiring unit 105 acquires data that represents the coordinates of a point p1 to a point p10 illustrated in
The frequency band identifying unit 106 identifies a first frequency band including a first notch and a second frequency band including a second notch. The first notch is a notch having a lowest frequency among notches included in a modeled head-related transfer function of the listener for training. The second notch is a notch having a second lowest frequency among the notches included in the modeled head-related transfer function of the listener for training.
The relation deriving unit 107 executes a first process of deriving a relation between a first scale having a correlation with a first probability corresponding to a first frequency band and the shape of the pinna of the listener for training for each of a plurality of frequency bands.
For example, in the first process, the relation deriving unit 107 executes a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having a plurality of frequency bands as objective variables, thereby calculating a first correlation matrix as a relation derived by the first process. The first correlation matrix is calculated for each frequency band. In addition, in this case, the first scale is a Mahalanobis distance or a value calculated using the Mahalanobis distance. The Mahalanobis distance is a product of a row vector in which parameters relating to the shape of the pinna of the listener for training are aligned, the first correlation matrix, and a column vector in which the parameters relating to the shape of the pinna of the listener for training are aligned.
In addition, the relation deriving unit 107 calculates a first scale using the first correlation matrix and the shape of the pinna of the listener for training and identifies a frequency band having the highest first probability among the plurality of frequency bands as a first frequency band on the basis of the first scale.
Furthermore, the relation deriving unit 107 executes a second process of deriving a relation between a second scale, which has a correlation with a second probability corresponding to a second frequency band, and the shape of the pinna of the listener for training for each of the plurality of frequency bands.
For example, in the second process, the relation deriving unit 107 executes a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having a plurality of frequency bands as objective variables, thereby calculating a second correlation matrix as a relation derived by the second process. The second correlation matrix is calculated for each frequency band. In addition, in this case, the second scale is a Mahalanobis distance or a value calculated using the Mahalanobis distance. The Mahalanobis distance is a product of a row vector in which parameters relating to the shape of the pinna of the listener for training are aligned, the second correlation matrix, and a column vector in which the parameters relating to the shape of the pinna of the listener for training are aligned.
In addition, the relation deriving unit 107 calculates a second scale using the second correlation matrix and the shape of the pinna of the listener for training and identifies a frequency band having the highest second probability among the plurality of frequency bands as a second frequency band on the basis of the second scale.
In addition, in a case in which the number of frequency bands present between a frequency band identified as the first frequency band and a frequency band identified as the second frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, the relation deriving unit 107 may execute at least one of a first correction process and a second correction process. For example, the predetermined lower limit threshold described here is “3”. In addition, for example, the predetermined upper limit threshold described here is “8”. The first correction process is a process of re-identifying a frequency band having a second highest first probability as a first frequency band. In addition, the second correction process is a process of re-identifying a frequency band having a second highest second probability as a second frequency band.
Both the frequency band in which the first notch is included and the frequency band in which the second notch is included are over a range of about one octave, and parts thereof overlap each other. For this reason, by executing at least one of the first correction process and the second correction process, the relation deriving unit 107 can identify the first frequency band and the second frequency band with higher accuracy.
In addition, in a case in which the number of frequency bands present between a frequency band identified as the first frequency band and a frequency band identified as the second frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, and a predetermined size of the pinna of the listener for training is smaller than a first threshold, the relation deriving unit 107 may execute the first correction process. The reason for this is that, in a case in which the pinna of the listener is small, the frequency band identified first as the first frequency band is incorrect in many cases.
Furthermore, in a case in which the number of frequency bands present between a frequency band identified as the first frequency band and a frequency band identified as the second frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, and a predetermined size of the pinna of the listener for training exceeds a second threshold, the relation deriving unit 107 may execute the second correction process. The reason for this is that, in a case in which the pinna of the listener is large, the frequency band identified first as the second frequency band is incorrect in many cases.
Next, a process in which the head-related transfer function generator 1 estimates a frequency band including a first notch of the individualized head-related transfer function of a listener for inference and a frequency band including a second notch of the individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference, the relation derived by the first process, and the relation derived by the second process will be described.
The pinna shape acquiring unit 105 acquires data that represents the shape of the pinna of the listener for inference. For example, the data is data that is similar to the data described with reference to
The frequency band estimating unit 108 executes a third process. More specifically, the frequency band estimating unit 108 calculates a third scale that has a correlation with a third probability corresponding to a third frequency band including a first notch having the lowest frequency among notches included in the individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the first correlation matrix for each of a plurality of frequency bands. Then, the frequency band estimating unit 108 estimates the frequency band having the highest third probability as a third frequency band.
In addition, the frequency band estimating unit 108 executes a fourth process. More specifically, the frequency band estimating unit 108 calculates a fourth scale having a correlation with a fourth probability corresponding to a fourth frequency band including a second notch having a second lowest frequency among notches included in the individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the second correlation matrix for each of a plurality of frequency bands. Then, the frequency band estimating unit 108 estimates a frequency band having the highest fourth probability as a fourth frequency band.
For example, in a case in which the third frequency band and the fourth frequency band are estimated, the frequency band estimating unit 108 uses the following Equation (1). Equation (1) represents that a product of a row vector having parameters x1, x2, x3, x4, x5, x6, x7, x8, x9, and x10 representing shapes of the pinna of the listener for inference as its elements, a column vector having these as its elements, and an inverse matrix of a correlation matrix having a correlation coefficient rj, k of xj (here, j=1, 2, 3, . . . , 10) and xk (here, k=1, 2, 3 . . . 10) as its elements is equal to the square of the Mahalanobis distance D. The inverse matrix of the matrix included in Equation (1) is an example of the first correlation matrix and the second correlation matrix described above. In addition, the Mahalanobis distance included in Equation (1) is an example of the first scale and the second scale described above. For example, the frequency band estimating unit 108 estimates a frequency band for which the Mahalanobis distance is a minimum as a first frequency band and estimates a frequency band for which the Mahalanobis distance is a minimum as a second frequency band.
In addition, in a case in which the number of frequency bands present between a frequency band estimated as the third frequency band and a frequency band estimated as the fourth frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, the frequency band estimating unit 108 may execute at least one of a third correction process and a fourth correction process. For example, the predetermined lower limit threshold described here is “3”. In addition, for example, the predetermined upper limit threshold described here is “8”. The third correction process is a process of re-estimating a frequency band having a second highest third probability as a third frequency band. In addition, the fourth correction process is a process of re-estimating a frequency band having a second highest fourth probability as a fourth frequency band. For example, the frequency band estimating unit 108 re-estimates a frequency band of which the Mahalanobis distance calculated using Equation (1) is a second largest as the third frequency band or the fourth frequency band.
Also for the individualized head-related transfer function, similar to the modeled head-related transfer function, both the frequency band in which the first notch is included and the frequency band in which the second notch is included are over a range of about one octave, and parts thereof overlap each other. For this reason, by executing at least one of the third correction process and the fourth correction process, the frequency band estimating unit 108 can identify the third frequency band and the fourth frequency band with higher accuracy.
In addition, in a case in which the number of frequency bands present between a frequency band estimated as the third frequency band and a frequency band estimated as the fourth frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, and a predetermined size of the pinna of the listener for inference is smaller than a third threshold, the frequency band estimating unit 108 may execute the third correction process. The reason for this is that, in a case in which the pinna of the listener for inference is small, the frequency band estimated first as the third frequency band is incorrect in many cases.
Furthermore, in a case in which the number of frequency bands present between a frequency band estimated as the third frequency band and a frequency band estimated as the fourth frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, and a predetermined size of the pinna of the listener for inference exceeds a fourth threshold, the frequency band estimating unit 108 may execute the fourth correction process. The reason for this is that, in a case in which the pinna of the listener is large, the frequency band identified first as the fourth frequency band is incorrect in many cases.
Next, an example of a process in which the head-related transfer function generator 1 generates an individualized head-related transfer function and an individualized head-related impulse response will be described with reference to
The individualized head-related transfer function generating unit 109 generates an individualized head-related transfer function of a listener for inference using results of estimation of the third frequency band and the fourth frequency band performed by the frequency band estimating unit 108.
More specifically, as illustrated in
Then, the individualized head-related transfer function generating unit 109 interpolates, for example, a point representing a frequency and a relative amplitude of a first peak, a point representing a center frequency and a relative amplitude of a third frequency band, a point representing a frequency and a relative amplitude of a second peak, a point representing a center frequency and a relative amplitude of a fourth frequency band through linear interpolation or the like, thereby generating an individualized head-related transfer function of the listener for inference. The first peak is a peak that appears in a frequency area lower than the first notch. The second peak is a peak that appears in a frequency area higher than the first notch and lower than the second notch. The individualized head-related transfer function generating unit 109 outputs data representing the individualized head-related transfer function to the individualized head-related impulse response generating unit 110 and outside of the head-related transfer function generator 1.
As illustrated in
Next, a case in which the head-related transfer function generator according to the embodiment generates and uses an integrated frequency band acquired by integrating at least two frequency bands described above will be described with reference to
The frequency band integrating unit 106a generates at least two integrated frequency bands acquired by integrating a plurality of frequency bands.
For example, the frequency band integrating unit 106a selects a frequency band denoted by a number “42” in
For example, the frequency band integrating unit 106a selects a frequency band denoted by a number “45” in
In addition, for example, the frequency band integrating unit 106a selects a frequency band denoted by a number “48” in
In addition, for example, the frequency band integrating unit 106a selects a frequency band denoted by a number “51” in
All the integrated frequency bands illustrated in
Each center frequency illustrated in
The integrated frequency band identifying unit 107a identifies a first integrated frequency band that includes a first notch and a second integrated frequency band that includes a second notch.
The relation deriving unit 108a executes a first process of deriving a relation between a first scale having a correlation with a first probability corresponding to the first integrated frequency band and the shape of the pinna of the listener for training for each of a plurality of integrated frequency bands.
For example, in the first process, the relation deriving unit 108a executes a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having a plurality of integrated frequency bands as objective variables in the first process, thereby calculating a first correlation matrix as a relation derived by the first process. In addition, in this case, the first scale is a Mahalanobis distance or a value calculated using the Mahalanobis distance.
In addition, the relation deriving unit 108a calculates a first scale using the first correlation matrix and the shape of the pinna of the listener for training and identifies an integrated frequency band having the highest first probability among the plurality of integrated frequency bands as a first integrated frequency band on the basis of the first scale.
Furthermore, relation deriving unit 108a executes a second process of deriving a relation between a second scale, which has a correlation with a second probability corresponding to a second integrated frequency band, and the shape of the pinna of the listener for training for each of a plurality of integrated frequency bands.
For example, in the second process, the relation deriving unit 108a executes a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having a plurality of integrated frequency bands as objective variables, thereby calculating a second correlation matrix as a relation derived by the second process. In addition, in this case, the second scale is a Mahalanobis distance or a value calculated using the Mahalanobis distance.
In addition, the relation deriving unit 108a calculates a second scale using the second correlation matrix and the shape of the pinna of the listener for training and identifies an integrated frequency band having the highest second probability among the plurality of integrated frequency bands as a second integrated frequency band on the basis of the second scale.
Next, a process in which the head-related transfer function generator 1a estimates an integrated frequency band including a first notch of the individualized head-related transfer function of a listener for inference and an integrated frequency band including a second notch of the individualized head-related transfer function of a listener for inference using the shape of the pinna of the listener for inference, the relation derived by the first process and the relation derived by the second process will be described.
The pinna shape acquiring unit 105 acquires data representing the shape of the pinna of a listener for inference. For example, this data is data that is similar to the data described with reference to
The integrated frequency band estimating unit 109a executes a third process. More specifically, the integrated frequency band estimating unit 109a calculates a third scale having a correlation with a third probability corresponding to a third integrated frequency band including a first notch having the lowest frequency among notches included in the individualized head-related transfer function of a listener for inference using the shape of the pinna of the listener for inference and the first correlation matrix for each of a plurality of integrated frequency bands. Then, the integrated frequency band estimating unit 109a estimates an integrated frequency band having the highest third probability as a third integrated frequency band.
The integrated frequency band estimating unit 109a executes a fourth process. More specifically, the integrated frequency band estimating unit 109a calculates a fourth scale having a correlation with a fourth probability corresponding to a fourth integrated frequency band including a second notch having a second lowest frequency among the notches included in the individualized head-related transfer function of the listener for inference using the shape of the pinna of the listener for inference and the second correlation matrix for each of a plurality of integrated frequency bands. Then, the integrated frequency band estimating unit 109a estimates an integrated frequency band having the highest fourth probability as a fourth integrated frequency band.
The individualized head-related transfer function generating unit 110a generates an individualized head-related transfer function of a listener for inference using results of estimation of the third integrated frequency band and the fourth integrated frequency band acquired by the integrated frequency band estimating unit 109a. More specifically, the individualized head-related transfer function generating unit 110a applies a technique for generating an individualized head-related transfer function of a listener for inference to the integrated frequency bands using results of estimation of the third frequency band and the fourth frequency band acquired by the individualized head-related transfer function generating unit 109 described above. In accordance with this, the individualized head-related transfer function generating unit 110a generates an individualized head-related transfer function on the basis of the integrated frequency bands.
The individualized head-related impulse response generating unit 111a performs an inverse Fourier transform on the individualized head-related transfer function generated by the individualized head-related transfer function generating unit 110a, thereby generating an individualized head-related impulse response.
Next, an example of a process executed by the head-related transfer function generator according to the embodiment will be described with reference to
In Step S101, the actually measured head-related impulse response acquiring unit 101 acquires data that represents an actually measured head-related impulse response of sound waves arriving at the external auditory meatus entrance of a listener for training.
In Step S102, the early head-related transfer function generating unit 102 calculates an initial head-related impulse response by applying a window function to the actually measured head-related impulse response and performs a Fourier transform on the initial head-related impulse response, thereby generating data representing an early head-related transfer function.
In Step S103, the frequency band dividing unit 103 divides the early head-related transfer function into a plurality of frequency bands.
In Step S104, the modeled head-related transfer function generating unit 104 extracts peaks or notches on the basis of the curvature of the early head-related transfer function for each of the plurality of frequency bands.
In Step S105, the modeled head-related transfer function generating unit 104 determines a relative amplitude on the basis of the curvature of the early head-related transfer function for each of the plurality of frequency bands.
In Step S106 the modeled head-related transfer function generating unit 104 interpolates points representing relative amplitudes, thereby generating data representing a modeled head-related transfer function of a listener for training.
In Step S201, the pinna shape acquiring unit 105 acquires data representing the shape of the pinna of a listener for training.
In Step S202, the frequency band identifying unit 106 identifies a first frequency band that includes a first notch and identifies a second frequency band that includes a second notch.
In Step S203, the relation deriving unit 107 executes the first process of deriving a relation between the first scale having a correlation with the first probability corresponding to the first frequency band and the shape of the pinna of the listener for training for each of a plurality of frequency bands.
In Step S204, the relation deriving unit 107 executes the second process of deriving a relation between the second scale having a correlation with the second probability corresponding to the second frequency band and the shape of the pinna of the listener for training for each of a plurality of frequency bands.
In Step S205, the relation deriving unit 107 identifies a frequency band having the highest first probability as the first frequency band and identifies a frequency band having the highest second probability as the second frequency band.
In Step S206, the relation deriving unit 107 determines whether or not the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold. In a case in which it is determined that the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold (Step S206: Yes), the relation deriving unit 107 causes the process to proceed to Step S207. On the other hand, in a case in which it is determined that the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is neither equal to or smaller than the predetermined lower limit threshold nor equal to or larger than the predetermined upper limit threshold (Step S206: No), the relation deriving unit 107 ends the process.
In Step S207, the relation deriving unit 107 determines whether or not a predetermined size of the pinna of the listener for training is smaller than a first threshold. In a case in which it is determined that the predetermined size of the pinna of the listener for training is smaller than the first threshold (Step S207: Yes), the relation deriving unit 107 causes the process to proceed to Step S208. On the other hand, in a case in which it is determined that the predetermined size of the pinna of the listener for training is equal to or larger than the first threshold (Step S207: No), the relation deriving unit 107 causes the process to proceed to Step S209.
In Step S208, the relation deriving unit 107 executes the first correction process of re-identifying a frequency band having a second highest first probability as the first frequency band.
In Step S209, the relation deriving unit 107 determines whether or not the predetermined size of the pinna of the listener for training exceeds a second threshold. In a case in which it is determined that the predetermined size of the pinna of the listener for training exceeds the second threshold (Step S209: Yes), the relation deriving unit 107 causes the process to proceed to Step S210. On the other hand, in a case in which it is determined that the predetermined size of the pinna of the listener for training is equal to or smaller than the second threshold (Step S209: No), the relation deriving unit 107 causes the process to end.
In Step S210, the relation deriving unit 107 executes the second correction process of re-identifying a frequency band having a second highest second probability as the second frequency band.
In Step S301, the pinna shape acquiring unit 105 acquires data representing the shape of the pinna of a listener for inference.
In Step S302, the frequency band estimating unit 108 executes the third process of calculating a third scale having a correlation with the third probability corresponding to the third frequency band including a first notch and estimating a frequency band having the highest third probability as the third frequency band.
In Step S303, the frequency band estimating unit 108 executes the fourth process of calculating a fourth scale having a correlation with the fourth probability corresponding to the fourth frequency band including a second notch and estimating a frequency band having the highest fourth probability as the fourth frequency band.
In Step S304, the frequency band estimating unit 108 determines whether or not the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold. In a case in which it is determined that the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is equal to or smaller than the predetermined lower limit threshold or equal to or larger than the predetermined upper limit threshold (Step S304: Yes), the frequency band estimating unit 108 causes the process to proceed to Step S305. On the other hand, in a case in which it is determined that the number of frequency bands present between the frequency band identified as the first frequency band and the frequency band identified as the second frequency band is neither equal to or smaller than the predetermined lower limit threshold nor equal to or larger than the predetermined upper limit threshold (Step S304: No), the frequency band estimating unit 108 ends the process.
In Step S305, the frequency band estimating unit 108 determines whether or not a predetermined size of the pinna of the listener for inference is smaller than a third threshold. In a case in which it is determined that the predetermined size of the pinna of the listener for inference is smaller than the third threshold (Step S305: Yes), the frequency band estimating unit 108 causes the process to proceed to Step S306. On the other hand, in a case in which it is determined that the predetermined size of the pinna of the listener for inference is equal to or larger than the third threshold (Step S305: No), the frequency band estimating unit 108 ends the process.
In Step S306, the frequency band estimating unit 108 executes the third correction process of re-estimating a frequency band having a second highest third probability as the third frequency band.
In Step S307, the frequency band estimating unit 108 determines whether or not a predetermined size of the pinna of the listener for inference exceeds a fourth threshold. In a case in which it is determined that the predetermined size of the pinna of the listener for inference exceeds the fourth threshold (Step S307: Yes), the frequency band estimating unit 108 causes the process to proceed to Step S308. On the other hand, in a case in which it is determined that the predetermined size of the pinna of the listener for inference is equal to or smaller than the fourth threshold (Step S307: No), the frequency band estimating unit 108 ends the process.
In Step S308, the frequency band estimating unit 108 executes the fourth correction process of re-estimating a frequency band having a second highest fourth probability as the fourth frequency band.
In Step S401, the pinna shape acquiring unit 105 acquires data representing the shape of the pinna of a listener for training.
In Step S402, the frequency band integrating unit 106a generates at least two integrated frequency bands acquired by integrating a plurality of frequency bands.
In Step S403, the integrated frequency band identifying unit 107a identifies a first integrated frequency band that includes a first notch and identifies a second integrated frequency band that includes a second notch.
In Step S404, the relation deriving unit 108a the first process of deriving a relation between the first scale having a correlation with the first probability corresponding to the first integrated frequency band and the shape of the pinna of the listener for training for each of a plurality of integrated frequency bands.
In Step S405, the relation deriving unit 108a executes the second process of deriving a relation between the second scale having a correlation with the second probability corresponding to the second integrated frequency band and the shape of the pinna of the listener for training for each of a plurality of integrated frequency bands.
In Step S406, the relation deriving unit 108a identifies an integrated frequency band having the highest first probability as the first integrated frequency band and identifies an integrated frequency band having the highest second probability as the second integrated frequency band.
In Step S501, the pinna shape acquiring unit 105 acquires data representing the shape of the pinna of a listener for inference.
In Step S502, the integrated frequency band estimating unit 109a executes the third process of calculating a third scale having a correlation with the third probability corresponding to the third integrated frequency band including a first notch and estimating an integrated frequency band having the highest third probability as the third integrated frequency band.
In Step S503, the integrated frequency band estimating unit 109a executes the fourth process of calculating a fourth scale having a correlation with the fourth probability corresponding to the fourth integrated frequency band including a second notch and estimating an integrated frequency band having the highest fourth probability as the fourth integrated frequency band.
As above, the head-related transfer function generator 1 according to the embodiment has been described. The head-related transfer function generator 1 executes the process of dividing the early head-related transfer function into a plurality of frequency bands and extracting a peak or a notch on the basis of the curvature of the early head-related transfer function for each of the plurality of frequency bands. Next, the head-related transfer function generator 1 executes the process of determining a relative amplitude on the basis of the curvature of the early head-related transfer function for each of the plurality of frequency bands. Then, the head-related transfer function generator 1 interpolates points representing relative amplitudes, thereby generating data that represents a modeled head-related transfer function of the listener for training.
In this way, the head-related transfer function generator 1 can acquire a modeled head-related transfer function, which reproduces the features of the head-related transfer function of the listener for training, without actually measuring the head-related transfer function of the listener for training.
In addition, the head-related transfer function generator 1 acquires data that represents the shape of the pinna of the listener for training. Next, the head-related transfer function generator 1 identifies a first frequency band and s second frequency band of the modeled head-related transfer function. Then, the head-related transfer function generator 1 executes the first process of deriving a relation between a first scale, which has a correlation with a first probability corresponding to a first frequency band, and the shape of the pinna of the listener for training for each of the plurality of frequency bands. In addition, the head-related transfer function generator 1 executes the second process of deriving a relation between a second scale, which has a correlation with a second probability corresponding to a second frequency band, and the shape of the pinna of the listener for training for each of the plurality of frequency bands.
In this way, the head-related transfer function generator 1 can derive a relation between the shape of the pinna and the first frequency band and a relation between the shape of the pinna and the second frequency band that can be used for generating a modeled head-related transfer function of the listener for inference.
In addition, in the first process, the head-related transfer function generator 1 executes a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having a plurality of frequency bands as objective variables, thereby calculating a first correlation matrix as a relation derived by the first process. Furthermore, in the second process, the head-related transfer function generator 1 executes a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having a plurality of frequency bands as objective variables, thereby calculating a second correlation matrix as a relation derived by the second process.
In this way, the head-related transfer function generator 1 can derive a relation between the shape of the pinna and the first frequency band and a relation between the shape of the pinna and the second frequency band with accuracy of a certain level or higher.
In addition, the head-related transfer function generator 1 calculates a first scale using the first correlation matrix and the shape of the pinna of the listener for training and identifies a frequency band having the highest first probability among the plurality of frequency bands as a first frequency band on the basis of the first scale. In addition, the head-related transfer function generator 1 calculates a second scale using the second correlation matrix and the shape of the pinna of the listener for training and identifies a frequency band having the highest second probability among the plurality of frequency bands as a second frequency band on the basis of the second scale.
In this way, the head-related transfer function generator 1 can identify the first frequency band and the second frequency band with accuracy of a certain level or higher.
In addition, the head-related transfer function generator 1 executes at least one of the first correction process and the second correction process described above in a case in which the number of frequency bands present between a frequency band identified as the first frequency band and a frequency band identified as the second frequency band is equal to or smaller than the predetermined lower limit threshold or equal to or larger than the predetermined upper limit threshold.
In this way, the head-related transfer function generator 1 can identify a first notch and a second notch with higher accuracy that achieve important roles in a case in which a listener perceives a vertical angle of the direction in which a sound image is located within the median plane.
In addition, the head-related transfer function generator 1 may execute the first correction process in a case in which the number of frequency bands present between a frequency band identified as the first frequency band and a frequency band identified as the second frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold, and a predetermined size of the pinna of the listener for training is smaller than a first threshold.
In this way, the head-related transfer function generator 1 executes the first correction process in a case in which the pinna of the listener for training is small, and the possibility of a frequency band identified first as the first frequency band being incorrect is relatively high and thus can identify the first frequency band with further higher accuracy.
In addition, the head-related transfer function generator 1 may execute the second correction process in a case in which the number of frequency bands present between a frequency band identified as the first frequency band and a frequency band identified as the second frequency band is equal to or smaller than the predetermined lower limit threshold or equal to or larger than the predetermined upper limit threshold, and a predetermined size of the pinna of the listener for training exceeds the second threshold.
In this way, the head-related transfer function generator 1 executes the second correction process in a case in which the pinna of the listener for training is large, and the possibility of a frequency band identified first as the second frequency band being incorrect is relatively high and can identify the second frequency band with further higher accuracy.
In addition, the head-related transfer function generator 1 acquires data that represents the shape of the pinna of a listener for inference. Then, the head-related transfer function generator 1 executes the third process and the fourth process. The third process is a process of calculating a third scale having a correlation with a third probability corresponding to a third frequency band including a first notch having the lowest frequency among notches included in the individualized head-related transfer function of a listener for inference using the shape of the pinna of the listener for inference and the first correlation matrix and estimating a frequency band having the highest third probability as a third frequency band. The fourth process is a process of calculating a fourth scale having a correlation with a fourth probability corresponding to a fourth frequency band including a second notch having the second lowest frequency among notches included in the individualized head-related transfer function of a listener for inference using the shape of the pinna of the listener for inference and the second correlation matrix and estimating a frequency band having the highest fourth probability as a fourth frequency band for each of a plurality of frequency bands.
In this way, the head-related transfer function generator 1 can estimate the third frequency band in which the first notch is included and the fourth frequency band in which the second notch is included with accuracy of a certain level or higher for the individualized head-related transfer function of a listener for inference whose shape of the pinna is unknown.
In addition, the head-related transfer function generator 1 executes at least one of the third correction process and the fourth correction process described above in a case in which the number of frequency bands present between a frequency band identified as the third frequency band and a frequency band identified as the fourth frequency band is equal to or smaller than a predetermined lower limit threshold or equal to or larger than a predetermined upper limit threshold.
In this way, the head-related transfer function generator 1 can estimate at least one of the third frequency band and the fourth frequency band with further higher accuracy for the individualized head-related transfer function of a listener for inference whose shape of the pinna is unknown.
In addition, the head-related transfer function generator 1 may execute the third correction process in a case in which the number of frequency bands present between a frequency band identified as the third frequency band and a frequency band identified as the fourth frequency band is equal to or smaller than the predetermined lower limit threshold or equal to or larger than the predetermined upper limit threshold, and a predetermined size of the pinna of the listener for inference is smaller than the third threshold.
In this way, the head-related transfer function generator 1 executes the third correction process in a case in which the pinna of the listener for inference is small, and the possibility of a frequency band identified first as the third frequency band being incorrect is relatively high and thus can identify the third frequency band with further higher accuracy.
In addition, the head-related transfer function generator 1 may execute the fourth correction process in a case in which the number of frequency bands present between a frequency band identified as the third frequency band and a frequency band identified as the fourth frequency band is equal to or smaller than the predetermined lower limit threshold or equal to or larger than the predetermined upper limit threshold, and a predetermined size of the pinna of the listener for inference exceeds the fourth threshold.
In this way, the head-related transfer function generator 1 executes the fourth correction process in a case in which the pinna of the listener for inference is small, and the possibility of a frequency band identified first as the fourth frequency band being incorrect is relatively high and thus can identify the fourth frequency band with further higher accuracy.
In addition, the head-related transfer function generator 1 generates an individualized head-related transfer function of the listener for inference using results of estimation of the third frequency band and the fourth frequency band that are acquired by the frequency band estimating unit 108.
In this way, the head-related transfer function generator 1 can acquire an individualized head-related transfer function that reproduces the first notch and the second notch, which achieve important roles in a case in which a listener for inference perceives a vertical angle of the direction in which a sound image is located within the median plane, with high accuracy.
In addition, the head-related transfer function generator 1a acquires data that represents the shape of the pinna of the listener for training. Next, the head-related transfer function generator 1a generates at least two integrated frequency bands acquired by integrating a plurality of frequency bands. Next, the head-related transfer function generator 1a identifies the first integrated frequency band and the second integrated frequency band of the modeled head-related transfer function. Then, the head-related transfer function generator 1a executes the first process of deriving a relation between a first scale, which has a correlation with a first probability corresponding to a first integrated frequency band, and the shape of the pinna of the listener for training for each of a plurality of integrated frequency bands. In addition, the head-related transfer function generator 1a executes the second process of deriving a relation between a second scale having a correlation with a second probability corresponding to a second integrated frequency band and the shape of the pinna of the listener for training for each of a plurality of integrated frequency bands.
In this way, the head-related transfer function generator 1a can derive a relation between the shape of the pinna and the first frequency band that can be used for generating a modeled head-related transfer function of a listener for training on the basis of a frequency width that can be identified by the listener for training. In addition, in this way, the head-related transfer function generator 1a can derive a relation between the shape of the pinna and the second frequency band that can be used for generating a modeled head-related transfer function of a listener for training on the basis of a frequency width that can be identified by the listener for training.
In addition, the head-related transfer function generator 1a executes a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having a plurality of integrated frequency bands as objective variables in the first process, thereby calculating a first correlation matrix as a relation derived by the first process. Furthermore, the head-related transfer function generator 1a executes a discriminant analysis having the shape of the pinna of the listener for training as an explanatory variable and having a plurality of integrated frequency bands as objective variables in the second process, thereby calculating a second correlation matrix as a relation derived by the second process.
In this way, the head-related transfer function generator 1a can derive a relation between the shape of the pinna and the first frequency band that has accuracy of a certain level or higher and matches a frequency width that can be identified by the listener for training. In addition, in this way, the head-related transfer function generator 1a can derive a relation between the shape of the pinna and the second frequency band that has accuracy of a certain level or higher and matches a frequency width that can be identified by the listener for training.
In addition, the head-related transfer function generator 1a calculates a first scale using the first correlation matrix and the shape of the pinna of the listener for training and identifies an integrated frequency band having the highest first probability among the plurality of integrated frequency bands as a first integrated frequency band on the basis of the first scale. Furthermore, the head-related transfer function generator 1a calculates a second scale using the second correlation matrix and the shape of the pinna of the listener for training and identifies an integrated frequency band having the highest second probability among the plurality of integrated frequency bands as a second integrated frequency band on the basis of the second scale.
In this way, the head-related transfer function generator 1a can identify a first integrated frequency band that has accuracy of a certain level or higher and is based on the frequency width that can be identified by the listener for training. In addition, in this way, the head-related transfer function generator 1a can identify a second integrated frequency band that has accuracy of a certain level or higher and is based on the frequency width that can be identified by the listener for training.
In addition, the head-related transfer function generator 1a acquires data that represents the shape of the pinna of the listener for inference. Then, the head-related transfer function generator 1a executes the third process and the fourth process. The third process is a process of calculating a third scale having a correlation with a third probability corresponding to a third integrated frequency band including a first notch having the lowest frequency among notches included in the individualized head-related transfer function of a listener for inference using the shape of the pinna of the listener for inference and the first correlation matrix and estimating an integrated frequency band having the highest third probability as a third integrated frequency band for each of the plurality of integrated frequency bands. The fourth process is a process of calculating a fourth scale having a correlation with a fourth probability corresponding to a fourth integrated frequency band including a second notch having the second lowest frequency among notches included in the individualized head-related transfer function of a listener for inference using the shape of the pinna of the listener for inference and the second correlation matrix and estimating an integrated frequency band having the highest fourth probability as a fourth integrated frequency band for each of the plurality of integrated frequency bands.
In this way, the head-related transfer function generator 1a can estimate the third integrated frequency band that has accuracy of a certain level or higher and is based on a frequency width that can be identified by the listener for training for an individualized head-related transfer function of the listener for inference whose shape of the pinna is unknown. In addition, in this way, the head-related transfer function generator 1a can estimate the fourth integrated frequency band that has accuracy of a certain level or higher and is based on a frequency width that can be identified by the listener for training for an individualized head-related transfer function of the listener for inference whose shape of the pinna is unknown.
In addition, in the embodiment described above, although a case in which the head-related transfer function generator 1 calculates the first correlation matrix and the second correlation matrix by executing a discriminant analysis has been described as an example, the configuration is not limited thereto.
For example, the relation deriving unit 107, in the first process, may derive a first learned model that has been caused to learn using training data having the shape of the pinna of a listener for training as a problem and having a first frequency band as an answer as a relation derived by the first process. In such a case, the relation deriving unit 107 calculates a first scale using the first learned model and the shape of the pinna of the listener for training and identifies a frequency band having the highest first probability among a plurality of frequency bands as a first frequency band on the basis of the first scale.
In addition, for example, the relation deriving unit 107, in the second process, may derive a second learned model that has been caused to learn using training data having the shape of the pinna of a listener for training as a problem and having a second frequency band as an answer as a relation derived by the second process. In such a case, the relation deriving unit 107 calculates a second scale using the second learned model and the shape of the pinna of the listener for training and identifies a frequency band having the highest second probability among a plurality of frequency bands as a second frequency band on the basis of the second scale.
In addition, for example, the relation deriving unit 108a, in the first process, may derive a first learned model that has been caused to learn using training data having the shape of the pinna of a listener for training as a problem and having a first integrated frequency band as an answer as a relation derived by the first process. In such a case, the relation deriving unit 108a calculates a first scale using the first learned model and the shape of the pinna of the listener for training and identifies an integrated frequency band having the highest first probability among a plurality of integrated frequency bands as a first integrated frequency band on the basis of the first scale.
In addition, for example, the relation deriving unit 108a, in the second process, may derive a second learned model that has been caused to learn using training data having the shape of the pinna of a listener for training as a problem and having a second integrated frequency band as an answer as a relation derived by the second process. In such a case, the relation deriving unit 108a calculates a second scale using the second learned model and the shape of the pinna of the listener for training and identifies an integrated frequency band having the highest second probability among a plurality of integrated frequency bands as a second integrated frequency band on the basis of the second scale.
In addition, in the embodiment described above, a case in which the head-related transfer function generator 1 calculates the third scale using the first correlation matrix and calculates the fourth scale using the second correlation matrix has been described as an example, the configuration is not limited thereto. For example, the frequency band estimating unit 108 may calculate the third scale using the first learned model. In addition, for example, the frequency band estimating unit 108 may calculate the fourth scale using the second learned model.
In addition, in the embodiment described above, a case in which the head-related transfer function generator 1a calculates the third scale using the first correlation matrix and calculates the fourth scale using the second correlation matrix has been described as an example, the configuration is not limited thereto. For example, the integrated frequency band estimating unit 109a may calculate the third scale using the first learned model. In addition, for example, the integrated frequency band estimating unit 109a may calculate the fourth scale using the second learned model.
Furthermore, at least some of the functions of the head-related transfer function generator 1 according to the embodiment described above may be realized by recording a program for realizing such functions in a computer-readable recording medium and causing a computer system to read and execute the program recorded in this recording medium. The “computer system” described here includes an operating system (OS) and hardware such as peripherals.
Furthermore, the “computer-readable recording medium” represents a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM or a storage unit such as a hard disk built into the computer system. In addition, the “computer-readable recording medium” may include a medium dynamically storing the program for a short time such as a communication line in a case in which the program is transmitted via a network such as the Internet or a communication line such as a telephone line and a medium storing the program for a predetermined time such as a volatile memory inside a computer system serving as a server or a client in the case. In addition, the program described above may be used for realizing some of the functions described above and may realize the functions described above in combination with a program that has already been recorded in the computer system.
While preferred embodiments of the invention have been described and illustrated above, it should be understood that these are exemplary of the invention and are not to be considered as limiting. Additions, omissions, substitutions, and other modifications can be made without departing from the spirit or scope of the present invention. Accordingly, the invention is not to be considered as being limited by the foregoing description, and is only limited by the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
JP2020-090035 | May 2020 | JP | national |
JP2020-200590 | Dec 2020 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
6795556 | Sibbald | Sep 2004 | B1 |
9681250 | Luo | Jun 2017 | B2 |
9961466 | Oh | May 2018 | B2 |
20150156599 | Romigh | Jun 2015 | A1 |
20190014431 | Lee | Jan 2019 | A1 |
20200374647 | Cappello et al. | Nov 2020 | A1 |
Number | Date | Country |
---|---|---|
2008-211834 | Sep 2008 | JP |
A-2016-201723 | Dec 2016 | JP |
2017-085362 | May 2017 | JP |
2019-169835 | Oct 2019 | JP |
2020-170938 | Oct 2020 | JP |
Entry |
---|
Aizaki et al., “Band-divided notch-peak model for head-related transfer function—Relation between divided bandwidth and accuracy of sound image localization,” (w/ Partial English Translation) 2019 Graduation Research Summary, Jan. 29, 2020, 12 pages. |
Aizaki et al., “Band-divided notch-peak model for head-related transfer function—Relation between divided bandwith and accuracy of sound image localization,” (w/ Partial English Translation) Proceedings of the Acoustical Society of Japan, Japanese Acoustical Society 2020 Spring Research Conference Mar. 2, 2020, 28 pages. |
Nishiyama et al., “Category estimation of notch frequency of individual head-related transfer function based on auricle shape,” (w/ English Translation) 3-4-4, Reports of the meeting of the Acoustical Society of Japan, Sep. 2020, 11 pages. |
Japanese Notice of Allowance (w/ English translation) for corresponding JP Application No. 2020-200590, dated Oct. 26, 2021, 5 pages. |
Number | Date | Country | |
---|---|---|---|
20210368285 A1 | Nov 2021 | US |