The present disclosure relates to a method and system for biometric authentication and dynamic compensation for a headphone based on headphone transfer function (HPTF).
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Biometric authentication is used to enable a seamless user experience to edge devices, such as mobile phones and laptops, while providing device security. To enable a better user experience, various techniques are known to reduce the intent to action time. This intent to action time is defined by the moment the user wants the target device to execute an action to the moment the edge device finishes execution. Modern recognition techniques, such as image and speech recognition techniques, reduce the intent to action time. Recent advancements in edge computing combined with cloud services have greatly improved the quality of life.
Facial recognition is based on having a camera mounted on the target device, and facial recognition is achieved by comparing the pre-registered facial features using neural network related techniques. Various techniques are then used to enhance the visual precision, such as infrared-based (IR-based) depth sensor and stereoscopic imaging. These methods are mostly used to prevent ill-intent personnel from breaking the systems by showing the target's photos. However, these systems tend to be more costly in terms of power consumption and sensor costs. In addition, mobile devices may not include image sensors on the front to achieve higher screen to body ratio.
Speech recognition is based on having a microphone to capture acoustic input and then analyze the real-time streaming input to the pre-registered commands for a match. Since the recognition accuracy is coupled with a signal to noise ratio (SNR), commonly known routines such as multi-mic and noise reduction routines are used to increase accuracy. Multi-channel and noise reduction techniques are also costly in terms of power consumption and sensor costs. Also, voice recognition requires users to speak the keywords, which may be inconvenient in public.
Moreover, commonly in many headphones, the HPTF is measured by using ear simulators on dummy heads. The acoustics operator tunes the frequency response of the headphone according to the measured HPTF. However, due to the individual differences, the HPTF measured by the ear simulator may not be satisfactory. When an end user uses the headphone and listens to the music, the audio output may not be the desired sound that the acoustics operator has tuned. Different listeners may hear different sound in one headphone regardless of how the headphone is worn. In addition, even though the headphone may have sufficient bass performance, the listener may hear a lesser degree of bass when the user does not wear the headphone properly due to air leakage between the headphone and the user's ear.
The individual HPTF of the listener involves the different reflections between the inner surface of the headphone and the eardrum from those of the measured HPTF, or just because of some undesired air leakage, which introduces some timbre distortions.
To play back sounds to different listeners through headphones, the HPTF may be calibrated and compensated.
This section provides a general summary of the disclosure and is not a comprehensive disclosure of its full scope or all of its features.
According to one aspect of the disclosure, a method of authentication and dynamic compensation for a headphone is provided. The method includes performing the authentication for a user based on a headphone transfer function (HPTF) of the user when the user wears the headphone. The method includes detecting whether a frequency response deviation exists between the HPTF of the user and a tuned HPTF. The method includes dynamically compensating for the HPTF of the user based on the detected frequency response deviation. According to another aspect of the present disclosure, a system of authentication and dynamic compensation for a headphone is provided. The system comprises a computer-readable storage medium and a processor coupled to the memory. The processor is configured to perform the authentication for a user based on headphone transfer function (HPTF) of the user when the user wears the headphone. Further, the processor is configured to detect whether a frequency response deviation exists between the HPTF of the user and a tuned HPTF. Furthermore, the processor is configured to dynamically compensate for the user's HPTF based on the detected frequency response deviation
According to yet another aspect of the present disclosure, a computer-readable storage medium comprising computer-executable instructions is provided which, when executed by a computer, causes the computer to perform the methods disclosed herein.
In one aspect, performing the authentication further comprises constructing an HPTF model and an authentication decision, measuring the HPTF of the user, and authenticating the user based on the measured HPTF, the constructed HPTF model, and the authentication decision. In one aspect, constructing the HPTF model and the authentication decision further comprises collecting global HPTF from a plurality of additional users, forming a global model with a global distribution based on the collected global HPTF, collecting local HPTF from the user, forming a local model with a local distribution based on the collected local HPTF, and determining run time lost coefficients based on a predefined lost function. In one aspect, the method includes computing a feature distance based on the global model and the local model, determining the authentication is successful when the feature distance is closer to the local model than the global model, and determining the authentication is unsuccessful when the feature distance is closer to the global model than the local model. In one aspect, the global model and the local model are based on a Gaussian Mixture Model. In one aspect, the method further comprises measuring an anechoic free field transducer to microphone transfer function. In one aspect, detecting the frequency response deviation between the HPTF of the user and the tuned HPTF further comprises generating an estimated HPTF of the user based on a filtered least mean squared routine, obtaining a magnitude response of the estimated HPTF of the user, comparing the magnitude response and a tuned magnitude response, and determining the frequency response deviation in real time based on the comparison.
According to another aspect of the disclosure, a method of authentication and dynamic compensation for a headphone is provided. The method includes measuring a headphone transfer function (HPTF) of a user when the user wears the headphone, constructing an HPTF model and an authentication decision, and authenticating the user based on the measured HPTF, the constructed HPTF model, and the authentication decision. The method includes generating an estimated HPTF of the user based on a filtered least mean squared routine, obtaining a magnitude response of the estimated HPTF of the user, comparing the magnitude response and a tuned magnitude response, detecting a frequency response deviation between the HPTF of the user and a tuned HPTF, and dynamically compensating for the HPTF of the user based on the detected frequency response deviation.
Further areas of applicability will become apparent from the description provided herein. It should be understood that the description and specific examples are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
In order that the disclosure may be well understood, there will now be described various forms thereof, given by way of example, reference being made to the accompanying drawings, in which:
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements disclosed in one embodiment may be utilized in other embodiments without specific recitation. The drawings referred to here should not be understood as being drawn to scale unless specifically noted. Also, the drawings are often simplified, and details or components may be omitted for clarity of presentation and explanation. The drawings and discussion serve to explain principles discussed below, where like designations denote like elements.
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.
The following description is merely exemplary in nature and is not intended to limit the present disclosure, application, or uses. It should be understood that throughout the drawings, corresponding reference numerals indicate like or corresponding parts and features.
Examples will be provided below for illustration. The descriptions of the various examples will be presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.
The headphone transfer function (HPTF) is defined as the acoustic transfer function from the speaker of a headphone to the sound pressure at the eardrum. In general, the individual HPTF varies with different headphones or listeners, since each headphone has its own designed feature, and each listener has unique characteristics of the ear. Accordingly, this disclosure will provide embodiments for applications based on HPTF. For example, in a headphone product, the method and the system discussed herein may be applied to a biometric authentication. After the biometric authentication, the disclosure will provide a method and system for detection and calibration of frequency response deviation to obtain a desired sound performance for individual users during use of the headphone product.
Active Noise Cancelling (ANC) headphones are based on monitoring the surrounding noise. Namely, it captures the environmental sound using both internal and external microphones. Then, by keeping the magnitude and inverting the phase of the surrounding noise with calibrated playback system, high precision anti-noise with closely coupled feedback loops can be reproduced.
HPTF is relevant to at least two parts, e.g., the free field measurement and the impulse response between the pinna plus ear canal and the internal microphone. Since the free field measurement can be measured in a controlled environment, and the manufacture tolerance can be calibrated in production line, the remaining variable is the microphone to pinna plus ear canal response, which is referred to hereinafter as Ear Reference Point (ERP) to Ear Entrance Point (EEP). This ERP to EEP transfer function (Hear) is different from person to person between pinna plus ear canal.
w(n+1)=w(n)−μe(n)r′(n) (1)
In relation (1), μ is the adaptation step-size, w(n) is the weight vector at time n, e(n)=d(n)+wT(n)r(n). e(n) is the residual noise measured by the error microphone, d(n) is the noise to be canceled, and r(n) and r′(n) are obtained from the convolutions r(n)=h(n)*x(n) and r′(n)=h′(n)*x(n), respectively. x(n) is the synthesized reference signal, and h(n) and h′(n) are the impulse responses H(f) and H′(f) respectively. H(f) is the transfer function of the secondary path, and H′(f) is the estimate of H(f), which is also regarded as HPTF. The system configuration of FxLMS are illustrated as
HPTF Authentication
As for the application of authentication, the HPTF difference problem can be transformed into an identification problem, which could be solved with statistically modelling, such as Bayes approach and neural networks.
To distinguish between a generic HPTF to the target user, statistical models will be used. In this embodiment, a Gaussian Mixture Model (GMM) is constructed based on the impulse response measured. To construct the GMM reference, the free field response in the anechoic chamber is first measured as Hfree-field(f). For each data point i∈P persons, which is measured M times (total size of P*M) used for training, the transducer to microphone transfer function is captured and is depicted as HHPTF(f) (i omitted). Then Hear(f) is obtained based on relation (2) shown below.
H
ear(f)=HHPTF(f)/Hfree-field(f) (2)
To increase the accuracy, the data may be pre-processed into magnitude data and relative phase data, as shown below in relations (3)-(4).
|Hear(f)|=√{square root over (Re(Hear(f))2+Im(Hear(f))2)} (3)
∠Hear(f)=tan−1[Im(Hear(f))/Re(Hear(f))] (4)
Then, each data point (i) can be treated as a vector of [magnitude, phase]×[left, right] per sample data and measured M times on each test subject's head for different fittings. The global model then is trained following the GMM model construction procedure accordingly to obtain X˜Nglobal(μ, σ).
HPTF Model Construction and Authentication Decision
For example, anechoic free field transducer to microphone transfer function may be measured, i.e., Hfree-fieid(f) is obtained. Referring to
To register a new target, by using FxLMS combined with the stored Hfree-field(f), Htarget(f)=HHPTF(f)/Hfree-field(f) can be extracted, and this process for the target user will be repeated M times to create local model as Y˜Nlocal(μ, σ) by predefined feature distance D, which in this case, could be simplified as the distribution Minimum Mean Square Error (MMSE), as shown below in relation (5).
In relation (5), β0 . . . βP are parameter estimates.
To achieve bio-authentication using the model created above, the distance function is computed as the following: if mean(∥X−Y∥)>(∥Y−μY∥), as the feature distance, is closer to local Y˜Nlocal(μY, σY) than global X˜Nglobal(μx, σx), then it can be determined that the device is authenticated. Otherwise, if the feature distance is closer to global X˜Nglobal(μx, σx) than local Y˜Nlocal(μY, σY), then the authentication returns failure as result.
Runtime HPTF Extraction Model
Deviation Detection and Frequency Response Calibration
To playback sounds to different listeners through headphones and improve the sound experience of the user, the HPTF may be calibrated and compensated. For example, one method may be used to put a microphone inside the ear canal of the listener and perform a one-time calibration or playing a sweep signal or other measurement signal. It can compensate the HPTF but may maintain a short time after the compensation, since the listener might not wear the headphone at the same position each time, which means the listener has to repeat this calibration every time the user wants to use the headphone. Otherwise, the calibration may be ineffective. An improved adaptive and effective method for compensation in real time is further disclosed herein.
Considering that listeners may wear the headphone with air leakage, and different listeners have different HPTFs among each other and compared to a standard dummy head, a method is proposed herein to compensate the difference between the real HPTF and the well-designed one by the acoustics operator.
M(f)=|H(f)|, M0(f)=|H0(f)| (6)
In relation (6), “| |” is the absolute value operator. Then, at S503, M(f) and M0(f) are compared to determine the frequency response deviation is when the listener wears the headphone, for example, to determine how much the air leakage is in low frequency range.
Then, at S504, the dynamic compensation for the user's HPTF curve is performed based on the detected frequency response deviation. For example, a smooth and limited calibration function F(*) is used to obtain the compensated magnitude Mc(f) of their difference, as shown below in relation (7).
M
c(f)=F(M0(f)−M(f)) (7)
In relation (7), F(*) may be a linear or nonlinear function, for example,
and α and β are two parameters we can tune depending on the real system.
In this disclosure, systems and methods are provided to detect the individual differences between HPTF across different users. The systems and methods demonstrate the leverage the differences for an application, such as biometric authentication and headphone fitness detection based on frequency response deviation. Finally, based on the delta difference between the detected HPTF and the target curve, dynamic compensation for the differences can be performed and consistent listening experiences are provided.
The systems and methods described herein use the runtime computed HPTF model to interact with hearable devices. Such actions may be found in consumer devices, such as unlocking secure devices (e.g., mobile phones) and acoustic personalization (e.g. play/pause, load/store playlist, etc.). The systems and methods may also be applied to e-commerce and software services. For example, authentication protocol for secured payments (e.g., Google® Store) and conference software for identity identification and verification (e.g., WebEx® login ID automated meeting setup). The technique disclosed herein is based on the differences of HPTF between individuals from both the left and right ears and provides an alternative embodiment for both digital authentication and human computer interaction. The systems and methods described herein are applicable to the method of using statistical analysis to determine the hearable acoustic behavior.
Aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module”, “unit” or “system.”
The present disclosure may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
Unless otherwise expressly indicated herein, all numerical values indicating mechanical/thermal properties, compositional percentages, dimensions and/or tolerances, or other characteristics are to be understood as modified by the word “about” or “approximately” in describing the scope of the present disclosure. This modification is desired for various reasons including industrial practice, material, manufacturing, and assembly tolerances, and testing capability.
As used herein, the phrase at least one of A, B, and C should be construed to mean a logical (A OR B OR C), using a non-exclusive logical OR, and should not be construed to mean “at least one of A, at least one of B, and at least one of C.”
The term memory is a subset of the term computer-readable medium. The term computer-readable medium, as used herein, does not encompass transitory electrical or electromagnetic signals propagating through a medium (such as on a carrier wave); the term computer-readable medium may therefore be considered tangible and non-transitory. Non-limiting examples of a non-transitory, tangible computer-readable medium are nonvolatile memory circuits (such as a flash memory circuit, an erasable programmable read-only memory circuit, or a mask read-only circuit), volatile memory circuits (such as a static random access memory circuit or a dynamic random access memory circuit), magnetic storage media (such as an analog or digital magnetic tape or a hard disk drive), and optical storage media (such as a CD, a DVD, or a Blu-ray Disc).
The apparatuses and methods described in this application may be partially or fully implemented by a special purpose computer created by configuring a general-purpose computer to execute one or more particular functions embodied in computer programs. The functional blocks, flowchart components, and other elements described above serve as software specifications, which can be translated into the computer programs by the routine work of a skilled technician or programmer.
The description of the disclosure is merely exemplary in nature and, thus, variations that do not depart from the substance of the disclosure are intended to be within the scope of the disclosure. Such variations are not to be regarded as a departure from the spirit and scope of the disclosure.
This application is a continuation of International Application No. PCT/CN2020/112776, filed on Sep. 1, 2020. The disclosure of the above application is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2020/112776 | Sep 2020 | US |
Child | 18115875 | US |