With increasing public awareness of privacy concerns, voice modification, in particular, voice anonymization, is of particular interest, see, e.g., [2]. To address privacy concerns, for example, with respect to voice recordings in smart speakers and, for example, with respect to voice recordings in other IoT scenarios, where speech signals are recorded, stored, and analyzed, voice anonymization may, e.g., be conducted. Moreover, technology to address to regulatory requirements regarding privacy (for example, the General Data Protection Regulation, GDPR) may, e.g., be needed. Voice anonymization or Avatar-adaptation for conversations in the metaverse are fields where voice anonymization is appreciated.
The introduction of the VoicePrivacy Challenge has stirred a multinational interest in design of voice anonymization systems. The introduced framework consists of baselines, evaluation metrics and attack models and has been utilized by researchers to improve voice anonymization.
Voice anonymization may, for example, be conducted by a voice processing block that modifies a speech signal, so that a voice recording cannot be traced back to the original speaker.
For example, an acoustic front-end anonymizes the speaker's character before exchanging data with a voice assistant service has been proposed (see [16])
In conventional technology, a system for voice anonymization, referred to as baseline B1 system or as B1.a system has been provided in [1]. Further submissions mostly focused on changes to the individual blocks of the baselines. However, regardless of the individual modifications to this baseline by different groups, the obtained audio recordings are considered ‘unnatural’, see [2].
To improve anonymization performance as well as intelligibility, F0 modifications have been explored in the previous edition of the VoicePrivacy Challenge and subsequent works utilizing the challenge framework. Among the techniques investigated are creating a dictionary of F0 statistics (mean and variance) per identity and utilizing these for shifting and scaling the F0 trajectories [3], applying low-complexity DSP modifications [4] and applying functional principal component analysis (PCA) to get speaker-dependent parts [5].
BNs are extracted using a time delay neural network (TDNN) that actively prevents leaking of the speaker-dependent parts [6]. Thus, it is safe to assume that BNs do not contain immediately available speaker-dependent cues. x-vectors are returned as a single average per utterance or speaker, hence are hoped to have averaged out the effects of different linguistic content within the presented voice sample(s). Instead of supervisedly obtained PPGs, unsupervised representations are also used to represent individual sounds (see, e.g., [18]).
On the other hand, F0s are a complex combination of the identity of the speaker, the linguistic meaning, and the prosody, which also includes situational aspects such as emotions and speech rate [7]. Many speech synthesizers, notably the neural source-filters (NSFs), incorporate F0 trajectories as a parameter to control the initial excitation, mimicking the voice cords [8]. Thus, data-driven parts of the architectures have relatively little control over shaping the excitation.
According to an embodiment, a system for conducting voice modification on an audio input signal including speech to obtain an audio output signal may have: a feature extractor for extracting feature information of the speech from the audio input signal, a fundamental frequencies generator for generating modified fundamental frequency information depending on the feature information, such that the modified fundamental frequency information includes modified fundamental frequencies being different from real fundamental frequencies of the speech, and/or such that the modified fundamental frequency information indicates a modified fundamental frequency trajectory being different from a real fundamental frequency trajectory of the speech, and a synthesizer for generating the audio output signal depending on the modified fundamental frequency information and depending on the feature information.
According to another embodiment, a system may have: a system for conducting voice anonymization, wherein the speech in the audio input signal is speech that has not been anonymized, wherein the modifier is an anonymizer for generating anonymized second feature information as the modified second feature information depending on the second feature information, such that the anonymized second feature information is different from the second feature information, wherein the fundamental frequencies generator is configured to generate anonymized fundamental frequency information as the modified fundamental frequency information using the first feature information and using the anonymized second feature information, and wherein the synthesizer is configured to generate the audio output signal using the anonymized fundamental frequency information, using the first feature information and using the anonymized second feature information; and a system for conducting voice de-anonymization, wherein the system for conducting voice anonymization is configured to generate an audio output signal including speech that is anonymized, wherein the system for conducting voice de-anonymization is configured to receive the audio output signal that has been generated by the system for conducting voice anonymization as an audio input signal, and wherein the system for conducting voice de-anonymization is configured to generate an audio output signal from the audio input signal such that the speech in the audio output signal is de-anonymized.
According to another embodiment, a method for conducting voice modification on an audio input signal including speech to obtain an audio output signal may have the steps of: extracting feature information of the speech from the audio input signal, generating modified fundamental frequency information depending on the feature information, such that the modified fundamental frequency information includes modified fundamental frequencies being different from real fundamental frequencies of the speech, and/or such that the modified fundamental frequency information indicates a modified fundamental frequency trajectory being different from a real fundamental frequency trajectory of the speech, and generating the audio output signal depending on the modified fundamental frequency information and depending on the feature information.
Another embodiment may have a non-transitory digital storage medium having a computer program stored thereon to perform the inventive method for conducting voice modification on an audio input signal including speech to obtain an audio output signal, when said computer program is run by a computer.
A system for conducting voice modification on an audio input signal comprising speech to obtain an audio output signal according to an embodiment is provided. The system comprises a feature extractor for extracting feature information of the speech from the audio input signal. Moreover, the system comprises a fundamental frequencies generator to generate modified fundamental frequency information depending on the feature information, such that the modified fundamental frequency information comprises modified fundamental frequencies being different from real fundamental frequencies of the speech, and/or such that the modified fundamental frequency information indicates a modified fundamental frequency trajectory being different from a real fundamental frequency trajectory of the speech. Furthermore, the system comprises a synthesizer for generating the audio output signal depending on the modified fundamental frequency information and depending on the feature information.
Moreover, a method for conducting voice modification on an audio input signal comprising speech to obtain an audio output signal according to an embodiment is provided. The method comprises:
Furthermore, a computer program for implementing the above-described method when being executed on a computer or signal processor according to an embodiment is provided.
According to some embodiments, a fundamental frequency trajectory may, e.g., be derived from BN/PPG feature and from an anonymized x-Vector, e.g., on a frame-by-frame level, using neural network.
In some embodiments, a classification of voiced and unvoiced frames from BN/PPG features and from an anonymized x-Vector on a frame-by-frame level may, e.g., be conducted using a neural network.
Some embodiments relate to deriving fundamental frequencies (F0) from x-vectors and phonetic posteriorgrams (PPG) for voice modification, e.g., voice anonymization.
According to some embodiments, a (e.g., supervised) training of a neural network may, e.g., be conducted using F0 trajectories of speech signals as ground truth and BN/PPG features and x-vectors as input.
In some embodiments, a voice modification system, for example, with BN/PPG feature extraction and with x-Vector feature extraction, for example without F0 feature extraction, is provided.
According to an embodiment, a (possibly optional) manipulation (e.g., smoothing, modulation) of a derived F0 trajectory to further anonymize F0 may, e.g., be conducted.
Some embodiments provide a VoicePrivacy system description, which realizes speaker anonymization with feature-matched F0 Trajectories
According to an embodiment, a novel method to improve the performance of the VoicePrivacy Challenge 2022 baseline B1 variants is provided. Known deficiencies of x-vector-based anonymization systems include the insufficient disentangling of the input features. In particular, the fundamental frequency (F0) trajectories, which are used for voice synthesis without any modifications. Especially in cross-gender conversion, this situation causes unnatural sounding voices, increases word error rates (WERs), and personal information leakage.
Embodiments overcome the problems of conventional technology by synthesizing an F0 trajectory, which better harmonizes with the anonymized x-vector.
Some embodiments utilize a low-complexity deep neural network to estimate an appropriate F0 value per frame, using the linguistic content from the bottleneck features (BN) and the anonymized x-vector. The inventive approach results in a significantly improved anonymization system and increased naturalness of the synthesized voice.
The present invention is inter alia based on the finding that anonymizing speech can be achieved by synthesizing one or more of the following three components, namely, the fundamental frequencies (F0) of the input speech, the phonetic posteriorgrams (also referred to as bottleneck feature, BN) and an anonymized x-vector.
Some embodiments are based on the finding that F0 trajectories contribute to anonymization and modifications are promising to improve the performance of the system.
Embodiments may, e.g., apply a correction to the F0 trajectories before the synthesis such that they match the BNs and x-vectors.
For some of the provided embodiments, F0 extraction is not required for voice anonymization.
Embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
Before embodiments of the present invention are described in detail, some background information is provided.
However, it has been found in the past that improved systems other than LPC-based voice anonymization systems would be provided.
Basically, the system of
The concept of extracting the x-vector from speech input using a neural network has been proposed in 2018 in [15]: Snyder et al. “X-Vectors: Robust DNN Embeddings for Speaker Recognition” and is now well-known for the skilled person. The content of that well-known paper, in particular its section 2, is hereby incorporated by reference. The resulting x-vector that is obtained depends on and characterizes the speech input. Also, [1] explains and employs in its chapter 3.1 the feature extraction in x-vectors, which is herein incorporated by reference.
As the system of
Using the extracted fundamental frequencies, the obtained phonetic posteriorgrams and the anonymized C-vector, a synthesizer 240 then generates the speech output with the anonymized voice.
[1]: “Speaker Anonymization Using X-vector and Neural Waveform Models,”, 2019, proposes a particular, well-known, approach in its chapter 3.3 “Waveform Generation”, which is also incorporated herein by reference, to obtain the output speech from the anonymized x-vector. See also [2],
In the system of
The inventors have found that it may be beneficial in the system of
In the following, embodiments of the present invention are provided in detail.
The system comprises a feature extractor 210 for extracting feature information of the speech from the audio input signal.
Moreover, the system comprises a fundamental frequencies generator 230 to generate modified fundamental frequency information depending on the feature information, such that the modified fundamental frequency information comprises modified fundamental frequencies being different from real fundamental frequencies of the speech, and/or such that the modified fundamental frequency information indicates a modified fundamental frequency trajectory being different from a real fundamental frequency trajectory of the speech.
Furthermore, the system comprises a synthesizer 240 for generating the audio output signal depending on the modified fundamental frequency information and depending on the feature information.
According to an embodiment, the feature information may, e.g., comprise first feature information and second feature information. The system may, e.g., comprise a modifier 220 for generating modified second feature information depending on the second feature information, such that the modified second feature information is different from the second feature information. The fundamental frequencies generator 230 may, e.g., be configured to generate the modified fundamental frequency information using the first feature information and using the modified second feature information. The synthesizer 240 may, e.g., be configured to generate the audio output signal using the modified fundamental frequency information, using the first feature information and using the modified second feature information.
In an embodiment, the first feature information may, e.g., comprise phonetic posteriorgrams or other bottleneck features of the speech. The fundamental frequencies generator 230 may, e.g., be configured to generate the modified fundamental frequency information using the phonetic posteriorgrams or the other bottleneck features of the speech and using the modified second feature information. The synthesizer 240 may, e.g., be configured to generate the audio output signal using the modified fundamental frequency information, using the phonetic posteriorgrams or the other bottleneck features of the speech and using the modified second feature information.
Bottleneck features of the speech may, for example, be phonetic posteriograms of the speech, or may, for example, be triphone-based bottleneck features. (see [17]: P. Champion, D. Jouvet, and A. Larcher, “Speaker information modification in the VoicePrivacy 2020 toolchain”. This paper is incorporated by reference. In particular its chapters 1 to 4, are herewith incorporated by reference.) Triphone-based bottleneck features are by default not sanitised of the personal information as the PPGs. Thus semi-adversarial training may, e.g., be useful.
According to an embodiment, the fundamental frequencies generator 230 may, e.g., be implemented as a machine-trained system and/or may, e.g., be implemented as an artificial intelligence system.
In an embodiment, the fundamental frequencies generator 230 may, e.g., be implemented as a neural network, being configured to receive the first feature information and the modified second feature information as input values of the neural network, wherein the output values of the neural network comprise the modified fundamental frequencies and/or indicate the modified fundamental frequencies trajectory.
According to an embodiment, the neural network of the fundamental frequencies generator 230 may, e.g., comprise one or more fully connected layers such that each node of the one or more fully connected layers depends on all input values of the neural network, such that each node of the fully connected layers depends on the first feature information and depends on the modified second feature information.
In an embodiment, the neural network of the fundamental frequencies generator (230) has been trained by conducting supervised training of the neural network using fundamental frequencies and/or fundamental frequency trajectories of speech signals.
According to an embodiment, the neural network of the fundamental frequencies generator 230 may, e.g., be a first neural network. The modifier 220 may, e.g., be implemented as a second neural network. The second neural network may, e.g., be configured to receive input values from a plurality of frames of the audio input signal. The second neural network may, e.g., be configured to output the second feature information as its output values.
In an embodiment, the second feature information may, e.g., be an x-vector of the speech.
According to an embodiment, the modifier 220 may, e.g., be configured to generate a modified x-vector as the modified second feature information by choosing, depending on the x-vector of the speech, an x-vector from a group of available x-Vectors, such the x-vector being chosen from the group of x-vectors is different from the x-vector of the speech. The first neural network of the fundamental frequencies generator 230 may, e.g., be configured to receive the phonetic posteriorgrams or the other bottleneck features of the speech and the modified x-vector as the input values of the first neural network, and may, e.g., be configured to output its output values comprising the modified fundamental frequencies and/or indicating the modified fundamental frequencies trajectory. The synthesizer 240 may, e.g., be configured to generate the audio output signal using the phonetic posteriorgrams or the other bottleneck features of the speech and using the modified x-vector and depending on the output values of the first neural network that comprise the modified fundamental frequencies and/or that indicate the modified fundamental frequencies trajectory.
In an embodiment, system may, e.g., further comprise an output value modifier 235 for modifying the output values of the first neural network of the fundamental frequencies generator 230 to obtain amended values that comprise amended fundamental frequencies and/or that indicate an amended fundamental frequencies trajectory. The synthesizer 240 may, e.g., be configured to generate the audio output signal using the phonetic posteriorgrams or the other bottleneck features of the speech, using the modified x-vector and using the amended values.
According to an embodiment, the system may, e.g., further comprise a fundamental frequencies extractor 216 for extracting the real fundamental frequencies of the speech. The system may, e.g., comprise a second fundamental frequencies generator 231 for generating second fundamental frequency information using the phonetic posteriorgrams or the other bottleneck features of the speech and using the x-vector of the speech. The system may, e.g., further comprise a first combiner 232 (e.g., a subtractor 232) for generating (e.g., subtracting), depending on the real fundamental frequencies of the speech and depending on the second fundamental frequency information, values indicating a fundamental frequencies residuum. The system may, e.g., comprise a second combiner for combining (e.g., adding) the output values of the first neural network of the fundamental frequencies generator 230 and the values indicating the fundamental frequencies residuum to obtain combined values. The synthesizer 240 may, e.g., be configured to generate the audio output signal using the phonetic posteriorgrams or the other bottleneck features of the speech and using the modified x-vector and depending on the combined values.
In an embodiment, the synthesizer 240 may, e.g., be implemented as a neural vocoder and/or may, e.g., be implemented as a machine-trained system and/or may, e.g., be implemented as an artificial intelligence system and/or may, e.g., be implemented as a neural network.
According to an embodiment, the system may, e.g., be a system for conducting voice anonymization. The speech in the audio input signal may, e.g., be speech that has not been anonymized. The modifier 220 may, e.g., be an anonymizer 221 for generating anonymized second feature information as the modified second feature information depending on the second feature information, such that the anonymized second feature information may, e.g., be different from the second feature information. The fundamental frequencies generator 230 may, e.g., be configured to generate anonymized fundamental frequency information as the modified fundamental frequency information using the first feature information and using the anonymized second feature information. The synthesizer 240 may, e.g., be configured to generate the audio output signal using the anonymized fundamental frequency information, using the first feature information and using the anonymized second feature information.
In an embodiment, the system may, e.g., be a system for conducting voice de-anonymization. The speech in the audio input signal may, e.g., be speech that has been anonymized. The modifier 220 may, e.g., be a de-anonymizer 222 for generating de-anonymized second feature information as the modified second feature information depending on the second feature information, such that the de-anonymized second feature information may, e.g., be different from the second feature information. The fundamental frequencies generator 230 may, e.g., be configured to generate de-anonymized fundamental frequency information as the modified fundamental frequency information using the first feature information and using the de-anonymized second feature information. The synthesizer 240 may, e.g., be configured to generate the audio output signal using the de-anonymized fundamental frequency information, using the first feature information and using the de-anonymized second feature information.
According to an embodiment, the speech in the audio input signal may, e.g., be speech that has been anonymized according to a first mapping rule. The de-anonymizer 222 may, e.g., be configured to generating de-anonymized second feature information depending on the second feature information using a second mapping rule that depends on the first mapping rule. For example, the first and the second mapping rule may, e.g., define a mapping from an x-vector of the speech to a modified x-vector. Or, the first and the second mapping rule may, e.g., define a rule for selecting an x-vector from a plurality of x-vectors as a selected x-vector/as a modified x-vector depending on an (extracted) x-vector of the speech in the audio input signal.
In an embodiment, the system may, e.g., be configured to receive the information on the second mapping rule by receiving a bitstream that comprises the information on the second mapping rule. Or, the system may, e.g., be configured to receive information on the first mapping rule by receiving a bitstream that comprises the information on the first mapping rule, and the system may, e.g., be configured to derive information on the second mapping rule from the information on the first mapping rule.
Moreover, a system is provided. The system comprises a system for conducting voice anonymization, and a system for conducting voice de-anonymization. The system for conducting voice anonymization may, e.g., be configured to generate an audio output signal comprising speech that may, e.g., be anonymized. The system for conducting voice de-anonymization may, e.g., be configured to receive the audio output signal that has been generated by the system for conducting voice anonymization as an audio input signal. Moreover, the system for conducting voice de-anonymization may, e.g., be configured to generate an audio output signal from the audio input signal such that the speech in the audio output signal may, e.g., be de-anonymized.
In the following, particular embodiments of the present invention are described.
The embodiment of
Instead, the system of
For this purpose, in
Thus, rather than from the input speech, the fundamental frequencies/F0 trajectories (which are then used for the speech synthesis, e.g., in synthesizer 240) are generated from the modified x-vector and from the phonetic posteriorgrams or from the other bottleneck features using the neural network, e.g., using a Deep Neural Network (DNN).
Optionally, the system may, for example, also comprise an output value modifier 235 to further modify the modified fundamental frequencies after they have been created (see modification block 217 in
The system of
For example, the generation of the anonymized x-vector from the x-vector of the speech in the audio output signal generated by the system of
The systems of
For example, a better disentangling of the input speech may, e.g., be achieved by not using F0 trajectories derived from input speech. This results in significantly better voice modification/anonymization/de-anonymization.
Moreover, a potentially better speech synthesis quality may, e.g., be achieved because of harmonized input features.
Furthermore, the provided concept does not affect the word error rate of the modified voice. Moreover, a frame-wise performance may, e.g., be obtained.
Moreover, complexity reduction is achieved by the embodiment of
Inter alia,
In the following, a regression DNN for F0 trajectories according to an embodiment is described.
In particular,
In an embodiment, F0 trajectories may, e.g., be predicted in logarithmic scale with a global mean-variance normalization. Two output neurons in the last layer signify the predicted pitch value {circumflex over (F)}0[n] (no activation function) and the probability of the frame signifying a voiced sound pv[n] (sigmoid activation function). According to this probability, the F0 value for the frame is either passed as is (if the probability is greater than 0.5), or zeroed out (otherwise). The loss function for a batched input is provided in Equation 1 below, where ‘MSE(⋅)’ and ‘BCE(⋅)’ denote the ‘mean-squared error’ and ‘binary cross entropy with logits’ as implemented by PyTorch. The variable v denotes the voiced/unvoiced label of the frame and α denotes a trade-off parameter balancing the classification and regression tasks.
As in the embodiment of
In
Then, in subtractor 232, a subtraction is conducted between the real fundamental frequencies, extracted from the input speech by the fundamental frequencies extractor 216, and the artificial fundamental frequencies, generated by the other fundamental frequencies generator 231. What remains after the subtraction is an F0 residuum that still comprises, for example, the excitation of the input speech but without the real fundamental frequencies.
Optionally, a strength 233 may, e.g., amplify or attenuate this F0 residuum. The strength control 233 may, e.g., thus allow leakage of utterance-specific F0 character to be added to the speech synthesis.
A combiner (not shown) may, e.g., then combine (for example, add) the F0 residuum to the modified fundamental frequencies generated by fundamental frequencies generator 230.
This approach realizes to keep some signal properties of the input speech that are not related to the fundamental frequencies also in the output speech.
In the following, training strategies and hyperparameter optimization are considered.
A DNN according to an embodiment, may, e.g., be implemented using PyTorch [9], and may, e.g., be trained using PyTorch Ignite [10].
All files in the libri-dev-* and vctk-dev-* subsets may, e.g., be concatenated into a single tall matrix, then a random (90%, 10%) train-validation split is performed, allowing frames from different utterances to be present in a single batch. In an embodiment, early stopping after 10 epochs is employed without improvement and learning rate reduction (multiplication by 0.1 after 5 epochs without improvement in validation loss).
For conventional systems, OpTuna [11] tunes the learning rate Ir, the trade-off parameter α and the dropout probability p. Optimal values obtained after 50 trials are listed in Table 1. However, the inventors have found that a system according to an embodiment may, e.g., perform better without dropout. Thus, for some embodiments, p may, e.g., be set to p=0.
The above table depicts hyperparameter values obtained using OpTuna.
In the following, embodiments of the present invention are evaluated.
Regarding an analysis of the generated F0 trajectories, the inventors have verified the performance of our F0 regressor by visualizing the reconstructions for matched x-vectors and cross-gender x-vectors. The latter allows to evaluate the generalization capabilities.
In
Evaluation has also been conducted with respect to a challenge framework. The inventors have executed evaluation scripts provided by the challenge organizers. As a system according to a particular embodiment did not include a tunable parameter that governs the trade-off between the equal error rate (EER) and WER, the inventors have submitted a single set of results.
As can be seen from the table of
performs also significantly better than any other baseline system (c.f. [13]). For the VCTK conditions the WER scores also improve. For every data subset the pitch correlation ρF
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Some or all of the method steps may be executed by (or using) a hardware apparatus, like for example, a microprocessor, a programmable computer or an electronic circuit. In some embodiments, one or more of the most important method steps may be executed by such an apparatus.
Depending on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software or at least partially in hardware or at least partially in software. The implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed. Therefore, the digital storage medium may be computer readable.
Some embodiments according to the invention comprise a data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is performed.
Generally, embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer. The program code may for example be stored on a machine readable carrier.
Other embodiments comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
In other words, an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
A further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein. The data carrier, the digital storage medium or the recorded medium are typically tangible and/or non-transitory.
A further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods described herein. The data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
A further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods described herein.
A further embodiment comprises a computer having installed thereon the computer program for performing one of the methods described herein.
A further embodiment according to the invention comprises an apparatus or a system configured to transfer (for example, electronically or optically) a computer program for performing one of the methods described herein to a receiver. The receiver may, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
In some embodiments, a programmable logic device (for example a field programmable gate array) may be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein. Generally, the methods are performed by any hardware apparatus.
The apparatus described herein may be implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
The methods described herein may be performed using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.
While this invention has been described in terms of several advantageous embodiments, there are alterations, permutations, and equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
22189150.0 | Aug 2022 | EP | regional |
This application is a continuation of copending International Application No. PCT/EP2023/071584, filed Aug. 3, 2023, which is incorporated herein by reference in its entirety, and additionally claims priority from European Application No. EP 22189150.0, filed Aug. 5, 2022, which is also incorporated herein by reference in its entirety. The present invention relates to voice modification, and, in particular, to a system and a method for voice modification.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2023/071584 | Aug 2023 | WO |
Child | 19019980 | US |