The accompanying drawings illustrate a number of exemplary embodiments and are a part of the specification. Together with the following description, these drawings demonstrate and explain various principles of the present disclosure.
Throughout the drawings, identical reference characters and descriptions indicate similar, but not necessarily identical, elements. While the exemplary embodiments described herein are susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, the exemplary embodiments described herein are not intended to be limited to the particular forms disclosed. Rather, the present disclosure covers all modifications, equivalents, and alternatives falling within the scope of the appended claims.
Audio data can include unwanted signals that interfere with the quality of wanted signals. For example, background noise in a call may obstruct the voice of a speaker, making it difficult to discern words spoken during the call. To reduce the effect of unwanted noise, various noise suppression techniques may attempt to enhance incoming signals or suppress unwanted signals. In some cases, methods for noise suppression may estimate noise signals and develop filters to suppress the noise. However, it may be difficult to accurately and efficiently estimate noise, particularly for real-time audio streaming where mouth-to-ear latency is an issue.
Some noise suppression methods may attempt to mask some signals and increase the gain on other signals on a magnitude spectrum. Other methods may focus on post-processing techniques to hide residual noise or to exploit traits of human perception to enhance the audio. However, complex techniques may also increase the time and processing power required to perform the techniques, thereby increasing the latency or impacting the battery life of devices. In some embodiments, models using deep learning methods may have both algorithm latency caused by complex models and operating latency caused by processor use. For example, in some types of models, frames of audio may be convolved and concatenated with previous and/or subsequent frames for processing, which may be a time-consuming and processor-intensive process. In a live streaming scenario, such as during a call between multiple users, latency in processing audio may be especially problematic. On the other hand, models that attempt to reduce latency may face tradeoffs with the complexity of the model, which may reduce the quality of the noise suppression. For example, smaller models with lower complexity may be designed to quickly process audio, but this may compromise the quality of audio and may inaccurately reduce wanted signals or may incompletely suppress unwanted background noise. Thus, better methods of performing noise suppression are needed to minimize latency while improving accuracy.
The present disclosure is generally directed to systems and methods for noise suppression. As will be explained in greater detail below, embodiments of the present disclosure may, by training and optimizing a neural network model, create a filter to quickly process noisy audio and produce a cleaned audio clip. For example, by training a model to recognize spoken word and remove interference, the disclosed systems and methods may process speech for an audio application, such as a communications application, or for other machine learning applications like automatic speed recognition software. By applying machine learning to pairs of noisy audio clips and associated clean audio clips, the systems and methods described herein may first train a neural network model to address the quality loss in audio signals. The disclosed systems and methods may then quickly process audio, such as by applying a filter derived from neural networks to live streaming audio, to produce a cleaned version of the original audio using the trained model. Additionally, the trained model may include layers of encoding, layers of decoding, and a feature processor to process the audio and transform it into a cleaned and filtered version of the audio.
The disclosed systems and methods may also optimize the model to reduce latency and to reduce required processing power. The disclosed systems and methods may divide an audio clip or a captured live-streamed media clip into frames of time to process one frame at a time. For some models, concatenating separate frames of the audio, with each frame truncated to a short period of time, may help to determine a word spoken over a period of several frames by deriving context from neighboring frames to predict the content of a single frame. However, the process of concatenating data from multiple frames may be costly in terms of time and processing power. Rather than combing input from multiple frames into a single input tensor using concatenation, the systems and methods described herein may use indirection buffers to point to separate tensor locations of input from multiple frames to avoid the costly process of concatenation. The disclosed systems and methods may then apply the optimized model on each frame to encode, process, and decode the frame to identify a signal of interest, such as speech, and potential noisy signals. The systems and methods described herein may also perform pre-processing and post-processing steps to create a final clean audio clip combining the divided frames. Furthermore, the disclosed systems and methods may iteratively improve the trained model by retraining it using the captured or live-streaming audio and the processed clean audio.
In addition, the systems and methods described herein may improve the functioning of a computing device by improving the noise suppression model to increase the speed of audio processing and reduce the utilization of processing power. These systems and methods may also improve the fields of audio processing and communications by quickly and accurately reducing noise and artifacts from both recorded audio and live audio using a neural network model. Thus, the disclosed systems and methods may improve over typical methods of noise suppression.
Features from any of the embodiments described herein may be used in combination with one another in accordance with the general principles described herein. These and other embodiments, features, and advantages will be more fully understood upon reading the following detailed description in conjunction with the accompanying drawings and claims.
The following will provide, with reference to
As illustrated in
The systems described herein may perform step 110 in a variety of ways. In one example, media clip 212 may include an audio clip, a video clip, and/or a multimedia clip. In some examples, media clip 212 may represent a recorded media clip, and capture module 204 may extract audio data from media clip 212. In other examples, media clip 212 may represent live-streaming media, such as a teleconferencing call, and capture module 204 may capture media clip 212 in real time, such as by continuously capturing short segments of the audio. In these examples, performing noise suppression on media clip 212 may require reduced latency to avoid delays in communication between users. For example, a user of computing device 202 may be holding a video conference with a remote user of a different computing device. In this example, a delay of one second caused by audio processing may impact a conversation between the users, which may impact the experience of using the video conferencing software. Thus, capture module 204 may capture very short bursts of audio, such as clips of 10 milliseconds long at a time, to quickly process the audio before moving to the next segment of audio as it is received. In these examples, media clip 212 may represent audio to be sent from computing device 202 to a different computing device or audio received by computing device 202 from the different computing device. Additionally, in some examples, computing device 202 may act as an intermediary between client devices to process media clip 212 before forwarding cleaned audio to a client device.
In one example, computing device 202 of
Furthermore, in some embodiments, computing device 202 may be in communication with a server or other computing devices and systems via a wireless or wired network. In some examples, the term “network” may refer to any medium or architecture capable of facilitating communication or data transfer. Examples of networks include, without limitation, an intranet, a Wide Area Network (WAN), a Local Area Network (LAN), a Personal Area Network (PAN), the Internet, Power Line Communications (PLC), a cellular network (e.g., a Global System for Mobile Communications (GSM) network), or the like.
Returning to
The systems described herein may perform step 120 in a variety of ways. As discussed above, in some examples, division module 206 may divide media clip 212 into 10 ms frames of audio. In other examples, division module 206 may divide media clip 212 based on an optimal length of time for processing audio to suppress noise. For example, the disclosed systems and methods may determine a length of time to process the audio and output a clean version of the audio without significantly disrupting a live streaming session for a user. In other examples, the disclosed systems and methods may determine the predetermined length of time based on the amount of time taken to perform the disclosed noise suppression methods. For example, for a 10 ms frame of audio, the disclosed systems and methods may perform noise suppression within 3 ms, thereby performing the process quickly enough to avoid delays longer than the 10 ms length of time of the frame. For processing delays longer than 10 ms, a system may need to discard the frame of audio during live streaming to avoid accrued delays. In other words, the processing time for a frame may ideally be shorter than the predetermined length of time such that a frame is processed while a next frame is captured.
Returning to
The systems described herein may perform step 130 in a variety of ways. In some embodiments, trained neural network model 220 may be trained using a machine learning method on pairs of clean audio samples and noisy audio samples to create a filter or a mask for noisy audio. In other embodiments, trained neural network model 220 may be trained to produce clean audio directly from noisy audio, such as by transforming noisy audio clips into clean audio clips. In some examples, the terms “neural network” and “neural network model” may refer to a machine learning model that can learn from labeled or unlabeled data using multiple processing layers to estimate functions. For example, a deep belief neural network may use unsupervised training of input data to detect features within the data. Other examples of neural networks may include, without limitation, linear neural networks, convolutional neural networks, recurrent neural networks, memory networks, encoder-decoder networks, and/or any other suitable form of artificial neural network used for learning from data. In some examples, the term “machine learning” may refer to a computational algorithm that may learn from data in order to make predictions. Examples of machine learning may include, without limitation, support vector machines, neural networks, clustering, decision trees, regression analysis, classification, variations or combinations of one or more of the same, and/or any other suitable supervised, semi-supervised, or unsupervised methods. In some examples, the terms “filter” and “mask” may refer to a process of transforming data by excluding or enhancing specific portions or types of data, such as by removing frequencies determined to be noise from audio data.
In some examples, the term “quantization” may refer to a process of simplifying data to increase the processing speed of the data. For example, data that is stored in 32 bits may be quantized to be stored in 4 bits, thereby reducing the amount of time to process the smaller amount of data. In some examples, the term “tensor” may refer to a mathematical construct or data format capable of a high dimensionality that describes relationships between sets of objects in a vector space. For example, scalar data may be represented by a single number or dimension, and vector data may be represented by a list of numbers. In this example, a matrix may be represented by a higher dimension of numbers, such as an array with rows and columns. In this example, tensor data may encompass scalar data, vector data, and matrices, and tensor data may include additional dimensions, such as a matrix of matrices, as well as the relationship between these dimensions. Thus, a single tensor datapoint may include a more complex representation of data that may otherwise require multiple datapoints of other types of data. By training trained neural network model 220 to use input tensors, the disclosed systems and methods may enable trained neural network model 220 to process fewer individual inputs. By quantizing trained neural network model 220, the disclosed systems and methods may also simplify the complex input data represented by the input tensors. Thus, the disclosed systems and methods may create a more compact model that runs faster than the same model without quantization of the input audio data. In some examples, quantization may be dynamically performed, statically quantized, and/or any other variations of quantization that improves speed.
In the above embodiments, one or more machine learning methods may train trained neural network model 220 on pairs of clean audio samples and noisy audio samples, with each noisy audio sample representing a version of the corresponding clean audio sample that includes more unwanted noise. In some examples, the noisy audio samples may include one or more noisy audio clips, and the corresponding clean audio clips may represent the noisy audio clips pre-processed for noise suppression. For example, noisy audio samples may be collected and processed using trained neural network model 220 or a pre-existing filter to create noisy audio samples. The disclosed systems and methods may then train or retrain trained neural network model 220 using the corresponding pairs of clean and noisy audio samples. In other examples, the noisy audio samples may include clean audio clips transformed into noisy audio clips using data augmentation. In these examples, clean audio samples may represent target clean audio with intentionally introduced noise that trained neural network model 220 is trained to remove. For example, an audio clip of a user speaking may be augmented by distorting the speech and introducing additional background noise, such as wind sound or other voices. In some examples, the same clean audio sample or multiple clean audio samples may be distorted or augmented multiple ways to create multiple pairs of clean and noisy audio samples, thereby training trained neural network model 220 to recognize different types of noise. For example, the disclosed systems and methods may perform data augmentation on-the-fly to simulate multiple speakers, simulate multiple sources of sound such as multiple microphones, insert silence, create gain variation, add reverberation, and/or any other form of noise simulation. Additionally, the training data composed of clean audio samples and noisy audio samples may be transformed intentionally and/or randomly.
In one embodiment, trained neural network model 220 may be trained by comparing the noisy audio samples to the clean audio samples to determine a set of losses. In this embodiment, a loss may represent the difference between a noisy audio sample and the corresponding clean audio sample, such as by comparing a difference in volume at different frequencies. In some examples, the set of losses may be used to create the filter to process noisy audio and remove potential noise as detected by trained neural network model 220. In some embodiments, trained neural network model 220 may be trained by jointly using the clean audio samples and the noisy audio samples with specially designed losses.
In some examples, trained neural network model 220 may include one or more of a set of encoder layers, a feature processor, and/or a set of decoder layers. In some examples, the term “encoder” may refer to a machine learning mechanism that processes data to extract features that may be used to classify or label data for further analysis and that transforms variable data into a state with a fixed shape. In some examples, the term “decoder” may refer to a machine learning mechanism that corresponds with an encoder to map the fixed shape data to variable data. For example, a decoder may transform variable audio data into a feature tensor, and a decoder may take the feature tensor to output transformed audio data. In some examples, the term “feature processor” may refer to a form of neural network capable of processing features of data, such as features extracted by an encoder, to learn from the data. Examples of feature processors may include, without limitation, gated recurrent units (GRUs), long short-term memory (LSTM) neural networks, temporal convolutional networks (TCNs), latency-controlled bidirectional LSTM neural networks, split-shuffled LSTM neural networks, and/or any other form of neural networks capable of learning from data, particularly sequential data.
In some embodiments, trained neural network model 220 may be quantized and sped up by using indirection buffers to point to separate input tensor locations during encoding. In some examples, the term “indirection buffer” may refer to a buffer of pointers that indicate the location of data, such that an indirection buffer may be passed as data in place of the original data. For example,
In the above embodiments, an encoder layer in set of encoder layers 402 may save a state of a frame as a first input tensor of the frame, may save an output of a previous encoder layer for a previous frame occurring chronologically before the frame as a second input tensor of the frame, may use the indirection buffers to identify a location of the first input tensor and a location of the second input tensor, and may output an encoding of the frame using the first input tensor and the second input tensor. For example, an encoder may use depthwise separable convolutions or a single convolutional neural network block. As shown in
In some examples, a decoder layer in set of decoder layers 410 may decode a frame of the set of frames, may save a state of the decoded frame, and may save a partial output as a state of a subsequent decoder layer for a subsequent frame occurring chronologically after the frame. For example, a decoder may use transposed convolutions. As shown in
In one example, trained neural network model 220 may be used to detect speech. In this example, frames before and/or after a current frame may help determine what words are spoken during the current frame. Thus, by saving outputs and partial outputs at each encoder layer and each decoder layer, trained neural network model 220 may use previous frames to detect what is in current frames at each layer. In other examples, such as for recorded media clip 212 stored in a saved file, encoder layers and decoder layers may additionally use contextual data from subsequent frames to improve processing of each current frame. In these examples, trained neural network model 220 may have relaxed latency requirements and may enable processing of wideband audio with more audio context.
In some embodiments, the disclosed systems and methods may further include improving trained neural network model 220 by combining a gating mechanism and a normalization process to process an output of an encoder layer of set of encoder layers 402 to create an input of a next encoder layer. Additionally or alternatively, the disclosed systems and methods may combine the gating mechanism and the normalization process to process an output of a decoder layer of set of decoder layers 410 to create an input of a next decoder layer. In some examples, the term “gating mechanism” may refer to a neural network technique to pass data and information forward and to store data to update current states. In some examples, the term “normalization” may refer to a method to adjust the values of input data to conform to a common scale. For example, frames of audio data may be resampled to a range of frequencies and/or normalized to a volume range. The frames of audio data may then be processed and passed from one encoder layer to the next and/or from one decoder layer to the next to update the states of encoder and decoder layers, such as by using gated linear unit (GLU) layers and separate normalization layers. In this example, by fusing the GLU and normalization layers, the frames of audio data may be normalized while being passed to subsequent encoder or decoder layers to reduce the processing time of performing these steps separately. Additionally, the disclosed systems and methods may vectorize existing operators to streamline data processing. For example, trained neural network model 220 may take input audio from media clip 212, normalize the audio, output the processed and normalized audio, and reverse the normalization of the output.
In some embodiments, the disclosed systems and methods may quantize one or more of linear layers 408(1)-(N) of feature processor 406 and/or combine operations to combine outputs of linear layers 408(1)-(N) of feature processor 406. For example, linear layers of a GRU type of feature processor may be dynamically quantized to reduce the complexity of the data processed by feature processor 406 and/or to reduce the complexity of feature processor 406 itself. As another example, by combining the outputs of linear layers 408(1)-(N), the disclosed systems and methods may avoid materialization of intermediate output tensors, which reduces computing time and complexity.
To further reduce computing and processor usage, trained neural network model 220 may also use internal state management, simplify multiple layers of trained neural network model 220, simplify embedding management, quantize GRUs in trained neural network model 220, combine operations that combine the output of linear layers 408(1)-(N), and/or perform various other methods to streamline trained neural network model 220. For example, by vectorizing some operations of trained neural network model 220, the disclosed systems and methods may improve the speed of the operations. In some examples, the disclosed systems and methods may additionally include steps to perform acoustic echo cancellation and/or acoustic echo suppression to specifically reduce noise produced by echo recorded by sensors or microphones. In some examples, trained neural network model 220 may avoid or replace costly convolution processes and/or modify convolutions to accept multiple inputs. In some examples, using quantization in trained neural network model 220 may reduce overhead and latency in comparison to static quantization operations and may adjust to different input data. In some examples, trained neural network model 220 may balance the complexity of quantized data with the quality of audio processing to determine an optimal degree of quantization.
In some embodiments, performance module 208 may perform noise suppression process 218 by identifying a signal of interest, detecting the signal of interest in one or more frames of set of frames 214 using trained neural network model 220, and filtering one or more other signals from the one or more frames of set of frames 214. In these embodiments, performance module 208 may detect the signal of interest by extracting a set of parameters from trained neural network model 220 to perform just-in-time compilation and deployment of noise suppression process 218. For example, performance module 208 may determine that speech is a signal of interest, detect speech in live streaming audio of media clip 212, and extract parameters that identify speech to perform just-in-time filtering of background noise from media clip 212 while media clip 212 is streaming. By performing state management internally, trained neural network model 220 may avoid passing states back and forth, and operations to forward states may assume states are valid.
Returning to
The systems described herein may perform step 140 in a variety of ways. In one embodiment, creation module 210 may process the output of trained neural network model 220 to create clean media clip 224 by combining individual processed frames of set of frames 214. For example, trained neural network model 220 may process one frame at a time, and clean media clip 224 may represent a combination of the processed frames reconstituted into a single media clip. In some examples, the term “clean media clip” may refer to a media clip that has been processed or enhanced, such as by trained neural network model 220. For example, trained neural network model 220 may be trained to reduce artifacts in processed speech, and clean media clip 224 may represent a version of media clip 212 with enhanced quality of speech for clarity to a listener, such as the example of
In some embodiments, the systems and methods disclosed herein may further include iteratively improving trained neural network model 220 by retraining trained neural network model 220 with media clip 212 and clean media clip 224. For example, as illustrated in
As explained above in connection with method 100 in
By simplifying and combining operations such as convolutions and concatenation and by quantizing input data, the disclosed systems and methods may enable complex models that reduce runtime complexity and latency without compromising audio quality. Additionally, by enabling the model to quickly process a frame and using contextual information from a previous frame more efficiently, the disclosed systems and methods may reduce latency and decrease processor use to enable real-time noise suppression of live streaming audio. Thus, the systems and methods described herein may improve noise suppression and the processing of audio data to remove environmental distractions and artifacts.
Example 1: A computer-implemented method for noise suppression may include 1) capturing, by a computing device, a media clip, 2) dividing, by the computing device, the media clip into a set of frames, wherein each frame may include an audio portion of the media clip of a predetermined length of time, 3) performing, by the computing device, a noise suppression process on each frame of the set of frames using a trained neural network model, wherein the trained neural network model is quantized to use input tensors, and 4) creating, by the computing device, a clean media clip based on the noise suppression process.
Example 2: The computer-implemented method of Example 1, wherein the media clip may include one or more of an audio clip, a video clip, and/or a multimedia clip.
Example 3: The computer-implemented method of any of Examples 1 and 2, wherein the trained neural network model may be trained using a machine learning method on pairs of clean audio samples and noisy audio samples to create a filter for noisy audio and/or to produce clean audio directly from noisy audio.
Example 4: The computer-implemented method of Example 3, wherein the noisy audio samples may include one or more noisy audio clips and/or clean audio clips transformed into noisy audio clips using data augmentation.
Example 5: The computer-implemented method of any of Examples 3 and 4, wherein the trained neural network model may be trained by comparing the noisy audio samples to the clean audio samples to determine a set of losses.
Example 6: The computer-implemented method of any of Examples 1-5, wherein the trained neural network model may include one or more of a set of encoder layers, a feature processor, and/or a set of decoder layers.
Example 7: The computer-implemented method of Example 6, wherein the trained neural network model may be quantized by using indirection buffers to point to separate input tensor locations during encoding.
Example 8: The computer-implemented method of Example 7, wherein an encoder layer in the set of encoder layers may save a state of a frame as a first input tensor of the frame, may save an output of a previous encoder layer for a previous frame occurring chronologically before the frame as a second input tensor of the frame, may use the indirection buffers to identify a location of the first input tensor and a location of the second input tensor, and may output an encoding of the frame using the first input tensor and the second input tensor.
Example 9: The computer-implemented method of any of Examples 6-8, wherein a decoder layer in the set of decoder layers may decode a frame of the set of frames, may save a state of the decoded frame, and may save a partial output as a state of a subsequent decoder layer for a subsequent frame occurring chronologically after the frame.
Example 10: The computer-implemented method of any of Examples 6-9 may further include improving the trained neural network model by one or more of the following: combining a gating mechanism and a normalization process to process an output of an encoder layer of the set of encoder layers to create an input of a next encoder layer, quantizing one or more linear layers of the feature processor, combining operations to combine outputs of linear layers of the feature processor, and/or combining the gating mechanism and the normalization process to process an output of a decoder layer of the set of decoder layers to create an input of a next decoder layer.
Example 11: The computer-implemented method of any of Examples 1-10, wherein performing the noise suppression process may include identifying a signal of interest, detecting the signal of interest in one or more frames of the set of frames using the trained neural network model, and filtering one or more other signals from the one or more frames of the set of frames.
Example 12: The computer-implemented method of Example 11, wherein detecting the signal of interest in the one or more frames of the set of frames may include extracting a set of parameters from the trained neural network model to perform just-in-time compilation and deployment of the noise suppression process.
Example 13: The computer-implemented method of any of Examples 1-12 may further include iteratively improving the trained neural network model by retraining the trained neural network model with the media clip and the clean media clip.
Example 14: A corresponding system for noise suppression may include several modules stored in memory, including 1) a capture module that captures, by a computing device, a media clip, 2) a division module that divides, by the computing device, the media clip into a set of frames, wherein each frame may include an audio portion of the media clip of a predetermined length of time, 3) a performance module that performs, by the computing device, a noise suppression process on each frame of the set of frames using a trained neural network model, wherein the trained neural network model is quantized to use input tensors, and 4) a creation module that creates, by the computing device, a clean media clip based on the noise suppression process. The system may also include one or more hardware processors that execute the capture module, the division module, the performance module, and the creation module.
Example 15: The system of Example 14, wherein the trained neural network model may be trained using a machine learning method on pairs of clean audio samples and noisy audio samples to create a filter for noisy audio and/or to produce clean audio directly from noisy audio.
Example 16: The system of any of Examples 14 and 15, wherein the trained neural network model may include one or more of a set of encoder layers, a feature processor, and/or a set of decoder layers.
Example 17: The system of Example 16, wherein the trained neural network model may be quantized by using indirection buffers to point to separate input tensor locations during encoding.
Example 18: The system of Example 17, wherein an encoder layer in the set of encoder layers may save a state of a frame as a first input tensor of the frame, may save an output of a previous encoder layer for a previous frame occurring chronologically before the frame as a second input tensor of the frame, may use the indirection buffers to identify a location of the first input tensor and a location of the second input tensor, and may output an encoding of the frame using the first input tensor and the second input tensor.
Example 19: The system of any of Example 16-18, wherein a decoder layer in the set of decoder layers may decode a frame of the set of frames, may save a state of the decoded frame, and may save a partial output as a state of a subsequent decoder layer for a subsequent frame occurring chronologically after the frame.
Example 20: The above-described method may be encoded as computer-readable instructions on a computer-readable medium. For example, a non-transitory computer-readable medium may include one or more computer-executable instructions that, when executed by one or more processors of a computing device, may cause the computing device to 1) capture a media clip, 2) divide the media clip into a set of frames, wherein each frame may include an audio portion of the media clip of a predetermined length of time, 3) perform a noise suppression process on each frame of the set of frames using a trained neural network model, wherein the trained neural network model is quantized to use input tensors, and 4) create a clean media clip based on the noise suppression process.
As detailed above, the computing devices and systems described and/or illustrated herein broadly represent any type or form of computing device or system capable of executing computer-readable instructions, such as those contained within the modules described herein. In their most basic configuration, these computing device(s) may each include at least one memory device and at least one physical processor.
In some examples, the term “memory device” generally refers to any type or form of volatile or non-volatile storage device or medium capable of storing data and/or computer-readable instructions. In one example, a memory device may store, load, and/or maintain one or more of the modules described herein. Examples of memory devices include, without limitation, Random Access Memory (RAM), Read Only Memory (ROM), flash memory, Hard Disk Drives (HDDs), Solid-State Drives (SSDs), optical disk drives, caches, variations or combinations of one or more of the same, or any other suitable storage memory.
In some examples, the term “physical processor” generally refers to any type or form of hardware-implemented processing unit capable of interpreting and/or executing computer-readable instructions. In one example, a physical processor may access and/or modify one or more modules stored in the above-described memory device. Examples of physical processors include, without limitation, microprocessors, microcontrollers, Central Processing Units (CPUs), Field-Programmable Gate Arrays (FPGAs) that implement softcore processors, Application-Specific Integrated Circuits (ASICs), portions of one or more of the same, variations or combinations of one or more of the same, or any other suitable physical processor.
Although illustrated as separate elements, the modules described and/or illustrated herein may represent portions of a single module or application. In addition, in certain embodiments one or more of these modules may represent one or more software applications or programs that, when executed by a computing device, may cause the computing device to perform one or more tasks. For example, one or more of the modules described and/or illustrated herein may represent modules stored and configured to run on one or more of the computing devices or systems described and/or illustrated herein. One or more of these modules may also represent all or portions of one or more special-purpose computers configured to perform one or more tasks.
In addition, one or more of the modules described herein may transform data, physical devices, and/or representations of physical devices from one form to another. For example, one or more of the modules recited herein may receive an audio clip to be transformed, transform the audio clip into short frames of audio, output a result of the transformation to suppress noise in each frame, use the result of the transformation to create a clean audio clip, and store the result of the transformation to iteratively improve a noise suppression process. Additionally or alternatively, one or more of the modules recited herein may transform a processor, volatile memory, non-volatile memory, and/or any other portion of a physical computing device from one form to another by executing on the computing device, storing data on the computing device, and/or otherwise interacting with the computing device.
In some embodiments, the term “computer-readable medium” generally refers to any form of device, carrier, or medium capable of storing or carrying computer-readable instructions. Examples of computer-readable media include, without limitation, transmission-type media, such as carrier waves, and non-transitory-type media, such as magnetic-storage media (e.g., hard disk drives, tape drives, and floppy disks), optical-storage media (e.g., Compact Disks (CDs), Digital Video Disks (DVDs), and BLU-RAY disks), electronic-storage media (e.g., solid-state drives and flash media), and other distribution systems.
The process parameters and sequence of the steps described and/or illustrated herein are given by way of example only and can be varied as desired. For example, while the steps illustrated and/or described herein may be shown or discussed in a particular order, these steps do not necessarily need to be performed in the order illustrated or discussed. The various exemplary methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or include additional steps in addition to those disclosed.
The preceding description has been provided to enable others skilled in the art to best utilize various aspects of the exemplary embodiments disclosed herein. This exemplary description is not intended to be exhaustive or to be limited to any precise form disclosed. Many modifications and variations are possible without departing from the spirit and scope of the present disclosure. The embodiments disclosed herein should be considered in all respects illustrative and not restrictive. Reference should be made to the appended claims and their equivalents in determining the scope of the present disclosure.
Unless otherwise noted, the terms “connected to” and “coupled to” (and their derivatives), as used in the specification and claims, are to be construed as permitting both direct and indirect (i.e., via other elements or components) connection. In addition, the terms “a” or “an,” as used in the specification and claims, are to be construed as meaning “at least one of.” Finally, for ease of use, the terms “including” and “having” (and their derivatives), as used in the specification and claims, are interchangeable with and have the same meaning as the word “comprising.”