The present invention relates to a method and audio processing system for source separation and remixing.
Recorded audio signals may comprise a representation of one or more audio sources in addition to a noise component. Especially for User Generated Content (UGC) it is in general true that many individual audio sources will be picked up in addition to a noise audio component (such as white noise) when recording audio.
Consider e.g. a user recording the audio track of a video, recording a podcast or making a phone call using a headset or smartphone from the sidewalk of a busy street or in a forest during windy conditions. The recorded audio signal from the busy street could for instance, in addition to the voice of the user, include the voices of other nearby pedestrians, the ringtone of a nearby pedestrian's cellphone, the sound of passing cars or busses, sounds from a nearby construction site, the sound of a siren from an emergency vehicle and the noise component. Similarly, the recorded audio signal from the forest could for instance include the voice of the user, birdsong, the sound of an airplane passing above, the sound of the wind rattling the leaves and noise.
The recorded audio signal will comprise audio from all of these recorded sound sources which makes a desired audio signal, e.g. the voice of the user recording a video or making a phone call, less intelligible. To this end, neural network models for speech separation have been proposed which are capable of receiving an audio signal comprising recorded speech alongside other audio sources and noise as an input and output either a processed audio signal with enhanced speech intelligibility or a speech isolation filter (often referred to as a “mask”) for suppressing the non-speech audio components of audio signal. Accordingly, by using neural network models the intelligibility of speech present in audio signals can be enhanced allowing users to record audio signals at many locations.
In other situations, especially for Professionally Generated Content (PGC) such as the recording of an audio track for a movie, all audio sources, or at least additional audio sources in addition to the recorded voice may be of interest. For instance, for a movie audio track which is recorded in a forest during windy conditions the sound of a voice, the sound of the rattling leaves and birdsong are desired audio signal components whereas the sound of an airplane passing above is an undesired audio signal component. Accordingly, a neural network for speech separation may be used to the enhance the intelligibility of the voice whereby individually recorded audio signals containing only birdsong and only the sound of rattling leaves are mixed with the intelligibility enhanced speech to achieve a desired mix of audio sources for the movie audio track. Wherein the final mix has enhanced speech intelligibility but also comprises birdsong and the sound of rattling leaves, but not the sound of a passing airplane, which provides a desirable and believable ambience effect.
A draw back with the prior solutions is that while many neural network models perform well in terms of removing noise components each model is trained to remove a specific type of predetermined noise. Due to different definitions of noise, a single neural network model will perform well if the definition of noise used to train the model overlaps with the undesired noise which is to be removed. However, as soon as the trained model is applied to remove noise which is defined differently from the noise definition used during training the noise suppression performance decreases.
For instance, the trained speech separation model may be aggressive and trained to treat all audio signals components which are not speech as noise. Using such a speech separation on e.g. a movie audio track where speech, birdsong and the sound of leaves rattling are all desired audio signals will suppress the birdsong and the sound of the leaves rattling to isolate only the speech. On the other hand, using a less aggressive speech separation model, which e.g. is trained to predict and remove only the stationary background noise will suppress only the stationary background noise and not e.g. the unwanted sound of an airplane momentarily passing above (which is not an example of stationary background noise).
Thus, it is a purpose of the present disclosure to provide an enhanced method for audio processing which alleviates at least some of the draw backs of the above-mentioned existing solutions.
A first aspect of the present invention relates to a method of processing audio for source separation, the method comprising obtaining an audio signal including a mixture of speech content and noise content, determining speech content from the audio signal, determining stationary noise content from the audio signal, and determining non-speech content, from the audio signal, wherein the stationary noise content is a true subset of the non-speech content. The method further comprises, determining, based on a difference between the stationary noise content and the non-speech content a non-stationary noise content, obtaining a set of weighting factors comprising a weighting factor corresponding to each of the speech content, the stationary noise content, and the non-stationary noise content respectively, and forming a processed audio signal based on a combination of the speech content, the stationary noise content, and the non-stationary noise content weighted with the respective weighting factor.
With stationary noise content it is meant noise content which remains constant over time and which does not carry any interpretable information. White noise or thermal noise are both examples of stationary noise. Further examples of stationary noise are pink noise, Gaussian noise, any noise which e.g. is introduced by an audio amplifier and any noise with a time-independent distribution.
Non-speech may be defined as the difference between a clean speech audio signal (such as a speech signal recorded in an anechoic chamber with any stationary noise removed) and a clean speech audio signal with added disturbances (such as stationary noise or birdsong). That is, non-speech content comprises stationary noise but also other types of non-stationary noise such as birdsong or the sound of rain.
The first aspect of the invention is at least partially based on the understanding that by extracting the non-stationary noise as the difference between non-speech content and the stationary noise content two independent noise content types are obtained in addition to the independent speech content. This facilitates remixing as the relative magnitude of the three content types is adjusted by selecting a desired set of weighting coefficients. For example, by adjusting the three weighting coefficients the stationary noise content is omitted entirely, the non-stationary noise is attenuated but not omitted entirely and the speech content is amplified which results in a processed audio signal with enhanced speech intelligibility while also providing some amount of ambience (as at least a portion of the non-stationary noise content being kept).
In some implementations, determining the stationary noise content comprises providing the audio signal to a stationary noise isolator model trained to predict a stationary noise mask for removing stationary noise content from the audio signal and determining the stationary noise content based on the stationary noise mask and the audio signal.
Thus, an accurate trained model (e.g. implemented with a neural network) may be used to determine the stationary noise content given a representation of an audio signal. Stationary noise content may be defined precisely, and large amounts of training data is readily availible, may be recorded or created synthetically which means stationary noise isolator model can be trained to be very accurate.
Similarly, in some implementations determining the non-speech content comprises providing the audio signal to a speech isolator model trained to predict a noise mask for removing non-speech content from the audio signal; and determining non-speech content based on the noise mask and the audio signal.
Separating speech from arbitrary audio signals may be performed accurately with a model (e.g. implemented with a neural network) trained to predict mask for separating speech content provided a representation of an audio signal. Additionally, the same mask used to extract the speech content may also be used to extract non-speech content meaning that the same trained model may be used to determine both the speech content and the non-speech content.
While it is difficult to train a model to separate between different types of noise, such as stationary noise content and non-stationary noise content, some implementations of the first aspect of the present invention utilizes trained models adapted for separation of more distinctly different types of audio content, such as speech and stationary noise, and a subsequent manipulation of the separated audio content comprising to more accurately separate different types of noise. The manipulation comprising determining the difference between the stationary noise and the non-speech content.
In some implementations, the method further comprises bandpass filtering the non-stationary noise content with a bandpass filter configured to isolate a noise object in the non-stationary noise.
That is, while the non-stationary noise may comprise audio content associated with a plurality of non-stationary noise objects the application of a suitable bandpass filter will isolate at least one desired noise object. A benefit of applying the bandpass filter to the non-stationary noise content is that the filter will not let through any speech-content or stationary noise content as this is not present in the non-stationary noise content.
In some implementations, the bandpass filter has been obtained by analyzing an example audio signal wherein the method further comprises collecting an example audio signal, the example audio signal comprising at least one example of a noise object, determining the frequency distribution of the example audio signal and defining the bandpass filter based on the frequency distribution of the example audio signal.
To this end, the frequency distribution of any arbitrary non-stationary object(s) may be determined and used to generate a bandpass filter for the filtering the non-stationary noise.
According to a second aspect of the invention there is provided an audio processing system, the audio processing system comprising an audio content separation unit, the audio content separation unit being configured to obtain an audio signal, the audio signal including a mixture of speech content and noise content and determine, from the audio signal, speech content, stationary noise content, and non-speech content, wherein the stationary noise content is a true subset of the non-speech content. The audio content separation unit is further configured to determine, based on a difference between the stationary noise content and the non-speech content a non-stationary noise content, and the audio processing system further comprising a mixing unit configured to: obtain a set of weighting factors, comprising a weighting factor corresponding to each of the speech content, the stationary noise content, and the non-stationary noise content respectively, and forming a processed audio signal based on a combination of the speech content, the stationary noise content, and the non-stationary noise content weighted with the respective weighting factor.
According to a third aspect of the invention there is provided a non-transitory computer-readable medium storing instructions that, upon execution by one or more processors, cause the one or more processor to perform the method according to the first aspect of the invention.
Aspects of the present invention will be described in more detail with reference to the appended drawings, showing currently preferred embodiments.
Systems and methods disclosed in the present application may be implemented as software, firmware, hardware or a combination thereof. In a hardware implementation, the division of tasks does not necessarily correspond to the division into physical units: to the contrary, one physical component may have multiple functionalities, and one task may be carried out by several physical components in cooperation.
The computer hardware may for example be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, a smartphone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that computer hardware. Further, the present disclosure shall relate to any collection of computer hardware that individually or jointly execute instructions to perform any one or more of the concepts discussed herein.
Certain or all components may be implemented by one or more processors that accept computer-readable (also called machine-readable) code containing a set of instructions that when executed by one or more of the processors carry out at least one of the methods described herein. Any processor capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken are included. Thus, one example is a typical processing system (i.e. a computer hardware) that includes one or more processors. Each processor may include one or more of a CPU, a graphics processing unit, and a programmable DSP unit. The processing system further may include a memory subsystem including a hard drive, SSD, RAM and/or ROM. A bus subsystem may be included for communicating between the components. The software may reside in the memory subsystem and/or within the processor during execution thereof by the computer system.
The one or more processors may operate as a standalone device or may be connected, e.g., networked to other processor(s). Such a network may be built on various different network protocols, and may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof.
The software may be distributed on computer readable media, which may comprise computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to a person skilled in the art, the term computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, physical (non-transitory) storage media in various forms, such as EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Further, it is well known to the skilled person that communication media (transitory) typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The audio signal Sin, which comprises a mixture of speech and noise content, may be referred to as x(k) in the time domain where k is the time sample index. Thus, x(k) may be expressed as
in the time domain. By transforming the time domain representation in equation 1 to the spectral domain it is derived that
where X, S, N denote the time-frequency (T-F) representations of the audio signal mixture x(k), source s, and the noise n while the subscripts m and f denote the time frame index and frequency bin index respectively.
The audio signal Sin may be provided to a trained model wherein the trained model has been trained to output a mask M1, M2 for suppressing a certain type of noise wherein the mask M1, M2 is typically defined as the magnitude ratio between the desired speech Sm,f and the audio signal mixture Xm,f for each time frame and frequency bin. That is, the mask M is defined as
Depending on the type and training of the mask predicting model the mask M1, M2 may suppress different types of noise. While
With further reference to
The second trained model 12 is trained trained to output a mask M2 for suppressing only the stationary noise content {circumflex over (N)}2 of the audio signal Ŝin and leave all audio content which is not stationary noise content, which is referred to as the residual content Ŝ2, unaffected. Applying the mask M2 to the audio signal Sin effectively removes stationary noise, which remains constant over time (i.e. noise with a probability distribution which is constant over time), while other types of noise which are potentially undesired (e.g. the sound of nearby car revving its engine) are unaffected.
By using these two trained models simultaneously, the first model 11, a speech isolator model, trained to output a first mask M1 and the second model 12, a stationary noise isolator model, trained to output a second mask M2, wherein the first mask M1 is for suppressing non-speech {circumflex over (N)}1 and the second mask M2 is for suppressing stationary noise {circumflex over (N)}2 four partial representations of the audio signal Sin may be obtained. The estimated speech content Ŝ1 and non-speech content {circumflex over (N)}1 (i.e. noise such as birdsong and stationary noise) of the first model 11 are obtained as
and, similarly, the estimated residual content Ŝ2 (i.e. all content but the stationary noise content) and stationary noise content {circumflex over (N)}2 of the second model 12 is obtained as
The output audio signal, Sout, can now be determined by combining Ŝ1, Ŝ2, {circumflex over (N)}1 and {circumflex over (N)}2 from equations 4, 5, 6 and 7 as:
where α1, β1, γ1, μ1 are weighting factors for each of the speech content Ŝ1, the residual content Ŝ2, the non-speech content {circumflex over (N)}1 and the stationary noise content {circumflex over (N)}2 respectively. Alternatively, the output audio signal Sout from equation 8 can be rewritten in terms of the input audio signal mix X, the speech content Ŝ1 and the residual content Ŝ2 as
wherein c1, c2, c3 is an alternative set of weighting factors. It is understood that the same output audio signal Sout may be acquired with both equation 8 and 9 which means that there exists a mapping between the weighting factors α1, β1, γ1, μ1 and the weighting factors c1, c2, c3. However, as will now be described, the representation from equation 8 has some properties which can be exploited.
The above audio signal components Ŝ1, Ŝ2, {circumflex over (N)}1 and {circumflex over (N)}2 from equation 8 are not independent as e.g. the speech content Ŝ1 may be comprised partially or wholly in the residual content Ŝ2 which means that it may not be possible to achieve a desired mix of the components from equation 8. To this end, the non-speech content {circumflex over (N)}1 and the stationary noise content {circumflex over (N)}2 are used to define a new type of noise content referred to as the non-stationary noise content {circumflex over (N)}NS or the object noise content, which is defined as
and the stationary noise content {circumflex over (N)}2 is renamed {circumflex over (N)}S, meaning that
The stationary noise content {circumflex over (N)}S and the non-stationary noise content {circumflex over (N)}NS are independent parts of the audio signal Ŝin (as opposed to {circumflex over (N)}1 and {circumflex over (N)}2 which are dependent) wherein the stationary noise content {circumflex over (N)}S captures e.g. white noise and the non-stationary noise content {circumflex over (N)}NS captures all content which is neither stationary noise content nor speech content. Examples of non-stationary noise {circumflex over (N)}NS include birdsong, the sound of rattling leaves, the sound of cars, airplanes, helicopters and sirens, the sound of gusts of wind, the sound of rain or thunder. Each of these examples, in addition to other not mentioned examples, forms a respective noise object {circumflex over (N)}OBJ,1. {circumflex over (N)}OBJ,2 wherein each noise object {circumflex over (N)}OBJ,1. {circumflex over (N)}OBJ,2 is a true subset of the non-stationary noise content {circumflex over (N)}NS and associated with a certain type of audio content or audio content with a certain audio source (e.g. a machine, animal or vehicle).
Accordingly, the audio signal components Ŝ1, Ŝ2, {circumflex over (N)}S and {circumflex over (N)}{circumflex over (N)}S are combined in a manner similar to equation 8, as
wherein α2, β2, γ2, μ2 are weighting factors and γ2 and μ2 will influence the extent to which the stationary noise {circumflex over (N)}S and non-stationary noise {circumflex over (N)}NS is introduced into the output audio signal Sout. For instance, if μ2 is high the non-stationary noise content such as the noise objects {circumflex over (N)}OBJ,1, {circumflex over (N)}OBJ,2 will be emphasized in the processed audio signal Ŝout and if γ2 is set to zero the stationary noise is omitted entirely, whereby the balance between α2, β2, and μ2 will influence the relative volume of the non-stationary noise with respect the speech Ŝ1 and the residual Ŝ2.
It is noted that the output signal Sout as calculated with equation 12 using Ŝ1, Ŝ2, {circumflex over (N)}S. {circumflex over (N)}NS, may alternatively be expressed in terms of Ŝ1, Ŝ2, {circumflex over (N)}1, {circumflex over (N)}2 from equation 8 or in terms of Ŝ1, Ŝ2, X from equation 9. Accordingly, there exists a mapping between all three sets of weighting coefficients, namely the weighting coefficients α2, β2, γ2, μ2, the weighting coefficients α1, β1, γ1, μ1 and the weighting coefficients c1, c2, c3. However, the representation from equation 12 has the benefit of featuring three independent content types (if Ŝ2 is omitted) which facilitates more accurate remixing of the output audio signal Ŝout.
In some implementations, β1 or β2 is set to zero or the residual content Ŝ2 is omitted from equation 8 and 12 as Ŝ2 will involve some overlap between both the speech content Ŝ1 and the non-speech content {circumflex over (N)}1 as predicted by the first trained model 11.
With reference to
At step S1 an audio signal comprising a mix of speech content and noise content is obtained and provided to an audio separation unit 10. The audio separation unit 10 comprises a a speech isolator model 11 trained to predict a mask M1 for separating the speech content Ŝ1 from the non-speech content {circumflex over (N)}1 in the audio signal. By applying the mask M1 to the audio signal, e.g. in accordance with equation 4 and 5 in the above, the speech content Ŝ1 and non-speech content {circumflex over (N)}1 is determined at step S2a and step S2c respectively.
Analogously, the audio signal is provided to the stationary noise isolator model 12 trained to predict a mask M2 for separating the residual audio content Ŝ2 from the stationary noise content {circumflex over (N)}2. By applying the mask M2 to the audio signal, e.g. in accordance with equation 7 in the above, at least the stationary noise content {circumflex over (N)}2 is determined at step S2b.
At step S3 the non-stationary noise content {circumflex over (N)}NS is determined by the audio separation unit 10 as the difference between the non-speech content {circumflex over (N)}1 predicted by the speech isolator model 11 and the stationary noise {circumflex over (N)}2 as predicted by the stationary noise isolator model 12. Alternatively, the audio separation unit 10 outputs the speech content Ŝ1, the non-speech content {circumflex over (N)}1 and the stationary noise content {circumflex over (N)}2 whereby the non-stationary noise content {circumflex over (N)}NS is determined by an auxiliary computation unit.
The method may then go to step S5 which comprises obtaining at least one weighting factor for each of the speech content Ŝ1, the stationary noise content {circumflex over (N)}2={circumflex over (N)}S and the non-stationary noise content {circumflex over (N)}NS. The weighting factors are e.g. predetermined or set by a user/mixing engineer to obtain a desired mix of the independent speech content Ŝ1, stationary noise content {circumflex over (N)}2-{circumflex over (N)}S and non-stationary noise content {circumflex over (N)}NS in the output audio signal. Additionally, as will be described in the below, a selector may select or suggest a set of weighting coefficients based on the detected noise objects present in the audio signal.
At step S6 the speech content Ŝ1, the stationary noise content {circumflex over (N)}2={circumflex over (N)}S and the non-stationary noise content {circumflex over (N)}NS are combined by the mixer unit 14 with their respective weighting factor to form the processed audio signal, e.g. in accordance with equation 12 in the above. That is, the different independent content types of the audio signal are remixed to form a processed output audio signal.
Optionally, as seen in the exemplary implementation in
The filter 13 may in turn be determined by collecting an example audio signal, the example audio signal comprising at least one example of a (non-stationary) target noise object such as birdsong or a group of target noise objects such as traffic sounds, and determining the frequency distribution of the example audio signal. The frequency distribution of the example audio signal will reveal the energy distribution of the audio signal whereby a suitable bandpass filter 13 may be defined with a passband which allows at least a predetermined portion of the example audio signal to pass through. For instance, the bandpass filter 13 is defined to be as narrow as possible but still feature a passband which allows at least 50%, and preferably at least 70%, and most preferably at least 90% of the energy of the test signal to pass through. That is, the bandpass filter 13 will filter attenuate noise objects different the target noise object(s).
To obtain a more accurate bandpass filter 13, the example audio signal should comprise a clean example of the target noise object or group of noise objects. To this end the target audio signal may be manually cleaned to remove audio components or noise which is not an example of the target noise object(s) or cleaned with a reliable automatic process. Additionally, a longer example audio signal, with more/longer examples of the target noise object(s) is preferred to avoid averaging errors. For instance, the example audio signal comprises at least one hour, and preferably at least five hours and most preferably at least ten hours of noise object audio content.
As an illustrative example, the target noise object is birdsong whereby an example audio signal with ten hours of clean birdsong is obtained and the frequency distribution determined. The frequency distribution reveals that most of the example signal energy is contained between 3 kHz and 7 kHz whereby a bandpass filter 13 with a passband between 3 kHz and 7 kHz, and a stopband which starts at 1 KHz and 9 kHz respectively, is defined to separate the birdsong from other noise objects present in the non-stationary noise ŜNS.
and wherein α1, α2, and α3 are weighting factors for each of the dry speech Ŝd, the dry speech and early reverberation Ŝe and the dry speech, early reverberation and late reverberation Ŝ1.
Thus, by e.g. setting α2 and α3 to small values relative α1 the dry speech will be emphasized in the output audio signal Sout and by setting α1 and α3 to small values relative α2 the dry speech with early reverberation will be emphasized in the output audio signal Sout.
With late reverberation it is meant speech reverberation with a reverberation time which exceeds a predetermined threshold and with early reverberation it is meant speech reverberation with a time constant below the predetermined threshold.
The speech isolator model 11′ may comprise one trained model for each of the different speech types Ŝd. Ŝe, Ŝl or the speech isolator may comprise a single isolator model 11′ trained to predict one mask for separating each of the different types of speech Ŝd, Ŝe, Ŝl.
While the implementation of the audio processing system 1 in
In
To this end, the classifier 15 may be a neural network trained to predict the presence of at least one noise object given a representation of an audio signal. It is envisaged that the neural network predicts a likelihood of the audio signal comprising one or more predetermined noise objects, wherein the noise object associated with the greatest likelihood is the predicted noise object.
The selector 16 may retrieve the filter 13′ from a database 171 of different sets of filter data 172a, 172b, 172c wherein each set of filter data is associated with a noise object and describes a filter 13′ to be applied. For instance, for each noise object present in the predetermined set of noise objects which are possible outputs of the classifier 15 there is a corresponding set of filter data 172a, 172b, 172c in the database 171. Additionally, as seen in
In the exemplary embodiment shown in
While the audio processing system 1 in
In connection to
For instance, if the classifier 15 predicts the presence of birdsong the selector 16 may select a set of weighting factors 176c which suppresses the stationary noise, amplifies the non-stationary noise and amplifies the speech content as birdsong is considered to not disturb the speech intelligibility while adding a pleasant ambiance. On the other hand, if the classifier 15 predicts the presence of wind sounds the selector 16 may select a different set of weighting factors 176a which suppresses the stationary noise and the non-stationary noise (which includes the wind sound) while amplifying the speech content as wind sounds is considered to not be an unwanted disturbance.
In this manner, the selector 16 automatically selects a suitable weighting factor set 176a, 176b, 176c for all audio signals according to a predetermined set of rules wherein a user or mixing engineer, optionally, provides some preferences to modify the rules. The preferences e.g. indicates a desire to suppress some noise objects more than others (e.g. suppress all manmade noise objects such as machine sounds and traffic sounds but keep all nature sounds such as birdsong, rain sound and thunder sound). Alternatively or additionally, the preferences e.g. indicates a desire to enhance speech intelligibility at the cost of less ambience wherein any reverberation and stationary noise is omitted entirely and any noise object is attenuated.
In some implementations (not shown) the classifier 15 may receive the non-stationary noise content {circumflex over (N)}NS (instead of the entire audio signal) which has been extracted using the output of the stationary noise isolator model 12 and the speech isolator model 11. As the noise objects will be in the non-stationary noise content {circumflex over (N)}NS the classifier 15 can still correctly predict the presence of at least one noise object while the classification can be made more accurate due to the non-stationary noise {circumflex over (N)}NS including only audio content being a true subset of the audio signal content.
During training the internal weights and/or parameters of the isolation models 11, 12 are adjusted so as to predict mask M1 which accurately isolates the speech and mask M2 which accurately isolates the stationary noise. To accomplish this, the resulting audio signal after applying mask M1 is compared to a ground truth signal comprising the clean speech from the speech database 179 and the resulting audio signal after applying mask M2 is compared to a ground truth signal comprising only the stationary noise added from the noise database 177. By changing the internal weights and/or parameters of the isolation models 11, 12 so as to minimize discrepancies between the audio signal with the respective mask applied and the ground truth signal the models 11, 12 will gradually learn to predict masks M1, M2 for accurate speech separation and stationary noise separation.
The one or more noise object isolator models 174a, 174b, 174c of the database 173 described in connection to
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the disclosure discussions utilizing terms such as “processing”, “computing”, “calculating”, “determining”, “analyzing” or the like, refer to the action and/or processes of a computer hardware or computing system, or similar electronic computing devices, that manipulate and/or transform data represented as physical, such as electronic, quantities into other data similarly represented as physical quantities.
It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the embodiments of the invention utilizes more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a computer system or by other means of carrying out the function. Thus, a processor with instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Note that when the method includes several elements, e.g., several steps, no ordering of such elements is implied, unless specifically stated. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the embodiments of the invention. In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
The person skilled in the art realizes that the aspects of the invention by no means is limited to the embodiments described above. On the contrary, many modifications and variations are possible within the scope of the appended claims. For example, while the classifier 15 and selector 16 of the implementations depicted in
Various aspects of the present invention may be appreciated from the following enumerated example embodiments (EEEs):
Number | Date | Country | Kind |
---|---|---|---|
PCT/CN2021/131462 | Nov 2021 | WO | international |
22171560.0 | May 2022 | EP | regional |
This application claims priority of the following priority application: International application PCT/CN2021/131462 (reference: D21131WO), filed 18-11-2021, U.S. provisional application 63/288,996 (reference: D21131USP1), filed 13-12-2021 and U.S. provisional application 63/336,824 (reference: D21131USP2), filed 29-4-2022 and EP patent application Ser. No. 22/171,560.0, filed 4-5-2022, each of which is hereby incorporated by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/047830 | 10/26/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63288996 | Dec 2021 | US | |
63336824 | Apr 2022 | US |