Sound quality detection method and device for homologous audio and storage medium

Description

CROSS-REFERENCE TO RELATED APPLICATION

This application is a U.S. national stage of international application No. PCT/CN2019/130094, filed on Dec. 30, 2019, which claims priority to Chinese Patent Application No. 201910468263.8, filed on May 31, 2019 and entitled “METHOD FOR DETECTING TONE QUALITY OF HOMOLOGOUS AUDIO, DEVICE AND STORAGE MEDIUM,” which is incorporated herein by reference in its entirety.

TECHNICAL FIELD

The present application relates to the field of audio technologies, and in particular, relates to a sound quality detection method and device for homologous audio and a storage medium.

BACKGROUND

At present, a music platform usually stores a large number of homologous audio files. Homologous audio files are audio files acquired by transcoding the same audio file one or more times, for example, audio files of the same song with different sound quality.

Due to the large number of homologous audio files stored in the music platform and uneven sound quality of the audio files, costs for storing, acquiring, and managing the homologous audio files are relatively high. Therefore, the sound quality of the homologous audio files needs to be detected to effectively manage the homologous audio files based on the sound quality, thereby reducing the costs of storing, acquiring, and managing the homologous audio files.

SUMMARY

Embodiments of the present application provide a sound quality detection method and device for homologous audio and a storage medium. The technical solutions are as follows:

According to one aspect, a sound quality detection method for homologous audio is provided. The method includes:

acquiring a plurality of audio files to be detected, wherein the plurality of audio files are homologous audio files;

acquiring at least one audio feature of each of the plurality of audio files by performing feature extraction on the audio file, and generating a correspondence list between the at least one audio feature of each of the plurality of audio files and an audio file identifier; and

determining, using a sound quality detection model, a sound quality score of each of the plurality of audio files based on the correspondence list between the at least one audio feature of each of the plurality of audio files and the audio file identifier, wherein the sound quality detection model is configured to detect sound quality of homologous audio files.

Optionally, acquiring the at least one audio feature of each of the plurality of audio files by performing the feature extraction on the audio file includes:

by performing the feature extraction on a first audio file in the plurality of audio files, acquiring at least one of a sampling rate, a bit depth, a bitrate, a maximum value among energy roll-off differences of all frames, a spectral contrast, spectral flatness in time, a mean value of an energy shadow region upon audio energy normalization, a mean value and variance of normalized energy of all frames in time, a peak ratio of envelope amplitudes of all frames, spectral entropy, a spectral centroid, and a spectral height of the first audio file, wherein the first audio file is any one of the plurality of audio files.

Optionally, determining, using the sound quality detection model, the sound quality score of each of the plurality of audio files based on the correspondence list between the at least one audio feature of each of the plurality of audio files and the audio file identifier includes:

inputting the correspondence list between the at least one audio feature of each of the plurality of audio files and the audio file identifier to the sound quality detection model, and outputting the sound quality score of each of the plurality of audio files by the sound quality detection model.

Optionally, prior to determining, using the sound quality detection model, the sound quality score of each of the plurality of audio files based on the correspondence list between the at least one audio feature of each of the plurality of audio files and the audio file identifier, the method further includes:

acquiring a plurality of sets of sample data, wherein each of the plurality of sets of sample data includes a plurality of sample audio files that are homologous audio files, and sample sound quality scores of the plurality of sample audio files; and

acquiring the sound quality detection model by training a to-be-trained sound quality detection model based on the plurality of sets of sample data.

Optionally, acquiring the plurality of sets of sample data may specifically include:

acquiring a source audio file for any set of sample data in the plurality of sets of sample data;

acquiring the plurality of sample audio files by continuously performing lossy transcoding on the source audio file M times, wherein M is a positive integer;

determining the sample sound quality score of each of the plurality of sample audio files; and

determining the plurality of sample audio files and the sample sound quality scores of the plurality of sample audio files as the any set of sample data.

Optionally, acquiring the plurality of sample audio files by continuously performing the lossy transcoding on the source audio file M times includes:

acquiring a lossy audio file by performing the lossy transcoding on the source audio file;

determining the lossy audio file as an r^thlossy audio file, and letting r=1;

acquiring an (r+1)^thlossy audio file by performing the lossy transcoding on the r^thlossy audio file;

in the case that r+1 is not equal to M, letting r=r+1, and returning to the step of acquiring the (r+1)^thlossy audio file by performing the lossy transcoding on the r^thlossy audio file; and

in the case that r+1 is equal to M, determining the source audio file and the first lossy audio file to an M^thlossy audio file as the plurality of sample audio files.

acquiring a plurality of sets of test data, wherein each set of test data includes a plurality of test audio files that are homologous audio files and sample sound quality scores of the plurality of test audio files;

determining, using the sound quality detection model, a test sound quality score of each of the plurality of test audio files in each of the plurality of sets of test data;

comparing the test sound quality score of each of the plurality of test audio files in each set of test data with the sample sound quality score; and

performing the step of determining, using the sound quality detection model, the sound quality score of each of the plurality of audio files based on the correspondence list between the at least one audio feature of each of the plurality of audio files and the audio file identifier in response to determining, based on a comparison result, that the sound quality detection model meets a sound quality detection condition.

Optionally, upon comparing the test sound quality score of each of the plurality of test audio files in each set of test data with the sample sound quality score, the method further includes:

updating the sound quality detection model based on the plurality of sets of test data in response to determining, based on the comparison result, that the sound quality detection model does not meet the sound quality detection condition; and

determining, using the sound quality detection model, the sound quality score of each of the plurality of audio files based on the correspondence list between the at least one audio feature of each of the plurality of audio files and the audio file identifier includes:

determining, using the sound quality detection model as updated, the sound quality score of each of the plurality of audio files based on the correspondence list between the at least one audio feature of each of the plurality of audio files and the audio file identifier.

Optionally, upon determining, using the sound quality detection model, the sound quality score of each of the plurality of audio files based on the correspondence list between the at least one audio feature of each of the plurality of audio files and the audio file identifier, the method further includes:

selecting first N audio files in the plurality of audio files ranked in descending order of their sound quality scores, wherein N is a positive integer; and

determining the N audio files as first-type audio files and audio files other than the N audio files in the plurality of audio files as second-type audio files.

Optionally, upon determining the N audio files as the first-type audio files and the audio files other than the N audio files in the plurality of audio files as the second-type audio files, the method further includes:

deleting the second-type audio files.

According to one aspect, a sound quality detection device for homologous audio is provided. The device includes: a processor; and a memory configured to store at least one instruction executable by the processor; wherein the processor, when executing the at least one instruction, is caused to perform:

acquiring a plurality of audio files to be detected, wherein the plurality of audio files are homologous audio files;

Optionally, the processor, when executing the at least one instruction, is caused to perform:

Optionally, the processor, when executing the at least one instruction, is further caused to perform:

acquiring a plurality of sets of sample data, wherein each of the plurality of sets of sample data includes a plurality of sample audio files that are homologous audio files and sample sound quality scores of the plurality of sample audio files; and