Embodiments of the present disclosure relate to the field of human-computer interaction technology, and in particular, to an audio processing method and apparatus, a device and a storage medium.
With the continuous growth of media content and the rapid development of computer technology, users have the need to interact with media and create personalized media content in the process of using media data. Audio editing is a common way to create media content.
The existing audio editing functions are limited and cannot meet the needs of users for processing and creating based on different audio.
Embodiments of the present disclosure provide an audio processing method and apparatus, a device and a storage medium to improve the efficiency of audio processing and meet the individual needs of users for audio production.
In a first aspect, an embodiment of the present disclosure provides an audio processing method, including:
In a second aspect, an embodiment of the present disclosure provides an audio processing apparatus, including:
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and a memory;
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable memory medium in which the computer-readable memory medium stores a computer-executed instruction, and when the processor executes the computer-executed instruction, the audio processing method described above in the first aspect and various possible designs of the first aspect are implemented.
In a fifth aspect, an embodiment of the present disclosure provides a computer program product, including a computer program, when the computer program is executed by a processor, the audio processing method described above in the first aspect and various possible designs of the first aspect are implemented.
In a sixth aspect, an embodiment of the present disclosure provides a computer program, when the computer program is executed by a processor, the audio processing method described above in the first aspect and various possible designs of the first aspect are implemented.
In order to more clearly illustrate the technical scheme in the embodiments of the present disclosure or in the prior art, a brief description of the drawings required to be used in the embodiments or in the description of the prior art is presented below. It is obvious that the drawings in the description below are embodiments of the present disclosure. For ordinary technical personnel in the field, other drawings can be acquired according to these drawings without any creative labor.
In order to make the purpose, technical scheme and advantages of the embodiments of the present disclosure clearer, the technical scheme in the embodiments of the present disclosure will be clearly and completely described in combination with the drawings attached to the embodiments of the present disclosure. Obviously, the described embodiment is a part of the disclosure embodiment, but not the whole embodiment. Based on the embodiments in the present disclosure, all other embodiments acquired by ordinary technical personnel in the field without creative labor are subject to the protection of the present disclosure.
Aiming at, but not limited to, one or more of the above problems, the embodiments of the present disclosure proposes an audio processing method. The method provides a visual intelligent audio processing process, can automatically realize the integration of a vocal and an accompaniment in the target audio, and directly performs audio editing after intelligent processing, and can package and output the material package when exporting the audio processing file, meet the individual needs of different users, improve the user experience of audio production.
In order to facilitate the understanding of the technical solution provided in the present disclosure, the following first gives a brief introduction to the application scenario of the audio processing method.
As an example, a user accesses the server 102 through the terminal 101, for example, by uploading two pieces of audio data through the terminal 101. The server 102 first respectively performs sound source extraction (including a vocal, a musical instrument, etc.) on the two pieces of audio data, such as acquiring the vocal in one piece of audio data and the accompaniment in another piece of audio data; secondly, paragraph recognition of the vocal and the accompaniment is carried out to acquire the target segment of the vocal and the accompaniment (such as the climax segment); finally, rhythm detection and rhythm alignment are performed on the target segments of the vocal and the accompaniment to generate the mixed target audio. The server 102 sends the target audio to the terminal device 101 for the user to audition, save, and share the target audio, or post-process the target audio.
The terminal device in this embodiment can be any electronic device with information display function, including but not limited to a smart phone, a laptop, a tablet, a smart vehicle device, a smart wearable device, a smart screen, etc.
The server of this embodiment can be a common server or a cloud server, and the cloud server is also called a cloud computing server or a cloud host, which is a host product in the cloud computing service system. The server can also be a server for a distributed system, or a server that combines block chain.
It should be noted that the product implementation form of the present disclosure is a program code contained in platform software and deployed on an electronic device (which can also be a computing cloud or a mobile terminal and other hardware with computing capability). For example, the program code of the present disclosure may be stored inside an electronic device. At runtime, the program code runs in a host memory and/or a GPU memory of the electronic device.
It should also be noted that the technical solution provided in the present disclosure may be applied to a server or a terminal device, or a part of the processing may be performed by the terminal device and a part of the processing may be performed by the server, without any limitation to this embodiment.
The following is a detailed description of the technical solution provided by the present disclosure in combination with several specific embodiments. The following embodiments may be combined with each other and may not be repeated in some embodiments for the same or similar concepts or processes.
The embodiments of the present disclosure provide an audio processing method and apparatus, a device, and a storage medium, the method includes: acquiring a vocal in a piece of audio uploaded by a user in response to a first instruction; acquiring an accompaniment from another piece of audio uploaded by the user in response to the second instruction; mixing and matching the vocal and the accompaniment in the two pieces of audio automatically in response to the third instruction, the efficiency of audio processing is improved and the personalized needs of users for audio production is met.
In this embodiment, the acquiring the vocal in response to the first instruction includes: acquiring audio data containing only the vocal in response to the first instruction.
In a possible implementation, the first instruction is generated in response to the user touching the screen control, and the vocal is acquired according to the first instruction.
In a possible implementation, the first instruction is generated in response to the user clicking on the screen control with the mouse, and the vocal is acquired according to the first instruction.
In a possible implementation, the first instruction is generated in response to the voice control of the user, and the vocal is acquired according to the first instruction.
It should be noted that the first instruction is not only used to indicate the acquisition of the audio data containing the vocal, but also to trigger the extraction of the vocal portion of the audio data.
In this embodiment, the acquiring the accompaniment in response to the second instruction includes: acquiring an audio data containing only the accompaniment in response to the second instruction.
In a possible implementation, the second instruction is generated in response to the user touching the screen control, and the accompaniment is acquired according to the second instruction.
In a possible implementation, the second instruction is generated in response to the user clicking on the screen control with the mouse, the accompaniment is acquired according to the second instruction.
In a possible implementation, the second instruction is generated in response to the voice control of the user, and the accompaniment is acquired according to the second instruction.
It should be noted that the second instruction is not only used to indicate the acquisition of the audio data containing the accompaniment, but also to trigger the extraction of the accompaniment portion of the audio data.
In a possible implementation, the third instruction is generated in response to the user touching the screen control, the voice and the accompaniment are mixed to acquire the target audio according to the third instruction.
In a possible implementation, the third instruction is generated in response to the user clicking on the screen control with the mouse, the voice and accompaniment are mixed to acquire the target audio according to the third instruction.
In a possible implementation, the third instruction is generated in response to the voice control of the user, the voice and accompaniment are mixed to acquire the target audio according to the third instruction.
In the embodiment of the present disclosure, mixing of the vocal and the accompaniment can also be described as performing mashup of the vocal and the accompaniment. The user uploads two pieces of audio data that need for mashup by means of interface touch control or voice control, and the vocal is extracted from one piece of the audio data and the accompaniment is extracted from the other piece of the audio data respectively. The automatic mixing and matching of the vocal and the accompaniment in two pieces of the audio is realized, the efficiency of audio processing is improved, and the personalized needs of users for audio production is met.
On the basis of the above embodiments, the above audio processing process is introduced in detail through a specific embodiment.
In this embodiment, the first interface may also be described as an audio import interface. Optionally, the touch operation for the first control on the first interface, the touch operation for the second control, and the touch operation for the third control include but are not limited to click operations.
Illustratively,
It should be pointed out that the interface controls in this embodiment include but are not limited to icons, buttons, drop-down boxes, sliders, etc. The touch operations include but are not limited to click operations, long press operations, double click operations, sliding operations, etc.
Optionally, in some embodiments, the first voice input by the user is acquired in response to a long-press operation for the first control on the first interface, the first instruction is generated by the speech recognition, the first audio is imported according to the first instruction, and the vocal is extracted from the first audio.
Optionally, in some embodiments, the second voice input by the user is acquired in response to a long-press operation for the second control on the first interface, the second instruction is generated by the speech recognition, the second audio is imported according to the second instruction, and the accompaniment is extracted from the second audio.
Optionally, in some embodiments, the third voice input by the user is acquired in response to a long-press operation for the third control on the first interface, the third instruction is generated by the speech recognition, and the vocal and accompaniment are mixed according to the third instruction to acquire the target audio.
Optionally, in some embodiments, the user may also enter a control voice via a device physical button, such as a smartphone side key, to import the first or second audio as described above or to perform audio mixing.
Based on the first interface shown in
As an example,
For example, the user uploads the recorded playing and singing audio and uploads the existing finished music work at the same time. By extracting the vocal in the user's playing and singing video and the accompaniment in the finished music work, the extracted vocal and the extracted accompaniment are mixed to acquire the target audio, the target audio integrates the user's vocal and the existing accompaniment. The above audio processing process greatly facilitates the users to create personalized music and meet the music creation needs of different users.
Illustratively,
Optionally, the first interface 400 further includes: a fourth control 405 and a fifth control 406. The fourth control 405 is used to trigger custom processing of the vocal and/or the accompaniment, the custom processing includes audio clip of the vocal and/or the accompaniment. The fifth control is used to trigger audio editing of the vocal and/or the accompaniment (go to the recording studio for audio editing or processing), as detailed below.
In an implementation of this embodiment, the acquiring the target audio by mixing the vocal and the accompaniment includes: acquiring a vocal segment of the vocal and an accompaniment segment of the accompaniment; acquiring the target audio by mixing the vocal segment and the accompaniment segment. That is, when mixing the vocal and the accompaniment, the mixable vocal segment and the mixable accompaniment segment are first extracted respectively from the vocal and accompaniment, and then the audio mixing is performed according to the vocal segment and accompaniment segment to acquire the target audio. Specifically, the vocal segment and accompaniment segment can be acquired by the following embodiments.
In an implementation, the vocal and the accompaniment are input into a paragraph recognition model respectively to acquire the vocal segment of the vocal and the accompaniment segment of the accompaniment. The paragraph recognition model is used to identify the target segment of audio. Specifically, the vocal is input into the paragraph recognition model to acquire the target segment of the vocal. The accompaniment is input into the paragraph recognition model to acquire the target segment of the accompaniment. The target segment can be a chorus segment, a climax segment, or other segment of the audio, for example, the target segment is a repeating segment in a song.
Optionally, the paragraph recognition model can be trained using a deep learning model, and this embodiment does not limit the structure of the deep learning model. The embodiment realizes the intelligent extraction of the vocal segment and the accompaniment segment by training the model, which can improve the efficiency and accuracy of the audio processing.
Optionally, the training process of the paragraph recognition model includes acquiring a training data set, where the training data set includes multiple audio samples and labeling information for each audio sample, and the labeling information is used to indicate the target segment corresponding to the audio sample; taking the multiple audio samples in the training data set as the input of the paragraph recognition model; taking the labeling information of each audio sample in the training data set as the output of the paragraph recognition model; training the paragraph recognition model until the loss function of the paragraph recognition model converges; stopping the training of the paragraph recognition model; and obtaining the model parameter of the trained paragraph recognition model.
In this embodiment, the paragraph recognition model can be used to analyze the rhythm and loudness changes and other information of the input audio, identify the audio prelude, verse, chorus, interlude, bridge, epilogue, mute and other segments, and extract the most likely chorus, namely the climax. Specifically, the start and end time stamps of different segments are extracted, and subsequent clipping is carried out to finally output the target segment of the audio.
In an implementation, displaying a soundtrack of a vocal and a soundtrack of the accompaniment on the second interface in response to a touch operation for a fourth control on a first interface; acquiring the vocal segment in response to an editing operation for the soundtrack of the vocal; and acquiring the accompaniment segment in response to an editing operation for the soundtrack of the accompaniment.
This implementation acquires the target segment of the vocal and the target segment of the accompaniment through the user editing the segment on the interface for subsequent audio mixing. This method increases the user's custom processing of imported vocal and accompaniment, improves the user's participation in audio production, and meets the needs of different users in audio production.
In an implementation of this embodiment, the acquiring the target audio by mixing the vocal and the accompaniment includes: acquiring a first rhythm of the vocal and a second rhythm of the accompaniment, performing rhythm alignment on the first rhythm of the vocal and the second rhythm of the accompaniment, and mixing based on the aligned vocal and accompaniment to acquire the target audio.
In an implementation, the second rhythm of the accompaniment is adjusted based on the first rhythm of the vocal, so that the first rhythm of the vocal is consistent with the second rhythm of the accompaniment.
In an implementation, the first rhythm of the vocal is adjusted based on the second rhythm of the accompaniment, so that the first rhythm of the vocal is consistent with the second rhythm of the accompaniment.
In an implementation of this embodiment, the acquiring the target audio by mixing the vocal and the accompaniment includes: acquiring the target audio by mixing the vocal segment of the vocal and the accompaniment segment of the accompaniment. Specifically, the first rhythm of the vocal segment and the second rhythm of the accompaniment segment are acquired, and the rhythm alignment is performed on the first rhythm of the vocal segment and the second rhythm of the accompaniment segment, the aligned vocal segment and the accompaniment segment is mixed to acquire the target audio.
In an implementation, the second rhythm of the accompaniment segment is adjusted based on the first rhythm of the vocal segment, so that the first rhythm of the vocal segment is consistent with the second rhythm of the accompaniment segment.
In an implementation, the first rhythm of the vocal segment is adjusted based on the second rhythm of the accompaniment segment, so that the first rhythm of the vocal segment is consistent with the second rhythm of the accompaniment segment.
Based on the above several embodiments of the rhythm alignment, it can be seen that:
The third audio may be one audio out of the vocal and the accompaniment, and correspondingly, the fourth audio may be another audio out of the vocal and the accompaniment, or, the third audio may be one audio out of the vocal segment and the accompaniment segment, and the fourth audio may be another audio out of the vocal segment and the accompaniment segment.
It should be noted that the above embodiments involve the rhythm detection of the audio or the audio segment, which is used to detect the downbeat time in the beat and infer the speed of the entire audio or audio segment. The adjustment of the audio rhythm involves stretching or compressing the audio rhythm, usually by aligning the rhythm of the vocal track to the accompaniment track, and processing the vocal track file by stretching or compressing the audio.
Through the rhythm detection and alignment of two audio or two audio segments, the vocal and accompaniment in the mixed target audio are better integrated, and the audio processing effect is improved.
In an implementation of this embodiment, in response to a touch operation for a third control on the first interface, the interface jumps to a third interface, where the third interface includes a third playing control, the third playing control is used to trigger playing of the target audio. The third interface is the audio mixing preview interface. The following is a graphical illustration of the user interface changes to acquire the target audio after the user imports two audio segments.
Illustratively,
Illustratively,
Illustratively,
Based on the above graphical third interface, the following will provide a detailed explanation of each functional control on the third interface through several specific embodiments.
In an implementation of this embodiment, displaying a first window in response to a touch operation for a cover editing control on a third interface; acquiring a target cover in response to the control selection operation on the first window. The first window includes a cover import control, one or more preset static cover controls, and one or more preset animation effect controls.
Optionally, the target cover is a static cover or a dynamic cover.
In an implementation, if the target cover is the dynamic cover, the acquiring the target cover in response to the control selection operation on the first window includes: acquiring a static cover and animation effect in response to the control selection operation on the first window; generating a dynamic cover changing with an audio characteristic of the target audio according to the audio characteristics of the target audio, the static cover and the animation effect. The audio characteristic includes an audio beat and/or volume.
Illustratively,
Illustratively,
By providing users with the function of setting the audio cover, this embodiment realizes the personalized editing of the cover by different users and improves the user experience of audio production.
In an implementation of this embodiment, the data associated with the target audio is exported to the target location in response to an export instruction on the third interface. Optionally, the target location includes an album or a file system.
Illustratively, as shown in
Optionally, in some implementations, a fourth voice input by the user is acquired in response to a long-press operation for an export control 702 on a third interface 701, an export instruction is generated by speech recognition, and the data associated with the target audio is exported to the target location according to the export instruction.
In an implementation of this embodiment, the data associated with the target audio is shared to the target application in response to a sharing instruction on the third interface.
For example, as shown in
Optionally, in some implementations, a fifth voice input by the user is acquired in response to the long-press operation for the sharing control 704 on the third interface 701, the sharing instruction is generated by speech recognition, and the data associated with the target audio is shared, according to the sharing instruction, to the target application or the specified user in the target application.
Optionally, the data associated with the target audio includes at least one of the following: the target audio, the vocal, the accompaniment, a vocal segment of the vocal, an accompaniment segment of the accompaniment, a static cover of the target audio, and a dynamic cover of the target audio.
In summary, the data exported or shared by the user can contain only the target audio, or it can contain all the intermediate data during the process of acquiring the target audio. If the exported or shared data is too large, the data can be compressed and then the compressed data is exported locally or shared with other users. If the shared data received by other users contains all the intermediate data in the process of acquiring the target audio, the user can not only play the target audio, but also query or re-edit the intermediate data to generate new target audio, so as to realize the cooperation of multiple users in audio production, increase the interaction between users, and improve the user experience.
In an implementation of this embodiment, jumping from a third interface to a fourth interface in response to a touch operation for an audio editing control on the third interface, where the fourth interface includes an audio processing function control. The fourth interface is the interface for audio post-processing, which can also be described as a recording studio interface, the user can perform audio post-processing on the vocal and the accompaniment in the target audio on the fourth interface.
In an implementation of this embodiment, jumping from a third interface to a fourth interface in response to a touch operation for an audio editing control on the third interface, where the fourth interface includes a trigger control associated with the audio processing function control, and the trigger control is used to trigger display of the audio processing function control.
Optional, the audio processing function control include one or more of the following:
Optionally, the audio optimization includes the optimization processing of the vocal and/or the accompaniment of the playing and singing audio of the user, that is, the audio optimization includes the optimization of playing and singing, such as the optimization processing of male guitar, female guitar, male piano, female piano, etc.
Optionally, the extraction of the accompaniment includes removal of vocal, removal of instruments, and other extraction processing.
Optionally, the style synthesis includes such as hot songs in car, classic pop, heart moments, relaxation moments, childhood memories, reggae style and other style optimization.
Optionally, the audio mashup includes optimization processing for rhythm alignment, shifting, etc.
Illustratively,
Illustratively,
In the embodiment of the present disclosure, the user imports the first audio by touching the first control on the first interface, and extracts the vocal from the first audio; then imports the second audio by touching the second control on the first interface, and extract the accompaniment from the second audio; finally, the vocal and accompaniment are mixed to acquire the target audio by touching the third control on the first interface. The above process realizes the automatic mashup of the vocal and accompaniment in two pieces of audio, improves the audio processing effect, and meets the personalized needs of users for audio production.
For the audio processing method corresponding to the above embodiment,
The acquisition module 1101 is configured to acquire a vocal in response to a first instruction;
The acquisition module 1101 is further configured to acquire an accompaniment in response to a second instruction;
The processing module 1102 is configured to acquire a target audio by mixing the vocal and the accompaniment in response to a third instruction.
In an embodiment of the present disclosure, the acquisition module 1101 is configured to import first audio and extract the vocal from the first audio in response to a touch operation for a first control on a first interface;
The acquisition module 1101 is further configured to import second audio and extract the accompaniment from the second audio in response to a touch operation for a second control on the first interface.
In an embodiment of the present disclosure, the processing module 1102 is configured to acquire the target audio by mixing the vocal and the accompaniment in response to a touch operation for a third control on the first interface.
In an embodiment of the present disclosure, the processing module 1102 is configured to:
In an embodiment of the present disclosure, the processing module 1102 is configured to:
In an embodiment of the present disclosure, the audio processing apparatus 1100 further includes: a display module 1103;
In an embodiment of the present disclosure, the processing module 1102 is configured to:
In an embodiment of the present disclosure, the processing module 1102 is configured to adjust the second rhythm of the fourth audio based on the first rhythm of the third audio to make the first rhythm of the third audio and the second rhythm of the fourth audio are consistent.
In an embodiment of the present disclosure, the first interface includes:
In an embodiment of the present disclosure, the processing module 1102 is configured to jump to a third interface in response to a touch operation for the third control on the first interface, where the third interface includes a third playing control, and the third playing control is used to trigger playing of the target audio.
In an embodiment of the present disclosure, the display module 1103 is configured to display a first window in response to a touch operation for a cover editing control on a third interface, where the first window includes a cover import control, one or more preset static cover controls, and one or more preset animation effect controls;
In an embodiment of the present disclosure, the acquisition module 1101 is configured to acquire a static cover and animation effect in response to a control selection operation on the first window;
In an embodiment of the present disclosure, the processing module 1102 is configured to export data associated with the target audio to a target location in response to an export instruction on a third interface; where the target location includes an album or a file system.
In an embodiment of the present disclosure, the processing module 1102 is configured to share data associated with the target audio to a target application in response to a sharing instruction on a third interface.
In an embodiment of the present disclosure, the data associated with the target audio includes at least one of the following:
In an embodiment of the present disclosure, the processing module 1102 is configured to jump from a third interface to a fourth interface in response to a touch operation for an audio editing control on the third interface, where the fourth interface includes an audio processing function control or a trigger control associated with the audio processing function control, and the trigger control is used to trigger the display of the audio processing function control;
The audio processing apparatus provided in this embodiment can be used to execute the technical solution of the above method embodiments, the realization principles and technical effects therebetween are similar, and the embodiment will not be repeated here.
As shown in
Generally, the following apparatus can be connected to the I/O interface 1205: including: input apparatus 1206 such as touch screen, touchpad, key board, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1207 such as a liquid crystal display (Liquid Crystal Display, LCD), loudspeaker, vibrator, etc.; a storage apparatus 1208 such as magnetic tape, hard disk, etc.: and the communication apparatus 1209. The communication apparatus 1209 allows the electronic device 1200 to communicate wirelessly or wired with other devices to exchange data. Although
In particular, the process described in the reference flow diagram above may be implemented as a computer software program in accordance with an embodiment of the present disclosure. For example, an embodiment of the present disclosure includes a computer program product that includes a computer program hosted on a computer readable medium, the computer program contains program code for performing the method shown in the flow diagram. In such an embodiment, the computer program can be downloaded and installed from the network via the communication apparatus 1209, or from the storage apparatus 1208, or the ROM 1202. When the computer program is executed by the processing apparatus 1201, the functions defined in the method of the embodiment of the present disclosure are performed.
It should be noted that the computer readable medium mentioned in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. Computer readable storage medium may, for example, be, but are not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, equipment or devices, or any combination of the above. More specific examples of computer readable storage medium may include, but are not limited to: electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM or flash memory), a fiber optic, a portable compact disk read-only memory (Portable Compact Disk Read-Only Memory, CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium containing or storing a program, the program may be used by or in combination with an instruction execution system, equipment or device. In the present disclosure, a computer readable signal medium may include a data signal propagated in baseband or as part of a carrier that carries computer readable program code. Such transmitted data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium which may send, propagate or transmit a program used by or in combination with an instruction execution system, equipment or device. The program code contained on the computer readable medium may be transmitted in any appropriate medium, including but not limited to: wire, optical cable, Radio Frequency (RF), etc., or any suitable combination of the above.
The computer readable medium may be included in the electronic device; or it may exist alone and not be incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, enables the electronic device to perform the method shown in the embodiment.
Computer program code used to perform operations of the disclosure can be written in one or more programming languages, or combinations of them, the programming languages include object-oriented programming languages such as Java, Smalltalk, C++, as well as regular procedural programming languages such as the “C” language or similar programming languages. Program code may execute completely on the user's computer, partly on the user's computer, as a stand-alone package, partly on the user's computer and partly on a remote computer, or completely on a remote computer or server. In the case of a remote computer, the remote computer may connect to the user computer through any kind of Network, including a local area network (Local Area Network, LAN) or a wide area network (Wide Area Network, WAN), or, connect to an external computer (for example, using an Internet service provider to connect over the Internet).
The flow diagrams and block diagrams in the attached drawings illustrate the possible realization of the architecture, functions and operations of the systems, methods and computer program products in accordance with the various embodiments of the present disclosure. At this point, each block in a flow diagram or block diagram may represent a module, program segment, or part of code that contains one or more executable instructions to implement a specified logical function. It should also be noted that in some alternative implementations, the functions labeled in the block can also occur in a different order than those labeled in the attached drawings. For example, two blocks represented back-to-back can actually be executed mostly in parallel, and they can sometimes be executed in reverse order, depending on the functionality involved. Note also that each block in the block diagram and/or flow diagram, and the combination of blocks in the block diagram and/or flow diagram, can be implemented either with a dedicated hardware-based system that performs a specified function or operation, or with a combination of dedicated hardware and computer instructions.
The units described in an embodiment of the present disclosure may be implemented by means of software or hardware. In some cases, the name of the unit does not qualify the unit itself. For example, the first acquisition unit can also be described as “the unit that acquires at least two Internet protocol addresses”.
The functions described above in this article can be performed at least in part by one or more hardware logical parts. For example, unrestrictedly, demonstration types of hardware logic parts that can be used include: a field-programmable gate array (Field-Programmable Gate Array, FPGA), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an application specific standard product (Application Specific Standard Product, ASSP), a system on chip (System on Chip, SOC), a complex programmable logic device (Complex Programmable Logic Device, CPLD) and so on.
In the context of the present disclosure, a machine readable medium may be a tangible medium that may contain or store a program for use by or in combination with an instruction execution system, apparatus, or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. The machine readable storage medium may include, but is not limited to an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium may include an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read only memory (EPROM or a flash memory), a fiber optic, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In a first aspect, an audio processing method is provided according to one or more embodiments of the present disclosure, including:
According to one or more embodiments of the present disclosure, the acquiring the vocal in response to the first instruction includes:
According to one or more embodiments of the present disclosure, the acquiring the target audio by mixing the vocal and the accompaniment in response to the third instruction includes:
According to one or more embodiments of the present disclosure, the acquiring the target audio by mixing the vocal and the accompaniment includes:
According to one or more embodiments of the present disclosure, the acquiring the vocal segment of the vocal and the accompaniment segment of the accompaniment includes:
According to one or more embodiments of the present disclosure, the acquiring the vocal segment of the vocal and the accompaniment segment of the accompaniment includes:
According to one or more embodiments of the present disclosure, the acquiring the target audio by mixing the vocal and the accompaniment includes:
According to one or more embodiments of the present disclosure, the performing the rhythm alignment on the first rhythm of the third audio and the second rhythm of the fourth audio includes:
According to one or more embodiments of the present disclosure, the first interface includes:
According to one or more embodiments of the present disclosure, further including: jumping to a third interface in response to a touch operation for the third control on the first interface, where the third interface includes a third playing control, and the third playing control is used to trigger playing of the target audio.
According to one or more embodiments of the present disclosure, further including: displaying a first window in response to a touch operation for a cover editing control on a third interface, where the first window includes a cover import control, one or more preset static cover controls, and one or more preset animation effect controls;
According to one or more embodiments of the present disclosure, if the target cover is a dynamic cover, the acquiring the target cover in response to the control selection operation on the first window includes:
According to one or more embodiments of the present disclosure, further including: exporting data associated with the target audio to a target location in response to an export instruction on a third interface; where the target location includes an album or a file system.
According to one or more embodiments of the present disclosure, further including: sharing data associated with the target audio to a target application in response to a sharing instruction on the third interface.
According to one or more embodiments of this disclosure, the data associated with the target audio includes at least one of the following:
According to one or more embodiments of the present disclosure, further including: jumping from a third interface to a fourth interface in response to a touch operation for an audio editing control on the third interface, where the fourth interface includes an audio processing function control or a trigger control associated with the audio processing function control, and the trigger control is used to trigger display of the audio processing function control;
In a second aspect, an audio processing apparatus is provided according to one or more embodiments of the present disclosure, including:
According to one or more embodiments of the present disclosure, the acquisition module is configured to import first audio and extract the vocal from the first audio in response to a touch operation for a first control on a first interface;
According to one or more embodiments of the present disclosure, the processing module is configured to acquire the target audio by mixing the vocal and the accompaniment in response to a touch operation for a third control on the first interface.
According to one or more embodiments of the present disclosure, the processing module is configured to:
According to one or more embodiments of the present disclosure, the processing module is configured to:
According to one or more embodiments of the present disclosure, the audio processing apparatus also includes: a display module;
According to one or more embodiments of the present disclosure, the processing module is configured to:
According to one or more embodiments of the present disclosure, the processing module is configured to adjust the second rhythm of the fourth audio based on the first rhythm of the third audio to make the first rhythm of the third audio and the second rhythm of the fourth audio are consistent.
According to one or more embodiments of the present disclosure, the first interface includes:
According to one or more embodiments of the present disclosure, the processing module is configured to jump to a third interface in response to a touch operation for the third control on the first interface, where the third interface includes a third playing control, and the third playing control is used to trigger playing of the target audio.
According to one or more embodiments of the present disclosure, the display module is configured to display a first window in response to a touch operation for a cover editing control on a third interface, where the first window includes a cover import control, one or more preset static cover controls, and one or more preset animation effect controls;
According to one or more embodiments of the present disclosure, the acquisition module is configured to acquire a static cover and animation effect in response to a control selection operation on the first window;
According to one or more embodiments of the present disclosure, the processing module is configured to export data associated with the target audio to a target location in response to an export instruction on a third interface; where the target location includes an album or a file system.
According to one or more embodiments of the present disclosure, the processing module is configured to share data associated with the target audio to a target application in response to a sharing instruction on a third interface.
According to one or more embodiments of the present disclosure, the data associated with the target audio includes at least one of the following:
According to one or more embodiments of the present disclosure, the processing module is configured to jump from a third interface to a fourth interface in response to a touch operation for an audio editing control on the third interface, where the fourth interface includes an audio processing function control or a trigger control associated with the audio processing function control, and the trigger control is used to trigger the display of the audio processing function control;
In a third aspect, according to one or more embodiments of the present disclosure, an electronic device is provided, including: at least one processor and a memory;
In a fourth aspect, according to one or more embodiments of the present disclosure, a computer-readable memory medium is provided in which the computer-readable memory medium stores a computer-executed instruction, and when the processor executes the computer-executed instruction, the audio processing method described above in the first aspect and various possible designs of the first aspect are implemented.
In a fifth aspect, according to one or more embodiments of the present disclosure, a computer program product is provided, including a computer program, when the computer program is executed by a processor, the audio processing method described above in the first aspect and various possible designs of the first aspect are implemented.
In a sixth aspect, an embodiment of the present disclosure provides a computer program, when the computer program is executed by a processor, the audio processing method described above in the first aspect and various possible designs of the first aspect are implemented.
The above description is only a better published embodiment and an explanation of the technical principles used. The person skilled in the field shall understand that the scope of disclosure covered by the present disclosure shall not be limited to technical schemes resulting from a particular combination of such technical characteristics, but shall also cover other technical schemes resulting from any combination of such technical characteristics or their equivalent without being isolated from such disclosure ideas. For example, a technical scheme formed by substituting the above features with the similar functional technical features disclosed in the present disclosure (without limitation).
Furthermore, although operations are described in a particular order, this should not be understood as requiring that they be performed in the particular order indicated or in a sequential order. In certain circumstances, multitasking and parallel processing can be beneficial. Similarly, although a number of concrete implementation details are included in the above discussion, these should not be interpreted as limiting the scope of the present disclosure. Some of the characteristics described in the context of a single embodiment can also be realized in combination in a single embodiment. Conversely, various characteristics described in the context of a single embodiment can also be realized in multiple embodiments individually or in any suitable sub-combination.
Although the subject has been described in language specific to structural features and/or methodological logical actions, it should be understood that the subject defined in the attached claims is not necessarily limited to the specific features or actions described above. Rather, the specific features and actions described above are merely sample forms of implementing a claim.
Number | Date | Country | Kind |
---|---|---|---|
202210495456.4 | May 2022 | CN | national |
This application is a National Stage of International Application No. PCT/CN2023/092377, filed May 5, 2023, which claims priority to Chinese Patent Application No. 202210495456.4, filed May 7, 2022, both of which are hereby incorporated by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2023/092377 | 5/5/2023 | WO |