The present application relates to the field of information processing technologies and, in particular, to an audio processing method and apparatus, a device and a storage medium.
With the continuous development of computer technologies and the continuous growth of people's demands in personalities, more and more users are not satisfied with the unchanging media creation style, and hope to create media content with their own styles. Audio editing is a typical way for users to edit media content to create stylized media content.
The existing audio editing functions are limited and cannot meet the diverse and personalized media creation needs of users. Therefore, it is urgent to expand different audio editing functions to meet the diverse and personalized needs of users.
Embodiments of the present application provide an audio processing method and apparatus, a device and a storage medium, to improve the diversification of audio editing functions to meet personalized needs of users.
In a first aspect, an embodiment of the present disclosure provides an audio processing method, including:
In a second aspect, an embodiment of the present disclosure provides an audio processing apparatus, including:
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor and a memory;
In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where a computer-executed instruction is stored in the computer-readable storage medium, and when the computer-executed instruction is executed by a processor, the audio processing method described in the above first aspect and various possible designs of the first aspect is implemented.
In a fifth aspect, an embodiment of the present disclosure provides a computer program product, including a computer program, where when the computer program is executed by a processor, the audio processing method described in the above first aspect and various possible designs of the first aspect is implemented.
In a sixth aspect, an embodiment of the present disclosure provides a computer program, where when the computer program is executed by a processor, the audio processing method described in the above first aspect and various possible designs of the first aspect is implemented.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to describe the principles of the disclosure.
To make the objectives, technical solutions and advantages of embodiments of the present disclosure clearer, in the following, the technical solutions in the embodiments of the present disclosure will be clearly and comprehensively described with reference to the accompanying drawings in the embodiments of the present disclosure. Apparently, the described embodiments are some but not all of the embodiments of the present disclosure. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in the present disclosure without paying creative efforts shall fall within the protection scope of the present disclosure.
Embodiments of the present disclosure aim at the problem that the existing audio editing functions cannot meet the needs of users for diversified and personalized audio production, and propose an audio processing method. With this method, not only extraction processing on audio, for example, extraction of a vocal and/or an accompaniment, can be performed, but also the extracted vocal and/or accompaniment can be presented to a user for audition, storage, sharing or post-processing, thus diverse needs of the user can be met and the user experience is improved.
The technical solution provided by the embodiments of the present disclosure can be applied to a scenario where an electronic device processes audio. The electronic device here may be any device having an audio processing function, and may be a terminal device, or a server or a virtual machine, etc., or may be a distributed computer system composed of one or more servers and/or computers, etc. The terminal device here includes but is not limited to a smart phone, a notebook computer, a desktop computer, a platform computer, a vehicle-mounted device, a smart wearable device, a smart screen, etc., which is not limited in the embodiments of the present disclosure. The server may be an ordinary server or a cloud server. The cloud server is also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system. The server here may also be a server of a distributed system, or a server combined with a blockchain.
It is worth noting that product implementation of the present disclosure is in a form of a program code included in a platform software and deployed on an electronic device (or hardware having computing capabilities, such as a computing cloud or a mobile terminal). Illustratively, the program code of the present disclosure may be stored inside the electronic device. At runtime, the program code runs in a host memory and/or a GPU memory of the electronic device.
In the embodiments of the present disclosure, “multiple” means two or more. “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships. For example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists alone. The character “/” generally indicates that the contextual objects are in an “or” relationship.
In the following, the technical solution of the present disclosure will be described in detail through specific embodiments. It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
Embodiments of the present disclosure provide an audio processing method and apparatus, a device, and a storage medium. By acquiring, in response to an audio acquisition instruction, to-be-processed audio, performing, in response to an audio extraction instruction for the to-be-processed audio, audio extraction on the to-be-processed audio, to obtain target audio, where the target audio is a vocal and/or an accompaniment extracted from the to-be-processed audio, and presenting the target audio, the directly extracted vocal and/or accompaniment can be presented to a user for the user to play, save, share or process, etc., the diverse needs of users can be met and the user experience is improved.
Illustratively,
S101, acquire, in response to an audio acquisition instruction, to-be-processed audio.
In the embodiment of the present disclosure, when a user uses the electronic device to process audio, the audio acquisition instruction may be sent to the electronic device, so that the electronic device acquires the to-be-processed audio, in response to the acquired audio acquisition instruction.
Illustratively, the audio acquisition instruction may be sent by the user through a human-computer interaction interface of the electronic device, for example, by touch-controlling a control on the human-computer interaction interface, or by voice (in this case, the electronic device has controls with functions such as voice acquisition or playing), and there is no limit in this regards here.
In an implementation, in response to the detected or received audio acquisition instruction, the electronic device may receive the to-be-processed audio from other devices, or read the to-be-processed audio from a database stored by itself (at this time, a database is deployed in the electronic device), or may fetch the to-be-processed audio from a cloud. The way of acquiring the to-be-processed audio is not limited in the embodiment of the present disclosure, and may be determined according to an actual scenario, and details are not described here.
It can be understood that in the embodiment of the present disclosure, the to-be-processed audio acquired by the electronic device may be preprocessed audio, for example, audio data obtained through audio extraction on an acquired target video by the electronic device, or may be unprocessed audio. There is no limit in this regards in the embodiment.
S102, perform, in response to an audio extraction instruction for the to-be-processed audio, audio extraction on the to-be-processed audio, to obtain target audio, where the target audio is a vocal and/or an accompaniment extracted from the to-be-processed audio.
Illustratively, after the electronic device acquires the to-be-processed audio, the user can send the audio extraction instruction to the electronic device, so that the electronic device, in response to the audio extraction instruction, performs the audio extraction on the to-be-processed audio and extracts the target audio from the to-be-processed audio, to obtain the vocal and/or the accompaniment extracted from the to-be-processed audio, that is, the target audio may be at least one of the vocal and the accompaniment.
Illustratively, the electronic device may acquire the audio extraction instruction sent by the user through the human-computer interaction interface, or may acquire the audio extraction instruction sent by the user by voice. There is no limit in this regards in the embodiment.
S103, present the target audio.
In the embodiment, after extracting the target audio from the to-be-processed audio, the electronic device can present the target audio for the user to play, save, share and/or process.
Illustratively, the electronic device may present the target audio on an interface of a target application on which controls operable by the user, such as a save control, a play control, a process control, and the like, are deployed. In an implementation, the process control is configured to trigger the presentation of the target audio on a processing page. The processing page may be a page for performing audio processing. On this page, users can perform various audio editing and/or processing, and output final processing results.
In the audio processing method provided by the embodiment of the present disclosure, the to-be-processed audio is acquired in response to the audio acquisition instruction; in response to the audio extraction instruction for the to-be-processed audio, the audio extraction is performed on the to-be-processed audio to obtain the target audio, where the target audio is the vocal and/or the accompaniment extracted from the to-be-processed audio; and finally the target audio is presented. In this technical solution, by presenting the extracted target audio, that is, by using the solution which enables the accompaniment extraction result to be open and output to the user, the user can choose to play, save, share, process and perform other operations on the target audio according to the needs, the personalized needs of users are met and the user experience is improved.
In order to enable readers to have a deeper understanding of the implementation principles of the present disclosure, further refinements are now made in conjunction with the following embodiments.
Illustratively, on the basis of the foregoing embodiments,
S201, acquire, in response to a touch-control operation on a first control on a first interface, the to-be-processed audio.
In the embodiment of the present disclosure, it is assumed that the to-be-processed audio is audio acquired by the electronic device in response to the user's touch-control operation on the first interface. That is, in this embodiment, the first interface is an interface for audio uploading.
Illustratively,
It can be understood that the touch-control operation may also be interpreted as a press operation, a touch operation, or a click operation, etc. The press operation may be a long press, a short press, or a continuous press. The embodiment does not limit the specific meaning of the touch-control operation.
Illustratively, as shown in (b) of
In an implementation, the extraction option may include a vocal removal control and an accompaniment removal control. The vocal removal control is configured to trigger the removal of the vocal in the audio, and the accompaniment removal control is configured to trigger the removal of the accompaniment in the audio.
In a possible design of the embodiment, the extraction option may also include an accompaniment extraction control (not shown) which can be configured to trigger the extraction of various types of audio component such as the vocal and the accompaniment etc. in the audio, to obtain the vocal and the accompaniment etc. in the audio. There is no limit in this regards in the embodiment.
S202, perform, in response to a touch-control operation on a second control on a second interface, the audio extraction on the to-be-processed audio, to obtain the target audio, where the second control is configured to trigger the audio extraction.
In the embodiment of the present disclosure, after acquiring the to-be-processed audio, the electronic device can perform the extraction operation on the to-be-processed audio, to obtain the target audio.
Illustratively, as shown in (b) of
It can be understood that, in the embodiment of the present disclosure, the first interface, the second interface, and subsequent interfaces represent different interfaces, and there is no sequence between them. Similarly, the first control, the second control, and subsequent controls also only represent different controls, and there is no sequence between them either. For example, the second control may be a first control on the second interface.
Illustratively, in a possible design of the embodiment of the present disclosure, the above S103 may be implemented through the following S203.
S203, display, on a third interface, an audio graphic corresponding to the target audio and/or a third control associated with the target audio, where the third control is configured to trigger playing of the target audio.
Illustratively, in this possible design of the embodiment, after obtaining the target audio, the electronic device can display the audio graphic corresponding to the target audio and/or the third control associated with the target audio on the third interface, so as to present the target audio to the user.
Illustratively, as shown in (c) of
In an implementation, in the first area 330 in (c) of
Correspondingly, when the user touch-controls the third control 331, the electronic device can play the target audio in response to the touch-control operation on the third control 331, and present the audio graphic 332 that changes with a waveform amplitude of the target audio.
Illustratively, in another possible design of the embodiment of the present disclosure, the above S103 may be implemented through the following S204.
S204, display, on a third interface, a fourth control associated with the target audio, where the fourth control is configured to trigger an export of data associated with the target audio to a target location.
The target location includes an album or a file system.
Illustratively, in this possible design of the embodiment, after obtaining the target audio, the electronic device may present the target audio to the user in a manner of displaying the fourth control associated with the target audio on the third interface.
Illustratively, as shown in (c) of
Correspondingly, when the user touch-controls the fourth control 333, the electronic device may export the target audio to the target position in response to the touch-control operation on the fourth control 333.
Illustratively, when the electronic device exports the target audio, the target audio may be exported to the target location in an audio format or in a file format. There is no limit in this regards in the embodiment.
Illustratively, in another possible design of the embodiment of the present disclosure, the above S103 may be implemented through the following S205.
S205, display, on a third interface, a fifth control associated with the target audio, where the fifth control is configured to trigger audio editing of the target audio.
Illustratively, in this possible design of the embodiment, after obtaining the target audio, the electronic device may also display the fifth control associated with the target audio on the third interface.
Illustratively, as shown in (c) of
Correspondingly, when the user touch-controls the fifth control 334, the electronic device may perform an audio editing operation on the target audio in response to the touch-control operation on the fifth control 334.
In an embodiment, the audio editing may include one or more of the following: editing the audio to optimize the audio; extracting the vocal and/or the accompaniment from the audio; extracting the vocal from the audio, and mixing the extracted vocal with a preset accompaniment; and extracting the vocal from a first audio, extracting the accompaniment from a second audio, and mixing the extracted vocal with the extracted accompaniment.
The embodiment does not limit the specific content of audio editing, which can be determined according to an actual situation, and details are not described here.
In the audio processing method provided in the embodiment, the to-be-processed audio is acquired in response to the touch-control operation on the first control on the first interface and the audio extraction is performed, in response to the touch-control operation on the second control on the second interface, on the to-be-processed audio to obtain the target audio, where the second control is configured to trigger the audio extraction, finally the audio graphic corresponding to the target audio and/or the third control associated with the target audio and configured to trigger the playing of the target audio can be displayed on the third interface, and/or, the fourth control associated with the target audio and configured to trigger the export of the data associated with the target audio to the target location can be displayed on the third interface, and/or, the fifth control associated with the target audio and configured to trigger the audio editing of the target audio can be displayed on the third interface. In this technical solution, audio uploading, audio processing, and audio presentation in various ways are performed through the controls on the interfaces, thus audio processing functions of the electronic device are enriched, audio processing intelligence of the electronic device is improved, personalized needs of users are met, and the user experience is improved.
In an embodiment of the present disclosure, the audio editing of the target audio in S205 above may include the following steps.
A1, present, in response to an audio processing instruction, one or more audio processing function controls, where the one or more audio processing function controls are configured to trigger execution of corresponding audio processing functions.
A2, perform, in response to a touch-control operation on one audio processing function control in the one or more audio processing function controls, audio processing corresponding to the one audio processing function control, on the target audio, to obtain the processed target audio.
In an implementation, in these steps, when the electronic device presents the obtained target audio on the third interface 33, and after the user plays the target audio through the third control 331 and listens to the target audio, and determines that the target audio does not meet requirements, the user can further give the audio processing instruction to proceed the editing of the target audio to obtain the processed target audio.
Illustratively, when receiving the user's audio processing instruction, the electronic device may respond to such instruction and present the one or more audio processing function controls, so as to detect audio processing instructions given through touch-controlling different audio processing function controls by the user, and then to perform different audio processing functions in response to the detected operations.
In a possible design of the embodiment, after the user's touch-control operation on the fifth control 334 on the third interface (for example, the export to audio track in
As an example, the electronic device presents, in response to a touch-control operation on a sixth control on the fourth interface, one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls. The seventh control is configured to trigger the presentation of the one or more audio processing function controls on a fifth interface.
In an implementation, presenting the one or more audio processing function controls includes presenting the one or more audio processing function controls in a window form, or presenting the one or more audio processing function controls through the fifth interface.
In a possible design,
Illustratively, when the electronic device responds to the detection of a touch-control operation on the sixth control 411, as shown in (b) of
In another possible design,
Illustratively, as shown in (b) of
Correspondingly, in response to the detection of the touch-control operation on the seventh control 512, as shown in (c) of
As another example, in response to a sliding operation on the fourth interface, the electronic device presents one or more audio processing function controls or the seventh control associated with the one or more audio processing function controls, and the seventh control is configured to trigger the presentation of the one or more audio processing function controls on the fifth interface.
In a possible design of the embodiment of the present disclosure, when the user performs a sliding operation on the fourth interface 41, correspondingly, the electronic device may present, in response to the sliding operation on the fourth interface 41, the one or more audio processing function controls directly through a window form or on the fifth interface. Reference may be made to
In another possible design of the embodiment of the present disclosure, when the user performs a sliding operation on the fourth interface (for example, a left sliding operation, and correspondingly, when the user performs a right sliding operation, it is possible to return from the audio console interface 51 to the fourth interface 41), correspondingly, the electronic device may present, in response to the sliding operation on the fourth interface, the seventh control associated with one or more audio processing function controls and then present, in response to the detection of the touch-control on the seventh control, the one or more audio processing function controls directly in the form of a window or on the fifth interface. Reference may be made to
In an embodiment of the present disclosure, referring to the above-mentioned
It can be understood that the embodiment of the present disclosure does not limit the types and functions of the controls included on each interface, which can be determined according to actual needs, and details are not described here.
Illustratively, following function support may further be included on the fourth interface 41:
At the same time, on the audio console interface 51, a control of volume of a sub-track 515 and a general output channel 516 is also supported: on a right side of a volume slider, an effector control 517 is also included. By touch-controlling the effector control 517, one can choose to enter an effector interface, and can choose desired effect prefabrication and modify a degree of using an effect on the effector interface. Under an effector button, one can also choose audio processing to unlock more audio processing methods, which are not be described here.
Furthermore, in the embodiment of the present disclosure, after various audio generation processes are complemented and if a duration of an audio track needs to be edited, a return to the fourth interface (the audio track interface) and clicking to select an audio track waveform are performed, with the following operations supported: audio splitting, audio cutting, audio copy and segment deletion.
In an implementation, upon a long press on a blank track, a paste button can be called out, with which cut or copied audio can be pasted. In addition, it is also supported that an audio duration is changed by dragging a beginning and an end of the audio.
In an embodiment of the present disclosure, as shown in the above-mentioned
In an embodiment, audio optimization may also be referred to as playing and singing optimization, which is a solution for optimizing vocal and/or musical instruments for the audio. For example, referring to
Accompaniment extraction can include options of vocal removal, accompaniment removal, or accompaniment extraction (i.e., get vocal and an accompaniment after the extraction).
Style synthesis can also be called one-key remix, that is, the extracted vocal can be mixed and edited with a preset accompaniment. In an implementation, the preset accompaniment may include but not limited to different genres including, but not limited to, car music, classic pop, heartbeat moments, relaxing moments, childhood fun, hip-hop backstreet, future bass, reggae style, drumbeat, etc., and, the embodiment of the present disclosure also does not limit names of various genres which can be named based on the needs of users, and details are not described here.
Audio mashup is a solution for mixing and editing at least two pieces of audio, which can be mixing and editing of the vocal and the accompaniment, or mixing and editing of at least two pieces of vocal, or mixing and editing of at least two pieces of accompaniment. The embodiment of the present disclosure does not limit the source audio used.
In the embodiment, the electronic device may perform, in response to a touch-control operation on a first audio processing function control, an audio processing function corresponding to the first audio processing function control. The first audio processing function control may be at least one set of controls from various types of controls such as the audio optimization control, the accompaniment extraction control, the style synthesis control, and the audio mashup control.
In the embodiment of the present disclosure, a solution to jump from an accompaniment extraction function interface to an audio processing function interface is provided for the user, saving the path, allowing to proceed the editing and creation, meeting the user's diverse and personalized creation needs, and improving the user experience.
On the basis of the above-mentioned embodiments, when the electronic device presents the obtained target audio on the third interface 33, and after the user plays the target audio through the third control 331 and listens to the target audio, and determines that the target audio meets the requirements, the user then can give an audio export instruction through the fourth control 333 on the third interface 33, to export the target audio to the target position, for example, to the album or the file system.
As an example, in response to an operation on the fourth control 333 on the third interface 33, the data associated with the target audio can be directly exported to the target position, where the data associated with the target audio may include the to-be-processed audio, the target audio (the accompaniment and/or the vocal) obtained through execution of the audio extraction, etc., and may also be an audio segment used in an audio processing process, and details are not described here.
As another example, the embodiment of the present disclosure also provides a function of adding a cover to the target audio. Therefore, in response to the touch-control operation on the fourth control 333 on the third interface 33, a jump of the interface from the third interface 33 to a sixth interface can be performed, and the target audio is displayed on the sixth interface.
Correspondingly, in response to an interface editing instruction given on the sixth interface by the user, a cover can be added to the generated target audio or an original cover of the generated target audio can be changed. Similarly, in response to a detected save instruction, a generated target cover and the data associated with the target audio can be saved to a target location: in response to a detected sharing instruction, the generated target cover and the data associated with the target audio can be shared to a target application; in response to a detected import-to-audio-track instruction, the data associated with the target audio can also be imported to the audio track interface for the user to proceed the editing.
It can be understood that the embodiment of the present disclosure does not limit specific operations on the sixth interface, and corresponding operations can be performed based on user instructions to implement different functions.
In a possible design of the present disclosure, in response to an operation on the fifth control 334 on the third interface 33, a jump to the audio processing interface is performed and one or more audio processing function controls are presented, and in response to a touch-control operation on one audio processing function control in the one or more audio processing function controls, audio processing corresponding to the one audio processing function control is performed on the target audio to obtain the processed target audio, and then upon a detection of an export instruction, a jump to the sixth interface is performed, and the processed target audio is displayed on the sixth interface.
Illustratively,
In an implementation, in (a) of
In an implementation, in (a) of
Illustratively,
S701, display, in response to a touch-control operation on the ninth control on the sixth interface, a first window, where the first window includes a cover import control, one or more preset static cover controls, and one or more preset animation effect controls.
In the embodiment of the present disclosure, when the electronic device presents a ninth control 612 configured to trigger cover editing, the user may give a cover editing instruction through the ninth control 612. For example, when the user's touch-control operation on the ninth control 612 is detected by the electronic device, in response to the touch-control operation, the electronic device may present an interface as shown in (b) of
Referring to (b) of
In an implementation, the cover part includes a custom cover import control and one or more preset static cover controls. The cover import control is configured to trigger an import of a local picture, and the one or more preset static cover controls are configured to trigger selection of a preset static cover. It can be understood that the static cover is a plurality of pictures preset in a target application of the electronic device, for example, cover 1, cover 2 and cover 3.
In an implementation, the animation part includes an animation-unwanted control and one or more preset animation effect controls. The animation-unwanted control is configured to trigger selection of no animation, that is, a cover generated by the electronic device has no animation effect. The one or more preset animation effect controls are configured to trigger selection of preset animation effects. It can be understood that the animation effects are a variety of dynamic changes preset in a target application of the electronic device. For example, the animation effects may include animation 1, animation 2 and animation 3.
S702, acquire a target cover in response to a control selection operation on the first window, where the target cover is a static cover or a dynamic cover.
In the embodiment, the user can select various controls presented on the sixth interface according to actual needs. For example, when the user touch-controls the custom cover import control, the electronic device can use a locally imported photo as a static cover of the audio, and when the user selects the animation-unwanted control from the animation part, the generated target cover is a static cover.
As another example, when the user selects a cover from the cover part and an animation from the animation part, respectively, a dynamic cover can be generated. Specifically, in the embodiment of the present disclosure, if the target cover is a dynamic cover, S702 may be implemented through the following steps.
B1, acquire, in response to the control selection operation on the first window, a static cover and an animation effect.
B2, generate, according to an audio characteristic of the processed target audio and the static cover and the animation effect, a dynamic cover that changes with the audio characteristic of the processed target audio, where the audio characteristic includes audio tempo and/or volume.
In an embodiment, the electronic device may detect the user's control selection operation. For example, as shown in (b) of
It can be understood that, in the embodiment of the present disclosure, when the user clicks the eighth control 611 below the dynamic cover 620 in (c) of
In an implementation, when the final audio processing and editing operations are completed, the electronic device can also export the generated target cover and the data associated with the target audio in response to the user's operation. In an implementation, an export to an album or a file is supported. And the cover can be replaced at a time of the export to the album, and one can choose to finish or choose a share to a target application after the export is completed.
In addition, the user can also choose a share to a file. At this time, a compressed package containing the audio is automatically generated for facilitating the user's sending to other places for further editing.
In an embodiment of the present disclosure, after the above S702, the audio processing method may further include the following step.
S703, export, in response to an export instruction on a sixth interface, data associated with the processed target audio, to a target location, where the target location includes an album or a file system.
In the embodiment, the export instruction may be a voice, a touch-control operation on an export control, and the like.
For example, when a voice recognition function on the sixth interface is enabled, the user can give the export instruction by voice.
For another example, as shown in (a) and (c) of
In an embodiment of the present disclosure, after the above S702, the audio processing method may further include the following step.
S704, share, in response to a sharing instruction on the sixth interface, the data associated with the processed target audio, to a target application.
Illustratively, in the embodiment, the sharing instruction may be a voice, a touch-control operation on a share control, and the like. For example, when the voice recognition function on the sixth interface is enabled, the user can give a sharing instruction by voice.
For another example, as shown in (a) and (c) of
It can be understood that, in the embodiment of the present disclosure, the above-mentioned data associated with the processed target audio includes at least one of the following:
It can be understood that in the embodiment, the data associated with the processed target audio may be materials such as audio clips and audio data (for example, vocal, an accompaniment, etc.) at various stages of audio processing, or materials such as the static cover and the dynamic cover of the target audio, or compression packages, material packages, etc. that are compressed from multiple pieces of audio data. The embodiment does not limit the specific forms of the data associated with the processed target audio.
Illustratively, the electronic device may share and/or export various data associated with the processed target audio. For example, based on the user's instruction(s), the electronic device may export and/or share the generated data associated with the processed target audio, and may also export and/or share the audio-processed target audio (vocal or an accompaniment, etc.), and may also export and/or share a generated target cover (static cover or dynamic cover) together with the target audio. There is no limit in this regards in the embodiment.
On the basis of the foregoing embodiments,
S801, perform, in response to a detection of a touch-control operation on the accompaniment extraction control, audio extraction on the to-be-processed audio to obtain the target audio.
As an example, the electronic device may process the to-be-processed audio to obtain the target audio.
As another example, the electronic device may also upload the to-be-processed audio to a cloud, so as to invoke a remote extraction service to extract the target audio from the to-be-processed audio. In an implementation,
Specifically, as shown in
It can be understood that, after obtaining the target audio, the electronic device may execute different processes in response to the user's touch-control operations on different controls.
As an example, after S801, the audio processing method may include the following steps.
S802, export, in response to a detection of a touch-control operation on an export-to-audio-track control, the target audio to the audio track interface for subsequent editing, to obtain the processed target audio.
S803, save, in response to a detection of a touch-control operation on the save control, the data associated with the processed target audio to a file system or an album.
Illustratively, for a generated audio file, in order to facilitate subsequent editing on other devices, the processed target audio and its related data can be compressed to obtain a file in the form of a compressed package for processing and storage together.
In an embodiment, when the data associated with the processed target audio is saved to the album, it may be supported that a cover of a file such as the target audio is replaced or a cover is added thereto by default, so as to improve the aesthetic feeling of the user when enjoying the target audio.
As another example, after S801, the audio processing method may include the following step.
S804, save the data associated with the target audio in response to a detection of a touch-control operation on the save control.
Illustratively, the data associated with the target audio may be saved to a file system or an album.
In an implementation, in the above S803 and S804, for a manner of saving the data associated with the target audio, reference may be made to the following
For the specific implementation of each step in the embodiment, reference may be made to the descriptions in the foregoing embodiments, and details are not repeated here.
From the contents recorded in the above-mentioned embodiments, it can be seen that the audio processing method provided by the embodiments of the present disclosure enables the result of the accompaniment extraction to be open and output to users, meeting the diverse needs of the users, provides a jump from the accompaniment extraction function to the audio track processing interface, not only saving the interface jump path, but also providing the possibility to proceed editing and creation on the results of the accompaniment extraction, provides a new way of saving, that is, supporting saving to a file and saving to an album, and supports changing the cover of the file, thus improving the intelligence of the application program to which the audio processing method is applicable, and improving the user experience.
The following are apparatus embodiments of the present application, which can be used to implement the method embodiments of the present application. For details not disclosed in the apparatus embodiments of the present application, please refer to the method embodiments of the present application.
In an optional embodiment of the present disclosure, the acquiring module 1101 is specifically configured to acquire, in response to a touch-control operation on a first control on a first interface, the to-be-processed audio, where the first control is configured to trigger loading of audio.
In an optional embodiment of the present disclosure, the processing module 1102 is specifically configured to perform, in response to a touch-control operation on a second control on a second interface, the audio extraction on the to-be-processed audio, to obtain the target audio, where the second control is configured to trigger the audio extraction.
In an optional embodiment of the present disclosure, the presenting module 1103 is specifically configured to display, on a third interface, an audio graphic corresponding to the target audio and/or a third control associated with the target audio, where the third control is configured to trigger playing of the target audio.
In an optional embodiment of the present disclosure, the presenting module 1103 is specifically configured to display, on a third interface, a fourth control associated with the target audio, where the fourth control is configured to trigger an export of data associated with the target audio to a target location, and the target location includes an album or a file system.
In an optional embodiment of the present disclosure, the presenting module 1103 is specifically configured to display, on a third interface, a fifth control associated with the target audio, where the fifth control is configured to trigger audio editing of the target audio.
In an optional embodiment of the present disclosure, the presenting module 1103 is further configured to present, in response to an audio processing instruction, one or more audio processing function controls, where the one or more audio processing function controls are configured to trigger execution of corresponding audio processing functions; and
In an optional embodiment of the present disclosure, the presenting module 1103 is specifically configured to present, in response to a touch-control operation on a sixth control on a fourth interface, the one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls, where the seventh control is configured to trigger presentation of the one or more audio processing function controls on a fifth interface.
In an optional embodiment of the present disclosure, the presenting module 1103 is specifically configured to present, in response to a sliding operation on a fourth interface, the one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls, where the seventh control is configured to trigger presentation of the one or more audio processing function controls on a fifth interface.
In an optional embodiment of the present disclosure, the audio processing function controls include:
In an optional embodiment of the present disclosure, the presenting module 1103 is further configured to display the processed target audio on a sixth interface, where the sixth interface includes an eighth control, and the eighth control is configured to trigger playing of the processed target audio.
In an optional embodiment of the present disclosure, the sixth interface further includes a ninth control, and the presenting module 1103 is further configured to display, in response to a touch-control operation on the ninth control on the sixth interface, a first window, where the first window includes a cover import control, one or more preset static cover controls, and one or more preset animation effect controls; and
In an optional embodiment of the present disclosure, if the target cover is the dynamic cover, the processing module 1102 is specifically configured to:
In an optional embodiment of the present disclosure, the processing module 1102 is further configured to export, in response to an export instruction on a sixth interface, data associated with the processed target audio, to a target location, where the target location includes an album or a file system.
In an optional embodiment of the present disclosure, the processing module 1102 is further configured to share, in response to a sharing instruction on a sixth interface, data associated with the processed target audio, to a target application.
In an optional embodiment of the present disclosure, the data associated with the processed target audio includes at least one of the following:
The audio processing apparatus provided in the embodiment can be used to implement the technical solutions of the above method embodiments, and their implementation principles and technical effects are similar, and are not described in detail here in the embodiment.
As shown in
Generally, the following apparatuses may be connected to the I/O interface 1205: an input apparatus 1206 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; an output apparatus 1207 including, for example, a liquid crystal display (LCD for short), a speaker, a vibrator, etc.; a storage apparatus 1208 including, for example, a magnetic tape, a hard disk, etc.; and a communication apparatus 1209. The communication means 1209 may allow the electronic device 1200 to perform wireless or wired communication with other devices to exchange data. Although
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded from a network and installed via the communication apparatus 1209, or installed from the storage apparatus 1208, or installed from the ROM 1202. When the computer program is executed by the processing apparatus 1201, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination thereof. The computer-readable storage medium may be, for example, but not limited to, an electrical, a magnetic, an optical, an electromagnetic, an infrared, or a semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM for short, or a flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM for short), an optical storage device, a magnetic storage device, or any suitable combination thereof. In the present disclosure, the computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, the computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave carrying computer-readable program codes therein. Such a propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination thereof. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium, and the computer-readable signal medium can transmit, propagate, or transport the program used by or in conjunction with the instruction execution system, apparatus, or device. The program codes contained on the computer-readable medium may be transmitted through any appropriate medium, including but not limited to: an electric wire, an optical fiber cable, an RF (radio frequency), etc., or any suitable combination thereof.
The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is caused to implement the methods shown in the above-mentioned embodiments.
The computer program codes for carrying out the operations of the present disclosure may be written in one or more programming languages, or a combination thereof, where the above programming languages include an object-oriented programming language, such as Java, Smalltalk, and C++, as well as a conventional procedural programming language, such as “C” language or similar programming languages. The program codes may be executed entirely on a user computer, executed partly on a user computer, executed as a stand-alone software package, executed partly on a user computer and partly on a remote computer, or executed entirely on a remote computer or a server. In a case involving the remote computer, the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN for short) or a wide area network (WAN for short), or may be connected to an external computer (e.g., connected via the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of the systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, a program segment, or a portion of codes that includes one or more executable instructions for implementing a specified logical function. It should also be noted that, in some alternative implementations, functions indicated in the blocks may occur in an order different from that indicated in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowcharts, or a combination of blocks in the block diagrams and/or flowcharts may be implemented in a special purpose hardware-based system that perform a specified function or operation, or may be implemented in a combination of special purpose hardware and a computer instruction.
The apparatuses or modules involved in the embodiments described in the present disclosure may be implemented by means of software or by means of hardware. Names of the apparatuses do not constitute a limitation on the apparatuses or modules per se under certain circumstances.
The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a field programmable gate array (FPGA for short), an application specific integrated circuit (ASIC for short), an application specific standard product (ASSP for short), a system on chip (SOC for short), a complex programmable logic device (CPLD for short), etc.
In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, a magnetic, an optical, an electromagnetic, an infrared, or a semiconductor systems, apparatus, or devices, or any suitable combination thereof. More specific examples of the machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read only memory (ROM), an erasable programmable read only memory (EPROM or a flash memory), fiber optics, a portable compact disk read only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In a first aspect, according to one or more embodiments of the present disclosure, an audio processing method is provided, including:
According to one or more embodiments of the present disclosure, the acquiring, in response to the audio acquisition instruction, the to-be-processed audio includes:
According to one or more embodiments of the present disclosure, the performing, in response to the audio extraction instruction for the to-be-processed audio, the audio extraction on the to-be-processed audio, to obtain the target audio includes:
According to one or more embodiments of the present disclosure, the presenting the target audio includes:
According to one or more embodiments of the present disclosure, the presenting the target audio includes:
According to one or more embodiments of the present disclosure, the presenting the target audio includes:
According to one or more embodiments of the present disclosure, the audio editing of the target audio includes:
According to one or more embodiments of the present disclosure, the presenting, in response to the audio processing instruction, the one or more audio processing function controls includes:
According to one or more embodiments of the present disclosure, the presenting, in response to the audio processing instruction, the one or more audio processing function controls includes:
According to one or more embodiments of the present disclosure, the audio processing function controls include:
According to one or more embodiments of the present disclosure, the method further includes: displaying the processed target audio on a sixth interface, where the sixth interface includes an eighth control, and the eighth control is configured to trigger playing of the processed target audio.
According to one or more embodiments of the present disclosure, the sixth interface further includes a ninth control, and the method further includes:
According to one or more embodiments of the present disclosure, if the target cover is the dynamic cover, the acquiring, in response to the control selection operation on the first window, the target cover includes:
According to one or more embodiments of the present disclosure, the method further includes:
According to one or more embodiments of the present disclosure, the method further includes:
According to one or more embodiments of the present disclosure, the data associated with the processed target audio includes at least one of the following:
In a second aspect, according to one or more embodiments of the present disclosure, an audio processing device is provided, including:
According to one or more embodiments of the present disclosure, the acquiring module is specifically configured to acquire, in response to a touch-control operation on a first control on a first interface, the to-be-processed audio, where the first control is configured to trigger loading of audio.
According to one or more embodiments of the present disclosure, the processing module is specifically configured to perform, in response to a touch-control operation on a second control on a second interface, the audio extraction on the to-be-processed audio, to obtain the target audio, where the second control is configured to trigger the audio extraction.
According to one or more embodiments of the present disclosure, the presenting module is specifically configured to display, on a third interface, an audio graphic corresponding to the target audio and/or a third control associated with the target audio, where the third control is configured to trigger playing of the target audio.
According to one or more embodiments of the present disclosure, the presenting module is specifically configured to display, on a third interface, a fourth control associated with the target audio, where the fourth control is configured to trigger an export of data associated with the target audio to a target location, and the target location includes an album or a file system.
According to one or more embodiments of the present disclosure, the presenting module is specifically configured to display, on a third interface, a fifth control associated with the target audio, where the fifth control is configured to trigger audio editing of the target audio.
According to one or more embodiments of the present disclosure, the presenting module is further configured to present, in response to an audio processing instruction, one or more audio processing function controls, where the one or more audio processing function controls are configured to trigger execution of corresponding audio processing functions; and
According to one or more embodiments of the present disclosure, the presenting module is specifically configured to present, in response to a touch-control operation on a sixth control on a fourth interface, the one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls, where the seventh control is configured to trigger presentation of the one or more audio processing function controls on a fifth interface.
According to one or more embodiments of the present disclosure, the presenting module is specifically configured to present, in response to a sliding operation on a fourth interface, the one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls, where the seventh control is configured to trigger presentation of the one or more audio processing function controls on a fifth interface.
According to one or more embodiments of the present disclosure, the audio processing function controls include:
According to one or more embodiments of the present disclosure, the presenting module is further configured to display the processed target audio on a sixth interface, where the sixth interface includes an eighth control, and the eighth control is configured to trigger playing of the processed target audio.
According to one or more embodiments of the present disclosure, the sixth interface further includes a ninth control, and the presenting module is further configured to display, in response to a touch-control operation on the ninth control on the sixth interface, a first window, where the first window includes a cover import control, one or more preset static cover controls, and one or more preset animation effect controls; and
According to one or more embodiments of the present disclosure, if the target cover is the dynamic cover, the processing module is specifically configured to:
According to one or more embodiments of the present disclosure, the processing module is further configured to export, in response to an export instruction on a sixth interface, data associated with the processed target audio, to a target location, where the target location includes an album or a file system.
According to one or more embodiments of the present disclosure, the processing module is further configured to share, in response to a sharing instruction on a sixth interface, data associated with the processed target audio, to a target application.
According to one or more embodiments of the present disclosure, the data associated with the processed target audio includes at least one of the following:
In a third aspect, according to one or more embodiments of the present disclosure, an electronic device is provided, including: at least one processor and a memory;
In a fourth aspect, according to one or more embodiments of the present disclosure, a computer-readable storage medium is provided, where a computer-executed instruction is stored in the computer-readable storage medium, and when the computer-executed instruction is executed by a processor, the audio processing method described in the above first aspect and various possible designs of the first aspect is implemented.
In a fifth aspect, according to one or more embodiments of the present disclosure, a computer program product is provided, including a computer program, where when the computer program is executed by a processor, the audio processing method described in the above first aspect and various possible designs of the first aspect is implemented.
In a sixth aspect, according to one or more embodiments of the present disclosure, a computer program is provided, where when the computer program is executed by a processor, the audio processing method described in the above first aspect and various possible designs of the first aspect is implemented.
The above descriptions are only preferred embodiments of the present disclosure and illustrations of an applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to a technical solution formed by a specific combination of the above-mentioned technical features, but should also cover other technical solutions formed by any combination of the above technical features or their equivalents without departing from the above disclosure concept, for example, technical solutions formed by a mutual replacement between the above features and the technical features with similar functions (but not limited to) disclosed in the present disclosure.
In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations are performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable sub-combination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.
Number | Date | Country | Kind |
---|---|---|---|
2022104954600 | May 2022 | CN | national |
This application is a National Stage of International Application No. PCT/CN2023/092363, filed on May 5, 2023, which claims priority to Chinese patent application No. 202210495460.0, entitled “AUDIO PROCESSING METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM” and filed with the China National Intellectual Property Administration on May 7, 2022. Both of the above applications are incorporated herein by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2023/092363 | 5/5/2022 | WO |