The present disclosure is based on and claims the benefit of priority to the Chinese patent application No. 202111373450.1 with a title “VIDEO GENERATION METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM”, filed on Nov. 18, 2021, which is hereby incorporated by reference in its entirety into the present disclosure.
The present disclosure relates to the technical field of video generation, and in particular, to a video generation method, apparatus, device, and storage medium.
Currently, an electronic device may generate a video from a set of images selected by a user.
In the related art, an electronic device acquires a set of images selected by a user and music matched with the set of images, and generates a video from the set of images and the music.
In the above-described related art, the video is generated from the set of images and the music, so that the video is poor in richness.
Embodiments of the present disclosure provide a video generation method, apparatus, device, and storage medium, for solving the problem of poor richness of the generated video.
In a first aspect, an embodiment of the present disclosure provides a video generation method, comprising: acquiring a plurality of images and music matched with the plurality of images; determining first feature information of the plurality of images and second feature information of the music; determining a target rendering effect combination according to the first feature information, the second feature information and a pre-stored plurality of rendering effects; wherein the rendering effects are animation, special effect or transition; and generating a video according to the plurality of images, the music and the target rendering effect combination.
Alternatively, the first feature information comprises a first global feature and a first local feature; the second feature information comprises a second global feature and a second local feature; the determining a target rendering effect combination according to the first feature information, the second feature information and a pre-stored plurality of rendering effects, comprises: determining a plurality of candidate effects in the plurality of rendering effects according to the first global feature and the second global feature; determining one or more target effects in the plurality of candidate effects according to the first local feature, and performing combination processing on the one or more target effects to obtain one or more rendering combinations; and determining the target rendering effect combination according to the first local feature, the second local feature and the one or more rendering combinations.
Alternatively, the first global feature comprises a first image emotion, a first image style and a first image scene corresponding to the plurality of images; the second global feature comprises a first music emotion, a first music style and a first music theme; and the determining a plurality of candidate effects in the plurality of rendering effects according to the first global feature and the second global feature, comprises: for each rendering effect, acquiring a first initial score corresponding to each of the first image emotion, the first image style and the first image scene according to an identification of the each rendering effect; screening the plurality of rendering effects according to the first initial score to obtain a plurality of intermediate effects; for each intermediate effect, acquiring a second initial score corresponding to each of the first music emotion, the first music style and the first music theme according to an identification of the each intermediate effect; and screening the plurality of intermediate effects according to the second initial score to obtain the plurality of candidate effects.
Alternatively, the screening the plurality of rendering effects according to the first initial score to obtain a plurality of intermediate effects, comprises: for each rendering effect, determining a sum of the first initial score corresponding to each of the first image emotion, the first image style and the first image scene, as a first target score corresponding to the each rendering effect; and determining rendering effects of which the first target score is greater than or equal to a first threshold in the plurality of rendering effects, as the plurality of intermediate effects.
Alternatively, the screening the plurality of intermediate effects according to the second initial score to obtain the plurality of candidate effects, comprises: for each intermediate effect, determining a sum of the second initial score corresponding to each of the first music emotion, the first music style and the first music theme, as a second target score corresponding to the each intermediate effect; and determining intermediate effects of which the second target score is greater than or equal to a second threshold in the plurality of intermediate effects, as the plurality of candidate effects.
Alternatively, the first local feature comprises a second image emotion, a second image style and a second image scene corresponding to each of the plurality of images; the determining one or more target effects in the plurality of candidate effects according to the first local feature, comprises: for each image, determining a third target score corresponding to each of the plurality of candidate effects under a condition of the image, according to the plurality of candidate effects and the second image emotion, the second image style and the second image scene corresponding to the each image; and determining the one or more target effects in the plurality of candidate effects according to the third target score corresponding to each of the plurality of candidate effects.
Alternatively, the first local feature comprises a second image emotion, a second image style and a second image scene corresponding to each of the plurality of images; the second local feature comprises a chorus point, a phrase and section point and a beat point of a music fragment corresponding to each of the plurality of images in the music; the determining the target rendering effect combination according to the first local feature, the second local feature and the one or more rendering combinations, comprises: screening the one or more rendering combinations according to a second image emotion, a second image style and a second image scene corresponding to a first image in the plurality of images, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the first image, to obtain N initial candidate combinations, where N is an integer greater than or equal to 1; determining M (j−1)th candidate combinations according to the N initial candidate combinations and the one or more rendering combinations, where M is equal to a product of N and a total number of the one or more rendering combinations; screening the M (j−1)th candidate combinations according to a second image emotion, a second image style and a second image scene corresponding to a jth image, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the jth image, to determine N jth candidate combinations, and taking the N jth candidate combinations as new N initial candidate combinations, adding 1 to j, and repeating the steps until a last image in the plurality of images, and determining a candidate combination corresponding to the last image as the target rendering effect combination; where an initial value of j is 2.
Alternatively, the screening the one or more rendering combinations according to a second image emotion, a second image style and a second image scene corresponding to a first image, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the first image, to obtain N initial candidate combinations, comprises: determining a combination score corresponding to each of the one or more rendering combinations according to the second image emotion, the second image style and the second image scene corresponding to the first image, and the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image; and determining N rendering combinations of which the combination score is greater than or equal to a fourth threshold in the one or more rendering combinations, as the N initial candidate combinations.
Alternatively, the determining a combination score corresponding to each of the one or more rendering combinations according to the second image emotion, the second image style and the second image scene corresponding to the first image, and the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image, comprises: for each rendering combination in the one or more rendering combinations, determining a music matching score according to the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image and an identification of each rendering effect in the rendering combination; determining an image matching score according to the second image emotion, the second image style and the second image scene corresponding to the first image and the identification of each rendering effect in the rendering combination; determining an internal combination score corresponding to the first image; and determining the music matching score, the image matching score and the internal combination score as the combination score corresponding to the each rendering combination.
Alternatively, the determining first feature information of the plurality of images and second feature information of the music, comprises: performing feature extraction on the plurality of images through a pre-stored image feature extraction model, to obtain the first feature information of the plurality of images; and performing feature extraction on the music through a pre-stored music feature extraction model, to obtain the second feature information.
Alternatively, the target rendering effect combination comprises the animation, special effect and transition corresponding to each of the plurality of images; the generating a video according to the plurality of images, the music and the target rendering effect combination, comprises: sequentially displaying the plurality of images according to the animation, special effect and transition corresponding to each the plurality of images in the target rendering effect combination, and playing the music, to generate the video.
Alternatively, the acquiring a plurality of images and music matched with the plurality of images, comprises: in response to a selection operation on a plurality of target images in a plurality of candidate images, determining the plurality of target images as the plurality of images; and in response to a selection operation on target music in a plurality of candidate music, determining the target music as the music matched with the plurality of images.
In a second aspect, an embodiment of the present disclosure provides a video generation apparatus, comprising: an acquisition module configured to acquire a plurality of images and music matched with the plurality of images; a first determination module configured to determine first feature information of the plurality of images and second feature information of the music; a second determination module configured to determine a target rendering effect combination according to the first feature information, the second feature information and a pre-stored plurality of rendering effects; the rendering effects being animation, special effect or transition; and a generation module configured to generate a video according to the plurality of images, the music and the target rendering effect combination.
Alternatively, the first feature information comprises a first global feature and a first local feature; the second feature information comprises a second global feature and a second local feature; the second determination module is specifically configured to: determine a plurality of candidate effects in the plurality of rendering effects according to the first global feature and the second global feature; determine one or more target effects in the plurality of candidate effects according to the first local feature, and perform combination processing on the one or more target effects to obtain one or more rendering combinations; and determine the target rendering effect combination according to the first local feature, the second local feature and the one or more rendering combinations.
Alternatively, the first global feature comprises a first image emotion, a first image style and a first image scene corresponding to the plurality of images, and the second global feature comprises a first music emotion, a first music style and a first music theme; the second determination module is specifically configured to: for each rendering effect, acquire a first initial score corresponding to each of the first image emotion, the first image style and the first image scene according to an identification of the rendering effect; screen the plurality of rendering effects according to the first initial score to obtain a plurality of intermediate effects; for each intermediate effect, acquire a second initial score corresponding to each of the first music emotion, the first music style and the first music theme according to an identification of the intermediate effect; and screen the plurality of intermediate effects according to the second initial score to obtain the plurality of candidate effects.
Alternatively, the second determination module is specifically configured to: for each rendering effect, determine a sum of the first initial score corresponding to each of the first image emotion, the first image style and the first image scene, as a first target score corresponding to the rendering effect; and determine rendering effects of which the first target score is greater than or equal to a first threshold in the plurality of rendering effects, as the plurality of intermediate effects.
Alternatively, the second determination module is specifically configured to: for each intermediate effect, determine a sum of the second initial score corresponding to each of the first music emotion, the first music style and the first music theme, as a second target score corresponding to the intermediate effect; and determine intermediate effects of which the second target score is greater than or equal to a second threshold in the plurality of intermediate effects, as the plurality of candidate effects.
Alternatively, the first local feature comprises a second image emotion, a second image style and a second image scene corresponding to each of the plurality of images; the second determination module is specifically configured to: for each image, determine a third target score corresponding to each of the plurality of candidate effects under a condition of the each image, according to the plurality of candidate effects and the second image emotion, the second image style and the second image scene corresponding to the each image; and determine the one or more target effects in the plurality of candidate effects according to the third target score corresponding to each of the plurality of candidate effects.
Alternatively, the first local feature comprises a second image emotion, a second image style and a second image scene corresponding to each of the plurality of images; the second local feature comprises a chorus point, a phrase and section point and a beat point of a music fragment corresponding to each of the plurality of images in the music; the second determination module is specifically configured to: screen the one or more rendering combinations according to a second image emotion, a second image style and a second image scene corresponding to a first image in the plurality of images, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the first image, to obtain N initial candidate combinations, where N is an integer greater than or equal to 1; determine M (j−1)th candidate combinations according to the N initial candidate combinations and the one or more rendering combinations, where M is equal to a product of N and a total number of the one or more rendering combinations; screen the M (j−1)th candidate combinations according to a second image emotion, a second image style and a second image scene corresponding to a jth image, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the jth image, to determine N jth candidate combinations, and take the Njth candidate combinations as new N initial candidate combinations, add 1 to j, and repeat the steps until a last image in the plurality of images, and determine a candidate combination corresponding to the last image as the target rendering effect combination; where an initial value of j is 2.
Alternatively, the second determination module is specifically configured to: determine a combination score corresponding to each of the one or more rendering combinations according to the second image emotion, the second image style and the second image scene corresponding to the first image, and the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image; and determine N rendering combinations of which the combination score is greater than or equal to a fourth threshold in the one or more rendering combinations, as the N initial candidate combinations.
Alternatively, the second determination module is specifically configured to: for each rendering combination in the one or more rendering combinations, determine a music matching score according to the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image and an identification of each rendering effect in the rendering combination; determine an image matching score according to the second image emotion, the second image style and the second image scene corresponding to the first image and the identification of each rendering effect in the rendering combination; determine an internal combination score corresponding to the first image; and determine the music matching score, the image matching score and the internal combination score as the combination score corresponding to the each rendering combination.
Alternatively, the first determination module is specifically configured to: perform feature extraction on the plurality of images through a pre-stored image feature extraction model, to obtain the first feature information of the plurality of images; and perform feature extraction on the music through a pre-stored music feature extraction model, to obtain the second feature information.
Alternatively, the generation module is specifically configured to: sequentially display the plurality of images according to the animation, special effect and transition corresponding to each of the plurality of images in the target rendering effect combination, and play the music, to generate the video.
Alternatively, the acquisition module is specifically configured to: in response to a selection operation on a plurality of target images in a plurality of candidate images, determine the plurality of target images as the plurality of images; and in response to a selection operation on target music in a plurality of candidate music, determine the target music as the music matched with the plurality of images.
In a third aspect, an embodiment of the present disclosure provides an electronic device, comprising: a processor, and a memory communicatively connected to the processor;
In a fourth aspect, an embodiment of the present disclosure provide a computer-readable storage medium, having computer-executable instructions stored thereon, which, when executed by a processor, implement the method according to any one of the first aspect.
In a fifth aspect, an embodiment of the present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the first aspect.
In a sixth aspect, an embodiment of the present disclosure provides a computer program which, when executed by a processor, implements the method according to any one of the first aspect.
The embodiments of the present disclosure provide a video generation method, apparatus, device, and storage medium, wherein the video generation method comprises: acquiring a plurality of images and music matched with the plurality of images; determining first feature information of the plurality of images and second feature information of the music; determining a target rendering effect combination according to the first feature information, the second feature information and a pre-stored plurality of rendering effects; the rendering effects being animation, special effect or transition; and generating a video according to the plurality of images, the music and the target rendering effect combination. In the method, the richness of the video is improved by means of generating the video according to the plurality of images, the music and the target rendering effect combination.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the present disclosure.
With the above drawings, explicit embodiments of the present disclosure are shown and will be described in more detail hereinafter. The drawings and the description thereof are not intended to limit the scope of the concepts of the present disclosure in any manner, but rather to illustrate the concepts of the present disclosure for those skilled in the art with reference to specific embodiments.
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the drawings. The following description refers to the drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
First, the technical terms involved in the present disclosure will be explained.
Animation, refers to the effect of deformation and displacement of an image.
Transition, refers to the effect of switching between two images.
Special effect, refers to some particle special effects or light and shade color changes made on an image.
The related art is explained next.
In the related art, an electronic device recommends a special effect for the set of images according to music, and then generates a video according to the set of images, the music and the recommended special effect, so that the generated video is poor in richness.
In the present disclosure, in order to improve the richness of the video, inventors found that, by determining a target rendering effect combination corresponding to a plurality of images and music, wherein the target rendering effect combination may include one or more rendering effects of animation, transition or special effect, and generating a video according to the plurality of images, the music and the target rendering effect combination, the video can have the one or more rendering effects of animation, transition or special effect, so that the richness of the video is improved.
The application scenario involved in the present disclosure is described below with reference to
The plurality of images are matched with the music.
The target rendering effect combination includes rendering effects. The target rendering effect combination is determined according to first feature information of the plurality of images, second feature information of the music and a plurality of rendering effects.
The video is generated according to the plurality of images, the music, and the target rendering effect combination.
In the present disclosure, the target rendering effect combination is determined according to the first feature information of the plurality of images, the second feature information of the music and the plurality of rendering effects, and then the video is generated according to the plurality of images, the music and the target rendering effect combination, so that the video contains the rendering effects in the target rendering effect combination, and the richness of the video is improved.
The following describes in detail the technical solution of the present disclosure and how the technical solution of the present disclosure solves the above technical problem by specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present disclosure will be described below with reference to the drawings.
S201, acquiring a plurality of images and music matched with the plurality of images.
Alternatively, an execution subject of the embodiments of the present disclosure is an electronic device, and may also be a video generation apparatus provided in the electronic device, where the video generation apparatus may be implemented by a combination of software and/or hardware.
The electronic device may be a Personal Digital Assistant (PDA), a User Device or a User Equipment, a tablet computer, a desktop computer, a video camera, a video recorder, or the like.
Alternatively, the acquiring a plurality of images and music matched with the plurality of images may be made in 2 ways as follows.
Way 1, in response to a selection operation of a plurality of target images in a plurality of candidate images, determining the plurality of target images as the plurality of images; and in response to a selection operation of target music in a plurality of candidate music, determining the target music as the music matched with the plurality of images.
Way 2, in response to a selection operation of a plurality of target images in a plurality of candidate images, determining the plurality of target images as the plurality of images, and processing the plurality of images and a plurality of candidate music by a preset music matching model, to obtain the music matched with the plurality of images.
Alternatively, the plurality of candidate images may be images pre-stored in the electronic device, and the plurality of candidate music may be music pre-stored in the electronic device and/or a preset server.
It should be noted that, each image corresponds to part of music fragments in the music, and the music fragments corresponding to each of the plurality of images can constitute the music.
S202, determining first feature information of the plurality of images and second feature information of the music.
Alternatively, the determining first feature information of the plurality of images and second feature information of the music may be made in 2 ways as follows.
Way 1, performing feature extraction on the plurality of images and the music through a pre-trained feature extraction model, to obtain the first feature information of the plurality of images and the second feature information of the music.
Alternatively, the pre-trained feature extraction model is obtained by training with a plurality of sample data. Each sample data includes one or more sample images and sample music matched with the one or more sample images.
Way 2, performing feature extraction on the plurality of images through a pre-stored image feature extraction model, to obtain the first feature information of the plurality of images; and performing feature extraction on the music through a pre-stored music feature extraction model to obtain the second feature information.
The first feature information includes a first global feature and a first local feature.
The first global feature is a comprehensive feature of all of the plurality of images and the first local feature is a feature of each of the plurality of images.
The first global feature includes one or more of an image emotion label, an image style label or an image scene label corresponding to the plurality of images.
The image emotion label includes a first image emotion. For example, the first image emotions include image emotions Tm1, Tm2, Tm3, Tm4, and the like.
The image style label includes a first image style. For example, the first image style includes image styles Tf1, Tf3, and the like.
The image scene label includes a first image scene. For example, the first image scene includes image scenes Tt1, Tt2, Tt3, and the like.
The first local feature includes one or more of an image emotion label, an image style label and an image scene label corresponding to each of the plurality of images (i.e., The first local feature includes one or more of an image emotion label, an image style label and an image scene label corresponding to each of the plurality of images).
The image emotion label includes a second image emotion. The image style label includes a second image style. The image scene label includes a second image scene.
Alternatively, the first image emotion may be the same as or different from the second image emotion, the first image style may be the same as or different from the second image style, and the first image scene may be the same as or different from the second image scene.
The second feature information includes a second global feature and a second local feature.
The second global feature include a music emotion label, a music style label and a music theme label of the music.
The music emotion label includes a first music emotion. For example, the first music emotions include music emotions Me1, Me2, and the like.
The music style label includes a first music style. For example, the first music style includes music styles Mf1, Mf2, and the like.
The music theme label includes a first music theme. For example, the first music theme includes music themes Mt1, Mt2, Mt3, and the like.
The second local feature includes a chorus point, a phrase and section point and a beat point of a music fragment corresponding to each of the plurality of images in the music.
S203, determining a target rendering effect combination according to the first feature information, the second feature information and a pre-stored plurality of rendering effects.
The pre-stored plurality of rendering effects may be stored in a preset animation-transition-special effect database.
The rendering effect can be any one of animation, special effect or transition.
The plurality of rendering effects may include a plurality of different animations, a plurality of different special effects, and a plurality of different transitions.
Each rendering effect has a respective attribute. For example, attributes include effect direction, visual impact, and the like.
Alternatively, the target rendering effect combination may include X rendering effects corresponding to each of the plurality of images, may also include X rendering effects corresponding to each of some of the plurality of images, and may also include one or more of an identification, a name, a type, and the like corresponding to each of the X rendering effects.
Alternatively, the X rendering effects may include one or more of animation, special effect, or transition.
Specifically, please refer to the embodiment in
S204, generating a video according to the plurality of images, the music and the target rendering effect combination.
In response to the target rendering effect combination comprising the animation, special effect and transition corresponding to each of the plurality of images, the plurality of images are displayed sequentially according to the animation, special effect and transition corresponding to each of the plurality of images in the target rendering effect combination, and the music is played, to generate the video.
It should be noted that, in response to the target rendering effect combination comprising the animation, special effect, and transition corresponding to each of some of the plurality of images, in the process of sequentially displaying the plurality of images, each of the some of the images is displayed according to the animation, special effect, and transition corresponding to the each of the some of the images in the target rendering effect combination.
In the video generation method according to the embodiment of
In the present disclosure, after the plurality of images and the music matched with the plurality of images are acquired, the electronic device can automatically execute the video generation method according to the embodiments of the present disclosure, so that the video generation time is reduced, and the video generation efficiency is improved. The plurality of images and the music matched with the plurality of images can be selected by the user, so that the video generation method according to the embodiments of the present disclosure achieves a user operable level, and the user experience is improved.
On the basis of the embodiment of
S301, determining a plurality of candidate effects in the plurality of rendering effects according to the first global feature and the second global feature.
For example, in response to the first global feature comprising an image emotion label, an image style label and an image scene label, where the image emotion label includes a first image emotion, the image style label includes a first image style and the image scene label includes a first image scene, and the second global feature includes a music emotion label, a music style label and a music theme label, where the music emotion label includes a first music emotion, the music style label includes a first music style and the music theme label includes a first music theme, S301 may specifically include:
Alternatively, according to the identification of the each rendering effect, the first initial score corresponding to the first image emotion is acquired from a first preset list, the first initial score corresponding to the first image style is acquired from a second preset list, and the first initial score corresponding to the first image scene is acquired from a third preset list. The first preset list is a list corresponding to the image emotion label, the second preset list is a list corresponding to the image style label, and the third preset list is a list corresponding to the image scene label. For example, the first preset list is shown in Table 1 below, the second preset list is shown in Table 2 below, and the third preset list is shown in Table 3 below.
In Tables 1-3, A11-A62, B11-B62, and C11-C62 are the first initial scores.
For example, in response to the first image emotion including Tm1 and Tm2, the first image style includes Tf1, and the first image scene includes Tt1, Tt2; for the rendering effect identified as 1, according to the identification 1, a first initial score A12 corresponding to Tm1 and a first initial score corresponding to Tm2 are acquired from the first preset list, a first initial score B11 corresponding to Tf1 is acquired from the second preset list, and a first initial score C11 corresponding to Tt1 and a first initial score C12 corresponding to Tt2 are acquired from the third preset list.
The screening the plurality of rendering effects according to the first initial score to obtain a plurality of intermediate effects, includes: for each rendering effect, determining a sum of the first initial score corresponding to the first image emotion, the first initial score corresponding to the first image style and the first initial score corresponding to the first image scene, as a first target score corresponding to the rendering effect; and determining rendering effects of which the first target score is greater than or equal to a first threshold in the plurality of rendering effects, as the plurality of intermediate effects.
For example, for the rendering effect identified as 1, a sum of A11, A12, B11, C11, and C12 is determined as the first target score corresponding to the rendering effect identified as 1.
Further, according to the identification of the intermediate effect, a second initial score corresponding to the first music emotion may be acquired from a fourth preset list, a second initial score corresponding to the first music style may be acquired from a fifth preset list, and a second initial score corresponding to the first music theme may be acquired from a sixth preset list. The fourth preset list is a list corresponding to the music emotion label, the fifth preset list is a list corresponding to the music style label, and the sixth preset list is a list corresponding to the music theme label. For example, the fourth preset list is shown in Table 4 below, the fifth preset list is shown in Table 5 below, and the sixth preset list is shown in Table 6 below.
In Tables 4-6, D11-D62, E11-E62, and F11-F62 are the second initial scores.
For example, in response to the first music emotion comprising Me1, the first music style includes Mf1, Mf2, and the first music theme includes Mt1, for an intermediate effect identified as 4, according to the identification 4, a second initial score D41 corresponding to Me1 is acquired from the fourth preset list, a second initial score E41 corresponding to Mf1 and a second initial score E42 corresponding to Mf2 are acquired from the fifth preset list, and a second initial score F41 corresponding to Mt1 is acquired from the sixth preset list.
Alternatively, the screening the plurality of intermediate effects according to the second initial score to obtain the plurality of candidate effects, includes: for each intermediate effect, determining a sum of the second initial score corresponding to the first music emotion, the second initial score corresponding to the first music style and the second initial score corresponding to the first music theme, as a second target score corresponding to the intermediate effect; and determining intermediate effects of which the second target score is greater than or equal to a second threshold in the plurality of intermediate effects, as the plurality of candidate effects.
For example, for an intermediate effect identified as 4, a sum of D41, E41, E42, F41 is determined as the second target score.
Alternatively, the first threshold and the second threshold may be the same or different.
S302, determining one or more target effects in the plurality of candidate effects according to the first local feature.
Alternatively, the first local feature includes a second image emotion, a second image style and a second image scene corresponding to each of the plurality of images; and the determining one or more target effects in the plurality of candidate effects according to the first local feature, includes:
Alternatively, the third target score may be determined by:
A way to obtain the third target score will be described below by taking as an example that the plurality of images include image 1 and image 2, the second image emotion of the image 1 includes Tm1, the second image style thereof includes Tf1, the second image scene thereof includes Tt1; the second image emotion of the image 2 includes Tm2, the second image style thereof includes Tf1, and the second image scene thereof include Tt1 and Tt2.
Under the condition of the image 1:
For the candidate effect identified as 1, a first initial score A11 corresponding to Tm1 is acquired from the first preset list, a first initial score B11 corresponding to Tf1 is acquired from the second preset list, a first initial score C11 corresponding to Tt1 is acquired from the third preset list, and a sum of A11, B11 and C11 is determined as a third target score corresponding to the candidate effect identified as 1 under the condition of the image 1.
Under the condition of the image 2:
Alternatively, the determining one or more target effects in the plurality of candidate effects may be made in 3 ways as follows.
Way 1, determining candidate effects of which the third target score is greater than or equal to a third threshold in the plurality of candidate effects, as the one or more target effects.
For example, in the plurality of candidate effects, a candidate effect of which the third target score corresponding to the plurality of candidate effects is greater than or equal to the third threshold under the condition of the image 1, and a candidate effect of which the third target score corresponding to the plurality of candidate effects is greater than or equal to the third threshold under the condition of the image 2, are determined as the one or more target effects.
Way 2, for each candidate effect, determining a sum of the third target score corresponding to the candidate effect under the condition of the image 1 and the third target score corresponding to the candidate effect under the condition of the image 2, as a total score corresponding to the candidate effect; and determining candidate effects of which the total score is greater than or equal to a fifth threshold in the plurality of candidate effects as the one or more target effects.
Way 3, in the plurality of candidate effects, determining candidate effects of which the third target score corresponding to the plurality of candidate effects is greater than or equal to the third threshold under the condition of the image 1, as a first target effect;
S303, performing combination processing on the one or more target effects to obtain one or more rendering combinations.
For example, in response to the one or more target effects includes 2 animations, 5 transitions, and 3 special effects, the 2 animations, 5 transitions, and 3 special effects are combined to obtain 30 (equal to a product of 2, 5, and 3) rendering combinations.
S304, determining a target rendering effect combination according to the first local feature, the second local feature and the one or more rendering combinations.
Specifically, the first local feature includes a second image emotion, a second image style and a second image scene corresponding to each of the plurality of images; the second local feature includes a chorus point, a phrase and section point and a beat point of a music fragment corresponding to each of the plurality of images in the music; S304 includes:
S3041, screening the one or more rendering combinations according to a second image emotion, a second image style and a second image scene corresponding to a first image in the plurality of images and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the first image, to obtain N initial candidate combinations.
N is an integer greater than or equal to 1.
Specifically, please refer to the embodiment in
S3042, determining M (j−1)th candidate combinations according to the N initial candidate combinations and the one or more rendering combinations, where M is equal to a product of N and a total number of the one or more rendering combinations; and screening the M (j−1)th candidate combinations according to a second image emotion, a second image style and a second image scene corresponding to a jth image, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the jth image, to determine N jth candidate combinations, and taking the N jth candidate combinations as N new initial candidate combinations, adding 1 to j, and repeating the steps until a last image in the plurality of images, and determining a candidate combination corresponding to the last image as the target rendering effect combination.
j is an integer greater than or equal to 2. An initial value of j is 2.
Note that S3042 is repeated for other images than the first image in the plurality of images. Specifically, please refer to the embodiments in
In the embodiments of
In addition, in the embodiments of
Next, the execution process of S3041 will be described below with reference to
S401, determining a combination score corresponding to each of the one or more rendering combinations according to the second image emotion, the second image style and the second image scene corresponding to the first image, and the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image.
For each of the one or more rendering combinations: determining a music matching score according to the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image and an identification of each rendering effect in the rendering combination;
determining an image matching score according to the second image emotion, the second image style and the second image scene corresponding to the first image and the identification of each rendering effect in the rendering combination;
Alternatively, the music matching score is determined by S4011 to S4013 as follows.
S4011, according to the identification of each rendering effect in the rendering combination, the third initial score corresponding to each of the chorus point, the phrase and section point and the beat point is acquired from a seventh preset list.
For example, the seventh preset list has a format as shown in Table 7 below.
In Table 7, G11-G63 are the third initial scores.
For example, in response to the rendering combination comprising an animation identified as 2, a transition identified as 4, and a special effect identified as 6,
S4012, determining a global score of each rendering effect according to the identification of each rendering effect in the rendering combination.
For example, in response to the first music emotion comprising Me1, the first music style comprising Mf1, Mf2, and the first music theme comprising Mt, and in response to the rendering combination comprising the animation identified as 2, the transition identified as 4, and the special effect identified as 6,
S4013, determining the third initial score corresponding to each of the chorus point, the phrase and section point and the beat point and a sum of the global score of each rendering effect, as the music matching score.
On the basis of the above S4011 to S4012, the music matching score is equal to a sum of G21, G22, G23, G41, G42, G43, G61, G62, G63, the global score of the animation identified as 2, the global score of the transition identified as 4, and the global score of the special effect identified as 6.
Alternatively, the image matching score may be determined by S4021 to S4022 as follows.
S4021, acquiring the first initial score corresponding to the second image emotion from the first preset list, acquiring the first initial score of the second image style from the second preset list, and acquiring the first initial score of the second image scene from the third preset list, according to the identification of each rendering effect in the rendering combination.
In response to the rendering combination comprising the animation identified as 2, the transition identified as 4, and the special effect identified as 6, and the second image emotion of the first image includes Tm1, the second image style thereof includes Tf1, the second image scene thereof includes Tt1,
S4022, determining an image matching score according to the plurality of first initial scores obtained in S4021.
Alternatively, on the basis of S4021, a sum of A21, B21, C11, A41, B41, C41, A61, B61, and C71 is determined as the image matching score.
Alternatively, the image matching score may also be determined according to the plurality of first initial scores obtained in S4021 in other manners, and details are not repeated here again.
Alternatively, by taking the attribute of the rendering effect including effect direction and visual impact as an example, a method for determining an internal combination score corresponding to the first image is described in the following S4031 to S4035.
S4031, determining, in an eighth preset list, a fourth initial score of an effect direction corresponding to each rendering effect, according to the identification of each rendering effect in the rendering combination.
For example, the eighth preset list has a format as shown in Table 8 below.
In Table 8, H11-H63 are the fourth initial scores.
It should be noted that, in Tables 1 to 8, the identification is an identification of a rendering effect, the name is a name of the rendering effect, and the type is a type of the rendering effect.
For example, in response to the rendering combination comprising an animation identified as 2, a transition identified as 4, and a special effect identified as 6, according to the identification 2, a fourth initial score H21 of an effect direction corresponding to the animation identified as 2 is acquired from the eighth preset list; according to the identification 4, a fourth initial score H41 of an effect direction corresponding to the transition identified as 4 is acquired from the eighth preset list; according to the identification 6, a fourth initial score H61 of an effect direction corresponding to the special effect identified as 6 is acquired from the eighth preset list.
S4032, determining an effect direction attribute score according to the fourth initial score of the effect direction corresponding to each rendering effect.
Alternatively, determining a similarity corresponding to each two rendering effects according to a fourth initial score of the effect direction corresponding to each of the each two rendering effects; and determining a sum of the similarity corresponding to each two rendering effects as the effect direction attribute score. Alternatively, the similarity may be cosine similarity or other similarities.
For example, in response to the rendering combination comprising the animation identified as 2, the transition identified as 4, and the special effect identified as 6, the similarity corresponding to the animation identified as 2 and the transition identified as 4 is determined, according to the fourth initial score H21 of the effect direction corresponding to the animation identified as 2 and the fourth initial score H41 of the effect direction corresponding to the transition identified as 4;
S4033, according to the identification of each rendering effect in the rendering combination, determining a fourth initial score of a visual impact corresponding to each rendering effect in the eighth preset list.
In response to the rendering combination comprising the animation identified as 2, the transition identified as 4 and the special effect identified as 6, according to the identification 2, a fourth initial score H22 of the visual impact corresponding to the animation identified as 2 is acquired from the eighth preset list;
S4034, determining a visual impact attribute score according to the fourth initial score of the visual impact corresponding to each rendering effect.
Alternatively, an impact difference score corresponding to each two rendering effects is determined according to a fourth initial score of the visual impact corresponding to each one of each two rendering effects; and a sum of the impact difference score corresponding to each two rendering effects is determined as the visual impact attribute score.
Alternatively, the impact difference score corresponding to two rendering effects may be represented by the following formula: −β*|X1−X2|; where β is a preset value, − is a negative sign, * is a multiplication sign, X1 is a fourth initial score of the visual impact corresponding to one of the two rendering effects, X2 is a fourth initial score of the visual impact corresponding to the other of the two rendering effects, ∥ is an absolute value.
For example, in response to the fourth initial score of the visual impact corresponding to the animation identified as 2 being H22, the fourth initial score of visual impact corresponding to the transition identified as 4 being H42, and the fourth initial score of the visual impact corresponding to the special effect identified as 6 being H62, the visual impact attribute score is equal to a sum of (−β*|H22−H42|), (−B*|H42−H62|) and (−β*|H22−H62|).
S4035, determining a sum of the effect direction attribute score and the visual impact attribute score as an internal combination score corresponding to the first image.
S402, determining N rendering combinations of which the combination score is greater than or equal to a fourth threshold in the one or more rendering combinations as N initial candidate combinations.
Specifically, Nis a preset value. For example, N may be 10, 20, etc., and N is not limited herein.
In the embodiment in
On the basis of the above embodiments, a loop process involved in determining the target rendering effect combination is described below with reference to
S501, determining M (j−1)th candidate combinations according to the N initial candidate combinations and the one or more rendering combinations.
M is equal to a product of N and a total number of the one or more rendering combinations.
S502, according to a second image emotion, a second image style and a second image scene corresponding to a jth image, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the jth image, screening the M (j−1)th candidate combinations, to determine N jth candidate combinations.
The initial value of j is 2.
S503, judging whether j is greater than a total number of the plurality of images.
If not, executing S504, otherwise executing S505.
S504, taking the Njth candidate combinations as new N initial candidate combinations, adding 1 to j, and repeating S501-S503.
S505, determining the jth candidate combination with the largest combination score in the N jth candidate combinations as the target rendering effect combination.
It should be noted that, in response to S505 being executed, it indicates that the jth image is the last image, and at this time, the jth candidate combination with the largest combination score in the N jth candidate combinations is the candidate combination corresponding to the last image.
For each jth candidate combination, the combination score corresponding to the jth candidate combination is equal to a sum of the music matching score, the image matching score, the internal combination score and the combination matching score.
The method for determining the music matching score is similar to the execution process of the above S4011 to S4013, and is not described herein again.
The method for determining the image matching score is similar to the execution process of the above S4021 to S4022, and is not described herein again.
The method of determining the internal combination score is similar to the execution process of the above S4031-S4035, and is not described herein again.
The determining of the combination matching score is explained below by taking an example that the plurality of images include a first image and a second image, the jth candidate combination includes animation, transition and special effect of the first image, and animation, transition and special effects of the second image:
The determination method of the first similarity, the second similarity, and the third similarity is similar to the determination method of the similarity corresponding to each two identifications in S4032, and is not described herein again.
The determination method of the first impact difference score, the second impact difference score and the third impact difference score is similar to the determination method of the impact difference score corresponding to each two identifications in S4034, and is not described herein again.
The video generation apparatus provided in the embodiments of the present disclosure may implement the above video generation method, and their implementation principles and beneficial effects are similar, which are not described herein again.
Alternatively, the first feature information comprises a first global feature and a first local feature; the second feature information comprises a second global feature and a second local feature; the second determination module 103 is specifically configured to: determine a plurality of candidate effects in the plurality of rendering effects according to the first global feature and the second global feature; determine one or more target effects in the plurality of candidate effects according to the first local feature, and perform combination processing on the one or more target effects to obtain one or more rendering combinations; and determine the target rendering effect combination according to the first local feature, the second local feature and the one or more rendering combinations.
Alternatively, the first global feature comprises a first image emotion, a first image style and a first image scene corresponding to the plurality of images, and the second global feature comprises a first music emotion, a first music style and a first music theme; the second determination module 103 is specifically configured to: for each rendering effect, acquire a first initial score corresponding to each of the first image emotion, the first image style and the first image scene according to an identification of the rendering effect; screen the plurality of rendering effects according to the first initial score to obtain a plurality of intermediate effects; for each intermediate effect, acquire a second initial score corresponding to each of the first music emotion, the first music style and the first music theme according to an identification of the intermediate effect; and screen the plurality of intermediate effects according to the second initial score to obtain the plurality of candidate effects.
Alternatively, the second determination module 103 is specifically configured to: for each rendering effect, determine a sum of the first initial score corresponding to the first image emotion, the first initial score corresponding to the first image style and the first initial score corresponding to the first image scene, as a first target score corresponding to the rendering effect; and determine rendering effects of which the first target score is greater than or equal to a first threshold in the plurality of rendering effects, as the plurality of intermediate effects.
Alternatively, the second determination module 103 is specifically configured to: for each intermediate effect, determine a sum of the second initial score corresponding to each of the first music emotion, the first music style and the first music theme, as a second target score corresponding to the intermediate effect; and determine intermediate effects of which the second target score is greater than or equal to a second threshold in the plurality of intermediate effects, as the plurality of candidate effects.
Alternatively, the first local feature comprises a second image emotion, a second image style and a second image scene corresponding to each of the plurality of images; the second determination module 103 is specifically configured to: for each image, determine a third target score corresponding to each of the plurality of candidate effects under a condition of the each image, according to the plurality of candidate effects and the second image emotion, the second image style and the second image scene corresponding to the each image; and determine the one or more target effects in the plurality of candidate effects according to the third target score corresponding to each of the plurality of candidate effects.
Alternatively, the first local feature comprises a second image emotion, a second image style and a second image scene corresponding to each of the plurality of images; the second local feature comprises a chorus point, a phrase and section point and a beat point of a music fragment corresponding to each of the plurality of images in the music; the second determination module 103 is specifically configured to: screen the one or more rendering combinations according to a second image emotion, a second image style and a second image scene corresponding to a first image in the plurality of images, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the first image, to obtain N initial candidate combinations, where N is an integer greater than or equal to 1; determine M (j−1)th candidate combinations according to the N initial candidate combinations and the one or more rendering combinations, where M is equal to a product of N and a total number of the one or more rendering combinations; screen the M (j−1)th candidate combinations according to a second image emotion, a second image style and a second image scene corresponding to a jth image, and a chorus point, a phrase and section point and a beat point of a music fragment corresponding to the jth image, to determine N jth candidate combinations, and take the N jth candidate combinations as new N initial candidate combinations, add 1 to j, and repeat the steps until a last image in the plurality of images, and determine a candidate combination corresponding to the last image as the target rendering effect combination; where an initial value of j is 2.
Alternatively, the second determination module 103 is specifically configured to: determine a combination score corresponding to each of the one or more rendering combinations according to the second image emotion, the second image style and the second image scene corresponding to the first image, and the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image; and determine N rendering combinations of which the combination score is greater than or equal to a fourth threshold in the one or more rendering combinations, as the N initial candidate combinations.
Alternatively, the second determination module 103 is specifically configured to: for each rendering combination in the one or more rendering combinations, determine a music matching score according to the chorus point, the phrase and section point and the beat point of the music fragment corresponding to the first image and an identification of each rendering effect in the rendering combination; determine an image matching score according to the second image emotion, the second image style and the second image scene corresponding to the first image and the identification of each rendering effect in the rendering combination; determine an internal combination score corresponding to the first image; and determine the music matching score, the image matching score and the internal combination score as the combination score corresponding to the rendering combination.
Alternatively, the first determination module 102 is specifically configured to: perform feature extraction on the plurality of images through a pre-stored image feature extraction model, to obtain the first feature information of the plurality of images; and perform feature extraction on the music through a pre-stored music feature extraction model, to obtain the second feature information.
Alternatively, the generation module 104 is specifically configured to: sequentially display the plurality of images according to the animation, special effect and transition corresponding to each of the plurality of images in the target rendering effect combination, and play the music, to generate the video.
Alternatively, the acquisition module 101 is specifically configured to: in response to a selection operation of a plurality of target images in a plurality of candidate images, determine the plurality of target images as the plurality of images; and in response to a selection operation of target music in a plurality of candidate music, determine the target music as the music matched with the plurality of images.
The video generation apparatus provided in the embodiment of the present disclosure may implement the above video generation method, and their implementation principles and beneficial effects are similar, which are not described herein again.
The memory 202 is configured to store computer-executable instructions;
The processor 203 is configured to execute the computer-executable instructions stored in the memory 202, to cause the processor 203 to perform the video generation method described above.
An embodiment of the present disclosure provides a computer-readable storage medium, having computer-executable instructions stored thereon, which, in response to being executed by a processor, implement the video generation method described above.
An embodiment of the present disclosure further provides a computer program product, comprising a computer program, which, in response to being executed by a processor, implements the video generation method described above.
An embodiment of the present disclosure further provides a computer program, which, in response to being executed by a processor, implements the video generation method described above.
All or part of the steps of the above-described method embodiments may be performed by hardware associated with program instructions. The foregoing program may be stored in a readable memory. In response to being executed, the program performs the steps of the method embodiments described above; and the memory (storage medium) includes: read-only memory (ROM), random access memory (RAM), flash memory, hard disk, solid state disk, magnetic tape, floppy disk, optical disc, and any combination thereof.
The embodiments of the present disclosure are described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to the embodiments of the present disclosure. It will be understood that each flow and/or block of the flowchart and/or block diagram, and combinations of flows and/or blocks in the flowchart and/or block diagram, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing devices to produce a machine, such that the instructions, executed via the processing unit of the computer or other programmable data processing devices, create means for implementing the functions specified in one flow or one or more flows in the flowchart and/or one block or one or more blocks in the block diagram.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing devices to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the functions specified in one flow or one or more flows in the flowchart and/or one block or one or more blocks in the block diagram.
These computer program instructions may also be loaded onto a computer or other programmable data processing devices, to cause a series of operational steps to be performed on the computer or other programmable devices to produce a computer implemented process, such that the instructions executed on the computer or other programmable devices provide steps for implementing the functions specified in one flow or one or more flows in the flowchart and/or one block or one or more blocks in the block diagram.
It will be apparent to those skilled in the art that various variations and modifications may be made to the embodiments of the present disclosure without departing from the spirit and scope of the present disclosure. Thus, if such modifications and variations to the embodiments of the present disclosure fall within the scope of the claims of the present disclosure and their equivalents, the present disclosure is also intended to encompass these modifications and variations.
In the present disclosure, the term “include” and variations thereof may refer to non-limiting inclusions; the term “or” and variations thereof may mean “and/or”. The terms “first,” “second,” and the like in the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. In the present disclosure, “at least two” means two or more. “and/or” describes association of associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that: A exists alone, A and B exist simultaneously, and B exists alone. The character “/” generally indicates that the former and latter associated objects are in an “or” relationship.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The present disclosure is intended to cover any variation, use, or adaptation of the present disclosure, which follow the general principles of the present disclosure, and include common knowledge or customary technical means in the technical field not disclosed in the present disclosure. It is intended that the specification and the embodiments be considered as exemplary only, with a true scope and spirit of the present disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes can be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202111373450.1 | Nov 2021 | CN | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/SG2022/050839 | 11/18/2022 | WO |