Embodiments of the present disclosure relate to a method and an apparatus for video frame interpolation processing, and a non-instantaneous readable storage medium.
The video processing is a typical application of artificial intelligence, and the video frame interpolation technology is also a typical technology in the video processing. The video frame interpolation technology aims to synthesize intermediate video frames with smooth transitions based on the front and back video frames in a video, in order to make video playback smoother and improve the user's viewing experience. For example, a video with a frame rate of 24 can be transformed into a video with a frame rate of 48 through the video frame interpolation processing, allowing the users to feel the video clearer and smoother when watching.
At least one embodiment of the present disclosure provides a method for video frame interpolation processing, including: acquiring the first video frame and the second video frame of the video; acquiring the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame; and determining whether to interpolate a frame between the first video frame and the second video frame based on the first comparison result. The first video frame and the second video frame are adjacent temporally, and the first video frame is a forward frame of the second video frame. The first comparison result indicates whether a picture switch exists between the first video frame and the second video frame.
For example, in the method provided in at least one embodiment of the present disclosure, the picture switch includes a subtitle switch and/or a scene switch.
For example, in the method provided in at least one embodiment of the present disclosure, acquiring the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame, includes: determining whether the subtitle switch exists between the first video frame and the second video frame based on whether a subtitle content of the first video frame and a subtitle content the second video frame are identical.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether the subtitle switch exists between the first video frame and the second video frame based on whether the subtitle content of the first video frame and the subtitle content of the second video frame are identical, includes: acquiring an audio segment corresponding to the first video frame; acquiring a start video frame and an end video frame corresponding to the audio segment based on the audio segment; and determining whether the subtitle switch exists between the first video frame and the second video frame based on the start video frame and the end video frame.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether the subtitle switch exists between the first video frame and the second video frame based on the start video frame and the end video frame, includes: in response to the second video frame being between the start video frame and the end video frame, determining that the subtitle switch does not exist between the first video frame and the second video frame; and in response to the second video frame not being between the start video frame and the end video frame, determining that the subtitle switch exists between the first video frame and the second video frame.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether the subtitle switch exists between the first video frame and the second video frame based on whether the subtitle content of the first video frame and the subtitle content the second video frame are identical, includes: acquiring the first recognition text content of the first video frame; acquiring the second recognition text content of the second video frame; and in response to the first recognition text content and the second recognition text content being identical, determining that the subtitle switch does not exist between the first video frame and the second video frame.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether the subtitle switch exists between the first video frame and the second video frame based on whether the subtitle content of the first video frame and the subtitle content the second video frame are the same, further includes: in response to the first recognition text content being different from the second recognition text content acquiring a first sub-image of the first video frame; acquiring a second sub-image of the second video frame; and determining whether the subtitle switch exists between the first video frame and the second video frame based on the first sub-image and the second sub-image. The first sub-image corresponds to the first subtitle content of the first video frame; and the second sub-image corresponds to the second subtitle content of the second video frame.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether the subtitle switch exists between the first video frame and the second video frame based on the first sub-image and the second sub-image, includes: determining the first similarity between the first sub-image and the second sub-image based on the first sub-image and the second sub-image; in response to the first similarity being greater than the first threshold, determining that the subtitle switch does not exist between the first video frame and the second video frame; and in response to the first similarity being not greater than the first threshold, determining that the subtitle switch exists between the first video frame and the second video frame.
For example, in the method provided in at least one embodiment of the present disclosure, acquiring the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame, includes: determining whether the scene switch exists between the first video frame and the second video frame based on whether the scene of the first video frame and the scene of the second video frame are identical.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether the scene switch exists between the first video frame and the second video frame based on whether the scene of the first video frame and the scene of the second video frame are identical, includes: acquiring the second similarity between the first video frame and the second video frame; in response to the second similarity being greater than the second threshold, determining that the scene switch does not exist between the first video frame and the second video frame; and in response to the second similarity being not greater than the second threshold, determining that the scene switch exists between the first video frame and the second video frame.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether to interpolate a frame between the first video frame and the second video frame based on the first comparison result, includes: in response to the first comparison result indicating that the picture switch does not exist between the first video frame and the second video frame, determining to interpolate a frame between the first video frame and the second video frame; and in response to the first comparison result indicating that the picture switch exists between the first video frame and the second video frame, determining not to interpolate a frame between the first video frame and the second video frame.
For example, in the method provided in at least one embodiment of the present disclosure, further including: setting the first frame interpolation flag; and in response to the picture switch existing between the first video frame and the second video frame, modifying the first frame interpolation flag to the second frame interpolation flag.
For example, in the method provided in at least one embodiment of the present disclosure, further including: in response to the picture switch existing between the first video frame and the second video frame, acquiring the fourth video frame; acquiring the second comparison result between the second video frame and the fourth video frame based on the second video frame and the fourth video frame; determining whether to interpolate a frame between the second video frame and the fourth video based on the second comparison result. The fourth video frame and the second video frame are adjacent temporally, the second video frame is a forward frame of the fourth video frame, and the second comparison result indicates whether the picture switch exists between the second video frame and the fourth video frame.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether to interpolate a frame between the second video frame and the fourth video based on the second comparison result, includes: in response to the second comparison result indicating that the picture switch does not exist between the second video frame and the fourth video frame, interpolating a plurality of video frames between the second video frame and the fourth video frame. The number of the plurality of video frames is based on the second frame interpolation flag.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether to interpolate a frame between the second video frame and the fourth video based on the second comparison result, includes: in response to the second comparison result indicating that the picture switch exists between the second video frame and the fourth video frame, determining not to interpolate a frame between the second video frame and the fourth video frame; and modifying the second frame interpolation flag to the third frame interpolation flag. The third frame interpolation flag is used to indicate the number of frames to be interpolated next.
For example, in the method provided in at least one embodiment of the present disclosure, further including: in response to interpolating the third video frame between the first video frame and the second video frame, acquiring the first sub-image of the first video frame; acquiring the third sub-image of the third video frame; and determining whether to replace the third video frame with the first video frame based on the first sub-image and the third sub-image. The first sub-image corresponds to the first subtitle content of the first video frame, and the third sub-image corresponds to the third subtitle content of the third video frame.
For example, in the method provided in at least one embodiment of the present disclosure, determining whether to replace the third video frame with the first video frame based on the first sub-image and the third sub-image, includes: acquiring a pixel value of the first pixel in the first sub-image; setting a pixel value of the third pixel in the third sub image based on the pixel value of the first pixel in the first sub-image; and determining whether to replace the third video frame with the first video frame based on the first sub-image and the set third sub-image. The pixel value of the first pixel is greater than the third threshold; and the relative position of the third pixel in the third sub-image is the same as the relative position of the first pixel in the first sub-image.
At least one embodiment of the present disclosure also provides an apparatus for video frame interpolation processing including an acquisition module, a comparison module, and an operation module. The acquisition module is configured to acquire the first video frame and the second video frame of the video. The first video frame and the second video frame are adjacent temporally, and the first video frame is a forward frame of the second frame. The comparison module is configured to acquire the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame. The first comparison result indicates whether a picture switch exists between the first video frame and the second video frame. The operation module is configured to determine whether to interpolate a frame between the first video frame and the second video frame based on the first comparison result.
At least one embodiment of the present disclosure also provides an apparatus for video frame interpolation processing including a processor and a memory. The memory includes one or more computer program modules. The one or more computer program modules are stored in the memory and are configured to be executed by the processor, and the one or more computer program modules include instructions for executing the method for video frame interpolation processing in any of the above embodiments.
At least one embodiment of the present disclosure also provides a non-instantaneous readable storage medium storing computer instructions. The computer instructions upon execution by a processor, cause the processor to execute the method for video frame interpolation processing in any of the above embodiments.
In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the drawings of the embodiments are briefly introduced below. Apparently, the drawings described below only relate to some embodiments of the present disclosure, rather than limiting the present disclosure.
In order to make objects, technical details and advantages of the embodiments of the present disclosure apparent, the technical solutions of the embodiments are described in a clearly and fully understandable way in connection with the drawings related to the embodiments of the present disclosure. Apparently, the described embodiments are just a part but not all of the embodiments of the present disclosure. Based on the described embodiments herein, those skilled in the art can obtain other embodiment(s), without any inventive work, which should be within the scope of the present disclosure.
Flowcharts are used in the present disclosure to illustrate the operations performed by the system according to the embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in an exact order. Instead, various steps may be processed in reverse order or concurrently, as desired. At the same time, other operations can be added to these procedures, or a certain step or steps can be removed from these procedures.
Unless otherwise defined, all the technical and scientific terms used herein have the same meanings as commonly understood by those of ordinary skill in the art to which the present disclosure belongs. The terms “first”, “second”, and the like, which are used in the description and the claims of the present disclosure, are not intended to indicate any sequence, amount or importance, but used to distinguish various components. Similarly, the terms “a”, “an”, “the”, or the like are not intended to indicate a limitation of quantity, but indicate that there is at least one. The terms, such as “comprise/comprising”, “include/including”, or the like are intended to specify that the elements or the objects stated before these terms encompass the elements or the objects and equivalents thereof listed after these terms, but not preclude other elements or objects. The terms, such as “connect/connecting/connected”, “couple/coupling/coupled”, or the like, are not limited to a physical connection or mechanical connection, but may include an electrical connection/coupling, directly or indirectly. The terms, “on”, “under”, “left”, “right”, or the like are only used to indicate relative position relationship, and when the position of the object which is described is changed, the relative position relationship may be changed accordingly.
As illustrated in
Currently, commonly used video frame interpolation algorithms cannot deal with the deformation problem well, for example, the deformation problem caused by scene switch, subtitle switch, and the like. This is because most video frame interpolation algorithms need to use the information of the previous and subsequent frames of the video. When the subtitles/scenes of the previous and subsequent frames of the video are switched, the optical flow information of the previous and subsequent frames cannot be correctly estimated, and therefore obvious deformation will occur.
At least to overcome the above technical problems, at least one embodiment of the present disclosure provides a method for video frame interpolation processing, including: acquiring the first video frame and the second video frame of the video; acquiring the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame; and determining whether to interpolate a frame between the first video frame and the second video frame based on the first comparison result. The first video frame and the second video frame are adjacent temporally, and the first video frame is a forward frame of the second video frame. The first comparison result indicates whether a picture switch exists between the first video frame and the second video frame.
Correspondingly, at least one embodiment of the present disclosure further provides an apparatus for video frame interpolation processing and a non-instantaneous readable storage medium corresponding to the above method for video frame interpolation processing.
The method of video frame interpolation processing according to at least one embodiment of the present disclosure can solve the obvious deformation problem caused by the video picture switch during the frame interpolation processing, ensure the smoothness of the video, and thereby improve the user's viewing experience.
The video frame interpolation processing method provided according to at least one embodiment of the present disclosure is non-limitingly described below through several examples or embodiments. As described below, different characters in these specific examples or embodiments may be combined with each other without conflicting with each other, thereby obtaining new examples or embodiments, and all of these new examples or embodiments also should be within the scope of the present disclosure.
At least one embodiment of the present disclosure provides the method for video frame interpolation processing 10, as illustrated in
S101: acquiring the first video frame and the second video frame of the video. The first video frame and the second video frame are adjacent temporally, and the first video frame is a forward frame of the second frame.
S102: acquiring the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame. The first comparison result indicates whether a picture switch exists between the first video frame and the second video frame.
S103: determining whether to interpolate a frame between the first video frame and the second video frame based on the first comparison result.
It should be noted that, in the embodiments of the present disclosure, the terms “first video frame” and “second video frame” are used to refer to any two temporally consecutive or adjacent images or video frames in the video or video frame sequence. The term “first video frame” is used to refer to the previous frame of the two temporally adjacent images, the term “second video frame” is used to refer to the subsequent frame of the two temporally adjacent frames of images, and the term “third video frame” is used to refer to an intermediate frame or an interpolation frame interpolated between two temporally adjacent images. The term “first video frame”, “second video frame” or “third video frame” is not limited to a specific frame of image or a specific sequence. The term “first comparison result” is used to refer to the comparison result between two adjacent frames of images in the video, and is not limited to a specific comparison result or a specific order. It should also be noted that the embodiments of the present disclosure use the forward frames of two adjacent frames as a reference, or the backward frames of two adjacent frames as a reference, as long as the entire video frame interpolation processing method is consistent.
For example, in at least one embodiment of the present disclosure, for step S102, in order to avoid the deformation problem caused by the video picture switch between the previous and subsequent frames of the video, the first video frame and the second video frame which are adjacent can be compared to determine whether the picture switch exists between the first video frame and the second video frame.
For example, in at least one embodiment of the present disclosure, for step S103, it can be determined whether to perform a frame interpolation operation between the first video frame and the second video frame based on the first comparison result between the first video frame and the second video frame. For example, in some examples, the frame interpolation operation can be achieved by using the optical flow prediction method to calculate the intermediate frame/interpolation frame based on the adjacent first and second video frames.
It should be noted that the embodiments of the present disclosure do not specifically limit the method of how to acquire the intermediate frame/interpolation frame (i.e., the third video frame), and various conventional frame interpolation methods may be used to acquire the third video frame. For example, the intermediate frame/interpolation frame may be generated based on two adjacent video frames, may be generated based on more adjacent frames, or may be generated based on a specific or some specific video frames, which is not limited in the present disclosure and can be set according to the actual situation. For example, in at least one embodiment of the present disclosure, step S103 includes in response to the first comparison result indicating that the picture switch does not exist between the first video frame and the second video frame, determining to interpolate a frame between the first video frame and the second video frame; and in response to the first comparison result indicating that the picture switch exists between the first video frame and the second video frame, determining not to interpolate a frame between the first video frame and the second video frame.
Therefore, in the method for video frame interpolation processing 10 provided by at least one embodiment of the present disclosure, the frame interpolation operation is selectively performed according to the comparison result between adjacent video frames, which effectively avoids the obvious deformation problem caused by the video picture switch during the frame interpolation processing, ensures the smoothness of the video, and thereby improves the user's viewing experience.
For example, in at least one embodiment of the present disclosure, the picture switch between the first video frame and the second video frame include subtitle switch, scene switch, and the like. The embodiments of the present disclosure do not limit this.
For example, in one example, the subtitle in the first video frame is “Where are you going” and the subtitle in the second video frame is “I'm going to school”. When the subtitle in the first video frame is different from the subtitle in the second video frame, it can be considered that a subtitle switch occurs between the first video frame and the second video frame. It should be noted that the embodiments of the present disclosure do not limit the subtitle content.
For example, in one example, when the scene in the first video frame is in a shopping mall, the scene in the second video frame is in a school, and the scene in the first video frame is different from the scene in the second video frame, it can be considered that a scene switch occurs between the first video frame and the second video frame. It should be noted that in the embodiments of the present disclosure, the scene in each video frame may include any scene such as the shopping mall, the school, the scenic spot, and the like. The embodiments of the present disclosure do not limit this.
For example, in at least one embodiment of the present disclosure, for step S102, acquiring the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame includes: determining whether the subtitle switch exists between the first video frame and the second video frame based on whether the subtitle content of the first video frame and the subtitle content of the second video frame are identical.
For example, in at least one embodiment of the present disclosure, for determining whether the subtitle switch occurs between adjacent frames, the start and end of each sentence in the audio of the video can be located to acquire the two video frames corresponding to the audio, and the two video frames can be marked according to the time information of the corresponding audio frames to determine whether the corresponding subtitle is switched.
For example, in at least one embodiment of the present disclosure, determining whether the subtitle switch exists between the first video frame and the second video frame based on whether the subtitle content of the first video frame and the second video frame are the same, includes the following step S201 to S203, as illustrated in
S201: acquiring an audio segment corresponding to the first video frame.
S202: acquiring a start video frame and an end video frame corresponding to the audio segment based on the audio segment.
S203: determining whether the subtitle switch exists between the first video frame and the second video frame based on the start video frame and the end video frame.
It should be noted that in embodiments of the present disclosure, “start video frame” and “end video frame” are used to refer to two video frames determined based on the time information of the corresponding audio segment. The “start video frame” and “end video frame” are not limited to a particular video frame or a particular order.
For example, in at least one embodiment of the present disclosure, for step S201, the corresponding audio data can be input into a voice recognition system for voice segmentation, obtaining the voice recognition result and the corresponding time information. For example, the time information includes the start time and the end time of the corresponding audio segment. Based on the voice recognition result and the corresponding time information, an audio segment corresponding to the first video frame can be acquired.
For example, in at least one embodiment of the present disclosure, for step S202, based on the time information of the recognized corresponding audio segment, the start video frame and end video frame corresponding to the audio segment can be determined.
It should be noted that the embodiments of the present disclosure do not limit the voice recognition method, and any effective voice recognition method may be used.
For example, in at least one embodiment of the present disclosure, step S203 includes: in response to the second video frame being between the start video frame and the end video frame, determining that the subtitle switch does not exist between the first video frame and the second video frame; and in response to the second video frame not being between the start video frame and the end video frame, determining that the subtitle switch exists between the first video frame and the second video frame.
For example, in at least one example of the present disclosure, a video includes a sequence of video frames, for example, the video includes temporally adjacent video frame 1, video frame 2, video frame 3, video frame 4, video frame 5 . . . . Assuming that the first video frame is the video frame 2, and the audio segment corresponding to the first video frame is “where are you going”, according to the time information of the audio segment (e.g., the start time point and end time point of a sentence), it is determined that the start video frame corresponding to the audio segment is video frame 1 and the end video frame is the video frame 4. In this case, it indicates that the subtitles displayed on the pictures for the video frame 1 to the video frame 4 are all “where are you going”, that is, the same subtitle content is displayed. For example, assuming that the second video frame is the video frame 3, between the video frame 1 and the video frame 4, then no subtitle switch occurs between the first video frame and the second video frame. For another example, assuming that the second video frame is the video frame 5, not between the video frame 1 and the video frame 4, then a subtitle switch occurs between the first video frame and the second video frame. With the above operation, it can be determined which video frames have the subtitle switch through the audio corresponding to the video.
For example, in at least one embodiment of the present disclosure, for determining whether the subtitle switch occurs between adjacent video frames, in addition to determining by audio, a method of text recognition may also be used. For example, in some examples, a text recognition algorithm is used to acquire the subtitle contents displayed on the first video frame and the second video frame, and a comparison is made to determine whether the subtitle switch occurs between the first video frame and the second video frame. It should be noted that the embodiments of the present disclosure do not limit the text recognition algorithm, as long as the text content can be recognized.
For example, in at least one embodiment of the present disclosure, as illustrated in
For example, in at least one embodiment of the present disclosure, the determining whether the subtitle switch occurs between adjacent video frames (the first video frame and the second video frame) includes: acquiring the first recognition text content of the first video frame, acquiring the second recognition text content of the second video frame, in response to the first recognition text content and the second recognition text content being the same, determining that the subtitle switch does not exist between the first video frame and the second video frame.
It should be noted that in the embodiments of the present disclosure, the term “first recognition text content” or “second recognition text content” is used to refer to the recognition text content obtained by performing the text recognition operation on the corresponding video frame. The “first recognition text content” and “second recognition text content” are not limited to specific text contents or specific order.
For example, in at least one embodiment of the present disclosure, in order to recognize the subtitle more accurately, the scope for applying the text recognition operation can be set in advance. Because of the fixed display position of the subtitle in the video picture, it is possible to set the approximate region where the subtitle is located in advance.
Usually, the text recognition algorithm cannot achieve 100% accuracy. For example, the text recognition algorithm may result in incomplete accuracy of text partitioning and other issues. For example, in some examples, text recognized in positions other than the subtitle causes the mismatch of the text sequence recognized in the previous and subsequent frames. In order to more accurately determine whether the subtitle switch exists, the method for video frame interpolation processing 10 provided in the embodiments of the present disclosure includes the following steps S301-S303, as illustrated in
S301: in response to the first recognition text content being different from the second recognition text content, acquiring the first sub-image of the first video frame. The first sub-image corresponds to the first subtitle content of the first video frame;
S302: acquiring the second sub-image of the second video frame. The second sub-image corresponds to the second subtitle content of the second video frame;
S303: determining whether the subtitle switch exists between the first video frame and the second video frame based on the first sub-image and the second sub-image. It should be noted that in the embodiments of the present disclosure, the term “first subtitle content” or “second subtitle content” is used to refer to the subtitle content displayed in the corresponding video frame. The terms “first subtitle content” and “second subtitle content” are not limited to specific subtitle contents or the specific order.
It should also be noted that in the embodiments of the present disclosure, the term “first sub-image”, “second sub-image”, or “third sub-image” is used to refer to the image of the subtitle region in the corresponding video frame. The terms “first sub-image”, “second sub-image”, and “third sub-image” are not limited to specific images or a specific order.
For example, in at least one embodiment of the present disclosure, a text recognition operation is performed on a certain video frame, and the coordinates of the subtitle in the video frame are recognized (for example, the coordinates of the top left, bottom left, top right, and bottom right vertex positions of a complete sentence of a subtitle). Based on the coordinates, a region where the subtitle is located in the video frame can be obtained, thereby obtaining a sub-image corresponding to the subtitle content of the video frame.
For example, in at least one embodiment of the present disclosure, step S303 includes: determining the first similarity between the first sub-image and the second sub-image based on the first sub-image and the second sub-image; in response to the first similarity being greater than the first threshold, determining that the subtitle switch does not exist between the first video frame and the second video frame; and in response to the first similarity being not greater than the first threshold, determining that the subtitle switch exists between the first video frame and the second video frame.
It should be noted that in the embodiments of the present disclosure, the term “first similarity” is used to refer to the image similarity between subtitle sub-images of two adjacent video frames. The term “second similarity” is used to refer to the image similarity between two adjacent video frames. The terms “first similarity” and “second similarity” are not limited to a specific similarity or order.
It should also be noted that in the embodiments of the present disclosure, there are no limitations to the values of the terms “first threshold”, “second threshold”, and “third threshold”, which can be set according to actual needs. The terms “first threshold”, “second threshold”, and “third threshold” are not limited to a specific value or order.
For example, in the embodiments of the present disclosure, the image similarity between two images can be calculated through various methods. For example, through the cosine similarity algorithm, the histogram algorithm, the perceptual hash algorithm, the mutual information-based algorithm, and the like. The embodiments of the present disclosure do not limit the methods for calculating image similarity, which can be selected according to actual needs.
For example, in at least one embodiment of the present disclosure, the structural similarity (SSIM) algorithm can be used to calculate the similarity between two images. For SSIM, it is a fully referenced image quality evaluation indicator that measures image similarity from three aspects: brightness, contrast, and structure. The formula for calculating SSIM is as follows:
where μx represents the average value of x, μy represents the average value of y, σx2 represents the variance of x, σy2 represents the variance of y, σxy represents the covariance of x and y. c1=(k2L)2, c2=(k2L)2 represent the constants for maintaining stability. L represents the dynamic range of the pixel value. k1=0.01, k2=0.03. The range of values for structural similarity is −1 to 1. The larger the value, the smaller the image distortion. When the two images are identical, the SSIM value is equal to 1.
For example, in at least one embodiment of the present disclosure, the “first threshold” can be set to 0.6 or 0.8. It should be noted that the embodiments of the present disclosure do not limit the value of the “first threshold”, which can be set according to actual needs.
For example, in at least one embodiment of the present disclosure, as illustrated in
It should be noted that the embodiments of the present disclosure do not limit the method of calculating text similarity. For example, methods such as the Euclidean distance, the Manhattan distance, and the cosine similarity can be used to calculate text similarity. It should also be noted that the embodiments of the present disclosure do not specifically limit the threshold of the text similarity, which can be set according to actual needs.
For example, in at least one embodiment of the present disclosure, the picture switch includes the scene switch in addition to the subtitle switch. For example, step S102 includes: determining whether the scene switch exists between the first video frame and the second video frame based on whether the scene of the first video frame and the scene of the second video frame are identical.
For example, in at least one embodiment of the present disclosure, when the scene switch exists in the video, the image similarity (e.g., SSIM value) between the previous and subsequent frames of the image is significantly reduced. Therefore, the scene switch can be achieved by calculating the image similarity.
For example, in at least one embodiment of the present disclosure, the determining whether the scene switch exists between the first video frame and the second video frame includes the following steps: acquiring the second similarity between the first video frame and the second video frame; in response to the second similarity being greater than the second threshold, determining that the scene switch does not exist between the first video frame and the second video frame; in response to the second similarity being not greater than the second threshold, determining that the scene switch exists between the first video frame and the second video frame.
For example, in at least one embodiment of the present disclosure, the second similarity can be the structural similarity (SSIM), and can also be, for example, the perceptual hash algorithm, the histogram algorithm, and the like to calculate the similarity between images (i.e., video frames). The embodiments of the present disclosure do not limit the algorithm for calculating the image similarity.
It should be noted that, in the embodiments of the present disclosure, the number of frames to be interpolated is in the example of 2-fold interpolation. For example, interpolating from 30 fps (frames per second) to 60 fps indicates that the number of frames transmitted per second is increased from 30 frames to 60 frames. When a scene switch or a subtitle switch is detected between two adjacent video frames, no more frame interpolation is performed between the current two frames. In order to ensure that the number of frames is consistent, two frames are interpolated in the next frame interpolation. For another example, when the scene switch and the subtitle switch occur twice consecutively, it 1 results in two unexecuted frame interpolation operations. If only two frames are interpolated in the next frame interpolation, it would result in fewer frames in the overall video.
For example, in order to avoid the occurrence of fewer frames described above, in at least one embodiment of the present disclosure, in addition to steps S101-S103, the method for video frame interpolation processing 10 includes: setting the first frame interpolation flag; and in response to the picture switch existing between the first video frame and the second video frame, modifying the first frame interpolation flag to the second frame interpolation flag.
It should be noted that, in the embodiments of the present disclosure, the terms “first frame interpolation flag”, “second frame interpolation flag” and “third frame interpolation flag” refer to frame interpolation flags at different time points or stages, so as to indicate how many consecutive picture switches exist in the video. The terms “first frame interpolation flag”, “second frame interpolation flag” and “third frame interpolation flag” are not limited to a specific value or a specific order.
For example, in some examples, it is assumed that the video includes a sequence of video frames, for example, the video includes temporally adjacent video frame 1, video frame 2, video frame 3, video frame 4, video frame 5, . . . . For example, in one example, a frame interpolation flag is provided, for example, the frame interpolation flag is initialized as (0, 0). Two adjacent video frames (e.g., the first video frame and the second video frame) are input, assuming that the first video frame is the video frame 2 and the second video frame is the video frame 3. Whether there is a picture switch (subtitle switch or scene switch) between the video frame 2 and the video frame 3 is determined by the method described in the above embodiments. If a picture switch exists between the video frame 2 and the video frame 3, modify the frame interpolation flag from (0,0) to (0,1). For example, in some examples, when it is determined that a picture switch occurs between two adjacent video frames, a value “1” is added to the frame interpolation flag (0,0), and the previous value “0” is popped up, that is, the updated interpolation flag is (0,1). When it is determined that there is no picture switching between two adjacent video frames, a value “0” is added to the frame interpolation flag (0,0), and the previous value “0” is popped up, that is, the updated frame interpolation flag is (0,0).
For example, in at least one embodiment of the present disclosure, in response to the picture switch existing between the first video frame and the second video frame, acquiring the fourth video frame. Acquiring the second comparison result between the second video frame and the fourth video frame based on the second video frame and the fourth video frame. Determining whether to interpolate a frame between the second video frame and the fourth video based on the second comparison result. The fourth video frame and the second video frame are adjacent temporally, and the second video frame is a forward frame of the fourth video frame. The second comparison result indicates whether the picture switch exists between the second video frame and the fourth video frame.
For example, in at least one embodiment of the present disclosure, determining whether to interpolate a frame between the second video frame and the fourth video based on the second comparison result, includes: in response to the second comparison result indicating that the picture switch does not exist between the second video frame and the fourth video frame, interpolating a plurality of video frames between the second video frame and the fourth video frame. The number of the plurality of video frames is based on the second frame interpolation flag.
For example, in at least one embodiment of the present disclosure, the determining whether to interpolate a frame between the second video frame and the fourth video based on the second comparison result, includes: in response to the second comparison result indicating that the picture switch exists between the second video frame and the fourth video frame, determining not to interpolate a frame between the second video frame and the fourth video frame; and modifying the second frame interpolation flag to the third frame interpolation flag. The third frame interpolation flag is used to indicate the number of frames to be interpolated next.
It should be noted that the term “fourth video frame” is used to refer to the subsequence frame temporally adjacent to the “second video frame”, and the fourth video frame is not limited to a specific frame or a specific order. The term “second comparison result” is used to refer to the comparison result between two adjacent frames (the second video frame and the fourth video frame) in the video, and is not limited to a specific comparison result or a specific order.
For example, in some examples, it is assumed that the video includes a sequence of video frames, for example, the video includes temporally adjacent video frame 1, video frame 2, video frame 3, video frame 4, video frame 5, . . . . Assuming that the first video frame is the video frame 1, the second video frame is the video frame 2, and the fourth video frame is the video frame 3. As illustrated in FIG. 7, if the video frame 1 and the video frame 2 are input, it is determined that a picture switch (subtitle switch or scene switch) exists between the video frame 1 and the video frame 2, in this case, no frame interpolation operation is performed between the video frame 1 and the video frame 2, and the frame interpolation flag is set to (0,1). Then, two adjacent video frames, that is, the video frame 2 and the video frame 3, are input and it is determined whether a picture switch (subtitle switch or scene switch) exists between the video frame 2 and the video frame 3 through the method provided by the above embodiments. For example, if it is determined that no picture switch exists between the video frame 2 and the video frame 3, a frame interpolation operation is performed between the video frame 2 and the video frame 3. In this case, the frame interpolation flag is (0, 1), indicating that a picture switch occurs (i.e., there is no frame interpolation between the video frame 1 and the video frame 2). In order to avoid the problem of fewer frames, it is necessary to interpolate two video frames between the video frame 2 and the video frame 3. For another example, if it is determined that the picture switch still exists between the video frame 2 and the video frame 3, the frame interpolation operation between the video frame 2 and the video frame 3 is not performed. In this case, the frame interpolation flag is modified from (0,1) to (1,1). For example, a value “1” is added to the frame interpolation flag (0,1), and the previous value “0” is popped out. The frame interpolation flag (1,1) indicates that there are two consecutive picture switches in the video frame sequence. For example, a picture switch exists between the video frame 1 and the video frame 2, and a picture switch still exists between the video frame 2 and the video frame 3. For example, continue to compare the video frame 3 and the video frame 4 through the similar operation. If no picture switch exists between the video frame 3 and the video frame 4, the frame interpolation operation can be performed. In order to avoid the problem of fewer frames, based on the frame interpolation flag (1,1), it can be seen that 3 video frames need to be interpolated between the video frame 3 and the video frame 4. Thus, the overall integrity of the video after frame interpolation is guaranteed.
It should be noted that in the practical application, it is rare for the picture switch to occur in several consecutive adjacent video frames. Therefore, the above embodiments of the present disclosure take up to 2 consecutive picture switches as an example, and initialize the frame interpolation flag to (0,0). The embodiments of the present disclosure do not limit this, and it may be set according to actual needs.
For example, in at least one embodiment of the present disclosure, the method for video frame interpolation processing 10 further includes the following steps S401-S403, as illustrated in
S401: in response to interpolating the third video frame between the first video frame and the second video frame, acquiring the first sub-image of the first video frame. The first sub-image corresponds to the first subtitle content of the first video frame.
S402: acquiring the third sub-image of the third video frame. The third sub-image corresponds to the third subtitle content of the third video frame.
S403: determining whether to replace the third video frame with the first video frame based on the first sub-image and the third sub-image.
For example, in at least one embodiment of the present disclosure, step S403 includes: acquiring a pixel value of the first pixel in the first sub-image; setting a pixel value of the third pixel in the third sub image based on the pixel value of the first pixel in the first sub-image; and determining whether to replace the third video frame with the first video frame based on the first sub-image and the third sub-image after being set. The pixel value of the first pixel is greater than the third threshold. The relative position of the third pixel in the third sub-image is identical to the relative position of the first pixel in the first sub-image.
For example, in the embodiments of the present disclosure, the relative position of the third pixel in the third sub-image and the relative position of the first pixel in the first sub-image are the same. It can be understood that, for example, taking the top left corner vertex of the first sub-image as the coordinate origin, the position coordinates of the first pixel in this coordinate system are the same as the position coordinates of the third pixel in the coordinate system taking the top left corner vertex of the third sub-image as the coordinate origin.
Based on the detailed description associated with
For example, in some examples, when interpolating a third video frame between the first video frame and the second video frame, in order to improve the accuracy of frame interpolation, it can be determined whether the subtitle of the first video frame and the subtitle of the third video frame are the same, that is, whether the subtitle switch occurs, as illustrated in
For example, in some examples, because the color of the subtitle usually remains stable, for example, most subtitles are white, it is possible to select the pixel (i.e., the first pixel) in the first sub-image of the first video frame (i.e., the region corresponding to the recognized coordinate C0) which is greater than a certain threshold (i.e., the third threshold). For example, the third threshold is set to 220, and the pixel value range is generally 0-255. The value of the first pixel is assigned to the pixel located at the same position as the first pixel (i.e., the third pixel) in the third sub-image (i.e., the region corresponding to the recognized coordinate Ct. For example, in
For example, in at least one embodiment of the present disclosure, comparing the first sub-image and the assigned third sub-image includes subtracting each pixel value of each corresponding pixel of the first sub-image and the assigned third sub-image, and determining whether the number of pixels for which the absolute value of the pixel difference exceeds a certain threshold (e.g., 150) is greater than another threshold (e.g., 30). If the number of pixels for which the absolute value of the pixel difference exceeds 150 is greater than 30, it is considered that there is a significant deformation in the subtitle of the interpolated third video frame. The first video frame is directly copied to replace the deformed interpolation frame (i.e., the third video frame). Of course, the second video frame can also be used to replace the deformed interpolation frame (i.e., the third video frame), and the embodiments of the present disclosure do not limit this. Such that, the deformation problem caused by the significant movement of the subtitle background can be avoided.
As illustrated in
Therefore, the video frame interpolation processing method 10 provided by at least one embodiment of the present disclosure can solve the obvious deformation problem caused by the video picture switch and the significant movement of the subtitle background during the frame interpolation processing, ensures the smoothness of the video, and thereby improves the user's viewing experience.
It should also be noted that in the various embodiments of the present disclosure, the execution order of each step of the video frame interpolation processing method 10 is not limited. Although the execution process of each step is described in a specific order above, this does not constitute a limitation on the embodiments of the present disclosure. The various steps in the video frame interpolation processing method 10 can be executed serially or in parallel, according to actual needs. For example, the video frame interpolation processing method 10 may also include more or fewer steps, and the embodiments of the present disclosure do not limit this.
At least one embodiment of the present disclosure also provides an apparatus for video frame interpolation processing, the frame interpolation operation is selectively performed according to the comparison result between adjacent video frames. This effectively avoids the obvious deformation problem caused by the video picture switch during the frame interpolation processing, ensures the smoothness of the video, and thereby improves the user's viewing experience.
For example, in at least one embodiment of the present disclosure, as illustrated in
For example, in at least one embodiment of the present disclosure, the acquisition module 801 is configured to acquire the first video frame and the second video frame of the video. The first video frame and the second video frame are adjacent temporally, and the first video frame is a forward frame of the second video frame. For example, the acquisition module 801 can implement the step S101, and the specific implementation method can refer to the relevant description of the step S101, which will not be repeated here.
For example, in at least one embodiment of the present disclosure, the comparison module 802 is configured to acquire the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame. The first comparison result indicates whether a picture switch exists between the first video frame and the second video frame. For example, the comparison module 802 can implement the step S102, and the specific implementation method can refer to the relevant description of the step S102, which will not be repeated here.
For example, in at least one embodiment of the present disclosure, the operation module 803 is configured to determine whether to interpolate a frame between the first video frame and the second video frame based on the first comparison result. For example, the operation module 803 can implement the step S103, and the specific implementation method can refer to the relevant description of the step S103, which will not be repeated here.
It should be noted that the acquisition module 801, the comparison module 802, and the operation module 803 can be implemented through software, hardware, firmware, or any combination of them. For example, they can be respectively implemented as an acquisition circuit 801, a comparison circuit 802, and an operation circuit 803. The embodiments of the present disclosure do not limit the specific implementations of them.
It should be understood that the apparatus for video frame interpolation processing 80 provided in the embodiments of the present disclosure may implement the aforementioned method for video frame interpolation processing 10, and may also achieve technical effects similar to the aforementioned method for video frame interpolation processing 10, which will not be repeated here.
It should be noted that in the embodiments of the present disclosure, the video frame interpolation processing apparatus 80 may include more or fewer circuits or units, and the connection relationship between each circuit or unit is not limited and can be determined according to actual needs. The specific composition of each circuit is not limited, and can be composed of analog devices based on circuit principles, digital chips, or other applicable methods.
At least one embodiment of the present disclosure also provides an apparatus for video frame interpolation processing 90. As illustrated in
For example, the processor 910 may be a central processing unit (CPU), a digital signal processor (DSP), or other forms of processing unit with data processing and/or program execution capabilities, such as a field programmable gate array (FPGA). For example, the central processing unit (CPU) can be an X86 or ARM architecture. The processor 910 can be a general-purpose processor or a specialized processor, which can control other components in the video frame interpolation processing apparatus 90 to perform the desired function.
For example, the memory 920 may include any combination of one or more computer program products, which may include various forms of computer-readable storage medium, such as a volatile memory and/or a non-volatile memory. The volatile memory may include, for example, a random-access memory (RAM) and/or a cache memory. The non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, an erasable programmable read-only memory (EPROM), a portable compact disc read-only memory (CD-ROM), a USB memory, a flash memory, and the like. One or more computer program modules 921 can be stored on a computer-readable storage medium, and the processor 910 can run one or more computer program modules 921 to implement various functions of the video frame interpolation processing apparatus 90. In the computer-readable storage medium, various applications and data, as well as various data used and/or generated by applications, can also be stored. The specific functions and technical effects of the video frame interpolation processing apparatus 90 can be referred to the description of the video frame interpolation processing method 10 mentioned above, and will not be repeated here.
The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a laptop, a digital broadcasting receiver, a PDA (personal digital assistant), a PAD (portable Android device), a PMP (portable multimedia player), a car terminal (e.g., car navigation terminal), and a fixed terminal such as the digital TV, a desktop computer, and the like. The video frame interpolation processing apparatus 300 illustrated in
For example, as illustrated in
For example, the following components may be connected to I/O interface 305: an input apparatus 306 including, for example, a touch picture, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, and the like; an output apparatus 307 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, and the like; a storage apparatus 308 including, for example, a magnetic tape, a hard disk, and the like; and a communication apparatus 309 including, for example, a network interface card such as a LAN card, a modem, and the like. The communication apparatus 309 may allow the video frame interpolation processing apparatus 300 to communicate through the wired or wireless method with other devices to exchange data, and perform communication processing through a network such as the Internet. A drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311, such as a disk, a CD-ROM, a magnetic disk, a semiconductor memory, and the like, are mounted to the drive 310 as needed to allow computer programs read therefrom to be mounted into the storage apparatus 308 as needed. Although
For example, the video frame interpolation processing apparatus 300 may further include a peripheral interface (not shown), and the like. The peripheral interface may be of various types, such as a USB interface, a lighting interface, and the like. The communication apparatus 309 may communicate with networks and other devices through wireless communication, such as the Internet, an internal network, and/or a wireless network such as a cellular telephone network, a wireless local area network (LAN), and/or a metropolitan area network (MAN). The wireless communication may use any of various communication standards, protocols, and technologies, including but not limited to the global system for the global system for mobile communications (GSM), enhanced data GSM environment (EDGE), wideband code division multiple access (W-CDMA), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wi-Fi (e.g., based on the IEEE 802. 11a, IEEE 802. 11b, IEEE 802. 11g, and/or IEEE 802. 11n standards), voice over internet protocol (VOIP), Wi-MAX, protocols for e-mail, instant messaging, and/or short message service (SMS), or any other suitable communication protocol.
For example, the video frame interpolation processing apparatus 300 can be any device such as a cell phone, a tablet computer, a laptop computer, an e-book, a game console, a television, a digital photo frame, a navigator, and the like, or any combination of the data processing device and hardware, and the embodiments of the present disclosure do not limit this.
For example, according to the embodiments of the present disclosure, the process described above with reference to the flowchart may be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product that includes a computer program carried on a non-transient computer-readable medium, which includes program code for executing the method illustrated in the flowchart. In such embodiments, the computer program may be downloaded and installed from the network through the communication apparatus 309, or installed from the storage apparatus 308, or installed from ROM 302. When the computer program is executed by the processing apparatus 301, the video frame interpolation processing method 10 in the embodiments of the present disclosure is executed.
It should be noted that the computer-readable medium mentioned in the present disclosure may be a computer-readable signal medium, a computer-readable storage medium, or any combination thereof. The computer-readable storage medium can be, for example, but not limited to, a system, a device, or an apparatus of electricity, magnetism, light, electromagnetism, infrared, or semiconductors, or a combination of any of the above. More specific examples of computer-readable storage medium may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard drive, a random-access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device or any suitable combination of the above. In the embodiments of the present disclosure, the computer-readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, apparatus, or device. In the embodiments of the present disclosure, the computer-readable signal medium may include a data signal propagated in the baseband or as part of a carrier carrying the computer-readable program code. Such propagated data signals may take various forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device. The program code contained on computer-readable medium can be transmitted using any suitable medium, including but not limited to: a wire, an optical cable, RF (radio frequency), and the like, or any suitable combination of the above.
The computer-readable medium mentioned above can be included in the video frame interpolation processing apparatus 300. It can also exist separately without being assembled into the video frame interpolation processing apparatus 300.
The embodiments of the present disclosure also provide a non-instantaneous readable storage medium.
For example, the non-instantaneous readable storage medium 140 may be any combination of one or more computer-readable storage medium. For example, one computer-readable storage medium contains computer-readable program code for acquiring the first video frame and the second video frame of the video, another computer-readable storage medium contains computer-readable program code for acquiring the first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame, and still another computer-readable storage medium contains the computer-readable program code for determining whether to interpolate frames between the first video frame and the second video frame based on the first comparison result. Of course, the above program codes can also be stored on the same computer-readable medium, and the embodiments in the present disclosure do not limit this.
For example, when the program code is read by the computer, the computer can execute the program code stored in the computer storage medium, such as the method for video frame interpolation processing 10 provided in any embodiment of the present disclosure.
For example, the storage medium may include a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a random-access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory (CD-ROM), a flash memory, or any combination of the above storage mediums, and may also be other applicable storage mediums. For example, the readable storage medium can also be the memory 920 of
Embodiments of the present disclosure also provide an electronic device.
In the present disclosure, the term “plurality” refers to two or more, unless otherwise specified.
After considering the disclosure of the specification and practices disclosed herein, those skilled in the art will easily come up with other implementation solutions disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptive changes of the present disclosure, which follow the general principles of the present disclosure and include common knowledge or commonly used technical means in the technical field that are not disclosed in the present disclosure. The specification and embodiments are only considered exemplary, and the true scope and spirit of the present disclosure are indicated by the following claims.
It should be understood that the present disclosure is not limited to the precise structure described above and illustrated in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is limited only by the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
202210178989.X | Feb 2022 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2023/077905 | 2/23/2023 | WO |