The present disclosure relates to the technical field of dynamic image processing, and more particularly, to a method and electronic device for processing dynamic image.
After many mobile device manufacturers presented new image media formats such as Zoe and LivePhoto, it is very possible that dynamic image formats will replace existing static image formats and become the next important competition segment in the mobile device technology innovation field. An existing dynamic image records only image information within a shooting scope and purely records an original digital media signal, without considering sound content information under a shooting scenario; therefore, in the field of dynamic image format processing, there is much space to improve the use experience of users.
The present disclosure provides a method for processing a dynamic image and an electronic device thereof, so as to solve the technical problem that an existing dynamic image records only image information within a shooting scope and purely records an original digital media signal without considering sound content information under a shooting scenario.
In a first aspect, embodiments of the present disclosure provide a method for processing a dynamic image includes:
taking a dynamic image, and recording a sound during a process of taking the dynamic image;
extracting a sound print feature from recorded sound information; and
writing an extracted sound print feature into the dynamic image, and performing sound print tagging on the dynamic image.
In a second aspect, embodiments of the present disclosure provide a non-volatile computer storage medium which stores computer executable instructions, wherein the computer executable instructions are executed to:
take a dynamic a dynamic image according to an image taking instruction, and record a sound during a process of taking the dynamic image according to the image taking instruction;
extract a sound print feature from recorded sound information according to a sound print extraction instruction; and
write an extracted sound print feature into the dynamic image according to a sound print tagging instruction, and perform sound print tagging on the dynamic image.
In a third aspect, embodiments of the present disclosure further provide an electronic device, configured to perform the method for processing a dynamic image, including:
at least one processor; and
a memory communicably connected to the at least one processor; wherein
the memory stores instructions executable by the at least one processor, wherein, the instructions, when being executed by the at least one processor, causes the at least one processor to:
take a dynamic a dynamic image according to an image taking instruction, and record a sound during a process of taking the dynamic image according to the image taking instruction;
extract sound print feature from recorded sound information according to a sound print extraction instruction; and
write an extracted sound print feature into the dynamic image according to a sound print tagging instruction, and perform sound print tagging on the dynamic image.
One or more embodiments are illustrated by way of example, and not by limitation, in the figures of the accompanying drawings, wherein elements having the same reference numeral designations represent like elements throughout. The drawings are not to scale, unless otherwise disclosed.
In order to make the present disclosure easy to understand, the following describes the present disclosure more fully with reference to relevant accompanying drawings. The accompanying drawings show preferred embodiments of the present disclosure. However, the present disclosure may be implemented in multiple manners, and is not limited to embodiments described herein. On the contrary, the purpose of providing these embodiments is to more thoroughly and fully understand the disclosure of the present disclosure.
Unless otherwise specified, all technical and scientific terms used herein have the same meanings as commonly understood by a person skilled in the technical field of the present disclosure. The terms used in the description of the present disclosure are merely intended to describe specific embodiments of the present disclosure, and are not intended to limit the present disclosure.
Referring to
Step 100: launching a dynamic image-shooting function and start taking a dynamic image.
Step 200: launching a sound recording function to record a sound during a process of taking the dynamic image, and store the dynamic image that has been taken and recorded sound information.
In step 200, according to this embodiment of the present disclosure, the dynamic image is stored in a form of Thumbnail+MOV (thumbnail), where an image is from Preview (preview) data of a camera, multiple frames of image data are encoded to generate an MOV, and an image at a center time axis is cut as the Thumbnail. An MOV format (QuickTime file format, which is an audio and video file format developed by Apple company, and is used to store a common digital media type) that is recorded by default contains a sound source with a video length of 4 seconds, and the recorded information includes a voice, an ambient sound, or a noise.
Step 300: performing, by a sound print extracting module, sound print feature extraction on the recorded sound information that has been stored, and storing a sound print feature that has been extracted.
In step 300, according to this embodiment of the present disclosure, a special segment of media information is used to store the sound print feature. Specifically, as shown in
Step 301: endpoint checking: checking for valid sound source data entry.
Step 302: pre-emphasis: performing differential and filtering processing on entered sound source data.
In step 302, an algorithm formula for pre-emphasis filtering is:
H(Z)=1− (1)
Step 303: audio framing: performing discretization processing on a streaming sound source.
In step 303, in order to retain some detail features of the sound source, especially special sound quality of some environment scenarios, and in consideration of a size of data volume to be processed, in the present disclosure, 1 Channel 44100 Hz sampling standard is selected. According to a rule for audio processing, a duration of an audio frame is commonly controlled at about 20-30 ms; therefore, the number of sampling points for a single audio frame may be determined to 1024, which actually corresponds to a duration 1024÷44100×1000 ≈ 2.
Step 304: windowing processing: performing windowing processing on frame data by selecting a common Hamming window.
In step 304, after Hamming windowing processing is performed on each frame of audio data S(n) on which framing processing has been performed, data S′(n)=S(n)×S′(n)=S(n)×W(n) is obtained, where W(n). W is in the following form:
Step 305: FFT (Fast Fourier Transformation, Fast Fourier Transformation): converting a time domain sound source into frequency domain energy.
In step 305, the time domain sound source is converted into frequency domain data by using Fast Fourier Transformation at the atomic operation level, where a conversion formula is:
Step 306: performing band-pass filtering and sound print feature extraction on a sound source.
In step 306, filtering and sound print feature extraction are performed by using a specific filter and extraction algorithm with respect to different sound source features required for analysis, for example: for a voice feature, a triangle band-pass filter +DCT may be used to collect an MFCC coefficient feature; and for an ambient sound, a logarithmic filter+a wavelet transformation may be used to collect a Jaccard coefficient bit feature.
Step 400: reading the dynamic image that has been stored, write the extracted sound print feature in serialization into a specified file data node of the dynamic image, and perform sound print tagging on the dynamic image.
Step 500: classifying and storing, according to the sound print feature, the dynamic image on which sound print tagging has been performed.
In step 500, a classifying manner for classifying, according to the sound print feature, the dynamic image on which sound print tagging has been performed includes voice feature classification, ambient sound feature classification, or noise feature classification.
Step 600: performing retrieving by voice inputting or category search, so as to quickly retrieve a dynamic image having a specific sound print feature.
In step 600, a voice feature may be quickly indexed directly by performing recognition according to a similarity of input voice; an ambient sound feature or a noise feature that is more complex and another sound feature shall be classified according to features such as a sound-making object, a scenario location, sound strength, etc., and search shall be performed according to categories.
Referring to
the image taking module is configured to take a dynamic image;
the sound recording module is configured to record a sound during a process of taking the dynamic image;
the storage module is configured to store the dynamic image that has been taken and the recorded sound information; and
the sound print extracting module is configured to extract a sound print feature from recorded sound information, and store the sound print feature that has been extracted. Specifically, the sound print extracting module further includes an endpoint checking unit, a pre-emphasis unit, an audio framing unit, a windowing unit, an audio source conversion unit, and a filtering unit; wherein:
the endpoint checking unit is configured to check for valid sound source data entry;
the pre-emphasis unit is configured to perform differential and filtering processing on entered sound source data; An algorithm formula for pre-emphasis filtering is:
H(Z)=1− (1)
the audio framing unit is configured to perform discretization processing on a streaming sound source; In order to retain some detail features of the sound source, especially special sound quality of some environment scenarios, and in consideration of a size of data volume to be processed, in the present disclosure, 1 Channel 44100 Hz sampling standard is selected. According to a rule for audio processing, a duration of an audio frame is commonly controlled at about 20-30 ms; therefore, the number of sampling points for a single audio frame may be determined to 1024, which actually corresponds to a duration 1024÷44100×1000 ≈ 2.
the windowing unit is configured to perform windowing processing on frame data by using a Hamming window; After Hamming windowing processing is performed on each frame of audio data S(n) on which framing processing has been performed, data
S′(n)=S(n)×S′(n)=S(n)×W(n) is obtained, where W(n). W is in the following form:
the audio source conversion unit is configured to convert a time domain sound source into frequency domain energy by using FFT; The time domain sound source is converted into frequency domain data by using Fast Fourier Transformation at the atomic operation level, where a conversion formula is:
the filtering unit is configured to perform band-pass filtering and sound print feature extraction on a sound source. Filtering and sound print feature extraction are performed by using a specific filter and extraction algorithm with respect to different sound source features required for analysis, for example: for a voice feature, a triangle band-pass filter +DCT may be used to collect an MFCC coefficient feature; and for an ambient sound, a logarithmic filter+a wavelet transformation may be used to collect a Jaccard coefficient bit feature.
The sound print tagging module is configured to read the dynamic image that has been stored, write the extracted sound print feature in serialization into a specified file data node of the dynamic image, and perform sound print tagging on the dynamic image.
The classifying module is configured to classify and store, according to the sound print feature, the dynamic image on which sound print tagging has been performed. A classifying manner for classifying, according to the sound print feature, the dynamic image on which sound print tagging has been performed includes voice feature classification, ambient sound feature classification, or noise feature classification.
The retrieving module is configured to perform retrieving by voice inputting or category search, so as to quickly retrieve a dynamic image having a specific sound print feature. A voice feature may be quickly indexed directly by performing recognition according to a similarity of input voice; an ambient sound feature or a noise feature that is more complex and another sound feature shall be classified according to features such as a sound-making object, a scenario location, sound strength, etc., and search shall be performed according to categories.
An embodiment of the present disclosure provides a non-volatile computer storage medium, wherein the computer storage medium stores computer executable instructions, which may be executed to perform the method for processing a dynamic image according to any one of the above method embodiments.
Referring to
The electronic device includes at least one processor and a memory, and
The electronic device for performing the method for processing a dynamic image may further include: an image taking apparatus and a sound recording apparatus. The image taking apparatus may be configured to taken a dynamic image, and the sound recording apparatus may be configured to record sound information such as a voice, an ambient sound, or a noise.
The processor, the memory, the image taking apparatus and the sound recording apparatus may be connected to each other via a bus or in another manner.
The memory, as a non-volatile computer readable storage medium, may be configured to store non-volatile software programs, non-volatile computer executable programs and modules, for example, the program instructions/modules corresponding to the methods for processing a dynamic image in the embodiments of the present disclosure (for example, the sound print tagging module, the classifying module, the retrieving module and the like as illustrated in
The memory may also include a program storage area and a data storage area. The program storage area may store an operating system and an disclosure implementing at least one function. The data storage area may store the dynamically taken image and recorded sound information. In addition, the memory may include a high speed random access memory, or include a non-volatile memory, for example, at least one disk storage device, a flash memory device, or another non-volatile solid storage device. In some embodiments, the memory optionally includes memories remotely configured relative to the processor. These memories may be connected to the apparatus for processing item operations over a network. The above examples include, but not limited to, the Internet, Intranet, local area network, mobile communication network and a combination thereof.
The one or more modules are stored in the memory, and when being executed by the one or more processors 610, perform the method for processing a dynamic image in any of the above method embodiments.
The product may perform the method according to the embodiments of the present disclosure, has corresponding function modules for performing the method, and achieves the corresponding beneficial effects.
In the method for processing a dynamic image and the electronic device thereof according to embodiments of the present disclosure, real-time calculation is performed and a sound print feature of a scenario for taking a dynamic image is extracted; the sound print feature is written into the dynamic image to implement sound print tagging on the dynamic image; and the dynamic image is classified according to the sound print feature to achieve the objective of retrieving the dynamic image by category and perform quick match and query based on the sound print feature, which makes operations for users to retrieve an image more efficiently and intuitively.
The above embodiments are preferred embodiments of the present disclosure; however, embodiments of the present disclosure are not limited to the above embodiments. Any other change, amendment, alternation, combination, or simplification without departing from the spirit and principle of the present disclosure shall be equivalent replacements, and shall fall within the protection scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
201610196491.0 | Mar 2016 | CN | national |
This application is a continuation of International Application No. PCT/CN2016/088859, filed on Jul. 6, 2016, which is based upon and claims priority to Chinese Patent Application No. 201610196491.0, filed on Mar. 31, 2016, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2016/088859 | Jul 2016 | US |
Child | 15245743 | US |