An example embodiment of the present invention relates generally to cinemagraphs, and more particularly to providing audio enabled cinemagraphs.
Cinemagraphs are animated photographs where a part of the image moves repeatedly. Cinemagraphs can be created by automated programs, such as the Nokia Lumia 920 Cinemagraph Lens Application, where a user starts the cinemagraph lens, records a scene for a moment and then chooses which area of the video is animated. Current cinemagraphs do not provide audio.
Methods, apparatuses, and computer program products are provided according to example embodiments of the present invention in order to create optimized audio enabled cinemagraphs.
In one embodiment, a method is provided that at least includes receiving at least two image frames and audio, wherein the duration of the audio is longer than the duration of the at least two image frames; receiving a selection of a segment of the at least two image frames; defining an output image by looping the selected segment of the at least two image frames; defining an output audio from the received audio based at least on a start time and a stop time of the selected segment; and producing an animated image by at least combining the output image and the output audio.
In some embodiments, receiving the at least two image frames and audio may comprise causing recording of image frames and audio and/or receiving previously recorded image frames and audio. In some embodiments, receiving the of the at least two image frames and audio may comprise recording of at least one audio signal and recording of at least two image frames, wherein the recording of the at least one audio signal begins before and ends after the recording of at the least two image frames.
In some embodiments, the selection of the segment of the at least two image frames may comprise one of: automatically selecting a whole image comprising the at least two frames, receiving a selection of a whole image comprising the at least two frames, or receiving a selection of at least one region of a whole image comprising the at least two frames for generating a dynamic region and a selection of a region of the whole image for generating a substantially static region.
In some embodiments, the duration of the output audio may be an integer multiple of the duration of the output image. In some embodiments, generating the animated image may further comprise overlapping multiple instances of the output audio by a specified duration. In some embodiments, producing the animated image may further comprise the output image and the output audio being synchronized at regular intervals.
In some embodiments, the method may further comprise determining an amount of audio overlap to be used in generating the animated image; determining an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; determining a desired length for the output audio; and selecting an integer multiple of audio segments before and after the output image to generate the desired length output audio.
In some embodiments, the method may further comprise determining an amount of audio overlap to be used in generating the animated image; determining an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; determining a desired length for the output audio; generating a set of potential audio outputs, wherein the potential audio outputs are different combinations of the audio segments before and after the output image which provide the desired length output audio; for each potential audio output, determining at least one of: a correlation between an overlap segment at the beginning of the potential audio output and an overlap segment at the end to of the potential audio output and a quietness of an overlap segment at the beginning of the potential audio output and an overlap segment at the end to of the potential audio output, wherein the overlap segments are equal to the amount of audio overlap; and selecting the potential audio output with the best correlation or that produces the quietest overlap as the output audio for use in generating the animated image.
In some embodiments, the method may further comprise determining an amount of audio overlap to be used in generating the animated image; determining an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output video loop; causing display of a received image frame timeline and a received audio timeline; causing display of an indication of the output image on the timelines; receiving a selection of a start position and a stop position on the received audio timeline; and generating the output audio using a segment of received audio between the start position and the stop position.
In another embodiment, an apparatus is provided that includes at least one processor and at least one memory including computer program instructions with the at least one memory and the computer program instructions configured to, with the at least one processor, cause the apparatus at least to receive at least two image frames and audio, wherein the duration of the audio is longer than the duration of the at least two image frames; receive a selection of a segment of the at least two image frames; define an output image by looping the selected segment of the at least two image frames; define an output audio from the received audio based at least on a start time and a stop time of the selected segment; and produce an animated image by at least combining the output image and the output audio.
In some embodiments, the at least one memory and the computer program instructions may be further configured to, with the at least one processor, cause the apparatus to record image frames and audio or receive previously recorded image frames and audio. In some embodiments, the at least one memory and the computer program instructions may be further configured to, with the at least one processor, cause the apparatus to record at least one audio signal and record image frames, wherein the recording of the at least one audio signal begins before and ends after the recording of image frames.
In some embodiments, the selection of the segment of the at least two image frames comprises one of automatically selecting a whole image comprising the at least two image frames, receiving a selection of a whole image comprising the at least two image frames, or receiving a selection of at least one region of a whole image comprising the at least two image frames for generating a dynamic region and a selection of a region of the whole image for generating a substantially static region.
In some embodiments, the duration of the output audio may be an integer multiple of the duration of the output image. In some embodiments, producing the animated image may further comprise overlapping multiple instances of the output audio by a specified duration.
In some embodiments, the at least one memory and the computer program instructions may be further configured to, with the at least one processor, cause the apparatus at least to determine an amount of audio overlap to be used in generating the animated image; determine an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; determine a desired length for the output audio; and select an integer multiple of audio segments before and after the output image to generate the desired length output audio.
In some embodiments, the at least one memory and the computer program instructions may be further configured to, with the at least one processor, cause the apparatus at least to determine an amount of audio overlap to be used in generating the animated image; determine an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; determine a desired length for the output audio; generate a set of potential audio outputs, wherein the potential audio outputs are different combinations of the audio segments before and after the output image which provide the desired length output audio; for each potential audio outputs, determine at least one of: a correlation between an overlap segment at the beginning of the potential audio output and an overlap segment at the end to of the potential audio output and a quietness of an overlap segment at the beginning of the potential audio output and an overlap segment at the end to of the potential audio output, wherein the overlap segments are equal to the amount of audio overlap; and select the potential audio output with the best correlation or that produces the quietest overlap as the output audio for use in generating the animated image.
In some embodiments, the at least one memory and the computer program instructions may be further configured to, with the at least one processor, cause the apparatus at least to determine an amount of audio overlap to be used in generating the animated image; determine an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; cause display of a received image frames timeline and a received audio timeline; cause display of an indication of the output image on the timelines; receive a selection of a start position and a stop position on the received audio timeline; and generate the output audio using the segment of received audio between the start position and the stop position.
In some embodiments, the apparatus may further comprise a user interface, the user interface configured to provide for display of the recorded video; provide for selection of the recorded video segment and selection of the video regions for generating a dynamic region and a substantially static region; provide for display of a recorded video timeline and a recorded audio timeline; and provide for selection of a start position and a stop position on the audio timeline.
In a further embodiment, a computer program product is provided that includes at least one non-transitory computer-readable storage medium bearing computer program instructions embodied therein for use with a computer with the computer program instructions including program instructions configured to receive at least two image frames and audio, wherein the duration of the audio is longer than the duration of the at least two image frames; receive a selection of a segment of the at least two image frames; define an output image by looping the selected segment of the at least two image frames; define an output audio from the received audio based at least on a start time and a stop time of the selected segment; and produce an animated image by at least combining the output image and the output audio.
In some embodiments, the program instructions may be further configured to record image frames and audio or receive previously recorded image frames and audio. In some embodiments, the program instructions may be further configured to record at least one audio signal and record at least two image frames, wherein the recording of the at least one audio signal begins before and ends after the recording of image frames.
In some embodiments, the selection of the segment of the at least two image frames may comprise one of automatically selecting a whole image comprising the at least two image frames, receiving a selection of a whole image comprising the at least two image frames, or receiving a selection of at least one region of a whole image comprising the at least two image frames for generating a dynamic region and a selection of a region of the whole image for generating a substantially static region.
In some embodiments, the program instructions may be further configured to determine an amount of audio overlap to be used in generating the animated image; determine an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; determine a desired length for the output audio; and select an integer multiple of audio segments before and after the output image to generate the desired length output audio.
In some embodiments, the program instructions may be further configured to determine an amount of audio overlap to be used in generating the animated image; determine an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; determine a desired length for the output audio; generate a set of potential audio outputs, wherein the potential audio outputs are different combinations of the audio segments before and after the output image which provide the desired length output audio; for each potential audio outputs, determine at least one of: a correlation between an overlap segment at the beginning of the potential audio output and an overlap segment at the end to of the potential audio output and a quietness of an overlap segment at the beginning of the potential audio output and an overlap segment at the end to of the potential audio output, wherein the overlap segments are equal to the amount of audio overlap; and select the potential audio output with the best correlation or that produces a quietest overlap as the output audio for use in generating the animated image.
In some embodiments, the program instructions may be further configured to determine an amount of audio overlap to be used in generating the animated image; determine an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; cause display of a received image frames timeline and a received audio timeline; cause display of an indication of the output image on the timelines; receive a selection of a start position and a stop position on the received audio timeline; and generate the output audio using the segment of received audio between the start position and the stop position.
In another embodiment, an apparatus is provided that includes at least means for receiving at least two image frames and audio, wherein the duration of the audio is longer than the duration of the at least two image frames; means for receiving a selection of a segment of the at least two image frames; means for defining an output image by looping the selected segment of the at least two image frames; means for defining an output audio from the received audio based at least on a start time and a stop time of the selected segment; and means for producing an animated image by at least combining the output image and the output audio.
In some embodiments, the means for receiving the at least two image frames and audio may comprise means for causing recording of image frames and audio or means for receiving previously recorded image frames and audio. In some embodiments, the means for receiving the of the at least two image frames and audio may comprise means for recording of at least one audio signal and means for recording of at least two image frames, wherein the recording of the at least one audio signal begins before and ends after the recording of at the least two image frames.
In some embodiments, the means for generating the output audio may further comprise means for determining an amount of audio overlap to be used in generating the animated image; means for determining an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; means for determining a desired length for the output audio; and means for selecting an integer multiple of audio segments before and after the output image to generate the desired length output audio.
In some embodiments, the means for generating the output audio may further comprise means for determining an amount of audio overlap to be used in generating the animated image; means for determining an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output image; means for determining a desired length for the output audio; means for generating a set of potential audio outputs, wherein the potential audio outputs are different combinations of the audio segments before and after the output image which provide the desired length output audio; means for determining at least one of: a correlation between an overlap segment at the beginning of the potential audio output and an overlap segment at the end to of the potential audio output and a quietness of an overlap segment at the beginning of the potential audio output and an overlap segment at the end to of the potential audio output, wherein the overlap segments are equal to the amount of audio overlap; and means for selecting the potential audio output with the best correlation or that produces a quietest overlap as the output audio for use in generating the animated image.
In some embodiments, the means for generating the output audio may further comprise means for determining an amount of audio overlap to be used in generating the animated image; means for determining an amount of audio segments in the received audio before and after the output image, wherein an audio segment is the same length as the output video loop; means for causing display of a received image frame timeline and a received audio timeline; means for causing display of an indication of the output image on the timelines; means for receiving a selection of a start position and a stop position on the received audio timeline; and means for generating the output audio using a segment of received audio between the start position and the stop position.
Having thus described certain embodiments of the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms “data,” “content,” “information,” and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.
Additionally, as used herein, the term ‘circuitry’ refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of ‘circuitry’ applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term ‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term ‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
As defined herein, a “computer-readable storage medium,” which refers to a non-transitory physical storage medium (e.g., volatile or non-volatile memory device), can be differentiated from a “computer-readable transmission medium,” which refers to an electromagnetic signal.
Methods, apparatuses and computer program products are provided in accordance with example embodiments of the present invention to create optimized audio enabled cinemagraphs or animated images.
In some example embodiments, video and audio may be captured simultaneously for use in generating an audio enabled cinemagraph. In some embodiments, audio may be recorded for a period longer than the duration of the portion of the recorded video that may be used for looping when a cinemagraph is created. An audio length may be selected in integer multiples of the video loop length before and after the video loop segment to create the audio loop. The audio loop may be played together with the video loop and because of the integer multiples used for the audio loop, the audio may be in sync with the video at regular intervals.
In example embodiments, a device may start to record audio as soon as the cinemagraph lens application is started and ends the recording of audio only just before the audio is needed for generation of the cinemagraph. Once the video for the cinemagraph is created, information is known about where the looping segment of the video was taken from the recorded video (i.e. the start and end times of the looping video segment). This information may then be used in generating the audio for the cinemagraph.
In some example embodiments, the video and audio may be received from another device or they may be extracted from pre-recorded video and audio data. In some embodiments, the received video may comprise two or more image frames.
In some example embodiments, the video may comprise animated images comprising at least two frames, and the cinemagraph may be created from a whole image comprising the at least two frames which may either be selected automatically or may be manually selected by a user or may be created from a user selection of a region selected from the whole image comprising the at least two frames. In some embodiments, the video may comprise a series of images comprising at least two image frames.
The system of an embodiment of the present invention may include an apparatus 100 as generally described below in conjunction with
It should also be noted that while
Referring now to
In some embodiments, the processor (and/or co-processors or any other processing circuitry assisting or otherwise associated with the processor) may be in communication with the memory 104 via a bus for passing information among components of the apparatus. The memory device 104 may include, for example, a non-transitory memory, such as one or more volatile and/or non-volatile memories. In other words, for example, the memory 104 may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processor). The memory 104 may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present invention. For example, the memory 104 could be configured to buffer input data for processing by the processor 102. Additionally or alternatively, the memory 104 could be configured to store instructions for execution by the processor.
In some embodiments, the apparatus 100 may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single “system on a chip.” As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
The processor 102 may be embodied in a number of different ways. For example, the processor may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. As such, in some embodiments, the processor may include one or more processing cores configured to perform independently. A multi-core processor may enable multiprocessing within a single physical package. Additionally or alternatively, the processor may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
In an example embodiment, the processor 102 may be configured to execute instructions stored in the memory 104 or otherwise accessible to the processor. Alternatively or additionally, the processor may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor is embodied as an ASIC, FPGA or the like, the processor may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor is embodied as an executor of software instructions, the instructions may specifically configure the processor to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor may be a processor of a specific device configured to employ an embodiment of the present invention by further configuration of the processor by instructions for performing the algorithms and/or operations described herein. The processor may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor.
The apparatus 100 may include a user interface 106 that may, in turn, be in communication with the processor 102 to provide output to the user and, in some embodiments, to receive an indication of a user input. For example, the user interface may include a display and, in some embodiments, may also include a keyboard, a mouse, a joystick, a touch screen, touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms. The processor may comprise user interface circuitry configured to control at least some functions of one or more user interface elements such as a display and, in some embodiments, a speaker, ringer, microphone and/or the like. The processor and/or user interface circuitry comprising the processor may be configured to control one or more functions of one or more user interface elements through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor (e.g., memory 104, and/or the like).
The apparatus 100 may include a recording interface 108 that may, in turn, be in communication with the processor 102 to provide for capturing video or audio in some embodiments. For example, the recording interface may include a camera, one or more microphones, a video module, an audio module, and/or other recording mechanisms. For example, in an example embodiment in which the recording interface comprises a camera, the camera may include a digital camera capable of forming a digital image file from a captured image. As such, the camera may include all hardware (for example, a lens or other optical component(s), image sensor, image signal processor, and/or the like) and software necessary for creating a digital image file from a captured image and/or video. Alternatively, the camera may include only the hardware needed to view an image, while a memory device 104 of the apparatus stores instructions for execution by the processor in the form of software necessary to create a digital image file from a captured image. In an example embodiment, the camera may further include a processing element such as a co-processor which assists the processor in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to, for example, a joint photographic experts group (JPEG) standard, a moving picture experts group (MPEG) standard, or other format.
The apparatus 100 may optionally include a communication interface 110 which may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data from/to a network and/or any other device or module in communication with the apparatus 100. In this regard, the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, the communication interface may include the circuitry for interacting with the antenna(s) to cause transmission of signals via the antenna(s) or to handle receipt of signals received via the antenna(s). In some environments, the communication interface may alternatively or also support wired communication. As such, for example, the communication interface may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
In this regard, the apparatus 100 may include means, such as the processor 102, memory 104, or the like, for starting a cinemagraph application. See block 202 of
The apparatus 100 may include means, such as the processor 102, memory 104, recording interface 108, or the like, for causing audio recording to be started. See block 204 of
As shown in block 206 of
In alternative embodiments, the recorded video and audio may be received from another device or may be extracted from previously recorded video and audio files. In some embodiments, the received or recorded video may comprise two or more image frames.
As shown in block 210 of
As shown in block 214 of
As shown in block 218 of
As shown in block 220 of
In some example embodiments, various parameters of the recorded video and audio may be used in generating the video loop and audio loop for an audio enabled cinemagraph. Such parameter values may include:
Based on the above values, the length of the looped video may be described as ce−cb. A further parameter o may be defined as the length of audio overlap needed for smooth audio looping. In some embodiments, the overlap length may be defined to be about 0.25 seconds, for example. It is assumed that o<ce−cb (the duration of the looping video clip). In some embodiments, there may be no audio overlap used so that o=0. Additionally, it may be assumed that ce+o≦ae because the audio was recorded for a short duration longer than the video.
As shown in block 402 of
The apparatus 100 may include means, such as the processor 102, memory 104, or the like, for determining the amount of audio that is available before and after the looped video clip. See block 404 of
c
e
+N(ce−cb)+o≦ae and ab≦cb−M(ce≦cb).
As shown in block 406 of
In some embodiments, it may be desirable that the length of audio that is looped is an integer multiple (AL) times longer that the video that is looped. In an example embodiment, AL may be defined as 7, so the audio loop length would be 7 times longer than the video loop length, for example. In example embodiments, the value of AL may depend on the length of the video loop. For example, in some embodiments a comfortable audio length may be greater than five seconds.
In an example embodiment, the audio that is selected for the audio loop may be Mb=3 video loop lengths before the video loop and Ne=3 video loop lengths after the video loop so that Mb+Ne+1=AL. However, there might not always be a sufficient number of audio segments available, that is it may be that M<Mb or N<Ne. In example embodiments, the following pseudo-code may be used to select the audio for the audio loop:
As shown in block 408 of
Operation may then return to block 220 of
As shown in
In another embodiment, instead of trying to center the looped audio around the looped video, if there is enough audio available, all possible combinations of M and N could be generated and the audio from the beginning (AB) and the audio from the end (AE) of the looped audio of each combination is compared. The combination of M and N that produces the best correlation between AB and AE may then be used for creating the audio loop for the audio enabled cinemagraph. For example, if AB and AE are N samples long, such that AB=xi, with i=1, . . . , N and AE=yi, with i=1, . . . , N, then the correlation between AB and AE would be:
As shown in block 602 of
The apparatus 100 may include means, such as the processor 102, memory 104, or the like, for determining the amount of audio that is available before and after the looped video clip. See block 604 of
c
e
+N(ce−cb)+o≦ae, and ab≦cb−M(ce−cb).
As shown in block 606 of
As shown in block 608 of
from: cb−M(ce−cb) to: cb−M(ce−cb)+o,
and the audio at the end, called AE, may be defined as the time period:
from: N(ce−cb) to: N(ce−cb)+o.
The apparatus may then determine the correlation between the AB and the AE for each combination of M and N.
As shown in block 610 of
As shown in block 612 of
Operation may then return to block 220 of
In another embodiment, instead of limiting the correlation search to the few points as discussed above, the correlation could be searched more accurately. In an example embodiment, if there is enough audio available, all possible combinations of M, N and τ could be tried, were τ is an optimization variable defined as
where L defines the number of different values of τ to be tested. The audio from the beginning (AB) and the audio from the end (AE) of the looped audio of each combination is compared. The combination of M, N and τ that produces the best correlation between AB and AE may then be used for creating the audio loop for the audio enabled cinemagraph.
In an example embodiment, when the audio enabled cinemagraph is played back, the audio playback is still started from a point that is an integer multiple of the video loop length away from the video loop, e.g. from the point cb−M(ce−cb), to preserve the time synchronization between the video and the audio and also maintain the best possible correlation during the audio fade-in and fade-out.
As shown in block 702 of
The apparatus 100 may include means, such as the processor 102, memory 104, or the like, for determining the amount of audio that is available before and after the looped video clip. See block 704 of
c
e
+N(ce−cb)+o≦ae and ab≦cb−M(ce−cb).
As shown in block 706 of
where L is 128.
As shown in block 708 of
from: cb−M(ce−cb)+τ to: cb−M(ce−cb)+τ+o,
and the audio at the end, called AE, may be defined as the time period:
from: N(ce−cb)+τ to: N(ce−cb)+τ+o.
The apparatus may then determine the correlation between the AB and the AE for each combination of M, N and τ.
As shown in block 710 of
As shown in block 712 of
Operation may then return to block 220 of
In another embodiment, instead of using correlation to find the best points for creating the audio loop as described above, the apparatus may choose to use the quietest places in the audio for looping. During the quiet parts of the audio, the overlap is very inaudible. Further, having important speech in the audio is less likely during the quiet parts of the audio and, as such, the audio overlap is less likely to happen in the middle of a word. If there is enough audio available, all possible combinations of M and N could be generated and the audio from the beginning (AB) and the audio from the end (AE) of the looped audio of each combination is compared. The combination of M and N that produces the quietest AB and AE may be used for creating the audio loop for the audio enabled cinemagraph. If there are several almost as good choices of combinations of M and N, the combination where both AB and AE are quiet and AB and AE are strongly correlated may be used.
As shown in block 802 of
The apparatus 100 may include means, such as the processor 102, memory 104, or the like, for determining the amount of audio that is available before and after the looped video clip. See block 804 of
c
e
+N(ce−cb)+o≦ae and ab≦cb−M(ce−cb).
As shown in block 806 of
As shown in block 808 of
from: cb−M(ce−cb) to: cb−M(ce−cb)+o,
and the audio at the end, called AE, may be defined as the time period:
from: N(ce−cb) to: N(ce−cb)+o.
The apparatus may then compare the AB and the AE for each combination of M and N to determine the quietness of each combination.
As shown in block 810 of
As shown in block 812 of
Operation may then return to block 220 of
In another embodiment, instead of detecting how quiet the signal is, a voice activity detector may be used. The combination of M and N that produces the smallest likelihood for speech during AB and AE may be chosen for creating the audio loop. In an example embodiment, to avoid interrupting speech during looping of the audio, it may be possible to try to record the audio for the cinemagraph from a direction that has as little speech as possible. For example, if the apparatus has directional microphones, audio for the cinemagraph may be chosen from the microphone signal that has lowest probability for speech.
In another embodiment, an apparatus may provide a user interface to receive user input for performing selection of the audio loop, such as illustrated in
As shown in block 902 of
The apparatus 100 may include means, such as the processor 102, memory 104, user interface 106, or the like, for causing the display of timelines for the recorded video and the recorded audio, such as video timeline 1006 and audio timeline 1008 of
As shown in block 906 of
As shown in block 908 of
As shown in block 912 of
As shown in block 914 of
Operation may then return to block 220 of
In some embodiments, only some of the operations described in relation to
As shown in
In some example embodiments, when an audio enhanced cinemagraph is viewed, the audio playback may be started from a part of the looped audio where the audio and the video are in sync. In some example embodiments, the desired audio loop length (AL) may be limited to a fixed value, such as in seconds for example, instead of being dependent on the video loop length. In some example embodiments, the recorded audio may be trimmed to remove the end and/or beginning if there is device handling noise, for example. The noise can be detected easily because it makes the audio clip. In some example embodiments, the audio may be repeated only a predefined number of times and then stopped.
In some example embodiments, when generating automated cinemagraphs, the looped video may be trimmed in a rather straightforward manner from the recorded video. In some embodiments, generating the looped video may be performed such that the beginning of the looped video is taken to be the time at which the first frame of the looped video is taken from the recorded video and the end of the looped video is taken to be the beginning of the looped video plus the length of the looped video, i.e. begininng+number_of_taken_frames*framelength.
As described above,
Accordingly, blocks of the flowchart support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowchart, and combinations of blocks in the flowchart, can be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
In some embodiments, certain ones of the operations above may be modified or further amplified. Furthermore, in some embodiments, additional optional operations may be included, such as shown by the blocks with dashed outlines. Modifications, additions, or amplifications to the operations above may be performed in any order and in any combination.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.