The proposed apparatus generally relates to audio and video synchronization and, more specifically, to synchronizing audio with video recorded using a zoom objective.
Conventional video devices including consumer cameras and other mobile devices such as phones with cameras traditionally have microphones placed on a body of the device. When filming or recording a subject at a distance such as a few meters from the lens, a lag in the audio with respect to the video is noticeable due to a difference in propagation delay. For example, a distance of ten meters from the subject can account for a 33 ms delay. When recording, video using a wide-angle longshot, the audio delay appears normal. However, when recording video using a zoom function, delay between the audio and video is noticeable. Prior systems have been unable to account for such a delay by synchronizing the audio with the video when using a zoom function to make the subject appear closer than the actual distance from the lens.
What is needed is a device and method for synchronizing audio and video with respect to a focus distance of the video device when using a zoom function.
The proposed apparatus relates to video recording devices, such as for example, a mobile phone with a camera and microphone. It will be appreciated that the proposed apparatus is not limited to any specific type of device and may be applied to any video recording device.
According to a first aspect of the disclosure, a device includes an image sensor, at least one lens coupled to a focus control, at least one microphone and a processor. The processor adjusts audio signals recorded by the at least one microphone with respect to video signals recorded by the image sensor. The focus control provides a zoom factor. The zoom factor may be adjustable. The audio signal matching is based in the zoom factor. The processor also identifies an audio source supplying the audio signals and adjusts the audio signals based on the audio source.
In another embodiment, the processor includes an audio to zoom adaptation circuit that delays or advances the audio signals with respect to the video signals recorded by the image sensor, based on a distance for a subject of the video signals from the device and a zoom factor of the at least one lens.
In another embodiment, the audio to zoom adaptation circuit includes a control circuit, an audio processor and a variable delay circuit. The control circuit is configured to receive a focus distance signal, the zoom factor signal and an input selection signal and generates a control signal and an audio mixing signal. The audio processor is configured to receive audio signals from the at least one microphone along with the mixing signal and generates a mixed audio signal. The variable delay circuit is configured to receive the mixed audio signal and the control signal and then delay or advance the mixed audio signal based on the delay control signal.
In another embodiment, the delay or advance of the recorded audio signals with respect to the video images is based on the at least one lens performing a zoom operation.
In another embodiment, the processor is further configured to determine a distance for the audio source from the device.
In another embodiment, an input device is provided for receiving input selection signals. The input device may be one of a user interface, a keyboard, a touch screen, a mouse, a pointer and/or an eye tracker. The input device is used to select an object for which audio signals are to be delayed.
In another embodiment, the recorded audio is delayed or advanced based on air temperature and/or a medium in which the device is positioned.
In another embodiment, the audio source is recorded as a delayed or advanced audio signal or as an audio signal together with the audio delay or advance, or distance information.
In another embodiment, the device is a Plenoptic camera system configured so that at least two audio channels may be recorded with a different audio delay or advance or playout information.
According to another aspect of the disclosure, a method is described in which a video signals of a subject are recorded using an image sensor coupled to at least one lens having a zoom actor and audio signals are recorded. The recorded audio signals are adjusted by a processor with respect to the video signals recorded by the image sensor based on the zoom factor. The processor also identifies an audio source supplying the audio signals and adjusts the audio signals based on the audio source.
In another embodiment, the delay or advance for the audio signals with respect to the video signals recorded by the at least one microphone, includes determining a distance between the subject and the image sensor, and then delaying or advancing the audio signals with respect to the video signals based on the determined distance and a zoom factor for the at least one lens.
In another embodiment, the audio signals are recorded using at least one microphone including at least one of a front directional microphone, a front microphone and a side microphone. The step of delaying or advancing the audio signals with respect to the video signals includes receiving a focus distance signal, a zoom factor signal and an input selection signal; generating a control signal and a mixing signal based on at least one of the focus distance signal, zoom factor signal and input selection signal; generating a mixed audio signal based on audio signals received from at least one of the front directional microphone, front microphone and side microphone and the mixing signal; and delaying or advancing the mixed audio signal based on the control signal.
In another embodiment, the step of adjusting the audio signals with respect to the video signals includes combining the delayed or advanced mixed audio signal and the video signal to form an audio/video signal.
In another embodiment, the step of delaying or advancing of the audio signals with respect to the video images is performed when the at least one lens is performing a zoom operation.
In another embodiment, when a source of the audio signals is at a different distance from the image sensor than the subject, the method further includes identifying the source of the audio signals, determining a distance between the source of the audio signals and the device upon receipt of an input audio selection signal and delaying or advancing of the audio signals to the video signals based on the determined distance between the source of the audio signals and the device.
At least parts of the methods of the disclosure may be computer implemented. Accordingly, embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit”, “module” or “system”. Furthermore, the embodiments may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since embodiments can be implemented in software, at least parts of the methods can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible carrier medium may comprise a storage medium as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid-state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g., a microwave or RE signal.
These and other aspects, features and advantages of the present disclosure will be described or become apparent from the following detailed description of the preferred embodiments, which is to be read in connection with the accompanying drawings. The drawings include the following figures briefly described below:
It should be understood that the drawings are for purposes of illustrating the concepts of the disclosure and are not necessarily the only possible configuration for illustrating the disclosure.
It should be understood that the elements shown in the figures may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces. Herein, the phrase “coupled” is defined to mean directly connected to or indirectly connected with through one or more intermediate components. Such intermediate components may include both hardware and software based components.
The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its spirit and scope.
All examples and conditional language recited herein are intended for instructional purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read only memory (“ROM”) for storing software, random access memory (“RAM”), and nonvolatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
The present arrangement provides a method and device for synchronizing audio and video when recording using a zoom function.
Modern photo cameras, including cameras on mobile devices such as phones, have increasingly extreme telephoto lenses (zoom objectives) and are able to record talking movies/video, so called “live shots” (pictures that contain video snippets of 3-5 seconds including sound) and Sound-Shots (a picture with up to nine seconds of sound). As these cameras become more advanced, they are able to record audio/video with greater clarity and record images from greater distances using zoom features. When using zoom features, cameras may include one or more different lenses such as a wide angle lens and a long shot lens. Alternatively, more than one camera, each having a separate image sensor and lens may be used. The further away the subject image being recorded is from the video device, the greater the delay in receiving any audio captured from the subject image and the more noticeable the offset between the audio and video portions of the recording. It thus becomes necessary to synchronize the recorded audio and video when using zoom features.
Depicted in
In
The Audio-to-Zoom adaptation control circuit 300 is shown in greater detail in
These three signals may be used by the Audio-to-Zoom circuit 320 to generate a delay control signal for delaying the audio and video in relation to each other. These signals may also be used by the Audio-to-Zoom circuit 320 to generate a control signal that controls mixing of the microphones 170 having different ranges, positions and directions. Normally, video processing takes longer than audio processing and thus the audio signal is generally delayed, this delay may be altered based on the Focus distance, Zoom factor and input selection signal to account for the use of a zoom feature and the distance of the subject being recorded from the video device. Delaying video and audio in modern digital systems using streaming (MPEG, H264, etc.) can be performed before coding in a traditional manner by delaying in the time domain as well as setting presentation time stamps correctly in the video domain. An overview of exemplary actions performed by the audio-to-zoom adaptation control circuit 300 based on different scenarios is depicted in Table 2. The distances are exemplary and may vary for different applications as well as the actions (delay and filtering) that are taken.
In
Further, signals from the microphones may be compared and analyzed for individual processing of the detected sound sources in addition to the individually adopted delay of the microphones for increasing the sound of the video. In a simplified example, the directed, front microphone may also pick up sound from sides and behind the video device. The undirected, front microphone may pick up a cloud of sounds from all sound sources as illustrated in
In
A flow chart describing the audio-to-zoom adaptation is provided in
The total audio delay=video delay 130)−audio processing delay (330)−AAD (320)
wherein, c=the speed of sound (approx. 331 m/second); OFD=optical Focus distance; AFD=audio focus distance (in most cases AFD=OFD); and AAD=AFD/c
If the result is negative, the video must be delayed until the result is “0” or positive. Based on the air temperature, the speed of sound (c) used in the above calculation may be adjusted according to c=331 m/s+0.6 m/s*temperature (Celcius). If the camera is water proof and the video is taken submerged, a different speed of sound speed based on the liquid medium of water may be applied (1500 m/s).
In a further embodiment, not only is the audio focus depth measured but also the direction is determined. This direction does not need to be in line with the optical axis of the recording. In the case of selecting the audio focus to the pair of people 630 but aiming the camera at the single person 640 a deviation of the direction is evident. By an intelligent mixing of the microphones a direction from which the source of the audio signal is originating is recognizable. This improved signal is properly delayed and recorded.
In an alternate embodiment, the adaptation parameters may be written into the meta-data of the image or video file. Additionally, other parameters including but not limited to camera type, shutter speed, aperture, GPS coordinates etc. may also be written into the meta-data of the image or video file.
In a further alternate embodiment, at least two audio tracks may be recorded. A first audio track including the originally captured sound without adaptation may be recorded and a second audio track including the captured sound mixed and including the delay. If different audio channels are used, recording of the adaptation parameters may be performed in parallel with the recorded audio channel(s). The adaptation parameters do require less data space than an audio stream. In particular, plenoptic recordings with the ability to adjust the focus at varying times use multiple audio tracks with different audio focus depths. The recording of one or more additional audio tracks will be useful. The user can not only select a new optical focus distance (objects being displayed sharp) but also adjust the audio to that different optical focus depth.
For a plenoptics audio recording two basic approaches are possible. All microphones are recorded and the mixing process is done later, or according to
Although embodiments which incorporate the teachings of the present disclosure have been shown and described in detail herein, those skilled in the art can readily devise many other varied embodiments that still incorporate these teachings. Having described preferred embodiments of a system and method for enhancing content (which are intended to be illustrative and not limiting), it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in the particular embodiments of the disclosure which are within the scope of the disclosure as outlined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
16306748.1 | Dec 2016 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/083998 | 12/21/2017 | WO | 00 |