The present invention relates to a video enhancement technique for enhancing a minute change in a video.
Video magnification is a video enhancement technique which detects only a desired minute change (color change or motion) from an input video and enhances and visualizes the detected minute change (for example, refer to NPLs 1 to 3). If video magnification is applied, for example, it is possible to input a video of a seemingly motionless human face and synthesize a video enhancing minute undulations of blood vessels due to pulsation, changes in facial color, and the like. Video magnification is composed of multi-step processing composed of (1) time frequency band-pass filtering, (2) weighted enhancement filtering, and (3) addition processing. (1) In time-frequency band-pass filtering, a time-series signal representing minute changes in an arbitrary time-frequency band is detected from a video signal. (2) In the weighted enhancement filtering process, an enhanced minute signal is generated by enhancing only minute components from the obtained time-series signal. (3) In addition processing, the enhanced minute signal is added to the original video signal. Video magnification can obtain a visualized video by enhancing only minute changes in the input video by performing these multi-stage processes.
However, NPLs 1 to 3 use multi-step processing as described above, which leads to complexity of an algorithm. Algorithm complexity leads to reduced algorithm readability, implementation difficulty, and increased computational complexity. In addition, since the operation and effects of Video Magnification are difficult to understand, it is difficult for users to predict or interpret the behavior and results of applying the algorithm. Furthermore, the performance of each process itself is also insufficient and there is also the problem that minute changes which are not intended by a user are enhanced and artifacts (noise) are generated during the enhancement.
An object of the present invention is to facilitate implementation of video enhancement processing which enhances a minute change in a video in view of the above technical problems.
A video synthesizing device according to a first aspect of the present invention includes: a signal conversion unit configured to extract a color signal of a predetermined resolution from an input video; a filtering unit configured to generate an enhanced minute color signal in which a minute color change included in the color signal is enhanced by applying a weighted enhanced time-frequency band-pass filter to the color signal; and a video synthesizing unit configured to synthesize an enhanced video in which the minute color change in the input video is enhanced by using the color signal and the enhanced minute color signal.
A video synthesizing device according to a second aspect of the present invention includes: a signal conversion unit configured to extract a phase signal corresponding to a desired change in motion from an input video; a filter processing unit configured to generate an enhanced minute phase signal which enhances a minute phase change included in the phase signal by applying a self-addition weighted enhancement time-frequency band-pass filter to the phase signal; and a video synthesizing unit configured to synthesize an enhanced video in which minute motion changes in the input video are enhanced using the phase signal and the enhanced minute phase signal.
The present invention integrates a plurality of processes performed in multiple stages in the related art into one filtering process. Therefore, according to the present invention, it is possible to easily implement video enhancement processing for enhancing a minute change in a video.
An embodiment of the invention will be described in detail below. Note that, in the drawings, constituent parts having the same functions will be denoted by the same numbers and redundant explanations will be omitted.
Note that, although symbols such as “{circumflex over ( )}” used in the text need to be written directly above the character immediately following it, they are written immediately before the character in question due to text notation limitations. In the mathematical expressions, these symbols are written in their original positions, that is, directly above the letters.
A first embodiment of the present invention is a video synthesizing device and a method for detecting a minute color change in an arbitrary time-frequency band in a video and synthesizing a video enhancing the detected minute color change. As shown in
A video synthesizing device is, for example, a special device configured by reading a special program into a publicly known or dedicated computer having a central processing unit (CPU), a main storage device (random access memory: RAM), and the like. The video synthesizing device performs each process under the control of, for example, a central processing unit. Data input to the video synthesizing device and data obtained in each process are stored in, for example, a main storage device and data stored in the main storage device is read out to the central processing unit as needed and used for other processes. At least a part of each processing unit included in the video synthesizing device may be configured by hardware such as an integrated circuit.
A video synthesizing method performed by the video synthesizing device 1 of the first embodiment will be described below with reference to
A target video signal is input to the video synthesizing device 1. The target video signal is, for example, a digital video signal such as an RGB signal or a YIQ signal. In the embodiment, it is assumed that the target video signal is an RGB signal and is expressed by Expression (1).
Here, (x,y) represents a pixel position and t represents a time frame index. A target video signal Ic(x, y, t) input to the video synthesizing device 1 is input to the video input unit 11.
In Step S11, the video input unit 11 selects one or more color signals corresponding to a minute color change to be enhanced from the input target video signal Ic (x, y, t). In the following description, assuming that a color signal corresponding to green is selected (that is, c=g), the selected color signal (hereinafter also referred to as a “target color signal”) is denoted by Ig (x, y, t). When selecting a color signal corresponding to red, the target color signal should be read as Ir (x, y, t). Similarly, when selecting a color signal corresponding to blue, the target color signal should be read as Ib (x, y, t). The video input unit 11 outputs the target color signal Ig (x, y, t) to the signal conversion unit 12 and the addition unit 14 and outputs the color signal Ir (x, y, t) and Ib (x, y, t) other than the target color signal Ig (x, y, t) to the video synthesizing unit 15.
In Step S12, the signal conversion unit 12 receives the target color signal Ig (x, y, t) from the video input unit 11 and converts the target color signal Ig (x, y, t) into multi-resolution representation. For example, it may be converted to a multi-resolution representation called a Gaussian pyramid defined by Expression (2).
Here, N represents the resolution number and n represents the resolution index.
The signal conversion unit 12 selects a color signal Ing (x, y, t) with a predetermined resolution n from the target color signal {Ing (x, y, t)| n=1, . . . , N} converted to a multi-resolution representation. The signal conversion unit 12 outputs the target color signal Ing (x, y, t) of the selected resolution n to the filtering unit 13.
In Step S13, the filtering unit 13 receives the target color signal Ing (x, y, t) of resolution n from the signal conversion unit 12 and applies weighted enhancement time-frequency band-pass filtering to the target color signal Ing (x, y, t) of resolution n on the basis of a predetermined enhancement rate α∈R and a time frequency ft∈R arbitrarily selected by a user, as shown in Expression (3).
Here, Bng (x, y, t) represents an enhanced minute color signal obtained by enhancing only the minute color signal at the time frequency ft with the enhancement rate α. k∈[−K, K] represents the range for filtering. That is to say, a window width for filtering is 2K+1. Parameters d, ε, and σ will be described later. The filtering unit 13 outputs the generated enhanced minute color signal Bng (x, y, t) to the addition unit 14.
The processing of the filtering unit 13 will be described in more detail below. In Expression (3), the part which implements the time-frequency band-pass filtering is sLOG (k; σ). Furthermore, A(d; ε) is the part which implements the weighted enhancement filtering process. The steps will be described in order below.
The role of sLOG (k; σ) in Expression (3) will be explained. First, LoG (k; σ) is a filter called LOG (Laplacian of Gaussian). The LOG filter is defined in the form of the second derivative of the Gaussian function as shown in Expression (4).
Time-frequency band-pass filtering can be performed by convolving the LOG filter with the color signal. Here, the problem is how to choose the optimum window width 2K+1 and the parameter σ of LOG (k; σ). In NPLs 1 and 2, the window width 2K+1=fs/4 ft and the parameter σ=fs/4(2)1/2ft of LOG (k; σ) used in the field of video matching are adopted. However, from the viewpoint of time-frequency band-pass filtering, the window width 2 K+1 and the parameter σ are not suitable, and thus the time-frequency selectivity is poor. Therefore, in this embodiment, it is considered to set the optimal window width 2K+1 and parameter σ for time-frequency band-pass filtering.
First, a method for setting the optimum window width 2K+1 will be described. Assuming that the sampling frequency of the target video signal is fs, the frequency resolution Δf of LOG (k; σ) satisfies Expression (5) from a time-frequency trade-off relationship.
That is to say, it can be seen that increasing the window width improves the frequency resolution of LOG (k; σ). However, the larger the window width, the higher the calculation cost. Therefore, in the present invention, the window width is adaptively set as shown in Expression (6).
As a result, the frequency bins of LOG (k; σ) are set as in Expression (7).
By configuring in this way, a direct current (DC) component is assigned to f=0 and all frequency band components lower than f=ft other than the DC component are assigned to f=ft/2. Therefore, the DC component can be clearly separated and considered and the minimum ideal frequency resolution can be obtained.
Subsequently, a method for setting the optimum parameter σ will be described. From the point of view of time-frequency band-pass filtering, it is desirable that LOG (k; σ) has a maximum frequency response at time-frequency ft. Therefore, in this embodiment, the optimum parameter σ is obtained by solving the optimization problem of Expression (8).
[Math. 8]
f
t=argmaxf>0F2K+1[LoG(k;σ)](f) (8)
Here, F2x+1 [·] (f) represents the one-dimensional Fourier transform. The optimization problem of Expression (8) can be solved in closed form as in Expression (9).
The optimal parameter σ determined above allows LOG (k; σ) to have the maximum frequency response at the time frequency ft.
LOG (k; σ) has the maximum frequency response at the time frequency ft and normalizes a value thereof to 1 so that, when the enhancement factor α is applied, small color changes are purely multiplied by α, making it easier to control the degree of enhancement. Therefore, sLOG (k; σ) obtained by scaling LOG (k; σ) is defined as in Expression (10).
The role of A(d; ε) in Expression (3) will be explained. First, A(d; ε) is defined as in Expressions (11) to (13).
As shown in Expressions (11) to (13), A(d; ε) weights the enhancement rate α on the basis of a variation d of the color signal Inc (x, y, t) and is necessary to enhance only minute color changes. Here, (2ln2)1/2σa=ε indicates the half width at half maximum, and when d=ε is satisfied, the enhancement rate is designed to be exactly half, A(d; ε)=(α−1)/2. By configuring in this manner, if the variation amount d of the color signal is too large than a on the basis of the value ε of the color signal selected by the user, the enhancement rate becomes 0, making it possible to enhance only minute color signals.
Finally, when the filter processing of Expression (3) is performed on the basis of Expressions (10) to (13), it is possible to obtain an enhanced minute color signal Bng (x, y, t) in which only the minute color signal at the time frequency ft is enhanced with the enhancement rate α.
In Step S14, the addition unit 14 receives the target color signal Ig (x, y, t) from the video input unit 11 and the enhanced minute color signal Bng (x, y, t) from the filtering unit 13 and generates an enhancement target color signal {circumflex over ( )}Ig(x, y, t) obtained by adding signal Bg(x, y, t) obtained by up-sampling enhancement minute color signal Bng(x, y, t) to the original resolution to target color signal Ig(x, y, t). The addition unit 14 outputs the generated enhancement target color signal {circumflex over ( )}Ig(x, y, t) to the video synthesizing unit 15.
In Step S15, the video synthesizing unit 15 receives the enhancement target color signal {circumflex over ( )}Ig(x, y, t) from the addition unit 14 and color signals Ir(x, y, t) and Ib(x, y, t) other than the target color signal Ig(x, y, t) from the video input unit 11 and synthesizes these signals {circumflex over ( )}Ig(x, y, t), Ir(x, y, t) and Ib(x, y, t) to generate an enhanced video signal {circumflex over ( )}Ic(x, y, t). The video synthesizing unit 15 outputs the generated enhanced video signal {circumflex over ( )}Ic(x, y, t) to the video synthesizing device 1.
A second embodiment of the present invention is a video synthesizing device which detects a minute movement in an arbitrary time-frequency band in a video and synthesizes a video which enhances the detected minute movement and a method therefor. As shown in
The video synthesizing method performed by the video synthesizing device 2 of the second embodiment will be described below with reference to
A target video signal is input to the video synthesizing device 2. In this embodiment, it is assumed that the target video signal is the YIQ signal and is represented by Expression (14).
Here, (x,y) represents the pixel position and t represents the time frame index. The target video signal Ic(x, y, t) input to the video synthesizing device 2 is input to the video input unit 21.
In Step S21, the video input unit 21 selects a luminance signal Iy(x, y, t) from the input target video signal Ic(x, y, t). The video input unit 21 outputs the luminance signal Iy(x, y, t) to the signal conversion unit 22 and converts signals Ii(x, y, t) and Iq(x, y, t) other than luminance signal Iy(x, y, t) are output to the video synthesizing unit 24.
In Step S22, the signal conversion unit 22 receives the luminance signal Iy(x, y, t) from the video input unit 21 and converts the luminance signal Iy(x, y, t) into a plurality of band frequencies ω∈Ω and an analytic signal in a plurality of directions θ∈Θ. This analytic signal is represented by Expression (15).
Here, an analytic signal with a certain frequency ω and a certain direction θ is given by Expression (16).
Here, Rω,θ
The signal conversion unit 22 selects a phase signal φω,θ
In Step S23, the filtering unit 23 receives the phase signals φω,θ
Here, {circumflex over ( )}φω,θ
Here, (2ln2)1/2σa=εω is derived from the Fourier transform shift theorem and converts a user-selected local motion ε into a phase signal εω at frequency ω. By configuring in this way, when the variation d of the phase signal is too large than εω on the basis of the local motion ε selected by the user, the enhancement rate becomes 0 and only the minute phase signal can be enhanced.
Also, δ(k) is defined by Expression (21).
This allows the phase signals φω,θ
That is to say, the procedure of adding the result of weighted enhancement filtering by A (d; ε) sLOG (k; σ) to the original phase signal φω,θ
In Step S24, the video synthesizing unit 24 receives the amplitude signal Aω,θ
Subsequently, the video synthesizing unit 24 generates an enhanced luminance signal {circumflex over ( )}Iy (x, y, t) in which only minute movements are enhanced from the set of enhanced analysis signals {circumflex over ( )}Rω,θ
The present invention simplifies a video enhancement processing algorithm for enhancing a minute change in a video. Specifically, the multi-stage processing performed in Video Magnification in the related art has been integrated into a single filtering process. This made the algorithm more readable and easier to implement. In addition, the performance of Video Magnification itself is improved by reviewing each process included in the conventional multi-step process while integrating it into a single filter process, it is possible to enhance the minute color changes and movements that users expect, and the occurrence of artifacts that occur during enhancement is reduced. Furthermore, when enhancing minute movements, memory usage can be saved at the same time.
Although the embodiments of the present invention have been described above, the specific configuration is not limited to these embodiments and it is needless to say that the present invention includes any appropriate design changes without departing from the gist of the present invention. The various processes described in the embodiments are not only performed in chronological order in accordance with the described order, but may also be performed in parallel or individually according to the processing capacity of the device that performs the processes or as necessary.
When the various processing functions of each device described in the above embodiments are realized by a computer, the processing contents of the functions that each device needs to have are described by a program. Furthermore, various processing functions in each of the devices described above are realized on the computer by operating the arithmetic processing unit 1010, the input unit 1030, the output unit 1040, and the like by loading this program into the storage unit 1020 of the computer shown in
A program describing the contents of this processing can be recorded in a computer-readable recording medium. Computer-readable recording media are, for example, non-temporary recording media such as magnetic recording devices and optical discs.
Also, distribution of this program is, for example, executed by selling, assigning, lending, and the like portable recording media such as DVDs and CD-ROMs on which the program is recorded. In addition, the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to other computers via the network.
A computer which executes such a program, for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer once in the auxiliary recording unit 1050, which is a own non-temporary storage device thereof. Furthermore, when performing the process, this computer reads the program stored in the auxiliary recording unit 1050, which is an own non-temporary storage device thereof, into the storage unit 1020, which is a temporary storage device and performs processing which follows the read program. In addition, as another execution form of this program, the computer may read the program directly from a portable recording medium and perform processing according to the program, and each time the program is transferred from the server computer to this computer, may sequentially execute processing according to the received program. Moreover, a configuration for performing the above-described processing may be employed by a so-called an Application Service Provider (ASP) type service, which does not transfer the program from the server computer to this computer and realizes the processing function only by an execution instruction and result acquisition thereof. Note that the program in this embodiment includes information to be used for processing by a computer and equivalent to a program (such as data that is not a direct command to a computer but has the property of prescribing computer processing).
Moreover, in this embodiment, the device is configured by executing a predetermined program on a computer, but at least a part of these processing contents may be implemented by hardware.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/018210 | 5/13/2021 | WO |