The present invention relates to a coding method, a coding apparatus, and a program.
A camera may capture a video of a physical body having a flat outer appearance, such as a painting, a tablet, or a ground (hereinafter, referred to as a “flat physical body”). A shape, a size, and a position of an image of the physical body captured in each frame of a moving image are different according to a movement of the physical body and a movement of the camera. A coding apparatus may compensate for the movement of the image of the physical body (motion compensation) so that the shape, the size, and the position of the image of the captured flat physical body are the same in each frame of the moving image.
MPEG-4 advanced simple profile (ASP), which is one of the moving image coding standards, employs a motion compensation method called global motion compensation (GMC). The coding apparatus determines a two-dimensional motion vector for each corner of the frame of the moving image to perform motion compensation.
Reference 1: ISO/IEC 14496-2:2004 Information technology—Coding of audio-visual objects—Part 2: Visual
Reference 2: F. Zou, J. Chen, M. Karczewicz, X. Li, H.-C. Chuang, W.-J. Chien “Improved affine Motion Prediction”, JVET-00062, May 2016
Reference 3: M. Narroschke, R. Swoboda, “Extending HEVC by an affine motion model”, Picture coding symposium 2013
If the value of “no_of_sprite_warping_points” is 3, the coding apparatus uses affine transformation to perform motion compensation. The degree of freedom of affine transformation is also lower than the degree of freedom of projective transformation.
If the value of “no_of_sprite_warping_points” is 2, the coding apparatus uses similarity transformation to perform motion compensation. The degree of freedom of similarity transformation is lower than the degree of freedom of projective transformation.
Thus, a method of adaptively switching the value of “no_of_sprite_warping_points” between 2 and 3 has been proposed as a draft standard of joint exploration team on future video coding (JVET).
Motion compensation using a transformation equivalent to the affine transformation when the value of “no_of_sprite_warping_points” is 3 has been proposed. In H.264/advanced video coding (AVC) and H.265/high efficiency video coding (HEVC), the coding apparatus only performs motion compensation for a deformation of images of a physical body performing a translational movement (non-rotational movement) between frames. This motion compensation corresponds to a motion compensation when the value of “no_of_sprite_warping_points” is 1.
A relation expression of coordinates in a two-dimensional image (frame) of a flat physical body (rigid body) existing in a three-dimensional space, captured by a camera while the camera moves, can be expressed as in Equation (1).
The coding apparatus performs projective transformation on the basis of “h11, . . . , h32” and Equations (2) to (5) to derive a point (x′, y′) of the frame 401 corresponding to a point (x, y) of the frame 400. The 3×3 matrix “H” in Equation (2) is a homography matrix.
Eight parameters (x′1, y′ 1, . . . , x′4, y′4) representing movement destinations of the four known points in the frame 400 are parameters needed by the coding apparatus to transform the point (x, y) into the point (x′, y′). This means that the number of variables “h11, . . . , h32” in the homography matrix H is eight, and that the global motion compensation of ASP in MPEG-4 is “no_of_sprite_warping_points=4 (number of parameters=8).
In this way, when an image of a flat physical body captured by a camera that is moving is deformed in accordance with, for example, a relative position of the camera and the physical body, the coding apparatus performs motion compensation using projective transformation on the basis of eight parameters. Furthermore, also when an image of a physical body in a still state having any shape captured by a camera at a fixed position is deformed in accordance with a camera parameter of the camera, the coding apparatus performs motion compensation using projective transformation on the basis of eight parameters.
However, physical deformation of a flat physical body is limited. Thus, the number of the degrees of freedom of physical deformation of the flat physical body are fewer than the number of the degrees of freedom of deformation that can be expressed by projective transformation (8 parameters).
The flat plate illustrated in
In view of the above circumstances, an object of the present invention is to provide a coding method, a coding apparatus, and a program capable of improving the coding efficiency of an image.
One aspect of the present invention is a coding method for coding an image to be coded using a reference image, the coding method including identifying a reference area being a part of the reference image, the reference area corresponding to an area to be coded being an area obtained by dividing the image to be coded, and obtaining a predicted area with respect to the area to be coded, by prediction using the reference area, wherein the area to be coded and the reference area have different sizes and/or different shapes, and in the identifying, the reference area is identified by utilizing a difference between a manner of projection of an object corresponding to the area to be coded and a manner of projection of the object corresponding to the reference area, due to an operation performed on a camera when the image to be coded and the reference image are acquired.
One aspect of the present invention is a coding apparatus for coding an image to be coded using a reference image, the coding apparatus including an identification unit configured to identify a reference area being a part of the reference image, the reference area corresponding to an area to be coded being an area obtained by dividing the image to be coded, and a predictor configured to obtain a predicted area with respect to the area to be coded, by prediction using the reference area, wherein the area to be coded and the reference area have different sizes and/or different shapes, and the identification unit identifies the reference area by utilizing a difference between a manner of projection of an object corresponding to the area to be coded and a manner of projection of the object corresponding to the reference area, due to an operation performed on a camera when the image to be coded and the reference image are acquired.
One aspect of the present invention provides a program that causes a computer to operate as the coding apparatus mentioned above.
According to the present invention, it is possible to improve the coding efficiency of an image.
An embodiment of the present invention will be described below with reference to the drawings.
In VVC (NPL 1), which is currently being standardized, it is not required that reference areas used at a time of predicting blocks to be coded have the same shape and size. This is because affine motion compensation prediction, that is expected to be implemented in VVC and later standards, can be utilized. However, the affine motion compensation prediction, that is expected to be implemented in VVC, uses motion vectors related to four vertices in the blocks to be coded to identify a reference area. When motion vectors related to the four vertices are used, eight parameters need to be used (because a motion vector defines a movement in an xy-plane). That is, eight parameters are transmitted to the decoding apparatus for each block to be coded. In VVC, eight parameters are used to identify the reference area, regardless of the relationship between the shape/size of the block to be coded and the shape/size of the reference area.
However, it is assumed that in some cases, the above-mentioned relationship can be identified without using eight parameters, and thus, there still remains a challenge in improvement of the coding efficiency. On the other hand, the coding apparatus uses projective transformation to express the deformation of the image of a physical body. The physical deformation of a physical body is limited, and thus, the coding apparatus uses projective transformation employing less than eight parameters (degrees of freedom) to express the deformation of images of the physical body in frames of a moving image. The coding apparatus can improve the coding efficiency of images by highly accurate motion compensation using projective transformation of any number N (N being an integer from 1 to 4) of parameters (degrees of freedom) less than eight parameters.
When the relationship mentioned above is broken down into subordinate concepts and organized, it is possible to reduce the number of parameters required for identification. Specifically, a minimum number of parameters required for identifying the relationship in shape and size is determined based on which of pan, tilt, roll, zoom and combinations thereof is a change (operation) performed on the camera from a time when an image to be coded is captured to a time when a reference image is captured. The relationship broken down into subordinate concepts can be derived from a change performed on the camera from a time when the image to be coded is captured to a time when the reference image is acquired, and thus, a camera parameter is utilized for estimating the relationship broken down into the subordinate concepts. In other words, for a correlation that is low due to a difference between a manner in which a predetermined object is projected in the image to be coded and a manner in which the predetermined object is projected in the reference image, the difference in the projecting manners is identified and corrected, so that the correlation is increased.
When the coding apparatus uses one parameter, one parameter obtained from any one of pan, tilt, roll, and zoom is used. When the coding apparatus uses two parameters, two parameters obtained from any two of pan, tilt, roll, and zoom are used. When the coding apparatus uses three parameters, three parameters obtained from any three of pan, tilt, roll, and zoom are used. When the coding apparatus uses four parameters, four parameters obtained from all of pan, tilt, roll, and zoom are used. The coding apparatus uses a camera parameter related to the image to be coded and a camera parameter related to the reference image to identify an operation performed on the camera and determine the number of parameters in accordance with the identified operation. Below, a specific configuration will be described.
The coding apparatus 1 includes a camera parameter determination unit 10, a parameter number determination unit 11, a motion vector determination unit 12, a subtractor 13, a transformer 14, a quantizer 15, an entropy coder 16, an inverse quantizer 17, an inverse transformer 18, an adder 19, a distortion removal filter 20, a frame memory 21, an intra-frame predictor 22, a motion compensator 23, and a switch 24.
Each functional unit other than the motion compensator 23 in the coding apparatus 1 may operate according to a well-known moving image coding standard such as “H.265/HEVC” and “H.264/AVC”. A part of the motion compensator 23 in the coding apparatus 1 may operate according to a well-known moving image coding standard.
A processor such as a central processing unit (CPU) or a graphics processing unit (GPU) executes a program stored in a memory which is a nonvolatile recording medium (non-transitory recording medium), and thus, a part or all of the coding apparatus 1 is achieved as software. A part or all of the coding apparatus 1 may be achieved by using hardware such as a large scale integration (LSI) or a field programmable gate array (FPGA).
The camera parameter determination unit 10 determines a camera parameter, based on a signal representing a moving image to be coded (hereinafter, referred to as a “moving image signal”). For example, the camera parameter determination unit 10 determines that an internal matrix A of a camera is a camera parameter A. The internal matrix A of the camera is represented by a 3×3 matrix indicating a focal length, a pixel size, and an image center of the camera. When a zoom function of the camera is utilized to capture a moving image, the focal length of the camera changes. Thus, when the zoom function of the camera is utilized to capture a moving image, the camera parameter determination unit 10 determines that an internal matrix A′ of the camera is a camera parameter A′. That is, when the zoom is not utilized, the camera parameter A′ is equal to A. The camera parameter determination unit 10 outputs a determination result of the camera parameter as the camera parameter signal to the outside, the parameter number determination unit 11, the motion vector determination unit 12, and the motion compensator 23.
The parameter number determination unit 11 determines the number of parameters required for projective transformation of the image to be coded represented by the moving image signal, based on the moving image signal and the camera parameter signal. The parameter number determination unit 11 uses a camera parameter related to the image to be coded and a camera parameter related to the reference image to identify an operation performed on the camera and determines the number of parameters in accordance with the identified operation. The motion vector determination unit 12 determines a motion vector, based on the moving image signal, the camera parameter signal, and the number of parameters. Specifically, the motion vector determination unit 12 outputs the motion vector, based on the number of parameters and a position in the image determined in advance according to the number of parameters. For example, when the number of parameters is 1 or 2, the motion vector determination unit 12 outputs a motion vector at the upper left corner of the image, and when the number of parameters is 3 or 4, the motion vector determination unit 12 outputs motion vectors at the upper left corner and the lower right corner of the image. Note that the positions in the image are not limited to those described above.
The subtractor 13 subtracts a predicted signal from the moving image signal. The predicted signal is generated for each predetermined unit of processing (area to be coded) by the intra-frame predictor 22 or the motion compensator 23. In H.265/HEVC, the predetermined unit of processing is a prediction unit. The subtractor 13 outputs a predicted residue signal resulting from the subtraction, to the transformer 14. The transformer 14 applies a discrete cosine transform to the predicted residue signal. The quantizer 15 quantizes a result of the discrete cosine transform. The entropy coder 16 performs entropy coding on a result of the quantization. The entropy coder 16 outputs coded data resulting from the entropy coding to an external apparatus (not illustrated) such as a decoding apparatus.
The inverse quantizer 17 performs inverse quantization on the result of the quantization. The inverse transformer 18 applies an inverse discrete cosine transform to a result of the inverse quantization. The adder 19 calculates a sum of a result of the inverse discrete cosine transform and the predicted signal to generate a decoded image. The distortion removal filter 20 removes a distortion from the decoded image to generate a decoded image signal from which the distortion is removed.
The frame memory 21 stores the decoded image signal (reference image) from which the distortion is removed. The decoded image signal stored in the frame memory 21 is the same as the decoded image signal generated by the decoding apparatus. The frame memory 21 deletes a decoded image signal that is stored for a time equal to or longer than a predetermined time, from the frame memory 21. Note that the frame memory 21 may store a decoded image signal of a long-time reference frame until the frame memory 21 acquires a deletion instruction. The frame memory 21 may not store the decoded image signal of a frame that is not used as a reference.
The intra-frame predictor 22 executes intra-frame prediction processing on the decoded image signal to generate a predicted signal according to a result of the intra-frame prediction processing. The motion compensator 23 executes motion compensation prediction processing on the decoded image signal to generate a predicted signal according to a result of the motion compensation prediction processing. For example, the motion compensator 23 identifies a reference area that is a part of the reference image represented by the decoded image signal and makes a prediction using the reference area to obtain a predicted area with respect to the area to be coded. The area to be coded and the reference area have different sizes and/or different shapes. The switch 24 outputs, to the subtractor 13, a predicted signal according to the result of the intra-frame prediction processing. The switch 24 outputs, to the subtractor 13, a predicted signal according to the result of the motion compensation prediction processing.
Next, a configuration example of the motion compensator 23 will be described.
Motion compensation modes include a first mode and a second mode. The first mode is a motion compensation mode on the basis of inter-frame prediction processing in a well-known moving image coding standard such as “H.265/HEVC” and “H.264/AVC”. The second mode is a motion compensation mode in which a homography matrix on the basis of one or more motion vectors (the N-parameter signal) is used to execute projective transformation for each unit of projective transformation on the decoded image signal stored in the frame memory 21.
The analyzer 231 acquires a plurality of frames (hereinafter, referred to as a “frame group”) of the moving image in a predetermined time period (time interval) as the moving image signal. Furthermore, the analyzer 231 acquires a camera parameter signal for each frame from the camera parameter determination unit 10. The analyzer 231 determines whether or not the acquired frame group is a frame group captured in a time period during which the camera parameter is constant. The accuracy of the projective transformation using the homography matrix is high for a frame group captured in a time period during which the camera parameter is constant, and thus, the second mode motion compensation is more suitable than the first mode motion compensation.
When the analyzer 231 determines that the frame group is a frame group captured in a time period during which the camera parameter is not constant, the analyzer 231 generates a motion compensation mode signal representing the first mode (hereinafter, referred to as a “first motion compensation mode signal”). The analyzer 231 outputs the first motion compensation mode signal to the inter-frame predictor 232 and the switch 235.
When the analyzer 231 determines that the frame group is a frame group captured in a time period during which the camera parameter is constant, the analyzer 231 generates a motion compensation mode signal representing the second mode (hereinafter, referred to as a “second motion compensation mode signal”). The analyzer 231 outputs the second motion compensation mode signal to the matrix generator 233 and the switch 235.
When the inter-frame predictor 232 acquires the first motion compensation mode signal from the analyzer 231, the inter-frame predictor 232 acquires the decoded image signal from the frame memory 21. The inter-frame predictor 232 acquires the moving image signal from the analyzer 231. The inter-frame predictor 232 executes motion compensation on the basis of the inter-frame prediction processing in a well-known moving image coding standard, with respect to the decoded image signal. The inter-frame predictor 232 outputs a predicted signal on the basis of the first mode motion compensation, to the switch 235.
When the matrix generator 233 acquires the second motion compensation mode signal from the analyzer 231, the matrix generator 233 acquires the frame group and the camera parameter signal from the analyzer 231. When the matrix generator 233 acquires the second motion compensation mode signal from the analyzer 231, the matrix generator 233 acquires the decoded image signal from the frame memory 21. When the matrix generator 233 acquires the second motion compensation mode signal from the analyzer 231, the matrix generator 233 acquires the motion vector from the motion vector determination unit 12.
The matrix generator 233 outputs the N-parameter signal for each frame to an external apparatus (not illustrated) such as a decoding apparatus and the projective transformer 234. The matrix generator 233 outputs the N-parameter signal to an external apparatus (not illustrated) such as a decoding apparatus and the projective transformer 234, for each unit of projective transformation defined in the decoded image. The external apparatus such as the decoding apparatus may use the output camera parameter signal and the output N-parameter signal to derive a homography matrix. The matrix generator 233 uses the camera parameter signal and the motion vector to generate a homography matrix “H”. For example, the matrix generator 233 identifies the reference area by utilizing a difference between a manner of projection of an object corresponding to the area to be coded and a manner of projection of the object corresponding to the reference area, due to an operation performed on the camera. The operations performed on the camera are the above-described pan, tilt, roll, and zoom.
The projective transformer 234 executes projective transformation using the homography matrix “H” on the decoded image signal stored in the frame memory 21. The projective transformer 234 outputs a predicted signal on the basis of the second mode motion compensation to the switch 235.
The motion vector determination unit 12 determines a motion vector, based on the moving image signal, the camera parameter signal, and the number of parameters (step S103). The motion vector determination unit 12 outputs a determination result of the motion vector to the motion compensator 23. The subtractor 13 generates a predicted residue signal (step S104). The transformer 14 applies a discrete cosine transform to the predicted residue signal. The quantizer 15 quantizes a result of the discrete cosine transform (step S105). The entropy coder 16 performs entropy coding on a result of the quantization (step S106).
The inverse quantizer 17 performs inverse quantization on the result of the quantization. The inverse transformer 18 applies an inverse discrete cosine transform to a result of the inverse quantization (step S107). The adder 19 calculates a sum of a result of the inverse discrete cosine transform and the predicted signal to generate a decoded image (step S108). The distortion removal filter 20 removes a distortion from the decoded image to generate a decoded image signal from which the distortion is removed (step S109).
The distortion removal filter 20 records the decoded image signal in the frame memory 21 (step S110). The intra-frame predictor 22 executes intra-frame prediction processing on the decoded image signal to generate a predicted signal according to a result of the intra-frame prediction processing. The motion compensator 23 executes motion compensation prediction processing on the decoded image signal to generate a predicted signal according to a result of the motion compensation prediction processing (step S111).
The analyzer 231 acquires a frame group and a camera parameter signal (step S201). The analyzer 231 determines whether or not the acquired frame group is a frame group captured in a time period during which a camera parameter “B” is constant (step S202). When the analyzer 231 determines that the acquired frame group is a frame group captured in a time period during which the camera parameter “B” is constant (step S202: YES), the analyzer 231 outputs the second motion compensation mode signal to the matrix generator 233 and the switch 235 (step S203).
The matrix generator 233 outputs the N-parameter signal for each frame to an external apparatus (not illustrated) such as a decoding apparatus (step S204). Furthermore, the matrix generator 233 outputs the N-parameter signal to an external apparatus (not illustrated) such as a decoding apparatus, for each unit of projective transformation (prediction unit) defined in the decoded image.
The matrix generator 233 uses the camera parameter signal, the decoded image signal, and the motion vector to generate the homography matrix “H” (step S205).
First, equations utilized in the following description will be described. Rotation matrices when the camera performs each of tilt (a rotation about the x-axis), pan (a rotation about the y-axis), and roll (a rotation around the z-axis) are expressed by Equations (6) below.
θx in Equation (6) represents a rotation angle in the x-axis direction. θy represents a rotation angle in the y-axis direction. θz represents a rotation angle in the z-axis direction. Furthermore, the camera parameter A is expressed by Equation (7) below.
In Equation (6), ox represents a half of a horizontal image size, oy represents a half of a vertical image size, fx and fy are determined from the focal length and the vertical and horizontal size of the pixels in the imaging plane, and normally, fx=fy=f is satisfied. A space rotation amount R is expressed by Equation (8) below using Equations (6).
[Math. 8]
R=R
x(θx)Ry(θy)Rz(θz) (8)
The matrix generator 233 generates the homography matrix “H”, based on Equation (9) below.
A′RA−1 in Equation (9) corresponds to the homography matrix “H”. Note that, when the zoom is not utilized in capturing the image to be coded, A′ in Equation (9) is the camera parameter A. In Equation (9), a point (x, y) in the decoded image signal corresponds to a point (vx/v1, vy/v1) in the moving image signal.
Next, a specific processing for generating the homography matrix “H” by the matrix generator 233 will be described with reference to
As illustrated in
In Equation (10), vx/v1 is the x-component of the motion vector of the origin point, and thus, the matrix generator 233 solves Equation (10) to obtain θ (or sin θ and cos θ) to generate the homography matrix “H” (ARA−1) on the entire screen. Note that a movement of the upper left origin point is focused on in
As illustrated in
In Equation (11), (vx/v1, vy/v1) is a motion vector of the origin point, and thus, the matrix generator 233 solves Equation (11) to obtain θ (or sin θ and cos θ) and f to generate the homography matrix “H” (A′RA−1) on the entire screen. f′ in Equation (11) is expressed as f=s*f Here, s is a value representing a change ratio off, s>1 in the case of zooming in, and s<1 in the case of zooming out. Note that a movement of the upper left origin point is focused on in
Thus, when two of pan, tilt, roll, and zoom operations are performed on the camera 31, the matrix generator 233 identifies the reference area by using a combination of parameters expressed in one dimension or a parameter expressed in two dimensions. Specifically, when two-dimensional components of a motion vector at one specific point of the image to be coded are used, the matrix generator 233 uses the two-dimensional components, a camera parameter at a time when the image to be coded is acquired, and a camera parameter at a time when the reference image is acquired to generate the homography matrix “H” and uses the generated homography matrix “H” to identify the reference area, and when one-dimensional components (for example, only x-components) of the respective motion vectors at two specific points of the image to be coded are used, the matrix generator 233 uses the plurality of one-dimensional components, a camera parameter at a time when the image to be coded is acquired, and a camera parameter at a time when the reference image is acquired to generate the homography matrix “H” and uses the generated homography matrix “H” to identify the reference area. When two or more parameters are used, it is preferable to select farthest possible parameters from each other in one image plane.
As illustrated in
In Equations (12), (vx1/v11, vr1/v11) and (vx2/v12, vy2/v12) are motion vectors in the upper left corner and the lower right corner of the screen, and thus, the matrix generator 233 solves Equations (12) to obtain θ, θy, and θz (or the sines and cosines of θ, θy, and θz and f′, to generate the homography matrix “H” (A′RA−1) on the entire screen. Thus, when three of pan, tilt, roll, and zoom operations are performed on the camera 31, the matrix generator 233 identifies the reference area by using a parameter expressed in one dimension and a parameter expressed in two dimensions. Specifically, the matrix generator 233 uses two-dimensional components (x-component and y-component) of a motion vector at one of two specific points of an image to be coded, a one-dimensional component (for example, only the x-component) of a motion vector at the other one of the two specific points, a camera parameter at a time when the image to be coded is acquired, and a camera parameter at a time when the reference image is acquired to generate the homography matrix “H”, and uses the generated homography matrix “H” to identify the reference area.
As illustrated in
In Equations (13), (vx1/v11, vr1/v11)) and (vx2/v12, vy2/v12) are motion vectors in the upper left corner and the lower right corner of the screen, and thus, the matrix generator 233 solves Equations (13) to obtain θ, θy, and θz (or the sines and cosines of θ, θy, and θz and f′, to generate the homography matrix “H” (A′RA−1) on the entire screen. In the example of
The projective transformer 234 executes the second mode motion compensation on the decoded image signal stored in the frame memory 21 by performing projective transformation using the homography matrix “H” (step S206). The projective transformer 234 outputs a predicted signal on the basis of the second mode motion compensation to the switch 235. The switch 235 outputs the predicted signal on the basis of the second mode motion compensation to the subtractor 13 (step S207).
The projective transformer 234 determines whether or not the second mode motion compensation is executed for all the frames in the acquired frame group (step S208). When the projective transformer 234 determines that there is a frame for which the second mode motion compensation is not yet executed (step S208: NO), the projective transformer 234 returns the processing to step S204. When the projective transformer 234 determines that the second mode motion compensation has been executed for all of the frames (step S208: YES), the matrix generator 233 and the projective transformer 234 terminate the motion compensation processing for the acquired frame group.
When the frame group is a frame group captured in a time period during which the camera parameter “B” is not constant (a frame group suitable for well-known inter-frame prediction processing) (step S202: NO), the analyzer 231 outputs the first motion compensation mode signal to the inter-frame predictor 232 and the switch 235 (step S209).
The inter-frame predictor 232 executes motion compensation on the basis of the inter-frame prediction processing in a well-known moving image coding standard, for the decoded image signal stored in the frame memory 21 (step S210). The inter-frame predictor 232 outputs a predicted signal on the basis of the first mode motion compensation, to the switch 235. The switch 235 outputs the predicted signal on the basis of the first mode motion compensation, to the subtractor 13 (step S211).
The inter-frame predictor 232 determines whether or not the first mode motion compensation is executed for all the frames in the acquired frame group (step S212). When the inter-frame predictor 232 determines that there is a frame for which the first mode motion compensation is not yet executed (step S212: NO), the inter-frame predictor 232 returns the processing to step S210. When the inter-frame predictor 232 determines that the first mode motion compensation has been executed for all of the frames (step S212: YES), the inter-frame predictor 232 terminates the motion compensation processing for the acquired frame group.
The coding apparatus 1 of the embodiment generates coded data having a small coding amount and allowing for generation of a high-quality decoded image, by motion compensation on the basis of projective transformation of an image of a physical body. Thus, the coding apparatus 1 of the embodiment can improve the coding efficiency of the image.
The following appendices are disclosed regarding the coding apparatus 1 of the embodiment.
A coding method for coding an image to be coded using a reference image includes identifying a reference area being a part of the reference image, the reference area corresponding to an area to be coded being an area obtained by dividing the image to be coded, and obtaining a predicted area with respect to the area to be coded, by prediction using the reference area, wherein
the area to be coded and the reference area have different sizes and/or different shapes, and in the identifying, the reference area is identified by utilizing a difference between a manner of projection of an object corresponding to the area to be coded and a manner of projection of the object corresponding to the reference area, due to an operation performed on a camera when the image to be coded and the reference image are acquired.
In the coding method described above, the operation performed on the camera is at least one of pan, tilt, roll, or zoom, or a combination of at least two of pan, tilt, roll, or zoom.
In the coding method described above, in the identifying, the operation is identified by using a camera parameter related to the image to be coded and a camera parameter related to the reference image.
In the coding method described above, in the identifying, when the operation is at least one of pan, tilt, roll, or zoom, the reference area is identified by using a parameter expressed in one dimension.
In the coding method described above, in the identifying, a homography matrix is generated by using a one-dimensional component of a motion vector at one specific point of the image to be coded, a camera parameter at a time when the image to be coded is acquired, and a camera parameter at a time when the reference image is acquired, and the generated homography matrix is used for identification.
In the coding method described above, in the identifying, when the operation is a combination of at least two of pan, tilt, roll, or zoom, the reference area is identified by using a combination of parameters expressed in one dimension or a parameter expressed in two dimensions.
In the coding method described above, in the identifying, when two-dimensional components of a motion vector at one specific point of the image to be coded are used, a homography matrix is generated by using the two-dimensional components, a camera parameter at a time when the image to be coded is acquired, and a camera parameter at a time when the reference image is acquired, and the generated homography matrix is used for identification, and when one-dimensional components of respective motion vectors at two specific points of the image to be coded are used, a homography matrix is generated by using the one-dimensional components, a camera parameter at a time when the image to be coded is acquired, and a camera parameter at a time when the reference image is acquired, and the generated homography matrix is used for identification.
In the coding method described above, in the identifying, when the operation is a combination of at least three of pan, tilt, roll, or zoom, the reference area is identified by using a parameter expressed in one dimension and a parameter expressed in two dimensions.
In the coding method described above, in the identifying, a homography matrix is generated by using two-dimensional components of a motion vector at one of two specific points of the image to be coded, a one-dimensional component of a motion vector at the other one of the two specific points, a camera parameter at a time when the image to be coded is acquired, and a camera parameter at a time when the reference image is acquired, and the generated homography matrix is used for identification.
In the coding method described above, in the identifying, when the operation is a combination of all of pan, tilt, roll, and zoom, the reference area is identified by using a plurality of parameters expressed in two dimensions.
In the coding method described above, in the identifying, a homography matrix is generated by using two-dimensional components of motion vectors at two specific points of the image to be coded, a camera parameter at a time when the image to be coded is acquired, and a camera parameter at a time when the reference image is acquired, and the generated homography matrix is used for identification.
Although the embodiment of the present invention has been described in detail with reference to the drawings, a specific configuration is not limited to the embodiment, and a design or the like in a range that does not depart from the gist of the present invention is included.
The present invention is applicable to a coding apparatus that performs lossless coding or lossy coding of a still image or a moving image.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/045083 | 11/18/2019 | WO |