The present invention relates to an apparatus for determining a video based on depth information and method thereof, and more specifically, an apparatus for determining a video based on depth information and method thereof that compare feature information extracted from an original multi-view video and a query multi-view video to determine whether they are the same video or not.
Metaverse is a 3D virtual world that is a compound word of meta, which means artificial or abstract, and universe, which means the real world. The metaverse allows a user to be more immersed in content by using augmented reality (AR) and virtual reality (VR) technologies.
On the other hand, the content provided to the metaverse is mainly a video taken through a multi-view camera, and a video that is disseminated after applying a slight transformation, for example, an illegally copied video in which a part of the screen is cut off or the resolution is lowered may be used.
In addition, since the fully revised Copyright Act includes a provision of ‘mandatory technical measures (filtering) to block the transmission of illegally copied videos’ by an online service provider (OSP) of special types such as P2P and webhard, a study is being conducted to identify whether a video provided to the user in the metaverse is the same copyrighted work as another person's copyrighted work.
However, in the case of multi-view videos, since they are transmitted in a file format in which a basic video (basic view) and a depth information screen (additional view) are combined, and thereby the file capacity of the original video and query video is larger than that of a general stereo type video, so there is a problem that the speed of sending and receiving files is slow to identify the videos.
In addition, since the illegal works filtering technology is mainly used to determine whether a 2D video is illegally copied, there is a limit in that it cannot be applied to 3D video such as a 360-degree virtual reality (VR) video.
The background technology of the present invention is disclosed in Korean Unexamined Patent Publication No. 10-2005-001802 (published on Feb. 23, 2005).
As described above, according to the present invention, an object of the present invention is to provide an apparatus for determining a video based on depth information and method thereof that compare feature information extracted from an original multi-view video and a query multi-view video to determine whether they are the same video or not.
According to an embodiment of the present invention for achieving such a technical problem, there is provided an apparatus for determining a video based on depth information, the apparatus including an original video extraction unit that receives an original multi-view video, extracts a plurality of frames from the input original multi-view video, extracts a plurality of basic screen feature information from the extracted basic screen for each frame, and stores the extracted plurality of basic screen feature information in an original database for each frame; a query video extraction unit that receives a query multi-view video, extracts a plurality of frames and depth information screens from the input query multi-view video, extracts a plurality of depth information screen feature information from the extracted depth information screen for each frame, and stores the plurality of depth information screen feature information in a query database for each frame; and a video determination unit that compares basic screen feature information of the original multi-view video with the depth information screen feature information of the query multi-view video to determine whether the query multi-view video is the same as the original multi-view video.
The query multi-view video extraction unit may calculate each of differences between the basic screens in a plurality of multi-view videos captured by a plurality of cameras installed at different locations, sum up the calculated difference values, and then compresses the sums to generate the depth information screen.
The depth information screen may have a smaller size and capacity than those of the basic information screen.
The original video extraction unit may include a video frame extraction module that extracts a plurality of frames from the input original multi-view video; a basic screen feature information extraction module that extracts the basic screen from the original multi-view video and extracts basic screen feature information of a preset type for each frame from the extracted basic screen; and an original database generation module that stores the extracted plurality of basic screen feature information for each frame in the original database.
The query video extraction unit may include a video frame extraction module that receives the query multi-view video and extracts a plurality of frames from the input query multi-view video; a depth information screen extraction module that extracts a plurality of depth information screens in which depth information is stored from the input query multi-view video; and a query database generation module that extracts a plurality of depth information screen feature information from the extracted depth information screen for each frame and stores the plurality of depth information screen feature information for each frame in the query database.
The video determination unit may include a feature information input module that receives input of the generated original database and query database; a similarity determination module that compares and determines a similarity by applying the input original database and query database to a pre-learned similarity model; and a providing module that provides, through a blockchain network, whether the query multi-view video is the same as the original multi-view video as a result of the similarity determination.
According to another embodiment of the present invention, there is provided a method for determining a video based on depth information executed by an apparatus for determining a video, the method including a step of receiving an original multi-view video and extracting a plurality of frames from the input original multi-view video; a step of extracting a plurality of basic screen feature information from each of the extracted frames; a step of storing the extracted plurality of basic screen feature information in an original database for each frame; a step of receiving a query multi-view video and extracting a plurality of frames and depth information screens from the input query multi-view video; a step of extracting a plurality of depth information screen feature information from the extracted depth information screen for each frame; a step of storing the extracted plurality of depth information screen feature information in a query database for each frame; and a step of comparing basic screen feature information of the extracted original multi-view video with the depth information screen feature information of the query multi-view video to determine whether the query multi-view video is the same as the original multi-view video.
As described above, according to the present invention, by using the basic screen feature information of the original multi-view video and the depth information screen feature information of the query multi-view video, it is possible to determine whether the videos are the same as each other, so that it is possible to perform the determination more quickly than before, and reduce a communication load in that the depth information screen feature information having a small data capacity may be used.
Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art may easily practice the present invention. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. In addition, in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.
Throughout the specification, when a certain component is said to “include”, this means that it may further include other components without excluding other components unless otherwise stated.
With reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art may easily practice the present invention.
First, an apparatus for determining a video based on depth information according to an embodiment of the present invention will be described with reference to
As shown in
First, the original video extraction unit 110 receives an original multi-view video, extracts a plurality of frames from the input original multi-view video, extracts a plurality of basic screen feature information from the extracted basic screen for each frame, and stores the extracted plurality of basic screen feature information in an original database for each frame.
Further, the query video extraction unit 120 receives the query multi-view video, extracts a plurality of frames and depth information screens from the input query multi-view video, extracts a plurality of depth information screen feature information from the extracted depth information screen for each frame, and stores the extracted plurality of depth information screen feature information in a query database for each frame.
To elaborate, the query video extraction unit 120 may calculate each of differences between the basic screens in a plurality of query multi-view videos captured by a plurality of cameras installed at different locations, sum up the calculated difference values, and then compress the sums to generate the depth information screen.
Here, the depth information screen may have a smaller size and capacity than those of the basic screen.
In this case, the apparatus 100 for determining a video based on depth information may receive the original multi-view video or the query multi-view video using MPEG immersive video (MIV) technology.
Here, the MIV is a technology that encrypts and decodes a screen that additionally combines the basic screen (basic view) of the multi-view video and the depth information screen (additional view) obtained by summing the difference between unique feature information of each view and the basic view and then compressing the video. The multi-view video is a plurality of cameras with different positions (for example, a total of 16 cameras with different positions in the upper, middle, and lower directions), and the basic screen may be a reference frame including feature information of a video among videos captured by the plurality of cameras.
That is, the original video extraction unit 110 may match or combine the depth information screen with the basic screen, similarly to the query video extraction unit 120.
Next, the video determination unit 130 compares basic screen feature information of the original multi-view video with the depth information screen feature information of the query multi-view video to determine whether the query multi-view video is the same as the original multi-view video.
Referring to
First, the video frame extraction module 111 extracts a plurality of frames from the input original multi-view video.
In addition, the basic screen feature information extraction module 112 extracts the basic screen from the original multi-view video, and extracts basic screen feature information of a preset type for each frame from the extracted basic screen.
At this time, the basic screen feature information extraction module 112 may extract a plurality of basic screen feature information of a preset type (for example, object type, object position, object outline, object shape and size, and motion).
Finally, the original database generation module 113 stores the extracted plurality of basic screen feature information for each frame in the original database.
Referring to
First, the video frame extraction module 121 receives the query multi-view video and extracts a plurality of frames from the input query multi-view video.
Further, the depth information screen extraction module 122 extracts a plurality of depth information screens in which depth information is stored from the input query multi-view video.
At this time, the depth information screen extraction module 122 calculates each of differences between basic screens in the plurality of query multi-view videos captured from a plurality of cameras (for example, 16 cameras configured in a 4×4 shape) at different positions, respectively, and it is possible to extract the depth information screen in the form of combining the depth information screen generated by summing and compressing the calculated difference values with the basic screen.
Also, the depth information screen extraction module 122 may extract the depth information screen from the query multi-view video of a form in which the depth information screen is combined with the basic screen.
Finally, the query database generation module 123 extracts a plurality of depth information screen feature information from the extracted depth information screen for each frame and stores the plurality of depth information screen feature information for each frame in the query database.
Here, the query database generation module 123 may extract a plurality of depth information screen feature information of a preset type (for example, object type, object position, object outline, object shape, size, and motion), and store it in the database.
Referring to
First, the feature information input module 131 receives input of the original database stored in the original video extraction unit 110 and the query database stored in the query video extraction unit 120.
In other words, the feature information input module 131 may receive the plurality of basic screen feature information stored for each frame in the original multi-view video and the plurality of depth information screen feature information stored for each frame in the query multi-view video.
Then, the similarity determination module 132 compares and determines the similarity by applying the input original database and query database to the pre-learned similarity model.
Specifically, the similarity determination module 132 compares the similarity by applying the original database and the query database to a pre-learned similarity model, and when the similarity falls within a preset ratio range (for example, 85% to 100%), may determine that the query multi-view video is the same as the original multi-view video.
In addition, the providing module 133 provides, through a blockchain network, whether the query multi-view video is the same as the original multi-view video as a result of the similarity determination.
At this time, the blockchain network includes a public blockchain network, a private blockchain network, and a hybrid blockchain network, and the determination result may be provided efficiently and securely.
Hereinafter, a method for determining a video based on depth information according to an embodiment of the present invention will be described in detail with reference to
As shown in
Specifically, the video frame extraction module 111 may extract a preset number of frames (for example, 60 frames/cycle) from the input original multi-view video.
Next, the original video extraction unit 110 extracts the plurality of basic screen feature information from each extracted frame (S520).
Referring to
Next, the basic screen feature information extraction module 112 extracts a plurality of basic screen feature information of a preset type for each frame from the extracted basic screen (S522).
At this time, the basic screen feature information extraction module 112 may extract the plurality of basic screen feature information of the preset type (for example, object type, object position, object outline, object shape, size, and motion).
Next, the original database generation module 113 stores the extracted plurality of basic screen feature information for each frame in the original database (S530).
Next, the video frame extraction module 121 receives the query multi-view video and extracts a plurality of frames from the input query multi-view video, and the depth information screen extraction module 122 extracts the plurality of depth information screens from the extracted depth information screens for each frame (S540).
Here, the depth information screen may have a smaller size and capacity than those of the basic information screen.
As shown in
At this time, the depth information screen extraction module 122 may calculate each of differences between the basic screens in the plurality of query multi-view videos captured from a plurality of cameras (for example, 16 cameras configured in a 4×4 shape) at different positions, respectively, and extract the depth information screen in the form of combining the depth information screen generated by summing and compressing the calculated difference values with the basic screen.
Also, the depth information screen extraction module 122 may extract the depth information screen from the query multi-view video in which the depth information screen is combined with the basic screen.
Next, the query database generation module 123 extracts the plurality of depth information screen feature information from the extracted depth information screen for each frame (S550).
To elaborate, the query database generation module 123 may extract the plurality of depth information screen feature information of the preset type (for example, object type, object position, object outline, object shape, size, and motion) from the extracted depth information screen for each frame.
Next, the query database generation module 123 stores the extracted plurality of depth information screen feature information for each frame in the query database (S560).
Finally, the video determination unit 130 compares the basic screen feature information of the extracted original multi-view video with the depth information screen feature information of the query multi-view video to determine whether the query multi-view video is the same as the original multi-view video (S570).
As shown in
In other words, the feature information input module 131 may receive the plurality of basic screen feature information stored for each frame in the original multi-view video and the plurality of depth information screen feature information stored for each frame in the query multi-view video.
Then, the similarity determination module 132 compares and determines the similarity by applying the input original database and query database to the pre-learned similarity model (S572).
Specifically, the similarity determination module 132 may compare the similarity by applying the original database and the query database to the pre-learned similarity model, and when the similarity falls within a preset ratio range (for example, 85% to 100%), and determine that the query multi-view video is the same as the original multi-view video.
In addition, the providing module 133 provides, through a blockchain network, whether the query multi-view video is the same as the original multi-view video as a result of the similarity determination (S573).
At this time, the blockchain network includes a public blockchain network, a private blockchain network, and a hybrid blockchain network, and the determination result may be provided efficiently and securely.
As described above, according to the embodiments of the present invention, by using the basic screen feature information of the original multi-view video and the depth information screen feature information of the query multi-view video, it is possible to determine whether the videos are the same as each other, so that it is possible to perform the determination more quickly than before, and reduce a communication load in that the depth information screen feature information having a small data capacity may be used.
Although the present invention has been described with reference to the embodiments shown in the drawings, this is only exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical scope of protection of the present invention should be determined by the technical spirit of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0178113 | Dec 2022 | KR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2022/020917 | 12/21/2022 | WO |