This non-provisional application claims priority under 35 U.S.C. §119(a) on Patent Application No(s). 109115105 filed in Taiwan, ROC on May 6, 2020, the entire contents of which are hereby incorporated by reference.
This disclosure relates to artificial intelligence, neural network, pattern recognition, and object detection, and more particularly to a transparency adjustment method adapted to a target object image shown in a video and a document camera applying this method.
Generally, when shooting a teaching video, the body of the speaker often blocks the writing on the blackboard or the lecture content displayed on the slide, which will cause inconvenience to the learners watching the video.
So far, the image processing may perform a segmentation of human body contour, and then transparentize the human body part and the background. However, the huge amount of computation required for segmentation of human body contour consumes a lot of computation power. Therefore, it requires more hardware to support real-time video processing. If the human body contour segmentation technology is applied to the hardware platform of a general video camera, due to the limitation of hardware performance, its computing power cannot meet the requirements of real-time video processing.
According to an embodiment of the present disclosure, a transparency adjustment method adapted to a target object image shown in a video comprising: extracting a first frame from the video, wherein the target object image is not in the first frame; extracting a second frame from the video after extracting the first frame, wherein the target object image is in the second frame; selecting a target block from the second frame, wherein the target block contains the target object image; obtaining a position of the target block in the second frame and selecting a background block from the first frame according to the position; replacing the target block of the second frame with the background block of the first frame to generate a third frame, wherein the third frame comprises the background block and a part of the second frame other than the target block; and generating an output frame according to the third frame, a transparency parameter, and one of the second frame and the target block.
According to an embodiment of the present disclosure, a document camera comprising: a camera device configured to obtain a video; a processor electrically connecting to the camera device, wherein the processor is configured to extract a first frame and a second frame from the video, select a target block from the second frame, select a background block from the first frame, and generate a third frame and an output frame; and a display device electrically connecting to the processor, wherein the display device is configured to display an output video according to the output frame; wherein a target object image is not in the first frame and is in the second frame; the third frame is the second frame whose target block is replaced with the background block of the first frame; the target block contains the target object image, the target block locates at a position of the second frame, and the background block corresponds to the position of the first frame; the output frame is generated according to the third frame, a transparency parameter, and one of the second frame and the target block.
The present disclosure will become more fully understood from the detailed description given hereinbelow and the accompanying drawings which are given by way of illustration only and thus are not limitative of the present disclosure and wherein:
In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, that one or more embodiments may be practiced without these specific details. In other instances, well-known structures and devices are schematically shown in order to simplify the drawings.
Please refer to
Please refer to
Please refer to
The processor 3 electrically connects to the camera device 1. The processor 3 is configured to extract a first frame and a second frame from the video, select a target block from the second frame, select the background block from the first frame, and generate a third frame and an output frame. The processor 3 is, for example, a System on Chip (SoC), a Field Programmable Gate Array (FPGA), a Digital Processor Unit (DPU), a Central Processing Unit (CPU), and a control chip, or a combination thereof. However, the present disclosure does not limit thereto. In an embodiment, the processor 3 comprises a computing unit 32 and a processing unit 34.
The computing unit 32 performs an algorithm to detect the target object image 7′. The algorithm is, for example, the Single Shot multibox Detector (SSD) or You Only Look Once (YOLO). However, the present disclosure is not limited thereof. In another embodiment of the present disclosure, the computing unit 32 is an artificial intelligence computing unit, which loads a pre-trained model to perform the algorithm. For example, images of various types of the target object 7 (such as the human hand) are collected in advance, these images are served as the input layer, and a neural network is adopted to train a model to determine whether the target object image 7′ appears in the video. Said neural network is, for example, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Deep Neural Network (DNN), however the present disclosure is not limited thereof.
In an embodiment, the computing unit 32 determines whether the target object image 7′ is in the extracted frame. If the target object image 7′ is not in the extracted frame, this frame will be set as the first frame. If the target object image 7′ is in the extracted frame, this frame will be set as the second frame. The extraction timing of the first frame should be earlier than the extraction timing of the second frame. In addition, the computing unit 32 selects the target block from the second frame, and output the related information of the selected target block to the processing unit 34. The target block comprises the target object image 7′. In an embodiment, the computing unit 32 selects a model corresponding to a shape of the target block, wherein the shape is a rectangle or an outline of the target object 7 (such as a human hand).
The processing unit 34 electrically connects to the computing unit 32. Based on the related information of the target block outputted by the computing unit 32, such as the coordinate of the target block in the second frame, the processing unit 34 determines a position of the target block in the second frame and select the background block from the first frame according to the identical position. The processing unit 34 further generates a third frame according to the first frame and the second frame. The third frame is the second frame whose target block is replaced with the background block. In an embodiment of the present disclosure, the processing unit 34 generates an output frame according to the second frame, the third frame and a transparency parameter. In another embodiment of the present disclosure, the processing unit 34 generates an output frame according to the target block, the third frame, and the transparency parameter.
Please refer to
Please refer to
Please refer to step S1, which shows “extracting a first frame”. Please refer to
Please refer to step S2, which shows “extracting a second frame”. Please refer to
Please refer to step S3, which shows “selecting a target block from the second frame”. Please refer to
Please refer to step S4, which shows “selecting a background block from the first frame”. Please refer to
Please refer to step S5, which shows “generating a third frame”. The processor 3 replaces the target block B1 of the second frame F2 with the background block B2 of the first frame F1. Please refer to
Please refer to step S6, which shows “generating an output frame”. In an embodiment of step S6, the processor 3 generates the output frame according to the second frame F2, the third frame F3 and the transparency parameter. For example, assuming the transparency parameter is a, the output frame may be generated according to the following equation:
RGBF4=RGBF2×α+RGBF3×(1−α), wherein RGB represents values of the three primary color (red, green, blue) of the frame. The transparency parameter is between 0 and 1, such as 0.3. Please refer to
In another embodiment of step S6, the output frame F4 is generated according to the target block B 1, the third frame F3 and the transparency parameter a, the rest of the process is as the same as the foregoing description, and the description is not repeated here.
The above describes a process flow of the transparency adjustment method adapted to a target object image shown in a video according to an embodiment of the present disclosure. In practice, the processor 3 repeats the process of steps S1-S6 to continuously update the first frame F1, the second frame F2, the third frame, and the output frame F4, and thereby displaying the video with a transparent target object image 7′, so that the viewer may clearly see the text blocked by the speaker's body. Regarding the process for updating the first frame F1, for example, the processer 3 may update the first frame after the third frame F3 is generated in step S5 and before the step S1 is performed again. Specifically, the processor 3 sets the third frame F3 generated in step S5 as the first frame F1 when step S1 is performed next time. The processes for updating the second frame F2, the third frame F3 and the output frame F4 are performed according to steps S1-S6 as described previously, wherein the third frame F3 is served as a new first frame F1 in step S1.
In view of the above, the present disclosure uses the object detection and the algorithm in the artificial intelligence field to extract a first frame without the target object image and a second frame with the target object image. The present disclosure selects the background block from the first frame whose extraction timing is earlier, selects the target block from the second frame whose extrication timing is later, replaces the target block with the background block to generate a third frame without the target object image, and performs a mix operation according to the second frame, third frame, and the transparency parameter to achieve the effect of transparent target object. The transparency adjustment method adapted to a target object image shown in a video proposed by the present disclosure may make the speaker's body transparent so that the teaching material will not be blocked by the body. The present disclosure provides a great convenience in the video production of teaching and speech. The background content blocked by the speaker will be updated after the speaker moves away.
The object detection technology adopted by the present disclosure is mature in terms of stability and accuracy. Said object detection technology adopts the block detection policy. Compared with the pixel-based detection mechanism used in traditional human-shape cutting, the computing power required by the present disclosure is smaller. The present disclosure does not need to update every frame of the video so that the computing tasks can be further reduced. The preset disclosure is suitable for the current video cameras.
Number | Date | Country | Kind |
---|---|---|---|
109115105 | May 2020 | TW | national |