This application claims the priority benefit of China application serial no. 201911188023.9, filed on Nov. 28, 2019. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The invention relates to a conference apparatus. More particularly, the invention relates to a video conference apparatus and a video conference method.
Nowadays, as the demand for video conferences increases, how to provide a video conference apparatus design suitable for various types of conference scenarios while providing a good video effect is an important goal in the search and development of video conference apparatuses. For instance, in a video space, when one or more conference members are present, how one or a plurality of sound sources may be automatically tracked so that a corresponding conference image may be provided is an important technical issue to be overcome at present. Moreover, generally, in a conventional video conference apparatus, after a conference image is obtained, computation of a large amount of processor resources is required to perform image analysis on the captured overall conference image, so that the position of a close-up human face (the speaker) may be determined. Accordingly, several solutions of embodiments are provided as follows to provide a video conference apparatus capable of achieving the effects of automatically tracking the sound source and appropriately displaying the conference image with low data computation for image processing.
The information disclosed in this Background section is only for enhancement of understanding of the background of the described technology and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art. Further, the information disclosed in the Background section does not mean that one or more problems to be resolved by one or more embodiments of the invention was acknowledged by a person of ordinary skill in the art.
The invention is directed to a video conference apparatus and the video conference method capable of automatically generating an appropriate close-up conference image and providing a favorable video conference experience.
In order to achieve one or a portion of or all of the objects or other objects, a video conference apparatus provided by the invention includes an image detection device, a sound source detection device, and a processor. The image detection device is configured to obtain a conference image of a conference space. The sound source detection device is configured to detect a sound source of the conference space and outputs a positioning signal corresponding to the sound source. The processor is coupled to the image detection device and the sound source detection device and is configured to receive the conference image and the positioning signal, so as to select a first sub-conference image corresponding to the sound source in the conference image according to the positioning signal. The processor performs human face detection on the first sub-conference image to detect a human face image closest to a central axis of the first sub-conference image. The processor selects a second sub-conference image in the conference image by treating the human face image as an image center and outputs the second sub-conference image.
In order to achieve one or a portion of or all of the objects or other objects, a video conference method provided by the invention includes the following steps. A conference image of a conference space is obtained through an image detection device. A sound source of the conference space is detected, and a positioning signal corresponding to the sound source is outputted through a sound source detection device. A first sub-conference image corresponding to the sound source in the conference image is selected according to the positioning signal through a processor. Human face detection is performed on the first sub-conference image to detect a human face image closest to a central axis of the first sub-conference image through the processor. A second sub-conference image in the conference image is selected by treating the human face image as an image center, and the second sub-conference image is outputted through the processor.
Based on the above, in the video conference apparatus and the video conference method provided by the invention, the conference image of the conference space may be obtained through the image detection device. Moreover, a partial conference image corresponding to the sound source in the conference image may be selected according to the positioning signal of the sound source detection device. In this way, the partial conference image may be outputted to an external display apparatus to be displayed.
Other objectives, features and advantages of the present invention will be further understood from the further technological features disclosed by the embodiments of the present invention wherein there are shown and described preferred embodiments of this invention, simply by way of illustration of modes best suited to carry out the invention.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
It is to be understood that other embodiment may be utilized and structural changes may be made without departing from the scope of the present invention. Also, it is to be understood that the phraseology and terminology used herein are for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless limited otherwise, the terms “connected,” “coupled,” and “mounted,” and variations thereof herein are used broadly and encompass direct and indirect connections, couplings, and mountings.
In order to make the disclosure more comprehensible, several embodiments are described below as examples of implementation of the invention. Moreover, components/members/steps with the same reference numerals represent the same or similar parts in the accompanying figures and embodiments where appropriate.
In this embodiment, the video conference apparatus 100 may be an independent and movable apparatus and may be placed at any appropriate position in the conference space. For instance, the video conference apparatus 100 may be placed at a center of a table, a ceiling of a conference room, or the like, so as to obtain the conference image of the conference space and detect the sound source in the conference space. Nevertheless, in another embodiment, the video conference apparatus 100 may also be integrated with other computer apparatuses or display apparatuses, which is not limited by the invention. In this embodiment, the processor 110 may select a first sub-conference image corresponding to the sound source in the conference image according to the positioning signal and performs human face detection on the first sub-conference image, so as to detect a human face image closest to a central axis of the first sub-conference image. The processor 110 reselects a second sub-conference image in the conference image by treating the human face image as an image center and outputs the second sub-conference image. In other words, the processor 110 provided by the embodiment may first determine a range of the first sub-conference image in the conference image according to the conference image provided by the image detection device 130 and the positioning signal provided by the sound source detection device 140 and then determines a range of the second sub-conference image in the conference image according to a determination result of the human face detection performed on the first sub-conference image. Moreover, in the second sub-conference image outputted by the processor 110, the human face image corresponding to the sound source is located at a central position of the second sub-conference image. That is, through the video conference apparatus 100 provided by this embodiment, image processing or human face identification is not required to be performed on the entire piece of the conference image. Instead, an appropriate close-up conference image is automatically generated with low data computation for image processing.
Further, when the processor 110 provided by this embodiment performs the human face detection on the first sub-conference image, the processor 110 reads the neural network model 121 in the memory 120 and inputs the first sub-conference image into the neural network model 121, so as to identify at least one human face in the first sub-conference image through the neural network model 121. Next, the processor 110 determines the human face image closest to the central axis of the first sub-conference image according to distribution of the at least one human face in the first sub-conference image. In addition, the neural network model 121 provided by this embodiment may be trained through a plurality of reference conference images of different conference scenarios in advance, so that the trained neural network model 121 may be configured to at least identify whether a random object in the first sub-conference image is a human face. The different conference scenarios described above may refer to different conference background, different conference room brightness, or different conference objects, and so on, which is not limited by the invention.
In this embodiment, the processor 110 may include a central processing unit (CPU) exhibiting image data analysis and calculation processing functions or may include a programmable microprocessor for a general purpose or a special purpose, an image processing unit (IPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuits (ASIC), a programmable logic device (PLD), or other similar operational circuits or a combination these circuits. Moreover, the processor 110 is coupled to the memory 120, so as to store the neural network model 121, related image data, image analysis software, and image processing software required to implement a video conference method provided by the invention into the memory 120, so that the processor 110 may read and execute related software programs. The memory 120 may be, for example, a movable random access memory (RAM), a read-only memory (ROM), a flash memory, or a similar component or a combination of the foregoing components. In an embodiment, the video conference apparatus 100 may also be integrated with other computer apparatuses or display apparatuses, which is not limited by the invention.
Besides, in another embodiment, the processor 110 of the video conference apparatus 100 may further judge whether the human face image 301 of the conference member 204 in the second sub-conference image 320 is greater than a first image range threshold or less than a second image range threshold, so as to perform an image scaling operation based on the human face image 301 acting as the center, and outputs the scaled second sub-conference image 310.
In other words, the video conference apparatus 100 may automatically and appropriately adjust an image size of the human face image 301 in the second sub-conference image 320 according to a distance between the speaking conference member 204 and the video conference apparatus 100, so that an appropriate human face close-up image of the speaker is provided. Nevertheless, the first image range threshold and the second image range threshold may be judged according to a display resolution of an external display apparatus, which is not limited by the invention.
In addition, sufficient teachings, suggestions, and implementation description related to implementation, variation, and extension of each step of this embodiment may be acquired with reference to the description of the embodiments of
Therefore, with reference to
In addition, sufficient teachings, suggestions, and implementation description related to implementation, variation, and extension of the video conference apparatus of this embodiment may be acquired with reference to the description of the embodiments of
In addition, sufficient teachings, suggestions, and implementation description related to implementation, variation, and extension of the video conference apparatus of this embodiment may be acquired with reference to the description of the embodiments of
In view of the foregoing, in the video conference apparatus and the video conference method provided by the invention, the panoramic conference image of the conference space may be obtained through the image detection device. Moreover, a partial conference image corresponding to the sound source and captured from the panoramic conference image may be determined according to the positioning signal of the sound source detection device. Herein, the human face image of the speaker corresponding to the sound source is automatically centered in the middle of the partial conference image. Therefore, in the video conference apparatus and the video conference method provided by the invention, an appropriate close-up conference image may be automatically generated, so that a favorable video conference experience is provided.
The foregoing description of the preferred embodiments of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form or to exemplary embodiments disclosed. Accordingly, the foregoing description should be regarded as illustrative rather than restrictive. Obviously, many modifications and variations will be apparent to practitioners skilled in this art. The embodiments are chosen and described in order to best explain the principles of the invention and its best mode practical application, thereby to enable persons skilled in the art to understand the invention for various embodiments and with various modifications as are suited to the particular use or implementation contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents in which all terms are meant in their broadest reasonable sense unless otherwise indicated. Therefore, the term “the invention”, “the present invention” or the like does not necessarily limit the claim scope to a specific embodiment, and the reference to particularly preferred exemplary embodiments of the invention does not imply a limitation on the invention, and no such limitation is to be inferred. The invention is limited only by the spirit and scope of the appended claims. The abstract of the disclosure is provided to comply with the rules requiring an abstract, which will allow a searcher to quickly ascertain the subject matter of the technical disclosure of any patent issued from this disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Any advantages and benefits described may not apply to all embodiments of the invention. It should be appreciated that variations may be made in the embodiments described by persons skilled in the art without departing from the scope of the present invention as defined by the following claims. Moreover, no element and component in the present disclosure is intended to be dedicated to the public regardless of whether the element or component is explicitly recited in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
201911188023.9 | Nov 2019 | CN | national |