First, an overview of embodiments of the present invention will be explained in brief as follows.
For example, in a mixed video delivered to the own device of a certain user A, the user A performs control so that the display of a facial image of a user B with whom the user A wants to hold a local conversation among other parties of communication displayed in the mixed video is enlarged and in this way the user A shortens the virtual sense of distance from the user B. In this case, the face of the user A is also automatically controlled so as to be displayed enlarged on the user B side and in this way the user B shortens the virtual sense of distance from the user A. In this condition, only the voice of the user A out of the mixed voices delivered to the user B is emphasized and mixed and only the voice of the user B out of the mixed voices delivered to the user A is emphasized and mixed. That is, after shortening the sense of distance, even if the user A and user B hold a conversation in voices which are lower than their normal voices, the conversation between the parties are emphasized and can be heard more easily as a consequence. On the other hand, other users hear the conversation of the user A and user B as just the same low voices. In this way, during a video-conference, it is possible to hold a local conversation in much the same way as an actual conference.
Hereinafter, a first embodiment of the present invention will be explained with reference to drawings.
First, a video-conferencing system using the present invention will be explained and then the effects thereof will be explained.
Each conference terminal (21 to 24) is equipped with a camera device (Camera-21 to Camera-24) to take in an input video (V1 to V4), a microphone device (Microphone-21 to Microphone-24) to take in an input voice (A1 to A4), a display device (Monitor-21 to Monitor-24) to display a mixed video (MV1 to MV4) and a speaker device (Speaker-21 to Speaker-24) to reproduce a mixed voice (MA1 to MA4) respectively. On the other hand, the multipoint control unit 1 is equipped with the video mixing unit 11 which mixes input videos and outputs a mixed video, the voice mixing unit 12 which mixes input voices and outputs a mixed voice and the layout change instruction analyzer 13. Suppose the layout change instruction analyzer 13 generates a video mixing control signal and inputs it to the video mixing unit 11, and can thereby control a mixing method for the mixed video generated by the video mixing unit 11. Furthermore, according to the present invention, suppose this layout change instruction analyzer 13 generates a voice mixing control signal and inputs it to the voice mixing unit 12, and can thereby control a mixing method for the mixed voice generated by the voice mixing unit 12. Between the conference terminal 21 and the multipoint control unit 1, there is a communication route Vc21-1 to transmit a video from the conference terminal 21, a communication route Vc21-2 to transmit a mixed video from the multipoint control unit 1, a communication route Ac21-1 to transmit a voice from the conference terminal 21, a communication route Ac21-2 to transmit a mixed voice from the multipoint control unit 1, and there is also a communication route Cc-21 to send/receive a parameter when mixing a video between the conference terminal 21 and the multipoint control unit 1. Here, the “parameter” used when mixing the video transmitted from the conference terminal 21 is used to change a screen split layout of the mixed video transmitted from the multipoint control unit 1 to the conference terminal 21 (hereinafter referred to as “layout change parameter”). That is, by transmitting a layout change parameter from the conference terminal 21, it is possible to freely change the screen split layout of the mixed video delivered to the own terminal. Communication routes to send/receive a video, a voice and a layout change parameter are also provided between the conference terminal 22 and the multipoint control unit 1, between the conference terminal 23 and the multipoint control unit 1 and between the conference terminal 24 and the multipoint control unit 1 in the same way. The layout change parameter corresponds, for example, to video selection information.
Hereinafter, details of the method of implementing the present invention will be explained.
The computer main unit 21-1 has a thin box-shaped housing and a pointing device 21-3 and a keyboard are arranged on the top surface thereof. Moreover, a network communication device 21-4 is incorporated in the computer main unit 21-1.
This network communication device 21-4 is a device to execute a communication over a network and designed to execute a communication defined, for example, as Ethernet. Alternatively, it is designed to execute a radio communication defined as IEEE802.11b or 802.11a. The communication operation of the network communication device 21-4 is controlled by a network transmission/reception program (see
This network transmission/reception program has a function of transmission/reception processing on video data and voice data using RTP in addition to network protocol processing such as TCP/IP, UDP.
Furthermore, the computer main unit 21-1 is provided with terminals for a microphone input and a speaker output, and a microphone device Microphone-21 and a speaker device Speaker-21 or a headset which unites the microphone device Microphone-21 and the speaker device Speaker-21 as an earphone can be connected thereto.
The microphone device Microphone-21 connected to this microphone input terminal is a device to input a voice to the conference terminal 21. The voice input operation of the microphone device Microphone-21 is controlled by a voice acquisition program (see
Furthermore, the computer main unit 21-1 includes a USB connection terminal and a camera device Camera-21 can be connected thereto.
The camera device Camera-21 connected to this USB connection terminal is a device to input a video to the conference terminal 21. The video input operation of the camera device Camera-21 is controlled by a video acquisition program (see
The display operation of the mixed video MVI is controlled by a video reproducing program (see
The CPU is a processor provided to control the operation of the conference terminal 21 and executes the operating system (OS) and various application programs loaded into the main memory from the hard disk drive (HDD).
The north bridge is a bridge device which bidirectionally connects a local bus of the CPU and a high-speed bus between the north bridge and the south bridge. The north bridge incorporates a display controller. The display controller controls the display device Monitor-21 which is used as the display monitor of the conference terminal 21. The display controller in this embodiment displays a mixed video on the display device Monitor-21 according to the video display program.
The south bridge is a bridge device which bidirectionally connects the high-speed bus on the north bridge side and a low-speed bus which connects a keyboard or the like. The south bridge incorporates the USB (Universal Serial Bus) controller. The camera device Camera-21 is connected to this USB controller. The camera device Camera-21 captures a video under the control of the video acquisition program and converts the captured video to an electric signal so that the captured video can be processed inside the conference terminal 21. Furthermore, the south bridge also incorporates the sound controller. The microphone device Microphone-21 and the speaker device Speaker-21 are connected to this sound controller. The microphone device Microphone-21 collects sound under the control of the voice acquisition program and converts the collected sound to an electric signal so that the sound can be processed inside the conference terminal 21. The speaker device Speaker-21 reproduces the sound processed as an electronic signal inside the conference terminal 21 under the control of the voice reproducing program as a sound wave. The south bridge also incorporates the LAN controller. The network communication device 21-4 such as a physical layer device of Ethernet is connected to this LAN controller. The network communication device 21-4 modulates transmission data and demodulates received data under the control of the network transmission/reception program.
This network communication device 1-4 is a device which executes a network communication and is designed to execute a communication specified, for example, as Ethernet. Alternatively, it is designed to execute a radio communication specified as IEEE 802.11b or 802.11a. The communication operation of the network communication device 1-4 is controlled by the network transmission/reception program (see
This network transmission/reception program has a function of transmission/reception processing on video data and voice data by RTP in addition to network protocol processing such as TCP/IP, UDP.
The CPU is a processor provided to control the operation of the multipoint control unit 1 and executes the operating system (OS) and various application programs loaded into the main memory from the hard disk drive (HDD).
The video compression program executes processing according to the video mixing program and executes processing of compressing and coding the mixed video data generated by the video mixing program into a format such as MPEG4 and the network transmission/reception program transmits the compressed and coded video data according to the video compression program.
The video decompression program executes processing according to the network transmission/reception program and executes processing of decompressing and decoding the received video data compressed and coded into a format such as MPEG4 subjected to reception processing by the network transmission/reception program into non-compressed video data and the video mixing program generates a mixed video using the non-compressed video data according to the video decompression program.
The voice compression program executes processing according to the voice mixing program, executes processing of compressing and coding the mixed voice data generated by the voice acquisition program into a format such as G.711 and the network transmission/reception program transmits the compressed and coded voice data according to the voice compression program.
The voice decompression program executes processing according to the network transmission/reception program, executes processing of decompressing and decoding the received voice data compressed and coded into a format such as G.711 subjected to reception processing by the network transmission/reception program into non-compressed voice data and the voice mixing program generates a mixed voice using the non-compressed voice data according to the voice decompression program.
The layout change instruction analysis program executes processing according to the network transmission/reception program and executes analysis processing on the layout change parameter subjected to reception processing by the network transmission/reception program. The video mixing program changes the screen split layout of the mixed video according to the analysis result of the layout change instruction analysis program. Furthermore, the layout change instruction analysis program calculates, when generating a mixed voice, the level of the volume of each voice in the case of the analysis processing on the layout change parameter. The voice mixing program adjusts the volume of each voice in the case of the mixed voice according to the calculation result of the layout change instruction analysis program.
The more specific processing functions of the layout change instruction program, video mixing program and voice mixing program will be described later.
The video compression program and the video decompression program at the multipoint control unit 1 in this embodiment process four videos at the same time independently. Furthermore, the voice compression program and the voice decompression program at the multipoint control unit 1 process four voices at the same time independently. Furthermore, the video mixing program generates four independent mixed videos using four videos. Furthermore, the voice mixing program generates four independent mixed voices using four voices. Furthermore, the network transmission/reception program performs transmission/reception processing on videos and voices of the four conference terminals and reception processing on the layout change parameter independently of each other.
The north bridge is a bridge device which bidirectionally connects a local bus of the CPU and a high-speed bus between the north bridge and the south bridge.
A LAN controller is incorporated in the south bridge. The network communication device 1-4 such as a physical layer device of Ethernet is connected to this LAN controller. The network communication device 1-4 modulates transmission data and demodulates received data under the control of the network transmission/reception program.
As internal components, the conference terminal 21 is provided with a network transmission/reception unit 211, a video compression unit 212, a video decompression unit 213, a voice compression unit 214, a voice decompression unit 215, a video acquisition unit 216, a video reproducing unit 217, a voice acquisition unit 218, a voice reproducing unit 219 and a layout change instructor 300. The above described network transmission/reception unit 211, video compression unit 212, video decompression unit 213, voice compression unit 214, voice decompression unit 215, video acquisition unit 216, video reproducing unit 217, voice acquisition unit 218, voice reproducing unit 219 and layout change instructor 300 are realized by the processing routines of the network transmission/reception program, video compression program, video decompression program, voice compression program, voice decompression program, video acquisition program, video reproducing program, voice acquisition program, voice reproducing program and layout change instruction program shown in
The video reproducing unit 217 allows drawing data created inside to be displayed on the display screen 2100 shown in
The network transmission/reception unit 211 sends/receives video data and voice data in a streaming format, manages the start and end of transmission/reception thereof, can identify video data and voice data that are sent/received, and sends/receives video data and voice data using appropriate communication channels. Upon receiving video data, the network transmission/reception unit 211 outputs the video data to the video decompression unit 213 and upon receiving voice data, the network transmission/reception unit 211 outputs the voice data to the voice decompression unit 215.
The video acquisition unit 216 controls the camera device Camera-21, instructs the start of video capturing and end of video capturing. When video capturing is started, the video (V1) captured by the camera device Camera-21 is inputted to the video acquisition unit 216 as video data. The video acquisition unit 216 outputs the video data to the video compression unit 214 to transmit the input video data to the multipoint control system 1. When the video data is inputted, the video compression unit 214 encodes (compresses) the video data into MPEG4 and outputs the video data to the network transmission/reception unit 211. The network transmission/reception unit 211 performs processing on the compressed video data so as to be transmitted to the multipoint control apparatus 1 through a network and then transmits the video data using the communication channel Vc21-1.
The voice acquisition unit 218 controls the microphone device Microphone-21 and instructs the start of sound collection and the end of video capturing. When sound collection starts, the voice (A1) being collected by the microphone is inputted to the voice acquisition unit 218 as voice data. The voice acquisition unit 218 outputs the voice data to the voice compression unit 214 so as to transmit the inputted voice data to the multipoint control apparatus 1. When the voice data is inputted, the voice compression unit 214 encodes (compresses) the voice data into G.711 and outputs it to the network transmission/reception unit 211. The network transmission/reception unit 211 performs processing on the compressed voice data so as to be transmitted to the multipoint control apparatus 1 through the network and then transmits the voice data using the communication channel Ac21-1.
When receiving data from the Vc21-2, the network transmission/reception unit 211 outputs the compressed video data included in the received data to the video decompression unit 213. When the compressed video data is inputted, the video decompression unit 213 decodes (decompresses) it, generates non-compressed video data and outputs the non-compressed video data generated to the video reproducing unit 217. The video reproducing unit 217 is equipped with the function of controlling the display device Monitor-21, creating and displaying the window 2101 as an application and displays, when displayable video data is inputted, the video data as “mixed video MV1” in the display area 1000 in the window 2101.
When receiving data from the Ac21-2, the network transmission/reception unit 211 outputs the compressed voice data included in the received data to the voice decompression unit 215. When the compressed voice data is inputted, the voice data voice decompression unit 215 decodes (decompresses) it, generates non-compressed voice data and outputs the non-compressed voice data generated to the voice reproducing unit 219. The voice reproducing unit 219 controls the speaker device Speaker-21 and reproduces the voice data inputted as “mixed voice MA1”.
An example of the embodiment of the layout change instructor 300 will be shown below.
First, the operation when the layout change instructor 300 is initialized will be explained.
The table management unit 304 internally creates and stores an area management table which is shown in
When the layout change instructor 300 is initialized, the area detection unit 302 acquires the area management table information in the initialized condition from the table management unit 304 and outputs the area management table information to the control data generation unit 305.
When the area management table information is inputted from the area detection unit 302, the control data generation unit 305 builds a payload unit of a mixed video control packet to transmit the area management table information to the multipoint control unit 1.
Upon receiving the mixed video control packet from the control data generation unit 305, the control data transmission processor 306 outputs this control packet to the network transmission/reception unit 211 together with additional information such as the destination address information of the network which is necessary to transmit this control packet to the multipoint control unit 1. When the mixed video control packet with the additional information added is inputted from the control data transmission processor 306, the network transmission/reception unit 211 transmits this mixed video control packet to the multipoint control unit 1 as the layout change parameter through the communication channel Cc21.
Next, the operation of the layout change instructor 300 accompanied by the user's operation after initialization will be explained.
The pointer detection unit 301 detects that the pointer 200 is in the display area 1000 of the mixed video MV1 in the window 2101 in the display screen 2100 and when an operation event further occurs at the position, the pointer detection unit 301 detects the event. An operation event is generated by clicking, double-clicking, drag and drop or the like through operations of the pointing device 21-3. As shown in
As shown in
When the rectangular area information {ID, x, y, w, h, Layer} is inputted from the area detection unit 302, the frame display unit 303 displays, using the values of x, y, w, h, the rectangular frame 2000 in the display area 1000 in the window 2101 of the display screen 2100 managed with XY coordinates.
Here, the method whereby the user moves the display position of the pointer 200 and changes the size and position of the rectangular frame displayed by the frame display unit 303 will be described. As shown above, the pointer detection unit 301 detects the position of the pointer 200 and outputs the position information (expressed using X′Y′ coordinates) of the pointer 200 and operation event information (left clicking, cancellation of left clicking and right clicking or the like) to the area detection unit 302. When the operation event information inputted is valid, the area detection unit 302 temporarily stores the position information (expressed using X′Y′ coordinates) of the pointer 200 converted to XY coordinates and operation event information. At this time, the area detection unit 302 detects whether or not the positions of the detected XY coordinates belong to the area of the rectangular area information {ID, x, y, w, h, Layer} stored inside and carries out, when the positions of the detected XY coordinates do not belong to the area, processing on the “position confirmation signal” described above, but when the positions of the detected XY coordinates are detected to belong to the area, the area detection unit 302 executes the “rectangular frame change processing”. The explanation of the processing on the “position confirmation signal” described above corresponds to the case where the rectangular area information is not stored inside the area detection unit 302.
Hereinafter, the “rectangular frame change processing” will be explained using
First, suppose a case where the pointer 200 is moved to a vertex of the rectangular frame 2000, the left button is clicked there, the pointer 200 is moved with the left button being kept clicked and left clicking is canceled after the pointer 200 is moved. In this case, the pointer detection unit 301 detects the first left clicking and inputs the information to the area detection unit 302 and the area detection unit 302 thereby recognizes that the vertex of the rectangular frame 2000 is the start of the specified “rectangular frame change processing”. Next, the pointer detection unit 301 detects the movement of the pointer and inputs the information to the area detection unit 302 and the area detection unit 302 can thereby recognize that it is the processing of changing the size of the rectangular frame 2000. Furthermore, the pointer detection unit 301 detects that left clicking is canceled and inputs the information to the area detection unit 302 and the area detection unit 302 can thereby recognize that the processing of changing the size of the rectangular frame 2000 is confirmed, that is, the end of the “rectangular frame change processing”. When the area detection unit 302 recognizes the processing of changing the size of the rectangular frame 2000, it changes the values of x, y, w, h of the rectangular area information {ID, x, y, w, h, Layer} stored inside as required and outputs the changed rectangular area information to the frame display unit 303. For example, in the processing whereby the size of the frame is changed by changing the position of the vertex clicked with the left button, the area detection unit 302 changes the values of x, y, w, h as appropriate so that the opposite angle position of the clicked vertex is fixed. At some midpoint of the processing of changing the size of the rectangular frame 2000, the area detection unit 302 outputs the rectangular area information at any time only for the frame display unit 303 so that the display of the rectangular frame in the display area 1000 is changed and upon recognizing the end of the “rectangular frame change processing”, the area detection unit 302 changes the information of x, y, w, h, Layer of the corresponding ID in the area management table under the management of the table management unit 304 and outputs the changed area management table information to the control data generation unit 305. In this embodiment, suppose a length-to-width aspect ratio of the rectangular frame is kept constant and when the position of the pointer 200 does not satisfy the requirement that the aspect ratio be kept constant when the end of the “rectangular frame change processing” is recognized, the pointer detection unit 301 automatically corrects the position of the pointer 200 to a point where the requirement that the aspect ratio be kept constant is satisfied. Furthermore, suppose the size can be changed only to four fixed sizes of a maximum display size (320 pixels×240 pixels in this embodiment) and sizes ¾, ½ and ¼ thereof in the display area 1000. When the frame does not fit into these sizes, suppose the frame is automatically corrected to a closest one among these sizes.
Next, suppose the pointer 200 is moved to a position located in an area within the rectangular frame 2000 yet other than the vertices, the left button is clicked there, the pointer 200 is moved with the left button being kept clicked and left clicking is canceled after the pointer 200 is moved. In this case, the pointer detection unit 301 detects the first left clicking and inputs the information to the area detection unit 302 and the area detection unit 302 thereby recognizes any position other than the vertices of the rectangular frame 2000 as the start of the specified “rectangular frame change processing”. Next, the pointer detection unit 301 detects the movement of the pointer 200 and inputs the information to the area detection unit 302 and the area detection unit 302 can thereby recognize that it is the processing of changing the position of the rectangular frame 2000. Furthermore, the pointer detection unit 301 detects that the left clicking is canceled and inputs the information to the area detection unit 302, and the area detection unit 302 can thereby recognize that the processing of changing the position of the rectangular frame 2000 has been confirmed, that is, the end of the “rectangular frame change processing”. When the area detection unit 302 recognizes the processing of changing the position of the rectangular frame 2000, it changes the values of x, y of the rectangular area information {ID, x, y, w, h, Layer} stored inside and outputs the changed rectangular area information to the frame display unit 303. For example, when the size of the frame is assumed not to change in the processing of changing a position, the values of x, y are changed as appropriate using the value of difference between the position of the pointer 200 recognized at the start of the “rectangular frame change processing” and the position of the pointer 200 in movement. At some midpoint of the processing of changing the position of the rectangular frame 2000, the area detection unit 302 outputs the rectangular area information as appropriate only to the frame display unit 303 so that the display of the rectangular frame is changed in the display area 1000 and at the time point of recognizing the end of the “rectangular frame change processing”, the area detection unit 302 changes the information of x, y, w, h, Layer of the corresponding ID in the area management table under the management of the table management unit 304 and outputs the changed area management table information to the control data generation unit 305.
In the case of the processing of changing the size or the position of the rectangular frame 2000, the area detection unit 302 changes the information of x, y, w, h, Layer of the corresponding ID in the area management table under the management of the table management unit 304, but it is also possible to perform such control that the Layer with the corresponding ID is set to 1 and the corresponding video source is arranged at the top. In this case, the value of the layer which was previously 1 in the area management table is incremented by 1. If this results in an overlap with other registered information, the value of the layer of the other registered information is incremented by 1.
The processing at the control data generation unit 305 and the control data transmission processor 306 when area management table information is inputted corresponds to that explained above as the operation when the layout change instructor 300 is initialized.
On the other hand, when the conference terminal 21 contrarily receives a mixed video control packet from the multipoint control unit 1, suppose the conference terminal 21 extracts the area management table included therein and overwrites the area management table information under the own management therewith.
As the internal components, the multipoint control unit 1 is provided with a network transmission/reception unit 101, four video compression units 102-1 to 102-4, four video decompression units 103-1 to 103-4, four voice compression units 104-1 to 104-4, four voice decompression units 105-1 to 105-4, a video mixing unit 11, a voice mixing unit 12 and a layout change instruction analyzer 13. The above described network transmission/reception unit 101, video compression units 102-1 to 102-4, video decompression units 103-1 to 103-4, voice compression units 104-1 to 104-4, voice decompression units 105-1 to 105-4, video mixing unit 11, voice mixing unit 12 and layout change instruction analyzer 13 are realized by processing routines of the network transmission/reception program, video compression program, video decompression program, voice compression program, voice decompression program, video mixing program, voice mixing program and layout change instruction analysis program shown in
The network transmission/reception unit 101 can receive video data using the communication channels Vc21-1 to Vc24-1 shown in
The network transmission/reception unit 101 transmits/receives video data and voice data in a streaming format, manages the start and the end of transmission/reception thereof, can identify video data and voice data to be transmitted/received and transmits/receives video data and voice data using appropriate communication channels.
The network transmission/reception unit 101 outputs the video data received through the Vc21-1 to the video decompression unit 103-1, outputs the video data received through the Vc22-1 to the video decompression unit 103-2, outputs the video data received through the Vc23-1 to the video decompression unit 103-3 and outputs the video data received through the Vc24-1 to the video decompression unit 103-4.
The network transmission/reception unit 101 outputs the voice data received through the Ac21-1 to the voice decompression unit 105-1, outputs the voice data received through the Ac22-1 to the voice decompression unit 105-2, outputs the voice data received through the Ac23-1 to the voice decompression unit 105-3 and outputs the voice data received through the Ac24-1 to the voice decompression unit 105-4.
The non-compressed video data decompressed by the video decompression unit 103-1, video decompression unit 103-2, video decompression unit 103-3 and video decompression unit 103-4 are inputted to video mixing unit 11. The video mixing unit 11 internally creates four kinds of mixed videos MV1 to MV4, outputs the mixed video MV1 to the video compression unit 102-1, outputs the mixed video MV2 to the video compression unit 102-2, outputs the mixed video MV3 to the video compression unit 102-3 and outputs the mixed video MV4 to the video compression unit 102-4.
The non-compressed voice data decompressed by the voice decompression unit 105-1, voice decompression unit 105-2, voice decompression unit 105-3 and voice decompression unit 105-4 are inputted to the voice mixing unit 12. The voice mixing unit 12 internally creates four kinds of mixed voices MA1 to MA4, outputs mixed voice MA1 to the voice compression unit 104-1, outputs mixed voice MA2 to the voice compression unit 104-2, outputs mixed voice MA3 to the voice compression unit 104-3 and outputs mixed voice MA4 to the voice compression unit 104-4.
The reduction parameters for the reduction circuits 31 to 34 inputted to the video mixing unit 11 from outside and the position parameters for the mixing circuits 41 to 44 are collectively called “video mixing control signals”.
Parameters for the adjustment circuits 51 to 54 inputted to the voice mixing unit 12 from outside are collectively called “voice mixing control signals”.
The multipoint control unit 1 whose configuration is shown in
The layout change instruction analyzer 13 judges which conference terminal transmitted a mixed video control packet (S11). The terminal which transmitted the packet is defined as a “transmission terminal”.
The layout change instruction analyzer 13 extracts an area management table from the mixed video control packet (S12). This is defined as a “transmission area management table”.
The layout change instruction analyzer 13 analyzes the area management table and recognizes how the transmission terminal will change the screen split layout of the mixed video delivered to the transmission terminal (S13). In the case of this embodiment, the size and the arrangement position of each video for generating a mixed video is can be analyzed from the area management table shown in
The layout change instruction analyzer 13 identifies the conference terminal which delivers the video whose size is instructed to be increased by the transmission terminal using the size of each video recognized in step 3 (S14). The conference which delivers this video is defined as a “target terminal”.
The layout change instruction analyzer 13 generates a second area management table to instruct the screen split layout of the mixed video to be delivered to the target terminal (S15). This second area management table is defined as a “target area management table”. The target area management table is set so that the size of the video delivered by the transmission terminal increases. For example, the size of the video delivered by the transmission terminal is adjusted so as to be equal to the size of the video delivered by the target terminal specified in the transmission area management table. Furthermore, an arrangement position is specified so that the video whose size is increased falls within the range of the mixed video. Furthermore, the hierarchy information is specified so that the video of the transmission terminal comes to the top layer.
The layout change instruction analyzer 13 generates a video mixing control signal using the information of the transmission area management table and the target area management table and outputs it to the video mixing unit (S16).
The layout change instruction analyzer 13 generates a voice mixing control signal to control a mixed voice delivered to the transmission terminal and the target terminal and outputs it to the voice mixing unit (S17). In this case, parameters are adjusted so that the volume of the voice delivered from the target terminal becomes louder in the mixed voice delivered to the transmission terminal. Furthermore, parameters are adjusted so that the volume of the voice delivered from the transmission terminal becomes louder in the mixed voice delivered to the target terminal.
The layout change instruction analyzer 13 generates a mixed video control packet including the target area management table and transmits it to the target terminal (S18).
The layout change instruction analyzer 13 judges which conference terminal transmitted the mixed video control packet (S21). The terminal which transmitted the packet is defined as a “transmission terminal”.
The layout change instruction analyzer 13 extracts an area management table from the mixed video control packet (S22). This is defined as a “transmission area management table”.
The layout change instruction analyzer 13 analyzes the area management table and recognizes how the transmission terminal will change the screen split layout of the mixed video delivered to the transmission terminal (S23). In the case of this embodiment, the size and the arrangement position of each video for generating a mixed video can be analyzed from the area management table shown in
The layout change instruction analyzer 13 in step 3 (S24). The conference terminal which delivers this video is defined as a “target terminal”. Furthermore, the transmission terminal and terminals other than the target terminal are defined as “non-target terminals”.
The layout change instruction analyzer 13 generates a second area management table and a third area management table to instruct a screen split layout of the mixed video delivered to the target terminal and the non-target terminals (S25). This second area management table is defined as a “target area management table” and the third area management table is defined as a “non-target area management table”. The target area management table is set so that the size of the video delivered by the transmission terminal increases. For example, the size of the video delivered by the transmission terminal is adjusted so as to be equal to the size of the video delivered by the target terminal specified in the transmission area management table. Furthermore, the arrangement position is specified so that the video in the increased size falls within the range of the mixed video. Furthermore, the hierarchy information is specified so that the video of the transmission terminal comes to the top layer. On the other hand, the non-target area management table is set so that the size of the video delivered by the transmission terminal and the size of the video delivered by the target terminal become smaller. For example, the sizes of videos delivered by the transmission terminal and the target terminal are adjusted to become the smallest. Furthermore, the arrangement position is specified so that the video in the reduced size falls within the range of the mixed video. Furthermore, the hierarchy information is specified so that the video of the transmission terminal comes to the top layer and the video of the target terminal comes to the second layer.
The layout change instruction analyzer 13 generates a video mixing control signal using the information of the transmission area management table and the target area management table and outputs it to the video mixing unit (S26).
The layout change instruction analyzer 13 generates a voice mixing control signal to control a mixed voice to be delivered to the transmission terminal and the target terminal and outputs it to the voice mixing unit (S27). In this case, parameters are adjusted so that the volume of the voice delivered from the target terminal becomes louder in the mixed voice delivered to the transmission terminal. Furthermore, parameters are adjusted so that the volume of the voice delivered from the transmission terminal becomes louder in the mixed voice delivered to the target terminal. Furthermore, parameters are adjusted so that the volume of the voice delivered from the transmission terminal and the volume of the voice delivered from the target terminal become smaller in the mixed voice delivered to the non-target terminals.
The layout change instruction analyzer 13 generates a mixed video control packet including the target area management table and transmits it to the target terminal (S28). Furthermore, the layout change instruction analyzer 13 generates a mixed video control packet including the non-target area management table and transmits it to the non-target terminals.
As a result of processing procedure example 1 in the above described layout change instruction analyzer 13, when, for example, user A increases the display size of user B (changes from 160×120 pixels to 240×180 pixels) in the mixed video delivered to the own conference terminal 21 as shown in
Furthermore, as a result of processing procedure example 2 in the above described layout change instruction analyzer 13, when, for example, user A increases the display size of user B (changes from 160×120 pixels to 240×180 pixels) in the mixed video delivered to the own conference terminal 21 as shown in
This embodiment has explained the case where the number of conference terminals is four, but the number of terminals is not limited to this and the number of terminals may also be more or less than four. When there are many conference terminals, such a case can be handled by increasing the number of corresponding components in the multipoint control unit 1.
This embodiment has explained the case where the sizes of all videos transmitted from the conference terminals 21 to 24 are 320×240 pixels, but the sizes of videos transmitted from the respective conference terminals may also differ from one another. In such a case, it is possible to input videos, for example, to a video size decision unit 71 as shown in
This embodiment assumes that the average volumes of voices transmitted from the conference terminals 21 to 24 are the same, but the average volumes of voices transmitted from the respective conference terminals may also differ from one another. In such a case, it is possible to input voices, for example, to a volume level decision unit 81 as shown in
Furthermore,
Furthermore,
As the first embodiment of the present invention, the detailed configurations and operations of the multipoint control unit 1 and conference terminals 21 to 24 and the video-conferencing system made up of these components have been shown so far.
At an actual conference, local conversations (private conversations) such as private consultation and confirmation are often conducted. When a local conversation is held during an actual conference, an interested party often talks to the other party in such a small voice that other conferees cannot hear. That is, the party approaches the other party and talks in a suppressed tone of voice.
For example, a certain user A performs control so that in a mixed video delivered to the own device, the facial image of a user B with whom the user A wants to conduct a local conversation among other parties of communication displayed in the mixed video is displayed enlarged and a virtual sense of distance from the user B is thereby shortened. In this case, control is automatically performed so that the face of the user A is also displayed enlarged on the user B side and a virtual sense of distance from the user A is thereby shortened for the user B, too. In this condition, only the voice of the user A out of the mixed voices delivered to the user B is emphasized and mixed and only the voice of the user B out of the mixed voice delivered to the user A is emphasized and mixed. That is, after shortening the sense of distance, even if the user A and the user B conduct a conversation in a smaller voice than normal voice, the conversation between the parties becomes easy to be heard as a result of the emphasis. On the other hand, other users can hear the conversation between the user A and the user B just as the same small voice. The present invention allows users to conduct a local conversation even during a video-conference in a sense similar to that in an actual conference.
Here, in the above described example of
Furthermore, in the example of
This embodiment has described the “rectangular frame change processing” as a specific example of the operation method of increasing the display of the facial image of the other party with whom a user wants to conduct a local conversation in a mixed video displayed on the conference terminal side, but the operation method is not limited to this. For example, as an operation to select the other party, when the mouse button is “clicked” on the facial image of the other party with whom the user wants to conduct a local conversation, it is possible to send position information indicating the clicked point in the mixed video from the conference terminal to the multipoint control unit, detect the parties who conduct a local conversation from the information on the multipoint control unit side, generate a mixed video with the sizes of the respective facial images adjusted for the parties and deliver the mixed video or generate a mixed voice with the volumes of the respective voices adjusted and deliver the mixed voice. It is also possible to perform control such that left clicking causes the sizes of the facial images or the sound volumes of the parties to double or become a maximum and right clicking causes the sizes of the facial images or the sound volumes which have been increased by left clicking to be reduced to ½ or return to their original levels.
Moreover, this embodiment generates and delivers a mixed video in which facial images of parties who conduct a local conversation are displayed enlarged, but the operation of selecting the other party need not be limited to the method of enlarging the facial image of the party. For example, it is possible to generate and deliver a mixed video in which the facial image of the party is displayed framed or generate and deliver a mixed video in which the facial images of users other than the party are displayed with lowered color tones and darkened so that only the party is highlighted.
Hereinafter, a second embodiment of the present invention will be presented with reference to drawings.
The configurations of conference terminals 21 to 24 and a multipoint control unit 1 of this embodiment are the same as those of the first embodiment and correspond to the first embodiment with the function of a layout change instruction analyzer 13 added thereto.
On the other hand,
In an actual conference, local conversations (private conversations) such as private consultation and confirmation are often conducted during the conference. In an actual conference, while engaging a local conversation, a party concerned often holds a conversation in a small voice with the other party so that other people in the conference do not hear the voice. That is, the parties come close to each other and talk in suppressed tone of voice. The present invention allows the other conferees to recognize that a local conversation is being held, cause the local conversation to stop or also participate in the local conversation depending on their needs.
Number | Date | Country | Kind |
---|---|---|---|
2006-244553 | Sep 2006 | JP | national |