This Nonprovisional application claims priority under 35U.S.C. ยง 119 (a) to Japanese Patent Application No. 2023-129793, filed in Japan on Aug. 9, 2023, the entire contents of which are hereby incorporated by reference.
An embodiment of the present disclosure relates to an image control method, an image control apparatus, and a program.
Japanese Unexamined Patent Application Publication No. 2022-16997 discloses an information processing method that, in a case in which it is determined that a user is in an utterance state, outputs fast-forward image data of the user that is obtained by fast-forwarding image data of the user prior to a predetermined time, among buffered image data, to other users.
The information processing method of Japanese Unexamined Patent Application Publication No. 2022-16997 is hard to convey an intention not to talk or a desire to talk, to other users, since the entire screen display does not change whether the microphone is muted on or muted off.
In view of the foregoing, an embodiment of the present disclosure is directed to provide an image control method that is able to clearly convey an intention not to talk or a desire to talk, to a conference participant.
An image control method includes determining a mute operation, sending control information to switch a first state and a second state to a camera, based on the determination result, and causing the camera that outputs first image information in the first state and outputs second image information in the second state to output, based on the control information, the first image information or the second image information.
According to an embodiment of the present disclosure, an intention not to talk or a desire to talk is able to be clearly conveyed to a conference participant.
The microphone 10 is installed on a ceiling in a room. The microphone 10 has a housing having a thin rectangular parallelepiped shape. The controller 20 and the camera 30 are installed on a desk.
The desk is installed directly under the housing of the microphone 10. In the example of
The camera 30 obtains an image of a user. Predetermined signal processing is performed on a video signal according to the obtained image, and the video signal on which the signal processing has been performed, is sent to the PC 40. The camera 30 performs framing processing such as pan, tilt, or zoom, for example.
The microphone 10 obtains a voice of the user. The microphone 10 includes a communication interface (I/F) 11, a processing controller 12, a flash memory 13, a RAM 14, and a microphone unit 15.
The processing controller 12 reads out an operating program from the flash memory 13 to the RAM 14 and collectively controls operations of the microphone 10. It is to be noted that the program does not need to be stored in the flash memory 13 of the own apparatus. The processing controller 12 may download the program each time from a server or the like, for example, and may read out the program to the RAM 14.
The processing controller 12 functions as a processor that processes an audio signal. The processing controller 12 performs predetermined signal processing on the audio signal obtained by the microphone unit 15. The microphone unit 15 is an array microphone that has a plurality of microphone units, for example. The processing controller 12 performs directivity processing of beamforming. The beamforming is processing to arrange a phase in a direction of a talker by delay sum processing and forms a sound collection beam having increased sensitivity in the direction of the talker, for example.
The processing controller 12 may obtain direction information on the voice of a talker and perform processing to direct the sound collection beam in the direction of the talker. The processing controller 12 analyzes the audio signal obtained from a plurality of microphones in the microphone unit 15 and estimates a voice arrival direction. The method of analyzing the audio signal may be any method such as a cross-correlation method, a delay sum (Delay-and-Sum) method, or a MUSIC (Multiple Signal Classification) method. In the cross-correlation method, the processing controller 12 calculates a cross correlation of audio signals of the plurality of microphones, for example. The processing controller 12 obtains a cross-correlation peak of audio signals of certain two microphones, for example. The processing controller 12 further obtains a cross-correlation peak of audio signals of two different microphones. The processing controller 12 estimates the voice arrival direction based on of a plurality of cross-correlation peaks calculated in such a manner. In other words, the processing controller 12 selects two or more sets of the plurality of microphones and obtains the plurality of cross-correlation peaks. The estimated voice arrival direction is represented by a space vector, for example.
The processing controller 12 sends the audio signal on which the signal processing has been performed, to the PC 40 through the communication I/F 11. The PC 40 is connected to another information processing apparatus in a remote place through the network such as the Internet. The PC 40 sends the audio signal received from the microphone 10 and the video signal received from the camera 30 to the information processing apparatus on a far-end side. The PC 40 may display the video signal received from the camera 30 on a display (not shown) of the own apparatus.
In addition, the PC 40 receives a video signal and an audio signal from the information processing apparatus on the far-end side. The PC 40 outputs the received video signal on the not-shown display. In addition, the PC 40 outputs the received audio signal to a not-shown speaker. As a result, the image control system functions as a component of a remote conference system for holding a remote conference.
The controller 20 is an example of the image control apparatus of the present disclosure and is a remote controller for operating the microphone 10 or the camera 30. The controller 20 includes a communication I/F 21, a processing controller 22, a flash memory 23, a RAM 24, and a user I/F 25.
The processing controller 22 reads out an operating program 231 from the flash memory 23 to the RAM 24 and collectively controls operations of the controller 20. It is to be noted that the program does not need to be stored in the flash memory 23 of the own apparatus. The processing controller 22 may download the program each time from a server or the like, for example, and may read out the program to the RAM 24.
The processing controller 22 receives an operation by a user through the user I/F 25. The user I/F 25 has at least a mute button. However, the user I/F 25 may include an operation element such as a volume change button or a power button.
The microphone 10 receives the mute information (S21). The microphone 10, in a case of receiving the mute-on information, stops an output of the audio signal obtained by the microphone unit 15 and changes into a mute state, in a case of receiving the mute-off information, resumes the output of the audio signal obtained by the microphone unit 15 and cancels the mute state.
The processing controller 22 of the controller 20 sends control information to switch a first state and a second state to the camera 30 based on the determination result (S13). The processing controller 22, in the case of determining the mute-on operation, sends the control information to switch to the first state, to the camera 30, and, in the case of determining the mute-off operation, sends the control information to switch to the second state, to the camera 30.
The camera 30 receives the control information (S31). The camera 30 switches s a camera state based on the control information (S32). The camera state includes the first state in which first image information is outputted and the second state in which second image information is outputted.
Alternatively, the reference state may correspond to a state on which an image of a specific user is focused. The specific user is a chairperson who facilitates a conference. The specific user is preset by the PC 40. Alternatively, the reference state may be an initial state in which the framing processing such as pan, tilt, and zoom, for example, is not performed.
The second state is a state in which the framing processing such as pan, tilt, and zoom, for example, is performed and the talker is focused on. As an example, the camera 30 performs processing to recognize the face of the talker by a predetermined model using a neural network or the like, for example. The camera 30 performs pan, tilt, and zoom so that the image of the recognized talker may be in the center of a screen and so that an occupancy rate of the image of the talker in the screen may be a predetermined rate (50%, for example).
It is to be noted that the camera 30 may receive the direction information on the voice of the talker that the microphone 10 obtains, and may perform the framing processing based on the direction information. In addition, the camera 30 may perform processing to mask an image of a person other than the talker or an image of a person who has not participated in a conference in the second state.
As described above, the image control system according to the present embodiment changes the camera state of the camera 30 in conjunction with the mute button of the controller 20. In a case in which a user performs the mute-on operation, the first image information that the camera 30 outputs is an image obtained by capturing the whole users. In a case in which a user performs the mute-off operation, the second image information that the camera 30 outputs is an image that focuses on a talker (the user u1 in the example of
The first image information and the second image information are sent to the information processing apparatus on the far-end side through the PC 40. The first image information or the second image information is displayed on a display of the information processing apparatus on the far-end side. Alternatively, the PC 40 may display the first image information and the second image information on the display (not shown) of the own apparatus.
Normally, a mute-off state and a mute-on state are displayed on a GUI of software of the remote conference system for holding a remote conference. Therefore, as a comparative example, in a case in which the camera state does not change even when the user performs the mute operation, for example, a user of the information processing apparatus on the far-end side is unlikely to notice a change from mute-on to mute-off and from mute-off to mute-on.
However, in the image control system according to the present embodiment, in the case in which the user performs the mute-on operation, the image of the camera 30 is the image obtained by capturing the whole users. A user who looks at the image obtained by capturing the whole users can intuitively understand that nobody wants to talk. In the image control system according to the present embodiment, in the case in which the user performs the mute-off operation, the image of the camera 30 is the image that focuses on a talker (the user u1 in the example of
In such a manner, a user of the image control system according to the present embodiment can gain a new customer experience of being able to clearly convey an intention not to talk or a desire to talk, to other conference participants.
In the image control system according to the first modification, the camera 30 is directly connected to the PC 40. The controller 20 sends the control information to switch the first state and the second state to the PC 40 in the processing of S13.
The PC 40 receives the control information (S41). The PC 40 switches the state of the camera 30, based on the received control information (S42). The camera state, as described above, includes the first state in which the first image information is outputted and the second state in which the second image information is outputted.
In such a manner, the controller 20 may send the control information to the camera 30 through an information processing apparatus that receives a video signal of a camera.
Specifically, in the image control system according to the second modification, when the controller 20 sends the mute information to the microphone 10 in the processing of S12, the microphone 10 sends the control information to switch the first state and the second state to the camera 30 (S23).
In other words, in the second modification, the controller 20 does not send the control information to the camera 30 but sends the mute information to the microphone 10, which causes the microphone 10 to send the control information to the camera 30.
It is to be noted that, in a case in which the camera 30 receives the direction information on the voice of a talker that the microphone 10 obtains and performs the framing processing based on the direction information, the microphone 10 may stop sending the direction information when sending the control information to switch to the first state to the camera 30. As a result, in a case in which the camera 30 switches to the first state, a conflict between the framing processing to focus on a talker based on the direction information and the framing processing to capture the whole of the plurality of users is able to be prevented.
An image control system according to the third modification includes a processor 50. The processor 50 may be hardware (DSP: Digital Signal Processor) of signal processing. The processor 50 controls a device such as the microphone 10 connected to the image control system through the network, a not-shown speaker, or the camera 30, and performs signal processing such as routing, mixing, or effects, on a signal to be inputted into each device, or a signal to be outputted from each device.
The processor 50, from the microphone 10, may receive the audio signal obtained by the microphone unit 15, and may perform directivity processing of beamforming. Alternatively, the processor 50 may analyze the audio signal obtained from the plurality of microphones in the microphone unit 15 and estimate a voice arrival direction. Alternatively, the processor 50 may receive the video signal captured by the camera 30 and perform framing processing.
The controller 20 is directly connected to the processor 50. The controller 20 sends the mute information to the processor 50 in the processing of S12. The processor 50 receives the mute information (S51) and sends the mute information to the microphone 10 (S52). Then, the processor 50 sends the control information to switch the first state and the second state to the camera 30 (S53).
In such a manner, in a case in which the image control system includes the processor 50, the controller 20 may send the control information to the camera 30 through the processor 50.
In the image control system according to the fourth modification, the camera 30 and the controller 20 are directly connected to the PC 40. The controller 20 sends the mute information to the PC 40 in the processing of S12.
The PC 40, when receiving the mute information (S61), sends the mute information to the microphone 10 (S62). Then, the PC 40 sends the control information to switch the first state and the second state, the camera 30 (S63).
In such a manner, the controller 20 may send the mute information to the microphone 10 through the PC 40 and send the control information to the camera 30.
The description of the foregoing embodiments is illustrative in all points and should not be construed to limit the present disclosure. The scope of the present disclosure is defined not by the foregoing embodiments but by the following claims. Further, the scope of the present disclosure is intended to include all modifications within the scopes of the claims and within the meanings and scopes of equivalents.
For example, each device including the processor 50 does not need to be connected through the network and may be connected by another communication line such as USB. Alternatively, each device including the processor 50 may be connected by wireless such as wireless LAN or Bluetooth (registered trademark).
Number | Date | Country | Kind |
---|---|---|---|
2023-129793 | Aug 2023 | JP | national |