The present invention relates to an information processing system, an edge device, a server, a control method, and a storage medium, and more particularly to an information processing system, an edge device, a server, a control method, and a storage medium that control communication of avatars in a virtual space.
It has become common that a plurality of users each wear a head mounted display (an HMD) and communicate with each other through their own alter egos, known as avatars, in a virtual space created by virtual reality (VR). Voice chat is often used as a communication means (a communication tool) between the users wearing the HMD.
Japanese Patent No. 6289703 discloses a technique for imparting directionality to a voice outputted from a sound source object of a first avatar and setting a sound volume parameter of the outputted voice according to a position of a sound collection object of a second avatar.
For example, consider a situation in which a user and an acquaintance participate as spectators in a live music event held in a virtual space, each wearing an HMD. At the live music event, the surroundings for the user and the acquaintance (themselves) are usually noisy due to singing and playing of performers, cheers of other spectators, etc. Therefore, there is an issue that even in the case that the acquaintance speaks to the user while the user and the acquaintance are participating in the live music event held in the virtual space, the user has difficulty hearing the acquaintance's voice and having a conversation with the acquaintance.
In the case that the technique disclosed in Japanese Patent No. 6289703 is applied in this situation, several problems are expected to occur. The first problem is that setting of the sound volume parameter can only be controlled on the sound source object side, so even in the case that the user of the second avatar (the sound collection object side) thinks that “he or she is not able to hear the voice outputted from the first avatar (the sound source object side)”, this cannot be solved. The second problem is that the user of the first avatar (the sound source object side) has to determine whether or not it is necessary to change the setting of the sound volume parameter. The third problem is that in the case that there are avatar(s) of other spectator(s) near a desired acquaintance (the second avatar), the loud voice outputted from the sound source object of the first avatar will reach the other spectator(s) and cause a nuisance.
The present invention provides an information processing system that allows a user to establish a conversation with a desired person through an avatar even in a noisy environment in a virtual space, an edge device, a server, a control method, and a storage medium.
Accordingly, the present invention provides an information processing system that comprises three or more multiple user terminals, each including a voice input unit and a voice output unit, connected via a network, and provides a virtual space including avatars linked to the multiple user terminals, respectively, the information processing system comprising a voice generating unit configured to generate voice data whose sound source is each avatar linked to each of the multiple user terminals based on a voice inputted into the voice input unit of each of the multiple user terminals, and one or more processors and/or circuitry configured to execute a voice data transmitting processing that transmits, to the multiple user terminals, the voice data whose sound source is each avatar, execute an association determination processing that determines whether or not first voice data whose sound source is a first avatar linked to a first user terminal among the multiple user terminals is associated with a second avatar linked to a second user terminal among the multiple user terminals, execute a voice data analysis processing that analyzes the first voice data among voice data to be transmitted to the second user terminal by the voice data transmitting processing, and second voice data excluding the first voice data, and execute a sound volume adjustment processing that, in a case that it is determined in the association determination processing that the first voice data is associated with the second avatar and it is analyzed in the voice data analysis processing that the second voice data disturbs the first voice data, adjusts a sound volume of the voice data to be transmitted to the second user terminal, which is outputted from the voice output unit of the second user terminal.
Accordingly, the present invention provides an edge device that functions as one of the multiple user terminals, which are included in the information processing system, the edge device comprising one or more processors and/or circuitry configured to execute the sound volume adjustment processing.
Accordingly, the present invention provides a server that is included in the information processing system and that is connected to the multiple user terminals via the network, the server comprising one or more processors and/or circuitry configured to execute the voice data transmitting processing, the association determination processing, and the voice data analysis processing.
According to the present invention, the user is able to establish the conversation with the desired person through the avatar even in the noisy environment in the virtual space.
Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
The present invention will now be described in detail below with reference to the accompanying drawings showing embodiments thereof.
Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the following embodiment does not limit the present invention as defined by the claims. Although the following embodiment will describe a plurality of features, not all of the plurality of features are necessarily essential to the present invention, and the plurality of features may be combined in any desired manner. Furthermore, in the accompanying drawings, the same or similar configurations (components) will be given the same reference numerals and duplicated descriptions will be omitted.
The CPU 101a executes various kinds of processing by using programs and data stored in the RAM 103a or the ROM 102a. As a result, the CPU 101a performs the operation control of the user terminal 100 as a whole, and also executes or controls various kinds of processing which will be described below as those performed by the user terminal 100.
The ROM 102a stores setting data for the user terminal 100, computer programs or data related to the startup of the user terminal 100, or computer programs or data related to the basic operations of the user terminal 100, and the like.
The RAM 103a includes an area for storing computer programs or data loaded from the ROM 102a or the HDD 104a. In addition, the RAM 103a includes a working area that is used when the CPU 101a executes various kinds of processing. In this way, the RAM 103a is able to provide various kinds of areas as needed.
The HDD 104a is an example of a large-capacity information storage device. The HDD 104a stores an operating system (an OS), or computer programs or data for causing the CPU 101a to execute and control the various kinds of processing which will be described below as those performed by the user terminal 100. In addition, in the above description, registering and retaining of various kinds of information may be performed by the HDD 104a or the RAM 103a. The computer programs or the data stored in the HDD 104a are loaded into the RAM 103a as appropriate under the control of the CPU 101a, and are processed by the CPU 101a.
It should be noted that in addition to the HDD 104a or instead of the HDD 104a, a medium (a recording medium) and a drive device that reads and writes computer programs and/or data from and into the medium may be provided. Known examples of such a medium include a flexible disk (an FD), a CD-ROM, a DVD, a USB memory, an MO disk, and a flash memory.
The hardware configuration of the information processing apparatus applicable to the user terminal 100 is not limited to the configuration shown in
The HMD 106 includes a function of displaying a field-of-view image in a virtual space that is created by VR and is provided by the server 111. In addition, the HMD 106 includes various kinds of sensors (a first detecting unit and a second detecting unit) built therein for detecting an inclination of the HMD 106 itself, a line of sight of a user wearing the HMD 106 (hereinafter, referred to as “a wearing user”), movements of hands of the wearing user, etc. The CPU 101 (a line-of-sight direction setting unit) sets a line-of-sight direction (a gaze direction) of an avatar of the wearing user based on a position of the avatar of the wearing user and the line of sight of the wearing user that has been detected by the HMD 106, and determines a field-of-view image to be displayed by the HMD 106 in accordance with the line-of-sight direction. Furthermore, the CPU 101 (an avatar movement control unit) moves hands of the avatar of the wearing user, which is displayed within the virtual space, in response to the movements of the hands of the wearing user that have been detected by the HMD 106. It should be noted that based on the detection results obtained by the various kinds of sensors built in the HMD 106, the server 111 may perform setting of the line-of-sight direction of the wearing user in the virtual space, the determination of the field of view image, and the control of movements of the hands of the avatar of the wearing user.
When the microphone 107 (serving as a voice input unit) accepts an input of the wearing user's voice, the microphone 107 (serving as a voice generating unit) generates voice data thereof and then transmits the generated voice data to the CPU 101a. The CPU 101a transmits the voice data from the microphone 107 to the server 111 as voice data whose sound source is an avatar linked to the user terminal 100. It should be noted that in the case that the wearing user is a performer of a live music event held in the virtual space, the microphone 107 accepts an input of a singing voice or a playing sound of the performer. The speaker 108 (serving as a voice output unit) outputs voice data collected in the virtual space.
The controller 109 (serving as a setting unit) accepts input(s) from the wearing user and reflects the contents of the input(s) in moving the position of the avatar, which is an alter ego of the wearing user in the virtual space (which represents the wearing user in the virtual space), and in operating a user interface (a UI) displayed in the virtual space.
Unlike the user terminal 100, the server 111 is not connected to the external apparatuses such as the HMD 106, the microphone 107, the speaker 108, and the controller 109, but the server 111 is an information processing apparatus whose hardware configuration is the same as that of the user terminal 100 except for that the server 111 is not connected to the external apparatuses. In other words, the server 111 includes a CPU 101b, a ROM 102b, a RAM 103b, and an HDD 104b, and respective functional units (the CPU 101b, the ROM 102b, the RAM 103b, and the HDD 104b) are communicably connected to each other via a bus 105b. In the hardware configuration shown in
In addition, numerical values, processing timings, processing orders, the processing entities, transmission destinations/transmission sources/storage locations of data (information), etc., which are used in the embodiment described below, are given as an example to provide a concrete description, and are not intended to be limited to such an example.
As shown in
The first user terminal 100-1 includes an input/output unit 112-1 and a transmitting and receiving unit (user terminal) 113-1, and the server 111 according to the present embodiment includes a transmitting and receiving unit (server) 114, an association determining unit 115, a voice data analyzing unit 116, and a sound volume adjusting unit 117.
The respective functional units of the first user terminal 100-1 and the server 111 that are shown in
The input/output unit 112-1 accepts data inputted from external apparatuses (an HMD 106-1, a microphone 107-1, a speaker 108-1, a controller 109-1, etc.) connected to the first user terminal 100-1, and outputs data to these external apparatuses.
The transmitting and receiving unit (user terminal) 113-1 transmits data retained by the first user terminal 100-1 (for example, voice data generated by the microphone 107-1, the position of the avatar of the wearing user, the movements of the hands of the wearing user, etc.) to the server 111 via the network 110. In addition, the transmitting and receiving unit (user terminal) 113-1 receives data transmitted from the server 111 via the network 110.
The transmitting and receiving unit (server) 114 (a voice data transmitting unit) transmits data retained by the server 111 (for example, voice data whose sound source is each avatar in the virtual space, a position of each avatar, movements of hands of each avatar, etc.) to the user terminal 100, which is the transmission destination, via the network 110. In addition, the transmitting and receiving unit (server) 114 receives data transmitted from the user terminal 100, which is the transmission source, via the network 110.
With such a configuration, the voice data whose sound source is each avatar in the virtual space, information on the position of each avatar in the virtual space, and information on the movements of the hands of each avatar are shared between the server 111 and the user terminal 100.
The association determining unit 115 (an association determining unit) determines whether or not the voice data transmitted from the first user terminal 100-1 is associated with an avatar linked to the second user terminal 100-2.
In the case that the association determining unit 115 determines that the voice data transmitted from the first user terminal 100-1 is associated with the avatar linked to the second user terminal 100-2, the voice data analyzing unit 116 (a voice data analyzing unit) analyzes the voice data to be transmitted to the second user terminal 100-2.
The sound volume adjusting unit 117 (a sound volume adjusting unit) adjusts a sound volume of the voice data to be transmitted to the second user terminal 100-2, which is outputted from a speaker 108-2.
Hereinafter, with reference to
First, an example of the processing of adjusting the sound volume of the voice data in accordance with the situation in the virtual space according to the present embodiment, which is executed in the information processing system 1, will be described with reference to a flowchart of
As shown in
In the step S201, the input/output unit 112-1 accepts a voice uttered by a user of the first user terminal 100-1 into the microphone 107-1 as first voice data. In the step S202, the transmitting and receiving unit (user terminal) 113-1 transmits the first voice data accepted in the step S201 to the server 111 via the network 110, and the present processing on the side of the first user terminal 100-1 ends.
In the step S203, the transmitting and receiving unit (server) 114 receives, via the network 110, the first voice data transmitted in the step S202.
In the step S204, the association determining unit 115, the voice data analyzing unit 116, and the sound volume adjusting unit 117 execute a sound volume adjustment processing that adjusts the sound volume of the voice data to be transmitted to the second user terminal 100-2. The sound volume adjustment processing will be described in detail below with reference to
In the step S205, the transmitting and receiving unit (server) 114 transmits the voice data whose sound volume has been adjusted in the step S204 to the second user terminal 100-2 via the network 110, and the present processing on the side of the server 111 ends.
In the step S206, a transmitting and receiving unit (user terminal) 113-2 receives the voice data transmitted from the server 111 via the network 110 in the step S205.
In the step S207, an input/output unit 112-2 outputs the voice data that has been received in the step S206 from the speaker 108-2 at the sound volume that has been adjusted in the step S204, and the present processing on the side of the second user terminal 100-2 ends.
Next, an example of the sound volume adjustment processing in the step S204 of
As shown in
In a step S302, in the case that the result of the association determination processing performed in the step S301 is a determination result indicating that the first voice data is associated with the second avatar, the association determining unit 115 advances the sound volume adjustment processing to a step S303, and otherwise, ends the sound volume adjustment processing.
In the step S303, the voice data analyzing unit 116 executes a voice data analysis processing that analyzes the first voice data and second voice data excluding the first voice data, among the voice data to be transmitted to the second user terminal 100-2. The voice data analysis processing will be described in detail below with reference to
In a step S304, in the case that the result of the voice data analysis processing performed in the step S303 is an analysis result indicating that the second voice data disturbs the first voice data, the voice data analyzing unit 116 advances the sound volume adjustment processing to a step S305, and otherwise, ends the sound volume adjustment processing. The second voice data disturbs the first voice data, which means a situation in which the first voice data cannot be heard or is very difficult to hear due to the second voice data.
In the step S305, the sound volume adjusting unit 117 adjusts the sound volume of the voice data to be transmitted to the second user terminal 100-2, and then ends the sound volume adjustment processing. The sound volume of the voice data to be transmitted to the second user terminal 100-2 is adjusted by increasing the sound volume of the first voice data or decreasing the sound volume of the second voice data in accordance with the settings of the server 111 or the second user terminal 100-2. In this way, in the case that the second voice data disturbs the first voice data, by increasing the sound volume of the first voice data, the first voice data becomes able to be heard. Alternatively, in the case that the second voice data disturbs the first voice data, by decreasing the sound volume of the second voice data, the first voice data becomes able to be heard relatively.
Next, an example of the association determination processing in the step S301 of
As shown in
In the step S307, the association determining unit 115 determines that the first voice data is associated with the second avatar, and then ends the association determination processing.
In the step S308, the association determining unit 115 determines whether or not the first avatar and the second avatar are in a friendly relationship. In the case that the first avatar and the second avatar are in a friendly relationship (YES in the step S308), the association determination processing proceeds to the step S307, and otherwise (NO in the step S308), the association determination processing proceeds to a step S309. The case where the first avatar and the second avatar are in a friendly relationship is, for example, a case where the first avatar and the second avatar have been registered in the server 111 as friends with each other. The details of this determination method will be described below with reference to
In the step S309, the association determining unit 115 determines whether or not the first avatar is facing a direction of the second avatar (the first avatar faces the second avatar). In the case that the first avatar faces the second avatar (YES in the step S309), the association determination processing proceeds to the step S307, and otherwise (NO in the step S309), the association determination processing proceeds to a step S310. The case where the first avatar faces the second avatar is, for example, a case where the second avatar is present in a line-of-sight direction of the first avatar or the second avatar is present on an extension line in front of the first avatar's face. As described above, whether or not the second avatar is present on the extension line in front of the first avatar's face is determined based on the position of the first avatar set by a user operation with respect to the controller 109-1 and the position of the second avatar set by a user operation with respect to the controller 109-2.
In the step S310, the association determining unit 115 determines whether or not a name of the second avatar is included in the first voice data. Specifically, the association determining unit 115 performs a voice recognition processing and a natural language processing with respect to the first voice data, and determines whether or not a name “B-san” displayed above the head of the second avatar, which will be described below with reference to
In the step S311, the association determining unit 115 determines that the first voice data is not associated with the second avatar, and then ends the association determination processing.
Next, an example of the voice data analysis processing in the step S303 of
As shown in
In the step S313, the voice data analyzing unit 116 determines that the second voice data disturbs the first voice data, and then ends the voice data analysis processing.
In the step S314, the voice data analyzing unit 116 determines that the second voice data does not disturb the first voice data, and then ends the voice data analysis processing.
Next, specific examples of the respective processing shown in the flowcharts of
As shown in
In addition, as shown in
A dashed line speech bubble 405 indicates that voice data corresponding to the voice “ . . . ” uttered by the user of the first user terminal 100-1 is being generated from the first avatar 401 as a sound source. The dashed line speech bubble 405 is displayed in
In the step S306, the association determining unit 115 determines whether or not the second avatar 402 is placing his/her hand on his/her ear toward the first avatar 401, which is the sound source of the first voice data. As shown in
In the step S306, the association determining unit 115 determines whether or not the second avatar 402 is placing his/her hand on his/her ear toward the first avatar 401, which is the sound source of the first voice data. As shown in
In the step S308, the association determining unit 115 determines whether or not the first avatar 401 and the second avatar 402 are in a friendly relationship. As shown in
In the step S306, the association determining unit 115 determines whether or not the second avatar 402 is placing his/her hand on his/her ear toward the first avatar 401, which is the sound source of the first voice data. As shown in FIG. 4D, similar to
In the step S308, the association determining unit 115 determines whether or not the first avatar 401 and the second avatar 402 are in a friendly relationship. However, here, a friend registration status different from the friend registration status between avatars, which is shown in
In the step S309, the association determining unit 115 determines whether or not the first avatar 401 is facing a direction of the second avatar 402 (the first avatar 401 faces the second avatar 402). As shown in
In the step S306, the association determining unit 115 determines whether or not the second avatar 402 is placing his/her hand on his/her ear toward the first avatar 401, which is the sound source of the first voice data. As shown in FIG. 4E, similar to
In the step S308, the association determining unit 115 determines whether or not the first avatar 401 and the second avatar 402 are in a friendly relationship. However, here, a friend registration status different from the friend registration status between avatars, which is shown in
In the step S309, the association determining unit 115 determines whether or not the first avatar 401 is facing a direction of the second avatar 402 (the first avatar 401 faces the second avatar 402). As shown in
In the step S310, the association determining unit 115 determines whether or not the name 404 of the second avatar 402 is included in the first voice data whose sound source is the first avatar 401. As shown in
Thus, according to the present embodiment, in the case that it is determined that the first voice data is associated with the second avatar and it is analyzed that the second voice data disturbs the first voice data, the sound volume of the voice data to be transmitted to the second user terminal 100-2 is adjusted. As a result, even in a noisy environment in the virtual space, a user A is able to establish a conversation with a desired person, a user B.
In the above description, the adjustment of the sound volume of the voice data to be transmitted to the second user terminal 100-2 is automatically performed by the server 111, and the user does not know whether the sound volume has been adjusted or not. However, it may be possible to store information indicating that the sound volume adjusting unit 117 has adjusted the sound volume of the voice data to be transmitted to the second user terminal 100-2, and the first user terminal 100-1 and the second user terminal 100-2 may be notified that the sound volume has been adjusted. As a result, the user of the first user terminal 100-1 is able to know that the sound volume of the voice uttered by the user of the first user terminal 100-1 has been adjusted so that it can be easily heard by the user of the second user terminal 100-2. In addition, in the case that the sound volume of the second voice data is decreased in the step S305, the user of the second user terminal 100-2 is able to know the reason why the sound volume of the second voice data is decreased.
In addition, in the above description, in the case that it is determined that the first voice data is associated with the second avatar and it is analyzed that the second voice data disturbs the first voice data, the sound volume of the voice data to be transmitted to the second user terminal 100-2 always has been adjusted. However, instead of always automatically adjusting the sound volume, the sound volume adjusting unit 117 may transmit an inquiry to the user of the second user terminal 100-2 as to whether or not to adjust the sound volume, and allow the user to choose whether or not to adjust the sound volume. In this case, in the step S305, instead of adjusting the sound volume of the voice data to be transmitted to the second user terminal 100-2, the sound volume adjusting unit 117 stores information indicating that the sound volume should be adjusted and an adjustment amount of the sound volume of the voice data. Thereafter, the voice data with the unadjusted sound volume and the inquiry for prompting the user to choose whether or not to adjust the sound volume are transmitted to the second user terminal 100-2. In response to this inquiry, the input/output unit 112-2 of the second user terminal 100-2 causes the HMD 106-2 to display a dialog box prompting the user to choose whether or not to adjust the sound volume, allowing the user to choose whether or not to adjust the sound volume with the controller 109-2. In the case that the user has chosen to adjust the sound volume, a sound volume adjustment request is transmitted to the server 111, and the sound volume adjusting unit 117 adjusts the sound volume of the voice data by the stored adjustment amount and then transmits the voice data with the adjusted sound volume to the second user terminal 100-2.
In addition, in the above description, the determinations of the steps S306 to S310 of the association determination processing are performed in the order of the steps, but may be performed based on “an OR condition” in which the determinations of the steps S306 to S310 are performed simultaneously and the processing proceeds to the step S307 when any one of the determinations of the steps S306 to S310 is established. In addition, the determinations of the steps S306 to S310 of the association determination processing may be performed based on “an AND condition” in which the processing proceeds to the step S307 when some of the determinations of the steps S306 to S310 is established simultaneously. For example, in the case of the AND condition of the steps S308 and S309, in the case that the first avatar and the second avatar are in a friendly relationship and the first avatar faces the second avatar, the processing may proceed to the step S307.
In addition, in the above description, the adjustment of the sound volume of the voice data to be transmitted to the second user terminal 100-2 is performed by the server 111, but may be performed by the second user terminal 100-2. In this case, the server 111 transmits, to the second user terminal 100-2, the voice data with the unadjusted sound volume and an adjustment amount of the sound volume of each piece of the voice data.
In addition, in the above description, a client-server system in which the server 111 exists has been described, but a peer-to-peer system may also be used. In this case, the server 111 does not exist, and the functional configuration and the processing of the server 111 are performed by the user terminal 100 instead of the server 111.
Embodiment(s) of the present invention can also be realized by a computer of a system or apparatus that reads out and executes computer executable instructions (e.g., one or more programs) recorded on a storage medium (which may also be referred to more fully as a ‘non-transitory computer-readable storage medium’) to perform the functions of one or more of the above-described embodiment(s) and/or that includes one or more circuits (e.g., application specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiment(s), and by a method performed by the computer of the system or apparatus by, for example, reading out and executing the computer executable instructions from the storage medium to perform the functions of one or more of the above-described embodiment(s) and/or controlling the one or more circuits to perform the functions of one or more of the above-described embodiment(s). The computer may comprise one or more processors (e.g., central processing unit (CPU), micro processing unit (MPU)) and may include a network of separate computers or separate processors to read out and execute the computer executable instructions. The computer executable instructions may be provided to the computer, for example, from a network or the storage medium. The storage medium may include, for example, one or more of a hard disk, a random-access memory (RAM), a read only memory (ROM), a storage of distributed computing systems, an optical disk (such as a compact disc (CD), digital versatile disc (DVD), or Blu-ray Disc (BD) TM), a flash memory device, a memory card, and the like.
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2023-172214, filed on Oct. 3, 2023, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2023-172214 | Oct 2023 | JP | national |